You are on page 1of 400

305 A 1150

Obstructions

305 (IX.1 1) and by @“(jr) the set of thomotopy classes


of mappings in %“(f’) relative to L. The set
Obstructions &O(f’) consists of a single element ‘because of
the arcwise connectedness of Y, @(f’) is non-
A. History empty, and @‘(f’) (na 2) may be empty. Let ,f”
be an element of @“(f’). If we consider the
The theory of obstructions aims at measuring restriction off” to the boundary Oni1 of an
the extensibility of mappings by means of oriented (n + l)-cell (r”+’ of K, then f”: $‘+l -t
algebraic tools. Such classical results as the Y determines an element c(f”, on+‘) of the
+Brouwer mapping theorem and Hopf’s exten- thomotopy group 7~,,(Y) (- 202 Homotopy
sion and tclassification theorems in homotopy Theory). This element gives a measure of ob-
theory might be regarded as the origins of this struction for extending f” to the interior of
theory. A systematic study of the theory was C?+I. We obtain an (n+ 1)-tcocycle c”+‘(f”) of
initiated by S. Eilenberg [l] in connection the tsimplicial pair (K, L) with coefficients in
with the notions of thomotopy and tcoho- n,J Y), called the obstruction cocycle off”, by
mology groups, which were introduced at the assigning c(fn,a”+‘) to each (n+ l)-ce11 a”+‘.
same time. A. Komatu and P. Olum [L] ex- This obstruction cocycle c”“(f”) is the mea-
tended the theory to mappings into spaces not sure of obstruction for extending f” to R”“. A
necessarily +n-simple. For mappings of poly- necessary and sufflcient condition for the
hedra into certain special spaces, the +homo- extensibility is given by c”“(f”) ==0. Clearly,
topy classification problem, closely related to c”+l (j”“) is uniquely determined for each ele-
the theory of obstructions, was solved in the ment f of @“(f’). The set of a11 c”+‘(f”) with
following cases (K” denotes an m-dimensional S”E@“(~‘) forms a subset o”+‘(f’) of the group
polyhedron): K”+‘+S” (N. Steenrod [SI), Knt2 of cocycles Z”‘l(K, L; n,(Y)). @“‘l(f) is non-
4s” (J. Adem), Kntk*Y, where ni( Y)=0 for empty if and only if o”+l(,f’) contains the zero
i <n and n < i < n + k (M. Nakaoka). There element 0.
are similar results by L. S. Pontryagin, M. LetKn=KxI,Lu=(KxO)L(LxZ)U
Postnikov, and S. Eilenberg and S. MacLane. (K x 1). Given two mappings SO, fi : K-t Y
Except for the special cases already noted, it is satisfying f. 1L =f, 1L, we cari defïne a natural
extremely difflcult to discuss higher obstruc- mapping F’ : La -) Y such that an element
tions in general since they involve many com- F” of @“(F’) corresponds to a thomotopy
plexities. Nevertheless, it is significant that the h”-’ relative to L connecting f. ( K”-’ with
idea of obstructions has given rise to various fi 1K”-‘. Given an element F”E@“(F’), we
important notions in modern algebraic topol- have the element P’(F”) of Z”“(Km, L”;
ogy, including cohomology operations (- 64 n,(Y)), which we identify with Z”(K, L; n,(Y))
Cohomology Operations) and characteristic through the natural isomorphism of chain
classes (- 56 Characteristic Classes). groups of the pair (Ko, L”) to those of the
The notion of obstruction is also very useful pair (K, L). Thus we cari regard c”+‘(F”) as an
in the treatment of cross sections of fiber bun- element of Z”(K, L; 7c”(Y)), which is denoted
dles (- 147 Fiber Bundles), tdiffeomorphisms by d”(f,, h”-‘,f,), and cal1 it the separation (or
of differentiable manifolds, etc. difference) cocycle. If ,jo ( R”-l =SI ) i?‘, we
have the canonical mapping F”: L” U (Ko)
* Y, and the separation cocycle is denoted
B. General Theory for an n-Simple Space Y simply by d”(fo,fi). The set of :separation
cocycles corresponding to elements of @“(Fr) is
The question of whether two (continuous) considered to be a subset of Z”(K, L; n,( Y))
mappings of a topological space X into an- and is denoted by o”(.fO,fi). A necessary and
other space Y are +homotopic to each other sufflcient condition for h”-’ to be extensible
cari be reduced to the extensibility of the given to a homotopy on l?” is d”(f,,h”-‘,f,)=O.
mapping: (X x {O))U(X x {l})+ Y to a map- Therefore a necessary and suffccient condition
ping of the product X x 1 of X and the unit for f0 1R” =fi ( R” (rel L) (i.e., relative to L) is
interval I= [0, 11 into Y. Therefore the prob- O~o”(f,,f,). Givenf;,f;:K”-tYwithfO[L=
lem of classifying mappings cari be treated .fTIL, then d”(&‘,h”-‘,f;) (~o’(f;,f;)) is
in the same way as that of the extension of an element of Z”(K”, L; 7c”(Y)): which is also
mappings. considered to be a cochain of the pair (K, L).
Let K be a tpolyhedron, L a subpolyhedron In this sense, we ca11 @(SO, h” -‘, ,f,“) the sepa-
of K, and R” = LU K” the union of L and the ration (or deformation) cochain over (K, L).
+n-skeleton K” of K. Let Y be an tarcwise The coboundary of the separation cochain
connected n-simple space, and ,f’ be a mapping d”(fl, h”-‘,,f;) coincides (except possibly for
of L into Y. Denote by O”(f’) the set of map- sign) with c”“(&‘)-c”“(f,“).
pings of I?” into Y that are extensions off’, For a tïxed fo E a>“( f’), any n-cochain dn
1151 305 c
Obstructions

of the pair (K, L) with coefficients in n,(Y) general, if On( fo, fi) is nonempty, it is a coset
is expressible as a separation cochain d”= of H”(K, L; x,(Y)) factored by the subgroup
d”(fl,f;) wheref/EQ”(f’) is a suitable map- O”(f,, fo). Combined with the existence theo-
ping such that fil R”-’ =fr” 1R”-’ (existence rem on separation cochains, this cari be uti-
theorem). lized to show the following theorem.
Therefore if we take an element f”-’ of Assume that O”(f ‘) is nonempty. The set
@-I(f’) whose obstruction cocycle c”(f”-‘) of a11 elements @‘(f ‘) that are extensions of
is zero, the set of a11 obstruction cocycles an element of @-l (f ‘) is put in one-to-one
c”“(J”) of a11 such ~“E@“(S’) that are exten- correspondence with the quotient group of
sions off”-’ forms a subset of O”“(f’) and Hn(I?‘, L; A,( Y)) modulo On( fo, fô) by pairing
coincides with a coset of Z”+‘(K, L; n,(Y)) the obstruction On( fô, f “) with each f” for a
factored by B”+‘(K, L; rcL,(Y)). Thus a coho- fixed fô. Among such elements of @“( f ‘), the
mology class ?+r(f”-~)EH”+~(K, L; rrn( Y)) set off” that are extensible to @+’ is in one-
corresponds to an f”-’ E Qn-l (f’) such that to-one correspondence with the quotient
c”(f”-‘) = 0, and ?“+i(f”-‘) = 0 is a necessary group of H”(R”+1 ,L; ~n(Y))=H”(KL; G(Y))
and sufftcient condition for f”-’ to be exten- modulo the subgroup On( fô”, f$+‘), assuming
sible to I?“” (first extension theorem). that fo is extended to fo”+’ (fkst classification
For the separation cocycle, d”(f,, h”-‘,fJ~ theorem).
H”(K, L;a,(Y)) corresponds to each homotopy
h”-’ on I?n-Z such that d”-‘(f,, h”-‘,f,)=O,
and 6”( f& h”-‘, fi) = 0 is a necessary and C. Primary Obstructions
sufficient condition for h”-’ to be extensible to
a homotopy on R” (tirst homotopy theorem). Assume that H’+‘(K, L; ni( Y))= H’(K, L; ni( Y))
The subset of H”+‘(K, L, K,( Y)) correspond- =O,whereO<i<p(e.g.,rq(Y)=O,O<i<p).In
ing to on+’ (f ‘) is denoted by On+’ (f ‘) and is this case, by consecutive use of the lïrst exten-
called the obstruction to an (n + 1)dimensional sion theorem and the lïrst homotopy theorem,
extension off ‘. Similarly, the subset On( fo, fi) we cari show that each @(f ‘) (i <p) consists of
of H”(K, L, n,( Y)) corresponding to o”( fo, fi) is a single element and OP+‘( f ‘) also consists of a
called the obstruction to an ndimensional single element Pc1 (f ‘) E HP+‘(K, L; xP( Y)). The
homotopy connecting f. with fi. Clearly, a element CP+’ (f ‘), called the primary obstruc-
condition for f' to be extensible to Rn+’ is tion off ‘, vanishes if and only if f’ cari be
given by 0 E O”+l (f ‘), and a necessary and extended to RP+1 (second extension theorem).
sufftcient condition for f. 1K” = fi 1K” (rel L) is When Hi+’ (K, L; ai(Y)) = 0 for i > p (for exam-
given by OeO”(fo, fi). ple, when ai(Y) = 0 for p ci < dim(K -L)),
A continuous mapping <p: (K’, L’)+(K, L) f’ is extendable to K if and only if the first ob-
induces homomorphisms of cohomology struction off’ vanishes (third extension
groups ‘p*: H”+‘(K, L; n,( Y))+H”+‘(K’, L’, theorem).
n,(Y)), H”(K, L; w,( Y))+H”(K’, L’; n,(Y)). Then Correspondingly, if H’(K, L; zi( Y))=
for f’:LtY, O”“(f’ocp)~rp*O”+‘(f’), and Hi-’ (K, L; rci( Y)) = 0 (0 < i < p), then for any
for f,, fi : K + Y such that f. 1L = fi ) L, On( f, o two mappings fo, fi : K-+ Y, f. (L = fi 1L,
cp,f, o cp)~ p*O”( fo, fi). Therefore we also lïnd Op(fo, fi) consists of a single element dP(fo, fi)
that the obstruction to an extension and the E HP(K, L; rcp( Y)), which we cal1 the primary
obstruction to a homotopy are independent difference off0 and fi. This element vanishes
of the choice of subdivisions of K, L, and con- if and only if f0 1l@’ = fi 1&’ (rel L) (second
sequently are topological invariants. homotopy theorem). Moreover, when H’(K, L;
Let fo, fi, and fi be mappings K-* Y such rci( Y)) = 0 (i > p), the primary difference is zero
that f. 1L = fi 1L = f, 1L. Given homotopies if and only if f. E fi (rel L) (third homotopy
h~;l:f,)~“-l~fi)Rn-l(relL),h;;l:fl)~-l~ theorem).
f2 1R”-’ (rel L), then for the composite h”,;’ = Assume that the hypotheses of the second
h;;’ o ht;‘, we have extension theorem and second homotopy
theorem are satistïed. If we assign to each
element f P of @‘(f ‘) the primary difference of
f P and the tïxed element f{, then Gi”( f ‘) is in
one-to-one correspondence with HP(RP, L;
and for the inverse homotopy h;;’ : f, 1Z?‘-l T rcp( Y)) by the lïrst classification theorem (sec-
f. 1R”-’ of h”,;‘, clearly ond classification theorem). Similarly, assume
that the hypotheses of the third extension
d”(f,,h;o’,fo)= -d”(fo,h&‘,f,).
theorem and third homotopy theorem are
Therefore O”(fO,fO) forms a subgroup of satisfîed. Iff,:K+ Y, f’=& 1L, then homotopy
H”(K, L, nnn(Y)) that is determined by the classes relative to L of extensions f off’ are
homotopy class off0 1em1 relative to L. In put in one-to-one correspondence with the
305 D 1152
Obstructions

elements of HP(K, L; xp( Y)) by pairing dp(,f;fO) ferential equations by reducing the operations
with ,f (tbird classification theorem). of differentiation and integration into alge-
brait ones in a symbolic manner. The idea
was initiated by P. S. Laplace in his Théorie
D. Secondary Obstructions
analytique des probabilités (18 12), but the
method has acquired popularity since 0.
For simplicity, assume that ni(Y) = 0 (i< p and
Heaviside used it systematically in the late
p < i < q). If the primary obstruction CP+I (f’)~
19th Century to solve electric-circuit problems.
H P+I (K, L; 7~,,(Y)) off’: L-t Y vanishes, we cari
The method is therefore also callec. Heaviside
detïne 04+’ (,f’) c H4+’ (K, L; 7rIq(Y)), which we
calculus, but Heaviside gave only a forma1
cal1 the secondary obstruction off’. When Y =
method of calculus without bothering with
SP, q = p + 1, p > 2, the secondary obstruction
rigorous arguments. The mathematical foun-
Op”(f’) coincides with a coset of HpfZ(K, L;
dations were given in later years, tïrst in terms
Z,) modulo the subgroup Sq2(HP(K, L; Z)),
of +Laplace transforms, then by ap:$ying the
where Sq’ denotes the +Steenrod square
theory of tdistributions. One of the motiva-
operation [S]. In this case, if L = KP, then
tions behind L. Schwartz’s creation of this
Op”(,f’) reduces to a cohomology class,
latter theory in the 1940s was to give a sound
Sq’(i*)-‘f’*(o) with i: L+K, where o is a gen-
foundation for the forma1 method, but the
erator of HP(SP, Z) (in this case (i*)m’f’*(a) #
theory obtained has had a much larger range
0 is equivalent to ?“(f’)=O) [S]. Moreover,
of applications. Schwartz’s theory was based
if Sq2f’*(a) = 0, then there exists a suitable
on the newly developed theory of +topological
extension fp+2:lZp+2 + Y = SP of ,f’. The set of
linear spaces. On the other hand, J. Mikusin-
obstruction cocycles of all such fp” defïnes
ski gave another foundation, based only on
the tertiary obstruction Op+3( f ‘), which coin-
elementary algebrdic notions and on Titch-
cides wit h a coset of Hp+3(K, L; Z,) modulo the
marsh’s theorem, whose proof has recently
subgroup SqZ(HP”(K, L; Z,)). By using the
been much simplifïed.
tsecondary cohomology operation 0 of J.
In this article, we fïrst explain the simple
Adem, it cari be expressed as @((i*)-‘f’*(o))
theory established by Mikusinski [2] and later
(- 64 Cohomology Operations).
discuss its relation to the classical Laplace
All the propositions in this article remain
transform method.
true if we take +CW complexes instead of
polyhedra K.

B. The Operational Calculus of Mikusibski


References

[l] S. Eilenberg, Cohomology and continuous The set % of all continuous complex-valued
mappings, Ann. Math., (2) 41 (1940), 231-251. functions a = {a(t)} defïned on t 2 0 is a tlinear
[2] P. Olum, Obstructions to extensions and space with the usual addition and scalar multi-
homotopies, Ann. Math., (2) 52 (1950) l-50. plication. %?is a +Commutative algebra with
[3] P. Olum, On mappings into spaces in multiplication a. b detïned by the tconvolution
which certain homotopy groups vanish, Ann. {S;u(t-s)b(s)ds}. Th e ring W has no +zero
Math., (2) 57 (1953) 561-574. divisors (Titchmarsh’s theorem). (There have
[4] N. E. Steenrod, The topology of fibre been several interesting proofs of Titchmarsh’s
bundles, Princeton Univ. Press, 1951. theorem since the tïrst demonstratilon given
[5] N. E. Steenrod, Products of cocycles and by Titchmarsh himself [3]. For example, a
extensions of mappings, Ann. Math., (2) 48 simple proof has been published by C. Ryll-
(1947), 290-320. Nardzewski (1952).) Hence we cari construct
[6] E. H. Spanier, Algebraic topology, the tquotient field -2 of the ring %. An element
McGraw-Hill. 1966. of 2 is called a Mikusitiski operator, or simply
an operator. If we deiïne a(t) = 0 for t < 0 for
the elements {a(t)} in ??, then V? is a subalgebra
of %VI,which is the set of all locally integrable
(locally L,) functions in (-a, a) uhose +sup-
306 (X11.20) port is bounded below. Here we identify two
functions that coincide almost everywhere.
Operational Calculus The algebra J& has no zero divisor, and its
quotient tïeld is also 2.
A. General Remarks The unity element for multiplication in %,
denoted by 6 = b/b (b #O), plays the role of the
The term “operational calculus” in the usual +Dirac &function. It is sometimes called the
sense means a method for solving tlinear dif- impulse function. The operator 1= { 1) ~‘6 is the
1153 306 C
Operational Calculus

function that takes the values 0 and 1 accord- sented by


ing as t < 0 or t > 0. This operator is Heavi-
side’s function and is sometimes called the
unit function. Usually it is denoted by l(t)
or simply 1. The value l(0) may be arbitrary, m M(ni) of
but usually it is detïned as 1/2, the mean of the +C
i=1Zj7Je "
limit values from both sides. The operator I is
an integral operator, because, as an operator where we assume that Âo, 1,). . . , i, exhaust the
carrying a into I. a, it yields roots of the equation o(L) = 0, 1, is a multiple
root of degree 1, and a11 other roots are simple
t
a(s)ds = the integral of a over [0, t]. (WI= n - 1). The above formula is called the
l.a=
is 0 1 expansion theorem.
More generally, the operator {t”-‘/I(n)} (ReÂ.
> 0) gives the Âth-order integral. The operator
s = 611, which is the inverse operator of 1, is a C. Limits of Operators
differential operator. If a E +Z is of tclass C’ ,
then we have A sequence a, of operators is said to converge
to the limit a = b/q if there exists an operator
s~a=a’+a(+0)6=a’+{a(+O)}/{l}. q( # 0) such that q. a, EV and the sequence
of functions q. a, converges tuniformly to b
Similarly, if UEW is of class C”, we have
on compact sets. The limit a is determined
S”.a=a’“)+a’“-“(0)6+a(“-2)(0)s uniquely without depending on the operator q.
+ . +a(O)s”-1. Based on this notion of limits of operators, we
(2)
cari construct the theory of series of operators
The operator U+S. a cari be applied to func- and differential and integral calculus of func-
tions a that are not differentiable in the ordi- tions of an independent variable i whose
nary sense, and considering the application of values are operators. They are completely
s to be the operation of differentiation, we cari parallel to the usual theories of elementary
treat the differential operator algebraically in calculus (- 106 Differential Calculus; 216
the tïeld 02. In particular, we have s. 1 = 6, and Integral Calculus; 379 Series). A linear partial
this relation is frequently represented by the differential equation in the function V(X, t) of
formula two variables, in particular its initial value
problem, reduces to a linear ordinary differen-
dl(t)/dt=S(t). (3)
tial equation of an operational-valued func-
A rational function of s whose numerator is tion of an independent variable x.
of lower degree than its denominator is an For a given operator w, the solution (if
telementary function of t. For example, we it exists) of the differential equation v’(1) =
have the relations w. <p(Â) with the initial condition C~(O)= 6 is
unique, is called the exponential function of an
l/(s-~)“={t”-‘ea’/(n-l)!},
operator w, and is denoted by ~(1) = e”“. If the
power series

s/(s2 +/32) = {cos~t}. (4)


j. I”w”/n! (6)
The solution of an ordinary linear differ-
ential equation with constant coefficients converges, the limit is identical to the exponen-
C:=,, a,<p(‘)(t) =f(t) under the tinitial condition tial function e”‘“. However, there are several
C~“)(O) = yi (0 < i <n - 1) is thus reduced to an cases in which eAWexists even when the series
equation in s by using formulas (1) and (2), and (6) of operators does not converge.
is computed by decomposing the following For example, for w = - &, we have
operator into partial fractions:
e -d= {(Â/2&P)exp( -L’/4t)}, , (7)

(5) and for w = -s, we have

e-“S=h”=s.H,(t), (8)
whereLi=~,+l~o+~,+2~l + . .. +w~-,-~,
0 < r < n - 1. The general solution is repre- where the function H,(t) takes the values 0 and
sented by (5) if we consider the constants yo, 1 according as t < 1 or t > 1. H,(t) belongs to
. . . ,yn-i or fio, . ,jI-i as arbitrary parameters. the ring @ and is called the jump function at Â.
If the rational function in the right-hand side For f(t)e%, we have
of (5) is M@)/L)(s) and the degree of the numer-
~A~u(t)~=u(t-4~9
ator is less than that of the denominator, then
the right-hand side of (5) is explicitly repre- and hence we cal1 (8) the translation operator
306 D 1154
Operational Calculus

(or shift operator). For w = -s, the series (6) changing the variable from s to p and multi-
does not converge, but if we apply the forma1 plying the former transform by p.
relation e~“s=C~o(-is)“/n! to f(t), we have a
forma1 Taylor expansion
E. Relation to Distributions

For f EV?, an operator of the form h’. sk. f is


identifïed with a distribution of L. Schwartz
The solution of linear tdifference equations
are represented by rational functions of h”. with support bounded from below. We cari
identify with a Schwartz distribution the limit
The power series C a,h” of operators always
of a sequence f, (or a suitable equivalence class
converges. This fact gives an explicit example
of sequences) of operators of the form h” sk .f
of a representation by forma1 power series.
such that f., fn+l, . are identical in the inter-
Note that the operators em” and e-“JS play
val (-II, n). The notions of Schwartz distri-
an essential role in the solution of the +wave
butions and of MikusiBski operators do not
equation
include each other, but both are generaliza-
azq
-=- a2azp tions of the notions of functions and their
ax2 at2 derivatives. For formulas and examples -
Appendix A, Table 12.
and the +heat equation

azq
~=~~ dacp References
5X2 at
The operator (7) converges to 6 for /2+0, and [l] 0. Heaviside, On operators in physical
this gives a tregularization of the Dirac 6- mathematics 1, II, Proc. Roy. Soc. London,
function. ser. A, 52 (1893), 504-529; 54 (1893), 105-442.
[2] J. Mikusihski, Operational calculus, Per-
gamon, 1959. (Original in Polish, 1953.)
[3] E. C. Titchmarsh, The zeros of certain
D. Laplace Transform
integral functions, Proc. London Math. Soc.,
25 (1926), 283-302.
For every function {f(t)} E %?,the limit
[4] K. Yosida, Operational calculus: A theory
of hyperfunctions, Springer, 1984.
lim Pe-“‘f(l)dl= m eeAsf(,l)dA (9)
B-+m s o s0
always exists (in the sense of the limit of oper-
ators), and as an operator coincides with the
original function {f(t)}. Therefore, if the usual
307 (XIX.1 3)
Laplace transform (- 240 Laplace Trans- Operations Research
form) of the function f(n) exists and (9) is a
function g(s), then as a function of the differen- A. General Remarks
tial operator s, g(s) is the operator that is
given by the inverse Laplace transform ,f(t) of Operations research in the most general sense
g(s). Formulas (4) and (7) are indeed typical cari be characterized as the application of
examples of this relation, where the left-hand scient& methods, techniques, and tools to the
side is the usual Laplace transform of the operations of systems SO as to provide those in
right-hand side. In the practical computation control with optimum solutions to problems.
of (5), we cari compute the Laplace inverse This definition is due to Churchman, Ackoff,
transform of the right-hand side. However, if and Arnoff [l]. Operational research began
we took the Laplace transform as the founda- in a military context in the United Kingdom
tion of the theory, it would not only be com- during World War II, and it was quickly taken
plicated but also be subject to the artifïcial up under the name operations research (OR)
restriction caused by the convergence condi- in the United States. After the war it evolved
tion on Laplace transforms. in connection with industrial organization,
In the theory of operational calculus, the and its many techniques have found expand-
transform ing areas of application in the Uniied States,
m the United Kingdom, and in other industrial
dP)=P e-P’f(t)dt (10) countries. Nowadays OR is used widely in
I 0 industry for solving practical problems, such
is sometimesusedinstead of the Laplace trans- as planning, scheduling,inventory, transpor-
form itself. But the difference is not essen- tation, and marketing. It also has various im-
tial; we obtain the latter transform merely by portant applications in the fïelds ojagricul-
1155 307 c
Operations Research

ture, commerce, economics, education, public ming), Markovian decision processes (- 127
service, etc., and some techniques developed in Dynamic Programming, 261 Markov Pro-
OR have influenced other lïelds of science and cesses), and basic tprobability theory are used
technology. to construct models for these problems.
(4) Queuing mode1 (- 260 Markov Chains
H). In a telephone system, calls made when a11
B. Applications the lines of the system are busy are lost. The
problem of computing the probability of loss
Applications of OR to practical problems are involved was lïrst solved in the pioneering
often carried out by project teams because article on tqueuing theory by A. K. Erlang in
knowledge of disparate aspects of the prob-
1917. For systems in which calls cari be put on
lems are required, and interdisciplinary co- hold when a11lines are busy, one deals with the
operation is indispensable. The following are twaiting time distribution instead of the proba-
the major phases of an OR project: (i) formu- bility of 10s~. In the 1930s F. Pollaczeck and
lating the problem, (ii) constructing a mathe- A. Ya. Khinchin gave explicit formulas for the
matical mode1 to represent the system under characteristic function of the waiting time
study, (iii) deriving a solution from the model, distribution. In many situations, such as in
(iv) testing the mode1 and the solution derived telephone systems, air and surface traffic,
from it, (v) establishing controls over the solu-
production lines, and computer systems, vari-
tion and putting it to work (implementation). ous congestion phenomena are often observed,
When the mathematical mode1 that has and many kinds of queuing models are utilized
been constructed in phase (ii) is complicated to analyze the congestion. Mathematically,
and/or the amount of data to be handled is almost a11 such models are formulated by
large, a digital tcomputer is often utilized in using Markov processes. For practical uses,
phases (iii) and (iv). approximation and computational methods
are important as well as theoretical results.
C. Mathematical Models [2] (5) Scheduling mode1 (- 376 Scheduling and
Production Planning). Network scheduling is
Typical mathematical models and tools that used to schedule complicated projects (for
appear frequently in OR are: example, construction of buildings) that
(1) Optimization mode1 (- 264 Mathematical consist of a large number of jobs related to
Programming). This mode1 is characterized by each other in some natural order. PERT (pro-
one or more real-valued functions, which are gram evaluation and review technique) and
called objective functions, to be minimized (or CPM (critical path method) are popular com-
maximized) under some constraints. According putational methods for this mode1 (- 281
to the number of objective functions, the types Network Flow Problems). Job shop scheduling
of objective functions, and the types of con- is used when we have m jobs and n machines
straints, this mode1 is classilïed roughly as and each job requires some of the machines in
follows: (i) Single-objective model, which in- a given order. The mode1 allows us to lïnd an
cludes linear, quadratic, convex, nonlinear and optimal order (in some certain sense) of jobs to
integer programming models (- 215 Integer be implemented on each machine.
Programming, 255 Linear Programming, 292 (6) Replacement model. There are two typical
Nonlinear Programming, 349 Quadratic Pro- cases. One is the preventive maintenance model,
gramming); (ii) multi-objective model; (iii) sto- which is suitable when replacements are done
chastic programming mode1 (- 408 Stochastic under a routine policy because a replacement
Programming); (iv) dynamic programming or a repair before a failure is more effective
mode1 (- 127 Dynamic Programming); (v) than after a failure. Probabilistic treatments
network flow mode1 (- 281 Network Flow are mainly used, and this mode1 resembles
Problems). those for queues and tMarkov processes. The
(2) Game-theoretic mode1 (- 173 Game other is a mode1 for deciding whether to re-
Theory). Game theory is a powerful tool for place a piece of equipment in use. In this case,
deriving a solution to practical problems in we need to compare costs of both used and
which more than one person is involved, with new equipment, and evaluations of various
each player having different objectives. types of present cost are important.
(3) Inventory mode1 (- 227 Inventory Con- (7) Simulation. This is a numerical experi-
trol). It is necessary for most lïrms to control ment in a simulated mode1 of a phenomenon
stocks of resources, products, etc., in order to which we want to analyze. Simulation is one of
carry out their activities smoothly; various the most popular techniques in OR.
inventory models have been developed for (8) Other models. Besides the models listed
such problems. Mathematically, optimization above, many problems are formulated by way
techniques (- 264 Mathematical Program- of various other models in OR. In modeling,
307 Ref. 1156
Operations Research

tprobability theory, and mathematical tstatis- weak operator topology, but not with respect
tics, especially, tMarkov chain, tmultivariate to the strong operator topology. The opera-
analysis, tdesign of experiments, tregression tion from 2 x J to &? of taking the product,
analysis, ttime series analysis, etc. often play (A, B)-tAB, is continuous with respect to the
important roles. uniform operator topology, is continuous with
respect to the strong operator topology when
the first factor is restricted to a set bounded in
References
the operator norm, but is not contmuous on
.-A x %. It is continuous with respect to the
[l] C. W. Churchman, R. L. Ackoff, and E. L.
weak operator topology when one of the fac-
Arnoff, Introduction to operations research,
tors is iïxed (i.e., it is separately continuous).
Wiley, 1957.
The set D is a TBanach space with respect to
[2] H. M. Wagner, Princip]es of operations
the operator norm, or, more precisely, a +C*-
research, Prentice-Hall, 1975.
algebra. It is a +locally convex topological
linear space with respect to the strong or weak
operator topology.
The Banach space g is the +dual of the
Banach space ~4, of all tnuclear operators in
308 (X11.19) S, (- 68 Compact and Nuclear Operators 1).
Operator Algebras The weak* topology in 33 as the dual of %, is
called the <T-weak topology.

A. Preliminaries

Let 5 be a +Hilbert space. The set of +bounded C. Von Neumann Algebras


linear operators on @ is denoted by .S(sj) = J.
It contains the identity operator 1. The notions
A subset ./H of .S is called a *-subalgebra if it
of opera.tor sum A + B, operator product AB,
is a subalgebra (i.e., A, BE.~ implies i.A+pB,
and +adJoint A* are defïned on it. A subalge-
ABE.~) and contains the adjoint ,4* of any
bra of U(sj) is called an operator algebra. In
AEM. The commutant .d’ of a subset .d of Z8
this article we consider mainly von Neumann
is the set of operators that commu1.e with both
algebras. For C*-algebras - 36 Banach Alge-
A and A* for AE.~. The commutant is a *-
bras G-K.
subalgebra, and .d’ = SZZ”‘.
Any tHermitian operator A (i.e., an operator
A von Neumann algebra .,&! is a =-subalgebra
such that A = A*) has the property that (Ax, x)
of %Ythat is delïned by one of the follow-
is always real for any XE$~. If (Ax,x)>O for
ing four equivalent conditions: (i) ,@ is a *-
any x, A is called positive, and we Write A 3 0.
subalgebra of 8 containing 1, closed under
When Hermitian operators A and B satisfy A
the weak operator topology; (ii) & is a *-
~ B 3 0, we Write A > B. Thus we introduce an
subalgebra of Ua,containing 1, closcd under
ordering A > B between Hermitian operators.
the strong operator topology; (iii) .&Y is the
A set {A,) of positive Hermitian operators is
commutant of a subset of 9 (or, equivalently,
said to be an increasing directed set if any two
.4? = &“‘); (iv) .4? is a *-subalgebra of .%Ycon-
of them A,, A, always have a common major-
taining Z, closed under the uniform operator
ant A,, that is, A,< A, and A, < A,. If a Her-
topology, and, as a Banach space, coinciding
mitian operator A satisfies (Ax, x) = sup(A,x, x)
with the conjugate space of some Banach
for such a set, it is called the supremum and is
space. Note that a *-subalgebra of g, closed
denoted by sup A,. The supremum sup A,
under the uniform operator topology, is a
exists if and only if the sup(A,x, x) is finite for
C*-algebra. Von Neumann a1gebra.s are also
any x, and then A, converges to A with respect
called rings of operators or W*-algebras. The
to the weak and strong operator topologies.
latter term is usually used for a C*-.algebra
*-isomorphic to a von Neumann algebra in
B. Topologies in .#I contrast to a concrete von Neumann algebra.
The study of these algebras was :started by J.
Various topologies are introduced in % = von Neumann in 1929. He showed the equiva-
.a(%): the +uniform operator topology, the 1 lente of conditions (i)-(iii) (von Neumann%
+strong operator topology, and the tweak 1 density theorem), and established a foundation
operator topology (- 251 Linear Operators). for the theory named after him [l] The notion
These topologies are listed above in order of of von Neumann algebras cari be regarded as
decreasing îïneness. The operation in % of a natural extension of the notion oFmatrix
taking the adjoint, A-t A*, is continuous with algebras in a lïnite-dimensional space, and
respect to the uniform operator topology and therein lies the importance of the theory. The
1157 308 D
Operator Algebras

fourth condition of the defmition was given by plete description of a11 possible normal repre-
S. Sakai (Pacifie J. Math., 1956). sentations (i.e., a normal *-homomorphism
The following theorem is of use in the the- into some B) of a von Neumann algebra.
ory of von Neumann algebras: Given a *-
subalgebra d of LB containing 1, its closure &
with respect to the weak or strong operator
D. States, Weigbts, and Traces
topology is von Neumann algebra; and when
we denote the set of elements of operator norm
< 1 in d (resp. J&‘) by ~2, (resp. &ZJ, A1 is A state <pof a C*-algebra & is a complex-
likewise the closure of ~2, with respect to the valued function on ~-4 that is (1) complex
weak or strong operator topology (Kaplan- linear: q(A+B)=cp(A)+cp(B), ~(CA)=~(A)
sky’s density tbeorem (Ann. Math., 1951). for A, BE.&, C~C, (2) positive: q(A*A)>O for
If E is a projection operator in a von Neu- AE &, and (3) normalized: 11cp\( = 1 (equiva-
mann algebra 4, then EJE = {EAE 1A Idi’} lent to ~(1) = 1 if I E &). For any positive
is a *-subalgebra of %? closed with respect to linear functional <p on d, there exists a triplet
the weak operator topology. It is not a von (sj., xc, 5,) (unique up to the unitary equiva-
Neumann algebra because it does not contain lente) of a Hilbert space &,, a representa-
I, but since its elements operate exclusively in tion ne (i.e., a *-homomorphism into B(&,,))
the closed subspace E!& we cari regard it as of LX~,and a vector 5, in a,+, such that <p(A) =
an algebra of operators’on E!-j. In this sense, (n,(A)&, 5,) and sj, is the closure of K,(&‘)&.
EdE cari be regarded as a von Neumann The space $J,,, is constructed by detïning the
algebra on E4j, which we cal1 the reduced von inner product (q(A), q(B)) = @?*A) in the
Neumann algebra of &’ on ES, and Write J?~. quotient of J& by its left ideal {Alq(A*A)=
If E is a projection operator in A’, EkfE = 0}, where q is the quotient mapping, and by
Ed restricted to the subspace E$j is called completion. Then Z, is defined by n,(A)@) =
the induced von Neumann algebra of & on q(AB). This is called the GNS construction
Etij and is denoted also by dE. In the latter after its originators 1. M. Gel’fand, M. A.
case, the mapping A E J%’+ EA E &fE is a *- Naïmark, and 1. E. Siegel.
homomorphism and is called the induction of A weigbt <p on a von Neumann algebra Jz’ is
dl ont0 AE. a function deiïned on the positive elements of
The tensor product 5jj, @ & of two Hilbert A, with positive real or infinite values, which
spaces Gi (i = 1,2) is the tcompletion of their is additive and homogeneous (q(A + B) =
ttensor product as a complex linear space q(A)+rp(B) and &A)=~~I(A) for a11 A, BE
equipped with the unique tinner product satis- .,&’ and I > 0 with the convention 0. CO= 0 and
fying Ch Q.Lgl Og,)=(f,,g,),(f,,g,)* for cc + a = CO). It is said to be faitbful if it does
a11fi, gis!&. For any AicB(Sji), there exists not vanish except for rp(O)=O, normal if q(A)
a unique operator in a(!& 0 !&) denoted = sup <p(A,) whenever A, is an increasing net
by A, 0 A, satisfying (A, 0 A,)(f, 0 fi) = of positive elements of J%’ and A = sup A,, and
A,f, @ A2f2 for a11fic!&. For von Neumann semihite if the left ideal ‘!Rq, consisting of a11
algebras Ai c %(!&), the von Neumann alge- elements A~dt’ for which ‘p(A*A) is finite, has
bra generated by A, Q A, with Ai~Mi is de- the property that the linear span !IX, of %m%,
noted by A1 0 .,+Y2and is called the tensor is dense in A. The restriction of cp to positive
product of JZ!, and .L&. The *-isomorphism AE elements of W, has a unique extension to a
&+A @ 1 E & @ 1, where 1 is the trivial von linear functional on !JJl,, which we denote by
Neumann algebra consisting solely of complex the same letter <p. Canonically associated with
multiples of the identity operator, is called an a normal semilïnite weight <p, there exists a
amplification. Hilbert space Ç&, a normal *-representation z,+,
For two von Neumann algebras .Jz’~c @ji) of &!, and a complex linear mapping q from
(i = 1,2), a *-isomorphism n from A1 into &Y2 ‘9&, into a dense subset of $j, such that (q(B’),
is called spatial if there exists a unitary (i.e., a @))=cp(B*B), and q&A)t@)=q(AB), where
bijective isometric linear) mapping U from !& B, B’E’%, and AE 4. If cp is finite (i.e., ‘!R,,,=
to $j2 such that UAU*=nA for a11 Ae.M1, A), then its extension to &’ is a positive linear
and a *-homomorphism R is called normal if functional for which the triplet (!$,, z,,,, q( 1))
supanA, = ~(sup, AJ whenever A, is a bounded is given by the GNS construction.
increasing net in A. Any normal *-homomor- The linear span of a11 normal states of a von
phism is continuous in the strong and weak Neumann algebra A is a norm-closed sub-
operator topologies and is a product of an space of its dual A*, called its predual and
amplification, a spatial *-isomorphism, and an denoted by J&, because 4’ turns out to be the
induction. Its kernel is of the form E&i!, where dual of A.+.
E is a projection operator belonging to the A trace t on a von Neumann algebra .4? is a
tenter ZJ‘= J?! n 4’ of 4. This gives a com- weight satisfying t(UAU*) = t(A) for U unitary
308 E 1158
Operator Algebras

in A and for a11 positive A in A (equivalently, finite), II, (i.e., of type 11 and not fïnite), and
t(A*A)= t(AA*) for a11 A EA). III. A factor of type 1, is isomorphic to the
algebra B(5) of a11 bounded operators on an
n-dimensional Hilbert space sj. Since the dis-
E. The von Neumann Classification
covery of two nonisomorphic examples of
factors of type II, by F. J. Murray and von
A von Neumann algebra for which a semi- Neumann (1943), classification of factors has
fmite normal trace does not exist is called a
been a central problem in the theory of von
purely infinite von Neumann algebra or von Neumann algebras. After 1967, great progress
Neumann algebra of type III. In contrast to was made in the investigation of isomorphism
this, a von Neumann algebra & is called classes of factors, and we have uncountably
semifinite (resp. finite) if for each positive Her- many nonisomorphic examples of factors of
mitian operator A (#O) in .I there is a semi-
types II,, II,, and III. After the discovery of
fïnite (resp. tïnite) normal trace t such that the third to ninth nonisomorphic examples of
t(A)#O. Every Abelian von Neumann algebra
factors of type II, by J. Schwartz (1963) (the
is lïnite. If there are no central projection oper- third example, 1963), W. Ching (the fourth),
ators E #O such that &E is finite, A is called Sakai (the fïfth), J. Dixmier and E. (3. Lance
properly infinite. Purely infinite &? and .%(4j) (the sixth and seventh), and G. Zeller-Meier
for inlïnite 5j, for example, are properly in- (the eighth and ninth), D. McDuff showed that
fmite. A nonzero projection operator E in 4 is there exist countably many nonisomorphic
called Abelian when &E is Abelian. We cal1 & examples of factors of type II 1, and lïnally
a von Neumann algebra of type 1 (or discrete) McDuff (Ann. Math., 1969) and Sakai (J. Func-
when it contains an Abelian projection E for tional Anal., 1970) showed the existN:nce of
which 1 is the only central projection P cover- uncountably many nonisomorphic examples
ing E (i.e., E < P). A von Neumann algebra is of factors of type II,. For type III factors -
of type II if it is semitïnite and contains no Section 1.
Abelian projection. A von Neumann algebra
of type 1t is called of type II 1 if it is fïnite and
of type II, if it is properly infinite. A fïnite von G. Tbe Integral Direct Sum and
Neumann algebra is also characterized as a Decomposition into Factors
von Neumann algebra in which the operation
of taking the adjoint is continuous with re- The Hilbert spaces considered in this section
spect to the strong operator topology on are a11 tseparable. Let (‘%R,d, p) be a tmeasure
bounded spheres (Sakai, Proc. Japan Acad., space; with each [E~JI we associate a Hilbert
1957). A properly infinite von Neumann alge- space $j([). We consider functions x(5) on 9JI
bra d is characterized by the property that whose values are in 5j([) for each <. Let K be
.M and J&?@ g(sj) for any separable 5j is *- a set of these functions having the following
isomorphic. properties: (i) Ilx([)ll is measurable for x([)EK;
Given a von Neumann algebra A, there (ii) if for a function y([), the numerical func-
exist mutually orthogonal projections E,, E,,, tion (x(c), y([)) is measurable any x(LJEK,
E,,, in the tenter d of .M such that E, + E,, + then yak; (iii) there is a countable family
E,,, = 1, and AE,, AE,,, ME,,, are von Neumann {x1 (<), x,(l), . . } of functions in K such that
algebras of type 1, type II, type III, respec- for each fixed [c!IR, the set {xl([),xz([), . ..} is
tively. This decomposition is unique. There dense in b(c). Then K is a linear space. We cal1
also exist unique central projections E, and each function in K a measurable vector func-
Ei such that -AE, is f-mite, AE, is properly in- tion. We introduce in the set of measurable
finite, and Es + Ei = 1. The two decompositions vector functions x(i) with
cari be combined. (If some of the projections E
are 0, the condition on the corresponding .AE
is to be waived.)

an equivalence relation by delïning x(c) and


F. Factors y([) as equivalent when

A von Neumann algebra whose tenter consists


exclusively of scalar multiples of the identity
operator is called a factor. Von Neumann’s Thus we obtain a space of equivalence classes
reduction theory (- Section G) reduces the which we denote by $. $5 is a Hilbert space
study of arbitrary von Neumann algebras on

(x,
y)
=s(x(i)>
y(i))
440
with the inner product
a separable Hilbert space more or less to the
study of factors. Factors are classified into
types 1, (n = CO,1,2,. ), II, (i.e., of type II and
1159 308 1
Operator Algebras

which is called the integral direct sum (or direct system for a given one-parameter group of
integral) of fi([). An operator function A(c) automorphisms (of a C*-algebra) describing
that associates with each c E YJI a bounded the time-development of the system. It was a
linear operator A([) on sj([) is called measur- fortunate coincidence that this condition was
able if for any measurable x(c), A(i)~(i) is also formulated in a so-called C*-algebra approach
measurable. If, moreover, IIA(c)JI is bounded, to statistical mechanics by R. Haag, N. M.
A(c) transforms a function in 6 to a function Hugenholtz, and M. Winnink [ 141 at about
in & and thus defïnes a bounded linear oper- the same time that Tomita’s work appeared
ator on b. This operator is called the integral in 1967. The original proofs of the Tomita-
direct sum (or direct integral) of A(c), and an Takesaki theory have been simplified con-
operator on J3 that cari be reduced to this siderably by the work of M. Rieffel [16] and
form is called decomposable. A. Van Daele [15]. Deeper insight into the
Generally, let fi be a Hilbert space, and signifïcance of modular automorphisms is also
consider an Abelian von Neumann alge- provided by the work of A. Connes [ 191,
bra d on 4j. Then we construct a measure showing that the group of modular automor-
space (YJI, 8, PL)and represent d as the set of phisms (up to inner automorphisms) is intrin-
bounded measurable functions on ‘!LX. (This is sic to the von Neumann algebra (i.e., inde-
possible in different ways. The Gel’fand repre- pendent of the weight) and belongs to the
sentation is an example.) Then a Hilbert space tenter of Out .& (the group Aut A of a11 *-
5(i) cari be constructed SO that b is repre- automorphisms of the von Neumann algebra
sented as the integral direct sum of a(<). Oper- A modulo the subgroup Int A of a11 inner
ators in d are a11 decomposable and are called *-automorphisms).
diagonalizable. A von Neumann algebra & on Some of the basic definitions and results of
$j whose elements are a11 decomposable is the Tomita-Takesaki tbeory are as follows. If <p
characterized by .,#Yc &‘. The A([) yielded is a normal semifinite faithful weight on A,
by the decomposition of operators A in A the antilinear operator S,, deiïned on a dense
generate a von Neumann algebra A([) on subset r#I, fl X;) of !$, (- Section D) by the
B(l). If we take as d the tenter 3 of A, then relation S,+&4)=~(A*), is tclosable and the
almost a11 the .,#Y(<) are factors (Von Neumann% polar decomposition S, = J,Az’ of its closure
reduction tbeory), and if we take as d a maxi- defines a positive self-adjoint operator A,,
mal Abelian von Neumann algebra contained called a modular operator, and an antiunitary
in .&‘, then almost a11 the A(c) are type 1 involution J,,,. The principal results of the
factors (F. 1. Mautner, Ann. Math., 1950). Tomita-Takesaki theory are (1) if x E A, then
@‘(A) E A$4Aii’ E &! for a11 real t, and this
defines a continuous one-parameter group of
H. Tomita-Takesaki Tbeory *-automorphisms @ of .L, called modular
automorphisms, and (2) if AEM, then j,(A)=
Motivated by the problem of proving the J,AJ,EX, and j, is a conjugate-linear iso-
commutant theorem for tensor products (i.e., morphism of A onto X. A weight rp on A
(A1 @ AZ)’ =A; @ Ai), which remained is said to satisfy the KMS condition at fi (a real
unsolved for algebras of type III up until that number) relative to a one-parameter group of
time, Tomita succeeded in 1967, after years of *-automorphisms a, of &Z if, for every pair
effort, in generalizing the theory of Hilbert A, BE %+, n %$, there exists a bounded con-
algebras, previously developed’only for semi- tinuous function F(z) (depending on A, B), on
finite von Neumann algebras. The most im- 0 < Im z < /? holomorphic in 0 < Im z < b and
portant ingredient of this theory lies in certain such that F(t) = cp(Aa,(B)) and F(t + il?)=
one-parameter groups of *-automorphisms of ~(O#S)A). A given one-parameter group a,
a von Neumann algebra, called modular auto- coincides with a group of modular automor-
morphisms (see below), each one-parameter phisms a: if and only if cp satistïes the KMS
group of modular automorphisms being in- condition at p= -1 relative to ut. (In statistical
trinsically associated with a faithful semifïnite mechanics, B = (kT)-‘, where k is the Boltz-
normal weight of the algebra. Tomita’s theory mann constant and T the absolute tempera-
was perfected by Takesaki [ 131, who also ture of the system.)
showed that modular automorphisms satisfy
(and are characterized by) a condition origi-
nally introduced in statistical mechanics by 1. Structure and Classification of Factors of
the physicists R. Kubo, P. C. Martin, and J. Type III
Schwinger and accordingly known as the
KMS condition. In the mathematical founda- At the Baton Rouge Conference in 1967, R.
tions of statistical mechanics, this condition T. Powers reported his results [ 171 on non-
characterizes equilibrium states of a physical isomorphism of the one-parameter family
308 J 1160
Operator Algebras

of factors of type III (now called Powers’s conjugacy class of Q, modulo inner automor-
factors), which had been constructed by von phisms. The restriction 8, of 0, to the tenter 3
Neumann in 1938 in terms of an iniïnite ten- of A’ is of special importance. M is a factor if
sor product of factors of type 1 (abbreviated and only if 8, is ergodic. In that case, &! is of
as ITPFI). Prior to Powers’s work only three type III 1 if M is a factor, & is of type III,,
different kinds of factors of type III, along 0 < 3, < 1, if 0, is periodic with period - log A,
with the same number of factors of type II 1, and A is of type III, if & is aperiodic and not
had been distinguished. A systematic classifïca- isomorphic to the one-parameter group of
tion of ITPFIs was subsequently given by *-automorphisms of L,(R) induced by the
H. Araki and E. J. Woods [18] in terms of two translations of the real line R. (The excluded
invariants, i.e., the asymptotic ratio set r,(,N) case does not occur for &X’ of type Il 1.)
and the p-set p(&‘) of the von Neumann A von Neumann algebra on a separable
algebra .X. Using the Tomita-Takesaki the- Hilbert space is called approximately fïnite-
ory, Connes [19] introduced two invariants, dimensional (or approximately finite or byper-
namely the S-set S(.&‘) (the intersection of the finite) if it is generated by an increasing se-
spectra of all modular operators) and the T-set quence of finite-dimensional *-subalgebras.
T(.M) (the set of a11 real t for which the modu- This class of von Neumann algebras includes
lar automorphism @’ is inner), and, when J& is many important examples, such as ITPFI and
an ITPFI, proved the equality S(.M) = r,(d) the von Neumann algebra generated by any
and the relation T(d) = 2z110gp(J&)l -l. The representation of canonical commulation (or
S-set S(.,#) of a factor of type III on a separa- anticommutation) relations on a separable
ble Hilbert space is either the set of all non- Hilbert space. The classification of approxi-
negative reals (type III,), the set of a11 integral mately finite-dimensional factors is almost
powers of 3, (where 0 <n < 1) and 0 (type III,), complete. In fact the uniqueness of an approxi-
or the set {O, 1) (type III,). The work of Araki mately finite-dimensional factor of type II, has
and Woods shows that there exists only one been known since the work of von Neumann.
ITPFI of type III, for each LE(O, 1) (the exam- It is called the hypertïnite factor. Th’z unique-
ples of Powers) as well as for  = 1, while there ness of approximately lïnite-dimensional fac-
exist continuously many ITPFIs of type III,. tors of type II, (which is then the tensor prod-
Woods [20] has shown that the classification uct of the hyperfïnite factor with a($)) and of
of ITPFIs of type III, is not smooth. type III,, 0 <IL < 1, (which are then Powers’s
A structural analysis of factors of type III, factors) has been demonstrated by Connes
given independently by Connes [ 193, Takesaki [22]. Approximately fïnite-dimensional fac-
(Acta Math., 1973), and Araki (Publ. Res. Inst. tors of type III, are classifïed exactly by the
Math. Sci., 1973) expressed independently a isomorphism classes of the ergodic groups 8c
certain class of factors of type III as a kind of of *-automorphisms of commutative von Neu-
crossed product of semifïnite von Neumann mann algebras 3. Any such factor is a Krie-
algebras with their injective endomorphisms ger’s factor, Le., a crossed product OFa com-
(automorphism in the case of Connes). These mutative von Neumann algebra with a single
analyses led Takesaki [21] to the discovery *-automorphism. Examples of such factors
of a duality theorem for crossed products of have been extensively studied by Krieger, who
von Neumann algebras with locally compact has also shown [23] that isomorphism of a
groups of theik *-automorphisms (- Section J) Krieger’s factor is equivalent to weak equiva-
and its application to the following structure lente of the associated nonsingular transfor-
theorem for von Neumann algehras of type III. mation of the standard measure space.
The crossed product of a von Neumann alge- A von Neumann algebra on a separa-
bra .LZ with the group of modular automor- ble Hilbert space is approximately finite-
phisms @’ is a von Neumann algebra M of dimensional if and only if it is injective (- 36
type II,, with a canonical action 0, of the Banach Algebras H).
dual group as a one-parameter group of *-
automorphisms which is trace-scaling, i.e.,
z o 8, = e -‘T for some faithful normal trace z. J. Crossed Products
If d is properly infinite, the crossed product
of .,lr with 0, is isomorphic to the original von The crossed product U410, G of a vo n Neu-
Neumann algebra .L. In particular, any von mann algebra .&’ (acting on a Hilbert space
Neumann algebra 4? of type III cari be writ- sj) and a locally compact Abelian group G
ten as the crossed product of a von Neumann relative to a continuous action c( of G on .M
algebra X of type II, with a one-parameter (by *-automorphisms us, gE G) is the von
group of trace-scaling *-automorphisms 0,. Neumann algebra N generated by 1he opera-
The isomorphism class of d is determined by tors n(A), AE&? and i(h), heG, defined on the
the isomorphism class of .k’ together with the Hilbert space L,(G, $j) of all .!j-valued Lz-
1161 308 Ref.
Operator Algebras

functions on G (relative to the Haar measure) Topological Groups). If this representation is


by a tfactor representation, the type of this repre-
sentation is defined according to the type of
the von Neumann algebra -&’ generated by
U,, gE G. A tgroup of type I is a group whose
factor- representations are all of type I. For
where t: E L, (G, J3). The canonical action 6(
example, connected semisimple Lie groups and
of the dual G on Jlr is defined by 62,(B) =
connected nilpotent Lie groups are of type I
p(p)Bp(p)* for BE&” and pee, where p(p)
(Harish-Chandra, Trans. Amer. Math. Sot.,
is defined by CAPM (9) = (9, P> t(g). The 1953; J. M. G. Fell, Proc. Amer. Math. Sot.,
duality theorem of Takesaki [2t] asserts
1962). Examples of groups that are not of type
that [A 0, G] @e is isomorphic to & 0
I are known (- 437 Unitary Representation
@L,(G)), where the second factor a@,(G))
El.
is the algebra of all bounded linear operators
on L,(G).
References
K. Natural Positive Cone
[l] J. von Neumann, Collected works II, III,
The closure Vu of the set of vectors A;q(A) Pergamon, 1961.
for all positive A in ‘%+, n ‘%s reflects certain [2] W. Arveson, An invitation to C*-algebras,
properties of the von Neumann algebra & Springer, 1976.
for O< c(< l/2 [24,25], In particular, V’i4 is [3] 0. Bratteli and D. W. Robinson, Operator
called the natural positive cone. It is a self-dual algebras and quantum statistical mechanics,
closed convex cone, and is intrinsic to the von Springer, 1979.
Neumann algebra & (i.e., independent of [4] J. Dixmier, Von Neumann algebras,
the weight rp). Every normal positive linear North-Holland, 1981.
functional $ on & has a unique representa- [S] J. Dixmier, Les C*-algbbres et leurs rep&
tive <(+) in this cone (i.e., t+b(A)=(q+,(A)<($), sentations, Gauthier-Villars, 1964.
t($))), and the mapping l is a concave, mono- [6] I. Kaplansky, Rings of operators, Ben-
tone, bijective homeomorphism, homo- jamin, 1968.
geneous of degree l/2. The group of all *- [7] G. K. Pedersen, C*-algebras and their
automorphisms of & has a natural unitary automorphism groups, Academic Press, 1979.
representation U(g), gEAut A, satisfying the [8] S. Sakai, C*-algebras and W*-algebras,
relations U(g)ALJ(g)*=g(A), U(g)s(cp)= Springer, 1971.
5((PogP). [9] J. T. Schwartz, W*-algebras, Gordon &
Breach, 1967.
[lo] S. Stratila and L. Zsido, Lectures on von
L. C*-Algebras and von Neumann Algebras Neumann algebras, Editura Academiei/Abacus
Press, 1975.
Let a C*-algebra & be given. A *-represen-
[l l] M. Takesaki, Theory of operator algebras
tation x+ TX gives rise to a von Neumann
I, Springer, 1979.
algebra A, generated by TX, x E &. The type of
[ 121 Y. Nakagami and M. Takesaki, Duality
this representation is defined according to the
for crossed products of von Neumann alge-
type of &. A C*-algebra is called a C*-algebra
bras, Lecture notes in math. 731, Springer,
of type I if its *-representations are always of
1979.
type I. It is known that this class is exactly the
[ 131 M. Takesaki, Tomita’s theory of modular
class of GCR algebras (- 36 Banach Algebras
Hilbert algebras and its applications, Lecture
H). It is also known that a separable non-type
notes in math. 128, Springer, 1970.
I C*-algebra has a representation of type II
[14] R. Haag, N. M. Hugenholtz, and M.
and a general non-type I C*-algebra has a
Winnink, On the equilibrium states in quan-
representation of type III (J. Glimm, Ann.
tum statistical mechanics, Comm. Math. Phys.,
Math., 1961; Sakai [S]).
5 (1967), 215-236.
For a C*-algebra &, all its representations
[ 151 A. Van Daele, A new approach to the
generate tinjective von Neumann algebras if
Tomita-Takesaki theory of generalized Hilbert
and only if & is tnuclear [26,27].
algebras, J. Functional Anal., 15 (1974), 378-
393.
M. Topological Groups and von Neumann [ 161 M. Rieffel and A. van Daele, A bounded
Algebras approach to Tomita-Takesaki theory, Pacific
J. Math., 69 (1977), 187-221.
Consider a unitary representation g+ Ug of a [ 171 R. T. Powers, Representations of uni-
locally compact Hausdorff group G (- 423 formly hyperfinite algebras and their asso-
309 A 1162
Orbit Determination

ciated von Neumann rings, Ann. Math., (2) metric, and eclipsing binaries can be deter-
86 (1967), 138-171. mined by similar methods.
[18] H. Araki and E. J. Woods, A classilica-
tion of factors, Publ. Res. Inst. Math. Sci., (A)
4 (1968) 51-130.
B. Kepler’s Orbital Elements
[ 191 A. Connes, Une classification des facteurs
de type III, Ann. Sci. Ecole Norm. Sup., (4) 6
(1973), 1333252. Consider, for example, an asteroid moving on
1203 E. .I. Woods, The classification of factors an ellipse with one focus at the sum. The ellip-
is not smooth, Canad. J. Math., 25 (1973), 966 tic orbit is fixed by the initial conditions of the
102. motion or the integration constant,3 of the
[21] M. Takesaki, Duality for crossed prod- tHamilton-Jacobi equation (- 55 Celestial
ucts and the structure of von Neumann alge- Mechanics) and is described by Kepler’s orbital
bras of type III, Acta Math., 131 (1973) elements a, e, w, i, Q, and t, (Fig. 1).
249-310.
[22] A. Connes, Classification of injective north
factors, Ann. Math., 104 (1976) 733115.
1231 W. Krieger, On ergodic flows and the
isomorphism of factors, Math. Ann., 223
(1976) 19-70.
[24] H. Araki, Some properties of modular
conjugation operators of von Neumann alge-
bras and a non-commutative Radon-Nikodym
theorem with a chain rule. Pacific J. Math., 50
(1974), 3099354.
[25] A. Connes, Caracterisation des espaces Fig. 1
vectoriels ordonnes sous-jacents aux algebres Orbital elements
de von Neumann, Ann. Inst. Fourier, 24, Fast.
4 (1974): 127-156. The size and shape of the ellipse are deter-
[26] M. Choi and E. Effros, Separable nuclear mined by the semimajor axis (half I:he tmajor
C*-algebras and injectivity, Duke Math. J., 43 axis) a and the teccentricity e, while the argu-
(1976) 309-322. ment w of perihelion, measured from the as-
[27] M. Choi and E. Effros, Nuclear C*- cending node to the perihelion, shows the
algebras and injectivity: The general case, direction of the major axis. (Sometimes, we
Indiana Univ. Math. J., 26 (1977) 443-446. adopt as one of the main parameters the peri-
[28] R. V. Kadison and J. R. Ringrose, Funda- helion distance 4 = a( 1 -e) instead of the semi-
mentals of the theory of operators I, Academic major axis a.) The position of the orbital plane
Press, 1983. is determined by the inclination angle i to the
[29] S. Stratila, Modular theory in operator ecliptic and the longitude R of the ascending
algebras, Editura Academiei/Abacus Press, node, and then the position of the asteroid
1981. on the orbit is determined by the time t, of
the perihelion passage. The period T of one
revolution, or mean motion n = 2n/T, which
is the mean angular velocity, is computed by
Kepler’s third law u2a3 = p, with p a constant
309 (XX.7) depending on the mass of the asteroid. The
mean motion is a fundamental frequency in
Orbit Determination the solution of the Hamilton-Jacobi equation
and is obtained by differentiating the energy
A. General Remarks constant -p/2a with respect to an action
variable fi.
The purposes of the theory of orbit determi- To express the position of the asteroid on
nation are (1) to study properties of orbits of the ellipse as a function of time, we use the
celestial bodies, (2) to determine orbital ele- true anomaly u, which is the angular distance
ments from observed positions of the celestial of the asteroid from the perihelion. the eccen-
bodies, and (3) to compute their predicted tric anomaly E, and the mean anomaly M =
positions utilizing the orbital elements. Celes- n(t -to). Of these three anomalies the mean
tial bodies to which the theory is applied are anomaly can be derived directly from Kepler’s
mainly planets, asteroids, comets, satellites, elements, although it must be transformed to
and artificial satellites in the solar system, the true anomaly or to the eccentric anomaly
although orbits of meteors and visual, photo- when we compute the coordinates of the aster-
1163 309E
Orbit Determination

oid. Kepler’s equation Because of the perturbations, the orbit


deviates from the fixed ellipse, although at
E-esinE=M (1) every moment the instantaneous velocity and
holds between E and M. Solving this equation, position of the asteroid determines an ellipse.
we obtain an expression for E as a function of The orbital elements of the ellipse thus defined
M: at each moment, called osculating elements, are
variable with time. To compute perturbations
E = M +“~i~JJne)sinnM, that cause this change of osculating elements,
it is necessary to observe the initial conditions
where J,, is the tBesse1 function of order n. of motion, i.e., the osculating elements at the
However, in practical computations, we often initial moment. During a time interval shorter
solve equation (1) directly by numerical than the period of one revolution, the vari-
methods or by using tables. ations of the osculating elements are usually
very small. Therefore, by three sets of observa-
tions made at three moments at short inter-
C. Orbit Determination vals, it is possible to determine the orbital
elements that can.be identified with the oscu-
An astrometric observation of a celestial body lating elements observed at the mean moment.
usually consists of measurements of two co- However, if the intervals are very short, errors
ordinates (right ascension and declination) on in the determined values often are very large,
the celestial sphere. Therefore, to derive six and it becomes necessary to carry out obser-
orbital elements, three sets of observations vations at distant moments also. When such
should be made at three moments separated additional observations are made, those data
by appropriate time intervals. If the topocen- are compared with the respective values that
tric distance of the celestial body is known, follow from the initial observations, and the
the orbital elements can be computed directly perturbations computed from them; then the
from observations. However, since the distance method of least squares is used to improve the
is not usually known, special methods have estimation of the orbital elements.
had to be developed. A method for orbit deter-
mination was worked out by C. F. tGauss at
the beginning of the 19th century to find the E. Artificial Satellites
orbit of Ceres, the first asteroid to be dis-
covered. Although the topocentric distances Since the periods of revolution of asteroids
are not known, we know that orbits of aster- are of the order of a few years, the osculating
oids are planar, and Kepler’s second law, the elements change very little in a few weeks. On
law of conservation of area1 velocity, holds the other hand, for artificial satellites moving
approximately. Therefore we can assume that around the earth, the periodic as well as secu-
the area of the triangle made by the sun and lar perturbations become very large after a
the two positions of the asteroid observed at few hours because the period of revolution
different moments is proportional to the cor- may be as short as two hours. Therefore, to
responding time interval. Using this property determine orbital elements for artificial satel-
of the orbit we can derive the topocentric lites, observed positions should be corrected
distance and then the orbital elements. This by subtracting the effects of periodic pertur-
method is called the indirect method, and bations computed from approximate orbital
similar methods can be developed for para- elements already known. By using the obser-
bolic and hyperbolic orbits. vations thus corrected, mean orbital elements
are derived by the method of orbit improve-
ment. The approximate orbital elements can
D. Osculating Elements and Orbit be computed if the launching conditions of the
Improvement satellites are known. In this manner, mean
orbital elements can be derived every day, and
For the ttwo-body problem the orbit is a fixed variations of the mean orbital elements, or
and invariable ellipse, and therefore Kepler’s amounts of secular perturbations, for a certain
orbital elements are constants. On the other period (say, for 100 days) are found. From
hand, when gravitational interactions from them, information on atmospheric density and
other bodies cannot be disregarded, the orbital the gravitational potential of the earth are
elements are found to be variable by comput- derived. It should be remarked that for arti-
ing the tperturbations by the tmethod of vari- ficial satellites distance measurements have
ation of constants. The perturbations are been made by radar, and velocity determi-
expressed as sums of periodic, tsecular, and nations have been made by measuring the
long-periodic terms. Doppler effect.
309 F 1164
Orbit Determination

For satellites of other planets, measurements Section I), +spectral resolution, and tergodic
of two coordinates with respect to the centers theory. The central notion is Bana’zh lattices
of the planets are made. Masses of planets can (- Section F), but the theory has been ex-
be computed by Kepler’s third law when the tended to the case where E is a +locally convex
orbital elements of satellites are known, and topological linear space with the SI ructure of
gravitational potentials of the planets can be a vector lattice [S-S].
determined from their secular perturbations.

B. Definitions
F. Binaries
A real tlinear space E is said to be an ordered
In the study of visual binaries, methods similar linear space if E is supplied with an torder
to those for satellites can be applied, although relation > with the following two properties:
the exact estimation of the distances to bina- (i)xayax+z>y+z;(ii)x>yan’i/13O(iis
ries is often impossible. For photometric a real number) j ix > 3,~.
binaries radial components of velocities are If, in addition, E forms a UaHice under this
derived by measuring the Doppler effect; and order 2, we call E a vector lattice (Riesz space
for eclipsing binaries important information, or lattice ordered linear space).
such as their masses, densities, and sizes, as For Sections B through E, we assume E to
well as data regarding their internal consti- be a vector lattice. For any x, YE E, the tjoin
tutions, can be derived from the observed and +meet of x, y are denoted by x v y and
orbital elements. x A y respectively. The following relations are
obvious:
References (x+z)v(y+z)=(xvy)+z,

[I] G. Stracke, Bahnbestimmung der Planeten


und Kometen, Springer, 1929. ixviy=i(xvy), ixAiy=l.(xAy) (i>O),
[2] A. D. Dubyago, The determination of
orbits, Macmillan, 1961. iXViy=i(XAy), iXAiy=i(XVJl) (iGo)

[3] C. F. Gauss, Theoria motus, Dover, 1963. and


[4] P. R. Escobal, Methods of orbit determi-
nation, Wiley, 1965. (XVy)AZ=(XAz)V(yAZ),

The last relation means that E is a tdistribu-


tive lattice.
310 (X11.4) For x E E, the elements x v 0, ( -.r) v 0, and
x v( -x) are called respectively the positive
Ordered Linear Spaces
part, negative part, and absolute value of the
element x, and are denoted respectively by x+,
A. History x-, and 1x1. The following identities hold: x=
X+ -x (Jordan decomposition), Ix I= x+ +
Many spaces used in functional analysis, such x ,X+/TX-= 0, XVy+XAy=X+J>, liXl=

as +Hilbert spaces, +Banach spaces, and rtopo- Jillxl, and Ix-yl=xvy-XAY.


logical linear spaces, are generalizations of For a, b6E with a<b, the set {xla<x<h}
Euclidean spaces, where the leading idea has is called an interval and is denoted by [a, b].
been to generalize the distance in Euclidean A subset of E is called (order) bounded if it is
spaces in various ways. On the other hand, included in an interval. An element e of E is
generalizing the order concept for real num- said to be a unit or an Archimedean unit if
bers has led to spaces of another kind: ordered for any x E E there exists a natural number n
linear spaces and vector lattices. The theory of such that x < ne. A linear subspace I of E is
vector lattices was presented in a lecture by F. called an ideal (or order ideal) of E if x E I and
Riesz at the International Congress of Mathe- Ivl<lxl imply ~61.
maticians in 1928 [l] and has been developed
by many authors. Among them we cite H.
Freudenthal, L. V. Kantorovitch (Mat. Sb., 2 C. Order Limits
(44) (1937)), Riesz (Ann. Math., (2) 41 (1940)),
S. Kakutani, F. Bohnenblust, G. Birkhoff [2], Given a subset {xZ} of E, if an element x of
H. Nakano [3], B. C. Vulikh [S], and H. H. E is an upper bound of {x.} and any upper
Schaefer [S]. Vector lattices have been used in bound JJ of {xZ} satisfies the relation y 2 x,
lattice-theoretic treatments of integration (- then it is called the least upper bound (or su-
1165 310 F
Ordered Linear Spaces

premum) of {x.} and is denoted by sup, x, or Let E, be a linear space of functions defined
VoI x,. The greatest lower bound (or infimum) on a set R ordered pointwise. If there exists a
of {x~}, denoted by inf,x, or A\.x#, is defined bijective mapping defined on a vector lattice E
dually. onto E, that is linear and order isomorphic,
A sequence {x,} (x,eE) is said to be order we call E, a representation of E. If E has an
convergent to x if there exists a nonincreasing Archimedean unit and is simple (which means
sequence {u”} (u, E E) such that A,, u, = 0 and that E and (0) are the only ideals of E), then E
Ix, - XI < u,. In this case x is called the order can be represented as the set of real numbers
limit of {x,} and is denoted by x=o-limx,. For such that the Archimedean unit of E is repre-
order convergent sequences {x,,} and { y,}, we sented by the number 1 (H. Freudenthal, Proc.
can show the following relations: o-lim(lx, + A/ad. Amsterdam, 39 (1936)).
py,,) = J(o-limx,) + p(o-lim y,) and o-lim(x,x y,)
= (o-lim x,) :: (o-lim y,). We say that E is Arcbi-
E. Dual Spaces
medean if the relations 0 < nx < y (n = 1,2,3, . . )
imply x = 0. If E is Archimedean, then the
Let !2(E, F) be the set of order bounded linear
relations x = o-lim x, and ,?= lim i, imply Ix =
mappings of a vector lattice E into a vector
o-lim &,x,. We say that E is complete (G-
lattice F, where order boundedness means that
complete) if any (countable) subset of E that is
any bounded (in the sense of the order) subset
bounded above has a least upper bound. A
of E is mapped into a bounded set of F. For
e-complete vector lattice is always Archime-
any cpi, qpzE 2(E, F), define ‘pi > ‘pz to mean
dean. If E is a-complete, for any sequence {x,}
cpl(x)~cp,(x)(x>O,x~E). If F is complete, then
bounded above, o-lim supx, is defined to be
L?(E, F) is a complete vector lattice. An element
A”Vm>nxm; we define o-lim infx, similarly.
cpE C(E, F) is called a positive operator if cp> 0.
With these definitions, x = o-lim x, is equiva-
If F is the set of real numbers R, then f?(E, F)
lent to x = o-lim sup x, = o-lim infx,. Any
is the set of all (order) bounded tlinear func-
Archimedean vector lattice can be extended to
tionals on E. This space, called the dual lattice
a complete vector lattice in the same way as
of the vector lattice E and denoted by Eb, is a
the real numbers are constructed from the
complete vector lattice. For f6Eb and x 2 0,
rational numbers by Dedekind cuts (- 294
x E E, we have
Numbers).

D. Examples of Vector Lattices Iflc4 = sup f(Y).


IYI GX

Sequence spaces, such as c, m, and l,, and


tfunction spaces, such as C, M, and L,, form F. Banacb Lattices
vector lattices under pointwise ordering (-
168 Function Spaces). Among these spaces c A linear space E is called a normed vector
and C are not u-complete, but the others are lattice if E is a vector lattice having the struc-
complete. We give two examples. First, let C ture of a tnormed space satisfying 1x1~ 1yl*
be a a-algebra of subsets of a space 0, and let llxll< Ilyll. Furthermore if a normed vector
A@, C) be the set of all finite o-additive tset lattice E is complete relative to the norm, we
functions defined on C. Then A(R, Z) is a call E a Banacb lattice. The examples in Sec-
complete vector lattice if we define pi 2~~ to tion D are Banach lattices (for PEA@, C) we
mean pLt(S)>pLz(S) for any SEC. The second define 11~11
=IM4).
example is an ordered space consisting of all InBanachlattices, 11x,-x/l+Oand /Iy,-yll
tbounded symmetric operators Ton a Hil- -0 imply lIx,xy,-xxyll+O. Among relations
bert space H, where we define T, > T, to mean between order convergence and norm conver-
(T,x,x)>(T,x,x) for any XEH. In general, this gence in Banach lattices, the following is one
space is not a vector lattice. However, if A is of the most fundamental: In a Banach lattice
a commutative tW*-algebra of operators on E, norm convergence of a sequence {x”} to x is
H and S is the set of tsymmetric operators be- equivalent to relative uniform star convergence
longing to A, then S is a complete vector lattice of {x,). to x, i.e., for any subsequence {x,t,)}
under the ordering just defined. We can re- of {x,}, there exists a subsequence {x,(,,,(i))} of
place the conditions of finiteness in A@, C) {x+,,} and an element y of E satisfying the
and boundedness in S with weaker ones and relations IX,&(~)) -xlQy/1(1=1,2,...).
still obtain the same situation. The tRadon- Any set bounded relative to the order is
Nikodym theorem in A@, C) and the tspectral bounded relative to the norm, but the con-
resolution theorem of symmetric operators in verse does not hold in general. For a linear
S can be extended to theorems of tspectral functional, however, these two concepts of
representations in general vector lattices. boundedness coincide, and the order dual of
310 G 1166
Ordered Linear Spaces

E is the same as the norm dual of E. More- spanned by a strictly positive element. In this
over, the dual (in any sense) of a Banach lattice case the eigenvalues of A on the spectral circle
is also a Banach lattice. are the kth roots of unity for some k multiplied
by r(A), each of which is a simple root of the
eigenequation of A.
G. Abstract M Spaces and Abstract L Spaces Since a positive matrix of order II corre-
sponds to a positive operator in E,, exten-
sions of this theorem to positive operators in
For a Banach lattice E, we consider the follow-
ordered linear spaces have been studied by
ing three conditions: (M) x, y>O*~jxvy~~ =
many mathematicians. For these extensions, see
max(llxll, Ilvll). WI x, Y~O=~IX+YII = Il.4 + the following articles: M. G. Krein and M. A.
IIYII. &,I x~Y=wlx+Yllp= llxllP+ IIYIIP Rutman (Amer. Math. Sot. Transl., 26 (1950);
(1 < p < co). If E satisfies one of the conditions
original in Russian, 1948), F. F. Bonsall (J.
(M), (L), or (L,), we say that E is an abstract M
London Math. Sot., 30 (1955)), S. Karlin (J.
space, and abstract L space, or an abstract L,
Math. Mech., 8 (1959)), T. Ando (J. Fuc. Sci.
space, written AM, AL, and AL,, respectively.
Hokkaido Univ., ser. 1, 13 (1957)), H. H. Schae-
If the unit ball of an AM space has a +greatest
fer [8], H. P. Lotz (Math. Z., 108 (1968)), F.
element, it is called the Kakutani unit of E.
Niiro and I. Sawashima (Sci. Pap. Cal!. Gen.
The duals of AM spaces, AL spaces, and AL,
Educ., Univ. Tokyo, 1966), I. Sawashima and F.
spaces are AL spaces, AM spaces with the
Niiro (Nat. Rep. Ochanomizu Univ., 30 (1979))
Kakutani units, and AL, spaces (l/p + l/q = l),
and S. Miyajima (J. Fat. Sci. Univ. Tokyo, 27
respectively. An AM space with a Kakutani
(1980)).
unit is represented by C(n), i.e., the set of all
real-valued continuous functions defined on a
compact Hausdorff space Q. The AL spaces
and AL,, spaces are represented by L, and L,, I. Integrals Based on Ordering
respectively, on a tmeasure space. Here the
representation of a Banach lattice means a
As applications of ordered linear space theory,
representation of a vector lattice preserving we state the integrals of Daniel-Stone and
the norm (Kakutani, Ann. Math., (2) 42 (1941); of Banach. Let us begin with a set (5 of real-
Bohnenblust, Duke Math. J., 6 (1940)). valued functions defined on an abstract space
S and assume that 6 is a vector lattice with
respect to the usual order relation, addition,
H. Spectral Properties of Positive Operators
and scalar multiplication. Assume further that
a functional E(f) defined on e satisfies the
The n-dimensional real vector space E, is a following conditions: (i) additivity, i.e., E(f+ g)
vector lattice under pointwise order (- Sec- = E(f) + E(g); (ii) positivity, i.e., ,f;> 0 implies
tion D). An element x in E, is called strictly E(f)>O; (iii) A .f,~& (n= 1,2, . ..) and If] <
positive if x, > 0 for all i. A square matrix A = C,% IhI imply E(lfl)~C,“=, -Nf,lh where I.fl
(ai, j) of order II is called positive if ai, j > 0 for means ,fv( -,f). A functional on e satisfying
all i and j. It corresponds to a positive opera- both conditions (i) and (ii) is called a positive
tor in E, (- Section E). A is called irreducible linear functional. A positive linear functional
if there exists no permutation matrix P such on & satisfies M. H. Stone’s condition (iii) if
and only if it satisfies P. J. Daniell’s condition
that P-‘AP= , where A, and A,
(iii)‘: fi >f2 > . . and lim,,, f, = 0 imply
are square matrices of order n, (1 d n, < n). We lim ,,+,E(f,)=O. Next, we define, for every
denote by cr(A) the tspectrum of A and by r(A) function cp on S admitting fco as values, a
the tspectral radius of A, i.e., sup{ 1~11Ada}. functional N(q) as follows:
The spectral circle of A is the circle of radius
r(A) having the origin as its center. 0. Perron
(Math. Ann., 64 (1907)) and G. Frobenius (S. B.
Preuss. Akad. Wiss., 1908 and 1912) established Here we put N(q)= +co when for a function Q
the following remarkable result on the spec- there are no functions { fn} such that Iv]<
tral properties of positive matrices. C,“=, lf,l. A function cp is, by definition, a null
Theorem (Perron-Frobenius). Let A be a function if N(q) = 0 holds, and a set A is a
positive square matrix. Then its spectral radius null set if its tcharacteristic function is a null
r(A) belongs to a(A), and for this spectrum function. Since each function of so = {‘p 1N(q)
r(A) there exists an eigenvector x 3 0. Assume < +co} takes finite values except on a null set,
further that A is irreducible and the order of A we can define addition and the scalar multipli-
is greater than 1. Then r(A) > 0 and the eigen- cation for such functions except on a null set.
space of A for r(A) is a 1-dimentional subspace Let 3 be the set of equivalence classes of k,,
1167 310 Ref.
Ordered Linear Spaces

with respect to the relation cpN $ defined as originally on [0, 1) and


N(cp - $) =O. Then 3 is a Banach lattice with
p(x) = in$ M(x, a).
the norm N, and (E is included in 5 (by iden-
tifyingfandgof&whenE(lf-gl)=O). Let us
Then, by the Hahn-Banach extension theorem,
denote now by L! the closure of (%in 3. Then
there exists a linear functional F on 5 satisfy-
any function cp belonging to f? is said to be
ing F(x)<p(x). If we write Jx(s)ds for F(x),
Daniell-Stone integrable, and L(q) = N(cp+)-
then we can prove immediately that jx(s)ds
N(cp-)(cp+=&ld+d, c~-=%ld--vcp))is has the following properties: (1) ~{UX (s) +
called the Daniell-Stone integral of rp. The
by(x)}ds=ajx(s)ds+bjy(s)ds, where a
integral L thus defined is, as a functional on 2,
and b are real constants. (2) x(s) > 0 implies
an extension of the functional E on @. For this
Jx(s)ds>O. (3) jx(s+s,Jds=Jx(s)ds, where sc
integral, Lebesgue’s convergence theorem is
is an arbitrary real number. (4) j 1 ds = 1. If
easily proved, and a result corresponding to
necessary, we can add the property (5) jx(l -
Fubini’s theorem has been obtained [9]. Fur-
s) ds = j x(s) ds by defining
thermore, the concepts of measurable func-
tions, measurable sets, and measure can be
x(s)ds=;{F(x(s))+F(x(l-s))}.
defined by using L and 2. Also, the relation s
between L and the Lebesgue integral with
Then
respect to this measure is known [9]. Since
2 is an tabstract L-space, the Daniell-Stone
integral L(q) is represented by the Lebesgue F(x)= x(s)ds
s
integral of cp on a certain measure space. The
Daniell-Stone integral introduced above is due or
to M. H. Stone [9]. Daniel1 (Ann. Math., (2)
19 (1917-1918)) originally defined the upper ;{F(x(s))+f(x(l-s))}=Jx(s)ds
integral F(cp) by using E(f) on & satisfying
conditions (i), (ii), and (iii’). Also, he defined is called the Banach integral of x(s).
the set f! of Daniell-Stone integrable func- The construction of the Daniell-Stone inte-
tions by L={cplF(q)= -I(--cp)}. S. Banach gral and the Banach integral opened avenues
defined an integral by using methods similar to several other abstract integrals based on the
to Daniell’s, replacing condition (iii’) for a posi- order relation, such as an integral for more
tive linear functional E(f) on 6 by condi- general functions with values in a vector lat-
tion(iii”): lim,,,f,=O, lf,l<g, andf,, gs(Z tice, or an integral considered as a mapping
imply lim,,, E(f,) = 0 [lo]. Furthermore, N. from a vector lattice into another vector lattice
Bourbaki [ 1 l] and E. J. McShane (Proc. Nat. (or from an ordered set into another ordered
Acad. Sci. US, 1946) have defined a more gen- set). Indeed, if the function takes values in a
eral integral than the Daniell-Stone with a complete vector lattice, then almost all results
condition analogous to (iii’), replacing the se- in this section (e.g., the Hahn-Banach exten-
quence in it by a tdirected family of functions sion theorem) hold trivially. For discussions of
02~ these and other abstract integrals - [13-151.
Specifically if, in the Daniel&Stone integral, S
is a locally compact Hausdorff space and & is
the set of continuous functions with compact References
supports, then a functional E(f) on a satisfy-
ing conditions (i) and (ii) is proved to satisfy [l] F. Riesz, Sur la decomposition des oper-
the condition (iii’), and the Daniell-Stone ations fonctionnelles lineaires, Atti de1 Con-
integral L(q) can be constructed from E(f) gresso Internazionale dei Matematici, 1928,
Clll. Bologna, 3, 143-148.
Banach also defined another integral for all [2] G. Birkhoff, Lattice theory, Amer. Math.
real-valued bounded functions on [0, 1) by Sot. Colloq. Publ., revised edition, 1948.
using the tHahn-Banach extension theorem [3] H. Nakano, Modulared semi-ordered
[12]. His definition is as follows: Let 5 be the linear spaces, Maruzen, 1950.
set of all real-valued bounded functions on [4] M. M. Day, Normed linear spaces, Erg.
[0, 1) and 2I be the family of all finite sets of Math., Springer, 1958.
real numbers c(= (c(r) c(~, . . , CL,).Furthermore, [S] B. C. Vulikh, Introduction to the theory of
we define, for x(s) E $j and c(E 2I, partially ordered vector spaces, Gronigen,
1967.
[6] I. Namioka, Partially ordered linear top-
ological spaces, Mem. Amer. Math. Sot., 1957.
where x(s) is considered as the periodic exten- [7] A. L. Peressini, Ordered topological vector
sion to (-co, +co) of the function defined spaces, Harper & Row, 1967.
311 A 1168
Ordering

[S] H. H. Schaefer, Banach lattices and posi- is called a total (or linear) ordering, and A is
tive operators, Springer, 1974. called a totally ordered (or linearly ordered) set.
[9] M. H. Stone, Notes on integration, Proc. We sometimes write x < y as y > .c. The
Nat. Acad. Sci. US I-111, 34 (1948), 336-342, binary relation > is called the dual ordering of
4477455,483-490; IV, 35 (1949), 50-58. <; it is also an ordering. More generally, the
[lo] S. Banach, The Lebesgue integral in duals of concepts, conditions, and propositions
abstract spaces, Theory of the Integral, S. concerning an ordering are defined by replac-
Saks (ed.), Stechert, 1937, 320-330 (Dover, ing the ordering with its dual. For example,
1964). x<y means that xdy and xfy, while x>y
[ 111 N. Bourbaki, Elkments de mathkmatique, means that x 2 y and x # y; and > is the dual
Integration, ch. l-9, ActualitCs Sci. Ind., Her- of <. If a universal proposition concerning the
mann: ch. l-4, 117a, second edition, 1965; ch. ordering is true, then its dual is also true; this
5, 1244b, second edition, 1967; ch. 6, 128la, principle is called the duality principle for
1959; ch. 7, 8, 1306, 1963; ch. 9, 1343, 1969. ordering. Incidentally, x < y is equivalent to
[ 121 S. Banach, ThCorie des opkrations the statement x < y or x = y, according to the
linkaires, Warsaw, 1932, 30-32. (Chelsea, 1963). definition of <.
[ 131 S. Izumi, N. Matuyama, M. Nakamura,
M. Orihara, and G. Sunouchi, An abstract
integral, Proc. Imp. Acad. Tokyo, I-111, 16 B. Definitions
(1940), 21~25,87~89,518~523; IV, 17 (1941),
l-4; V-X, 18 (1942), 45p49,50-52,53-56, A subset of an ordered set X of the form
535-538,539%542,543-547. {x 1a < x < h} is denoted by (a, h), and a set of
[ 141 E. J. McShane, Order-preserving maps the form (a, h), {x 1x < a}, or {x 1x > a} is called
and integration processes, Ann. Math. Studies, an interval. In particular, S(c) = {x 1x cc} is
Princeton Univ. Press, 1953. called the segment of X determined by c. A
[ 151 T. H. Hildebrandt, Integration in abstract pair of elements a, b satisfying a < h is called a
spaces, Bull. Amer. Math. Sot., 59 (1953), 11 l- quotient of X and is denoted by b/u.
139. When u<c<b or b<c<u, c is said to lie
[16] N. Dunford and J. T. Schwartz, Linear between a and b. A totally ordered set A is said
operators, Wiley, I (1958), III (1971). to be dense if for any pair of distinct elements a
[ 171 K. Yosida, Functional analysis, Springer, and b in A there exists a third element c lying
1965. between a and b. When u < b and there is no
element lying between a and b, then a is called
a predecessor of b, and b a successor of a. In
this manner, most of the terminology as-
sociated with the inequality of numbers is
311 (11.12) carried over to general ordering.
In an ordered set A, an element a is called
Ordering
an upper bound of a subset X if x < (2 for
every element x of X. When an upper bound
A. Ordering exists, X is said to be bounded from above (or
bounded above). The dual concept of an upper
The concept of ordering is abstracted from bound is a lower bound of the subset; and if the
various relations, such as the inequality rela- subset has a lower bound, it is said to be
tion between real numbers and the inclusion bounded from below (or bounded below). A set
relation between sets. Suppose that we are bounded both from above and from below is
given a set X = {x, y, Z, j; the relation be- simply said to be bounded. When a is an upper
tween the elements of X, denoted by < or bound of X and u~X, then a is called the
other symbols, is called an ordering (partial greatest element (or maximum element) of X.
ordering, semiordering, order relation, or sim- Such an element a (if it exists) is unique and is
ply order) if the following three laws hold: (i) denoted by max X; its dual is the least element
the reflexive law, x <x; (ii) the antisymmetric (or minimum element) and is denoted by
law, x < y and y $ x imply x = y; and (iii) the min X. If there is a least element in the set of
transitive law, x < y and y < z imply x <z. upper bounds of X, it is called the least upper
A set X with an ordering between its ele- bound (or supremum) of X and is denoted by
ments is called an ordered set (partially ordered 1.u.b.X or supX. Its dual is the greatest lower
set or semiordered set). A subset of an ordered bound (or infimum) and is denoted Isy g.1.b.X
set X is also an ordered set with respect to the or inf X.
same ordering as in X. If for an arbitrary pair If the ordered set X is the image q(A) of a
of elements x, y of an ordered set A either set A under a +mapping cp, where A is of the
x < y or y d x must hold, then the ordering < form {i& 1C(n) ), then sup X is also written
1169 311 G
Ordering

sup,(,, cp(l) and is called the supremum of cp(i) ciples of induction are often used for proving
for all 1 that satisfy C(1). When there is no propositions or giving definitions concerning
danger of misunderstanding, it may be written ordinal numbers (- 312 Ordinal Numbers).
as sup, cp(l) or sup ~(1) and called simply the
supremum of cp(J.); similar conventions hold
for inf, max, min, etc. D. Directed Sets
An element a of a set X is called a maximal
element if a <x never holds for any element x An ordered set (or in general a preordered
of X; its dual is a minimal element. If the set (- Section H)) in which every finite subset
greatest (least) element exists, it is the only is bounded from above is called a directed
maximal (minimal) element. But in general, a set. Let B be a subset of a directed set A. If
maximal (minimal) element is not necessarily {b 1b 2 u} fl B # 0 for every element a of A,
unique. then B is said to be cofinal in A; such a subset
B is itself a directed set. If {b 1b > u} c B for
some element a of A, then B is said to be re-
C. Chain Conditions sidual in A; such a subset B is also cotinal in A.
The condition that B is colinal in A is equiva-
An ordered set X is said to satisfy the minimal lent to the condition that A -B is not residual
condition if every nonempty subset of X has a in A.
minimal element. The dual condition is called
the maximal condition. An infinite sequence
{ 4.4, . . . . a,, . } of elements of an ordered E. Order-Preserving Mappings
setXsuchthata,<a,<...<u,<...is
called an ascending chain, and the condition A mapping rp: A-A’ of an ordered set A into
that X has no ascending chain is called the an ordered set A’ is called an order-preserving
ascending chain condition. The notions dual to mapping (monotone mapping or order homo-
those of ascending chain and ascending chain morphism) if a ,< b always implies q(a) ,< p(b).
condition are descending chain and descending Moreover, if rp is bijective and qp-’ is also an
chain condition, respectively. By the chain order-preserving mapping from A’ onto A,
condition, we mean either the ascending or the then rp is called an order isomorphism. A’ is
descending chain condition. Under the taxiom said to be order homomorphic (order isomor-
of choice, the maximal condition is equivalent phic) to A when there exists an order homo-
to the ascending chain condition, and the morphism (order isomorphism) cp such that A’
minimal condition to the descending chain = q(A). If a mapping cp: A-+ A’ gives an order
condition. isomorphism of A to the dual of A’, cp is called
If a totally ordered set X satisfies the mini- a dual isomorphism (or anti-isomorphism).
mal condition or, equivalently, if every non-
empty subset of X has a least element, then the
set X is called a well-ordered set, and its order- F. Direct Sum and Direct Product
ing is called a well-ordering.
The following theorem is called the principle Let S be a set that is the tdisjoint union of a
of transfinite induction: Let P(x) be a propo- family {A,JA,A of its subsets, and suppose that
sition concerning an element x of a well- each A, is an ordered set. For a, b E S, define
ordered set X such that (i) P(x,) is true for the adb to mean that a, bEA, for some SEA and
least element x0 of X, and (ii) P(x) is true if ad b with respect to the ordering in A,. The
P(y) is true for all y satisfying y < x. Then P(x) ordered set S obtained in this way is called the
is true for all x in X. tMathematica1 induction direct sum (or cardinal sum) of the family
is a special case of this principle, where X is {AA),,, of ordered sets. When (al)le,, and
the set of all natural numbers. To define a (b,),,, are elements of the Cartesian product
mapping F from a well-ordered set X into a P = &,, A, of a family {A,},,, of ordered
set Y, we may use the following principle: sets, we define (a,),,, ,< (b,),,, to mean that
Suppose that F(x,) is defined for the least a, < b, holds for all 1 E A. The ordered set P
element x0 of X, and for each element x of X obtained in this way is called the direct product
there is given a method to associate an ele- (or cardinal product) of the family {A,},,, of
ment G(f) of Y uniquely with each mapping ordered sets.
f: S(x)+ Y with domain S(x), where S(x) is the
segment of X determined by x. Then there
exists a unique mapping F: X-+ Y satisfying G. Ordinal Sum and Ordinal Product
F(x)= G(F 1S(x)) for all x. The definition of
the mapping F by this principle is called a Suppose that 2I = {A, B, . . . } is a family of
definition by transfinite induction. The prin- mutually disjoint ordered sets and is itself an
311 H 1170
Ordering

ordered set. Then an ordering < can be de- 312 (11.13)


fined in the disjoint union S = u X (X E %) as
Ordinal Numbers
follows: x <y in S means that either (i) there
exists an A satisfying x, ye A E 9I and x d y
holds with respect to the ordering in A; or (ii) A. General Remarks
for A and B satisfying x E A E (II, ye BE 2t, we
have A <: B. The ordered set S obtained in this Let A g B mean that two tordered :sets A, B
way is called the ordinal sum obtained from 2I are torder isomorphic; then the relation z is
and is denoted by C,, X. In particular, if ?I = an tequivalence relation. An equivalence class
{A, B} and A <B, the ordinal sum is denoted under this relation is called an order type, and
by AS-B. the class to which an ordered set A belongs is
Suppose that X is a subset of the Cartesian called the order type of A. Historically, an
product n, X, of a family of ordered sets ordinal number was first defined as the order
indexed by an ordered set A, and the subset type of a twell-ordered set (Cantor [2]). How-
{i 1x2 # y,} of A has a least element whenever ever, it was found that a contradiction occurs
x=(x,),,, and y = (y,),,, are two distinct if order type defined in this way are considered
elements of X. The ordering in X defined by to form a set. Hence, another definition was
setting x < y when xc < y,, for the least ele- given by J. von Neumann [3], which is stated
ment p of { 11 xi # y,} is called the lexico- in Section B. A similar situation was found
graphic ordering in X. It can be applied to X = concerning the definition of tcardinal numbers,
I”I, X, if A is well-ordered; X is then called which led to a new definition of cardinal num-
the ordinal product. When A, B, . . are ordered bers using ordinal numbers, which is given in
sets, AB . is often used to denote the ordinal Section D.
product obtained from X, = A, X, = B, . .
with the ordering 1 < 2 < . of indices; the
ordering in this ordinal product is called the B. Definitions
lexicographic ordering in the Cartesian product
AxBx . A set tl is called an ordinal number if it satisfies
the following two conditions: (i) tl IS a well-
ordered set with the tbinary relation E as its
ordering; and (ii) p E CLimplies /3 c 0:. Accord-
H. Preordering
ing to this definition, the empty set. is an
ordinal number, which is denoted by 0. Also,
A relation R between elements of a set X is 1 = {0}, 2 = (0, l}, 3 = (0, 1,2}, . are ordinal
called a preordering (or pseudoordering) if it numbers. These ordinal numbers, which are
satisfies the reflexive law and the transitive finite sets, are called finite ordinal numbers.
law, but not necessarily the antisymmetric law. The finite ordinal numbers are identified with
By defining (x,,y,)R (x,,y,)*x, <x,, for the natural numbers (including 0). The set w
example, a preordering of pairs (x, y) of real = (0, 1,2, . . } of all natural numbers is also an
numbers is obtained. From a preordering R an ordinal number. An ordinal numb’cr that is an
equivalence relation - can be defined in X by infinite set, like w, is called a transfinite ordinal
x-yo(xRy and yRx). Let [X]=X/- be number.
the tquotient set of set X by this equivalence For every well-ordered set A, there exists
relation, and let [xl, [y] be the equivalence one and only one ordinal number order iso-
classes determined by x, yeX; then an order- morphic to A. This ordinal number is called
ing < can be defined in [X] by [x] < [y] 0 the ordinal number of A. (Throughout this
xRy. (For further topics - 52 Categories and article, lower-case Greek letters denote ordinal
Functors; 409 Structures.) numbers.) We also write c(E p as a c /?, which
defines an tordering of the ordinal numbers.
The least ordinal number is 0, and the order-
ing of the finite ordinal numbers c’aincides
References
with the usual ordering of the natural num-
bers. The least transtinite ordinal number is o.
[l] N. Bourbaki, ElCments de mathkmatique, The ordering <, introduced by defining CI< fl
I. ThCorie des ensembles, ch. 3, ActualitCs Sci. to mean either R <p or a = /Y, is a tlinear order-
Ind., 1243b, Hermann, second edition, 1967; ing and, in fact, a twell-ordering of the ordinal
English translation, Theory of sets, Addison- numbers. Therefore ttransfinite induction can
Wesley, 1968. be applied to ordinal numbers.
[2] G. Birkhoff, Lattice theory, Amer. Math. For any ordinal number a, the set a’ =
Sot. Colloq. Publ., 1940, revised edition, 1948. { 5 ( 5 <a} is also an ordinal number, and is
Also - references to 381 Sets. the tsuccessor of a. There exists at most one
1171 312 E
Ordinal Numbers

ordinal number that is the tpredecessor of tl. A called the cofinality of c( and is denoted by
transfinite ordinal number without a prede- cf(cc).
cessor is called a limit ordinal number, and all
the other ordinal numbers are called isolated
ordinal numbers. The first limit ordinal num- D. Cardinal Numbers
ber is w. For any set A of ordinal numbers,
(5 13~ (5 < q E A)} is an ordinal number and is Let M-N mean that a one-to-one corre-
sup A, the tsupremum of A. spondence exists between the two sets M and
N. An ordinal number a with the property that
. .
ct - 5 implies c(< 5 is called an initial number
or a cardinal number.
C. Sum, Product, and Power
With the taxiom of choice, it can be shown
that for each set M there exists one and only
The sum cc+p, the product a.B (or a/?), and
one cardinal number TVsatisfying M -CC. This
the power ~8 of ordinal numbers CL,/3 are de-
lined by translinite induction on /I and have unique CCis called the cardinality (or cardinal
number) of the set M and is denoted by M.
the following properties:
All finite ordinal numbers are cardinal num-
cr+o=a, cc+B’=(a+8)‘, bers, and o is the least transfinite cardinal
number. There exists one and only one mono-
a+Y=suP{~+515<r};
tone function that maps the class of ordinal
wo=o, cr.p=cr.fi+a, numbers onto the class of transfinite cardinal
numbers, and it is a normal function. The
V=suP{~.515<r};
value of this function corresponding to a is
CCO=1, rg=&.c( ) tlY=sup{c(~l~<y}. denoted by K, (alepb alpha) or 0,. In partic-
ular, K, = w, and K, is both the smallest un-
Here y is a limit ordinal number, and for the
countable cardinal number and the smallest
power we assume that tl > 0. The sum and
uncountable ordinal number. A finite ordinal
product thus defined satisfy the associative
number is called an ordinal number of the first
laws (~(+/I)+y=cc+(B+y), (a./?).~=
number class, and an ordinal number tl satisfy-
cr.@?.?) and the left distributive law cr.(b+y)
ing K, < CI<K, is called an ordinal number of
=cr./I+cc.y; the power satisfies the laws
the second number class. The concept of ordinal
c@+~=cx~~~~, @‘=(c@)~. If CI and bare the
number of the third (or higher) number class is
ordinal numbers of the well-ordered sets A
defined similarly.
and B, respectively, then tl+ b is the ordinal
number of the tordinal sum A + B, and E./I is
the ordinal number of the tordinal product
E. Inaccessible Ordinal Numbers
BA.
When n > 1, any ordinal number c( can be
The cofinality cf(a) of a always satisfies cf(a) <
written uniquely in the form
c(. An ordinal number is said to be regular
cc=lrSl.yl+K82.yz+...+RB”.yn; when cf(cc)= c( and singular when cf(cr) <a. For
any ordinal number CC,cf(a) is a regular car-
Bl>DZ>...B”>O, O<Yi<7c, l<i,<n,
dinal number; therefore any regular ordinal
which is called the n-adic normal form for a; number is a cardinal number. When tl = e+ is
when n = o, it is called Cantor’s normal form. regular and /I is a limit ordinal number, c( is
Let f be an ordinal number-valued function said to be weakly inaccessible. Let R be the
of ordinal numbers. We say that f is strictly set-valued function of ordinal numbers, de-
monotone when CL< /3 implies f(a) <f( /I). If f is fined by R(O)=@ and R(a)=~{~(R(~))~~<a}
strictly monotone, then tl <f(cd. We say that f (by ttranstinite induction), where ‘$3(M) de-
is continuous when f(y) = sup { f(t) 15 < y} for notes the tpower set of M. A regular ordinal
each limit ordinal number y. A strictly mono- number LXis said to Lx strongly inaccessible
tone continuous function is called a normal when CI> w and the following condition is
function. If f is a normal function, then for any satisfied: If x, y are a pair of sets such that
c( there exists a /I that satisfies f(p) = /3 > t(. In XE R(a), y c R(a), and there exists a mapping
fact, it suffices to define /I& < o) by PO =f(cc + of x onto y, then ye R(a). If a regular ordinal
1X A+, =f(BJ and put B=sup(&In<c4. number t( is strongly inaccessible, it is weakly
Since f(a) = ob is a normal function, there inaccessible. A strongly inaccessible ordinal
exists an E that satisfies O’ = E. Such an ordinal number is usually defined as a regular number
-
number E is called an c-number. We say that /I c(> w such that /I < c( implies ‘p(p) < tl. Under
is cofinal to u. when there exists a monotone the axiom of choice, this definition is equi-
function f that satisfies c(= sup{ f(t)’ 1l< 8). valent to the one given here. Moreover, under
The first ordinal number that is colinal to CI is the tgeneralized continuum hypothesis, strong
312 Ref. 1172
Ordinal Numbers

inaccessibility and weak inaccessibility are since we are concerned only with ordinary
equivalent. differential equations, we omit the word “ordi-
nary.” If the left-hand side ,f of (1) contains
y(“) explicitly or aflay # 0, then we say that
References
the order of (1) is n, and if further f is a poly-
nomial in y, y’, , y(“) that is of degree m with
[l] N. Bourbaki, Elements de mathematique,
respect to y(“), we say that the degree of (1) is
I. Theorie des ensembles, ch. 3, Actualites Sci.
m. In particular, if f is a linear form in y, y’,
Ind., 1243b, second edition, Hermann, 1967;
. . . . y”“, then (1) is said to be linear. 4 differ-
English translation, Theory of sets, Addison-
ential equation that is not linear is :said to be
Wesley, 1968.
nonlinear (- 252 Linear Ordinary IDiffer-
[2] G. Cantor, Beitrage zur Begrtindung der
ential Equations; 291 Nonlinear Problems).
transtiniten Mengenlehre II, Math. Ann., 49
Let CP(X,Y,C~, . . . . c,) be a function of the
(I 897), 207-246. (Gesammelte Abhandlungen,
n + 2 variables x, y, ci, . . , c, of class C’ in
Springer, 1932; English translation, Contri-
a domain D, and let (x,,y,,cT, . . . . &ED,
butions to the founding of the theory of trans-
cp(xo,~o,c~, . . ..c.O)=O, and (py(xo,~3,ci’, . .. .
finite numbers, Open Court, 1915.)
c,“) # 0. Then the equation cp(x, y, CT, , c,“) = 0
[3] J. von Neumann, Zur Einfiihrung der
defines an timplicit function y(x) of class C’
transfiniten Zahlen, Acts Sci. Math. Szeged., 1
satisfying the condition y(x,) = y,. Consider
(1923) 199-208. (Collected works I, Per-
ci, , c, to be constants in cp(x, y, ci, . . . , c,) = 0
gamon, 1961.)
and differentiate cp n times with respect to x.
Also - references to 381 Sets.
Then we obtain a system of n equations in the
variables x, y, y’, . . . , y(“), c,, . . . , c,. If we can
eliminate ci, , c, from these n equations and
cp=O, then we obtain an nth-order differential
313 (X111.2) equation of the form (1). Conversel:y, a solu-
Ordinary Differential tion of an nth-order differential eqLLation can
usually be written in the form
Equations
dX,Y,Cl, .“,c”)=o, (4
A. General Remarks which contains n arbitrary constants cl, , c,
(sometimes called integration constants). A
Let x be a real (complex) variable and y a real solution containing n arbitrary constants of
(complex) function of x. Assume that y = F(x) the form (2) of an nth-order differential equa-
is a differentiable function of class C” if x, y are tion is called a general solution, and a solu-
real, and a holomorphic function if x, y are tion cp(x, y, cy, , c,“) =0 obtained from a
complex. We write y’, y”, , y”” for the first general solution cp= 0 by giving particular
n derivatives of y. A relation among x, y, y’, values coi, , c,” to the arbitrary co,lstants is
. ..>y (N1 called a particular solution. Some equations
admit solutions that are not particular solu-
f(x, y, y’. . ) y’“‘) = 0 (1) tions. They are called singular solutions (for
(which holds identically with respect to x), is example, Klairaut differential equations; -
called an ordinary differential equation for the Appendix A, Table 14.1).
function y = F(x). Here we assume that the
function f in the left-hand side of (1) is a real
B. Systems of Differential Equations
(complex) function of the n + 2 variables x, y,
y’, . , y(“) and is defined in a given domain of
A set of n differential equations containing n
R”+’ (C”+‘). Usually we assume further that f
unknown functions y,, , y, of a variable x is
has a certain regularity, such as being of class
called a system of ordinary differential equa-
c’ (r = 0, 1, , co), treal analytic, or tcomplex
tions. Here each equation of the system has a
analytic. A function y = F(x) that satisfies (1) is
form similar to (I), but each left-hand side
called a solution of (1). To find a solution of (1)
contains y,, , y, and their derival:ives. A set
is to solve or integrate it. Ordinary differential
of n functions y, , , y, of x is called a solution
equations may be contrasted to partial dif-
if the functions satisfy the given system of dif-
ferential equations, which are equations similar
ferential equations. The highest order of de-
to (1) but in which y is a function of two or
rivatives in the left-hand sides is called the
more variables x1, x2, . and which contain
order of the system of differential eaquations.
the partial derivatives a~/&, , ay/dx,, (- 320
We consider most frequently a first-order
Partial Differential Equations). Ordinarily, the
system of the form
term differential equation refers to an ordinary
or partial differential equation. In this article, Yj=,fi(x,Y,,...,Y,), i=l , 2 ,..., n. (3)
1173 314 A
ODES (Asymptotic Behavior of Solutions)

Ifwe put y=y,, y’=y,, ...,~(“-~)=y~ and solve for some special types of differential equations
(1) with respect to y(“) to get y(“)=f,(x, y,, (- Appendix A, Table 14.1). S. Lie gave theo-
. . . , y,), then (1) is equivalent to a system of retical foundations for this method by using
equations of the form (3), where fr = y,, fi = Lie transformation groups (- 431 Transfor-
y,, . , fn =$ In an analogous way, a general mation Groups; Appendix A, Table 14.111).
system of equations can be transformed to a There are many other methods, for exam-
system of the form (3). Therefore (3) is called ple, power series methods (assuming that the
the normal form of differential equations. solution can be expanded in a power series
C a,(x -a)“, substituting the series for y in (1),
and finding its coefficients); methods of suc-
C. The Geometric Interpretation cessive approximation; methods using tLa-
place transforms or tFourier transforms; tper-
Whenx,y,,..., y, are real, (3) can be inter- turbation methods; numerical methods; etc.
preted as follows: Let I = (a, b) be an open (- 303 Numerical Solution of Ordinary Dif-
interval and D a domain of R”. Let ferential Equations).
Historically, finding explicit solutions of
Yi=(Pi(X~C1~~~~9Cnh i=l,2 )..., n, (4) various kinds of differential equations has
be functions of class C’ defined for (x, ci , been the main object of the theory. Recently,
. . . , C,)E I x D, and let 3(x,) be the image in however, the importance of qualitative studies,
the y, , . . , y,-space of D under the mapping in particular theorems on the existence and
yi = cpi(x,, cl, . . , c,) (i = 1,2, . , n) for a fixed uniqueness of solutions, has been recognized.
x0 E 1. We assume that for each fixed x,, E 1 we For example, if a solution with a property A is
have a((pi, . . . ,cp,)/a(c,, . . . ,c,)#O in D. Then given, and if the uniqueness of the solution
for every x = x0 E I, c1 , . . . , c, are considered having the property A and the existence of
to be functions of ( y , , . . . , y,) defined in a solutions having the properties A and B can be
neighborhood of every point ( yi, . . . , yt) of shown, then the given solution necessarily has
3(x,), and we have y; = qi(x, c1 , . . . , c,) = the property B. In this way, topological and
f;(x,y,,...,y,)(i=1,2, . . . . 4,i.e.,~,,...,y, analytic studies of differential equations are
satisfy a system of differential equations of the applied to find their solutions (- 314 Or-
form (3). On the other hand, (4) represents a dinary Differential Equations (Asymptotic
family of curves of class C’ in the x, yr, . . , y,- Behavior of Solutions); 315 (Boundary Value
space R”+’ containing n parameters c1 , . . . , c,, Problems); 316 (Initial Value Problems); 126
for which (y;, . . , yi) is the tangent vector (in Dynamical Systems).
the terminology of physics, (yf, . , y:) gives the
speed and the direction of a stationary flow in
References
R”+l at each point). By solving (3) we find the
family of curves of class C’ in R”+’ (in the
[l] W. Kaplan, Ordinary differential equa-
terminology of physics, we find a stationary
tions, Addison-Wesley, 1958.
flow of which the speed and the direction are
[2] P. Hartman, Ordinary differential equa-
given at each point). A solution containing n
tions, Wiley, 1964.
parameters analogous to (4) is called a general
[3] E. Hille, Lectures on ordinary differential
solution of (3), and a solution obtained from a
equations, Addition-Wesley, 1968.
general solution by giving particular values to
Also - references to 316 Ordinary Differential
the n parameters is called a particular solution.
Equations (Initial Value Problems).
As may be imagined by the interpretation in
this section, there exists in general one and
only one particular solution passing through
the point (x,, ~7,. . ,y,“) for x~EI, (~7,. . . , yi)~
3(x,). The problem of finding this solution,
i.e., the solution of (3) for which yi(x,) = yp for 314 (X111.5)
x = x0, is called the initial value problem (-
316 Ordinary Differential Equations (Initial
Ordinary Differential
Value Problems)). Equations (Asymptotic
Behavior of Solutions)
D. Methods of Integration
A. Linear Differential Equations

We have different methods of solving differen-


A system of linear ordinary differential equa-
tial equations. To solve differential equations
tions can be written as
by a finite number of integrations is called the
method of quadrature. This method is useful x’ = A(t)x, (1)
314 B 1174
ODES (Asymptotic Behavior of Solutions)

where t is a real independent variable, x = solutions of (2), it suffices to transform the


(x1, . ,x”) is an n-dimensional complex vec- matrix A into a tJordan canonical form, since
tor function oft, and A(t) is an n x n matrix the structure of the solution space of (2) is
whose elements are complex-valued functions completely determined by the Jordan canon-
oft. If /l(t) is a continuous function oft de- ical form of A. Thus all solutions of (2) are
fined on an open interval I, any solution of (1) bounded if and only if every eigenvalue of A
is continuously differentiable for t E I. The has a real part not greater than zero, and those
question naturally arises as to how the solu- with zero real parts are of simple type, that
tions behave as t approaches either one of the is, the corresponding blocks in the Jordan
endpoints of I; that is, the question of the canonical form are all 1 x 1 matrices; all solu-
asymptotic properties of the solutions. The tions of (2) tend to zero as t + cc if and only
interval I can always be taken to be O< t < co, if every eigenvalue of A has negative real part.
by applying a suitable transformation of the Consider the linear system
independent variable if necessary.
x’ = A,(t)x, (3)
The study of the tasymptotic expansions of
solutions when the coefficient A(t) is an ana- where A,(t) is a periodic matrix function of
lytic function oft was initiated by H. Poincart: period w. According to tFloquet’s theorem, (3)
in 1880. This work was continued by J. Horn, is transformed into a system with constant
J. C. C. A. Kneser, and others in the direction coefficients by means of a suitable transfor-
of removing assumptions on the structure of mation x = P(t)y, where P(t) is a nonsingular
A(t) and extending the domain where the ex- periodic matrix of period w. Thus, at least
pansions are valid. The theory has been almost theoretically, the information on the asymp-
completed by W. J. Trjitzinsky, J. Malmquist, totic behavior of the solutions of the periodic
and M. Hukuhara (- 254 Linear Ordinary system (3) can be derived from the correspond-
Differential Equations (Local Theory)). On the ing theory for the system with con,stant coefli-
other hand, 0. Perron initiated a new direc- cients (2).
tion of research by weakening the regularity
conditions on the coefficients. His work was
continued by F. Lettenmeyer, R. A. Spith, C. Asymptotic Integration
Hukuhara, and others. The methods used in
these two lines of investigation were originally
distinct, but Hukuhara established a unified Suppose that A(t) is bounded. Then the type
method of treating the problems arising in number x(x) is finite for any nontrivial solu-
these two different types of investigations. tion x(t) of(l), and the number of distinct
Furthermore, he succeeded in sharpening type numbers does not exceed n.
those results previously obtained. Consider the linear system
Here we assume that A(t) need not be ana-
x’=[A+B(t)]x, (4)
lytic. The following asymptotic properties
of a solution x(t) as t+ co are considered: where A is a constant matrix and B(t) is a
(i) boundedness of limsup t-’ log[x(t)l; (ii) matrix function such that j:” IlB(s)ll ds-+O as
boundedness of solution: limsupIx(t)l< co; t-+ co. For any nontrivial solution x(t) of (4),
(iii) convergence of solution: lim x(t); (iv) inte- the limit p=lim t-‘log(x(t)l exists and is equal
grability: Jmlx(s)lPds< co, etc. We call x(x)= to the real part of one of the eigenvalues of A.
limsupt-‘loglx(t)l the type number (or Lya- Conversely, if at least one eigenvalue of A has
punov characteristic number) of the solu- real part p, then there exists a nontrivial solu-
tion x(t). The fact that all solutions of (1) are tion x(t) of (4) satisfying lim tf’log Ix(t)1 = p.
bounded is equivalent to the tstability of the Suppose in addition that B(t)+0 as t-co. Let
solution x = 0, and the fact that all solutions of /* 1 < /** < . <p. be the real parts of the eigen-
(1) tend to zero as t+ co is equivalent to the values of A. Then there exists a tfundamental
tasymptotic stability of the solution x = 0. system of solutions of (4), {xl(t), . . ,x,(t)},
such that for any ci, ck # 0,

B. Constant Coefficients and Periodic


A sharp estimate of the term o(t) was given by
Coefficients
Hukuhara.
Next consider the linear system
We begin with the particular case of (I), where
x’= [A(t)+ B(t)]x,
A(t) is a constant matrix:
where the matrices A(t) and B(t) satisfy
x’ = Ax. (21
~“IIA’(s)llds<co andj”IlB(s)llds< co.
To study the asymptotic properties of the Let A,(t), .,.,1,(t) and I,, ,I,, ;.,=limI,(t),
1175 314 E
ODES (Asymptotic Behavior of Solutions)

be the eigenvalues of A(t) and A = lim A(t), exists for every solution x(t) of(l), if there
respectively. N. Levinson proved the follow- exists a nontrivial solution x(t) of (1) such
ing theorem: Assume that I,, . . . ,A, are mutu- that limx(t)=O, then 1imRej’tr A(s)ds=
ally distinct and Mjk(t) = Re r0 [S(s) - n,(s)] ds -co, but if there is no such solution, then
satisfy either Mjk(t)+ co as t-t cc and for each Re s’ tr A(s) ds is bounded.
pair(j,k),Mj&)-M,(t,)> -Kfor t,<t,,or
Mjk(t)-+--a as t+oo and Mjk(tZ)-Mjk(tl)<K
for t, <t2, or IMjk(tZ)-Mjk(tl)(<K for all t,, E. Nonlinear Differential Equations
t,, where K is a positive constant. Then (5) has
a fundamental system of solutions {xi(t), . . . , Consider a system of nonlinear differential
x,(t)} such that equations of the form
f
x’ = Ax + f(t, x), (6)
xj(t) = exp Aj(s)ds [<j+o(l)], j=l,...,n,
(S 0 > where A is an II x n constant matrix and f(t, x)
where gj is an eigenvector of A corresponding is an n-vector function that is continuous for
to 5. t > 0,l x 1~ A, and that satisfies f(t, 0) = 0. Sup-
pose that f(t,x)/lxl+O as 1x1-+0 and t-co.
Then for every eventually nontrivial solution
x(t) of (6) that tends to zero as t+co, PC=
D. Boundedness and ConvergenCe of Solutions
lim t-‘log)x(t)l exists and equals the real part
of one of the eigenvalues of A. Conversely,
Consider again the linear system (5) satisfying if at least one eigenvalue of A has real part
s”IIA’(s)ll ds< co and j”IIB(s)ll ds< 00. Suppose p < 0, then there exists a solution x(t) of (6)
that all eigenvalues of A(t) have nonpositive such that lim t-’ loglx(t)l =p. Suppose that
real parts and that the eigenvalues of A = f(t,x)/lxl-0 as 1x1-+0 uniformly with respect
lim A(t) whose real parts vanish are simple. to t. Then if all eigenvalues of A have negative
Then all solutions of (5) are bounded. This real parts, the zero solution x(t) = 0 of (6) is
result is a generalization, due to L. Cesari, of asymptotically stable, and if A has an eigen-
the so-called Dini-Hukuhara theorem. value whose real part is positive, then the zero
In the case of general A(t), it is known that solution of (6) is tunstable. Suppose that fx(t, x)
not all solutions of (5) are bounded even if all =(iYfj(t, x)/ax,)+0 as 1x1-+0 uniformly with
solutions of (1) tend to zero as t+ 00 and if the respect to t. In this case, if A is a matrix such
matrix B(t) is such that j” IIB(s)ll ds < cc and that its k eigenvalues have negative real parts
B(t)-+0 as t -+ co. However, if A(t) is periodic and the other n-k eigenvalues have positive
or satisfies 1iminfReptr A(s)ds> -co, then real parts, then there exists a k-dimensional
under the assumption that s” IIB(s)ll ds< 00, manifold S containing the origin with the
the boundedness of all solutions of (1) implies following property: For to sufficiently large,
the boundedness of all solutions of (5). any solution x(t) of (6) tends to zero as t-co,
The following inequalities often provide provided that x(t,)ES, and if x(to)$S, x(t)
useful information about the asymptotic cannot remain in the vicinity of the origin no
behavior of solutions of (1): matter how close x(t,) is to the origin.
In the nonlinear system

x’ = F(t, x), (7)


suppose that F(t, x) is of period w with respect
to t and has continuous partial derivatives
with respect to x. Suppose, moreover, that (7)
where p[A(t)] =lim,,+, [l/1 + hA(t)JJ - II/h. has a solution p(t) of period o. If all the tchar-
(p[A(t)] was introduced by Lozinskii.) If acteristic exponents of the tvariational system
limsupS’p[A(s)]ds< 00, then all solutions of of (7) with respect to p(t), y’= F,(t, p(t))y with
(1) are bounded; if limSfp[A(s)] ds exists, then F,(t, x) = (aQ(t, x)/ax,), have negative real parts,
for every solution x(t) of (I), Ix(t)1 tends to a then the periodic solution p(t) is asymptoti-
finite limit as t-co; and if limS’p[A(s)]ds= cally stable. If an autonomous system x’ = F(x)
-co, then all solutions of (1) tend to zero as has a periodic solution p(t) and the corre-
t+ co. It can be shown that every solution of sponding variational system y’ = F,(p(t))y has
(1) tends to a finite limit as t+ co, provided n - 1 characteristic exponents with negative
that~“IIA(s)llds<cc. eigenvalues, then there exists an E > 0 such that
If all solutions of (1) are bounded, then for any solution x(t) satisfying (x(t,)--(to)1 <E
limsupRertrA(s)ds<co. IfliminfRertrA(s) for some to and tl, we have (x(t)-p(t+c)l+O
ds > -co, then (1) has a solution x(t) with the as t+ 00 for a suitable choice of c (asymptotic
property that limsuplx(t)l >O. When limlx(t)l phase).
314 F 1176
ODES (Asymptotic Behavior of Solutions)

F. Scalar Differential Equations Equation (9) is said to be oscillatory if every


solution of (9) that is continuable to t = co has
The aforementioned results can be specialized arbitrarily large zeros. If (9) is oscillatory and if
to the case of higher-order scalar (or single) ql(t)>q(t), then the equation x”+q,(t)lxlYsgnx
ordinary differential equations. Much sharper = 0 is also oscillatory. When y = 1, (9) is oscil-
results can often be derived through direct latory Lf q(t) > (1 + E)/4t2 for some t: > 0, and is
analysis of scalar equations themselves. In not oscillatory if q(t)< l/4?. A necessary and
particular, detailed and deep results have been sufficient condition for equation (9) with y # 1
obtained for second-order linear differential to be oscillatory is as follows: l” sq(s)ds = co
equations of the form if~>l;~“sYq(s)ds=~ifO<y<l.

x”+q(t)x=O, (8)
References
e.g., tMathieu’s equation.
If J”slq(s)lds< co, then (8) has a funda-
[1] M. Hukuhara, Sur les points singuliers des
mental system of solutions {x1(t), x2(t)} satisfy-
kquations diffkrentielles 1inCaires; domaine rCel,
ing, for t+ co,
J. Fat. Sci. Hokkaido Univ., (I) 2 (1934), 13-
x1(t)= 1+0(l), x2(0= Cl +dl)l, 88.
[2] R. Bellman, Stability theory of differential
x;(t)=l-‘o(l), x>(t)= 1+0(l);
equations, McGraw-Hill, 1953.
if j”lq(s)+ 1 Ids< a, then (8) has a funda- [3] L. Cesari, Asymptotic behavior and sta-
mental system of solutions satisfying bility problems in ordinary differential equa-
tions, Springer, third edition, 1970.
xl(t)=f’f[l +0(l)], x2(t)=,-*[1+0(l)],
[4] W. A. Coppel, Stability and asymptotic
x;(t)=f’f[l +0(l)], x;(t)= -em’[l +0(l)]; behavior of differential equations, Heath, 1965.
[S] P. Hartman, Ordinary differential equa-
and if 1” 1q(s) - 11ds < co, then (8) has a funda-
tions, Wiley, 1964.
mental system of solutions [6] E. A. Coddington and N. Levinson,
x,(t)=<?[l +0(l)], xZ(t)=em”[l +0(l)], Theory of ordinary differential equations,
McGraw-Hill, 1955.
x;(t)=ie”[l+o(l)], x;(t)= -ie-“[1+0(l)].
[7] J. S. W. Wong, On the generalized Emden-
Suppose that q(t)+c>O as t+m and j”Iq’(s)I Fowler equation, SIAM Rev., 17 (1975), 339%
ds < m. Then x(t) and x’(t) are bounded for 360.
every solution x(t) of (8). The same is true if
q(t) is a positive periodic function of period
w such that wfiq(s)ds<4. If q(t) is nega-
tive, then (8) always has both bounded and
unbounded monotone solutions. 315 (X111.4)
The number of linearly independent solu- Ordinary Differential
tions x(t) of (8) satisfying ~“lx(s)12ds< CC
Equations (Boundary
plays an important role in teigenvalue prob-
lems. It is known that the ordinary differen- Value Problems)
tial operator ![x] =x” + q(t)x is of +limit point
type at infinity if there exist a positive func- A. General Remarks
tion M(t) and positive constants k,, k, such
that q(t)< k,M(t), IA4’(t)Mm3’2(t)l Q k,, and Consider the differential equation in the real
{” ?~~“~(s)ds= co, and that 1[x] is of +limit variable x
circle type at infinity if q(t) > 0, j” q-1’2(s)ds =
c(j, and ~“I[q-3’2(s)q’(s)]‘+(1/4)q-5i2(s)q’2(s)l f(x ,,1y y’ . >y’“‘)=O. (1)
ds<m. Let a,, , ak be points in an interval I c R and
Finally consider the nonlinear equation consider several relations between nk values
x”+q(t)l~I~sgnx=O, (9) YCaib Y’taiX f.. >Y’“-” (ai), i= 1, , k. The prob-
lem of finding solutions of (1) satisfying these
where 11is a positive constant and q(t) is a relations is called a boundary value problem of
positive function. If q’(t)>O, then all solutions (1), and the relations considered are called
of (9) are bounded; if either q’(t) > 0 and lim q(t) boundary conditions. When k = 2 and u, , a2 are
< x or q’(t) < 0 and lim q(t) > 0, then all solu- the endpoints of I, the problem, called a two-
tions x(t) of (9) are bounded together with point boundary value problem, has been a main
their derivatives x’(t); and if q’(t) 2 0, lim q(t) = subject of study. We can consider boundary
xz and either q”(t)>0 or q”(t)<O, then all value problems in the same way for systems of
solutions of (9) converge to zero as t+m. differential equations.
1177 315 c
ODES (Boundary Value ProbIN

B. Linear Differential Equations where G(x, 5) = G(x, <, 0). For I


function G(x, 5, A) of (5) and th
C0nsider.a linear ordinary differential opera- tion G*(x, &II) of L*[y] =ly,
tor L defined by have the relation G(x, 5, A) = C?
the assumption that (5) is self-
the following four proposition
w,here pk(x) is a complex-valued function of has only real eigenvalues whit
Aclass C”-’ defined on a compact interval or countably infinite discrete s
a<x<bandp,(x)#Oforanyxe[a,b]. We functions corresponding to tw
define a system of linear boundary operators values are orthogonal to each
u l,...,U,,,by is an torthonormal set of eiger
that no eigenfunction is linear
Ui[Y] = f M,y’j-“(a)+ t Iv,y”-l’(b). of {rp,}, then the system {cp,} i;
j=l j=l
orthonormal set in the Hilbert
consisting of functions that arc
Given a function f(x) and complex constants
grable on (a, b), and hence for
yl, . . . , y,, the linear boundary value problem
defined by expansion f=a,cp,+a,~,+.
L,(a, b) the tParseva1 equality
LCYl=f(x), VCYI=Yi, i=1,...9n, (2) if S is a function of class C” sal
0, then the Fourier expansion
is a two-point boundary value problem. When
uniformly to ,f on [a, b].
f(x) = 0, yi = 0, i = 1, . . . , n, the problem is called
The boundary value problet
homogeneous; otherwise it is called inhomoge-
order equation
neous. Let L*[y] be a formally tadjoint dif-
ferential operator of L[y]. A set of m* linear (P(4Y’)’ + M4 + W))Y = 0,
boundary conditions 7JF [y] = 0, i = 1, . . . , m*,
is said to be an adjoint boundary condition of aY(4 + PJY’(4= 0, YY@)+ 6Y’
U,[y]=O,i=l,..., m, if for any function y of is called a Sturm-Liouville pro1
class C” satisfying Ui [ y] = 0, i = 1, . . . , m, and r are real-valued functions deli
any function y* of class C” satisfying Vi* [ y*] and CI, fi, y, are real constants. I
=O,i=l,..., m*, we have J,bL[y]pdx= q, r are continuous and p(x) > (
fiyL*[y*]dx. The boundary value problem [a, b]. Then (i) the eigenvalues
ing sequence tending to +co; (
L*cYl=o, ui” CYI =o, i=l,...,m*, (3)
tion q,(x) associated with A, h,
is said to be an adjoint boundary value problem zeros in a <x <b, and there ex:
of adjacent zeros of q,,(x) a zero (
(iii) the set of eigenfunctions is
uYl=o, YCYI =O* i=l ,...,m. (4) set on [a, b] with weight functi

We say that the problem (4) is self-adjoint if


L[y] = L*[y] and the conditions Ui[y] =O,
i=l , . . . , m, are equivalent to the conditions
U,*[y]=O,i=l,..., m*. When the coefficients pO, . . !
The boundary value problem containing a defined in an open interval -0
parameter 1 b < co and pk is of class Cnmk, L
natural way operators in the H
LCYI = lY, uiCYl=“, i=l ,...,n, (5) consisting of functions that are
grable in a XX < b, and the gen
admits nontrivial solutions only for special
based on operator theory in H
values of 1. Such values of 1 are called the
(- 390 Spectral Analysis of 01
eigenvalues (or proper values) of (5), and the
corresponding solutions $0 are called the
eigenfunctions (or proper functions) of (5). For
any value of 1 that is not an eigenvalue, there C. Nonlinear Differential Equ:
exists a unique function G(x, 5, A) such that the
conditions L[y] =,ly+J Ui[y] =0 are equiva- Boundary value problems for 1
lent to y = c G(x, 5, l)f(<)d<. The function ferential equations are very dif
G(x, 5, I) is called the Green’s function of (5). If sults are obtained only for equ
1= 0 is not an eigenvalue, then (5) is equivalent form.
to Consider, for example, the SI
equation
Y(X) = a * W, 5M5) dt,
s Ia Y” =f(x, Y, Y’)
315 Ref. 1178
ODES (Boundary Value Problems)

and boundary conditions y(a) = A, y(b) = [4] M. A. Naimark (Neumark), Linear dif-
B. The following theorem has been proved: ferential operators I, II, Ungar, 1967, 1968.
Suppose that f(x, y, y’) is continuous for a < x (Original in Russian, 1954.)
<b, co(x)<y<~(x), -co<y’< +co, and [S] N. Dunford and J. T. Schwartz, Linear
I”& Yt Y’)l c MU +y?; u”(X) >f(x, we4 o’(x)) operators II, Interscience, 1963.
and a” <f(x, W(X), O’(X)) for a < x < b; and [6] B. M. Levitan and I. S. Sargsyan (Sargsjan),
w(u)<A<o(a) and o(b)<B<ti(b). Then (6) Introduction to spectral theory: Self-adjoint
admits a solution y(x) such that y(a) = A, y(b) ordinary differential operators, Aml:r. Math.
= B, and Q(X) <y(x) < W(X) for a <x ,< b. If Sot. Transl. of Math. Monographs, 1975.
in addition f is an increasing function with (Original in Russian, 1970.)
respect to y, the solution is unique. More-
over, under suitable conditions, the solution
is obtainable by the method of successive
approximations.
The boundary value problem
316 (X111.3)
y”’ + 2yy” + 2/l&2 - y”) = 0, Ordinary Differential
Y(0) = Y’(O) = 0, y’bbk (x+m), Equations (Initial
where i and k are constants, appears in the
Value Problems)
theory of fluid dynamics. It is known that if
I > 0, the problem has a solution, and that if
A. General Remarks
0 < 1, < 1, the solution is unique.
Consider the system of differential equations
Consider a system of ordinary differential
Y;=&,Y,, . . ..Y.), j= 1, . ../ n. equations

The problem of finding a solution such that dy,ldx=fi(x,~,, . . ..Y.), i=l,...,n. (1)
yj(uj) = bj, j = 1, , n, called Hukubara’s prob-
A. L. Cauchy first gave a rigorous proof for
lem, reduces to the initial value problem when
the existence and uniqueness of solutions:
the uj coincide. The problem of solving
Iffi, i= l,..., n, and their derivatives aJ/ay,
y’“‘=f(x >y >y’ ,. ,, , y’“-1’) are continuous in a neighborhood Iof a point
(a, b,, . . . , b,), then there exists a unique solu-
y(uj)=bj, j= 1, . . . . n,
tion of (1) satisfying the conditions yi(u) = bi,
is reduced to Hukuhara’s problem by a suit- i=l,..., n. These conditions are called initial
able change of variables. The following result conditions, and the values a, b,, . . . , b, initial
is a generalization of +Perron’s theorem: Let values. The problem of finding solutions that
Q~(x), oj(x), j = 1, . , n, be continuous and right satisfy initial conditions is called an initial
and left differentiable functions and gj(x) < value problem (or Cauchy problem) If we con-
Wj(x) for a <x <b. Suppose that the fi(x, y, , sider (x, y, , . . . , y,) as the coordinates of a point
, y,) are continuous for c(Q x < p and wL(x) < in the (n + I)-dimensional space R”+‘, then a
y, $Q(x), k = 1, , n; satisfy (x - uj)(D %$x) - solution of (1) represents a curve in this space
fj(x,y,,...,y,))~Oforyj=wj(x)andw,(~)~ called a solution curve (or integral curve). The
yk<<wk(x), k#j; satisfy (x-aj)(D’gj(x)-4(x, statement that a solution satisfies initial con-
~1, . ,Y,)) ~0 for yj=pj(x) and o,(x)<Y,< ditions yi(u) = bi, i= 1, . , n, means that the
ok(x), k #j; and satisfy gj(uj) < bj < wj(aj). Then integral curve represented by it passes through
there exists a solution y(x) such that yj(uj)= bj the point (a, b,, . . . , b,,).
and gj(x) <y(x) < ai( This theory was ap- Since, in general, we can transform a dif-
plied by M. Hukuhara to the study of singular ferential equation of higher order into a sys-
points of ordinary differential equations. tem of differential equations of the form (1) by
introducing new dependent variables, all de&
nitions and theorems concerning the system (1)
can be interpreted as applying to a higher-
References
order equation. For example, for the equation
y’“)=f(x, y, y’, . . . , y’“-‘1) the conditions y(u)= b,
[l] E. A. Coddington and N. Levinson, y’(u) = b’, . , y’“-‘)(n)= b’“-‘) constitute initial
Theory of ordinary differential equations, conditions, and the values a, 6, b’, . . , b(“-‘) are
McGraw-Hill, 1955. initial values. If f and its derivatives dflay(‘) are
[Z] P. Hartman, Ordinary differential equa- continuous, then there exists a uni#que solution
tions, Wiley, 1964. satisfying given initial conditions.
133 E. Hille, Lectures on ordinary differential Suppose that the d are continuous. Then a
equations, Addison-Wesley, 1969. system of functions (yl(x), . . , y,(x:l) is a solu-
1179 316 E
ODES (Initial Value Problems)

tion of (1) if and only if is one of the simplest. When f is continu-


ous and satisfies the Lipschitz condition, the
Yitxlcbi+ xf;(x,Yl(x)~.~~~Y”(x))dx~ method of successive approximation, initiated
s II by C. E. Picard, is often used to prove the
i=l ,...,n. existence of solutions. This method is as fol-
lows: We choose a suitable function, for ex-
When the fi are not continuous, we define ample y,,(x) = b, and then define yk(x), k = 1,
(YlM . ..T y,(x)) to be a solution of (1) for 2 , . .. , recursively by K(X) = b + cf(x, y,-,(x))dx.
the initial value problem y,(a) = bi if (y,(x), Then {y,(x)} is uniformly convergent, and its
. . . , y,(x)) satisfied the integral equation just limit is a solution of (1) satisfying y(a) = b.
given. Assuming the continuity off, H. Okamura
We use the vectorial notation: y = (yl, . . . gave a necessary and sufficient condition for
y,),f=(f,,...,f,)togetherwith IIyII’=y:+ uniqueness: Suppose that f is continuous in D.
. . . + yi. The equations (1) are then written Then a necessary and suficient condition for
as the single equation there to exist a unique solution curve of (1)
going from any point of D to the right is that
Y’ = f(X,Y).
there exist a (?-function rp(x, y, z) defined
for (x, y, z) such that (x, y) and (x, Z)E D and
B. Equations in the Real and Complex satisfying the conditions cp(x,y, z) = 0 for y = z,
Domains cp(x,y, z) > 0 for y # z, and

We state main theorems for differential equa-


tions in the real domain in Sections C-F and
in the complex domain in Section G.

E. Perron’s Theorem
C. Existence Theorems
Consider the scalar equation y’ =f(x, y). We
Suppose that f(x, y) is continuous for Ix-al < r have Perron’s theorem: Let w(x) and W(X) be
and ((y-b\\ <p, and that Ilf(x,y)ll GM there. continuous functions that are right differenti-
Then equation (1) admits a solution satisfying able in CI< x <p and satisfy w(x) <a(x), and let
y(a) = b and defined in an interval Ix -al < f be a continuous function defined on D : CI<
min(r, p/M) (existence tbeorem). There are x < 8, g(x) < y < W(X). Suppose that D’@(x) <
two methods of proving this theorem, one f(x,g(x)) and D’?S(x)>f(x,~S(x)). (Dfo de-
using Caucby polygons and one using Qixed- notes the tright derivative of 0.) Then for any
point theorems for function spaces. From this (a, b)cD there exists a solution defined on
theorem we deduce that if f(x, y) is continuous a <x < fl and satisfying y(u) = b. The fact that
in a domain D of R”+l, then there exists a the interval of definition is a 6 x </l can be
solution curve passing through any point of D. expressed by saying that if we denote the set
Let y = (pl (x) and y = Q~(x) be solutions of (1) co<x<B,Iyl<co byR,thenDisclosedinR
defined in the intervals Ii and I,, respectively. and there exists, among solution curves going
If Z, cl2 and P~(x)=Q)~(x) for x~l,, we say from a point in D to the right, a curve that
that qz is a prolongation or extension of vi. reaches the boundary of R.
Given a solution of(l), there exists a nonex- Perron’s theorem was generalized by M.
tendable solution that is an extension of the Hukuhara and M. Nagumo. Let R be an open
solution. The solution curve of a nonexten- set in R”+‘, D a closed set in a, and f a con-
dable solution tends to the boundary of D as x tinuous function in D. A necessary and suffi-
tends to any one of the ends of its interval of cient condition for (1) to admit a solution
definition. curve going from any point (a, b) in D to the
right is that there exist a sequence of points in
D, {(u,,b,)}, such that a& and (bk-b)/&-a)
D. Uniqueness Theorems
+f(u, b). Moreover, every solution curve is
prolonged to the right to the boundary of Q.
Continuity does not imply uniqueness of the Let S(y) be a continuous tsubadditive and
solution. If (1) admits at most one solution
positively homogeneous function and w(x) a
satisfying a given condition, we call this con- function continuous and right differentiable
dition a uniqueness condition. Various kinds of on CL<x < /?.A sufficient condition for D: E <
uniqueness theorems, which state uniqueness x < fi, S(y) < w(x) to possess the property in the
conditions, are known. statement of Perron’s theorem is given by
The Lipschitz condition: D’w(x)> S(f(x, y)) for llyll =4x).
Ilf(x,y)-f(x,z)ll <LIIy-zll, LB0 aconstant, A continuous function w(x) is said to be a
316 F 1180
ODES (Initial Value Problems)

right majorizing function of (1) with respect to If the derivatives aflay, are also continuous,
S(y) if for any solution p(x), S(&a)) < w(a) then (p(x, a, b, 1) is a continuously differentiable
implies S(q(x)) d w(x) for x > a if both S(q(x)) function of (x, b); zjk = 8cpj/i?bk, j = 1, . . , n,
and w(x) are defined. In order for w(x) to be satisfy the system of linear ordinary differential
a right majorizing function, it suffices that equations and the initial condition
D+w(x):>S(f(x, y)) for llyll =0(x). A function
satisfying this inequality is called a right su-
perior function of (1) with respect to S(y). If
F(x, S(y))> S(f(x, y)), then any solution of and zj = a(pj/au, j = 1, . , n, satisfy the same
y’= F(x, y) is a right superior function of (1). system with the initial condition zj(u) =
Theorems stating such facts are called com- -JJa, b, 4, where G?fjlay,) means (4$/c%+)
parison theorems. (x, (p(x, a, b, i,), 1). If f further admits continu-
If (1) has a unique solution, the condition ous derivatives af/a,$, then ~(x, a, b, A) is con-
D’w(x)> S(f(x, y)) for S(y) = w(x) implies that tinuously differentiable with respect to A,, and
w(x) is a right majorizing function of (1). Con- moreover, wjl = arpj/dil,,j = 1, , n, :;atisfy
versely, we can derive from comparison theo- the system
rems general uniqueness theorems, one of
which we state. Suppose that G(x,y) is continu-
ous for a<x<fl and O<y<r(x); G(x,O)=O;
a solution of y’= G(x, y) such that y=o(r(x)) as These differential systems are called the varia-
x+cc+O vanishes identically; and finally that tional equations of (1).
S(f(x, ykfb, yJ)< Gk Sty, -y2)). Then for C. Caratheodory proved the existence of
two solutions pi, IJQ of (1) such that S(q, - y)J solutions of (1) under the less restrictive as-
=o(r(x)) as x+c1+0, we have vi EV)~. Assum- sumption that f is continuous with respect to y
ing that f is continuous at (a, b) and taking for any fixed x and measurable with respect to
y/(x-a) as G, we obtain Nagumo’s condition x for any fixed y.
(X-U)S(f(X,Y,)-f(X,Y*))~S(Y,-Yy,). Suppose that f is continuous and satisfies a
G. Peano proved the following theorem: Lipschitz condition. Let z(x) be a function
With the same notation and assumption as in such that z(u)=b and Ilzi(x)-J;(x,z(x))ll <E(X),
Perron’s theorem, there exist a maximum and let y(x) be a solution of (1) such that y(a) =
solution cp and a minimum solution cp of y’ = b. Then we obtain
f(x, y) such that Y(U) = b for w(a) < b < ~(a),
and such that there exists a solution curve
lIzi(yi(x)ll <eLixma’ X.z(x)emLIX-‘ldx ,
passing through any point in a <x < ,O, q(x) < IS (I
y<Cp(x). This theorem was extended by Huku-
which gives approximate solutions of (1).
hara as follows. Suppose that f(x, y) is con-
tinuousandboundedinD:a<x</$ ~~yll<co.
Let C be a icontinuum in D, and let g(C) de-
note the set of solutions intersecting C. Then G. Equations in the Complex Domain
5(C) is a continuum of the tfunction space
C( [a, /l]). From this theorem we can deduce We assume that the variables x, y,. . , y, all
the Kneser-Nagumo theorem, which says that have complex values. We have the ,following
the intersection of the set of points belonging theorem: If f is holomorphic at (a, b), then (1)
to the members of s(C) and a hyperplane x = 5 has a unique solution that is holomorphic
is a continuum. It was proved by Hukuhara at x = a and takes the value b at x =: a. This
that if C is in the hyperplane x = a, then (1) theorem can be proved by utilizing the method
admits a solution connecting the two hyper- of successive approximations and fixed-point
planes x = a and x = p and passing through the theorems. Cauchy proved the theorem by
boundary of the set of points belonging to the using majorant series. This method., called the
members of g(C). method of majorants, proceeds for the scalar
equations as follows: Let f(x, y) = 1: ujk(x -
a)j(y - b)k and y = C c,(x - a)“. Substituting
F. Equations Containing Parameters the latter series into both members. we can
successively determine the coefficients c, by the
We assume uniqueness of the solution of method of undetermined coefliciems. Assum-
ingthatlfl<Mforlx-u(<randly-bb(<p,
Y’ = f(x, Y, 4, 1=(4,...,&J, (2) consider the solution Y = C C,,(x - t~)n of
where f is a continuous function of (x, y, I). Let dY M
(p(x, a, b, 1) denote the solution of (2) satisfying
%=(l-(x-a)/r)(l-(Y-b)/p)
y(a) = b. Then ~(x, a, b, 2) is continuous with
respect to (x, a, b, 2) in its region of definition. 1 satisfying Y(a) = b. We have C, 2 lc,,I for any n,
1181 317 A
Orthogonal Functions

which shows that Z C,(x -a)” is a majorant tielles ordinaires sans points de Peano, Mem.
series of 1 c,(x -a)“. Coll. Sci. Univ. Kyoto, (A) 24 (1941).
We have the following uniqueness theorem: [S] 0. Perron, Ein neuer Existenz Beweis fiir
Suppose that f(x, y) is holomorphic at (a, b). die Integrale der Differentialgleichung y’ =
Let C be a curve having the point a as one of f(x, y), Math. Ann., 15 (1945).
its ends and q(x) be a solution with the follow- [6] M. Hukuhara, Sur la theorie des equations
ing properties: y, is holomorphic on C except differentielles ordinaires, J. Fat. Sci. Univ.
possibly at x = a, and there exists a sequence of Tokyo, sec. I, vol. VII, pt. 5, 1958.
points on C, {a,}, such that ak+a and ~(a,)- [7] M. Nagumo, Uber die Lage der Integral-
b. Then q(x) is holomorphic at a. By a ttheo- kurven gewiihnlicher Differentialgleichungen,
rem of identity, the analytic continuation Proc. Phys.-Math. Sot. Japan, 24 (1943).
of a solution continues to be a solution if it [8] E. Hille, Ordinary differential equations in
does not encounter any singularity off. If a the complex domain, Wiley, 1976.
solution q(x) is holomorphic on a smooth [9] R. Abraham and J. E. Marsden, Founda-
curve x = x(t), with 0 < t < t, and x(O) = a, and tions of mechanics, Benjamin-Cummings,
p(a) = b, then y =&(t)) is a solution for 0 < second edition. 1978.
t<t, of

Y’ = X’WW)~ Y) (3)
satisfying y(0) = b. Conversely, if (3) has a solu- 317 (X.21)
tion y = v(t) defined in 0 < t < t, and satisfy-
ing y(0) = b, and if f(x, y) is holomorphic at
Orthogonal Functions
(x(t), v(t)) for 0 < t < t,, then (1) has a solution
(p(x) holomorphic on C and y(t)=&(r)) for A. Orthogonal Systems
O<t<t,.
Suppose that f= fi/fi, where fi and fi are Let (X, ,u) be a tmeasure space. For complex-
holomorphic at (a, b). If fi (a, b) #O, fi(a, y) f 0, valued functions A g on X belonging to the
and f,(a,b)=O, then the equation y’=f(x,y) tfunction space &(X), we define the inner
admits a unique solution such that y-b as product (l;g)=jxf(x)g(x)dp(x) and the norm
x-a, and this solution can be expanded into Ilf 11=(Jf)“*. If(Xg)=O, then we say that f
a tPuiseux series: and g are orthogonal on X with respect to the
measure p. If X is a subset of a Euclidean
y= 5 C,(X-up. space and p is the tLebesgue measure m, then
II=0
we simply say that they are orthogonal. If the
If f is holomorphic at (a, b), then the solu- measure has a tdensity function q(x) with
tion y =9(x, x0, yO) of (1) satisfying y(x,) = ye respect to the Lebesgue measure and (1 g)
is holomorphic with respect to (x,x,,, y,,) at =jxf(x)S(x)q(x)dm(x)=O, we say that they
(a, a, b). If f(x, y, 1) is holomorphic at (a, b, &), are orthogonal with respect to the weight
then the solution of (2), y = ~(x, x0, ye, A), satis- function q(x). If IIf II* = 1, then f is said to be
fying y(x,) = y,, is holomorphic at (a, a, b, A,). normalized. A set of functions {f.(x)} (n =
Suppose that x is a real variable and y is a 1,2, . . . ) is said to be an orthogonal system
complex vector. If f is continuous with respect (or orthogonal set), and we write {f,} eO(X), if
to (x, y) and holomorphic with respect to y, any pair of functions in the set are orthogonal.
then the solution ~(x, x0, yO) is holomorphic The orthogonal set {f.(x)} is said to be ortho-
with respect to yO. If f(x, y, 1) is continuous normal, and we write { fn} E ON(X) if each f, is
with respect to (x, y, 1) and holomorphic with normalized.
respect to (y, A), then ~(x, x0, ye, 1) is holomor- Let {f,} be a set of linearly independent
phic with respect to (ye, 1). functions in L*(X), and let R be a subset of
L,(X). If we can approximate any function
f o R arbitrarily closely by a finite linear com-
References bination of the f.(x) with respect to the norm
in L*(X), we say that {f,} is total in R. Let
[l] E. Kamke, Differentialgleichungen reeller {f.} E O(X). If (cp,fn) = 0 for all n implies q(x)
Funktionen, Akademische Verlag, 1930. = 0 almost everywhere (a.e.), then { fn} is said
[2] E. A. Coddington and N. Levinson, to be complete in L*(X). An orthogonal system
Theory of ordinary differential equations, {f,} is complete in L,(X) if and only if the
McGraw-Hill, 1955. system is total in L*(X).
[3] P. Hartman, Ordinary differential equa- If {f,} E O(X), then the series Czl cnfn(x) is
tions, Wiley, 1964. called an orthogonal series. If the series con-
[4] H. Okamura, Condition necessaire et verges to q(x) tin the mean of order 2, then c,
suffisante remplie par les equations differen- =(q,f.)/llfn11*.Wecallthec,(n=1,2,...)the
317 B 1182
Orthogonal Functions

expansion coefficients or Fourier coefficients of where 1 < p < 2, l/p + l/p’ = 1. Conversely, if
q(x) with respect to {f.}. (C Ia Ip)l/p < co (1 < p < 2), there exists a func-
If {g,,} c L,(X) are linearly independent, we tion b(x) which has the a, as its Fourier coeffl-
can construct an orthonormal system {f.} cients and such that
by forming suitable linear combinations of
the g”; {J,} spans the same subspace as { 9”).
For this purpose we set fi(x)=gl(x)/llgl 11,
.L(x)=c~~C4lllcp.ll~
where cp,(x)=&4- (F. Riesz’s theorem). When the orthonormal
C:Zi (g,,,fJf,(x), n > 2. This procedure is
system is the trigonometric system, this is
called Schmidt orthogonalization or Gram-
called the Hausdorff-Young theorem. (ii) Let
Schmidt orthogonalization.
{ uz} be the decreasing rearrangement of { 1a, I };
If the c, are Fourier coefficients of (P(X)E
then
L,(X) with respect to {f,(x)} EON(X), then
we have the +Bessel inequality C,“=l lc,j’<
u~pnP-2 < A p (1 .:p<2).
//cpI/‘. Equality in the Bessel inequality for all “=l s L1
(peL,(X) (the tParseva1 identity) is equivalent
to completeness of {f.} in L,(X). In this case If 4 > 2 and x uzqnq-* < co, then there exists a
C,“=, cJJx) is called the orthogonal expansion function q(x) which has the II, as its Fourier
of cp with respect to {f”}, and conversely, for coefficients and such that
any sequence {c,} such that C jc,l’ < co, there
is a function cpE L,(X) that has the c, as its
Fourier coeffkients, and s*lp(x)lqdx$Aq f u,*qnq-2
n=1
(Paley’s theorem). When the system is trigono-
metric, this is called the Hardy-Littlewood
theorem. (6) If for some positive E we have
C lc,12-‘< 00, then Cc”f”(x) converges a.e.
(7) If we set s*(x) = sup, 1C:=1 cJ,(x)l, then
This is called the tRiesz-Fischer theorem. //s*II~<A~(CC:~V~-*)~/~ (q>2), where {cz} is
the decreasing rearrangement of { Ic.I}.

B. Orthogonal Systems on the Real Line

We assume that X is a finite interval (a, b) and C. Examples of Orthogonal Systems


that functions on X are real-valued. We write
O(a, b) or ON(a, b) instead of O(X), ON(X). (1) {cosnx} EO(0, n), {sinnx} EO(0, t-c).(2)
(1) If {f,} EON@, b), If.(x)l =GM(const.),and { 1, cos nx, sin nx} E 0(0,27t) (- 159 Fourier
Cc&x) converges ta.e., then c,‘O as n+ co. Series). (3) Suppose that A(x) is positive and
(2) We can construct a complete orthonormal continuous, and let y,(x) be solutions of y”(x)
system {f,(x)} and a function (P(x)EL~(u, b) + 1, A(x)y(x) = 0 satisfying the condition y,(u)
such that its orthogonal expansion CcJJx) = y,(b) = 0, where 1, is any teigenvalue. Then
diverges everywhere. (3) If {f.(x)} E ON(a, b)
{JA(x)y,(x)} E O(u, b) (for orthogonality of
and x cf log’ n < co, then C c,f”(x) converges
eigenfunctions - 315 Ordinary Differential
a.e. The factor log* n cannot be replaced by
Equations (Boundary Value Problems) B). (4)
any other monotone increasing factor w(n)
Set r”(x) = -1 or 1 according as the nth digit of
satisfying 0 <w(n) = o(log* n) (Rademacher-
the binary expansion of x (0 <x < 1) is 1 or 0,
Men’shov theorem). K. Tandori proved that if
and r,,(x) = 0 if x is expandable in two ways.
c,JO and C c,(p, converges a.e. for any ortho-
Then {r,(x)} E O(0, 1). This is called Rade-
normal system {cp,}, then I: Ic,Izlog2 n < co.
macher’s system of orthogonal functions. The
(4) If the orthogonal expansion of a function
system is not complete, but it is interpreted
cp~L,(a, b) is tsummable by Abel’s method on
as a tsample space of coin tossing. Rade-
a set E, then it is t(C, l)-summable a.e. on E.
macher’s system is useful for constructing
(C, l)-summability a.e. of the orthogonal ex-
various counter-examples. (5) Let the binary
pansion of a function of cpEL,@, b) is equiva-
expansion of n be n = 2’1+ 2’1+ . . . + 2”~ (vl < v2
lent to convergence a.e. of the partial sums
< . <v,), and set w~(~)=r~~+~(x)r,~+~(x)
s*“(x) (n = 1,2, .) of its expansion. (5) Sup-
. . . rvP+l(x). Then {w”(x)} is a complete or-
pose that {f.(x)} eON(u, b), If.(x)1 < M. Then:
thonormal system called Walsh’s system of
(i) If the a, are the Fourier coeffkients of q(x)
orthogonal functions. This system is interpreted
with respect to {f,(x)}, then as a system of characters of the group of binary
numbers,and there are many theoI*emsfor this
system analogous to those for the 1rigono-
1183 317 Ref.
Orthogonal Functions

metric system. (6) In the interval [0, 11, set If c(is a positive integer m, we get

x!m = J2m, x~((k- 1)/2”, (k- 1/2)/2m)

= -3, xE((k-11/2)/2m, k/2”)


The particular case m = 0 gives the Laguerre
=o, XE((l- 1)/2”, 1/2”), polynomials. In this case, however, it is cus-
lfk, 1<1<2”‘. tomary to normalize them as L,(x) =
(e”/n!)(d”/dx”)(x”e-“) (- Appendix A, Table
The orthonormal system x:(x) (1 <k < 2”, 2O.VI). Laguerre polynomials are used in
1~ m) is called Haar’s system of orthogonal tnumerical integrations of a Gaussian type in
functions. The Haar expansion of the continu- (0, co). Furthermore, associated Laguerre
ous function f(x) converges to f(x) uniformly. polynomials appear in the solutions of the
Schrodinger equation for the behavior of
hydrogen atoms. This system of orthogonal
D. Orthogonal Polynomials (- Appendix A, polynomials is useful in the expansions of
Table 20) approximate eigenfunctions of atoms analo-
gous to hydrogen, velocity distribution func-
tions of molecules in gas theory, and so on.
Suppose that we are given a weight function
(3) Setting (p(x)=emXZ (or emXzp)in (-co,
cp(x)30 (q(x) > 0 a.e.) defined on (a, b) and that
cc), we get Hermite polynomials H,,(x) =
the inner product of functions X g on (a, b) is
( -ll)“eXZ(d”e-X2/dx”), modulo constant fac-
defined by (Jg)=~~f(x)g(x)cp(x)dx. We ortho-
tors (- Appendix A, Table 2O.VI). Hermite
gonalize {x”} by Schmidt orthogonalization
polynomials are special cases of parabolic
and obtain polynomials p,(x) of degree n. Here
cylinder functions (- 167 Functions of Con-
the sign of p,(x) can be determined so that the
fluent Type). These polynomials appear as
sign of the coefftcient of the highest power of
x is positive. We call {p”(x)} the system of eigenfunctions of the Schrodinger equation for
harmonic oscillators. They are also connected
orthogonal polynomials belonging to the
weight function q(x). This system is complete with probability integrals and are used in
mathematical statistics.
in L$‘)(a, b), which is defined to be the space
(4) Replacing the integral by a finite sum
of functions f such that fiIf(x)[*cp(x)dx< ~0. -.
~&j(m)g(m) m the definition of inner pro-
In other words, the system (mp,(x)} is a
complete orthonormal system in the ordinary duct, we get so-called orthogonality for a finite
sum. (Regarding orthogonal polynomials with
L,(a, b) space. Concerning the convergence
respect to a finite sum (- Appendix A, Table
problem of the orthogonal expansion by
{p,(x)}, the Christoffel-Darboux formula 2O.VII) and their application to the mean
square approximation - 19 Analog Compu-
P,(x)P,+l(t)-P”(~)P”+l(x) tation F.) Since orthogonal polynomials with
&P,(L)P,(x)=c.
t-x respect to a finite sum are often called simply
orthogonal polynomials by engineers, one
plays an important role. must be careful not to confuse these with the
Several important special functions in clas- ordinary ones.
sical mathematical physics are given by ortho-
gonal polynomials:
(l)Settingcp(x)=(l-~)“(l+x)~(cc>-l,P>
References
-1) in [ -1, 11, we get the Jacobi polynomials,
although they are sometimes defined in [O, l]
with respect to q,(x) =x’( 1 -x)p (- Appendix [ 1) S. Kaczmarz and H. Steinhaus, Theorie
A, Table 20.V). If we set tl = /J in the Jacobi der Orthogonalreihen, Warsaw, 1935 (Chelsea,
polynomials, we get the ultraspherical poly- 1951).
nomials (or Gegenbauer polynomials) (- [2] G. Szegii, Orthogonal polynomials, Amer.
Appendix A, Table 20.1). Furthermore, if LY= p Math. Sot. Colloq. Publ., 1939.
=O, then we get the tLegendre polynomials, [3] R. Courant and D. Hilbert, Methods of
and if tl = fi = -l/2, we get the Chebyshev mathematical physics I, Interscience, 1953.
polynomials T,,(x) = cos(n arc cos x). The T,(x) [4] F. G. Tricomi, Vorlesungen iiber Ortho-
also appear in the best approximation prob- gonalreihen, Springer, 1955.
lem (- Appendix A, Table 2O.II; 336 Poly- [S] G. Alexits, Convergence problems of or-
nomial Approximation). thogonal series, Pergamon, 1961, revised trans-
(2) If we set q(x) = x4evX in (0, co), we get the lation of the German edition of 1960.
Sonine polynomials (or associated Laguerre [6] G. Sansone, Orthogonal functions, Inter-
polynomials) with appropriate constant factors. science, revised English edition, 1959.
318 A 1184
Oscillations

318 (XX.1 4) The solution is given by a series

Oscillations 1 A, sin(knx/l) cos(knnt/l),


k
which is just the superposition of tb.e funda-
A. General Remarks
mental vibration (corresponding to k = 1) and
simply harmonic motions of frequencies equal
A vibration or oscillation is a phenomenon
to multiples of the fundamental frequency.
that repeats periodically, either exactly or ap-
If a resisting force proportional to the veloc-
proximately. Exactly periodic oscillations are
ity is acting, the equation becomes
studied in the theory of tperiodic solutions of
differential equations. The tperiod of a solu- d2xldt2+2edxJdt+n2x=0, n>E, (2)
tion f(t) is called the period of the oscillation,
whose solution
and its reciprocal the frequency. The difference
between the greatest and least values of f(t) x= AC”‘cos(ot+cc), a=Jn2-Ei (3)
(globally or in an interval) is the amplitude.
is not periodic. However, x becomes zero at
The theory of vibrations has its origin in the
a fixed interval n/a, and the extremal values
study of mechanical vibrations, but its nomen-
in the intervals decrease to zero in a geo-
clature has been used also for electric circuits.
metrical progression with the common ratio
As examples of practical applications of the
u = exp( - ns/a). This phenomenon is called
theory of oscillations, we mention, in engineer-
damped oscillation with damping ratio u and
ing, the prevention of vibrations and the gen-
logarithmic decrement log u = - Z&/C. In this
eration of stable sustained oscillations, and in
case, too, 27c/(~ is called the period.
geophysics, investigations concerning the free
When a driving force term q(t) is present in
oscillation of the earth, the existence of which
the right-hand side of (2), the solutilsn takes on
has recently been confirmed.
the additional term

B. Linear Oscillation

Periodic solutions of tlinear differential equa-


-COSCG cp(t)@sin otdt ,
tions have been studied in detail for a long s >
time. Perhaps the simplest case of such an
oscillation is represented by the differential which represents the forced oscillation due to
equation do
If E<O in (2) (negative resistance)> the solu-
d2xJdt2+n2x=0, (1) tion (3) increases in amplitude, so that a small
disturbance is amplified, resulting i:n an auto-
where the restitutive force is proportional to
matic generation of oscillation. This phenom-
the displacement from the equilibrium posi-
enon is called self-excited vibration. Besides
tion. Typical examples are the free vibration
being caused by some special kinds of circuit
of a simple pendulum with small amplitude
elements (e.g., tunnel diodes), such a situation
and an electric circuit composed of a self-
often occurs when the vibrating system has
inductance and capacity (without resistance).
The solution is given by x = A cos(nt + CC).This time delay characteristics (- 163 Functional-
is called harmonic oscillation or simple bar- Differential Equations).
Among sustained vibrations, other than
manic motion. Here the amplitude is A, the
forced oscillations and self-excited ,vibrations,
period is 2744 n is the circular frequency, and x
are the parametrically sustained vibrations
is the initial phase.
caused by periodic variation of a parameter of
A system of m tdegrees of freedom (x, , ,
the vibrating system. Electric wires and panto-
x,) is said to be in free harmonic oscillation if
graphs for use in high-speed electric railways
the coordinates can be expressed as
must be designed to prevent unwanted para-
metrically sustained vibrations. On the other
xi = 2 Ai,cos(n,t + a,), i=1,2 ,..., m.
k=l hand, a parametron is an electric element
utilizing parametrically sustained vibration.
Each of these simple harmonic oscillations is
called a normal vibration. As a limiting case,
where the number of degrees of freedom is C. Nonlinear Oscillation
infinite, we have the vibration of a string:

i?=u(x, t) Actual vibrating systems contain more or less


wzn2
nonlinear elements, which give rise to various
at= 3x2 ’
kinds of oscillations different from those de-
u(0, t) = u(/, t) = 0. scribed by the linear theory (- 290 Non-
1185 318 Ref.
Oscillations

linear Oscillation). For example, d2x/dt2 -


E( 1 -x2) dx/dt +x = 0 (E > 0) represents a stable
sustained oscillation such that for large values
of E, two nearly stationary states occur alter-
nately, the transition from one to the other
taking place abruptly. This is called relaxation
oscillation.

References

[l] Lord Rayleigh, The theory of sound, Mac-


millan, second revised edition, I, 1894; II, 1896
(Dover, 1945).
[2] A. A. Andronov and C. E. ChaIkin, Theory
of oscillations, Princeton Univ. Press, 1949.
(Original in Russian, 1937.)
[3] L. S. Pontryagin, Ordinary differential
equations, Addison-Wesley, 1962. (Original in
Russian, 1961.)
Also - references to 287 Nonlinear Lattice
Dynamics, 290 Nonlinear Oscillation, 291
Nonlinear Problems.
319 A 1188
Paradoxes

319 (1.3) with integral coefficients. From the specified


enumeration of all expressions in the English
Paradoxes language, by striking out those which do not
define a real number in the interval (0, 11,
A. General Remarks we obtain an enumeration of those which
do. Consider the following expression: “The
A statement that is apparently absurd but not greatest real number represented by a proper
easily disproved is called a paradox. A con- nonterminating decimal fraction whose nth
tradiction between a proposition and its neg- digit, for any natural number n, is not equal to
ation is called an antinomy if both statements the nth digit of the nonterminating decimal
can be supported by logically equivalent rea- fraction representing the real number defined
soning. In practical use, however, “paradox” by the nth expression in the last-described
and “antinomy” often mean the same thing. enumeration.” Then we have before us a de&
nition of a real number in the interval (0, l] by
means of an expression in the English lan-
guage. This real number, by its definition,
B. Paradoxes in Set Theory
must differ from every real number definable
by an expression in the English language. This
(1) The Russell Paradox (1903). We classify is contradictory.
sets into two kinds as follows: Any set that The following paradox was given by Berry
does not contain itself as an element is called a (1906): “The least natural number not name-
set of the first kind, and any set that contains able in fewer than twenty-two syllables” is
itself as an element is called a set of the second actually named by this expression, swhich has
kind. Every set is either a set of the first kind twenty-one syllables. The Epimenides paradox
or of the second kind. Denote the set of all sets is a traditional ancient Greek paradox of this
of the first kind by M. If M is a set of the first kind: Epimenides (a Cretan) said, “Cretans are
kind, M cannot be an element of M. But if M always liars. . .”
is of the first kind, then M must be an element
of M, by definition. This is contradictory. On
the other hand, if M is a set of the second kind,
C. Paradoxes of the Continuum
A4 must be an element of M; but since M is an
element of M, M is a set of the first kind, so M
cannot be an element of M, by definition. This The problem of the continuum is important in
is contradictory. both mathematics and philosophy. There are
Since the kind of reasoning employed in this several paradoxes of Zeno concerning the
paradox is very simple and is often utilized in continuum, among which the following two
mathematics, it became popular in set theory. are best known
To remove this paradox from set theory, Rus- (1) Assume that Achilles and a tortoise start
sell suggested tramified type theory. If we simultaneously from the points A and B, re-
adopt this theory, however, it becomes very spectively, Achilles running after the tortoise.
hard to develop even an ordinary theory of When Achilles reaches the point i?, the tortoise
real numbers (- 156 Foundations of Math- advances to a point B,. When Achilles reaches
ematics). On the other hand, this paradox, the point B,, the tortoise advances further to a
together with the Burali-Forti paradox, indi- point B,. Thus Achilles can never overtake
cates that the definition of a set should be the tortoise.
restrictive. This realization led to the develop- (2) A flying arrow occupies a certain point at
ment of +axiomatic set theory. each moment. In other words, at each moment
the arrow stands still. Therefore the arrow can
never move.
(2) The Burali-Forti Paradox (1897). Let W=
{0, 1,2, , w, } be the +well-ordered set
(- 3 12 Ordinal Numbers A) of all tordinal
numbers. Let R be the ordinal number of B! References
Then every ordinal number, being an element
of w is less than Q But s2 is an ordinal num-
[l] A. N. Whitehead and B. Russell, Principia
ber. Hence, R < R. This is contradictory.
mathematics, second edition, Cambridge
Univ. Press, 1925.
(3) The Richard Paradox (1905). The expres- [2] S. C. Kleene, Introduction to metamath-
sions in the English language can be enum- ematics, Van Nostrand, 1952.
erated by the device that is applied to the [3] E. Mendelson, Introduction to mathemat-
usual enumeration of the algebraic equations ical logic, Van Nostrand, 1966.
1189 320 B
Partial Differential Equations

320 (XIII.1 9) Inotherwords,iff,,f, ,..., f,-iaren-lin-


dependent tintegrals of (2), then for an arbi-
Partial Differential Equations trary function a’, 2 = @(jr, . . . ,fn-J is a general
solution of (1) (- Section C).
A. General Remarks Example 2. If PI, P2, . . . , P,,, R are functions
of independent variables x1, . . . , x, and the
A partial differential equation is a functional dependent variable z, and if the quasilinear
equation partial differential equation (Lagrange’s dif-
ferential equation)
a2 azz azz
l>...> X”‘Z,ax, >...> G’ax,ax,‘“’ =o
>
p,;+P2;+ . . . +P”&=R
that involves a function z of independent vari- 1 2 ”

ables x1, x2, . . . . x,, its tpartial derivatives,


has an integral hypersurface V(z, xi, . . , x,) = 0,
and the independent variables x1, . . ,x,. The
then we have
definition of a system of partial differential
equations is similar to that of a system of
tordinary differential equations. (The partial P,~+p2~+...+P”~+R~=0, (4)
1 2 ”
differential equation becomes an ordinary
differential equation if the number of inde- which is an equation of type (1). From this we
pendent variables is one.) can obtain a general solution by the method of
The torder of the highest derivative appear- example 1. The same procedure is applicable
ing in a partial differential equation is called to solving other systems of partial differential
the order of the partial differential equation. equations.
Usually we write pi for az/axi, x for x1, and
y for x2 when n = 2, and p = &/ax, q = &lay,
r=aZZlaxZ,~=a2z~axay,t=azzlay2 when2
B. Characteristic Manifolds
is a function of x and y.
A partial differential equation is called linear
if it is a linear relation with respect to z and its We consider a partial differential equation of
partial derivatives. For example, the equation the nth order of two independent variables
&Y)~+%Y)s+ C(x, At+ W,y)p x, y:

+ % y)q + F(x, Y)Z = G(x, Y) F(~,Y,z,PI~,P~~, . . ..P.o, . . ..~on)=O.

is a linear partial differential equation. A par- where pjr = g$. (5)


tial differential equation is quasilinear if it is
a linear relation with respect to the highest-
order partial derivatives. A partial differential With this equation, we associate a manifold
equation is called nonlinear if it is not linear defined by a real parameter 1,
(- 291 Nonlinear Problems). A function z = x =x(n), Y=Y(a)v Pjk=Pjk(A)t
dXl>X2,‘.~, x,) that satisfies the given partial
differential equation is called a solution of j,k=O, l,..., n-l; j+k<n- 1, (6)
the partial differential equation. Obtaining
and consider the following problem: Find a
such a solution for a given partial differen-
solution cp(x, y) of (5) that satisfies
tial equation is called solving this equation,
and by analogy to the case n = 2, the integral aj+kdx, Y)
hypersurface of the equation is asap =Pjk(l),

z-fp(x,,x, ,...) XJ’O. j,k=O,1,2 ,..., n-l; j+k<n-1


For a system of partial differential equations,
on the curve x=x(n), y = y(L). We call this
we define solutions in the same manner.
Example 1. X,, X2, . . , X, are functions of problem the Cauchy problem for equation (5).
If F vanishes for a system of values x0, y”,
n independent variables xi, x2,. . , x,. Then
solving the partial differential equation p$(j,k=0,1,2,...,n;p~o=zo;j+k<n)andis
tholomorphic in a neighborhood of this sys-

x,~+,~+...+x”g=o
1 2 ”
tem of values, if x, y, pjk are holomorphic
functions of 1 in a neighborhood of 1= 0 and
take the respective values x0, y”, pi at 1= 0,
is equivalent to solving the system of ordinary
and if
differential equations
P,,dy”-PP,-,,,dxdy”-‘+ ...
dx, dx, -dx,
(2)
Xl x2 X +(-l)“Po,dx”#O (7)
320 C 1190
Partial Differential Equations

for (x0, y”, pi), then we have a unique holo- C. Classification of Solutions
morphic solution of this Cauchy problem in
a neighborhood of (x0, y”, p,;). (Here we use First, we consider a partial differential equa-
the notation l$ = aF/apjk.) This is Cauchy’s tion of the first order of two independent
existence theorem. We cannot apply this theo- variables:
rem when the left-hand side of (7) vanishes
Fb, Y, z, P, d = 0. (8)
at (x0, y”, pi). At such a point, uniqueness of
the solution fails, and there may be several A solution of (8) that contains two arbitrary
solutions through the point (x0, y”, p$). There constants is called a complete solution. If we
are n curves on which the left-hand side of (7) get one complete solution of (8), then we can
vanishes on the integral surface z = cp(x, y). obtain all the solutions by differentiations and
These curves are called characteristic curves of eliminations. Let (9) be a complete solution:
(5). We associate the values
V(x, y, z, a, b) = 0. (9)
j+k
pjk(,)&!?m Differentiating this, we get
aXjayk x=x(i).Y=Y(A)

g+pg=o,
av
-+q2=o.
av
j, k=O, I,..., n; j+k<n, (10)
ay
with each point (x, y) on these curves. The
Eliminating a and b from (9) and (lO), we get
manifold {X(n)> Y(A), P(l)} (P(i) = { Pjk(l)}) of
the original equation (8). Furthermore, solving
the parameter 1 is called a characteristic mani-
equation (8) is equivalent to getting three func-
fold of equation (5). Cauchy’s existence theo-
tions a, b, z of x, y from (8), (9), and (10). If we
rem cannot be applied on the characteristic
regard a, b as functions of x, y, in (9), we get
manifold.
The foregoing considerations can be ex- avaa c3Vab avaa avab=o.
tended, to some extent, to the space of higher aaax+abax=“’ --+--
aaay ab a) (11)

dimensions R” or C”. Let P be a linear partial


differential operator of order m: Therefore we can replace (9) and (10) by (9)
and (11). Hence we have the following three
cases: (i) When a, b are constants, we get a
complete solution. (ii) When V = 0, #3V/aa = 0,
and aV/ab = 0, we get a solution that does not
contain arbitrary constants, because z, a, b are
all functions of x and y. We call this solution a
The coefficients are assumed smooth and real singular solution of (8). (iii) When aI’/&, aV/ab
in R” or holomorphic in C”. Its homogeneous do not vanish simultaneously, the Jacobian
part of order m, denoted by P,,,(x, D), is called D(a, b)/D(x, y) vanishes because of ( 11). This
the principal part of P. Let S be a hypersurface means that there exists a functional relation
defined by q(x)=0 with gradcp(x)#O. S is between u and b. If there are two such rela-
called a characteristic surface if P,,,(x, grad q(x)) tions, a and b are constants and the solution
= 0 holds for x in a neighborhood of S or z becomes a complete solution. Therefore we
merely for x E S, and q(x) is called a phase assume that there is only one such relation
function. A real vector t”( #O), or a complex between a and b, whose form is assumed to be
vector 5” # 0 is called a characteristic direction b = q(a). Then we get
at the point x0 if Pm(xo, 5’) = 0. The zeros (x, 5) av av
(5 # 0) of P,,,(x, 5) are called the characteristic Vk Y, z, 4 cp(4) = 0, aa+ab’P’(L’)=O. (12)
set. Furthermore (x0, 5’) is called simple if
grads Pm(xo, 5’) #O. Suppose that (x0, 5’) is If we solve (12) for the unknowns a and z, we
simple. The integral curve (x(t), t(t)) that get a solution z of (8) that contains an arbi-
satisfies trary function cp instead of arbitrary constants.
Such a solution is called a general solution of
2 = grad, P,,,(x, 0, the partial differential equation (8) ‘of the first
( = - grad, P,,,(x, 5) order. By specializing this function cp, we ob-
tain a particular solution of (8). Thus (i)-(iii)
is called the bicharacteristic strip of Pm. Evi- exhaust all the cases, and by obtaining a com-
dently P,,,(x, 5) is constant along the bicharac- plete solution of (8) we can get all the solu-
teristic strip. In particular, if this constant is tions of (8). The number of compleie solutions
zero, it is said to be null-bicharacteristic. The may be more than 1, or it may be infinite.
phase function q(x) can be obtained at least These complete solutions can be tr,znsformed
locally by using the bicharacteristic strip. into each other by tcontact transformations.
1191 320 D
Partial Differential Equations

Moreover, they are contained in the general nition of a general solution is not applicable
solutions. since we cannot successfully define a general
Now we consider the case where the number solution by using the number of arbitrary
of independent variables is n. Take an equa- functions contained in a solution.
tion that contains n-r + 1 arbitrary constants Instead, we now use the following definition,
a1,az,...,an-,+1: due to J. G. Darboux: A solution of a general
partial differential equation is called a general
V(xl,xz, . . . ,x,:a,,a,, . . . . an-r+l)=O. (13) solution if by specializing its arbitrary func-
Differentiate this equation assuming that tions and constants appropriately we obtain
a,, . . . , an-,+l take fixed values. Then we get a solution whose existence is established by
Cauchy’s existence theorem. A solution z =
av av aZ cp(x, y) of a general partial differential equa-
~+Pi~=09 Pi’z? i=l,2 )..., n.
1 tion is called a singular solution if Cauchy’s
(14) existence theorem cannot be applied on any
If we eliminate a,, a,, . . . , an-r+l from (13) and curves on the manifold formed by z = cp(x, y),
(14), we obtain r partial differential equations p=acpfax,q=aq~ay.
of the first order:

D. Cauchy’s Method
j=l,2 ,..., r. (15)
We can regard equation (8) as a relation be-
We call (13) a complete solution of (15). In this tween the point (x, y, z) on the integral surface
case, as in the case when II = 2, we get all the S and the direction cosines of a tangent plane
solutions of (15) from a complete solution (13) at that point. Therefore the tangent planes at
of (15). We have the same classification as in all points of the surface form a one-parameter
the case n = 2: (i) When we regard a,, . . . , an-r+1 family. They envelop a cone (T) whose vertex
as constants, then we have a complete solution is (x, y, z) on S. The tangent plane at a point M
of (15). (ii) When we can eliminate the con- on the integral surface S is tangent to this cone
stants a 1, . . . ,u”-~+~ from equations (T) along one generating line G of (T).
A curve on S whose tangents are all generat-
av av
v=o, -=o, ...) -=o, ing lines of (T) is a characteristic curve. If we
aal aan-r+l write
we get a solution that does not contain an
arbitrary constant. Such a solution is called a
singular solution of (15). (iii) If not all of the
i3V/&zi vanish, there exists at least one func- then the characteristic curve is given by the
tional relation among a,, a,, . . . , u”-~+~. We system of ordinary differential equations:
assume that there exist exactly k ( < n-r)
dx dy dz -dp _ -dp
relations among a,, a,, . . . ,an-r+l: -=-= -= (18)
P Q Pp+Qq X+pZ y+qz’
jJa,,a, ,..., an-r+l)=O, j=l,..., k. (16)
We call this system the characteristic differen-
Then there exist numbers 1,) I,, . . . ,1, such tial equation or Charpit subsidiary (auxiliary)
that equation of the partial differential equation (8)
of the first order. System (18) determines not
only x, y, z but also p and q. The set of these
tsurface elements (x, y, z, p, q) is the character-
I=1 ,...,n-r+l. (17) istic manifold. This characteristic manifold is
considered as a part of the integral surface
Hence, by eliminating a,, a2, . . . ,an-r+l, A,, with infinitesimal width, and in this case we
I,, . . , 1, from (13), (16), and (17), we generally call it the characteristic strip. The character-
obtain exactly one relation between x1, . . . , x, istic strip is represented by the equations x =
and z. This is a solution of (15) that contains
x(n),~=y(l),z=z(l),p=~(l),q=q(l)con-
exactly k arbitrary functions fi, f2, . . . , fk. Such taining a parameter. On the integral surface
a solution is called a general solution of (15). In z = z(x, y), we have
particular, if k = n-r + 1, then it is a cdmplete
solution. We might think that there are n-r g=azdx I azdY
general solutions corresponding to k = 1, . . . , n dl ax dl dy dl
- r. But these general solutions are not essen-
and
tially different. For the partial equation of the
second order F(x, y, z, p, 4, r, s, t) = 0, this deli- dz=pdx+qdy. (19)
320 E 1192
Partial Differential Equations

Equation (19) is called the strip condition. The Second, for the twave equation
equations (18) evidently satisfy this condition.
The solution ~(t, x)= u(t, x1, , x,,) of
ah a% ah i a%
p+2+~-~p=o’
av (22)

we have a solution
‘40,x) = q(x)
is obtained (at least locally) as follows. Let the
solution of the differential equations
(23)
where r2 = x2 + y2 + z2, f is an arbitrary func-
tion, and y is a particular solution of (22).
Furthermore, u =f(cr, 8) satisfies the equation
d5i
--.f;,(t,x1,...,x,,51,...,5,), lGi<n, of a characteristic curve of (22):
z-

issuing from (x,, 5’) at t=O be (x(t; x0, to),


<(t; x0, [“)). Then specializing ;Jp = cpXr(xO)
(1 <j < n), the solution u(t, x) is-obtained by Such a solution, which is the product of par-
quadrature along these curves (characteristic titular solutions and some function that con-
strips) from the relation tains an arbitrary function, is called a primitive
du solution of the original equation. Equation (22)
~ = -ftt> x, 5) + jc tjf<,tt> x, 0 has another primitive solution of the type
dt

In particular, when f(t, x, 5) is homogeneous u=$(t+f, s).


of degree 1 in 5, by Euler’s identity the right-
hand side is identically zero. This means that u where CJis an arbitrary function.
is constant along the characteristic strip. Laplace’s equation has a primitive solution

1 Z-r

E. Homogeneous Partial Differential u=-f ~


r ( x+iy >
Equations
A basic equation is an equation, such as La-
Assume that ,f(tr, t2,. ,&,,) is a thomogene- place’s equation, that has a primary solution
ous polynomial of m independent variables and a primitive solution. A solution of an
tr, t2, , 5,. We denote the differential oper- equation that has the same charactzristic
ator a/ax, by 0,. Then consider a homogene- curves as a basic equation can be obtained
ous partial differential equation from a particular solution of the basic equa-
tion by integrations and additions. For exam-
f(D,,D, ,..., D,)u=O. (20) ple, if we choose a particular solution u = Y-’
We can obtain a homogeneous equation from of Laplace’s equation, which is a specialization
an inhomogeneous partial differential equation of the primitive solution u = r-‘f((z -- I)/(x + iy)),
by transformation of the dependent variable. then
For example, Dfw = D, w becomes the homoge-
neous partial differential equation (0: - D,D,)u u= [(x-5)2+(y-~)2+(z-1)~‘]-“2
=0 by the transformation of the dependent
variable u = eX”w. The equation (0: - D,D,)u = x F(5, v> 0 d5 dv 4
0 corresponds to the homogeneous polynomial
is a solution of
.f(tl, C2, &I= tf -t2t3. Generally, for ewa-
tion (20), we consider the solution Au + 47rF(x, y, Z) =0
u=F(O,,O,, . . . . OS), (21) in the interior of the domain of integration.
where 0,) O,, ,0,, are functions of x,, , x,
and F is an arbitrary function of Oi. Such a
F. Determined Systems
solution is called a primary solution of (20).
For example, for the equation (0: -0:)~ = 0,
there are two primary solutions, u = F(x, +x,) The general form of a system of partial dif-
and u = F(x r -x2). For +Laplace’s equation ferential equations in two independent vari-
ables is
a2u a*u a2u
Au=~,+,,+~=O, Fi(X, y, u(l), u(2), . , U(m),uy, up, ) up, u:“‘,
OY
ll$!, . ..)=O. i=l, 2, . . . . h, (24)
we have a primary solution u = F(z + ix cos c(+
iy sin a), where c1is a parameter. i.e., a system of h equations for m functions
1193 320 H
Partial Differential Equations

u(1), uw ,..., ucrn)of the independent variables x H. Fundamental Solutions


and y. The system is called a determined sys-
tem if h = m, an overdetermined system if h > L is assumed to be a linear partial differential
m, and an underdetermined system if h < m. operator with constant coefficients. If a tdistri-
An example of a determined system is the bution E satisfies the equation
Xauchy-Riemann equation
LE(x)=c?(x),
ux-l&=0, u,+v,=o,
where 6(x) is the tDirac b-function, then we
for u(x, y), u(x, y), which can be further reduced call E(x) a fundamental solution (or elementary
to two determined equations Au = 0 and Au = solution) of L. Also, if L is a linear differential
0. operator and E satisfies the equation
A simple example of an overdetermined
system is L-w, Y) = G-Y),
then we call this distribution E(x, y) a funda-
4 =fk YX u, =f(x, Y).
mental kernel (or elementary kernel) of L.
A necessary and sufficient condition for the Let L be a differential operator with con-
existence of a solution of this system is & = f,. stant coefficients and E(x) be a fundamental
The Cauchy-Riemann differential equations for solution of L. Then E(x, y) is a fundamental
a holomorphic function f(z,, z2) = u + iv of two kernel of L. Sometimes E(x, y) itself is called a
complex variables z i =x i + iy,, z2 =x2 + iy, fundamental solution. L. Ehrenpreis and B.
are Malgrange proved that any linear differential
operator with constant coefficients has a fun-
u,,=uy,, ux2=vy*, uy,= -%,, damental solution [4].
uy,= - ux2> If we take a fundamental solution (funda-
mental kernel) E and add to it an arbitrary
which can be reduced to
solution of the equation Lu = 0, then we get
ux,x,+uy,y,=o, ux,xl+~y‘y*=o, another fundamental solution (fundamental
kernel) of L. This freedom of the fundamental
ux2x2 + uy,y, = 0, ux,y, - ux*y, = 03 solution (fundamental kernel) can be used to
which is also an overdetermined system. construct tGreen’s functions of the boundary
An example of an underdetermined system value problem of elliptic equations and of the
is mixed initial-boundary value problem for
parabolic equations. A Green’s function is a
a(4 4 fundamental solution (fundamental kernel)
~ = u,vy - uyv, = 0.
ak Y) that satisfies given boundary conditions (-
If this equation holds, there exists a functional 188 Green’s Functions; 189 Green’s Operator).
relation w(u, v) = 0 that can be regarded as a The fundamental solutions (fundamental ker-
nels) relative to the Cauchy problem are also
solution of this underdetermined system.
defined as in this section. For example, con-
sider a fundamental solution of the Cauchy
G. General Theory of Differential Operators problem concerning the future behavior of
a differential operator L =.a/at - P(a/ax),
In recent developments of the theory of par- namely, a distribution E(t, x) that satisfies LE =
tial differential equations, there is a trend to 0 (t>O) and E(t,x)I,=,=6(x). If we put &t,x)=
construct a general theory for tdifferential E(t,x) (t>O) and E”(t,x)=O (t<O), then E”(t,x)
operators regardless of the classical types is a fundamental solution (or kernel) of the
of differential equations (- 112 Differential differential operator L, that is, LE” = s(t, x).
Operators). For example, we take a property Sometimes a fundamental solution of the
that is satisfied by some equation of classical Cauchy problem for a parabolic equation is
type (e.g., an telliptic differential equation) and called a Green’s function. On the other hand, a
proceed to characterize all equations that have fundamental solution (or kernel) of the Cauchy
this property. (For example, thypoellipticity is problem for a hyperbolic equation is called a
a property of classical tparabolic and elliptic tRiemann function. A Riemann function actu-
equations.) There are several basic problems in ally is not always a function; in general it is a
this general theory: the existence of a funda- distribution.
mental solution, the existence of a local solu- Example 1. A fundamental solution of the 3-
tion, unique continuation of solutions, the dimensional Laplacian
differentiability and analyticity of solutions,
and the propagation of smoothness. We ex- A=$+;+$
plain here only two of them: the fundamental x2 3
solution and the local existence of solutions. is E(x)= -1/47cr, where r=Jm.
320 I 1194
Partial Differential Equations

Example 2. A fundamental solution of the neighborhood of the origin, if I; is even, L


the Cauchy problem for the future of the 3- is locally solvable, and if k is odd, it is not
dimensional wave operator locally solvable. However, for linear partial
differential operators with multiple character-
‘istics, the problem of local solvability becomes
extremely difficult.
i.e., a distribution E(t, x) (t 20) that satisfies
LE=O (t>O), E(O,x)=O, and (a/at)E(O,~)= References
6(x), is given by E(t,x)=(1/4~t)d(r--t) (t>O),
r=p x1 + xi + xi. A fundamental solution [l] E. Goursat, Cours d’analyse mathCma-
for L is given by E(t,x)=E(t,x) (t>O); = tique, Gauthier-Villars, II, second (edition,
0 (t < 0) (- Appendix A, Table 15.V). 1911; III, fourth edition, 1927.
[2] R. Courant and D. Hilbert, Methods of
mathematical physics II, Interscience, 1962.
I. Existence of Local Solutions [3] H. Lewy, An example of a smooth linear
partial differential equation without solution,
Given a linear differential operator L and the Ann. Math., (2) 66 (1957), 155-158.
equation Lu =A we have the problem of deter- [4] L. HGrmander, Linear partial differential
mining whether this equation always has a operators, Springer, 1963.
solution in some neighborhood of a given [S] S. Mizohata, Theory of partial differen-
point. If the coefficients of L and f are holo- tial equations, Cambridge Univ. Press, 1973.
morphic in a neighborhood of this point and (Original in Japanese, 1965.)
if the homogeneous part of highest order [6] C. CarathCodory, Calculus of variations
does not vanish, then there exists a solution and partial differential equations of the first
that is holomorphic in a neighborhood of the order, Holden-Day, 1967.
given point (+Cauchy-Kovalevskaya existence [7] I. G. Petrovskii, uber das Cau#:hysche
theorem). Problem fiir ein System linearer pa.rtieller
If L is a linear differential operator with Differentialgleichungen im Gebiete der nicht-
constant coefficients, E is a fundamental solu- analytischen Funktionen, Bull. Univ. Moscou,
tion of L, and ,f is a function (or distribution) ser. internat., sec. A, vol. 1, fast. 7 (1938), l-74.
that is zero outside of a compact set, then we [S] J. Hadamard, Le probltme de Cauchy et
have a solution LI that is the tconvolution of E les kquations aux d&iv&es partielles 1inCaires
and S: u = E *f: On the other hand, H. Lewy hyperboliques, Hermann, 1932.
proposed the following example [3]: [9] I. G. Petrovskii, Lectures on partial dif-
ferential equations, Interscience, 1954. (Orig-
inal in Russian, 1953.)
[lo] S. L. Sobolev, Partial differential equa-
where f is a real function of x3. He showed tions of mathematical physics, Pergamon,
that if this equation has a solution that is of 1964. (Original in Russian, 1954.)
class C’, then ,f must be real analytic. There- [ 1 l] J. L. Lions, Equations diffirentielles
fore, if j is of class C” but not real analytic, opCrationelles et problkmes aux lirnites,
then this equation has no Cl-solution. Actu- Springer, 1961.
ally, no solution exists even in the distribution [ 121 J. L. Lions and E. Magenes, Probl6mes
sense. (Note that, since the coefficients of L are aux limites non homog&nes et applications,
now complex-valued, the results mentioned at Dunod, I, 1968; II, 1968; III, 1969.
the beginning of this section are no longer [ 131 A. Friedman, Partial differential equa-
applicable.) tions, Holt, Rinehart and Winston, 1969.
For linear differential operators L, a study [ 141 F. Treves, Locally convex spa.ces and
by L. Hiirmander gives some necessary con- linear partial differential equations, Springer,
ditions and also some sufficient conditions for 1967.
the local existence of a solution of the equa- [ 151 F. Treves, Linear partial differential equa-
tion Lu =f for sufficiently general f [4]. This tions with constant coefficients, Gordon &
result has been developed and completed by Breach, 1966.
L. Nirenberg and F. Treves [ 181 and by R. [ 161 F. John, Partial differential equations,
Beals and C. Fefferman [ 191. The operator Springer, 1971.
considered by S. Mizohata (J. Math. Kyoto [ 173 L. Bers, F. John, and M. Schechter, Par-
Univ., 1 (1962)), tial differential equations, Interscience, 1964.
[IS] L. Nirenberg and F. Treves, On local
solvability of linear partial differential equa-
tions I, II, Comm. Pure Appl. Math., 23
serves as a standard model in this problem. In (1970), l-38,459-510.
1195 321 B
PDEs (Initial Value Problems)

[19] R. Beals and C. Fefferman, On local also for initial conditions on a hypersurface,
solvability of linear partial differential equa- called an initial surface (- Section C).
tions, Ann. Math., 91(1973), 482-498. Let a(x, D) be a linear partial differential
[20] F. Cardoso and F. Treves, A necessary operator of order m:
condition of local solvability for pseudodiffer- alal
ential equations with double characteristics, 4x,D)= 1 a,WD=, DQ=axa,
Ann. Inst. Fourier, 24 (1974), 225-292. I.1 < m 1 ...aq’
lLYl=a,+...+cc,,

where the coefficients a,(x) are of class C” in a


neighborhood of x = 0. Its characteristic poly-
321 (X111.21) nomial is
Partial Differential Equations
(Initial Value Problems)
Let S: s(x) = 0 be a regular surface (i.e., s, =
A. General Remarks (as/ax,, . . . , as/ax,) # (0)) of codimension 1.
We suppose that S is a tnoncharacteristic
First, we give two examples of initial value surface, that is, h(x, s,) # 0 on S. Let u(x) and
problems for tpartial differential equations. y(x) (0 < k < m - 1) be the functions of class C”
(I) Consider the partial differential equation in a neighborhood of x = 0 and on S, respec-
u, - u, = 0 of two independent variables x and tively. We consider the Cauchy problem
y. If the function q(y) is of class C’, then u =
a%
cp(x + y) is a solution of this equation that 4x, DMx) = u(x). -= wk(x) on S,
satisfies ~(0, y) = q(y).
anqx)
(II) We denote a point of R”+’ (or C”‘t) by O<k<m-1, (1)
(t,x), x=(x1,..., x,). Let L be a linear partial
differential operator of order m: where n is the outward normal direction of S.
S is thus the initial surface. Then there exists a
unique solution u(x) of class C” in a neighbor-
hood of x = 0. In fact, by the change of vari-
Ivl=v,+v,+...+v,, v,<m, ables X, = S(X), Xi = xi (2 G i G n) if as/ax, #
0 on S, this problem can be reduced to the
where the coefftcients ayOy,,,,y” (t, x) are func- Cauchy problem (1) by taking account of the
tions of class C” (i.e., treal analytic functions fact that h(x, s,) #O on S.
or holomorphic functions) in a neighborhood The Cauchy-Kovalevskaya theorem asserts
of (t, x) = (0,O). If the functions u(t, x) and wk(x) the local existence of solution when the initial
(0 <k <m - 1) are of class C” in a neighbor- values are of class C”. Indeed, J. Hadamard
hood of (t, x) =(O,O), then there exists a unique noted that if the initial values are not of class
solution u(t, x) of class C” in a neighborhood C”, the initial value problem does not always
of (t, x) = (0,O) that satisfies have a solution. For example, consider the
initial value problem
L Cul = 44 4,
2 2 2
!?+!?+a”=()
$(o,x)=w,(x), O<k<m-1. (1) a2 ay2 az2

This is called the Cauchy-Kovalevskaya exis- with the initial values


tence theorem (for linear partial differential au
40, Y,4 = W(Y,4, -=o.
equations). axto, Y,4
As in (II) we choose one of the independent
If the function w(y, z) is not of class C” in any
variables as the principal variable and regard
neighborhood of y = z = 0, the solution of this
the others as parameters. When we assign a
problem can never exist in (or even on one side
value a to the principal variable t, the values of
x > 0 of) any neighborhood of x = y = z = 0.
the dependent variables (unknown functions)
and their derivatives are called initial values,
initial data, or Cauchy data. Conditions to
determine initial values are called initial con- B. The Cauchy-Kovalevskaya Existence
ditions. The problem of finding a solution of Theorem for a System of Partial Differential
(1) under given initial conditions is called an Equations in the Normal Form
initial value problem or Cauchy problem. We
may consider initial value problems not only The Cauchy-Kovalevskaya existence theorem
for initial conditions on a hyperplane t = a, but (1) is extended for more general systems of
321 C 1196
PDEs (Initial Value Problems)

partial differential equations in the normal Suppose that F(x,, . , xkr u,p,, pk) is a real-
form studied by Kovalevskaya. Consider valued function of class C2 in a neighborhood
of xi = “i, u = b, pi = ci; cp(x,, ,x,), S(x, / . ) Xk)
[?P’U.
---=Fi t,x,u ,,..., u, ,...I are functions of class C2 in a neighborhood
iitpr of x, = ui that satisfy b = ~(a,, . , aJ, ci =
(@/dXi),=,, S(a, , . %., ak) = 0; and

(5)
where l~i,j~m,Ivl=v,+v,+...+v,~p,,v,<
I-‘J>and x =(x1, , x,). We assume that Fi (1~
Then there exists a solution u of (4) of class
id m) are functions of class C” with respect to
C2 in a neighborhood of x = a that satisfies u =
arguments t, x, ur, , u,, _, Ynj/L?tYOi;X;l
q(x) on the hypersurface S(x) = 0.
?xin, , in a neighborhood of (O,O, .._ ,O). If
Furthermore, if F, q, S are of class C’ and
the functions wik(x) (1 < i ,< m, 0 <k < pi ~ 1)
satisfy (5), then there is at most one solution u
are of class C” in a neighborhood of x = 0,
of (4) of class C’ in a neighborhood of a that
then there exists a unique solution U, (t, x), ,
satisfies u = q(x) on the hypersurfa’ce S(x) = 0.
u,(t, x) of class c’” in a neighborhood of (t, x) =
These facts can be proved in the following
(0,O) that satisfies the equations and the initial
way. By choosing S,(x), . , S,-,(x) and then
values
S(x) so that the +Jacobian a(& S,, . . . , Sk-,)/
ikU. i?(x,, , xk) does not vanish and b,y changing
-$CO> -U)= wik(x), 1 didm, O<k<pi-1
variables from x to S, S,, . , Sk-t, we obtain
a normal form solved for &/as by condition
c2,41. (5) (this condition means that the bypersurface
S(x) = 0 is not tangent to the tcharacteristic
C. Single Equations of the First Order curves).

For a single partial differential equation given


in the normal form D. Quasilinear Equations of the Second Order

Consider the equation

weassumethatF(x,y, ,..., y,,u,q, ,..., qk)isa


real-valued function of class Cz in a neighbor-
hood of x = u, yi = hi, u = c, q, = di, and that
‘p(y,, , yk) is also a function of class C2 such where x=(x,, . . . . xk), p=(pl, . . . . p,.), and pi=
that cp= c, &p/c;lyi = di at yi = hi. Then there i?‘u/axi. We assume that the initial conditions
is a solution u of (2) in a neighborhood of are u = q(x) on S(x) = 0 and
x = a and yi = h, that satisfies u(u, y, , . , yk) =
y)( y1 , , yk) and is of class C2.
For the uniqueness of solutions of this
Cauchy problem, A. Haar (Atti de/ Congress0 on the same hypersurface. Taking the other
Internazionule dei Matematici, 1928, Bologna, functions S, (x), . , S,-,(x) and then S(x) so
vol. 3) showed that if F satisfies the +Lipschitz that the Jacobian d(S,S,, ,s,-,)/8(x,, ,x,)
condition does not vanish, we change the variables x to
s,,,sr ,..., skml (s,=S,si=Si).Then weget
I Fb, Y, u’, 4’) - F(x, Y, u, 411

(6)

then the solution of the initial value prob- where


lem for (2) is unique. To obtain this result he
studied the partial differential inequality

(3)

Next, we consider more general equations of


the first order

?u ?u
Fx I,..., q.,u,- ).../ ~ =o.
i
dX, CX, When s,, =O, the initial conditions are trans-
1197 321 E
PDEs (Initial Value Problems)

formed to suitable sense, we say that the initial value


problem is well posed (properly posed or cor-
u=cp*(s, , . . . , Sk-J= cp(x(O, s 1, . . ., Sk-l)),
rectly posed).
Systematic research on the well-posedness
of Cauchy problems was initiated by I. G.
Petrovskii, who considered the following sys-
tem of partial differential equations, which is
more general than differential equations of the
normal form. (The coefficients are all assumed
to be functions oft only.)

Suppose that Q(S, S) # 0, and set au/as, = qi. tg=g ,“~~ujk”~“~...““(t)a”o+v’+“‘+*“uk~x~t)


Then equation (6) added to the initial con-
a0a.q . . ax:
ditions u = cp*(s,, . , sk-i), q. = $*(s 1, . . ..%-I) +Bj(xlr...,xn9t)9 j=l ,..., N,
at so =0 is equivalent to the system in the
Jvl=v,+v,+...+v,, VO<?lk. (7)
normal form of partial differential equations of
the first order This has a normal form if nk = m (k = 1, . . , N).
Now, taking the derivatives
au 84,_ aqo. PC=1 ,...,k-1,
Gzqo3 as0 as,’ a nj-1 a II-2

uj(x, tb z uj(x, t)~ . . P


0 at 0

i uj(xt t,
0

with initial conditions


-ZQ(S,,S)f$
a >1
-b*
as new unknowns, we get another system:

u=cp*(s, ,...,sk-l), qO=$*(Sl,..‘,Sk-I),

qu= acp*lasP + cj(x, tb j=l,...,N’. (8)


We take as the space of initial values the
when so = 0. Thus, if the coefficients are of
ttopological linear space composed of all func-
class C”, the preceding theory applies to this
tions whose derivatives up to a sufficiently
equation.
large order are bounded on the whole space R”
and equipped with the topology determined by
E. Continuous Dependence of Solutions on the the tseminorms that are the maximums of
Initial Values derivatives up to a given order on the whole
space, and we take as the range space a simi-
First, we consider the following simple exam- lar space on the xl-space, where x E R” and
ple of a linear equation. The wave equation 0 < t < T. Then we can formulate a necessary
and sufficient condition for the well-posedness
a% ,a% of the initial value problem for the future (the
@=” s
problem is regarded as specifying a mapping
for v(x, t) is the simplest thyperbolic equation. that assigns to the initial values on t = 0 the
Its solution satisfying the initial conditions values u(x, t) for t > 0). To give such a con-
v(x,O)=f(x), (au/at)(x,O)=g(x) is given by dition we consider the following system of

X+C* ordinary differential equations, which are


u(x,c)=~(f(X+c~)+~(X-c~))+~
sx_ets(Wa.
given by a tFourier transformation
x-space of system (8):
on the

It is obvious from this expression that if we


regard f(x) and g(x) as elements of the tfunc-
tion space C(R) of continuous functions of
x (27rilJ1 . (27ti{,)y”fik(& t) + Cj(& t),
x E R with the topology of tuniform conver-
gence on compact sets, then v(x, t) is deter- j=l,...,N’. (9)
mined as the value of a linear operator from
If the tfundamental system of solutions of (9)
C(R2) to C(R’) on xt-space. If such continuous
is vjj)(& t) (i = 1, . . . , N’, j = 1, . . . , N’), then the
dependence on the initial values is established,
condition is that these functions satisfy the
or more precisely, if there is a unique solution
inequalities
for sufliciently smooth initial values that de-
pends on the initial values continuously in a Iv1”(5,t)l~C(1+151)L, O<t<T, (10)
321 F 1198
PDEs (Initial Value Problems)

where C and L are constants independent of 5. A. P. Calderon showed that Carleman’s


This is the condition obtained by Petrovskii. result can be extended to the case n > 2 (Amer.
If system (7) is of normal form and is well J. Math., 80 (1958)). Consider the following
posed for the future, it is also well posed for linear partial differential equation of the kth
the past. In this case, equation (7) is said to be order:
of hyperbolic type (- 325 Partial Differential
Equations of Hyperbolic Type). An example
that is not of a normal form and is well posed
is a parabolic equation (- 327 Partial Dif- where Pj(x, y, t) is a homogeneous polynomial
ferential Equations of Parabolic Type). In of degree j of 5 = (lr, . , 5,) with real coefh-
PetrovskiTs theory it is sufficient to assume cients and B is a differential operation in (x, y)
that the coefftcients in (7) are continuous and of order at most k - 1. We assume that the
bounded, and we can take T arbitrarily large coefficients of 4(x, y, 5) are functions of (x, y) of
provided that (10) is satisfied. Hence Petrov- class C’, their derivatives are Holder continu-
skii’s theory guarantees global existence of ous, and the coefficients of B are bounded and
solutions. continuous. If the characteristic equation of
(W,

F. Uniqueness of Solutions ik+ 5 qx,y,()lk-j=O,


j=I

If a linear partial differential equation of the has only distinct roots for 5 ~0, then the Ck-
first order of normal form has coefficients of solution of the Cauchy problem is unique in a
class c’“, then the solution of class C’ satisfy- neighborhood of x = a. Calderon proved this
ing the prescribed initial conditions is unique except for the cases k b 4, n = 2, where a cer-
(Holmgren’s uniqueness theorem, 1901). Every tain topological difficulty arises. S. Mizohata
system of partial differential equations of (J. Math. Sot. Japan, 11 (1959)) succeeded in
higher order of normal form can be reduced to obtaining the proof for the exceptlonal cases.
a system of the first order of normal form. This result can be extended to systems of
Therefore, if the coefhcients are of class C”, equations under similar assumptions. See L.
there is only one solution for the original Hormander [4] for an extension to the com-
initial value problem with continuous partial plex coefftcient case. S. Mizohata, T. Shirota,
derivatives up to the order of the equation. and H. Kumanogo discuss the uniqueness
Moreover, if an analytic manifold S of dimen- theorem for equations of double character-
sion n - 1 (n is the number of independent istics or of parabolic type.
variables) is not tcharacteristic for the given For nonlinear equations there are, in gen-
equation of order m, there is at most one solu- eral, very few results about the global existence
tion whose derivatives of order up to m - 1 of solutions. For example, if the function F in
coincide with given functions on the manifold equation (2) in the normal form satisfies the
S. The proof of this fact relies on the Cauchy- Lipschitz condition with constants A and B
Kovalevskaya existence theorem. that are independent of x, then we get a global
The uniqueness problem for the initial value existence theorem. The method of proof of
problem is in genera1 very diflicult even for this theorem is as follows: First, we prove the
linear equations when the coefficients are not existence for 1x1<Ed and sufficiently small sI
of class C”. by Picard’s tsuccessive iteration method. Then
In particular, if the number of independent we regard x = E, as the hyperplane on which
variables is 2 and the coefftcients are all real, the initial values are assigned and proved the
then we have a result of T. Carleman (1939) existence of a solution on &r < Ix1 1<Ed, and so
about the system: on. The same method can be applied to non-
linear systems of the first order if .:hey are
special types of quasilinear systems.

where the a,, are of class C2 and the b,,, are


continuous. He proved that if the eigenvalues G. Construction of Solutions by Asymptotic
of the matrix (a,,) are all distinct, then even Expansion
when some of the eigenvalues are complex,
there is at most one solution of class C’ for the Let a(x, D) = &~,,,u~(x)Da be a linear par-
initial value problem. If we omit the assump- tial differential operator of order m, with
tion about the eigenvalues, however, the theo- coefficients of class C”. We write u(x, <)=
rem does not hold in general, because we have ~,,,,,a,(x)5”=h(x,5)+h’(x,5)+ . with h
a counterexample due to A. PliS (1954) where I and h’ homogeneous in 5 of degree m and m -
all coefficients are of class C”, m = 2, bpy = 0. 1, respectively. Let K : Q(X) = 0 be a regular
1199 321 G
PDEs (Initial Value Problems)

surface of codimension 1. We assume that K w,5,, 60, . . . . 0) = 0 has m distinct roots. Then
is a simple characteristic, i.e., h(x, rp,) = 0 there exist m characteristic surfaces Ki: qi(x) =
and (ah/&(x, q=))#(O) on K. Let fj(t) (j= 0 (1 < i < m) originating from T. rp,(x) is ob-
0, 1, . .. ) be a sequence of functions satisfying tained by solving the THamilton-Jacobi equa-
df,/dt(t)=&(t), j = 1,2, . . . . Then the equation tion h(x, cp,)= 0, ~(0, x2, . .. ,x,)=x2. Now, if
a(x, D)u(x) = 0 has a formal solution in the wk(x) (0 < k <m - 1) has a pole along T, the
form Cauchy problem (1) has a unique solution in
the form
u(x)= 2 fj(cP(x))uj(x).
j=O

The coefficients uj(x) are obtained by solving


u(x)
=i=l
f a+ Gi(x)lOg
Pi(x)
>+H(x),
successively the equations where F,(x), G,(x) (1 < i < m) and H(x) are holo-
morphic functions in a neighborhood of x =
Ll C”jl =ki2 LC”j-k+Ilt (13 0 and pi (1 <i < m) are integers > 0. In order
to obtain this solution, we set u(x) in the form
where U(X) = C& C,p”,ojj((Pi(X))Ui,j(x), where fo(t) is
a function with a pole or a logarithmic sin-
gularity at t = 0 chosen so that u(x) satisfies
the initial conditions. Thus we can determine
+ h’k 4 the coefficients ui, j(x) by solving successively
the equations (13) on each K, (1 < i < m) (Y.
and L, (2 < k < m) are differential operators of Hamada, Publ. Res. Inst. Math. Sci., 5 (1969);
order k depending only on a and rp. By this C. Wagschal, J. Math. Pures Appl., 51 (1972)).
method of asymptotic expansion of solution, When the multiplicity of characteristic roots is
fundamental solutions and singularities of more than 1, the situation is not the same. For
solutions for hyperbolic equations have been example, consider the Cauchy problem
studied (Hadamard [S, 61; R. Courant and P.
Lax, Proc. Nat. Ad. Sci. US, 42 (1956); Lax, Dfu-D,u=O, u(O,x,)=$
Duke Math. J., 24 (1957); D. Ludwig, Comm.
Pure Appl. Math., 22 (1960); S. Mizohata, J.
D,u(O,x,)=O.
Math. Kyoto UC., 1 (1962); - 325 Partial
Differential Equation of Hyperbolic Type). The solution is
If a(x, D) is an operator with analytic coefli-
cients, this formal solution is convergent. By
using this fact, Mizohata constructed tnull
solutions. In fact, put A(t) = tP+m+j/r(p+ m + with essential singularities along x2 = 0. This
j) (p>O), j=O, 1, . . . . Define u(x) by u(x)=0 situation occurs quite generally. We factor
for q(x)<0 and u(x)=~~o~((p(x))uj(x) for h(x,()=h,(x, 5)‘~ . . . h,(x,(p, where hi (1 <i<s)
q(x) >, 0 obtained by the preceding process. are irreducible polynomials of degree mi in 5
Then u(x) is a null solution of a(x, D)u(x) = 0 with holomorphic coefficients. We assume that
(Mizohata, J. Math. Kyoto Univ., 1 (1962)). the equation n;=, hi(O, tl, 1, 0, .. . ,O) = 0 has p
The Cauchy problem in the case when the distinct roots (p = m, + . . . + mJ. Then there
initial surface has characteristic points had exist p characteristic surfaces Ki (1 < i < p)
been studied by J. Leray and by L. Girding, T. originating from T. We suppose more gener-
Kotake, and Leray. Let a(x, D) be a differential ally that the initial values y(x) are multi-
operator with holomorphic coefficients in a valued functions ramified around T. Then the
complex domain. Let S : s(x) = 0 be a regular Cauchy problem (1) has a unique holomorphic
surface and T be a subvariety of codimension solution on the universal covering space of V
1 of S. Suppose that S is noncharacteristic - ufzl Ki, where V is a neighborhood of x = 0.
on S - T, but characteristic on T. Consider In fact, this is solved by transforming this
the Cauchy problem (1) of Section A. Then problem into tintegrodifferential equations.
the solution u(x) is ramified around a char- Such a method of solution is closely related
acteristic surface K that is tangent to S on T, to the method discussed in this section. See
and it can be uniformized (Leray, Bull. Sot. Hamada, Leray, and Wagschal (J. Math. Pures
Math. France, 85 (1957); Garding, Kotake, and Appl., 55 (1976)). In this case, even if the initial
Leray, Bull. Sot. Math. France, 92 (1964)). values have only poles, the solution in general
Next we consider the Cauchy problem (1) may have essential singularities along UfZ1 Ki,
when the initial surface S (x1 = 0) is nonchar- but if a(x, D) satisfies Levi’s condition (a(~, D)
acteristic, but the initial values W&C)(0 6 k < is well decomposable), the solution does not
m - 1) have singularities on a regular sub- yield essential singularities along urZ1 Ki. See
variety T (x1 =x1 = 0) of S. We assume that J. De Paris (J. Math. Pures Appl., 51 (1972)).
321 Ref. 1200
PDEs (Initial Value Problems)

References sume that a problem is mathematcally well


posed, guess the possible solutions, and verify
[l] R. Courant and D. Hilbert, Methods of it directly.
mathematical physics II, Interscience, 1962. A problem is said to be mathematically well
[2] I. G. Petrovskii, Lectures on partial dif- posed (properly posed or correctly posed) if,
ferential equations, Interscience, 1954. (Orig- under assigned additional conditions, the
inal in Russian, second edition, 1953.) solution (i) exists, (ii) is uniquely determined,
[3] L. Hiirmander, Linear partial differential and (iii) depends continuously on the assigned
operators, Springer, 1963. data. By carefully examining problems in
[4] E. Goursat, Cours d’analyse mathema- physics and engineering, we usually obtain
tique, Gauthier-Villars, II, second edition, many well-posed and important problems. In
1911; III, fourth edition, 1927. these problems, usually the data are suf&
[S] J. Hadamard, Lecons sur la propagation ciently smooth functions. In these cases, part
des ones et les equations de l’hydrodynamique, (iii) of the above definition (the continuous
Hermann, 1903 (Chelsea, 1949). dependence of the solutions on the data) fol-
[6] J. Hadamard, Le probleme de Cauchy et lows often from assumptions (i) and (ii). For
les equations aux derivees partielles lintaires example, telliptic equations like +Laplace’s
hyperboliques, Hermann, 1932. equation u,, + uYY= 0 for u(x, y) describe laws
[7] C. Caratheodory, Calculus of variations of static or stationary phenomena such as the
and partial differential equations of first order, field of universal gravitation, the electrostatic
Holden-Day, 1965. (Original in German, field, the magnetostatic field, the steady flow
1935.) of incompressible fluids without vortices, and
[S] S. Mizohata, Theory of partial differential the steady flow of electricity or heat. For this
equations, Cambridge University Press, 1973. equation, ‘boundary value problems are well
(Original in Japanese, 1965.) posed, but tinitial value problems are not (-
[9] F. Treves, Linear partial differential equa- 323 Partial Differential Equations of Elliptic
tions with constant coefficients, Gordon & Type). By contrast, for thyperbolic equations
Breach, 1966. like utf - u,, = 0 for u(x, t) and +par.abolic equa-
[lo] F. John, Partial differential equations, tions like u, - u,, =0 for u(x, t) which control
Springer, 1971. the change (in reference to time) of the various
[ 1 l] J. Dieudonne, Elements d’analyse, 4, 7, 8, stationary phenomena, initial value problems
Gauthier-Villars. 1971-1978. or mixed problems with both boundary con-
ditions and initial conditions are well posed
(- 325 Partial Differential Equat tons of Para-
bolic Type).
322 (X111.20) For the rest of this article we explain funda-
Partial Differential Equations mental and typical methods of integration (-
Appendix A, Table 15).
(Methods of Integration)

A. General Remarks B. The Lagrange-Cbarpit Method

For the partial differential equation of the first


The methods of integrating partial differential
order
equations are not as simple as those for tordi-
nary differential equations. For ordinary dif- au au
ferential equations, we often obtain the desired Fb, y>u,P>4)= 0, P=z’ (1)
q =3
solution by first finding the general solution
containing several arbitrary constants and we consider a system of ordinary differential
then specializing those constants to satisfy equations called tcharacteristic differential
prescribed additional conditions. The situa- equations:
tion is more complicated, however, for partial dx dy du _ -dp -&
differential equations. In the formal general
K=F, pFp+qF, F,+pF, F,,+qF,’
(2)
solution of a partial differential equation we
have arbitrary functions instead of arbitrary If we obtain at least one tintegral of this sys-
constants, and there are cases where it is im- tem containing p, q, and anarbitrary constant
possible, or very difficult, to find a suitable spe- a in the form
cialization of these functions so that the given
G(x, Y, u, P, q) = a, (3)
additional conditions are fulfilled. For this
reason, methods of solution are rather specific and if we find p and q from (1) and (3), then du
and are classified according to the types of = p dx + q dy is an iexact differential form, and
additional conditions. In many cases, we as- by integrating it we get a solution 0(x, y, u, a, h)
1201 322 D
PDEs (Methods of Integration)

= 0 of (1). A solution of (1) containing two solution


arbitrary constants is called a tcomplete solu- m
tion. Here, setting b = g(a) and eliminating u u= e?‘Ycosvxdv=&exp( -g)
from the two equations cD(x,y, u, a, g(u)) = 0 s-m
and Q’, + @,g’(a) = 0, we obtain a solution (Y>O)> (4)
involving an arbitrary function. Such a solu-
tion is called a tgeneral solution of (1). For which is the tfundamental solution of the heat
example, consider pq= 1. Since the character- equation. This name refers to the fact that the
istic differential equations are function (4) can be used to obtain solutions of
the heat equation under some initial condi-
dx dy du -dp -dq tions. More exactly, a solution of uY- u,, = 0
-=-z-c- =-
4 P 2Pcl 0 0 ’ that is a function of class Cz in y > 0, is con-
tinuous in y 2 0, and coincides with a bounded
we have p = a (constant), and hence from the
continuous function rp(x) on y = 0 can be ob-
original equation we get q = a-‘. Then du =
tained by a superposition of the solution (4)
a dx + am1dy is an exact differential form,
and by integrating it a complete solution u = such as
ax + a-’ y + b is obtained. A general solution
is found by eliminating a from u = ax + a-l y +
g(a)andO=x-a-‘y+g’(a).
Likewise, when we have n independent vari- The method of separation of variables ap-
ables x 1, . . . , x, in the equation plied after a suitable transfbrmation of vari-
ables is often successful. In particular, by using
au orthogonal coordinates, tpolar coordinates, or
F(xl,...,x”,u,Pl,...,P”)=O, (1’)
pi=,xiT tcylindrical coordinates according to the form
we can use the tcharacteristic differential equa- of boundary, we often obtain satisfactory
tions to find a complete solution and a gen- results. For example, concerning the boundary
eral solution (the Lagrange-Charpit method; - value problem for Laplace’s equation Au = u,,
82 Contact Transformations). + uyY= 0, which is smooth in the circle rz =
x2 + yz < 1 and takes the value of a given con-
tinuous function g(0) on the circumference
C. Separation of Variables and the Principle of I = 1, we can use polar coordinates to rewrite
Superposition the equation in the form

The simplest and most useful method is the Au=u,,+~+~uee=O


separation of variables. Concerning the equa- r r
tion uf + u,’ = 1 in u(x, y), for example, by set- and apply the method of separation of vari-
ting u = q(x)+ $(y) we obtain rp’(x)’ +$‘(y)’ = ables to obtain particular solutions I” cos no,
1 or q’(x)’ = 1 -G’(y)‘. Since the right-hand r” sin no. Hence it is reasonable to suspect that
side is independent of x and the left-hand side by a superposition of these particular solutions
is independent of y, both sides must be equal we can obtain the desired solution:
to the same constant o?. From this we get a
(complete) solution involving two arbitrary
u&y)=?+ $$ (u,cosnB+b,sinn@r”.
constants t(, /? n=l

u=L%x+JiTy+p In fact, this series is a desired solution if the


coefficients a,, bn can be chosen so that the
For tlinear equations, it is often effective to series converges uniformly for 0 < r < 1 and can
write the solution as a product u = &x)$(y). be differentiated twice term by term for 0 < r <
From the theat equation u, = u,,, we obtain by 1, and if we have
this process a relation $‘(y)/$(y)= cp”(x)/cp(x),
from which we get a particular solution u =
g(e)=?+ z (u,cosnB+ b,sinnB).
~e+~ sin v(x - 0~)containing a parameter v. n=l
Next, when the equation is linear and
By virtue of the uniqueness of the solution of
homogeneous, by forming a linear combina-
an elliptic equation, this is the unique desired
tion of particular solutions that correspond
solution. The boundary value problem in the
to various values of a parameter v, we obtain
preceeding paragraph is well posed.
a new solution (the principle of superposi-
tion). For example, by integrating the solution
P2Ycos vx with respect to the parameter v D. Mixed Problems
between the limits -co and 00 (namely, by a
superposition consisting of a linear combina- For linear homogeneous equations of hyper-
tion and a limiting process), we obtain a new bolic or parabolic type, mixed problems fre-
322 D 1202
PDEs (Methods of Integration)

quently appear, i.e., problems in which both than the one obtained by combining the
initial and boundary conditions are assigned. method of separation of variables and the
These problems are, furthermore, classified principle of superposition.
into two types. Furthermore, the method described in this
For the first type, homogeneous boundary section is applicable to solving the following
conditions are assigned. For example, in vibra- nonhomogeneous equation, which charac-
tion problems of a nonhomogeneous string terizes the motion of a string under the in-
between 0 <x < I, we must find the solution of fluence of an external force f(x, t):
the equation
4, - 4, =f(x, t) (7)
under the boundary condition for the first type
satisfying an initial condition u =f(x), ut = g(x) of mixed problem and the initial condition u=
for t = 0, under homogeneous boundary con- u, = 0 for t = 0. In this case, we expand the
ditions: (i) u = 0 for x = 0 and x = 1 in the case unknown u and the function f(x, t) in terms of
of two fixed ends; (ii) u, = 0 for x = 0 and x = 1 the system of eigenfunctions {sin nx}, and by
in the case of two free ends; (iii) -u, + a,u = substituting
Oforx=O,andu,+a,u=Oforx=/(a,,>O,
err > 0), where the two ends are tied elastically u=“gl a,(t)sinnx, f(x,t)=“z A,(t)sinnx
to the fixed points; (iv) T(O)u(O)= T(l)u(l),
T(O)u’(O)= T(/)u’(l) (periodicity condition); into (7), we reduce the problem to determining
(v) u, u’ are finite at x = 0 and x = 1 (regularity u,,(t) (n = 1,2, . . . ). When the external force
condition). varies with a harmonic oscillation over time as
The method of separation of variables is in f (x, t) = - cp(x)exp( - iwt), a similar method
applicable to problems of this type also. For can be applied by setting u = u(x)exp( - iwt).
example, suppose that we have two fixed ends. For mixed problems of the second type, the
Disregarding the initial condition for a while, homogeneous initial condition u =: 0, ut = 0 for
we find a particular solution fulfilling only t = 0 is assigned, but the boundary condition
is nonhomogeneous. For example, when an
the boundary condition by setting u(x, t)=
y(x)exp ivt. Here, y(x) must satisfy oscillating string is at rest until t =: 0, and for t
> 0 its right end is fixed and its left end moves
(T(x)y’)‘+v2p(x)y=0, y(O)=y(l)=O. (5) subject to an assigned rule, the behavior of the
string is described by the solution of utf - u,, =
Except when the solution is trivial (i.e., u(x, t)
0 under the boundary condition ~(0, t) = f (t),
= 0), y(x) f 0 must hold. But in the special
u(l, t) = 0 (t > 0). This is called a transient prob-
case T(x) = 1, p(x) = 1, I= rr, the values of v for
lem. If we now choose an arbitrary function
which functions of this kind exist are only
B(x, t) that fulfills all the boundary and initial
l/2,3, . . (the corresponding y(x) is sin vx).
conditions, and if we set u - B(x, t]l= u(x, t) and
Also, in more general cases, nontrivial solu-
B,, - B,, = f (x, t), then u satisfies
tions y(x) exist only for some discrete values
of v. These v are called teigenvalues of(S), and 4, - vxx =f(x,t) for t>O
the corresponding solutions y(x) are called withv=v,=Ofort=O;v=Oforx:=Oandx=1.
teigenfunctions for the eigenvalues v. That is, Then u(x, t) describes the oscillation of a string
the desired particular solution can be obtained that is at rest until t = 0 and moves under the
by solving the teigenvalue problem (5). Again, effect of an external force represented by f(x, t)
when 7’(x) = p(x) = 1, I= rc, particular solutions for t > 0. Problems of this kind often appear in
are sin nx exp int, sin nx exp( - int) (n = 1,2, . ). electrical engineering.
We consider Such problems can be reduced I:O problems
cc of the first type in the manner described in the
u(x, t)= 1 (a,cosnt+ b,sinnt)sinnx, (6) previous paragraph, but there are some direct
II=,
methods that are more effective, the first being
which is obtained by a superposition of these Duhamel’s method. Consider the case where
particular solutions. If we can determine the f(t) is the unit impulse function:
coefficients a,, b. so that this series converges
l-1. t>o.
uniformly and is twice differentiable term by f(t)= o’ ’
term, and I, t<O,
and let U(x, t) be a solution corresponding to
f(x) = “El a, sin nx, g(x)= f nb,sinnx, this case and vanishing for t Q 0, x > 0. Then a
II=1
solution corresponding to the general case is
then the series (6) is a solution of the mixed given by
problem in question. If the uniqueness of the
solution of the mixed problem is proved, it is - U(x, t - z)f(z)dt.
not necessary to look for any solution other
1203 322 E
PDEs (Methods of Integration)

The second method is based on application 2. For the linear partial differential operator
of the tLaplace transformation. Denoting the
Laplace transform of a solution u(x, t) by &+c(x.y)~w
Lw=A(x,Y)~+2B(x,y)
mue-Pldt=!&@,
s 0 P +D(x,y)~+E(x,y)aU+F(x,y)u
ay
we multiply both sides of u,, - u,, = 0 by e-p’
and its adjoint partial differential operator
and integrate the result with respect to t from
0 to co. Then, taking account of the initial a2(Av) + 2 aw4 + awd
M(v)=axZ - ~
condition, we have axay ay2
v,,-p%=o
-- ww -- a(w + Fv
with the boundary condition ax ay ’

we have tGreen’s formula:


v(O,p)=p mf(t)e-p’dt, v(l, p) = 0.
s0
(vL(u)-uM(v))dxdy
If for p = LX+ i/3 (tl > clo) we can find a solution
v(x, p) of this boundary value problem for the
ordinary differential equation, then the desired (8)
solution is given by

4x,
t)=&
s L
4x9 PI
-ePdp,
P
where
P=v{Au,+Bu,}-u{(Av),+(Bu),}+Duu,

Q=v{Bu,+Cu,}-u{(Bv),+(Cv),}+Euv, (9)
where L is a path in c(> a0 parallel to the
imaginary axis. This is called the Bromwich and dD denotes the boundary curve of the
integral. domain D, n the internal normal of aD at a
point of aD, and s the arc length.
We can apply this formula for solving a
nonhomogeneous equation L(u) =f as follows.
E. Green’s Formula and Application of
Assume that there exists a solution of L(u) =f
Fundamental Solutions
satisfying the assigned additional condition,
and choose a tfundamental solution of M(v) =
Given a linear partial differential operator 0 having an adequate singularity at a point
(x0, y,) of D and fulfilling a suitable boundary
L[ul=~a,(x)m4 P=(PI,...,P”), condition. Then, if we substitute these solu-
P
tions u and v into (8), we obtain an explicit
Dp,p~+-+pnpXp . ..dxp.
representation of u(x,, yo). If we can verify that
we call the operator the function u(x, y) thus obtained is a solution
of L(u) =f fulfilling the assigned additional
L*[v] GC( -1)Pl +~~~+P~DP(Zp(x)v)
P
conditions, then we see, under the assumption
of uniqueness of solutions, that this and only
the adjoint operator of L, and the operator this function u(x, y) is the desired solution. For
t~[~]~~(-l)P~+~~~+P~DP(ap(x)v) example, consider a boundary problem for
P
L(u)=g+e
the transposed operator of L. Sometimes ‘L is ay' for which &f(a)=a’“+fi.
ax2 ay2
also called the adjoint operator of L. In the The problem is, for a circle of radius r with
complex Hilbert space, the adjoint operators center at the origin, to find a function u(x, y)
are more appropriate than the transposed that is continuous in the interior and on the
operators. In this case, the operator L* defined circumference C, of the circle, and that satis-
above is usually called the formal adjoint oper- fies L(u)=f in the interior of the circle (f is
ator to distinguish it from the one defined by bounded and continuous in the interior of the
circle) and is equal to a given continuous
(L[u],v)=(u,‘L*[v]) for all ueD(L),
function g on the circumference C,. In this
D(L) being the domain of the definition of ~ case, we consider the circumference K, of
L (- 251 Linear Operators). These trans- ~ sufficiently small radius E with center (x0, y,)
posed or adjoint operators are often used to contained in the interior of the first circle, and
represent (at least locally) the solutions u of set
L[u]=“fI
4% Y) = (w~)lw l/P + &, YX
We explain in more detail the specific case
where the number of independent variables is p=((x-xx,)2+(y-yy,)2)1’2. (10)
322 Ref. 1204
PDEs (Methods of Integration)

We apply formula (8) to the domain 0, en- [S] J. L. Lions, Equations diff&entielles opCra-
closed by the circumferences K, and C,. If, tionelles et problkmes aux limites, Springer,
in (lo), h satisfies M(u) = 0 in the interior of the 1961.
circle enclosed by C, and vanishes on C,, then [6] A. Friedman, Partial differential equations,
we get Holt, Rinehart and Winston, 1969.

vfdxdy= - (vu, - uv,)ds + gods.


s Kr s C,
By letting E tend to zero, the first term on the
right-hand side yields 323 (X111.24)
Partial Differential Equations
2n 1 a1ogp
-U(Xo,Yo) --pdO= -u(x,,Y,), of Elliptic Type
s 0 2n ap
by the logarithmic singularity of u at the point
A. General Remarks
(x,, y,). Therefore we have an explicit repre-
sentation of u,
Suppose that we are given a tlinear second-
order partial differential equation

and it is easily verified that this is the desired


solution.
As stated in the previous paragraph, to =fb% (1)
apply Green’s formula it is necessary to find a where x=(x1, . , x,), aji = aij. If the quadratic
solution v of M(v) = 0 possessing a fundamen- form C aij&lj in 5 is tpositive definite at every
tal singularity as the logarithmic singularity, point x of a domain G, this equation (or the
i.e., a so-called fundamental solution. As fun- operator L) is said to be elliptic (or of elliptic
damental singularities for type) in G. For n=2, a,,(x)a,,(x)--(a,,(x))‘>
0 is the condition of ellipticity. In this case,
M(~)=!?!-!?! andM(v)=?-??! by a change of independent variables, the
ay ax2 ay2 ax2 equation is transformed locally into the canon-
of parabolic type and of hyperbolic type, we ical form
must take those given respectively by 2 2

$+‘“+b,(x,y)g+b2(x,y)(;U+c(x,y)u
@, Y) aY2 ay

=i QFiii ’ exp(-$fJ$), =.ikY).


Y>Y,,
The operator x1=, a2/axf, denoted by A, is
10, Y4Y0, called Laplace’s operator (or the Laplacian).
The simplest examples of elliptic equations are
and Au = 0 (Laplace’s differential equation) and
l/2, Ix-Xol<Y-Yo, Au =f(x) (Poisson’s differential equation) (-
4x, Y) = Appendix A, Table 15).
{ 0, Ix-Xol>Y-y,

(- 323 Partial Differential Equations of Ellip-


tic Type; 325 Partial Differential Equations of B. Fundamental Solutions
Hyperbolic Type; 327 Partial Differential
Equations of Parabolic Type). Let K be the n-dimensional ball with radius R,
center x0, boundary R (an (n - l)-dimensional
sphere), and area S,, and let r be the distance
References
from x0 to x. Then for any function u(x) of
class C2 we have
[l] R. Courant and D. Hilbert, Methods of
mathematical physics, Interscience, I, 1953; II,
1962.
[2] E. Goursat, Cours d’analyse math&ma-
tique, Gauthier-Villars, III, 1927. for n>2,
[3] A. G. Webster and G. SzegG, Partielle
Differentialgleichungen der mathematischen
Physik, Teubner, 1930.
[4] L. Schwartz, Mathematics for the physical
sciences, Hermann, revised edition, 1966. for n=2.
1205 323 C
PDEs of Elliptic Type

Thus, if u is a solution of Poisson’s equation Writing L[V(x, c)] =x(x, c), we obtain the
Au=f(x), we have a representation of u(xO) by following tintegral equation of Fredholm type
replacing Au by f in the integrals just given. in q(x):
Next, concerning the solutions of the equation
Au + cu =0 (c > 0, constant), the following
relation holds:
If we denote the tresolvent of this equation by
ir(x, 5) and put

where I is the tgamma function, n’ = (n - 2)/2,


and J, is the tBesse1 function of order v. then E(x, 5) = - y(x, <)$@)/w” is seen to
Now, if we put be a fundamental solution of L. Thus Levi’s
method enables us to construct a fundamental
(~(Xi-~i)z~-n’*=~2-n, n>3, solution locally by successive approximation,
w, 5)= because the integral equation in rp as above
has a unique solution expressible by Neumann
10g(~(xi-&)z)l’2=Iog~, n=2, series if the domain G is small enough (- 189
! Green’s Operator).
then the function u(x) defined by

C. The Dirichlet Problem

Let G be a bounded domain with boundary I.


n-2 We call the problem of finding a solution u
c#,=27F-
W42) of the given elliptic equation in G that is con-
tinuous on G U I’ and takes the assigned con-
represents a tparticular solution of Au =f(x),
tinuous boundary values on T the first hound-
where V(x, <) as a function of the variables x
ary value problem (or Dirichlet problem). In
satisfies A V(x, 5) = 0 except at x = 5.
Consider now the more general case (1). A particular, the Dirichlet problem for Au = 0
function E(x, 5) is called a fundamental solution has been studied in detail (- 120 Dirichlet
Problem).
(or elementary solution) of (1) or of L if
If c(x)<0 andf(x),<O or if c(x)<0 and
f(x) < 0 in (I), a solution u does not attain its
u(x)= E(x, 5)f(5)&
sG local negative minimum in G. This is called
the strong maximum principle (- [3,4] for
provides a solution of (1) for any foci Hopf ‘s maximum principle and Giraud’s theo-
with compact support. A fundamental solution rem). The maximum principle is one of the
E(x, 5) is a solution of the equation L[E] = 0 most powerful tools available for the treat-
having a singularity at x = 5 of the form ment of elliptic equations of the second order
-o,‘Jaor’-“(n>,3)oro;‘Jaologr with real coefficients. From this it follows that
(n=2), where a(t)=det(@(l;)) and r= the solution of the Dirichlet problem for (1) is
(Caij(~)(xi-&)(xj-~j))1/2 and (a’j) is the unique if c(x) < 0. Furthermore, concerning the
inverse matrix of (a,). uniqueness of the solution, we have the follow-
Roughly, there are three different methods ing criterion: If there exists a function w(x) >
of constructing fundamental solutions. The 0 of class C2 in G and continuous on GU r
first and most general is to use pseudodif- such that L[w(x)] < 0, then the solution of the
ferential operators (- 345 Pseudodifferential Dirichlet problem is unique.
Operators). The second is that of J. Hadamard Let G be a tregular domain in the plane. If
[ 11, which uses the geodesic distance between we denote tGreen’s function of A relative to
two points x and 5 with respect to the Rie- the Dirichlet problem in G by K(x, y; t, r/), then
mannian metric ~aij(x)dxidxj. This is impor- the solution u of Au =f(x, y) vanishing on I is
tant in applications of elliptic equations to given by
geometry. The third method is due to E. E.
Levi [Z] and is as follows: Let U(X,Y)’ -& KkYi 5, df(5, ?be&>
ss G
r2-n
na3,
w, 5)= where f(x, y) satisfies a tHolder condition.
{ -lo;,, n=2. The Dirichlet problem for
To obtain a solution of L[u] =f(x) we set
323 D 1206
PDEs of Elliptic Type

with prescribed boundary values reduces to for any (x, 5)~ c x R”. The inequality (2) is one
the problem in the previous paragraph. In fact, of the most important a priori estimates in the
let h(x, y) be a function that coincides with the theory of elliptic equations [S, 63 (- inequality
prescribed boundary value on r. Then if we (9) in Section H).
put u = h(x, y) + u, the problem is reduced to
finding a solution u satisfying an equation
similar to the one for L[u] (replacing f by
D. Quasilinear Partial Differential Equations
f- L[h]) and vanishing on r. Now suppose
that r consists of a finite number of +Jor-
dan curves whose tcurvatures vary continu- Consider the second-order partial differential
ously, II and h are continuously differentiable, equation in u(x,, . , x,):
and c and f satisfy HGlder conditions. Let
F(xl,...,x,,u,P*,...,Pn,P1l,..~,Pijr.~.,Pnn)=o,
K(x, y; <, ‘1) be Green’s function for the Diri-
chlet problem relative to A. To find a solution where pi = au/axi, pij = a2 @xi axj. If, for a
of L[uj =,f vanishing on r we set solution u(x), C~,j=,(aF/apij)~i~j is a positive
definite form, we say that the equation is ellip-
tic at u(x). Moreover, if for any values of u, p,,
pij this quadratic form is positive definite, we
say simply that the equation is of elliptic type.
Then p is a solution of the integral equation
The equation is called quasilinear if F is linear
in pij. For example, the equation of tminimal
Pk Y) -+ H(x, Y, 5, vb(5, v)dtdv =f(x> YX
ss G surfaces (- 334 Plateau’s Problern)

where H(x, Y, 5, d = (-1/W {4x, y)K,(x, Y;


5, VI + 04 YW,(X> y; 5,?) + C(X> Y)K(X, y; 5, II,).
is a quasilinear elliptic equation. Furthermore,
Therefore we have the following alternatives:
Either L[u] =f has a unique solution for any Au =f(x, Y, u, u,, uy)
f and any given boundary values on r, or
is also elliptic. E. Picard solved this equation
L[u] = 0 has nontrivial solutions vanishing on
by the method of successive approximation [7].
r (in this case the number of linearly indepen-
Specifically, let h(x, y) be the tharmonic func-
dent solutions is finite).
tion taking the assigned boundary values on I-.
More general equations of type (1) have
Starting from u,(x, y)= h(x, y), we define the
been studied by J. Schauder and others.
functions u,(x, y) successively as the solutions
Schauder proved, first, that when the h,(x) and
of
c(x) are zero, aij(x) and f(x) satisfy Hiilder
conditions, and the boundary r of G is of class
C’, then there exists a unique solution u(x)
vanishing on r. Next, when the b,(x) and c(x) coinciding with h on I-. Let K(x, y; 5, q) be
are continuous and aij and S satisfy HGlder Green’s function in G and in the region de-
conditions, he showed the following alterna- fined by lu-h(x,y)l<A, Jp-II,..< B, lq-
h,,l <B, (x, y)~ G, and let the supremum of
tives: Either L[u] =,f admits a unique solution
If(x, y, u, p, q)[ be C. Assume now that
vanishing on r for every A or L [u] = 0 has
nontrivial solutions vanishing on I’ (in this
case, the number of linearly independent solu-
tions is finite).
In (l), suppose that aU, bi, and c are Holder
continuous of exponent t( (0 < c(< 1) uniformly
on G and that r is of class C’+“. Then we have
and that f satisfies a HGlder condition in (x, y).
the following inequality (Schauder’s estimate)
Moreover, let
for any UEC*+‘(G):

II4 2+a,G~~1~Il~c~lIlo,G+II4l2CPIJ) If(X> Y>u’, P’> d-m, Y>%P? 411


+~2ll4o,,, <Llu-u’l+L’(lp’-pl+Iq’-ql).
(2)
Assume finally that
where IIf IL stands for the norm in the func-
tion space C?(S) (- 168 Function Spaces). K,
and K, are positive constants depending on L, (LK+L’(IK,I+lK,l))d5d~~y<l
ss G
G, and cxbut independent of u. More precisely,
K, depends only on the ellipticity constant i, Under these assumptions, {u,(x, y)} is uni-
of L defined as the smallest number > 1 such formly convergent, and the limit 11(x, y) coin-
that ciding with h(x, y) on r satisfies

AU = f(X, Y, U, h/ax, &lay).


1207 323 E
PDEs of Elliptic Type

Furthermore, it is known that the solution is Dirichlet problem for this equation is unique.
unique within the region mentioned above. Even when FU < 0 does not hold, the conclu-
This method can be applied when G is small sion remains the same if we can reduce the
and the values of h, and h, are limited. equation to this case by a suitable change of
When f does not contain p and q, the fol- variables.
lowing method is known. Let w(x, y) and Quasilinear equations in divergence form
0(x, y) both be continuous on G U r and of
class C2 in G. Suppose that f(x, y, u) satisfies a f&"i(x,u.Y&,....~)
HGlder condition in ~(x, y) < u <0(x, y), and I 1 "
that au au
=f x,u,-- ,...) - , (4)
Awaf(x,y,w(x,y)), A@,<f(x,~,W,y)). ( ax, ax, >
Then; given a continuous function rp on r such or more generally, any quasilinear elliptic
that w < cp< O, there exists a unique solution u equation
of Au = f (x, y, u) such that
$, aij(XThm&, ...tg)&
wk Y) ,< 4x, Y) < 0(x, Y) in G, n 1,

u(x, y) = cp on r. au au
=f x,u,- ,..., - ,
Finally, consider the equation ( ax, ax,> (5)

and even quasilinear elliptic systems have been


a%
AZ,2BL +c2u=o, treated in detail in several recent works [S, 91.
axay ay* J. Serrin [lo] treated the Dirichlet problem
where A, B, and C are functions of x, y, u, p, q and established the existence and the unique-
and AC - BZ > 0. Under the following condi- ness of solutions for some classes of equations
tions, there exists a solution of the Dirichlet of type (5) containing the minimal surface
problem: A, B, C are of class C*, and their equation. His method is to estimate the maxi-
derivatives of order 2 always satisfy HGlder mum norms of u and of its first derivatives, to
conditions; G is tconvex; and the boundary apply a result of 0. A. Ladyzhenskaya and N.
value cp along I- considered as a curve in xyu- N. Ural’tseva [9] and the Schauder estimate
space represented by the parameter of arc (2), and finally to use the Leray-Schauder
length is of class C3+’ (CL> 0). Moreover, any fixed-point theorem [ 111.
plane having 3 common points with this curve
has slope less than a fixed number A. The
E. Relation to the Calculus of Variations
proof of this theorem is carried out in the
following way: For any function u satisfying
Consider the bilinear form
lu(x,Y)l~maxlcpI~ lu,l~A, lu,l<A,

replacing u, P, 4 by u(x, Y), u,(x, Y), u,(x, ~1,


respectively, in A, B, and C, we have the linear
equation in v: +2xbi(x)gu+c(x)u’+2f(x)u dx,
I >
A[u]$+2B[u, &+qu,$=o.
where we assume ~u~~(x)~~<~>O. Under the
boundary conditions imposed on u, if there
We can obtain the solution v taking the exists a function u that makes J minimum,
boundary value cp on I-. Thus we have a then assuming some differentiability condi-
mapping u-v. Applying the Vixed-point tion on aij(x), hi(x), c(x), we have the tEuler-
theorem in function space to this mapping, we Lagrange equation
have the desired solution v(x, y) = u(x, y).
Concerning the Dirichlet problem for the
second-order semilinear elliptic partial dif-
ferential equation
u-f(x)=O,

i~~aij(x)~=f(x,u,~,-..,~),
1 J n
which is a linear second-order self-adjoint
a work by M. Nagumo (Osaka Math. J., 6 elliptic equation.
(1954)) establishes a general existence theorem. B. Riemann treated the simplest case, where
For the general nonlinear equation aij(x) = Sj, b,(x) = c(x) = 0, i.e., the case Au = 0.
He proved, assuming the existence of the mini-
Fb l,...,x,,u,Pl,...,P”,Pll,...,Pij,..., Pd = 0,
mum of J, the existence of the solution of Au =
if X(6JF/apij)5i5j > 0, F. < 0, the solution of the 0 with assigned boundary values. This result,
323 F 1208
PDEs of Elliptic Type

called Dirichlet’s principle, was used by D. second boundary value problem are deter-
Hilbert, R. Courant, H. Weyl, 0. Nikodym, mined uniquely up to additive constants. Fur-
and others to show the existence of solutions thermore, let M be the tadjoint operator of L,
for linear self-adjoint elliptic equations. and let
If F(x,, . , x,, u,pr, . ,p,) satisfies
C(ZZF/(ipii?pj)~i[j>O (and F has some regu- IY[v,=ug+(p-b)u,
larity), then the function that minimizes the
integral where
^I
J= x ,,..., x.,u,g
1
,...,;
n
dx,, . . ..dx. b=i=,
f cos(v,,xi)[h,-~
21
with the given boundary condition satisfies the Then if the boundary S of G is of class C’ and
Euler-Lagrange equation (of type (4)) ,f; cp are continuous, in order that there exist
at least one solution u of the second or third
boundary value problem relative to L[u] =,L it
is necessary and sufficient that
(pi = au/Zxi) and the boundary condition as
well. This is also an elliptic partial differential fvdx- cpvdS=O,
equation in u. The case where F is a function sG ss
of p alone (in particular, the case of tmini- where v is any solution of M[v] =0 with the
ma1 surfaces) has been studied by A. Haar, T. boundary condition B’[u] = 0. Here the neces-
Rado, .1. Serrin, and others, particularly for sity is easily derived from Green’s formula. G.
F=(1+p:+...+p~)1’2inthecaseofthemini- Giraud used the notion of fundamental solu-
ma1 surface equation. tion to reduce the second and third boundary
value problems relative to L[u] =f to a prob-
lem of integral equations, under the assump-
F. The Second and Third Boundary Value
tions that G is a domain of class C’, the co-
Problems
efficients of L and f satisfy Holder conditions,
Let G be a domain in R” with a smooth and cp and /J’ are continuous [3; also 121.
boundary consisting of a finite number of
hypersurfaces. Also, let B[u] be the boundary
G. Method of Orthogonal Projection
operator defined by

The theory of +Hilbert spaces is applicable to


(6) the boundary value problems in Section F. In
general, let H”(G) be the space of functions in
where a=(~~=,(~~~, uijcos(voxj))2)‘~2, v0 is the L,(G) whose partial derivatives in the sense of
outer normal of unit length at the point x E S, tdistributions up to order m belong to I&(G).
and v is the conormal defined by For elements f and g in H”(G), we define the
following inner product:
cos(vxi)= -f aijcos(v,xj)/a, i= 1, . . ..?I.
j=l
(Ldm=c JGD”f(xP”g(x)
lal<m dx,
The problem of finding the solution u(x) of the
equation L[u] =f continuous on the closed where
domain G and satisfying B[u] = cp on the alnl
boundary S of G is called the second boundary DE= Icrl=a,+...i-a,
value problem (or Neumann problem) when ax;1 . . ax?’
b = 0, and the third boundary value problem (- 168 Function Spaces). With respect to this
(or Robin problem) when /I $0. In general, in inner product, H”(G) is a Hilbert space. When
boundary value problems, the condition that u satisfies L[u] =f (MEL,) and q(x) is an
the solutions must satisfy at the boundary is arbitrary element of H’(G), Green’s formula
called the boundary condition. We assume that yields
the boundary S of G is expressed locally by a
function with iHolder continuous first deriva-
tives (G is then called a domain of class C’l”).
Assume that G is such a domain, c GO, p > 0,
and at least one of c and p is not identically 0. +(c(x)u,q)+ ua”. @dS=(f,cp),
s s av
Then the second and third boundary value
problems admit one and only one solution. where b;(x) = b,(x) = - Z,(&I,/~X,). Taking
When c = 0 and p = 0, the solutions of the account of the boundary condition on u:
1209 323 H
PDEs of Elliptic Type

a aujav+pu=O,we get is called an elliptic operator if &=,,, a,(x)<’ #0


({ # 0). In particular, if

+ssBu(pdS=
-(Jcp).
Re, z dW>cW, c>O,
a m

I-. is called a strongly elliptic operator. In this


case, m is even. L. Girding studied the Diri-
Thus the problem is reduced to finding U(X)E chlet problem for strongly elliptic operators
H’(G) satisfying this equation for all (PE H’(G). [14]. If we put m=2b, the boundary value
This equation can be regarded as an equation condition is stated as @u/&j =fj(x) (j = 0, 1,
in H’(G). If necessary, by replacing c(x) by . . , b - l), where v is the normal of unit length
c(x) - t for a large t, we can show that for any at the boundary. Using the notion of function
f(x)~&(G), there exists a unique solution space, this boundary condition means that
u(x)~Hl(G) [12,13]. Now, for a solution of the solutions belong to the closure fib(G) of
this functional equation, if we take (P(X)E g(G) in Hb(G) (- 168 Function Spaces). In
9(G), we have (u, L*[q]) = (J rp), where L* is this treatment, Giirding’s inequality
the tadjoint operator of L. This means that
u(x) is a solution of L[u] =f in the sense of
distributions, and we call such a u(x) a weak unfix, (7)
solution. Such a treatment may be called the
method of orthogonal projection, following where 6 and c are positive constants, plays an
Weyl. In this case, it can be shown that if we important role.
assume smoothness of the coefficients, the In general, for an elliptic operator L defined
boundary S, and /?, then the solution U(X)E in an open set G, if u(x) satisfies J!,[u] =f(x)
H’(G) belongs to H’+‘(G) when ME H”(G) and f(x) belongs to H” on any compact set in
(s = 0, 1, . . . ). Thus, if we apply Green’s for- G, then u(x) belongs to H”+” on every compact
mula, we can see that u satisfies the boundary set in G (Friedrich’s theorem [ 1.51).
condition a au/h + /Iu = 0. In particular, when General boundary value problems for elliptic
s> n/2, we see that u(x)~C~(@ by tSobolev’s equations of higher order have been consid-
theorem [12]. In other words, u(x) is a gen- ered by S. Agmon, A. Douglis, and L. Niren-
uine solution. berg [16], M. Schechter [17], and others.
Next, we introduce the complex parameter These problems are formulated as follows:
1 and consider the boundary value problem
(L + nr) [u] =L f~ z,,(c), aatqav + flu = 0. If t Uul =fM Bj(X,D)u(x) = cpj(X),
is large, (L - t1) is a one-to-one mapping from XES, j=l,2 ,...,b(=m/% (8)
the domain B(L)= {u~H~(G)~a&@v+~u=O}
onto L,(G). Thus, denoting its inverse, which where the Bj(x, D) are differential operators
acts on the equation from the left, by G,, we at the boundary and f and {cpj} are given
have functions. Under certain algebraic conditions
(Shapiro-Lopatinskii conditions) on (L, {Bj}),
(I+@+ t)G,)[u]=GJ
the problems are treated also in H”(G); hence
Conversely, since the solution u(x) contained the L2 a priori estimates play a fundamental
in L2(G) (hence also contained in g(L)) satis- role: If u E H”(G),
lies the equation and the boundary condition,
the problem is reduced to the displayed equa- ll”llm~K(llLull +,iI lIBjullm-mj-(1/2).S+ Il”llb
tion in L,(G) considered above. Now, since G
is bounded and G, is a continuous mapping (9)
from L,(G) into H’(G), Rellich’s theorem yields
where K is a constant determined by (L, Bj, G),
that G, is a tcompact operator when it is re-
II . Ilk,S is the norm in Hk(S), and mj are the
garded as an operator in L,(G). So we can
orders of Bj (compare (9) with (2)). Under these
apply the tRiesz-Schauder theorem (- 189
estimates the boundary value problem is said
Green’s Operator).
to be coercive. In applications, the theory of
interpolation of function spaces are also used
H. Elliptic Equations of Higher Order [lS] (- 168 Function Spaces). Variational
general boundary value problems have been
The differential operator of order m: treated by D. Fujiwara and N. Shimakura (J.
Math. Pures Appl., 49 (1970)) and others. For
systems of such equations, there are works by
F. E. Browder (Ann. math. studies 33, Prince-
ton Univ. Press, 1954, 15-51) and others.
323 I 1210
PDEs of Elliptic Type

I. Analyticity of Solutions K. Elliptic Pseudodifferential Operators and


the Index
In a linear elliptic equation Lu =f; suppose
that all thaoefficients and f are of class C” A pseudodifferential operator P(x, ,9) with
(resp. real analytic) in an open set G and that u symbol p(x, [)E$‘, (- 345 Pseudodifferential
is a distribution solution in G. Then u is also of Operators) is said to be elliptic, provided there
class C” (resp. real analytic) in G [19]. Hence exists a positive constant c such that Ip(x, <)I >
the linear elliptic operators are hypoelliptic c(l+l~l)” for all XER” and ltl>c-‘. The no-
(resp. analytically hypoelliptic) (- 112 Dif- tion of ellipticity can be extended to oper-
ferential Operators). In particular, tharmonic ators on a manifold. The theory of elliptic
functions (i.e., solutions of Au = 0) are (real) pseudodifferential operators has been widely
analytic in the domain of existence, whatever applied to the study of elliptic differential
the boundary values may be. equations, and is particularly useful in the
Hilbert conjectured that when F(x, y, U, p, calculation of the +index of elliptic operators.
4,r,s,t)(p=p,,q=p,,r=p,,,s=p,,,t=p,,) B. R. Vainberg and V. V. Grushin [:22] cal-
is analytic in the arguments, then any solu- culated the index i of the tcoercive boundary
tion u of the elliptic equation F = 0 is analytic value problem for an elliptic operator by
on the domain of existence (1900, Hilbert’s showing that i is equal to the index of some
19th problem; - 196 Hilbert). This conjec- elliptic pseudodifferential operator on the
ture was proved by S. Bernshtein, Rado, and boundary.
others, and then H. Lewy proposed a method Example [23]: Given a real vector field
of extending this equation to a complex do- (vi, v2) on the unit circle x: +x$ = 1, suppose
main so that it can be regarded as a thyper- the vector (v,(x), v2(x) rotates 1 times around
bolic equation (Math. Ann., 101 (1929)). This the origin as the point x = (xi, x2) moves once
result was further extended by I. G. Petrovskii around the unit circle in the positive direc-
to a general system of nonlinear differential tion. Then the index of the boundary value
equations of elliptic type (Mat. Sb., 5 (47), problem
(1939)).

( $+$ 2>44 =m,


J. The Unique Continuation Theorem
v1(4~+,(x)~=,(x),
1 2
x:+x;= 1,
Since all the solutions of Laplace’s equa-
tion Au = 0 are analytic, it follows that if u(x) is equal to 2 - 21.
vanishes on an open set in a domain, then M. F. Atiyah and I. M. Singer determined
u(x) vanishes identically in this domain. This the index of a general elliptic operator on a
unique continuation theorem can be extended to manifold in terms of certain topological invar-
linear elliptic partial differential equations with iants of the manifold (- 237 K-Theory H).
analytic coefficients in view of the analyticity The index of noncoercive boundary value
of solutions. This fact is also proved by apply- problems has also been studied by Vainberg
ing +Holmgren’s uniqueness theorem (- 321 and Grushin, R. Seeley (Topics in pseudo-
Partial Differential Equations (Initial Value differential operators, C.I.M.E. 1968,335-
Problems)). The unique continuation theorem, 375), and others.
first established by T. Carleman for second-
order elliptic partial differential equations
L. The Giorgi-Nash-Moser Result
L[u] =0 with Cl-coefficients in the case of
two independent variables, was extended to Let us state the following result (J. Moser
second-order linear elliptic equations with C2- [24]): Let L be of the form
coefficients in the case of any number of inde-
pendent variables by C. Miiller, E. Heinz,
Lu= f a a,(x); ,
and finally by N. Aronszajn [20]. This re- i, axj 1
j=)
cl
search was extended by A. P. Calderon [21]
where aij= aji are real-valued, of class Lm(G),
and others in the direction of establishing the
and such that the ellipticity condition (3) holds
uniqueness of the Cauchy problem. However,
at almost everywhere in G with some Ia 1.
it is to be remarked that even if we assume
Also, let G’ be any subdomain of G whose
that the coefficients are of class C”, we cannot
distance from 8G is not smaller tha.n 6 > 0.
affirm the unique continuation property for
Then, for any weak solution UE H’(G) of the
general elliptic equations. A counterexample
equation Lu = 0 and for any two points x and
was given by A. PliS(Comm. Pure Appl. Math., y in G’, we have the inequality
14 (1961)). See also the work of K. Watanabe
(Tohoku Math J., 23 (1971)).
1211 323 Ref.
PDEs of Elliptic Type

where A and tl (A > 0,O < c(i 1) depend only on normal vector to G. Let Z; be the set of x E I
(n, i,6) and are independent of the particular at which L is not degenerate in the normal
choice of (I,, G, G’, u). Moser proved that the direction. Also, let Z,, C,, and Z. be the sets
above inequality is a corollary to a Harnack- ofxsI\Z; at which b(x)>O, ~0, and =O,
type inequality (- 327 Partial Differential respectively, where b(x) is defined by
Equations of Parabolic Type G).
b(x)=-i$ v,(x)bi(x)-j$ y}. (11)
I
M. Asymptotic Distribution of Eigenvalues
Then the Dirichlet problem for equation (1) is
to find a function u(x) defined on G U C, U Z3
Let L be an elliptic operator on G of order m
satisfying
with smooth coefficients realized as a self-
adjoint operator in L2(G) under a nice bound- L[u] =f in G, (1)
ary condition, where G is a bounded domain
u=g on Z2U&, WI
in R” with smooth boundary. Let N( T)( T> 0)
be the number of eigenvalues of L smaller than where f and g are given functions.
T. Then, it holds that Letl<p<co.Wesetq=p/(p-l).Wealso

N(T)=CT”‘“+an error term, as T+ +a~, Put


n abi(x) ” a2aij(X).
c*(x)=cw~l~+ c (13)
C =(2x)-” dx 4x, 5) -n’m dS,, (10) , i,jEl axiaxj
sG s S”-l
We have the following existence theorem [28]:
where a(x, 5) is the tprincipal symbol of L and If(i)eitherc<OonGorc*<OonGandif
S”-’ is the unit sphere on R”. C is independent (ii) pc + qc* < 0 on G, the Dirichlet problem
of the boundary condition and, if L is of con- (1) and (12) (with g = 0) has a weak solution
stant coefftcients, of the shape of G. UE LP(G) for any f~ LP(G). The regularity of
Formula (10) was at first established by solutions is also discussed in [28]. The value of
H. Weyl [25] for the case of L = Laplacian, b(x) is closely related to the regularity near the
and hence it is often called Weyl’s formula. point XE I if L is degenerate at x in the nor-
Weyl’s method is based on the minimax prin- mal direction (- also M. S. Baouendi and C.
ciple [26]. T. Carleman (Ber. Math.-Phys. Goulaouic, Arch. Rational Mech. Anal., 34
Klasse der Siichs. Akad. Wiss. Leipzig, 88 (1969)).
(1936)) studied the behavior of the trace of Degenerate elliptic equations of type (1)
the Green’s function of zl-L as lzl+ co in the have also been investigated from the proba-
complex plane (- [27]). S. Minakshisun- bilistic viewpoint (- 115 Diffusion Processes).
daram (Canad. J. Math., 1 (1947)) discussed The general boundary value problems for
this formula in connection with the heat equa- degenerate elliptic equations of higher order
tion; see also S. Mizohata and R. Arima (J. have been treated by M. I. Vishik and V. V.
Math. Kyoto Univ., 4 (1964)). L. Hormander Grushin [29], N. Shimakura (J. Math. Kyoto
(Acta Math., 121 (1968)) treated the case of Univ., 9 (1969)) P. Bolley and J. Camus (Ann.
compact manifolds without boundary and Scuola Norm. Sup. Piss, IV-1 (1974)), and
obtained the best possible error estimate. others.
H. P. McKean and I. M. Singer (J. Differential
Geometry, 1 (1967)) treated the case of mani-
folds and discussed the geometric meaning of References
this formula. In general, N(T) is no more than
[l] J. Hadamard, Lectures on Cauchy’s prob-
O(T”im) if L is of degenerate elliptic type (C.
lem in linear partial differential equations,
Nordin, Ark. Mat., 10 (1972)).
Yale Univ. Press and Oxford Univ. Press,
1923.
N. Equations of Degenerate Elliptic Type [2] E. E. Levi, Sulle equazioni lineari total-
mente elliptiche alle derivate parziali, Rend.
An operator L of the form (1) is said to be Circ. Mat. Palermo, 24 (1907), 275-317.
degenerate at x0 E G in the direction 5 E R” if 5 [3] C. Miranda, Partial differential equations
is a null vector of the matrix (a,(x’)). L is said of elliptic type, Springer, 1970.
to be of degenerate elliptic type if (aij(x)) is [4] M. H. Protter and H. F. Weinberger,
nonnegative definite at any x E G and if L Maximum principles in differential equations,
is degenerate at some point of G in some Prentice-Hall, 1967.
direction. [S] J. Schauder, Umber lineare elliptische Dif-
Suppose that the coefficients of L and the ferentialgleichungen zweiter Ordnung, Math.
boundary I of G are smooth enough. At XE I, Z., 38 (1934), 257-282.
denote by v(x) = (vl (x), . . . , v,,(x)) the unit outer [6] J. Schauder, Numerische Abschltzungen in
324 A 1212
PDEs of First Order

elliptischen linearen Differentialgleichungen, [25] H. Weyl, Das asymptotische Verteilungs-


Studia Math., 5 (1934) 34442. gesetze der Eigenverte linearer partieller Dif-
[7] E. Picard, Lecon sur quelques problemes ferentialgleichungen, Math. Ann., 71 (1912)
aux limites de la theorie des equations differen- 441-479.
tielles, Gauthier-Villars, 1930. [26] R. Courant and D. Hilbert, Methods of
[8] C. B. Morrey, Multiple integrals in the mathematical physics I, II, Interscience, 1962.
calculus of variations, Springer, 1966. [27] S. Agmon, Lectures on elliptic boundary
[9] 0. A. Ladyzhenskaya and N. N. Ural’ts- value problems, Van Nostrand, 1965.
eva, Linear and quasilinear elliptic equations, [28] 0. A. Oleinik and E. V. Radkevich,
Academic Press, 1968. (Original in Russian, Second-order equations with non-negative
1964.) characteristic form, Plenum, 1973. t:Original in
[lo] J. Serrin, The problem of Dirichlet for Russian, 1971.)
quasilinear elliptic equations, Philos. Trans. [29] M. I. Vishik and V. V. Grushin, Bound-
Roy. Sot. London, (A) 264 (1969), 413-493. ary value problems for elliptic equations de-
[ 1 l] J. Leray and J. Schauder, Topologie et generate on the boundary of a domain, Math.
equations fonctionnelles, Ann. Ecole Norm. USSR-Sb., 9 (1969), 423-454. (Original in
Sup., 51 (1934) 45-78. Russian, 1969.)
Cl23 S. Mizohata, The theory of partial dif-
ferential equations, Cambridge Univ. Press,
1973. (Original in Japanese, 1965.)
[13] L. Nirenberg, Remarks on strongly ellip-
tic partial differential equations, Comm. Pure 324 (X111.22)
Appl. Math., 8 (1955), 649-675.
[14] L. Garding, Dirichlet’s problem for linear
Partial Differential Equations
elliptic partial differential equations, Math. of First Order
Stand., 1 (1953), 55-72.
[ 151 K. 0. Friedrichs, On the differentiability
of solutions of linear elliptic equations, Comm. A. Quasilinear Partial Differential Equations
Pure Appl. Math., 6 (1953), 299-325. and Their Characteristic Curves
[16] S. Agmon, A. Doughs, and L. Nirenberg,
Estimates near the boundary for solutions of Suppose that we are given a tquasilinear
elliptic partial differential equations satisfying partial differential equation
general boundary conditions I, II, Comm.
Pure Appl. Math., 12 (1959), 6233727; 17 2 pitxTu)&yQ(x, u)9 x=(xl,...,x,). (1)
I
(1964), 35-92.
[ 171 M. Schechter, General boundary value A curve defined by a solution xi = ~c,(t), u = u(t)
problems for elliptic equations, Comm. Pure of the system of ordinary differential equations
Appl. Math., 12 (1959) 457-486. dx.
[lS] J. L. Lions and E. Magenes, Problemes 2 = Pi(X, u), i=l,...,n; ~=QW
dt
aux hmites non homogtnes et applications
I-III, Dunod, 1968. is called a characteristic curve of (1:1(- 320
[ 191 F. John, Plane waves and spherical Partial Differential Equations; 322 Partial
means applied to partial differential equations, Differential Equations (Methods of Integra-
Interscience, 1955. tion)). A necessary and sufficient condition for
[20] N. Aronszajn, A unique continuation u = u(x) to be a solution of (1) is that the char-
theorem for solutions of elliptic equations or acteristic curve passing through any point
inequalities of second order, J. Math. Pure on the hypersurface u = u(x) (in the (n + l)-
Appl., 36 (1957) 235-249. dimensional xu-space) always be contained in
[21] A. P. Calderon, Uniqueness in the this hypersurface. For example, the: character-
Cauchy problem for partial differential equa- istic curve of C;=‘=, xiau/axi = ku is xi = xpe’,
tions, Amer. J. Math., 80 (1958), 16-36. u = u”ek* (a solution of xf = xi, u’ = It u). There-
[22] B. R. Vainberg and V. V. Grushin, Uni- fore the solution u = u(x) is a function such
formly nonelliptic problems I, II, Math. that u@x,, . . . . lx,)=lku(x,, . . . . xn) @=e’>O),
USSR-Sb., 1 (1967), 543-568; 2 (1967), 11 l- i.e., a homogeneous function of degree k.
133. (Original in Russian, 1967.)
[23] I. N. Vekua, System von Differential-
gleichungen erster Ordnung vom elliptischen B. Nonlinear Partial Differential E,quations
Typus und Randwertaufgaben, Berlin, 1956. and Their Characteristic Strips
[24] J. Moser, On Harnack’s theorem for
elliptic differential equations, Comm. Pure We denote the value of au/ax, by pi and define
Appl. Math., 14 (1961) 577-591. the surface element (or hypersurfaoe element)
1213 324 D
PDEs of First Order

by the (2n + 1)-dimensional vector (x, u, p) = pendent when the rank of the matrix (Pj) is
(x i, . . . ,x., u, pl, . . . , p,). Consider the partial equal to k. If a system of k independent linear
differential equation partial differential equations involving one
unknown function f(x),
F(x l,...,X.,U,Plr...,Pn)=O, pi= aulaxi.
(2) X&=0, . ..) XJ-=o, (3)
A set (x(t), u(t),p(t)) of surface elements de- has the maximum number n -k of independent
pending on a parameter t and satisfying the tintegrals, then the system (3) is called a com-
system of ordinary differential equations plete system. A necessary and sufficient condi-
tion for the system (3) to be a complete system
& -F&Y
du
zziz n PiFpi,$= -(F.ri+PiF,) is that there exist k3 functions n;(x) of class
dt C” such that
is called a characteristic strip of equation (2),
and the curve x=x(t), u = u(t) is called a char-
acteristic curve of (2). For quasilinear equa-
tions, this definition of characteristic curve that is,
coincides with the one mentioned in Section A.
Furthermore, an r-dimensional tdifferentiable
manifold consisting of surface elements satisfy-
ing the relation Here, [X,, Xi] is a differential operator of first
order, called the commutator of the differential
du- i pidxi=O operators Xi, Xi or the Poisson bracket.
i=l

is called an r-dimensional union of surface


elements. A solution of the partial differential D. Involutory Systems
equation (2) is, in general, formed by the set
of all characteristic strips possessing, as ini- For two functions F(x, u, p), G(x, u, p) of x, u, p
tial values, surface elements belonging to an of class C”, we define the Lagrange bracket
(n - 1)-dimensional union of surface elements CF>Gl by
satisfying F(x, u, p) = 0.
An example of a nonlinear partial differen- CF,Gl=iI (E($+P~:)
tial equation of first order is pq - z = 0 (where
x,=x,x,=y,u=z,p,=p,p,=q).Theequa-
tions of the characteristic strip are x’ = q, y’ = p,
z’ = 2pq, p’ = p, q’ = q, and therefore the char-
acteristic strip is given by y = y, + (p,,/q,,)x, If F, G do not contain u and are homogeneous
z = zo + 2PoX +(Pd/qo)x2v P = PO +(Po/qo)x,
linear forms with respect to p, then F and G
q = q. +x (if we take x as an independent vari- are differential operators F = Xi u and G = X2 u
able and impose an initial condition y = y,, with respect to u (for pv = au/ax,), and we see
z=zo, p=po, q=qo at x=0). Putting zo= that [F, G] = [Xi, X2]“. This bracket has the
W(yo) (an arbitrary function) for x = 0, we following properties:
have, furthermore, y = yo,+ W(y,)/(W ‘(Y~))~,z = CF,Cl = - CGKl,
wYo)+2wYoYw’(Yo)+ ~(Yob2/w(Yo))2.
The elimination of y. from these expressions
yields a general solution z = z(x, y). In this
CF,‘~((5, . . . , %)I = i$I g CF,Gil,
I
case, a tcomplete solution is 4az = (x + ay + b)2
(where a, b are constants), and a tsingular
solution is z = 0.

C. Complete Systems of Linear Partial When F, G are functions of x and p only, we


Differential Equations usually use the notation (F, G) and call it also
the Poisson bracket. In this case, the right-
hand side of the third relation vanishes.
For functions P;.(x)(i= 1, . . . . x=(x,, . . . . x,)) of Consider k partial differential equations
class C”, define a differential operator X by involving one unknown function u(xl, . . . , x,),
au
x= t PJx)L F,(x,u,p)=O, i=l,..., k; (4)
“=, Y PY=aXy.
We call k differential operators Xi = If a common solution u(x) of these equations
Z;=1p;(x)a/ax, (i= 1, .. . , k) mutually inde- exists, it is also a solution of [F,, Fj] = 0 (i,j=
324 E 1214
PDEs of First Order

1, . . . , k). Therefore, from the equations thus function by ‘p(xI, . . ,x,, u) = 0, we have
obtained, we take independent equations and
add them to the original system. If we then x ,,..., xn,u,-=
have more than n + 1 equations, the original
acpiau1.‘. ’
system has no solution. Otherwise, we obtain This gives formally a partial differential equa-
a system Fj = 0 (j = 1, . . . , Y) for which the Fj are tion with independent variables u, x1, . ,x,
independent (i.e., the rank of (aFJ@,) is equal and a dependent variable cp. It does not con-
to r), and all [F,, Fjl = 0 can be derived from tain cp explicitly. That is, this equation has the
Fj = 0. A system (4) such that [Fi, 41~ 0 for all form
i, j is called an involutive (or involutory) system.
We always treat a system by extending it to an
involutory system. When k = 1, we regard the Then, by finding a partial derivative, say
equation itself as an involutory system. adax.+, , as a function of the rem,aining ones
When the equatians (4) are mutually inde- from the displayed equation, we get a partial
pendent, a necessary and sufficient condition differential equation of the form
for them to have in common a solution with
arplat+H(t,X,acp/aX)=o,
n-k degrees of freedom (a solution that coin-
cides with an arbitrary function on an ade- which is called the normal form of the partial
quate manifold of dimension n-k) is that the differential equation of first order. Setting pi=
system (4) be a system of equations involving @/ax,, the equations of the characteristic
unknowns p and equivalent to an involutory curve of this equation are
system.
dx. dpi
An involutory system (4) can be extended ~=H”i(t,x,p), z= -ft&x.p),
to an involutory system consisting of n inde-
pendent equations by adding n-k suitable i=l,...,n,
equations
which are called Hamilton’s differential
F~+~(x,u,P)=~~+~,...,F,(x,u,P)=~,. (4’) equations.
That is, we can find successively f= Fl (I= k + Now, consider the tEuler-Lagrange dif-
1, , n) such that Fl satisfies the system of ferential equations
equations dFXi/dt-F+=O, i= 1, . . . . n,

CFiafl =O, i=l,...,l-1, for the integral


which is a complete system of linear partial ‘1 dx
differential equations for A and the & (i = J= F(t, x, x’) dt, x1=1,
llL
1, . . , n) are mutually independent. Then, if we
find that py as functions of (x, u, a) (a =(u~+~, Under the assumption that det(F,..,;)#O, we
‘.’ 2a,)) from (4) and (4’), the system of ttotal put FXr = pi, and solve these relations with
differential equations respect to XI in the form, say, xi = tpi(t, x, p).
Furthermore, if we put
au
-=pY(x,u,a), v=l,..., n,
ax,

is tcompletely integrable, and we can find u as


a solution containing essentially n-k + 1 then the Euler-Lagrange equations are equiva-
parameters c, ak+, , . , a,, that is, a complete lent to Hamilton’s differential equations
solution of (4). Moreover, if we have an in- dxi/dt = H,,, dpildt = - Hxi, i=l,...,n,
volutory system of n + 1 independent equa-
tions F, =O, . . . . F,=O, Fk+l =ak+l, . . . . F,,, = since F(t, x, x’) = zy=‘=l pi Hpi - H.
a,,, , then we find a complete solution by A curve represented by a solution of the
eliminating pl, . . . , p, between the equations. Euler-Lagrange equations is called a station-
This method of integrating an involutory ary curve. Now consider a family of stationary
system is called Jacobi’s second method of curves in a domain G of the (n + l).-dimensional
integration. tx-space such that passing through every point
of G there is one and only one curve in this
family, and suppose that the family is ttrans-
E. Relation to the Calculus of Variations versa1 to an r-dimensional manifold rU (r < n)
(that is, F6t -C F,$x, = 0 for the differentials
Consider a partial differential equation of first 6t, 6xi along a?[; in particular, if 2I consists of
order F(x,, . . . . x,,u,p, ,..., p,)=O.Ifasolution only one point (r =O), a stationary curve pass-
u(x) of this equation is given as an implicit ing through this point is transversal to IX). In
1215 325 A
PDEs of Hyperbolic Type

this case, if we denote by V(t, x) the value of 325 (X111.25)


the integral J along the stationary curve from
2I to any point (t, x) of G, then the equation
Partial Differential Equations
of Hyperbolic Type
aV/&+H(t,X,aV/ax)=O

holds. This equation is called the Hamilton- A. Second-Order Linear Hyperbolic Equations
Jacobi differential equation or the canonical or
eikonal equation. Conversely, a solution of this A tlinear partial differential equation in n + 1
equation is equal to the value of the integral J variables t, x=(x,, . . . . x,) of the second order,
for a family of stationary curves transversal to
an adequate ‘?I.

au n au
F. The Monge Differential Equation -a,-- 1 a,--au=O,
at i=l axi
Consider the partial differential equation (2). with coefficients aoi, . . . , a that are functions in
By eliminating p and t between (t, x) is said to be hyperbolic (or of hyperbolic
type) (with respect to the t-direction) in tx-
dxi space if the characteristic equation of equation
--=Fpp :=A PiFpi, and F(x, u, p) = 0
dt (1) considered at each point of tx-space,
(for example, by eliminating p between dxJdx,
H(t,x;1,5)=1’- ‘f a&l-- i aij&tj=O,
= FPi/FP, and F = 0 when FP, #O), we obtain i=l i,j=l

,,..., ~,,~,a~,la~, ,..., ax,fax,)=o. (2)


Mb
has two distinct real roots 1= I, (t, x, <),
This equation is called the Monge differential
l,(t, x, <) for any n-tuple of real numbers 5 =
equation, and the curve represented by its solu-
tion is called an integral curve of the equation. (51, . . ..5”)#(0. . . . . 0). In particular, (1) is called
regularly hyperbolic if these two roots are
In the (n + 1)-dimensional tx-space, a curve
separated uniformly, that is,
that is an envelope of a l-parameter family of
characteristic curves of the partial differential (t?;1~=,ln,(t,x,5)-n,(t,x,5)1=c>o.
. .
equation (2) is a solution of the Monge equa-
tion. A characteristic curve is also an integral A typical example of hyperbolic equations is
curve. When n = 2, an integral curve that is not the wave equation
a characteristic curve is a tline of regression of
a% as
the surface generated by the family of charac-
teristic curves tangent to the integral curve
qU=c7tZ-@-...-ax,‘=O.
a%
(3)

under consideration (which is an integral sur- Equation (3) is also called the equation of a
face of F(x, u, p) = 0). If F is linear in pi, i.e., vibrating string, the equation of a vibrating
quasilinear, all integral curves coincide with membrane, or the equation of sound propaga-
characteristic curves. tion according as n = 1,2, or 3. Another exam-
ple is

a2u ,a2u au
References z-c @-2%=0,

[l] R. Courant and D. Hilbert, Methods of which describes the propagation of electric
mathematical physics II, Interscience, 1962. current in a conducting wire with leakage and
[2] I. G. Petrovskii, Lectures on partial dif- is called the telegraph equation (- Appendix
ferential equations, Interscience, 1954. (Orig- A, Table 15).
inal in Russian, 1953.) A hyperplane ,l(t - to) + . . . + &(x” - xz) = 0
[3] E. Goursat, Cours d’analyse mathema- passing through a point p” = (to, x0) in tx-space
tique, Gauthier-Villars, II, second edition, and having normal direction (A,<) is called a
1911; III, fourth edition, 1927. characteristic hyperplane of (1) at p” if the
[4] F. John, Partial differential equations, direction (A,<) satisfies the characteristic equa-
Springer, 197 1. tion at p” : H(t’, x0; A,<) = 0. A hypersurface S :
[S] C. Caratheodory, Variationsrechnung s(t, x) = 0 in tx-space is a characteristic hyper-
und Partielle Differentialgleichungen Erste surface of (1) if at each point of S the tangent
Ordnung, Teubner, 1935; English translation, hyperplane of S is a characteristic hyperplane
Calculus of variations and partial differential of (I), that is, H(t, x; st, s,) = 0 everywhere on
equations of the first order, Holden-Day, 1967. S. According to the theory of lirst-order par-
325 B 1216
PDEs of Hyperbolic Type

tial differential equations, a characteristic are of class C’, then there exists a unique solu-
hypersurface of S is generated by so-called tion u = u(t, x) of class C? in the domain 0 < t <
bicbaracteristic curves, i.e., solution curves t = co, -cc <xi< co (1 <i<n). Moreover, this
t(z), x(z) of a system of ordinary differential correspondence {u,(x),ui(x)}+u(~,x) is con-
equations tinuous in the following sense: If a sequence of
dt initial functions {uOk(x), u,,(x)} (k.= 1,2, . ..)
-Hi., . ..a dx,- -Hen> and their derivatives up to the Ith order tend
dr- dz
to 0 uniformly on every compact set in the
di hyperplane t = 0, then the sequenc’e of corre-
Ht, . ..> d5,- H
dz=- dz - - %’ sponding solutions u,(t, x) also tends to 0
uniformly on every compact set in each hyper-
H(t,x;i,[)=O. plane t = constant. In other words. the Cauchy
Now if (1) is hyperbolic, the set of all charac- problem for regularly hyperbolic equations is
teristic hyperplanes at p” : {l(t - to) + . . . + 5.(x, twell posed in the sense of Hadamard [2].
-xi) = 0 1H(t”, x0; A,() = 0} has as its envelope For dependence of the solution on initial
a cone C(p”) with the vertex p”. Moreover, data, the following proposition is svalid: The
since the intersection of any hyperplane t = values of the solution u at a point p” = (to, x0)
constant and the cone C(p”) is an (n- l)- depend only on the initial data on a domain
dimensional ellipsoid or two points for n = 1, Go (domain of dependence) of the initial hyper-
a conical body D+(p”) (k(p’)) is determined, plane, which is determined as the intersection
whose boundary consists of the part of C(p”) of the backward emission %(p”) and the
with t > to (t < to) and the interior of the ellip- initial hyperplane. We have the following dual
soid on the hyperplane t = constant. A tsmooth proposition: A change in the initial conditions
curve y in tx-space is called timelike if the in a neighborhood of a point Q. of the initial
tangent vector of y at each point p on y be- hyperplane induces a change of values of the
longs to D+(p) or D-(p). Consider the set of solution only in some neighborhood of the
points that can be connected with the point p” forward emission 9+(Qo) (domain of influence).
by a timelike curve. We call its closure an If the coefficients of the equation a.re bounded,
emission, and a subset 9+(p”) (9-(p’)) of the the intersection of emissions 9+(p”) and the
closure for which t 2 to (t < to) a forward (back- hyperplane t = constant is always compact.
ward) emission. An emission is a conical body It follows that the domain of dependence and
surrounded by characteristic hyperplanes in the domain of influence are bounded. In some
some neighborhood of the vertex p”. If the co- special cases, there exists a proper subdomain
efficients of (1) are bounded functions, the of Go such that the solution depends only on
emissions 9+(p”) are contained in a conical the initial data on the subdomain. For exam-
-
body ple, for the wave equation (3) with n = 3, the
solution for the Cauchy problem a.t a point
p” = (to, x0) (- Section D) is determined, as
can be seen from the solution formula (12),
independently of the situation of p”, where by the initial data in a neighborhood of the
cone with vertex p” : (t - to)Z = & (xi - x;)~,
Lax = max,,,x),151=1 (I4(LT5)L IW,x,5N.
namely, in a neighborhood of the intersection
of bicharacteristic curves (lines, in this case)
B. The Caucby Problem passing through p” and the initial hyperplane.
If the solution of the Cauchy problem has
Important for the hyperbolic equation (1) is such a property, it is said that Huygens’s prin-
the KZauchy problem, i.e., the problem of ciple is valid, or that diffusion of waves does
finding a function u = u(t, x) that satisfies (1) in not occur. For the wave equation l(3), Huy-
t > 0 and the initial conditions gens’s principle is valid only for odd n > 1.
40, 4 = u,(x), &/&(0,x)=u,(x), (4)
where the functions uo(x) and ui(x) are given
C. The Energy Inequality
on the initial hyperplane t = 0.
Suppose that (1) is regularly hyperbolic and
The energy conservation law for the wave
the coefficients are bounded and sufficiently
equation (3),
smooth (i.e., of class C’ with v sufficiently
large). Then for the Cauchy problem the fol- E(t)
lowing theorem holds. Theorem (C): There
exists a positive integer 1(= [n/2] + 3), depend-
ing on the dimension n + 1 of the tx-space,
such that if the functions uo(x) and ur(x) in (4) = constant,
1217 325 D
PDEs of Hyperbolic Type

is generalized to the so-called energy inequal- following equality holds [7]:


ity for hyperbolic equations, which plays an m
essential role in deducing the well-posedness of
the Cauchy problem and the properties of the
q(x)=
s-cc
A:“+q”2cp(y)dy X,((x-Ykw~,
s (ml=1
(7)
domain of dependence of the solution. Let the
coefficients of a hyperbolic equation (1) be where AYis the tLaplacian with respect to the
bounded functions, and let G(r) be the inter- variables y=(y,, . . . . y.) and dw is the surface
section of the conical body K = {(t, x) l&&t - element of the unit sphere lol= 1 in x-space.
tl)Z > C;=i (xi - xt)‘} with the hyperplane t = Now, since the tprinciple of superposition is
r (t < t’). The k( > l)th-order energy integral valid because (1) is linear, we can infer from
of the solution u(t, x) of (1) on G(7)is defined formula (7) that the Cauchy problem (1) with
by initial condition (4’) can be reduced to the
one for initial conditions with parameters
Ef’(u,G(7))
Y, w:

u(r,x)=O, &4/&(7,x)=&((x-y)o). (4”)


In fact, since x,(s) is (q - 1)-times differentiable
Then the following inequality holds: by definition, the Cauchy problem (1) with
Ejk)(u,
G(7))
< CE$‘(u, G(t”)), to < 7<t’, (5) initial condition (4”) has a unique solution
R&t, x; 7, y; w) for q chosen large enough so
where the constant C is independent of u. We that theorem (C) can be applied to (1) with
call (5) the energy inequality (J. Schauder [S]). initial condition (4”). Moreover, R&t, x; 7, y; w)
For the wave equation (3), the hypothesis 1= is a function of (t, x, 7, y, w) of class C’, and v
[n/2] + 3 in theorem (C) can be replaced by a increases with q. Now, let q(x) be a function
weaker condition I = [n/2] + 2, but if we take of class C’ with sufficiently large v and with
I= [n/2] + 1, there is an example for which no compact support. Then, by (7) and the delini-
global solution of class C? exists. In general, tion of R,, the integral
even though the initial functions are of class
C’, the solution in the Cauchy problem for co A1”+4”2c$y)dy R,(t,x;7,y;Ww (8)
hyperbolic equations may not be of class C’, 5 -52 s Iwj=1
while if the energy of the initial functions is
is a solution of the Cauchy problem (1) with
bounded, the energy of the solution is also
initial condition (4’). Therefore, when R, is
bounded.
found explicitly, (8) yields a solution formula
of the Cauchy problem (1) with initial condi-
tion (4’) as a functional of the initial func-
D. Representation Formulas for Solutions of
tions. Since in (8), the integral ~,+i R,do is
the Cauchy Problem
not necessarily of class Cnfq as a function of
(t, x; 7, y), Af+@‘2 & R,do is in general not a
We consider solution formulas that represent function in the ordinary sense. But we denote
solutions of the Cauchy problem explicitly as it by R(t, x; 7, y) formally, and understand that
functionals of the initial functions. The prob- a linear operator
lem of solving (1) under the condition (4), or
more generally, the problem of solving an u(t,x)= W,x;7,ykdy)dy (9)
equation L[u] =f(t, x) under (4), can be re-
duced, by transforming the unknown function is defined by (8). The kernel R(t, x; 7, y) in this
u and applying tDuhamel’s method, to the sense is called a fundamental solution or Rie-
Cauchy problem with initial conditions on the mann function of the Cauchy problem. If we
hyperplane t = 7: extend the function ~,o,=lR&t, x; 7,y; w)dw de-
47,x)= 0, au/at(7, x)= q(x) (4’) fined for t > 7 to t < 7, assigning it the value 0
there, then R(t, x; 7, y) = A$‘+q)‘2~l+i R&t, x;
for arbitrary 7.We define a function x,(s) of a 7, y; o)do can be considered a tdistribution
real variable s by on (t, x; 7, y)-space, and the equality L(t, x,
a/at,a/ax)R(t,
x;7,y)=~*(7,y,aja7,a/ay)R(t,
X;
x,(s)= ls1*/4(2zi)“-‘q! (n odd), 7, y)=@t-7)6(x-y) is valid, where L* is the
= -sqlogIs1/(2ni)“q! (n even), (6) tadjoint operator of L and 6 is tDirac’s 6-
function. In other words, R(t, x; 7, y) is a fun-
where n is,the dimension of the x-space and q damental solution of L in the sense of distri-
is a positive integer such that q + n is even. bution theory.
Then for a function q(x) of class C’ (with v The fundamental solution R(t, x; 7, y) can be
sufficiently large) with compact support, the analyzed using the asymptotic expansion with
325 E 1218
PDEs of Hyperbolic Type

respect to the sequence of functions {x&s)}, E. Second-Order Nonlinear Hyperbolic


and we have the following important result: Equations
If the coefficients of (1) are of class C” (resp.
real analytic), then the fundamental solu- A second-order nonlinear differential equation
tion R(t, x; T, y) is of class C” (real analytic)
a5 au au a2u a%
in (t, x) except for points that are on bichar- -=A t,x,u,--,--,--- ~-
acteristic curves of (1) passing through the
at2 at axi ataxi'axialG '
point (z. y). In the language of the Cauchy 1 <i, j<n, (13)
problem, the smoothness of the solution u at
a point p = (t, x) depends only on the smooth- is called hyperbolic in a neighborhood of a
ness of the initial conditions on a neighbor- function U(t, x) if the linear equation of the
hood of the intersection of the initial hyper- form (1) obtained from (13) is hyperbolic,
plane and all bicharacteristic curves pass- where aoi(t, x) and aij(t, x) are determined
ing through p. This fact is called Huygens’s by substituting u by U(t, x) in the partial de-
principle in the wider sense. Behavior of the rivatives of A with respect to a2u/itax, and
fundamental solution R(t, x; r, y) near dis- LJ2u/dxiaxj, respectively. If the functions A
continuous points has also been investi- in (13) and U =x0(x) + tu, (x) determined by
gated [2]. (4) are sufftciently smooth with respect to
For the wave equation (3), the fundamental t, x, u, . , a2u/axidxj, the Cauchy problem for
solution can be constructed, and therefore we (13) with initial condition (4) has a unique
can write the solution formula explicitly. The solution in some neighborhood of the initial
solution formula for (3) with initial condition hyperplane under the condition that equation
(4’) for n > 3 is (13) is hyperbolic in a neighborhood of U. In
general, initial value problems for nonlinear
u(t, 4 equations have only local solutions.
1 an-2 t
(t2-~2)(n-3”2~Q(~,~)d~,
(n-2)! at”-2 s () F. Higher-Order Hyperbolic Equations
1
Q(x, T) =- cp(x+zo)dw, An Nth-order linear differential equation in
W” s [w/=1 n + 1 variables t, x = (xi, . . ,x,) with constant
coefficients
where o, = 2Jrr”/I(n/2) is the surface area
of the unit sphere of n-dimensional space.
Solutions of the Cauchy problem (3) with
initial condition (4) for n = 1, 2, and 3 are,
respectively, =o, (14)

u,(x+t)+u,(x-t) 1 X+t where cr=(ai, . . . . cc,),lal=cc,+...$-cc,,and


u(t, x) = +j Ul(W5
2 alal

(d’Alembert’s solution),
sx f
(10) c-1-
a fi
6X ax;1 . ..axp'
is hyperbolic in the sense of Girding if the
following two conditions are satisfied: (i) the
u(t, Xl >x*1
partial derivative aN/atN appears in L; (ii) the
1 a uo(51 a 52)d51@2
real parts of the roots i = E., (0, . , L,(l) of the
characteristic equation L(i, it) = 0 are bounded
- 2n at
s C~Jt2-(X1-51)2-(x2-52)2
functions of real variables 5 = (<i, , L&J.
~1(51>52)d51dL When L is a homogeneous equation of the
(11) Nth order, condition (ii) is equivalent to the
+k s C~Jt2-(X1-r,)2-(X2-52)2
following condition: (ii’) 1, (<), , AN(t) are
(Poisson’s solution), and purely imaginary for all real 5 = (r, , , I&) #
(0, . ,O). The principal part (consisting of the
1 6
ukx,,x2,x3)=~at f
highest-order terms) of a hyperbolic equation
sQ t is also hyperbolic. If (14) is hyperbolic, a
+L uI(t,,~2~:3)dw, (12)
theorem analogous to theorem (C) holds for
the Cauchy problem for (14) with initial
471 s n, t
conditions
(Kircbhoff’s solution), where C, is a disk in the
akujatk(o,~)=~k(x), O<k<n-1; (15)
5, (,-plane with center (x,, x2) and radius t, D,
is a sphere in the 5, t2ii3-space with center that is, the Cauchy problem for (14) with initial
(xi, x2, xg) and radius t, and dw, is the surface condition (15) is well posed in the sense of
element of D,. Hadamard. Conversely, if the Cauchy problem
1219 325 H
PDEs of Hy.perbolic Type

for (14) with initial condition (15) is well posed, lem. We take up two important types, hyper-
then (14) must satisfy the hyperbolicity con- bolicity in the sense of Petrovskii and sym-
ditions (i) and (ii) in the sense of Girding. In metric hyperbolicity due to Friedrichs.
other words, well-posedness of the Cauchy We call a system of linear differential
problem is equivalent to hyperbolicity in the equations
sense of L. Garding [ 141. Girding’s conditions
for hyperbolicity cannot be generalized to the
case of variable coefficients, since the influence
of the lower-order terms in the equation is l,<i<l, (17)
taken into account in the definition of hyper-
a system of hyperbolic differential equations (in
bolicity. However, in the case of constant
coefficients, an Nth-order homogeneous equa- the sense of Petrovskii) if the determinant
tion remains hyperbolic for any addition of
lower-order terms if and only if the character-
istic equation has N distinct purely imaginary
roots for any real 5 =(li, . . . , 5,) # (0, . . . , 0). In calculated formally using the matrix of dif-
this case the equation is called hyperbolic in ferential operators in the system, is hyperbolic
the strict sense. Thus a linear equation with in the sense of Petrovskii as a single equation
variable coefficients of N( = $, nj)th order. Petrovskii showed
that the Cauchy problem for a system that is
hyperbolic in this sense is well posed [lo].
There were some imperfections in his argu-
a,<N
ment, which have been corrected by others
=o (16) (- S. Mizohata [ 123). In the case of constant
is called hyperbolic in the sense of Petrovskii if coefficients, the Cauchy problem for (17) is
the characteristic equation well posed if and only if (18) is hyperbolic in
the sense of Girding.
aN+ 1 a,,&, x)ns(io” =o K. 0. Friedrichs, observing that the energy
.,+/.l=N
inequality played an essential role in Petrov-
has N distinct purely imaginary roots (called ski% research, studied symmetric hyper-
characteristic roots) A,(t, x, 0, . . , A,(t, x, 5) for bolic systems of equations, since for them
each point p = (t. x) and each 5 # 0. Moreover, the energy inequalities are valid most natu-
if the characteristic roots 1,) . . . ,1, are sepa- rally. A system of first-order linear differential
rated uniformly, i.e., the inequality equations
lim 15th x, 5) - &(L x, 5)1= c > O
(t,x),lSI=l.j#k

holds, (16) is said to be regularly hyperbolic. In


the second-order case, this definition is equiva- is called symmetric hyperbolic (in the sense of
lent to the previous one. Theorem (C) holds Friedrichs) if the matrices A,(t, x) (0 < i ,< n) are
for the Cauchy problem for a regularly hyper- symmetric and A,(t, x) is positive definite. A
bolic equation (16) with initial conditions typical example is provided by Maxwell’s
(15). For the domain of dependence of the equations. For this system it has been shown
solution, a result analogous to the case of the that the Cauchy problem is well posed and
second-order equation can be obtained using the domain of dependence of the solution is
an energy inequality [9, lo]. If the coefficients bounded [13].
are of class C”, Huygens’s principle in the
wider sense is valid, that is, discontinuity of the
H. Weakly Hyperbolic Operators
solution is carried over only along bicharacter-
istic curves. We adopt the following definition of hyper-
bolicity: a linear differential operator of Nth
order
G. Systems of Hyperbolic Equations

For systems of equations

,$ Lij[Uj]=O, 1<i<l, is called hyperbolic if the Cauchy problem for


L[u] =0 with initial condition (15) is well
where the L, are higher-order linear differen- posed in Hadamard’s sense. A hyperbolic
tial operators of the form (16), several types operator L is called strongly hyperbolic if L
of hyperbolicity are formulated in connection remains hyperbolic for any addition of lower-
with the well-posedness of the Cauchy prob- order terms, and weakly hyperbolic otherwise.
325 I 1220
PDEs of Hyperbolic Type

A necessary condition for hyperbolicity is for a,+lal+q&+r(fil<p, then it is necessary


that all characteristic roots of LN(tr x, ,I,() = 0 for the well posedness of the Cauchy problem
be real for any (t, x, 5) (P. D. Lax [ 151, Mizo- that
hata [ 161). In the case of constant coefficients,
strongly hyperbolic operators have been char-
acterized by K. Kasahara and M. Yamaguti
(Mem. Cdl. Sci. Kyoto Uniu. (1960)). (‘<ER”\O)
As for operators with variable coefficients, it fora,+/al+qB,+r]/I]+s(l+q)<,7aresatis-
is known that not only regularly hyperbolic fied, where L,-, (1 <s d N) are the homoge-
operators but also some special classes of not neous lower-order terms of order N -s of L
regularly hyperbolic operators are strongly (Ivrii and V. M. Petkov [20]).
hyperbolic (V. Ya. lvrii, Moscow Math. Sot., On the other hand, the suffrcieni. condition
1976). Let us consider the hyperbolicity of an in the case of multiplicity 2 is given by some
operator L which is not regularly hyperbolic conditions related to the subprincipal symbol
under the assumption that the multiplicities of
the characteristic roots are constant and at
i a2L,
L,-, --
most 2; namely, the characteristic polynomial 2 ( Et+&g$

L,(t, x, I”, 5) = AN + c a,&, X)2y which corresponds to Levi’s condition in


a,+lal=N the case of constant multiplicity, and to the
rr,QN-1
tPoisson brackets (A. Menikoff, Amer. J.
is decomposed in the following way:
Math., 1976; Ohya, Ann. Scuola Norm. Sup.
LN(t, x, 1, 5) Pisa, 1977; L. Hormander, J. Anal. Math.,
1977).
The Cauchy problem for a weakly hyper-
bolic system of equations is more complicated,
because of the essential difficulty that the
If: l/z,(t,X,5)-n,(t,X,5)l=C>O.
l<l=i,jik matrix structure

Assume that the ij(t, x, 0, j = 1,2, . . , N-s,


are real. Then L is hyperbolic if and only if it
c
a,+lal=nj
a&(t, x),laO[’
>
satisfies E. E. Levi’s condition, i.e., associated with (17) is not clear in general (-
references in [20]).

11=o
n a2LN a3, I. Gevrey Classes
+Cm=,aiacaa~, A=A,
Classically, the functions of tclass s (s > 1)
forall (t,x,<), j=l,2 ,..., s, of Gevrey (- 58 C”-Functions and Quasi-
Analytic Functions G; 168 Function Spaces B
where L,_, denotes the homogeneous part (14) were introduced into the stud& of the
of (N - 1)th order among lower-order terms fundamental solution for the heat equation:
of L (Mizohata and Y. Ohya [17]). Thus,
for a weakly hyperbolic operator with vari- $JR”)= {cp(x)~C?(R”)]for any compact
able coefficients, even the principal part is subset K of R” and any multi-indices
not necessarily hyperbolic. J. Chazarain [ 1 S] cc,there exist constants C, and A
has studied weakly hyperbolic operators such that ~~p,I(a/ax)“cp(x)I G
with characteristic roots of arbitrary constant C,A’a’lcrl!“}.
multiplicity. 0. A. Oleinik [ 191 studied the This class of functions was used efficiently in
Cauchy problem with nonconstant multiplic- the studies of the Cauchy problem for tweakly
ities for second-order equations. For higher- hyperbolic partial differential equations:
order equations, if the multiplicity of charac-
teristic roots at (F, 2) is at most p, or more
precisely, if there exist positive rational num-
bers q and r (q 2 r) such that in [O,T]xR”=Q

j=O, 1,...,h’-1.

We assume that the multiplicities of the char-


acteristic roots are constant, i.e.,

L,(t,x,l,r)=Ij(~.-ii(t,x,t))‘i,
(“~ER”\O) i=l
1221 325 K
PDEs of Hyperbolic Type

where vi is constant for any (t, x, 5) ER x R”, when xeK(A,0)\W(A,@). Here,q=mk-IpI-n


&(t, x, 5) is real and distinct, and Zf=, vi = N. is the degree of homogeneity of the left-hand
Let max 14iakvi=p. If we suppose that L(t,x, side,andw=~j”=,(-l)j-‘5jd51/\...Ad~A
a/at,a/ax)- Z(t, X,a/at,a/ax) is a (tpseudo) . . Ed&. The integrands are closed trational
differential operator of order at most q, where (n - 1)-forms on (n - 1)-dimensibnal complex
tprojective space and are integrated over cer-
tain thomology classes c(* = a(A, 8, x)* and t, .
&*. These formulas provide means of obtain-
is a (pseudo)differential operator with a,(t, x, ing topological criteria for lacunas. Let Y c
a/at, a/ax) being strictly hyperbolic (pseudo) g be a maximal connected open set, where
differential operators associated with L,, E(L, 0, x) is holomorphic. dip is said to be a
then for any s such that 1 <s < p/q, the weak (strong) lacuna of L if E(L, 0, x) is the
Cauchy problem is well posed in yj@), restriction of an entire function to P(E(L, 8, x)
provided that all a,&, x) of L and f(t, x) =0 in 2). In [25] it is shown that x belongs
belong to y!:\(Q), and that {u~(x)J,,~~~~-~ are to a weak lacuna for all E(Lk,, 0;) if and only
given in $,L(R”) (Ohya [21], J. Leray and if ac(* = 0. The sufficiency directly follows from
Ohya [22]). This result was proved even for (20’) and the necessity follows from a theo-
the case of arbitrary nonconstant multiplicity rem of A. Grothendieck (Publ. Math. Inst.
of characteristic roots by M. D. Bronshtein HES, 1966) which implies that the rational
1231. forms which appear in (203 span all the tco-
homology classes in question.
J. Lacunas for Hyperbolic Operators

The theory of lacunas of fundamental solutions


K. Mixed Initial-Boundary Value Problems
of hyperbolic operators, initiated by Petrovskii
[24], has been developed further in a paper
[25] by M. F. Atiyah, R. Bott, and L. Girding. Let Q be a domain in R” with a sufficiently
Let L(t) = &(I;) + M(t) be a hyperbolic smooth boundary I-, let L(t, x, a/at, a/ax) be a
polynomial with respect to the vector &R” - 0, linear hyperbolic operator of N th order de-
where J&(C) is the principal part of L; this fmedin[O,co)xfi={(t,x)It~[O,co),x~fi},
means that LN(0) # 0 and L(< + t0) # 0 when and let Bj(t, x, a/at, a/ax), j = 1,2, ., . , b, be linear
IIm t 1is sufficiently large. Then L has a funda- differential operators of N,th order defined in a
mental solution E = E(L, 0, x) in the form neighborhood of [0, co) x r.
The problem of finding a function u(t, x)
E(L,&x)=(27q” ~(5 -ice)-leix(C-ic8)dg, satisfying the conditions
s
L[u]=O in (0,co)xQ
where c is sufficiently large and the integral
is taken in the sense of tdistribution. The con- Bj[u]=O on (O,co)xr, j=l,2 ,..., b,
vex hull of the support of E, denoted by K =
K(L, 0) = K(A, f?),is a cone depending only aku/atk(o,x)=uk(x), O<k<N-1, (21)
on the real part Re A of the complex hyper- is called a mixed initial-boundary value prob-
surface A: L(l) = 0, and contained in the union lem. A typical example of such a mixed prob-
of the origin and the half-space x. 0 > 0. Let A, lem is provided by the case L = IJ (n = 2) and
be the ttangent conoid of A at 5, transported B[u] = u(t, x), which describes the vibration of
to the origin, and define the wavefront surface membranes with a fixed boundary.
W(A, 0) by the union of all K(A,, 0) for 5 #O. The mixed problem (21) is said to be well
Then it can be shown that the singular sup- posed if for any initial data uk(x)~Cm(iZ),
ports of E(L, 0) and all the E(Lk,, 0) are con- 0 < k < N - 1, which are compatible with the
tained in W(A, 0) and, moreover, that they are boundary conditions, there exists a unique
locally holomorphic outside W. In [25], the solution u(t, x)E Cm( [0, 00) x a).
Herglotz-Petrovskii-Leray formulas are gen- Mixed problems for second-order hyper-
eralized to any nonstrict L(t). Thus we have bolic equations are considered in [6]. In regard
DpE(L;,o,x) to mixed problems for hyperbolic equations
of higher order, we make the following four
= const Jx. 5)q5flLrm-k45)> q>o, (20) assumptions: (i) Sz= R”, = {(x’, x,) 1x’ER”-I,
5 x, > 0); (ii) r = {x Ix, = 0) is not characteristic
DBE(L:,, 0, x)
for L or Bj; (iii) L is regularly hyperbolic;
(iv) Nj<N-1 and Nj#Nkifj#k.
We denote the tprincipal parts of L and Bj
= const (x~r)“PGr(Kk45b qso
s t; Jo* by L,(t,x’,x,,i?/i% alax’, a/ax,) and Bjo(t,x’,
(20’) apt, a/ax’, a/ax,), respectively. By the hyper-
325 L 1222
PDEs of Hyperbolic Type

bolicity of L, an equation in K where cp is a smooth function satisfying the


eikonalequationJV~~Z=l.Ifuj,j=C~,l,...,N,
L,(t,x’,O,i,(‘,~)=0 for Imi<O, t’ER”-’
satisfy the transport equations
has p roots K: with ImK; >O, and N-p roots
ic: with Im ~~~ < 0, and the number p is inde- 22+2Vo.V0j+Aqnj= -ilJujml, u-1 =o,
pendent of (t, x’) and (A, 5’). A necessary con-
dition for the well-posedness of the mixed we have
problem (21) is that the number of boundary
conditions coincide with this integer p. The
q w=O(kmN).
function R defined by Then w of (22) is an approximate solution of
q u = 0 for large k, and it represents a wave
W, x’, At’)
propagating in the direction Vq.
When supp w fl(0, co) x I # cp, if M’ hits the
= det
boundary I transversally, we can construct

where L’ = n&i (K - $ (t, x’, A,[‘)) and C is a w+(t, x; k),eWe+W-O


j$o uj’ 0, x)k -j

contour enclosing all xj’, is called a Lopatinski


determinant.
We say that L and Bj satisfy the uniform such that
Lopatinski condition if lVcp+l’=l inR, (p+=‘p, and
inf IR(t,x’,&t’)l=c>O.
(r,x’,,lmA-<(l
K’l+l4=1

When the uniform Lopatinski condition is


satisfied, the mixed problem (21) is well posed, and vj+ satisfy the transport equations and vj+
and (21) represents a phenomenon with a finite = - uj on (0, co) x I, where v is the unit inner
propagation speed, which is the same as that normal of I. Then w + w+ is an approximate
of the Cauchy problem for L[u] = 0 (T. Bala- solution, and w+ represents a reflected wave
ban [26], H. 0. Kreiss [27], and R. Sakamoto propagating in the direction
[28]). An analogous result holds in the case vcp + = vq - 2(Vcp v)v.
of a domain R with a compact boundary I,
provided that L and Bj satisfy the uniform These asymptotic solutions show that the
Lopatinski condition at every point of I. high-frequency waves propagate approxi-
In the treatment of mixed problems for L mately according to the laws of tgeometric
and Bj not satisfying the uniform Lopatin- optics.
ski condition, the well-posed problems have If asymptotic solution (22) has a caustic, i.e.,
been characterized for operators with con- { {x + IVq(x) 1IE R} 1x E supp w} has an envel-
stant coefficients when fi=R’!+ (Sakamoto ope, w of the form (22) cannot be an asympto-
[29]). For general domains, however, the well- tic solution near the caustic. The asymptotic
posedness of mixed problems depends not behavior of high-frequency solutions near the
only on the properties of the Lopatinski deter- caustic was first considered by G. B Airy
minant but also on the shape of the domain (Trans. Cambridge Philos. Sot., 1838). Under
(M. Ikawa [30]). the condition that the principal curvatures of
the caustic are positive, w in the form (22) can
be prolonged to a domain containing the
caustic satisfying the asymptotic solution
L. Asymptotic Solutions
w(t, x; k) = eik(e(x)-‘){Ai( - kz’3p(x))go(t, x; k)

In order to explain some properties of phe- + ikmn3Ai’( - k2’3p(x))g1(t,~; k)},


nomenon governed by hyperbolic equations, (23)
asymptotic solutions play an important role.
where Ai is the Airy function
Consider, for example, the acoustic problem

q u=O in (0, co)xR, Ai(r i ei(zf+f3/3)dt


s m
u=O on (0, c0) x r.
and
Let w(t,x; k) be a function defined in (0, co) x
R with parameter k > 1 of the form g~(,(t,x;k)=Cg,j(t,x)k”‘-j
j
w(t, x; k) ,= ,+Ax)--f) jio uj(t, x)k-‘, (D. Ludwig [31]).
G53
Concerning the reflection of grazing rays by
1223 325 Ref.
PDEs of Hyperbolic Type

strictly convex obstacles, the reflected wave R,, the solution becomes smooth in R, for
can be constructed by the superposition of sufficiently large t.
asymptotic solutions of the type (23).
The methods of construction of asymptotic
solutions of the forms (22) and (23) are also References
applicable to Maxwell equations or more
general hyperbolic systems (R. K. Luneberg [l] R. Courant and D. Hilbert, Methods of
[32]; Ludwig and Granoff, J. Math. Anal. mathematical physics II, Interscience, 1962.
Appl., 1968; Guillemin and Sternberg, Amer. [2] J. Hadamard, Le probleme de Cauchy et
Math. Sot. Math. Surveys, 14 (1977)). les equations aux d&i&es partielles lineaires
hyperboliques, Hermann, 1932.
[3] J. Hadamard, Lecons sur la propaga-
tion des ondes et les equations de l’hydro-
M. Propagation of Singularities
dynamique, Hermann, 1903.
[4] R. Courant and K. 0. Friedrichs, Super-
Let L be a hyperbolic operator with C” coetfi- sonic flow and shock waves, Interscience, 1948.
cients and consider the Cauchy problem L[u] [S] J. Schauder, Das Anfangswertproblem
= 0 with initial condition (15). When the initial einer quasilinearen hyperbolischen Differential-
data have singularities, the solution also has gleichung zweiter Ordnung, Fund. Math., 24
singularities for t > 0, which is a property of (1935) 213-246.
hyperbolic equations quite different from the [6] M. Krzyianski and J. Schauder, Quasi-
properties of parabolic ones. It should be lineare Differentialgleichungen zweiter Ord-
noted that the tpropagation of singularities nung vom hyperbolischen Typus, gemischte
cannot be derived from the Huygens prin- Randwertaufgaben, Studia Math., 6 (1936),
ciple in the wider sense,i.e., even for regu- 162-189.
larly hyperbolic operators of second order [7] F. John, Plane waves and spherical means
we cannot determine the location and the applied to partial differential equations, Inter-
type of singularities of the solutions for ini- science, 1955.
tial data with singularities directly from the [S] S. L. Sobolev, Applications of functional
singularities of the fundamental solution analysis in mathematical physics, Amer. Math.
Nt, x; 5 Y). Sot., 1963. (Original in Russian, 1950.)
Suppose that the multiplicities of character- [9] J. Leray, Hyperbolic differential equations,
istic roots of L are constant. Assume that uk, k Lecture notes, Institute for Advanced Study,
=o 91,..., N - 1, have, on either side of a sufti- Princeton, 1952.
ciently smooth (n - 1)-dimensional manifold F, [lo] 1. G. Petrovskii, Uber das Cauchysche
continuous derivatives of sufficiently high Problem fur Systeme von partiellen Differen-
order to suffer jump discontinuities across I. tialgleichungen, Mat. Sb. (N.S.), 2 (44) (1937),
Then the solution u has continuous partial 815-870.
derivatives of sufficiently high order every- [ 1l] 1. G. Petrovskii, Lectures on partial dif-
where except on the characteristic surfaces of ferential equations, Interscience, 1954. (Orig-
L issuing from F, and across these the partial inal in Russian, 1953.)
derivatives of u have jump discontinuities [ 121 S. Mizohata, The theory of partial dif-
(Courant and Hilbert [ 11). ferential equations, Cambridge, 1973. (Original
For more general singularities of initial in Japanese, 1965.)
data, it is known that the twavefront propa- [13] K. 0. Friedrichs, Symmetric hyperbolic
gates along the tbicharacteristic strips satisfy- linear differential equations, Comm. Pure
ingqt--L,(t,x,Vq)=O,1=1,2 ,..., N-s,thatis, Appl. Math., 7 (1954), 345-392.
WFu(. , t) is contained in {(x(t), [(t))~ T*(R”) 1 1 c141 L . Girding, Linear hyperbolic equations
Cdxjldt) ts) =tanllatj) ts, x, 513(dtj/df)(s) = with constant coefficients, Acta Math., 85
-(aW~xi)(s, X, 51, (x(O), ~(O))E iJkWF(d} (1951) l-62.
(J. Chazarain [18]). [ 151 P. D. Lax, Asymptotic solutions of oscil-
The propagation of singularities is more latory initial value problems, Duke Math. J.,
complicated in mixed problems because of the 24 (1957) 627-646.
reflections of singularities at the boundary. [ 161 S. Mizohata, Some remarks on the
For the tacoustic problem, R. B. Melrose [33] 1 Cauchy problem, J. Math. Kyoto Univ., 1
showed the following: Suppose 8 = CR c (1961) 109-127.
{x 11x I< R} for some R > 0, and all the broken [17] S. Mizohata and Y. Ohya, Sur la con-
rays according to the geometric optics starting dition de E. E. Levi concernant des equations
from~,=Rfl{]xl<R}gooutof&inafixed hyperboliques, Pub]. Res. Inst. Math. Sci.,
time. Then for initial data with singularities in 4 (1968) 5 11-526; Sur la condition d’hyper-
326 A 1224
PDEs of Mixed Type

bolicitC pour les Cquations g caractCristiques 326 (X111.27)


multiples II, Japan. J. Math., 40 (1971), 63-
104.
Partial Differential Equations
[lS] J. Chazarain, OpCrateurs hyperboliques g of Mixed Type
caractkristiques de multipliciti: constante, Ann.
Inst. Fourier, 24 (1974), 173-202; Propagation A. General Remarks
des singularit& pour une classe d’opkrateurs B
CaractCristiques multiples et r&solubilit& locale, Let A [u(x)] = 0 be a tquasilinear second-order
Ann. Inst. Fourier, 24 (1974), 203-223. partial differential equation. The type (telliptic,
[ 191 0. A. Oleinik, On the Cauchy problem thyperbolic, or tparabolic) of the equation
for weakly hyperbolic equations, Comm. Pure depends on the location of the point x. If the
Appl. Math., 23 (1970), 569-586. type varies as the point x moves, the equation
[20] V. Ya. Ivrii and V. M. Petkov, Necessary is said to be of mixed type. An example is the
conditions for the correctness of the Cauchy equation
problem for nonstrictly hyperbolic equations,
~2 azq
Russian Math. Surveys, 29 (1974), l-70. (Orig-
inal in Russian, 1974.)
[Zl] Y. Ohya, Le problkme de Cauchy pour
( >
‘-7 G-2
2uv azq
-+
c axay (
I--
~2 azq
-=o
~2> ay2
(1)
les Cquations hyperboliques & caracttristique
of 2-dimensional stationary flow without
multiple, J. Math. Sot. Japan, 16 (1964), 268% rotation of a compressible fluid without vis-
286. cosity, where cp is the velocity potential, u=
[22] J. Leray and Y. Ohya, Systkmes IinCaires,
&p/ax and v = acp/ay are the velocity compo-
hyperboliques non stricts, 2jdme Colloque sur
nents, and c is the local speed of sound, which
1’Analyse Fonctionnelle, CBRM (1964), 105-
is a known function of the speed q =: (u’ + u2)1/2
144.
of the flow. Equation (1) is of elliptic type if q <
[23] M. D. Bronshtein, The parametrix of
c, i.e., the flow is subsonic, and of hyperbolic
the Cauchy problem for hyperbolic operators
type if q > c, i.e., the flow is supersonic. If there
with characteristics of variable multiplicity,
exist points where the flow is subsonic as well
Functional Anal. Appl., 10 (1976) 4, 83-84.
as points where it is supersonic, (1) :IS of mixed
(Original in Russian, 1976.)
type. The study of equations of mixNed type has
[24] I. G. Petrovskii, On the diffusion of waves
become important with the development of
and the lacunas for hyperbolic equations, Mat.
high-speed jet planes.
Sb. (N.S.), 17 (59) (1945), 289-370.
[25] M. F. Atiyah, R. Bott, and L. Ggrding,
Lacunas for hyperbolic differential operators
with constant coefficients I, II, Acta Math., B. Cbaplygin’s Differential Equation
124 (1970), 109-189; 134 (1973), 145-206.
[26] T. Balaban, On the mixed problem for a It is difficult to solve equation (1) directly since
hyperbolic equation, Mem. Amer. Math. Sot., it is nonlinear. However, we can linearize it by
112 (1971). taking q and 0 = arc tan(o/u) as independent
[27] H. 0. Kreiss, Initial boundary value variables (the hodograph transformation). The
problems for hyperbolic systems, Comm. Pure linearized equation takes the form
Appl. Math., 23 (1970), 277-298.
[28] R. Sakamoto, Mixed problems for hyper- $-Iqx)IZ=O, XW)20, (2)
bolic equations I, II, J. Math. Kyoto Univ., 10
aY2
(1970), 349-373,403-417. which is called Cbaplygin’s differential equa-
[29] R. Sakamoto, &-well posedness for hyper- tion. Equation (2) is hyperbolic for :c > 0 and
bolic mixed problems with constant coefli- elliptic for x < 0. The study of general equa-
cients, J. Math. Kyoto Univ., 14 (1974), 93- tions of mixed type, even when they are linear,
118. is much more difficult and less developed
[30] M. Ikawa, On the mixed problems for than the study of equations of nonmixed type.
the wave equation in an interior domain II, Almost all research so far has been on equa-
Osaka J. Math., 17 (1980), 253-279. tion (2) or slight modifications of it.
[31] D. Ludwig, Uniform asymptotic expan-
sions at a caustic, Comm. Pure Appl. Math.,
19 (1966), 215-250. C. Tricomi’s Differential Equation
[32] R. K. Luneberg, Mathematical theory of
optics, Brown Univ. Press, 1944. The simplest equation of the form (2) is
[33] R. B. Melrose, Singularities and energy
decay in acoustical scattering, Duke Math. J.,
(3)
45 (1979j, 43-59.
1225 326 Ref.
PDEs of Mixed Type

which is called Tricomi’s differential equation. partial differential equations. There is, how-
F. G. Tricomi considered the following bound- ever, a difftculty in Friedrichs’s theory since it
ary value problem for (3): In Fig. 1, AC and does not give a unified procedure for reducing
BC are two tcharacteristic curves of (3) and e a given boundary value problem for a given
is a Jordan curve connecting A and B. We seek equation to an admissible boundary value
a solution of (3) in the domain D bounded by problem for a symmetric positive system of
AC, BC, and cr that takes given values on u and partial differential equations. The study of
on one of the two characteristic curves, say on equations of mixed type that are of more
AC. This boundary value problem is called the general form than (2) by means of Friedrichs’s
Tricomi problem. Tricomi proved the existence theory is an open problem.
and uniqueness of the solution of his problem
under some conditions on the shape of Q and
the smoothness of the boundary values. After E. Further Studies
Tricomi, much research has been done on his
and similar problems for equations of form (2) Work on equations of more general type than
[Z]. We can also consider problems such as (2) or (3) has appeared (not all depending on
finding a solution of (3) (or of (2)) satisfying Friedrichs’s theory). For example, the fol-
the initial conditions lowing equations are treated in [S, 6,7],
respectively:

on the common boundary x = 0 of the elliptic


domain and the hyperbolic domain of the
equation. This is called the singular initial
value problem. S. Bergman [3] obtained an
integral formula for the solution under the
condition that zi(y) and zl(y) are real analytic.

Q
Y
A where z=(zl,..., zn) and G(y) and K(y) are
symmetric matrices.
0
C
References

B [l] F. Tricomi, Sulle equazioni lineari alle


x
0 derivate parziali di 2” ordine di typo misto,
Atti Acad. Naz. Lincei, (5) 14 (1923) 133-247.
Fig. 1
[2] L. Bers, Mathematical aspects of subsonic
and transonic gas dynamics, Wiley, 1958.
D. Friedricbs’s Theory [3] S. Bergman, An initial value problem for a
class of equations of mixed type, Bull. Amer.
For the study of equations of mixed type it Math. Sot., 55 (1949), 165-174.
would of course be most convenient if there [4] K. 0. Friedrichs, Symmetric positive linear
existed a general theory of boundary value differential equations, Comm. Pure Appl.
problems independent of the type of the equa- Math., 11 (1958) 333-418.
tion. However, constructing such a general [S] A. V. Bitsadze, On the problem of Equa-
theory is considered very difftcult, because the tions of mixed type in multi-dimensional
twell-posedness of boundary conditions as domains (in Russian), Dokl. Akad. Nauk
well as the analytic properties of solutions are SSSR, 110 (1956), 901-902.
quite different according to the type. The first [6] V. I. Zhegalov, Boundary value problem
contributor to the solution of this difficult for higher-order equations of mixed type (in
problem was K. 0. Friedrichs [4], who noticed Russian), Dokl. Akad. Nauk SSSR, 136 (1961)
that although the methods of solving the 274-276.
tCauchy problem and the tDirichlet problem [7] V. P. Didenko, Some systems of differen-
are quite different, both methods utilize energy tial equations of mixed type (in Russian), Dokl.
integrals in the proof of the uniqueness of Akad. Nauk SSSR, 144 (1962), 709-712.
solutions. Using this observation, he succeeded [8] A. V. Bitsadze, Equations of the mixed
in constructing a unified theory that enables type, Pergamon, 1964. (Original in Russian,
us to treat various types (including the mixed 1959.)
type) of linear equations in a single scheme- [9] M. M. Smirnov, Equations of mixed type,
an admissible boundary value problem for a Amer. Math. Sot. Transl., 1978. (Original in
symmetric positive system of first-order linear Russian, 1970.)
327 A 1226
PDEs of Parabolic Type

327 (X111.26) solutions u, = sin fi (x -a)exp( -- 1, t). Here


the i, are the roots of sin & (b -a) = 0, and
Partial Differential Equations the c, are chosen so that C,“=r c,u,(x, 0) = q(x).
of Parabolic Type In fact, if v(x) is continuously differentiable,
then C,“=r c,u,(x, t) is the required solution.
A. General Remarks Thus we are led to the problem of expanding
a given function q(x) in a tFourier series.
Consider a second-order linear partial dif- The temperature distribution in an infinite
ferential equation rod is given by a continuous function u(x, t)
that satisfies equation (3) for t > 0 and that, for
t = 0, takes values given by q(x), where
l$ u(x, t) = cp(x). (5)

If q(x) is bounded, then it can also be repre-


sented by superposition of particular solutions
for an unknown function u of (n + 1) indepen- e -(n-xPW of (3) as
dent variables (x, t) = (x1, , x,, t), where aij =

I-i
aji. This equation is said to be parabolic (or 1 3c (p(c()e-(a-xw&,
of parabolic type) if and only if the quadratic
2fi -cc
form x aijtitj in 5 is positive definite at each u(x, t) =
point (x, t) of the region under consideration. x t>0,
CPM>
t=0.
and t are sometimes called the variables of I
space and of time, respectively.
Partial differential equations of parabolic
The most widely studied of the parabolic
type are important because of their connec-
equations is the equation of heat conduction (or
tion with various phenomena in the physical
the heat equation):
world; they include not only equations that
govern the flow of heat but also those that
L[u,=Au-$0, (2) describe diffusion processes (- 115 Diffusion
Processes).
where ,I = x1=, a*/ZxF is the Laplacian taken
over the space variables.
C. Partial Differential Equations of Parabolic
Type in Two Variables
B. The Equation of Heat Conduction
We are concerned mainly with the partial
The l-dimensional case of the heat equation is differential equation of parabolic ‘:ype in two
variables:

a(x, t)$+2b(x, t)g+c(x, t)!$


for the temperature u(x, t) in a rod, considered
as a function of the distance x measured along
+d(x, t)&+e(x, t)+y+f(.r, t)u=g, (7)
the rod and the time t. Equation (3) was one of
the first treated in the theory of partial differ-
with ac= b’. In the region where Ial + ICI >O,
ential equations. Consider a finite rod with
equation (7) can be reduced to the: form
constant temperature 0 at its ends x = a and x
= b. Thermodynamics suggests that the initial
temperature q(x) (q(a)=cp(b)=O) prescribed at
t =0 is sufficient to determine the distribution
of heat u(x, t) in the rod at all later times t >O. by an appropriate change of variables 5 =
On such physical grounds, we can expect that U(x, t), z = V(x, t). If e’ <O in this region, we
a solution to the following problem exists: can assume without loss of genera.lity that our
Find a continuous function u(x, t) that satisfies equation takes the canonical form
equation (3) for a < x < b, t > 0, and the bound- 2
all
ary conditions a(x,‘)~+b(x,t)~+c(x,f)u-~-=8.
/’
(8)
~(a, t) ==u(b, t) = 0,
with a > 0, from the outset. It has the single
‘~~~U(X, t) = cp(4, a<x<b. (4) family of tcharacteristics

t = constant. (9)
According to J. Fourier the answer to this
problem is expressed in a series X:1 c,u, con- There are four typical problems to be posed
structed by superposition of the particular with regard to equation (8).
1221 327 D
PDEs of Parabolic Type

The first consists of determining, in some This problem, posed for equation (lo), is also a
neighborhood of a given curve C nowhere mathematical formulation of the problem of
tangent to a characteristic, a solution u that heat conduction in a rod [4]. If p(x) is of class
possesses prescribed values of u and au/an, C’, then the solution to this problem can be
or of a linear combination of u and au/an, expanded as
along C. For instance, the problem of finding
a solution u(x, t) such that u(x,, t)= g(t) and u(x, t)= fJ c,e-“n’cp,(x),
u,(xO, t) = h(t) for given functions g(t) and n=1
h(t) is a problem of this type. Consider the
equation

ah au
---= 0 (10)
CT,=
s
a
b

dxh(x)dx,

where cp,(x) is a normalized function (~~q$(x)dx


(17)

ax2 at = 1) that satisfies the boundary conditions

in the region a < t <b, x,, <x. According to E. d(a) - bA4 = 0, d(b)+HvnW=O (18)
Holmgren, a solution u(x, t) satisfying the
and the equation
conditions

$; 4x9 t) = s(t), lg u,(x, t) = h(t), (11) (19)

where g’(t) is bounded and continuous, exists if The fourth type of problem is to find a
and only if function u(x, t) that satisfies (8) for t > 0 and
the initial condition limtlo u(x, t) = q(x). It
corresponds to the problem of heat conduc-
(12)
tion in an infinite rod.

is of class C” and satisfies

1k’“‘(t)1 < M(n!)‘/r” (13) D. Green’s Formula

for positive constants M and r.


The tadjoint of the differential operator L[u]
In the second type of problem we are re-
in (3) is given by
quired to find in a region of the form

cPl(t)~xGcpz(tX t,<t<t,, (14) (20)


a solution of (8) that takes prescribed values
Integration by parts yields the identity
on part of the boundary of that region. Here
we impose the hypothesis on the curves x =
cpl(t) and x= q2(t) that they are nowhere tan-
gent to a characteristic (9). M. Gevrey [3]
showed that if such a solution does exist, the
functions cpl(t) and q2(t) must satisfy the tH61-
der condition with exponent c(> l/2:
where G is the region bounded by the closed
I~i(t+h)-~i(t)l~clhl”, c = constant, (15) curve C, and the line integral on the right is
evaluated in the counterclockwise direction
for sufficiently small h. The problem of heat
over C. We call (21) Green’s formula for the
conduction mentioned in Section B corre-
partial differential equation of parabolic type
sponds to the particular case ‘pi(t) = constant,
(3). As in the case of partial differential equa-
cp,(t)=constant, for which condition (15) is
tions of elliptic type (- 323 Partial Differen-
automatically fulfilled.
tial Equations of Elliptic Type), this formula
The third type of problem is to find in a
is used to establish the uniqueness of the solu-
region of the form
tion of (3) and to derive an integral repre-
a<x<b, t>o, (14’) sentation for it.
For example, the uniqueness of the solution
a solution of (8) that satisfies the conditions is established in the following way: Let the
ljLyu(x, t) = cpb); curve fi and B^E in Fig. 1 be such that no
characteristic meets either of them in more
than one point. If u(x, t) is continuous in the
g-hu=O for x=a, h = constant > 0; closed region (ABED), vanishes on A^D, B^E,
and the segment AB, and satisfies equation (3)
au in (ABED) except on AB, then it vanishes
z+H~=O for x=b, H = constant > 0.
identically. Green’s formula (21), applied to the
(16) region G=(ABQP) and the functions $ = 1,
327 E 1228
PDEs of Parabolic Type

cp= u2, yields where (x, a) E R” x R” and t < 8. For equation


(3), the following maximum principle holds:
In the region (ABED) of Fig. 1, suppose that
L[u] 2 0 and that u takes its maximum value
In a similar way, by extending Green’s for- K at an interior point M. Then u is identically
mula suitably, we are able to prove unique- equal to K on the segment QP and in the
ness theorems for the four problems stated in region (ABQP). More generally, various ver-
Section C for more general linear parabolic sions of the maximum principle are known for
equations. equation (1) [S].

p1 ------.$ ------- / Q E. The Laplace Transform Method

Let u(x, t) be a solution of (3) for t >O and


cc
Fig. 1 u(x, 1”)= e-%(x, t)dt, 1>0, (25)
s cl
To obtain a representation for solutions we be its fLaplace transform with respect to t.
proceed as follows: Let u(x, t) be a solution of Utilizing integration by parts, we lhave
(3) and Let
m
e-‘*u,(x, t)dt = [C”‘u(x, t)]fI$
s0
‘72
be a particular solution. Applying Green’s for- +1 emA*u(x, t)dt
s0
mula to the region (PABQMP) and the func-
tions cp= u(x, t), rj = CJ(x,, t, + h, x, t), where h = - q(x)+ iu(x, A), 6’6)
is a positive number and M is a point with
provided that lim,,, e-%(x, t)= 0 and

s-4x.,
t,)e
coordinates (x0, to), we obtain
lirntlo u(x, t) = q(x). We find in view of (3):
-(x,-x)2/4h a*u
PC!
s = wx, 4 - cpw.
= u(x, t) U(x,, t, + h, x, t)dx Once the solution of (26’) has been found, the
s---
PASQ desired solution of (3) can be derived by invert-
ing the Laplace transform (25). This idea can
also be applied to the solution of parabolic
equations with constant coefficients in (n + 1)
Since the integral in the left-hand side of this variables, such as (1).
equality approaches u(xO, to) as hJ0, we can
establish the basic representation formula
F. General Second-Order Equations of
4%to)=s- u(x,
t)U(x,,
to,x,t)dx
PABQ
Parabolic Type

Consider the equation (I) with f=: 0, which can


+( ug-+J)dt (23) be written as

for solutions u of (3). Formula (23) shows that F,A(r)u


(27)
u(xO, to) is determined in terms of the partic- at )
ular solution (22) if we know the values of u
where A(t) is a second-order telliptic operator
and au/dx on the part PABQ of the boundary
with parameter t. Let D be a region (bounded
of the region (ABED). The function (22) is
or unbounded) of points x whose boundary
called the fundamental solution of (3) because
is a smooth hypersurface S. We pose the fol-
it plays the same role as the fundamental
lowing initial boundary value problem for
solution logr (r=((a-x)2+(~-y)2)“2) of
(27): Find a function u(x, t) that satisfies in
Laplace’s equation a2upx2+a2Ulay2=0.
D x (0, co) equation (27) together with the
Similarly, the following function E (called
conditions
the Gauss kernel) is a fundamental solution of
equation (2): lpi(x,t)=(P(x), XED,

E(a,B;x, t)=(4n(b-t))-““exp I-&$}, ~u(x, t)/an+h(x, t)u(x, t)=f(x, t), XES, (16’)
(24) ) where a/&r is the directional derivative in the
1229 327 G
PDEs of Parabolic Type

outward tconormal direction at (x, t) (xES), If the function h in the boundary conditions
and h(x, t) 2 0. (16’) and the coefficients of A are independent
The Laplace transform is not suitable for oft, then the fundamental solution U(<, z, x, t)
solving problems (27) and (16’). Instead, the depends only on 5, x, and t-t and is written
theory of l-parameter semigroups of linear as U(t, x; t-z) (t > 7). Furthermore, if A is
operators (- 378 Semigrbups of Operators tself-adjoint, then there exist a sequence of
and Evolution Equations) can be applied to teigenfunctions {$p(x; 2) 1p = 1,2, . . . } (A$p +
establish similar fundamental results. Let m be &Qp= 0) and a sequence {p,(A)} of measures
a large positive integer. For t > 0, put t, = k6 on the real line for which the following hold:
fork=0 , 1,..., m-l witha=tfm. BytheLa- (1) The fundamental solution U(& x; t) is ex-
place transform method as described above, pressed in the form
we can associate with $ a unique solution v of
A(t,)v=lv-$ with 1= l/6. We put R,$=lv. U(t, x; t) = 2 m e-“Vp,(5; Wp(x; 4&,(l).
By iterating this procedure m times, we have p=1 s -cc,

a function u,(x, t) = R,-, R,-, . . . R,q start- (2) The solution u(x, t) of (27) satisfying (16’)
ing from the initial value cpat t =O. Then we with f(x, t) = 0 is expanded as
obtain a solution u(x, t) of (27) and (16’) as
the limit of u,(x,t) as m-co. The following e-“‘l(l,k 4cp,Wpp(4,
results are known [6,7]: (i) There exists a u =
U(& T,x, t) (x, 5ED, t > ‘t 2 0) that, as a func- where
tion of x and t, satisfies equation (27) and the
homogeneous boundary conditions (16’) with
rp= 0, f= 0. (ii) The function u(x, t) defined by

UC%
t)=sDCPK)
w 09x2
w (q(x) is the function given in (16’).

G. Nash’s Results

Let us consider a parabolic equation

(28)
is a solution of (27) and satisfies (16’). Thus
U(& Z,x, t) is a generalization of the function
where aij = aji are real-valued functions of class
(24), called the fundamental solution of the
C” and equal to constants outside a fixed
linear parabolic equation (27) with boundary
compact set of R” for all t 2 t, (this regularity
conditions (16’). Besides the properties (i)
assumption can be relaxed). Suppose that
and (ii), the fundamental solution satisfies
there exists a constant 12 1 such that 1-‘J<[’
UK, 7, & t) 2 0, SDU(<, GZ, w) U(z, w, x, wz = ~~aij(x~t)5ilj~al~12 for all (4X,S)E(t~,oO) X
U([, z, x, t) (z < w < t), and further SDU(<, T,
R” x R”. Then, for any bounded solution u(x, t)
x, t) dx = 1 under some additional assump-
of (28) in (to, co) x R” and for any (x, y, t, s) such
tions. Therefore this theory is of consider-
that x ER”, y ER”, and t, < t < s, the inequality
able significance from the point of view of
the theory of probability (- 115 Diffusion
Processes).
It can be shown that a weak solution of the (29)
parabolic equation (27) is a genuine solution.
That is, if u(x, t) is locally summable and holds, where B=sup{ lu(x, t)l 1t > t,, XER”} and
/I = a/(2cr+ 2). In this inequality, the constants
j; jDUW(~ +A(t)*cp(x,t)
) dxdt=O u and A are positive, depending on (n, 1) but
independent of the particular choice of (aij), t,
for any function rp(x, t) of class Cz in D x (0, co) andofu[9].
and vanishing outside a compact subset of D As a corollary to this theorem, J. Nash
(A(t)* is the adjoint of the partial differential proved that, if the aij do not contain t and if
operator A(t) and dx = dx, . . . dx,-I), then v(x) is a bounded solution in R” of the elliptic
u(x, t) satisfies (27) in D x (0, m) in the usual equation obtained by replacing au/at by 0 in
sense.In particular, when the coefficients of (28), the inequality
A(t) are infinitely differentiable, any solu-
Iv(x)-u(y)l<A’B’Ix--yl” (30)
tion u(x, t) of (27) in the distribution sense is a
genuine solution (- 125 Distributions and holds for any (x, y) ER” x R”, where u = or/@+ 1)
Hyperfunctions). with LYin (29) and B’= sup lo(x)I. The constant
321 H 1230
PDEs of Parabolic Type

A’ depends only on (n, 2) (- 323 Partial Dif- [4] R. Courant and D. Hilbert, Methods of
ferential Equations of Elliptic Type L). mathematical physics, Interscience, I, 1953, II,
1962.
[S] M. H. Protter and H. F. Weinberger, Max-
H. Partial Differential Equations of p- imum principles in differential equations,
Parabolic Type Prentice-Hall, 1967.
[6] S. Ito, Fundamental solutions of parabolic
Let p and m be given positive integers. Let us differential equations and boundary value
consider an equation for an unknown function problems, Japan, J. Math., 27 (1957), 555102.
u of (n + 1) independent variables (x, t) of the [7] A. M. Bin, A. S. Kalashnikov, and 0. A.
Oleinik, Linear equations of the second order
type
of parabolic type, Russian Math. Surveys, 17-3

z+c a,j
a,,j(X, t)&$=.f (31)
(1962), 1- 143. (Original in Russian 1962.)
[S] J. L. Lions, Equations differentielles
operationnelles et problemes aux limites,
where a = (a ,, . . . . u,) and aa/axa=(a/ax,p .
Springer, 1961.
(a/ax,p. Wewritealso1a(=cc,+...+a,.In
[9] J. Nash, Continuity of solutions of para-
(31), C,,,i is the summation taken over the
bolic and elliptic equations, Amer. J. Math.,
(aj) such that pj+lal<pm and O<j<m. Let
80 (1958), 931-954.
us denote by {2,(x, t, [)}r=:=, the roots 1 of the
[lo] S. D. Eidel’man, Parabolic systems,
equation
North-Holland, 1969. (Original in Russian,
1”” +c’ a,,j(x, t)(g)“v= 0, (32) 1963.)
a,i
[ 1 l] S. Mizohata, Hypoellipticitt ties Cqua-
where Ch,j is the summation over the (a,j)‘s tions paraboliques, Bull. Sot. Math. France,
such that pj + 1aI = pm and that 0 <j < m. We 85 (1957), 15-50.
say that the equation (31) is p-parabolic (or of
p-parabolic type) in the sense of I. G. Petrov-
skii if and only if there exists a positive num-
ber 6 such that 328 (V.6)
Rei,(x,t,t),< -bl&‘, 1 <k<m, (33) Partitions of Numbers
for any (x, t) in the region under consideration
and for any <E R”. The integer p is then seen to A partition of a positive integer n is. an ex-
be even. Equation (27) is p-parabolic if --A(t) pression of n as the sum of positive integers.
is strongly elliptic of order p. The heat equa- The number of partitions of n, where the order
tion (2) is 2-parabolic in this sense. Similarly, of the summands is ignored and repetition is
we can define the p-parabolic systems of equa- permitted, is denoted by p(n) and is called the
tions [lo]. number of partitions of n. For example p(5) = 7
p-parabolic equations are known to be since5=4+1=3+2=3+1+1=;!+2+1=
thypoelliptic if the coefficients are of class 2+1+1+1=1+1+1+1+1.Therefore,p(n)
C” [ 111. S. D. Eidel’man obtained precise equals the number of tconjugate classes of the
estimates of the fundamental solutions and tsymmetric group of order n and is closely
of their derivatives for p-parabolic equa- related to the trepresentation theory of this
tions [lo]. The mixed initial boundary value group.
problems are investigated in detail also by The igenerating function of p(n) is
Eidel’man [lo] and by R. Arima (J. Math.
Kyoto IJniu., 4 (1964)). f(x)=l+ 5 p(n)x”= 5 (l-X3jl?.
ft=1 ( n=1 I
The unit circle 1x1~ 1 is the tnatural boundary
References of f(x), which is holomorphic in Ix I< 1. The
Dedekind eta function, which is closely related
[l] L. Bers, S. Bochner, and F. John, Contri- to f(x), is defined by the following formula for
butions to the theory of partial differential the complex variable t taking values in the
equations, Ann. Math. Studies, Princeton upper half-plane:
Univ. Press, 1954.
[2] A. Friedman, Partial differential equations q(r)=exp(rriz/l2) fi (1 -exp(2ainT)).
of parabolic type, Prentice-Hall, 1964. n=1
[3] M. Gevrey, Les equations aux derivees Hence ~(7 + l)= exp(rci/l2)n(z). L. Euler (1748)
partielles du type paraboliques, J. Math. Pures obtained the following formula (called the
Appl., 9 (1913) 3055471; 10 (1914), 105-147. pentagonal number theorem because n(3n - 1)/2
1231 328
Partitions of Numbers

is a tpentagonal number): function f(x) of p(n):

fJl-x4
= 1 + 2

II=1
( -l)yXn(3n-w +X”(3~+lYZ)~
=W,k&ev($$$J
This follows easily from the Jacobi-Biebler
equality Xf(exp(y--z)),

(h,k)=l, hh’= -l(modk),

where W,,, is defined by


= 1+ -f q+(Z”+Z-“) (lql< 1, ZZO).
“=I W,,, =exp(nis(h, k))
By using the ttransformation formula for 9- and the value of s(h, k) was given by Radema-
functions, we can infer from the pentagonal cher in the form
number theorem that n( - l/z)= ,/$ n(r).
Hence, if a, 6, c, d are integers satisfying ad -
bc= 1 and c>O, then
Here the symbol ((t)) in the sum denotes the
function that is 0 for integral t and t - [t] - l/2
otherwise ([ ] is the tGauss symbol). With
where E is a 24th root of unity. It is known regard to the Dedekind sum s(h, k), we have the
that q(r) is a tcusp form of weight -l/2 with reciprocity law for Dedekind sums:
respect to the full tmodular group F( 1) [9, 111.
C. L. Siegel (1955) gave a simple proof of the s(h,k)+s(k,h)= -$+A ;+;+A
formula q( - l/r)= ,/@ q(r). Later S. Iseki ( >
(1957) gave another proof by using a new If we make substitutions a = h’, b = (hh’ + 1)/k,
method, known as the CI- /I formula [ 121. c = k, and d = -h in the Hardy-Ramanujan
The size of p(n) increases rapidly with n; for transformation formula, then the E appearing
instance p(lO)=42 and p(lOO)= 190,569,292. in the transformation formula of ~(7) is seen to
By making use of a remarkable identity, G. H. be equal to exp( - nis(c, d) + 7ci(a + d)/l2c). A
Hardy and S. Ramanujan (1918) proved the direct proof of the transformation formula and
following inequalities, where A and B are the reciprocity law was given by K. Iseki
suitable constants: (1952).
According to Cauchy’s integral formula, p(n)
(A/n)e’&~(n)<(B/n)e~@J”.
can be represented as an integral:
Subsequently they obtained
L f(x)dx
~(4 -W4~4exp~~J2n/3). 27ri s rX”+l ’
After, P. Erdiis (1942) and D. J. Newman where the contour I is taken inside the unit
(1951), A. G. Postnikov [S] succeeded in circle around the origin. The generating func-
proving tion f(x) varies greatly: namely, letting r+ 1-O
in x = r exp(2aip/q), where p and q are fixed
integers, it follows from the transformation
formula that f(x) w exp(w2/6q2(l - r)). Never-
by means of an elementary function-theoretic theless, we can deal with the integral by the
method. Multiplying both sides of Euler’s tcircle method, introduced by Hardy and
formula by the generating function of p(n) and Ramanujan, which threw light on recent addi-
comparing the coefficients, we obtain tive number theory. Hardy and Ramanujan
thus obtained

+ O@xp(DJtt)),
where o,=k(3k- 1)/2 (k=O, fl, +2, . ..) is a
pentagonal number. From this formula we
can calculate p(n) successively; in fact P. A. where
MacMahon obtained in this way the values of
p(n) for n up to 200.
Hardy and Ramanujan proved the following The theory was improved by Rademacher
transformation formula for the generating I ( 1937,1943),
who expanded p(n) into the
328 Ref. 1232
Partitions of Numbers

series [6] S. Ramanujan, Collected paper:; of Srini-


vasa Ramanujan, Cambridge Univ. Press,
1927 (Chelsea, 1962).
[7] H. Petersson, Uber Modulfunktionen und
Partitionenprobleme, Abh. Deutsch. Akad.
where
Wiss., Berlin Kl. Math. Allg. Nat., 1954, no. 2.
‘h(n)= c W,,,exp( - 2&m/k). [S] A. G. Postnikov, Introduction to the ana-
h(IllOdk,,(h,k)=I
lytical theory of numbers (in Russian), Mos-
Rademacher (1954) proved further that cow, 1971.
[9] M. I. Knopp, Modular functions in analy-
tic number theory, Markham, 1970.
[lo] T. Mitsui, On the partition problem in an
algebraic number field, Tokyo J. Math., 1
(1978).
[ 1 l] H. A. Rademacher, Topics in analytic
where number theory, Springer, 1973.
[12] S. Iseki, The transformation formula for
L,(z)=n:=on!T(v+n+l)’
5 z” the Dedekind modular function aml related
functional equations, Duke Math. J., 24 (1957),
Rademacher (1943) had developed an inge- 653-662.
nious proof by taking “Ford’s circle” as the
contour r.
Ramanujan observed that p(5m + 4) = 0
(mod5), p(7m+5)=0 (mod7), and p(llm+
6) = 0 (mod 11). Rademacher (1942) and New-
329 (Xx1.38)
man studied these cases by using q(z). More Pascal, Blaise
generally, A. 0. L. Atkin proved that if 24n = 1
(mod5”7b11c) then p(n)-0 (mod5”71+tb~2)llc) Blaise Pascal (June 19, 1623-August 19, 1662)
(Glasgow Math. J., 8 (1967)). At present, this is was born in Clermont-Ferrand in southern
the best result. France. He lost his mother when still an infant
Letn=n,+n,+...+n,beapartitionofn. and was brought up by his father, Etienne
Many problems arise when we put additional Pascal (discoverer of the curve called Pascal’s
conditions on the nj. For instance, we may tlimacon). As a youth, he demonstrated a
require that the nj satisfy certain congruence remarkable ability for mathematics. In 1640,
relations (L. K. Hua (1942), S. Iseki (1959)) or under the influence of Desargues, he dis-
are powers of integers (E. M. J. Wright (1934), covered tPascal’s theorem on conic: sections,
L. Schoenfeld (1944) S. Iseki (1959)) or are and in 1642 invented an adding ma.chine. After
powers of primes (T. Mitsui (1957)). The par- hearing of Toricelli’s experiments in 1646, he
tition problem can also be extended to the became interested in the theory of lluids and
case of an algebraic number field of finite on his own began to conduct experiments; this
degree (Rademacher, G. Meinardus (1953), research put to rest the prevailing opinion that
Mitsui (1978)). nature abhors a vacuum and that, therefore, a
vacuum cannot exist. Pascal formulated the
principle stating that pressure, when applied at
References any point within a contained liquid, is trans-
mitted throughout the fluid. By me:ans of this
[1] R. G. Ayoub, An introduction to the ana- principle, he explained various phenomena
lytic theory of numbers, Amer. Math. Sot. concerning fluids such as the atmosphere and
Math. Surveys, 1963. laid the foundations for hydrostati’cs.
[2] G. H. Hardy and S. Ramanujan, Asymp- Between 1652 and 1654, Pascal was pre-
totic formulae in combinatory analysis, Proc. occupied with social affairs, but subsequently
London Math. Sot., (2) 17 (19lQ 75-115. he began to devote himself to religion. He
[3] G. H. Hardy and E. M. Wright, An intro- entered the Abbey Port-Royal of the Jansenist
duction to the theory of numbers, Clarendon sect, where he remained until his death. Im-
Press, fourth edition, 1965. mediately before his entry, however, he and
[4] K. Iseki, A proof of a transformation for- Fermat exchanged correspondence about
mula in the theory of partitions, J. Math. Sot. games of chance, and these letters proved to
Japan, 4 (1952), 14-26. be the beginning of the theory of tprobability.
[Sj H. A. Rademacher, Lectures on analytic Concerning games of chance, Pascal had con-
number theory, Tata Inst. Fund. Res., 1954- ducted research on tPascal’s triangle, and in
1955. this study he formulated and used tmathemat-
1233 330 Ref.
Permutations and Combinations

ical induction. He also indicated a way to


gent for IzI < 1, and it is verified that x =
obtain the sum of the mth powers of the con- 0n
secutive terms of an arithmetic progression, (x),/n! in terms of the Jordan factorial (x),.
and with an intuitive idea of limits obtained The same results hold in a tcomplete field with
the formula j: x”dx = amf’/(m + 1). While in tvaluation, in particular in a tp-adic num-
Port-Royal, he published Lettres provinciales ber field. In any case, we have the recursive
(1657), in which he carried on a dispute with relation
the Jesuits. His book Penstes shows his deep
involvement with religion; however, he did not
abandon mathematics. In 1658, he determined
the area enclosed by a tcycloid and its base,
the barycenter and area of the figure enclosed
by a cycloid and straight lines parallel to its
base, and the volume of the figure obtained by and in general
rotating it around these lines. The study of the
methods used by Pascal to obtain these re-
sults, which were forerunners of differential
and integral calculus, led tLeibniz to discover which leads to many identities involving
the fundamental theorem of calculus. Pascal binomial coefficients. The recursive relation
also formulated clear ideas about axioms. n
allows us to compute the values of easily
0k
for small integers n, k, as was noticed by
Pascal. The arrangement of these values in a
References
triangular form:

[l] L. Brunschvicg (ed.), B. Pascal, Oeuvres 1


I-XIV, Hachette, 19041914. 1 1
[2] J. Chevalier (ed.), B. Pascal, Oeuvres com- 1 2 1
pletes, Bibliothtque de Pleiades, Gallimard, 1 3 3 1
1954. 1 4 6 4 1
[3] KGkiti Hara, L’Oeuvre mathematique de is called Pascal’s triangle. For integral values
Pascal, Mem. de la Facultt des Lettres de x, (1 + z)” are polynomials, and we have (a +
l’Univ. d’Osaka, no. 21, 1981.
l~)“=~~=~ i a n-kbk (binomial theorem). As
0
a generalization, we have

(a,+...+a~~=~&a~~...a~~
. ... .
330 (11.8)
Permutations and (multinomial theorem), where the sum is ex-
tended over all nonnegative pi with Z pi = n.
Combinations The number of ways of choosing k elements,
allowing repetition, from a set of n elements is
Let there be given a set R of n elements. If (-i)k~j=~+[p'). This is ais0 the
we choose k distinct elements of R and ar-
range them in a row, we have a k-array or k- number ofnonnegative integral solutions of
permutation of elements of Q. The number of &xi = k. As an example of binomial coefli-
such arrays is (r& = n(n - 1) . .. (n -k + 1). cients with noninteger arguments, we have
The polynomial (x),= x(x - 1) . . . (x -k + 1)
in x of degree k is called the Jordan factorial of (-j’2)=(-1)“2-‘“0
degree k. In particular, (n), = n!, n factorial, is
the number of permutations of R. A subset of
R is called a k-subset if it contains exactly k References
elements. The number of k-subsets (or k-
combinations) of R is i = (n),/k!. The [l] E. Netto, Lehrbuch der Combinatorik,
0 Teubner, second edition, 1927 (Chelsea, 1958).
X [2] P. A. MacMahon, Combinatory analysis I,
binomial coefficients are defined by the
0n II, Cambridge Univ. Press, 1915-1916 (Chel-
sea, 1960).
generating function (1 + 2)” = x:0 z”. For
[3] J. Riordan, Combinatorial identities,
any complex number x, the series is conver- Wiley, 1968.
331 A 1234
Perturbation of Linear Operators

331 (XII.1 3) B. Stability of Basic Properties

Perturbation of Linear (1) Let TE C(X, Y). Important notions for


Operators characterizing the smallness of A E .4(X, Z)
relative to Tare the following. (i) A is said to
be relatively bounded with respect to T (or
A. General Remarks
simply T-bounded) if D(A) 3 D(T) and there
exist a, b > 0 such that
Historically, the perturbation method was
(*) ~~Aull,~allull.+bllTu//,for all ueD(T).
developed as an approximation device in
classical and quantum mechanics. In the per- The infimum, denoted by I/A 11T, of b for which
turbation theory of eigenvalues and eigenfunc- (*) holds with some a is called the ‘r-bound of
tions, created by L. Rayleigh and E. Schriidin- A. (ii) A is said to be relatively compact with
ger, the main concern was to find solutions as respect to T (or T-compact) if D(A) 3 D(T)
power series in a parameter K that could be and A is compact from D(T) with the graph
regarded as small. In the perturbation theory norm of T to Z (- 68 Compact and Nuclear
for linear operators, however, we are con- operators F). T-compactness of A implies T-
cerned more generally with the behavior of boundedness (and in Hilbert spaces 11AlI T = 0).
spectral properties of linear operators when (2) Let TEC(X, Y), and let AEA{X, Y) be T-
the operators undergo small change. The bounded. (i) If II All T < 1 (or if A is :r-compact),
foundation of the mathematical theory, includ- then T+ AE C(X, Y). (ii) If, in addil.ion, X = Y
ing a complete convergence proof of perturba- is a Hilbert space, T is +self-adjoint, and A is
tion series, was laid down by F. Rellich [l] isymmetric, then T+ A is self-adjoint (Rellicb-
and T. Kato [2,3]. Another major topic in Kato theorem) [l, 81. (iii) Suppose that T is a
the perturbation theory for linear operators +Fredholm operator. If either A is r-compact
is the perturbation of continuous spectra, or the inequality (*) holds with constants a, b
which was initiated by K. 0. Friedrichs (Math. satisfying bp + a < p for a certain positive
Ann., 115 (1938); [4]). It is closely related to number p determined by T, then 7’+ A is
scattering theory and is discussed more fully a Fredholm operator and ind( T+ ‘4) = ind T
in 375 Scattering Theory. A standard reference (for ind T, nul T, and def T - 251 Linear
in this field is [S] (also - [6,7]). Most of Operators). In the latter case when: bp + a < p,
the material presented in this article is taken we also have nul(T+ A) < nul T anlj def( T-t
from [S:]. A)<defT.
For problems in Hilbert spaces there are
two general frameworks in which to formulate
perturbation situations: the operator formula- C. Continuity and Analyticity of F.amilies of
tion and the form formulation. In the former Closed Operators
we deal with a family of operators T(lc) direct-
ly, while in the latter, we deal with associated In order to handle unbounded operators,
semibounded Hermitian (or, more generally, which are important in application:<, it is neces-
sectorial) forms t(K). The latter is applicable sary to introduce generalized notions of con-
only when there is semiboundedness (or a vergence and analyticity of families of closed
sectorial property) inherent in the problem, operators.
but is usually more general than the former (1) C(X, Y) becomes a +metric space by a
in such problems, since the latter (resp. the distance function a(S, T) having the property
former) requires roughly the constancy of that &r(S), r(T))<&, T)d2&r(S), r(T)),
the domain of the “square root” of T(K) (resp. where for closed subspaces M and N we put
the domain of T(ti)). In this article we discuss
&M,N)=max[?j(M,N),6(N,M)],
problems in the operator formulation. For
the form formulation - [S] and [7]. 6(M, N) = sup dist(u, N); 6(0, N) = 0.
uthf, Ilull =1
In this article X, Y, . , are complex Banach
spaces and T, A, are linear operators unless 8(M, N) is called the gap between hi and N [S].
other specifications are made. The following When T,-+ T in this metric, T, is said to con-
notations defined in 251 Linear Operators are verge to T in the generalized sense. This gen-
used without further explanation: D(T), R(T), eralized convergence coincides with the norm
B(X, Y), B(X), r(T) (the graph of T), a(T) (the convergence if T,, TEB(X, Y). If X = Y and
spectrum of T), p(T) (the resolvent set of T), p(T) # 0, then ‘i-t T in the generiilized sense
and R([; T) (the resolvent of T). We also use if and only if for some (or equivalently all)
C(X, Y) (resp. A(X, Y)) to denote the set of all [sp(T) we have [gp(T,) for sufficiently large
+closed linear operators (resp. all tlinear n and llR([; T,)-R([; T)/I +O, n+cg. This is
operators) from X to Y and C(X) = C(X, X). called norm resolvent convergence.
1235 331 D
Perturbation of Linear Operators

(2) When X = Y, there is also the notion of there exists 6 > 0 such that I KI < 6 implies r c p
strong convergence in the generalized sense (T(K)). This follows from the upper continu-
[S], which is roughly the strong convergence ity of compact components of the spectrum
of resolvents. In particular, when T” and T are with respect to the metric d of C(X) [S]. Thus
self-adjoint operators in a Hilbert space, T,-t T the separation of the spectrum discussed in (1)
strongly in the generalized sense if R(c; T,)+ is applicable to T(K). In particular, corre-
R(<; T) strongly for some (or equivalently all) sponding to the projection
[ with Im 5 #O. This is called strong resolvent
convergence. R(i; W)di, IKI<6,
(3) Let D c C be a domain. The notion of
analytic&y (holomorphic property) of a family T(K) is decomposed as T(K)= T,(K) @ T2(1c);
T(K)EB(X, Y), KED, of bounded operators and the problem of determining the spectrum
is well known (- 37 Banach Spaces K). of T(K) inside r is reduced to the problem of
This notion is generalized to a family T(K)E determining the spectrum of Tl(~) (I KI < 6).
C(X, Y), KED, of closed operators [l, 51. Suppose now that A = {A,} is an isolated
Namely, T(K) is said to be bolomorphic in D if eigenvalue of T(O) and that m = dim P(O)X < co.
at each K~ E D there exist a Banach space Z Then dim P(rc)X = m, I KI < 6. Moreover, a base
and U(JC)EB(Z,X), V(K)EB(Z, Y), defined near {cpl (K), . . . , (P,(IC)} of P(K)X can be constructed
K~, such that (i) U(K) and V(K) are holomor- in such a way that the (Pi are holomorphic
phic at K~ as families of bounded operators; in { llcl<6’<6} [3,5]. Thus the problem for
(ii) U(K) is one to one and onto D(T(lc)); (iii) T,(K) in this case is just the finite-dimensional
T(K) U(K) = V(K). Let us mention several spe- eigenvalue problem det {lhj, -( T(K)c~~(K),
cial cases. (I) if X = Y and if [E~(T(K)) for all (pk(lc))} =O. The totality {lj(~)} of solutions
ICED, then T(K) is holomorphic in D if and of this equation, i.e., the totality of eigen-
only if R([; T(K)) is holomorphic in D. (II) If values of T(K) near A,, is expressed by one or
D(T(K)) is independent of K and if T(K)u is several power series of K~/” with a suitable
holomorphic in D for every UE D( T(K)) then integer p > 0. If T(K) is a self-adjoint family,
T(K) is holomorphic in D (holomorphic family we can take p = 1 so that the eigenvalues are
of type (A) [S]). (III) Let TE C(X, Y), and let holomorphic near 1,. If H(K) = H(O)+ &P
T(")cA(X, Y)suchthat D(T("))ID(T) and + . . . is a self-adjoint holomorphic family
~~T(")u~~~c"-'(~~IuII+~IITuII),uED(T), wherea, described in example (IV) in (3) of Section C
b, c > 0. Then T(K)u = Tu + KT% + . + IC” T(“) and if m = 1, the power series A(K) = C)Ljd can
+ . . , u E D( T) defines a holomorphic family of be explicitly cofnputed as A1 =(H(%A,, u,), L,
type(A)inD={?cIIicI<(b+c)-‘}.(IV)IfX=Y =(H%,, uo) +(SH(‘h,, H(‘h,), . . . , where
is a Hilbert space and if T(K) is self-adjoint for H(‘)u, = I,u, with /uoll = 1 and where S
real JC,T(K) is said to be a self-adjoint family. = lirn<+ R(c; #‘))(I -P(O)) is the reduced
In particular, the family discussed in (III) is a resolvent. This series is known as the Rayleigh-
self-adjoint holomorphic family if T is self- Schriidinger series. The power series for the
adjoint and T(") is symmetric. associated eigenvectors U(K) = C &uj can also
be computed. For details, including the case of
a degenerate lo (m > l), in which the situation
D. Perturbation of Isolated Eigenvalues becomes more complicated due to the splitting
of eigenvalues, - [S]. The perturbation theory
(1) Separation of the spectrum. Let TE C(X). discussed in this subsection is called analytic
Suppose that a bounded subset A of c(T) is (or regular) perturbation theory.
separated from the rest of c(T) by a simple (3) Even when a problem cannot be handled
closed contour I- (i.e., r c p( T) and A(o(T)\ A) by means of analytic perturbation, it may
lies inside (outside) of r). Then the operator happen that the coefficients kj and uj of formal
power series can be computed up to a cer-
tain j. In many such cases it can be shown
under general assumptions that an asymptotic
which is independent of r, is a projection (i.e., expansion such as n(lc)=n, +A, K+O(K) is
P E B(X) and P* = P). The closed subspaces valid as long as the coefficients involved can be
X, = PX and X, = (I - P)X treduce T and give computed legitimately [2,5]. Estimates for
rise to the decomposition T= TIxl @ Tlx, = O(K) can also be given. This provides a rigor-
Tl @ T2. In particular, a( TJ = c(T) fl {inside ous foundation for the perturbation method in
of r} and a(T,)=o(T)rl {outside of r}. many important practical problems. The case
(2) Let T(K) be holomorphic in D. We as- of degenerate 1, can be treated similarly. The
sume that OED and regard T(O)= T(0) as the strong convergence in the generalized sense
unperturbed operator. Suppose that A and r mentioned in (2) of Section C is used here. This
are as in (1) with T replaced by T(O).Then theory is called asymptotic perturbation theory.
331 E 1236
Perturbation of Linear Operators

E. Perturbation of Continuous Spectra the existence and the completeness of the latter
implies the unitary equivalence of absolutely
For continuous spectra, studying the mode continuous parts. All the results mentioned
of change under perturbation is not usually a above are proved by scattering-theoretic
tractable problem. Rather, certain parts of methods, either by the wave operator ap-
continuous spectra tend to be stable under proach or by the abstract stationary approach
perturbation; and the study of this stability has (- 375 Scattering Theory, esp. B, C).
been a major topic in perturbation theory
(also - 375 Scattering Theory). In this section
we discuss only self-adjoint operators and let F. Some Other Topics
H=jldE(I), H,, . ..) be self-adjoint opera-
tors in a Hilbert space X. For B, and 11 lip to (1) For the perturbation theory for semigroups
be used below - 68 Compact and Nuclear of operators and evolution equations, not
Operators. discussed in this article, - [S, 7,9].
(1) The essential spectrum (- 390 Spectral (2) The detailed structure of continuous
Analysis of Operators E) is stable under com- spectra is hard to analyze. An eigenvalue 1, of
pact perturbation. Namely, if H = HO + K with HO which is embedded in the continuous spec-
compact K, then CT~(H)=O,(H,,) (H. Weyl, trum may diffuse into the continuous spectrum
Rend. Circ. Mat. Palermo, 27 (1909)). More in the presence of a perturbation. In such a

generally, it suffices to assume that R(<; H) case, H(K), IC#O, has no eigenvalues near Lo
- R([; HO) is compact for some (or equivalently but may have a continuous spectrum highly
all) [e p(H) fl p(H,,). Conversely, if X is sepa- concentrated around /I,. This phenomenon
rable and if u~(H)=a,(H,,), then there exist a of spectral concentration is studied, especially
unitary operator U and a compact operator K for some concrete problems, in relation to
such that H = UH, U-’ + K (J. von Neumann, resonance poles (or poles of the holomorphic
ActualitCs Sci. Ind., 229 (1935)). Moreover, any continuation of the resolvent or the scattering
self-adjoint operator H in a separable Hilbert matrix). In some problems, it is proved that
space can be changed into H + K with a pure the first few terms of the perturbation series for
point spectrum by adding a K EB, with I/K lip L(K) that are still computable are related to the
< E for any p > 1 and E> 0 (S. T. Kuroda, Proc. real part of the resonance. Some problems of
Japan Acad., 34 (1958)). I. D. Berg (1971) W. resonance can be treated by the technique of
Sikonia (1971) J. Voigt (1977), and D. Voicu- dilation analyticity, a technique wh.ich is also
lescu (1979) have extended these results to effective in other problems of spectral analysis
normal operators and m-tuples of commuta- (J. Aguilar and J. M. Combes, Comm. Math.
tive self-adjoint operators. Also - 390 Spec- Phys., 22 (1971)).
tral Analysis of Operators I, J. (3) A vast quantity of results in the spectral
(2) The absolutely continuous spectrum (- theory of the Schrodinger operators appear-
390 Spectral Analysis of Operators E) is stable ing in the tschrodinger equation in quantum
under perturbation by the ttrace class. Name- mechanics can be obtained by perturbation
ly, if H = HO + K, with K E B,, then the abso- methods.
lutely continuous parts of HO and H are tuni- For the topics mentioned in (2) and (3) -
tarily equivalent, and in particular o,,(H) = c71.
a,,(H,) (M. Rosenblum, Pacific J. Math., 7
(1957); T. Kato, Proc. Japan Acad., 33 (1957)).
Among generalizations we mention the follow- References
ing two. (i) If R(<; H)- R([; HO)~B1 for some
[E p(H) fl p(H,), then the absolutely continuous [l] F. Rellich, Stijrungstheorie der Spektral-
parts of cp(H,,) and q(H) are unitarily equiva- zerlegung I-V, Math. Ann., 113 (1’937), 600-
lent for any smooth strictly increasing real 619; 113 (1937) 677-685; 116 (1939) 5555570;
function cp (M. Sh. Birman, Izv. Akad. Nauk 117 (1940), 356-382; 118 (1942), 462-484.
SSSR, ser. mat., 27, (1963); T. Kato, Pac$c J. [2] T. Kato, On the convergence of the per-
Math., 15 (1965)). (ii) If HO and H act in different turbation method, J. Fat. Sci. Univ. Tokyo,
Hilbert spaces X0 and X, respectively, and if sec. I, 6 (1951) 145-226.
there exists J E B(X,, X) such that JD(H,) c [3] T. Kato, On the perturbation l.heory of
D(H) and such that the closure of HJ-JH, closed linear operators, J. Math. Sot. Japan, 4
belongs to B,(X,,,X), then the same conclusion (1952), 3233337.
as in (i) holds (D. Pearson, J. Functional Anal., [4] K. 0. Friedrichs, On the perturbation of
28 (1978)). (i) can be derived from (ii). Perturba- continuous spectra, Comm. Pure Appl. Math.,
tion theory for absolutely continuous spectra 1 (1948), 361-406.
is closely related to the study of generalized [5] T. Kato, Perturbation theory for linear
wave operators in scattering theory. In fact, operators, Springer, 1966.
1237 332
Pi (7~)

[6] N. Dunford and J. T. Schwartz, Linear Using this formula, L. van Ceulen (1540- 1610)
operators I-III, Wiley-Interscience, 1958, calculated rt to 35 decimals. In the 17th and
1963,197l. 18th centuries, the Japanese mathematicians T.
[7] M. Reed and B. Simon, Methods of mod- Seki, K. Takebe, and Y. Matunaga computed
ern mathematical physics I-IV, Academic rc to 50 decimals. Since the 17th century, many
Press, 1972, 1975, 1979, 1978. formulas that represent rc as a sum of infinite
[S] T. Kato, Fundamental properties of Hamil- series or as a limit have been used to obtain
tonian operators of Schrodinger type, Trans. more accurate approximate values. The fol-
Amer. Math. Sot., 70 (1951), 195-211. lowing are representations of x known in those
[9] H. Tanabe, Equation of evolution, Pitman, days:
1979. (Original in Japanese, 1975.)
n 2.2.4.4.6.6...
(J. Wallis)
7=1.3.3.5.5.7...

7c 1 12 32 52
-=- - - - (W. Brouncker; for
332 (V1.7) 4 1+2+2+2+...
the notation - 83 Continued Fractions)
Pi (71)
=1-1/3+1/5-l/7+ . .
(J. Gregory, G. W. F. Leibniz)
The ratio of the circumference of a circle to its
diameter in a Euclidean plane is denoted by A, = 4 Arc tan l/5 - Arc tan l/239 (J. Machin).
the initial letter of aspz~srpo~ (perimeter).
Thus rc can be defined as A formula combining Machin’s representation
of rr and the power series Arc tan x = x -
(1/3)x3 +(1/5)x5 - . . . is called Machin’s for-
mula and was often used for calculating an
approximate value of n. By utilizing this for-
The symbol 7chas been used since W. Jones
mula, in 1873 W. Shanks obtained an approxi-
(1675-1749) and L. Euler. The fact that this
mate value of 7cup to 707 decimals. No im-
ratio is a constant is stated in Euclid’s Ele-
provement of his approximation was obtained
ments; however, Euclid gave no statement
until 1946 when D. F. Ferguson calculated 710
about the numerical value of 71.As an approxi-
digits of x and found that Shanks’s value was
mate value of n, 3 has been used from antiq-
correct only up to the 527th digit. The com-
uity. According to the Rhind Papyrus, (4/3)4
putation of an accurate approximate value of
was used in ancient Egypt. Let L,(Z,) be the
n has been made easier by the recent develop-
perimeter of a regular n-gon circumscribed
ment of computing machines, and an approxi-
about (inscribed in) a circle of radius 1. Then
mate value up to l,OOO,OOO decimals has been
the relations
obtained. P. Beckmann [Z] gives a detailed and
humorous historical account of the calculation
L,>n>l,, &L+!,
2n -L 4 of n from ancient times up to the present com-
puter age. Various numerical results obtained
lZn=4z by electronic computers are not formally pub-
hold. Archimedes obtained 3%~ rc< 35 by lished, some being deposited in the UMT
calculating L,, and I,,. In 3rd-century China repository of the editorial office of the journal
Liu Hui used x + 3.14. In Sth-century China, Mathematics of coniputation. Choong et al. [3],
Tsu Chung-Chih mentioned 2217 as an inaccu- using information in Cl], obtained the first
rate approximate value and 355/l 13 as an 21,230 partial denominators of the regular
accurate approximate value of n. These values continued fraction representation of ?I and
were obtained by methods similar to those of described how their numerical evidence tallies
Archimedes. In 5th~century India, Aryabhatta with theoretical results, obtained by the met-
rical theory of continued fractions, which is
obtained rc+ 3.1416, and in 16th-century
Europe, Adriaen van Roomen obtained valid for almost all irrational numbers (e.g.
nk 355/l 13. - C41).
F. V&e represented 2/a in the following In 1761, J. H. Lambert used Brouncker’s
infinite product: expression of a in a continued fraction to
prove that R is irrational. In 1882, C. L. F.
Lindemann proved that x is a ttranscendental
number using Euler’s formula eRi= -1. The
approximate value of K up to 50 decimals is
3.141592653589793238462643383279502884197
16939937510.. (- Appendix B, Table 6).
332 Ref. 1238
Pi (7-r)

References follows: (1) Circular domain: Iz - c I< r. (2)


Half-plane: Re z > 0, or Im z > 0. (3) Angu-
[1] D. Shanks and J. W. Wrench, Calculation lar domain: c(< arg(z -c) < p. (4) Annular
of rt to 100,000 decimals, Math. Comp., 16 domain: r < 1z - c I< R. (5) Slit domain: a
(1962), 76-99. domain obtained by excluding a Jordan arc
[2] P. Beckmann, A history of pi, second I from a domain D, where all points on F
edition, Golem Press, 1971. (except an endpoint lying on the boundary of
[3] K. Y. Choong, D. E. Daykin, and C. R. D) are contained in D. In this case, the Jordan
Rathbone, Rational approximations to rc, arc I is called the slit of the domain.
Math. Comp., 25 (1971) 3877392.
[4] A. Khinchin, Continued fractions, Noord-
hoff, 1963. B. Boundary Elements

A boundary point P of a domain D is called


accessible if there exists a sequence of points
333 (11.18) P, tending to P such that the line Isegments
Plane Domains P, Pz,... lie completely in D. For example, for
the domain obtained by removing x = l/(n +
l),O<y<l/2(n=l,2,...)fromth~zsquare
A. Domains in the Complex Plane 0 <x < 1,O < y < 1, the boundary points with
x = 0,O < y < l/2 are all inaccessible (Fig. 1).
A +domain (i.e., a tconnected open set) in the
tcomplex plane or on the tcomplex sphere is
called a plane domain. The tclosure of such a
domain is called a closed plane domain. In this
article, we consider only subsets of the com-
plex plane (or sphere), and a plain domain is
called simply a domain. The tinterior of a
+Jordan curve J in the complex plane is a
domain called a Jordan domain. In a domain
D, a Jordan arc whose two endpoints lie on Fig. 1
the boundary of D is called a cross cut of D.
For a domain D, each of the following three Let the domain D be bounded by a smooth
conditions is equivalent to the condition that Jordan curve, and let P be a boundary point of
D is tsimply connected: (1) For every cross cut D. Take an angular domain D’ with vertex at
Q of D, D - Q has exactly two tconnected the point P and the initial parts of the two
components. (2) Every Jordan curve in D is sides of D’ lying in D. A curve in 1) converging
thomotopic to one point, that is, it can always to the point P from the interior of the angular
be continuously deformed to a point. (3) The domain D' is called a Stolz’s path or a nontan-
tmonodromy theorem holds in D. gential path ending at the point P.
If D is a domain on a complex sphere, each Let D be a simply connected domain. A
of the following three conditions is equivalent sequence {qy} of cross cuts mutually disjoint
to the condition that D is simply connected: (4) except for their endpoints is called a funda-
The boundary of D consists of a single tcon- mental sequence of cross cuts if it lsatisties the
tinuum or a single point. (5) For every Jordan following two conditions (Fig. 2): (1) Every q,
curve C in D, either the interior or the exterior separates qy-1 and qv+l on D. (2) For v-00,
of C is contained in D. (6) The complement the sequence qv tends to a point on the bound-
of D with respect to the complex sphere is ary. Let {qv} be a fundamental sequence of
a connected (not necessarily arcwise con- cross cuts, and denote by D, the subdomain of
nected) closed set. Jordan domains are simply D separated by qy that contains qy+l. The inter-
connected. section n 0, consists only of the boundary
Let n 3 2 be an integer and D a plane points of D. Two fundamental sequences {q”},
domain. The thomology group H,(D, Z) is {q\} of cross cuts are equivalent if every D,,
identical to Z”-’ if and only if the complement contains all qk except for a finite number of v,
of D in the complex sphere has n connected and every 0; contains all qv except for a finite
components. Then D is said to be n-ply con- number of v. Here D,, Dp are the subdomains
nected or multiply connected without speci- constructed from q,, and q; as above. This
fying n. If D is an n-ply connected domain, condition determines an equivalence relation,
there exist n - 1 suitable mutually disjoint under which the equivalence class of funda-
cross cuts Q,, . . , Qn-l such that D - mental sequences of cross cuts is called a
(Q, U U Qnml) is simply connected. boundary element. This notion is due to C.
Some typical examples of domains are as Caratheodory [2]. The boundary element of a
1239 334 B
Plateau’s Problem

multiply connected domain is defined similarly the shape of a minimal surface, i.e., a surface of
for each isolated component of the boundary. the least area. This experiment was performed
For example, each point of a slit domain, by the Belgian scientist J. A. Plateau (1873)
except for the endpoint of F lying on the to realize minimal surfaces; hence Plateau’s
boundary of the domain, determines two problem is that of determining the minimal
distinct boundary elements on each side. A surfaces bounded by given closed space curves.
closed domain is usually considered to be the It is a problem of the tcalculus of variations.
union of a domain and the set of all its bound-
ary elements. Various notions of tideal bound-
ary come from considering suitable boundary B. Formulation
elements for various purposes (- 207 Ideal
Boundaries). Let F be a tsimple closed curve in xyz-space
such that its projection C on the xy-plane is
also a simple closed curve. Let D be the finite
domain bounded by C. We consider surfaces
z = z(x, y) having common boundary r. Then
under suitable assumptions on the smooth-
ness of z(x, y), the problem is to minimize the
tfunctional

J[z]= /mdxdy;
Fig. 2 JJ
aZ aZ
C. Domain Kernels P=z’
q=&’

Let {G,} be a sequence of domains containing with the condition that z = z(x, y) has r as its
the origin 0. If a suitable neighborhood of the boundary. The +Euler-Lagrange differential
origin is contained in G, for all v, there exists a equation for the functional J[z] is
domain G such that every closed domain
containing the origin and contained in G is
contained in G, except for a finite number of v.
The union K of such domains G is called the or (1 +$)r-22pqs+(l +$)t=O, r=a2.2/ax2,
domain kernel of the sequence {G,} (Carathto- s = a2z/axay, t = d2zldy2, which is a second-
dory). If there is no neighborhood of the origin order tquasilinear partial differential equation
contained in G, for all v, we put K = (0). of elliptic type and whose geometric interpre-
If every infinite subsequence of {G,} has the tation had already been given by M. C. Meus-
same domain kernel K, then we say that the nier (1776).
sequence {GY} converges to K. The notion of To formulate the problem more generally,
domain kernel is important in considering the let a surface be expressed in vector form x =
limits of a sequence of conformal mappings x(u, v) by means of parameters u, u. Let its
(- 77 Conformal Mappings). +first fundamental form be dx2 = Edu2 +
2F du dv + G dv2 and its tsecond fundamental
form be -drdn=Ldu2+2Mdudv+Ndv2,
References
with n = n(u, u) the unit normal vector. By
[1] M. H. A. Newman, Elements of the to- equating to zero the +lirst variation of the
pology of plane sets of points, Cambridge areal functional
Univ. Press, second edition, 1951.
[2] C. Caratheodory, Uber die Begrenzung
einfach zusammenhangender Gebiete, Math.
Ann., 73 (1913), 323-370. based on infinitesimal displacement in the
normal direction, we obtain the Euler-
Lagrange equation in the form
2H=(NE-2MF+LG)/(EG-F’)=O,
334 (X.33)
where H = (R;’ + R;‘)/2 is the +mean curva-
Plateau’s Problem ture of the surface and R, and R, are the
+radii of principal curvature. Since +Beltrami’s
A. Origin second differential form satisfies A,x = Hn, the
condition for a minimal surface becomes Ax =
Because of surface tension, a soap membrane 0 (with A the +Laplace operator) provided
bounded by a given closed space curve takes that isothermal parameters u, u satisfying E =
334 c 1240
Plateau’s Problem

G, F = 0 are chosen, i.e., the vector x(u, u) rep- from the variational problem of minimizing
resenting a minimal surface is harmonic (the D[f]. The variational equation of D[f] is Af=
components of this vector are tharmonic func- 0.
tions of u, u). Let q(u, u) be a harmonic vector (3) An analytic vector s(w) is representable
conjugate to X(U, u). Then isothermality is ex- in terms of the boundary values of its real part.
pressed by the condition that the analytic For instance, if the domain of w is the unit
vector ~(w)=x(u,u)+~~(u,u) (w=u+iv,i= disk IwI < 1, then Poisson’s integral formula
J-1) satisfies r(w)’ = 0 (Weierstrass). In 2n
1 eie+w
general, a minimal surface is defined as a sur- 5(w)=- b(Q)- de + iIm s(w)
face with everywhere vanishing mean curva- 237s 0 e’*-w
ture, and Plateau’s problem is to determine the with the boundary function b(0) =: Re g(e”)
minimal surface with a preassigned boundary. can be used. On the other hand, the vector
In this formulation, the problem can be easily function that minimizes the Dirichlet integral
generalized to an n-dimensional Euclidean among functions with fixed boundary values is
space R” (- 275 Minimal Submanifolds). harmonic. Based on these facts, Douglas trans-
formed Dirichlet’s functional with harmonic
argument functions into a functional whose
C. Existence of a Solution
arguments are boundary functions. Specifi-
The existence of a solution of Plateau’s prob- cally, by starting from the problem of minimiz-
ing Douglas’s functional
lem was discussed by S. N. Bernshtein (1910)
from the viewpoint of a tboundary value prob-
lem of the first kind for the elliptic partial
differential equation in the previous section. A.
Haar (1927) dealt with the minimal problem
ss
2n

0
2n
h(4 - bka2
o 4sin2(0-(p)/2

we can prove the existence of solution


dOdq,

of
for the functional J[z] by a tdirect method Plateau’s problem satisfactorily.
in the calculus of variations. Previously,
Riemann, Weierstrass, Schwarz, and others dis- D. The Generalized Case
cussed the case where the given space curve
r is a polygon, in connection with the +mono- Up to now we have been concerned with
dromy group concerning a second-order linear Plateau’s problem in the case of a single simple
ordinary differential equation. Subsequently, closed curve. Douglas, Courant, and others
R. Garnier (1928) investigated the existence of treated the generalized case of a fmite number
a solution by the limit process when l- is a of boundary curves, where +genus and orienta-
simple closed curve with bounded curvature. bility are assigned as the topological structure
However, when r is assumed merely to be of the surface to be found. The existence of a
trectifiable, the existence of a solution was first solution has been shown in this c,lse also. The
shown by the limit process by T. Rad6 (1930). problem is further generalized from the case of
He further discussed the general case where r fixed boundary to the case where the bound-
can bound a surface with finite area. On the ary is merely restricted to lie on a given mani-
other hand, by introducing a new functional fold [3]. On the other hand, C. B Morrey
depending on boundary values instead of the (1948) generalized the problem by replacing
area1 functional, J. Douglas (193 I) succeeded the ambient space R” by an n-dimensional
in giving a satisfactory result for the existence +Riemannian manifold and gave the existence
problem. R. Courant (1937) gave another proof in considerable generality [6].
existence proof by reducing Plateau’s problem
to the IDirichlet principle [3].
At present, the methods of discussing the E. Relation to Conformal Mappings
existence of solutions of Plateau’s problem can
be classified into the following three sorts There is a notable relation between Plateau’s
(represented, respectively, by Rad6, Courant, problem and conformal mapping when the
and Douglas): dimension of the space is 2. Namely, the exis-
(1) The first method is to minimize directly tence proof of the solution of the former for a
the areal functional jj Jwdu do. The Jordan domain implies +Riemann’s mapping
variational equation of the areal functional theorem together with W. F. Osgood and C.
becomes H = 0. CarathCodory’s result on boundary corre-
(2) Dirichlet’s functional for a scalar function spondence (- 77 Conformal Mappings).
f(u,u)isdefined byD[f]=~~(f,2+f,Z)dudu,
and for an n-dimensional vector function f(u, u) F. New Developments
with components fj(u, u) (j = 1, . . . , n) by D[f] =
&, D[f;]. The existence of a solution of Among recent contributions to the study of
Plateau’s problem can be discussed by starting Plateau’s problem, the following remarkable
1241 335 Ref.
PoincarC, Henri

results have emerged. One of them is con- teau’s problem, Ann. Math., (2) 91 (1970), 550-
nected with the final result of Douglas (1939) 569.
on the existence of solution surfaces. The [S] S. Hildebrandt, Boundary behavior of
mapping of a 2-dimensional manifold with minimal surfaces, Arch. Rational Mech. Anal.,
boundary into R” defining Douglas’s solution 35 (1969), 47-82.
of the Plateau problem for a finite number of [9] J. C. C. Nitsche, A new uniqueness theo-
simple closed curves is a tminimal immersion rem for minimal surfaces, Arch. Rational
with the possible exception of isolated points Mech. Anal., 52 (1973), 319-329.
where it fails to be an immersion. These points [lo] H. Federer, Geometric measure theory,
are called branch points. It was then proved Springer, 1969.
by R. Osserman (1970) and R. D. Gulliver [ 1 l] F. J. Almgren, Plateau’s problem. An
(1973) that for n = 3 the mapping of Douglas’s invitation to varifold geometry, Benjamin,
theorem, which is a surface of least area, is 1966.
free of branch points, i.e., is an immersion.
Osserman also gave examples of generalized
minimal surfaces in R” (n > 3) with true branch
points. In this connection, Gulliver also dealt
with an analogous problem for surfaces of 335 (XXl.39)
prescribed mean curvature. PoincarQ, Henri
Next, we mention the question of boundary
regularity. H. Lewy (1951) proved that if the
Henri Poincare (April 29, 1854-July 17, 1912)
boundary of a minimal surface is analytic,
was born in Nancy, France. After graduating
then the surface is analytic up to the bound-
from the Ecole Polytechnique, he taught at the
ary. Subsequently, S. Hildebrandt (1969) and
University of Caen in 1879, then at the Uni-
others proved that if the boundary is of class
versity of Paris in 1881. He was made a mem-
Cm+, m > 1, the surface is also of class Cm*” up
ber of the Academic des Sciences in 1887 and
to the boundary. There are also some recent
of the Academic Francaise in 1908. He died in
results on the number of solutions of Plateau’s
Paris.
problem. For instance, J. C. C. Nitsche (1973)
His achievements center on analysis and
proved the uniqueness of solutions for analytic
applications to theoretical physics and astron-
boundaries of ttotal curvature at most 4a.
omy. However, his work covered many fields
Further developments in connection with
of mathematics, including arithmetic, algebraic
Plateau’s problem have emerged in the work
geometry, spectral theory, and topology. His
of E. R. Reifenberg (1960) and others, who
tuniformization of analytic functions by means
sought to minimize the tHausdorff measure
of the theory of tautomorphic functions in
among general classes of geometric objects,
1880 is especially notable. His paper on the
not as parametrized manifolds, but as subsets
tthree-body problem won the prize offered by
of R”. The existence and regularity of solutions
the king of Sweden in 1889.
of Plateau’s problem from this point of view
The methods he developed in his three-
have been discussed by H. Federer (1969), W.
volume Mkanique cdeste (1892- 1899) began
H. Fleming, F. J. Almgren, and others [lo].
a new epoch in celestial mechanics. In addi-
tion, Poincare opened the road to talgebraic
topology and made suggestive contributions to
References
the ttheory of relativity and tquantum theory.
He asserted that science is for science’s own
[l] T. Rado, On the problem of Plateau, Erg.
sake [4], and his popular philosophical works
Math., Springer, 1933.
concerning the foundations of natural science
[2] J. Douglas, Solution of the problem of
and mathematics exhibit a lucid style.
Plateau, Trans. Amer. Math. Sot., 33 (1931)
263-321.
[3] R. Courant, Dirichlet’s principle, con- References
formal mapping and minimal surfaces, Inter-
science, 1950. [l] H. Poincare, Oeuvres I-XI, Gauthier-
[4] J. C. C. Nitsche, Vorlesungen iiber Mini- Villars, 1916-1956.
malflachen, Springer, 1975. [2] H. Poincare, Les mithodes nouvelles de la
[S] R. Osserman, A survey of minimal sur- mecanique celeste l-111, Gauthier-Villars,
faces, Van Nostrand, 1969. 1892-1899.
[6] C. B. Morrey, Multiple integrals in the [3] H. Poincare, Science et hypothese, Flam-
calculus of variations, Springer, 1966. marion, 1903.
[7] R. Osserman, A proof of the regularity [4] H. Poincare, La valeur de la science, Flam-
everywhere of the classical solution to Pla- marion, 1914.
336 A 1242
Polynomial Approximation

[S] H. Poincare, Science et mtthode, Flam- is called the best approximation of ,T(x) by a
marion. 1908. linear combination of (cp,(x)}. For any given
n there is a best approximation of Jr(x) by a
linear combination of q,(x), . , ~JX), but such
an approximation is not always unique. For
such an approximation to be unique it is neces-
336 (X.20) sary and sufficient that the determinant of the
Polynomial Approximation matrix (qk(xi)) (k, i = 0, 1,2, , n) is not zero,
where x0, xi, . . . . x, are n + 1 arbitrary distinct
points of A (Haar’s condition) (Math. Ann., 78
A. General Remarks
(1918)). If {cp,(x)} satisfies this condition, the
system of functions {‘pk}k=o,,,,,, is called a
On the existence of polynomial approxima-
Chebyshev system (or unisolvent system). The
tions, we have Weierstrass’s approximation
sets { 1, x,x*, , x”} on [a, b], { 1, cosx, ,
theorem, which is formulated in the following
cos nx} on [0, rr] and {sin x, . , sin nx} on [0, rr]
two forms: (i) Iff(x) is a function that is con-
are Chebyshev systems. For a Chebyshev sys-
tinuous in the finite interval [a, b], then for
tem {qk(x)} on [a, b], let P,(x) be a linear com-
every E> 0 there exists a polynomial P,(x) of
bination of q,(x), , q,,(x) that is not identical
degree n = n(s) such that the inequality If(x)
to the function f~C[a, b]. Then P,,(x) is the
- P”(x)1 GE holds throughout the interval
best approximation for ,f(x) if and only if there
[a, b]. (ii) If ,f(0) is a continuous function of
are at least n + 2 distinct points xi < . . < x,,+~
period 2n, then corresponding to every posi-
of [a, b], where If(x) - P”(x)1 attains its maxi-
tive number E there exists a trigonometric
mum (these points are called deviation points)
polynomial of degree n = PI(E),
and (f(xi)-p,(Xi))(f(Xi+,)-P,(Xi+ I)<0 (i=
1,. , n + 1) (Chebyshev’s theorem).
P”(0) = u. + t (a,cos k0 + bk sin kO), (1)
k=l For example, consider the polynomial P,(x)
= a,_, xnml + . . + a, x + a, with real coefh-
such that the inequality If(o)-P,,(O)1 <F. holds
cients such that
for all values of 0. The second form of Weier-
strass’s theorem follows from the first, and max Ix”--a,-ix”-‘-...--a,1
-1 GXCl
conversely. M. H. Stone obtained a theorem
that generalizes Weierstrass’s theorem to the takes its smallest value. Then x” - P,(x) =
case of functions of several variables. Of the 2-(“m1)T,(x), where T,(x)=cos(narccosx) is
many direct proofs now available for Weier- the Chebyshev polynomial of degree n.
strass’s theorem, we mention two simple ones. Since the best approximation is desired for
To prove version (i) of the theorem, we can numerical computation, several mlethods have
assume that the given function f(x) is defined been developed to find it (- 300 Numerical
in the segment [0, 11. Consider the Bernshtein Methods). However, when the set 4 c R”
polynomial (n > 2) contains three nonintersecting arcs
emanating from a common point, A admits no
B,(x)=if ; ; Xk(l-xp.
Ii=000
Chebyshev system. Thus we do not always
have a unique best-approximation polynomial
Then B,(x) converges to f(x) tuniformly. To C161.
prove (ii), we can apply tFejer’s theorem on
+Fourier series. We have the following gen-
C. Degrees of Approximation and Moduli of
eralization of(i): Let p, , p2, be a sequence of
Continuity
positive numbers such that limp, = co. Then
linear combinations of x0 = 1, xp’, Xpz, . can
For a continuous function ,f(x) defined on
uniformly approximate each continuous func-
[a, b], the modulus of continuity of ktb order is
tion on [0, 11 with arbitrary precision if and
defined by
only if C pi’ = --co (Miintz’s theorem).

B. Best Approximations

Let qo(x), ‘p,(x), be a sequence of linearly for t <(b -a)/k. In particular, (ui i:< the ordi-
independent continuous functions on a nary modulus of continuity. Put E.*(f)=
bounded closed domain A in R”. For any info“,...,a.,h,,_...h.max,,,,hlf(x)-~‘,(x)l, where
given continuous function f(x), a function f is a continuous periodic function of period
P,(x)=C”,=,c,cp,(x) attains 271 and P.(x) is a trigonometric polynomial of
the form (I). Then E,*(f)<c,w,(f; l/(n+ 1))
inf max I.f(x) - P,(x)1
C,>.....C”xtA (Jackson’s theorem [I]), where ck is indepen-
1243 336 F
Polynomial Approximation

dent off: The best possible coefficient clr has fcLipcr(O<a<l)(- 84ContinuousFunc-
been determined by J. Favard [2]. Further tions A), then If(x)-a.(x)/ =O(nP). However,
investigations on the relation between E,*(f) If(x)-a.(x)l=O(n-‘) if and only if the tconju-
and ~~(f; t) have been carried out by S. N. gate function J(x)ELip 1; If(x)--o.(x)1 =o(n-‘)
Bernshtein [3] and A. Zygmund [4]. S. B. if and only if f(x) is constant (M. Zamansky
Stechkin obtained the following results: [7], G. Sunouchi and C. Watari [S]).

E. Trigonometric Interpolation Polynomials

[S, 61. For the approximation of fec( [ -1, 11)


Since the trigonometric system is a Chebyshev
by polynomials, there exists a polynomial
system, given 2n + 1 distinct points x,,, xi, . . ,
P,,(x) of degree at most n such that for any
xln and arbitrary numbers c,,, ci, . . . , cZn,
xsc-1,119 there is always a unique trigonometric poly-
If(x) - P”(X)IG WW)‘~, (f”‘i +4X nomial of degree n with prescribed values ck at
the points xk. Given any continuous function
where M, is a constant not depending on J x, f(x) with period 27r, the trigonometric poly-
and n, f(‘)(x) is the rth derivative of f(x), and nomial that coincides with f(x) at the points
t(x)=(l/n)(m+(lxl/n)). We also have xk is called the trigonometric interpolation
theorems evaluating w,(f(‘); t) in terms of If(x) polynomial with nodes at xk. If xk = 2nk/(2n + 1)
- P,,(x)l. For the proof of these theorems, (k=O,l,..., 2n), then the interpolating trigo-
estimation of the magnitude of the derivative nometric polynomial is given by
of the polynomial of degree n plays an essen-
tial role. For example, we have the Bernshtein sin((n+ 1/2)(x-xj))
inequality max, 1T,‘(x)1 < nmax, 1T,(x)1 for any
M.L-4= &jiof(xJ sin((x - xj)/2)
trigonometric polynomial T,(x) of degree n 2n
1 sin((n + 1/2)(x - t)ldrp (t)
and the Markov inequality f(t)
=-s3.c 0 sin((x- t)/2) ” ’

where q,(t) is a step function that has the


for x E [ -1, l] and any polynomial P,(x) of value 2nj/(2n + 1) in [2nj/(2n + l), 2rc( j + l)/
degree n. (2n + l)]. U.(f; x) resembles the partial sum
s,(x) of the Fourier series of f(x). If f(x) is con-
tinuous and of tbounded variation, then
U,(f; x) converges uniformly to f(x) (D. Jack-
D. Approximation by Fourier Expansions
son Cl]). Although the partial sum s,(x) of the
Fourier series of a continuous function f(x)
If {q,(x)} is an torthonormal system of func-
converges almost everywhere to f(x), there is a
tions in &(a, b) and f is any function in
continuous function for which &(Xx) diverges
&(a, b), then among all linear combinations of
everywhere (J. Marcinkiewicz [9]). Moreover,
q,(x), . . . , q,(x) the one that gives the best
there exists a continuous function for which
mean square approximation to f (i.e., the one
(l/n)(C;=i U,(f;x)) diverges everywhere (P.
for which the integral
Erdiis [lo], G. Griinwald [ 111). Restating
these facts for the algebraic polynomial case,
we can conclude that there is a continuous
function defined in [ -1, l] for which the tLa-
attains its minimum) is the Fourier polynomial
grange interpolation polynomial and its arith-
~{=oakpk(x), where uk =ji f(x)cp,(x)dx. Con- metic mean are both divergent everywhere if
sequently, the least square approximation (or
we take as nodes the roots of the Chebyshev
best approximation with respect to the L,-
polynomial of degree n.
norm) by trigonometric polynomials is given
by the partial sum s,(x) of the Fourier series of
f(x). For L, (1~ p < co), s,(x) also gives the F. The Case of a Complex Domain
best approximation up to a constant factor,
but in the case of uniform approximation we If a given function f(z) is holomorphic in a
have If(x)-s.(x)l<A(logn)w,(f;n-‘), and this bounded tsimply connected domain E in the
result cannot be improved in general. There is complex plane and continuous in E, then f(z)
no linear operation that gives the best trig- is approximated uniformly by polynomials on
onometric approximation. In approximation any compact set in E (Runge’s theorem). This
with a linear combination of cpc(x), . . . , q,(x), theorem was first studied by C. Runge, and his
the saturation phenomenon of approximation results were developed by J. L. Walsh and M.
often appears. For example, observe the arith- V. Keldysh (e.g., [ 123). When E contains no
metic means of s,(x) (i.e., tFejCr means e”(x)). If interior point, the polynomial approxima-
336 G 1244
Polynomial Approximation

tion of a continuous function defined in E cessive coefficients of the polynomtal P,,(z) can
was given by M. A. Lavrent’ev. Unifying these be calculated by +finite differences. Conver-
two results, S. N. Mergelyan obtained the gence of Newton’s interpolation polynomial is
following theorem [ 131: A necessary and sufh- closely connected to convergence of +Dirichlet
cient condition for an arbitrary function con- series.
tinuous on a compact set E and holomorphic
inside E to be approximated on E uniformly
by polynomials is that the set E does not H. Chebyshev Approximation
divide the complex plane.
On the degree of approximation of poly- Let D be a bounded closed subset of the com-
nomials to f‘(z) on a simply connected domain, plex plane, and f(z) a continuous function on
there are the following results: Let D be a D. Then there exists a polynomial n,(z) of de-
closed bounded set whose complement K is gree n such that max,,, If(z) - rc,(z)( attains
connected and regular in the sense that K the infimum E,,(j). The polynomial n,(z) is
possesses a iGreen’s function C(x, y) with a unique and is called the best approximation
pole at infinity. Let D, be the locus G(x, y) = polynomial (in the sense of Chebyshev). If D is
log R > 0. When ,f‘(z) is holomorphic on D, simply connected and j’(z) is single-valued and
there exists the largest number p with the holomorphic on D, then n,(z) converges to f(z)
following property: ,f(z) is single-valued and uniformly on D. Moreover, in this case there
holomorphic at every interior point of D,. If exist a number M that does not depend on n
R < p, there exist polynomials P,(z) of degree n and a number R > 1 such that If(z) - n,,(z)1 $
(n = 1,2, . ) such that I,f’(z) - P,,(z)1 < M/R” for M/R”. Assuming that S(z) satisfies certain ad-
ZE D, where M is a constant independent of II ditional conditions, W. E. Sewell [ 141 proved
and z. On the other hand, there exist no such the existence of a constant r such that I,f(z) -
polynomials P.(z) on D for R > p (Bernshtein n,(z)1 < M/n’R”. Furthermore, by approximat-
and Walsh [12]). ing f(z) = z” by polynomials of degree n - 1,
we can show that there exists a polynomial
T,(z) of degree n such that
G. Lagrange’s Interpolation Formula
min ~~~~z”+a,z”~‘+...+a,l =:IT,(z)].
For each n (n = 0, 1, ), let z(i”), zy), . , z!$:!, be 1 I
a given set of real or complex numbers, and let T,(z) is called a Chebyshev polynomial of de-
f(z) be an arbitrary function. Then there is a gree n with respect to the domain D. Similar
unique polynomial of degree n that coincides statements are valid for functions of a real
with f(z) at each point zp) (k = 1, , n + 1). variable. In particular, when D = [ -1, 11, we
This is called Lagrange’s interpolation poly- have
nomial and is given by
T,(x)=cos(narccosx)/2”-‘,

which is the ordinary (real) Chebyshev poly-


nomial. Generally, the limit
w(z)=(z-zyy...(z-z?$).
The sequence P,,(z) does not always converge
to j(z). For example, if we take f(z) = l/z and
exists, and p(D) coincides with the tcapacity
the (n $- 1)st roots of I as zp), then P,,(z) = z”
and ttranstinite diameter of D [ 151. For new
and P,,(z) converges to j(z) only at the point 1.
results and applications of Chebyshev poly-
For real variables also, there are examples
nomials - [17].
of divergent P,(z). However, if ,f(z) is holo-
If the method of evaluating the degree of
morphic in Iz( <p (p> l), then P,,(z) with the
approximation using the absolute value I.f(z) -
(n+ 1)st roots of 1 as nodes converges to j’(z)
n,(z)] is replaced by methods using a +curvi-
uniformly in IzI < 1.
linear integral or tsurface integral, as explained
When zp’ is independent of the choice of n,
below, we still obtain similar results. Let D be
P,(z) coincides with the sum of the first n terms
a closed domain in the complex plane with a
of iNewton’s interpolation formula. In this
boundary C that is a rectifiable Jordan curve.
case, P,,(z) is called Newton’s interpolation
If .f(z) is single-valued and holomorphic on D,
polynomial and is given by
then there exists a polynomial rr,( z) of degree
P,(z)=a,+a,(z-z,)+a,(z-z,)(z-z,)+... n that minimizes the integral ~eu(.z)If‘(z)-
n,(z)lPldzl (p>O), where u(z) is a given posi-
+a,(z-z,)...(z-z,),
tive continuous function on C. Moreover,
where ao=S(zl); al=(~(z,)-f(z,))l(z,-~,) I.f‘(z) - n,(z)] < M/R” for some R > 1 (actually
(z,#z,), a, =.f’(z1)(z2=z,); and so on. Suc- {Z,,(Z)} is +overconvergent). If D is a closed
1245 336 Ref.
Polynomial Approximation

Jordan domain and if f(z) is single-valued and ical calculation of functions. To get the best
holomorphic in D, then there exists a poly- approximating polynomial p(x) = P,(x) =
nomial n,(z) of degree n that minimizes the C~+c,rp,(x) of f(x) (- Section B), we must
integral rSDu(z)lf(z)-n,(z)lPdS, where u(z) is determine coefficients ck that satisfy the con-
a given positive continuous function on D. ditions of Chebyshev’s theorem. The first step
Moreover, If(z) - rc,(z)l < M/R” for some R > 1 in this process is the orthogonal development
on D. of f(x) by Chebyshev polynomials {T,(x)} :
%(x)=%oak&(u), u=(x-(A+B)/2)/((B-
A)/2). The error If(x) - rp,(x)l is estimated by
I. Approximation by Orthogonal Polynomials a constant multiple of T,+,(u): If(x) - cp,(x)l <
on a Curve K 1T,,, (u)l. This Chebysbev interpolation is
actually given by a, = N ml Ck,f(xi), ak =
Let C be a rectifiable Jordan curve in the 2N-’ CE,f(xi)T(ui) (k= 1, . . . ,n), where N =
complex plane, and let pk(z)~tL2(C). If n+ 1 and the ui=(xi-(A + 8)/2)/((8-A)/2)
JCPk(4~ldzI = Sk,, then {PAZ)} is called (i=l,..., N) are the roots of T,(u). Let M
an orthonormal system on C. Given a holo- be the extremum of the error If(x) - cp,(x)l
morphic function f(z) on D, we set ak = of such an approximation, and set f(xi) -
JcfWd4ld z I an d consider the formal series 9,(x,) = f Mi (i = 1,2, . . , N). Consider a func-
Ckm,a a,p,(z). If we denote the nth partial sum tion cP,(x) = C ak K(u) satisfying f(xi) - @(xi) =
of this series by s,(z), then s,(z) is the least + M. Then solve the linear equation @“(xi) -
square approximation by a linear combination cp,(x,) = f (M - Mi) with respect to Aa, = ak - ak
of pa(z), ,p,(z). This and other results, such and M. Repeat this process until Aa, becomes
as tBessel’s inequality, the tRiesz-Fischer sufficiently small.
theorem, etc., are all valid here as in the theory A computer can perform the division very
of general torthogonal systems. In particular, if quickly, and the rational approximation of a
wetakelzl=RasC,then{l,z,~~ ,... jisan function, for example by its tcontinued frac-
orthogonal system. Since in this case tion expansion, is often useful.
1
ak=2aR2k+’ c f(z)zkldzl =L fo,dz,
27ci s =zk+’
s References
and s,,(z) = a, + a, z + . . + a,~“, the tTaylor
expansion of f(z) coincides with the ortho- [1] D. Jackson, The theory of approximation,
gonal expansion of f(z). Amer. Math. Sot. Colloq. Publ., 1930.
Given a compact domain D and a holomor- [2] J. Favard, Sur les meilleurs procedes
phic function on D, if there exist orthogonal d’approximation des certains classes des fonc-
polynomials p,,(z) such that the orthogonal tions par des polynomes trigonometrique,
expansion of f(z) with respect to p,,(z) con- Bull. Sci. Math., 61 (1937), 2099224, 243-265.
verges to f(z) uniformly on D, we say that [3] S. N. Bernshtein (Bernstein), Sur l’ordre de
{p,(z)} belongs to the domain D. The problem meilleure approximation des fonctions con-
of existence and determination of such poly- tinues par des polynomes de degre don&
nomials for any given domain was proposed Mtm. Acad. Roy. Belgique, ser. 2,4 (1912),
and first solved by G. Faber. Generalizations l-104.
were given by G. Szego, T. Carleman, and [4] A. Zygmund, Smooth functions, Duke
Walsh. Roughly speaking, p,,(z) is given by the Math. J., 12 (1945), 47-75.
orthogonalization of the system { 1, z, z’, . } [S] N. I. Achieser, Theory of approximation,
with respect to the curvilinear integral on C = Unger, 1956. (Original in Russian, 1947.)
3D or the surface integral on D. [6] A. Zygmund, Trigonometric series, Cam-
bridge, 1959.
[7] M. Zamansky, Class de saturation de
J. Numerical Approximation of Functions certains proctdts d’approximation des series
de Fourier des fonctions continues et applica-
The accuracy of the approximation of a given tions a quelques probltmes d’approximation,
function f(z) by the partial sums of its tTaylor Ann. Sci. Ecole Norm. Sup., sir. 3, 66 (1949),
expansion C a,(x -x0)” decreases rapidly as 19-93.
the distance Ix-x01 increases. The accuracy of [8] G. Sunouchi and C. Watari, On the deter-
the approximation of f(x) defined on a com- mination of the class of saturation in the the-
pact interval [A, B] by a (polynomial) function ory of approximation of functions II, TBhoku
q(x) can be evaluated by means of the least Math. J., 11 (1959) 480-488.
square approximation, the best approximation [9] J. Marcinkiewicz, Sur la divergence des
with respect to the uniform norm, and so on. polynomes d’interpolation, Acta Sci. Math.
The second method is best suited to numer- Szeged., 8 (1937) 131-135.
337 A 1246
Polynomials

[lo] P. Erdos, Some theorems and remarks on terminates) X,, . , X, over R, and its element
interpolations, Acta Sci. Math. Szeged., 12
F(X,,Xz ,..., X,)=~a,l.~...,mX;~~l’~‘X~~
(1950), 11-17.
[l l] G. Griinwald, Uber Divergenzerschei- (2)
nungen der Lagrangeschen Interpolations- (C denotes a finite sum for nonnegative in-
polynome der stetigen Functionen, Ann. tegral vi beginning with vr = v2 = . = v,,,= 0) is
Math., (2) 37 (1936) 908-918. called a polynomial in m variables X, , . , X,
[ 121 J. L.. Walsh, Interpolation and approxi- over R. We call each summand a term of the
mation by rational functions in the complex Pob-omiaL aY,Y2,..Ymthe coefficient and vr + v2
domain, Amer. Math. Sot. Colloq. Publ. 1935; + . + v,,, the degree of this term. The greatest
fifth edition, 198 1. degree of terms is called the degree of the
[ 131 S. N. Mergelyan (Mergeljan), On the polynomial 8’. The term ~,,,,a of degree 0
representation of functions by series of poly- is called the constant term of F. If a polyno-
nomials on closed sets, Amer. Math. Sot. mial F in X,, . , X, is composed of terms of
Transl., ser. I, 3 (1962) 287-293. (Original in the same degree n, then F is called a bomoge-
Russian. 1951.) neous polynomial (or form) of degree n; a
[14] W. E. Sewell, Degree of approximation polynomial consisting of a single term, such
by polynomials in the complex domain, Ann. asaX;lX~...X;~, is called a monomial.
math. studies 9, Princeton Univ. Press, 1942. Now let c(~,cQ, . . ..a. be elements of R (or a
[ 151 E. Hille, Analytic function theory II, commutative ring S containing R), and let
Ginn, 1962. F(sc,,cc,, ,a,,,) denote the element of R (or
[ 161 J. R. Rice, The approximation of func- S) obtained by substitution of x1, c(~, , xm
tions, Addison-Wesley, I, 1964; II, 1969. for X,, X,, , X, in F(X,, X,, , X,). It is
[ 171 S. J. Karlin and W. J. Sneddon, Tcheby- also called a polynomial in x1, x2, . . , c(,. If
sheff systems with applications in analysis and F(a,,a,,..., c(,J = 0, then (xi, x1, . . , a,) is
statistics, Interscience, 1966. called a zero point (in S) of the polynomial
F(X,,X,,..., X,) (or a solution of the alge-
braic equation F(X,, . . , X,) = 0). In the case
of one variable, it is called a root of F(X,) (or
of F(X,)=O).
337 (111.4)
Polynomials
C. Polynomial Rings

A. Polynomials in One Variable Addition and multiplication in R[X] are


defined by
Let R be a commutative +ring and a 0, a 1, “‘> a,
elements of R. An expression ,f(X) of the form

f(X)=a,+u,X+...+u,X (1)
A polynomial f(X)eR [X] can be regarded
is called a polynomial in a-variable X over R; if
as a function of a commutative ring R’ con-
a, #O, the number n is called the degree of the
taining R into itself such that c-f(c). In this
polynomial f(X) and is denoted by degf: If a,
sense, f(X)+g(X) and f(X)g(X) are the func-
= 1, the polynomial (1) is called a manic poly-
tions such that c++f(c)+g(c) and L Hf(c)g(c),
nomial. The totality of polynomials in X over
respectively.
R forms a commutative ring with respect to
It holds that deg(f(X) + g(X)) <
ordinary addition and multiplication (whose
max{degf(X),degg(X)}, degf(XMX)~
definition will be given later). It is called the
degf(X)+degg(X). If R is an tintegral do-
ring of polynomials (or the polynomial ring) of
main, then the latter inequality is an equality,
X over R and is denoted by R[X]. We say
and therefore R[X] is an integral clomain.
that we adjoin X to R to obtain R[X].
For these inequalities and for convenience
elsewhere, we define the degree of 0 to be
B. Polynomials in Several Variables indefinite.
Assume that R is a field. For given J gcR [X]
Let R[X, Y] denote the ring R[X][Y], name- (degg > l), we can find unique 4, rER [X] such
ly, the ring obtained by adjoining Y to R [Xl. thatf=gq+randdegr<deggorr=O(divi-
An element of R[X, Y] can then be expressed sion algorithm). This q is called the integral
as Car” X’Y”. This expression is called a quotient off by g, and r is called the remainder
polynomial in X and Y over R. Generally, of ,f divided by g. The same fact remains true
R[X,, . . . . X,]=R[X,, . . . . X,~,][X,] is called in the general R[X] if g(X) is manic. (- 369
the polynomial ring in m variables (on m inde- Rings of Polynomials).
1247 337 I
Polynomials

D. Factorization into Primes talgebraic number field k is irreducible, we


can obtain an irreducible polynomial in
Let k be an integral domain. Since k[X] and X i, . . . , XP (0 <p <m) from the polynomial
hencek[X,,..., X,,,] are integral domains, we F(X1, ..., X,,,) by assigning appropriate values
can define the concepts concerning divisibility inktox,,,,..., X, (Hilbert’s irreducibility
(such as a divisor, a multiple, etc.) (- 67 Com- theorem, J. Reine Angew. Math., 110 (1892)).
mutative Rings). If k is a tunique factorization These two theorems have been generalized in
domain, then so are k[X] and k[X,, . . ,X/J. many ways and given precise formulations. In
A polynomial over k is said to be primitive if Hilbert’s irreducibility theorem, the algebraic
the greatest common divisor of all the coefli- number field may be replaced, for example,
cients is equal to 1. Every polynomial over k by any infinite field that is Vinitely generated
can be uniquely expressed as a product of over its tprime field (K. Dorge, W. Franz, E.
some primitive polynomials and an element of Inaba).
k; a product of primitive polynomials is primi-
tive (Gauss’s theorem).
Ifkisafield, then k[X] and k[X,,...,X,,,] G. Derivatives
are unique factorization domains. Further-
more, to find the greatest common divisor Given a polynomial
(f; g) off; g E k[X], we can use the Euclidean
algorithm, that is, apply the division algorithm
repeatedly to obtain over a field k, we define the (formal) derivative
f=9q1+r1, g=rlq2+r2, rl=r2q3+r3,..., off with respect to Xi as L(X,, . . , X,,,) =
~Viav,v,...vm Xi’. . Xyi-’ . Xvm and denote
degg>degr,>degr,>..., it by af/aX,. The map f H cTf~aXi is called
so that after a finite number of steps we attain the (formal) derivative with respect to Xi. In
ry-l = rvq,+l(rv+l = 0). Then r, = (L g). Accord- particular, if m = 1, then af/aX is denoted by
ingly, k[X] is a tprincipal ideal domain. This df/dX. The usual rules of tderivatives also
algorithm is applied to Euclid rings (- 67 hold for the formal derivative. If df/dX = 0 for
Commutative Rings L). an irreducible polynomial f(X) in k[X], then
f(X) is said to be inseparable; otherwise, f(X)
is separable. If the tcharacteristic of the field k
E. The Remainder Theorem is 0, then every irreducible polynomial f(X)
( # 0) is separable. When k is of characteristic
Let k be an integral domain, f(X)E k[X], and
p # 0, an irreducible polynomial f(X) is insep-
let g(X)= X - a(aE k). Then using the division
arable if and only if we can write f(X) = g(XP).
algorithm, we get

f(X)=@-cMX)+r,
q(X)Ek[X], rek. H. Rational Expressions

Therefore, f(cr) = r; that is, the remainder of The tfield of quotients of the polynomial hng
f(X) divided by X - tl is equal to f(cc). This is 4X,,..., X,] over a field k is denoted by
called the remainder theorem. If f(N) = 0, then 4X,, . . . . X,,) and is called the field of ra-
f(X) is divisible by X-cc in k[X]. tional expressions (or field of rational func-
tions) in variables Xi, . . . , X, over k. Its element
F. Irreducible Polynomials is called a rational expression in X1,. . , X,.
It can be written as a quotient of one poly-
nomial f (Xi, . _, X,) by another polyno-
Let k be a field. A polynomial f(X)ck[X] of
mialg(X,,..., X,) # 0. Also, an expression
degree n is said to be reducible over k if f is
divisible by a polynomial of degree v < n in f(ccl,...,a,)/s(ccl,..., c(,) obtained by replac-
k[X] (v # 0); otherwise, it is said to be irreduc- ingx,,..., X, with elements c(i, . . . , a, of k
in the above expression is called a rational
ible over k. Any polynomial of degree 1 is
irreducible. A polynomial f is a tprime element expression m tli, . . . , c(, (provided that
of k[X] if and only if f is irreducible over k. d~l,...,~“)ZO).
Let I be a unique factorization domain. If
f(X) is a polynomial (1) in I [X] such that
for a prime element p in I, a, f 0 (modp), I. Symmetric Polynomials and Alternating
a,-, =a,-, = . . . -=a,-O(modp) but a,$0 Polynomials
(modp’), then f(X) is irreducible over the
field of quotients of Z (Eisenstein’s theorem). Let f(X,,..., X,) be a polynomial in vari-
If a polynomial (2) in m variables over an ablesx,,..., X, over an integral domain I. If
337 J 1248
Polynomials

,f(X,, , X,) is invariant under every per- gives a criterion for the condition tl at some of
mutation of Xi,. , X,, it is called a sym- xi,. ,gn are equal. If r*i, . , z, are the roots of
metric polynomial (or symmetric function) of an ialgebraic equation a,X” + a, X”-’ + +
X,, ,X,. If f(X,, , X,) is transformed a, = 0 of degree n, then D(! i , , a,) is called
into -,f(X, , , X,) by every +odd permuta- the discriminant of the equation. It can be
tion of X,, ,X,, it is called an alternating expressed in terms of coefftcients a,, u1 , . . , u,
polynomial (or alternating function). Also, of the equation. For instance, if n = 2, we
an expression ,f(ai, , a,) obtained from have a;D = a: -4a,a,; if n = 3, we have a:D =
f(X,, , X,) by replacing Xi,. , X, with a~a~+18a,a,a2a,-4a,u~-4u~u,--27u~u~.
elements x, , , an of I is called a sym-
metric (alternating) function of s1i, , xn if
References
f(X,, , X,) is symmetric (alternating).
Let the coefficient of X”-k in the expansion
[1] B. L. van der Waerden, Algebra I,
of (X - X,).,.(X -X,) be denoted by (- l)krr,.
Springer, seventh edition, 1966.
Thenwehavec,=CXi=X,+...+X,,a,=
[2] N. Bourbaki, Elements de mathematique,
CXiXj=X,X,+X,X3+...+X,~,X, ,...) on=
Algebre, ch. 4, Actualites Sci. Ind., 1102b,
X,X,, , X,. Obviously, these are sym-
Hermann, second edition, 1959.
metric polynomials of X,, , X,. Moreover,
[3] A. G. Kurosh, Lectures on general algebra,
for every element cp of the polynomial ring
Chelsea, 1970. (Original in Russian, 1955.)
ICr,,Y,,...,r,l,cp(o,,a,,...,(~,)i~asym- [4] R. Godement, Cours d’algebre, Hermann,
metric polynomial of Xi, , X,. Conversely,
1963.
every symmetric polynomial of X, , , X,
For Hilbert’s irreducibility theorem,
can be uniquely expressed as a polynomial
[S] S. Lang, Diophantine geometry, Inter-
‘p(n,, 02,. , a,,). Thus the totality of symmetric
science, 1962, ch. 8.
polynomials of X, , . , X, is identical with the
ring I[a,,a,, . . . . cr,]. This is called the funda-
mental theorem on symmetric polynomials, and
gi, mz, , o, are called elementary symmetric
polynomials (or elementary symmetric func-
tions). For example, for s, = C Xi’ (v = 1,2, .),
338 (X.28)
we have .s, = g,, s2 = 0: - 2r~~, s3 = 0: - 30, gz +
Potential Theory
3a,,s,=a~-4a:~,+2~:+4a,a,-4rr,.
Concerning the elementary symmetric poly- A. Newtonian Potential
nomials and the s,, we have the relations
S,-~,S,-,+~~S,~~-...+(-l)“~~~“~-lS,+ In dynamics, a potential means a function u
(-l)“va,=O(v=1,2,...),ands,-ais,-, of n variables xi, , x, such that -grad u =
+...+(--l)“cT&,=O(~=n+l,n+2 )...) -(fh/dx,, . ..) c3u/i?x,,) gives a field of force in
(Newton’s formulas). the n-dimensional (n > 2) Euclidean space R”
Let p(X ,,..., X,)=(X,-Xx,)(X,-Xx,)... Given a point P in R” and a measure p, the
(X,-X,)(X, -X,) (X,, - X,-i) be the product functions u(P) given by the integrals
of n(n- 1)/2 differences between Xi, ,X,.
Then the polynomial p is invariant under even u(P) = - log PQ dn(Q), n = 2,
permutations of Xi, , X,,, and p becomes -p s-
under odd permutations. Hence p is an alter-
nating polynomial of Xi, . , X,. It is called the u(P)= PQ2-“&dQX n>3,
simplest alternating polynomial of these vari- s-
ables. Because of its particular expression, p is are typical examples of potential fu actions.
also called the difference product of Xi, , X,. They are called the logarithmic potential and
If the characteristic of I is different from 2, an Newtonian potential, respectively. H owever,
alternating polynomial f is divisible by the some authors mean by Newtonian potential
simplest alternating polynomial p; it can be the function u(P)=JPQmldp(Q) in R3. Usu-
written as ,f=ps, where s stands for a sym- ally, the measure p is taken to be a nonnega-
metric polynomial. tive +Radon measure with compact tsupport.
These potentials are tsuperharmonic in R” and
harmonic outside the support of n. Conversely,
J. Discriminants any harmonic function defined on a domain in
R” can be expressed as the sum of a potential
The square D(X,, . ,X,) = p2(X,, , X,,) of of a single layer and a potential of a double
the simplest alternating polynomial p is a layer (defined in the next paragraph). Because
symmetric polynomial, and it is therefore a of this close relation between potentials and
polynomial in 0i, ., 0,. D(cc,, x2, , cc,)= 0 harmonic functions, sometimes potlential
1249 338 c
Potential Theory

theory means the study of harmonic functions potential given above may be too general, and
(- 193 Harmonic Functions and Subhar- some restrictions are called for. We assume
monic Functions). (For the representation by that fi is a tlocally compact Hausdorff space;
potentials of superharmonic functions - 193 @ is a tlower semicontinuous function on n x
Harmonic Functions and Subharmonic Func- R satisfying -co < @ d co; and p, v, and i are
tions S.) nonnegative Radon measures with compact
Suppose that the measure p of R3 satisfies support in R. In particular, when n = R”, the
dp = pdz with sufficiently smooth density p potential with the kernel @(P, Q) = PQ-”
and volume element dz. Then the Newtonian (0 < tl < n) is called a potential of order CL(some-
potential u of /.f satisfies Poisson’s equation times of order n-m) or a Riesz potential.
Au= -4np. If the support of p is contained in
a surface S and dp = p da with density p and
C. The Maximum Principle and the Continuity
surface element da, then the potential u of p is
Principle
called the potential of a singli layer (or simple
distribution). If p is continuous on S, then u is
Let s2 = R”, and let @(P, p) be the kernel of
continuous in the whole space, and the tdirec-
the Newtonian potential. Then @(P, p) satis-
tional derivative of u at a point P in the direc-
fies the following principles: (1) Frostman’s
tion of the normal line to S at PO tends to
maximum principle (first maximum principle):

-2Tcp(P,)+ ,z=
s s 3% PQ I p=p,
da(Q) suppen@(P, p) < su~,,,~@(P, p) for any pL,
where S, is the support of p. (2) Ugaheri’s
maximum principle (dilated maximum prin-
as P approaches P,, along the normal line. ciple): There is a constant c > 0 such that
Therefore, as P moves on the line, the direc- SwpenW, P) G c supp+,@,(p~ 14 for any P. (3)
tional derivative jumps by -47cp(P,,) at P,,. If p A variation of Ugaheri’s maximum principle:
satisfies the tHiilder condition at PO ES, then Given any compact set K c a, there exists a
the derivative at P in the direction of any fixed constant c which may depend on K such that
tangent line at PO has a finite limit as P tends
suhK @VT P) G c whspW, PL)for any P
to PO along the normal line. The integral with S,, c K. (4) Upper boundedness principle:

u(p)=
ssp&&(Q) Q
If @(P, p) is bounded from above on S,,, then
it is bounded on n also. (5) For any compact
set Kc rZ and any p with S, c K, if @(P, p) is
is called the potential of a double layer (or bounded above on S,, then it is bounded on K
double distribution). If p is continuous on S, also. (6) Continuity principle: If @(P, p) is con-
then the limits at PO of u from the two direc- tinuous as a function on S,, then it is also con-
tions along the normal line at PO exist and are tinuous in R. Generally, the relations shown
27-cp(P,) + u(P,) and -2np(P,) + u(P,,). If, fur- in Fig. 1 hold among principles (l)-(6), where
ther, p is of class C2 on S, then each partial (a)+(b) means that (a) implies (b), and (c)+(d)
derivative of u has a finite limit as P tends to a means the negation of (c)+(d).
point of S.
(3)
& *
B. Generalized Potential (1)s (2) (5) ?H (6)

The classical notion of potentials is generalized


as follows: Let R be a space supplied with a
measure p ( > 0) and @(P, Q) a real-valued Fig. 1
function on the product space R x n. When
the integral j@(P, Q)dp(Q) is well defined at If the continuity principle holds for a gen-
each point PER, it is called the potential of p eral kernel @(p, p) of a potential and if for any
with kernel @ and is denoted by @(P, p) or p there is a sequence {Pk} of points in fi - S,,
@p(P). The function 6(P, Q) = @(Q, P) is called that has an accumulation point on S,, and
the adjoint kernel of @. When along which @(Pk, p)+sup,-, @(P, p), then (1)
holds also. The second condition is valid, for
instance, when R = R”, @(P, p) is tsubharmonic

=sW’,
&W)
in R”-S,,, and lirnsup@(P,~)<s~p~,~.@(Q,/*)
as P tends to the point at infinity. T. Ugaheri,
G. Choquet, and N. Ninomiya studied (2) and
(6). Ugaheri showed that for any nonnegative
exists for measures p, v 2 0, the value is called decreasing function q(r) defined in [0, co) and
the mutual energy of p and v. In particular, satisfying ~(0) = co, the kernel @(P, Q)= (p(PQ)
(p, p) is called the energy of p. The definition of satisfies (2) in R”. M. Ohtsuka established
338 D 1250
Potential Theory

that (6)-t(S) and (S)+(6) in general and proved topology. He proved that the fine, weak, and
(S)+(6) in the special case where @ is contin- strong convergences are equivalent for any
uous on R x 0 in the wider sense (i.e., @ may sequence {p,} of measures with bounded
have co as its value) and @ is finite outside the energies. B. Fuglede [ 161 called a kernel of
diagonal set of fi x R. The examples in Sec- positive type consistent when any +Cauchy
tions F, I, and J show that the potentials with net with respect to the strong topology that
kernels satisfying a weak condition such as (6) converges vaguely to a measure converges
possess a number of important properties. We strongly to the same measure. This notion is
note that (6) does not necessarily hold in gen- used to give conditions for E to be tcomplete
eral. (For literature on (l)-(6) and other re- with respect to the strong topology [ 1 I, 181.
lated principles - [ 181.) Moreover, Fuglede called a consist’ant kernel
satisfying the energy principle perfect, and
studied the cases where tconvolution kernels
D. The Energy Principle on a locally compact topological group are
consistent or perfect [ 111. For instance, PQ -a
Denote by E the class of all measures of finite (0 <a <n) in R” and the Bessel kernel, which
energy. A symmetric kernel is called positive was studied by N. Aronszajn and K.. T. Smith,
definite (or of positive type) if (p - v, p - v) = are perfect.
(~,P)+(v,v)-2(1(,v)>Ofor any p, vcE. If
the equality (p - v, p - v) = 0 always implies LL=
v, then the kernel is said to satisfy the energy F. Convergence of Sequences of Potentials
principle. Some characterizations for a kernel
to be of positive type or to satisfy the energy We are concerned with determining when a
principle were given by Ninomiya. Using family of potentials corresponding to a class
them, he showed that a symmetric kernel that of measures {p,} with indices w in a directed
satisfies Frostman’s maximum principle or the set converges. If all S,,. are contained in a
domination principle (- Section L) and a fixed compact set and p. converges, vaguely
certain additional condition is of positive type. to pLo, then liminf@(P,&>@(P,p,,) in R. If,
Choquet and Ohtsuka generalized these de& moreover, Q, is continuous in the wider sense
nitions and results [ 1S]. and both @ and 6 satisfy the continuity prin-
ciple, then equality holds quasi-everywhere
(q.e.) in R in this inequality [18]; WI: now de-
E. Topologies on Classes of Measures fine the notion q.e. First, for a nonempty com-
pact set K in sl, define W(K) to be .nf(p,p),
Let C, be the space of continuous functions where S,, c K and p(K) = 1, and W(M) to be
with compact support in n and MO’ be the w for the empty set @. Next, set U:(X) =
class of measures in n. The tseminorms p- inf,,, W(K) for an arbitrary set X in R and
v+lJfd/r-Jfdvl (feC,,) define the vague w,(X) = supxi G K(G), where G is an open set
topology on A4:. The class of unit distribu- in a. When a property holds except on a set X
tions can be topologized by the vague topol- such that W,(X) = co (resp. q(X) = co), we say
ogy, which induces a topology on fi itself. This that the property holds quasi-everywhere (q.e.)
topology coincides with the original topology (resp. nearly everywhere (n.e.)). The terms q.e.
in R. A subclass M of Mi is relatively com- and n.e. are also used in the theory of +capac-
pact with respect to the vague topology if on ity, although their meaning is not t:le same as
every compact set in R, the values of the mea- here. (For results on the convergence of se-
sures of M are bounded. Denote by L the class quences of potentials - [ 181.)
of measures i. such that (I., p) is finite for all
~LEM$, and define the fine topology on M,’ by
the seminorms ~--~-[(1,~)-(3~,v)l (1cL). This G. Thin Sets
topology was introduced by H. Cartan [4] for
the Newtonian kernel. It is the weakest topol- A set X c R is called thin at PO when either PO
ogy that makes each @(P, I”), 3L~L, continu- is an isolated point of the set X U {.PO} with
ous. Further, when the kernel is of positive respect to the original topology of 0 or there
type, the weak topology is defined on E by the exists a measure p such that lim inf@(P, p) >
seminorms p--v-l&p)-(1,v)l (GEE). The @(P,,,p) as PEX-{PO} tends to P,,. If PO is
strong topology is defined on E by the semi- an isolated point of X U {PO} with respect to
norm J(p-v,p-v). the topology weakest among those stronger
For the Newtonian kernel, Cartan [S] than both the original topology and the fine
showed that vague<fine<weak<strong on E, topology, then X is thin with respect to the
where vague<fine, for instance, means that adjoint kernel &. The converse is true in a
the fine topology is stronger than the vague special case [S]. The notion of thinness was
1251 338 L
Potential Theory

introduced in 1940 by M. Brelot and inves- J. The Gauss Variational Problem


tigated in detail in [3]. Let q(r) be a positive
decreasing function. Suppose there exist posi- Given a compact set K and a function f on K,
tive numbers r,,, 6, a such that q(r) <acp(( 1 + the problem of minimizing the Gauss integral
6)r) in 0 < r < r,. Assume that R is a metric (p, p) - 2 jf& for a measure p such that S,, c K
space with distance p, and take cp(p(P, Q)) is called the Gauss variational problem. When
as the kernel @(P, Q). Then a necessary and an additional condition is imposed on p, the
sufficient condition for X to be thin at P,, is problem is called conditional. Among many
xgl sj/We(Xj)< cc (s> 1) [19], where Xj= results obtained for this problem [ 181, the
{PEXI sj,<rp(p(P, P,))<sj+‘}. This criterion following is typical: If CDis symmetric, K sup-
was obtained by N. Wiener in 1924 and uti- ports a nonzero measure of finite energy, and f
lized to give a condition for a boundary point is finite upper semicontinuous on K, then there
PO of a domain D in R3 to be tregular with exists a p such that f(P) <@(P, p) n.e. on K
respect to the tDirichlet problem for D. In this and f(P) > @(P, p) on S,. When @ is not sym-
situation, P,, is regular if and only if the com- metric, the same relations hold for some p
plement of D is thin at PO. When every com- if @ is positive and 6 satisfies the continuity
pact subset of X is thin at P,,, X is called inter- principle (Kishi [ 15]), although the method
nally thin at PO. A necessary and sufficient using Gauss variation is not applicable. When
condition for X to be internally thin at PO is @ is symmetric and of positive type, the Gauss
c S’/14qXj) < co. integral with f(P) = @(P, v) (v E E) is equal to
IIp-vII’- llvll*, and the minimizing problem
is equivalent to finding the projection of v to
{p E E 1S,, c K}. In some cases, this projection is
H. Polar Sets
equal to the measure obtained by the balayage
of v to K (- Section L).
Brelot (1941) called a set A polar when there
is a measure p for which @(P, p) = cc on A.
Consider ri(D, K) ( = V(K, 0)) as defined K. Equilibrium Mass Distributions
in 48 Capacity C, and define ri,(n, X) by
swG,,infKcG ri(n, K) for an arbitrary set X, A unit measure p supported by a compact set
where G is an open set. Then for any p, X = K is called an equilibrium mass distribution on
{P 1@(P, p) = co} is a G, set for which the value K if @(P, p) is equal to a constant a n.e. on K
of ri,(o, X) is infinite. Conversely, given a Gh and cD(P, p) < a in R. The kernel is said to
set A of TNewtonian outer capacity zero in R” satisfy the equilibrium principle if there exists
(n > 3), there is a measure p such that the set of an equilibrium mass distribution on every
points where the Newtonian potential of p is compact set. If a > 0, l/a can be regarded as a
equal to co coincides with A and p(R”-- A) =0 kind of capacity, and p/a is called a capacitary
(Choquet [7]). This result is called Evans’s mass distribution. Corresponding to tinner
theorem (or the Evans-Selherg theorem) (- and outer capacities, inner and outer capaci-
48 Capacity) in the special case where A is tary mass distributions and their coincidence
compact. can be discussed [ 111. When @ is symmetric,
Frostman’s maximum principle is equivalent
to the equilibrium principle.

I. Quasicontinuity

L. The Sweeping-Out Principle


A function f in R is called quasicontinuous if
there is an open subset G in n of arbitrarily A kernel is said to satisfy the sweeping-out
small capacity such that the restriction off to principle (or balayage principle) if for any com-
R - G is continuous. Naturally, quasicontinu- pact set K and measure p there exists a mea-
ity depends on the definition of capacity. Sup- sure v supported by K such that @(P, v) =
pose that whenever the potential of a measure Q(P, p) n.e. on K and @(P, v) < @(P, p) in R.
p with kernel @ is continuous as a function on When we find such a v, we say that we sweep
S,, it is quasicontinuous in R; @ is then said to out p to K, and finding v is called a sweeping-
satisfy the quasicontinuity principle. Assuming out process (or balayage). This terminology
in addition that CDis positive symmetric and originated in the classical process for the New-
taking l/U@, G) as the capacity of G, we find tonian potential in which the exterior of K is
that every potential is quasicontinuous in R covered by a countable number of balls and
(M. Kishi). A similar result is valid for a non- the masses inside the balls are repeatedly
symmetric kernel if the continuity principle is swept out onto the spherical surfaces. For any
assumed (Choquet [6]). i general kernel, the balayage principle implies
338 M 1252
Potential Theory

the domination principle (also called Cartan’s N. Diffusion Kernels


maximum principle), which asserts that if the
inequality O(P, p) < @(P, v) is valid on ,S, for By means of the bilinear form Sf& (f~ C,
PEE and an)i I’, then the same inequality holds p E M,), we now introduce weak topologies in
in 0. The converse is true if @ is positive, sym- the space C of continuous functions in R and
metric, continuous in the wider sense, and in the class A4, of Radon measures of general
finite outside the diagonal set. In contrast to sign with compact support. Similarly, we in-
the domination principle, @ is said to satisfy troduce weak topologies in C’, and in the class
the inverse domination principle if the inequal- M of measures of general sign with not neces-
ity @(P,p)<O(P,v) on S, for ,LLLEEand any v sarily compact support. A positive linear map-
implies the same inequality in Q. In a special ping G of M, into M that is continuous with
case, the domination principle implies Frost- respect to these two weak topologies is called
man’s maximum principle [ 171. For the equi- a diffusion kernel. The linear mapping G* of
librium and domination principles for non- C, into C that is determined by Ifa’Gp =
symmetric kernels - [14]. l G*fdp is called a transposed mapping. We can
Corresponding to inner and outer capaci- define the balayage principle for G and the
tary mass distributions, we can examine inner domination principle for G* as in the case
and outer balayage mass distributions and where kernels are functions. Then c’ satisfies
inner and outer Gauss variational problems the balayage principle if and only if G* satis-
and their coincidences [S, 181. With respect to fies the domination principle (Choquet and
the Newtonian potential, a point P is called an Deny). The complete maximum principle has
internally (externally) irregular point of X if the been defined and studied for G* (Deny). G. A.
inner (outer) balayage mass distribution to X Hunt obtained a relation between this prin-
of the unit measure cp at P is different from cp. ciple and the representation of GTf in the
X is thin (internally thin) at P if and only if P form s: P,fdt with a tsemigroup P,. His result
is an externally (internally) irregular point of X is important in the theory of +stochastic pro-
(Cartan [:5]). cesses (- 261 Markov Processes).

0. Convolution Kernels
M. Other Principles
A diffusion kernel G on a locally compact
A kernel @ is said to satisfy the uniqueness Abelian group induces a convolution kernel K
principle if p = v follows from the equality if G is translation-invariant. It is called a Hunt
@(P, p)=@(P, v), which is valid n.e. in Q. Nino- kernel when there exists a vaguely continuous
miya and Kishi studied this principle. A ker- semigroup {4,20 such that K = s; c:,dt and cl0
nel @ is said to satisfy the lower envelope prin- =E,,, the Dirac measure at the origin. A Hunt
ciple if given p and v, there is a i such that kernel satisfies the domination prinlsiple and
@(P, jL) = min(Q(P, p), @(P, v)). If Q, satisfies the the balayage principle to all open sets, and it
domination principle and 6 satisfies the con- satisfies the complete maximum principle if
tinuity principle, then @ satisfies the lower and only if {LY,}~~~ is sub-Markovian, Ida, < 1.
envelope principle on every compact set con- The Fourier transform of such a sernigroup
sidered as a space. Conversely, if @ satisfies has a closed connection with a negative-
the lower envelope principle on every compact definite function [l].
set considered as a space, then @ satisfies the For a convolution kernel K satisfying the
domination principle or the inverse domi- domination principle, or, equivalently, the
nation principle under some additional con- balayage principle, the inequality
ditions (Kishi). A kernel is said to satisfy the
complete maximum principle if the inequality
@,(P, p) d @(P, v) + u on S,, implies the same is valid for all relatively compact open sets w,
inequality in 0 for p E E, any v, and a > 0 (Car- and w2 [S]. It has a unique decomposition
tan and J. Deny, 1950). This principle implies IC= cp. (IC,, + ICY), where cp is a continuous ex-
both Frostman’s maximum principle and the ponential function, ICYis equal to 0 ‘or a Hunt
domination principle. Potentials of order CI kernel satisfying the complete maxi-mum prin-
(n - 2 d n <n) in R” and the Yukawa potential ciple, and ICYis a singular kernel satisfying the
with kernel aPQm’ exp( --im) in R3 satisfy domination principle such that ICY* E, = ICYfor
the complete maximum principle. Relations every XE&, [12].
between this principle and some other prin-
ciples were studied by Kishi [ 141. P. Potentials with Distribution Kernels
While all principles discussed so far are
global, C’hoquet and Ohtsuka made a local A function f in R” is called slowly increasing
study. in the sense of Deny if there exists a positive
1253 338 Ref.
PotentiaI Theory

integer 4 such that jf(P)(l + OP2)-4dr(P) < cpE C, n D. If in addition p > 0, then u is called
co. A tdistribution K is called a distribution a pure potential. For any pure potential, the
kernel if the tFourier transform FK of K is a lower envelope principle, the equilibrium
function k > 0 and both k and l/k are slowly principle, the balayage principle, and the com-
increasing in the sense of Deny. Given such plete maximum principle hold [Z]. Suppose
a distribution K, the family W of distributions that n is a locally compact Abelian group and
T for which FT are functions and 1)TII * = D is a Dirichlet space such that U,,u(x) = u(x -
Jk(FT)‘dr < co (this is called the energy of Y)ED and IIU,ull= Ilull for every UED and
T) is a tHilbert space with inner product ysR. Then we call D special and characterize
(‘J’i , T2) = j kFT, FT, dr. However, the family of it in terms of a real-valued continuous func-
Newtonian potentials of measures with finite tion on 0 [2]. (For axiomatic potential theory
energy is not a Hilbert space (Cartan). The - 193 Harmonic Functions and Subharmonic
family of functions of class C” with.compact Functions.)
support is tdense in W. For every TE W the
function FK x FT is slowly increasing in the
sense of Deny. There exists a distribution U References
= UT that satisfies FU = FK x FT, called the
K-potential of T. Since W is complete, the [l] C. Berg and G. Forst, Potential theory on
method of projection is applicable, and prob- locally compact Abelian groups, Springer,
lems of equilibrium, balayage, and capacity 1975.
can be examined. For instance, if tDirac’s [2] A. Beurling and J. Deny, Dirichlet spaces,
distribution 6 is taken as a distribution kernel, Proc. Nat. Acad. Sci. US, 45 (1959), 208215.
then the corresponding capacity is the Le- [3] M. Brelot, Sur les ensembles efliles, Bull.
besgue measure. In the case of the Newtonian Sci. Math., (2) 68 (1944), 12-36.
kernel, aUT/axj =fj is defined a.e. in R” for any [4] M. Brelot, Elements de la theorie classique
TE W. These A are square integrable, T= du potentiel, Centre de Documentation Uni-
-c,~~=,afi/ax,,and IITll*=~,C3=~Slfil~dz, versitaire, third edition, 1965.

s(xj-Yj)~(Q)PQ-“dz(Q),
where l/c, = 2(n - 2)7c”‘*/I(n/2). Furthermore, [S] H. Cartan, Theorie g&ale du balayage en
potentiel newtonien, Ann. Univ. Grenoble
UT=(n-2) k
j=l
(N.S.), 22 (1946), 221-280.
[6] G. Choquet, Sur les fondements de la
where xi, yj are components of P, Q, respec- theorie fine du potentiel, C. R. Acad. Sci. Paris,
tively. Every ordinary potential of a double 244 (1957), 160661609.
layer is a special case of UT. Conversely, let f [7] G. Choquet, Potentiels sur un ensemble de
be a function in R” that is absolutely continu- capacite nulle, Suites de potentiels, C. R. Acad.
ous along almost every line parallel to each Sci. Paris, 244 (1957), 1707-1710.
coordinate axis and whose partial derivatives [8] G. Choquet and J. Deny, Noyaux de
are square integrable. Then f is equal to the convolution et balayage sur tout ouvert, Lec-
potential of some TE W with Newtonian ker- ture notes in math. 404, Springer, 1974,60-
nel up to an additive constant. These results 112.
are due to Deny [9]. [9] J. Deny, Les potentiels d’tnergie linie, Acta
Math., 82 (1950), 107-183.
[lo] 0. Frostman, Potentiel d’equilibre et
Q. Dirichlet Spaces capacite des ensembles avec quelques applica-
tions a la theorie des fonctions, Medd. Lunds
In this section, functions are assumed to be Univ. Mat. Sem., 3 (1935).
complex-valued. Let 0 be a locally compact [ 111 B. Fuglede, On the theory of potentials in
Hausdorff space, 5 2 0 a Radon measure in 0, locally compact spaces, Acta Math., 103
and C,, the space of continuous functions with (1960), 139-215.
compact support. A Hilbert space D consisting [12] M. Ito, Caracterisation du principe de
of locally t-integrable functions is called a domination pour les noyaux de convolution
Dirichlet space if C, I? D is dense in both C, non-born&, Nagoya Math. J., 57 (1975), 167-
and D, the relations Iv(P)-u(Q)l<lu(P)-u(Q)1 197.
and Iu(P)l<lu(P)l for uoD and a function u [ 131 0. D. Kellogg, Foundations of potential
always imply u~D and (IuI( 6 l/uII, and for any theory, Springer, 1929.
compact set K c fi, there exists a constant [14] M. Kishi, Maximum principles in the
A(K) such that jKIu(dc<A(K)IIuII for every potential theory, Nagoya Math. J., 23 (1963),
u E D. The notion of Dirichlet space was in- 165-187.
troduced by A. Beurling. A function u E D is [lS] M. Kishi, An existence theorem in poten-
called a potential if there exists a Radon mea- tial theory, Nagoya Math. J., 27 (1966), 133-
sure p such that (u, cp)= j cpdp holds for every 137.
339 A 1254
Power Series

[ 161 N. S. Landkof, Foundations of modern written in the form C~oc,t”, and such a t is
potential theory, Springer, 1972. called a local canonical pirameter.
[ 171 N. Ninomiya, Etude sur la theorie du When t is a local canonical parameter, a
potentiel pris par rapport au noyau syme- series of the form Z.“= -m c, t” is called a Lau-
trique, J. Inst. Polytech. Osaka City Univ., rent series and a series C,“= _ mc,t”lk (k a fixed
(A) 8 (1957), 1477179. natural number) is called a Puiseux series, after
[ 181 M. Ohtsuka, On potentials in locally the French mathematicians A. Lau:rent and V.
compact spaces, J. Sci. Hiroshima Univ., 25 A. Puiseux. Power series are sometimes called
(1961) 135-352. Taylor series.
[19] M. Ohtsuka, On thin sets in potential If we perform tanalytic continuations of a
theory, Sem. Anal. Funtions, Inst. for Adv. power series from its center along radii of its
Study, Princeton, 1957, vol. 1, p. 302-313. circle of convergence, we encounter a tsingular-
ity on the circumference for at least one radius.
For a power series with the circle 01‘ conver-
gence IzI = R, the argument TVof the singularity
on JzJ = R nearest z = R is given in t.he follow-
339 (X1.2) ing way. Suppose, for simplicity, that the
Power Series radius of convergence R of Z c,z” equals 1,
and put

A. General Remarks

Let a and c 02 c 1, c 2, be elements of a tfield P(h) = lim sup m.


K and z be a variable. A series of the form P = n-m
zzo cn(z - n)” is called a power series (in one Then c( is obtained from
variable). We assume that K is the field of
complex numbers. For a given power series P, cos a= P; (0) = ,$I~ (P(h) - 1)/h
we can determine a unique real number R
(O< R < co) such that P converges if [z--al < R (S. Mandelbrojt, 1937). In particular, if all the
and diverges if R < lz - al. We call R the radius c, are real and nonnegative, z = R is a singular-
of convergence and the circle lz ---al = R (some- ity (Vivanti’s theorem).
times Iz -- al < R) the circle of convergence of P.
The value of R is given by R = l/lim sup ;/lc.l
(Cauchy-Hadamard formula) with the conven- B. Abel’s Continuity Theorem
tions 0= l/co, co = l/O. Also R=lim(c,/c,+,(,
provided that the limit on the right-hand side As a property of the power series on the circle
exists. of convergence, we have Abel’s continuity
A power series tconverges absolutely and theorem: If the radius of convergeno: of f(z) =
uniformly on every compact subset inside C,“=O a,,~” is equal to 1 and xg,, a, coverges
its circle of convergence and defines there a (or is t(C, k)-summable (k > -1)) to /f, then
single-valued complex function. Since the f(z)+A when z approaches 1 in any sector
series is ttermwise differentiable, the function is {zI]zl<l,larg(l-z)l<(n/2)-6},6>0
actually a holomorphic function of a complex (YStolz’s path).
variable. Conversely, any function f(z) holo- The converse of this theorem is not always
morphic in a domain can be represented by a true. The existence of lim,,, f(z) does not
power series in a neighborhood of each point necessarily lead to the convergence of C.“ea..
a of the domain. Such a representation is Even Cesaro summability of x:,, a, does not
called the Taylor expansion of f(z) at a (or in always follow from the existence of lim,,,f(z).
the neighborhood of a). A power series that If a, = o( l/n) and f(z)+A when z approaches 1
represents a holomorphic function is called a along a curve ending at z = 1, then 1: a, con-
holomorphic function element. K. Weierstrass verges to A (Tauber’s theorem, 1897). The
defined an analytic function as the set of all theorems concerning additional sufficient
elements that can be obtained by tanalytic conditions for the validity of the converse of
continuations starting from a given function Abel’s theorem are called theorems of Tauber-
element (-- 198 Holomorphic Functions). ian type (or Tauberian theorems). In ‘Tauber’s
Besides the series Z$, c,(z - a)“, a series of theorem, the hypothesis on the a, may be
the form Q = C$, c,z -” is called a power series replaced by a, = O( l/n) or n Re a,, n Im a, may
with center at the point at infinity, and its value be bounded from above but not necessarily
at co is defined to be cO. By putting z-a = t from below (G. H. Hardy and J. E. Little-
when its center a is a finite point and z-i = t wood). Here, the condition a, = 0( l/n) cannot
when its center is co, every power series can be be weakened (Littlewood). SuIIicient condi-
1255 339 D
Power Series

tions for Z a, to be tsummable for various Then the sequence of numbers I,-, 11, is in-
summation methods are also known. creasing, and limI,-,/I,=R*. If R,, R,, . . . are
different valued of 1,-,/Z,, then f(z) is bolo-
morphic at points not lying on IzI = R, (Hada-
C. Lambert Series mard’s theorem, 1892).
When a point a and a set A of points in the
A series of the form complex plane are given, the set of points that
can be joined with a by a segment disjoint
(1) from A is called the star region determined
by a and A. Take any half-line starting at
is called a Lambert series. If C a, is convergent, a; the point of A lying on it and nearest to
(1) converges for any z with lzl # 1, and more- a is called a vertex. For a power series x c,z”,
over, it converges uniformly on any compact the set of centers of the function elements
setcontainedinlzl<lorlzl>l.If~a,is obtained by analytic continuations along
divergent, (1) and the power series x a,z” half-lines starting at the origin is called the
converge or diverge simultaneously for IzI # 1. star region of C c,z” with respect to the
There is a detailed study of Lambert series origin. Let {a} and {/I} be the set of vertices
by K. Knopp (1913). If R is the radius of con- of the star regions with respect to the origin
vergence of Z a,z” and Cd,,,ad = A, is the sum of X a,z” and C b,z”, respectively. Then the
extending over all divisors of n, then we have star region determined by the origin and
the reciprocity relation the set {x/3} is contained in the star region
of Za,b,z” (Hadamard’s multiplication theo-
rem, 1892).
The following are some results concerning
which holds for IzI < min(R, 1). As special cases conditions for the coincidence of the circle of
of this relation, we have convergence of a power series and its tnatural
boundary. Let the a, be positive numbers
and b a natural number greater than 1. If the
radius of convergence of Ego u,zb’ is equal to
1, then IzI = 1 is its natural boundary (Weier-
strass, E. I. Fredholm). If the radius of con-
vergence of Csoanzln (with (1,) an increas-
where p and rp are the tMijbius function and ing sequence of natural numbers) is 1 and
tEuler function, respectively. lim inf n.+m(ln+l -1,)/1,>0, then IzI= 1 is the
If the na, are real and bounded from below, natural boundary (Hadamard’s gap tbeo-
rem). The latter condition was weakened to
Z”
lim 5 (1-z)na,-=s lim inf,+, (A,,, -A.,,)/& > 0 by E. Bore1
Z-l-O,=1 1-z” (1896). E. Fabry (1896) showed that with the
implies x a, = s. Hardy and Littlewood (1921) radius of convergence of Z~oa,z” being 1, if
showed that this theorem of Tauberian type is there exist a suitable sequence of natural num-
equivalent to the tprime number theorem. bersm,<m,<...andanumberB(O<B<l)
such that lim s,/m, = 0, where the si are the
number of nonzero a, contained in the inter-
D. Singularities of Power Series val (mi(l -O),mi(l +0)), then IzI= 1 is the
natural boundary of Z a,z”. By applying
Given a power series P = C a,,z”, if the tbranch this theorem to Z a,z”n with radius of con-
in Iz( -CR* of the analytic function f(z) deter- vergence 1, it can be shown that if lim,,, 1,/n
mined by P is single-valued meromorphic but = co, then IzI = 1 is its natural boundary.
the branch in Iz I < R’ with R’ > R* has sin- These theorems are called gap theorems
gularities other than poles, then R* is called because they concern power series with
the radius of meromorphy of P and IzI = R* gaps in their exponents. A generalization
(sometimes Iz I < R*) is called the circle of of Fabry’s theorem was obtained by G. Polya.
meromorpby of P. R* can be computed in the It is known that Fabry’s last condition
following way. Put above is in a sense the best possible (Polya,
1942).
Z,=limsupl~(, Regarding the natural boundary of a power
n-rm
series, we also have the following result: When
a, a“+I .‘. an+p the radius of convergence of Za,z” is 1, by a
D(P)= ‘.+I %+2 “’ %+p+l suitable choice of the sequence {E,) (E,= -Jl),
n
... ... ... .. . the series ~E,u,z” has IzI = 1 as its natural
a n+p an+p+l ... an+zp boundary (A. Hurwitz, P. Fatou, Polya).
339 E 1256
Power Series

E. Overconvergence [2] P. Dienes, The Taylor series, Clarendon


Press, 1931.
If the radius of convergence of ,f(z) = C a,z” is [3] L. Bieberbach, Analytische Fortsetzung,
1, the sequence of partial sums S( 1, z), S(2, z), Springer, 1955.
. . . . where S(n, z) = C:=O uvzy, is naturally di- Also - references to 198 Holomorphic
vergent for IzI > 1, but a suitable subsequence Functions.
S(n,, z) (k = 1,2, . .) may still be convergent. For singularities,
A. Ostrowski [7, S] called this phenomenon [4] S. Mandelbrojt, Les singularit& des fonc-
overconvergence and proved the following re- tions analytiques represent&es par une strie de
sult: By definition, f(z) = C,“,Oa,z’n has a Taylor, Gauthier-Villars, 1932.
lacunary structure if the sequence { 1,) has a For theorems of Tauberian type,
subsequence {A,} such that A,,,+, > E.,,(l + 0) [S] N. Wiener, Tauberian theorems, Ann.
(0 > 0). If this situation occurs, then in a s&i- Math., (2) 33 (1932), l-100.
ciently small neighborhood of a point on IzI = [6] H. R. Pitt, Tauberian theorems, Oxford
1 where f(z) is holomorphic, S(&, z) (k = Univ. Press, 1958.
1,2, ) converges uniformly. This result in- For overconvergence,
cludes Hadamard’s gap theorem as a special [7] A. Ostrowski, Uber eine Eigenschaft ge-
case. Conversely, any power series for which wisser Potenzreihen mit unendlichvielen ver-
overconvergence takes place can be repre- schwinden der Koefhzienten, S.-B. Preuss.
sented as the sum of a power series having a Akad. Wiss., (1921), 5577565.
lacunary structure and a power series whose [S] A. Ostrowski, Uber vollstandige Gebiete
radius of convergence is greater than 1. G. gleichmassiger Konvergenz von Folgen ana-
Bourion [9] gave a unified theory of these lytischer Funktionen, Abh. Math. Sem. Univ.
results using tsuperharmonic functions. Hamburg, 1 (1922), 3277350.
R. Jentsch [lo] showed that all singularities [9] G. Bourion, L’ultraconvergence dans les
of a power series on its circle of convergence series de Taylor, Actualites Sci. Ind., Hermann,
are accumulation points of the zeros of the 1937.
partial sums. On the other hand, if the zeros [lo] R. Jentsch, Fortgesetzte Untersuchungen
of a subsequence S(n,, z) (k = 1,2, ) has no iiber die Abschnitte von Potenzreihen, Acta
accumulation point on IzI = 1, then the power Math., 41 (1918) 253-270.
series has a lacunary structure and overcon-
vergence takes place for S(n,, z) (k = 1,2, ).
If logn,,, =O(n,) and S(n,,z) (k=1,2, . ..) is
overconvergent, then all boundary points of 340 (XVII.1 7)
the domain of overconvergence are accumula- Probabilistic Methods in
tion points of the zeros of S(n,,z) (Ostrowski
Statistical Mechanics
cw
A power series is completely determined by
its coefficients, but little is known about the A. Introduction
relations between the arithmetical properties
of its coefficients and the function-theoretic Probabilistic methods are often very useful in
properties of the function represented by the the rigorous treatment of the mathlematical
series. A known result is that if the power foundations of statistical mechanics and also
series C c,z” with rational coefficients repre- in some other problems related to statistical
sents a branch of an talgebraic function, then mechanics. As examples of such methods, we
we can find an integer y such that the c,y” explain here (I) Ising models, (2) Markov
(n > 1) are all integers (Eisenstein’s theorem, statistical mechanics, (3) percolation processes,
1852). (4) random Schrodinger equations, and (5) the
For power series of several variables - 21 +Boltzmann equation.
Analytic Functions of Several Complex Vari-
ables; for formal power series - 370 Rings of B. king Models
Power Series. For power series expansions -
Appendix A, Table lO.IV. The king model was proposed by II. Ising [ 11
to explain the phenomena of phase transitions
of a ferromagnet, in which either a + or -
References spin is put on each site of a crystal lattice, and
interaction between nearest neighboring sites
For the general theory of power series, is taken into consideration.
[l] E. G. H. Landau, Darstellung und Begriin- Let I’ be a cube in the d-dimensional integer
dung einiger neuerer Ergebnisse der Funk- lattice space Zd and X, = { +l, -1)’ be the
tionentheorie, Springer, second edition, 1929. totality of spin configurations in V. Each ele-
1257 340 D
Probabilistic Methods in Statistical Mechanics

ment of XV is denoted by c= {ci}ieV (ci= $1 in physics, biology, and sociology are formu-
or -1). We suppose that a spin configuration lated as a class of infinite-dimensional tMarkov
0 has a potential of the type processes. The field of investigation of station-
ary states and statistical or ergodic properties
‘V(a)= - 1

(i,j)cV
Oiiaj+hC

isV
CT;,
of these processes is called Markov statistical
mechanics, which has made rapid progress
where (i, j) means that (i, j) is a nearest neigh-
during the last decade. We explain this field by
boring pair of sites and h stands for the para-
looking at a typical class of processes.
meter of an external field. A tprobability mea-
Let Zd be the d-dimensional integer lattice
sure on Xv is called a state on V. For each
space. Putting + or - spin on each site on Zd,
state p, the free energy is defined by
let us consider a random motion of spins which
evolves while interacting with neighboring
spins. Let X= { +l, -lJzd be the totality of
spin configurations and an element of X be
where /l= l/kT (k is the tBoltzmann constant, denoted by r={~(i)}~,~d (v(i)= +l or -1). The
T is the tabsolute temperature). Then there process is described in terms of a collection of
exists a unique state nonnegative functions ci(q) defined for iEZd
and q E X. For the configuration qt at time t,
g{.h(a)= exp(-8Wd) aEXv, q,(i) changes to -v],(i) in the time interval
c ,,x,ew-&424)’
[t, t + At] with probability ci(qt)At + o(At).
on V which minimizes the free energy F, (vari- This process on X is called a spin-flip model.
ational principle). gj,” is called a Gibbs state on For an initial distribution p we denote by pt
V of the Ising model with parameter (p, h). the distribution at time t. If pLt= p for all t > 0,
Physically, a Gibbs state is an equilibrium p is called a stationary state.
state, in which various physical quantities are Example 1. Stochastic Ising models. A sto-
calculated. chastic Ising model was proposed by Glauber
Since for each Zd-homogeneous (i.e., trans- [8] to describe the random motion in a fer-
lationally invariant with respect to Z”) state p romagnet upon contact with a heat bath. Then
the mean free energy the flip rate {ci(~)j is defined by the potential
1 of the Ising model. It is known that any Gibbs
.&)= v-a
lim -444
1VI state of the Ising model is a reversible station-
ary state of the stochastic Ising model, and the
is well defined, a (limiting) Gibbs state can also converse is also valid. Free energy plays an
be defined for the infinite domain V= Zd by important role in the study of the ergodic
the above mentioned variational principle. properties of these models. In particular, the
However, at present the probabilistic deli- mean free energy is a nondecreasing functional
nition of Gibbs states given by Dobrushin [2] along the distributions pLt, 0 < t < cc.
and Lanford and Ruelle [3] is prevalent. Example 2. Contact processes. A contact
It is known that if an external field is present process was introduced by Harris [ 1 l] to
(i.e., h # 0) there is only one Gibbs state for any investigate the spread of infection. The flip rate
fi. On the other hand when an external field is of the contact process is given by
absent (h = 0) and d 2 2, there are at least two 1 if q(i)= +l,
Gibbs states, i.e., a phase transition occurs, for ci(T) =
ki if q(i)= -1 and
a sufficiently low temperature.
Finally, we mention some known facts in #{jllj-il=l, r(j)=+l}=k.
this field. In the following we assume h = 0. (1)
Here + 1 denotes an infected individual and
In the l-dimensional case, the phase transition
-1 denotes a,healthy one. Denote by -1 the
never occurs. (2) For d > 2, there exists a crit-
configuration at which all sites are healthy and
ical value /l,(d) such that the phase transition
by 6-r the unit point mass at -1; then S-i is
does not occur for any b < b,(d) but it occurs
a stationary state. The most important result
for every fi > b,(d). The calculation of b,(2) has
is the following: There exists a critical value 1,
been carried out by Onsager [4]. (3) In the 2-
(0 < 1, < co) such that if A < 1,6-i is a unique
dimensional case, every Gibbs state is Z*-
stationary state, and if 1> 1, there is another
homogeneous [S, 63. (4) For d 2 3, there is a
stationary state p satisfying
Zd-inhomogeneous Gibbs state for sufficiently
large p [7]. p[qEX; q(i)= +l for infinitely many i] = 1.

C. Markov Statistical Mechanics D. Percolation Processes

Stochastic Ising models, infinite interacting A percolation process is a mathematical model


particle systems, and many models occurring which describes the random spread of a fluid
340 E 1258
Probabilistic Methods in Statistical Mechanics

through a medium. It can be used to describe where {n(dy, w)} is a +Poisson random mea-
phenomena such as the penetration through a’ sure with mean measure dy and cp~:x)is a non-
porous solid by a liquid or the spread of an negative measurable function satisfying v(x) =
infectious disease [ 141. Usually, the process is 0(1x1-“-*) as Ixl+‘co. Let E,(x,y,cs) be the
identified as a site percolation process or a continuous kernel for the resolution of the
bond percolation process. Here we describe the identity for A(o), and denote by N(A) the mean
latter only. of E,(O, 0, w). Then N(1) is a nondecreasing
Let 1, = {S, B} be a countable connected function vanishing on (--co, 0) and has the
+graph with a set of sites (tvertices) S and a set following asymptotic form at A= 0:
of bonds (tedges) B. Each bond b is open with
i”‘210gN(i)+ -y, n/z as 2.40,
probability p and closed with probability 1 -p
independently of all other bonds. Suppose that where y1 is the first eigenvalue of --A with a
a fixed point o is the source of a fluid which Dirichlet boundary condition on the ball in R”
flows from o along open bonds only. with unit volume. The quantity N(J) can be
Let H(p) be the probability that the fluid identified with a limiting state density of A(w),
spreads infinitely far, and define the critical namely, the limit function of
percolation probability pH = infj p 1O(p) > 0).
Then it is known that (1) pH = l/2 for the
square lattice [16], (2) p,=2sin(x/lS) for the as V tends to R” regularly, where {.J.~(w)} is
triangular lattice, and (3) pH = 1 - 2 sin(x/l8) the set of eigenvalues of A(w) in a smooth
for the honeycomb lattice. bounded domain V in R” with a Dirichlet
boundary condition and 1VI is the volume of
V. To obtain the above asymptotils behavior of
E. Random Scbriidinger Equations
N(1), the theory of tlarge deviation for Mar-
Random Scbriidinger equations are tschrtidin- kov processes plays a crucial role [20].
ger equations in R” with random potentials
U(x, w); therefore the corresponding operators
are of the form
F. Boltzmann Equation
A(w)= -A+Uc.,w),

where w denotes a random parameter in a In the kinetic theory of gases the tBoltzmann
probability space (52, 3, P) and A denotes the equation is derived from the tliouville equa-
+Laplacian in R”. It is assumed that this sys- tion by considering the BBGKY hierarchy of
tem of potentials forms a spatially homoge- particle distribution functions for N particles
neous random field with the tergodic property. and then by taking the limit N --* co under
This system of equations is considered to be a certain conditions (- 402 Statistical Me-
model describing the motion of quantum- chanics). Mathematically rigorous. discussions
mechanical particles in a random medium. were given by 0. E. Lanford [21] for a gas of
Mathematically, the problem is to investigate hard spheres; he showed that solutions of the
various spectral properties of the self-adjoint BBGKY hierarchy converge to those of the
operators A(w). Since A(w) and its shifted Boltzmann hierarchy for small time under the
operator A(w(. +x)) are tunitarily equiva- Boltzmann-Grad limit (N + co, Ntl’+ 1, d =
lent, the above assumption on the potentials the diameter of the hard spheres).
U( ., w) implies that the spectral structures are An approach to the Boltzmann equation in
independent of each sample w a.s. if their the spatially homogeneous case can also be
structures of A(o) are measurable with respect based on a tmaster equation. M. Kac [22]
to @, 8:). considered a Poisson-like process describing
Several rigorous results have been obtained. the random time evolution of the n-tuple of
In the l-dimensional case, if the potentials the velocities of n particles. For a gas of hard
u(., w) are functionals of a strongly ergodic spheres this is determined by the master
+Markov process, then it is proved that A(w) equation
has only a pure point spectrum [ 173, and each
a
eigenfunction decays exponentially fast [ 1S]. --utt,x,,
at ~~~,X,)
In multidimensional cases, asymptotic behav-
ior at the left edge of the mean of the resolu-
tions of the identity for a certain A(w) have
been investigated [19]. It is assumed that the
random potential for this A(w) takes the form -u(t~x,,...,x,)}l(xi-xj,~)ld~,
t>O, x1 /.... x,gR3, (1)
U(x,w)= r cp(X-Y)ddY,4,
JR" where S2 is the 2-dimensional unit sphere, dl is
1259 340 Ref.
Probabilistic Methods in Statistical Mechanics

the uniform distribution on S2 and [2] R. L. Dobrushin, The description of a


random field by means of conditional proba-
x; = xi + (Xi - xi, Z)Z, x;=xj-(xj-xi,z)z.
bilities and conditions of its regularity, Theory
Let u be a positive constant and S(G) Prob. Appl., 13 (1968), 197-224. (Original in
denote the (3n - 1)-dimensional sphere with Russian, 1968.)
center 0 and radius ,/&. Given a symmetric [3] 0. E. Lanford III and D. Ruelle, Obser-
probability density u, on S(,/&) for each vables at infinity and states with short range
n > 1, a sequence {u,} is said to have Roltz- correlations in statistical mechanics, Comm.
mann’s property or to be chaotic (or u-chaotic Math. Phys., 13 (1969), 194-215.
to stress u), if there exists a probability density [4] L. Onsager, Crystal statistics I: A two-
u on R3 such that dimensional model with an order-disorder
transition, Phys. Rev., 65 (1944), 117- 149.
lim c~l(x,)...rp,(x,)u,(x,,...,x,)dx, [S] M. Aizenman, Translation invariance and
“-m s S(.JG) instability of phase coexistence in the two
dimensional Ising system, Comm. Math. Phys.,
. ..dx.= fi
k=l R3 cp,(x)Wdx 73 (1980), 83-94.
s
[6] Y. Higuchi, On the absence of non-
for each m > 1 and (pkE C,,(R’), 1 <k < m. Kac’s translationally invariant Gibbs states for the
assertion is that the Boltzmann equation is to two-dimensional Ising model, Proc. Conf. on
be derived from the master equation via the Random Fields, Esztergom, North-Holland,
propagation of chaos; more precisely, if {u,} is 1981.
a u-chaotic sequence, then {un(t)} is also u(r)- [7] R. L. Dobrushin, Gibbs state describing
chaotic, where u,(t) is the solution of the mas- coexistence of phases for a three-dimensional
ter equation (1) with u,(O) = u, and u(t) is the Ising model, Theory Prob. Appl., 17 (1972)
solution of the following Boltzmann equation 582-600. (Original in Russian, 1972.)
with u(O) = u: [8] R. J. Glauber, Time-dependent statistics of
the Ising model, J. Math. Phys., 4 (1963) 294-
&U(f,X)= (44 X’ML Y’) - 44 x)u(t, Y,} 307.
s SxR' [9] R. L. Dobrushin, Markov processes with a
large number of locally interacting compo-
nents: Existence of a limit process and its er-
x'=x+(y-x,Z)Z, y'=y-(y-x,1)1.
godicity, Problems of Information Transmis-
The propagation of chaos was verified by sion, 7 (1971), 149-164; Markov processes
Kac [22], H. P. McKean [24], and others with many locally interacting components:
for a considerably wide class of nonlinear The reversible case and some generalizations,
equations of Boltzmann type (with cutoff). The ibid., 235-241. (Originals in Russian, 1971.)
propagation of chaos is the stage correspond- [lo] F. Spitzer, Interaction of Markov pro-
ing to the tlaw of large numbers. The next cesses, Advances in Math., 5 (1970), 246-290.
stage is the tcentral limit theorem or fluctu- [ 1 l] T. E. Harris, Contact interactions on a
ation theory, which was also discussed by Kac lattice, Ann. Probability, 2 (1974), 969-988.
[23] and McKean [25]. Moreover, based on [ 121 R. Holley and T. M. Liggett, The survival
Kac’s work [22], McKean [26] introduced a of contact processes, Ann. Probability, 6
class of tMarkov processes associated with (1978), 198-206.
certain nonlinear evolution equations includ- [ 131 T. M. Liggett, The stochastic evolution of
ing the Boltzmann equation; a process of this infinite systems of interacting particles, Lecture
type describes the time evolution of the veloc- notes in math. 598, Springer, 1977, 187-248.
ity of a particle interacting with other similar [14] S. R. Broadbent and J. M. Hammersley,
particles. In the case of the spatially homo- Percolation processes I: Crystals and mazes,
geneous Boltzmann equation of Maxwellian Proc. Cambridge Philos. Sot., 53 (1957), 629-
molecules without cutoff, such a Markov 641.
process was constructed by solving a certain [15] R. T. Smythe and J. C. Wierman, First-
tstochastic differential equation [27]. This passage percolation on the square lattice,
implies the existence of probability measure- Lecture notes in math. 671, Springer, 1978.
valued solutions of the equation. [ 161 H. Kesten, The critical probability of
bond percolation on the square lattice equals
l/2, Comm. Math. Phys., 74 (1980), 41-59.
[17] I. J. Goldseid, S. A. Molchanov, and L. A.
References
Pastur, One-dimensional random Schrodinger
operator with purely point spectrum, Func-
[l] E. Ising, Beitrag zur Theorie des Ferro- tional Anal. Appl., 11 (1977), l-10. (Original in
magnetismus, Z. Phys., 31 (1925), 253-264. Russian, 1977.)
341 A 1260
Probability Measures

[IX] S. A. Molchanov, The structure of eigen- B. Quantities Characterizing Probability


functions of one-dimensional unordered Distributions
structures, Math. USSR-Izv., 12 (1978), 69-
101. (Original in Russian, 1978.) Several different quantities characterize the
[19] S. Nakao, On the spectral distribution of properties of probability distributions in one
the SchrGdinger operator with random poten- dimension: the mean (or mathematical expecta-
tial, Japan. J. Math., 3 (1977), I1 l-139. tion) m = s?w x0(x), the variance 0’ = jYZ IX -
[20] M. D. Donsker and S. R. S. Varadhan, HI(‘~@(x), the standard deviation 0, the kth
Asymptotics for the Wiener sausage, Comm. moment c(~= s?, xkd@(x), the kth absolute
Pure Appl. Math., 28 (1975), 525-565. moment flk=JTw Ixlkd@(x), the kth moment
[21] 0. E. Lanford III, Time evolution of large about the mean pLk=JTc(x-m)kdG)(~), etc.
classical systems, Lecture notes in physics 38, A one-to-one correspondence exists be-
Springer, 1975. tween a l-dimensional distribution Q, and its
[22] M. Kac, Probability and related topics in (cumulative) distribution function F defined by
physical sciences, Interscience, 1959. F(x)=@(( -a,~]). A distribution function is
[23] M. Kac, Some probabilistic aspects of characterized by the following properties: (1) It
the Boltzmann-equation, Acta Phys. Austrica is monotone nondecreasing; (2) it IS right con-
Supp. 10, Springer, 1973, 379-400. tinuous; (3) lim,,-,, F(x)=0 and lim,,., F(x)
1241 H. P. McKean, Speed of approach to = 1. Similar statements hold for the multi-
equilibrium for Kac’s caricature of a Max- dimensional case.
wellian gas, Arch. Rational Mech. Anal., 21 Let X(w) be a real random variable on a
(1966), 347-367. tprobability space (Q, 3, P). Then I.he distri-
[25] H. P. McKean, Fluctuations in the kine- bution of X is given by Q(E) = P( { Q 1X(W)E
tic theory of gases, Comm. Pure Appl. Math., E)), EE % ‘, and the characteristic quantities
28 (1975), 435-455. of @ defined above are given in terms of X((u)
1261 H. P. McKean, A class of Markov pro- as follows: m = E(X), d = E(X -m)‘, F(x) =
cesses associated with nonlinear parabolic P( {w 1X(w) <x}), etc. The moments and the
equations, Proc. Nat. Acad. Sci. US, 56 (1966), moments about the mean are connected by
1907~1911. r
the relation p, = XL=0 cc,-,( - m)k (r = 1,2,
1271 H. Tanaka, Probabilistic treatment of the 0k
Boltzmann equation of Maxwellian molecules, ). When @ is an n-dimensional distribution,
Z. Wahrscheinlichkeitstheorie und Verw. the following quantities are frequently used:
Gebiete, 46 (1978), 67- 105. the mean vector, which is an n-dimensional
vector whose ith component is given by mi =
JxidO(x); the covariance matrix, which is
an n x n matrix whose (i, j)-element is crij =
J(xi-mi)(xj-mj)d@(x); the moment matrix,
which is an n x n matrix whose (i,J’)-element is
341 (XVll.2) m,j=jxixjd@(x). (The covariance matrix is
Probability Measures also called the variance matrix or the variance-
covariance matrix.) The covariance matrix
and the moment matrix are ipositive definite
A. General Remarks and symmetric. The quantities listed above
are defined only under some integrability
A probability measure 0 on a tmeasurable conditions.
space (S, G) is defined to be a imeasure on
(S, 5) with Q(S) = 1 (- 270 Measure Theory).
In probability theory, probability measure C. Characteristic Functions
appears usually as the tprobability distribution
of a irandom variable (- 342 Probability Consider a probability measure @ defined on a
Theory). Unless stated otherwise, we regard a measurable space (R”, W’), where !S” is the g-
Yopological space T as a measurable space algebra of all +Borel sets in R”. The character-
endowed with the topological a-algebra B(T) istic function of @ is the +Fourier transform 4”
on r, i.e., the +o-algebra generated by the defined by
+open subsets of T. Hence the distribution
an R”-valued random variable is a probability
measure on (R”, 8”=G1J(R”)). From this proba-
of
cei@3x)dcqx),
47(z)= JR”
ZER”, (1)

bilistic background we often call a probability where (z, x) denotes the +scalar product of z
measure on (R”, %“) an n-dimensional (proba- and x (z, x E R”). Let X be an n-dirnensional
bility) distribution. For probability measures random variable with probability distribution
on topological spaces - 270 Measure Theory. Q defined on a tprobability space (a,‘%, P).
1261 341 D
Probability Measures

Then the Fourier transform of Q, is also called necessarily exist for all n-dimensional distri-
the characteristic function of X, which can also butions but does exist for a number of useful
be written as E(ei@,X)) (- 342 Probability probability distributions @, and then f(z)
Theory). uniquely determines @,.
The following properties play a fundamental Given a l-dimensional distribution @ with
role in the study of the relationship between fik < +co, we denote by yk the coefficient of
probability distributions and characteristic (i~)~/k! in the TMaclaurin expansion of logrp(z).
functions: (i) the correspondence defined by (1) We call yk the (kth order) semi-invariant of 0,.
between the n-dimensional probability distri- The moments and semi-invariants are con-
bution @ and its characteristic function cp is nected by the relations y1 = CI~, y2 = CQ-a: =
one-to-one. (ii) For any up, b,ER, a,< b, (p= c?, y,=a,-3a,a,+2a:, y4=a4-3a:-4alag
1,2, . . , n), we have +12$a,-6af,....

D. Specific Distributions

Given an n-dimensional distribution @‘, a


point a with @({u}) > 0 is called a disconti-
xdz,,..., z,W1,...,~z,, (2) nuity point of Q. The set D of all discontinuity
where f(t; a, b) denotes the modified indicator points of @ is at most countable. When O(D)
function of [a, b] defined by = 1, Q, is called a purely discontinuous distribu-
tion. In particular, if D is a lattice, @ is called a
1, t+,b), lattice distribution. If the distribution function
f(t;a,b)= l/2, t=u or b, of @ is a continuous function, @ is called a
i 0, t $ Cu.bl, continuous distribution. By virtue of the
tLebesgue decomposition theorem, every
and x=(x,, . . . . x,) E R”. If an n-dimensional
probability distribution can be expressed in
interval I = [a,, . , a,; b,, . . . , b,,] defined by the form
ai6xi<bi(i=1,2,..., n) is an interval of con-
tinuity for the probability distribution Q,, i.e., @=u,@,+u,@,+u,a$
@(al) = 0, where 81 denotes the boundary of I,
a,,~,,%~o, u,+u,+a,=l,
then the left-hand side of (2) is equal to Q(I).
Equation (2) is called the inversion formula for where Q1 is purely discontinuous, aD, is tab-
the characteristic function cp. solutely continuous with respect to tLebesgue
The characteristic function cp of an n- measure, and @‘3 is continuous and tsingular.
dimensional probability distribution has the Let @ be an absolutely continuous distri-
following properties: (i) For any points z(l), bution. Then there exists a unique (up to Le-
. . . . zCp)of the n-dimensional space R” and any besgue measure zero) measurable nonnega-
complex numbers a,, . . . , up, we have tive function f(x) (xe R”) such that Q(E) =
JEf(x)dx. This function f(x) is called the
j,fl dz”’ - z’k’)uja,2 0. probability density of @.
We now list some frequently used l-
(ii) cp(zCk))converges to ~(0) as zCk)-+O. (iii) ~(0) dimensional lattice distributions (for explicit
= 1. A complex-valued function cp of z E R” is data - Appendix A, Table 22): the unit distri-
called tpositive definite if it satisfies the in- bution with @( (0)) = 1; the binomial distribu-
equality in (i). Any continuous positive deii- tion Bin(n, p) with parameters n and p; the
nite function cp on R” such that ~(0) = 1 is the Poisson distribution P(1) with parameter i;
characteristic function of an n-dimensional the geometric distribution G(p) with parameter
probability distribution (TBochner’s theorem) p; the bypergeometric distribution H(N, n, p)
(- 192 Harmonic Analysis). A counterpart to with parameters N, n, and p; and the negative
Bochner’s theorem holds for any positive binomial distribution NB(m, 4) with parameters
definite sequence as well (THerglotz’s theorem). m and q. The following k-dimensional lattice
The characteristic function is often useful for distributions are used frequently: the multi-
giving probability distributions explicitly. (For nomial distribution M(n, p) with parameters n
characteristic functions of typical probability and p; the multiple hypergeometric distribution;
distributions - Appendix A, Table 22. For the negative multinomial distribution; etc.
general information about criteria that can be The following l-dimensional distributions
used to decide whether a given function is a are absolutely continuous: the normal distri-
characteristic function - [8].) bution (or Gaussian distribution) N(p, CT*)with
The moment generating function defined by mean p and variance g2 (sometimes N(0, 1) is
f(z)=lexp( -(z,x))d@(x) (zcR”) does not ~ called the standard normal distribution); the
341 E 1262
Probability Measures

Caucby distribution C(p, a) with parameters random variables. The mean concentration
,u (+median) and a; the uniform distribution function defined by
U(a, /I) on an interval [x, /I]; the exponen-
tial distribution e(a) with parameter a; the c&j ; (F(x + I) - F(x - 1))2 dx
gamma distribution T(p, c); the +X2 distribu- s cc
tion x’(n); the beta distribution B(p, q); the F- is also useful for similar purposes.
distribution F(m, n); the Z-distribution Z(m, n); Let N(m, II) be the 1-dimensiona. normal
the t-distribution t(n); etc. Furthermore, there distribution with mean m and variance u, and
are several k-dimensional absolutely continu- let P(I,a) be the distribution obtained through
ous distributions, such as the k-dimensional translation by a of the Poisson distribution
normal distribution N( p, C) with mean vector with parameter J.. If l-dimensional distri-
p=(pIrpz, . . ..pJ and covariance matrix I= butions Qk, ‘Pk (k= 1,2) exist such that N(m, u)
(a,), the Dirichlet distribution, etc. =Ql em,,, P(i,a)=Y, *Y2, we have Qk=
N(m,, ok), Yk= P(R,,a,) (k= 1,2) for some
mk, c~, i,, uk (k = 1,2). These are known, respec-
E. Convolution tively, as Cramer’s theorem and Raikov’s
theorem. Yu. V. Linnik proved a similar fact
(the decomposition theorem) for a more gen-
Given any two n-dimensional distributions Or,
eral family with reproducing property by using
Qz, the n-dimensional distribution @(E) =
the theory of analytic functions [9].
SR2n~E(.~+t)d~l(~)d~Z(y) is called the com-
position (or convolution) of @, and @, and is
denoted by Q, *Qz, where xE is the indicator
F. Convergence of Probability Distributions
function of the set E. Let X, and X, be tinde-
pendent random variables with distributions
The concept of convergence of distributions
@, and OI. Then the distribution of X, +X, is
plays an important role in limit theorems and
@I * Qz When Fi is the distribution function of
other fields of probability theory. When R is a
Qi (i= 1,2), the distribution function F, * F2 of
topological space, we consider convergence of
@, * Qz is expressed in the form F, * F2(x) =
probability measures on 0 with respect to the
Ja”F,(n:-y)dF,(y). If @, has a density fr(x),
tweak topology introduced in the space of
then @, * Qz has a density f(x) = JR”fi (x -
measures on Q (- 37 Banach Spaces). Such
y)dF,(y). If q(z) is the characteristic function
convergence is called weak convergence in
of the convolution of two probability distri-
probability theory. For a sequence of n-
butions a, and @)2with characteristic func-
dimensional distributions (I$ (k = 1,2, ) to
tions ‘p, and ‘pz, then cp is the product of ‘pr
converge to @ weakly, each of the following
and cp2: q(z) = ‘p,(z) (p*(z). Therefore, for every
conditions is necessary and sufftcient. (1) For
k, the kth order semi-invariant of the convo-
every continuous function with compact sup-
lution of two distributions is equal to the sum
port, lim,,,Sn.f(x)d~,(x)=Sa.f(x)d~,(x). (2)
of their kth order semi-invariants. Suppose
At every continuity point of the distribution
that we are given a family of distributions
function F(x,, ,x,) of @, lim,,, &(x1,. ,xJ
@ = {@(a, p, . )} indexed with parameters
= F(x,, . , x,) (Fk is the distribution func-
cc,/l,.... Iffor(a,,/?I ,,... )and(cr,,& ,... )there
tion of Q. (3) For every continuity set E of
exists(x3,& ,... )suchthat@(cc,,/I, ,... )*
@ (namely, a set such that Q(E - I?‘) = 0),
@(x,, /I’~, . . )= @(cc,, &, ), then we say that ct,
lim,,, Ok(E) = D(E). (4) For all open G c R”,
has a reproducing property. Some of the distri-
lim inf,,, (D,(G) 2 O(G). (5) For all closed F c
butions listed above have the reproducing
R”, lim sup,,, %(F)< W’). (6) Iirk, P(%, @I
property: P(Ir)*P(I,)= P(1, +I,), i?in(n,,p)*
=O, where p is a metric defined in the follow-
Bin(n,,p)=Bin(n,+n,,p),NB(m,,q)* ing way: Given any n-dimensional distribu-
NB(m,,q)=NB(m,+m,,q), Nh,d* tions @r, @I, we put ~~~=inf{~(@~i(F)<@~(F~)+
N(~c,.rr:)=N(~l+~2,~:+~22), r(~l,a)*
E for every closed F} (F” is the s-neighborhood
r(P2,F.)=uP,+P2,4, C(P,,~,)*C(/*,,%)= of F) and define p(@1,@‘2)=max(8s,2,c2,). The
W, + v2, o1 + g2), etc. metric p, called the Levy distance. was intro-
Given a l-dimensional distribution function
duced by Levy [6] in one dimension and by
F(x). Yu. V. Prokhorov in metric spaces [lo]. Each
QJl)=- max (F(x+i)-F(x-1)), I>O, of these conditions except (2) is still necessary
--mcx<‘x
and sufficient for @)nto converge weakly to @
is called the maximal concentration function of when @, and @ are probability measures on a
F (P. Levy [6]). Since it satisfies the relation tcomplete separable metric space It should
QF,*F2(I)<QFI(I) (i= 1,2), we can use it to also be noted that the probability measures
study the properties of sums of independent on a complete separable metric space consti-
1263 341 G
Probability Measures

tute a complete separable metric space with Let Xki, i= 1,2, . . . . n(k), be independent
respect to the Levy distance. random variables for every k, and assume that
A family @JaEA) of probability measures the distribution of Xki belongs to a(~&, i =
on a complete separable metric space is said to 1,2, . . . . n(k), where s,-+O as k+m. If the
be tight if for every E> 0 there exists a tcom- probability distributions of the sums X, =
pact set K = K(E) such that @,(K’) <E for all CyLkj Xki converge to a probability distribu-
aeh. A family Qa (aeA) is tight if and only if it tion as k+ co, then the limit distribution is
is ttotally bounded with respect to the topol- infinitely divisible.
ogy induced by the Levy distance. Hence a The characteristic function of a l-dimen-
tight family Qa (a E A) has a weakly convergent sional infinitely divisible distribution can be
subsequence. written in the form
We can give a criterion for the convergence
of probability measures in terms of their char- cp(z)=exp irz-gz2
acteristic functions. Suppose that ak and CDare (
n-dimensional probability measures with char- m

acteristic functions (pk and rp. Then $ con- + 4% z)- 1 y’ dG(u) T (3)
1 -02 >
verges weakly to @ if and only if for every z,
lim, q,Jz) = q(z). Let (pk be the characteristic where y is a constant, o is a nonnegative con-
function of an n-dimensional probability dis- stant, G(u) is a nondecreasing bounded func-
tribution ak. If the sequence { (pk} converges tion with G(-co)=O, A(u,z)=exp(iuz)- l-
pointwise to a limit function cp and the conver- izu/( 1 + u’), and the value of A(u, z)( 1 + uz)/uz
gence of (Pi is uniform in some neighborhood at u = 0 is defined to be - z2/2. Formula (3) is
of the origin, then cp is also the characteristic called Khinchin’s canonical form. For the char-
function of an n-dimensional probability dis- acteristic function of an infinitely divisible n-
tribution CJand the sequence {@,} converges dimensional distribution, the canonical form is
weakly to @ [7] (Levy’s continuity theorem). as follows:
For any probability distribution concen-
trated on [0, co), the use of tLaplace trans- q(z) =exp i(m, z) - t cpqzpzq
( P.4’1
forms as a substitute for Fourier transforms
provides a powerful tool. The method of ei(zsx)-l -$A$ n(dx) ,
+
probability generating functions is available J( R” > >
for the study of arbitrary probability distri-
z=(z,, . . . . z&R”, (4)
bution concentrated on the nonnegative inte-
gers [14]. The method of moment-generating where rn~:R”, (cp,) is a positive semidefinite
functions is also useful. There are many results matrix, and n(dx) is a measure on R” such that
on the relation between these functions, proba- n({O})=O and
bility distributions, and their convergence [ 14-
161. -n(dx) < co.
Let @r, @z, . . and @ be l-dimensional
distributions. If all absolute moments exist, Formula (4) is called L&y’s canonical form. If
~~l/?[lu=cc forBj=S”“~lIxljd~(x)<co, a l-dimensional infinitely divisible distribution
and lim,,,J?, xjd@,(x)=J:,xjd@(x) (j= @ satisfies JR1 x2 d@(x) < co, then its character-
0, 1,2, . . ), then Qk weakly converges to CD. istic function is given by
This condition is sufficient but not necessary.
cp(z)=exp imz-iz’
(

G. Infinitely Divisible Distributions m (eizu


+ -1-izu)$dK(u) , (5)
5 -a’ >
An n-dimensional probability distribution @ where m is a real constant, v is a nonnega-
is called infinitely divisible if for every posi- tive constant, and K(u) is a nondecreasing
tive interger k, there exists a probability dis- bounded function such that K( -co)=O. It
tribution Qk such that @ = Qk * @‘r * . . . * @‘k is called Kolmogorov’s canonical form. (For
(= @tk). Both normal distributions and Pois- infinitely divisible distributions on a homo-
son distributions are infinitely divisible. If an geneous space - 5 Additive Processes.)
n-dimensional distribution @ satisfies the con- Let CDand Y be n-dimensional distributions.
dition slxl ,,@(dx) <a, we say that @‘E U(E). Then Ifforsome1>O,Y(E)=(D(IE)(1E={~~l~~
CDis an infinitely divisible distribution if and E}) for every set E, we say that @ and Y are
onlyifforeverya>Owecanlind@l,@,,,..., equivalent. Let @ and Y be probability distri-
@,Eu(s) such that @=@, *Q2* . . . *$. butions with distribution functions F and G
341 H 1264
Probability Measures

and characteristic functions cp and $. Then (2/n)loglzl (SL= 1). The parameter u is called
the following three statements are equivalent: the exponent of the quasistable distribution. A
(1) CDand Y are equivalent; (2) G(x) = F(ix) for quasistable distribution with c(# 1 is obtained
every x; and (3) tj(jLz) = q(z) for every z. We from a stable distribution by tran:jlation, but
call @ a stable distribution if for every pair of quasistable distributions with a = 1 are not.
distributions al, @)2equivalent to @, the con- Semistable distributions are another gen-
volution @I * @)2 is equivalent to @,. If @ is eralization of stable distributions. A distri-
stable, every distribution equivalent to @ is bution is called semistable if its characteristic
also stable. We can characterize stable distri- function q(z) satisfies the relation tj(qz) =
butions in terms of their characteristic func- q”$(z) for a positive number q (# l), where
tions q(z) as follows: For every pair 1,, , i, > 0, q(z) = exp(+(z)). Also in this case, the general
there exists a i = i(l, ,&) > 0 such that cp(iz) = form was obtained by L&y [6].
cp(1,z)cp(iL,z). We can restate this characteri- A l-dimensional probability distribution D
zation as follows: @ is stable if and only if is called an L-distribution if the distribution
for every pair of independent random vari- function F of @ is the convolutioc. of F(x/a)
ables X, and X, with identical distribution @ and some other distribution function F,(x) for
and for any positive numbers i., and A,, there every 0 <a < I @ is an L-distribution if and
exists a positive number i, such that (i,X, + only if there exists a sequence of independent
j.2X2)/). has the distribution 0. By the deli- random variables {X,} such that for suitably
nition we see that all stable distributions are chosen constants B, > 0 and A, the distri-
infinitely divisible. butions of the sums B;‘(& X,)-A, con-
In the l-dimensional case, putting q(z)= verge to @ and sup,.,,,P(Ix,/B,I>~)~o as
expll/(z), we have I/J(,~.z)=+(~,z)+$(~“~z), n+ a for every c > 0. Quasistable distribu-
which implies ~(z)=(-~~+i(z/~z~)cJ~z~~, tions are L-distributions.
wherec,>O, -oo<c,<co,O<x<2,The
parameter x is called the exponent (or index)
of the stable distribution. The stable distribu-
H. The Shape of Distributions
tions with exponent x= 2 are the normal dis-
tributions, and the stable distributions with
Let F(x) be a l-dimensional distr bution func-
exponent a = 1 are the Cauchy distributions.
tion. The quantity cP such that F([,, - 0) <p <
We have +(z) = - c0 lzla for a symmetric stable
F(l,) (0 < p < 1) is called the quanfile of order
distribution. (For the stable distribution with
p of F. In particular, the quantity [1,2 is called
exponent l/2 - Appendix A, Table 22).
a median. If F satisfies the relation 1 - F(m + x)
Generalizing stable distributions, we can
= F(m-xx), it is called symmetric. In any l-
define quasistable distributions, which B. V.
dimensional symmetric distribution, every
Gnedenko and A. N. Kolmogorov 1171 called
moment of odd order about the mean (if it
stable distributions also. Let F be the distri-
exists) is equal to zero.
bution function of a l-dimensional distri-
The ratio y, = /lx/a3 is used as a measure of
bution @. @ is said to be quasistable if to
departure from symmetry of a distribution and
every b, >O, b, > 0 and real I,, 1, there corre-
is called the coefficient of skewness. Further-
spond a positive number b and a real number
more, the ratio 1~~= p3/a4 - 3 is called the
i such that we have the relation F((x--I,)/b,)*
coefficient of excess. For the normal distri-
F( (x - i2)/b2) = F( (x - i)/b).
bution, we have y, = y2 = 0. If y2 f 0, yZ ex-
Let {X,) be a sequence of independent ran-
presses the degree of deviation from the nor-
dom variables with identical distribution. If for
mal distribution.
suitably chosen constants A, and B, the distri-
A distribution function F(x) is called uni-
butions of the sums B[‘(Cbl X,)-A, converge
modal if there exists one value x =: a such that
to a distribution, the limit distribution is a
F(x) is convex for x <u and concave for x > a.
quasistable distribution (L&y). A necessary
All L-distributions (and hence quasistable
and sufficient condition for a distribution to
distributions) are unimodal [ 181.
be quasistable is that its characteristic func-
tion q(z) satisfy the relation cp(h,z)cp(b,z)=
cp(bz)e@ (y = i -2, -&). The characteristic
function of a quasistable distribution has the I. Kolmogorov’s Extension Theorem
canonical representation
Let R = RT, where T is an arbitrary index set.
d4 = exp $(z)? We associate with RT the o-algebra dT gen-
$(z)=imz-c~zlz(l+i~(z/~zl)cr~(z.a)), erated by the cylinder sets, i.e., {o)~nl n,,(w)6
E,, ,~~Jw)EE,}, where X,(W) denotes the tth
where m is a real number, c > 0,O < n < 2, coordinateofru,EkE93(R1), l<k<n,t,<t,<
IflI < 1, and Q(z,z)= tan(lrr/2) (ctf l), w(z, z)= < t,, and n = 1,2,. Given a probability
1265 341 Ref.
Probability Measures

measure @ on (RT, bT), we can define a fmite- d( T,*) of T,* as the a-algebra generated by
dimensional tmarginal distribution Qs for any the system of half-spaces {xeT,*;x(t)<a}, tE
finite subset S of T by (D,(E)=@(x;‘(E)), EE T, UER’. For a probability measure 0 on

dt)=
23’, where rcSis the natural tprojection Q: (T,*,B(T,*)), define
RT-+RS. The measures {Qs} satisfy the follow-
ing consistency condition: If S, c S, (c T) are
finite and if EE @I, then s TT
exp(i(x, t))d@(x),

which is called the characteristic


tET,

functional of
@ A functional cp on T is the characteristic
where A~, s, : R S2-+RS~ is the natural projection. functional of a probability measure @ on
Conversely, if we are given a family of finite- (T,*, b( T,*)) such that Q*( U,, Taz) = 1 for some
dimensional probability measures {Qs} which sequence {cc,} c A, where @* is the touter
satisfies the consistency condition (6), then measure (- 270 Measure Theory) and T,* is
Kolmogorov’s extension theorem [ 1) asserts the topological dual of (7’, 11.II,), if and only if cp
that there exists a unique probability measure is positive definite, ~(0) = 1, and continuous
CDon (RT,23jT) such that Qs(E)=@(rr~‘(E)), with respect to the topology I(z) [23].
EE ds, for any finite S c T. As special cases of the foregoing theorem,
This theorem is useful in constructing tsto- we have the following. If (T, 7) is a nuclear
chastic processes. For example, let @)n1,n2,.,,,” I space, then every positive definite r-continuous
be the tproduct measure of k copies of a given functional cp with ~(0) = 1 is the characteristic
probability measure Q on R’. Then the family functional of a probability measure on T,*
Pn,,n, ,..., +;nl,n2,.-, nk E Z, k E N} satisfies the (Minlos [24]). Schwartz’s spaces Y(R) and
consistency condition and hence, by Kolmo- 9(R”) are nuclear. Let (7, r = 11.11)be a Hilbert
gorov’s extension theorem, determines a prob- space. A tHilbert-Schmidt operator U is, by
ability measure on RZ, which is denoted by CD’. definition, a tbounded linear operator on T
Thus X,(~)=R,(O), neZ, (we(RZ,Bz,cDZ)), such that xi I/ Ueill * < co, by any tcomplete
are independent identically @distributed ran- orthonormal system {ei} (this quantity does
dom variables. not depend on the choice of {ei}). Define a
Kolmogorov’s extension theorem is gen- seminorm Il.//c by Iltllu= IIUtll, teT, for a
eralized to the case where the component Hilbert-Schmidt operator U. Then the topol-
spaces are tstandard measurable spaces (- ogy l(7) coincides with the topology induced
270 Measure Theory) instead of R’, and also by the system of seminorms 11.jj “, where U are
to the case where product spaces are replaced Hilbert-Schmidt operators, which is called
by tprojective systems [ 193. the Sazonov topology. Thus every functional
cp on T, which is positive definite, ~(0) = 1,
and continuous with respect to the Sazonov
J. Characteristic Function& on Intkite- topology, is the characteristic functional of a
Dimensional Spaces probability measure on T,* (Sazonov [25]).
The probability measure on Y’ with the
characteristic functional exp( -jFm lf(s)l’ds),
Contrary to the finite-dimensional case, fe 9, is the probability measure of a Gaussian
Bochner’s theorem does not necessarily hold twhite noise on Y’.
in infinite-dimensional spaces. For example, let
(T, 11.11)be an infinite-dimensional tHilbert
space and q(t) = exp( - 11tll’). cp is continuous
References
and positive definite, and p(O)= 1. But it is
known that there is no probability measure on
T= T* (topological dual of T) which corre- [ 1] A. N. Kolmogorov, Grundbegriffe der
sponds to cp. Bochner’s theorem is generalized Wahrscheinlichkeitsrechnung, Springer, 1933;
to infinite-dimensional spaces as follows. English translation, Foundations of the theory
Let T be a real tvector space endowed with of probability, Chelsea, 1950.
the topology z defined by a system of tHilbert- [2] H. Cramer, Mathematical methods of
ian seminorms { 11.IIoL,c(E A}. Define a new statistics, Princeton Univ. Press, 1946.
topology I(7) of T by all Hilbertian semi- [3] J. L. Doob, Stochastic processes, Wiley,
norms (1.(( each of which is HS-dominated by 1953.
some Il’l/a, C(EA, i.e., SUp{(~i Ileill’)“‘; {ei}:a- [4] M. M. Loeve, Probability theory, Van
orthonormal} < co. If I(z) = 7, then (T, 7) is Nostrand, third edition, 1963.
called a tnuclear space. [S] W. Feller, An introduction to probability
Let T,* be the topological dual of (T, 7) (i.e., theory and its applications II, Wiley, 1966.
the set of all r-continuous real valued linear [6] P. Levy, Theorie de l’addition des vari-
functionals on T). Define a Bore1 structure ables aleatoires, Gauthier-Villars, 1937.
342 A 1266
Probability Theory

[7] P. L&y, Calcul des probabilitCs, Gauthier- 342 (XVII.1)


Villars, 1925.
[S] E. Lukacs, Characteristic functions, Haf-
Probability Theory
ner, 1960.
[9] Yu. V. Linnik, Decomposition of proba- 1 A. History
bility distributions, Oliver & Boyd, 1964.
(Original in Russian, 1960.) The origin of the theory of probability goes
[IO] Yu. V. Prokhorov, Convergence of ran- back to the mathematical problems connected
dom processes and limit theorems in proba- with dice throwing that were discussed in
bility theory, Theory of Prob. Appl., I (1956), letters exchanged by B. Pascal and P. de Fer-
155-214. (Original in Russian, 1956.). mat in the 17th century. These problems were
[I I] L. LeCam, Convergence in distribution of concerned primarily with concepts such as
stochastic processes, Univ. California Publ. tpermutations, tcombinations, and tbinomial
Statist., 2 (1957), 207-236. coefficients, whose theory was established at
[ 121 K. R. Parthasarathy, Probability mea- about the same time [ 11. This elementary
sures on metric spaces, Academic Press, 1967. theory of probability was later enriched by the
[ 131 P. Billingsley, Convergence of probability work of scholars such as Jakob Bernoulli [2],
measures, Wiley, 1968. A. de Moivre [3], T. Bayes, L. de Buffon,
[ 141 W. Feller, An introduction to probability Daniel Bernoulli, A. M. Legendre, and J. L.
theory and its applications I, Wiley, second Lagrange. Finally, P. S. Laplace completed the
edition, 1957. classical theory of probability in his book
[ 151 J. A. Shohat and J. D. Tamarkin, The Thhrie analytique des prohahilitds (I 8 12). In
problem of moments, Amer. Math. Sot. Math. this work, Laplace not only systematized but
Surveys, 1943. also greatly extended previous important
[ 161 D. V. Widder, Laplace transform, Prince- results by introducing new methods, such as
ton Univ. Press, 1941. the use of tdifference equations and tgenerat-
[ 171 B. V. Gnedenko and A. N. Kolmogorov, ing functions. Since the 19th century, the
Limit distributions for sums of independent theory of probability has been extensively
random variables, Addison-Wesley, 1954. applied to the natural sciences and even to the
(Original in Russian, 1949.) social sciences.
[I81 M. Yamazato, Unimodality of infinitely The definition of a priori probaoility due to
divisible distribution functions of class L, Ann. Laplace provoked a great deal of argument
Prob., 6 (I 978), 523-531. when it was applied. For example, R. von
[I91 S. Bochner, Harmonic analysis and the Mises advocated an empirical theory of proba-
theory of probability, Univ. of California bility based on the notion of Kollektiv (col-
Press, 1955. lective), which is a mathematical model of
1201 E. Nelson, Regular probability measures mass phenomena [S]. However, these argu-
on function space, Ann. Math., (2) 69 (1959), ments are concerned with philosophical rather
630-643. than mathematical aspects. Nowa’rlays, the
[2l] 1. M. Gel’fand and N. Ya. Vilenkin, main concern of mathematicians lies not in the
Generalized functions IV, Applications of intuitive or practical meaning of probability
harmonic analysis, Academic Press, 1964. but in the logical setup governing probability.
(Original in Russian, 1961.) From this viewpoint the mathematical model
[22] Yu. V. Prokhorov, The method of charac- of a random phenomenon is given by a proba-
teristic functionals, Proc. 4th Berkeley Symp. bility measure space (Q23, P), where R is the
Math. Stat. Prob. II, Univ. of California Press, set of all possible outcomes of the phenom-
1961,403-419. enon, P(E) represents the probability that an
[23] A. N. Kolmogorov, A note on the papers outcome belonging to E be realized, and %3is
of R. A. Minlos and V. Sazonov, Theory of a o-algebra consisting of all sets E for which
Prob. Appl., 4 (I 959), 22 l-223. (Original in P(E) is defined. All probabilistic concepts,
Russian, 1959.) such as random variables, independence, etc.,
[24] R. A. Minlos, Generalized random pro- are defined on (Q 23, P) in terms of measure
cesses and their extension to a measure, Selec- theory. Such a measure-theoretic basis of
ted Transl. Math. Stat. and Prob., Amer. probability theory is due to A. Kolmogorov
Math. Sot., 3 (1962), 291-3 13. (Original in [6], though similar considerations had been
Russian, 1959.) made before him for special probllzms, for
[25] V. Sazonov, A remark on characteristic example, in the work of E. Bore1 concerning
functionals, Theory of Prob. Appl., 3 (1958), the strong law of large numbers [“I and in the
188-192. (Original in Russian, 1958.) rigorous definition of Brownian motion by
[26] M. Kac, Probability and related topics in N. Wiener [S].
physical sciences, Interscience, 1959. Ever since probability theory was given
1267 342 B
Probability Theory

solid foundations by Kolmogorov, it has made ~ event, sure event) is the complementary set EC
tremendous progress. The most important (empty set 0, whole space Q). For a finite or
concept in today’s probability theory is that of infinite family {E,} (AE A), the sum event (resp.
tstochastic processes, which correspond to intersection or product event) of E, is the set
functions in analysis. In applications a sto- U,E,(n,E,).IfEflF=@,thenwesaythatE
chastic process is used as the mathematical and F are mutually exclusive or that they are
model of a random phenomenon varying with exclusive events.
time. The following types of stochastic pro- By the definition of P, we have 0~ P(E) <
cesses have been investigated extensively: 1 for any event E, P(0) = 0, and P(n) = 1.
tadditive processes, tMarkov processes and Moreover, if {E,} (n = 1,2,. . . ) is a sequence of
tMarkov chains, tmartingales, tstationary pairwise exclusive events and E is the sum
processes, and tGaussian processes. tBrownian event of E,, we have
motion and tbranching processes are impor-
tant special stochastic processes. In the same P(E)= f P(E,).
II=1
way as functions are often defined by differen-
tial equations, there are stochastic processes This property is called the additivity of proba-
which can be defined by tstochastic differen- bility. If P(E) = 1, the event E is said to occur
tial equations. The theory of stochastic pro- almost certainly (almost surely (abbrev. a.s.),
cesses and stochastic differential equations for almost all w, or with probability 1).
can be applied to tstochastic control, tstochas- Given a finite sequence {E,} (n = 1,2, . . . , N)
tic filtering, and tstatistical mechanics. The of events, we say that the events E, (n = 1,2,
tergodic theory that originated in statistical . ..) N) are mutually independent or that the
mechanics is now regarded as an important sequence {E,} (n = 1,2, . . . , N) is independent
branch of probability theory closely related to if every subsequence satisfies
the theory of stationary processes.
P(EilflEi2fl...flEi,)=fiP(E,S.
j=l

B. Probability Spaces Given an infinite family {E,} (LoA) of events,


we say that the events E, (Ieh) are mutually
Let fi be an tabstract space and 23 be a to- independent or that the family {E,} (I E A) is
algebra of subsets of R. A probability measure independent if every finite subfamily is inde-
(or probability distribution) over R(B) is a set pendent. The concept of independence of
function P(E) defined for EE B and satisfying events can be generalized to a family {Bk}
the following conditions: (Pl) P(E) 20; (P2) for (noA) of o-subalgebras of b as follows. A
every sequence {E,} (n = 1,2, . . . ) of pairwise family {b,} (1 E A) of o-subalgebras of events is
disjoint sets in 8, said to be independent if for every choice of
E, E 23,, the family {E,} (A E A) of events is
p UE” =CW,); independent.
( ” > n
For a sequence {E,} (n = 1,2,. . . ) of events,
(P3) P(a) = 1. The triple (a, 8, P) is called a the sets lim sup, E, and lim inf, E, are called the
probability space. The space R (resp. each superior limit event and inferior limit event,
element w of a) is called the basic space, space respectively. The superior limit event (inferior
of elementary events, or sample space (resp. limit event) is the set of all w for which in-
sample point or elementary event). We say that finitely many events among E, (all events
a condition E(W) involving a generic sample except finitely many E,) occur. Therefore
point w is an event; in particular, it is called a P(liminf,E,) is the probability that in&
measurable event or random event if the set E nitely many events among E, occur, and
of all sample points satisfying E(W) belongs P(lim sup,, E,) is the probability that the events
to 23. We assume that an event is always a E, occur for all n after some number n,,, where
measurable one, since we encounter only n, depends on w in general. The Borel-Cantelli
measurable events in the theory of probability. lemma, which is concerned with the evaluation
Because of the obvious one-to-one corre- of P(lim sup” E,), reads as follows: Given a
spondence between measurable events and b- sequence {E,} (n = 1,2, . . . ) of events, we have
measurable sets (i.e., the correspondence of (i) whether the events E, (n = 1,2, . . . ) are mutu-
each event E with the set E of all sample points ally independent or not, Z,, P(E,) < co implies
w satisfying E), a b-measurable set itself is that P(lim sup” E,) = 0; and (ii) if the events
frequently called an event. If E is an event and E, (n = 1,2,. . ) are mutually independent,
E is the b-measurable set corresponding to E, Z” P(E,) = cc imples that P(lim sup” E,) = 1.
we call P(E) or Pr(s) the probability that the Frequently, applications of part (ii) are greatly
event E occurs, i.e., the probability of the event hampered by the requirement of independence;
E. The complementary event (resp. impossible a number of sufficient conditions for depen-
342 C 1268
Probability Theory

dent events to have the same conclusion as pendent or that the sequence {X,) (n = 1,2,
(ii) have been discovered. The Chung-Erdik , N) is independent. Given an infinite family
theorem [9] is quite useful in this connection. {X,} (n E A) of random variables, we say that
the random variables are mutually independent
or that the family is independent if every
C. Random Variables
finite subfamily is independent. The latter
definition of independence of random vari-
Let (0.23, P) be a probability space. A random
ables is compatible with the previous defini-
variable is a real-valued function X defined on
tion of independence of c-subalgebras of d;
R that is %-measurable (i.e., for every real
i.e., if b [X,] denotes the a-subalgebras of
number a, the set {w 1X(w) < u} is in 23). If
23 generated by the sets {w I X,(WIC A,}, with
X,, X,, , X, are random variables, the map-
A, an arbitrary l-dimensional Bore1 set, the
ping X =(X1, X,, , X,) from s1 into R” is said
independence of the family {X,} (1.E A) in the
to be an n-dimensional (or R”-valued) random
latter sense is equivalent to the independence
variable. More generally, a mapping X from
of the family {b [X,] } (1 E A) in the previous
(Q, 23) into another imeasurable space (S, C5)is
sense. If the X, (n= 1,2, . . . . N) (X,{ (3,A)) are
called an (S, e)-valued random variable if it is
k,- (k,-) dimensional random variables, then
measurable, that is, for every set A of e, the set
the independence of the family {X,,} (n = 1,2,
{w I X((U)E A} belongs to d.
“‘> N) ({X,} (1 E A)) is defined similarly; it is
Let ‘23’ be the o-algebra of all +Borel subsets
enough to take k,- (k,-) dimensional Bore1
of the real line R. Then each random variable
sets A,, (A,) for l-dimensional Bore1 sets in
X induces a probability measure @ on (R, B1)
equation (1).
such that
Given a family {X,} (3, E A) of r,andom vari-
ables, the smallest o-algebra with respect to
which every X, is measurable is called the (r-
The measure @ is called the (l-dimensional)
algebra generated by {X,} (1eA) and is de-
probability distribution of the random variable
noted by 23 [X, I i E A]. Each element of this
X or simply the distribution of X. The point
class is said to be measurable with respect to
function F defined by
the family {X,} (A E A) of random variables.
W=P({~IX(4d~}), UER, Since a random variable X is a 2%
measurable function, we can speak of the
is a monotone nondecreasing and right con-
tintegral of X relative to the measure P on 8.
tinuous function such that lim,,, F(u) = 1,
If X is integrable relative to P, the integral of
lim,,-, F(u) = 0. The function F is called the
,X over A is denoted by E(X; A). E(X; Q), usu-
cumulative distribution function (or simply the
ally denoted by E(X), is called the mean, ex-
distribution function) of the random variable X.
pectation, or expected value of X, denoted also
Similarly, an n-dimensional random variable
by M(X) or m,. If (X - E(X))’ is integrable,
X =(X1 , , X,) induces its n-dimensional prob-
ability distribution (or simply n-dimensional V(X) = E((X - E(X))Z)
distribution) and its n-dimensional distribution
is called the variance of X, denoted by a’(X).
function F(u, , . , u,)=P({wIX,(w)~a,,...,
The standard deviation of X is the nonnegative
X,(o) :Gun}). If the X, (n = 1,2, . , N) are k,-
square root a(X) of the variance. If X and Y
dimensional random variables (n = 1,2, , N),
are two random variables for which E((X -
we say that the I( = Cr=, k,)-dimensional ran-
E(X))(Y-E(Y))) exists, the value E((X-
dom variable X=(X,, X,, , X,) is the joint
E(X))( Y- E( Y))) is called the covariance of
random variable of X, (n = 1,2, . , N) and that
X and Y. When X and Y have finite variances,
the (I-dimensional) distribution @ of X is the
the correlation coefficient of X and Y is defined
joint distribution (or simultaneous distribution)
by
of X, (n= 1,2, , N). On the other hand, the
k,-dimensional distribution Qn of X, is called E((X-E(X))(Y-E(Y)))
the marginal distribution of the /-dimensional p(x’ Y)={E((X-E(X))2)E((Y--.E(Y))z)}“2.
distribution @,.
It follows that E(aX + b Y) = uE(X) + bE( Y),
Given a finite sequence {Xn} (n = 1,2, . , N)
V(aX + b) = u2 V(X) for any real numbers u,
of random variables, if the relation
b, and that, in particular, E(X Y) := E(X)E( Y),
P({~IX,(~)EA,(~=~,~,...,N)}) V(X + Y) = V(X) + V(Y) for mutually indepen-
dent random variables X and Y. It also fol-
lows from the definition that -1 :C p(X, Y) d 1
in all cases. The independence of X and Y
holds for every choice of 1 -dimensional +Borel implies that p(X, Y)=O, but the converse is
sets A,, (n = 1,2, . , N), we say that the random false in general. The variance is important
variables X, (n = 1,2, , N) are mutually inde- because of the well-known Chebyshev inequal-
1269 342 E
Probability Theory

p(E)=
sEX(w)dP,
EE5,
ity: If X is a random variable with finite vari- with finite mean, the function
ance e2,

P(IX-E(X)12c)<02/c2

for every positive number c. defines on (Q, 5) a completely additive set


function which is tabsolutely continuous
with respect to P. Therefore, by the tRadon-
D. Convergence of Random Variables Nikodym theorem, there is an g-measurable
function f such that
If P(lim,,, X, = X,) = 1, the sequence {X.} is
said to converge almost everywhere (almost p(E)= f(w)dP for every EEG..
certainly, almost surely (a.s.), or with proba- sE
bility l)toX,.Iflim,,,P(IX,-X,I>&)=O This function is unique up to a set of P-
for every positive number E, the sequence {X,} measure zero and is called the conditional
is said to converge in probability to X,. For a expectation (or conditional mean) of X relative
given positive number p the sequence {X”} is to 5, denoted by E(X 13). When 5 is gen-
said to converge in the mean of order p to X, if erated by a random variable Y, we also write
lim,,, E( 1X, - X, 1”)= 0. Finally, if the ran- E(X I Y) for E(X I 8) and call it the conditional
dom variables X, (n = 1,2, . . . , co) have distri- expectation of X relative to Y. In this case,
butions @. (n = 1,2, . . . , co), respectively, and if there is a tBore1 measurable function f such
m that E(XI Y)=f(Y(w)), and we write E(XI Y=
lim f(x) d@“(X) = ~mfwwx)
O2 y) for f(y). The same fact holds when Y is a
n-m s --oo s
multidimensional random variable. It follows
for every continuous function f with compact from the definition that the conditional expec-
support, the sequence {Xn} is said to converge tation has the following properties, up to a set
in distribution to X,. Note that the sequence of P-measure zero: (i) if X > 0, then E(X 1%) 2
of random variables converging in distribution @(ii) E(aX+bY1~)=aE(XJ~)+bE(YJ~);(iii)
may not converge in any ordinary sense. For E(E(X 13)) = E(X); (iv) if X and 8 are mutu-
example, random variables converging in ally independent, i.e., B[X] and 5 are mutu-
distribution may even be defined on different ally indepent, then E(X I 3) = E(X); (v) if X is
probability spaces. On one hand, almost sure $-measurable, then E(XI 8)=X and E(XYI 5)
convergence does not in general imply conver- = XE( Y I 5); (vi) if lim,,, X,, = X, with IX,1 <
gence in the mean. On the other hand, either Y and Y is an integrable random variable,
almost sure convergence or convergence in the then lim,,, E(X, I 5) = E(X, 1%); (vii) if 8 is
mean implies convergence in probability, and a o-subalgebra of ‘& then E(E(X I 5) I 6) =
convergence in probability implies conver- E(X 18); (viii) if X2 is integrable and Y is any
gence in distribution. However, P. Lkvy [lo] @measurable random variable, then E((X -
proved that if the X, (n = 1,2,. . . ) are mutually JW IS))‘) G Et@‘- Y)‘).
independent, the sequence When X is the indicator function (i.e., the
icharacteristic function) xE of a set E in b,
r,=i:X., k=l,2,..., E(xE I 5) is called the conditional probability
“=I
of E relative to 3 and is denoted by P(E ( 5).
is convergent almost everywhere if and only if In particular, if 5 = {F, F’, @, a} with 1 >
it is convergent in distribution (or in proba- P(F) >O, P(E 15) is the simple function which
bility). The famous three-series theorem of takes the values P(E fl F)/P(F) on F and
Khinchin and Kolmogorov [t l] claims that P(E n FC)/P(FC) on 8”. These values are de-
the series x,, X, with X,, X2,. . independent is noted respectively by P(E I F) and P(E I F’).
convergent almost surely if and only if there The definition of P(E 1 Y) or P(E ( Y = y) is also
exists a sequence of independent random the same as in the case of the conditional
variables Xi, X$, . such that each of the three expectation.
series Let 3 be a o-subalgebra of 23 and Y a real
random variable. According to the foregoing
c PGL z X), 1 EK)> 1 w;)
n ” n definition, P( YE E I 5) or P( Y-‘(E) 15) is the
conditional probability of the occurrence of
is convergent.
the event YE E under &. Since P( YE E I 8) is
determined except on a P-null set depending
E. Conditional Probability and Conditional on E, an arbitrary version of P( YE E 1g),
Expectation viewed as a function of E, does not always
satisfy the conditions of a probability measure.
Let (Q, 8, P) be a probability space and 5 a r~- However, we can prove that there exists a
subalgebra of %3.If X is a random variable nice version of P( YE E 1%;)which is a proba-
342 F 1270
Probability Theory

bility measure in EEL’ for every weR and [6] and the Hewitt-Savage zero-one law. Let
that such a version is unique almost surely. a = cc(X, , X,, ) be an event concerning a se-
This version is called a regular conditional quence of random variables {X”}. CI is called a
probability of YE E under 3 or the conditional tail event concerning {X.} if for every n, occur-
probability distribution of Y under 5; this is rence or nonoccurrence of CI depends only on
written as P,(E 13). P( YE E 1X), P( YE E 1X=x), {Lx”+l,... }. For example, {lim,,, X, =0} is
P,(E) X), and P,(E 1X=x) are interpreted a tail event. a is called a symmetric event con-
similarly. The conditional probability distri- cerning {X”} if occurrence or nonoccurrence of
bution can be defined not only for real ran- c( is invariant under every finite permutation of
dom variables but also for every random X,, X, . . . For example, the event that xi+ X,
variable which takes values in an tanalytic > 0 for infinitely many n’s is a symmetric
measurable space. event. Kolmogorov’s zero-one law: Every tail
event concerning a sequence of indepen-
dent random variables has probability 0 or 1.
F. Bayes’s Formula Hewitt-Savage zero-one law: Every symmetric
event concerning a sequence of independent
Let E,, E,, . , E, be pairwise exclusive events, and identically distributed random variables
and assume that one of them must occur. If E has probability 0 or 1.
is another random event, we have Kolmogorov’s zero-one law can be extended
as follows: Let g,, n= 1,2, . . . , be a. sequence of
P(EJP(E I EJ independent o-subalgebras of 23. Then the o-
P(E, I El =
P(E,)P(E~E,)+...+P(E,)P(E~E,)’ algebra 2 = nk Un,k s,,, called the tail o-
algebra of { &}, is trivial, i.e., P(A) = 0 or 1 for
where P(E,) is the probability of the event Ei
every A E 2. Kolmogorov’s zero-one law is a
and P(E 1EJ is the conditional probability of E
special case where ‘&, is the o-algebra gen-
under the assumption that the event Ei has
erated by X, for every n.
occurred. This is called Bayes’s formula. In
practical applications E, , . , E, usually repre-
sent n unknown hypotheses. Suppose that
References
the probabilities on the right-hand side of the
formula are given. We then apply Bayes’s
[l] I. Todhunter, A history of the mathemat-
formula to reevaluate the probability of each
ical theory of probability from the time of
hypothesis Ei knowing that some event E has
Pascal to that of Laplace, Macmillan, 1865
occurred as the result of a trial. This is why
(Chelsea, 1949).
P(E,) (P(Ei I E)) is called the a priori (a pos-
[Z] J. Bernoulli, Ars conjectandi, 1713.
teriori) probability. However, the determina-
[3] A. de Moivre, The doctrine of chances,
tion of the values of a priori probabilities is
1718.
sometimes difficult, and we often set P(E,) =
[4] P. S. Laplace, ThCorie analytique des
l/n in practical applications, although this
probabilitts, Paris, 1812.
has caused a great deal of criticism.
[S] R. von Mises, Vorlesungen au,s dem
When X is a random variable subject to the
Gebiete der angewandten Mathematik I,
distribution with continuous probability den-
Wahrscheinlichkeitsrechnung und ihre An-
sity f(x), Bayes’s formula is extended to the
wendung in der Statistik und theoretischen
following form:
Physik, Franz Deuticke, 1931.
fW W IX =x01 [6] A. Kolmogorov (Kolmogoroff ), Grund-
.f(xo I El = jZmP(EIX=x)f(x)dx’ befriffe der Wahrscheinlichkeitsrechnung, Erg.
Math., Springer, 1933; English translation,
where f(x 1E) is the conditional probability Foundations of the theory of probability,
density of the random variable X under the Chelsea, 1950.
assumption that the event E has occurred, and [7] E. Borel, Les probabilitks dCnombrables et
P(E I X =x0) is the conditional probability of E leur application arithmetiques, Rend. Circ.
relative to X. Mat. Palermo, 27 (1909), 247-271
[S] N. Wiener, Differential space, J. Math.
Phys., 2 (1923), 131-174.
G. Zero-One Laws [9] K. L. Chung and P. ErdGs, On the appli-
cation of the Borel-Cantelli lemma, Trans.
In probability theory there are many theorems Amer. Math. Sot., 72 (1952), 179- 186.
claiming that an event with certain properties [lo] P. L&y, ThCorie de I’addition des vari-
has probability 0 or 1. Such theorems are ables alkatoires, Gauthier-Villars, 1937.
called zero-one laws. Here, we mention two [ 1 l] J. L. Doob, Stochastic processes, Wiley,
famous examples, Kolmogorov’s zero-one law 1953.
1271 343 B
Projective Geometry

[12] M. M. Lo&e, Probability theory, Van respectively. The set of all points that are
Nostrand, third edition, 1963. contained in a line is called the point range
Cl33 W. Feller, An introduction to probability with the line as its base. In projective geome-
theory and its applications, Wiley, I, second try, there exists a one-to-one correspondence
edition, 1957; II, 1966. between the set of lines and the set of point
[14] L. Breiman, Probability, Addison-Wesley, ranges, so we can identify every line with a
1968. point range. In this case a line 1E Q is repre-
[IS] J. Lamperti, Probability, Benjamin, 1966. sented as a subset of P, and the relation (p, 1)~
I means that the point p belongs to the set 1.
Let S be a subset of P and p,, pz be any two
distinct points of S. If the line that contains
p1 and p2 is always contained in S, then S is
343 (VI.14) called a subspace. Points and lines are sub-
spaces. Now we impose the following axiom:
Projective Geometry (IV) There exist a finite number of points
such that any subspace that contains all of
A. Introduction them contains P.
We call a projective geometry satisfying
axiom (IV) a finite-dimensional projective
Projective geometry is the most fundamental
geometry, which from this point on will be
of classical geometries and one of the first
the sole object of our consideration. We call
examples of axiomatized mathematics.
P a projective space. Consider sequences of
subspaces of the type P 2 Pnml $ . . . +=P 1 =z
B. Construction of Projective Geometry PO # 0, where 0 is the empty set. The num-
ber n of the longest sequence is called the
dimension of P. If P is of dimension n, we
We construct projective geometry axiomati-
write P” instead of P. We call P’ a projec-
cally [4]. Given two sets P, Q and a trelation
tive line and Pz a projective plane. Each sub-
I c P x Q, consider the triple !@= {P, Q, F}. We
space S of P, together with the set of lines
call each element of P a point and each ele-
of P contained in S, gives a finite-dimensional
ment of Q a line. If (p, 1)~ F holds for a point p
projective geometry, and so S is a projective
and a line 1, then we say that the line 1 contains
space. Lines and points are projective spaces
the point p. When two lines 1, and 1, contain a
of dimensions 1 and 0, respectively. By con-
point p, we say that they intersect at p. When
vention, the empty set is a (-1)-dimensional
several points are contained in the same line,
projective space. We call each 2-dimensional
these points are said to be collinear, and when
subspace a plane and each (n - 1)-dimensional
several lines contain the same point, these lines
subspace in P” a byperplane.
are said to be concurrent. For ‘p we impose the
Let M, N be subspaces of P, and for a pair
following axioms:
of points PE M, q E N consider the set p U q
(I) There exists one and only one line that
of all points on the line that contains p and
contains two given distinct points.
q. The set { p U q 1p E M, q E N} is denoted by
(II) Suppose that we are given noncollinear
MU N, and we call it the set spanned by M
points po, pl, and pz, and distinct points q,,
and N. By convention, we put 0 U M = M
q2. Now suppose that {P~,P~,~~} and {po, and p U p = p. Then P’U P” is the projective
pz, q2} are collinear triples. Then the line con-
space of the lowest dimension which contains
taining pl, p2 and the line containing q,, q2
P’ and P”. On the other hand, if we denote
necessarily intersect (Fig. 1).
the intersection of P’ and P” by P’n P”, then
it is the projective space of highest dimension
that is contained in both of them. We call
P’U P” and P’ n PS the join and the intersec-
tion of P’ and P”, respectively. When the
dimension of the space spanned by r + 1
points is r, we say that these points are inde-
pendent; otherwise they are dependent. If any
Fig. 1 r+ 1 points of a given subset M of P” are
independent for each r < n, we say that points
(III) Every line contains at least three dis- of M lie in a general position. The space P’
tinct points. necessarily contains r + 1 independent points,
The ‘$3 that satisfy axioms (I) and (II) and and there necessarily exists a P’ that contains
axioms (I), (II), and (III) are called general r + 1 arbitrary given points in a projective
projective geometry and projective geometry, space; it is unique if the points are indepen-
343 c 1272
Projective Geometry

dent. If P’U P” = P’ and P’ f’ P‘= P”, then have no points in common with ttem and
r + s = t + u. We call the latter the dimension project each point of P; onto P$ from P$-rm’.
theorem (or intersection theorem) of projective The one-to-one correspondence PC-P; thus
geometry. obtained is called a perspective mapping. If a
The set C, of all hyperplanes that contain a one-to-one correspondence P;+P,J is repre-
Porn2 in P” is called a pencil of hyperplanes, sented as the composite of a finite number of
and the Pzm2 common to them is called the perspective mappings, then we call it a projec-
center of C, If a pencil of hyperplanes contains tive mapping. These mappings are extended to
two distinct hyperplanes of P”, then the pencil those of fundamental figures, too.
is determined uniquely by these two. When Suppose that in a proposition or a figure in
n = 2 and 3, it is called a pencil of lines and a P”, we interchange P’ and Pnmrml (O<r < n)
pencil of planes, respectively. Each pencil of and also interchange contains and is contained
hyperplanes of P”, or more generally, each (and related terms). The proposition or the
pencil of hyperplanes of a subspace of an arbi- figure thus obtained is said to be dual to the
trary dimension in P”, is called a linear funda- original one. In projective geometry, if a pro-
mental figure of P” or simply a fundamental position is true, then its dual proposition is
figure. In P”, the set Zr of all Prier, Prier+‘, , also true (duality principle). This is assured
Pn--’ that contain the same Ponrml is called because propositions dual to axioms (I))(IV)
the star with center P;mrml. Each set that con- hold; and P’ and .Zr are dual to each other.
sists of the totality of subspaces of an arbitrary The projective space PG obtained by the prin-
demension in the same P’ or a subset of it is ciple of duality, by regarding the hyperplanes
called a P’figure. of P” as points of Pi, is called the dual space of
Under the assumption that P’ and P” do not P”.
have points in common, the operation of con-
structing P’U P” from P’ and P” is called pro-
jecting P” from P’. Assuming that P’ and P”
C. Projective Coordinates
have points in common the operation of con-
structing P’n P” from P’ and P” is called cut-
ting P” by P’. Suppose that we are given spaces Here we introduce projective coordinates in P”.
P,,, P, , and Pz, and a fundamental figure Z in Consider Desargues’s theorem: Suppose that
the space P, By projecting Z from P, and then p,, p2, p3 and ql, q,, q3, are two sets of points
cutting it by P2, we can construct a funda- in P”, each of which is independent and satis-
mental figure c’ on P2. This operation is called fies pi # q, (i = 1,2,3). If the three lines pi U qi
projection of Z from PO onto P2, and we call P, (i = 1,2,3) are concurrent, then the three
the center of projection (Fig. 2). In this case, we points(~,U~,)n(q,Uq,),(~,U~,)n(q,Uq,),
say that C and L” are in perspective and denote (pl U p2) n (ql U q2) are collinear. The converse
the relation by C X c’. If for two fundamental is also true. This theorem holds for n > 3 gen-
figures Z and c’ there exist a finite number of erally. However, when n = 2, there exist projec-
fundamental figures F, (1 < i < I) such that C, tive geometries for which it does not hold; we
F,~...~F,~C’,thenwesaythatCandZ’are call these non-Desarguesian geometries. In
projectively related to each other and denote such cases it is impossible to introduce coordi-
this by C,c’ (Fig. 3). Now for arbitrary sub- nates, so we assume Desargues’s theorem for
spaces P;, P; (0 < r < n), we take Ponrm’ that n=2.
When four points pi (1 < i < 4) in P” lie on
the same plane and in general position, we call
the figure that consists of these four points and
the six straight lines gij = p, U pj (1 :; i d j < 4) a
complete quadrangle p,p2p3p4; ea’sh pi is called
a vertex, and each gij is called a side. If six
points qi (I d i < 6) on a line I are points of
intersection of six sides gi2, gi3, g,4r g34, gz4,
Fig. 2 g23 of a complete quadrangle with I, we call

P?
P4
P3
AIYi!kL 1
91 q1 92% 9, 9;

Fig. 3 Fig. 4
1273 343 c
Projective Geometry

them a quadrangular set of six points (Fig. 4). neous coordinate of p with respect to this
By Desargues’s theorem, we can show that if frame. Also, we call the pair (x0, xi) such that
there are given three fixed distinct points on a x0, x1 E K and x1(x0)-l = 5 homogeneous co-
line 1, then any pair of distinct points on 1 ordinates of p. Since the supporting point p, is
determines uniquely a point on I such that the excluded from K(p,, pm, pl), we fix (0, x1) such
six points thus obtained constitute a quadran- that x1 #O as the homogeneous coordinates
gular set. The quadrangular property is invari- of P,. In order for (x0, x’) and (y”, y’) to be
ant under projective mappings. On a line 1 homogeneous coordinates of the same point, it
we fix three mutually distinct points p,,, pi, p,. is necessary and sufficient that there exist an
For any two points px, pY different from pm on element 1# 0 of K such that ya = x”l (c(= 0,l).
I, we take the point s such that pm, px, p,,, pm, In conformity with these results, we now
p,,, s constitute a quadrangular set of six points introduce coordinates in P”. A set 5 = [a,,
and call s the sum of px and py with respect a,, . . . , a,, u] of ordered n + 2 points in a gen-
to [po, pm, pJ (Fig. 5). On the other hand, eral position is called a frame (or projective
the point,t such that p,,, p,, pl, pm, py, t con- frame) of P”; each of a, (0 <CL <n) is called a
stitute a quadrangular set of six points is fundamental point, and u is called a unit point.
called the product of px and py with respect to For A=a,~U...Ua,(O~cr,<...<cr,~n), we
[po,pm,pI] (Fig. 6). When we are given a fixed denote by A* the space spanned by the re-
triple [p,, pm, pJ on a line I, as before, the set maining fundamental points. For any point
of points on 1 not equal to pm is called a point p that is not contained in A*, we put pa =
range of the number system, provided that we A n (p U A*) and call it the component of p
exclude pa, from the point range. We call the on A. Then & = [aEO, . . , a=,, uA] is a frame
set of three points [po,pm, pJ a frame (or of A. Hereafter, we shall omit u, for brevity.
projective frame) of I, and we call p,, the origin, Suppose that isomorphisms &a : K(a=, a+K
p1 the unit point, and pm the supporting point. are assigned for each pair CI, fi (0 < c(< /l< n).
Under a certain condition, the system {f&,} is
determined by one of the &. In this case we
denote {Q,,} by 0 and call { 5, O} a projective
coordinate system of P”. For any point p of P”
not contained in A, = a, U . . . U a,, we denote
by pi the component of p on a, U ai (1~ i ,< n),
and we put 5’ = O(pi). The elements of the
ordered set (<I, <‘, . . . , 5”) are called the in-
homogeneous coordinates of p with respect to
Fig. 5
3, and those of the set (x0,x1, . . . ,x”) such that
xi(xo)-i = r’ are called the homogeneous co-
ordinates of p. When p is contained in A,, we
define (0, xi, . . , x”) as homogeneous coordi-
nates of p with respect to 5, provided that
(x1 , . . . , x”) are homogeneous coordinates of p
with respect to sAO.
Now we represent the point whose coordi-
nates are (x0, x1 , . . . , x”) simply by x. In P”,
Fig. 6 when coordinates are introduced, a necessary
and sufftcient condition for points z to be on
A point range of the number system consti- the line that passes through two distinct points
tutes a tlield (which may be noncommutative) x and y is that za = x”A + yap (0 < cz< n), when
with respect to the previously defined sum and 1, ALEK are parameters. More generally, a
product. We call the field a Staudt algebra, point z is contained in the space spanned by
and an abstract algebra isomorphic to it is Y + 1 independent points xg (0 < b < r) in P” if
called a coefficient field of P”. We denote by and only if z” = &, x;Ls (0 <CI i n, Is E K). In
K(p,,p,,p,) the Staudt algebra that is deter- particular, the equation of a hyperplane is
mined by a frame [p,,, pm, p,]. A projective represented in the form ~~=o X,za = 0 (X, E K)
mapping of 1 onto itself that leaves invariant with respect to variable coordinates z”. There-
each of three distinct points po, pm, p1 on 1 is fore each hyperplane is uniquely determined
necessarily an tinner automorphism of the held by the ratio of X0, X,, . . . , X,. We call X0,
K(P~,P,,P,). Denoting by CP~,P~,PJ a frame X 1, . . . , X, hyperplane coordinates of the hyper-
on a line 1 of P” with coefhcient field K, we call plane. If n = 2, they are called coordinates of a
each isomorphism O:K(pe,pm,pi)+K a co- line, and if n = 3, plane coordinates of a plane.
ordinate system of 1. For each point p on 1 we (For coordinates of P’ in P” - 90 Coordinates
call the element 5 = B(p) of K the inhomoge- B.1
343 D 1274
Projective Geometry

D. Projective Transformations sending 5 onto 5’. (3) Given two distinct lines
I, and I, contained in a plane in P” and two
A one-to-one correspondence cp between the sets of three distinct points pi (i= 1,2,3) and qi
point sets of two projective spaces P” and P” is (i= 1,2,3) that lie on I, and I, respectively,
called a collineation in the wider sense if for any then the three points (p2 U q3) fl(p3 J q2),
three points pl, p2, p3 that are collinear, cp(pi) (~~Uq,)n(p,Uq,),and(~,Uq,)n(,7~Uq~)are
(i = 1,2,3) are also collinear and vice versa. If collinear. These three propositions are mutu-
P” = P.J, we call q a correlation; if P” = P”, we ally equivalent. We call proposition (2) the
call cp a collineation. If we denote a correlation fundamental theorem of projective geometry
by zO, any other correlation is obtained as a and proposition (3) the theorem of Pappus. If
composite of z0 and a collineation. If z is a the coefficient field is the real (complex) num-
correlation, it naturally induces a mapping ber field, we call the projective space a real
Pl+P”, which we also denote by r. Then zor (complex) projective space. In classical geom-
is a collineation. If 7 o 7 is an identity, we call 7 etry, only these cases were studied.
an involutive correlation. Suppose that cp: P”+ Suppose that the coeffkient field is com-
P” is a collineation in the wider sense and mutative. Then, if we assign an isomorphism
0 ,< r < n - 1. Then cp induces a one-to-one cor- Q,,:K(p,,p,,p,)-+K for the Staudt algebra
respondence between the set of r-dimensional K(p,, pm, pJ on a line in a space, then the
subspaces of P” and the set of r-dimensional isomorphism 6 of the Staudt algebra K(q,,
subspaces of P”; and if P’z P” in P”, then q,, ql) on an arbitrary line onto K can be
cpp’3 CpP”. uniquely determined so that 0-l o 8, is a pro-
Next, suppose that we are given two projec- jective mapping. Utilizing such isomorphisms,
tive spaces P” and P” that are subspaces of a we can determine homogeneous coordinates
space PN (n < N). (When Desargues’s theorem in an arbitrary subspace of P” by a frame on it.
holds, any two projective spaces of the same Suppose that the coefficient field is a com-
dimension can be identified with subspaces of mutative field whose characteristic is not 2.
a projective space of higher dimension.) In this For four collinear points pi (1 < i < 4) in P”,
case, when a collineation in the wider sense where p, , p2, p3 are distinct and p4 # pl, we
cp: Pn-P is a projective mapping, we call it a consider a frame such that pl, pz, and p3 are,
projective collineation in the wider sense. A respectively, the supporting point, l.he origin,
projective collineation is also called a projective and the unit point. The inhomogeneous co-
transformation. The totality of collineations of ordinate 1 of p4 with respect to this frame is
P” constitutes a ttransformation group and is called the anharmonic ratio (cross ratio or
called the group of collineations of P”; we de- double ratio) of these four points and is de-
note it by K(P”). The totality of projective noted by [p1,p2;p3,p4]. If we denote the
transformations of P” constitutes a tnormal inhomogeneous coordinates of pi with respect
subgroup of a(P”); we denote it by O(P”) and to a general frame by (~0, x!) (i = 1,2,3,4), then
call it the group of projective transformations. 3, can be expressed as
The totality of projective transformations that
leave invariant a frame 5 of P” constitutes a
subgroup B)l,+l(s). It is isomorphic to the
group of tinner automorphisms J(K) of the
coefficient field K of P”. A collineation is not
Moreover, if we interchange the order of the
necessarily a projective transformation. The
four points, then we have
former is obtained as a composite of a projec-
tive transformation and an automorphism of i = CP2rPli P4, P31
the coeffkient field. Specifically, if we denote
=CP3,P4;P1rP*l
the group of tautomorphisms of K by ‘%(K),
then ~(P”)/B(P”)z’%(K)/~(K). Hence in order =cP4>P3;P*~PIl>
for all collineations to be projective transfor-
mations, it is necessary and suffkient that all IP,,Pz;P4.P31=:> A
automorphisms of the coeffkient field be inner
automorphisms. If the coeffkient field is the cPI>P3;P*>P41=l-A
real number field, then collineations are always
projective transformations. For the complex
number field, however, this is not necessarily
true.
Now, we consider the following three pro-
positions: (1) The coefficient field of P” is com-
mutative. (2) Given frames 5 and 5’ of P”,
there exists a unique projective transformation
1275 343 E
Projective Geometry

In general, these six values are different; how- respect to some frame by (X,), then the linear
ever, there are the following two exceptions: transformation
when 1= -1, l/2, and 2; and when 1 is a root
of 12--1+ 1 =O. When 1= -1, these four
points are called a harmonic range of points,
and the points p3, p4 are called harmonic is a correlation. (Here also, we extend the
conjugates with respect to pl, p2; or pl, p2 and definition of correlation and include the case
pa, p4 are said to be harmonically separated where T, = (t$) is not regular.) The condition
from each other. When I2 -A + 1= 0, these that z* is an involutive correlation is given by
four points are said to be an equianharmonic T* = + ‘T,. When T, = -IT*, the involutive
range of points. For the dual of these, we can correlation r* is called a null system. The
consider the anharmonic ratio of four hyper- correlation r* is a null system if and only if
planes of a pencil of hyperplanes. The concept any point x of P” is contained in the hyper-
of the anharmonic ratio can be extended fur- plane r*(x). When T* =‘T*, we call the involu-
ther to the case of four elements of funda- tive correlation z* a polar system. For a polar
mental figures in general. The anharmonic system r*, the set of points x that are con-
ratio is a quantity that is invariant under tained in hyperplanes z*(x) constitutes a qua-
projective transformations. dric hypersurface (or hyperquadric).
Each projective transformation X+X is
expressed with respect to homogeneous co-
ordinates xa (0 < CI< n) of P” as
E. Quadric Hypersurfaces

Let 7* be a polar system, and let Q;-’ be the


PZO, det(t,“) #O. totality of points x contained in z*(x). Then
the equation of the quadric hypersurface Q;-’
Conversely, if T= (ti) is a tregular matrix is given by
(t; EK), then (1) determines a projective trans-
formation. So there is a one-to-one corre- (3)
spondence between projective transformations
and tequivalence classes of the regular ma- For such a correlation 7* we call a relation

trices T=(t;) with the tequivalence relation between the set of points x of P” and the set of
T-~T(IEK\{O}). Therefore, when K is com- hyperplanes z.+(x) a polarity with respect to
mutative, the group of projective transforma- Q!-l. We call 7,(x) the polar of x with respect
tions 6( P”) of P” is isomorphic to the factor to Q;-‘, and x the pole of z,(x) with respect
group PGL(n + 1, K) of the tgeneral linear to Q;-‘. If the points of intersection of a line
group GL(n+ 1, K) with the coefficient field K passing through a point x with Q”;’ and 7*(x)
by its center {pllp~K\{O}}; that is, 6(P”)g are denoted by zl, z,; y, then x, y; zl, z2 is a
PGL(n + 1, K). harmonic range of points. When a point x lies
Extending the definition of projective trans- on the polar of a point y, we say that x and y
formations, we call the transformation repre- are mutually conjugate. Each point on Q;-’ is
sented by (l), with an arbitrary square matrix conjugate with itself, and the converse is also
that is not necessarily regular, a projective true. We call the polar of a point on Q;-l the
transformation. When T is regular, it is called tangent hyperplane of Q;-’ at that point.
a regular projective transformation, and when If 7* is regular or singular of the hth species,
T is not regular, it is called a singular projec- we call the corresponding quadric hypersur-
tive transformation. In particular, if the frank face regular or singular of the hth species. If 7*
of T is n + 1 -h, then we say that the projective is singular of the hth species, its singular sub-
transformation is singular of the hth species. If space is contained in Qt-‘. We call points on
(1) is singular of the hth species, n + 1 hyper- this singular subspace singular points of Q;-‘.
planes ZFzo tixa = 0 (0 < a < n) have a space Q”;‘, which is singular of the first species (i.e.,
P*-’ in common. We call Phml the singular Qn2-l with just one singular point), is called a
s&space of this transformation. A projective cone.
transformation is not defined on its singular We call a subspace contained in Q;-’ a
subspace. A singular projective transformation generating space. If it is a line we call it a
of the hth species is the composite of the pro- generating line. We put q = (n - 2)/2 or (n - 1)/2
jection of P” onto some Pneh with the singular according as II is even or odd. Then, if the
subspace as its center and a regular projective coefficient field is an talgebraically closed
transformation of Pnmh. field, for each regular Q;-’ there necessarily
If the coordinates of a point are denoted exist q-dimensional generating spaces. Also,
by (x”) and hyperplane coordinates with Q: is a truled surface covered by two families
343 F 1276
Projective Geometry

of generating lines, and Qi” is covered by two Utilizing various subgroups of this group, we
families of k-dimensional generating spaces. can reconstruct various classical geometries.
If pi (1 < i < 5) are live points in a general For example, consider the projective space P”
position in a plane, then there exists one and whose coefftcient field is the real number field,
only one Qi passing through these points; we and fix a hyperplane 17,. Let G(P”) be the
call Qi a conic. In order that six points pi subgroup of projective transformations formed
(1 < i < 6) in a plane lie on Qi, it is necessary by all projective transformations that leave II,
and sufficient that the three points (pi Up,)fl invariant. Then the geometry that belongs to
(~4 U ~4, (~2 U ~3) n(ps U PA> and (~3 U ~4) f- this group is taffine geometry. Similarly, fix
(p6 Up,) be collinear (Pascal’s theorem). The an imaginary regular quadric hypersurface
dual of the last theorem is called Brianchon’s Q;-” in II, and consider the geometry that
theorem. belongs to the subgroup of G(P”) leaving this
Given two hypersurfaces Q;-‘, Q;-’ in P”, Q;-2 invariant. We thus obtain Euclidean
we consider another &’ such that the polar geometry. Moreover, if we assign some regular
of an arbitrary point x with respect to &’ quadric hypersurface Qt-‘, then the geometry
belongs to the pencil of hyperplanes deter- belonging to the subgroup of (li(P”,l that leaves
mined by polars of x with respect to Qn2-l thee2n-1 invariant is a tnon-Euclidean or
and Q’-‘. The set of all such &’ is called a tconformal geometry according as the trans-
pencil of quadric hypersurfaces. It is the set of formation space is the set of inner points of
all &’ that pass through the intersection of Q;-’ or the whole Q;-‘.
Q;-’ and Q;-i. In the cases n = 2 and 3, we
call it a pencil of tonics and a pencil of quad-
rics, respectively. G. Projective Geometry and Modular Lattices
Denoting by Iij and Fj (0 < i <j < 3) the
tPliicker coordinates of two straight lines 1 tLattices (lattice-ordered sets) and projective
and i in P3, we put geometry are intimately related. The totality of
(1 1)=10’t23_102t13+103T12
subspaces of each dimension in general projec-
tive geometry ‘p constitutes a tcomplete
+~01~23~~02~13+~03~12,
(4) tmodular lattice L(‘p) with respect to the in-
clusion relation. If ?p is a finite-dimlensional
Then (I, 1) = 0 holds. If we regard these Iij as
projective geometry, then it is an tirreducible
homogeneous coordinates of P5, then there
complemented modular lattice of finite theight.
exists a one-to-one correspondence between
Conversely, suppose that L is a modular lat-
the points on the regular quadric hypersurface
tice with tminimum element @, and denote by
Q: defined by (1, I) = 0 and the lines in P3. We
P the totality of elements p tprime over @ (i.e.,
say that each point of Qt is the image of the
tatomic elements) and by Q the totality of
line corresponding to it in P3. Two lines 1 and
elements 1 prime over atomic elements. Then, if
7 intersect if and only if (1,T) = 0. Geometrically,
p < I and (p, l) E F:, Fp(L) = {P, Q, F} is a general
this means that the images of I and f are con-
projective geometry. If L is an irreducible
jugate with respect to Qt. Therefore the line
complemented modular lattice of finite height,
passing through the images of 1 and 7 is a
then p(L) is a linite-dimensional projective
generating line of Q$. The image of a pencil of
geometry; in this case we have ‘@ % ‘@(L(v))
lines in P3 is a generating line of Qt. Quadric
and L e L@(L)). So we may consider projec-
hypersurfaces and sets of lines in P3 are im-
tive geometry and irreducible complemented
portant objects of study in both projective and
modular lattices as having the same mathe-
algebraic geometry. In particular, linear line
matical structure. If a lattice L is an n-
congruences (linear line complexes) that are
dimensional projective geometry, its tdual lat-
families of lines dependent upon two (three)
tice is also an n-dimensional projective geom-
parameters are of great interest. In these
etry, and this is the principle of duality.
theories, quadric hypersurfaces play a fun-
damental role. When the coefficient field is
noncommutative, the above theory has to be
H. Analytic Representations of Projective
greatly modified.
Geometry

F. Projective Geometry and the Erlangen Let K be an arbitrary field, commutative or


Program noncommutative. For an arbitrary natural
number n, we consider an (n + 1)-dimensional
From the standpoint of the tErlangen pro- (for the noncommutative case, right or left)
gram of F. Klein, the aim of projective geom- linear space I”‘+‘(K) over K. The totality of
etry is to study properties that are invariant linear subspaces in it constitutes an irreducible
under the group of projective transformations. complemented modular lattice P”(K) with
1277 344A
Pseudoconformal Geometry

respect to the inclusion relation, and P”(K) The Tits system corresponds to a projective
gives rise to an n-dimensional projective geom- geometry in the following case. Let k be any
etry. We call it a right or left projective space. commutative field, and let G be the general
Points of P”(K) correspond to (right or left) linear group of degree n over k, i.e., G consists
l-dimensional linear subspaces. Conversely, of all nonsingular square matrices of degree n
it can be shown that an n-dimensional projec- with entries in k. Let B be the subgroup of G
tive geometry over K is isomorphic to P”(K). consisting of all upper triangular matrices (i.e.,
Therefore projective geometries can be com- matrices whose entries below the principal
pletely classified by means of the natural diagonal are all zero). Let N be the subgroup
number n and the field K except when n = 2 of G consisting of all monomial matrices (i.e.,
and the geometry is non-Desarguesian. We matrices such that each column and each row
may restate this fact as follows: We consider a contain just one nonzero entry). Then (G, B, N)
space P”= V-“+‘(K) - (0). If we fix a basis of forms a Tits system called type (A,-,). The
I/“+‘(K), then we can represent P”= {x=(x’, corresponding theory of buildings of the type
Xl ,..., x”)~x=~K,O,<cc<‘n}, x#(O,O ,..., 0). If above is nothing but the projective geometry.
there exists a nonzero element 1 of K such that Thus by means of Tits’s theory of buildings the
y = xl, then the elements x and y are called relationships among projective geometry and
equivalent; we write x N y. We denote by P”(K) other geometries have been clarified [9].
the factor set of P under the foregoing equiva-
lence relation, and by [x] the equivalence class
References
that contains x. We put I([x],[y])={[z]lz’=
x”~+Y”P,V~WK) and Q=~~(~xl,C~l)lCxl~ [l] G. Birkhoff, Lattice theory, Amer. Math.
[y] EP”(K)}. We call each element of P”(K) a
Sot. Colloq. Publ., revised edition, 1967.
point and each element of Q a line. Then these
[2] W. V. D. Hodge and D. Pedoe, Methods
points and lines and the natural inclusion
of algebraic geometry I, Cambridge, 1947.
relation satisfy axioms (I)-(IV) and give an n-
[3] 0. Schreier and E. Spemer, Einfiihrung in
dimensional projective geometry. When K is a
die analytische Geometrie und Algebra, Van-
ttopological field (e.g., the real number field,
denhoeck & Ruprecht, II, 1951; English trans-
the complex number field, or the tquaternion
lation, Projective geometry of n-dimensions,
field), we may define the topology of P”(K) as
Chelsea, 1961.
the factor space P”(K) = p/ N. In particular, if
[4] 0. Veblen and J. W. Young, Projective
K is the real number field R, then P”(R) is
homeomorphic to the factor space obtained geometry I, II, Ginn, 1910-1938.
[S] E. Artin, Geometric algebra, Interscience,
from the n-dimensional hypersphere S” :(x0)’ +
. . . + (x”)’ = 1 in the (n + 1)-dimensional Eucli- 1957.
[6] S. Iyanaga and K. Matsuzaka, Affine
dean space En+’ by identifying the end points
geometry and projective geometry, J. Fat. Sci.
of each diameter. Hence P”(R) is compact.
Univ. Tokyo, 14 (1967), 171-196.
Similar facts hold for the cases of the complex
[7] A. Seidenberg, Lectures in projective geom-
and quaternion number fields. Since the group
etry, Van Nostrand, 1962.
of projective transformations 0(P”(K)) acts
[8] R. Hartshorne, Foundations of projective
ttransitively on P”(K), if K is a topological
geometry, Benjamin, 1967.
field we can regard P”(K) as a thomogeneous
[9] J. Tits, Buildings of spherical type and
space of the topological group B(P”(K)).
finite BN-pairs, Lecture notes in math. 386,
Moreover, the totality of r-dimensional sub-
Springer, 1974.
spaces in P”(K) constitutes a TGrassmann
manifold. In algebraic geometry the tdirect
product of two projective spaces is important;
we call it a biprojective space.
344 (Vll.22)
Pseudoconformal Geometry
I. Tits’s Theory of Buildings (Generalization of
Projective Geometry)
A. Definitions
In a situation when a triple (G, B, N) consisting
of a group G and its subgroups B, N satisfies Let A and A’ be subsets (with relative topol-
the axioms of a BN-pair or Tits system (- 13 ogy) of tcomplex manifolds X and X’ of dimen-
Algebraic Groups R), a new geometric object, sion n, respectively. A homeomorphism f of
called a “building,” was introduced by J. Tits A onto A’ is called a pseudoconformal trans-
[9]. His theory contains projective geometry formation if there exists a tbiholomorphic
as a particular case. The theory of buildings mapping f of an open neighborhood of A in X
has deep connection with algebraic groups. onto an open neighborhood of A’ in x’ such
344 B 1278
Pseudoconformal Geometry

that y(x)=f(x) for x6.4. If there exists such a tained a criterion in terms of +Cart.;ln connec-
mapping ,I; A is said to be pseudoconformally tions in some fiber bundle over the hypersur-
equivalent to A’. Pseudoconformal geometry is faces. However, he did not publish the proof of
a geometry that studies geometric properties his result until S. S. Chern and J. Moser [4], in-
invariant under the pseudoconformal equiva- dependently of Tanaka [3], obtained a similar
lence. However, most studies in pseudocon- result and gave the first proof of this result.
formal geometry so far have concentrated Let M be a real analytic hypersurface in C”+’
mainly on the investigation of smooth hyper- (n > 1) whose Levi form has p positive and
surfaces in a complex manifold-more specifi- q negative eigenvalues (p + q = n). L.et H be
cally, the smooth (or real analytic) boundaries the subgroup of SU(p + 1, q + 1) lea,ving the
of bounded domains in C”. In fact, to pseudo- point (l,O, , O)EC”+’ fixed. According to the
conformal geometry on hypersurfaces we can Cartan-Tanaka-Chern-Moser result, we can
apply the methods of differential geometry as construct functorially a principal fiber bundle
well as those of the theory of functions of Y over M with structure group H and a Car-
several complex variables. tan connection w on Y with values in the Lie
H. Poincare [1] studied perturbations of algebra of SU(p+ l,q+ 1) such that if M and
the boundary of the unit ball in C2 that are M’ are pseudoconformally equivalent, then
pseudoconformally equivalent. E. Cartan [2] there is a bundle isomorphism cp 01“ Y to Y’
studied the equivalence problem of hyper- preserving the Cartan connections: v*w’ = w,
surfaces in C* and gave the complete list of where Y’ is the corresponding principal fiber
all simply connected hypersurfaces on which bundle over M’ and u’ is the Cartan connec-
the group of pseudoconformal automorphisms tion on Y’. Conversely, if there is a bundle
acts transitively. Such a hypersurface is called isomorphism cp of Y to Y’ such that ‘p*w’= o,
homogeneous. then M and M’ are pseudoconformally equiva-
Let M be a smooth hypersurface in a com- lent. By using this solution of the equivalence
plex manifold X with the +almost complex problem, we can prove that the group A(M)
structure tensor J, i.e., J,: T,X* T,X is an of all pseudoconformal automorphisms of a
involutive linear automorphism ofjhe tangent nondegenerate real analytic hypersurface M in
space T,X of X at x induced by the complex a complex manifold X of dimension n is a Lie
structure of X. Put H,M = T,M n J, T,M for transformation group of dimension not exceed-
x E M. The union of all H,M is called the ing n* + 2n. H. Jacobowitz [S] con,structed
bundle of holomorphic tangent vectors of M a similar bundle B over M and a Cartan con-
and is denoted by H(M). H(M) is also called nection on B in a different way fro-m that of
the CR (Cauchy-Riemann) structure of M. Let Chern and Moser. We do not know whether
M’ be a smooth hypersurface in a complex B and Y actually coincide.
manifold X’. A diffeomorphism f: M+M’ is
called a CR-equivalence if the tdifferential
mapping rf:f: TM + TM’ off preserves the C. Classification
CR-structures, where TM denotes the ttangent
bundle of M. If f: M-M’ is a pseudocon- Cartan (1932) classified all simply connected
formal transformation, then f is clearly a CR- homogeneous hypersurfaces in C*. In partic-
equivalence. Let E, be the annihilator of H,M ular, he proved that if M is a compact homo-
in TX*(M). Then the union of E, (xEM) defines geneous strictly pseudoconvex hypersurface
a tline bundle E over M. The Levi form L, at with dim M = 3, then M is pseudoconformally
XEM, defined only up to a multiplier, is the equivalent to either (1) S3 or its quotient by
quadratic form on H,M defined by L,(u, u) = the action of a root of unity or (2) the hyper-
dO(u, a) for U, UGH,M, where fI is a nonvanish- surface given in the 2-dimensional projective
ing section of E in a neighborhood of x. If the space by the equation in homogeneous coordi-
Levi form is nondegenerate at every point of nates:(z,z,+z,z,+z,~,)* = m*]z~+z~+z~\*
M, M is called a nondegenerate hypersurface. (m> 1) or the double covering of such a sur-
In particular, if the Levi form is definite, then face. A. Morimoto and T. Nagano [6] and
M is called strictly pseudoconvex. later H. Rossi [7] tried to generalize this
result and obtained a partial classification of
simply connected compact homogeneous
B. Equivalence Problem hypersurfaces with dimension > 5 D. Burns
and S. Shnider [S] classified all sirnply
Cartan studied the equivalence problem for connected compact homogeneous strictly
the case n = 2, and obtained a criterion for two pseudoconvex hypersurfaces M wrth dim M =
hypersurfaces in C2 to be pseudoconformally 2n + I 2 5. They proved that A4 is pseudo-
equivalent. N. Tanaka (1965) generalized the conformally equivalent to .S*“” or the tangent
method of Cartan for the case n 2 3 and ob- sphere bundle of a rank one isymmetric space
1279 344 E
Pseudoconformal Geometry

or the unit circle bundle of a homogeneous C. Fefferman [lo] proved that (A) implies
negative line bundle over a homogeneous (D) when D, and D, are strictly pseudoconvex.
algebraic manifold. S. Bell generalized the result of Fefferman in
On the other hand, a real hypersurface M in the case when one of D, and D, is strictly
a complex manifold X of complex dimension pseudoconvex. Bell and E. Ligocka [ 1 l]
n + 1 is called spherical if at every point p E M, proved that if M, and M2 are real analytic and
there is a neighborhood of p in X such that if D, and D, are pseudoconvex, then (A) im-
U n M is pseudoconformally equivalent to an plies (D). When D, and D, are not strictly
open submanifold of S’“+‘. The hyperquad- pseudoconvex and Mi is not real analytic, we
ric p+l= a&+, is spherical, where U,,, = do not know whether (A) implies (D) or not.
{(z 1,..., z,+1)EC”+111m(z,+,)>lz,12+...+ As remarked by Burns, Shnider, and Wells
1z,12}. If M is spherical, then the universal (1978), by using the theorem of Fefferman,
covering space fi of M is also spherical. If M we can prove that (A) implies (C) when M,
is a homogeneous spherical hypersurface, then and M2 are real analytic and if D, and D,
there is a covering into mapping f: a-+S”‘+l, are strictly pseudoconvex. I. Naruki [ 123
and f(fi) is a homogeneous domain in S2n+1. obtained the same result. We do not know
We know that the only compact simply con- whether (A) implies (C) when M, and M, are
nected spherical M is S2”+1. Burns and Shnider real analytic and D, and D, are pseudoconvex,
classified all homogeneous domains M in though we know that (A) implies (B). We do
S’“+‘: M is pseudoconformally equivalent not know whether (B) implies (C) in general.
to (I) or (II) of the following: (Ia) S2”+l - S. I. Pinchuk [ 133 proved the following:
V fl S’“‘l, where V is a complex vector sub- Let D, D’ be strictly pseudoconvex domains in
space of C”+’ with O,<dim, V<n. (Ib) Qn+l- C” with simply connected real analytic bound-
L, 0, 0 < m < n, where L, is a certain sub- aries aD, aD’. Let f: U+C” be a nonconstant
group of SU(n + 1, l)/(center). (II) S*“+l - holomorphic mapping from a connected neigh-
S2”+l flR”+‘. At present, it seems difficult to borhood U of a point PE aD in C” into C” such
extend Cartan’s classification of all simply that f (U n aD) c aD’. Then we can find a holo-
connected homogeneous hypersurfaces to morphic mapping f:D+D’ such that J(x)=
higher dimensions. K. Yamaguchi (1976) f(x) for XED fl U. Combining this theorem
treated a hypersurface M in a complex mani- with Fefferman’s result we see that for two
fold of dimension n with a large automorphism domains as above, D is biholomorphically
group A(M). He showed that if dim A(M)= equivalent to D’ if and only if aD is locally
n2 + 2n, then M is a real hyperquadric in the pseudoconformally equivalent to au, i.e, there
n-dimensional complex projective space P,,C are neighborhoods U and V of a point p ELID
(- Section B). He then showed that the sec- and q E do’, respectively, such that U n aD and
ond largest dimension for A(M) is equal to V fl aD’ are pseudoconformally equivalent.
n2 + 1 except when n = 3 and the index r = 1, Concerning the tproper holomorphic map-
for which dim .4(M) = 1 1( = n2 + 1). Under the pings rather than diffeomorphisms, Burns and
additional assumption that M is homoge- Shnider (1979) proved the following theorem:
neous, he showed that if dim A(M) = n2 + 1, Let Mi = dD, (i = 1,2) be strictly pseudoconvex,
then M is the afflne part of a real hyperquadric and let f: D, +D, be a proper holomorphic
in P,,C (except when n = 5 and r = 2). He also mapping. (a) If D, =D,, then f extends smoothly
obtained a similar result in the nonhomoge- up to the boundary D, . (b) If aD, is real analy-
neous case. tic for i = 1, 2, then f extends holomorphically
past the boundary.

D. Relations to Other Equivalences


E. Deformations of Domains
Let D, and D, be bounded domains in C” with
smooth boundary aD,= Mi (i = 1,2) for which Let M be a compact connected strictly pseudo-
we denote by H(M,) the CR structures. We convex real hypersurface in a complex mani-
consider the following propositions (A)-(D) for fold X of dimension n + 1. Let cp be a smooth
these domains: (A) D, is biholomorphically strictly tplurisubharmonic function defined
equivalent to D,. (B) Ml is CR equivalent to on a neighborhood V of M such that M =
M,. (C) M, is pseudoconformally equivalent to {x~VIcp(x)=O} anddq#Oon M. Let U=
M2. (D) There is a diffeomorphism f: fil -0, {x E VI --E < p(x) < E} for small E > 0 such that
such that fl ,,, : D, -+D, is biholomorphic. Z? is compact and aU is smooth. Let g(u)
It is clear that (C) implies (D) and that (D) be the open set in Cm(U) of strictly plurisub-
implies (A). On the other hand, we can prove harmonic functions $ with dt+bA & A (d&b)” #O
that (B) is equivalent to (D). When does (A) on u. Let B c Rk be a small open ball around
imply (B) and when does (B) imply (C)? 0. We denote by p( u x B) the set of $ cCm( u
344 F 1280
Pseudoconformal Geometry

x B) such that +(x, t) = $,(x)E~( U) for all t E l?. alence problem of pseudo-Hermitian mani-
For(1,EY(UxB)wesetM,,,={xcUl$,(x)= folds by applying Cartan’s method of equiv-
6). After introducing these notations, Burns, alence. He proves, among other things, that
Shnider, and Wells (1978) proved the following the group of all pseudo-Hermitian transforma-
theorem. There exists an open dense set V c tions of the nondegenerate pseudo-Hermitian
Y( U x B) with cpE V and a set of tsecond manifold (M, 0) of dimension 2n + 1 is a Lie
category &? c V such that for every $ E%?, transformation group of dimension not ex-
ti~B and 6,cR small enough, (i) Mt,,d, is CR- ceeding (n + 1)‘. Webster considered the rela-
equivalent to Mfl,dl if and only if t, =t,, 6, = tion between pseudo-Hermitian manifolds
6,. (ii) The group of CR-automorphisms of and pseudoconformal geometry and proved
M reduces to the identity only. For $E that for n > 2 the ellipsoid E given by the equa-
Y[$; B), taking t E B, 6 E R small enough, Mf,d tion A,x:+B,y:+...+A,+,x,Z+,+B,+,y,2+,=
is a compact connected strictly pseudoconvex l,wherez,=x,+iy,(k=l,...,n+l)ispseudo-
hypersurface in X. If M bounds the relatively conformally equivalent to the hypersphere
compact region D in X then M,,, also bounds S2”” if and only if A, = B, (k = 1, . , n + 1).
a relatively compact region D,,,. In particular, This result gives, by virtue of Fefferman’s
there exist smooth families of deformations theorem, a necessary and sufficient condition
of the unit ball in C”+’ of arbitrary high di- for an ellipsoidal domain to be biholomorphi-
mension. There are arbitrary small perturba- tally equivalent to the unit ball.
tions of the unit sphere in C”+’ that admit no
pseudoconformal transformations other than
the identity. References

[l] H. Poincare, Les fonctions analytiques de


F. Topics Related to Pseudoconformal deux variables et la representation qconforme,
Geometry Rend. Circ. Mat. Palermo, 23 (1907), 1855220.
[2] E. Cartan, Sur la geometric pseudo-
(1) Pinchuk (1975) proved the following: Let conforme des hypersurfaces de l’espace de
D,, D, be strictly pseudoconvex domains in C” deux variables complexes I, II, Ann. Mat. Pura
with C” boundary aD,, aD,. Let U be a neigh- Appl., 11 (1932), 17-90; Ann. Scuola Norm.
borhood of a point psaD, in C”. If there is a Sup. Pisa, 1 (1932), 3333354.
Cl-mapping f: U fl o1 +C” such that f is [3] N. Tanaka, On nondegenerate real hyper-
holomorphic on U fl D, and f( U n aD,) c aD,, surfaces, graded Lie algebras and Cartan
then there is a holomorphic mapping 7: U’-+ connections, Japan. J. Math., 2 (1976), 13 l-
C” of a neighborhood U’ of U f’aD, into C” 190.
such that y(x) =f(x) for XE U n & n U’. This [4] S. S. Chern and J. Moser, Real hypersur-
result is related to the implication (B) Z-(C) in faces in complex manifolds, Acta Math., 133
Section D. (1974) 2199271.
(2) H. Alexander [14] proved the follow- [S] H. Jacobowitz, Induced connections on
ing: Let U be a connected neighborhood of a hypersurfaces in C”, Inventiones Math., 43
point p6S “-i in c” and f: U-42” a holomor- (1977), 109-123.
phic mapping such that f( U n S’“-‘) c S2”-‘. [6] A. Morimoto and T. Nagano, On pseudo-
Then either f is a constant mapping or there is conformal transformations of hypersurfaces, J.
a biholomorphic automorphism 7: B,+B, of Math. Sot. Japan, 14 (1963), 2899300.
the unit open ball B, such that y(x) =f(x) for [7] H. Rossi, Homogeneous strongly pseudo-
XE U n B,. He also proved that every proper convex hypersurfaces, Rice Univ. Studies, 59
holomorphic mapping f: B,+B, is necessarily (1973), 131-145.
an automorphism of B, if n > 1. G. M. Henkin [S] D. Burns and S. Shnider, Spherical hyper-
(1973) proved that every proper holomorphic surfaces in complex manifolds, Inventiones
mapping f: D, +D, of a strictly pseudoconvex Math., 33 (1976) 2233246.
domain D, into a strictly pseudoconvex D, can [9] K. Yamaguchi, Non-degenerate hypersur-
be extended continuously to a function 7: 0, + faces in complex manifolds admitting large
0,. More precisely, there is a constant c > 0 groups of pseudoconformal transformations I,
such that If(zl)-f(zz)I<cIz, -~~lr/~ for every Nagoya Math. J., 62 (1976), 55-96.
~1, ~2~4. [lo] C. Fefferman, The Bergman kzrnel and
(3) Let M be a real hypersurface in C”+’ biholomorphic mappings of pseudoconvex
with H(M) the bundle of holomorphic tan- domains, Inventiones Math., 26 (1974) l-65.
gent vectors to M. We take a real nonvanish- [l l] S. Bell and E. Ligocka, A simplification
ing l-form 0 that annihilates H(M). S. M. and extension of Fefferman’s theorem on
Webster (1978) called the pair (M, 0) a pseudo- biholomorphic mappings, Inventiones Math.,
Hermitian manifold. He considered the equiv- 57 (1980), 2833289.
1281 345 A
Pseudodifferential Operators

[ 121 I. Naruki, On extendability of isomor- exists a constant Cb,B,Ksuch that


phisms of Cartan connections and biholomor-
lD~DrsP(x,r)l~c,,,,,(l +It3m+d’=‘-p’fl’,
phic mappings of bounded domains, TBhoku
Math. J., 28 (1976) 117-122. XEK, PER”,
[ 131 S. I. Pinchuk, On holomorphic mappings
of real analytic hypersurfaces, Math. USSR- then p(x, 5) is said to be of class S:,(G). The
Sb., 34 (1978), 503-519. operator P defined by (2) is called a pseudo-
[14] H. Alexander, Proper holomorphic map- differential operator (of order m) of class
pings in C”, Indiana Math. J., 26 (1977), 137- SE,(n) and is often denoted by P = p(x, 0,) E
146. S,l#). When 0 = R” and constants Ca,fi,,s,x =
[lS] S. M. Webster, Pseudo-Hermitian struc- C,,@are independent of K, we denote SEd(R”)
ture on a real hypersurface, J. Differential simply by Szd, and set
Geometry, 13 (1978), 25-41.

345 (X111.33) Differential operators (1) with coefficients of


Pseudodifferential Operators class g (- 168 Function Spaces B(13)) belong
to s;l,,. The complex power (1 -A)@ of l-
A = 1 - & a*/i?x,2 is defined as a pseudo-
differential operator of class Sf,ei by the sym-
A. Pseudodifferential Operators bol (1 + l (I*)‘/*. Operators of class SF, are
continuous mappings of Y into Y. Therefore,
Pseudodifferential operators are a natural for any real s, the operator (1 - A)s” can be
extension of linear partial differential opera- uniquely extended to be a mapping of Y’ into
tors. The theory of pseudodifferential opera- Y’ by the relation
tors grew out of the study of singular integral ((1-A~‘*u,u)=(u,(1-A)s’*~),
operators, and developed rapidly after 1965
with the systematic studies by J. J. Kohn and UEY’, VEY.

L. Nirenberg [ 11, L. Hiirmander [2], and


For any 1 < r < co and real s, the tSobolev
others. The term “pseudodifferential operator”
first appeared in Kohn and Nirenberg [ 11. space H”,’ is defined by
Let P be a tlinear partial differential opera- HS*‘={u~Y’l(l-A)S’*ueL,(R”)},
tor of the form
which is a Banach space provided with the
p = p(x, DX) = c aa(x) (1) norm ~Ju//,,,= j/(1 -A)““uII,. In particular,
lal<m
H”= Hsg2 is a Hilbert space with the norm
and let u(x) be a function of class C;(G) lI4,= II~lls,z.Set
(c Cz(R”)). Then by means of the tFourier
H-“J- H”.‘, H-=O=H-W.2,
inversion formula, Pu(x) can be written in the - -m<s<m
u
form
fp’- - n H”,‘, H” =H”,*.
-m<s-cm
z%(x) =(27pZ exp(ix. 5)p(x, tMW5, (2)
s R” Then
where t2([) denotes the tFourier transform of
u(x) (- 160 Fourier Transform H). But this
representation of Pu(x) has a meaning even if Choosing the Hormander class SF, in the
p(x, r) is not a polynomial in <. Thus, for a case0<6<p<l anda< asamodelclass,
general function p(x, 0, the pseudodifferential we here list the main results of the theory of
operator P =p(x, 0,) with the symbol p(x, 5) is pseudodifferential operators:
defined by (2). A symbol class is determined in (i) Pseudolocal property. The operator P of
accordance with various purposes, but it is class SF6 in general does not have the local
always required that the corresponding opera- property u E Y’ * supp Pu c supp u, but if p >
tors have essential properties in common 0, then P has the pseudolocal property u E 9” *
with partial differential operators. Horman- sing supp Pu c sing supp u [3].
der [3] defined a symbol class SF,(G) for real (ii) Algebra of pseudodifferential operators.
numbers m, p, and 6 with p > 0 and 6 > 0 in the Let P=p(x,D&Sz, and 3=pj(x,DX)~Sz,,
following way: Let p(x, 5) be a Cm-function j= 1,2. Then there exist P*=p*(x,D&S~,
defined in R x R”. If for any pair of multi- and Q = q(x, D,)E SF;+mzsuch that (Pu, v) =
indices E, p and any compact set K c R”, there (u, P*u) for u, UEY, i.e., P* is the forma1 ad-
345 A 1282
Pseudodifferential Operators

joint of P, and Q = P, PI. Furthermore, if P, we call it a parametrix of P. For a differen-


we set POX, 5) = WD$P(X, 5) and qAx, 5) = tial operator P, the existence of a left (resp.
(iD,)ap, (x, <)D&(x, <), then for any integer right) parametrix is a sufficient condition for P
N we have to be thypoelliptic if p > 0, (resp. the equation
Pu = f ~9’ is locally solvable).
The estimate (5), in particular, when m = s =
0, has been obtained by Hiirmander [3], V. V.
and Grushin (Functional Anal. Appl., 4 1:1970)),
H. Kumano-go (J. Fat. Sci. Univ. Tokyo, 17
(4) (1970)) when 0 < 6 <p < 1, and A. P. Calder6n
and R. Vaillancourt [S], H. 0. Cordes (J.
Hence the operator class SFn is an algebra in Functional Anal. 18 (1975)), T. Kato (Osaka
the sense J. Math., 13 (1976)), Kumano-go pt], and
otherswhen0<6<p<landfi<l. Asharp
form of Girding’s inequality has been proved
by Hkmander [6], P. D. Lax and L. Niren-
berg [7], and sharpened by A. Melin (Ark.
where m, = max(m,, m2). In particular, if 0 < Mat., 9 (1971)), C. Fefferman and D. H. Phong
fi<p<l, we havem-(p--6)N+-m and (Proc. Nat. Acad. Sci. US, 76 (19791), and
m, +m,-(p--G)N-+-co as N+co. Then, we HGrmander [S]. A general sufficient condition
say that p*(x, 5) and q(x, () have asymptotic ex- for the existence of a parametrix for an opera-
pansions in the sense of (3) and (4), respectively: tor of class Sz, was obtained by HGrmander
[3]. Let P=p(x,D,) belong to S:, with 0~
P*k i)-gP:(x> 0 6 <p < 1. Assume that the symbol ,7(x, 5) satis-
lies the following conditions: (i) for some C, >
and O,realm’(<m),andR>O,wehave Ip(x,<)l>
C, I< I”” (I 5 I> R); (ii) for any c(, b there exists a
constant C,.,,, such that

c3,41. lD,pD;~(x> O/p(x> 01 G Co,,,D1516’p’-p’a’,


(iii) H”-boundedness. For P E Srd and any
1512R.
real s there exists a constant C, such that
Then there exists a parametrix Q = q(x, D,) of
llp4dCsllulls+,, UEW+” (5)
P in the class Sp;r’.
c41. By means of operators of class S;t,#) we
(iv) A sharp form of Gkding’s inequality. can define the wave front set of u E g’(n),
Let p(x, <) = (pj,(x, 5); j, k = 1, . , I) be a Her- which enables us to resolve sing supp u on
mitian symmetric and nonnegative matrix of T*(R)\O, the cotangent bundle of R minus its
P~,~(x, 5) E SFd. Then for P = p(x, 0,) there exists zero section. An operator P = p(x, DX) E S;lo(0)
a constant C such that is said to be microlocally elliptic at (x0, 5’)~
T*(R)\0 if lim,,, Ip(~~,z~~)l/l~~~1”>0. For
WPu,u)> -Cll~Ilf~-~~-~,,~~, (6) a distribution u E s’(O), we say that a point
whereu=(u, ,..., u,)withujEHm”,j=l ,..., 1, (x0, 5’) of T*(R) 10 does not belong to the
and Il~ll~~~=C,!=~ll~~ll~~~~ wave front set (or the singular speclrum) of u,
(v) Invariance under coordinate transfor- denoted by WF(u), if there exist a(.<), ME
mations. Assume that 0 < 1 - p $6 < p < 1. Let CW), 4x0) #O, b(xO) 20, and PE S;to(n),
x(y)=(x,(y), .,.,x,(y)) be a Cm-coordinate which is microlocally elliptic at (xc’, to), such
transformation from R; onto R: such that that aPhu~C~(R). Then we easily see that
ax,(y)/Zyjeg, j, k= 1, . . . . n, and Cm’ < WF(u) is a closed conic subset of 7’*(R)\O. An
Jdet(a,x(y))l<C for a constant C>O, where important fact is that the relation smgsuppu =
det(Z,x(y)) denotes the determinant of the Proj, WF(u) (the projection of WF(u) on a)
Jacobian matrix (3,x(y)) = (ax,(y)/ayj). Then holds, from which we can perform a so-called
for any P = p(x, D,)E SE8 in R:, there exists microlocal analysis, the analysis on T*(R)\O,
an operator Q = 9(x, Dx)eSEd in R; such that of sing suppu. As the sharp form of the pseudo-
(Qw)(y)=(Pu)(x(y))for w(y)=u(x(y))~Y. This local property of an operator PE SFdr if
fact enables us to define pseudodifferential 0 < 6 < p < 1, P has the micro-pseudolocal prop-
operators on C”-manifolds [3,4]. erty: u~Y’ 3 WF(Pu)c WF(u).
(vi) Parametrix. For a given PsSE,, an Pseudodifferential operators of multiple
operator E E SE, is called a left (resp. right) symbol have been defined by K. 0. Friedrichs
parametrix of P if EP - I (resp. PE - I) is of (Courant Inst., 1968) and Kumano-go [4].
class Se r”. If E is a left and right parametrix of More refined and useful classes of pseudo-
1283 345 B
Pseudodifferential Operators

differential operators have been defined by R. d,cp(x, e,y), is an immersion of C, to T*(R” x


Beals (Duke Math. J., 42 (1975)) Hormander R”) ~0, the cotangent bundle of R” x R” minus
[S], and others. its zero section. The image @C, = A,, is a conic
The theory of pseudodifferential operators Lagrange manifold, i.e., the canonical 2-form
has found many fields of application, such e = Cjdtj A dxj - Cjdqj A dyj vanishes on A,
as M. F. Atiyah and R. Bott (Ann. Math., 86 and the multiplicative group of positive num-
(1967)) on the tLefschetz fixed-point formula; bers acts on A,. Let a,, . . ,a,, be a system of
Friedrichs and P. D. Lax (Comm. Pure Appl., local coordinates in A,. These, together with
18 (1965)) on symmetrizable systems; Horman- adae,, . . . . acp/aeN,constitute a system of
der [6], Yu. V. Egorov (Russian Math. Surveys, local coordinate functions of R” x (RN ~0) x R”
30 (1975)) on subelliptic operators; Kumano- in a conic neighborhood of C,,,. Let J denote
go (Comm. Pure Appl. Math., 22 (1969)), F. the Jacobian determinant
Treves (Amer. J. Math., 94 (1972)), S. J. Alin-
hat and M. S. Bouendi (Amer. J. Math., 102
(1980)) on uniqueness of the Cauchy problem;
S. Mizohata and Y. Ohya (Publ. Res. Inst. m6.d .
Math. Sci., 4 (1968)) Hormander (J. Analyse
Math., 32 (1977)) on tweakly hyperbolic equa- The function a,@=@ uJ,~@-’ is called the
tions; C. Morawetz, J. V. Ralston, and W. A. local symbol of A. Here ~1,~ is the restriction of
Strauss (Comm. Pure Appl. Math., 30 (1977)), a to C,. The conic Lagrange manifold A+, =
M. Ikawa (Pub. Res. Inst. Math. Sci., 14 (1978)) A,(A) and the symbol aAm= u,,?(A) essentially
on the exponential decay of solutions; and determine the singularity of the kernel distri-
Nirenberg and Treves [ 161, Beals and Feffer- bution k(x, y) of the Fourier integral operator
man [ 173 on local solvability theory. A. Conversely, given a conic Lagrange mani-
For recent developments in the theory of fold A in T*(R” x R”)\O and a function a,
pseudodifferential operators and its applica- on it, one can construct a Fourier integral
tions - Kumano-go [4], M. Taylor (Prince- operator A such that A,(A) = A and u,,~(A) =
ton Univ. Press, 1981), Treves (Plenum, 1981), a,. Those Fourier integral operators whose
and others. associated conic Lagrange manifolds are the
graphs of homogeneous tcanonical transfor-
mations of T*(R”)\O are most frequently used
B. Fourier Integral Operators in the theory of linear partial differential equa-
tions. Let A be a Fourier integral operator
A Fourier integral operator A: C$‘(R”)+GS’(R”) such that A,+,(A) is the graph of a homoge-
is a locally finite sum of linear operators of the neous canonical transformation x. Then the
adjoint of A is a Fourier integral operator
type
such that the associated conic Lagrange mani-
Af(~)=(2n)-‘“+~)” fold is the graph of the inverse transformation
exp(icp(x, 0, Y))
s RN+” x-l. Let A, be another such operator; if the
associated conic Lagrange manifold is the
x4x, 6 YMY) dY de. (7)
graph of xi, then the composed operator A, A
Here a(x, 19,y) is a Cm-function satisfying the is also a Fourier integral operator and the
inequality associated conic Lagrange manifold is the
graph of the composed homogeneous canon-
pp,BD;a(x, 8, y)l <C(l + lel)m-P’S’+(l-P)(‘“‘+‘u’)
ical transformation xix.
for some fixed m and p, l/2 < p ,< 1, and any A pseudodifferential operator of class
triple of multi-indices c(, p, y, and for rp(x, 8, y) 8: i-JR”) is a particular type of Fourier inte-
a real-valued function of class C” for 0 # 0 and gral operator. In fact, a Fourier integral oper-
homogeneous of degree 1 in 0 there. The func- ator A is a pseudodifferential operator of class
tion cp is called the phase function and a the 8: ,-JR”) if and only if A,(A) is the graph of
amplitude function. the identity mapping of T*(R”)\O. Hence for
Let C,={(x,e,y)Id,cp(x,8,y)=O,e#O} and any Fourier integral operator A, A*A and
W={(x,y)ER”xR”l38#Osuchthat(x,@,y)E AA* are pseudodifferential operators.
C,}. If d,,,,,rp(x,& y)#O for 6~0, then the The following theorem is due to Egorov
kernel distribution k(x, y) of A is of class C” [ 111: Let P and Q be pseudodifferential opera-
outside W. There have been detailed studies tors of class Sz ,-JR”) with the symbols
of the case where the d,,,,,(acp(x, 8, y)/&?j), p(x, 5) and q(x, <), respectively, and let A be a
j=l,2 , . . , N, are linearly independent at Fourier integral operator such that the asso-
every point of C,. In this case, C, is a smooth ciated conic Lagrange manifold A,(A) is the
manifold in R” x (RN\O) x R”, and the mapping graph of a homogeneous canonical transfor-
0: c,+, ~Y)-+,Y, 5,~)~ 5 =ba e,~), ff= mation x of T*(R”)\O. If the equality PA= AQ
345 Ref. 1284
Pseudodifferential Operators

holds, then 9(x, <)-p(x(x, 5)) belongs to the function independent of y of class S;l,. Then
class S,1~k;2p(R”). by (7) the Fourier integral operator A = A, is
Assume that m = 1, p = 1, and that p1 (x, 5) defined by
is a real-valued ?-function, homogeneous
of degree 1 in r for /<I> 1, such that p(x, 5) A,u(x)=(27c-“‘2 exp(iS(x, 5)M:c W(5M.
-pl(x,<)~S~,,(R”) and drp,(xo,~O)#O at s R”
(x0,5’), [loI> 1, where p,(x”,to)=O. Then one (8)
can find a Fourier integral operator A such Let T be the canonical transformation with
that the function 9(x, 5) of Egorov’s theorem the tgenerating function S(x, <), i.e., T is de-
satisfies the relation 9(x, 5) - 5, E Sp, o(R”). fined by y = V$(x, q), 5 = V,S(x, q). Then for
The theory of Fourier integral operators has the Fourier integral operator A, we have
its origin in the asymptotic representation of
solutions of the wave equation (- 325 Partial WF(A,u)= {(x> 5)= T(y,~)I(y,rl)~‘~F(u)},
Differential Equations of Hyperbolic Type UEY. (9)
L; also, e.g., [12,13,14]). G. I. Eskin (Math.
USSR-,%., 3 (1976)) used a type of Fourier Next consider a hyperbolic operator L = 0, +
integral operator in deriving the energy esti- p(t, x, 0,) for a real-valued symbol p(t, x, 5)~
mates and constructing the fundamental solu- B”( [O, To]; Sl,,) with some To > 0. .For a small
tions for strict hyperbolic operators. H6r- 0 < Tg To the solution S(t, x, 4) of the eikonal
mander (Acta Math., 121 (1968)) introduced equation a,S+p(t,x,V,S)=O on [O, T] with
the term “Fourier integral operators,” and the initial condition SI,,, = x. 5 exists in g”
applied these operators to the derivation of ([0, T]; S:,,). Consider the Cauchy problem
highly accurate asymptotic formulas for spec- Lu = 0 on [0, T], ultEO = uo. Then t.nere exists
tral functions of elliptic operators. Egorov an amplitude function e(t, x, <)E&?’ ([0, 7’1;
(Math. USSR-Sb., 11 (1970)) applied his theo- SF,,) such that the solution u(t) is found in
rem and the corollary stated above to the the form u(t) = E,(t)u,. On the other hand,
study of hypoellipticity and local solvability let (x, 5) =(X(t, y, q), z(t, y, a)) be the bichar-
for pseudodifferential operators of principal acteristic strip defined by +Hamilton’s ca-
type. Using Egorov’s theorem and the same nonical equation dxJdt = V,p(t, x, 0, d</dt =
corollary, Nirenberg and Treves [ 161 obtained - V,P(~, x, 5) with (x, 5),,=, = (Y, 4. ‘Then
decisive results concerning local solvability for (X(t, y, q), E(t, y, q)) can be solved by means
linear partial differential operators of principal of the relations y = V,S(t, X, q), E = V,S(t, X, II),
type; these results were completed by Beals as a family of canonical transformations with
and Fefferman [ 171. Hiirmander and J. J. a parameter tc [0, T]. Thus by means of (9)
Duistermaat [9,15] constructed a general we have
global theory of Fourier integral operators
WF(u(O)c {(x, 5) = (XQ, Y, d, W> Y, 4)
making use of Maslov’s theory [14], which
was originally published in 1965. By virtue l(~~d~WF(uo))> (10)
of this research, the Fourier integral oper- which is the fundamental result in the study of
ator has come to be recognized as a powerful the propagation of wave front sets as solutions
tool in the theory of linear partial differential of general hyperbolic equations (-. 325 Partial
operators. An interesting application of the Differential Equations of Hyperbolic Type M).
global theory of Fourier integral operators The works of Egorov, Nirenberg and
appeared in J. Chazarain (Inoentiones Math., Treves, and HGrmander motivated the theory
24 (1974)). The boundedness of Fourier inte- of hyperfunctions developed by M. Sato and
gral operators in the spaces L*(R”) (or the gave rise to the concept of tquantized contact
space H”) has been studied in several cases. transformations, which correspond to Fourier
Some suflicient conditions for boundedness integral operators in the theory of distribu-
have been obtained by Eskin (Math. USSR- tions. The above-stated transformation theo-
Sb., 3 (1967)), Hiirmander [9], D. Fujiwara rem of Egorov has been studied in detail with
[ 181, Kumano-go (Comm. Partial DiJff: Eq., 1 reference to systems of pseudodifferential
(1976)), K. Asada and Fujiwara (Japan. J. equations with analytic coefficients [ 19](-
Math., 4 (1978)), and others. A calculus of 274 Microlocal Analysis).
Fourier integral operators in R” was given in
Kumano-go [4].
The propagation of wave front sets by 1 References
means of a Fourier integral opera&r is de-
scribed as follows. Let us consider a phase [l] J. J. Kohn and L. Nirenberg, A.n algebra of
function of the form cp(x, 5, y) = S(x, 5) - y ‘5 in pseudo-differential operators, Comm. Pure
R: x R; x R;, and let a(x, 5) be an amplitude Appl. Math., 18 (1965), 269-305.
1285 346 C
Psychometrics

[2] L. Hormander, Pseudo-differential opera- 346 (XVlll.17)


tors, Comm. Pure Appl. Math., 18 (1965)
501-517.
Psychometrics
[3] L. Hormander, Pseudo-differential opera-
tors and hypoelliptic equations, Amer. Math. A. General Remarks
Sot. Proc. Symp. Pure Math., 10 (1967), 138-
183. Psychometrics is a collection of methods for
[4] H. Kumano-go, Pseudo-differential oper- drawing statistical conclusions from vari-
ators, MIT Press, 1981. (Original in Japanese, ous psychological phenomena which are ex-
1974.) pressed numerically or quantitatively. It con-
[S] A. P. Calderon and R. Vaillancourt, A sists chiefly of statistical methods to deal with
class of bounded pseudo-differential operators, psychological measurements and of theories
Proc. Nat. Acad. Sci. US, 69 (1972), 1185- dealing with mathematical models concerning
1187. learning processes, social attitudes, and mental
[6] L. Hormander, Pseudo-differential opera- abilities.
tors and non-elliptic boundary problems, Ann.
Math., 83 (1966), 129-209.
B. Sensory Tests
[7] P. D. Lax and L. Nirenberg, On stability
for difference schemes, a sharp form of Gird-
A measurement wherein human senses are
ing’s inequality, Comm. Pure Appl. Math., 19
taken as the gauge is called a sensory test. The
(1966), 473-492.
panel of judges must be composed appropri-
[8] L. Hormander, The Weyl calculus of
ately, and the examining circumstances must
pseudo-differential operators, Comm. Pure
be controlled. Various methods of psycholog-
Appl. Math., 32 (1979), 359-443.
ical measurements are applied. In the following
[9] L. Hormander, Fourier integral operators
sections we describe the basic statistical proce-
I, Acta Math., 128 (1971), 79-183.
dures used in sensory testing.
[lo] J. J. Duistermaat, Fourier integral oper-
ators, Lecture notes, Courant Institute,
1973. C. Paired Comparison
[ 1 l] Yu. V. Egorov, On canonical transfor-
mations of pseudo-differential operators (in When there are t objects (treatments or stimuli
Russian), Uspekhi Mat. Nauk, 25 (1969), 235- in some cases) O,, O,, . . . , O,, the method of
236. comparing them two at a time in every pos-
[ 121 P. D. Lax, Asymptotic solutions of oscil- sible way is called paired comparison. The
latory initial value problems, Duke Math. J., following are typical mathematical models of
24 (1957), 627-646. this method.
[ 131 D. Ludwig, Exact and asymptotic solu-
tions of the Cauchy problem, Comm. Pure (1) Thurstone-Mosteller Model. Suppose that
Appl. Math., 13 (1960), 473-508. the probability that Oi is preferred to Oj for a
[14] V. P. Maslov, Theorie des perturbations pair (Oi, Oj) is pij. Of the 12judges who compare
et mtthodes asymptotiques, Gauthier-Villars, this pair, the number who prefer Oi is nij, and
1972. the number who prefer Oj is nji = n - nij. In this
[ 151 J. J. Duistermaat and L. Hiirmander, comparison it is assumed that the strengths of
Fourier integral operators II, Acta Math., 128 the stimuli Oi, Oj to the senses are random
(1972), 183-269. variables Xi, Xi, and Oi is preferred when Xi >
jI6] L. Nirenberg and F. Treves, On local Xi. Furthermore, it is assumed that the joint
solvability of linear partial differential equa- probability distribution of Xi and Xj is the 2-
tions I, II, Comm. Pure Appl. Math., 23 dimensional tnormal distribution with pi and
(1970), I-38,459-510. 0’ as mean and variance of Xi, and ,p as corre-
[ 171 R. Beals and C. Fefferman, On local lation coefficient of Xi and Xj. There is no loss
solvability of linear partial differential of generality in assuming that 2a2( 1 -p) = I
equations, Ann. Math., (2) 97 (1973), 482- and & pi = 0. Let Q(x) be the standardized
498. normal distribution function and pij = @(pi -
[ 181 D. Fujiwara, On the boundedness of pj). Using p;= n,/n as estimates of the true
integral transformations with highly oscilla- pij, we can obtain the estimates pi. Using pz
tory kernels, Proc. Japan Acad., 51(1975), = @(pi - fij) and p; we can test the hypothesis
96-99. thatp,=p,=...=p,.
[19] H. Komatsu (ed.), Hyperfunctions and
pseudo-differential equations, Lecture notes in (2) The Bradley-Terry Model. The experi-
math. 287, Springer, 1973. mental method in the Bradley-Terry model
346 D 1286
Psychometrics

is the same as in the Thurstone-Mosteller Mosteller model is a method for scahng a set
model. It is postulated that, associated with of stimuli by means of observable proportions.
O,, O,, , O,, there exist parameters ni for Oi
(rci>O, &, rci= 1) such that P~~=z,/(zT+~L~). (2) Multidimensional Scaling (MDS). Multidi-
Obtaining the tmaximum likelihood estimator mensional scaling is a collection of methods to
of ni, we can test the appropriateness of the deal with data consisting of many measure-
models. ments on many objects and to characterize the
mutual distance (dissimilarity), or closeness
(3) Scheffk’s Model. Each pair Oi, Oj is pre- (affinity), by representing those objects by a
sented to 2n judges; n of them examine Oi first small number of indices or by points in a
and 0, next, and the remaining n examine the small-dimensional Euclidean space. It has seen
pair in the opposite order. A judgment is re- useful applications in the analysis of people’s
corded on a 7-point (or 9-, 5-, or 3-point) scale. attitude and perception and their characteri-
In the 7-point scaling system a judge pre- zations by means of a few numbers or points
sented with the ordered pair (Oi, Oj) marks one in a space of low dimension.
of the seven points 3, 2, 1, 0, -1, - 2, - 3, Historically, MDS was first developed
meaning, respectively, that Oi is strongly pre- by Torgerson (1958) and refined further by
ferable to Oj, Oi is moderately preferable to Oj, Shepard (1962) and Kruskal(l964). The
Oi is slightly preferable to Oj, no preference, Oj method developed by Torgerson and also the
is slightly preferable to Oi, etc. The mark given INDSCAL method by Carrel and Chang
by the kth judge on his preference of Oi to 0, is (1970) are called metric multidimensional scal-
denoted by X,, which can be regarded as the ing, while the method by Shepard and Kruskal
sum of a main effect, deviation of subtractivity, is called the nonmetric MDS. The former is
order effect, and error. Significance tests for applied when the data are represented in con-
these effects and estimates of various para- tinuous scales and the latter when the data are
meters are given by using statistical tlinear in discrete nominal or ordinal scale,s. Tech-
models. +BIBD, +PBIBD, etc., can also be niques of multidimensional scaling are closely
applied to paired comparisons. related and sometimes actually equtvalent to
various methods of multivariate analysis,
especially principal component analysis, ca-
D. The Pair Test, Triangle Test, and Duo-Trio
nonical correlation analysis, and discriminant
Test
analysis (- 280 Multivariate Analysis).

The pair test, triangle test, and duo-trio test


are sensory difference tests. The methods are
F. Factor Analysis
as follows. Pair test: A judge is requested to
designate a preference between the paired
Though factor analysis can be considered to
samples A and B. Triangle test: A judge is
be a method to deal with multivariate data in
requested to select two samples of the same
general (- 280 Multivariate Analysis), it has
kind out of A, A, B. Duo-trio test: A judge is
had close connections with psychometric
first acquainted with a sample A and then is
studies, in both theoretical developments and
requested to choose from A and B the one he
applications. Historically, it was initiated by
has seen in the previous step. In all the above
Spearman (1927) and developed further by
cases, the hypothesis that A and B are different
Thurston (1945) in order to measure human
and that the judge has no ability to determine
abilities from test scores. Mathema+.ically, the
the difference between them is tested by using
model of factor analysis is formulated as fol-
the tbinomial distribution.
lows: Let zjk be the standardized score of the
jth test achieved by the kth subject, j = 1, . , p;
E. Scaling k= l,..., N; then it is assumed that it can be
represented as a linear combination of r com-
(I) One-Dimensional Case. Psychometric mon factors and one specific factor as
scaling methods are procedures for construct-
zjk = aj,.f,k + aj2f2k + + aj,f,k + uju,jka (1)
ing scales for psychological phenomena. Some
of them require judgments concerning a par- wherefi,,i=l,..., r,k=l,..., N,re,presentsthe
ticular attitude that is considered unidimen- magnitude of the ith common factor (ability)
sional. Under the assumption that a psycho- in the kth subject and aji is the size of contri-
logical phenomenon is a random variable bution of the ith factor to the score of the jth
with some distribution law and the parameters test.
of the distribution law determine psycholog- Usually it is assumed that (i) V(z,) = 1, (ii)
ical scales, a psychological scaling is given by V(J)= 1 and Cov(f,,fi.)=O for i# i’, (iii) V(vj)
estimating the parameters. The Thurstone- = 1 and Cov(vj, vj,)=O, j#j’, (iv) Cov(fi, vj)=O.
1287 346 G
Psychometrics

Then it follows that f(P,; G?~= E,) (v = i, j, . . , k). When the recursive
formula can be expressed as f=f(P.; &, n),
the response probability is said to be quasi-
and independent of path [ 111. In the recursive for-
mula J two events Ei and Ej (i #j) are said to
be commutative if~..i...j...(P,)=~..j...i...(P.). If
where hi” is called the communality of thejth any two events are commutative, the condi-
variable zj and uf is called the specificity. It tion of event commutativity is satisfied. By
follows easily from (1) that any orthogonal making f explicit with respect to n, we write
transformation of the scores does not affect the P”=F(n;&“-,, . . . . &I ; P,). Under the condition
model. of event commutativity, the explicit formula
Problems of factor analysis are classified can be written P,, = F(N,, N,, . . . , N,; PI), where
into three types: Ni is the frequency of occurrence of Ei in the
(1) Estimation of communality: There are first (n - 1) trials (& Ni = n - 1). If both event
several methods of determining communality commutativity and path independence of re-
or initial estimates of it when some iterative sponse probability are satisfied, the explicit
procedure is used. formula P, =fylfp . . .f;Nf(Pl) can be obtained.
(2) Determination of factor loadings, which is
the estimation of the aji: A number of methods (2) Linear Models. In a linear model, the re-
have been proposed, among which those often cursive formula is written as a linear function
used are the MINRES by Harman (1967), the of P”.
varimax by Kaiser (1958), and the tmaximum Example (1). Bush-Mosteller model [6]. The
likelihood based on the normal model by Bush-Mosteller model assumes the response
Lawley and Maxwell. probability to be path independent. The re-
(3) Estimation of factor scores fi,: Usually fac- cursive formula is expressed as fi(P.) = aiPn +
tor scores are estimated after factor loadings (1 - cc&, &‘” = Ei. Here ai (0 < ai < 1) represents
have been determined. Thurstone proposed the degree of ineffectiveness of Ei for learning
p=ZR-‘A and Harman F=ZA(A’A)-‘, and Izi (0 < Ai < 1) is the tfixed point of fi. A
where F, Z, A are the matrices of factor scores, necessary and sufftcient condition for Ei and Ej
test scores, factor loadings, respectively, and R (ifj) to be commutative is that either fi or fj
is the correlation matrix of the Z’S be an tidentity operator or li = S.
Example (2). Estes’s stimulus-sampling
model [7]. We can consider the stimulus as a
G. Learning Theory set composed of m elements, each of which
corresponds to either response A, or A,; the
(1) General Description. Assume that a se- manner of their correspondences depends on
quence of trials is done in order to study some each trial. If J,, elements correspond to A,
given behavior and that on each trial on the nth trial, then we have P,, = J,/m. Sup-
particular events occur (stimuli, responses, pose that on the nth trial s ( <m) elements are
reinforcements, etc.) that influence the en- sampled, among which X, elements corre-
suing behavior. Then the behavior itself is spond to A,, and the remaining XL ( = s - X,)
modified by such a sequence of trials. Learning elements correspond to A,. As a result of
models refer to such processes of behavior the nth trial, if A, is reinforced, we set Y, = 1;
modification, and they are frequently repre- otherwise, we set Y, = 0. Furthermore, assume
sented by recursive formulas for response that J n+l = Jn +X, Y. - XA( 1 - Y,,). Hence, we
probabilities. obtain the recursive formula Pn+l = P, + {Xx Y,
Assume that two mutually exclusive re- - XA( 1 - Y,)}/m. In this model, the response
sponse alternatives A, and A, occur on the nth probability is path independent and 8” =
trial (n = 1,2, . . . ) with respective probabilities (X,,, YJ. Other linear models have been pro-
P,, and 1 -P,,, and that an event Ei occurs on posed in which the response probability is
the nth trial with Pr(& = Ei) = xi (i = 1,2, . . , t; either quasi-independent of path [8] or path
xi=, 7ti = 1). Then the recursive formula for P,, dependent [lo].
is of the form Pn+l =f(Pn; 6$, CI?“.-~,. . , gl). If
the formula can be written as f=f(P.; & (3) Nonlinear Models. In nonlinear models the
Gl,..., &,,), then the response probability is recursive formula cannot be written as a linear
called d-trial path dependent. In the special function of P,.
case d = 0, it is called path independent. For Example (3). Lute’s P-model [9]. Let the
simplicity we write f(P,; c?“= I$, CF.-~ = Ej, . . , response strengths of A, and A, on the nth
8, =Ek)=Jj..,k(Pn) (ij, . . . . .k= 1,2, . . . . t). If the trial be u, and oh, respectively (both positive),
response probability is path independent, and assume that P,, the response probability
then &...,(Pn) =I&. . . fk(Plb where f,(R) = of A,, is expressed as un/(un + ub). The response
346 Ref. 1288
Psychometrics

strengths u,, and L$ depend on each trial. Under [lo] S. H. Sternberg, A path-dependent linear
the assumption that the response strength is model, Studies in Mathematical Learning
path independent and that u.+~ changes in- Theory, R. R. Bush and W. K. Estes (eds.),
dependently from ub, the recursive formula of Stanford Univ. Press, 1959, 308-33!).
u, is written as u,,+~ = cpi(u,), &” = Ei. Here, if we [ 111 S. H. Sternberg, Stochastic learning
assume ~i(u)>O for u>O and cpi(cu)=ccpi(u) for theory, Handbook of Mathematical Psy-
u > 0 and c > 0, then cpi(u,,)= /&u. with /ii > 0. In chology II, R. D. Lute, R. R. Bush, .and E.
a similar way, the recursive formula for vi can Galanter (eds.), Wiley, 1963, l-120.
be expressed as cpi(uL) = /j’iuL (/$ > 0). Therefore For MDS,
we have Pn+~ =P,l{Pn+bi(l -4)) (bi=Pi’lPA [ 121 J. B. Kruskal, Nonmetric multidimen-
gn = Ei. This model is nonlinear, and the re- sional scaling: A numerical method, Psycho-
sponse probability is path independent. By metrika, 29 (1964), 115-129.
making the recursive formula explicit, we [ 131 R. N. Shepard, The analysis of proxim-
obtain P,, = P,/{P, +(l -Pl)exp(& N,logb,)}. ities: Multidimensional scaling with an un-
Hence it is clear that the events are commuta- known distance function, Psychometrika, 27
tive. Other nonlinear models in which the re- (1962), 125-219.
sponse probability is either quasi-independent [ 141 L. Guttman, Multiple rectinear predic-
of path [S] or path dependent [9] have also tion and the resolution into components,
been proposed. Psychometrika, 5 (1940), 75-99.
Here we have taken up only the case in [ 151 J. B. Carrol and J. J. Chang, Analysis of
which the number of response alternatives is individual difference in multidimensional
2, but we can generalize to the case of more scaling via an N-way generalization of Eckart-
than two alternatives. For fitting a model and Young decomposition, Psychometrika, 35
experimental data, expected response prob- (1970), 283-319.
abilities and various other statistics deduced For factor analysis,
from the model (total error, trial number of [ 163 C. Spearman, The abilities of man, Mac-
first success or last error, and sequential sta- millan, 1927.
tistics such as length of response +run or +auto- [ 171 H. H. Harman, Modern factor analysis,
correlation between responses) are used. Es- Univ. of Chicago Press, 1967.
timation methods have also been devised for [ 1S] H. F. Kaiser, The varimax critlzrion for
the parameters involved. analytic rotation in factor analysis, Psycho-
metrika, 23 (1958), 187-240.
[ 191 D. N. Lawley and A. E. Maxwell, Factor
analysis as a statistical criterion, Butterworth,
References
1963.
[20] K. Takeuchi, H. Yanai, and B. N. Muk-
For sensory tests, herjee, Foundations of multivariate analysis,
[l] H. A. David, The method of paired com- Wiley, 1982.
parisons, Hafner, 1963.
[2] J. P. Guilford, Psychometric methods,
McGraw-Hill, second edition, 1954.
[3] M. Ci. Kendall, Rank correlation methods,
Griffin, third edition, 1962.
[4] W. S. Torgerson, Theory and method of
scaling, Wiley, 1958.
For learning theory,
[S] R. J. Audley and A. R. Jonckheere, The
statistical analysis of the learning process, Brit.
J. Statist. Psychol., 9 (1956), 87-94.
[6] R. R. Buch and F. Mosteller, Stochastic
models for learning, Wiley, 1955.
[7] W. K. Estes, Component and pattern
models with Markovian interpretations,
Studies in Mathematical Learning Theory, R.
R. Bush and W. K. Estes (eds.), Stanford Univ.
Press, 1959,9-52.
[S] M. I. Hanania, A generalization of the
Bush-Mosteller model with some significance
tests, Psychometrika, 24 (1959), 53-68.
[9] R. D. Lute, Individual choice behavior: A
theoretical analysis, Wiley, 1959.
347 A 1290
Quadratic Fields

D. The Kronecker Symbol


347 (V.12)
Quadratic Fields Let k = Q(&), and let d be the discriminant
of k. We define the symbol x for k as follows:
A. General Remarks (i) x(p)= 1 if (p)=pp’ (pfp’) in 0; (ii) x(p)= - 1
if (p) = p in o; (iii) x(p) = 0 if (p) = c ’ in O; and
Any tcxtension field of the rational number (iv) x(n) = ni x(pi)‘i for n = ni p;i 3>0. In partic-
field Q of degree 2 is called a quadratic field. ular, we define x(l)= 1. If (n,d)= 1, the sym-
Any quadratic field k is obtained from Q by bol x can also be defined using the +Jacobi
adjoining a square root of a square-free integer symbol as follows: If m 3 1 (mod4), then x(n) =
(i.e., an integer #O, 1 with no square factor) (n/jml); and if m=2,3 (mod4), then x(n)=
m:k=Q(fi). If m is positive (negative), k is xz(n)(n/lm’j) for d=2em’, where (i) x:(n)=
called a real (imaginary or complex) quadratic (- I)(“-‘)‘~ for e = 2, m’ E 3 (mod 4); (ii) x:(n) =
field (- 14 Algebraic Number Fields). Let (- l)(nz-‘)/B for e=3, m’s 1 (mod4); and (iii)
(n*~lm+(n-1)/2 for e=3,m'c3
x:(4=(- 1)
(1 + fi)/2 for mz 1 (mod4),
(mod 4). If (n, d) # 1, then x(n) =C#. For a nega-
for m = 2,3 (mod 4).
(O= t fi tive integer -n, we define X(-n+=(sgnd)X(n).
The symbol x(n) for nE Z is called the Kro-
Then (1, w) is a +minimal basis of k. That is,
necker symbol for k.
any talgebraic integer x of k has the unique
The Kronecker symbol for k has the follow-
expression x = u + bw with a, b E Z. The +dis-
criminant d of k is given by d = m in case m z 1 ing four properties: (1) x(n) = 0 ii‘ (n, d) # 1, and
x(n)=*1 if(n,d)=1;(2)X(m)=X(n)ifm=n
(mod 4) and d = 4m in case m E 2,3 (mod 4). The
conjugate element of an element a = a + b& (modd); (3) x(mn)=X(m)X(n); (4) x(n)= 1 if and
only if n= N(a) (modd) for some integral ideal
(a, bE Q) of k over Q is given by a’ = n -b&.
The mapping (r: a+x’ is an tautomorphism of a of k such that (a,(d)) = 1. (Property (4) shows
that a quadratic field provides a class field; -
the field k.
59 Class Field Theory.)

B. IJnits
E. Ideal Classes
Let k be an imaginary quadratic field. The
tunitsofkare +l, +iincasem=-1; fl, The +class number h of k was calculated by
+ uo, + o~(w, =( 1 + fl)/2) in case nr = - 3; P. G. L. +Dirichlet (1840) by analytical meth-
and _+ 1 in all other cases. ods as follows:
L,et k be a real quadratic field. There exists
a unit E” that is the smallest one among the hlog+,= -idi &)logsinF for m>O,
units (> 1) of k. Any unit E of k can be uniquely I 1
expressed in the form F:= + a$ (no Z). That is,
*E; ’ is a ifundamental unit of k. The fun- h=Gldir x(v)r for m<O,
I I
damental unit ~~=(~+y&)/2 (> 1) can be
calculated by finding a minimal positive inte- where d is the discriminant of k, Ed is the
gral solution (x, y) of +Pell’s equation x2 - positive fundamental unit (> I) of k, w is the
dy’ = + 4 by using continued fractions (- 83 number of roots of unity in k, and x is the
Continued Fractions; for a table of the fun- Kronecker symbol.
damental unit of k for m < 100 - [ 11). Denote by h(d) the class number of the
imaginary quadratic field with discriminant d.
It was conjectured from the time of C. F.
C. Prime Ideals +Gauss that h(d)-+ cxzas Id I+ co. This conjec-
ture was proved by H. Heilbronn (1934). More
The decomposition of a prime number p in the precisely, C. L. Siegel (Acta Arith., 1 (1935))
tprincipal order D of k is given as follows: (i) proved
Let p 1d, where d is the discriminant of k. Then
p is decomposed in o in the form (p) = p2. (ii)
Letp#2and(p,m)=l.lf(m/p)=l,then(p)=
pp’ in o (p # p’) and N(p) = N(p’) = p. If (m/p) = h(d)=1 holdsfor(d(=3,4,7,8,11,19,43,67,
-- 1, then (p)=p in o and N(p)=p’. Here p, p’ 163. In 1934, Heilbronn and E. H. Linfoot
are prime ideals of o, N means the +norm, and proved that there can be at most one more
(m/p) is the +Legendre symbol. (iii) Let 2/ld, such d, and finally A. Baker and H. M. Stark
that is, m E 1 (mod4). If m = 1 (mod 8), then independently proved that these nine numbers
(2) = pp’(p # p’) and N(p) = N(p’) = 2. If m = 5 are the only ones for which h(d)= 1 (Baker,
(mod 8) then (2) = p and N(p) =4. Muthematica, 13 (1966); Sta.rk, Michigan Math
1291 347 H
Quadratic Fields

J., 14 (1967)). Also, Baker and Stark proved genera of k, and a/$ is an Abelian group of
independently (Ann. Math., (2) 94 (1971); Math. type (2,2,. . . ,2).
Comp., 29 (1975)) that h(d) = 2 holds only for In order that an ideal class C belong to the
IdI = 15,20,24,35,40,51,52,88,91,115,123, principal genus, it is necessary and sufficient
148,187,232,235,267,403, and 427. that C = C: --d = C: hold for some ideal class
For real quadratic fields, we have C,. From this it follows that there are t - 1
tinvariants of the ideal class group 6 of k
2-i (log(h(d)log&,))/(log Jli, = 1, that are powers of 2. Each ideal class C such
that C” = C is called an ambig class of k. There
where .sdis the fundamental unit of k = a(,/$ are 2’-’ ambig classes, and they form an
(E, > 1). However, it is not yet determined Abelian group of type (2,2, . . . ,2). Each ideal a
whether there exist infinitely many d with h(d) of k with au= a is called an ambig ideal of k.
= 1 (- Appendix B, Table 4). Let (pi) = pf (i = 1, . , t). Then each ambig
ideal is uniquely expressed in the form a =
p;’ . . . pft(a) by some a~ Q and vi = 0, 1. Each
ambig class contains exactly two ambig ideals
F. Genera
of the form p ;I.. . p,“*. For example, for k =
Q(n) we have d= -2’.5.13, t=3, h=
Let G be the group of all (tfractional) ideals of 8, CL= {E, A, A’, A3, B, AB, A’B, A3B}, where
k, and let H be the group of all tprincipal A4 = E, B2 = E, A”= A3, and B”= B; the prin-
ideals (a) of k such that N(a) > 0. Each coset of cipal genus is 8 = {E, A’}, and the ambig
G modulo H is called an ideal class in the classes are {E, A*, B, A’B}.
narrow sense. (This notion is a special case of
the notion of ideal classes in the narrow sense
of algebraic number fields; - 14 Algebraic G. Norm Residues
Number Fields G.) In the cases (i) m < 0 and
(ii) m > 0, N(E,,) = - 1, the usual classification A quadratic field k is the tclass field over Q for
of ideals and classification of ideals in the an ideal group H. The tconductor of H is said
narrow sense are identical. When m > 0, N(E,,) to be the conductor of kJQ or simply the con-
= 1, each ideal class is divided into two ideal ductor of k. The conductor f = n,, f, of k =
classes in the narrow sense. We call an ideal Q(Ji)isgivenbyf=dform>Oandf=
class in the narrow sense simply an ideal class. dp, for m < 0. That is, the p-conductor f, = p
Let p1 , . . . , pt be the set of all prime numbers forpId,p#2;andfz=2’for2jd,d=2’m
dividing d. For n EZ with (n, d) = 1, we define ((2, m’) = 1). By means of the tHilbert norm-
x1(4,..., x,(n) as follows: For pi # 2, we define residue symbol, the Kronecker symbol x is
x,(n) = (n/pi); for pi = 2, we identify xi with XT in expressed by
the definition of the Kronecker symbol. In
order that xl(n)= . . . =x,(n)= 1 for neZ with x(a)=g $ for(d)=1,
(n, d) = 1, it is necessary and sufficient that
n = N(a) (modd) for an integer tl of k (where (i)0 for (a, d) # 1.
N(a) = a~‘). Since x1 xz . . . xt is equal to the
Kronecker symbol, it follows that n = N(a) H. History
(mod d) for an integral ideal a of k is a neces-
sary and sufficient condition for x1(n) . . . The arithmetic of quadratic fields was origi-
x,(n)= 1 to hold for ncZ with (n,d)= 1. Put nally developed in terms of the theory of binary
&i = xi(N(a)) (i = 1, . . , t), where a is an integral quadratic forms with rational integral coefi-
ideal with (a, (d)) = 1. Then (Ed, . . . , Ed)is uni- cients by Gauss and Dirichlet [2]. The theory
quely determined for the id&al class C contain- was then translated into the terms of ideal
ing a and does not depend on the choice of a. theory by J. W. R. Dedekind [2] (- 348 Qua-
The set fi of all ideal classes of k such that dratic Forms M). For example, the theory of
(El ,...) Et)=(l, . ..) 1) is called the principal genera for quadratic fields explained in Section
genus of k, and 9 is a subgroup of the ideal F was first developed by Gauss in terms of
class group (r of k. Each coset of (E modulo 43 binary quadratic forms, and the class number
is called a genus of k. For each genus, the formula was obtained by Dirichlet as a formula
values ci = Xi(N(a)) (i = 1, . . , t) are uniquely for binary quadratic forms. Hilbert [4, ch. 21
determined. We call (cl, . . . , EJ the character developed the arithmetic of quadratic fields
system of this genus, and each genus is uni- systematically by introducing the Hilbert
quely determined by its character system. A norm-residue symbol (- [1,5,6]). Later the
necessary and sufficient condition for (cl, . . . , EJ arithmetic of quadratic fields assumed the
to be a character system for some genus is that aspect of a simple example of class field theory
~~=fland~~~~... E, = 1. Hence there are 2’-’ (- 59 Class Field Theory).
347 Ref. 1292
Quadratic Fields

References =(x,Ax)=‘xAx; 2-‘B(x,y)=(x,A,y)=‘xAy.


We say that Q is nondegenerate if B(x, y) is
[l] Z. 1. Borevich and I. R. Shafarevich, Num- nondegenerate (i.e., if 1A I# 0).
ber theory, Academic Press, 1966. (Original in Consider a linear substitution xi = Cjm& PijxJ
Russian, 1964.) (i.e., x = Px’, with an n x m matrix P). Then we
[2] P. G. L. Dirichlet, Vorlesungen iiber Zah- get a new quadratic form Q’(x’) with the ma-
lentheorie, herausgegeben und mit Zusgtzen trix ‘PAP. If each pij belongs to the field K (to
versehen von R. Dedekind, Viewig, fourth a subring R of K that contains the unit ele-
edition, 1894 (Chelsea, 1969). ment of K), we say that Q represents Q’ over
[3] H. Hasse, Vorlesungen iiber Zahlentheorie, K (resp. R). A basic problem in the theory of
Springer, 1950. quadratic forms is to determine the exact
[4] D. Hilbert, Die Theorie der algebraischen conditions under which a given quadratic form
Zahlkcrper, Jber. Deutsch. Math. Verein. 4, I- Q represents another quadratic form Q’. The
XVIII (1897), 175-546. (Gesammelte Abhand- problem of representing numbers by a quadra-
lungen I, Springer, 1932,63-363; Chelsea, tic form (representation problem) is the par-
1967.) ticular case corresponding to m = 1. Any qua-
[S] B. W. Jones, The arithmetic theory of dratic form Q represents 0 by taking the zero
quadratic forms, Math. Assoc. of America and matrix as P. Hence, by the expression “Q
Wiley, 1950. represents 0 over K,” we usually mean the
[6] E. G. H. Landau, Vorlesungen iiber Zah- nontrivial representation of 0 over K by Q, i.e.,
lentheorie I, II, III, Hirzel, 1927 (Chelsea, Q(x) = 0 for some nonzero vector :r. If Q is
1969). nondegenerate and represents 0, then it repre-
sents any element of K*. Given an element p
in K*, we consider the quadratic form Q’ de-
fined by Q’(~~,...,x~+~)=Q(x)--px~+~. Then
Q represents p if and only if Q’ represents 0.
348 (III.1 5) Another important special case is that of
Quadratic Forms n = m. Then the discriminant of Q’ is given by
IPI’(AI. In particular,ifIPI#O(IP( is an inver-
tible element of R), we say that Q is equiva-
A. General Remarks lent to Q’ over K (resp. R). This gives rise to
an equivalence relation. Equivalent forms
A quadratic form Q is a quadratic homoge-
have the same rank. On the other hand, if the
neous polynomial with coefficients in a +field
rank of Q is Y, then Q is equivalent to a form
K, written
Cf=, a,x? over K (ui # 0, i = 1, . . . , 7). Generally,
Q(xl, . . . . x,)= c cikxjxk. for elements a and b in K* = K - { 0}, we write
1biQkSn
a-b if a. b-’ E (K*)2. Then if Q is (equivalent to
If the coefficients cik belong to the field of real Q’, we have A(Q) - A(Q’).
(complex) numbers, we call Q a real (complex) When we specify a field K, we assume that
quadratic form. Let V be an n-dimensional the coefficients of the quadratic forms and the
vector space over K. For a vector x in V coordinates of linear transformations are all
whose coordinates are x1, ,x., we put Q(x) contained in the field K. In particular, the
= Q(x 1, , x,). This gives rise to a mapping equivalence of the forms is equivalence over K.
X-Q(I) of I/ into K. Such a mapping satisfies
the following two conditions: (i) Q(ax)=a*Q(x)
B. Complex Quadratic Forms
((1E K); and (ii) Q(x + Y) - Q(x) - Q(Y) = B(x, Y)
is a tsymmetric bilinear form on V. Conversely,
If K is the field of complex numbers C, then
if a mapping Q: V+K satisfies these two con-
a form of rank r is equivalent to the form
ditions. then Q must come from a quadratic
CI=l x”; hence over C two forms of the same
form (-- 256 Linear Spaces). B(x, y) is called
dimension are equivalent if and only if they
the symmetric bilinear form associated with Q.
are of the same rank.
We assume that the tcharacteristic of K is
not 2. Putting uik = ski = cik/2 (i < k), a,, = cii
(i= 1, , n), we have Q(x) = x& uitxixk. The 1 C. Real Quadratic Forms
matrix A = (uik) is the matrix of the quadratic
form Q, and the determinant 1A 1 is the dis- Now let K be the field of real numbers R. If Q
criminant of Q, denoted by A(Q). (Sometimes, is of rank r, then it is equivalent to the form
instead of 1Al, we call ( -l)n(n-‘)‘22”l Al, the cfL’=, xf 1 x4=
, lxi+j(p+q=r).Here,pandq
discriminant of Q.) The rank of the matrix A is are uniquely determined by Q (Sylvester’s law
called the rank of Q. Using the notation for the of inertia). We call (p, q) the signature of Q.
tinner product of vectors, we can write Q(x) Two quadratic forms of the same dimension
1293 348 E
Quadratic Forms

are equivalent if and only if they have the Q~(x~,...,x,)+Q,(~~+~,...,x.+,).Q,~Q,


same signature. A quadratic form with the is called the direct sum of Q, and Q2. The
signature (n, 0) (resp. (0, n)) is called a positive matrix of Qi 0 Qz is the direct sum of the
(negative) definite quadratic form. Q is called a matrices of Qi and Q2. If Qi and Q; are
definite quadratic form if it is either positive or equivalent and Q1 @ Qz and Q; @ Q; are also
negative definite; otherwise it is an indefinite equivalent, then Qz and Q; are equivalent
quadratic form. Each of the following con- (Witt’s theorem).
ditions is necessary and sufficient for a form Q The quadratic form xi x2 +x3x4 + . . . +
to be positive definite: (i) for any nonzero real x+~x~~ is called the kernel form and is de-
vector x we have Q(x) > 0; (ii) all the tprincipal noted by N,. Any nondegenerate quadratic
minors of the matrix of Q are positive. Q is form Q(x, , . . . ,x,) is equivalent to the direct
negative definite if and only if -Q is positive sum of a kernel form N,(x, , . . , x,,) and a form
definite. A form with n variables is called a Q&,+1. . . . >x,), where if Q,,#O, i.e., n> 2r, we
positive (negative) semidefinite quadratic form have Q,,(x2,+i, . . . . x,)=0 only if xa,+i = . . . =x,
if its signature is (r, 0) (resp. (0, r)), where =O. N, and Q,, are uniquely determined by Q
1 <r<n. up to equivalence. The decomposition N, @ Q.
A linear transformation x’+x = 6%’ that is called the Witt decomposition of Q (E. Witt
leaves invariant the unit form Cy=i XT is an [S]). The number r is called the index of Q. An
orthogonal transformation. Then P is an tor- element x in I’ is said to be singular with re-
thogonal matrix. Any quadratic form can be spect to Q if Q(x) = 0. A subspace IV of V is
transformed into a diagonal form CG1 a,xf via said to be totally singular if all the elements
an orthogonal transformation. Here a,, . . . , a, in IV are singular. Let B be the symmetric
are the teigenvalues of the matrix of the form. bilinear form associated with Q. Then x is
Two forms Q and Q’ are equivalent with re- singular with respect to Q if and only if B(x, x)
spect to an orthogonal transformation if and = 0 (characteristic of K # 2). We say that x is
only if the corresponding matrices have the isotropic if B(x, x) = 0. Thus a subspace W is
same eigenvalues. totally singular if and only if it is totally iso-
tropic (i.e., B(x, y) = 0 for all x, y E W). The
index I of Q is the dimension of a maximal
D. Quadratic Forms over Finite Fields and totally singular subspace of K In particular, if
p-adic Number Fields K = R and (p, q) is the signature of Q, then the
index r = min(p, q). Here we must be careful,
Let Q and Q’ be nondegenerate quadratic since some authors call the number p-q or p
forms over the tlinite field F,. They are equiva- or q the index of Q. To make the distinction
lent if and only if they have the same rank and clear, we also call our r the index of total
A(Q) - A(Q’). Moreover, if the rank of Q is not isotropy, and the number p-q the index of
less than 3, then Q represents 0. inertia.
Next, suppose that Q and Q’ are nondegen- Necessary and sufficient conditions for a
erate quadratic forms over the tp-adic number nondegenerate Q to be a kernel form are:
field K. They are equivalent if and only if they n = the rank of Q = 0 (mod 2) when K = C;
have the same rank, A(Q)-A(Q’), and they n=O (mod2) and p-q=0 when K=R; nz
have the same Minkowski-Hasse character x, 0 (mod2) and A(Q)- 1 when K=F9; n=O
where 1 is defined as follows: Let C(Q) be the (mod2), A(Q)- 1, and x(Q)= 1 when K is a
tclifford algebra of Q, and let C*(Q) denote p-adic number field.
C(Q) if n is even and C’(Q) if n is odd. Then Let N, 0 Qo, N, @ Qb be Witt decompo-
x(Q) = 1 or -1 according as C*(Q) = M,(K) or sitions of Q and Q’, respectively. We say that Q
M,(K) @ D(K). (Here M, is the total matrix and Q’ belong to the same type if Q. and Qb
algebra of degree t over K and D(K) is the are equivalent and denote the set of types of
unique tquaternion algebra over K.) Also, if Q nondegenerate quadratic forms over K by W.
has rank not less than 5, then Q represents 0 in We define the sum of the types of Q and Q’ as
K. the type of Q @ Q’, and this gives W the struc-
ture of a commutative group. The type of a
kernel form is the identity element of this
E. Quadratic Forms over a General Field K group W, called the Witt group. The structure
of the Witt group depends on K. If K = C, then
The following facts are valid on any field K W=Z/2Z,ifK=R,thenWzZ;ifKisa
whose characteristic is not 2. Given a quadra- tlocal field with a tnon-Archimedean valua-
tic form Q i with variables xi, . . . , x, and an- tion, then W is a finite group; if K = F4, then
other form Qz with variables x”+i, . . . , x,+,, W~(Z/2Z)+(Z/2Z)ifq=l(mod4),W~Z/4Z
we get a new quadratic form Qi @ Qz or if q = 3 (mod4), and W z Z/22 if q is a power
Q,+Q,definedbyQ,OQ,(x,,...,x,+,)= of 2.
348 F 1294
Quadratic Forms

F. Hermitian Forms p-completion of K, and Q and Q’ be nonde-


generate quadratic forms over K. Then Q rep-
An expression H(x) = x7,,=, ai&xk is called a resents Q’ over K if and only if Q represents
Hermitian form if aik E C, uik = a,,,. (Here Zik, Xi Q’ over K, for all p, and Q represents 0 in K
are the complex conjugates of aikr xi, respec- (i.e., there exists a nonzero vector :c whose co-
tively.) The value H(x) is a real number. As for ordinates belong to K such that Q(x) = 0) if
quadratic forms, we define the notions of the and only if it represents 0 in K, for all p [3,4].
matrix of H and the discriminant, rank, and In particular, Q and Q’ are equivalent over K
+sesquilinear form associated with H. The if and only if they are equivalent over K, for
matrix A of H is a +Hermitian matrix whose all p. Hence the invariants with respect to
principal minors are real numbers. If we equivalence over K of a nondegenlzrate qua-
apply a linear transformation P(x’) =x, we dratic form Q over K are n = the rank of Q, A
obtain a Hermitian form with respect to x’ = the discriminant of Q, the Minkowski-Hasse
whose matrix is given by ‘PAP. Any Her- character xP for +prime divisors p of K, and the
mitian form is equivalent to a form Cp=i Xixi index of inertia j, of Q over K,m,i for each
-C&, T~+~x,+~; (p. 4) is called the signature of +realinfiniteprimedivisorp,,,(i~=l,...,r,)of
H. We define the notions of positive definite, K. Here the following properties hold: (i) xP= 1
negative definite, and indefinite Hermitian for all but finitely many p; (ii) npxp = 1 (this is
forms as we did for quadratic forms over the equivalent to the tproduct formula of norm-
field of real numbers. Each of the following residue symbols); (iii) A -( -l)(n2+j;!12 in K,%,,;
conditions is necessary and sufficient for H to and (iv) xp, i = 1 ifj,-0, 1, 2, 7 (mod 8), = -1
be positive definite: (i) H(x)>0 for any non- if j, = 3, 4, 5: 6 (mod 8) [3,4]. Conversely, if
zero complex vector x; (ii) all the principal the system {n, xv, x y,,,,jl, A} satislies condi-
minors of the matrix of H are positive. The tions (i)-(iv), then it is the set of invariants of a
definition of a semidefinite Hermitian form is quadratic form over K (Minkowski-Hasse
given in the same manner as for a quadratic theorem). In general, if a property concerning
form. K holds if and only if it holds for all K,, we
A linear transformation that leaves the say that the Hasse principle holds for the
Hermitian form Cy=, Yixi invariant is called a property.
unitary transformation, and its matrix is a
iunitary matrix. Any Hermitian form can be
H. Class and Genus of a Quadratic Form
transformed via a unitary transformation into
a diagonal form Cr=, aixixi, where a,, , a, Let K be an algebraic number field of finite
are the eigenvalues of the matrix of the Her- degree. Quadratic forms Q and Q’ over K are
mitian form.
said to be of the same class if they are equiva-
The notion of Hermitian forms can be gen-
lent over the tprincipal order o in K. On the
eralized as follows: Suppose that K is a tdivi- other hand, Q and Q’ are said to be of the
sion ring with an involution u+u (LIE K) (i.e., same genus if(i) they are equivalent over
n-5 is a linear mapping of K onto itself such
the principle order oy in K, for all non-
that Z = u, ah = ha). Then a Hermitian form H
Archimedean prime divisors p of K and (ii)
over K is defined by
they are equivalent over K, for all the Archi-
medean prime divisors p of K. A genus is
H(x)= t riuikxk, decomposed into a finite number Iof classes.
i,k=l
For example, if K is the field of rational num-
where xig K, U,~E K, uik =uki. In particular, if
bers, the number of classes in the genus of
we have, for any given vector x whose coordi-
X:1 x2 is 1 for m < 8, while it is > 2 for m > 8.
nates belong to K, an element u in K such that
H(x) = a + a, then we have a Witt decompo-
sition for H. Two examples of such K having I. Reduction of Real Quadratic Forms
involutions that differ from the identity map-
ping are a separable quadratic extension K of Let A be an m x m matrix and X an 112x n
a field I, and a tquaternion algebra K over a matrix. We put A[X] ='XAX. Then we can
field L. write Q(x) = S[x] = ‘xSX, where S is the matrix
of the quadratic form Q. In this section we put
K = R and define two forms to belong to the
G. Quadratic Forms over Algebraic Number same class if they are equivalent over the ring
Fields of rational integers. We identify the form Q
with its matrix S = (sij). Let S be a positive
Let K be an algebraic number field of fi- definite form in m variables. Then S is said to
nite degree, p be an +Archimedean or non- be a reduced quadratic form if S[gJ > skk and
Archimedean +prime divisor of K, K,, be the s,r+i>O(l ,<k<m,l</<m-I), whereq,isan
1295 348 K
Quadratic Forms

arbitrary vector whose coordinates gl, . , gm where S,, S,, . . is a complete system of re-
are integers such that (gk, . , g,,,) = 1. Any class presentatives of the classes in the genus of
of positive definite quadratic forms contains at S. M(S) is called the measure of genus of S.
least one (and generally only one) reduced On the other hand, for a natural number q,
form. For a reduced form R = (rkl), the follow- we denote by A,($ T) the number of the solu-
ing inequalities hold: 0 < r, 1 < rz2 < . . < r,,; tions of the congruence equation S[X] = T
f2r,,<r,,, k<l; r11r22...rmm<c(m)~R~, where (mod q). If q is a prime power p”, then the ratio
c(m) depends only on m. The set of all sym- ht,” q-mn+“(n+1)‘zA4(S, T) takes a constant value
metric matrices of degree m forms a linear SC,@, T) for sufficiently large a (where E,,,= l/2
space of dimension m(m + 1)/2 in which the if m = n > 2; = 1 otherwise). Furthermore, let us
subset ‘$3 formed by the positive definite sym- consider a domain B in the Euclidean space of
metric matrices is a convex open subset. More- dimension n(n + 1)/2 formed by the set of n x n
over, the subset ‘% formed by all reduced symmetric matrices containing T, and let B, be
positive definite symmetric matrices is a con- the domain formed by the matrices X such
vex cone whose boundary consists of finitely that S [X] E B. Then B, is a domain in the
many hypersurfaces and whose vertex is the space of dimension mn formed by all m x n
origin. Let S be an indefinite quadratic form matrices. Let a,@, T) be the limit of the ratio
whose signature is (n, m - n). The set of positive @,)/u(B) of the volumes of B and B, as the
definite quadratic forms H such that S-’ [H] domain B shrinks toward the point T. Then
= S forms a variety of dimension n(m - n), Siegel’s theorem states: CL,(S, T)&cr,(S, T) =
which is denoted by H(S). We say that S is ~A,(S,T),where.~=2ifm=n+l orm=n>2
reduced if H(S) fl % # 0. Given a natural and E= 1 otherwise. The infinite product of the
number D, there are only a finite number of left-hand side of this equation does not con-
definite or indefinite reduced quadratic forms verge absolutely if either m = n = 2 or m = n + 2,
with rational integral coefficients whose dis- and in those cases the order of the product Q,
criminant is +D. Hence the number of classes is considered to be the natural order of the
of quadratic forms with rational integral coef- primes p.
ficients and discriminant fD is also finite. A special case of Siegel’s theorem was
proved by H. Minkowski, but it was C. L.
J. Units Siegel [6] who proved it in its general form.
Except for a finite number of p, the numbers
Let S be a symmetric matrix of degree m with cr,(S, T) have been calculated. The explicit
rational coordinates. Let O(S) be the set of all form of tl,(S, T) is also known. In particular, if
real m x m matrices W for which S[ W] = S, we take the identity matrix E(“‘) of degree m as
and T(S) be the subset of O(S) consisting of S, then the formula in Siegel’s theorem is re-
the integral matrices. An element of T(S) is lated to the problem of expressing natural
called a unit of S. T(S) is a finite group if S is numbers as sums of m squares. For m =
definite, but otherwise it is infinite (except for 2,3, . ,8, the genus of E(“‘) contains only
the case m = 2, --IS1 =r*, with rational r). O(S) one class. Hence, putting n = 1, T = t ( = a
is a tLie group, and T(S) is a discrete subgroup natural number), we obtain from Siegel’s
with a finite number of generators. The thomo- theorem the number of ways in which we can
geneous space o(syr(s) is of finite measure express t as the sum of m squares [6, pt. I].
with respect to a tHaar measure defined on the Siegel’s result was generalized by Siegel himself
space. to the case where the form S is indefinite [6, pt.
II] and where the coefficients of the forms are
elements of an algebraic number field of finite
K. Minkowski-Siegel-Tamagawa Theory
degree [6, pt. III]. Also, regarding the number
Let S and T be rational integral positive de- of possible ways to express a natural number t
finite symmetric matrices of degree m and n, as a sum of m squares, the following formula
respectively (m 2 n). Let A(S, T) be the number was obtained by C. G. J. Jacobi for the case
of rational integral solutions for the equation where m=4, n= 1:
S[X] = T, and E(S) be the order of the group
of units r(s). We put
‘WlJ-)+‘w,J) For the case m = 3, n = 1, it is known that if t is
Mb% T) = E(Sl)
E(S,) +..., odd and A(Ec3), t) > 0, then t + 7 (mod 8) (for
1 1 details - P. T. Batemann, Trans. Amer. Math.
M(S)=- - sot., 71 (1951)).
E(S,)+E(S,)+.-’
T. Tamagawa used the theory of tadelized
Mb% T) algebraic groups and proved that the tTama-
4dS> T)= M(S) >
gawa number t(SO(n, S)) of the special ortho-
348 L 1296
Quadratic Forms

gonal group is 2. He also showed that from plained shortly), we again have a one-to-one
this fact, Siegel’s theory in this section can be correspondence between the classes of ideals of
deduced (- 13 Algebraic Groups P) [lo]. this order and the classes of quadratic forms
[3]. We let w=$/2 if d=O (mod4);=(1+
$)/2 if d = 1 (mod 4). When D > 0, we can
L. Theta Series introduce the notion of proper equivalence as
follows: Q and Q’ are properly equivalent if the
Let Q(xi, , xm) be a positive definite form matrix of Q is transformed to the Imatrix of Q’
with integral coefficients. For a complex num- by a linear transformation P whose determi-
ber z, we put nant is 1. Then, in the correspondence for the
Fk Q) = c expWiQ(x,, , x,)4, casef=l,ifwetakea,=r>O,a,:=s+tw,t>O
11. ....*. (r, s, teQ), then we get a relation between the
where xi, , x, run over all the integers. If classification of the forms and the classification
Tm z > 0, the series converges and represents in the finer sense of the ideals in k
an tentire function of z. These series are Suppose that D > 0 is not a square. Let (t, u)
called theta series. If we denote by A(n) the be an integral solution of +Pell’s equation t* -
number of integral solutions of the equation Du* = +4. Then the units of the form Q(x, y)
Q(x, , , x,) = II, we have = ax2 + bxy + cy* with discriminant D are
given by
F(z, Q) = t .4(n)e2ni"z. -cu
n=O + (t-bu)P
-
( au (t+bu)/2 >.
Moreover, if m = 2k, we have the following
transformation formula: Let (to, ue) be the smallest positive integral
solution of t* -Du’ = 4, put eD =(t, + u,fi)/2,
= ~(d)(cz + d)kF(z, Q,, and let h, be the class number in I:he finer
sense of the forms of discriminant D. Then the
where ~1,b, c, d are integers such that ad - bc following formula holds (Dirichlel.):
= 1, c z 0 (mod N), N is a natural number
determined by Q, and E is a character mod N.
In other words, F(z, Q) is a tmodular form
with respect to the tcongruence subgroup of where (D/n) is the +Kronecker symbol (here D
level N. Using the theory of modular forms, E. =f2d;weput(D/n)=Oif(f,n)#1,and(D/n)
Hecks showed that A(n) = A,(n) + O(nk/*), = (d/n) if (J n) = 1). For D < 0, the order w, of
where .4,(n) is a number-theoretic function of the units is known: it is 6 if D = -3; 4 if D =
n determined by the genus of Q. -4; and 2 otherwise. We also have

M. Binary Quadratic Forms with Integral


Coefficients
With respect to the numbers h, and .sD, little
Now put m = 2. Given a form Q(x, y) = ax* + else is known.
bxy + cy*, we put D(Q) = b* -4ac and call it
the discriminant of Q (i.e., D(Q) = -4A(Q)). Q is
said to be primitive if (a, b, c) = 1. When D(Q) References
is not a square, the theory concerning Q is
closely related to the arithmetic theory of the [1] B. W. Jones, The arithmetic theory of
iquadratic field Q(3) = k. Let d be the +dis- quadratic forms, Wiley, 1950.
criminant of k and put D=df*. When f= 1, [2] N. Bourbaki, Elements de mathematique,
there is a one-to-one correspondence between Algebre, ch. 9, Actualites Sci. Ind., 1272a,
the tideal classes of k and the classes of qua- Hermann, 1959.
dratic forms with discriminant d (when D < 0, [3] M. Eichler, Quadratische Formen und
we consider the classes of positive definite orthogonale Gruppen, Springer, 1952.
forms). The correspondence is given in the [4] 0. T. O’Meara, Introduction to quadratic
following manner: If a is an ideal in k with a forms, Springer, 1963.
basis x, , x2, then the corresponding form is [S] E. Witt, Theorie der quadratiljchen For-
given by Q(x,y)= N(a)-‘N(a,x+a,y), where men in beliebigen Kiirpern, J. Reine Angew.
N is the absolute +norm. If f > 1, we must Math., 176 (1937), 31-44.
replace the ring of integers o by the +order of [6] C. L. Siegel, Uber die analytische Theorie
the tconductor ,f That is, if we consider the der quadratischen Formen I, II, III, Ann.
ring formed by the elements x +fjiw (x, y are Math., (2) 36 (1935), 527-606; 37 1:1936), 230-
rational integers; the meaning of w is ex- 263; 38 (1937), 212-291 (Gesammelte Abhand-
1297 349 c
Quadratic Programming

lungen, Springer, 1966, vol. 1,326-405,410- When D is nonnegative definite, any feasible
443,469-548). solution of the above system of equalities gives
[7] C. L. Siegel, On the theory of indefinite an optimal solution of the primary problem
quadratic forms, Ann. Math., (2) 45 (1944), (Q), and when D is positive definite, the solu-
577-622. (Gesammelte Abhandlungen, tion is unique. When D is not nonnegative
Springer, 1966, vol. 2,421-466.) definite, the optimal solution of(Q), if it exists,
[S] E. Hecke, Analytische Arithmetik der is one of the feasible solutions of(C). The last
positiven quadratischen Formen, Kgl. Danske line of(C) implies that the solution must be a
Videnskab. Selskab, Mat.-Fys. Medd., XIII, 12 basic solution of the linear system of equalities,
(1940). (Mathematische Werke, Vandenhoeck and it also restricts the possible combinations
& Ruprecht, 1959,789-918.) of the basic variables. Since there exist only a
[9] G. L. Watson, Integral quadratic forms, finite number of possible combinations of the
Cambridge Univ. Press, 1960. basic variables, the quadratic programming
[lo] T. Tamagawa, Adtles, Amer. Math. Sot. problem can be solved in a finite number of
Proc. Symp. Pure Math., 9 (1966), 113-121. steps, if it has an optimal solution.
For binary quadratic forms,
[ 1 l] P. G. L. Dirichlet, Vorlesungen tiber
B. Duality
Zahlentheorie, Vieweg, fourth edition, 1894
(Chelsea, 1969).
The dual problem of(Q) is the following.
(QD) Minimize w = b’y + +x’Dx under the
condition A’y + Dx > c and x > 0, y 2 0.
If D is nonnegative definite, the following
349 (X1X.3) theorem holds.
Quadratic Programming Theorem: If the primary problem (Q) has a
solution x = x*, then the dual problem has a
A. Problems solution x = x* and y = y*, and max z = min w.
A more general form of the quadratic pro-
A quadratic programming problem is a special gramming problem can be given as follows.
type of mathematical programming (- 264 (Q) Maximize z = c’x -$x’Dx under the
Mathematical Programming) where the objec- condition XE V and b-Axe W, where V and
tive function is quadratic while the constraints Ware closed convex cones in R” and R”,
are linear. A typical formulation of the prob- respectively.
lem is as follows. Then the dual problem is expressed as
(Q) Maximize z = c’x -4x’Dx under the follows.
condition Ax < b and x > 0, x E R”. (QD) Minimize w = b’y - +x’Dx under the
Let the Lagrangian form for this problem be condition XE V, YEW* and A’y+ Dx-CE V*,
where V* and W* are the dual cones of V and
q(x, 1) = c’x -)x’Dx + I’(b - Ax). W.
Then, from the general properties of Lagran- The above theorem holds for both (Q) and
gian forms, the following theorem can be (QW
proved.
Theorem: If x =x* is an optimal solution of C. Algorithms
the problem (Q), there exists a vector 1* satis-
fying the conditions Various algorithms have been proposed for
-Dx*+c<L*, b*>O; quadratic programming [ 1,2,4], most of
which are based on condition (C). Wolfe [4]
b’l* = (c - Dx*)‘x*. proposed a method based on the simplex
Moreover, if the matrix D is nonnegative de- method for linear programming. If we intro-
finite, the above conditions are also sufftcient duce the artificial vectors 5 and q, we can find
for x =x* to be optimal. The second condition a feasible solution of(C) by solving the follow-
can be shown to be equivalent to ing linear programming problem.
(LQ) Maximize z = -l’G- l’q under the
b*‘(b-Ax*)=0 and x*‘(A’)L*+Dx*-c)=O. condition that
By introducing the slack vectors u > 0 and Ax+u-k=b, A’y+Dx-v+q=c;
v > 0, the conditions can be expressed as
x20, y>o, uao, v>o, 520, q>o;
x20, yao, u>o, v>o;
y’u=O, x%=0.
(C) Ax+u=b, A’y+Dx-v=c;
(LQ) can be solved by applying the simplex
y’u=O, x%=0. algorithm with the only modification being
349 Ref. 1298
Quadratic Programming

that the last line of the condition restricts the B. Classification


possible changes in the basic variables. When
D is positive definite, we can always obtain The subset defined by equation (111may be
a solution if there is a feasible solution of empty; for example, x2 + y2 + z2 + 1 = 0. In this
the original problem, and Wolfe proposed a article, we consider only quadric surfaces that
moditication of the foregoing algorithm for the are not the empty set. When a quadric surface
case when D is nonnegative definite that tells F without tsingular points has a center or
whether or not it has an optimal solution, and centers, we say that F is central.
gives it if it has one. Some other algorithms are If we choose a suitable rectangular coordi-
also effective when D is positive or nonnega- nate system, the equation of a central quadric
tive definite, but when D is not nonnegative surface is written in one of the following forms:
definite, no simple effective method has been
found to reach the optimal solution even when (2)
its existence has been established.

(3)
References
(4)
[l] E. M. L. Beale, On quadratic program-
ming, Naval Res. Logistic Quart., 6 (1959)
221-243. (5)
[2] C. Hildreth, A quadratic programming
procedure, Naval Res. Logistic Quart., 4 x2 y2
(1957), 79-85. a2 bd (6)
[3] W. S. Dorn, Duality in quadratic pro-
gramming, Quart. Appl. Mat., 18 (1960) 1555 (7)
162.
[4] P. Wolfe, The simplex method for quadra-
When the equation takes the form (2), (3), (4),
tic programming, Econometrica, 27 (1959),
(5), or (6), we call the quadric surface an ellip-
382.-398.
soid, hyperboloid of one sheet, hyperboloid of
two sheets, elliptic cylinder (or elliptic cylin-
drical surface), or hyperbolic cylinder (hyper-
bolic cylindrical surface), respectively. When
the equation takes the form (711,the surface
350 (VI.1 0) coincides with a pair of parallel planes. If a =
Quadric Surfaces b in (2), (3), (4), or (5) the surface is a tsurface
of revolution with the z-axis as the axis of revo-
lution. In this case, we call the surface an
A. Introduction ellipsoid of revolution, hyperboloid of revolution
of one sheet, hyperboloid of revolution of two
A subset F of a 3-dimensional Euclidean space sheets, or circular cylinder (or circular cylin-
E3 is called a quadric surface (surface of the drical surface), respectively. If a = b = c for an
second order or simply quadric) if F is the set of ellipsoid of revolution, then t ne surface is a
zeros of a quadratic equation C(X, y, z)=O, sphere with radius a.
where the coefficients of G are real numbers. If we choose a suitable recl.angular coordi-
The equation G(x, y, z) =0 is written as nate system, the equation of a noncentral
surface of the second order is, written in one of
ax2+hy2+cx2+d+2fyz+2yzx+2hxy+2f’x
the following forms:
+ 2y'y + 2h’z = 0. (1)

In general, a straight line intersects a quadric (8)


surface at two points. If it intersects the sur-
face at more than two points, then the whole
(9)
straight line lies on the surface. Suppose that
we are given a quadric surface and a point 0.
Suppose further that we are given a straight (10)
line passing through the point 0 and intersect-
ing the quadric surface at two points A and A’. 1 When the equation takes the form (8), (9), or
If A0 = OA’ for all such straight lines, then the (lo), we call the surface an elliptic paraboloid,
point 0 is called the center of the quadric hyperbolic paraboloid, or parabolic cylinder (or
surface. parabolic cylindrical surface), respectively. If a
1299 350 D
Quadric Surfaces

= b in (8), the surface is called an elliptic para- quadric cones


boloid of revolution.
Among these, (2), (3), (4), (8), and (9) are ~+y’-~=,
a2 b2 c2 (3’)
sometimes called proper quadric surfaces, and
the others degenerate quadric surfaces. and
Equations (2)-( 10) are called the canonical
forms of the equations of these surfaces (a, b, c -x’-y’+“=o (4’)
in canonical forms should not be confused with a2 b2 c2
a, b, c in (1)). The planes x=0, y=O, and z=O
asymptotic cones of (3) and (4), respectively.
in surfaces (2), (3), and (4) and the planes x = 0
and y = 0 in surfaces (8) and (9) are called
principal planes of the respective surfaces; and C. Poles and Polar Planes
lines of intersection of principal planes are
called principal axes. For a surface of revo- Suppose that we are given a straight line S
lution, positions of principal planes and prin- passing through a fixed point P not contained
cipal axes are indeterminate. We call a, b, c in in a quadric surface F, and S intersects the
equations of canonical form the lengths of the surface at two points X, Y. The locus of the
principal axes, or simply the principal axes. If point Q that is the tharmonic conjugate of P
F is a hyperboloid of one sheet or a hyperbolic with respect to X and Y is a plane. We call this
paraboloid, there are two systems of straight plane rc the polar plane of P with respect to the
lines lying on F; two straight. lines belonging quadric surface F, and P the pole of the plane
to the same system never meet (and are not rc. If the polar plane of a point P contains a
parallel), and two straight lines belonging to point Q, then the polar plane of Q contains P.
different systems always meet (or are parallel). In this case, we say that the two points P and
If F satisfies (3), these systems of straight lines Q are conjugate to each other with respect to
are given by the quadric surface. When the point P is on
the quadric surface, the tangent plane at P is
regarded as the polar plane of P. If the polar
plane (with respect to a quadric surface) of
each vertex of a tetrahedron is the face corre-
sponding to that vertex, we call this tetra-
hedron a self-polar tetrahedron. If the polar
If F satisfies (9), then two such systems are planes (with respect to a quadric surface) of
given by four vertices of a tetrahedron A are four faces
of a tetrahedron B, the same property holds
when we interchange A and B. We say that
such tetrahedrons are polar tetrahedrons with
respect to the quadric surface. Suppose that we
are given a quadric surface and two planes. If
the pole (with respect to the quadric surface) of
(1 and ,U are parameters). We call these straight one plane is on the other plane, these two
lines generating lines of the respective surfaces. planes are said to be conjugate with respect to
A hyperboloid of one sheet and a hyperbolic the surface.
paraboloid are truled surfaces described by When we are given two tpencils of planes in
these generating lines. tprojective correspondence, the locus of lines
When a quadric surface has singular points, of intersection of two corresponding planes is
they are double points. The set of double generally a hyperboloid of one sheet or a
points of a quadric surface F is either a single hyperbolic paraboloid. In particular, if the
point 0, a straight line I, or a plane rc. In the axes of these pencils of planes intersect, the
second case, F consists of two planes passing locus is a quadric conical surface, and if the
through 1 or I itself, and in the third case, F axes are parallel, the locus is a quadric cylin-
coincides with rc. In the first case, we say that drical surface (i.e., an elliptic or hyperbolic
F is a quadric conical surface (or quadric cone) cylinder). When there exists a projective corre-
with vertex 0. Its equation is written in the spondence between two straight lines not on a
form Ax’ + By’ + Cz* = 0 (ABC # 0). When A, plane, the locus of lines joining corresponding
B, C are of the same sign, F consists of only points is a quadric surface (M. Chasles).
one point 0. Otherwise, we can assume that A,
B>O, C= -1. In this case, if A=B, F is called D. Surfaces of the Second Class
a right circular cone, and if A #B, F is called
an oblique circular cone. A surface F in E3 is called a surface of the
Given hyperboloids (3) and (4), we call the second class if it admits two tangent planes
350 E 1300
Quadric Surfaces

passing through an arbitrary straight line L and satisfy


provided that F fl L = 0. This surface can be
represented as the set of zeros of a homoge-
$=& ~-jk
neous equation of the second order in +plane
coordinates ui, u2, us, u4. It is possible that a
surface of the second class degenerates into a
conic or two points. In general, quadrics are
surfaces of the second class, and vice versa. If Pi, P2; Qi, Q2 are corresponding points,
As in the case of quadrics, we can define then Pi Q2 = P2Q, (J. Ivory).
poles, polar planes, and polar tetrahedrons
with reference to surfaces of the second class. F. Circular Sections
Four straight lines joining corresponding
When the intersection of a plane and a quadric
vertices of two tetrahedrons polar with respect
is a circle, the intersection is called a circular
to a surface of the second class are on the same
section. In general, circular sections are cut off
quadric. We say that two such tetrahedrons
by two systems of parallel planes 1:hrough a
are in hyperboloid position.
quadric. The point of contact on the tangent
plane parallel to these is an +umbilical point of
E. Confocal Quadrics the quadric.

A family of central quadrics represented by the


following equations is called a family of con- G. Quadric Hypersurfaces
focal quadrics: A subset F of an n-dimensional Euclidean
2+yz+;2=1 space E” is called a quadric hypersurface (or
a>b>c>O, (11)
a+k b+k cfk ’ simply hyperquadric) if it is the set of points
(x, , ,x,) satisfying the following equation of
where k is a parameter. For a quadric belong-
the second degree:
ing to this family, any point on the ellipse
x’/(a - c) + y’/(b -c) = 1, z = 0 or the hyperbola
x’/(a-b)-z’/(b-c)=,l, y=O is called a focus.
This ellipse and hyperbola are called focal where aik, bi, c are all real numbers. We can
conies of the quadric. assume without loss of generality that the
Given an ellipsoid F and a point X(x, y, z)
matrix A = (aik) is symmetric. Assume that A is
not contained in the principal plane, we can not a zero matrix. In the case n = 2, F is a
draw three quadrics F’, F”, F”’ passing through conic, and in the case n = 3, it is a quadric
X and confocal with F. These surfaces F’, F”,
surface. The theory of classification of quad-
F”’ intersect each other and are mutually per- ric surfaces can be generalized to t.he n-
pendicular. One of them is an ellipsoid, an- dimensional case as follows: Let r(A*) = r* be
other one a hyperboloid of one sheet, and the the rank of the (n + 1) x (n + 1) matrix
third a hyperboloid of two sheets. Let k,, k,,
a,, .

I b,1
k, be the values of the parameter k in (11) al, b,
corresponding to these three surfaces. Then A*= ...
the coordinates x, y, z of the point X are given a,, . .. arm bn
b, . b, c
by
x= (a+W(a+k2)(a+k3)
J-
y=
J

z= (c+Wc+Mc+k,)
(b-a)(c-a)

-(b+Mb+Mb+k,)
(a-b)(c-b)


=

[ 1
b,
A

. . b,
;
bn ’
c

and put r(A) = r. Then we have the following


three cases: (I) r = r*; (II) r + 1 = r*; and (III) r +
J (a-c)(b-c) 2 = r*. Corresponding to each case, equation
(12) can be simplified (by a coordinate trans-
We call k,, k,, k, the elliptic coordinates of the
formation in E”) to the following canonical
point X.
forms, respectively:
Two points (x, y, z), (x’, y’, z’) are called corre-
sponding points if they belong to confocal (I) i liXF =o,
quadrics of the same kind, i=l

x2 y2 22
;+r+;= 1, (II) i liXf + 1 =o,
i=l

r2 r2
(x)+(4.)+(21)2=1,
(III) i &XT + 2x,+i = 0,
a+k b+k c+k i=l
1301 350 Ref.
Quad& Surfaces

where (A,, . . . . 1,,0, . . . . 0) (with n-r zeros) is transformation of coordinates in E”. If we


proportional to the teigenvalues of the matrix regard E” as an n-dimensional taffrne space
A. In general, we have 1 < r < n. In the cases over the real number field and reduce (12) to
where r = n in forms (I) and (II) and r + 1= n in the simplest form by a coordinate transforma-
(III), the hypersurface is called a properly (n - tion in the affme space, we have the following
l)-dimensioaal quadric hypersurface, and in canonical forms corresponding to cases (I), (II),
other cases,a quadric cylindrical hypersurface. and (III) discussed in Section G:
In cases (I) and (II), the quadric cylindrical
hypersurface is the locus of (n - r)-dimensional (I) (S&ix:- i x;=o,
i=l j=s+l
subspaces passing through each point of a
properly (r - l)-dimensional quadric hypersur-
(II) w$x:- i x:+1=0,
face and parallel to a fixed (n - r)-dimensional j=s+l
subspace. In case (III), the quadric cylindrical
hypersurface is the locus of (n - r - l)-dimen- (III) (S,&- i x;+2x,+1=o,
i=l j=s+l
sional subspaces passing through each point
of a properly r-dimensional quadric hypersur- where 0 <s < r and r-s = t. The terms properly
face and being parallel to a fixed (n-r - l)- (n - l)-dimensional, cylindrical, conical, para-
dimensional subspace. For form (I) with li > 0 bolic, elliptic, and hyperbolic can be defined in
(i= 1, . . . , n), a properly (n - 1)-dimensional terms of this affine classification. For example,
quadric hypersurface reduces to a point in E”; a cone is of type (I), a parabolic hypersurface is
for form (II) with Izi> 0 (i = 1, . . . , n) it becomes of type (III), an elliptic hypersurface is of type
the empty set. Suppose that we are given a (II) (0, r), a hyperbolic hypersurface is of type
quadric hypersurface F that is neither a point (II) (s, t) (s, t > 0), and type (II) (s, 0) represents
nor empty. Then the system {A,, .. . , A,} as- the empty set. A necessary and sufficient con-
sociated with F in its canonical equation is dition for a (nonempty) hypersurface to be
unique up to order (and signature in form represented by two canonical forms N(s, t),
(III)). Suppose that we are given a quadric N’(s’, t’) is that (i) N = N’ and (ii) if N =(I) or
surface F and a point P on F. Suppose further N = (III), then s = s’, t = t’ or s = t’, t = s’, and if
that if a point X other than P is on F, then the N = (II), then s = s’, t = t’.
whole straight line PX lies on F. In this case,
the hypersurface is called a quadric conical
I. Quadric Hypersurfaces in a Projective Space
hypersurface (or simply quadric cone). For
example, for case (I), we can take P = 0 (the
Suppose that we are given a field K of charac-
origin), and the hypersurface is a quadric cone.
teristic not equal to 2 and an n-dimensional
In cases(I) and (II), the hypersurface is sym-
tprojective space P” over K. A subset F of P” is
metric with respect to the origin. In these
called a quadric hypersurface (or simply hyper-
cases,a hypersurface is called a central quadric
quadric) if F is represented by a homogeneous
hypersurface, and in case (III), it is called a
equation of the second degree &=,,aikxixk
noncentral quadric hypersurface or parabolic
= 0, where (x,, x1, .. . , x,) are homogeneous
quadric hypersurface. When we cut a parabolic
coordinates in P” and aik E K, A = (aik) is a
quadric hypersurface by a (2-dimensional)
nonzero symmetric matrix. The problem of
plane containing the x,+,-axis, the section is a
classifying such surfaces is reduced to that of
parabola. If Izi< 0 (i = 1, . .. , r) in form (II), then
tquadratic forms or, equivalently, to that of
the surface is called an elliptic quadric hyper-
symmetrix matrices in K. Two symmetric
surface, and if there are both positive and
negative numbers among the Li, the surface is matrices A and B are equivalent if there exists
a regular matrix T such that B =‘TAT (- 348
called a hyperbolic quadric hypersurface. The
section of an elliptic quadric hypersurface by a Quadratic Forms). In particular, when K is an
talgebraically closed field or a treal closed
plane is always an ellipse. The section of a
field, a simple result is obtained. If K is an
hyperbolic quadric hypersurface by a plane is
algebraically closed field, then the equation of
an ellipse, a hyperbola, or two straight lines. In
the quadric hypersurface is reduced to the
general, the section of a quadric hypersurface
canonical form &, xf = 0, where r = r(A) = the
by a subspace is a quadric hypersurface on
that subspace. rank of A. When K is a real closed field, then
the canonical form is xi=,, x? - &+i xj’ = 0.

H. Quadric Hypersurfaces in an Affine Space


References
In Section G we considered a quadric hyper-
surface defined by (12) in an n-dimensional [l] G. Salmon, A treatise on the analytic geom-
Euclidean space E” and transformed the equa- etry of three dimensions, Hodges, Figgis &
tion to canonical form by an orthogonal Co., seventh edition, 1928 (Chelsea, 1958).
351 A 1302
Quantum Mechanics

[2] G. Salmon and W. Fiedler, Analytische tor onto the eigenspace spanned by teigen-
Geometrie des Raumes I, Teubner, fourth vectors belonging to the eigenvalue a,. Sup-
edition, 1898. pose that A =Zna,P,. Then the hypothesis
[3] H. F. Baker, Principles of geometry III, on measurement in quantum mechanics is
Solid geometry, Cambridge Univ. Press, 1922. given as follows.
[4] 0. Schreier and E. Sperner, Einfiihrung in When an observable A is observed in a state
die analytische Geometrie und Algebra II, +, one of the eigenvalues a, is found with
Vandenhoeck & Ruprecht, 1951; English probability proportional to (II/, P.$)‘. When an
translation, Projective geometry of n- eigenvalue a, is.once observed, a :state jumps
dimensions, Chelsea, 1961. from $ to an eigenstate P,,t+bwhich belongs to
[S] M. Protter and C. Morrey, Analytic geom- the eigenvalue a,. Quantum mechanics pre-
etry, Addison-Wesley, 1975. dicts only a probability pn with which a certain
value a, is found when an observable A is
observed. This probability, given by (+, P,,$),
is not changed even if $ is replaced by e’?$,
351 (Xx.23) 0 < 0 < 27~. Therefore ei”$ represents the same
Quantum Mechanics state as $. The set of e”$, 0 <B < 27t, for a
fixed ti (11$/I = 1) is called a unit ray.
A. Historical Remarks If Pn is 1 -dimensional and P,,cp := cp, //cpII = 1,
then ($, Pn$) = I($, (p)l’ is called the transition
+Newtonian mechanics (classical mechanics) probability between the two states.
successfully explained the motion of mechan- The expectation value (or expectation) of an
ical objects, both celestial and terrestrial, on operator A in a state $, usually nsormalized to
a macroscopic scale. It failed, however, to f,“‘p=,l, is defined to be (A)=(I), A$)=
explain blackbody radiation, which was dis- n% n
covered in the last decade of the 19th century. A general self-adjoint operator A can be
M. Planck introduced a hypothesis of discrete written as A = JA dP(i). When A is observed in
energy quanta, each of which contains an a state $, the probability for a value to be
amount of energy E equal to the frequency of found between /I, and 1, (&>A,: 1, included
the radiation v multiplied by a universal con- and I, excluded if P(L) is right continuous) is
($(P(I,)-P(i,)), $) (- 390 Spectral Analysis
stant h (called Planck’s constant). He applied
this hypothesis to derive a new formula for of Operators).
radiation that gives predictions in good The quantity (cp, All/) is called the matrix
agreement with observations. A. Einstein element of A between the two states cp and $.
proposed the hypothesis of the photon as a A state $ can be viewed as a functional $(A) =
particlelike discrete unit of light rays. Assum- (A) on the set of all observables A (its value
being the expectation), which is linear in A,
ing that many physical quantities, including
positive in the sense $(A*A)>O for any opera-
energy, have only discrete values, N. H. Bohr
tor A, and normalized: $( 1) = 1. If 0 < 1< 1 and
explained the stability of electronic states in
$(A)=@,(A)+(l-1)$,(A) for all obser-
atoms. As illustrated in these examples, quan-
vables A, then the state Ic, is called a mixture of
tum mechanics is applied to study the motion
states $i and ti2 with weights /z and (1 - 1). If
of microscopic objects, including molecules,
a state is not a nontrivial (i.e., I # 0, 1, $i # ti2)
atoms, nuclei, and elementary particles.
mixture, it is called pure. The state (A) =
(+, All/) on the set of all self-adjoint operators
B. Quantum-Mechanical Measurement A given by a vector $ is pure in this sense. If
sup, $(A,) = +(A) whenever A, is an increasing
Fundamental differences between the new net of positive operators with A as its limit,
mechanics and classical mechanics are due then + is called normal. Any normal positive
to the facts that many physical quantities, linear functional on the set of all self-adjoint
for example, energy, can take only discrete operators can be described by a trace-class
values in the microscopic world, and that positive operator p, called the density matrix,
states of microscopic objects are disturbed as (A) = tr(Ap). If {&} is a complete ortho-
by observation. normal set, where each & is an eigenvector of
A (pure) state at a certain time is expressed p belonging to the eigenvalue A,,, then (A) =
by a unit vector $ in a +Hilbert space X, and ~ CnUh Atin).
observables, or physical quantities, are ex-
pressed by +self-adjoint operators in such a C. Canonical Commutation Relations
space.
Let a, (n = 1,2, . . . ) be teigenvalues of an In quantum mechanics, canonical variables
observable A, and let P,, be a tprojection opera- are represented by the self-adjoint operators
1303 351 D
Quantum Mechanics

Qk (coordinates) and Pk (momenta), k = 1, . . . , D. Time Evolution and the Schriidinger


N, which satisfy the canonical commutation Equation
relations
The time t of an observation is fixed in the
foregoing discussion. A state changes, how-
where 1 is the identity operator, [A, B] de- ever, as the time t changes, in such a way that
notes the commutator AB - BA, h = h/(24, and the transition probability between states is
the relation is supposed to hold on a certain preserved. By Wigner’s theorem (- Section
dense domain of vectors. Self-adjoint opera- H), this time evolution of states can be im-
tors Qk and Pk satisfying the above relations plemented by unitary operators U, defined by
are unique up to quasi-equivalence under a the transformation of vectors $-U,$= $,.
suitable domain assumption, e.g., if &Q: + Furthermore, under some continuity assump-
C, Pt is essentially self-adjoint on a dense tion, such as that of (cp, U,$), U, can be made a
domain invariant under multiplication of the continuous one-parameter group. By Stone’s
Q’s and P’s and on which the above relations theorem, U, = eeiHtlh for a self-adjoint operator
are satisfied. Under such an assumption, Qk H. This operator is called the Hamiltonian
and Pk are unitarily equivalent to a direct operator (or simply Hamiltonian) determined
sum of the Schriidinger representation on by the structure of a system. An infinitesimal
L,(RN,dx, . . . dx,), where Qk is multiplication change in $ corresponding to an infinitesimal
by the kth coordinate xk and Pk is the differen- change in t can be generated by this operator
tiation - ih(3/3x,) (Rellich-Dixmier theorem). H as follows:
The above Schrodinger representation
is called the position representation (or q-
representation). The formulation using the
function space L, of real variables pk, k = This equation is called the time-dependent
1,2, . . , N, on which the operators Pk act as Schriidinger equation.
multiplications by pkr is called the momentum A state $ changes but observables do not
representation (or p-representation). change with time in the Schriidinger pic-
If Hermitian operators A and B satisfy the ture above. The other picture, known as the
canonical commutation relations in the form Heisenberg picture, is equally possible. In
(A$, B$)-(B$, A$) = ih(lC/, +), then the follow- this picture, the state is expressed by a time-
ing Heisenherg uncertainty relation holds for independent vector, while operators A vary
the expectation: with time as follows: A+ U** AU, = A(t). Rates
of change of operators A(t) can be calculated
by means of the equation

This gives the uncertainty in observations, F=fCH, A(t),,


which means that two observables A and B
cannot simultaneously be observed with ac- which is called the Heisenherg equation of
curacy. This is another important property of motion. When time t changes, the expectation
microscopic motion that cannot be found in value of an operator A in a state II, changes in
macroscopic motion. both pictures according to
In a direct sum of the Schrddinger repre-
d(A)/dt=i([H,A])/h.
sentation of the canonical commutation re-
lations, the unitary operators According to classical analytical dynamics, a
U@)=expi~a,Q,, V(b)=expiCb,P, change of a dynamical quantity that is a func-
k k
tion of tcanonical variables qi (positions) and
with real parameters uk and 4, k = 1, . . , N, pi (momenta) is given by
satisfy the following Weyl form of the canon- dA/dt = -(H, A),
ical commutation relations:
where H is a THamiltonian function and the
U(a)U(a’)= U(a+a’), V(b) V(b’) = V(b + b’), parentheses ( , ) denote the tPoisson bracket.
A replacement of the Poisson bracket (A, B) by
U(a)V(b)= V(b)U(a)exp -iCa,b, . [A, B]/ih transforms this classical equation
( k > into the quantum-mechanical equation above.
Conversely, any pair of families of unitary It should be noticed that the mathematical
operators U(a) and V(b) satisfying these rela- structure of the Poisson bracket is similar to
tions and depending continuously on para- that of commutator. In this transition from
meters a and b are unitarily equivalent to classical to quantum mechanics the corre-
those obtained as above (von Neumann unique- spondence principle can be used. This requires
ness theorem). that the laws of quantum mechanics must lead
351 E 1304
Quantum Mechanics

to the equations of classical mechanics in the turn system of s particles with the given inter-
classical situation, where many quanta are action potential V. If V satisfies an estimate
involved and h can be regarded as infinites- 11V$jj <illH,,ll/l] +p]/$ll (called the Kato per-
imally small in the commutation relation. turbation on Ho) for some nonnegative 1< 1
The correspondence principle suggests that and p > 0 and for all $ in a dense domain on
Hamiltonian operators in quantum mechanics which If,, is essentially self-adjoin& where Ho
can be obtained from Hamiltonian functions denotes the Schrodinger operator with V=O
H(p,, qk) of canonical variables pk and qk in (called the free Hamiltonian or the kinetic
classical mechanics after replacing pk and qk by energy term), then Ho + V is essentially self-
the operators Pk and Qk in the Schriidinger adjoint on the same domain. For the case
representation (up to uncertainty of about the where V consists of Coulomb interactions
order of operators). This process of moving between electrons and Coulomb potentials
from canonical variables and the Hamiltonian on electrons by fixed nuclei, for example,
function in classical mechanics to canonical such an estimate and hence the essential self-
operators and the Hamiltonian in quantum adjointness of Hamiltonians for atoms and
mechanics is called quantization. Taking a molecules were established first by T. Kato
system of s particles and letting xk, y,, and zk (Trans. Amer. Math. SOL, 70 (195 1)).
be the Cartesian coordinates of the kth parti- For a l-body Schrodinger operator (or 2-
cle, we usually write the equation of motion as body Schrijdinger operator after the center
of mass motion has been separated out), the
a* -rfl gAk
ih---= point spectrum is that of the particle trapped
at by the potential, and the state represented by
its eigenvector is called a bound state. The
+ ~(X,,Y,,Zl, . . ..~.,Y,,Z,) *> eigenvalue is nonpositive for a reasonable class
>
of potentials V (for example, if V(x) (xER~) is
which is a second-order partial differential continuous and 8(1x)-‘-‘) as J:cJ+oo for some
equation. Here rn,, is the mass of the kth par- E> 0), and its absolute value is called the bind-
ticle; Ak is the tLaplacian of xk, ykr and zk; and ing energy. The eigensolutions of the Schrii-
V is a real function called the potential energy. dinger equation are what have been called
This equation is the time-dependent Schriidin- stationary states above.
ger equation. The partial differential operator There are also stationary solutions that
on the right-hand side is called the (s-body) do not correspond to the point spectrum and
Schriidinger operator and $(x1, y,, z,, . , hence are not square integrable. They are used
x,, y,, z,, t) is called a wave function. The prob- in the stationary methods of scattering theory
ability of finding a particle in the volume (- 375 Scattering Theory).
dx,dy,dz, bounded by xkr xk + dx,, y,, yk + dy,,
and zk, z,+dz, is proportional to ($(x,,y,,z,,
. . ..x 83y s1z S’ t)l*. Usually \+I’ is normalized E. Some Exact Solutions for t:he l-Body
so that its integral over the whole space is 1. Schradinger Equation
We sometimes call $ the probability ampli-
tude. When $ is given by e--iEf’h+I, . . . ,zJ, (1) Harmonic oscillator. First consider the case
the expectation value of an operator A in a in which the space is of 1 dimension, so that
state $, (A)=j+*A$dx, . ..dz., does not de- the Laplacian A is (d/dx)‘. Let m be the mass
pend on time. When this is the case, II, is of the particle and V(x) = ma? x2/2 for a posi-
called a stationary state. tive constant w (called the angular frequency).
A real value E and a function &,, . . , ZJ The Hamiltonian
are found by solving an teigenvalue prob-
H = -(h2/2m)(d/dx)’ + mw2.c2/2
lem H~J = Ep. This equation is the time-
independent Schriidinger equation, which is a has simple eigenvalues
second-order partial differential equation.
E, = hw(n + (l/2)), n=0,1,2 )...,
Since the Hamiltonian H stands for the energy
of this system, the eigenvalues E are the energy with a complete orthonormal set of eigen-
values that this system can take. functions
When a potential function V is given, it is
&,(x)=c,H,(q)e-q2’2, q=(mw/h)1’2x,
a nontrivial matter to prove that the (s-body)
Schrodinger operator with the given V is es- where H,(q) is a tHermite polynomial and c, is
sentially self-adjoint on the set of, for example, the normalization constant:
all P-functions with compact supports so
rm
that its closure H defines mathematically the H,(q)= 1 (-l)kn!(2q)“-2k/{(n-2k)!k!},
k=O
continuous one-parameter group of unitaries
U t =e-ietih for the time evolution of the quan- c, = {22”(n!)2nh/(mw)}-114,
1305 351 F
Quantum Mechanics

When the space is of r dimensions, n in E, is polynomial


replaced by n, + . . . + n, with nonnegative
integers n,, . . , n,, and the corresponding eigen-
function is n,l=, Il/,,(xj).
(2) One-dimensional square-well potential. The eigenvalue is determined by n, and its
Let l’(x)=VforIxl,<a/2and V(x)=Oforlxl> multiplicity is n2, corresponding to the differ-
a/2 (x ER). If V>O, there are no point spectra. ent possible values of 1 and m.
If V<O and

F. Path Integrals
then there are N eigenvalues (N = 1,2,. . .)
obtained as the roots E < 0 of one of the fol- R. P. Feynman (Rev. Mod. Phys., 20 (1948))
lowing equations: has given the solution of the Schrijdinger
equation as an integral of eiLih over all possible
{(V/E)- l}1~Z=tan{a(-2mE)1~2/(2h)}, paths q(t), where L = L(q, 4) (4 = (d/dt)q(t)) is
{(V/E)-l}“*= -cot{a(-2mE)“*/(2h)}. the classical Lagrangian for the Hamiltonian
system. This integral is called the Feynman
(3) Separation of angular dependence for path integral. Mathematical reformulation of
central potential. If V(x) (x E R3) depends only the formula in terms of the Wiener measure
on r = 1x I= (zkl (xj)*)“’ (called a central poten- has been given by M. Kac (Proc. 2nd Berkeley
tial), then all eigenvalues E and a basis for Symp. Math. Statist. Probability, 1950; Proba-
eigenfunctions $(x) can be obtained in terms bility and related topics in the physical sciences,
of the polar coordinate r, 0, cp (x1 = r sin Qcos rp, Wiley, 1959).
x,=rsinesincp, x,=rcosB) as Consider the l-body Schriidinger operator
H = Ho + V (form sum), where V is the sum of a
N4= k;,(R dr-‘4rb locally integrable function bounded below and
-(h*/(2m))u”(r)+{h*l(l+ 1)/(2mr) a Kato perturbation on Ho. Let b(t) (t 20) be
the Wiener process and q(t)= hb(t)/(2m)“‘. For
+ V(r)-E}u(r)=O, any L, functions f;
‘x
lItill= 14r)12dr<=b (e-‘“‘*f)(x) ’
s0
where the angular function &, is an eigenfunc-
tion of the square L* of the orbital angular
momentum L = - ix x V: for almost all x, where E denotes the expecta-
tion for the Wiener process. If V is a sum of
I;, = c,,P;t(cos O)&mr, L, and L, functions (for spatial dimension
c,,=(-1)“{(21+l)(l-m)!/(47r(l+m)!)}”Z, < 3), then the right-hand side is continuous in
x for t > 0. This is called the Feynman-Kac
m=l,l-l,...) -1+1, -1, 1=0,1,2 )..., formula.
Let Lo be the Hamiltonian for a l-dimen-
Here P;“(x) is an tassociated Legendre
sional harmonic oscillator with m = w =
polynomial:
h = 1 and I++~be the eigenfunction $o(x) =
P;“(x) = (1 - X*)+(dl+m(XZ - l)‘/dx’+“)/(2’1!). n-li4exp( - x2/2). Consider H = Lo + V (form
sum), where V is a sum of a locally integrable
The above equation for u(r) is called the radial
function bounded below and a Kato pertur-
equation. The nonnegative integer 1 is the
bation on Lo. Let q(t) (tsR) be Gaussian ran-
azimuthal quantum number, and the integer m
dom variables with mean 0 and covariance
is the orbital magnetic quantum number. The E(q(t)q(s))=2-‘exp(-It-sl) (called the oscil-
wave function $(x) with the angular depen-
lator process). For any J(x) in L,(R, $02dx)
dence &,(0, cp) is called the S-wave, P-wave, D-
(j=l ,..., n)andt,,<t,...<t,<t,+,,
wave,. . . according as I= 0, 1,2, . . . .
(4) Hydrogen-type atom. Let V(r) = -Ze’/r
(2 > 0). For each 1 and m, there are eigenvalues
- e2 .Z2/(2an2) with eigenfunctions I,&~ =
r-‘u,I(r)I;,(Q,cp), where n=l+ 1, 1+2, . . is
the principal quantum number, The above path integral formulas are closely
related to the Trotter product formula
uJr) = ~,,L~~,~::‘)(s)s’+’ e-‘/*, s = 2Zr/(na),
e-t(A+B)= lim (em tA/ne-rB’n)n (t&O),
cnl= -{(n-I-- 1)![2Z/(na(n+l)!)]3/(2n)}“2, n-m

and L:(x) is the @h derivative of the tLaguerre where A and B are self-adjoint operators
351G 1306
Quantum Mechanics

bounded below and A + B is essentially self- ber of electrons in the normal state of the
adjoint. (The same formula holds without the vacuum. The absence from the vacuum of a
bdundedness assumption when t E iR.) negatively charged electron in a negative
energy state could then be expected to mani-
fest itself as a positively charged particle (posi-
tron) with positive mass and kinetic energy. If
G. The Dirac Equation
gamma rays are absorbed to excitl: an electron
from a negative energy state into a positive
The Schradinger equation is not relativistically energy state, an electron-position pair must be
invariant. The Klein-Gordon equation created. Y. Nishina and 0. Klein calculated
the cross section of Compton scattering (the
Klein-Nishina formula) by use of the Dirac
equation and found good agreement with
is obtained by replacing pk by (h/i)a/dx, observations, thus providing evidence that the
(k = 1,2,3) and E by ih8/& in the relativistic Dirac equation is correct. The existence of
identity negative energy states, however, forces us to
give up considering the Dirac equation as
E2=m2c4+p2c2,
an equation of one electron. The positron
where c is the speed of light. Wave functions of theory is introduced, and the Dirac equation
free particles are believed to satisfy this equa- is considered as the classical field of electron
tion. P. A. M. Dirac assumed that the $ of a waves and is second-quantized (-- 311 Second
free electron is expressed in terms of a tspinor Quantization).
with four components satisfying a linear dif- The Klein-Gordon equation can also be
ferential equation that automatically implies considered to be the classical wave equation of
the Klein-Gordon equation. Relativity re- matter and can be second-quantized. Motions
quires the equal handling of space and time. of particles with zero spin, pi mesons (n) for
The Dirac equation example, obey this equation.
We can rewrite the Dirac equation as
(8) ih&+h/& = He, H = cu. p + mc’p, where yk =
,8tlk (k = 1,2,3), and y” =/I Since H cannot
(x0 = ct) satisfies these requirements. The commute with the orbital angular momentum
coefficients y” can be so determined that of an electron L = r x p = - ihr x V, L is not
every component of $ also satisfies the Klein- conserved. However, the total angular momen-
Gordon equation. Thus the yv are found to be tum J = L + (h/2)a can be conserved when c~is
4 x 4 matrices satisfying the commutation a vector whose components can be given as
relations yUyY+yVy’=2gfiY (p, v=O, 1,2,3),
(: i), where B=(i ol), ~~i(i~ :),
where g”‘=O for p#v and go’= -gkk=l (k=
1,2,3). Sixteen linearly independent matrices and the ckr called Pauli spin matrices, are given
are obtained by repeated multiplication of five
matrices, which include the four matrices y”, by ox=c A), ay=(:’ ii)> %=(:A “1>.
y’, y2, y3 and the identity matrix. Any 4 x 4 (The y’ are called Dirac’s y matrices.) The
matrix can be expressed as a linear combina-
tion of these sixteen matrices. quantity S =z c is the intrinsic angular momen-
The Dirac equation has +plane wave tum of the eI&tron, also called the spin. Many
solutions particles besides the electron, the neutron for
example have spin. The matri Y S2 = Sz + S,’ +
.S,2is diagonal and is equal to s(s+ l)h21 (I is
where the energy eigenvalues E are the identity matrix). For the electron s = l/2,
and h/2 is called the absolute value of the spin.
_+Jm2c4fp2c2.
Therefore we say that electrons are particles of
There are four independent eigensolutions u(l), spin h/2. This was predicted in the theory of
IL(=), uc3), and uc4), because u has four compo- light spectra.
nents. Two of them are of positive energy and When the speed of an electron is very small,
the other two are of negative energy. Although so that (o/c)’ can be neglected, states of the
the negative energy case is physically undesir- electron can be expressed in 1:erms of two-
able, it has to be taken into account in order component wave functions. This approxi-
to obtain a mathematically complete set. To mation is called the Pauli approximation. If
solve this difficulty, Dirac proposed the hypoth- the spin-orbit term that appears in the Pauli
esis (Dirac’s hole theory) that all the negative approximation is also neglected, these two
energy states are filled up by an infinite num- components become independent of each
1307 351 H
Quantum Mechanics

other and individually satisfy the SchrGdinger values of J* are given by J (J+ l)h*. Each J
equation. must be zero, a positive integer, or a half-
integer. Adding inversions to the pure rota-
tions, we obtain the 3-dimensional orthogonal
H. Application of Representation Theory of Lie group (- 60 Classical Groups I). Irreducible
Groups representations of this group are written as
0: , where f corresponds to the characters
A symmetry (with active interpretation) is of the inversion relative to the origin. States
a bijective mapping of pure states (repre- with + are called even states and those with
sented by unit rays in a Hilbert space) preserv- -> odd states. For example, energy levels of
ing transition probabilities between them. atoms and nuclei can be classified by 0:.
Wigner’s theorem says that any symmetry can To obtain matrix elements of observables
be implemented by either a unitary or an between two stationary states, group repre-
antiunitary mapping of the underlying Hilbert sentation theory is useful. The transformation
space as a mapping of unit rays onto them- of every observable obeys a certain rule under
selves. Furthermore a connected Lie group the transformation of coordinates. The scalar
of symmetries is implemented by unitaries, is transformed according to D:, the vector
which form a projective representation. Eigen- according to 0; , the pseudovector according
states + of the Schriidinger equation are func- to DC, and the traceless tensor according to
tions of the coordinates of each particle. Let 0:. If the transformation of an observable
these coordinates be denoted together as x. is given by DJ, then a matrix element of this
Suppose that an operator T operates on the observable between the states belonging to DJv
x, as, for example, a rotation of the coordinate and DJ,, vanishes unless the tensor product
system or a permutation of the labels of the representation DJ @ DJ, contains as a factor a
particles. If T commutes with H or H is invar- representation equivalent to DJpv.In electro-
iant under the transformation x-+x’= TX, magnetic transitions in atoms or nuclei, D, @I
then T$(x) = q(x) = $(T-‘x) satisfies the same D,.=D,.+,+D,,+D,.-,(J’>l)iftheelectric
Schriidinger equation as $, where the trans- dipole transition dominates (J = 1). This im-
formation of the function is defined by t,V(x’) = plies the selection rule J’-+J’+ 1, J’, J’- 1.
$(x). The set of transformations x-+x’= TX When J’= 0, only the transition O+ 1 is pos-
forms a tgroup {T}, and the corresponding sible. More general selection rules can be
transformations $- T$ give a (generally obtained in the same way for general multi-
infinite-dimensional) representation of this pole transitions. Representation theory is
group, which should be tunitary on L,-space useful in determining general formulas of
relative to the Lebesgue measure dx if T leaves transition strengths.
the measure invariant. There are degeneracies There is a class of particles, many of which
of the energy eigenvalues, each of which is can occupy the same state, called bosons.
equal to the dimension of the corresponding There is another class of particles, of which
representation of the group {T}. If the repre- only one can occupy a given state, called fer-
sentation for each eigenvalue is decomposed mions. For example, the electron, neutron, and
into irreducible ones, then the decomposed proton are fermions, while the photon and pi
stationary state can be labeled by an tirreduc- meson are bosons. Two identical particles,
ible representation. both of which are either fermions of the same
When H is spherically symmetric, i.e., H is kind or bosons of the same kind, cannot be
invariant under the 3-dimensional rotation distinguished. Therefore the Hamiltonian
group, states are classified by the irreducible should be invariant under permutations be-
representations DL of the trotation group (- tween identical fermions, or between identical
258 Lorentz Group). The square of the sum L bosons. A system consisting of N identical
of all orbital angular momenta has eigenvalues particles can be classified by the irreducible
L(L + l)h’, where L must be 0 or a positive representations of the tsymmetric group S, of
integer. There are 2L + 1 degenerate states, N elements. When the particles are fermions,
each of which belongs to a different M, the two of them cannot occupy the same state (this
z-component of L, where M ranges from L to law is called the Pauli principle), so that only
-L by unit steps. Even when there is an inter- totally antisymmetric states are permissible
action between the orbital angular momentum for fermions. When a system consists only of
and the spin angular momentum, states are fermions of the same kind with spin h/2 and a
labeled by the irreducible representations DJ Hamiltonian of this system does not include
of the rotation group, where J is the sum of terms depending on spins, then the wave func-
the orbital angular momentum L and the spin tions are just products of spin and orbital
angular momentum S (J = L + S) and eigen- parts. In order to make wave functions totally
351 I 1308
Quantum Mechanics

antisymmetric, orbital wave functions with a and neutrons may be taken to be invariant
total spin u/2 should be limited to those corre- under the interchange of protons and neu-
sponding to the tYoung diagram [2Nm”, l”] = trons. This invariance is called charge sym-
T(2,2, . ..) l,l). metry. Analogous to ordinary spin, isospin can
be introduced to describe the two states of nu-
cleans. The up state of the isospin corresponds
I. Polyatomic Molecules to the proton and the down state corresponds
to the neutron. Consider transformations
The group generated by the 2-dimensional belonging to the tspecial unitary group SU(2)
rotations about the axis connecting two atoms in the 2-dimensional space spanned by the
and the reflections with respect to the planes proton state and the neutron state. If a Hamil-
containing this axis is used to classify states tonian of N nucleons is invariant under any
of diatomic molecules. Stationary states can transformation belonging to SU(2), then the
be classified by the absolute value A of the eigenstate of these N nucleons is classified
angular momentum of the diatomic system by its irreducible representations I),, where
around this axis, which can be zero or a posi- T stands for the total isospin of each state.
tive integer. When A > 1, the corresponding This invariance, called isospin invariance,
state has twofold degeneracy, whereas when A holds in the nucleus and elementary particles
=O, two states (labeled by &) arise, depending if electromagnetic and weak interactions, and
on the character of the reflections. If these two possibly the interaction responsible for the
atoms are identical, the molecular states can proton-neutron mass difference, are neglected.
be classified further as even and odd according When a state of N nucleons has an isospin
to the character of reflections with respect to T= v/2, the orbital-spin wave function of this
the plane containing the center of mass and state must correspond to the Young diagram
perpendicular to the axis. The classification of [2Nm”, l”], since the isospin wave function
the spectral terms of a polyatomic molecule is multiplied by the orbital-spin wave function is
related to its symmetry, described by the set of a totally antisymmetric wave funcl:ion. If this
all transformations that interchange identical Hamiltonian is also independent of spin, it is
atoms. For example, stationary states of meth- invariant under unitary transformations in the
ane molecules CH, are classified by the irre- 4-dimensional space spanned by the four inter-
ducible representations of the group Td (which nal states of the nucleon: up and down spins,
is generated by adjoining reflection symmetry up and down isospins. Therefore the states of
to the ttetrahedral group T). Level structures N nucleons can be classified by the irreducible
of a crystal are classified by the irreducible rep- representations of the group U(4) (Wigner’s
resentations of its symmetry groups. supermultiplet theory).
In the approximation of many-body prob-
lems by means of independent particles, the
wave function of the total system is con-
K. The C*-Algebra Approach
structed by multiplying the wave functions of
the individual particles. To construct such a
wave function, it is useful to consider the re- The uniqueness of operators satisfying the
duction to irreducible parts of the ttensor canonical commutation relations (represen-
products of representations of the groups tations of CCR’s) up to quasi-equivalence (-
attached to the individual particles. For exam- Section C) no longer holds if the number of
ple, an atom with two electrons carrying the canonical variables become infinit~e (a so-called
angular momenta J, and JZ has 25’ + 1 differ- system of infinitely many degrees of freedom),
ent angular momentum states, where J’= a point first emphasized by K. 0. Friedrichs
min(J,, J2) corresponding to the decompo- (Mathematical aspects of the quantum theory
sitionD,10DJ2=DJ,+JZ+DDJ,+JZ-1+...+ offields, Interscience, 1953), and p.hysical
D,J,-J,,. The right-hand side of this equation examples illustrating this point were given
gives all possible states of the atom. by L. van Hove (Physica, 18 (1952)) and R.
Haag (Mat. Fys. Medd. Danske Vid. Selsk., 29
(1955)). The use of C*-algebras in physics was
J. Charge Symmetry first advocated by I. E. Segal (Ann Math.,
48 (1947)), and the physical relationship
The proton and the neutron can be considered among all inequivalent representations of a
to be different states of the same particle, C*-algebra was first discussed by K. Haag and
called the nucleon, because these two particles D. Kastler (J. Math. Phys., 5 (1964)).
have very similar natures except for their In C*-algebra approach, a physical obser-
charges, masses, etc. As an approximation, the vable is an element of a C*-algebra ‘QI and a
Hamiltonian of a system consisting of protons state is a functional cp on VI (its value q(A) is
1309 351 L
Quantum Mechanics

the expectation value of the observable A E 2I Wightman, G. C. Wick, and E. P. Wigner


when measured in that state) that is linear, (Phys. Rev., 88 (1952)).
positive in the sense q(A*A) > 0 for any A E ‘%!I, In quantum field theory, the vacuum state
and normalized, i.e., 11cpII= 1 or equivalently can be taken to be pure (by central decompo-
cp(1) = 1 if 1 &I. The tGNS construction associ- sition if necessary) and in the associated GNS
ates with every state cp a Hilbert space H,, a representation (called the vacuum sector) all
representation n,(A), A E 2l, of ‘u by bounded vectors can be assumed to be physically rele-
linear operators on H,, and a unit cyclic vec- vant pure states. In principle, all physically
tor R, in H, such that rp(A) = (Q,, a,(A)n,). relevant information is in the vacuum repre-
Two states cp and $ (or rather 7crnand 7tJ are sentation; for example, a particle with spin h/2
called disjoint if there is no nonzero mapping can also be discussed in the vacuum sector if
T from H, to H, such that TX,,,(A) = 7c,(A) T we consider a state of this particle in the pre-
for all As2l. Abundant disjoint states occur sence of its antiparticle at a far distance, such
for a system of infinitely many degrees of as behind the moon (the behind-the-moon
freedom, e.g., equilibrium states of a infinitely argument). However, it is mathematically more
extended system with different temperatures convenient to consider the states of the par-
(- 402 Statistical Mechanics), superselection ticle without any compensating object (in the
sectors explained below (- 150 Field Theory) same way that an infinitely extended gas is
and equilibrium or ground states with broken more convenient for some purposes than a
symmetry. finitely extended gas surrounded by walls),
Because actual measurement can be per- which can be obtained as weak limits of states
formed only on a finite number of observables in the vacuum sector by removing the com-
(though chosen at will from an infinite number pensating particle to spatial infinity and which
of possibilities) and only with nonzero experi- produce inequivalent representations called
mental errors, information on any state cp can superselection sectors.
be obtained by measurements only up to a
neighborhood in the weak topology: 1rp(A,) -
ail<Ei,i=l,..., n. A set K, of states can
L. Foundation of Quantum Mechanics
describe measured information on states at
least equally well as another set K, (K, physi-
cally contains K2) if the closure of K, in the Hilbert spaces and the underlying field of
weak topology contains K,. From another complex numbers, which constitute a mathe-
viewpoint, all states of K, are weak limits of matical background for quantum mechanics,
states in K, and are physically relevant if are not immediately discernible from physical
states in K, are physically relevant. The set of observations, and hence there are various
all mixtures of vector states ($, a(A)+) for any attempts to find axioms for quantum mechan-
fixed faithful representation 71 of 2l is weakly ics that imply the usual mathematical struc-
dense in the set of all states of Cu, a point em- ture and at the same time allow direct physical
phasized by Haag and Kastler as a foundation interpretation.
of the algebraic viewpoint in the formulation One approach of this kind focuses attention
of quantum theory. on the set of all observables that have only
Under 360” rotation a vector representing two possible measured values 1 (yes) and 0
a state of a particle with spin h/2 acquires a (no), called questions, together with their order
factor -1 (- 258 Lorentz Group), while the structure (logical implications) and associated
vacuum vector would be unchanged. A non- lattice structure (join, meet, and orthocomple-
trivial linear combination (superposition) of mentation as logical sum, product, and nega-
these two would then be changed to a vector tion). This is called quantum logic in contra-
in a different ray. If the 360” rotation is not to distinction to the situation in classical physics,
produce a physically observable effect, then we where it would form a Boolean lattice. The
should either forbid nontrivial superpositions lattice of all orthogonal projections (corre-
of states of the two classes or, equivalently, sponding to all closed subspaces of a Hilbert
restrict observables to those leaving the sub- space) in quantum mechanics is a tcomplete,
space spanned by vectors in each class invar- torthocomplemented, weakly modular (also
iant so that the relevant linear combinations called orthomodular) tatomic lattice satisfy-
of vectors, when considered as states on the ing the covering law, where weak modularity
algebra of observables in the form of expecta- means cr\(c’vb)=b and bv(b’Ac)=c when-
tion functionals ($, A$), are actually mixtures ever b < c, and the covering law means that
(rather than superpositions) of states in two every b # 0 possesses an atom p under it (p < b)
classes and are invariant under the 360” rota- and that if an atom q satisfies q A b = 0, then
tion. This is called the univalence superselec- anycbetweenqvbandb(qvb>,c>b)isb
tion rule and has been pointed out by A. S. or q v b. Conversely any such lattice is a di-
351 Ref. 1310
Quantum Mechanics

rect sum of irreducible ones, each of which, notion of filtering corresponding to quantum-
if of dimension (length of longest chain) > 3, mechanical measurement.
can be obtained as the lattice of subspaces V Due to some features of quantum-
satisfying (T/l)’ = V in a vector space over a mechanical measurement not in conformity
(generally noncommutative) field with an anti- with common sense, there have arisen hidden
automorphic involution *, equipped with a variable theories that are deterministic and
nondegenerate Hermitian form. In this ap- reproduce the quantum-mechanical prediction.
proach, an additional requirement is needed For a situation where a pair of (correlated)
to restrict the underlying field and its * to be particles in states a and b are created and
more familiar ones, such as real, complex, or their spins (1 or -1, i.e., up or down spin)
quaternion fields and their usual conjuga- measured at positions distant from each other,
tions *. If that is done, then the set of all prob- the expectation value E(u, b) for the product
ability measures on the lattice (i.e., assign- would be given in a hidden variables theory
ment of expectation values 0 <p(a) < 1 for all
by E(u, b)= A,(l)B,(L)dp(L) for a probability
elements a in the lattice such that p(Vi ai) =
xi ~(a~) if ai I aj for all pairs i # j, ~(a) > 0, measure p and the functions A, and B, of
p( 1) = 1) is exactly the restriction to questions hidden variables 1, representing spins and
of states p(A)= tr(pA) given by the density hence satisfying 1A,[ < 1 and 1B,I < I. Then the
matrices p (Gleason’s theorem). following Bell’s inequality holds:
It is also possible to characterize the set of
IE(u, b)-E(u, b’)+E(u’, b)+E(u’, b’)1<2.
all states equipped with the convex structure
(mixtures) geometrically. The set of all states This contradicts both quantum-mechanical
(without the normalization condition) of a predictions and experimental results, so that
finite-dimensional, formally real, irreducible hidden variable theories of this type have been
tJordan algebra over the field of reals (the rejected.
positivity of a state cp is defined by ~(a*) > 0)
has been characterized as a transitively homo-
geneous self-dual cone in a finite-dimensional
References
real vector space (a cone V is transitively
homogeneous if the group of all nonsingular
linear transformations leaving V invariant is [l] P. A. M. Dirac, Principles of quantum
transitive on the topological interior of V) by mechanics, Clarendon Press, fourth edition,
E. B. Vinberg (Trans. Moscow Math. Sot., 12 1958.
(1963); 13 (1965)), where the relevant Jordan [2] A. Messiah, Quantum mechanics I, II,
algebras were completely classified earlier by North-Holland, 1961, 1962. (Original in
P. Jordan, J. von Neumann, and E. P. Wigner French, 1959.)
(Ann. Math., 36 (1934)) as direct sums of the C3.1 W. Pauli, General principles of quantum
following irreducible ones: the Jordan algebra mechanics, Springer, 1980. (Original in Ger-
(with the product A o B = (AB + BA)/2) of all man, 1958.)
Hermitian n x n matrices over the real, com- [4] S. Tomonaga, Quantum mechanics I, II,
plex, or quaternion field, all 3 x 3 Hermitian North-Holland, 1962- 1966.
matrices over octanions, or the so-called spin [S] J. von Neumann, Mathematical founda-
balls (the set of all normalized states being tions of quantum mechanics, Princeton Univ.
a ball) linearly generated by the identity and Press, 1955. (Original in German, 1932.)
yj (j = 1, . , n) satisfying yj o yk = 0 ifj # k and [6] B. L. van der Waerden, Sources of quan-
yj2=1. tum mechanics, Dover, 1967.
In infinite-dimensional cases, this type of [7] B. L. van der Waerden, Group theory and
characterization extends to the “natural” posi- quantum mechanics, Springer, 1974.
tive cones of vectors (A. Connes, Ann. Inst. [S] H. Weyl, The theory of groups and quan-
Fourier, 24 (1974); J. Bellissard and B. Iochum, tum mechanics, Dover, 1949. (Original in
Ann. Inst. Fourier, 28 (1978)); while the convex German, 1928.)
cone of all states (without normalization) of [9] E. P. Wigner, Group theory and its appli-
Jordan algebras and C*-algebras have been cation to the quantum mechanics of atomic
characterized in terms of a certain class of spectra, Academic Press, 1959.
projections associated with faces of the cone, [lo] B. Simon, Functional integration and
called P-projections, by E. M. Alfsen, F. W. quantum mechanics, Academic Press, 1979.
Shultz, and others (Actu Math., 140 (1978); 144 [ 111 G. G. Emch, Algebraic methods in statis-
(1980)). In finite-dimensional cases, Araki tical mechanics and quantum field theory,
(Commun. Math. Phys., 75 (1980)) has given a Wiley, 1972.
characterization allowing direct physical inter- [ 121 T. Bastin (ed.), Quantum theory and
pretation by replacing P-projection with a beyond, Cambridge Univ. Press, 1!)71.
1311 352 C
Quasiconformal Mappings

[ 131 B. d’Espagnat, Conceptual foundations of L. Bers [S] (- C. B. Morrey [6]). Consider an


quantum mechanics, Benjamin, second edition, orientation-preserving topological mapping f
1976. of a domain D on the z( =x + iy)-plane. The
Also - references to 150 Field Theory, 375 quasiconformality off is defined as follows. (1)
Scattering Theory, 377 Second Quantization, (the geometric delintion) Let Q be a curvilinear
and 386 S-Matrices. quadrilateral, i.e., a closed Jordan domain with
four specified points on the boundary, and
let the interior of Q be mapped conformally
onto a rectangular domain I. The ratio (> 1)
of the sides of I, called the modulus of Q and
352 (XI.1 5) denoted by mod Q, is uniquely determined. If
Quasiconformal Mappings modf(Q) < K mod Q for any curvilinear
quadrilateral Q in D, then f is called a K-
quasiconformal mapping of D. This is equiva-
A. History
lent to: (2) (the analytic definition) f is ab-
solutely continuous on almost every line seg-
H. Griitzsch (1928) introduced quasiconformal
ment parallel to the coordinate axes contained
mappings as a generalization of conformal
in D (this condition is often referred to as ACL
mappings. Let f(z) be a continuously different-
iable homeomorphism with positive Jacobian in D) and satisfies the inequality K-1lf I I
lfr:;I <K+l
between plane domains. The image of an in-
finitesimal circle (dzl = constant is an inlini- almost everywhere in D with some constant
tesimal ellipse with major axis of length (If,1 K > 1. When the value of K is irrelevant to the
problem considered, K-quasiconformal map-
+ Ifsl)ldzl and minor axis of length (If,1 -
I&l)ldzl. When the ratio K(z)=(lf,l+ pings are simply said to be quasiconformal.
l&1)/( If,1 - lf?l) is bounded, f is called quasi- The K-quasiconformal mapping f satisfies
conformal. If K = 1, then f is conformal. the so-called Beltrami differential equation
Griitzsch noticed that Picard’s theorem still
holds under the weaker condition; he deter-
mined the quasiconformal mappings between almost everywhere in D with the measurable
two given domains, which are not conformally coefficient p. The maximal dilatation (1 +
equivalent to each other, providing the smallest ll~llm)/(l-ll~ll,)does not exceedK. Some-
sup K, that is, those closest to conformality times f is called, for short a p-conformal map-
ping. These notions are also defined for map-
VI.
We cannot speak of the history of quasi- pings between tRiemann surfaces, where the
conformal mappings without mentioning the (-1, l)-form p&dz-’ is independent of the
discovery of extremal length by A. Beurling choice of the local parameter z.
and L. V. Ahlfors (- 143 Extremal Length), If in the above statements f is not neces-
which has led to the precise definition for sarily topological but merely a continuous
quasiconformality itself. function satisfying the same requirements,
Quasiconformal mappings have less rigidity we call it a p-conformal function. (If in addi-
than conformal mappings, and for this reason tion //pII m <(K - l)/(K + l), we call it a K-
they have been utilized for the type problem or quasiregular function or K-pseudoanalytic
the classification of open Riemann surfaces function.) A p-conformal function is repre-
(Ahlfors, S. Kakutani, 0. Teichmiiller, K. I. sented as the composite go h of an analytic
Virtanen, Y. Toki; - 367 Riemann Surfaces). function g with a p-conformal mapping h.
Quasiconformal mappings have important
applications in other fields of mathematics,
C. Principal Properties and Results
e.g., in the theory of tpartial differential equa-
tions of elliptic type (M. A. Lavrent’ev [Z])
The inverse mapping of a K-quasiconformal
and especially in the problem of moduli of
mapping is also K-quasiconformal. The com-
Riemann surfaces, including the theory of
posite mapping fi ofi of a K,-quasiconformal
Teichmiiller spaces (- 416 Teichmiiller
mapping fi with a K,-quasiconformal
Spaces). These applications are explained in
mapping fi, if it can be defined, is K, K,-
Sections C and D.
quasiconformal. A 1-quasiconformal mapping
is conformal. Every quasiconformal mapping
B. Definitions is ttotally differentiable a.e. (almost every-
where), its Jacobian is positive a.e., and (If,1 +
The current definitions of quasiconformality, I.fMlf,l - I-;I)6 K a.e.
which dispense with continuous differentia- Let f be a K-quasiconformal mapping of lzJ
bility, are due to Ahlfors [3], A. Mori [4], and < 1 onto 1w I< 1. Then f extends to a homeo-
352 C 1312
Quasiconformal Mappings

morphism of ]z] < 1 onto (WI d 1. If, further- continuous (resp. continuously differentiable,
more, ,f(O) = 0, then the Holder condition real analytic, complex analytic) in t, then fr@)
is also continuous (continuously differentiable,
real analytic, complex analytic). For the proofs
of these important results, which have opened
holdsfor]z,]<l,]z,]<l,and16isthebest up a new way to study +Teichmiiller space, the
coefficient obtainable independently of K extension and reflection of quasico.?formal
(Mori). This shows that any family of K- mappings are made essential use of.
quasiconformal mappings of ]z] < 1 onto ]w]
< 1 is tnormal. For further properties and (c) Extremal Quasiconformal Mappings. Let
bibliography - 0. Lehto and Virtanen [7] K(f) denote the maximal dilatation of a quasi-
and Ahlfors [S]. conformal mapping f: Suppose that a family
9 = { jJ of quasiconformal mappings is given.
If some f0 E$ exists such that K(j,) attains
(a) Boundary Correspondences and Extensions.
the infimum of K(f) for all fog, then f0 is
Ahlfors and Beurling characterized the corre-
called an extremal quasiconformal mapping in
spondence between ]z] = 1 and 1WI = 1 induced
9
by f [9]. What amounts to the same thing, the
Let R={(x,y)(O<x<a,O<y<h}, R’=
following theorem holds: Let p(x) be a real-
{(x’,y’)IO<x’<a’,O<y’<b’} be a pair of
valued monotone increasing continuous func-
rectangular domains. Let 9 be the family
tion on R such that lim,, +m p(x) = k 00. Then
of all quasiconformal mappings of R onto
there exists a quasiconformal mapping of the
R’ which map each vertex to a vertex with
upper half-plane y > 0 onto itself with bound-
(O,O)- (0,O). Then the unique extremal quasi-
ary correspondence x H p(x) if and only if
conformal mapping for 9 is the aftine map-
1 k4x-+t)-Ax) ping x’=(a’/a)x, y’=(b’/b)y (Grotzsch Cl]).
-<-------<p
P /4+/4--t) Next suppose that we are given 1.~0 homeo-
morphic closed Riemann surfaces 19, S and a
for some constant p > 1 and for all x, t E R. thomotopy class .P of orientation-preserving
Theorem of quasiconformal reflection (Ahl- homeomorphisms of R onto S. Then 9 con-
fors [lo:]). Let L denote a curve which passes tains a unique extremal quasiconfcmrmal map-
through cc and divides C U {a} into two ping. More precisely, either R and S are con-
domains Q, Q* such that R U L U a* = C U {m}. formally equivalent to each other or else R
Then there exists an orientation-reversing admits an essentially unique analytic (2,0)-
quasiconformal mapping of II onto R* which form @ such that the respective local co-
keeps every point of L fixed if and only if some ordinates z, w of R, S satisfy the dilferential
constant C exists satisfying I& -5, ]/I[, - equation
[,I d C for anyzree points <, , c,, & on L
such that [3~6,<2. (I?w/Z)/(C~W/~~Z)=[(K - l)/(K + l)]‘o/I@, (1)
with some constant K > 1 everywhere on R,
(b) Mapping Problem. Given a measurable at which @ #O (Teichmiiller [ 121, Ahlfors
function p in a simply connected domain D [3]). This turns out to be a generalization of
with llpll I, < 1, there exists a p-conformal Grotzsch’s extremal affrne mapping. The ex-
mapping of D onto a plane domain A which is tremal mapping f satisfying equation (1) is
unique up to conformal mappings of A [S]. sometimes referred to as the Teichmiiller
When p is real analytic and the derivatives of mapping.
functions are defined in the usual manner, a Consider again a p-conformal mapping 9
classical result concerning the tconformal of the unit disk D: ]z] < 1 onto itself which
mapping of surfaces asserts the existence of a induces a topological autom0rphis.m of the
solution of Beltrami’s differential equation boundary ]z] = 1. If we define 9 as the family
fi=A!fZ. of all quasiconformal automorphisms f of D
Concerning the dependence of p-conformal satisfyingf(e”)=g(e”), then the extremal
mapping on p, Ahlfors and Bers [ 1 l] obtained quasiconformal mapping in 9 exists but is not
the following important result: Denote by f p a always determined uniquely (K. Strebel [ 133).
p-conformal mapping of the whole finite plane As to the Teichmiiller mapping, the unique-
onto itself that preserves 0 and 1. The space of ness theorem is as follows: If the norm II@11=
functions p has the structure of a Banach jJ,I@(z)ldxdy of @ in (1) is finite, the Teich-
space with L&-norm, and the space of map- miiller mapping is the unique extremal quasi-
pings f@ also has the structure of a Banach conformal mapping in 9. Otherwse, the
space with respect to a suitable norm. If {p(t) uniqueness does not hold in general (Strebel
= ~(z; t)) is a family of p depending on the [ 131). On the other hand, a necessary and
parameter t with IIp(t)li ~ <k < 1 and p(t) is sufficient condition is proved for the Beltrami
1313 352 Ref.
Quasiconformal Mappings

coefficient p of a quasiconformal mapping of compact subset and has properties similar to


9 to be extremal (R. S. Hamilton [14], E. those of analytic functions.
Reich and Strebel in [ 151). Moreover, this Analytic transformations in the theory
last result can be extended to the extremal of functions of several variables are called
quasiconformal mapping between arbitrary pseudoconformal by some mathematicians,
Riemann surfaces. and there is a similar term quasi-analytic. The
latter is an entirely different notion from the
one discussed in this article.
D. Applications

In the earlier stage of development of this F. Generalization to Higher Dimensions


theory, quasiconformal mappings were applied
only to the ttype problem of simply connected Let f be a continuous ACL-mapping of a
Riemann surfaces and to the classification of subdomain G of R” into R” whose Jacobian
Riemann surfaces of infinite genus (- 367 matrix is denoted by f’(x). Furthermore, the
Riemann Surfaces). This application is based operator norm and the determinant off’ are
on the fact that it is often possible to find a denoted by Ilf’ll and det f ‘, respectively. Then
quasiconformal mapping with the prescribed f is said to be quasiregular if all the partial
boundary correspondence even when no derivatives off are locally of class L” on G
equivalent conformal mapping exists and and if there exists a constant K > 1 such that
the fact that the classes 0, and O,, (- 367 (11f ‘Il(x))n < K. detf’(x) almost everywhere in
Riemann Surfaces) of Riemann surfaces are G. The smallest K > 1 for which this inequality
invariant under quasiconformal mappings, as is true is called the outer dilatation off and
they are under conformal mappings. is denoted by K,( f ). If f is quasiregular,
It is worth remarking that the investigation then the smallest K >, 1 for which the inequal-
of quasiconformal mappings is intimately ity det f ‘(x) < K. [min,,, =i 1f ‘(x + y) I”] holds
connected with the recent development of the almost everywhere in G is called the inner
theory of tKleinian groups via Teichmiiller dilatation off and is denoted by K,( f ). If
spaces. max(K,(f),K,(f))<K’, thenfis said to be
The theory of quasiconformal mappings was K’-quasiregular. An orientation-preserving
also applied by Lavrent’ev [16] and Bers [2] mapping is called K-quasiconformal (J. VHi-
to partial differential equations, particularly sala [ 173) if it is a K-quasiregular homeomor-
to those concerning the behavior of fluids. phism. When n = 2, these definitions agree
They utilized the fact that if the density and its with those given in Section B.
reciprocal are bounded in a steady flow of a For n > 3 the following properties also still
2-dimensional tcompressible fluid, then the hold: A quasiregular mapping is discrete,
mapping of the physical plane to the potential open, totally differentiable a.e. and is abso-
plane (the plane on which the values of the lutely continuous (0. Martio, S. Rickman,
tvelocity potential and the tstream function and VHidlH [ 181). Quasiconformal extension
are taken as coordinates) is quasiconformal, of higher-dimensional half-spaces have been
and that if in addition the supremum of the studied by Ahlfors and L. Carleson [15].
tMach number is smaller than 1, then the
mapping from the physical plane to the thodo-
References
graph plane is pseudoanalytic.

[1] H. Grbtzsch, Faber mijglichst konforme


E. Similar Notions Abbildungen von schlichten Bereichen, Ber.
Verh. Sachs. Akad. Wiss. Leipzig, 84 (1932),
The term quasiconformal was used differently 114-120.
by Lavrent’ev, as follows: A topological map- [2] M. A. Lavrent’ev, Varational methods for
ping f= u + iv is called quasiconformal with boundary value problems for systems of ellip-
respect to a certain system of linear partial tic equations, Noordhoff, 1963. (Original in
differential equations when u and u satisfy Russian, 1962.)
the system. This is a generalized definition [3] L. V. Ahlfors, On quasiconformal map-
because the system may not be equivalent pings, J. Analyse Math., 3 (1954) 1-58,207-
to a Beltrami equation. However, it is reduced 208.
to a quasiconformal mapping if the system is [4] A. Mori, On quasi-conformality and
uniformly elliptic. Bers used the term pseudo- pseudo-analyticity, Trans. Amer. Math. Sot.,
analytic to describe a certain function related 84 (1957), 56-77.
to linear partial differential equations of ellip- [S] L. Bers, On a theorem of Mori and the
tic type. This function is pseudoanalytic in definition of quasi-conformality, Trans. Amer.
the sense of Section B on every relatively Math. Sot., 84 (1957) 78-84.
352 Ref. 1314
Quasiconformal Mappings

[6] C. B. Morrey, On the solutions of quasi-


linear elliptic partial differential equations,
Trans. Amer. Math. Sot., 43 (1938), 126-166.
[7] 0. Lehto and K. I. Virtanen, Quasikon-
forme Abbildungen, Springer, 1965; English
translation, Quasiconformal mappings in the
plane, Springer, second edition, 1973.
[S] L. V. Ahlfors, Lectures on quasiconformal
mappings, Van Nostrand, 1966.
[9] A. Beurling and L. V. Ahlfors, The bound-
ary correspondence under quasiconformal
mappings, Acta Math., 96 (1956), 125-142.
[lo] L. V. Ahlfors, Quasiconformal reflections,
Acta Math., 109 (1963) 291-301.
[l l] L. V. Ahlfors and L. Bers, Riemann’s
mapping theorem for variable metrics, Ann.
Math., (2) 72 (1960) 385-404.
[ 121 0. Teichmiiller, Extremale quasikon-
forme Abbildungen und quadratische Differen-
tiale, Abh. Preuss. Akad. Wiss. Math.-Nat. Kl.,
22 (1939), 1-197.
[ 133 K. Strebel, Zur Frage der Eindeutigkeit
extremaler quasikonformer Abbildungen des
Einheitskreises, Comment. Math. Helv., 36
(1962), 3066323; 39 (1964), 77789.
[ 141 R. S. Hamilton, Extremal quasiconformal
mappings with prescribed boundary values,
Trans. Amer. Math. Sot., 138 (1969) 3999406.
[ 151 L. V. Ahlfors et al., Contributions to
analysis, Academic Press, 1974.
[16] L. Bers, Mathematical aspects of subsonic
and transonic gas dynamics, Wiley, 1958.
[ 173 J. V&ill, Lectures on n-dimensional
quasiconformal mappings, Lecture notes in
math. 229, Springer, 1971.
[18] 0. Martio, S. Rickman, and J. Vatsala,
Definitions for quasiregular mappings, Ann.
Acad. Sci. Fenn., 448 (1969), l-40.
[ 191 L. Bers, A new proof of a fundamental
inequality for quasiconformal mappings, J.
Analyse Math., 36 (1979) 15-30.
353 A 1316
Racah Algebra

353 (XX.25) tation space are determined only u.p to an


overall phase factor (a complex number of
Racah Algebra modulus 1). By a suitable choice of the result-
ing arbitrary phase (which may depend on
A. General Remarks ,jI, j2, j), the coefficients are given by

Racah algebra is a systematic method of cal-


culating the tmatrix element (I/I, A$‘) in tquan-
turn mechanics, where A is a dynamical quan-
x (2j+ l)(j, +j,-j)!(j+j, -j,)!(j+j,-j,)!
tity and $ and $’ are irreducible components
of the state obtained by combining n tangular J (j, +j2+j+ I)!
momenta. The angular momentum j has x-,
xc (pl)vJ(.i, +ml)!(j,-m,)!(j:,+m2)!
y-, z-components j,, j,, j,, respectively. Each
Y ( v!(j, +j,-j-v)!(j, -in, -v)!
component is i times the infinitesimal rotation
around the respective axis and is the generator J(j,-m,)!(j+m)!(j-m)!
X
of the infinitesimal rotation for every irreduc- (j,+m,-v)!(j-j,+m,+v)!(j-jl-m,+v)! >.
ible component $. The addition of two angu-
They satisfy torthogonality relations. Another
lar momenta leads to a +tensor representation
concrete expression for the same coeffkients,
D(j,) @ O(j,) of two tirreducible representa-
but of a different appearance, was obtained
tions of the 3-dimensional rotation group.
earlier by Wigner. Wigner introduced the 3j-
The problem is to decompose this tensor rep-
symbol, given by
resentation into the direct sum of irreducible
representations. .ilj2j3 =( -l)j,-j2-m3(2j3 + l)-1’2

( ml m24 >

B. Irreducible Representations of the Three- x (jlW2m2 Ij,j2.& -4


Dimensional Rotation Group
for m, + m2 + m3 =0 and zero otherwise. This
Irreducible representations of the group SO(3) is invariant under cyclic permutations of 1,2, 3
of 3-dimensional rotations can be obtained and is multiplied by ( -l)ilij2+j3 under trans-
from irreducible representations D(j) (j = positions of indices as well as under the simul-
0, 1,2,. ) of its tuniversal covering group taneous sign change of all the m’s. The 3j-
SU(2) of 2 x 2 matrices with determinant 1, symbol multiplied by (-l)jz+j,-jl is the V-
through the 2-fold covering isomorphism coeffkient of Racah.
SO(3) g SU(2)/{ +I} (- 60 Classical Groups I). There are two ways, (D( j,) 0 D( j,)) 0 D( j,)
The representation D(j) (j = 0,1/2,1,3/2,. ) of and D( j,) @ (D( j,) 0 D( j,)), to reduce the
SU(2) is the 2j-fold tensor product A 0 . . . 0 A tensor product of three irreducible representa-
of AESU(~) restricted to the totally symmetric tions, and two corresponding sets of basis
part of the 2j-fold tensor product space. Let vectors. The transformation coefficient for the
u = (A) and u=(y) be a basis for the complex 2- two ways of reduction is written in the form
dimensional space on which SU(2) operates.
The symmetrized tensor product of (j + m)-fold
u and (j - m)-fold u multiplied by a positive =JGj12+ 1)(&3+ 1) Wl.hj ALj23).
normalization constant (m=j,j- 1, . , -j)
defines an orthonormal basis of the represen- Here W(abcd; ef), called the Racah coefficient,
tation space of D(j), which we shall denote by can be written as the sum of products of four
Wm). Wigner coeffkients. W has the following sym-
Decomposition of the tensor product of two metry properties:
irreducible representations D(j,) and D(j,) W(abcd; ef) = W(badc; ef)
into irreducible components leads to
= W(cdab; ef)
WI)0 D(j2)=~W9
= W(acbd;fe)
j=j, +j2, j,+j,-L...,Ij,-j21.
= ( -l)e+fmomd W(ebcf, (ad)
For the basis we can write
=( -l)‘+/-“-‘W(aefd; IX)

and satisfies an orthogonality relation. The 6j-


=mFm Wlmlh4j ,m2)(j,m,j2m21jlj2jm), symbol {$} is related to the Racah coefficient
2
~ by
and the coefficients are called the Clebscb-
Gordan coefficients or Wigner coefficients. The W(abdc; ef) = ( -l)o+b+c+d
vectors ti(jm) in each irreducible represen-
1317 354 B
Random Numbers

C. Irreducible Tensors The 9j-symbol can be written as a weighted


sum of the products of the three w’s.
A dynamical quantity qk (q = k, k - 1, . . . , -k) See [6] and [7] for explicit formulas of
that transforms in the same way as the basis of Clebsch-Gordan coefficients and [S] for Racah
D(k) under rotation of coordinates is called an coefficients.
irreducible tensor of rank k. That is, it satisfies

Cj,+ij,,~kl==(kTq)(kfq+l) T,k+l, References

CL, qkl =qT,“. [ 1] E. P. Wigner, Group theory and its appli-


Here [a, b] = ab - ba. The matrix element of cation to the quantum mechanics of atomic
this quantity between two irreducible compo- spectra, Academic Press, 1959.
nents can be written in the form [2] M. E. Rose, Elementary theory of angular
momentum, Wiley, 1957.
[3] U. Fano and G. Racah, Irreducible ten-
sorial sets, Academic Press, 1959.
[4] A. R. Edmonds, Angular momentum in
where CLis a parameter to distinguish multiple quantum mechanics, Princeton Univ. Press,
components with the same j, and components 1959.
of different CIare assumed to be orthogonal. [S] F. Bloch, S. G. Cohen, A. De-Shalit, S.
In this formula the Clebsch-Gordan coefh- Sambutsky, and I. Talmi, Spectroscopic and
cients are determined from group theory, while group-theoretical methods in physics (Racah
(a(ill T@IIGL’~‘) depends on the dynamics of the memorial volume), North-Holland, 1968.
system. [6] E. U. Condon and G. H. Shortley, Theory
When Vk) and Uck) operate only on the state of atomic spectra, Cambridge Univ. Press,
vectors in the subspaces H, and H,, respec- 1935 (reprinted with corrections, 1951).
tively, of the total space (tHilbert space) H = [7] M. Morita, R. Morita, T. Tsukada, and M.
H, x Hz, their scalar product ( Tck), Uck)) = Yamada, Clebsch-Gordan coefficients for j, =
C,( -l)q qk U,” has the matrix element 512, 3, and 712, Prog. Theoret. Phys., Suppl., 26
(1963), 64-74.
(a1C(2jlj2jml(T(k),U(k))la;C(;j;j;jm)
[S] L. C. Biedenharn, J. M. Blatt, and M. E.
=(-1)j1’j;-jW(j,j,j;j;; jk) Rose, Some properties of the Racah and as-
sociated coefficients, Rev. Mod. Phys., 24
(1952) 249-257.
For an irreducible component of the tensor
product of two irreducible tensors,
[TV+’ 0 U’kz’]: 354 (XVl.5)
Random Numbers
the matrix can be written as A. General Remarks
(aj, j,jll[T’kl’@ Uck2)]@lldj;j;jl)
A sequence of numbers that can be regarded
=J(2k+1)(2j+1)(2jf+1) as realizations of independent and identi-
x )$rjl I/ T(“~)lla’j;)(cCj,~~ U(k2)IIalj;) cally distributed trandom variables is called
a sequence or table of random numbers. It is
a basic tool for the tMonte Carlo method,
tsimulation of stochastic phenomena in nature
or in society, and fsampling or trandomiza-
tion techniques in statistics. Random numbers
The last factor, the 9j-symbol, is defined as used in practice are pseudorandom numbers
the matrix element between basis vectors of (- Section B); theoretically, the definition
CNjd x WJI x CWA x W4)l and CWJ x of random numbers leads to an algorithmic
%)I x VU) x W4)l: approach to the foundations of probability
CL21.
(j,j,(j,,)j,j,(j,,)jmlj,j,(j,,)j,j,(j,,)jm)
=JGj12+ l)Ch+ 1)(2j13+ 1)(2L4+ 1) B. Pseudorandom Numbers

Tables of numbers generated by random


mechanisms have been statistically tested and
354 c 1318
Random Numbers

published. To generate random numbers on a C. Statistical Tests


large scale, electronic devices based on sto-
chastic physical phenomena, such as thermo- To check uniform random numbers on (0,l)
electron noise or radioactivity, can be used. the following tests are used: (1) Divide (0,l)
For digital computers, however, numbers into subintervals; then the frequency of ran-
generated by certain simple algorithms can be dom numbers falling into these is a. multi-
viewed practically as a sequence of random nomial sample. +Goodness of tit can be tested
numbers; this is called a sequence of pseudo- by the +chi-square test; independence can be
random numbers. tested by observing the frequency of tran-
Distribution of random numbers that are sitions of subintervals in which a pair of con-
easily generated and suitable for general use is secutive numbers falls, as well as by observing
the continuous uniform distribution on the the overall properties, such as uniformity of
interval (0, I), which is approximated by the the frequency of patterns of subintervals in
discrete distribution on (0, 1, . , N - 1) (N >> 1). which a set of random numbers falls. (2) For a
Random numbers with distribution function set of random numbers, the distance of the
F( .) are obtained by transforming uniform empirical distribution function from that of the
distributions by F’( .). For typical distribu- theoretical one is tested by the +Kolmogorov-
tions, computation tricks avoiding the direct Smirnov test. (3) Observe the rank orders of a
computation of F-‘( .) have been devised. set of random numbers, and test the random-
Among them the use of +order statistics and ness of their permutations (test the number of
acceptance-rejection techniques have wide runs up and down).
applicability.
For the generation of uniform pseudoran- D. Kolmogorov-Chaitin Complexity and Finite
domnumberson{O,l,...,N-l},N=n”(n=a Random Sequences
computer word length), the following algo-
rithms are used. Each of them is written in As Shannon’s entropy is a quantity for mea-
terms of simple computer instructions. (1) The suring the randomness of random variables,
middle-square method was proposed by von the Kolmogorov-Chaitin complexity [4,5]
Neumann. We square an integer of s digits of is that of individual objects based on logic
radix (or base) n and take out the middle s instead of probability. For constructive ob-
digits as the next term. We repeat this process jects x E X, y E Y and a partial recursive func-
and obtain a sequence of pseudorandom num- tionA:Yx{l,2,...}~X,detine
bers. The sequence thus generated might be
min(log,nlA(y, n)=x),
cyclic with a short period, possibly after many &(xIY)= co (if A(y,n)=x for no n).
repetitions. The lengths from initial values to i
the terminal cycles are empirically checked. The function A is said to be asymptotically
(2) The +Fibonacci sequence {u,,} defined by optimal if for any B there exists a constant
uk+r --uk-r + uL (modn”) is apparently regular, C such that K,(x 1y) < K,(x 1y) + C for any
but it is uniformly distributed. (3) The con- x E X and ye Y. For an asymptotically op-
gruence method [3]: Define a sequence by timal A, which is known to exist, K,(x) y) is
uk+, = auk + c (mod n’) or (mod ns k 1). If c = 0, simply denoted by K(x 1y) and is called the
the procedure is called the multiplicative con- Kolmogorov-Chaitin complexity of x given y.
gruence method, otherwise the mixed con- P. Martin-Liif [6] discussed a relation be-
gruence method. The cycle, that is, the mini- tween complexity and randomness. Consider
mum k such that uk = uO, and the constants any statistical test for the randomness on the
a, c, and u0 that make the cycle maximum set of (say) finite decimal sequences which is
for given n and s are determined by number effective in the sense that it has a finite algo-
theory. The points (u,,, uk,+r, , ukl+r-r), k= rithm. Then there exists a constant C indepen-
0, 1, 2, . . . , he on a small number of parallel dent of L and M such that
hyperplanes in the /-dimensional cube. Good
choice of the constant a makes the sequence
quite satisfactory. (4) H. Weyl considered implies the acceptance of the decimal sequence
sequences f(k) = kcc (mod l), where c( is an 5,) . , tL by the test at the level 1 --2-M-C.
irrational number and k = 1, 2, . , whose This condition on the complexity is satisfied
values are uniformly distributed on the inter- by at least (1 -2-M) 10L sequences iamong the
val (0,l). They are not independent, though decimal sequences of length L.
they can be used for some special purposes. A
modified sequence xk = k2Cr (mod 1) is known E. Collective and Infinite Random Sequences
to be random for any irrational s( in the sense
that the tserial correlation N-’ C,“=, xLxL+,- For finite sequences, the notion of randomness
l/4 converges to 0 uniformly in 1 as N+co. is obscure by nature. For infinite sequences,
1319 355 A
Real Numbers

however, clearer definition is possible. Based [4] A. N. Kolmogrov, Logical basis for infor-
on the notion of collectives by R. von Mises, a mation theory and probability theory, IEEE
definition of infinite random sequences has Trans. Information Theory, IT-14 (1968),
been given by A. Church [7]. A selection func- 662-664.
tion is a (0, 1}-valued function on the set of [S] G. J. Chaitin, Algorithmic information
(say) finite decimal sequences such that {n, theory, IBM J. Res. Develop., 21 (1977), 350-
d<1,52,..., en-r)= l} is an infinite set for any 359.
infinite sequence <i, tZ, . . . For a selection [6] P. Martin-LGf, The definition of random
function rp and an infinite sequence 5 1, [ 2,“‘, sequences, Information and Control, 9 (1966),
the cp-subsequence is defined as &, &,,, . . . , 602-619.
where{n,<n,<...}={n,(p(~, ,..., &i)=l}. [7] A. Church, On the concept of a random
For a class Ic/ of selection functions, an infinite sequence, Bull. Amer. Math. Sot., 47 (1940)
decimal sequence is called a @collective if each 130-135.
of the numbers 0, 1,. . . ,9 appears in it with a [S] L. Kuipers and H. Niederreiter, Uniform
limiting relative frequency of l/10, and the distribution of sequences, Wiley-Interscience,
same thing holds for any q-subsequence with 1974.
(PE $. By definition, a random sequence is a $- [9] W. Schmidt, On normal numbers, Pacific
collective for the class $ of recursive selection J. Math., 10 (1960), 661-672.
functions. Almost all real numbers are random [lo] T. Kamae, Subsequences of normal se-
in their decimal expansions. quences, Israel J. Math., 16 (1973), 121-149.

F. Normal Numbers

Let x - [x] = Z x,9 be the r-adic expansion 355 (11.10)


of the fractional part of a real number x. For Real Numbers
any ordered set B, = (b, , . . . , &) of numbers
0, 1, . . , r - 1, let N&x, Bk) be the number of
A. Axioms for the Real Numbers
occurrences of the block B, in the sequence
x1, . . . ,xn. If Nn(x,B,)/n+r-k as n-+cc for every
The set R of all real numbers has the following
k and every B,, then x is said to be normal to
properties:
base r. Almost all real numbers are normal to
(1) Arithmetical properties: (i) For each pair
any r. D. G. Champernowne [S] constructed a
of numbers x, PER, there exists one and only
normal number given by the decimal expan-
one number w E R, called their sum and de-
sion 0.1, 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, . . .
noted by x + y, for which x + y = y + x (com-
No one has so far been able to prove or dis-
mutative law) and (x + y) + z = x + (y + z)
prove the normality of such irrational num-
(associative law) hold. Furthermore, there
bers as rr, e, fi, fi, . . . . W. Schmidt [9]
exists a unique number 0 (zero) such that
proved that the normality to base r implies the
x + 0 = x for every x E R (existence of tzero
normality to base p if and only if logr/logp is
element). Also, for each x, there exists one and
rational. A real number whose decimal expan-
only one number -XER for which x+(-x)
sion is random in the above sense is normal to
= 0. (ii) For each pair of numbers x, ye R,
base 10. For the converse, a necessary and
there exists one and only one number w E R,
sufftcient condition for a selection function cp
called their product and denoted by xy, for
(for which (p(l,, . . . , &) depends only on L) to
which xy = yx (commutative law), (xy)z =
have the property that the normality implies
x(yz) (associative law), and (x + y)z = xz +
the {cp}-collectiveness has been obtained in
yz (distributive law) hold. Furthermore,
ClOl. there exists a unique number 1 (unity)ER
such that lx = x for every XER (existence of
References tunity element). Also, for each x # 0 (x ER)
there exists one and only one number x-l ER
[l] D. E. Knuth, The art of computer pro- for which xx-i = 1. Owing to properties (i)
gramming II, second edition, Addison-Wesley, and (ii), all tfour arithmetic operations.obey
1981, ch. 3. the usual laws (with the single exception of
[2] E. R. Sowey, A second classified bibli- division by zero); in other words, R is a ttield.
ography on random number generation and (2) Order properties: (i) For each x, y E R,
testing, Int. Statist. Rev., 46 (1978), 89- 102. one and only one of the following three rela-
[3] D. H. Lehmer, Mathematical methods in tions holds: x < y, x = y, or x > y. With x < y
large-scale computing units, Proc. 2nd Symp. meaning x < y or x = y, the relation < obeys
on Large-Scale Digital Calculating Machinery, the transitive law: x < y and y < z imply x ,< z,
Harvard Univ. Press, 1951, 141-146. which makes R ttotally ordered. (ii) Order and
355 B 1320
Real Numbers

arithmetical properties are related by: x < y (5) Let {a,,} be a sequence of real numbers.
implies x+z<y+z for any ZER, and x<y If for any arbitrary positive E there (exists a
and 0 <z imply xz < yz; in other words, R is natural number n,, satisfying 1a, - 6 n I< E for
an tordered field. all m, n> n,, then {a,} is called a fundamental
In particular, x E R with x > 0 is called a sequence or Cauchy sequence. Any funda-
positive number, and x E R with x < 0 a negative mental sequence of real numbers is convergent
number. WewriteIxI=xifx>OandIxI= (completeness of real numbers).
-x if x < 0, and call 1x1 the absolute value For a set with properties 1 and 2 of Section
ofx. A, it can be proved that property 3 of Section
(3) Continuity property: If nonempty subsets A is equivalent to property 3, or properties 1
A and B of R, with a < h for each pair a E A and 4, or properties 1 and 5 of this section.
andhEB,satisfyR=AUBand AflB=@
(empty set), then the pair (A, B) of sets is called
a cut of R. For each cut (A, B) of R, there exists C. Intervals
a number XER (necessarily unique) such that
for every u E A, a <x, and for every b E B, For two numbers a, b E R with a < h, we write
h 3 x (i.e., x = sup A = infB). This property of
R is called Dedekind’s axiom of continuity (- (u,b)={xIu<x<b},
294 Numbers). (a,b]={xlu<x<bJ,
The set R of all real numbers is determined
uniquely up to an isomorphism, with respect [a,b)={xlu<x<b),
to arithmetic operations and ordering, by [a,bl={xIa<x<b},
properties (l)-(3). R forms an additive Abelian
group; its subgroup {0, + 1, + 2, , k n, . } and call them (finite) intervals, of which a and
generated by 1 can be identified with the b are their left and right endpoints, respectively.
group Z of integers. The subset of all positive Specifically, (a, b) is called an open interval and
integers { 1,2, , n, } may be identified with [u, b] a closed interval. The symbols x and
the set N of all natural numbers. The subset --cu are introduced as satisfying cc > x, x >
{m/n I m, n E Z, n # 0) of R forms the subfield of -io, a > -m for all XER. Writing fco for
R generated by 1. It can be identified with the x, we call +m and --co positive infinity and
field Q of all rational numbers. A real number negative infinity, respectively. To extend the
that is not rational is called an irrational concept of intervals, we define (-co, b) =
number. jxIx<b,xER},(-m,b]={xIxdb,xER},
(a,~)={xIu<x,xER}, [u,co)={xIa<x,
XER}, and (-co, ccj)=R, and call them infinite
B. Properties of Real Numbers intervals.
Let {a,} be a sequence of real numbers. If
(1) For each pair of positive numbers u and for each infinite interval (a, co) (( -:~,a)) there
h > a, there exists a natural number n with exists a number n, such that U,E(U. w) (U,E
u < nh (Archimedes’ axiom). (-co,a)) for all n>n,, then we write (I,+ +m
(2) For each pair of positive real numbers a (a,* -co) and call co (-co) the limit of an,
and b with a < b, there exists a rational num- denoted as before by lima,.
ber x such that a <x <b (denseness of rational
numbers).
(3) For any subset A in R tbounded from D. Topology of R
above (below), the ‘least upper bound of A: a
= sup A (tgreatest lower bound of A: b = infA) With the collection of all its open intervals
exists. (a, b) as an topen base, R is a ttopological
Let {u,,) be a kequence of real numbers. space (+order topology) that satisfies the kep-
Assume that for each arbitrary positive num- aration axioms T,, T,, T4. In R every (finite or
ber E there exists a number n, such that la, - bl infinite) interval (including R itself 11is tconnec-
<E for all n > n,. Then we write lim,,, a, = ted, and the set Q of rational numbers is dense.
h (or a,-+b) and call b the limit of {a,}. We A necessary and sufficient condition for a
also say that {a,} is a convergent sequence subset F of R to be tcompact is that F be
or that a, converges to b. bounded and closed (Weierstrass’s theorem).
(4) If for two sequences {a,,}, {b,}, we have In particular, any finite closed interval is com-
u,<u*z:..., <u,<...<b,<...db,db, and pact. R is a locally compact space satisfying
lim(b, -a,) = 0, then there exists one and the second tcountability axiom. Further, any
only one number c E R with lim a, = lim b, = c (finite or infinite) open interval is homeomor-
(principle of nested intervals). phic to R. The topology of R may :also be
1321 356 A
Recursive Functions

defined by the notion of convergence (- 87 [3] R. Dedekind, Gesammelte mathematische


Convergence). Werke I-III, Braunschweig, 1930- 1932.
Arithmetic operations in R are all con- [4] J. Dieudonne, Foundations of modern
tinuous: If a,+a and b,+b, then a, + b,-+a + b, analysis, Academic Press, 1960, enlarged and
a,- b,-m- b, a,b,+ab, and a,/b,,-*a/b (where corrected printing, 1969.
b #O, b, #O). Hence R is a ttopological field [S] K. Weierstrass, Gesammelte Abhandlun-
(regarding the characterization of R as a topo- gen l-7, Mayer & Miler, Akademische Ver-
logical group or a topological field - 422 lag., 189441927.
Topological Abelian Groups). Also - references to 381 Sets.
R as a topological Abelian group (with
respect to addition) is isomorphic to the top-
ological Abelian group R+ of all positive real
numbers with respect to multiplication. To be
precise, there exist topological mappings f : R 356 (1.9)
+R+ withf(x+y)=f(x)f(y) and g:R++R Recursive Functions
with g(xy)=g(x)+g(y). Iff(l)=a, g(b)= 1,
then J g are uniquely determined and are
written f(x)=aX, g(x)=log,x. A. General Remarks
Regarding R as a topological Abelian group
(with respect to addition), any proper closed A function whose tdomain and trange are both
subgroup I of R is discrete and isomorphic to the set of natural numbers (0, 1,2, . } is called
the additive group Z of integers. That is, for a number-theoretic function. In this article, the
some a>0 we have T={neln~Z}. In parti- term natural number is used to mean a non-
cular, the quotient group R/Z as a topological negative integer. Hilbert (1926) and K. Godel
group is isomorphic to the rotation group of a [l] considered certain number-theoretic func-
circle (l-dimensional ttorus group). Elements tions, called recursive functions by them and
of R/Z are called real numbers mod 1. now called primitive recursive functions after S.
C. Kleene [2] (the definition is given in Section
B). Godel introduced an efficient method of
arithmetizing metamathematics based on
E. The Real Line
representing certain linitary procedures in
metamathematics by primitive recursive func-
Let 1 be a Euclidean straight line considered to tions. Then the following problem naturally
lie horizontally, say from left to right. Let PO, arises: How shall we define a finitary method?
pi be two distinct points on 1, with p. situated In other words, how shall we characterize a
to the left of pl. Then there exists one and only number-theoretic function that is effectively
one bijection cp from the set L of all points of 1 computable, or provided with an algorithm of
to R satisfying (i) ‘p(pJ=O, cp(pl)= 1; (ii) if p computation? Gijdel defined the notion of
lies to the left of q, then q(p) < cp(q); and (iii) general recursive function by introducing a
for two line segments pq and p’q’ (where p and formal system for the elementary calculation of
p’ are to the left of q and q’, respectively), functions, following the suggestion given by J.
pq = p’q’ (pq and p’q’ are tcongruent)o cp(q) - Herbrand. Kleene later improved Godel’s
cp(p)=cp(q’)-a. Then q(p) is called the definition and developed the theory of gen-
coordinate of the point p, and (po, pJ the frame eral recursive functions [2]. Furthermore, A.
of the line 1. A Euclidean straight line with a Church and Kleene defined I-calculable func-
fixed frame is called a real line (identified with tions using the l-notation (Church [4]), and E.
R by the mapping cp) and is usually denoted by L. Post and A. M. Turing defined the notion of
the same notation R or R’. computable functions by introducing the
concept of Turing machines. These notions,
introduced independently and almost simulta-
neously, were found to be equivalent. Hence
References
such functions are now simply called recursive
functions. Here, instead of giving the definition
[l] N. Bourbaki, Elements de mathematique of recursive functions in the original style (the
III. Topologie gtntrale, ch. 4. Nombres reels, Herbrand-Godel-Kleene definition), we give it
Actualitts Sci. Ind., 1143c, Hermann, third by utilizing the idea of introducing schemata, a
edition, 1960; English translation, General natural extension of the notion of primitive
topology pt. 1, Addison-Wesley, 1966. recursive functions. We employ the letters x, y,
[2] G. Cantor, Gesammelte Abhandlungen, z, Xl, x2, ... for variables ranging over the
Springer, 1932. natural numbers.
356 B 1322
Recursive Functions

B. Primitive Recursive Functions Similarly, the following operations are primi-


tive recursive: the finite product n:y<rr the
Consider the following five definition logical connectives 1, v , A, + (-- 411
schemata: Symbolic Logic), definitions by cases, the
(I) q(x) = x’ (=x+1), bounded quantifiers Sly,,,, VyY,,,, and the
bounded p-operator pyYcz defined :as follows:
(11) 4)(X,, . . ..x.)=q py,,,R(x,y) is the least y such that y<z and
(4 a given natural number), R(x, y) holds, if there exists such a number y;
otherwise, it equals z. The following operation
(III) 4)(X , ) / x,) = xi (1 ,<i<n), is also primitive recursive:

(IV 4)(X,, .“> x,)=~(x,(x,,...,x,),..., cp(Y,X,, . . ..x.)


X&l >. . .1 X,1)> =x(y,@(y;x2 /..., x,),x* ,..., x,),

where

i<y
x2, . . ..x.),
where li/( ) is a constant natural number if A function cp is said to be primitive recursive
n = 1. A function is called primitive recursive if uniformly in $1, . . , $, when cp is definable by
it is definable by a finite series of applications applying a primitive recursive operation to
of the operations (IV) and (V) ($,x,x1, . . . . x,, * I, . ..> *,.
are already-introduced functions) starting from Almost all results mentioned in this section
functions each of which is given by (I), (II), or were given by Gadel [ 11. There are further
(III). Given the functions G1, . . . , $[, we define investigations on primitive recursive functions
the relativization (with respect to +, , , $J of by R. P&er (1934), R. Robinson (1947), and
the definition of primitive recursive functions others. Note that a function definable by a
as follows: A function is called primitive recur- +double recursion is not necessarily primitive
sive in $1, , $I if it is definable by a finite recursive. Ptter (1935, 1936) inveslzigated in
series of applications of (IV) and (V) starting detail functions that are definable: in general,
from $1, , tjr and from functions each of by k-fold recursions for every positive interger
which is given by (I), (II), or (III). k PI.
We say that a function cp(x,, . , x,) is
the representing function of a tpredicate
P(x 1, , x,,) if cp takes only 0 and 1 as values C. General Recursive Functions
and satisfies
The following p-operator is used to define
P(x,, . . . . X”)O’p(Xl, . . . . x,)=0.
general recursive functions by extending primi-
Then we call P a primitive recursive predicate if tive recursive functions. For a predicate R(y)
its representing function cp is primitive recur- on the natural numbers, pyR( y) is the least y
sive. The following functions and predicates such that R(y), if 3yR( y); otherwise, pyR( y) is
are examples of primitive recursive ones: a + b, undefined. Generally, py($(xl, . . ,x,, y)=O)
a’b, a”, a!, min(a,b), max(a,b), la-bJ, a=b, is not necessarily defined for each n-tuple
u < b, a ( b (a divides b), Pr(a) (a is a prime (x 1, . . , XJ of natural numbers. Now, a func-
number), pi (the (i + 1)st prime number, p,, = 2, tion is called a general recursive function (or
p1 = 3,. . . ), (a), (the exponent of pi of the unique simply recursive function) if it is definable by a
factorization of u into prime numbers if a #O; series of applications of schemata including a
otherwise, 0). new schema
Whenever we are given a concept or a
(VI) dXI> . ..1 x,)=~Y(lcl(x,,“‘,:c,,Y)=o)
theorem, we always transform it by replacing
the predicates contained in it (if any) by corre- for the definition of cp from any function II,
sponding representing functions. Then an that satisfies
operation R is called primitive recursive if
vx, ... VXJY(~(X,, . ../ x.,y)=O),
the function or the predicate a($, , . . , &,
Q 1, , Q,) that results from the application in addition to those used to define the primi-
of 51 to functions tjl, , $I and predicates tive recursive functions. Thus, by (definition,
Q,, ,Q, (1,maO, I+m>O) is primitive a primitive recursive function is general recur-
recursive in +, , , Q,. Put cp(x,, . ,x,, z) sive. A general recursive predicate is a predicate
=~y<I.$h,..., x,, y). Then cpis primitive such that its representingfunction is general
recursive in $, and the finite sum ZYcZ is a recursive. The facts, including the ones con-
primitive recursive operation in this sense. cerning relativization, that are valid for pri-
1323 356 D
Recursive Functions

mitive recursive functions are also valid for ment: Every effectively calculable function is a
general recursive functions. general recursive function. The converse of this
is evidently true by the definition of recursive-
Kleene’s Normal Form Theorem. For each n, ness. So Church’s thesis and its converse pro-
we can construct a primitive recursive predi- vide the exact definition of the notion of effec-
cate T,(z,xl, . . . , x,, y) and a primitive recursive tively computable functions. Though this
function V(y) such that given any general notion is somewhat vague and intuitive, the
recursive function rp(x,, . . . , x,), a natural definition seems to be satisfactory, as men-
number e can be found such that tioned at the beginning of this article. There-
fore, any function with a computation proce-
VX , ...Vx.3yT,(e,x,,...,x,,y), (1) dure or algorithm can be assumed to be gen-
eral recursive. Utilizing this, various decision
4$x,, . . . . x,1= UWT,(e,x,, . . ..x..Y)). (2)
problems have been negatively solved (- 97
Any natural number e for which (1) and (2) Decision Problem). Furthgmore, traditional
hold is said to define q recursively or to be a descriptive set theory can be reinvestigated
Giidel number of a recursive function cp.Let from this point of view, and the concept of
$, , . . . , JII (abbreviated Y) be any given func- effectiveness used in tsemi-intuitionism is
tions. We can relativize Kleene’s normal form clarified using general recursive functions
theorem with respect to them as follows: (- 22 Analytic Sets).
For each n, we can construct a predicate
y(z,x,,..., x,, y) that is primitive recursive in
Y such that given any function cpthat is gen- D. Recursive Enumerability
eral recursive in Y, a natural number e can be
found such that
A set {q(O), q(l), q(2), . . .I enumerated by a
Vx,...Vx,3y~‘(e,xx, ,..., x,,y), (3) general recursive function cp(allowing repe-
dx,, . . ..x.)= UWGy(e,xl, . . ..x.,y)), (4) titions) is called a recursively enumerable set.
The empty set is also considered recursively
where U(y) is the primitive recursive function enumerable. It is known that in this definition
mentioned in Kleene’s normal form theorem. “general recursive” can be replaced by “primi-
A natural number e for which (3) and (4) hold tive recursive” (.I. B. Rosser, 1936). A set E of
is said to define cp recursively in Y or to be a natural numbers is recursively enumerable if
Giidel number of q from Y. In particular, a and only if there is a primitive recursive pre-
Gijdel number e of cpfrom Y can be found dicate R(x,y) such that x~Eo!lyR(x,y)
independently of Y (except for I and the re- (Kleene [2]).
spective numbers of variables of $I) . . . , $J Generally, a predicate E(x 1, . . . , x,) is called
when cpis general recursive uniformly in Y. a recursively enumerable predicate if there is a
Now let S be a tformal system containing general recursive predicate R(x,, . . ,x,,
ordinary number theory. A number-theoretic yl, . . . , y.) such that E(x,, . . . . x,)o~y, . ..
predicate P(x 1, . . . , x,) is said to be decidable 3y,R(x,, . . , ~~,y,, . . ,y,). (Here “general
within S if there is a formula P(a,, . . . , a,) (with recursive” can be replaced by “primitive
no tfree variables other than the distinct vari- recursive.“)
ables a,, .. . , a,) of S such that for each n-tuple We call a set E a recursive set if the predi-
(x1, .. , x,) of natural numbers (the symbol k cate x E E is general recursive. The set C =
means provable in S), (x j3yT, (x, x, y)} is an example of a set that
is recursively enumerable but not recursive,
6) l-WI,..., XJ or I- lP(xl,...,x,)
and it has the following remarkable property:
and For every recursively enumerable set E, there
is a primitive recursive function such that
(ii) P(x, ,..., x,)-l-PP(x,,...,x,), XE E 9 (P(X)E C. In this sense, the set C is said
wherex,,..., x, designate the numerals corre- to be complete for the class of recursively
sponding to x1, . . , x, in S. If S is a consistent enumerable sets. Post’s problem, which asked
system such that primitive recursive predicates whether the sets that are recursively enumer-
are decidable within S and the predicates PfA able but not recursive have the same tdegree of
(for any formula A, PfA(xl, .. . , x., y) means (recursive) unsolvability as that of C, was
that y is the GSdel number of a proof of negatively solved simultaneously by R. M.
A@ ,, . . . , x,)) are primitive recursive, then a Friedberg (1957) and A. A. Muchnik (1956-
necessary and sufficient condition for P to be 1958). A recursively enumerable set E is gen-
decidable within S is that P is a general recur- eral recursive if and only if there is a gen-
sive predicate (A. Mostowski, 1947). eral recursive predicate R(x, y) such that
Church (1936) proposed the following state- x&lZo3yR(x,y) (Kleene [S], Post [6]).
356 E 1324
Recursive Functions

E. Partial Recursive Functions and put

A function cp(x , , , I,,) is called a partial func-


tion if it is not necessarily defined for all n- = U(pyrl-“m(e,x,, ,x,,y)).
tuples (xi, , x,) of natural numbers. For two
partial functions $(x1, , x,) and x(x,, , x,), We call such a functional ‘p(c~i, , E,,,,
$(x1, , x”) e x(x,, , xn) means that if either xi, . . , x,) (partial) recursive, and with it we
$(x,, ,x,) or x(x,, ,xn) is defined for can develop a theory of recursive functions of
x,, ,.x,, so is the other, and the values are variables of two types.
the same. For any given natural number e, Extending the notion of recursive func-
(P(x, , ,x,) = U(PYT,(~, x1, ,x,, Y)) (or tionals, Kleene introduced and investigated
2 U(IIJ~;:‘(~, x,, , x,,y))) is a partial func- the recursive functionals of variables of arbit-
tion, in general. We say that such a function is rary (finite) types [lo, 111. The natural num-
partial recursive (partial recursive (uniformly) in bers are the objects of type 0, and the one-
‘I’) and that a natural number e defines cp place functions from type,j objects to natural
recursively ((uniformly) in Y) or is a GSdel numbers are objects of type j-t 1. Denote
number of a partial recursive function 47 (from variables ranging over the type-j objects by r’,
‘I’). When a natural number e is a Godel num- bj, ‘/j, , or ai, a$, cl;, . . . , etc. Consider a
ber of 47(x,, , x,) (a Godel number of 4” functional (simply called a function) of a given
trom Y), 47(x,, , x,) is sometimes written as finite number of such variables of types taking
{e) (x,, , x,)( {e)“‘(x,, , x,)). If a predi- natural numbers as values. A function cp is
cate R(x,, J,,, y) is general recursive, then called a primitive recursive function if it is
pyR(x,, ,x,, y) is partial recursive. There- definable by a finite series of applications of
fore, {z) (x, , , x,) is a partial recursive func- the following schemata (I)-(VIII), where a is a
tion of the variables of z and of xi, ,x,. variable of type 0, b is any list (possibly empty)
On the partial recursive functions, the fol- of variables that are mutually distinct and
lowing two theorems, given by Kleene [3], are different from the other variables Iof the
most useful. (1) For natural numbers m, n, a schema, and $, x are given functicns of the
primitive recursive function ,Snm(z,y,, , y,) indicated variables. (I) cp(a, 6) = a’; (II) cp(6) =
can be found such that, for any natural num- q (q is a natural number); (III) cp(a, 6) = a;
bere,je}(y,,...,y,,x,,...,x,)-{S,”(e,y,,..., (IV db) = 44x(b), 6); (VI do, b)= h(b), da’, b)
y,))(x,, , x,,). (2) For any partial recursive = xh da, 61, 6); (VI) da) = $(a, 1 I a, is a
function $(z, xi, , x,), a natural number e list of variables from which a is obtained by
can be found such that {e} (x1, , x,,) N changing the order of two variables of the
$4%X,. . . . . x,). same type); (VII) (p(ai,u, b)=a’(a); (VIII)
The notion of partial recursive functions cp(rj, b) = ccj(iccj-‘~(aj, aj-‘, b)) (j&’ desig-
appeared first in the theory of tconstructive nates that 1 is a function of the variables cc’-‘).
ordinal numbers of Church and Kleene (1963). We assign to each function cp(a:+a natural
Partial recursive functions can be defined in number called an index (which plays the same
the Herbrand-Godel-Kleene style as a natural role as a Godel number) in such a way that
extension of general recursive functions, and it reflects the manner of application of the
they are also definable by a finite series of schemata used to introduce q(a). Yaw, we
applications of the schemata (IV), (V), and (VI) write cp(a) with an index e as {e}(e). We call
(in each schema, = used for the definition of q a function cp(a) partial recursive if it is detin-
should be replaced by 2) starting from func- able by a finite series of applications of the
tions given by (I), (II), and (III). schemata (I)-(VIII) (= is employed instead of
= in (IV)-(VI) and (VIII)), and (IX) cp(u, 6, c) N
{u} (6, c) (c is a finite list of variables of arbitrary
types). In particular, cp(a) is called a genera1 re-
F. Extension of Recursive Functions to
cursive function if it is defined for all values of the
Number-Theoretic Functionals
argument a. These notions can be relativized
also with respect to any given functions. Note
Let xi, , a, be number-theoretic functions of that for the case of types < 1, primitive recurs-
one variable. If cp(xi, ,x,) is (partial) recur- ive functions, partial recursive functions, and
sive uniformly in u,, , a,,,, then a Giidel also general recursive functions in the present
number e of cp is found independently of sense are equivalent to the corresponding
s(i, . . . . x,, and 47(x,, . . . . x,) is expressed as notions (introduced via relativizat.ion with
U(;1y7;pl..-“m(e, x,, , x,,y)). Now suppose uniformity) in the ordinary sense already de-
that 2,) ,x, range over the set NN of all scribed. The following theorem is important:
number-theoretic functions of one variable, Let r be the maximum type of a. Then there
1325 356 H
Recursive Functions

are two primitive recursive predicates M, N Every infinite cardinal is admissible. The
such that least admissible ordinal is w, and the next is
the ordinal wi of Church and Kleene, i.e., the
@)(a)- first nonconstructive ordinal. In fact, for every
woV(‘-13v]‘-2M(e, a, w, (‘-l,q’-‘), n 2 1, the first ordinal not expressible as the
order type of a A,’ predicate is admissible (-
ra2, Section H). For each infinite cardinal IC there
wo3pVf-2N(e a w (-1 3tf-2) 2
99, are K+ admissible ordinals of power K. Platek
investigated recursion theory in a still wider
r>2. setting. He dealt with functions defined, not on
Every function definable using (IX’) $(a)- a segment of ordinal numbers, but on a set,
px($(a, x) = 0) instead of (IX) is partial recur- and introduced the notion of admissible sets,
sive. However, not all the partial recursive i.e., sets on which a well-behaved recursion
functions of variables of types > 2 can be theory can be developed. An admissible set is a
obtained by applying schemata (I))(VIII) and transitive s-model of a certain weak set theory,
(IX’). and an ordinal K is admissible if and only if
Further developments have been pursued by there exists an admissible set A such that
J. E. Fenstad, J. Moldestad, and others in A n 0, = K, where 0, is the class of all ordinal
abstract computation theory [20&23]. numbers.
Recent developments have shown that
G. Recursive Functions of Ordinal Numbers generalized recursion theory, set theory, and
infinitary logic are closely related. In addition
G. Takeuti introduced a notion of primitive to the abovementioned, there are some investi-
recursiveness for functions from a segment of gations by Y. N. Moschovakis and others
the ordinal numbers to ordinal numbers. [14-271.
Using this, he constructed a model of set
theory in ordinal number theory. In connec-
tion with recursive functions of ordinal num- H. Hierarchib
bers, there are also investigations by A. Levy,
M. Machover, Takeuti and A. Kino, T. Tug& Utilizing the theory of trecursive functions, S.
S. Kripke, and others. C. Kleene succeeded in establishing a theory
Early treatments of recursive functions of of hierarchies that essentially contains class-
ordinal numbers dealt only with functions on ical descriptive set theory as an extreme case
infinite cardinals. For example, Takeuti con- [S, 10,31,32]. Although research following a
sidered functions with a fixed infinite cardinal similar line had also been done by M. Davis,
IC as a domain and a range, and defined IC- A. Mostowskii, and others, it was Kleene who
recursive functions using schemata similar to succeeded in bringing the theory to its present
the abovementioned (I)-(VI). Subsequently form.
Kripke observed that the assumption that IC is Sets or functions are described by tpredi-
a cardinal is not necessary, and introduced cates, which we classify as follows. Let a, b, . ,
the notion of admissible ordinals. An admis- al,0 2,. . . , x, y,. . . , be variables ranging over
sible ordinal K has the closure properties the set N of natural numbers, and GI,p, . . . ,
required for the construction of the calculus, %>‘32,...,5>%... be variables ranging over
and whenever a, /l< K and /I =f(a) is com- the set NN of all tnumber-theoretic functions
putable, then /?=f(cc) is computable in fewer with one argument. Let $i, . . . ,11/# 2 0) be
than K stages. Given an admissible ordinal K, number-theoretic functions. A predicate
K-recursiveness can be defined, as in the case of P(a 1,..., cc,,a, ,..., a,)(m,n>O,m+n>O)
general recursiveness, by various equivalent with variables of two ttypes is called analytic
methods, e.g., schemata, the equation calculus, in $i, . , $r (la 0) if it is expressible syntacti-
and definability in both quantifier forms. Most cally by applying a finite number of logical
of the elementary properties of general recur- symbols: +, v, A, 1,3x, Vx, 35, V<, to tgen-
sive functions (e.g., the normal form theo- era1 recursive predicates in $i, . . , &. In partic-
rem, parametrization theorem, enumeration ular, when P is expressible without function
theorem, etc.) are also valid for K-recursive quantifiers 35, V& it is called arithmetical in
functions. The notions of degrees of unsolva- $i , . . . , $r (I > 0). When 1= 0, they are called
bility and recursive enumerability can also be simply analytic and arithmetical, respectively.
generalized, yielding the notions of k-degrees For brevity, consider the case I= 0,
and K-recursive enumerability, respectively. and denote by a a finite list of variables
The fine structures of these properties are (a,, . . , a,, a,, . , a,). Every arithmetical pre-
currently the objects of intensive research. dicate P(a) is expressible in a form contained
356 H 1326
Recursive Functions

in the following table (a): and their dual forms. For any general recur-
sive predicate R, there is a primitive recursive
(4 R(a): predicate S such that
3xR(a, xl, Vx 3yR(a, x, Y), . . ,
3ctR(a,a)o3a3xS(a(x),a)93yS(y, a) (5)
VxR(a,.x), 3x VyR(a, x, YX , and its dual hold. Using these facts, we can
where each R is tgeneral recursive. In order to classify the forms of all analytic predicates by
obtain such an expression we first transform the table(b):
the given predicate into its tprenex normal
04 A(a);
form and then contract successive quantifiers
of the same kind by the formula Vx3xR(a,a,x), 3crVa?lxR(a,c~,P. x), . . . ,

3x, 3x,A(x,, . . . . x,) &VxR(a, LX,x), Va3/IVxR(a,cc,/J,x), . . . ,

o3xA((x),,...,(x),-,) (1) where A is arithmetical and each R is general


recursive. Similarly, denote by Zi, Z7,l each
and its “‘dual form.” Each form in (a) (or the form of predicate in (b) (or the class of all
class of all predicates with that form) is de- predicates reducible to that form), where k is
noted by Cf or II,“, where the suffix k refers the number of function quantifiers prefixed;
to the number of quantifiers prefixed, and Z also, denote by Ai the (class of) predicates
or I7 shows that the outermost quantifier is expressible in both forms CL and 17;. For Ci,
existential or universal, respectively. A predi- l7,l (k > l), we have the enumeration theorem,
cate that is expressible in both forms Cf and the hierarchy theorem, and the theorem on
I7,” (or the class of such predicates) is denoted complete form. The hierarchy given by table
by A:. A predicate belongs to AT if and only if (b) is called the analytic hierarchy.
it is general recursive (an analog of +Suslin’s For 1> 0 (namely, when predicates are
theorem). arithmetical or analytic in $i, . , &), we can
For k 3 1, there exists in Cf (or fl,“) an tuniformly trelativize the above results with
enumerating predicate that specifies every respect to $i, . . , $t. Now let {Z[,“‘~,...,wL,
predicate in Cf (L$. For example, for Z7: L$~v~~~~.~W[}k(r =O, 1) be the corresponding
and m = n = 1, there is a tprimitive recursive hierarchy relative to $1, , $t. Given a set
predicate S(sc,z, a, x, y) such that, given a C ( c NN) of functions with one argument,
general recursive predicate R(a, a, x, y), we we can consider hierarchies of predicates
have a natural number e such that which are arithmetical or analytic in a finite
Vx~yR(cc,u,x,y)oVx3yS(sc,e,a,x,y) number of functions in C. Such a hierarchy is
called a C-arithmetical or C-analytic hierarchy
(enumeration theorem). In this theorem, we and denoted by {Zf[C], n,“[C]},, or jCi[C],
can take Tt(z, a, x, y) (- Section F) as 4,f CCI~,, respectively. That is, when we re-
S(Z, z, a, x, y). For each k > 0, there exists a gard LL[C] as a class of predicates (or sets)
C~+,(@+,) predicate that is not expressible in P, it is the family {P/PEC;~~~~~..~~~: is,, . . . <[E
its dual form L$‘+, CC:+,) (hence, of course, in C, 1= 0, 1,2,. . }. These notations have been
neither Cf not TZ,“) (hierarchy theorem). There- given by J. W. Addison [28,29]. The NN-
fore, table (a) gives the classification of the arithmetical hierarchy and the NN-analytic
arithmetical predicates in a hierarchy. This hierarchy for sets correspond respectively to
hierarchy is called the arithmetical hierarchy. the finite Bore1 hierarchy and the !orojective
For each k 2 1, there exists a complete predi- hierarchy in the tspace of irrational numbers.
cate with respect to &?(Z7,“), that is, a C,“(Z7,“) Addison called the theory of those hierarchies
predicate with only one variable such that classical descriptive set theory, and in contrast
any &“(@) predicate is expressible by sub- to this, the theory of arithmetical and analytic
stituting a suitable general (or more strictly, hierarchies for sets (C = 0) effective descrip-
primitive) recursive function for its variable tive set theory [28].
(theorem on complete form). When m = 0, all We now restrict our consideration to pre-
the general recursive predicates in Cf exhaust dicates for natural numbers (i.e., to the case
Ai+, (Post’s theorem). m = 0). Define the predicates L, by L,(a)*
Concerning the function quantifiers, we a = a, Lk+l(a)03xTlL~(a, a, x). For each k > 0,
have ~,+,(a) is a Z,oCi predicate which is of the
highest +degree of recursive unsolvability
3a, . . . iIa,A(a,, . ,a,,,)
among the L’f+i predicates, and its degree is
-3aA(lt(a(t)),, . ,Lt(a(t)),-,) (2) prbperly higher than that of L,(a). Thus L,,
k = 0, 1, 2, , determine the arithmetical
3xA(x)-=3aA(a(O)), (3)
hierarchy of degrees of recursive unsolvability.
Vx3aAIx,5()03aVxA(x,ita(2”.3’)), (4) Kleene has extended the series of L, by using
1327 356 Ref.
Recursive Functions

the system S, (- 81 Constructive Ordinal P(a’, &, t’-‘), there is a primitive recursive
Numbers) of notations for the constructive or- predicate R(a’, n’-i, l’-‘) such that
dinal numbers as follows [6]: H,(a)- a = a;
30’V~r-zP(a’, cr’, t’-‘)
for yo0, H,,(a)o3xT,H~(a,a,x); for 3. 5y~0,
H~.~Y(u)~H,,~~,~(u),), where Y, = (~1 (no). This 03n’-1V<r-ZR(a’, VI-i, <r-2). (6)
H, is defined for each y E 0, and it is of a prop-
Using these equivalences, each predicate P(a’)
erly higher degree than that of Hz when
of order r + 1 (r > 0) is expressible in one of
z<,y. If lyl=lzl (lyl is the ordinal number
the following forms:
represented by y), then Hy and Hz are of the
same degree (C. Spector [34]). Thus a hierar- (4 B(a);
chy of degrees is uniquely determined by con-
Va’3<‘-‘R(a, a’, <‘-‘),
structive ordinal numbers. This hierarchy is
called the hyperarithmetical hierarchy of de- 3a’V/I’Zll’-‘R(a, LX’,/?, <‘-‘), ... ,
grees of recursive unsolvability. A function or
predicate is said to be hyperarithmetical if it is 3a’V<‘-‘R(a, a’, <‘-‘),
recursive in H, for some y E 0. These concepts VC(‘38rVS’-1R(a,cr’,8r, <‘-‘), ... ,
and the results mentioned below can be rela-
tivized to any given functions or predicates. where B is of order r and each R is general
A necessary (Kleene [31]) and sufficient recursive. When t = r + 1, table (c) gives the
(Kleene [32]) condition for a predicate to be classification of the predicates of order r + 1
hyperarithmetical is that it be expressible in into the hierarchy. In fact, for the predicates
both one-function quantifier forms Ai (an P(a’+‘) in each form, we have the enumera-
effective version of Suslin’s theorem). Denote tion theorem, the hierarchy theorem, and the
by Hyp the set ( c NN) of all hyperarithmetical theorem on complete form (Kleene [lo]).
functions a. For an arithmetical predicate D. A. Clarke [30] has published a detailed
A(a, a), 3aacHypA(a, a) is always a n: predicate review of the general theory of hierarchies.
(Kleene [33]). Conversely, for any l7: predi-
cate P, there is a general recursive predicate R References
such that P(u)- 3tl.,nyp VxR(u, a, x) (Spector
[35]). As to tuniformization, for a II: predi- [1] K. Godel, Uber formal unentscheidbare
cate P(u, b), we have Vx 3yP(x, y) a 3tlmEHYp SHtze der Principia Mathematics und ver-
VxP(x, a(x)) (G. Kreisel, 1962). Let E be an ob- wandter Systeme I, Monatsh. Math. Phys., 38
ject of type 2 defined by: E(a) = 0 if 3x@(x) =0), (1931), 173-198.
otherwise E(a) = 1. A function &z,, . . . , a,) is [2] S. C. Kleene, General recursive functions
hyperarithmetical if and only if it is general re- of natural numbers, Math. Ann., 112 (1936),
cursive in E (Kleene [lo]). A predicate that is 727-742.
hyperarithmetical relative to ZT; predicates [3] S. C. Kleene, On notation for ordinal
(k>O) is of A:+i (Kleene [32]), but the con- numbers, J. Symbolic Logic, 3 (1938), 150-l 55.
verse does not hold in general (Addison and [4] A. Church, The calculi of lambda-
Kleene, 1957). conversion, Ann. Math. Studies, Princeton
Kleene extended his theory of hierarchy Univ. Press, 1941.
to the case of predicates of variables of any [S] S. C. Kleene, Recursive predicates and
type by utilizing the theory of general recur- quantifiers, Trans. Amer. Math. Sot., 53 (1943),
sive functions with variables of finite types 41-73.
0, 1,2, . . [lo]. Let a’ be a list of variables of [6] E. L. Post, Recursively enumerable sets of
types < t. We say a predicate P(d) is of order Y positive integers and their decision problems,
in completely defined functions $i, . . . , I+$ Bull. Amer. Math. Sot., 50 (1944), 284-3 16.
(I > 0) (for brevity, denote them by Y) if P is [7] S. C. Kleene, Introduction to metamath-
syntactically expressible in terms of variables ematics, Van Nostrand, 1952.
of finite types, predicates that are general [S] R. Peter, Rekursive Funktionen, Akade-
recursive in ‘I”, and symbols of the tpredicate mische Verlag., 195 1.
calculus with quantification consisting only of [9] A. A. Markov, Theory of algorithms (in
variables of types cr. The predicates of order Russian), Trudy Mat. Inst. Steklov., 42 (1954).
0 in Y are exactly the general recursive ones in [lo] S. C. Kleene, Recursive functionals and
Y. When t > 1, and Y are functions of vari- quantifiers of finite types I, Trans. Amer.
ables of type 0, a predicate P(d) is of order 1 Math. Sot., 91 (1959), l-52.
(of order 2) in Y if and only if P is arithmetical [ 1 l] S. C. Kleene, Recursive functionals and
(analytic). quantifiers of finite types II, Trans. Amer.
We have theorems similar to (2)-(4) and Math. Sot., 108 (1963), 106-142.
the following theorem and its dual for I > 2: [12] H. Hermes, Enumerability, decidability,
For any given general recursive predicate computability, Springer, 1965.
357 A 1328
Regular Polyhedra

[ 13) H. Rogers, Theory of recursive functions 357 (Vl.6)


and effective computability, McGraw-Hill,
Regular Polyhedra
1967.
[14] J. Barwise, Intinitary logic and admissible
sets, J. Symbolic Logic, 34 (1969), 226-252. A. Regular Polygons
[ 151 R. B. Jensen and C. Karp, Primitive
recursive set functions, Proc. Symposia in Pure A +polygon in a Euclidean plane bounding a
Math,, XIII (1971), 143-176. tconvex cell whose sides and interior angles
[ 161 S. Kripke, Transfinite recursions on ad- are all respectively congruent is called a regu-
missible ordinals, J. Symbolic Logic, 29 (1964) lar polygon. When the number of vertices
161-162. (which equals the number of sides) is n, it is
[ 171 M. Machover, The theory of transfinite called a regular n-gon. There exist a circle
recursion, Bull. Amer. Math. Sot., 67 (1961), (circumscribed circle) passing through all the
575-578. vertices of a regular n-gon and a concentric
[ 181 G. Takeuti, A formalization of the theory circle (inscribed circle) tangent to all the sides.
of ordinal numbers, J. Symbolic Logic, 30 We call the center of these circles the center of
(1965), 295-317. the regular n-gon. The n vertices of a regular
[ 191 T. Tug&, On the partial recursive func- n-gon are obtained by dividing a circle into
tions of ordinal numbers, J. Math. Sot. Japan, n equal parts. (When a polygon in a Eucli-
16 (1964), l-31. dean plane bounds a tconvex cell, this 2-cell is
[20] J. Moldestad, Computation in higher sometimes called a convex polygon. Thus
types, Springer, 1977. regular polygon sometimes means the convex
[21] J. E. Fenstad, R. 0. Gandy, and G. E. cell bounded by a regular polygon as de-
Sacks (eds.), Generalized recursion theory, scribed above.) A necessary and sufficient
North-Holland, 1974. condition for a regular n-gon to be geometri-
[22] J. E. Fenstad, R. 0. Gandy, and G. E. cally constructible is that n be decompo-
Sacks (eds.), Generalized recursion theory II, sable into the product of prime numbers n =
North-Holland, 1978. 2”p,...p,(m>O),wherethepi(i=1,2 ,...)
[23] J. E. Fenstad, General recursion theory, are different +Fermat numbers (- 179 Geo-
Springer, 1980. metric Construction).
[24] Y. N. Moschovakis, Elementary induc-
tion on abstract structures, North-Holland,
1974. B. Regular Polyhedra
[25] Y. N. Moschovakis, Descriptive set
theory, North-Holland, 1980. Consider a regular polygon on a plane, and
[26] J. Barwise, Admissible sets and structures, take a point on the line perpendicular to the
Springer, 1975. plane at the center of the polygon. The set of
[27] P. G. Hinman, Recursion theoretic hier- points on all half-lines joining this point and
archies, Springer, 1978. points on the polygon (considered as a convex
[28] J. W. Addison, Separation principles in cell) is called a regular polyhedral angle having
the hierarchies of classical and effective de- this point as vertex (Fig. 1).
scriptive set theory, Fund. Math., 46 (1958-
1959) 123-135.
[29] J. W. Addison, Some consequences of the
axiom of constructibility, Fund. Math., 46
(1958-l 959) 3377357.
[30] D. A. Clarke, Hierarchies of predicates of
finite types, Mem. Amer. Math. Sot., 1964.
[3 l] S. C. Kleene, Arithmetical predicates and Fig. 1
function quantifiers, Trans. Amer. Math. Sot.,
79 (1955) 312-340. When a iconvex polyhedron 5 in E3 satis-
[32] S. C. Kleene, Hierarchies of number- fies the following two conditions, we call it a
theoretic predicates, Bull. Amer. Math. Sot., regular polyhedron: (1) Each face of 3, which is
61 (1955) 193-213. a 2-dimensional cell, is a regular polygon, and
1331 S. C. Kleene, Quantification of number- all faces of 3 are congruent to each other. (2)
theoretic functions, Compositio Math., 14 Its vertices are all surrounded alike. That is, by
(1959) 23-40. the projection of 3 from each vertex of 3, we
[34] C. Spector, Recursive well-orderings, J. obtain a regular polyhedral angle; these regu-
Symbolic Logic, 20 (1955), 151-163. lar polyhedral angles are all congruent to each
[35] C. Spector, Hyperarithmetical quantifiers, other. From (2) we see that the number of
Fund. Math., 48 (1959-1960), 313-320. edges emanating from each vertex of 3 is
1329 357 c
Regular Polyhedra

independent of the vertex. It has been known hedron are dual to each other, as are the
since Plato’s time that there are only five kinds icosahedron and dodecahedron. The tetra-
of regular polyhedra: tetrahedrons (Fig. 2), hedron is dual to itself. For a regular poly-
octahedrons (Fig. 3), icosahedrons (Fig. 4), cubes hedron 5, there exist concentric circumscribed
or hexahedrons (Fig. 5), and dodecahedrons and inscribed spheres whose center is the
(Fig. 6) (see also see Table 1). center of symmetry of 5 and is called the
center of ‘& Drawing tangent planes to the
circumscribed sphere at each vertex, we can
obtain a regular polyhedron dual to the given
one (Fig. 8).

Fig. 2 Fig. 3
Regular tetrahedron. Regular octahedron.

Fig. I Fig. 8

In a regular polyhedron, let a be the length


of an edge, f3 the magnitude of the dihedral
angle at each edge, and R and Y the radii of
circumscribed and inscribed spheres, respec-
tively. Then the following relations hold (we
Fig. 4 Fig. 5
assume that each face is a regular p-gon and q
Regular icosakedron. Regular hexahedron
faces meet at each vertex):
(or cube).

I’
N-f’
’ ‘\
\ ,I
\+-- -,’ (1)
\
/’ \
\
’ .[ BR AR
‘\ r=%otEtan-
2 p 2, ;=tanptanq
‘\
@

Fig. 6 (see Table 2). Corresponding to these poly-


Regular dodecahedron. hedra, we have finite subgroups of O(3) called
regular polyhedral groups (- 15 1 Finite
From a given regular polyhedron, we can Groups).
obtain another one by taking as vertices the
centers of all the faces of the given polyhedron C. Higher-Dimensional Cases
(Fig. 7). We say that the given regular poly-
hedron and the one obtained in this way are It is possible to generalize these considerations
dual to each other. The octahedron and hexa- to higher dimensions to define inductively

Table 1. Regular Polyhedra in 3-Dimensional Euclidean Space E 3


Number of
Number of Number of Number of Faces around
Figure Face Vertices Edges Faces a Vertex
Regular Equilateral
tetrahedron triangle 4 6 4 3
Regular Equilateral
octahedron triangle 6 12 8 4
Regular Equilateral
icosahedron triangle 12 30 20 5
Regular
hexahedron Square 8 12 6 3
Regular Regular
dodecahedron pentagon 20 30 12 3
357 Ref. 1330
Regular Polyhedra

Table 2. Numerical Values for Eqs. (1)


Number of
Faces sine e R/a r/a
4 2ti/3 7OO31’43.6” G/4 ti/12
6 1 90” a/2 l/2
8 2ti/3 109O28’16.4” 1/a l/G
fi(ti +I> lhGi77
12 2/G 116O33’54.2”
4 2fi

20 2/3 138” 11’22.8”


2fi 4ti

Table 3. Regular Polyhedra in 4-Dimensional Euclidean Space E4


3-Dimensional Regular Polyhedra
Number of
Figure Kind Number Vertices Duality
Regular 5-hedron Tetrahedron 5 5 a
Regular 8-hedron Cube 8 16
b
Regular 16-hedron Tetrahedron 16 8
Regular 24-hedron Octahedron 24 24 a

b
Regular 120-hedron
600-hedron Dodecahedron
Tetrahedron 120
600 600
120 I
a: dual to itself: b: dual to each other

Table 4. Regular Polyhedra in n-Dimensional Euclidean Space (n > 5)


Regular Polyhedron in R” - ’
Number of
Figure Kind Number Vertices Duality
Regular (n + I)-hedron Regular n-hedron n+l n+1 a
Regular Zn-hedron Regular (2n - 2)-hedron 2n 2”
b
Regular 2”-hedron Regular n-hedron 2” 2n
a: dual to itself; b: dual to each other

regular polyhedra in E”, n > 4. When n = 4 we 358 (11.2)


have 6 kinds of regular polyhedra (Table 3).
For n > 5 we have only 3 kinds (Table 4) (- 70
Relations
Complexes).
A. General Remarks

In its wider sense the term relation means in-


References
ary relation (n = 1,2,3, . . . ) (- 411 Symbolic
Logic G), but in this article we restrict our-
[ 11 J. S. Hadamard, Lecons de gkomttrie selves to its most ordinary meaning, i.e., to the
ClCmentaire II, Armand Cohn, second edition, case n = 2. Let X, Y be two sets and x, y be
1906,425-427. two variables taking their values in X, Y,
[2] D. Hilbert and S. Cohn-Vossen, Anschau- respectively. A proposition R(x, y) containing
lithe Geometrie, Springer, 1932; English x, y is called a relation or a binary relation if it
translation, Geometry and the imagination, can be determined whether R(a, b) is true or
Chelsea, 1952. false for each pair (a, b) in the Zartesian prod-
[3]E. Steinitz and H. Rademacher, Vorlesun- uct X x Y. For example, if both X and Y are
gen iiber die Theorie der Polyeder, Springer, the set of rational integers, then the following
1934. propositions are relations: x < y, x - y is even,
[4] H. S. M. Coxeter, Regular polytopes, x divides y. A relation R(x, y) is sometimes
Methuen, 1948; third edition, Dower, 1973. written as xRy.
[S] H. S. M. Coxeter, Regular complex poly- For a given relation R, we define its inverse
topes, Cambridge Univ. Press, 1974. relation R -’ by yRm’xoxRy. Then R is the
1331 359 B
Relativity

inverse relation of R-‘. In the example above, assume that to any x belonging to the domain
the inverse relation of x < y is y > x, and the A of I there corresponds one and only one
inverse relation of x is a divisor of y is y is a ye Y, namely, I(x) = {y} for any XE A. Then I’
multiple of x. A relation R is called reflexive if is called a univalent correspondence. If I and
xRx holds. R is called symmetric if xRy o yRx I-’ are both univalent correspondences, I is
(namely, if R and R-’ are identical). R is called called a one-to-one correspondence. For given
transitive if xRy and yRz imply xRz. R is sets A and B, a univalent correspondence with
called antisymmetric if xRy and yRx imply domain A and range B is called a tmapping (or
x = y. A reflexive, symmetric, and transi- tfunction) with domain A and range B (- 381
tive relation is called an equivalence relation Sets C).
(- 135 Equivalence Relations). A reflexive and
transitive relation is called a tpreordering. A
References
reflexive, transitive, and antisymmetric relation
is called an tordering (- 3 11 Ordering).
See references to 38 1 Sets.
Suppose that we are given a relation xRy
(xoX, YE Y). Then the set G = {(x, y) 1xRy},
which consists of elements (x, y) of the Car-
tesian product X x Y satisfying xRy, is called
the graph of the relation R. Conversely, for any
subset G of X x Y, there exists a unique rela-
359 (XX.21)
tion R with the graph G given by xRyo Relativity
(x, Y)E G.
A. History

B. Correspondences
The theory of relativity is a system of theoret-
ical physics established by A. Einstein and is
For a subset G of the Cartesian product X x
composed of special relativity and general
Y, the triple I = (G, X, Y) is called a corre-
relativity. Toward the end of the 19th century,
spondence from X to Y. The set X is called the
it was believed that electromagnetic waves
initial set of the correspondence I, and Y the
propagate through the ether, a hypothetical
final set of I. A relation xRy (x E X, y E Y)
medium. A number of experimenters tried to
determines a correspondence r = (G, X, Y) by
find the motion of the earth relative to the
its graph G, and conversely, a correspondence
ether, but all these attempts were unsuccessful
I determines a relation R. Given a corre-
(A. A. Michelson, E. W. Morley). Studying
spondence I = (G, X, Y), the sets A = prx G
these results, in 1905 Einstein proposed the
and B = prr G, where prx :X x Y-+X and
special theory of relativity, which extended
pr,: X x Y+ Y are the kanonical projections,
Galileo’s relativity principle of tNewtonian
are called the domain and range of the corre-
mechanics to telectromagnetism and radically
spondence I, respectively. For x E X, the set
revised the concepts of space and time. Almost
{ye Yl (x, y)~ G} is denoted by G(x) or I(x),
all the conclusions of special relativity theory
and we say that any element y of G(x) corre-
are now confirmed by experiments, and this
sponds to x by I.
theory has even become a guiding principle for
For a subset G of X x Y, we define a subset
developing new theories in physics. By extend-
G-‘of YxXby(x,y)~Go(y,x)~G-‘.Given
ing special relativity, Einstein established
a correspondence I = (G, X, Y), the corre-
(1915) the general theory of relativity. Its prin-
spondence (G-l, Y, X) is denoted by I--’ and is
cipal part is a new theory of gravitation con-
called the inverse correspondence of I. If G is
taining Newton’s theory as a special case. Its
the graph of a relation R, then G-’ is the
conclusions about the solar system are com-
graph of the inverse relation R-‘. The domain
patible with observed facts that are regarded
of the correspondence I is the range of r-i,
as experimental support for the theory. Effects
and vice versa. We have (I-‘)-’ = I.
due to general relativity other than those just
Suppose that we are given correspondences
described have been studied to a considerable
I1 = (Cl, X, Y) and I, = (G,, Y, Z). We define
extent, but it is hard at present to test theoret-
a subset G of X x Z by: (x, Z)E Go there exists
ical results experimentally, and there are some
ys Y satisfying (x, y) E Gi and (y, z) E G2. Then
doubts about the limit of its applicability.
the correspondence I =(G, X, Z) is denoted by
I, o I1 and is called the composite of I, and
r, We have the associative law (r, o r,) o rl = B. Special Relativity
r,o(r,orl)andthelaw(r,orl)-l=r;lo
r;l. In Newtonian mechanics, natural phenomena
Let I be a correspondence from X to Y, and are described in a 3-dimensional Euclidean
359 c 1332
Relativity

space considered independent of time. In spe- the following two postulates: (i) Special prin-
cial relativity, however, it is postulated that ciple of relativity: A physical law should be
space and time cannot be separated but are expressed in the same form in all inertial sys-
unified into a 4-dimensional pseudo-Euclidean tems, namely, in all coordinate systems that
snace with the tfundamental form move relative to each other with uniform
velocity. (ii) Principle of invariance of the speed
ds2=Cgahdxadxb=c2dt2-dx2-dy2-dz2,
a,b
of light: The speed of light in a vacuum is the
same in all inertial systems and in all direc-
where a, b=O, 1, 2, 3 and
tions, irrespective of the motion of the light
(x0, x’,x2,x3)=(ct, x, y, z). source. From these assumptions Einstein
derived (1) as the transformation formula
Here (x, y, z) are spatial Cartesian coordinates,
between inertial systems x = (ct, x, y, z) and
t is time, and c is the speed of light. This space
x’ = (ct’, x’, y’, z’) that move relative to each
was introduced by H. Minkowski and is called
other with uniform velocity u along the com-
Minkowski space-time. By means of it Min-
mon x-axis. This was the first step in special
kowski gave an ingenious geometric interpre-
relativity, and along this line of thought, Ein-
tation to special relativity.
stein solved successively the problems of the
A nonzero vector I’ is called timelike, null
Lorentz-Fitzgerald contraction, the dilation of
(or lightlike), or spacelike according as V2 > 0,
time as measured by moving clocks, the aber-
= 0, or < 0, where V2 = C,, b gob V” Vb.
ration of light, the Doppler effect, and Fresnel’s
The group of motions in Minkowski space-
dragging coefficient.
time is called the inhomogeneous Lorentz
group. Its elements can be written as
I
xi’=~c:x~+-ci, CgabCqCjb=gijr C. Relativity and Electromagnetism
0 o,b

where cj and ci are constants. The transfor- In special relativity, a physical quantity is
mations with ci = 0 are usually called Lorentz represented by a ttensor (or a scalar or a vec-
transformations, and the group G composed of tor) in Minkowski space-time, and physical
these transformations is called the homoge- laws are written in tensor form and are invar-
neous Lorentz group or simply the Lorentz iant under Lorentz transformations of coordi-
group. These are important concepts in special nates. This is the mathematical expression of
relativity. If Go denotes the tconnected compo- the special principle of relativity. Since the
nent of the identity element of G, the factor transformation (1) tends to a Galileo transfor-
group G/G, is an Abelian group of type (2,2) mation in Newtonian mechanics as c + co, the
and of order 4. We call G, the proper Lorentz special principle of relativity is a generalization
group. A frequently used element of Go is of the Newton-Galileo principle of relativity.
x-vt t -(v/2)x To summarize mathematically, it may safely
Lu:xr=Jjq7 t’=*p’ be said that special relativity is a theory of
invariants with respect to the Lorentz group.
Y’=Y, z’=z; IuI<c. (1) To illustrate this conclusion we consider elec-
tromagnetic theory.
Such transformations form a l-dimensional The electric field E is usually represented by
subgroup of Go with u as a parameter, and the
a “polar vector” and the magnetic field H by
composition law of the subgroup is given by
an “axial vector” in a 3-dimensional Euclidean
u+v space. Even if the magnetic field does not exist
L; L,= L,,
w=ix@+ in one inertial system, the field can arise in
another system that moves uniformly rela-
Elements of G not belonging to G, are
tive to the original system. In view of this,
~~~0’~ --a, xi’=x’; i=1,2,3, electric and magnetic fields are considered in
xiI, -xi. , relativity to form one physical quantity with
fj:xO’=xO i=l,2,3.
components
Both T (time reversal) and S (space reflection
or parity transformation) have aroused much
interest among physicists.
Historically, the transformation formula (1)
was first obtained by H. A. Lorentz, under the
assumption of contraction of a rod in the This quantity transforms as an talternating
direction of its movement in order to over- tensor of degree 2 under Lorentz transfor-
come the difftculties of the ether hypothesis, mations. In like manner, the electric charge
but his theoretical grounds were not satisfac- density p and the electric current density J
tory. On the other hand, Einstein started with are unified into a tcontravariant vector with
1333 359 D
Relativity

respect to Lorentz transformations: part of general relativity is a theory of gravita-


tion founded on the general principle of rela-
s=(s0,s1,s2,s3)
tivity and the principle of equivalence. The first
= (P.Jxlc, J,lc, Jzlc). principle is an extension of the special prin-
ciple of relativity to accelerated systems in
Such a vectors in Minkowski space-time
general. It requires that a physical law should
is sometimes called a four-vector as distin-
be independent of the choice of local coordi-
guished from an ordinary vector such as J. If
nates in a 4-dimensional tdifferentiable mani-
the electromagnetic field Fij and the current
fold representing space and time (space-time
four-vector s are thus defined, the +Maxwell
manifold). Since a physical quantity is repre-
equations, the basic equations of electro-
sented by a tensor on the space-time manifold,
magnetism, can be written in tensor form:
physical laws are expressed in tensor form, in
agreement with the first principle. The second
principle claims that gravitational and inertial
mass are equal, and accordingly fictitious
where forces due to acceleration (such as centrifugal
Cg’“gja=Sji. force) cannot be distinguished from gravita-
o,b a tional force. This had been shown with high
In the same way, the equation of motion for a accuracy by the experiments of R. von Eotvos
charged particle in an electromagnetic field (1890) and others.
can be expressed as Starting from these two principles, Einstein
was led to the following conclusion. If a gravi-
tational field is produced by matter, the cor-
responding space-time structure is altered;
namely, flat Minkowski space-time is changed
where e and m are the charge and mass of
into a curved 4-dimensional manifold with
the particle, respectively, and s is the arc
length along the particle trajectory (- 130 +pseudo-Riemannian metric of tsignature (1,3).
The tfundamental tensor gij of this manifold
Electromagnetism).
Though special relativity originated in represents the gravitational potential, and the
studies of electromagnetic phenomena, it has gravitational equation satisfied by gij can be
gradually become clear that the theory is valid expressed as a geometric law of the manifold.
also for other phenomena. One interesting Gravitational phenomena are thus reduced to
result is that the energy of a particle moving properties of the geometric structure of the
with uniform velocity u is given by space-time manifold. This idea, which was not
seen in the older physics, became the motif in
E = rnc’/Jw, the development of tunified field theories.
and accordingly even a particle at rest has Now the gravitational law proposed by
Einstein is an analog of the +Poisson equation
energy mc2 (rest energy). This shows the equiv-
in Newtonian mechanics. Let R, and R be the
alence of mass and energy, with the conversion
formula given by E = mc’. This conclusion has +Ricci tensor and the tscalar curvature formed
from gij. Then outside the source of a gravita-
been verified experimentally by studies of
tional field, gij must satisfy
nuclear reactions and has become the basis
of the development of nuclear power. The Gij = R, - g,R/2 = 0, that is, R, = 0, (2)
special principle of relativity also showed its
and inside the source,
validity in the electron theory of P. A. M.
Dirac (1928) and the quantum electrodynamics
of S. Tomonaga (1943) and others. It has,
where K is the gravitational constant. Here the
however, been shown that the invariance for
energy-momentum tensor Tj is a tsymmetric
space reflection (namely for the coset SG, of
tensor representing the dynamical state of
the Lorentz group G) is violated in the decay
matter (energy, momentum, and stress). Usu-
of elementary particles (T. D. Lee and C. N.
ally (2) and (3) are called the exterior and
Yang, 1956; C. S. Wu et al., 1957). Similar
interior field equations, respectively.
results have been obtained for time reversal
Next, the equation of motion of a particle in
(J. H. Christenson, J. W. Cromin, V. L. Fitch,
a gravitational field is given by
and R. Turlay, 1964).

D. General Relativity
if the particle mass is so small that its effect on
Special relativity has its origin in studies of the field is negligible. Here 6/&s stands for
electromagnetic phenomena, while the central tcovariant differentiation with respect to the
359 E 1334
Relativity

arc length s along the particle trajectory. In out on such problems as the gravitational field
other words, a particle in a gravitational field of a spinning mass, the dynamical process of
moves along a timelike tgeodesic in the space- gravitational collapse, the space-time structure
time manifold. Similarly, the path of light is of black holes, the generation of gravitational
represented by a null geodesic, whose equation waves, the global structure and dynamics of
is formally obtained from (4) by replacing the the universe, and so on. A comparison of the
right-hand side of the second equation by zero. theoretical predictions and the observations is
Experimental verification of the theory of generally favorable, but the phenomena in the
general relativity has been obtained by detec- universe are so complex that the effects of
tion of the following effects: the shift of spec- general relativity cannot always be isolated.
tral lines due to the gravity of the earth and of
white dwarfs, the deflection of light or radio
waves passing near the sun, the time delay of E. Solutions of Einstein’s Equations
radar echo signals passing near the sun, and
the advance of the perihelion of Mercury. All The isometric symmetry of space-time is de-
the observational data are compatible with the scribed by tKilling vectors. The stationary
theoretical results. Time delay and the advance metric is characterized by a timelike Killing
of the perihelion have been observed in a vector, in which case equation (2) reduces to
binary system of neutron stars. It is generally an telliptic partial differential equation on a
accepted that these results are experimental 3-dimensional manifold. If the space-time is
verifications of general relativity. axially symmetric as well as stationary, (2)
It should be noted that (2) has wave solu- reduces to the Ernst equation:
tions, which have no counterpart in Newton’s
(&+&*)V*&=2V&.VE, (5)
gravitational theory. This fact implies that
gravitational effects propagate with the veloc- where V represents divergence in a flat space.
ity of light. The gravitational waves transport The metric tensors are derived from the com-
energy and momentum, and the gravitational plex potential E. The solutions of (5) can be
mass is decreased by the emission of the waves. obtained using techniques developed for the
Experiments to detect gravitational waves soliton problem.
generated in the universe have been planned, One example of stationary and axially sym-
and the decrease of the orbital period of a metric solutions is the Kerr metric, which is
binary star system due to the emission of written as
gravitational waves has been observed.
The concept of gravitational waves suggests ds’=c*dt*-F(asin’edrp-cdt)*
the existence of a quantum of the gravitational
field (graviton); however, the detection of the 2

graviton is far from feasible. -P2 -(r2+u2)sin2Bd~2,


In the interior equation (3), the matter pro-
(6)
ducing a gravitational field is represented by a
tensor Tj of class Co. But there is also a way of withp2=r2+a2cos20andA=r2-2mr+
representing it by singularities of a solution of a*. This metric solution represents a gravita-
the exterior equation (2). From this point of tional field around a spinning mass with mass
view, the equation of motion of a material M = mc2/G and angular momentum J=
particle (i.e., singularity) is not assumed a Mac. When a = 0, this metric reduces to the
priori as in (4), but is derived as a result of (2) Schwarzschild metric.
(A. Einstein, L. Infeld, and B. Hoffman, 1938). Applying a Backlund transformation to the
In the static and weak field limits, the funda- Kerr metric, an infinite series of stationary and
mental form is approximately given by New- axially symmetric solutions can be derived. All
ton’s gravitational potential cp as these solutions belong to the space-time metric
with, in general, two Killing vectors.
The dynamical evolution of space-time
-(1+(p/cZ)(dx2+dy2+dz2), structure has been studied by means of the
+Cauchy problem of general relativity. Choos-
and (2) and (3) reduce in this limit to Laplace’s ing appropriate dynamical variables, equation
and Poisson’s equations, respectively. New- (2) or (3) is divided into constraint equations
ton’s theory of gravity is valid in the limit in terms of the initial data and evolution
$7/c* << I. equations in terms of the dynamical variables.
Stimulated by the discoveries of neutron The latter thyperbolic equations may also be
stars and black holes and by the big-bang written in Hamiltonian form.
theory of the universe in the 1960s numerous A typical example of such a problem is
studies of general relativity have been carried the dynamics of a spatially homogeneous 3-
1335 359 Ref.
Relativity

dimensional manifold; this has been studied as A sufficient condition for occurrence of a
a cosmological model. The space-time with singularity is that there be some point p such
a constant scalar curvature R is called de Sitter that all the null geodesics starting from p
space, and reduces to the Minkowski space if converge to p again, In addition to this con-
R = 0. If the 3-dimensional space is isotropic dition, for the proof of the singularity theorem
as well as homogeneous, the metric takes the it is presumed that the space-time is free of
form closed nonspacelike curves, that a Cauchy
surface exists, and that the energy-momentum
ds* =c2dt2-a(t)*{dx2 +f(x)*
tensor satisfies the condition
x (de* +sin* Od#)},

where f(x) = sin x, x, or sinh x. These are called


Robertson-Walker metrics and are considered
to describe a realistic expanding universe. for any timelike vector Vi. The singularity
whose existence is implied by this theorem
means that the space-time manifold is geodesi-
F. Global Structure of Space-Time tally incomplete (the space-time is complete if
every geodesic can be extended to arbitrary
Following the advances of modern differential values of its affine parameter).
geometry, manifestly coordinate-independent The causal structure of space-time is also
techniques to analyze space-time properties related to the global structure of the manifold.
have been applied to general relativity. The In this regard, black holes have been intro-
mathematical model of space-time is a con- duced as the final state of gravitational col-
nected 4-dimensional THausdorff C”-manifold lapse. In the black-hole structure of space-
endowed with a metric of signature (1,3). The time, there exists a closed surface called an
metric allows the physical description of local event horizon in an asymptotically flat space.
causality and of local conservation of energy The event horizon is the boundary (the set
and momentum. The metric functions obey the of points) in space-time from which one can
Einstein field equation (2) or (3). escape to infinity, or the boundary of the set
In order to clarify the global structure of of points that one can see from the infinite
the solutions of Einstein’s equations, maxi- future. Then the black hole is a region from
mally analytic extension of the solutions has which no signal can escape to the exterior of
been studied. The maximal extension of the the event horizon.
Schwarzschild metric is given as If we assume that singularities do not exist
in the exterior of the event horizon, a station-
32m3 ary black-hole structure is uniquely described
ds* = --e-*i*“‘(dv* -du*)
r by the Kerr metric [6]. In the case of spheri-
cally symmetric collapse, this assumption
- r*(dB* + sin’ Odq2),
is verified and the final metric is given by
using the Kruskal coordinates, which are re- the Schwarzschild metric. However, it is not
lated to the coordinates of (6) by known whether this assumption is true in

( > _
more general gravitational collapse.
k-1 e r/*m = u2 “2,

References
tanh t/4m = UJV or v/u.

To study the global structure at infinity, a [l] A. Einstein, The meaning of relativity,
tconformal mapping of the metric is used. For Princeton Univ. Press, fifth edition, 1956.
example, the Minkowski metric is written in [2] H. Weyl, Space, time, matter, Dover, 1950.
the form ds* = O* &*, where (Original in German, 1923.)
[3] A. Lichnerowicz, Theories relativistes de la
gravitation et de l’electromagnetisme, Masson,
andfi=secpsecq, tanp=t+r, tanq=t-r. 1955.
By means of this mapping, all points, includ- [4] C. W. Misner, K. S. Thorne, and J. A.
ing infinity, are assigned finite p, q coordinate Wheeler, Gravitation, Freeman, 1973.
values in - n/2 < q < p < n/2. [5] L. Witten (ed.), Gravitation, an introduc-
Singularities in space-time are one of the tion to current research, Wiley, 1962.
major problems concerning the global struc- [6] C. Dewitt and B. S. Dewitt (eds.), Black
ture of the manifold. For some Cauchy prob- holes, Gordon & Breach, 1972.
lems relevant to cosmology and gravitational [7] S. W. Hawking and G. F. R. Ellis, The
collapse, the inevitable occurrence of a sin- large scale structure of space-time, Cambridge
gularity has been proved (singularity theorem). Univ. Press, 1973.
360 Ref. 1336
Renaissance Mathematics

[S] M. Carmeli, Group theory and general solution of equations of the third degree in his
relativity, McGraw-Hill, 1977. book Ars magna (1545). The solution was due
[9] S. W. Hawking and W. Israel (eds.), Gen- to Tartaglia, to whom acknowledgment was
eral relativity, an Einstein centenary survey, made, although publication of the method
Cambridge Univ. Press, 1979. was against his will. This constitutes a famous
[ 101 A. Held (ed.), General relativity and episode in the history of mathematics, but
gravitation I, II, Plenum, 1980. what is historically more important is the fact
that essential progress beyond Greek mathe-
matics was made by mathematicians of this
period, since the Greeks were able to solve
equations only of degrees 1 and 2. Algebra
was subsequently systematized by the French
360 (Xx1.8) mathematician F. Viete (1540- 1603).
Renaissance Mathematics By the end of the 15th century, practical
mathematics (influenced by the Arabians) had
Toward the middle of the 13th century, scho- become popular in Europe, and more ad-
lastic theology and philosophy were at their vanced mathematics began to be studied in
height with the Summu theologiae of Thomas European universities, especially in Italy. In
Aquinas (1225‘?-1274); but in the latter half 1543, N. Copernicus (147331543) published
of the century, the English philosopher Roger his heliocentric theory (1543); G. Galilei (called
Bacon (1214-1294) attacked Aquinian philoso- Galileo) (1564- 1642), the indomitable propo-
phy in his Opus majus, insisted on the impor- nent of this theory, was also born in the 16th
tance of experimental methods in science, and century. Copernicus studied at the Univer-
strongly urged the study of mathematics. The sities of Bologna, Padua, and Ferrara; Galileo
Renaissance flourished first in Italy, then in studied at the University of Pisa and taught at
other European countries in the 15th and 16th the Universities of Pisa, Padua, and Florence.
centuries, in the domains of the arts and liter- A system of numeration was imported from
ature. Newer ideas in mathematics and the Arabia to Europe in the 13th century; by the
natural sciences dominated the 17th century. time of S. Stevin (1548?? 1620?) it took the
However, it was the invention of printing in definite form of a decimal system, and with the
the 15th century, the translation of the Greek development and acceptance of printing, the
texts of Euclid and Archimedes into European forms of the numerals became fixed.
languages, and the importation of Arabian
science into Europe during the Renaissance
References
that prepared for this development.
In the 15th century, the German priest
[l] M. Cantor, Vorlesungen iiber Geschichte
Nicolaus Cusanus (1401- 1464) discussed
der Mathematik II, Teubner, second edition,
infinity, the convergence of infinite series, and
1900.
some problems of quadrature. During the
[2] J. Cardan (G. Cardano), The book of my
same period, the German scholar Regiomon-
life (trans. J. Stoner), Dover, 1963.
tanus (143661476) wrote the first systematic
[3] G. Cardano, Opera, 10 vols., Lyon, 1663.
treatise on trigonometry independent of as-
[4] N. Bourbaki, Elements d’histoire des
tronomy. Leonardo da Vinci (145221519),
mathematiques, Hermann, 1960.
the all-encompassing genius born in the same
century, left manuscripts in which he wrote
about mechanics, geometric optics, and per-
spective. Da Vinci’s contemporary, the Ger-
man painter A. Diner (1471I1528), wrote a 361 (Xx.32)
textbook on perspective. In 1494, L. Pacioli
(1445?% 15 14) published Summa de arithmetica,
Renormalization Group
one of the first printed books on mathematics.
Its content, influenced by Arabian mathemat- A. Introduction
ics, includes practical arithmetic and double-
entry bookkeeping. The book enjoyed wide The concept of renormalization was intro-
popularity. duced by S. Tomonaga, J. S. Schwinger, M.
The best known result of l6th-century Gell-Mann, and F. E. Low in order to over-
mathematics is the solution of algebraic equa- come the difficulty of divergence in field theory.
tions of degrees 3 and 4 by the Italian mathe- If the upper bound of the momentum is limited
maticians Scipione del Ferro, N. Tartaglia to a finite cutoff value A, then physical quan-
(1506- 1557) G. Cardano (1501~ 1576), and L. tities, for example, the mass WI of an electron,
Ferrari (1522-I 565). Cardano published the can be obtained as finite quantities by letting
1337 361 C
Renormalization Group

A go to infinity after summing all divergent satisfies the renormalization equation


terms. This is called the renormalization method
using subtraction. Since the cutoff A is arbitrary
insofar as it is finite, the Green’s functions are
indefinite because they depend on A. This
(P$+P$-2NY
>
d’2N’({Pi},/4g)=0,

where fl= p dg/dp and y = fp d log Z,( p)/dp. If


dependence on the cutoff A corresponds to the coefficients jl and y are calculated pertur-
the response for the scale transformation of bationally up to a certain order, the renormal-
length, and this transformation is a certain ized Green’s function is obtained up to the
(semi) group, called a renormalization group. same order by solving the foregoing renor-
Several kinds of renormalization group have malization equation. This is the first kind of
been used in field theory, as well as in the renormalization group. A second kind ex-
statistical mechanics of phase transition. presses the response of the renormalized
Green’s function to the change in the mass and
coupling constant, and is expressed by the
B. Renormalization Group in Field Theory Callan-Symanzik equation [4,5]
Cl-31
m&+/?(g)$+2Ng,(g) GfN)=AGgN),
>
A typical method t6 resolve the ultraviolet
divergence is to add subtraction terms in the where /?(g) = Zm, ag/am,, and y,(g) = +Zrn,.
Lagrangian so that they cancel the divergence. alogZ,/am,, and where Z is determined by
This cancellation is usually performed in each Zm,am/am, = m. The inhomogeneous term is
order of the perturbation expansion. When defined by
the addition of a finite number of subtraction a
terms cancels the divergence, the relevant AGjfN)=Zmo-G~2N)
Hamiltonian is said to be renormalizable. am0
Otherwise it is called unrenormalizable. The
+Zm,~(NlogZ,)G~2N’~
Lagrangian density 0

The irreducible Green’s function rihZN) satisfies

m&+B(g)$-ZNy,(g) rA2N)=iArA2N).
for example, can be renormalized by the trans- >
formation ‘pO= Zj’2q, g,, = Z, Z;*g, and rni =
m2 + am’. All the divergences are taken into Since the inhomogeneous term can be neg-
the renormalization constants Z,, Z,, and lected in the high-energy region, the foregoing
6m2, so that the renormalized quantities cp, equation becomes homogeneous, and con-
g, and m are finite. These renormalization sequently its solution is
constants can be calculated by means of a
perturbation
circumventing

indeterminacy
method, but the requirement
the divergences alone is not
sufficient to determine them explicitly. This
is usually expressed as Z,(p)
of

=exp
[ So.)
s -2N
1 y(g’M-W&’
and Z,(p), i.e., in terms of a parameter p,
called the renormalization point. The p-
dependence of these functions can be deter-
xrA2N’(Pi,
m,
s(4). 9

Here g(n) is the solution of the equation


mined by means of the following renormaliza- ~,S(“)p-‘(g’)dg’ =log1. Since dimensional analy-
tion conditions: sis yields Th2N)(lpi, m, g) = 14-2NTR (pi, m/A, g),
the foregoing solution shows that high-energy
phenomena can be described by the low-
energy phenomena whose coupling constant is
r(Pi,PL)= l, pipj+l -46,). given by g, = g( co). In particular, when gm = 0,
high-energy phenomena are described by the
Since the renormalization constants depend on asymptotically free field. This circumstance is
the continuous parameter p, the renormalized called asymptotic freedom.
Green’s function and coupling constant g
are also functions of p, and consequently the
quantity dCzN)({pi}, p, g(p)) defined by C. Renormalization Group Theory in
Statistical Physics

The renormalization group technique has


proved to be powerful in statistical physics,
361 Ref. 1338
Renormalization Group

particularly in studies of phase transitions and By taking Q1 as the energy operator, we have
critical phenomena [6&10]. The correlation h, -T- T,= t. By the normalization h”lh, = 1,
length < diverges like t(T)-(T- T,)-” near we obtain the scaling law
the critical point T,, where v is called the crit-
ical exponent of 5. Similarly, the correlation
.f‘(t,..., h, ,... )-td’il,f(l ,..., hj/t’+ ,...)

function C(R) for the distance R behaves like The critical exponent of the specific heat de-
C(R)-Rm(dm2+q)~exp(-~R), K=<-‘, where fined by C-t P is given by the formula c(=
d is the dimensionality of the system and q is 2 -d/l,. Other scaling exponents { qj} can be
the exponent describing the deviation of the obtained via the formula ‘pj = ?,/A, from the
singular behavior of C(R) from classical the- eigenvalues of K. The simplest example of
ory. Renormalization is useful in evaluating R, is the case where a single interaction para-
these critical exponents systematically. The meter K is transformed into a new parameter
fundamental idea is to eliminate some degrees K’ by K’=f,(K). The fixed point K* is given
of freedom, to find recursion formulas for by the solution of K* =fb(K*). The correla-
interaction parameters, and then to evaluate tion exponent v defined by 5 -(K-K*)-’ is
critical exponents from their asymptotic be- given by the Wilson formula v = log b/log A,
havior near the fixed point. There are many A=(df,/dK),=,,. In most cases, R, is con-
different ways of carrying out this idea ex- structed perturbationally, and critical expo-
plicitly. Roughly classifying these into two nents are usually calculated in power series of
groups, we have (i) momentum-space renor- E-d,-d, as ~=~o+~,~+~2~2+..., where
malization group theories [6&10] and (ii) d, denotes the critical dimension. This is called
real-space renormalization group theories the c-expansion. The first few terms are cal-
[IO, 1 I]. culated explicitly for specific models, such as
The common fundamental structure of these the cp4-model. By applying the Bore1 sum
renormalization group techniques is explained method to these &-expansions, one can esti-
as follows. First the momentum space or real mate critical exponents [IO, 111.
space is divided into cells and rapidly fluctuat- The renormalization group method can be
ing parts, namely, small-momentum parts applied to other many-body problems, such as
inside each cell are integrated or eliminated, the Kondo effect [ 121.
and consequently the remaining slowly fluc-
tuating parts, namely, long-wave parts, are
References
renormalized. The original Hamiltonian X0 is
transformed into Z1 by means of this elimi-
[1] E. C. G. Stueckelberg and A. Petermann,
nation process and by some scale transforma-
La normalisation des constantes dans la
tion that preserves the phase space volume.
thkorie des quanta, Helv. Phys. Acta, 26 (1953),
This renormalization operation is written as
499-520.
R,: i.e., Z, = RbXO. Similarly we have X1 =
[Z] M. Cell-Mann and F. E. Low, Quantum
R,P,=R;& /..., s’?~=R,,~~~-,=R$F,, ,....
electrodynamics at small distances, Phys. Rev.,
This transformation R, has the (semi) group
95 (1954), 1300-1312.
property R,,. = R,R,.. A generator G is defined
[3] N. N. Bogolyubov and D. V. Shirkov,
by G = lim,,, +O(R, - l)/(b - 1). That is, R, =
Introduction to the theory of quantized fields,
exp(lG), e’ = h. The transformation of X is
Interscience, 19.59.
expressed as dX/dl= G [X]. The fixed point
[4] C. G. Callan, Jr., Broken scale invariance
A‘*=R,,8* is the solution of G[X*] =O. In
in scalar field theory, Phys. Rev., D2 (1970),
order to find critical exponents from the as-
1541-1547.
ymptotic behavior of G near Z*, we consider
[S] K. Symanzik, Small distance behaviour
a Hamiltonian of the form 2 =X* + wQ and
in field theory and power counting, Comm.
expand G[X] as G[X* + wQ] = wKQ +
Math. Phys., I8 (1970), 227-246; Small
0(w2). If the operator K thus defined has a
distance-behaviour analysis and Wilson
negative eigenvalue ii, the corresponding
expansions, Comm. Math. Phys., 23 (1971),
physical quantity Qi becomes irrelevant after
49-86.
repeating the renormalization procedure, and
[6] C. DiCastro and G. Jona-Lasinio, On the
the physical quantity Q, corresponding to a
microscopic foundation of scaling laws, Phys.
positive eigenvalue lLj > 0 becomes relevant.
Lett., 29A (1969), 322-323; C. DiCastro, The
Thus, by introducing a field hj conjugate to
multiplicative renormalization group and the
the relevant operator Qj, we study the Hamil-
critical behavior in d = 4 -E dimensions, Lett.
tonian ,% =X0* + CjhjQj. The free energy per
Nuovo Cimento, 5 (1972), 69-74.
unit volume f[X] =f(h,,h2, . ..) is found to
[7] K. G. Wilson, Renormalization group and
have the scaling property
critical phenomena. I, Renormalization group
.fV 1,..., hj ,... )~b-~f(b”lh ,,..., h’~h, ,... ). and the Kadanoff scaling picture, Phys. Rev.,
1339 362 B
Representations

B4 (1971), 3174-3183; II, Phase-space cell G-set. Giving a permutation representation of


analysis of critical behavior, Phys. Rev., B4 G in M is equivalent to giving the structure of
(1971), 3184-3205. a left G-set to M. A reciprocal permutation
[S] K. G. Wilson and M. E. Fisher, Critical representation of G in M is an tantihomomor-
exponents in 3.99 dimensions, Phys. Rev. Lett., phism G-+G,, which becomes a homomor-
28 (1972), 240-243. phism if we define the multiplication in G, by
[9] K. G. Wilson and J. Kogut, The renor- the right notation x(fg) = (xf)g. If the product
malization group and the E expansion, Phys. xa E M of a E G and x E M is defined and satis-
Rep., 12C (1974), 75-200. ties the conditions x(ab) = (xa)b and xl =x,
[lo] C. Domb and M. S. Green (eds.), Phase then, as before, G is said to operate on M from
transitions and critical phenomena VI, Aca- the right, and M is called a right G-set. Giving
demic Press, 1976. a reciprocal permutation representation of G
[ 1 l] A. A. Migdal, Recursion equations in in M is equivalent to giving the structure of a
gauge field theories, Sov. Phys. JETP, 42 right G-set to M.
(1975), 413-418; Phase transitions in gauge A (reciprocal) permutation representation is
and sublattice systems, Sov. Phys. JETP, 42 said to be faithful if it is injective; the corre-
(1975), 743-746. (Original in Russian, 1975.) sponding G-set is also said to be faithful. In
[ 123 K. G. Wilson, The renormalization particular, we can take G itself as M and de-
group: Critical phenomena and the Kondo line the left (right) operation by the multipli-
problem, Rev. Mod. Phys., 47 (1975), 773-840. cation from the left (right). Then we have a
[ 131 L. P. Kadanoff, Notes on Migdal’s recur- faithful permutation representation (reciprocal
sion formulas, Ann. Phys., 100 (1976), 359- permutation representation), which is called
394. the left (right) regular representation of G. For
UE G, the induced permutation ac:x-+ax (xa)
is called the left (right) translation by a.
We call a left G-set simply a G-set. If a sub-
362 (IV.1 6) set N of a G-set M satisfies the condition that
a E G, x E N implies ax EN, then N forms a G-
Representations set, which is called a G-subset of M. If a G-set
M has no proper G-subset (i.e., one different
A. General Remarks from M itself and the empty subset), then for
any two elements x, ye M there exists an ele-
For a mathematical system A, a mapping from ment a E G satisfying ax = y. In this case, the
A to a similar (but in general “more concrete”) operation of G on M is said to be transitive,
system preserving the structure of A is called a and the corresponding permutation represen-
representation of A. In this article, we consider tation is also said to be transitive. If an equiva-
the representations of tgroups and tassociative lence relation R in a G-set M is compatible
algebras. For representations of other alge- with the operation of G (i.e., R satisfies the
braic systems - 42 Boolean Algebras; 231 condition that a E G, R(x, y) implies R(ax, ay)),
Jordan Algebras; 248 Lie Algebras. For topo- then the quotient set M/R forms a G-set in the
logical, analytic, and algebraic groups - 13 natural way, called the quotient G-set of M by
Algebraic Groups; 69 Compact Groups; 249 R. If a G-set M has no nontrivial quotient G-
Lie Groups; 422 Topological Abelian Groups; set, i.e., if the only equivalence relations com-
423 Topological Groups; 437 Unitary Repre- patible with the operation are
sentations. For specific groups - 60 Classical
R(x, x’) for any x, X’E M
Groups; 61 Clifford Algebras.
and
B. Permutation Representations of Groups R(x, x’) if and only if x = x’,

We denote by 6, the group of all tpermuta- then the operation of G on M and the corre-
tions of a set M (- 190 Groups B). A permuta- sponding permutation representation are said
tion representation of a group G in M is a to be primitive.
homomorphism G-+Gw. We denote by aM the A mapping f: M+ M’ of G-sets is called a
permutation of M corresponding to a~ G and G-mapping (G-map) if the condition f(ax) =
write Q(X) = ax (x E M). Then we have a con- af(x) (a~ G, x E M) is satisfied. G-injection, G-
dition (ab)x = a(bx), lx = x (a, be G, 1 is the surjection, and G-bijection are defined natur-
identity element, x E M). In general, if the prod- ally. The inverse mapping of a G-bijection is
uct axcM of aEG and XCM is defined and also a G-bijection. Two permutation represen-
satisfies this condition, then G is said to oper- tations are said to be similar if there exists a
ate on M from the left, and M is called a left G-bijection between the corresponding G-sets.
362 C 1340
Representations

Let M be a transitive G-set, and fix any be extended uniquely to a linear representa-
element XE M. If we view G as a G-set, the tion of the tgroup ring K [G] in M, and con-
mapping f: G+ M defined by f(a) = ax is a G- versely, the restriction of a linear representa-
surjection and induces a G-bijection f: G/R tion of K[G] in M to G is a linear representa-
+ M. Here an equivalence class of R is pre- tion of G; and similarly for reciprocal linear
cisely a left coset of the stabilizer (stability representations. Thus the study of (reciprocal)
group or isotropy group) H, = {u E G 1ax = x}. linear representations of a group G in M can
Hence we have a G-bijection G/H,+ M. Con- be reduced to the study of (reciprocal) linear
versely, for any subgroup H of G, G/H is a representations of the group ring K [G] in M.
transitive left G-set. A transitive G-set is called We now consider the linear representation
a homogeneous space of G. of associative algebras, which we call simply
For a family {Mnjnci\ of G-sets, the Car- “algebras.” (Note that a group ring has a
tesian product n,,, M, and the tdirect sum canonical basis-the group itself-and allows
CIE,, M, become G-sets in the natural way; a more detailed investigation; - Sections G,
they are called the direct product of G-sets and I.1
the direct sum (i.e., disjoint union) of G-sets, Given a commutative ring K with unity and
respectively. Every G-set M is the direct sum of a linear representation p of a K-algebra A in a
a family {M,} of transitive G-subsets, and each K-module M, we introduce the structure of a
M, is called an orbit (or system of transitivity). left A-module into M by defining ax=p(a)x
For a G-set M, the direct product G-set Mk = (as A,XE M); the structure of a K-module in M
M x . . x M (k times) contains a G-subset obtained by the canonical homomorphism K
M’k)={(xl,...,xk)Ii#jimpliesxi#xj}.IfM(k) + A coincides with the original one. This A-
is transitive, M is said to be k-ply transitive. If module is called the representation module of
M is transitive and the stabilizer of each point p. Conversely, for any left A-module M we can
of M consists of the identity element alone, M define a linear representation p of A in M
is said to be simply transitive. (with M viewed as a K-module via K + A) by
If M has n elements, a permutation repre- putting p(a)x = ax; the representation module
sentation of a group G in M is said to be of of p coincides with the original one. This rep-
degree n. When G is a group of permutations resentation p is called the linear representa-
of M, the canonical injection G+ 6, is a tion associated with M. A reciprocal linear
faithful permutation representation; this case representation of A corresponds to a right A-
has been studied in detail (- 151 Finite module. Thus the study of (reciprocal) linear
Groups G). representations of A is equivalent to the study
of left (right) A-modules. For instance, if the
operation of a group G on M is trivial: crx =x
C. Linear Representations of Groups and (0~ G, x E M), the corresponding representation
Associative Algebras of G in M assigns the identity mapping I, to
every e E G. Furthermore, if M = K, this repre-
Let K be a tcommutative ring with unity ele- sentation is called the unit representation of G
ment and M be a K-module. Though we shall (over K).
mainly treat the case where K is a field and M Let p, p’ be linear representations of A in
is a finite-dimensional tlinear space over K, the K-modules M, M’, respectively. Then an
case where K is an tintegral domain and M is A-homomorphism M-M’ is precisely a K-
a tfree module over K of finite rank is also homomorphism f: M -+ M’ satisfying the con-
important. Since K is commutative, we can dition fop(a)=p’(a)of(a~A); this is some-
write Ix=xi (~EK,xEM). Let c?~(M) be the times called a homomorphism from p to p’.
tassociative algebra over K consisting of all K- In particular, an A-isomorphism is a K-
endomorphisms of M, and let GL(M) be the isomorphism f: M + M’ satisfying the con-
group of all tinvertible elements in G,(M), dition fop(a)o =$(a) (uEA); in this case
where we assume M # (0). Let A be an as- we say that p and p’ are similar (isomorphic or
sociative algebra over K. A linear represen- equivalent) and write p g p’.
tation of the algebra A in M is an algebra Let M be the representation module of a
homomorphism A+c?~(M). We always assume linear representation p of an algebra A. If p is
that A has a unity element and the homomor- injective, p and the corresponding M are said
phisms are unitary. For convenience, we can to be faithful. For example, the linear represen-
also consider a linear representation in the tation associated with the left A-module A is
trivial space M = {O}, which is called the zero faithful; this is called the (left) regular represen-
representation. A reciprocal linear represen- tation of A. If M is tsimple as an A-module,
tation is an antihomomorphism A-P&~(M). A p is said to be irreducible (or simple). A homo-
linear representation of a group G in M is a morphism from an irreducible representation p
group homomorphism G+GL(M). This can to p must be an isomorphism or the zero
1341 3623
Representations

homomorphism (Schur’s lemma). In particular, Therefore T and T' are similar if and only if
if K is an talgebraically closed field and M is n = n’ and PT(a)P-' = T'(a) (a~ A) for some
finite-dimensional, then such a homomor- n x n invertible matrix P. For a representation
phism is a scalar multiplication. A linear rep- of a group G, it suffices that this equation is
resentation is said to be reducible if it is not satisfied by all a E G.
irreducible. If M is tsemisimple as an A- We always assume that K is a field. Then a
module, p is said to be completely reducible (or K-module is a linear space over K. A linear
semisimple). If A is a semisimple ring, any representation p over K of a K-algebra A is
linear representation of A is completely reduc- said to be of degree n if its representation
ible. The converse also holds (- 368 Rings G). module M is of dimension n over K. Suppose
The linear representations associated with a thatasequence{O}=M,,cM,c...cM,=M
submodule and a quotient module of M as an of A-submodules of M is given. We take a
A-module are called a subrepresentation and a basis (ei, . , e,) of M over K such that (er,
quotient representation, respectively. The linear .‘.> e,i) forms a basis of Mi over K (1 < i < r).
representation associated with the direct sum Then the matrix representation corresponding
M, + . . . + M, of the representation modules to p relative to the basis (er , . . . , e,) has the
M,, . . , M, of linear representations pl, . . , pr is form
written p1 + . . . + p, and called the direct sum of r
T,,(a) Z-n(a) . .. T,, (~1
representations. If p is never similar to the
direct sum of two nonzero linear representa- T,,(a) .. . T,,(a)
tions, then p is said to be indecomposable; this u-+T(u)=
means that M is tindecomposable as an A-
-module. 0 T, 6)
For linear representations p, p’ of a group G where, if we put n, = dim M,/M,-, = m, - m,-, ,
in M, M', we define the linear representation
Ti(a) is an ni x nj matrix and 7$1) = 0 for i > j.
P 0 P’ in M 0 ~4’ by (P 0 p’M=p(d 0 p’kd The residue classes of e,i+,+, , . . . , e mi form a
(go G); this is called the tensor product of repre-
basis of the quotient space Mi/Mi-, over K,
sentations p and p’.
and the matrix representation corresponding
to the linear representation pi associated with
M,/M,+, relative to this basis is given by qi.
D. Matrix Representations The sequence { Mi} is a tcomposition series if
and only if each pi (hence Ti) is irreducible. In
Let K” be the K-module consisting of all n- this case, pl, . .. , p, are uniquely determined by
tuples (ci) of elements in a commutative ring
p up to their order and similarity (Jordan-
K. cY~(K")is identified with the K-algebra HSlder theorem). An irreducible representation
M,(K) of all n x n matrices (1,) over K:(&)(tj) p’ similar to some pi is called an irreducible
=(&r nijtj). Thus a linear representation of component of p. The number p > 0 of pi similar
A in K", i.e., a homomorphism A-t M,(K), is to p’ is called the multiplicity of p’ as an irre-
called a matrix representation of A over K, and ducible component of p. We also say that p
n is called its degree. A matrix representation
contains p’ p times or p’ appears p times as an
of a group G over K of degree n is a homo- irreducible component of p. The representa-
morphism G+GL(n,K), where GL(n, K) is tion p is completely reducible if and only if it
the group of all n x n invertible matrices. If
is similar to the direct sum of its irreducible
(e i, . . . , e,) is a +basis of a K-module M, then by components (admitting repetition). In this case,
the K-isomorphism K"+ M given by the as- p is similar to the matrix representation
signment (&)-C~=i ei&, we have a bijective
correspondence between the matrix represen- T,,(a) 0
tations of A of degree n and the linear repre- I 1
sentations of A in M, and the corresponding
representations are similar. Explicitly, the
linear representation p corresponding to a
matrix representation a-*(Aij(u)) is given by
E. Coefficients and Characters of Linear
P(akj= t eilij(a), UCA. Representations
i=l

Hence giving the finite-dimensional linear We consider the linear representations of an


representations over a field K is equivalent to algebra over a field K. A right (left) A-module
giving the matrix representations over K. Let M is regarded as a linear space over K. In its
T, T’ be matrix representations of degree n, n’. dual space M*, we introduce the structure of a
Then a homomorphism from T to T’ is an n’ left (right) A-module using the inner product
x n matrix P satisfying PT(a)= T'(a)P (as A). ( , ) as follows: (x. ax*) = (xa, x*) ((x, x*a)
362 F 1342
Representations

zz (ux,x*)), where UEA, XCM, x*EM*. If p is tion that corresponds to p with respect to
the representation associated with M, the rep- this basis. Then i, = P?,.~: (1 < i, j < n), where
resentation associated with M* is called the (eT, . . . . e,*) is the dual basis. If K is algebrai-
transposed representation (dual representation cally closed and p is irreducible (or more gen-
or adjoint representation) of p, and is denoted erally, tabsolutely irreducible), then {n,} is
by ‘p. The linear mapping ‘p(a) is the ttrans- linearly independent; therefore we have dim AZ
posed mapping of p(a). If M is finite- = nz (G. Frobenius and I. Schur). We take a
dimensional over K, we have (M*)* = A4 as an matrix representation T corresponding to p
A-module. For a linear representation p of a and put x,(u) = tr T(u) (a~ A). Then xP is a
group G, the mapping g+‘p(g)-’ (ysG) is function on A that is uniquely determined by p
called the contragredient representation of p. and belongs to AZ; xp is called the character of
The reciprocal linear representation associated p. For a linear representation p of a group G,
with the right A-module A is called the right the character of p can be regarded as a func-
regular representation of A, and its transposed tion on G. Moreover, it can be viewed as a
representation (i.e., the representation as- function on the set of all tconjugate classes of
sociated with the left A-module A*) is called G. The character of p is equal to the sum of the
the coregular representation of A. For any characters of the irreducible components of p
finite-dimensional semisimple algebra and taken with their multiplicities. The character
group ring of a finite group, the regular repre- of an irreducible representation is called an
sentation and the coregular representation are irreducible character (or simple character). If K
similar (- 29 Associative Algebras H). is of characteristic 0, then p g p’ is equivalent
Let p be a linear representation of A over K to &=x&7’, and the different irreducible charac-
and M be its representation module. For any ters are linearly independent. The character
x E M, x* E M*, we define a ilinear form p,,,* E of an absolutely irreducible representation
A* on A by p&u) = (ax,x*) (a~ A). This is (- Section F) is called an absolutely irreducible
called the coefficient of p relative to x, x* and character. If we consider absolutely irreducible
is determined by its values at generators of characters only, the statement holds irrespec-
A as a linear space. In particular, a coefficient tive of the characteristic of K.
of a linear representation p of a group G can The sum of all absolutely irreducible charac-
be regarded as a function on G taking values ters of A is called the reduced character (or
in K. For a fixed X*E M* the assignment x+ reduced trace) of A. The direct sum of all ab-
pX,X* gives an A-homomorphism M + A*, solutely irreducible representations of A is
where A* is considered as a left A-module. called the reduced representation of A, and its
Therefore any nonzero coefficient pX,X* of an character is equal to the reduced character.
irreducible representation p generates an A- The determinant of the reduced representation
submodule of A* isomorphic to M. In other is called the reduced norm of A.
words, any irreducible representation of A is
similar to some subrepresentation of the co-
regular representation of A. In particular, F. Scalar Extension of Linear Representations
any irreducible representation of a finite-
dimensional semisimple algebra or a finite Let K, L be commutative rings with unity
group is an irreducible component of the element, and fix a homomorphism 0: K +I?..
regular representation. Let AZ be the subspace We denote by M” the scalar extension a*(M)
of A* generated by all coefficients p,,,* (XE M, = M OK L of a K-module M relative to
x* E M*) for a given linear representation p. rr:xiO/*=xOi”~(x~M;~~K,~~E)(- 277
Then p z p’ implies Af = A$. If pl, , p, are Modules L). For an algebra A over K, the
irreducible representations of A such that pi scalar extension A” of the K-module A has
and pj are not similar unless i = j, the sum the natural structure of an algebra over
AZ, + + A;, in A* is direct. In particular, for L:(uOi)(bO~)=ahO3.~(u,b~A;~,~~L).
a semisimple algebra A, let the p, (I< i < r) be For a group G, we can regard (K[G])“=
the irreducible representations associated with L[G]:g@i~=gl(gEG,iEL).IfMisaleft
the minimal left ideals of the +simple compo- A-module, then M” has the natural structure
nents Ai of A. Then any irreducible represen- of a left Au-module; (a @ 1”)(x @ p) = ax @ 1.p
tation is similar to one and only one of pl, , (a E A, x E M; J., ,u E L). For the linear represen-
pV, and A* can be decomposed into the direct tation p associated with M, the linear repre-
sum of Ap*, , , AZ,. In addition, each A;T, is sentation p” over L associated with M” is
canonically identified with A*. called the scalar extension of p relative to
We shall treat finite-dimensional representa- a:p”(u@l)=p(u)@l,. Let(e,,...,e,)bea
tions exclusively. Let (el, . . , , e,) be a basis of basis of M over K. If the matrix representation
the representation module M of p over K, and u-(&(u)) corresponds to M relative to this
let a+ T(u)=(>.~~(u)) be the matrix representa- basis, then the matrix representation corre-
1343 362 G
Representations

sponding to M” relative to the basis (el @ 1, some finite group G, is a subgroup of the
. , e, 0 1) over L is given by a @ 1 -@,(a)“). tBrauer group B(K) of K, known as the Scbur
A linear representation over L is said to be subgroup of B(K). Recent research has clarified
realizable in K if it is similar to the scalar considerably the structure of this group [19].
extension p” of some linear representation p
over K.
In particular, if e: K -+L is an isomorphism, G. Linear Representations of Finite Groups
p” is called the conjugate representation of p
relative to 0. The conjugate representation Let G be a finite group of order g. The linear
relative to the automorphism a:i-+X (complex representation of G over K is equivalent to the
conjugation) of the complex number field is linear representation of the group ring K [Cl,
called the complex conjugate representation. If concerning which we have already stated the
m is an ideal of K and 0: K + K/m (tresidue general facts. If K is the ring Z of rational
class ring) is the canonical homomorphism, integers, a linear representation over K is
then the construction of p” from p is called the sometimes called an integral representation.
reduction modulo m (- Section I). If p is a We assume that K is a field. If the character-
prime ideal of K and o: K + K, (tlocal ring) is istic of K is zero or more generally not a divi-
the canonical homomorphism, then the con- sor of g, every linear representation of G over
struction of p” from p is called the localization K is completely reducible (H. Maschke). Such
relative to p. If K is an integral domain and p a representation is called an ordinary represen-
= K - {0}, then K, is the tfield of quotients of tation. If g is divisible by the characteristic of
K. We can also consider the “completion of K, we have a modular representation (- Sec-
representation” with respect to p. tion I).
Let K be a field, L a field extension, and The exponent of G is the smallest positive
0: K +L the canonical injection. Then for a integer it satisfying u” = 1 for every element
linear representation p of A and its represen- UE G. A field containing all the nth roots of
tation module M, the scalar extensions p”, M” unity is a splitting field for G (R. Brauer, 1945).
are written pL, ML, respectively. In view of M Consequently, for such a field K, any scalar
c ML, A c AL, 8&V) c &‘L(ML) by the natural extension of an irreducible representation over
injections, we can regard pL as an extension K is irreducible, and any irreducible represen-
of the mapping p. We shall consider finite- tation over any field extension of K is realizable
dimensional representations exclusively. For in K. We fix a splitting field K for G and as-
linear representations pl, p2 over K, p1g pz is sume that K is of characteristic 0, for example,
equivalent to pf g pg. An irreducible repre- we can assume K = C.
sentation p over K is said to be absolutely The number of nonsimilar irreducible repre-
irreducible if its scalar extension pL to any field sentations of G is equal to the number of
extension L is irreducible; an equivalent con- conjugate classes in G. Each irreducible repre-
dition is that the scalar extension p’ to the sentation appears as an irreducible component
talgebraic closure I? is irreducible. Another of the regular representation with multiplicity
equivalent condition is that every endomor- equal to the degree. In addition, each degree is
phism of the representation module M of p a divisor of the order g of G. Let p be a linear
must be a scalar multiplication. If every irre- representation of a subgroup H of G and M
ducible representation of A over K is absolutely be its representation module. Then the linear
irreducible, K is called a splitting field for representation of G associated with the K [Cl-
A. For a group G, if the field K is a splitting module K[G]6&, M is called the induced
field for the group ring K[G], then K representation and is denoted by pG.If the
is called a splitting field for G. Let A be linite- matrix representation T corresponds to p, then
dimensional over K. If K is a splitting field for using the partition of G into the cosets G =
A, any irreducible representation of AL is a 1H U . U a, H we can write the matrix
realizable in K for any field extension L of K. representation corresponding to pG as
For an arbitrary field K, the scalar extension
T(a;‘aul) . .. T(a;‘uu,)
pL of an irreducible representation p to a a+ ... . .. .. .
tseparable algebraic extension L of K is com-
T(u;‘uu,) . .. T (a,- sm,)

pletely reducible. For simplicity, we assume I 1
that K is tperfect and L = K. Then the multi- where we define T(b) = 0 for b# H. The induced
plicities of all irreducible components of pL are representation from a representation of degree
the same; this multiplicity is called the Scbur 1 of a subgroup is called a monomial represen-
index of p. tation. To such a representation corresponds
The set S(K) of talgebra classes over K, each a matrix representation T such that T(u) has
of which is represented by a (central) simple exactly one nonzero entry in each row and
component of the group algebra K [G] of column for every UE G. For the trivial sub-
362 H 1344
Representations

group H = {e}, we obtain the regular represen- If we put the numerals 1 to n into the n
tation of G. In general, for an irreducible squares of Tin a different order, we obtain
representation c of G and an irreducible repre- another symmetrizer u’ associated with T.
sentation p of a subgroup H, the multiplicity However, these two irreducible representations
of CJin pG coincides with that of p in the res- associated with LI and u’ are similar. Hence
triction a, of g to H (the Frobenius theorem). there corresponds to T a fixed class of irre-
The following ortbogonality relations hold for ducible representations of 6,, i.e., a fixed irre-
irreducible characters x and $ of G: ducible character of 6,. Moreover, any two
different Young diagrams yield different
irreducible characters, and any irreducible
character is obtained by a suitable Young
diagram. Thus there exists a one-to-one corre-
spondence between the Young diagrams and
the irreducible characters of 6,.
In the second formula, x ranges over all the The method of determining the character
irreducible characters of G, C(a) denotes the associated with a given diagram was found by
conjugate class of G containing a, and g, is the Schur and H. Weyl (- 60 Classical Groups).
number of elements in C(a).

H. Linear Representations of Symmetric I. Modular Representations of Finite Groups


Groups
Let G be a finite group of order g, and let K be
All irreducible representations of a tsymmetric a splitting field of G of characteristic p # 0. If p
group 6, over the field Q of rational numbers is a divisor of g, we have the case of modular
are absolutely irreducible. Hence the represen- representation, in which the situation is quite
tation theory of 6, over a field of character- different from the case of ordinary representa-
istic zero reduces to that over Q. Since the tion. The theory of modular representations
group algebra A = Q [S,] is semisimple, to of a finite group was developed mainly by
obtain an irreducible representation of 5,, it is Brauer after 1935.
sufficient to find a tprimitive idempotent (i.e., The elements of G whose orders are prime
an idempotent that is not the sum of two to p are called p-regular. Let k be the number
orthogonal nonzero idempotents) of A. Such of p-regular classes of G, i.e., conjugate classes
an idempotent can be obtained in the follow- of G containing the p-regular elements. Then
ing way. As in Fig. 1, we draw a diagram T there exist exactly k nonsimilar absolutely
consisting of n squares arranged in rows of irreducible modular representations F,, F2,
decreasing lengths, the left ends of which are , Fk. The number of nonsimilar indecom-
arranged in a single column. Such a diagram T posable components of the regular representa-
is called a Young diagram; if it has k rows of tion R of G is also equal to k, and we denote
lengths fi 2f22 >&>O,fi +fi+ +f,=n, these representations by U,, U,, , U,. We
then it is written T= T(f,, ,fJ. We put the can number them in such a way that F, ap-
numerals 1 to n in any order into the n squares pears in UK as both its top and bottom compo-
nent. If the degree of F, is f, and that of U, is
of T= T(fi>fz, . ..>.L). as in Fig. 1, for example.
We then denote by 0 any permutation of G, u,, then UK appears f, times in R and F, ap-
that preserves each row and construct an pears u, times in R. The multiplicities c,* of Fl
element of A : s = C cr. Similarly, we denote by in U, are called the Cartan invariants of G.
r any permutation of 6, that preserves each Take an algebraic number field R that is a
column and set t = C(sgn r)r. If we set u = t. s splitting field of G. Let p be a prime ideal in
= C f r. (T, then u is a primitive idempotent of R dividing p, and let o be the domain of +p-
A except for a numerical factor. This implies integers of 52. Then the residue class field o/p is
that u yields an irreducible representation of a finite field of characteristic p and a splitting
5,. The element u of A is called the Young field of G. Hence we can assume that o/p = K,
symmetrizer associated with T. where K is the field considered at the begin-
ning of this section. Let Z,, Z,, . , Z, be the
nonsimilar irreducible representations of G in
0. We can assume that all the coefftcients of
Zi are contained in o. Replacing every coeffi-
cient in Zi by its residue class mod p, we obtain
a modular representation Zi. The modular
Fig. 1 representations Z,, . . . , Z, thus obtained may
Young diagram. n = 26; fl = 8, .f~ = 6, .& = 6, .f, =4, f’ be reducible. The multiplicities di, of F, in Z,
=2. are called the decomposition numbers of G.
1345 3621
Representations

They are related to the Cartan invariants by defect 0 contains exactly one ordinary repre-
the fundamental relations sentation Zi, hence also exactly one modular
representation F, (x, = y, = 1). Moreover, we
have Zi = F, = U,. It follows that all the de-
grees zi of Zi belonging to a block of defect 1
The determinant Ic,J of degree k is a power are exactly divisible by p’-‘; the converse is
of p. We set g = p’g’, (p, g’) = 1. Then we may also true. Zi belongs to a block of defect 0 if
assume that R contains a primitive g’th root of and only if x,(a) = 0 for any element a of G
unity 6(~ 0). Since (p, g’) = 1, the residue class whose order is divisible by p.
Z(E K) of 6 is a primitive g’th root of unity. Let D be any p-Sylow subgroup of the tcen-
Let M be a modular representation of G. The tralizer C,(a) of an element a of G, and let
characteristic roots of M(a) for a p-regular (D: l)=pd. Then d is called the defect of the
element a are powers 2 of 8. We replace each class C(a), and D is called a defect group of
p by 6’ and obtain an element ((a) of R as the C(a). The number of blocks of defect e is equal
sum of these 6’. In this manner we define a to the number of p-regular classes of defect e.
complex-valued function r on the set of p- Let B, be a block of defect d. Then there exists
regular elements of G. We call 5 the modular a p-regular class of defect d containing an
character (or Brauer character) of M. Two element a such that g.Xi(a)/zi + 0 (mod p) for
modular representations have the same irre- any Zi in B,. The defect group D of C(a) is
ducible components if and only if their modu- called the defect group of B,, and D is uniquely
lar characters coincide. Denoting by cp. the determined up to conjugacy in G. The number
modular character of F, and by I], that of U,, of blocks of G with defect group D is equal to
we have the following orthogonality relations the number of blocks of the tnormalizer N,(D)
for the modular characters: with defect group D.
An arbitrary element x of G can be written
uniquely as a product x = sr = r’s, where s,
called the p-factor of x, is an element whose
order is a power of p, and r is a p-regular
element. We say that two elements of G belong
to the same section if and only if their p-factors
In the first sum, a ranges over all p-regular are conjugate in G. This is an equivalence
elements of G. relation. Obviously, each section is the union
We say that F, and FL belong to the same of conjugate classes of G. If the p-factor of x is
block if there exists a sequence of indices K, c(, not conjugate to any element of the defect
p, . . . . y, 1 such that cKa#O, c,~#O ,..., c,,#O. group D of B,, then xi(x) = 0 for all Zi in B,.
This is obviously an equivalence relation, and Let cp?, cp; , . . . , cpi&, be the absolutely irre-
F,, Fz, . , Fk are classified into a finite number, ducible modular characters of Cc(s), and let xi
say s, of blocks B,, B,, . . . , B,. If F, belongs to be the absolutely irreducible ordinary charac-
a block B,, we say by a stretch of language ters of C,(s). Since
that the corresponding U, also belongs to B,.
xX=-) = w.Xr) = &IX dhC(4 rc Cc(s)9
All the irreducible components of Zi belong a
to the same block since c,~ # 0 if di, # 0 and
we have
di, # 0. If the irreducible components of Zi
belong to B,, we say that Zi belongs to B,. Let XiW=C ri&r) =I &d(r).
0
x, be the number of Zi belonging to B, and y,
the number of F, belonging to B,. Then x, > y,. The db are called the generalized decompo-
If xi is the ordinary character of Zi, then xi sition numbers of G. If the order of s is p’, then
can be considered as the modular character the db are algebraic integers of the field of the
of Zi. If we denote the degree of Zi by zi, then p’th roots of unity. Let s be conjugate to an
g,x,(a)/z, for a~ G is an algebraic integer and element of D. There corresponds to B, a union
hence belongs to o. Now Zi and Zj belong to B, of blocks of Cc(s), and if Q # p, then fi, and
the same block if and only if g,xi(a)/zi = BP contain no irreducible modular represen-
g,Xj(a)/zj (modp) for all p-regular elements tations in common. We have di”, = 0 for any
aofG. Zi in B, (i.e., &$ B,). Brauer’s original proof
If p” is the highest power of p that divides all of this result was considerably complicated;
the degrees zi of Zi belonging to E,, then it is simpler proofs were given independently by K.
also the highest power of p dividing all the Iizuka and H. Nagao. From these relations we
degrees f, of F, belonging to B,. We call d = get the following refinement of the orthogon-
e-u the defect of B,; obviously 0 <d < e. If Zi ality relations for group characters. If Zi and
belongs to a block of defect d, then the power Zj belong to different blocks of G, then
of p dividing zi is pe-d+ki (hi 2 0). A block of ~,,s~i(a)~i(a-l)=O, where a ranges over all
362 J 1346
Representations

the elements belonging to a fixed section S of linear representation of G. In general, if p is a


G. If elements a and h of G belong to different projective representation of G over C, then the
sections, then C,, xi(a)xi(h -‘) = 0, where xi order of c, is a divisor of the degree of p (di-
ranges over all the characters of G belonging mension of V). Moreover, if p is irreducible,
to a fixed block B,. then both the degree of p and the square of the
order of cp are divisors of the order of G. K.
Yamazaki, among others, studied the projec-
J. Projective Representations of Finite Groups tive representations of finite groups in detail.

Let V be a finite-dimensional linear space over


a field K, and let P(V) be the tprojective space K. Integral Representations
associated with V (- 343 Projective Geom-
etry). The set of all projective transforma- Every complex matrix representation of G is
tions of P(V) forms the group PGL( V), which equivalent to a matrix representation in the
can be identified with the quotient group ring of algebraic integers. If an algebraic num-
GL(V)/K*l,. Here K*=K-{0} and K*l, is ber field K is specified, every K[G]-module V
the set of all scalar multiples of the identity contains G-invariant R-ilattices (briefly, G-
transformation 1v of V and is the center of lattices), where R is the ring of integers in K.
CL(V). A homomorphism G+PGL(V) is A G-lattice L is characterized as an R [G]-
called a projective representation of G in V or module, which is finitely generated and +tor-
simply a projective representation of G over sion free (hence tprojective) as an R-module. It
K. Two projective representations (p, V) and provides an integral representation of G as an
(p’, V’) of G are said to be similar if there exists automorphism group of the R-projective
an isomorphism q: PGL(V)+PGL(V’) induced module L.
by a suitable isomorphism V-r V’ such that R[G]-modules L and M need not be iso-
cp0P@)0cp-’ =~‘(a) (uEG). Let Vi #{O} be a morphic even when the K [G]-modules K @L
subspace of V. We can assume that P( V,) c and K @ M are isomorphic. The set of G-
P(V). If (p, V) is a projective representation lattices in a fixed K [G]-module V is divided
of G such that each p(a) (a~ G) leaves P( V,) into a finite number of R[G]-isomorphism
invariant, we get a projective representation classes (Jordan-Zassenbaus theorem). Let p be
(pr , Vi) by restricting the p(a) to P( V,). In this a prime ideal of R and R, be the localization
case (p, , V,) is called a subrepresentation of p. of R at p. The study of R,-representations is
A projective representation is said to be irre- intimately related with modular representation
ducible if there exists no proper subrepresen- theory. For any R [G]-module L there is an
tation of p. associated family of R,[G]-modules L, =
A mapping (T: G+GL( V) is called a section R, 0 L, where p ranges over all primes of R.
for (p, V) if n(o(u)) = p(u) for each UE G, where G-lattices L and M in a K [G]-module V are
z is the natural projection of GL( V) onto said to be of the same genus if L, g M, for
PGL( V). Any section (T defines a mapping every p. The number of genera of G-lattices in
,f: G x G+K* satisfying a(u)g(b) =,f(u, b)o(uh) V is given by &hp (9 = order G), where h,
(a, hi G). The set {f(u, b)}a,btc is called the denotes the number of R,[G]-equivalence
factor set of p with respect to cr. The mapping classes of R,[G]-lattices in V. When V is ab-
f is a +2-cocycle of G with values in K*. The 2- solutely irreducible, the number of R[G]-
cohomology class c,,EH’(G, K*) off is deter- equivalence classes in a genus equals the (ideal)
mined by p and is independent of the choice of +class number of K (J. M. Maranda and S.
sections for p. A projective representation p Takahashi).
has a section o which is a linear representation The +Krull-Schmidt theorem, asserting the
ofGin Vifandonlyifc,=l. IfGisafinite uniqueness of a direct sum decomposition into
group, for any ceH’(G, K*) there exists an indecomposable R[G]-modules, holds if R is a
irreducible projective representation p of G complete discrete valuation ring or if R is a
over K which belongs to c, i.e., cp = c. If p and discrete valuation ring and K is a splitting field
p’ are similar, then cp = cp.. The tensor product of G. The condition for the finiteness of the
p @ p’ of two projective representations p and number of nonisomorphic indecomposable G-
p’ can be defined as in the case of linear repre- lattices is known. In particular, for R = Z it
sentations, and we have cpol,, - cIj. cp.. If K is reduces to the requirement that the Sylow p-
algebraically closed, then H’(G, K*) is deter- subgroup of G be cyclic of order p or p2 for
mined by the characteristic of K. When K is every p 1y. Regarding projective Z [G]-modules
the complex field C, the group ff’(G, C*) = - 200 Homological Algebra G.
cJ?l(G) is called the multiplier of G. If 9X(G) = 1, The isomorphism problem, i.e., the question
then G is called a closed group, and any pro- of whether the isomorphism Z[G] gZ[H] of
jective representation of G is induced by a integral group algebras implies the isomor-
1347 363 Ref.
Riemann, Georg Friedrich Bernhard

phism G g H of groups, has been answered [ 171 L. Dornhoff, Group representation


affirmatively for certain special cases such theory, Dekker, A, 1971; B, 1972.
as tmeta-Abelian groups. [IS] J.-P. Serre, Representations lineaires des
R[G] is an R-torder in K[G], and in this groupes finis, second edition, Hermann, 1972.
context, a considerable portion of the integral [ 191 T. Yamada, The Schur subgroup of the
representation theory has been extended to Brauer group, Lecture notes in math. 397,
more general orders in separable algebras [ 14- Springer, 1974.
161.

363 (Xx1.40)
References
Riemann, Georg Friedrich
Bern hard
[l] B. L. van der Waerden, Gruppen von
linearen Transformationen, Springer, 1935
Georg Friedrich Bernhard Riemann (Septem-
(Chelsea, 1948).
ber 17,1826-July 20,1866) was born the son
[2] B. L. van der Waerden, Algebra II,
of a minister in Breselenz, Hanover, Germany.
Springer, fifth edition, 1967.
He attended the universities of Giittingen and
[3] H. Boerner, Darstellungen von Gruppen,
Berlin. In 1851 he received his doctorate at the
Springer, 1955.
University of Gottingen and in 1854 became
[4] N. Bourbaki, Elements de mathtmatique,
a lecturer there. In 1857 he rose to assistant
Algbbre, ch. 8, Actualites Sci. Ind., 1261a,
professor, and in 1859 succeeded P. G. L.
Hermann, 1958.
tDirichlet as full professor. In 1862 he con-
[S] C. W. Curtis and I. Reiner, Representation
tracted tuberculosis, and he died at age 40.
theory of finite groups and associative alge-
Despite his short life, his contributions en-
bras, Interscience, 1962.
compassed all aspects of mathematics.
[6] R. Brauer and C. Nesbitt, On the modular
His doctoral thesis (1851) stated the basic
characters of groups, Ann. Math., (2) 42 (1941),
theorem on tconformal mapping and became
556-590.
the foundation for the geometric theory of
[7] R. Brauer, Zur Darstellungstheorie der
functions. In his paper presented for the posi-
Gruppen endlicher Ordnung I, II, Math Z., 63
tion of lecturer (1854), he defined the tRie-
(1956), 406-444; 72 (1959-1960), 25-46.
mann integral and gave the conditions for
[S] M. Osima, Notes of blocks of group char-
convergence of trigonometric series. In his
acters, Math. J. Okayama Univ., 4 (1955), 175-
inaugural lecture in the same year, he dis-
188.
cussed the foundations of geometry, intro-
[9] I. Schur, Uber die Darstellung der endli-
duced n-dimensional manifolds, formulated the
then Gruppen durch gebrochene lineare Sub-
concept of tRiemannian manifolds, and defined
stitutionen, J. Reine Angew. Math., 127 (1904)
their curvature. In his paper of 1857 on tAbel-
20-50.
ian functions, he systematized the theory of
[lo] I. Schur, Untersuchungen tiber die Dars-
tAbelian integrals and Abelian functions. In
tellung der endlichen Gruppen durch gebro-
his paper of 1858 on the distribution of prime
chene lineare Substitutionen, J. Reine Angew.
numbers, he considered the tRiemann zeta
Math., 132 (1907), 85-137.
function as a function of a complex variable
[ 1 l] I. Schur, Uber die Darstellung der sym-
and stated tliiemann’s hypothesis concerning
metrischen und der alternierenden Gruppe
the distribution of its zeros. It remains for
durch gebrochene lineare Substitutionen, J.
modern mathematics to investigate whether
Reine Angew. Math., 139 (191 l), 155-250.
this hypothesis is correct. In his later years,
[ 121 R. Brauer, Representations of finite
influenced by W. Weber, Riemann became
groups, Lectures on modern mathematics,
interested in theoretical physics. He gave lec-
T. L. Saaty (ed.), Wiley, 1963, vol. 1, 133-175.
tures on the uses of partial differential equa-
[ 131 W. Feit, Characters of finite groups,
tions in physics that were edited and published
Benjamin, 1967.
by H. Weber.
[14] I. Reiner, A survey of integral representa-
tion theory, Bull. Amer. Math. Sot., 76 (1970),
1599227. References
[15] R. Swan, K-theory of finite groups and
orders, Lecture notes in math. 149, Springer, [l] G. F. B. Riemann, Gesammelte mathema-
1970. tische Werke und wissenschaftlicher Nachlass,
[16] K. W. Roggenkamp (with V. Huber- R. Dedekind and H. Weber, (eds.), Teubner,
Dyson), Lattices over orders I, II, Lecture second edition, 1892 (Nachtrage, M. Noether
notes in math. 115, 142, Springer, 1970. and W. Wirtinger (eds.), 1902) (Dover, 1953).
364A 1348
Riemannian Manifolds

[2] F. Klein, Vorlesungen iiber die Entwick- topology of M. There exists an essentially
lung der Mathematik im 19. Jahrhundert I, unique structure of a Riemannian manifold on
Springer, 1926 (Chelsea, 1956). (real or complex) telliptic or thyperbolic space
(- 285 Non-Euclidean Geometry), and d is
the distance function of these spaces.
If there exists an immersion cp of a differenti-
364 (Vll.3) able manifold N in a Riemannian manifold
(M, g), then a Riemannian metric ‘p*g is de-
Riemannian Manifolds fined on N by the tpullback process ( 11L II,+,., =
lldc&L)l\,). (For example, a submanifold and
A. Riemannian Metrics a ‘covering manifold of M have Riemannian
manifold structures induced by the natural
Let M be a tdifferentiable manifold of class C mappings (- 365 Riemannian Submanifolds).)
(1 < r < o), and y be a +Riemannian metric of If M = E3 and N is a 2-dimensional submani-
class c’-’ on M. Then (M,g) or simply M is fold of M, then ‘p*g is the Q’irst fundamental
called a Riemannian manifold (or Riemannian form of N. Assume further that ‘p is a diffeo-
space) of class c* (- 105 Differentiable Mani- morphism and N has a Riemannian metric
folds). The metric g is a fcovariant tensor field h. If ‘p*g = h, then (N, h) is said to be isometric
of order 2 and of class c’-‘; it is called the to (M, g), and q is called an isometry. The set
fundamental tensor of M. Using the value g, of I(M) of all isometries (isometric transforma-
g at each point PE M, a positive definite inner tions) of M onto M is a group. A necessary
product g,(X, Y), X, YE TP, is introduced on and sufficient condition for a mapping II/ :
the ttangent vector space T, to M at p, and N--t M to be an isometry is that d,(p, q) =
hence TP can be considered as a fvector space d,($(p),$(q)), p, 4~ N. In particular, I(E”) is
over R with inner product that can be identi- the tcongruent transformation group.
fied with the Euclidean space E” of dimension If a differentiable manifold M is the product
n = dim M. Utilizing the properties of the space manifold of Riemannian manifolds (M,, gl)
E”, we can introduce various notions on T, and (M2,g2), then (M,n:g, +n:g2) is called
and M. (For example, given a tangent vector the Riemannian product of M, and M,, where
LET,, we define the length (IL.11= IILl\g,P of L a,, c(= 1, 2, are projections from M to M,.
to be the quantity g,(L, I,)“‘. A normal vector Let F be the ttangent n-frame bundle over
at a point p of a submanifold N of M is well M and B = B,(M) be the subset of F consisting
defined as an element of the orthogonal com- of all orthonormal frames with respect to g.
plement of the subspace T,(N) of T,(M) with Then B is an O(n)-subbundle of F of class C”,
respect to g,; a differential form of degree 1 is called the tangent orthogonal n-frame bundle
identified with a tangent vector field.) A neces- (or orthogonal frame bundle). In this way we
sary and sufficient condition for a differenti- get a one-to-one correspondence between the
able manifold M of class c’ to have a Rie- set of all O(n)-subbundles of F and the set of
mannian metric is that M be tparacompact. A all Riemannian metrics of M.
Euclidean space E” has a Riemannian metric
expressed by C&dx’ Q dx’ in terms of an B. Riemannian Connections
orthogonal coordinate system (xi).
We assume that M is connected and of class There exists a unique +afftne connection in the
C”. A curve x : [a, b] + M is called piecewise orthogonal frame bundle B whose ‘torsion
smooth or of class D” if x is continuous and tensor is zero. This connection is called the
there exists a partition of [a, b] into finite Riemannian connection (or Levi-Civita connec-
subintervals [tiei, ti] such that the restrictions tion; - 80 Connections K). Let V denote the
x 1[tieI, tJ are timmersions of class C”. The tcovariant differential operator defined by this
length ((x(( of such a curve x is defined to be connection (- 80 Connections, 417 Tensor
[a” llx’(t)ll dt, where x’(t) is the tangent vector of Calculus). (For a vector field X, the covariant
x defined for almost all values oft. As in a differential operator V, acts on any tensor field
Euclidean space, the length (/x([ is independent T defined on a submanifold having X as a
of the choice of parameter t, and the concepts tangent vector field.) The covariant differen-
of tcanonical parameter and orientation of x tial Vg of the fundamental tensor g vanishes
can be defined (- 111 Differential Geometry identically. The tconnection form of the Rie-
of Curves and Surfaces). A function d : M x M mannian connection is expressed by n2 differ-
+ [0, co) is dctincd so that the value d(p, q), p, ential l-forms (~j),~~,~~~ on B, and we have
y E M, is the i&mum of the lengths of curves of wj+o{=O. Let (w’),~~~” be the tcanonical
class D” joining p and q. The function d is a l-forms on B. Then (~()r$~$,~~ together with
tdistance function on M, and the topology of (Q~) give rise to an absolute parallelism on
M defined by d coincides with the original B (that is, they are linearly independent at
1349 364D
Riemannian Manifolds

every point). Let (@) and (@) be the corre- desic arc. Let N(S) be the normal bundle of a
sponding set of differential l-forms on the submanifold S of M, that is, the differentiable
orthogonal frame bundle &., over another vector bundle over S consisting of all normal
Riemannian manifold N with dim N = dim M. vectors at all points of S. Then S is contained
If there exists an isometry + : M -+ N, then the in N(S) as the set of zero vectors at all points
differential dt,h is a diffeomorphism from B = of S. There exist a neighborhood U of S in
B, to BN, and we get (dJ/)*(@) = coi, (dtJ)*(@) = N(S) and a mapping Exp,: U+M of class C”
o{. Conversely, if there exists a diffeomorphism with the following property: There exists a
Y: BIM+BN satisfying Y*(P)= oi, ‘r*(eij)= geodesic arcx with the initial tangent vec-
w/ and M is torientable, then there exists an tor LE U, length [[XII = IlLll, and final point
isometry II/ : M + N such that dl(/ = Y holds on Exp,(L). Let U, be the largest U with this
a connected component B,, of B. Moreover, $ property. Then Exp,: Us+ M is determined
is uniquely determined if we choose one B,. In uniquely by S. The mapping Exp, is called
this way the problem of the existence of an the exponential mapping on S. If the rank of
isometry from M may be reduced to one of the the Jacobian matrix of Exp, is less than n at
existence of a diffeomorphism from B preserv- LE Us, then L or Exp,(L) is called the focal
ing absolute parallelism (as well as the order of point of S on the geodesic s+ Exp,(sL) (0 <s,
the basis (w’, oi’,). SL E IQ. If S is compact, then S has an open
According to the general theory of alhne neighborhood V, in N(S) satisfying the follow-
connections, the Riemannian connection on M ing three conditions: (i) V, c Us; (ii) llLl[ =
determines a Wartan connection uniquely d(Exp,(L), S) for LE V’, where the right--hand
with E” = I(E”)/O(n) as tfiber, which is called member expresses the intimum of the distance
the Euclidean connection. As a consequence, between the point Exp,(L) and points of S; (iii)
every tangent vector space T,(M) is regarded the restriction Exp, 1V, is an embedding. The
as a Euclidean space E”,, and for a given curve image Exp,( V,) is the tubular neighborhood of
x: [a, b] + M of class D” and for t E [a, b] there S. In the special case where S consists of only
exists an isometry Ix,,: E&--+E’& satisfying the one point p, N({ p}) coincides with the tangent
following three conditions (we denote Ix,* by vector space T,(M), and the focal point of p is
I,): (1) If x is a composite of two curves y and called the conjugate point of p, given as the
z, then I, = I,, . I,. (2) Differentiability: If x is of zero point of the tJacobi field (- 178 Geo-
class C” at t,, then t-+1,,, is of class C” at t,. desics, 279 Morse Theory). In this case, V, is
(3) I, depends on the orientation of x but not denoted by V,. If Tp is identified with R” (or
on the choice of its parameter. The develop- E”) by means of an orthonormal basis of T,,
ment X of x is the curve in E:,,, defined by x(t) then (Exp,))’ defined on Exp,( V,) is a coordi-
=1&x(t)), and we get [lx/l = IIxII. (1, is some- nate mapping, called the normal coordinate
times called the development along x.) Utilizing mapping. Furthermore, Exp,( VJ contains a
the concept of development, the theory of neighborhood W, of p such that there exists a
curves in E” can be used to study curves on M unique geodesic arcx joining any two points q
(- 111 Differential Geometry of Curves and and r of W, with llxll= d(q, r) and contained in
Surfaces). For example, if X is a segment, then W,. W, is called a convex neighborhood of p.
x or x( [a, b]) is called the geodesic arc (- 80
Connections L); the tFrenet formula is auto-
matically formulated and proved. The rotation D. Curvature
part I,” of I, (the composite of I, and the paral-
lel displacement of E; translating I,(x(b)) to The set of differential l-forms (w’, oi), by
x(a)) is regarded as an isomorphism of the means of which absolute parallelism is given in
inner product space Txe,r to T&. I: is ex- the orthogonal frame bundle B of M, satisfies
tended to an isomorphism of the ttensor alge- the tstructure equation dw’= - C,w,! A oj,
bra y( Tx& to g( T,,,,), which is denoted by dwj = - &‘w: A oj” + Qj, and (4) is called the
the same symbol I,” and called the parallel curvature form of the Riemannian connection
displacement or parallel translation along x. of M. This form is expressed by a tensor field
Given a tensor field K on M, we have V,.,,,K R (- 80 Connections; 417 Tensor Calculus ) of
= [dI$(K(x(t)))/dtll=,. In particular, a neces- type (1,3) on M, called the curvature tensor; if
sary and sufficient condition for VK =0 is that Rjkl are the components of R with respect to
I:(K(x(b)))= K(x(a)) for any x, in which case an orthonormal frame b E B of the tangent
K is said to be parallel. vector space T, of M, then K$ = (1/2)x R&ok A
0’ at b. Let (X, Y) be an orthonormal basis of
C. Exponential Mapping (- 178 Geodesics) a 2-dimensional subspace P of Tp. Then the
inner product K,(P) of X and R(X, Y) Y is
A curve x on M or the image of x is called a determined by P independently of the choice
geodesic if any subarc x 1[a, b] of x is a geo- of the basis (X, Y), where the i-component of
364 E 1350
Riemannian Manifolds

R(X, Y)Z with respect to the basis b of T, is classified [224]. Related to algebraic geome-
given by C R&ZjXkY’. K,(P) is the +Gauss- try, as the solution of the Frankel conjecture,
ian curvature of the surface Exp,( V, n P) and the following holds: If a compact Kahler mani-
is called the sectional curvature (or Riemannian fold has strictly positive sectional curvature,
curvature) of P. The curvature tensor R is then it is biholomorphic to the complex pro-
uniquely determined by the function K,(P) jective space [S, 61 (- 232 Kahler Manifolds).
of p and P. If dim M > 3 and if at every point p Furthermore, curvature tensors are related to
of M, K,(P) has a constant value M, indepen- tcharacteristic classes. For example, we have
dent of the choice of P, then M, is a constant the Gauss-Bonnet formula: If M is an even-
independent of the choice of p (F. Schur). If dimensional compact and oriented Riemann-
K,,(P) is constant, then M is called a space of ian manifold, the integral of a,Kc,+o on M is
constant curvature. If VR = 0, then M is called equal to the tEuler-Poincare characteristic,
a locally symmetric space (- 412 Symmetric where
Riemannian Spaces and Real Forms; 413
u, = n!/(2”n@(n/2)!),
Symmetric Spaces). In a local sense, Riemann-
ian metrics of these spaces are uniquely deter- o is the volume element of M, and K,,, is
mined by the curvature tensor R up to a con- defined as follows: For a positive even num-
stant factor. If M is of constant curvature K, ber s, K,,, is a real-valued function of the s-
complete, and simply connected, then M is dimensional subspaces P of the tangent vector
isometric to E”, the sphere (which is the uni- spaces Tp of M, which is given by
versal covering Riemannian manifold of a real
ielliptic space), or a real thyperbolic space
according as K is 0, positive, or negative. The
compact spaces of positive constant curvature,
that is, the Riemannian manifolds having the in terms of an orthonormal basis (Xi, , X,) of
sphere as the universal covering Riemannian P, where b, = ( - l)s/2/(2si2 s!), C is summation
manifold, were completely classlied by J. A. over all pairs of s-tuples satisfying {ii, , i,},
Wolf [l]. A complete, simply connected, {j,, , j,} c { 1,2, . . . , n}, E~,...~, is the sign of
and locally symmetric space is a tsymmetric (ii, , i,), ( , ) is the inner product in T, with
Riemannian space. The Ricci tensor (Rij) is respect to gp, R, is the value of the tensor R
defined by R,= --xkRbk. Let Q be the qua- at p, and R,(X,, Xj)Xk is as already defined
dratic form on T, given by (R,). Then the at the beginning of this section. In particular,
value Q(L) for a unit vector LE T, is the mean Kc,, = K. If K,,, of a compact and orientable
of K,(P) for all sections P (2-dimensional M is constant for a certain s, then the kth
subspaces of T,) containing L and is called the tpontryagin class of M (with real coefficients)
Ricci curvature (or mean curvature) of the vanishes for all k > s/2.
direction L at p. The mean R of Q(L) for all
the unit vectors L at p is called the scalar
curvature at p (- 417 Tensor Calculus). Q(L) E. Holonomy Groups
and R are expressed by Q(L) = Cig,,(R(Xi, L)L,
Xi) and R = c,Q(X,), up to positive constant Let p be a fixed point of M, and let 0, be the
factors, in terms of an orthonormal basis (Xi) set of all closed oriented curves of class D”
of T,. If the Ricci tensor of M is a scalar mul- with initial and final points p and with para-
tiple of the fundamental tensor, then M is meters neglected. The set H = {Ix 1xcn,,},
called an Einstein space. (When dim M > 3, this called the holonomy group of M, is a subgroup
scalar is constant.) If M is a +Kahler manifold of I(T,) (T, is identified with E”) independent of
and P is restricted to a complex plane (in- the choice of p (- 80 Connections), and x*1,
variant under the almost complex structure), is a homomorphism from ap to H. The restric-
then K(P) is called the holomorphic sectional tion Ho of this homomorphism to all closed
curvature. A Klhler manifold M of constant curves homotopic to zero is called the re-
holomorphic sectional curvature is locally stricted holonomy group. The rotation part h of
isometric to a complex Euclidean space, ellip- H, called the homogeneous holonomy group, is
tic space, or hyperbolic space. a subgroup of the orthogonal group O(n) of
The properties of the sectional curvature T,. The rotation part h, of Ho, called the re-
and the Ricci curvature are closely related to stricted homogeneous holonomy group, is a
the behavior of geodesics of Riemannian mani- connected component of h and a tcompact Lie
folds, and these properties reflect those of the group. The tLie algebra of h, is spanned by
topological structures of the manifolds (- 178 {I,(R,,,,(X, Y))l x: [a, b]-M is of class D”,
Geodesics). The compact simply connected x(a)=p, and X, YE Tx(,,,j, where R,,,,(X, Y) is
homogeneous Riemannian manifolds of strict- the endomorphism of the linear space TxcbJ
ly positive sectional curvature have been defined in Section D.
1351 364 F
Riemannian Manifolds

If M = E”, then H = (e}, where e is the iden- ture and equals E”, a real thyperbolic space, or
tity element. If h = {e} (ha = {e}), then M is a real telliptic space (or a sphere). A necessary
called flat (locally flat) (- 80 Connections E). and sufficient condition for the image of p to
Local flatness is equivalent to M being locally be a subbundle of B is that I(M) be transitive.
isometric to E”. If M is complete and H (re- If the image of fi contains the h-bundle, then
garded as a transformation group of E”) has a M is a symmetric space. If M is compact and
fixed point, then M is isometric to E”. Any I(M) is transitive, then the image of /I is con-
finite rotation group h is the homogeneous tained in the h-bundle (- 191 G-Structures). If
holonomy group of some locally flat and com- I(M) is transitive on M, then M is complete
pact Riemannian manifold. and is the thomogeneous space of Z(M). Con-
With respect to the linear group h of T,, versely, a homogeneous space M = G/K of
we get a unique decomposition T, = I$,, @ a Lie group G by a compact subgroup K has
$, @ . @ I$, of mutually orthogonal sub- a Riemannian metric invariant under G. In
spaces, where I/;,,(dim I’&, > 0) consists of all general, an element of I(M) preserves quan-
h-invariant vectors and I$,, i= 1,. . . , r, are tities uniquely determined by the Riemannian
irreducible h-invariant subspaces. If h or h, is metric g, such as the Riemannian connection,
irreducible (reducible) on T,, then M is called its curvature, the set of all geodesics, etc. Fur-
irreducible (reducible). If M is complete and thermore, any element of I(M) commutes with
simply connected (hence h = h,), then M is the V and the tlaplace-Beltrami operator. If M is
Riemannian product of closed submanifolds compact and oriented, then the connected
MC,,, cc=O, 1, . . . . r, satisfying I$, = T( M,,,). component I,,(M) of I(M) preserves any thar-
This decomposition M = l-J MC,, is determined manic differential form. If M is complete and
uniquely by M and called the de Rbam de- simply connected, then I,(M) is clearly decom-
composition of M [7]. In this case h is the posed into a direct product by the de Rham
direct product of closed subgroups h(,,, where decomposition of M. An element of the Lie
every h(,, acts on I&,, /? # t( as the identity, algebra of I(M) is regarded as a vector field X
and can be regarded as the homogeneous on M, called the infinitesimal motion, which
holonomy group of MC,,. If h, is irreducible satisfies the equation L,g = 0; that is, Vjti +
and M is not locally symmetric, then h, acts Vi tj = 0, where L, denotes tLie derivation
ttransitively on the unit sphere of Tp. The and the & are tcovariant components of X
classification of possible candidates for such with respect to a natural frame (a/ax,), i =
h, has been made [8,9]. For example, if n 1, . . , n (- 417 Tensor Calculus). This equa-
is even and h, is the tunitary group U(n/2), tion is called Killing’s differential equation,
then h acts transitively on the unit sphere. A and a solution X of this equation is called a
necessary and sufficient condition for h to be Killing vector field. The set of all Killing vec-
contained in U(n/2) is that M have a tcom- tor fields is a Lie algebra of finite dimension
plex structure and the structure of a Kahler ( < dim B). If M is complete, then this Lie alge-
manifold. bra coincides with that of I(M). If M is com-
The group h acts naturally on the ttensor pact and the Ricci tensor is negative definite,
algebra F(Tp) of Tp. If a tensor field A on M is then I(M) is discrete. If, furthermore, the sec-
parallel, then A, is invariant under h. Con- tional curvature is nonpositive, then an iso-
versely, if A, E y( T,) is invariant under h, there metry of M homotopic to the identity trans-
exists a unique parallel tensor field A satisfying formation is the identity transformation itself.
A, = A,. The orthogonal frame bundle B is It is known that dim I(M) < n(n + 1)/2 if
treducible to the h-bundle. dim M = n, and the maximum dimension is
attained only when M is a space of constant
curvature. For Riemannian manifolds with
F. Transformation Groups large I(M), extensive work on the structures of
M and I(M) has been done by I. P. Egorov, S.
The group I(M) consisting of all isometries of Ishihara, N. H. Kuiper, L. N. Mann, Y. Muto,
M with the tcompact-open topology is a tLie T. Nagano, M. Obata, H. Wakakuwa, K.
transformation group. The isotropy subgroup Yano, and others [lo, 11).
at any point is compact. In particular, if M is The fixed point set of a family of isometries
compact, so is I(M). The differential dcp of has interesting differential geometric properties
cp~1(M) is a transformation of the orthogonal [lo]. For example, let G be any subset of I(M)
frame bundle B. If b, is a fixed point of B, then and F the set of points of M which are left
the mapping fi defined by cp+dq(b,,) embeds fixed by all the elements of G. Then each con-
I(M) as a closed submanifold of B, and the nected component of F is a closed ttotally
differentiable structure of I(M) is thus deter- geodesic submanifold of M. If M is compact
mined. If b is surjective, it follows from the and f is an isometry of M, then A, = x(F),
structure equation that M is of constant curva- where Af denotes the TLefschetz number and
364 G 1352
Riemannian Manifolds

X(F) the tEuler characteristic of the fixed point g. When M is compact, C(M) or C,(M) is
set F off: As for the existence of fixed points essential if and only if it is not compact. If
of an isometry, the following are known: Let f C,,(M) is essential, then M is conformally
be an isometry of a compact, orientable Rie- diffeomorphic to a sphere or a Euclidean space
mannian manifold M with positive sectional (n>2) [12-151. When M is compact and has
curvature. If dim M is even and f is orientation constant scalar curvature and C,(M) # I,(M),
preserving, or if dim M is odd and f is orienta- sufficient conditions for M to be isometric to a
tion reversing, then f has a fixed point. In the sphere have been obtained by S. I. Goldberg
case of nonpositive curvature, the following is and S. Kobayashi, C. C. Hsiung, S. Ishihara, A.
basic: Every compact group of isometries of a Lichnerowicz, Obata, S. Tanno, Tashiro, K.
.complete, simply connected Riemannian mani- Yano, and others. For example, if C,(M) is
fold with nonpositive sectional curvature has a essential, then M is a sphere [14]. In general,
fixed point (E. Cartan). If a compact, orien- however, there are compact Riemannian mani-
table Riemannian manifold admits a lixed- folds with constant scalar curvature for which
point-free l-parameter group of isometries, C,(M)#I,(M) (N. Ejiri).
then its tpontryagin numbers vanish.
On a Riemannian manifold M, a transform-
ation of M which preserves the Riemannian G. Spheres as Riemannian Manifolds
connection, or equivalently which commutes
with covariant differentiation V is called an
affine transformation. Let A(M) denote the A Euclidean n-sphere S” (n > 2) has the prop-
group of all affme transformations of M. A erties of a Riemannian manifold. It is a space
transformation preserving the set of all geo- of positive constant sectional curvature l/r2
desics is called a projective transformation. Let (Y = radius) with respect to the natural Riemann-
P(M) denote the group of all projective trans- ian metric as a hypersurface of the Euclidean
formations of M. A transformation preserv- (n + 1)-space En+‘. A sphere is characterized by
ing the angle between tangent vectors is called the existence of solutions of certain differential
a conformal transformation. Let C(M) de- equations on a Riemannian manifold. On a
note the group of ail conformal transforma- unit sphere S” in E”+l, the eigenvalues of the
tions. They are Lie transformation groups tlaplace-Beltrami operator A on smooth
with respect to suitable topologies. Clearly, functionsaregivenbyO<I,<...<&<...,/?,
I(M)cA(M)cP(M), I(M)cC(M)(- 191 G- = k(n + k - 1). It is known that eigenfunctions
Structures). f corresponding to L,, Af= &f, are the restric-
A,(M), the connected component of A(M), is tions to S” of harmonic homogeneous poly-
decomposed into a direct product according to nomial functions F of degree k on En+‘. On a
the de Rham decomposition of M when M is compact Riemannian manifold M, if the Ricci
complete and simply connected (J. Hano). If curvature of M is not less than that of S”, then
M is complete and irreducible, then A(M) = the first eigenvalue 1, of A on M satisfies 1, >
I(M) except when M is a l-dimensional Euclid- Ai = n [16]. Conversely, under the same as-
ean space. If M is complete and its restricted sumption on the Ricci curvature, if xi = n,
homogeneous holonomy group h, leaves no then M is a sphere (Obata). On the other
nonzero vectors, then A,(M)=I,(M). If M is hand, if g is the standard metric on S”, then
compact, then A,(M) = I,(M) always. Af = nfis equivalent to the system of differen-
If M is complete and has a parallel Ricci tial equations
tensor, then the connected component P,,(M)
vjvif+fgji= 0. (E,)
= A,(M), unless M is a space of positive con-
stant sectional curvature (n > 2) (Nagano, N. A complete Riemannian manifold M (n > 2)
Tanaka, Y. Tashiro). If M is compact, simply admits a nontrivial solution of (E,) if and only
connected, and has constant scalar curvature, if M is a sphere (Obata, Tashiro). In general,
then P,(M) = I,(M), unless M is a sphere (n > 2) the restriction f to S” of a harmonic homogene-
(K. Yamauchi). ous polynomial of degree k satisfies Af=
Similarly to the case of P(M), it is known k(n + k - 1)f as well as a certain system (Ek)
that if M is complete and has a parallel Ricci of differential equations of degree k + 1 involv-
tensor, then the connected component C,(M) ing the Riemannian metric. For example,
= I,(M), unless M is a sphere (n > 2) (Nagano).
A conformal transformation remains con-
formal if the Riemannian metric g is changed If a complete Riemannian manifold M admits
conformally, namely, to e21g, f being any a nontrivial solution of (Ek) for some integer
smooth function on M. A subset of C(M,g) is k b 2, then M is locally isometric to a sphere
called essential if it cannot be reduced to a (Obata, Tanno, S. Gallot [17]). The gradient of
subset of I(M, 3) for any metric g conformal to a solution of (E,) is an infinitesimal conformal
1353 3641
Riemannian Manifolds

transformation and that of (E,) is an inlini- then the problem has been solved affirmatively
tesimal projective transformation. c201.
As the Kahler or quaternion Kahler version On the other hand, any smooth function on
of (E2), there is a system of differential equa- a compact manifold M of dimension n > 3 that
tions characterizing the complex projective is negative somewhere is the scalar curvature
space or the quaternion projective space as a of some metric on M. In particular, on a com-
Klhler manifold (Obata, Tanno, D. E. Blair, pact manifold (n > 3) there always exists a
Y. Maeda). Riemannian metric with constant negative
On a sphere, a Riemannian metric which is scalar curvature [18]. Any smooth function
conformal to the standard metric and has the can be the scalar curvature if and only if M
same scalar curvature as the standard one is admits a metric of constant positive scalar
always standard, namely, it has a positive curvature. The foregoing results show that
constant sectional curvature [ 143. there is no topological obstruction to the
existence of metrics with negative scalar curva-
ture of a compact manifold of dimension n >, 3.
H. Scalar Curvature For positive scalar curvature, there is a
topological obstruction. A compact tspin
structure (spin manifold) having nonvanishing
On a 2-dimensional Riemannian manifold M, ta-genus cannot carry a Riemannian metric of
the sectional curvature, the Ricci curvature, positive scalar curvature. The existence of such
and the scalar curvature all coincide with the a manifold has been shown. If a compact,
tGaussian curvature, which is a function on simply connected manifold M of dimension
M. If M is compact, by the tGauss-Bonnet n > 5 is not a spin manifold, then there exists a
theorem the Gaussian curvature K of M must Riemannian metric of positive scalar curva-
satisfy the following sign condition in terms of ture. Furthermore, if M is a spin manifold and
the tEuler characteristic x(M): spin tcobordant to M’ with positive scalar
if x(M) > 0, then K is positive somewhere; curvature, then M carries a Riemannian metric
if x(M) = 0, then K changes sign unless it is of positive scalar curvature [22]. A torus T”
identically zero; cannot carry a metric of positive scalar curva-
if x(M) < 0, then K is negative somewhere. ture. In fact, any metric of nonnegative scalar
This sign condition is also sufficient for a given curvature on T” must be flat [22].
function K to be the Gaussian curvature of Let K, and R, denote the sectional curva-
some metric on M. More precisely, starting ture and the scalar curvature, respectively, of a
with a Riemannian metric with constant Gaus- Riemannian metric g. Then the following are
sian curvature, one can say that a smooth known for a compact manifold M of dimen-
function K is the Gaussian curvature of some sion > 3: If M carries a metric g with K, <
metric conformally equivalent to the original 0, then it carries no metric with R > 0. If M
metric if and only if K satisfies the foregoing carries a metric g with K, ~0, then it carries
sign condition [ 183. no metric with R > 0. If M carries metrics gi,
H. Yamabe [ 191 announced that on every g2 with Kg1 < 0 and Rgl 2 0, then both metrics
compact Riemannian manifold (M, g) of di- are flat [22].
mension n > 3, there exists a strictly positive If the assignment of the scalar curvature to a
function u such that the Riemannian metric Riemannian metric is viewed as a mapping of
B = uq(n-2)g has constant scalar curvature. N. a space of Riemannian metrics into a space of
S. Trudinger, however, pointed out that his functions on a manifold M, then locally it is
original proof contains a gap in some cases. almost always surjective when M is compact
The problem reduces to the following non- (A. E. Fischer and J. E. Marsden, 0. Koba-
linear partial differential equation on a com- yashi, J. Lafontaine).
pact manifold M:

&p+m-2) =4n-l*u+Ru, I. Ricci Curvature and Einstein Metrics


n-2

where R is the scalar curvature of g and R a In this paragraph the manifolds under con-
constant which should be the scalar curva- sideration are assumed to be of dimension
ture of ~=u4’(“-‘)g (- 183 Global Analysis). n 2 3. The Ricci tensor (Rij) is a symmetric
Nevertheless, Yamabe’s original proof can be tensor field of type (0,2) on a Riemannian
pushed to cover a large class of metrics with manifold. The problem of finding a Riemann-
jM RdM < 0. Furthermore, it has since been ian metric g which realizes a given Ricci
solved for a wider class: namely, if M is not tensor reduces to the one of solving a system
conformally flat and n > 6, or if it is conform- of nonlinear second-order partial differential
ally flat and its fundamental group is finite, equations for g. The Bianchi identity (- 417
364 Ref. 1354
Riemannian Manifolds

Tensor Calculus) dimension impaire g courbure strictment


positive, J. Math. Pures Appl., (9) 55 (1976),
47-67.
[S] S. Mori, Projective manifolds with ample
must be satisfied. There is a symmetric (0,2)- tangent bundles, Ann. Math., (2) 110 (1979),
tensor on R” which cannot be the Ricci tensor 593-606.
for any Riemannian metric in a neighborhood [6] Y. T. Siu and S. T. Yau, Compact KLhler
of OER”. However, if a C” (or Cw) symmetric manifolds of positive bisectional curvature,
tensor field (Rij) of type (0,2) is invertible at a Inventiones Math., 59 (1980), 189-204.
point p, then in a neighborhood of p there [7] G. de Rham, Sur la reductibilitt d’un
exists a C”’ (or C”‘) Riemannian metric g such espace de Riemann, Comment. Math. Helv., 26
that (Rij) is the Ricci tensor of g [24]. (1952), 328-344.
The positivity of the Ricci curvature on a [S] M. Berger, Sur les groupes d’holonomie
Riemannian manifold puts rather strong re- homogine des varittks B connexion afIine et
strictions on the topology of the manifold des variCt&s riemanniennes, Bull. Sot. Math.
(- 178 Geodesics). However, nonnegative France, 83 (1955), 279-330.
Ricci curvature and positive Ricci curvature [9] J. Simons, On the transitivity of holonomy
are not too far from each other. If, on a com- systems, Ann. Math., (2) 76 (1962), 213-234.
plete Riemannian manifold M with nonnega- [lo] S. Kobayashi, Transformation groups in
tive Ricci curvature, there is a point at which differential geometry, Erg. Math., 70, 1972.
the Ricci curvature is positive, then there [ 111 K. Yano, The theory of Lie derivatives
exists a complete metric on M with positive and its applications, North-Holland, 1957.
Ricci curvature [25&271. [ 121 J. Lelong-Ferrand, Transformations
If a Riemannian manifold (M, y) is an Ein- conformes et quasi conformes des varittt&s
stein space, then g is called an Einstein metric riemanniennes compactes (dCmonstration de la
on the manifold M. Let uy denote the volume conjecture de Lichnerowicz), Acad. Roy. Belg.,
element determined by g. When M is compact, Cl. SC. MCm. Colloq. 39, no. 5 (1971), l-44.
.1 denotes the space of Riemannian metrics [ 131 A. J. Ledger and M. Obata, Compact
on M with total volume 1. The integral of the Riemannian manifolds with essential group of
scalar curvature y(g) = sM R,~I, is a functional conformorphisms, Trans. Amer. Math. Sot.,
on .X. The critical points of $? are Einstein 150 (1970), 645-651.
metrics (D. Hilbert). Let &Z, ( c A) denote the [14] M. Obata, The conjectures on conformal
space of metrics with constant scalar curva- transformations of Riemannian manifolds, J.
ture. Then if ?? is restricted to &‘, , then the Differential Geometry, 6 (1971), 247-258.
tnullity and +coindex at the critical point are [ 151 D. V. Alekseeviskii, Groups of conformal
finite [28,29]. transformations of Riemann spaces, Math.
An Einstein metric is always real analytic in USSR-Sb., 18 (1972), 285-301.
some coordinate system. In particular, if two [ 161 A. Lichnerowicz, GkomCtrie des groupes
simply connected Einstein spaces have neigh- de transformations, Dunod, 1958.
borhoods on which metrics are isometric, then [17] S. Gallot, equations diffkrentielles
they are isometric [30]. Though S” with stan- CaractCristiques de la sphkre, Ann. Sci. Ecole
dard Riemannian metric is a typical example Norm. Sup., 12 (1979), 235-267.
of an Einstein space, S4h+3 (k > 1) carries an [18] J. L. Kazdan and F. W. Warner, Exis-
Einstein metric that is not standard [31]. tence and conformal deformation of metrics
with prescribed Gaussian and scalar curva-
ture, Ann. Math., (2) 101 (1975), 317-331.
[ 191 H. Yamabe, On a deformation of
References
Riemannian structures on compact manifolds,
i Osaka Math. J., 12 (1960), 21-37.
[l] J. A. Wolf, Spaces of constant curvature, [20] T. Aubin, Equations diffkrentielles non
Publish or Perish, 1977. lintaires et problkme de Yamabe concernant la
[2] M. Berger, Les vari&tCs riemanniennes courbure scalaire, J. Math. Pure Appl., 55
homogknes simplement connexes g courbure (1976), 269-296.
strictment positive, Ann. Scuola Norm. Sup. [Zl] N. Hitchin, Harmonic spinors, Advances
Pisa, (3) 15 (1961), 179-246. in Math., 14 (1974), l-55.
[3] N. R. Wallach, Compact homogeneous [22] M. Gromov and L. B. Lawson, The classi-
Riemannian manifolds with strictly positive fication of simply connected manifold of posi-
curvature, Ann. Math., (2) 96 (1972), 277-295. tive scalar curvature, Ann. Math., (2) 111
[4] L. Berard-Bergery, Les variCtCs rieman- (1980), 423-434.
niennes homog&nes simplement connexes de [23] R. Schoen and S. T. Yau, The structure
1355 365 C
Riemannian Submanifolds

of manifolds with positive scalar curvature, gent bundle T(M), the tnormal bundle v(M),
Manuscripta Math., 28 (1979), 159-183. and their Whitney sum T(M) @ v(M).
[24] D. DeTurck, Existence of metrics with
prescribed Ricci curvature: Local theory, Tn-
B. General Results for Immersibility
ventiones Math., 65 (1981), 179-207.
[25] T. Aubin, Metriques riemanniennes et
An n-dimensional real analytic Riemannian
courbure, J. Differential Geometry, 4 (1970)
manifold can be locally isometrically embed-
383-424.
ded into any real analytic Riemannian mani-
[26] P. Ehrlich, Metric deformations of curva-
fold of dimension n(n + 1)/2 (M. Janet (1926),
ture I: local convex deformations, Geometriae
E. Cartan (1927)). The generalization to the
Dedicata, 5 (1976), l-23. C” case is an open question even when the
[27] J. P. Bourguignon, Ricci curvature and
ambient space is Euclidean.
Einstein metrics, Lecture notes in math. 838,
An n-dimensional compact C’ Riemannian
Springer, 1981,42-63.
manifold (3 < r < co) can be isometrically em-
[28] Y. Muto, On Einstein metrics, J. Dif-
bedded into an (n(3n + 11)/2)-dimensional
ferential Geometry, 9 (1974), 521-530.
Euclidean space (J. F. Nash (1956)). An n-
[29] N. Koiso, On the second derivative of the
dimensional noncompact c’ Riemannian
total scalar curvature, Osaka J. Math., 16
manifold (3 < r < co) can be isometrically em-
(1979), 4133421.
bedded into a 2(2n + 1)(3n + 7)-dimensional
[30] D. M. DeTurck and J. L. Kazdan, Some
Euclidean space (Nash (1956), R. E. Greene
regularity theorems in Riemannian geometry,
(1970)).
Ann. Sci. Ecole Norm. Sup., 14 (1981), 249-
Let M be an n-dimensional Riemannian
260. manifold with tsectional curvature K, and iii
[31] J. E. D’Atri and W. Ziller, Naturally
an (n + p)-dimensional Riemannian manifold
reductive metrics and Einstein metrics on with sectional curvature Ka. Then M cannot
compact Lie groups, Mem. Amer. Math. Sot.,
be isometrically immersed into fi in the fol-
18 (1979).
lowing cases:
See also references to 80 Connections, 105
(1) p<n-2 and Knr<Ka (T. Otsuki (1954));
Differentiable Manifolds, 109 Differential
(2) P < n - 1, K, < KG < 0, M is compact,
Geometry, 111 Differential Geometry of
and fi is complete and simply connected
Curves and Surfaces, 178 Geodesics, 191 G-
(C. Tompkins (1939), S. S. Chern and N. H.
Structures, 365 Riemannian Submanifolds, 417
Kuiper (1952), B. O’Neill(l960));
Tensor Calculus.
(3) p < n - 1, K, < 0, KG is constant ( < 0), M is
compact, and fi is complete and simply con-
nected [2].

365 (VII.1 3)
C. Fundamental Equations
Riemannian Submanifolds
Let f: (M, g)+(fi, Q) be an isometric immer-
A. Introduction sion. Let V and V denote the tcovariant differ-
entiations with respect to the tRiemannian
If an timmersion (or an tembedding) f of a connections of M and fi, respectively. For
tRiemannian manifold (M, g) into a Riemann- vector fields X and Y on M, the tangential
ian manifold (fi, g) satisfies the condition f*g component of vx Y is equal to Vx Y. Put
= g, then f is called an isometric immersion
0(X, Y)=VxY-V,Y. (1)
(or embedding) and M is called a Riemannian
submanifold of ii?f. In this article, f(M) will be Then c is a v(M)-valued symmetric (0,2) tensor
identified with M except where there is danger field on M, which is called the second funda-
of confusion. Suppose dim M = n and dim fi = mental form of M (or off). For a normal
n +p. Then the tbundle F(M) of orthonormal vector 5 at XE M, put g(A,X, Y)= Q(cr(X, Y), 5).
tangent frames of M, the bundle F,(M) of Then A, defines a symmetric linear transfor-
orthonormal normal frames of M, and their mation on T,(M), which is called the second
twhitney sum F(M) @ F,(M) are tprincipal fundamental form in the direction of 5. The
fiber bundles over M with tstructure groups eigenvalues of A, are called the principal curva-
O(n), O(p), and O(n) x O(p), respectively. These tures in the direction of 5. The connection on
are subbundles of the restriction to M of the v(M) induced from the Riemannian connec-
bundle F(fi) of orthonormal frames of fi. The tion of fi is called the normal connection of
vector bundles associated with F(M), F,(M), M (or off). Let V’ denote the covariant differ-
and F(M) @ F,(M) are, respectively, the ttan- entiation with respect to the normal con-
365 D 1356
Riemannian Submanifolds

nection. For a tangent vector field X and a n+p. Let (G% QA,BQn+p and (&% QA,BCn+p
normal vector field < on M, the tangential be the iconnection form and the tcurva-
(resp. normal) component of vxt is equal to ture form of fi with respect to (eJ, and put
-A:X (resp. V,‘<), that is to say, the relation wi=,f*Gg. Then (wj), 4i,j,(n is the con-
nection form of M with respect to (ei)isisn.
vx,I;= -A<X+V;< (2) (~9)~ Qign<aGn+p gives the second fundamental
holds. (1) is called the Gauss formula, and (2) is form, thatis,
called the Weingarten formula.
a(q, ej) =C wT(ej)e,. (1’)
Let R, i?, and R’ be the icurvature tensors
of V, V, and V’, respectively. Then the tinte- Put wr = C !I@‘. Then (h$) is the matrix repre-
grability condition for (1) and (2) implies senting the symmetric linear transformation
A, with respect to (e,), that is,
d(X. Y)Z= R(X, Y)Z+ Aoo,,, Y- Aoo,,,X
A,%ei = 1 htej. (2’)
+(v~~)(y,z)-(v;~)(x,z) (3)
Moreover, (u$),+, sa,pgn+p is the connec-
for vector fields X, Y, Z tangent to M, where
tion form of the normal connection with re-
V’ denotes covariant differentiation with re-
spect to (eJn+lgoGn+p. Let C@jj,, <i,jgn and
spect to the connection in T(M) @ v(M). (3) is
(@;),+I $n,p<n+p be the curvature forms of (w;)
called the equation of Gauss and Codazzi.
and (c$), respectively. Then the equations of
More precisely, the tangential component of
Gauss, Codazzi, and Ricci are given respec-
(3) is given by the equation of Gauss and the
tively by
normal component of (3) is given by the equa-
tion of Codazzi. Similarly, for vector fields t
and q normal to M, the relation

-dCA<>A,lX> Y) (4)
holds, which is called the equation of Ricci.
Formulas (l)-(4) are the fundamental equa-
tions for the isometric immersion ,f: M+fi.
As a particular case, suppose fi is a ispace
form of constant curvature c. Then the equa- D. Basic Notions
tions of Gauss, Codazzi, and Ricci reduce
respectively to
Let M be a Riemannian submanifold of iii. A
R(X, Y)Z=CMY,W--Y(X,Z) Y)+A,(,,,,X point x E M is called a geodesic point if cr = 0 at
A O(X.Z)Y) x. If every point of M is a geodesic point, then
(3,),
M is called a totally geodesic submanifold of
(V;,~)(Y,Z)-(V;~)(X,Z)=o, (3,)” fi. M is a totally geodesic submanifold of fi if
and only if every geodesic of M is a geodesic of
Q(R%‘, Y)~,~)=Y(CA<>~JX, Y). (4,)
IGJ.
Conversely, let (M, g) be an n-dimensional A mapping h: M+v(Mj defined by x*
simply connected Riemannian manifold, and ic:=, a(e,, ei) is independent of the choice of
suppose there is given a p-dimensional +Rie- an orthonormal basis (ei). h is called the mean
mannian vector bundle v(M) over M with curvature vector and 11h 11is called the mean
curvature tensor R’ and a v(M)-valued sym- curvature. M is called a minimal submanifold
metric (0,2) tensor field g on M. For a icross of fi if b = 0 (- 275 Minimal Submanifolds).
section 5 of v(M), define A, by q(A,X, Y)= A point XE M is called an umbilical point if
(a(X, Y), 0, where ( , ) is the fiber metric rr = g 0 h at x. x E M is an umbilical point if
of v(M). If they satisfy (3,),, (3,),, and (4,) then and only if A, is proportional to the identity
A4 can be immersed isometrically into an (n + transformation for all <E v,(M). If every point
p)-dimensional complete and simply connected of M is an umbilical point, then M is called a
space form M”+“(c) of curvature c in such a totally umbilical submanifold of fi.
way that v(M) is the normal bundle and 0 is A point XE M is called an isotropic point if
the second fundamental form. Moreover, such Ila(X,X)ii/llXl12 does not depend on XE
an immersion is unique up to an tisometry of T,(M). If every point of M is an isotropic
M”+“(c). point, then M is called an isotropic submani-
Let (e,), sACn+p be a local cross section of fold of a. It is clear that an umbilical point is
F(R) such that its restriction to M gives a an isotropic point.
local cross section of F(M) @ F,(M), and let A4 = dim nSEY,cM, ker A, is called the index
(w”) be its dual. Then ,f*o’= 0 for n + 1 <a < of relative nullity at XE M.
1357 365 H
Riemannian Submanifolds

E. Rigidity (1960)). On the contrary, a sphere has plenty


of compact minimal submanifolds.
An isometric immersion f: M+fi is said to be For each positive integer s, an n-dimensional
rigid if it is unique up to an isometry of !i?, n
sphere of curvature can be minimal-
that is, if f’:M-+fi is another isometric im- s(s+n-1)
mersion, then there exists an isometry rp of iii
such that f’ = cpof: If f: M-+ti is rigid, then ly immersed into a (2s + n - l)(“,~n~~~! - l}-
every isometry of M can be extended to an dimensional unit sphere and the immeision ’
isometry of A. is rigid if n = 2 or s < 3 (E. Calabi (1967), M.
An isometric immersion f: M-+M”+l(c) of do Carmo and N. Wallach (1971)).
an n-dimensional Riemannian manifold into Among all n-dimensional compact minimal
an (n + 1)-dimensional complete and simply submanifolds of an (n + p)-dimensional unit
connected space form is rigid in each of the sphere, the totally geodesic submanifold is
following cases: isolated in the sense that it is characterized by
(1) n = 2, c = 0, and M is compact and of posi- each of the following conditions:
tive curvature (S. Cohn-Vossen (1929)). n
(2) The index of relative nullity is < n - 3 at (1) sectional curvature > __ (T. Itoh
2(n + 1)
each point (R. Beez (1876); - [S]).
WW),
(3) n 2 5, c > 0, M is complete, and the index of (2) Ricci curvature > n- 2 (N. Ejiri (1979)),
relative nullity is < n - 2 at each point (D.
Ferus (1970)). (3) scalar curvature > n(n - 1) -n (J.
(4) n > 4, c # 0, and the tscalar curvature of M 2-l/P
Simons (1968)).
is constant (# n(n - 1)~) (C. Harle (1971)).
A generalization of (2) for the case of higher
codimension was obtained by C. Allendoerfer’ H. Submanifolds of Constant Mean Curvature
(1939). Various rigidity conditions have been
studied by S. Dolbeault-Lemoine, R. Sack- A manifold of constant mean curvature is a
steder, E. Kaneda and N. Tanaka, and others. solution to a variational problem. In partic-
ular, with respect to any volume-preserving
variation of a domain D in a Euclidean space,
F. Totally Geodesic and Totally Umbilical
the mean curvature of M = aD is constant if
Submanifolds
and only if the volume of M is critical.
The interesting question “If the mean curva-
A totally geodesic submanifold of a space form
ture of an isometric immersion-f: M-M”+‘(c)
is also a space form of the same curvature.
of an n-dimensional compact Riemannian
Totally geodesic submanifolds of compact
manifold M into an (n + l)-dimensional space
tsymmetric spaces of rank 1 were completely
form M”+‘(c) is constant, is M a sphere?” has
classified by J. A. Wolf (1963), and totally
not yet been completely solved, where M”+l(c)
geodesic submanifolds of symmetric spaces of
denotes a Euclidean space, a hyperbolic space,
rank 2 2 were studied by Wolf and B. Y. Chen
or an open hemisphere according as c = 0, < 0,
and T. Nagano [4].
or > 0. The answer is affirmative in the follow-
Let f: M +M”‘p(C) be a totally umbilical
ing cases: (1) dim M = 2, and the tgenus of M is
immersion of an n-dimensional Riemannian
zero (H. Hopf (1951), Chern (1955)). (2) f is an
manifold M into an (n+p)-dimensional space
embedding (A. D. Alexandrov (1958); - [S]).
form. Then M is a space form M”(c) with c > E,
These results remain true even if the as-
and f(M) is contained in a certain (n + l)-
sumption “the mean curvature is constant” is
dimensional totally geodesic submanifold
replaced by the weaker condition “the prin-
M”‘l(?) of M”+P(i?). If E> 0, then f(M) is local-
cipal curvatures k, > . . > k, satisfy a relation
ly a hypersphere; if E=O, then f(M) is locally
dk 1, . . . , k,) = 0 such that &p/dki > 0.”
a hyperplane or a hypersphere; if E < 0, then
Unlike an open hemisphere, a sphere S”+’
,f(M) is locally a geodesic sphere, a horo-
admits many compact hypersurfaces of con-
sphere, or a parallel hypersurface of a totally
stant mean curvaturej among which totally
geodesic hypersurface [2].
umbilical hypersurfaces and the product of
two spheres are the only ones with nonnega-
G. Minimal Submanifolds tive sectional curvature (B. Smyth and K.
Nomizu (1969)).
For general properties of minimal submani- A nonnegatively or nonpositively curved
folds - 275 Minimal Submanifolds. complete surface of nonzero constant mean
There is no compact minimal submanifold curvature in a 3-dimensional spaceform M3(c)
in a simply connected Riemannian manifold is either a sphere or a tclifford torus if c >
with nonpositive sectional curvature (O’Neill 0 and is either a sphere or a right circular
365 I 1358
Riemannian Submanifolds

cylinder if c < 0 (T. Klotz and R. Osserman then f is cylindrical (A. Pogorelov (1956), P.
(196661967) D. Hoffman (1973)). Hartman and L. Nirenberg (1959), and others).
(7) Ifpdn-1, c=F>O, and both M”(c)and
M”+P(?) are complete, then f is totally geodesic
I. Isoparametric Hypersurfaces (D. Ferus (1975)).
(8) If p<n- 1, ?>c>O, M”(c) is complete, and
A hypersurface A4 of fi is said to be isopara-
M”+P(F) is complete and simply connected,
metric if M is locally defined as the tlevel set
then f does not exist (J. Moore (1972)).
of a function f on (an open set of) fi with
property

dfr\dlldfl12=0 and dfr\d(Q”)=O, K. Homogeneous Hypersurfaces

A hypersurface A4 of a complete and simply Let M be an n-dimensional thomogeneous


connected space form A4”+‘(c) is isoparametric Riemannian manifold which is isometrically
if and only if M has locally constant principal immersed into an (n + 1)-dimensional complete
curvatures (Cartan). If c GO, A4 has at most and simply connected space form M”+’ (c).
two distinct principal curvatures (Cartan). If (1) If c= 0, then M is isometric to Sk x E’mk (S.
c > 0, the number of distinct principal cur- Kobayashi (1958), Nagano (1960) Takahashi
vatures of M is 1,2, 3,4, or 6 (H. Miinzner C91).
(1980)). If c = 0, then M is locally Sk x E”mk, (2) If c > 0, then M is isometric to EZ or else is
and if c < 0, then M is locally E” or Sk x Hnmk given as an orbit of a subgroup of the isometry
(Cartan). Isoparametric hypersurfaces of S”+’ group of M”+‘(c) (W. Y. Hsiang, H. B. Lawson,
having at most three distinct principal curva- Takagi; - [7]).
tures were completely classified by Cartan. R. (3) If c ~0, then M is isometric to E”, Sk x H’-k,
Takagi, T. Takahashi, and H. Ozeki and M. or a 3-dimensional group manifold
Takeuchi obtained several results for isopara-
metric hypersurfaces of Sn+i with four or six
distinct principal curvatures [7].
If a subgroup of the isometry group of
M”+l(c) acts transitively on M, then M is with the metric ds2 =em2rdx2 +e2’dyZ +dt2
isoparametric. The converse is true if c < 0, or (Takahashi (1971)). Each of the hypersurfaces
if c > 0 and M has at most three distinct prin- above except E2 in (2) and B in (3) is given as
cipal curvatures (Cartan), but not true in gen- an orbit of a certain subgroup of the isometry
eral (Ozeki and Takeuchi [7]). group of M”“(c).

J. Isometric Immersions between Space Forms L. KIhler Submanifolds

Let f: M”(c)+M”+“(?) be an isometric immer- A tcomplex submanifold of a +Ktihle’r manifold


sion of an n-dimensional space form into an is a Kahler manifold with respect to the in-
(n + p)-dimensional space form. duced Riemannian metric. A complex analytic
(1) If n=2, p= 1, c>O, c>F, M’(C) is complete, and isometric immersion of a Kahler manifold
and M3(c?) is complete and simply connected, (M, J, g) into a Kahler manifold (A, 1, y”) is
then ,f is totally umbilical (H. Liebmann called a Kiibler immersion, and M is called a
(1901); - [2]). Kgihler submanifold of iii. A Kahler submani-
(2) If II = 2, p = 1, c < 0, c < ?, M2(c) is complete, fold is a minimal submanifold. A compact
and M3(?) is complete and simply connected, KChler submanifold M of a Kahler manifold
then .f does not exist (D. Hilbert (1901); - @ can never be homologous to 0, that is, there
cm exists no submanifold M’ of fi such that M =
(3) If n=2, p= 1, c=O<?, M2(0) is complete, 8M’. If [M] E H.+(Ii?, Z) denotes the +homology
and M3(n is complete and simply connected, class represented by a Kahler submanifold M
then there exist infinitely many f (L. Bianchi of A, then vol(M)<vol(M’) holds for any
(1896); - 121). submanifold M’E [M] with equality if and
(4) If n = 2, p = 1, c = 0 > ?, M’(0) is complete, only if M’ is a Kahler submanifold (W. Wir-
and M3(t) is complete and simply connected, tinger (1936)).
then f(M2(0)) is either a horosphere or a set of A Kahler manifold of constant tholomor-
points at a fixed distance from a geodesic (J. phic sectional curvature is called a complex
Volkovand S. Vladimirova, S. Sasaki; - [Z]). space form. An n-dimensional complete and
(5) If n > 3, p = 1, and c > ?, then ,f is totally simply connected complex space form is either
umbilical. P,(C), Cn, or D,. Every Kghler submanifold of
(6) If p = 1, c = ?= 0, M”(O), is complete, and a complex space form is rigid (Calabi [ 10)).
M”+l(O) is complete and simply connected, KChler immersions of complex space forms
1359 365 0
Riemannian Submanifolds

into complex space forms were completely generally, let f: M + M”(c) be an isometric
determined by Calabi [lo] and by H. Naka- immersion of M into a complete and simply
gawa and K. Ogiue (1976). connected space form M”(c). If the image
C” (resp. 0.) is the only THermitian sym- of each geodesic of M is contained in a 2-
metric space that can be immersed in Cm (resp. dimensional totally geodesic submanifold of
D,,J as a Klhler submanifold (Nakagawa and M”(c), then f is either a totally geodesic im-
Takagi [ 1 I]), and KHhler immersions of Her- mersion, a totally umbilical immersion or a
mitian symmetric spaces into P,(C) were minimal immersion of a compact symmetric
precisely studied by Nakagawa, Y. Sakane, space of rank 1 by harmonic functions of de-
Takagi, Takeuchi, and others. More generally, gree 2; the last case occurs only when c > 0
Kghler immersions of homogeneous Klhler (S. L. Hong (1973), J. Little (1976), K. Saka-
manifolds into P,,,(C) were determined by mot0 [14]).
Takeuchi (1978). A KBhler submanifold of a complete and
Q~={[zil~P~+~(C)ICzf=O} inP,+,(C)is simply connected complex space form with
the only Einstein-Kghler hypersurface of a the same property as above is either a totally
complex space form that is not totally geodesic geodesic submanifold or a Veronese sub-
(B. Smyth (1967), S. S. Chern (1967)). The manifold (a KHhler immersion of P.(C) into
result remains true even if “Einstein” is re- Pntn+3j,2(C)) W. Nomizu W’W.
placed by “parallel Ricci tensor” (Takahashi Submanifolds with the above property are
(1967)). Besides linear subspaces, Q” is the only closely related to isotropic submanifolds with
Einstein-KHhler submanifold of P,,,(C) that is a V’o = 0. Submanifolds with V’a = 0 in sym-
complete intersection (J. Hano (1975)). metric spaces have been studied by Ferus, H.
Integral theorems and pinching problems Naito, Takeuchi, K. Tsukada, and others.
with respect to various curvatures for com-
pact KBhler submanifolds of P,,,(C) have been
0. Total Curvature
studied by K. Ogiue, S. Tanno (1973), S. T.
Yau (1975), and others [ 121. For example, if
the holomorphic sectional curvature of P.+,(C) Let J’: M + E” be an isometric immersion of an
n-dimensional compact Riemannian manifold
is 1, then each of the following is sufficient for
M into a Euclidean space. Let vi(M) be the
an n-dimensional compact Klhler submani-
unit normal bundle, S”-’ the unit sphere cen-
fold to be totally geodesic:
(1) holomorphic sectional curvature > l/2 (A. tered at 0 E Em, and let fi : v1 (M)+Smel be the
Ros (1985)), parallel translation. Let o and R be the tvol-
ume elements of v, (M) and P-l, respectively.
(2) sectional curvature > l/8 (A. Ros and L.
Verstraelen (1984)), Then for each 5 E vi(M), f?fi = (det A@ holds.
(3) Ricci curvature >n/2 [12], As a generalization of the ttotal curvature for
(4) embedded and scalar curvature > n* (J. H. a space curve, the total curvature of the im-
Cheng (198 1)). mersion f is defined as
The index of relative nullity p(x) of an n-
dimensional complete KHhler submanifold M If?Ql
of P,,,(C) satisfies Min,,,p(x)=O or 2n (K. Abe
(1973)). 1
ldetA<lo.
voU~“-‘) s v,(M)
If /l(M) is the least number of critical points of
M. Totally Real Submanifolds
a tMorse function on M, then

An isometric immersion of a Riemannian iqf$f)=/?(M)>2


manifold (M, g) into a KBhler manifold
(a, J,g) satisfying JT,(M) c v,.(M) at each holds (Chern and R. Lashof (1957, 1958),
point x E M is called a totally real immersion, Kuiper (1958)). 7(f) = 2 if and only if f is an
and M is called a totally real submanifold of embedding and f(M) is a convex hypersurface
iii. A totally geodesic submanifold P,(R) in of some En+’ in E” (Chern and Lashof (1958)).
P,(C), S’ x S’ in P2(C) and an immersion P.(C) If 7(f) < 3, then M is homeomorphic to S”
+P n(n+Z)(C) defined by [zi]+[zizj] give typical (Chern and Lashof (1958)). These results gen-
examples of totally real submanifolds. eralize theorems for space curves by W. Fen-
chel(1929), I. Fary (1949), J. Milnor (1950),
and others.
N. Submanifolds with Planar Geodesics An isometric immersion f which attains
inf(f) is called a minimum immersion or a
A surface in E3 whose geodesics are all plane tight immersion. tExotic spheres do not have
curves is (a part of) a plane or sphere. More minimum immersions (Kuiper (1958)). tR-
365 Ref. 1360
Riemannian Submanifolds

spaces have minimum immersions and, in [ 131 B. Y. Chen and K. Ogiue, On totally real
particular, a minimum immersion of a sym- submanifolds, Trans. Amer. Math. Sot., 193
metric R-space is a tminimal immersion into a (1974), 257-266.
hypersphere (Kobayashi and Takeuchi (1968)). [ 141 K. Sakamoto, Planar geodesic immer-
The total mean curvature of an isometric sions, TBhoku Math. J., 29 (1977), 25-56.
immersion f: M-E,,, of an n-dimensional [ 151 D. Ferus, Totale Absolutkriimmung in
compact Riemannian manifold into a Euclid- Differentialgeometrie und Topologie, Springer,
ean space is, by definition, 1968.

i llbll"* 1.
JM
It satisfies
366 (Vlll.6)
Riemann-Roth Theorems
1 llbll”*13vol(S”),
JM
where S” is the n-dimensional unit sphere. The A. General Remarks
equality holds if and only if f is totally umbil-
ical (T. J. Willmore (1968), Chen [3]). The +Riemann-Roth theorem (abbreviation:
R. R. theorem) is one of the most significant
results in the classical theory of ‘algebraic
References
functions of one variable. Let X be a compact
+Riemann surface of tgenus g, and let D =
[l] S. Kobayashi and K. Nomizu, Founda-
C mi Pi be a tdivisor on X. We denote by
tions of differential geometry II, Interscience,
deg D the degree of D, which is defined to be
1969.
C mi. The divisor D is said to be positive if
[2] M. Spivak, A comprehensive introduction
D # 0 and mi > 0 for all i. A nonzero tmero-
to differential geometry III, IV, V, Publish or
morphic function f on X determines a divisor
Perish, 1975.
(f)=CaiQi-CbjRj(ai,bj>O), where the Qi are
[3] B. Y. Chen, Geometry of submanifolds,
the zeros of order ai and the Rj are the poles of
Dekker, 1973.
order bj. The set of meromorphic functions .f
[4] B. Y. Chen and T. Nagano, Totally geo-
such that (f) + D is positive, together with the
desic submanifolds of symmetric spaces I, II,
constant ,f= 0, forms a finite-dimensional
III, Duke Math. J., 44 (1977), 745-755; 45
linear space L(D) over C. The R. R. theorem
(1978), 405-425.
asserts that dim L(D) = deg D - g + 1 + r(D),
[S] S. S. Chern, M. doCarmo, and S. Koba-
where r(D) is a nonnegative integer determined
yashi, Minimal submanifolds of a sphere with
by D. If K is the tcanonical divisor of X, then
second fundamental form of constant length,
r(D) = dim L(K -D) (- 9 Algebraic Curves C;
Functional Analysis and Related Fields,
11 Algebraic Functions D). (For the R. R.
Springer, 1970.
theorem for algebraic surfaces - 15 Algebraic
[6] A. D. Alexandrov, A characteristic prop-
Surfaces D.)
erty of spheres, Ann. Mat. Pura Appl., 58
Generalizations of this important theorem
(1962), 303-315.
to the case of higher-dimensional compact
[7] H. Ozeki and M. Takeuchi, On some types
tcomplex manifolds were obtained by K.
of isoparametric hypersurfaces I, II, TBhoku
Kodaira, F. Hirzebruch, A. Grothendieck, M.
Math. J., 27 (1975), 515-559; 28 (1976), 7-55.
F. Atiyah and I. M. Singer, and others. Let X
[S] P. Ryan, Homogeneity and some curva-
be a compact complex manifold, B be a +com-
ture conditions for hypersurfaces, TBhoku
plex line bundle over X, and B(B) be the +sheaf
Math. J., 21 (1969), 363-388.
of germs of holomorphic cross sections of B.
[9] T. Takahashi, Homogeneous hypersurfaces
When B is determined by a divisor D of X, we
in spaces of constant curvature, J. Math. Sot.
have H”(X, @(B))z L(D). Hence a desirable
Japan, 22 (1970), 395-410.
generalization of the R. R. theorem will pro-
[lo] E. Calabi, Isometric imbedding of com-
vide a description of dim, H’(X, 19(B)) in terms
plex manifolds, Ann. Math., (2) 58 (1953), l-
of quantities relating to the properties of X
23.
and B. Following this idea, various theorems
[ 1 l] H. Nakagawa and R. Takagi, On locally
of Riemann-Roth type have been obtained.
symmetric Klhler submanifolds in a com-
plex projective space, J. Math. Sot. Japan, 28
(1976), 638-667. B. Hirzebruch’s Theorem of R. R. Type
[ 121 K. Ogiue, Differential geometry of Kahler
submanifolds, Advances in Math., 13 (1974), Keeping the notation given in Section A, we
73-l 14. put x(X, c’(B)) = C,( -1)4 dim H4(X, 8(B)).
1361 366D
Riemann-Roth Theorems

Generally, if 9 is an arbitrary tcoherent ana- K is the canonical line bundle of X and cz(X)
lytic sheaf on X, we can define x(X, 9) using denotes the value at X of the 2nd Chern class
the same formula (replacing O(B) by F). The of X, that is, the Euler number of X.
quantity x(X, 9) has simple properties in The Noether formula xx =((K’) + c,(X))/12
various respects. For example, if the sequence follows from the above identity. The R. R.
O-tF-+S+F’+O is exact, we have x(X,9) theorem for surfaces is a powerful tool for the
=x(X, 9’) + x(X, 9”). If an tanalytic vector study of compact complex surfaces.
bundle F depends continuously on auxiliary
parameters, then x(X, Lo(F)) remains constant.
Let X be a projective algebraic manifold of D. Grothendieck’s Theorem of R. R. Type
complex dimension n. We consider the tChern
class c = 1 + ci + . . + c, of X and express it Grothendieck took an entirely new point of
formally as the product I-& (1 + yi). Thus the view in generalizing Hirzebruch’s theorem.
ith Chern class ci is expressed as the ith ele- The following is a description of his idea as
mentary symmetric function of yi, . , y.. Con- reformulated by A. Bore1 and J.-P. Serre [S].
sider the formal expression T(X) = I$=, yi/ We consider a nonsingular quasiprojective
(1 - e-?i). T(X) can be expanded as a formal algebraic variety X (- 16 Algebraic Varieties)
power series in the yi, and each homogeneous over a ground field of arbitrary characteristic.
term, being symmetric in the yi, can be ex- Namely, X is a closed subvariety of an open
pressed as a polynomial in ci, . . , c,, and thus set in a projective space (over an algebraically
determines a cohomology class of X. Similarly, closed ground field). Consider the group K(X),
we consider the formal expression of the Chern which is the quotient of the free Abelian group
class of the vector bundle F as 1 + d, + . . + d, generated by the equivalence classes of alge-
= n&t (1 + Sj), where q is the dimension of the braic vector bundles over X modulo the sub-
fiber of F. We put ch(F) = &i e% The formal group generated by the elements of the form
series ch(F) is also an element of the coho- F-F’- F”, where F, F’, F” are classes of
mology ring of X whose (v + 1)st term consists bundles such that there exists an exact se-
of a 2v-dimensional cohomology class. We call quence O+F’+F+F”+O. A similar construc-
h(F) the tChern character of F (- 237 K- tion for the (equivalence classes of the) tcoher-
Theory B), and define T(X, F) to be the value ent algebraic sheaves instead of the vector
of ch(F) T(X) at the tfundamental cycle X. bundles yields another Abelian group K’(X).
(The multiplication CA(F). T(X) is formal. It can be shown that K(X) is isomorphic to
T(X, F) is determined by the term of dimen- K’(X) by the correspondence F-+0(F) (= the
sion 2n alone.) T(X, F) is called the Todd char- sheaf of germs of regular cross sections of F).
acteristic with respect to F. Hirzebruch’s theo- Addition in K(X) is induced by the twhitney
rem of R. R. type asserts that x(X, B(F))= sum of the bundles, and K(X) has the struc-
T(X, F). In particular, when n = 1, F = [II] ture of a ring with multiplication induced by
(the line bundle determined by the divisor D), the tensor product. For a vector bundle F, its
Hirzebruch’s formula yields the classical R. R. Chern class c(F)=l+c,(F)+...+c,(F)(q=
theorem. If F satisfies the conditions for the the dimension of the fiber) is defined as an
vanishing theorem of cohomologies, the for- element of the tChow ring A(X) with appro-
mula gives an estimate for dimH’(X, O(F)) priate properties. (,4(X) is the ring of the
1111. rational equivalence classes of algebraic cycles
In 1963, Atiyah and Singer developed a on X, and c,(F) is the class of a cycles of co-
theory on indices of elliptic differential dimension i.) We define h(F) as before. It
operators on a compact orientable differenti- can be shown that c(F) and ch(F) are deter-
able manifold and obtained a general result mined by the image of F in K(X), and we have
that includes the proof of Hirzebruch’s 45 + 4 = c(&(?b CM5 + f7)= cm3 + CW?
theorem for an arbitrary compact complex &(&)=&(&h(q) (<,qsK(X)). If we have a
manifold [4,.5] (- 237 K-Theory H). tproper morphism f: Y-X between quasi-
projective algebraic varieties Y and X, we
have homomorphismsf!:K(X)-tK(Y) and
C. R. R. Theorem for Surfaces $:K(Y)+K(X). The former is defined by
taking the induced vector bundle and the
If X is a compact complex surface, i.e., a com- latter by the correspondence
pact complex manifold of dimension 2, then
F-+X( -l)q(aqf)F,
for complex line bundles F, and F,, the inter-
section number (FI F2) is defined to be cl(F,) U where S is a coherent algebraic sheaf on Y
c1(F2)[X]. The R. R. theorem for a line and (.!Sqf)P is the qth tdirect image of 9
bundle F on X is stated as follows: x(X, 0(F)) under f: (Since f is proper, (aqf)F is coher-
=(F’)/2-(KF)/2+((K2)+c2(X))/12, where ent.) Between Chow rings we have homomor-
366 E I362
Riemann-Roth Theorems

phisms,f*:A(X)+A(Y) and,f,:A(Y)+A(X), Singer, II by Atiyah and Segal, III by Atiyah


defined by taking inverse and direct images of and Singer).
cycles. With this notation, the theorem asserts [S] M. F. Atiyah and I. M. Singer, The index
that if X and Y are quasiprojective and ,f: Y+ of elliptic operators on compact manifolds,
X is a proper morphism, then f,(ch(t) T( Y)) = Bull. Amer. Math. Sot., 69 (1963), 422-433.
ch(f;(<))T(X). This is called Grothendieck’s [6] P. Baum, W. Fulton, and R. MacPherson,
theorem of R. R. type. If X consists of a single Riemann-Roth for singular varieties, Publ.
point, the theorem gives Hirzebruch’s theorem Math. Inst. HES, no. 45 (1975), 101-145.
for algebraic bundles. Since algebraic and [7] P. Berthelot, A. Grothendieck, L. Illusie, et
analytic theories of coherent sheaves on a al., Thkorie des intersections et thkorirme de
complex projective space are isomorphic, this Riemann-Roth, SGA 6, Lecture notes in math.
result covers Hirzebruch’s theorem (- 237 K- 225, Springer, 1971.
Theory). [S] A. Bore1 and J.-P. Serre, La th&or&me de
The subgroup R(Y) of A(Y) given by R(Y) = Riemann-Roth, Bull Sot. Math. France, 86
{ ch(Q T(Y) I< E K( Y)} is called the Riemann- (1958), 97-136.
Roth group of Y. Thus, using the notions [9] W. Fulton, Rational equivalence on sin-
developed by Grothendieck, the R. R. theorem gular varieties, Publ. Math. Inst. HES, no. 45
can be expressed as follows: R(Y) is mapped (1975), 147-167.
into R(X) by a proper morphism ,f: Y-X. [IO] W. Fulton and R. MacPherson, Intersect-
Generalizations to +almost complex manifolds ing cycles on an algebraic variety, Real and
and tdifferentiable manifolds were made by Complex Singularities, Sijthoff & Noordhoff,
Atiyah and Hirzebruch in this latter form [I]. 1976, 179-197.
One of the remarkable results is that an ele- [I I] F. Hirzebruch, Topological methods in
ment of R( Y) takes an integral value at the algebraic geometry, Springer, third edition,
fundamental cycle. This theorem is obtained 1966.
by taking X to be a single point. [ 121 K. Kodaira, On compact analytic sur-
faces I, Ann. Math., (2) 71 (1961), I 1 I - 152.
[ 131 R. MacPherson, Chern classes of singular
E. R. R. Theorem for Singular Varieties varieties, Ann. Math., (2) 100 (1974), 423-432.
[I41 B. Segre, Nuovi metodi e risultati nella
Let X be a projective variety over C and let geometria sulle varieta algebriche, Ann. Mat.
H.(X) (resp. H’(X)) denote the singular ho- (4) 35 (1953), l-128.
mology (resp. cohomology) group with rational [ 151 J. Todd, The geometrical invariants of
coefficients. Note that K(X) may not agree algebraic loci, Proc. London Math. Sot. (2) 43
with K’(X) when X is singular. The R. R. (1937), 127-138.
theorem for X formulated by P. Baum, W.
Fulton, and R. MacPherson [6] says that
there exists a unique natural transformation
?:K’(X)+H.(X) such that (I) if <EK(X) and 367 (XI.1 2)
~EK’(X), then <@~EK’(X) and ~(<@q)=
ch(<) (7(q)); (2) whenever X is nonsingular,
Riemann Surfaces
z(Io,) = T(X)(X). Note that the naturality of 7
means that for any f: X+ Y and any 4 E K’(X), A. General Remarks
f*7(9)=7(,f*II).
Riemann considered certain surfaces, now
named after him, obtained by modifying in a
References suitable manner the domains of definition of
multiple-valued ianalytic functions on the
[I] M. F. Atiyah and F. Hirzebruch, Riemann- complex plane in order to obtain single-valued
Roth theorems for differentiable manifolds, functions defined on the surfaces. For example,
Bull. Amer. Math. Sot., 65 (19.59), 276-281. consider the function z =f(w) = w2, where w
[2] M. F. Atiyah, R. Bott, and V. K. Patodi, varies in the complex plane, and its inverse
On the heat equation and the index theorem, function w = g(z) = J. Then g(0) = 0 and
lnventiones Math., I9 (1973), 279-330. g( co) = cc, whereas if z # 0, m, there exist two
[3] M. F. Atiyah, V. K. Patodi, and I. M. values of w satisfying g(z) = w. By setting z
Singer, Spectral asymmetry and Riemann = rei0 (r > 0,O < 0 <27-c), the corresponding
geometry, Bull. Lond. Math. Sot., 5 (1973), two values of w are w1 = ~e(0i2)’ and w2 =
229-234. J;e (0/2+n)i. Now consider how we should
[4] M. F. Atiyah, G. B. Segal, and I. M. Sin- modify the complex z-plane so that we can
ger, The index of elliptic operators, Ann. obtain a single-valued function on the modi-
Math., (2) 87 (1968), 484-604 (I by Atiyah and fied surface representing the same relationship
1363 367 B
Riemann Surfaces

between z and w. Let rc, and rcz be two copies manifold. Moreover, by condition (ii) it can be
of the complex plane. Delete the nonnegative deduced that R satisfies the tsecond counta-
real axes from rci and rc2 and patch them bility axiom and consequently is a tsurface
crosswise along the slits (Fig. 1). The surface R and admits a tsimplicial decomposition (T.
thus obtained is locally homeomorphic to the Rado, 1925). It is also orientable (- 410 Sur-
complex plane except for the origin and the faces). Therefore R is a tlocally compact metric
point at infinity, and situations at the origin space. It is not possible to define curve lengths
and the point at infinity are as indicated in on R compatible with the conformal structure,
Fig. 1. For z#O, co, there are two points zi but since angles can be defined, R is consid-
and z2 in rrr and rc2, respectively, with the ered to be a real 2-dimensional space with a
same coordinate z. Let w1 and w2 correspond tconformal connection. It is customary in the
to zi and z2, respectively. Then the function theory of functions to call R closed if it is
w = & becomes single-valued on R, and wr compact and open otherwise. A plane region D
and w2 are holomorphic functions of zr and is considered an open Riemann surface with
z2, respectively. The surface R is called the the conformal structure ‘u =(D, 1: D+D). A
Riemann surface determined by w = $. Riemann sphere is also considered to be a
closed Riemann surface whose analytic neigh-
borhoods are given by {U,, cp} and {U,, l/q},
where U, (U,) is the domain corresponding to
{1~~<2]({121>1/2}U{co})underthestereo-
graphic projection cp.
A function f on a Riemann surface is said to
be meromorphic, holomorphic, or harmonic
on R if f o $6’ is tmeromorphic, tholomor-
phic, or tharmonic in the usual sense on tie(U)
for every analytic neighborhood (U, I+&).More
Fig. 1
generally, suppose that for mappings between
plane regions we are given a property ‘$3 that
is invariant under conformal mappings. A
Working from the idea illustrated by this mapping T of a Riemann surface R, onto
example, H. Weyl and T. Rado gave rigorous another Riemann surface R, is said to have
definitions of Riemann surfaces. The usual de- the property ‘$3 if the mapping $“, o To I+&,’
finition nowadays is as follows: Let ‘$l be a set of I,&, (U,) into Il/u,( U,) has the property ‘p
of pairs (U, $c) of open sets U in a tconnected for every pair of analytic neighborhoods
tHausdorff space R and topological mappings VJ,, h,) nd W, h2) in R, and R,, respec-
$, of U onto plane regions satisfying the tively. Thus such a mapping T may be con-
following two conditions: (i) R= UcU,wojea U; formal, tanalytic, tquasiconformal, harmonic,
(ii) for each (U,, tiu,), (U,,$u2)~21 with I/= etc. If there exists a one-to-one conformal
U, n U, # 0, &, 0 $;: gives an (orientation- mapping of a Riemann surface R, onto an-
preserving) tconformal mapping of each con- other Riemann surface R,, then R, and R, are
nected component of ti”,(V) onto a corre- said to be conformally equivalent. Two such
sponding one of tic,(V). Two such sets 2X, Riemann surfaces are sometimes identified
and 211, are equivalent, by definition, if QII, U with each other.
91u, also satisfies conditions (i) and (ii). The
equivalence class (2I) of such Iu is called a
conformal structure (analytic structure or com- B. Covering Surfaces
plex structure) on R (- 72 Complex Mani-
folds). A pair (R, (2l)) consisting of a connected One of the main themes of the theory of func-
Hausdorff space R and a conformal structure tions is the study of analytic mappings of a
(‘?I) is called a Riemann surface, with R its Riemann surface R into another Riemann
base space and (2X) its conformal structure. (A surface R,, i.e., the theory of covering surfaces
Riemann surface in this sense is sometimes of Riemann surfaces.
called an abstract Riemann surface.) It is a Suppose, in general, that there are two
complex manifold of tcomplex dimension 1. surfaces R and R, and a continuous open
For (U, $c) in 2lu(%), (U, $c) (or sometimes U mapping T of R into, R, such that the inverse
itself) is called an analytic neighborhood, and image of a point in R, under T is an tisolated
$c is called a local uniformizing parameter (or set in R. Then T is called an inner transfor-
simply a local parameter). In the remainder of mation in the sense of Stoilow and (R, R,; T) a
this article we call R itself a Riemann surface. covering surface with R, its basic surface and
From condition (i) it follows that a Riemann T its projection. Often R is called simply a
surface R is a real 2-dimensional ttopological covering surface of R,. A point p0 with p,, =
367 C 1364
Riemann Surfaces

T(p) is called the projection of p, and p is said alytic functions proved by H. Behnke and K.
to lie above pO. In this case, there exist sur- Stein, which states that an open Riemann
face coordinates (U, $) and (U,,, I&,) at p and surface is a Stein manifold. Historically, by a
po, respectively, such that $(U)= {z 1lzl< l}, Riemann surface mathematicians meant either
~(P)=O,~,(U,)=(~II~I<~},~,(P,)=O,~~~ the abstract Riemann surface or a covering
w=(t&o TOI+-‘)(z)=z”, with the positive surface of the sphere, until the two notions
integer n independent of the choice of coordi- were proved identical.
nate neighborhoods. If n > 1, then p is called a Suppose that R is a covering surface of the
branch point, n the multiplicity, and n - 1 the z-sphere R, with the projection T: R+R,, and
degree of ramification. The set of all branch denote by R, the region that lies above 0 < lz -
points forms an at most countably infinite al < r,. If there exists a topological mapping
set of isolated points. If there is no branch Ic, of R, onto {(r,@IO<r<r,, -cc <@<co)
point, then the covering surface (R, R,; T) and such that a+rei8= T(I,-‘(r,@), then R is said
the projection Tare said to be unramified. For to have a logarithmic branch point above a; in
a given curve C,, in R, and a point p in R lying contrast, a branch point of multiplicity n of the
above the initial point of C,, a curve C in R type defined previously is sometimes called an
with p its initial point satisfying T(C) = Co is algebraic branch point.
called the prolongation along C, (or the lift of Ahlfors’s theory of covering surfaces, which
C,) starting from p. If any proper subarc of treats covering surfaces (R, R,; T) not only
C, sharing the initial point with C, admits a from the topological viewpoint but also from
prolongation along itself starting from p but the metrical one, is particularly important.
C, does not admit a prolongation along itself Let R and R, be either compact surfaces with
starting from p, then R is said to have a rela- simplicial decompositions or their closed
tive boundary above the endpoint of C,. A subregions with boundaries consisting of l-
covering surface without a relative boundary is simplexes and vertices such that T preserves
called unbounded. A tsimply connected surface simplicial decompositions. Here we call the
R” that is an unramilied unbounded covering part of the boundary of R whose projection is
surface of R, is said to be a universal covering in the interior of R, the relative boundary.
surface of R,. The universal covering surface With respect to a suitable metric on R,, let S
exists for every R. be the ratio of areas of R and R,, L the length
Suppose that R is an unbounded covering of the relative boundary, and --p and -pO the
surface of R,. Then the number of points on tEuler characteristics of R and R,, respec-
R that lie above each point of R, is always tively. Then Ahlfors’s principal theorem asserts
constant, say n (< +co), where the branch that max(O, p) 2 p,S - hL, where h > 0 is a
points of R are counted with their multiplic- constant determined only by R,. This has been
ities. n is called the number of sheets of R over applied widely in various branches of mathe-
R,, or R is said to be n-sheeted over R,. If R matics, including the theory of distribution of
and R, are compact surfaces with tEuler char- values of analytic mappings between Riemann
acteristic x and x0, respectively, and if R is an surfaces.
n-sheeted covering surface of R,, then we have
the Riemann-Hurwitz relation: x = nXo - V, V
being the sum of the degrees of ramification. A C. Uniformization
topological mapping S of an unramilied un-
bounded covering surface R of R, onto itself Suppose that we are given a correspondence
such that To S = T is called a covering trans- between the z-plane and w-plane determined
formation. The group of all covering trans- by a tfunction element p,, =(z,*, wz) (zz = P,(t),
formations of a universal covering surface of w: = QJt)). This correspondence generally
R, is isomorphic to the tfundamental group gives rise to a multiple-valued analytic func-
(i.e., the l-dimensional homotopy group) of tion w =f(z). We show how to construct a
Rw Riemann surface so that the function w =f(z)
In a covering surface (R, R,; T) whose basic can be considered a single-valued function on
surface R, is a Riemann surface, T can be it. We use f again to mean the connected
regarded as an analytic mapping of R onto R, component of the set of function elements
by giving R a conformal structure in a natural p = (z*, w*) (z* = P(t), w* = Q(t)) in the wider
manner. In particular, if R, is the sphere, then sense containing po, where the analytic neigh-
its covering surface is a Riemann surface. borhood of each point p is defined to be the set
Conversely, any Riemann surface can be re- of elements that are direct analytic continua-
garded as a covering surface of the sphere. This tions of p. Then f is a Riemann surface. For a
fact had long been known for closedRiemann point p =(z*, w*) (z* = P(t), w* = Q(t)) in f; set
surfaces; for open Riemann surfaces, it can be z=F(p)=P(O), w= G(p)=Q(O). Then two
deduced from the existence theorem of an- meromorphic functions z = F(p) and w = G(p)
1365 367 E
Riemann Surfaces

are defined on J and f can be considered as a bolic (Picard’s theorem). The Nevanlinna
covering surface of the z-sphere and the w- theory of meromorphic functions stimulated
sphere. We call f an tanalytic function in the this type problem. However, it is difticult to
wider sense. Thus we obtain a single-valued measure the ramifications of covering surfaces,
function w = G(p) on the Riemann surface f and many detailed reuslts of the type problem
that can be regarded as a modification of the obtained in the 1930s are limited mainly to
original function w = f (z). Suppose that there the case where all branch points lie above a
exist two meromorphic functions z = cp(c) and finite number of points on the sphere. A s&Ii-
w = $([) on a region D in the c-plane, and let cient condition for R to be of parabolic type,
z = P(< - [,,) and w = Q([ - &,) be tLaurent given by Z. Kobayashi (using the so-called
expansions of cp and 1(1at each point &, in D. If Kobayashi net, and a sufficient condition for
the function element psO= (z*, w*) (z* = P(t), w* R to be of hyperbolic type, given by S. Kaku-
= Q(t)) belongs to the Riemann surface f; then tani (using quasiconformal mappings), are
the correspondence w = f(z) determined by the significant results on the type problem. The
function element pro is said to be locally unifor- type problem had by then been extensively
mized on D by z = cp(<) and w = $([). In par- generalized to the following classification
ticular, if { pF 1[ED} =A then f is said to be theory.
uniformized by z = cp([) and w = $(c). If an
analytic function fin the wider sense, consid-
ered as a Riemann surface, is conformally E. Classification Theory of Riemann Surfaces
equivalent to a region D in the c-plane, then f
can be uniformized by z = F(p) and w = G(p). Riemann surfaces are, as pointed out by Weyl,
In general, f is not conformally equivalent to a “not merely devices for visualizing the many-
plane region, but if an unramified unbounded valuedness of analytic functions, but rather an
covering surface (f;” f; T) off is conformally essential component of the theory . . the only
equivalent to a region D in the [-plane, then f soil in which the functions grow and thrive.”
is uniformized by z = F o T(c) and w = Go T(c) So the problem naturally arises of how to
(Schottky’s uniformization). In particular, since extend various results in the theory of analytic
the universal covering surface (f m,f; T) off is functions of a complex variable to the theory
simply connected, it is conformally equivalent of analytic mappings between Riemann sur-
to the sphere l[l i co, the finite plane ill < co, faces. In general, open Riemann surfaces can
or the unit disk Ill< 1. Consequently, f is have infinite genus and are quite complicated.
uniformized by z = F o T(c) and w = Go T(c). So to obtain fruitful results and systematic
Therefore analytic functions in the wider sense development, one usually sets certain restric-
are always uniformizable. tions on the properties of the Riemann sur-
For example, an talgebraic function f consid- faces. In connection with this, R. Nevanlinna,
ered as a Riemann surface is always closed. If L. Sario, and others initiated the classification
its tgenus g =0, then f is the sphere and is thus theory of Riemann surfaces, which classifies
uniformized by rational functions z = F(c) and Riemann surfaces by the existence (or non-
w = G(c). If g > 0, then (f m, f; T) is conformally existence) of functions with certain properties.
equivalent to I(1 < 1 or l[l< co, and hence f is Denote by X(R) the totality of functions on
uniformized by z = F o T(c) and w = Go T(c). a Riemann surface R with a certain property
When l<l< 1, z and w are tautomorphic func- X. The set of all Riemann surfaces R for which
tions with respect to the group of linear trans- X(R) does not contain any function other than
formations preserving l<l< 1, while if l[l< co, constants is denoted by Ox. The family of
they are telliptic functions. analytic functions and the family of harmonic
functions are denoted by A(R) and H(R), re-
spectively. The family of positive functions,
D. The Type Problem that of bounded functions, and that of func-
tions with finite Dirichlet integrals are denoted
A simply connected Riemann surface R is by P(R), B(R), and D(R), respectively. From
conformally equivalent to the sphere, the finite these families, various new families are created,
plane, or the unit disk. Then R is said to be e.g., MD(R) = ,4(R) n B(R) tl D(R). Usually
elliptic, parabolic, or hyperbolic, respectively. OHB, OHD, OHBD, Oas, Oao, OaBD, and also OG,
The problem of determining the types of sim- the family of Riemann surfaces on which there
ply connected covering surfaces of the sphere are no Green’s functions, are considered. P. J.
by their structures, such as the distributions of Myrberg found an example of a Riemann
their branch points, is called the type problem surface of infinite genus which has a large
for Riemann surfaces. For example, if a simply boundary but belongs to O,,. The idea behind
connected covering surface does not cover Myberg’s example is often used to construct
three points on the sphere, it must be hyper- examples in classification theory. From works
367 F 1366
Riemann Surfaces

of Y. TBki, L. Sario, K. I. Virtanen, H. L. boundaries of Riemann surfaces, and in par-


Royden, M. Parreau, M. Sakai, and others, it ticular by the complexity of the set of ideal
can be seen that there are inclusion relations, boundary points at which handles of Riemann
as indicated in Fig. 2, among the classes just surfaces (i.e., parts with cycles not homologous
mentioned. There is no inclusion relation to 0) accumulate. Hence it is desirable to find
between O,, and O,,. For Riemann surfaces larger Riemann surfaces so that ideal bound-
of finite genus, 0, = O,,. Closed Riemann ary points of original surfaces that are not
surfaces are all in 0,. accumulating points of handles become inte-
Open Riemann surfaces in 0, are also said rior points. Suppose that a Riemann surface
to be parabolic (or of null boundary), and those R is conformally equivalent to a proper sub-
not in O,, hyperbolic (or of positive boundary). region R’ of another Riemann surface R,.
Several characterizations for parabolic Then R, is said to be a prolongation of R, and
Riemann surfaces are known. R is prolongable. A nonprolongable Riemann
surface is said to be maximal. Closed Riemann
surfaces are always maximal, but there also
exist maximal open Riemann surfaces (Rado).
However, every open Riemann surface is
Fig. 2 homeomorphic to a prolongable Riemann
surface (S. Bochner). Characterizations of
From a similar point of view, the classifica- prolongable Riemann surfaces and relation-
tion problem for subregions was studied in ships between the various null classes men-
detail by Parreau, A. Mori, T. Kuroda, and tioned in Section E and prolongations were
others. We call a noncompact region 0 which investigated from several viewpoints by R. de
is the complement of a compact subset of a Possel, J. Tamura, and others.
Riemann surface a Heins’s end. M. Heins
called the minimal number (< co) of gen-
erators of the semigroup of the additive class G. Analytic Mappings of Riemann Surfaces
of HP-functions that vanish continuously at
the relative boundary of a Heins’s end Q the Apart from the development of the classifica-
harmonic dimension of Q. Its properties were tion theory of Riemann surfaces, efforts have
investigated by Z. Kuramochi, M. Ozawa, and been made to extend various results in the
others. Generally, a function f is said to be X- theory of analytic functions of a complex
minimal if f is positive and contained in X(R) variable to the case of analytic mappings be-
and there exists a constant C, for every g in tween Riemann surfaces. L. Sari0 studied the
X(R) with fag>0 such that g-C,& The method of normal operators, which is utilized
family of Riemann surfaces R not belonging to to construct harmonic functions on Riemann
0, and admitting X-minimal functions on R is surfaces with given singularities at their ideal
denoted by U,. C. Constantinescu and A. boundaries; and he extended the main theo-
Cornea and others studied Riemann surfaces rems of Nevanlinna to analytic mappings
in U,, and r/,, where m is the class of posi- between arbitrary Riemann surfaces (- 124
tive functions in HD or limits of monotone Distributions of Values of Functions of a
decreasing sequences of such functions. There Complex Variable). M. Heins introduced the
are inclusion relations U,, !$ O,, - O,, Um notions of Lindelof type, Blaschke type, and
$ On, - O,, and UHDg U,,. One of the inter- others which are special classes of analytic
esting results in classification theory is the fact, mappings. Utilizing these notions, Con-
discovered by Kuramochi, that U,,U O,$ O,, stantinescu and Cornea, Kuramochi, and
and U,,UO,~O,,. others extended various results in the theory
Classification theory has a very deep con- of cluster sets (- 62 Cluster Sets) by studying
nection with the theory of +ideal boundaries. the behavior of analytic mappings at ideal
A. Pfluger and Royden showed that the classes boundaries. The theory of tcapacities on ideal
0, and O,, are invariant under quasicon- boundaries has also been developed.
formal mappings. However, it is still an open On every open Riemann surface R there
problem whether O,, is invariant in this sense. exists a nonconstant holomorphic function
(Behnke and Stein). Furthermore, Gunning
and Narasimhan proved that there exists a
F. Prolongations of Riemann Surfaces holomorphic function on R whose derivative
never vanishes. In other words, R is conform-
As classification theory shows, pathological ally equivalent to an unramified covering
phenomena occur for Riemann surfaces from surface of the sphere. Such a locally homeo-
the viewpoint of function theory in the plane. morphic analytic mapping is called the im-
These are caused by the complexity of ideal mersion of R. The proof is based on the fol-
1367 367 I
Riemann Surfaces

lowing deep result (S. Mergelyan’s theorem): called a harmonic differential, and one with * o
Suppose that K is a compact set on R such = - iw is said to be pure. A pure differential is
that R -K has no relatively compact con- expressed as w =fdz. Here, if f is a holomor-
nected component. Then every continuous phic function, then w is called a holomorphic
function which is holomorphic on the interior (or analytic) differential, and if f is a meromor-
of K can be approximated uniformly on K by phic function, then o is called a meromor-
a holomorphic function on R [22]. phic (or Abelian) differential. The differential
A surface of genus 0 is said to be planar (or form w is a holomorphic differential if and
of planar character or schlichtartig). A simply only if it is closed (i.e., dw = 0) and pure. The
connected surface is planar. Using the Dirich- differential o is called exact (or total) if o is
let principle, P. Koebe proved the following written as dF = F, dx + Fy dy with a globally
general uniformization theorem: Every planar single-valued function F.
Riemann surface R can be mapped conform- Next, the set of all measurable differentials
ally to the canonical slit regions on the ex- w with IIwllz=jsR wr\*o<coformsatHil-
-
tended complex plane C. More precisely, given bert space with respect to the norm IIw//. The
a point p on R, there exists the extremal hori- method of orthogonal decomposition in the
zontal (vertical) slit mapping F,(F,) such that theory of Hilbert spaces is the main device to
(i) Fi (F,) maps R conformally to a region on C study this space and also its suitable subspaces
whose boundary consists of horizontal (ver- and to obtain the existence theorem of har-
tical) slits and possibly points; (ii) F1 and F, monic and holomorphic differentials with
have a simple pole at p with residue 1; (iii) the various properties (- 194 Harmonic Inte-
total area of the slits and points is zero. Sup- grals). However, in contrast to differentiable
posethatFi=l/z+ai+ciz+...(i=1,2)in manifolds, it should be noted that finer or-
terms of local parameter z at p. Then s = (cl - thogonal decompositions into subspaces with
c,)/2 is called the span of R. It is known that specific properties hold for open Riemann
s (= Ild(Fi - F,)/2 I(*) > 0, where the equality surfaces. For instance, let r’(T,) be the Hilbert
holds if and only if R belongs to O,,. In the space of analytic (harmonic) differentials with
case of finite genus g, there also exist the con- finite norm, and set F,, = {WC r, 1w is exact},
formal mappings of R onto the parallel slit F,,,, = {o E I,, 1w is tsemiexact}; then we have
regions on the (g + 1)-sheeted covering surface the orthogonal decompositions
of C (Z. Nehari, Y. Kusunoki, and others).
L. Ahlfors proved that a Riemann surface
of genus g bounded by m contours can be r, = *r,,, i r,, = r,,, i *rhm, etc.,
mapped conformally to an at most (2g + m)-
where *F, stands for the space {w I *w E TX} and
sheeted unbounded covering surface of the
I,, and I,, are known as the space of analytic
unit disk.
Schottky differentials and the space of dif-
The structures of closed Riemann surfaces
ferentials of harmonic measures, respectively.
are determined by the algebraic structures of
Both spaces r,, and F,, have remarkable
meromorphic function fields on them. H. Iss’sa
properties [9].
obtained a noteworthy result which estab-
lished that open Riemann surfaces are also
determined by their meromorphic function
fields [24]. It is known too that open Rie- I. Abelian Integrals on Open Riemann Surfaces
mann surfaces are determined by their rings
of holomorphic functions. The systematic effort to extend the theory of
Abelian integrals on closed Riemann surfaces
to open Riemann surfaces was initiated by
H. Differential Forms on Riemann Surfaces Nevanlinna in 1940. At the first stage of the
development, only those Riemann surfaces
Since Riemann surfaces are considered real 2- with small boundaries (i.e., ones in 0, or O,,)
dimensional differentiable manifolds of class were treated, later a more general treatment
C”, differential l-forms o=udx+udy and was made possible by the discovery of the
differential 2-forms CI = c dx A dy are defined on notion of semiexact differentials (K. Virtanen,
them, and operations such as the exterior Ahlfors).
derivative dw = (au/ax - &lay) dx A dy and Let R be an arbitrary open Riemann sur-
exterior product can be defined (- 105 Dif- face. A l-dimensional cycle C is called a divid-
ferentiable Manifolds). Since coordinate trans- ing cycle if for any compact set in R there
formations satisfy the Cauchy-Riemann dif- exists a cycle outside the compact set homol-
ferential equation, the conjugate differential ogous to C. A differential is said to be semi-
*co= -vdx+udy of w can be defined. A dif- exact if its period along every dividing cycle
ferential form o satisfying dw = d * w = 0 is vanishes.
367 Ref. 1368
Riemann Surfaces

Ahlfors defined the distinguished (com- English translation, The concept of a Riemann
plex harmonic) differentials with polar sin- surface, Addison-Wesley, 1964.
gularities and obtained in terms of them a [2] S. Stoilow, Leqons sur les principes topo-
generalization of Abel’s theorem. Indepen- logiques de la thtorie des fonctions analy-
dently, Y. Kusunoki defined.the semiexact tiques, Gauthier-Villars, 1938.
canonical (meromorphic) differentials and [3] L. Ahlfors, Zur Theorie der Uberlagerungs-
gave in terms of them a formulation of Abel’s flachen, Acta Math., 65 (1935), 157-194.
theorem and the Riemann-Roth theorem on R [4] R. Nevanlinna, Uniformisierung, Springer,
[27]. It was proved that a meromorphic dif- 1953.
ferential df =du + idu is semiexact canonical if [S] H. Behnke and F. Sommer, Theorie der
and only if du is (real) distinguished, and then analytischen Funktionen einer komplexen
u is almost constant on every ideal boundary Veranderlichen, Springer, second revised edi-
component of R (in appropriate tcompactifi- tion, 1962.
cation of R). Hence every meromorphic func- [6] A. Pfluger, Theorie der Riemannschen
tion ,f for which df is distinguished is almost Flachen, Springer, 1957.
constant on every ideal boundary component, [7] G. Springer, Introduction to Riemann
and therefore f reduces to a constant by the surfaces, Addison-Wesley, 1957.
boundary theorem of Riesz type if R has a [S] L. V. Ahlfors (ed.), Contribution to the
large boundary. Whereas by the Riemann- theory of Riemann surfaces, Ann. Math.
Roth theorem above a nonconstant meromor- Studies, Princeton Univ. Press, 1953.
phic function ,f such that df is (exact) canon- [9] L. V. Ahlfors and L. Sario, Riemann sur-
ical exists on any open Riemann surface R faces, Princeton Univ. Press, 1960.
with finite genus, and f gives a canonical par- [lo] M. Schiffer and D. C. Spencer, Func-
allel slit mapping of R (- Section G). H. L. tionals of finite Riemann surfaces, Princeton
Royden [28] and B. Rodin also gave gener- Univ. Press, 1954.
alizations of the Riemann-Roth theorem. [ 111 L. Bers, Riemann surfaces, Lecture notes,
M. Yoshida, H. Mizumoto, M. Shiba, and New York Univ., 1957.
others further generalized the Kusunoki type [ 121 M. Tsuji, Potential theory in modern
theorems. The Riemann-Roth theorem for a function theory, Maruzen, 1959 (Chelsea,
closed Riemann surface can be deduced from 1975).
that for open Riemann surfaces by considering [ 131 R. C. Gunning, Lectures on Riemann
an open Riemann surface obtained from a surfaces, Princeton Univ. Press, 1966.
closed surface by deleting a point. For the classification of Riemann surfaces
Riemann’s period relation on R has been - [6,9] and the following:
studied for various classes of differentials, but [ 143 C. Constantinescu and A. Cornea, Ideale
for the case of infinitely many nonvanishing Rander Riemannscher Flachen, Springer, 1963.
periods, no definitive result has been obtained. [ 151 L. Sario and M. Nakai, Classification
The same is true for the theory of Abelian theory of Riemann surfaces, Springer, 1970.
differentials with infinitely many singularities. [ 161 M. Sakai, Analytic functions with finite
On the other hand, the analogy to the classical Dirichlet integrals on Riemann surfaces, Acta
theory is lost completely if no restriction is Math., 142 (1979), 1999220.
posed on the differentials on R. In this con- For analytic mappings of Riemann surfaces
text the following results due to Behnke and - [4,6,9] and the following:
Stein [26] are outstanding: (I) There exists an [ 173 B. Rodin and L. Sario, Principal func-
Abelian differential of the first kind on R with tions, Van Nostrand, 1968.
infinitely many given periods. (2) For two [ 181 L. Sario and K. Noshiro, Value distri-
discrete sequences {p,} and {q,,} of points in bution theory, Van Nostrand, 1966.
R, there exists a single-valued meromorphic [ 191 L. Sario and K. Oikawa, Capacity func-
function with zeros at pm and poles at 4,. It is tions, Springer, 1969.
further proved that there exists an Abelian [20] L. Ahlfors, Open Riemann surfaces and
differential with prescribed divisor and peri- extremal problems on compact subregions,
ods (Kusunoki and Sainouchi). This gener- Comm. Math. Helv., 24 (1950), 100-134.
alizes the results above and the Gunning- [2l] R. Gunning and R. Narasimhan, Immer-
Narasimhan theorem. sion of open Riemann surfaces, Math. Ann.,
174 (1967), 103-108.
[22] E. Bishop, Subalgebras of functions on a
References Riemann surface, Pacif. J. Math., 8 (1958), 299
50.
[23] M. H. Heins, Algebraic structure and
[ 1] H. Weyl, Die Idee der Riemannschen conformal mapping, Trans. Amer. Math. Sot.,
Flache, Teubner, 1913, third edition, 1955; 89 (1958), 267-276.
1369 368 B
Rings

[24] H. I&a, On the meromorphic function Hence we often call a unitary ring simply a
field of a Stein variety, Ann. Math., (2) 83 ring. If a ring has only one member (namely,
(1966), 34-46. 0), then 0 is the unity element of the ring. Such
For the generalization of algebraic functions a ring is called a zero ring. However, if a ring
and Abelian integrals - [4,6,9] and the has more than one element, the unity element
following: is distinct from the zero element. A ring is
[25] R. Nevanlinna, Quadratisch integrierbare called a commutative ring if it satisfies (v) ab
Differentiale auf einer Riemannschen Mannig- = ba (a, be A) (commutative law for multipli-
faltigkeit, Ann. Acad. Sci. Fenn., 1 (1941), l-34. cation) (- 67 Commutative Rings).
[26] H. Behnke and K. Stein, Entwicklung In this article we shall discuss associative
analytischer Functionen auf Riemannschen rings. Certain nonassociative rings are impor-
Fllchen, Math. Ann., 120 (1949), 430-461. tant; an example is tLie algebra. (An algebra is
[27] Y. Kusunoki, Theory of Abelian integrals a ring having a tground ring.)
and its applications to conformal mappings,
Mem. Coll. Sci. Univ. KY&o, (A, Math.) 32
(1959), 235-258; 33 (1961), 429-433. B. Further Definitions
[28] H. L. Royden, The Riemann-Roth
theorem, Comment. Math. Helv., 34 (1960), An element a # 0 of a ring A is called a zero
37-51. divisor if there exists an element b # 0 such that
Also - reference to 11 Algebraic Functions, ub = 0 or ba = 0. A commutative unitary ring
416 Teichmiiller Spaces. having more than one element is called an
integral domain if it has no zero divisors (- 67
Commutative Rings). Elements a and b of a
ring are said to be orthogonal if ub = bu = 0. An
368 (111.9) element a satisfying a” = 0 for some positive
Rings integer n is called a nilpotent element, and a
nonzero element a satisfying a2 = a is called an
idempotent element. An idempotent element is
A. Definition said to be primitive if it cannot be represented
as the sum of two orthogonal idempotent
A nonempty set A is called a ring if the follow- elements. For any subsets S and T of a ring A,
ing conditions are satisfied. let S + T(ST) denote the set of elements s +
(1) Two toperations, called addition and t(st) (s E S, t E T). In particular, SS is denoted
multiplication (the ring operations), are defined, by S2 (similarly for S3, S4, etc.), and further-
which send an arbitrary pair of elements a, b more, {a} +S({a}S) is denoted by a+S(uS). If
of A to elements a + b and ub of A. ST= 7’S= {0}, then subsets S and Tare said
(2) For arbitrary elements a, b, c of A, these to be orthogonal. A subset S of a ring is said to
operations satisfy the following four laws: (i) be nilpotent if S” = 0 for some positive integer
a + b = b + a (commutative law of addition); (ii) n, and idempotent if S2 = S.
(a+ b)+c=u+(b+c), (ab)c=u(bc) (associative For an element a of a unitary ring A, an
laws); (iii) u(b+c)=ab+uc, (a+ b)c=ac+ bc element a’ such that u’a = 1 (au’ = 1) is called a
(distributive laws); and (iv) for every pair a, b of left (right) inverse element of a. There exists a
elements of A, there exists a unique element c left (right) inverse element of a if and only if A
of A such that a + c = b. Thus a ring A is an is generated by a as a left (right) A-module. If
tAbelian group under addition. Each element there exist both a right inverse and a left in-
a of a ring A determines operations L, and R, verse of a, then they coincide and are uniquely
defined by L,(x) = ax, R,(x) = xu (x E A). Thus a determined by a. This element is called the
ring A has the structure of a tleft A-module inverse element of a and is denoted by a-‘. An
and a fright A-module. Since the operations element that has an inverse element is called
L, and R, commute for every pair a, b of ele- an invertible element (regular element or unit).
ments of A, the ring A is also an A-A-bimodule The set of all invertible elements of a unitary
(- 277 Modules). ring forms a group under multiplication. A
The identity element of A under addition is nonzero unitary ring is called a tskew field (or
called the zero element and is denoted by 0. It tdivision ring) if every nonzero element is inver-
satisfies the equation a0 = Ou = 0 (a E A). An tible. Furthermore, a skew field that satisfies
element eE A is called a unity element (identity the commutative law is called a commutative
element or unit element) of A if it satisfies ae = field or simply a field (- 149 Fields). In a
ea = a (a E A). If A has such a unity element, it general ring A, if we define a new operation
is unique and is often denoted by 1. A ring with (u,b)+aob by setting aob=a+b-ub, then
unity element is called a unitary ring. Most of A is a tsemigroup with the identity element
the important examples of rings are unitary. 0 with respect to this operation. The inverse
368 C 1370
Rings

element under this operation is called the omorphism of the ring Z of rational integers
quasi-inverse element, and an element that has into an arbitrary unitary ring. A composite of
a quasi-inverse element is called a quasi- homomorphisms is also a homomorphism. The
invertible element (or quasiregular element). An identity mapping 1, of a ring A is an isomor-
element a of a unitary ring is quasi-invertible if phism. A homomorphism of a ring A into itself
and only if the element 1 -a is invertible. is called an endomorphism, and an isomor-
phism of A onto itself is called an automor-
phism of A. If a is an invertible element of a
C. Examples
unitary ring A, then the mapping x+axa-’
(x E A) is an automorphism of A, called an
(1) Rings of numbers. The ring Z of rational
inner automorphism.
integers, the rational number field Q, the real
When condition (ii) for a homomorphism is
number field R, and the complex number field
replaced by (ii’) f(ab) =f(b)f(u) (a, bE A), a
C are familiar examples (- 14 Algebraic
mapping satisfying (i) and (ii’) is called an
Number Fields, 257 Local Fields).
antihomomorphism. In particular, if an anti-
(2) Rings of functions. The set K’ of func-
homomorphism f is bijective, then the inverse
tions defined on a set I and taking values in a
mapping f -’ is also an antihomomorphism,
ring K forms a commutative ring under point-
and f is called an anti-isomorphism. Antiendo-
wise addition and multiplication. In particular,
morphisms and antiautomorphisms are defined
let K = R, and let I be an interval of R. Then
similarly.
the set C’(1) of continuous functions, the set
C’(I) of functions that are r-times continuously
differentiable, and the set C”(l) of analytic
E. Subrings, Factor Rings, and Direct Products
functions on I are subrings (- Section E) of
R’.
A subset S of a ring A is called a subring of A if
(3) Rings of expressions. The set K [X,, . ,
a ring structure is given on S and the canon-
X,] of polynomials and the set K [ [X,, . . ,
ical injection S+ A is a ring homomorphism.
X,,]] of tformal power series in indeterminates
Thus the ring operations of S are the restric-
Xi, , X, with coefficients in a commutative
tions of those of A. If we deal only with uni-
ring K are commutative rings (- 369 Rings of
tary rings and unitary homomorphisms, then a
Polynomials, 370 Rings of Power Series).
subring S necessarily contains the unity ele-
(4) Endomorphism rings of modules. The set
ment of A. The smallest subring containing a
&K(M) of tendomorphisms of a +module M
subset T of a ring A is called the subring gen-
over a ring K is in general a noncommutative
erated by T. The set of elements that commute
ring. In particular, if M is a finite-dimensional
with every element of T forms a subring and is
+linear space over a field K, then &JM) can
called the commuter (or centralizer) of T. In
be identified with a +full matrix ring (- 256
particular, the commuter of A itself is called
Linear Spaces, 277 Modules).
the center of A.
(5) For other examples - 29 Associative
A tquotient set A/R of a ring A by an equiv-
Algebras, 36 Banach Algebras, 67 Commuta-
alence relation R is called a factor ring (quo-
tive Rings, 284 Noetherian Rings, and 439
tient ring) of A if a ring structure is given on
Valuations.
A/R and the canonical surjection A-+ AIR
is a ring homomorphism. This is the case if
D. Homomorphisms and only if the equivalence relation R is com-
patible with the ring operations (i.e., aRu’ and
A mapping 1‘: A +B of a ring A into a ring B bRb’ imply (a + b)R(u’ + b’) and (ab) R(a’b’)).
satisfying conditions (i) f(a + h) =f(a) +f(b) Let LYand /J’ be elements of the factor ring A/R,
and (ii) f(ub)=f(a)f(b) (a, bs A) is called a represented by a and b, respectively. Then the
homomorphism. If a homomorphism f is definition of factor ring implies that c(+ /?(G$)
tbijective, then the inverse mapping f-’ : B-rA is the equivalence class represented by a +
is also a homomorphism, and in this case f b(ab). Every ring A has two trivial factor
is called an isomorphism. More precisely, a rings, namely, A itself and the zero ring 0. If A
homomorphism (isomorphism) of rings is has no nontrivial factor rings, then A is called
often called a ring homomorphism (ring isomor- a quasisimple ring (- Section G). If f: A-t i? is
phism). There exists only one homomorphism a ring homomorphism, then the image f(A) is
of any ring onto the zero ring. For unitary a subring of B, and the equivalence relation R
rings A and B, a homomorphism f: A+ B is in A defined by f(uRbof(u)=f(b)) is com-
said to be unitary if it maps the unity element patible with the ring operations of A. Thus /
of A to the unity element of B. By a homomor- induces an isomorphism A/R df(A) (- Sec-
phism, a unitary homomorphism is usually tion F).
meant. In this sense there exists a unique hom- If { AijiE, is a family of rings, the Cartesian
1371 368 G
Rings

product A = niel Ai forms a ring under the to 1,then A=Ae,+... + Ae, is the direct sum
componentwise operations (a,) + (bi) = (ai + bi) of left ideals. Conversely, if A = J, + . . . + J, is
and (ai) =(aibi). This ring is called the direct the direct sum of left ideals and 1= e, + . +
product of the family of rings {Ai}i,,. The e, (e,E Ji) is the corresponding decomposition
mapping pi:A-+Ai that assigns to each (a,) its of the unity element, then e,, . . . ,e, are ortho-
ith component a, is called a canonical homo- gonal idempotent elements. In particular, if
morphism. For any set of homomorphisms J i, . . . , Jn are two-sided ideals, then each Ji is a
h:B+Ai (iEI), there exists a unique homomor- ring with unity element e,, and by a natural
phismf:B+A such that fi=pioffor each i. correspondence, the ring A is isomorphic to
the direct product nyZ1 Ji. In this case, A is
called the direct sum of ideals J,, . . . , J, and is
F. Ideals denoted by A= @;=‘=,Ji, or A=CZ, Ji.
A ring A is called a left (right) Artinian ring
A subset of a ring A is called a left (right) ideal if it is tArtinian as a left (right) A-module (i.e.,
of A if it is a submodule of the left (right) A- if A satisfies the tminimal condition for left
module of A (- 277 Modules). In other words, (right) ideals of A). A ring A is called a left
a left (right) ideal J of A is an additive sub- (right) Noetherian ring if it is tNoetherian as a
group of A such that AJ c J (JA c J). Under left (right) A-module (i.e., if A satisfies the
the operations induced from A, J is a ring tmaximal condition for left (right) ideals of A).
(however, J is not necessarily unitary). A sub- If A is commutative, left and right are omitted
set of A is called a two-sided ideal or simply in these definitions. The property of being
an ideal of A if it is a left and right ideal. Artinian or Noetherian is inherited by quo-
For an ideal J of a ring A, we define a rela- tient rings and the direct product of a finite
tion R in A by aRboa-bsJ. Then R is an family of rings, but not necessarily by subrings.
equivalence relation that is compatible with For general rings, the maximal and minimal
the operations of A. Each equivalence class is conditions for left (right) ideals are indepen-
called a residue class modulo J, and the quo- dent, but for unitary rings, left (right) Artinian
tient ring AIR is denoted by A/J and called the rings are necessarily left (right) Noetherian (Y.
residue (class) ring (or factor ring) modulo J. If Akizuki, C. Hopkins).
it is a field, it is called a residue (class) field.
Conversely, given an equivalence relation R
that is compatible with the operations of A, G. Semisimple Rings
the equivalence class of 0 forms an ideal J of
A, and the equivalence relation defined by J The statement that a unitary ring A is tsemi-
coincides with R. simple as a left A-module is equivalent to the
If f: A+B is a ring homomorphism, then statement that A is semisimple as a right A-
the tkernel off as a homomorphism of addi- module; in this case A is called a semisimple
tive groups forms an ideal J of A, and f in- ring (- Section H). Every module over a semi-
duces an isomorphism A/J+f(A). It S is a simple ring is also semisimple. A semisimple
subring and J is an ideal of a ring A, then S + ring is left (right) Artinian and Noetherian. A
J is a subring of A and S fl J is an ideal of S. semisimple ring is called a simple ring if it is
Furthermore, the natural homomorphism S+ nonzero and has no proper ideals except {0},
(S + J)/J induces an isomorphism S/S ll J+ that is, if A is a quasisimple ring. Thus A is a
(S + J)/J (isomorphism theorem). simple ring if and only if A is a nonzero, uni-
A left (right) ideal of a ring A is said to be tary, quasisimple, left (right) Artinian ring. If A
maximal if it is not equal to A and is properly is a semisimple ring, then it has only a finite
contained in no left (right) ideal of A other number of minimal ideals A,, . . . , A,, and A is
than A. Similarly, a left (right) ideal of A is said expressible as the direct sum A = A, + . . + A,,
to be minimal if it is nonzero and properly where each Ai is a simple ring, called a simple
contains no nonzero left (right) ideal of A. component of A. Any ideal of A is the direct
If e is an idempotent element of a unitary sum of a finite number of simple components
ring A, then 1 -e and e are orthogonal idem- of A. Quotient rings of a semisimple ring and
potent elements, and A = Ae + A( 1 -e) is the the direct product of a finite number of semi-
direct sum of left ideals. This is called Peirce’s simple rings are also semisimple.
left decomposition. Peirce’s right decomposition Any left (right) ideal of a semisimple ring A
is defined similarly. A left ideal J of A can be is expressible as Ae (eA) for some idempotent
expressed as J = Ae for some idempotent element e, and Ae (eA) is minimal if and only if
element e if and only if there exists a left ideal e is primitive. In particular, a minimal left
J’ such that A = J + J’ is a direct sum decom- (right) ideal is a simple left (right) A-module
position. More generally, if e,, . . , e, are ortho- that is contained in a certain simple compo-
gonal idempotent elements whose sum is equal nent. Two minimal left (right) ideals are iso-
368 H 1372
Rings

morphic as A-modules if and only if they are References


contained in the same simple component. If ,&,
1 < i < n, are the simple components of A, then [l] B. L. van der Waerden, Algebra, Springer,
for each simple left A-module M there exists I, 1966; II, 1967.
a unique i such that A,M # {0}, and M is [2] N. Jacobson, The theory of rings, Amer.
isomorphic to a minimal left ideal contained Math. Sot. Math. Surveys, 1943.
in Ai. [3] E. Artin, C. J. Nesbitt, and R. M. Thrall,
If M is a finite-dimensional linear space over Rings with minimum condition, Univ. of
a (skew) field D, then the endomorphism ring Michigan Press, 1944.
A =&D(M) of M over D is a simple ring. Con- [4] N. Jacobson, Structure of rings, Amer.
versely, for any ring A, the endomorphism ring Math. Sot. Colloq. Pub]., 1956.
D = &A(A4) of a simple A-module M is a (skew) [S] N. Bourbaki, Elements de mathematique,
field (Schur’s lemma). If A is a simple ring, then Algebre, ch. 1, 8, Actualites Sci. Ind., 1144b,
any simple A-module M can be considered as 1261a, Hermann, 1964, 1958.
a finite-dimensional linear space over D = [6] C. Chevalley, Fundamental concepts of
gA(M), and A is isomorphic to &,(M) (Wed- algebra, Academic Press, 1956.
derburn’s theorem). Furthermore, if r is the [7] N. Jacobson, Lectures in abstract algebra
dimension of M over D, then gD(M) is isomor- I, Van Nostrand, 1951 (Springer, 1976).
phic to the full matrix ring M,(D”) of degree [S] R. Godement, Cours d’algebre, Hermann,
r over the field D”, which is anti-isomorphic 1963.
to D. The dimension r is also equal to the [9] J. P. Jans, Rings and homology, Holt,
ilength of A as an A-module. The center of Rinehart and Winston, 1964.
A = G,(M) is isomorphic to the center of D, [lo] S. Lang, Algebra, Addison-Wesley, 1965.
which is a commutative field. Thus a simple [ 111 N. H. McCoy, The theory of rings, Mac-
ring is an associative algebra over a commuta- millan, 1964.
tive field (- 29 Associative Algebras).

H. Radicals
369 (III.1 3)
Let A be a ring. Then among ideals consisting Rings of Polynomials
only of quasi-invertible elements of A, there
exists a largest one, which is called the radical
A. General Remarks
of A and denoted by %(A). The radical of the
residue ring A/%(A) is (0). A ring A is called a
In this article, we mean by a ring a tcommuta-
semiprimitive ring if ‘%(A) is {O}. On the other
tive ring with +unity element. Let R be a ring,
hand, A is called a left (right) primitive ring if it
and let X,, . , X, be variables (letters, indeter-
has a tfaithful simple left (right) A-module. The
minates, or symbols). Then the set of +poly-
radical ‘%(A) is equal to the intersection of all
nomials in Xi, . , X, with coefficients in R is
ideals J such that A/J is a left (right) primitive
called the ring of polynomials (or polynomial
ring. In a unitary ring A, %(A) coincides with
ring) in n variables Xi, . , X, over R and is
the intersection of all maximal left (right)
denoted by R[X, , . , X,] (- 67 Commuta-
ideals of A. Furthermore, in a left (right) Ar-
tive Rings; 284 Noetherian Rings; 337 Poly-
tinian ring A, %(A) is the largest nilpotent
nomials; 368 Rings). On the other hand, when
ideal of A, and the condition ‘%(A) = {0} is
R and R’ are rings with common unity element
equivalent to the condition that A is a semi-
and R c R’, then for a subset S of R’ we denote
simple ring.
the subring of R’, generated by S over R, by
Among ideals consisting only of nilpotent
R[S]. When S= {xi,. , xn}, then there is a
elements of A, there exists a largest one, which
homomorphism cp of the polynomial ring
is called the nilradical (or simply the radical)
R [X,, , X,] onto R [S] defined by
and denoted by %(A) (- 67 Commutative
Rings). The nilradical of A/‘%(A) is (0). In
general, %(A) is contained in %(A), and if A is
left (right) Artinian, we have %(A)= %(A). A
ring A is called a semiprimary ring if A/%(A) is If cp is an isomorphism, then xi, . . ,x, are said
left (right) Artinian and therefore semisimple. to be algebraically independent over R; and
Furthermore, a ring A is called a primary ring otherwise, algebraically dependent over R.
(completely primary ring) if A/%(A) is a simple Thus the ring of polynomials in n variables
ring (skew field). A primary ring is isomorphic over R may be regarded as a ring R [x, , . ,x,]
to a full matrix ring over a completely primary generated by a finite system of algebraically
ring. independent elements xi, . ,x, over R.
1373 369 D
Rings of Polynomials

B. Ideals, Homogeneous Rings, and Graded a} is called a zero point of a subset S of K[X]
Rings iffb, ,..., a,)=Oforeveryf(X, ,..., XJES.
A point (a, , . . . , a,) is called algebraic over K
Consider the polynomial ring R [X] = R [X, , (K-rational) if every ai is algebraic over K (is
. . . , X,] in n variables over a ring R. A poly- an element of K). In this way we define alge-
nomial fcR[X] is a tzero divisor if and only if braic zero points and rational zero points.
there is a nonzero member a of R for which af Zero points of S are zero points of the tideals
=O. If a is an tideal of R, then R[X]/aR[X] 2 generated by S. Therefore, in order to investi-
(R/a) [Xi, . . ,X,1. Therefore, if p is a tprime gate the set of zero points of S, we may restrict
ideal of R, then pR [X] is a prime ideal of ourselves to the case.where S is an ideal. De-
R[X]. If R is a tunique factorization domain note by V(S) the set of zero points of S. If a,,
(u.f.d.), then R[X] is also a u.f.d. If R is a nor- a, are ideals of K[X], then (i) Y(a, II a,) =
mal ring, then so is R[X]. By the tHilbert V(a,a,)= V(a,)U V(a,); (ii) V(a, +a,)= V(a,)fl
basis theorem, if R is tNoetherian, then R [X] l’(a,); and (iii) if a, and a2 have a common
is also Noetherian. If m is the tKrul1 dimen- tradical, then V(a,)= V(a,).
sion of R, then Krull dim R [X] > n + m; the
equality holds if R is Noetherian. If R is a (2) Zero Points in a Projective Space. A point
field, then R[X] is not only a u.f.d. but also a (la,, . . , da,,) of an (n - l)-dimensional tprojec-
tMacaulay ring. tive space over R (with a,sR, some ai#O, 1~0,
A homogeneous ideal of R [X] is an ideal 2 # 0) is called a zero point of a polynomial
generated by a set of thomogeneous poly- f(X,, . . ,X,) if, f being expressed as Cfi with
nomials fA (the degree of fi may depend on 2). homogeneous polynomials fi of degree i,
When a is a homogeneous ideal, an element in f;@ i, . . . , a,) = 0 for every i (this condition
R [Xl/a is defined to be a homogeneous ele- holds if and only if f&z,, . . , da,) = 0 for any
ment of degree d if it is the class of a homoge- element I in R, provided that D contains in-
neous polynomial of degree d modulo a, and finitely many elements). Therefore, zero points
the quotient ring R[X]/a is called a homoge- of a subset S of K[X] are zero points of the
neous ring. More generally, assume that a ring smallest homogeneous ideal containing S.
R is, as a tmodule, the tdirect sum C,?e Ri Thus, in order to study the sets of zero points,
of its submodules Ri (i = 0, 1,2, . . . ) and that it is sufficient to consider sets of zero points of
RiRjc Ri, for every pair (i,j). Then we call R homogeneous ideals, and propositions similar
a graded ring, and an element in R, a homo- to (i), (ii), and (iii) of part 1 of this section hold
geneous element of degree d. (In some literature for homogeneous ideals a,, a2.
the term graded ring is used in a wider sense;
see below) In a graded ring R, if an ideal is
generated by homogeneous elements, then the D. The Normalization Theorem
ideal is called a homogeneous ideal (or graded
ideal). In a graded ring R = C&Ri, if the ideal Let a be an ideal of theight h in the poly-
Clpa_i Ri has a finite basis, then R is generated nomial ring K [X] = K [Xi, . . , X,] in n vari-
(as a ring) by a finite number of elements over ables over a field K. Then there exist elements
its subring R,. Therefore, the graded ring Yi , . . , K of K[X] such that (i) K[X] is tin-
R = x Ri is Noetherian if and only if R, is tegraloverK[Y]=K[Y,,...,Y,]and(ii)
Noetherian and R is generated by a finite Y1, ..., K generate a n K [ Y] (normalization
number of elements over R,. In this case, every theorem for polynomial rings).
homogeneous ideal is the intersection of a Using this theorem, we obtain the following
finite number of homogeneous tprimary ideals, important theorems on finitely generated
and every prime divisor of a homogeneous rings.
ideal is a homogeneous prime ideal. (1) Normalization theorem for finitely gen-
The notion of a graded ring is generalized erated rings. If a ring R is finitely generated
further as follows: A ring R is graded by an over an integral domain I, then there exist an
additive semigroup I (containing 0) if R is element a ( # 0) of I and algebraically indepen-
Cier Ri (direct sum) and if RiRjc Ri+j. dent elements zi, . . . , zt of R over I such that
the king of quotients R, (where S = {a” 1n =
1,2, . . . }) is integral over I [a-‘, zi, . . . , z,].
C. Zero Points (2) If p is a prime ideal of an integral domain
R that is finitely generated over a field K, then
(1) Zero Points in an AfRne Space. We consider (height of p) + (tdepth of p) = (ttranscendence
the polynomial ring K [X] = K [Xi, . . . , X,] degree of R over K), and the depth of p coin-
in n variables over a held K and a field R cides with the transcendence degree of R/p
containing K. A point (a,, . . , a,) of an n- over K. In particular, if m is a maximal ideal
dimensional tafhne space R” = {(a,, . . . , a,) 1aie of R, then R/m is algebraic over K.
369 E 1374
Rings of Polynomials

(3) Hilbert’s zero point theorem (Hilbert Then D(A g) = 0 if and only if either .f and g
Nullstellensatz). Let a be an ideal of the poly- have a common root or a, = b, = 0. Therefore,
nomial ring K [X] = K [X, , , X,] over the if I is a u.f.d. and a, and b, have no common
field K, and assume that the field Q containing factor, then D(A g) is the required resultant
K is talgebraically closed. If f~ K [X] satisfies R(f,g). (For other methods of elimination see
the condition that every algebraic zero point B. L. van der Waerden, Algebra, vol. II. For
(- Section C) of a is a zero point of ,L then criteria on whether a finitely generated ring
some power off is contained in a. over a field is tregular - 370 Rings of Power
Series 8.)

E. Elimination Theory
F. Syzygy Theory
Let fi, , fN be elements of the polynomial
ringR=I[X ,,..., X,,Y, ,..., Y,] inm+n (1) Classical Case. The notion of syzygy was
variables over an integral domain I. For each introduced by Sylvester (Phil. Trans., 143
maximal ideal m of I, let cp,,be the canonical (1853)), then generalized and clarified by Hil-
homomorphism with modulus m, and let R,,, bert [3], whose definition can be formulated
be an algebraically closed field containing I/m. as follows: Let R = k[X,, , X,] be a poly-
Let w,:,,be the set of points (ai, . , a,) of the n- nomial ring of n variables over a field k. R
dimensional afline space fi!, over R,,, such that has the natural gradation (i.e., R is a graded
the system of equations cp,,,(f;)(X,, , X,, ring in which each Xi (1 Q i < n) is of degree 1
a ,,..., a,)=O(i=1,2 ,..., N)hasasolutionin and elements of k are of degree 0). Let M
0:. To eliminate X,, . , X, from fi, ,fN is be a finitely generated graded R-module. If
to obtain g( Y,, ) Y,)E I[ Y, , , Y,] such that fi, . ,f, form a minimal basis of M over
every point of VI, is a zero point of q,,,(g) for R consisting of homogeneous elements, we
every m; such a g (or an equation g = 0) is introduce m indeterminates u,, . . , u, and
called a resultant of fi, . . . ,fN. The set a of
Put F=Ci <jsrn Ruj, the free R-module gen-
resultants forms an ideal of I[ Y,, . . . , Y,], and erated by ui, . . . . u,. Set deg(uj) = deg(&) (1 d
{ gi , . , g,,,} is called a system of resultants if j < m) and supply F with the structure of a
the radical of the ideal generated by it coin- graded R-module. Let cp be the graded R-
cides with a. If I is finitely generated over a homomorphism of F onto M defined by cp(uj)
field, then, denoting by b the radical of the =fj. Then N = Ker(cp) is a graded R-module
ideal generated by fi, , fN, we have a = b n uniquely determined by M up to isomorphism
1[Y,, . , Y,l. In particular, let I be a field. It is (of graded R-modules); N is called the first
obvious that I$,, is contained in the set V of syzygy of M. For a positive integer r, the rth
zero points of a. However, it is not necessarily syzygy of M is inductively defined as the first
true that V= I$,,. If every L is homogeneous syzygy of the (r- 1)st syzygy of M. The Hilbert
in X,, . . , X, and also in Y,, . , Y,, then we syzygy theorem states that for any finitely
have V= yo,. generated graded R-module M, the nth syzygy
If we wish to write a system of resultants of M is free. In other words, M admits a free
explicitly, we can proceed as follows: Regard resolution of length <n, i.e., an exact sequence
the fi as polynomials in Xi with coefficients in of the form
I[X,, . . . , X,, Y,, , Y”], and obtain resultants
R(fi,h) by eliminating Xi from the pairs ,fi, fi.
Then eliminate X, from these resultants, where v < n and each F(‘) (0 < i < v) is a finitely
and so forth. To obtain R(fi,fj), we may use generated free graded R-module. It follows
Sylvester’s elimination method. Namely, let f that if M, denotes the homogeneous part of
and g be polynomials in x with coefficients degree d in M, there exists a polynomial P(X)
inI:f=a,x”+a,x”-‘+...+a,,g=h,x”+ of degree <n - 1 such that dim,(M,) = P(d) for
b, xn-’ + . + b,. Let D(h g) be the following sufficiently large d; P(X) is called the Hilbert
determinant of degree m + n: polynomial (or characteristic function) of the
a, a, “. a, 0 . graded R-module M.
0 a, “. a,-, a, 0 “.
n . . (2) Generalization by Serre. The syzygy theory
. was generalized by J.-P. Serre [2] as follows:
0 . . 0 a, a, ... Let R be a Noetherian ring and M a finitely
b, b, ... bn-l bn 0 generated R-module. Then we can find a fi-
. ..
In c .
nitely generated free R-module
homomorphism
F and an R-
cp of F onto M. The kernel of
II
110 ... 0 b, b, ... cp, called a first syzygy of M, is not uniquely
1375 370 A
Rings of Power Series

determined by M. However, if N, and N, are degree d of the power series. The homoge-
first syzygies of M, then there exist finitely neous part a, of degree zero is called the
generated tprojective R-modules P1 and PZ constant term. Addition and multiplication
such that N, @ P, z N2 @ P2 (- 277 Modules are defined by (C ad) + (I: bd) = C(ad + bd),
K). For a positive integer r, an rth syzygy is (C Q)(C bJ=C&+j=daiQ BY these.oper-
defined inductively as in (1) of this section. An ations, the set of power series forms a com-
important result of Serre is that R is a tregular mutative ring, which is called the ring of
ring of tKrul1 dimension at most n if and only (formal) power series (or (formal) power series
if an nth syzygy of every finitely generated R- ring)inX,,..., X,, over R and is denoted by
module is tprojective. R[[X,, . . . ,X,1] or R{X,, . . ,X,}. If there is a
natural number N such that ad = 0 for every d
(3) Serre Conjecture. D. Quillen (Inuertiones > N, then the power series x ad is identified
Math., 36 (1976)) and A. Suslin (Dokl. Akad. with the polynomial a, + a, + . . . + aN. Thus
Nauk SSSR (26 Feb. 1976)) solved the Serre RCX 1 ,..., X,]cR[[X, ,..., X”]].SetX=
conjecture by proving that every projective IZiXiR[CXl, . . ..XJl. Then R[[Xl, . . ..XJl
module over a ring of polynomials over a field is tcomplete under the X-adic topology
is free. (- 284 Noetherian Rings B).
Assume that R’ is a commutative ring con-
(4) Special Cases. In the following special taining R and having a unity element in com-
cases, we can define the first syzygy of M mon with R, a’ is an ideal of R’ such that R’ is
uniquely up to isomorphism: (i) R is a Noe- complete under the a’-adic topology, and
therian tlocal ring and M is a finitely gen- X 1, . . , x, are elements of a’. Then an infinite
erated R-module; and (ii) R is a graded Noe- sum Cci ,,,, i,~f’ . . . x,$ (each ij ranges over
therian ring xdb0 R,, where R, is a field and nonnegative rational integers and c~,...~,ER)
M is a finitely generated graded R-module [4]. has a well-defined meaning in R’ (namely, if S,
is a finite sum of these terms such that C ij < d,
then the infinite sum is defined to be lim,,,S,,).
References This element x ci, ...i,~f’ . . x,$ is also called a
. .
power series m x1, . . . , x, with coefficients in
[l] M. Nagata, Local rings, Wiley, 1962 R. The set of such power series in x1, . . . ,x,
(Krieger, 1975). is a subring of R’, called the power series
[2] J.-P. Serre, Algirbre locale, multiplicitCs, ring in x 1, . , x, over R and denoted by
Lecture notes in math., Springer, 1965. RCCxl,..., x.11 or R{x, , . . . ,xn}. Defining cp by
[3] D. Hilbert, Uber die Theorie der algebrai- cp(Cci ,... i,X$ . ..X~~)=~C~~.,,~,X~’ . ..x$. we
schen Formen, Math. Ann., 36 (1890), 473- obtain a ring homomorphism cp: R[ [X,, . . ,
534. Xnll-RCCxl, . ..>x.]]. If cp is an isomorphism,
[4] J.-P. Serre, Sur la dimension homologique then we say that x1, . . . , x, are analytically
des anneaux et des modules noethkriens, Proc. independent over R.
Intern. Symp. Alg. Number Theory, Tokyo If iii is a tmaximal ideal of the formal power
and Nikko (1955), 175-189. series ring R[[X,, . . ..X.,]], then nt=?iillR is
Also - references to 284 Noetherian Rings. a maximal ideal of R and iii is generated by m
and X 1, . . . , X,. An element f of the power
series ring is tinvertible if and only if its con-
stant term f0 is an invertible element of R, and
in this case f-’ = C&f;“-’ .(fo-f)“. If R
370 (111.14) is one of the following, then R[ [X,, . . . , X.11 is
also of the same kind: (i) YNoetherian ring, (ii)
Rings of Power Series tlocal ring, (iii) tsemilocal ring, (iv) tintegral
domain, (v) tregular local ring, (vi) Noetherian
A. Rings of Formal Power Series (- 67 Com- tnormal ring. But even if R is a tunique fac-
mutative Rings; 284 Noetherian Rings; 368 torization domain (u.f.d.), R[ [X,, . . . , XJ]
Rings) need not be a u.f.d. (If R is a field, or more
generally, if R is a regular semilocal ring, then
Let R be a commutative ring with unity ele- RCCX,,..., X.11 is a u.f.d.). In particular, a
ment 1. Let Fd be the module of thomogeneous ‘formal power series ring k[ [Xl] in one vari-
polynomials of degree a’ in X,, . . . , X, with able X over a field k is an integral domain
coefficients in R. A formal infinite sum Cdm,,, ad whose field of quotients is called the field of
=a,+a,+... +a,+...+ ofelementsa,EF,is (formal) power series (or (formal) power series
called a formal power series or simply power field) in one variable X over k and is denoted
series in n variables X, , . . . , X, with coefficients by k((X)); an element of k((X)) is expressed
in R, and ad is called the homogeneous part of uniquely in the form C~,a,X” (a,~k,rcZ).
370 B 1376
Rings of Power Series

B. Rings of Convergent Power Series following two conditions: (i) R has only one
maximal ideal m (i.e., R is a tquasilocal ring);
Let K be a field with multiplicative +valuation and (ii) if ,I; go, h, are manic polynomials in
L’ (for instance, K = C, the complex number one variable x (here, a polynomial in x is
field, and t.(z) = 1~~1for ~EC). A formal power called manic if the coefficient of the term of the
seriesf(X, ,..., X,)=Ccil,,,i”X;l...Xbnis highest degree is 1) such that f-g,h,ErnR[x],
said to be a convergent power series if there g,R[x] +h,R[x] tmR[x] = R[x), then there
are positive numbers r,, . . . , r,,, M such that are manic polynomials 9, hc R[x] such that
u(ci, ,,,i,)r;l rtn ,< A4 for every (il, , i,). ,f= gh and g 3 go, h 3 h, modulo m.
In this case, if ui E K and ~(a,) < r;, then Important examples of Hensel rings are
2 ci, ,.,i,afl _. u,$ has its svrn in the tcompletion complete local rings, rings of convergent
of K. The set of convergent power series is a power series, and tcomplete valuation rings.
subring of K[ [X,, , X,,]]. It is called the ring When R is a Hensel ring, a commutative ring
of convergent power series (or convergent power R’ with unity element such that R’ is a finite R-
series ring) in n variables over K and is de- module is the direct sum of a finite number of
noted by K((X,, _.., X,)) or K{X,, ._., X,). Hensel rings. For any quasilocal ring Q, there
It is a regular local ring of +Krull dimension exists a Hensel ring &, called the Henseliza-
n. Hence it is a u.f.d. and its completion is tion of Q (for details - Cl]), for which the
K [[X,, _. , X,]]. If L‘ is a +trivial valuation, following statements hold: (i) 0 is a ifaithfully
thenK((X ,,..., X,))=K[[X ,,..., X,]]. flat Q-module; (ii) if tn is the maximal ideal of
Q, then the maximal ideal of 0 is nto, and
Weierstrass’s Preparation Theorem. For an &me = Q/mQ; (iii) if R is a Hensel ring that
element ,f= C c,, ,,.i, xfl...xgwK((X I,..., contains Q and has a maximal ideal n, and
X,)), assume that c,,,,,i=Ofor i=O, l,..., n n Q = m, then there is one and only one Q-
Y- 1 and cO. ,Or #O. Then for an arbitrary homomorphism q of 0 into R; (iv) if Q is a
element g of K ((X, , , X,)), there exists a ‘normal ring, then the Q-homomorphism q is
unique q E K ((X, , , X,)) such that g - an injection; (v) if Q is a local ring. then e is
~~EC:Z~X;K((X,, ,X,-,)>. In partic- also a local ring, and Q is dense in 0.
ular (considering the case where g = XL), there
is an invertible element u of K ((X, , , X,))
such that .fu =,f, +.f, X, + +,f,-, X,l-’ +
X: (f;~K((x,,...,x,~,))). References
By this theorem, we see easily that if n is an
ideal of theight h of K ((X,, ,X,)), then [l] M. Nagata, Local rings, Interscience, 1962
K ((X,, , X,))/a is isomorphic to a ring that (Krieger, 1975).
is a finite module over K ((X,, , Xnm,)). [Z] 0. Zariski and P. Samuel, Commutative
If Q is a +prime ideal of K (( X, , , X,)), algebra II, Van Nostrand, 1960; new edition,
then QK [ [X, , , X,]] is a prime ideal. Springer, 1975.

The Jacobian Criterion. Let K be a field, and


let R be the ring of polynomials K [X,, . , X,],
the ring of formal power series K [ [X, , . ,
X,]], or the ring of convergent power series 371 (XVIII.1 0)
K ((X, , . ,X,)) in n variables X,, , X,,
Robust and Nonparametric
over K. tPartia1 derivatives ?/c?Xi are well
defined in R. For ,fl, ,f,~ R, a iJacobian Met hods
matrix J( fi, . , ,f;) is defined to be the t x n
matrix whose (i, j)-entry is <fi/aXj. Let Q be a A. General Remarks
+prime divisor of the ideal CifiR, and let q be
a prime ideal containing Q. If the +rank of Robust and nonparametric or distribution-free
(J(,fi. ,f,) modulo q) is equal to the height methods are statistical procedures specifically
of Q, then the ring R,/Z,f, R, is a regular local devised to deal with broad families of proba-
ring. The converse is also true if K is a tperfect bility distributions.
field (if K is not a perfect field, then, modifying In the theory of statistical inference it is
J(,f,, ,f,), we can have a similar criterion usual to assume that the probability distri-
Cll). bution of the population from which the ob-
served values are chosen at random is specified
C. Hensel Rings exactly except for a small number of unknown
parameters (- 401 Statistical Inference). In
A Hensel ring (or Henselian ring) is a com- practical applications, however, it often hap-
mutative ring with unity element satisfying the pens that the assumptions made for the model,
1377 371 c
Robust and Nonparametric Methods

especially those about the shape of the distri- B. The One-Sample Problem
bution, may not hold for the actual data. In
such cases robust and/or nonparametric pro- Let F(x) be a tdistribution function of a tran-
cedures that do not require exact knowledge of dom variable X, (X1, . . ,X,) be an indepen-
the shape of the distribution and yet prove to dent trandom sample of size n from F(x), and
be relatively efficient or valid are required. The (x 1, . . . , x,) be an observed sample value. The
term nonparametric or distribution-free is used 100~ percentile of F is denoted by tp, i.e.,
for problems of testing hypotheses, and the F(<,,) =p. For testing the thypothesis H: 5, <
term robust is mainly used for problems of 5’ against the talternative hypothesis A :
point estimation (- 396 Statistic; 399 Statis- t,> to, the following procedure is proposed.
tical Estimation; 400 Statistical Hypothesis Let i(x,, . . ,x,) be the number of xi that are
Testing). greater than co. A test procedure by the fol-
Although the idea of the sign test appears in lowing ttest function cp is tuniformly most
the work of J. Arbuthnot (1710), the theoret- powerful in some neighborhood of 5, = 5” for
ical foundation for nonparametric tests was the double exponential distribution, where
first given in the proposals for the permutation cp(x,, . . ,x,) is defined by the equations
test by R. A. Fisher (1935), the rank test by F.
1 when i(xl ,..., x,)>c,
Wilcoxon (1945), and the test based on U-
statistics by H. B. Mann and D. R. Whitney cpb,, . . . . x,)= a when i(xl ,..., x,)=c,
(1947). In the years that followed two impor-
0 when i(x,, . . ..x.)<c
tant ideas appeared: the concept of asymptotic 1
relative efficiency by E. J. G. Pitman (1948) (0 < a < 1, 0 < c < n). This procedure is called
and the development of the theory of U- the sign test.
statistics by W. Hoeffding (1948). H. Chernoff Suppose that F(x) is symmetric about x =
and I. R. Savage (1958) showed, in studying lliz. Let R: be the rank of IX,- 5’1 among
the asymptotic distribution of a class of rank Ix1-5°1,..., IX.-t”l, and let Y(t)= 1,0 ac-
statistics, that the asymptotic efficiencies of cording as t > 0, < 0. Set
nonparametric tests are incredibly high. These
findings accelerated the studies of nonpara- ux,, ..‘,X,)= f: a,(R+)Y(Xi-to)
i=l
metric tests; recent progress is summarized in
the books by J. Hgjek and Z. $idik Cl], M. L. for some weights a,(l), . . , a,(n). The following
Puri and P. K. Sen [2], R. H. Randles and procedure rp, called the signed rank test, is
D. A. Wolfe [3], and P. J. Huber [6]. also used for testing the hypothesis H: <Ii2 < 5’
On the other hand, G. E. P. Box (1953) first against the alternative hypothesis A: <1/Z> 5’:
coined the term robustness in his sensitivity
studies, in which he investigated how the 1 when &(x1, . . . . x,)>c,
standard statistical procedures obtained under dx,,..., x,)= a when &(x1 ,..., x,)=c,
certain assumptions are influenced when such
assumptions are violated. Two papers by J. W. 1 0 when &(x1, . . . . x,)<c.
Tukey (1960, 1962) provided the initial founda- The procedure with a,(i) = i is frequently used
tion for robust estimation. J. L. Hodges and and is called the Wilcoxon signed rank test,
E. L. Lehmann (1963) noticed that estimators which is the uniformly most powerful rank
of location could be derived from nonpara- test in a neighborhood of (I,2 = 5’ for F(x) =
metric tests and that these estimators have l/( 1 + eex), the logistic distribution.
sometimes much higher efficiency than the
sample mean. A similar study for scale was
made by S. Kakeshita and T. Yanagawa C. The Two-Sample Problem
(1967). Huber (1964) proposed an estimator of
location by generalizing the method of least Let F and G be continuous distribution func-
squares. Along with the idea of the influence tions of random variables X and Y, respec-
curve introduced by F. R. Hampel(l974) the tively, (X,, . . . ,X,,,) and ( Y1, . . . , Y.) be the corre-
estimator proposed by Huber has become a sponding random samples, and (x 1, . . . , x,,,) and
core of subsequent studies of robust estimation. (yl, . . , y.) be the respective sample values.
K. Takeuchi (1971) proposed an adaptive Consider the problem of testing the hypothesis
estimate that is asymptotically fully efficient H: F(x) = G(x) against the alternative hypoth-
for a wide class of underlying distribution esis A, : F(x) f G(x) or A,: F(x)> G(x) for all x
functions. The developments of the theory of and F(x) f G(x). When the alternative hypoth-
robust estimation are reviewed by Huber esis A, is true, we say that the random vari-
[4-61 and R. V. Hogg [7]. Various proposed able Y is stochastically larger than X and write
estimators are compared in the book by D. F. F > G. A frequently used example of such an
Andrews et al. [S]. alternative hypothesis A, is G(x) = F(x - e), 0
371 D 1378
Robust and Nonparametric Methods

> 0. Let x be the family of all strictly increas- ante o2 of TN are given by
ing continuous functions. Then the hypothesis
H and the alternative hypothesis A, are in- P= J(H(x)) Wx),
variant under the group of transformations J
of the form xf=h(xJ, y;=h(yj) (i= 1, . ,m; j
= 1, . , n; he ,X). The tmaximal invariant
statistic in this case is the rank (R,, . , R,)
Na2=2(1 -i) W4(l
IJJ
X<Y -G(Y))

of (X,, , X,) or the rank (S,, . . . , S,) of (Y,,


. ..) Y.) when the combined sample (X,, . . . ,
X,; Y,, , YJ is ordered in an ascending order.
x J’(H(x))J’(H(y))dF(x)dF(y)

If a test function cp(x,, . . . ,x,; y,, , y,) satis-


fies PJF, F) < CIand PJF, C) > c( for any F > G,
then cp is considered a desirable test, where
JJF(x)(l
+T
1-i

X<Y
-F(Y))

p#‘,G)= CP(X,,...,X,;Y,,...,Y.) x J’(H(x))J’(H(y))dG(x)dG(y) a


JJ
x n dF(xi) jj dG(Yj). where J(H) = lim iv-m J,(H), H(x) = Nx) +
I i (1 - A)G(x), and I = lim 1,. When tnk= k/N, the
Lehman& Theorem. If cp satisfies the con- statistic TN is equivalent to the +U-statistic in
ditions stating that y? > yj (j = 1, . . . , n) yield the Wilcoxon test. When ek is the mean E(Z,)
cp(x,,...,x,;Y:,...,Y,*)~cp(x,,...,x,;Y,,...,Y,), of the kth order statistic Z, in an independent
then PJF, G) > P,(F, F). If in addition cp is a sample of size N from N(0, l), then the test by
tsimilar test, then cp is unbiased (- 400 Statis- TN is called the Fisher-Yates-Terry normal
tical Hypothesis Testing C). score test. When J is the inverse function of the
The Wilcoxon test (or the Mann-Whitney U- distribution function ‘of N(0, l), then the test is
test) is described by a test function cp= 1 when called the van der Waerden test.
U>candcp=OwhenU<c,whereUisa+U-
statistic defined by D. The k-Sample Problem

Let (X,, j = 1, . . , ni) be a random sample of


size ni from the population with a distribu-
with $(x,y)= 1 when xiy and $(x,y)=O tion function Fi(x) for each i = 1, , k. The k-
when x > y. This test is similar and unbiased. sample problem is concerned with testing the
Testing the hypothesis H : F = G against hypothesis H: F,(x) = . = Fk(x) against an
the alternative hypothesis A : F # G = F(x/a), alternative hypothesis A,: not all the F,(x)
CT> 1, is another two-sample problem, for are equal, A,: F,(x) = F(x - Q,) with ei f B, or
which the following test was proposed by T. A,: F,(x) = F(x/ai) with ai$ (T. Several tests have
Tamura. The test function is given by cp= been proposed for this problem, using quadra-
lforU>candcp=OforU<c,whereU= tic forms of the vector-valued U-statistic U =
(y)-‘(;)-’ Ci<i, Cj<f $(xi, xi,; Yj> Yj,) with $(~a u’; (U ‘, , Uk) whose coordinates U’ are defined
v,v’)=l whenv<u<v’,v<u’<v’orv’<u< by means of a function
v, v’ < ti <v and $(u, u’; v, v’) = 0 otherwise.
*TX ll,...,Xlm,,;...;Xkl,...rXkmxi, 1 i=l,...,k.
The following statistic TN is used frequently
in nonparametric problems. Let x1, . ,x,; WhenN=Cni-,oowithni=piN,O<pi<l,and
y,, . , yn be arranged in order of magni- C pi = 1, then
tude. Set zk = + 1 or 0 when the kth value (k =
V=(JN(U’-E(U’)),...,JN(Uk-E(Uk)))
1,2, , n + m) in the arrangement is an xi or
yj, respectively. For a given set of N = n + m has asymptotically a tmultivariate normal
reals {ek}, TN is defined by T,=m-’ C,“=, ekzk. distribution N(0, ,?J. Let B be the projection
Set HN(x)=&F,,,(x)+(l -QG,,(x), where matrix corresponding to the eigenspace for the
F,(x) and G,(x) are the iempirical distribution zero eigenvalues of the matrix ,E, and let A be
functions based on (x,, ,x,,,) and (y,, . , y,), a matrix such that AB = 0, CA = I .- B. The
respectively, and 0 < I, -i A., = m/N < 1 - I, < 1. statistic VA’V has asymptotically a tnoncen-
Then TN is represented by the integral tral chi-square distribution with degrees of
freedom = rank C. Several kinds of test repre-
J,Uf,(x))~Fm(x) sented by a critical region of the form VAV’ >
J
c are proposed, among which the Kruskal-
with ek = J,(k/N). Chernoff and Savage [9] Wallis test is a particular one having
proved that under some regularity conditions
the asymptotic distribution of TN is normal
ii(x,:...;xk)=~~ii(x,,xi). i:=l,...,k,
and that the asymptotic mean p and the vari-
1379 371 G
Robust and Nonparametric Methods

as basic functions, where 6(x, y) = 1 when x < y F,(x). Set


and 6(x, y) = 0 otherwise.
dn=swIF&)--&)I,
E. Asymptotic Relative Effkiency of Tests 4 = sup(F,(x) - E&4),

If there is more than one test procedure for a


given testing problem, then one may wish to
compare these procedures. Let {cp,} and { &} F,(x) - F,(x)
be two sequences of level tl tests, where cp,, and s,= sup
(I < F&G F,(x) .
$” are test functions based on a sample of
size n. The tpower functions of (P” and $” are Then
denoted by p(0 1cp,) and p(0) $J, respectively.
Let 0 be a real parameter and {0,} be such that
&+0, as i+co. Consider a hypothesis 0=0,
=,=g, ( -l)ke-Zk2rZ,
and a sequence (0,) of alternative hypotheses.
If, for any increasing sequences {n,} and {nl}
Ii-Ii P,(D,<z/&)=K(z)= 1 -e-“‘,
of positive integers satisfying tl < lim p(e,) cp,,) =
limB(eiI~~,,:)<l,limn~lni(=e({cp,},{~~,,)),say)

exists and is independent of u and lim fi(@,I cp,,),


then e is called Pitman’s asymptotic relative
efficiency of { cp,} against {IL,}. Suppose further
that the tests {cp,} and {&} are based on sta-
tistics T. = t.(X) and T,,* = t:(X), respectively,
in the following manner:

1 when t,(x)>c,
((Zk+l)*n*/E)((l-o)/o)r*
x(2k+l)-‘e-
q,(x)= a when tn(x)=c,

0 when t,(x)<c, e-‘*‘2dt.


1
1 when t:(x) >c*,
The statistics d,, D., s,, S, are frequently used
Icl,(x)= b when t,*(x) =c*, to test the hypothesis F(x) = F,(x). (This prob-
lem is called testing goodness of fit.)
0 when t,*(x)<c*,
1 In a two-sample problem, let F,(x) and
whereX=(X, ,..., X,)andx=(x, ,..., x,). Put G,(x) be two empirical distribution functions
0, = 0 and 0, = k/& (k = constant) for simplic- based on samples of sizes m and n from F(x)
ity. If T, and T”* are asymptotically normal, and G(x), respectively. Set
then under some conditions e is given by the
formula
d,,,=supIF,(x)-G,(x)l,
On,. = sup(l;,(x) - G,(x)).
If the hypothesis F = G is true, we have
As an example, consider a two-sample prob- lim P,(d,,, <z/,/$ = L(z),
lem on a tlocation parameter. If the popula-
tion distribution is normal and the Wilcoxon limP,(D,,,<z/JN)=K(z),
test is used to test the hypothesis of equality of
provided that m and n tend to co so that N=
means, then its asymptotic relative efficiency
mn/(m + n)+ co and m/n is constant. Taking
against Student’s test is 3/n. For the same
account of these facts, d,,, and D,,,” are used
problem, the asymptotic relative efficiencies of
to test the hypothesis F = G. The tests using
the Fisher-Yates-Terry normal score test and
the statistics d,, D., d,,,, D,,,, are called
the van der Waerden test against Student’s test
Kolmogorov-Smirnov tests.
are both unity. For the hypothesis of equality
of means in the k-sample problem, the asymp-
totic relative efficiency of the Kruskal-Wallis G. Interval Estimation
test against the F-test is 3/n, provided that
the sample is distributed normally. Let(X,,..., X,) be an independent random
sample from the population with a distri-
F. Kolmogorov-Smirnov Tests bution function F(x - 0), where 0 is an un-
known location parameter, and (x,, . . . , x,)
Let F,(x) be the empirical distribution function be its observed value. Suppose that F(x) is
based on a random sample of size n from continuous and symmetric about the origin.
371 H 1380
Robust and Nonparametric Methods

Using the statistic S, = S(x,, , x”) for the by minimizing Cf=r p(xi - T,) or by satisfying
one-sample nonparametric test for testing the
hypothesis H: 0 = 0, a tconlidence interval of fJ
is constructed as follows. For an appropriately
given y (O<y < l), select constants d, and d, in T, is called the M-estimator. When p(t) = t2, it
the range of S(x,, . , x,,) that satisfy agrees with the least squares estimator. Let
CD(x) be the distribution function of the stan-
P,{d,<S(X, ,..., X,)<d,}=l-y,
dard normal distribution, H(x) be an arbi-
where P,, means the probability under the trary continuous distribution function which
hypothesis H: 0 = 0. If there exist statistics is symmetric about the origin, 9 be a class
L,?(xI, . , x,) and U,(x,, ,x,) such that of distribution functions of the form F(x) =
L,(x,, . , x,) d 0 < U,(x,, , x,) if and only if (1 -E)@(x)+EN(x) for a given s (O<.s<l),
d, <S(x, -8, . . ..x.-@cd, for all 0, then the and V(p, F) be the asymptotic variance of
confidence interval of 0 with lOO( 1 - y)% con- T,. Huber [ 1 l] proved that p minimizing
fidence coefficient is given by (L,(x r , . , x,), s~p,,~I/(p,F)isgivenbyp,(t)=t’/2,K(t(-
4(x 1, . ..> x,)). This interval is distribution- K2/2 defined for (t] < K, > K, respectively, for
free, i.e., it holds that some constant K. Under quite general condi-
tions, the M-estimator converges as n+ 00
P(L,y(x,, . . . . x,)<0<u,(x,, . . . . X,)} = 1 --y
to T(F) in probability, which is defined by
for all F. jt,b(x- T(F))dF(x)=O. If $(t)= --f’(t)/f(t) is
chosen for $(x), then T, is the tmaximum likeli-
When S,, is the statistic for the Wilcoxon
hood estimator of 0 for F and is asymptoti-
signed rank test, L, and U, are given by L, =
cally efficient under some regularity condi-
K&+, -,,,) and U, = M/;M-dIjr where M = n(n +
tions. Generally, the M-estimator defined
1)/2 and q,,$. < K&, are ordered values
above is not scale invariant. A scale invariant
for M averages (xi + xi)/2 (i <j = 1,2, . , n).
version of the M-estimator is obtained by
replacing the defining equation by
H. Point Estimation
&$(y)=o, (1)
n
Let (X,, ,X,) be an independent random
sample from the population with a distri- where S, is any robust estimate of scale, e.g.,
bution function F(x - H), where Q is an un- the median of { ]xi- M1/0.6745}i,1,2,,,,,. where
known location parameter, and let (x1, . ,x,) M is the sample median, or by solving the
be its observed value. simultaneous equations (1) and
There are four methods of constructing
robust estimators of 8. Let X(,, < <X,,, be gx(fq)=o
n
ordered values of X,, . . , X,. The first method is
touse T,=a,X~,,+...+a,X~,,forsomegiven with respect to T. and S,,. In the context of the
constants a,, . . , a, such that & a, = 1. T, is maximum likelihood estimation, 1 is chosen to
called the L-estimator. An example is be x(t) = t$(t) - 1 (- 399 Statistical Estimation
PI.
T,(~)=(Px~[,,l+,)+x~[,“,+2)+ The third method employs nonparametric
“’ +PX,,-[,,,,)ln(l - 2c0, tests for testing the hypothesis H: 0 = 0 against
the alternative hypothesis A : 0 > 0. Let J be a
where p = 1 + [car] -cm. This estimator is called
real-valued and nondecreasing function such
the a-trimmed mean. Let J be a real-valued
that jhJ(t)dt =0 and R:(O) be the rank of IX,
function such that j; J(t)dt = 1, and set ak =
-O(among(X,-O( ,..., IX,-Q(.SetY(t)=l,O
lf/Ll,,,,J(t)dt; then as n+co, T. converges to according as t > 0, < 0 and S(X, -- 0, , X, - 0)
T(F)= fhJ(t)F-‘(t)dt in probability. Suppose
=Ci=, J((R:(@+n)/(2n+ l))Y(X;-8). Let
that F is a distribution function having an
+absolutely continuous density function f: fI*=sup{t);s(x,-8,.,.,X,-tI)>n},
Denote the derivative off by f’, and let I(F)
fI**=inf{&S(X,-0 ,..., X,-O)<p},
be the tFisher information on 0. Set $(t) =
-,f’(t)/f(t) and J(t)=$‘(F-‘(t))/l(F). Then where n is the expected value of S(X,, ,X,)
Chernoff, J. L. Gastwirth, and M. V. Johns under the hypothesis H: 0 = 0. Then an es-
[lo] proved that under some regularity con- timator of Q is defined by T, =(Q* + 0**)/2.
ditions TR is an tasymptotically efficient es- Hodges and Lehmann [ 121 first proposed this
timator of 0 for F. technique, and this estimator is called the R-
Let p be a real-valued (usually convex) estimator. When F(x) is symmetric about the
function of a real parameter with derivative ti origin and J(t) = t-i, S tends to be the statis-
= p’. The second method is to estimate 0 by T, tic for the Wilcoxon signed rank test, and the
1381 371 K
Robust and Nonparametric Methods

R-estimator reduces to the median of n(n+ 1)/2 least approximately, by T, = T(F,). Under some
averages (Xi+Xj)/2 (1 <i<j<n). As n-co, the conditions, it can be proved that as n-+ co,
R-estimator converges in probability to T(F),
defined by n”z(T(F,)-7’(F)-;~~lIC(Xi;F,T))+0
I
F(x)+l-F(2T(F)-x)
dF(x) = 0. in probability. Thus it follows that n’j2(T,-
2
T(F)) is asymptotically normally distributed
For symmetric F, the R-estimator defined by with asymptotic variance S(IC(x; F, T))2dF(x).
the statistic S with J(t)= -f’(F-‘(t))/f(F-l(t))
is asymptotically efficient under some regular-
J. The Regression Problem
ity conditions.
Although the above three methods provide
Consider the linear regression problem (- 403
robust estimators, which are seldom affected
Statistical Models D)
by outlying observations or contamination by
gross errors, their behavior still depends on F. xi= f ejaij+&,, i = 1,2, . . , n,
The last method of constructing robust es- j=l

timators consists of estimating 0 adaptively where the Xi are observable variables, the Qj
by utilizing information on the shape of F. are regression coefficients to be estimated, the
Among these, a striking one is the asymptoti- aij are given constants, and Ed, Ed,. . . , E, are
cally fully efficient estimator for a wide class identically and independently distributed
of F proposed by Takeuchi [ 131. The es- random errors whose distribution function is
timator is constructed by using subsamples given by F(x). The idea of the M-estimator is a
of size K (K <n) drawn from the original direct generalization of the method of least
sample, estimating the elements of the tco- squares; namely, to adopt (&, . . , “,) as an
variance matrix of the order statistics by U- estimator for (0,, . . ,0,) that minimizes
statistics, and selecting the best weights of the x;Zl p(X, - Cjejaij) for some function p such as
L-estimator. L. A. Jaeckel [14] made an a- the one described above.
trimmed mean adaptive estimator by selecting R-estimators of the regression coefficients
an a that minimizes the estimated asymptotic are obtained by minimizing & a,(Ri)Ai,
variance. where Ai = Xi - CjQjaij, Ri is the rank of Ai
amongA,,..., A,, and a,( .) is some monotone
I. The Influence Curve function satisfying CyZl a,(i) = 0. It has been
proved that minimizing & a,(Ri)Ai is asymp-
Let T(F) be a functional of a distribution totically equivalent to minimizing
function F, and let an estimator T. of 0 cal-
culated from an empirical distribution function
F, converge to T(F) in probability as n+ co. A
real-valued function IC(x; F, T) defined by the properties of which were first studied by J.
JureEkov& [ 163.
T((l-e)F+s6,)--T(F)
IC(x; F, T) = by
&
K. Dependence
for all x

is called the influence curve, where S, is the Let (Xl, Y,), . . . ,(X,, Y,) be random samples
distribution function of a point mass 1 at x. from a population with a bivariate distri-
The curve was first introduced by Hampel bution function, R, be the rank of Xi among
[lS] to study the stability aspect of estima- X 1, . . . , X,, when they are rearranged in an
tors against a small change of F. As an exam- ascending order, and Sj be the rank of yj
ple, when F is symmetric about the origin, among Y,, . . , Y, defined similarly as Ri. Vari-
the influence curve for the cl-trimmed mean ous quantities are devised to measure the
IC(x; F, T) is given by degree of dependence between X and Y.

F-+$/(1 -201) when x < F-‘(a), (1) Spearman’s Rank Correlation. Set di = Ri
x/(1-2a) when F-‘(cc)<x -Si.Thenr,=l-6Zid’/(n3-n)iscalled
Spearman’s rank correlation. If there is no
<F-‘(l-a), dependence between X and Y, i.e., if the Xi and
F-‘(l--)/(1--a) when x>F-‘(l-a). 3 are independent random variables, then
E(rs)=O and V(r,)=(n-1)-l.
By substituting the empirical distribution
function F, for F in T(F), we can represent the (2) Kendall’s Rank Correlation. Take pairs
robust estimators discussed in Section H, at (Ri,Si) and (Rj,Sj). If (Rj-Ri)(Sj-Si)>O, set
371 Ref. 1382
Robust and Nonparametric Methods

‘pij= 1; otherwise, ‘pij= -1. The statistic r,= [13] K. Takeuchi, A uniformly asymptotically
efficient estimator of a location parameter, J.
is called Kendall’s rank correla-
Amer. Statist. Assoc., 66 (1971), 2922301.
tion, where 2 runs over all possible pairs [ 141 L. A. Jaeckel, Some flexible estimates of
chosen from (R 1, S,), , (R,, S,). location, Ann. Math. Statist., 42 (1971) 1540-
If there is no dependence between X and Y, 1552.
then E(rk) = 0 and V(r,) = 2(2n + 5)/(9n(n - 1)). [15] F. R. Hampel, The influence curve and its
role in robust estimation, J. Amer. Statist.
(3) Rankit Correlation. Ri and &, i= 1, , n, Assoc., 69 (1974), 383-393.
are replaced by the corresponding normal [ 161 J. Jureckova, Nonparametric estimate of
scores, i.e., the means of order statistics in an regression coefftcients, Ann. Math. Statist., 42
independent sample of size n from N(0, 1); then (1971), 1328-1338.
the usual rsample correlation coefficient is cal-
culated from these scores. This correlation co-
efficient rR is called rankit correlation; and if
there is no dependence between X and Y, then 372 (Xx1.3)
E(z) = 0, V(z) = (n - 3)-i, asymptotically, where Roman and Medieval
l-i-r,
z=;log~ 1 -r (TFisher’s z-transformation). Mathematics
These stat%tics are used to test the hypoth-
esis of independence. The Romans were interested in mathematics
for everyday use; their arithmetic consisted of
computation (by means of the abacus), weights
References and measures, and money. For their mone-
tary system, they developed a computational
[ 1] J. Hajek and Z. Sidak, Theory of rank method using duodecimal fractions. Julius
tests, Academic Press, 1967. Caesar (102?%44 B.C.), known for his calendar
[2] M. L. Puri and P. K. Sen, Nonparametric reform in 46 B.C., also undertook to measure
methods in multivariate analysis, Wiley, 1971. his territory, which aroused a demand for
[3] R. H. Randles and D. A. Wolfe, Introduc- land surveying techniques. Books on prac-
tion to the theory of nonparametric statistics, tical geometry which provided this knowledge
Wiley, 1979. were called gromatics (a “groma” was a land
[4] P. J. Huber, Robust statistics: A review, surveying instrument). Toward the end of the
Ann. Math. Statist., 43 (1972), 1041-1067. Western Roman Empire (476) Greek mathe-
[S] P. J. Huber, Robust regression: Asymp- matics was studied; during this period Boe-
totics, conjectures and Monte Carlo, Ann. thius (c. 480-524) wrote his two books on
Statist., 1 (1973) 799-821. arithmetic and geometry. The former was a
[6] P. J. Huber, Robust statistics, Wiley, 1981. summarized translation of Nichomachus’
[7] R. V. Hogg, Adaptive robust procedures: A book, and the latter included propositions
partial review and some suggestions for future from the first three books of Euclid’s Elements
applications and theory, J. Amer. Statist. (without proof) and practical geometry.
Assoc., 69 (1974), 909-923. Music, astronomy, geometry, and arith-
[S] D. F. Andrews, P. J. Bickel, F. R. Hampel, metic, which constituted the “mathemata” of
P. J. Huber, W. H. Rogers, and J. W. Tukey, Plato’s Academy (closed in 529) were treated
Robust estimates of location: Survey and as the “quadrivium” (the four major subjects)
advances, Princeton Univ. Press, 1972. in the Encyclopedia of Martianus Capella,
[9] H. Chernoff and I. R. Savage, Asymptotic Cassiodorus, Isidorus, Hispalensis, and others.
normality and efficiency of certain nonpara- After the establishment of the Roman Church
metric test statistics, Ann. Math. Statist., 29 in the 5th century, the quadrivium was to be
(1958), 9722994. studied for the glory of God. Books on mathe-
[lo] H. Chernoff, J. L. Gastwirth, and M. V. matics from this period laid emphasis on the
Johns, Asymptotic distribution of linear com- computation of an ecclesiastical calendar and
binations of functions of order statistics with mystical interpretations of integers, as seen
application to estimation, Ann. Math. Statist., in books by Bede Venerabilis, Alcuin, and
38 (1967), 52-72. Maurus from the 7th through 9th centuries.
[ 1 l] P. J. Huber, Robust estimation of a loca- Arabian science was imported first through
tion parameter, Ann. Math. Statist., 3.5 (1964), Spain-under Moorish influence beginning in
733101. 711, the year of the fall of the Visigoths-and
[12] J. L. Hodges and E. L. Lehmann, Esti- then through the Crusades (1096- 1270). Com-
mates of location based on rank tests, Ann. putation with figures, originating in India,
Math. Statist., 34 (1963) 598-611. replaced the abacus in the 12th century, when
1383 372 Ref.
Roman and Medieval Mathematics

Arabian books on arithmetic and algebra,


along with Greek books on geometry and
astronomy (such as books by Euclid and
Ptolemy), were translated into Latin. Italian
merchants, whose occupation necessitated
computation, rapidly adopted the new system.
Representative books of this period are Liber
abaci (1202) and Practica geometrica (1220) by
Leonardo da Pisa (also known as Fibonacci,
c. 1170- 1250). The former includes the four
arithmetic operations, showing Indian in-
fluence, commercial arithmetic, and algebra.
The new methods were not limited to mer-
chants. The French bishop Nicole Oresme
(c. 1323-1382), who influenced Leonardo da
Vinci (1452-1519), introduced fractional expo-
nents and conceived the graphic representa-
tion of temperature, a precursor of coordi-
nates and functions.
From the 1 lth century, universities de-
veloped from theological seminaries, first in
Italian cities such as Bologna and Palermo,
and later in Paris, Oxford, and Cambridge.
Mathematics was taught in these universities,
although no remarkable creative contributions
were made. However, theologians such as
Albertus Magnus (c. 1193-1280) and Thomas
Aquinas (1225?-1274) discussed infinity in a
way that went beyond Greek thought and thus
helped to lay a basis for the modern philo-
sophy of mathematics.

References

[1] M. Cantor, Vorlesungen iiber Geschichte


der Mathematik, Teubner, I, 1880; II, 1892.
[2] M. Clagett, Greek science in antiquity,
Abelard-&human, 1955.
[3] M. Clagett, Archimedes in the Middle Ages
I, II, Univ. of Wisconsin Press, 1964.
[4] M. Clagett, The science of mechanics in
the Middle Ages, Univ. of Wisconsin Press,
1959.
[S] G. Sarton, Introduction to the history of
science I, From Homer to Omar Khayyam,
Carnegie Institution, 1927.
373 A 1386
Sample Survey

373 (xvlll.13) Pr{(J,W=(j I,... J,,X,,...,X,)}


Sample Survey =Pr{J,=j,}Xs(X,)Pr{J2=j21J1,X1JXB(XZ)
... PrC4=.LIJ,,Xl, ...,J,-I,X,-,}~e(X.),
A. General Remarks
where xs(Xi) is defined as 1 if Xi =: clJi and 0
otherwise. This formula can be shortened to
The sample survey is a means of getting sta-
the form
tistical information about a certain aggregate
from the observation of some but not all of Pr{(J,X)) =p(J,XhK J), (1)
it. The aggregate concerned is usually called
where
the finite population, and the observed part
is called the sample. Introducing a random ifXi=aJi,i=l ,..., n,
mechanism, J. Neyman established a method 0 otherwise.
of ascertaining objectively the reliability of
such information. This method is mainly ap- Expression (1) is the fundamental model for
plied to demographic statistical surveys and the sample survey problem. Note that P(J, X)
opinion polls, but it is also applicable to ran- is independent of the parameter 0. Therefore
dom samples of physical materials. We briefly ifweletI=(I,,..., I,) (I, < . . <I,,,) be the set
sketch the mathematical structure of this of numbers in (Jr, . . , J”) after deleting dupli-
method without going into detail about tech- cations, and let Y = (Yr , . . , Y,) be the corre-
nical problems that arise in the practical sur- sponding X values, then the joint distribution
vey situation. of I and Y can also be expressed as
Suppose that the population consists of N
Pr{(I, Y)> = p(I, Y)xs(Y, I),
units, where N is called its size. Each unit has
some characteristic c(, which is an element of where
some set R. The set of all characteristics of the 1 if I;=xlj,j= l,..., nz,
units in the population is designated by 6’= XOW> II =
0 otherwise.
{ c(~, , a,}, which we regard as a parameter.
The set of all possible Q is denoted by 0 c @‘. Since xe(Y, I) = x0(X, J) for all 8, the tcondi-
Suppose that one unit is chosen and ob- tional probability distribution of (J, X) for
served according to some procedure, the index given (I, Y) is independent of x0(X, J), and
number of the unit in the population is J, and hence (J, X) is a tsuflicient statistic. According
the observed characteristic is X. It is assumed to the general theory of sufficient statistics, we
that the observation is without error, hence can restrict ourselves to the class of procedures
X=a,. depending only on (I, Y).
Denote the whole sample by (J, X) = (5i, . . ,
J,, X,, . , X,), where n is the sample size and
Xi = c(~,. The sample size n may be a random C. Estimation
variable, and among J,, . . , J., duplication
may be allowed. The probabilistic scheme
Suppose that g(0) = g(cr,, . . , c+) is a real para-
for J is called the sampling procedure, and if
meter whose unbiased estimators are under
it satisfies the condition
discussion.
(c) Pr {Ji =j} is independent of tlJ, and of Xi+r , Theorem. There exists an unbiased estimator
. ,X,, (but may depend on Xi, . . . ,X,-i), of g(0) if and only if there exists a decomposition

it is called a random sampling procedure. More-


over, if the tjoint distribution of J is indepen-
Pr{I=(j,(v), . . . .j,(v))}>O, v= 1,2, . . . . (2)
dent of 0, it is called regular. Specifically, if n is
constant and Pr{J} is symmetric in J, it is If the sampling procedure is regular, the
called uniform. second condition can be replaced by Pr {I 3
The two main mathematical problems of (jl(v), . . . . j,(v))}>O, v=l, 2, . . . . Hence, for
sample surveys are to determine a random example, if cli is real and the sampling proce-
sampling procedure and to provide methods dure is regular, Cr= Z q/N is testimable if and
whereby statistical inferences can be made onlyifPr{I~i}>Oforalli,ande~=~(ai-
concerning 0 (- 401 Statistical Inference). -2
a) /(N - l)=EX(ct-aj)‘/N(N - 1) is esti-
mable if and only if Pr{I 3 i, j} > 0 for all i and
j. Also, I-I:, ai is not estimable unless Pr{I=
B. The Problem of Inference (1, . . , N)} > 0. The decomposition (2) is not
unique, and corresponding to different decom-
Condition (c) is assumed. The probability of positions, different unbiased estimators are
(J, X) is given by derived. Also, for the case of regular sampling
1387 373 F
Sample Survey

procedures, for any 0 = 0, it is always possible


to construct an unbiased estimator g(0) such
that Pr{~(0)=g(8,,)10=00} = 1 if g(0) is esti- N and n-1 co, and lim sup n/N < 1, provided
mable. Hence the variance of the locally best that
unbiased estimator is always 0, and the tuni-
formly minimum variance unbiased estimators
exist only in the trival case.
If some kind of symmetry exists among the
population units as well as the sampling pro- Also, the sample variance converges to c,’ as
cedure and the parameter, it would be natural n-rco. From these results we can construct
to require the same kind of symmetry for the asymptotic confidence intervals for Cc.
estimators. Let G be a group of permutations
among N numbers. Assume that for any 0EO E. The Problem of Sampling Procedures
and ycG, we have ~0~0 and g($)=g(e). If
Pr{yJ} =Pr{J} for all y, then the sampling In determining the sampling procedures, both
procedure is said to be invariant with respect the technical aspects of sampling and the
to G. An estimator is also called invariant if its accuracy of the estimators should be consid-
value does not change under any permutation ered. The most commonly used methods are
YE G of the numbers of sample units. Thus if G multistage sampling and stratified sampling, or
is the set of all permutations (i.e., the tsym- some combination of the two. For example,
metric group), then the invariant estimator is a the population is partitioned into several
function of Y (or X) only. Moreover, if the clusters. First we select some of them accord-
dimension m of Y is constant, Y is complete ing to a probability scheme and then choose
(under some mild conditions); hence there units from the selected clusters. This procedure
exists a unique minimum variance invariant is called two-stage sampling. The probabilities
unbiased estimator. for the selection of clusters may be uniform or
When there is some additional information, proportional to the size of the clusters. Strati-
it can be represented by auxiliary variables fied sampling is the method of dividing the
/Ii, . . . , jN, which are known and assumed to population into several subpopulations, called
have some relation to tli, . . . , tlN. Assume that strata, and selecting the sample units within
the xi are real numbers and that the parameter each stratum. If the size of the ith stratum is
to be estimated is f3=8=(zcci)/N. If we can Ni, the size of the sample chosen from this
assume that tli and pi are approximately pro- stratum is n,, and the probability is uniform
portional, we can estimate
-- the unknown popu- within each stratum, then the most common
lation mean %by %*=(X/Z) x fi where x is estimator for the population mean B is given
the sample mean of the U’Sand Z the sample by
mean of the /?‘s.Although g* is not unbiased,
we may expect that it has small error if the
relation between two variables is close. &* is where xi is the mean of the sample values in
usually called the ratio estimator. the ith stratum. The variance of G is given by
In practical research, as an estimator of the
population total A = C fxi, we usually use A^=
xX,/Pi, where P,=Pr{J3i}. Its variance is
V(d) = xx(Piq- Pij)(ai/Pi- aj/4)’ and is where CT,? is the population variance within the
estimated by ~(2) = x:C { (Pipi- P&/s} (Xi/Pi - ith stratum.
Xj/Pj)', where Pij=Pr{J~(i,j)}. When N is If the cost of drawing one sample unit in the
unknown, it can be estimated by the same ith stratum is equal to ci, then for fixed cost,
procedure as A (say A), and the population the variance of the estimator is minimized
mean Alcan be estimated by 2 = A/N, which is when
called a ratio estimator. %is biased except
when N is known.
which is called the condition of optimum
allocation.
D. Asymptotic Confidence Intervals

It is usually impossible to obtain any meaning- F. Replicated Sampling Plan


ful tconfidence interval based on exact small-
sample theory. But when the cli are real and W. E. Deming proposed an effective method in
the sampling procedure is uniform and with- practical sample surveys, called a replicated
out replacement, the sample mean d is asymp- sampling plan, following J. W. Tukey’s hint. It
totically normal with mean CCand variance enables us to easily evaluate variances of esti-
373 G 1388
Sample Survey

mates for any estimator and any sampling tistic involved in the situation (- 396 Statistic,
procedure. Let the sample be composed of k 401 Statistical Inference). In general, the proba-
subordinate samples which are selected by the bility distribution of a statistic is called the
same random sampling procedure from the sampling distribution. A set {Xi, X,, . . . , X.}
same population, and let &Ji, Xi) be the esti- of +random variables that are independently
mate from the ith subordinate sample by the and identically distributed according to a
same estimator and o^be the estimate from the distribution F is called a random sample
whole sample by the same estimator. Then, from F. A common sampling distribution de-
provided 8=x&/k approximately, the var- scribed in this article is that of the statistic Y =
iance of 8 can be estimated by o(G) = C(e - f(X, , ,X,), where the set {X,, , X,,} is a
@2/k(k - 1). If the sample is selected by the random sample from a tnormal distribution.
simple random sampling procedure and is of Examples of such a statistic Y of dimension 1
large scale, 4 and 8 are approximately normal are the tsample mean, tsample variance, linear
variates, and u(0) is evaluated by using the or quadratic forms of {X, , ,X,}, their ratios,
sample range of the &. In large-scale sample and torder statistic, while examples of Y of
surveys, even when the random sampling higher dimensions are the sample mean vec-
procedure is not simple, the theory related to tor and the sample covariance matrix and
the normal distribution can be applied to the its eigenvalues. The tnormal distribution
4 and 6. It has been shown that u(8) evaluated with tmean p and tvariance c2 is denoted by
by the above method includes not only the N(p, a’), while the +p-dimensional (p-variate)
sampling error but also the random part of the normal distribution with mean vector p and
nonsampling errors. covariance matrix C is denoted by N(p, Z)
(- Appendix A, Table 22).
G. Conceptual Problems

Although it has been established that the


B. Samples from Univariate Normal
sample survey method is useful in large-scale
Distributions
social or economic surveys, there are difficult
conceptual problems about the foundations of
the method (especially when auxiliary informa- If random variables Xi, . , X, are distributed
tion exists) that are still far from being settled. independently according to N(p, , a:), ,
N(p,,, a:), then a linear form Cia,Xi has the
References distribution N(&aipi, Ciu~~~). In particu-
lar, if {Xi, , X,,} is a random sample from
[l] W. G. Cochran, Sampling techniques, N(p, o*), then the sample mean X = xi Xi/n
Wiley, third edition, 1977. has the distribution N(p, cr’/n).
[2] W. E. Deming, On simplifications of sam- Let {Xi,..., Xn} be a random sample from
pling design through replication with equal the distribution N(0, 1). The sampling distri-
probabilities and without stages, J. Amer. bution of the statistic Y = x,X,” is called the
Statist. Assoc., 51 (1956) 24-53. cbi-square distribution with n degrees of free-
[S] R. J. Jessen, Statistical survey techniques, dom and is denoted by x2(n). It has the tproba-
Wiley, 1978. bility density
[4] J. Neyman, Contribution to the theory of
sampling human populations, J. Amer. Statist.
Assoc., 33 (1938), 101-116.
[S] P. V. Sukhatme and B. V. Sukhatme, for O<y< cc,fi(y)=O elsewhere, where I is
Sampling theory of surveys with applications, the tgamma function. The distribution of Y =
Asia Publishing House, second edition, 1970. Cyz’=,(Xi + pi)’ depends only on n and i =
163 K. Takeuchi, Some remarks on general zip!, and is called the noncentral cbi-square
theory for unbiased estimation of a finite distribution with n degrees of freedom and
population, Japan. J. Math., 35 (1966) 73384. noncentrality A and denoted by ~‘(n, a). It has
the probability density

374 (XVlll.4)
Sampling Distributions

A. General Remarks
for O< y< co, where f.+Zk and f, are the prob-
To perform statistical inference, it is necessary ability densities of chi-square distributions
to find the tprobability distribution of a +sta- and ,F, is an extended hypergeometric func-
1389 374 c
Sampling Distributions

tion. Noncentral chi-square distributions densities fm,n,l and f,,, of these distributions
have the following treproducing property: If are given by
Yi , . . , yk are distributed independently ac-
zw) -1
cording to ~‘(n,,l,), . . ..~‘(n.,i,), then xix fm&) = (m’n)m’2
has the distribution x2( xi ni, &1,). Also, we B(m/2, n/2) (1 + mz/n)(m+n)‘2 ’
have Cocbran’s theorem (Proc. Cambridge
Philos. Sot., 30 (1934)): If Xi, . , X, are dis- f,.,,i(z)=k~oe-~/z(~~~B(~~~~~~~,2)
tributed independently according to N(p,, l),
Z(m,2)+k-l
.‘., N(p”, 1) and if for quadratic forms Q,=
C. C &‘)X.X. for m = 1, . . . , k the matrices x (1 + mz/n)((m+W*)+k

Al =‘(ib) ci6 trank r, satisfy the condition


A, + . . + A, = I (unit matrix), then a necessary =e -w* 1 F 1 y;;;g$ L.“(Z)
and sufficient condition for Q r , . . . , Qk to have ( >
independent noncentral chi-square distri- for 0 <z < co, where B and I F’r are the theta
butions with rl, . . . , r, degrees of freedom, function and the confluent hypergeometric
respectively, is that rl + . . . + r, = n. In partic- function (- 167 Functions of Confluent Type),
ular, when pi = 0 for all i, they have (central) respectively.
chi-square distributions, and the theorem Let X be a random variable having the
implies their reproducing property. distribution F(m, n). The distribution of Z =
Let X and Y be independent random vari- flogX is called the z-distribution with m and n
ables having distributions N(6,l) and x*(n), degrees of freedom and is denoted by z(m, n).
respectively. Then the sampling distribution Its probability density is given by
of T = X/e is called the noncentral t-
distribution with n degrees of freedom and 2(m/n)“l* em2
noocentrality 6 and is denoted by t(n, 6). Its B(m/2, n/2) (1 + me2Z/n)(m+“)i2
probability density is given by
for -co<z<co.IfS:=C&(Xi-@‘/(m-l)
L,a(t)= f e-,y2&‘Wtn+k+ 1)/2) and Sz = &(x - y)*/(n - 1) are sample var-
iances based on independent samples of sizes m
k=O k! ,,/Gr(n/2)
and n taken from N(p, G’) and N(v, z*), respec-
tively, then the statistic z=ilog(S:/Si), which
was introduced by Fisher (Proc. Int. Math.
Congress, 1924), is distributed according to
for -cc < t < cc. In particular, when 6 = 0, the z(m - 1, n - 1) under the hypothesis o2 = z*.
distribution is called the t-distribution with n Fisher [6] tabulated percent points of z(m, n).
degrees of freedom and is denoted by t(n). Its
probability density is simplified to
-(“+t),* C. Samples from Multivariate Normal
W+ U/2)1
f,(t)=&W2) ( >+c Distributions

for -co<t<co. Let X be a p-dimensional random vector,


Let {Xi, . . . . X.} be a random sample from namely, a vector having real random variables
N(p, cr’). Exact sampling distributions of the as its components. X has the p-variate normal
tsample variance Sz =x(X, - x)*/(n - 1) and of distribution N(,u, JJ if and only if for any real
the t-statistic T = &(x -po)/Js’, where p = vector a=@,, . . . . up)‘, the random variable a’X
p. is a given number, were essentially obtained has the normal distribution N(a’,u, a’Za). If
by Student [S]: (n- l)S’/a* and T are dis- X i , . . . , X, are independent and have p-variate
tributed according to X’(n - 1) and t(n - l), normal distributions N(p,, z,), . . . , N(p,, &J,
respectively. His proof was made rigorous by respectively, and if A,, . . , A, are m x p real
R. A. Fisher (Metron, 5 (1925)), who proved in matrices, then the random vector A, X, + . +
particular that X and S2 are independent. If A,X, has the m-variate normal distribu-
p # po, then T follows the distribution t(n - tion N(,u, JJ, where p = & Ajpj and C=
LJd~-~oY4 & AjqA;.
Let X and Y be distributed independently Suppose that {Xi, . . . , X,} is a random
according to ~‘(m, 1) and x*(n), respectively. sample from the p-variate normal distribution
The distribution of Z = (X/m)/( Y/n) is called N(O,a,andletX=(X, ,..., X,)beapxnma-
the noncentral F-distribution with m and n trix. Then the probability distribution of IV=
degrees of freedom and noncentrality 1. In XX’ is called the Wishart distribution with
the special case when I = 0 it is called the F- scale matrix z and n degrees of freedom and
distribution with m and n degrees of freedom is denoted by IV,@, n) or simply W(& n). If
and is denoted by F(m, n). The probability n > p - 1, the joint probability density function
374 c 1390
Sampling Distributions

of p(p+ 1)/2 arguments of W=(M/I,) is Wishart distributions with rr , . , rk degrees of


freedom, respectively, is that rl + + r, = n. If,
f~,,(w)=(rp(n/2)12~l”‘2)~’
in particular, pi = . . . =p,=O and ifr,,,ap, then
x etr(--C-i W/2)1 Wl(n-p-1)i2 Q, is distributed according to W(C, r,,,).
If W has the distribution W(C, II) with n >
for W> 0, where W > 0 means that W is tposi-
p-1,thentheeigenvaluesi ,,..., &(a,>...>
tive definite, etr(A) = exp(tr A), and rP, a multi-
& > 0) of W have the joint probability density
dimensional gamma function, is defined as
function C,,,lCl -‘I2 &P’( - p/2, A)lApP-w2

&,(+ij), where A is a diagonal matrix


rp(a)=nP(P~‘)“Rr(a-~(i-l))
i=l
with diagonal elements 1,) ,1, and C,,, =
7rp”2(2pn/2~p(p/2)Tp(n/2))-1. If C= I, then the
for a>(~- 1)/2. When n<p- 1 the distri- joint probability density function becomes
bution is singular and has no probability
density.
Suppose that Xi, . . , X, are independent
and obey normal distributions N(p,, c), Suppose that S, and S, have independent
. , N(pu,, C), and let X =(X1, . , X,), M = Wishart distributions W(C, n,) and W(C, n2),
(pl, . . ,p,J. Then the distribution of W= XX’ respectively. The random matrix R = (S, +
is called the p-dimensional noncentral Wis- S2)-1/2S1(SI +S2)-1’2 is called the beta matrix,
hart distribution with scale matrix Z:, n de- and its distribution is denoted by B(n, 12, n,/2).
grees of freedom, and noncentrality matrix Its probability density function is
0 = C-’ MM’ and is denoted by W(C, II, Q). If
n > p - 1, the probability density function is

for W> 0. aF, is a hypergeometric function


for O<B<I.
with matrix argument, which is defined by
Suppose that S has the distribution W(L’, n)
aFp(al ,..., a,;b ,,..., ba;S)=@(a, ,..., a,;
and B has B(n, 12, n,/2); then, for any non-
b,, . . . . bp;S,I), where
singular symmetric matrix fi,
Jf”(a, /..., a,;b, ,..., b,;S,T)
p+l’
P{s<~= 12ccxy2rp 2
{ ( ,i

ti = (k 1, . , kp) is an ordered set of integers such and


thatk,+...+k,=kandk,>,...>k,~O,and
where C,(S) is a zonal polynomial (- [S]) of a
symmetric matrix S. The multivariate hyper-
geometric coefficient (u), is given by

x2F,2, n, p+l
-1+2;-
n,+p+l
-;R
(a),=a(a+l)...(a+k-1). ( 2 2 >
The noncentral Wishart distribution is sin- If {X r , . . . ,X,} is a random sample from
gular when n < p - 1. Similarly to the noncen- N(,B, C), then the sample mean X ==Cf=i X,/n
tral chi-square distribution, the noncentral and the sample covariance matrix S = Ez=i (X,
Wishart distribution has the reproducing -it) (X, - X)‘/(n - 1) are distributed indepen-
property with respect to both the number of dently according to the respective distributions
degrees of freedom and the noncentrality N(p,C/n)and W(C,n-l).Ifn>p-l,T’=
matrix. Also, Cochran’s theorem can be ex- n(X-p,)‘S’(~-,u,) is called the noncentral
tended to the multivariate case: Let Xi, . ,X, Hotelling T2 statistic with n- 1 degrees of
be p-variate random vectors independently freedom and noncentrality /Z=~(,I-,u,J’C-‘(,u
distributed according to N(pr,C), . . , N(p,,,Z), -&. (n-p) T’/p(n - 1) has a tnoncentral F-
respectively, and let A, = (a!$“)), m = 1, , k, be distribution with p and n-p degrees of free-
p x p real matrices of trank r,,, and such that dom and noncentrality 1.
A, + A, + . + A, = I (unit matrix). A necessary LetX=(X, ,..., X,)‘andY=(Y, ,..., YJ’,
and sufficient condition for random matrices p < q denote two random vectors, C, i and
Q, = Ci,jaF’XiX;, m = 1, , k, to be indepen- Zz2 their respective covariance matrices, and
dently distributed according to noncentral C,, the p x q matrix of covariances between
1391 374 D
Sampling Distributions

the components of X and Y. Each of the the density of R mentioned in the previous
nonnegative roots p1 , . . . , pP of the equation paragraph and p1 2, 3 ,,, p is the population par-
jZ1 & ZZ I - pzC, 1 I= 0 is called the canonical tial correlation.
correlation coeffkient. Let (Xi, Yi), . . . ,(X,, Y,)
be a random sample from the (p + q)-variate
normal distribution N(0, Z), and let S, 1 =
D. Large-Sample Theory
C~=,X,X&ln, S1,=Sz, =ELX,Y:Jn, S,,=
Zcbf=i Y,Yh/n, and S = (Sij)i, j=l, Z. The sample
canonical correlation coefficients are the non- So far we have dealt with random samples
negative roots rl, . . . , rpofI~12~~~~21-r2~111 {Xl, . ..> X”} composed of finitely many ran-
= 0 and for n > p + q the probability density dom variables (or vectors). The theory dealing
function of rf , . . . , I,’ is with such finite cases is called small-sample
theory, which is not always suitable for numer-
ical applications. In comparison with this, in
large-sample theory, where the sample size is
assumed to be sufficiently large, an approxima-
tion of the sampling distribution can often be
where obtained easily by means of the tcentral limit
theorem.
C P.4.n
If for three sequences X,, pc,, and o,, n=
1,2, . . . . of random variables, real numbers,
and positive numbers, respectively, the se-
quence (X, -~“)/a. tconverges in distribution
and R2 and P2 are diagonal matrices with to N(0, 1) as n+co, then the sequence X, is
elements rf and p,:, respectively. If, in partic- said to be asymptotically distributed according
ular, p = 1, then p1 and rl are, respectively, to N(p,, 0,‘). The definition can be extended
the population and the sample multiple cor- to higher dimensions. We write X, = op(rJ for
relation coefficients, and (n - q)rf /q( 1 - rf) a sequence r, of positive numbers if and only
follows the distribution F(q, n - q) whenever if X,/r, tconverges in probability to zero as
pl=o. n-rco. The following theorem is useful: If
Let {(X1, Y,), . . . ,(X,, Q} be a random X, = a + o,(r.), where a is a constant and
sample from the 2-dimensional normal distri-’ r, = o( l), and if a real-valued function f(x) is
bution with tcorrelation coefficient p. Then the of class c” in a neighborhood of x = a, then
sample correlation coefficient

zi(xi-x)(l,- y,
R=(zi(xi-x)2zi(&- Y)2)1’2
If X,, is asymptotically distributed according
has probability density to N( p, a2/n) and f(x) is differentiable at x =
p with the derivative f ‘(p) #O, then f (X,) is
f.(r; P) asymptotically distributed according to N(f(p),
=(2n-3/n(n- 3)!)(1 -p2p-w(1 _ r2yn-4)/2 (f’(p))’ 02/n). In higher-dimensional cases, if
X, is asymptotically distributed according to
N(p, Z/n) and f(x) is continuously differenti- *
able in a neighborhood of x =,u with nonzero
vector c=(~?f/ax,, . . . ,i3f/d~~)~=~, then f(X.)
for -1~ r < 1. For the special case p = 0, the
is asymptotically distributed according to
probability density becomes
N(f (A czc’/n).
Let {X1, . . . . X.} be a random sample from
1r $n-I) a univariate distribution with finite tmoments
( >
f,(r) =z=2qU -r2)(“-4)‘2, vi=E(Xi)fori=l,...,k,andletai=C,X$
be its ith sample moment. Then the random
2
vector (a,, . . , uk) asymptotically follows the k-
which implies that T=mRJdm variate normal distribution as n+ co with
has the V-distribution with n - 2 degrees of mean vector (vi, . . . , vk) and covariance matrix
freedom. nml(crij), where CQ~=V,+~-V~V~. Let Mi=C,(X,-
Given a random sample from a p-variate Zy/n and pi=E(X-vv,)‘for i=2, . . ..k be the
normal distribution, the probability density of sample central moment of order i and popula-
the tsample partial correlation coefficient tion central moment of order i, respectively.
RI,.,..., between the first and the second Then the random vector (X, M2, . . . , Mk) obeys
components with the remaining components the k-variate normal distribution asymptoti-
fixed is given by fn-p+2(r;p12.3,,.p), where f is cally as n+ cc with mean vector (vi, p2, . . , pk)
314 E 1392
Sampling Distributions

and covariance matrix nml(oij), where g1 1 = where pj= F(xj)- F(xjm,), j= 1, . k, provided
P2, ali=/4+l-ihPik~~ oijijPi+j-iPi-lPj+l- that the p’s are positive. In particular, the
i~i+~~j-~-~i~j+fijll2lli~~~j-~ for i,j>2. vector is asymptotically distributed according
A random variable xl that has a tchi-square to the k-variate normal. The result is substan-
distribution with n degrees of freedom obeys tially strengthened as follows: the Glivenko-
the distribution N(n, 2n) asymptotically as Cantelli theorem states that sup, IF,(x) - F(x)1
n+co. Also, m-m obeys N(O,l) converges to zero with probability 1 as n tends
asymptotically. The latter distribution approx- to infinity. If F(x) is continuous, then the ran-
imates xf indirectly better than N(n, 2n) ap- dom function &F,,(t)- F(t)) converges in
proximates x,’ directly. The t-distribution with distribution to a tGaussian process X(t) such
n degrees of freedom obeys N(0, 1) asymptoti- that E(X(t))=O and E(X(s)X(t))=: F(s)(l -
cally as n+ co. If X, obeys an F-distribution F(t)) for s< t. A Gaussian process X(t), Og
with m and n degrees of freedom, then mX, t < 1, with this moment condition is called a
obeys asymptotically the distribution x’(m) as Brownian bridge if F(t)= t, for O< t < 1. If F(x)
n* co. If X, obeys a tbinomial distribution is continuous, then the distributions of the ran-
Bin(n, p), then X, obeys asymptotically the dis- dom variables C, = $sup,(F,(x) -F(x)) and
tribution N(np, np(1 -p)), and Arcsinm 0, =&sup 1F,(x) - F(x)1 do not depend on F.
obeys asymptotically N(Arcsin&, 1/4n) as Asymptotically, they have identical distribu-
n+ m. This transformation is called the arcsin tions with supt B(t) and supt IB(t)l, respectively,
(or angular) transformation. If (X,, X,, . . . , X,) where B(t) is a Brownian bridge. We have
obeys the multinomial distribution Mu(n; pl,
P(sup,B(t)<x)= 1 -6x2,
pz, , pk), then it is asymptotically distributed
according to the normal distribution N&U,, Z,,),
P(sup,IB(t)I<x)=1+2 2 (-l)%@“*,
where ,u” = (np, , , np,), C, = (CT/,:)),ai:) = k=l

np,( 1 -pi), and c$‘) = - npipj, (i #j), and the


x>o.
random variable C:‘: (Xj - npj)*/npj, where
X,+,=n-(X,+...+X,)andp,+,=l-(p,+ Let {X,, ,X,} and {Y,, , Y,} be random
. . . fp,), obeys asymptotically the distribution samples from continuous distributions F and
x’(k) Clll. G, respectively, and let F,(x) and G,,(x) be their
If X, has the +Poisson distribution with empirical distribution functions. IJnder the
mean I,. where I.,+ co as n + co, then X,, and hypothesis H, : F = G, the distribution of the
Jg obey the respective distributions N(i,, &) Kolmogorov-Smirnov test statistic
and N(&, l/4) asymptotically. If R is the
sample correlation coefficient based on a ran-
dom sample of size n from a 2-dimensional
does not depend on F (or G), and asymptoti-
(bivariate) normal distribution with popula-
cally, as m+ co and m/n+,l< 1, the random
tion correlation coefficient p, then R is asymp-
function &(F,(t)- G,,(t)) converges in distri-
totically distributed according to N(p, (l-
bution to a Gaussian process X(t) such that
p’)‘/n) as n+co, and therefore z=ilog((l+
E(X(t))=O and E(X(s)X(t))=(l +i)F(s)(l -
R)/( 1 -R)) obeys asymptotically the distri-
F(Q),s<t.
bution N(i log(( 1 + p)/( 1 - p)), l/n) asymptoti-
cally. This transformation is called Fisher’s z-
transformation. The distribution
F. Edgeworth and Cornish-Fisher Expansions

Let {X, , X,, . ,X,} be a sample from a distri-


bution with mean p and variance 0’. The
gives a better approximation. random variable (X, + X, + +X, - np)/&a
is called the normalized sum of the sample.
E. Empirical Distribution Function The distribution function F,(x) of the normal-
ized sum of a sample from an absolutely
Let {X, , , X,} be a random sample from a continuous distribution F with higher-order
distribution F. The random function moments admits the Edgeworth expansion
[15-j:
F,,(x) = 1 {number of X’s that are <x}
n F,(x)
is called the empirical distribution function. For
any collection of fixed x’s (--co =x0 <x, <
< xk < oo), the random vector (nF,(xl), n(F,(x,)
-F&d), ,n(F,h-FAX,-J)) obeys the where Q and 4 are the tcumulative distribution
tmultinomial distribution Mu(n; pl, , pJ, and the iprobability density functions, respec-
1393 374 G
Sampling Distributions

tively, of N(0, 1), and B is a quantity bounded and


by a constant depending on F and v. Rk(x) is
the polynomial given by

where u =(X - n)/fi. However, the distri-


bution of the random variable
where Hk(x) is the tHermite polynomial of
degree k, yk is the tcumulant (tsemi-invariant)
of order k of the distribution of (Xi -PO/~, the
summation extends over all nonnegative m’s is much better approximated by N(0, l), and
suchthatm,+2m,+...+km,=k,and1=m,
+ m, + . . . + mk. In particular, we have R,(x) = n( 1 -&+&u=)
-ys(x*- 1)/6 and R,(x)= --7,(x3-3x)/24
-&x5 - 10x3 + 15x)/72. For a tlattice distri- gives a more accurate approximation to the
bution F concentrated on 0, fl, + 2, . . but lOOc(% point of the distribution x’(n). These
not on 0, +p, f2p, . . for any p> 1, the fol- are called the Wilson-Hilferty approximations
lowing expansion is valid for x = 0, f 1, (Proc. Nat. Acad. Sci. US, 17 (1931)).
_ ,...:
+2 The Edgeworth expansion was shown to be
valid in more general situations by R. N. Bhat-
tacharya and J. K. Ghosh [17]. In particular,
they obtained the following: Let {X1,X2,
. . . ,X,} be a random sample from a p-variate
distribution with a nonzero tabsolutely con-
tinuous component w.r.t. tLebesgue measure
where z = (x - np + 1/2)/&a and the Q’s are on RP. Let f0 (E l), fi, . . ,X be linearly in-
suitable polynomials; Qi(z)= R,(z) and Qz(z) dependent, real-valued, and continuously dif-
= R,(z) + 21240~. ferentiable functions. For i = 1, . . . , n, put Zi =
The Edgeworth expansion makes it possible (ft (Xi), fi(XJ, . . t fk(xi)b and assume that
to derive asymptotic formulas for the relation the distribution of Z, has moments up to the
between those u and u such that F,(u) = a(u). If order v (2 3). Let H be a real-valued function
F is an absolutely continuous distribution with on Rk such that the vth order derivatives are
moments of order v (2 3), then we have the continuous in a tneighborhood of p = E(Z,).
Cornish-Fisher expansions [ 161: Let V=(u,), i, j= 1, . . . , k, be the covariance
matrix of the random vector Z,, and put c2 =
C uijlilj, where li = 8H(z)/Bz,].=,, and z=
(z i, . . . , zk). Then
and

where the A’s and B’s are polynomials derived where z = & Z,/n, the supremum is taken
from the R’s of the Edgeworth expansion; over all Bore1 measurable sets B,
A,(u)= -y3(u2-1)/6, A,(u)= -y4(u3-31#24
+ y:(4u3 - 7u)/36, B,(u)= y3(u2 - 1)/6, B2(u) =
y4(u3-3u)/24-y;(2u3-5u)/36.
The expansions imply, in particular, that the
4,,(x) is the probability density function of the
random variable u + Cl=: Ak(u)nek” with u =
normal distribution N(0, o’), and the P’s are
(X, +X, + . . . + X,-- np)/Jtra is asymptoti- polynomials whose coefficients are indepen-
cally distributed according to N(0, 1) and that dent of n.
the lOOa% point u, of F. is approximated by u,
+ zvZz
kl B k (u II)n-k’2,where u, is the lOOcr% point
of N(0, 1). These approximations can be im-
proved further in some cases by a suitable G. Order Statistics
transformation of the sum Xi +X2 + . . + X,.
Thus, for example, if X is distributed accord-
Let {Xi, . . . . X,} be a random sample from a
ing to x2(n), then the Cornish-Fisher expan-
univariate distribution with continuous prob-
sions with v = 3 are
ability density f(x) and distribution function
F(~),andletX~,,,<..., <Xc”, be torder statistics.
._,t&2-1)+0(i)
The joint probability density of Y1 =X(.), Y2 =
374 H 1394
Sampling Distributions

x (/iJ, . . . , Y&i = X,,,, and Y,= XC,i) is given by the sequence (X, i) - a,)/b, and their domains of
attraction. The statistics R, = X,,, - XC,) and
n!
M, = (XC,, + XC,,)/2 are called the range and the
(a-l)!(B-a-l)!...(q-E-l)!(n-q)! midrange of the sample, respectively. If, for
x (F(Y,))“-‘(F(Y*)- F(Yl))-’ “’ some a,, a;, and b,, both sequences (XC,,-
4/h and (4 1)- ak)/b,, converge to nondegene-
x(F(y,)-F(y,~,))“-“~‘(l -F(y,))“-q... rate distributions G and H, respectively, then
they are asymptotically independent, and we
Xf(YJf(Yz)...f(Yp)
have
for -co<y,<...<y,<co,wherecc<B<...<
lim Pr{(R,-a, +a’,)/b,<x}
E< q. If for given constants 0 < 1?i < <$ <
1, each of the subscripts CC, , r~ tends to inlin-
zz Oc (1 -H(y-x))dG(y),
ity as n-t cu under the conditions ri = ni, +
s - aa
o(G), where g= rl, , r~= rP, then the ran-
dom vector (Y,, . . . , Y,) asymptotically obeys lim Pr { (2M, -a, - c&)/b, <x)
the p-dimensional normal distribution with m
mean vector (<i, , 5,) and covariance matrix = ff(x-~W(y).
n -l(o,), where aij = oji = &( 1 - lj)/)lf([i)f(t) for s --2
i<j and ti is the &quantile of the population,
defined by iLi = F(&). H. Characterization of the Distribution by
Suppose that there exist two sequences a, means of a Property of the Sampling
and b, of real and positive numbers, respec- Distribution
tively, such that as n+ co the sequence (X,,, -
a,)/b,, converges,in distribution to a nonde- A distribution or a family of distributions can
generate distribution G. The underlying dis- be characterized by a property of the sam-
tribution F is said to belong to the domain pling distribution of a suitable statistic. Let
of attraction of the limiting distribution G {X,, X,, . . . , X,,} be a random sample from a
(DA(G), for short). Except for the change of nondegenerate distribution F, and let XC1, d
location and scale, only the following three <XC,, be the torder statistics. The tsample
distributions have nonempty domains of mean X is independent of the tsample var-
attraction: iance S* = x(X,- X)‘/(n - 1) iff F is normal
N(p, c?) (Kawata and Sakamoto, J. Math. Sot.
Japan, 1 (1949)). Let aij, i, j = 1, , n, be real
numbers such that C aij = 0 and Z a,, # 0. If F

G,=
e-(-x)y
x<o has a finite second moment, then the condition
E( C uijXiXj 1X) = const. implies that F is nor-
1, x20'
mal. Two linear statistics L, = a, Xi + +
u,,X,, and L, = b, X, + + b,,X, are inde-
pendent only if F is normal, provided that
Writing F(x) = 1 -F(x), cc(F) = inf{x 1F(x) > 0)
ujbj#O for somej. In fact, the X’s need not be
and w(F)=sup{xlF(x)< l}, we have the fol-
identically distributed: If L, and L, are inde-
lowing theorem (B. V. Gnedenko, Ann. Math.,
pendent and ujbj # 0, then the distribution of
44 (1943)):
Xj is normal (Skitovich-Darmois theorem). Yu.
FEDA iff m(F)= CC and there exists an a
V. Linnik [23] gave a necessary and sufficient
such that
condition for the normality of F IO be equiv-
lim F(a+ux)/F(a+u)=xC for all x; alent to the identity of the distributions of L,
u-m
and L,. The condition is stated in terms of the
FEDA iff a-w(F)< m and zeros of the entire function (T(Z) = (a i 1’ + . . +
~cI,~~--(~~~~- . . . -lb,,l’. The result contains as
limF(a-ux)/F(a-u)=xY for all x>O;
u-0 a special case the following characterization
theorem for the normal distribution: If C aj’ =
FeDA iff there exists a positive function
1 and L, has a distribution identical to that of
R(t) such that
X,, then Fisnormal N(p,a2)withp(Zuj-1)
lim F(t+xR(t))/F(t)=e-” for all x. = 0. R. Shimizu gave a complete description of
r-w(F)
the characteristic function of the distribution
If F(x) is twice differentiable, f(x) = F’(x) for which L, has the same distribution as X,.
is positive for sufftciently large x, and In particular, it was proved that if loglu, I/
lim .+,&{U -FWf(x)il~x =O, then loglu, 1 is an irrational number, c( is the posi-
FEDA( Noticing the relation XC,,= tive number given by Cla,l”= 1, and if L, has
-max{ -Xi, -X2, , -X,}, we can also a distribution identical to that of X,, then F is
derive the possible limiting distributions for the +stable distribution with +characteristic
1395 374 J
Sampling Distributions

exponent ~1.The result was extended in [24] where ii is the covariance between (p(X,,
to the cases where the a’s are random variables . . . , X,,,) and (p(X,, . . . , Xi,Xi+i, . . . ,XL), with
independent of the x’s If E(X,) = 0 and E(X X! ,+, , . . . , XL an additional independent
IX,-X,X,-X ,..., X,-X)=O,thenFis random sample of size m-i from the same
normal [26]. Let pi, p2, . , p. be a set of real distribution. If ii #O, then U obeys the distri-
numbers. If the sampling distribution of the bution N(0, m*~i/n) asymptotically as n-+ co
statistic z(Xi + pi)* depends on the p’s only (W. Hoeffding, Ann. Math. Statist., 19 (1948)).
through C ,L$ , then F is normal. If Xi is posi- These results can be generalized to the case
tive and has finite mean, then the condition of several populations and samples. Let
E(X 1X,/X,, X,/X,, . . . , X,/X,) = const. implies X ll,“‘, X I”,; ..., .-G,..., XC”, be c independent
that F is the gamma distribution. If the distri- random samples, each drawn from one of c
bution F is not concentrated on a lattice 0, populations. Assume that a real-valued func-
P,h..., then JW&+~) - Xckj 1Xc,,) = const. tionrp(x,, ,..., xim,;...;x,i ,..., x&issym-
for some k implies that F is the texponential metric with respect to the arguments xii, . . . ,
distribution: F(x)= 1 -em”“, x>O. XClr+iJ-XCk) ximi for each i = 1, . . . , c. Then

u=fi Q -1.C(P(Xla(ll),...,Xlcr(lm,);...;
has the identical distribution for some k with
min{X,,X,, . . . . X,-,} iff F is exponential. If
a i, a 2,. , a, are positive numbers such that i=l 0m,
a,+a,+... +a,= 1 and such that loga,/Ioga,
is irrational, then the sampling distribution of
min{X,/a,,X,/a,, . . . ,X,/a,} is the same as where the summation extends over all combi-
that of X, iff F is exponential. Xo, is indepen- nations (cr(il), . . . , cc(imi)) of mi numbers taken
dent of XC1, - X iff F is exponential. from(1,2,..., ni) for each i = 1, . . , c, is called a
Suppose that the distribution F has a U-statistic. The mean and variance of U can
bounded density function and that the integral be obtained as before, while U is asymptoti-
&e’“dF( x ) is f mt‘t e on a neighborhood oft= cally normally distributed as the sample sizes
0. Let {X1,X,, . . . . X”} be a random sample n, , . . . , n, tend to infinity in fixed proportion.
from a distribution of the family 8= {F((x - Furthermore, if there are given several U-
p)/a)l -co<p<co,o>O}. For n>9, the sam- statistics, their tjoint distribution is asymptoti-
pling distribution of the statistics cally normal.

(X,--Xl) (X*-X,) (X6--X‘s) (X,-X,)


{ (X,-X,)’ w,-X7)’ (X9-m’ 0’ J. Distributions Having Monotone Likelihood
sgn(X,- Xl), sgn(X,- Xl), Ratio, and P6lya-Type Distributions

Let (S, b) be a sample space and


w4& -X4), wK -X4)
uniquely determines the family 9. If F is
symmetric, then for n > 6 the distribution of be a family of probability densities with re-
spect to a fixed ta-finite measure. The function
~I(&-m/w*-m Iw,-x,Y(x,-xx,)I}
uniquely determines B [25]. p&c) regarded as a function of f3 with a fixed
observed value of x is called the likelihood
function, and its value at a particular point 0 is
called the likelihood of that point. The family
I. U-Statistics
B with R c R is said to have monotone likeli-
hood ratio with respect to a real-valued function
Let {Xi, . . , X.} be a random sample from a
T(x) if and only if for any 0 < 9’ such that f3
certain distribution, and let cp(x,, . . . , x,) be a
and 0’ belong to R the ratio p,.(x)/p,(x) is a
real-valued function that is symmetric with
nondecreasing function of T(x). Under the
respect to the arguments xi, . . . ,x,. The
assumption that %Yc R and a2 log p&)/d&l
statistic
exists, a necessary and sufficient condition for
9 to have monotone likelihood ratio with
U= ; -l~dxal ,...,x,II)
1,
0 respect to T(X) E x is that a2 logp,(x)/MM 20
for any x and 8. If {Xi, . . . , X,} is a random
where the summation extends over all combi-
sample from a distribution that has a mono-
nations (cli, . . . . cr,,,)takenfrom(1,2 ,..., n),is
tone likelihood ratio and if a real-valued func-
called a U-statistic. Let fI=E((p(X,, . . . , X,)),
tion$(x,,..., x,) is nondecreasing in each of
and assume that E((p(X,, . . , X,,,)‘) is finite.
its arguments, then the expectation E&(X1,
Then the mean and variance of U are given by
. . . , X,)) is a nondecreasing function of 8.
The family B is said to be of P6lya type n
if and only if for any m = 1,2, . . . , n and any
374 Ref. 1396
Sampling Distributions

realnumbersx,<...<x,and0,<...<&,,, [15] F. Y. Edgeworth, The law of errors,


the determinant of the matrix (peL(xj)), i, j = Trans. Cambridge Philos. Sot., 20 (1904).
1,. . . , m, is nonnegative, and ,y is said to be [16] E. A. Cornish and R. A. Fisher, Moments
strictly of P6lya type n if the determinant is and cumulants in the specification of distri-
positive. Being of Polya type 2 is equivalent to butions, Rev. Int. Statist. Inst., 5 (1937). Repro-
having a monotone likelihood ratio. If 9 is duced in R. A. Fisher, Contributions to mathe-
(strictly) of Polya type n for any positive in- matical statistics, Wiley, 1950.
teger n, then it is said to be (strictly) of P6lya [ 171 R. N. Bhattacharya and J. K. Ghosh, On
type. An texponential family of distributions the validity of the formal Edgeworth expan-
with probability density p,(x) = exp(0x + a(0) sion, Ann. Statist., 6 (1978), 4344451.
+ s(x)) for x E X c R and HE !Z c R is strictly [18] R. N. Bhattacharya and R. Ranga Rao,
of Polya type. Each of the noncentral chi- Normal approximation and asymptotic expan-
square distribution, noncentral t-distribution, sions, Wiley, 1976.
and noncentral F-distribution is strictly of [19] E. J. Gumbel, Statistics of extremes,
Polya type with respect to the noncentrality Columbia Univ. Press, 1958.
parameter. [20] A. E. Sarhan and B. G. Greenberg (eds.),
Contributions to order statistics, Wiley, 1962.
[21] H. A. David, Order statistics, Wiley,
second edition, 1981.
References
1221 J. Galambos, The asymptotic theory of
extreme order statistics, Wiley, 1978.
[1] H. Cramer, Mathematical methods of [23] Yu. V. Linnik, Linear forms and statis-
statistics, Princeton Univ. Press, 1946. tical criteria, Selected Transl. Math. Statist.
[2] S. S. Wilks, Mathematical statistics, Wiley, Prob., 3 (1962), l-90. (Original in Russian,
1962. 1953.)
[3] M. G. Kendall and A. Stuart, The ad- 1241 R. Shimizu and L. Davies, General char-
vanced theory of statistics I, Griffin, fifth edi- acterization theorems for the Weibull and the
tion, 1952. stable distributions, Sankhya, ser. A, 43 (198 I),
[4] N. L. Johnson and S. Katz, Distributions 282-310.
in statistics, 4 vols., Wiley, 1969-1972. [25] Yu. V. Prokhorov, On a characterization
[5] Student (W. G. Gosset), The probable of a class of probability distributions by distri-
error of a mean, Biometrika, 6 (1908), l-25. butions of some statistics, Theory of Prob.
[6] R. A. Fisher, Statistical methods for re- Appl., 10 (1965), 438-445. (Origmal in Russian,
search workers, Oliver & Boyd, 1925. 1965.)
[7] T. W. Anderson, An introduction to multi- [26] A. M. Kagan, Yu. V. Linnik, and C. R.
variate statistical analysis, Wiley, 1958. Rao, Characterization problems in mathemat-
[S] A. T. James, Distributions of matrix vari- ical statistics, Wiley, 1973.
ates and latent roots derived from normal [27] J. Galambos and S. Kotz, Characteriza-
samples, Ann. Math. Statist., 35 (1964), 475- tions of probability distributions, Springer,
501. 1978.
[9] A. G. Constantine, Some non-central [28] G. P. Patil, S. Kotz, and J. K. Ord (eds.),
distribution problems in multivariate analysis, Statistical distributions in scientific work III,
Ann. Math. Statist., 34 (1963), 1270-1285. Characterizations and applications, Reidel,
[lo] H. Chernoff, Large-sample theory: 1975.
Parametric case, Ann. Math. Statist., 27 (1956) [29] S. Karlin, Decision theory for Polya type
l-22. distributions, Case of two actions I, Proc. 3rd
[ 1 l] K. Pearson, On the criterion that a given Berkeley Symp. Math. Stat. Prob. I, Univ. of
system of deviations from the probable in the California Press, 1956, 115-128.
case of a correlated system of variables is such
that it can be reasonably supposed to have
arisen from random sampling, Philos. Mag.,
(5) 50 (1900). Reproduced in K. Pearson, Early 375 (Xx.26)
statistical papers, Cambridge Univ. Press,
1948.
Scattering Theory
[ 121 K. R. Parthasarathy, Probability mea-
sures on metric spaces, Academic Press, 1967. A. General Remarks
[13] P. Bilhngsley, Convergence of probability
measures, Wiley, 1968. The path of a moving (incident) particle is
[14] R. H. Randles and D. A. Wolfe, Introduc- distorted when it interacts with another (tar-
tion to the theory of nonparametric statistics, get) particle, such as an atom or a molecule.
Wiley, 1979. Phenomena of this sort are generally called
1397 375 B
Scattering Theory

scattering. Scattering is called elastic when the criteria for “free” and “noninteracting.“) H,, is
internal properties of the incident particle and assumed to be tabsolutely continuous, which
the target remain unchanged after the colli- is the case in most practical situations. Then
sion, and inelastic when the internal properties the outgoing and incoming wave operators
change, other particles are emitted, or the two W, = W,(H, H,) are defined, if they exist, by
particles form a bound state. w = s-lim ei’He-i%
k (s-lim = tstrong limit).
The extent of scattering depends on the sizes *-*cc
of the incident and target particles. The scat-
This means that given any free motion emifHou
tering cross section is defined as the probability
there is an initial (t = 0) state u+ ( = IV+ u) such
that the incident beam will be scattered per
that emitHuk and emitHou are asymptotically
unit time (normalized to one particle per unit
equal at t = *co. W+ are tisometric, intertwine
time crossing unit area perpendicular to the
the two dynamics: eyiz’j W’+ = W, emi*“,, and
direction of incidence). In tclassical mechanics
map X (which is nothing but the tabso-
the scattering cross section is equal to the
lutely continuous subspace 2a,(H,) for H,)
cross section of the target perpendicular to the
onto a closed subspace of X&3). The scatter-
incoming beam, hence the term “cross sec-
ing operator S is defined as S = I%‘,* W- (A* is
tion.” The probability of scattering into a unit
the Hilbert-space tadjoint of A). S commutes
solid angle in a particular direction is called
with H,, and maps states in the remote past
the differential cross section. The probability
into states in the distant future. One of the
that the incoming particle is absorbed by the
most important problems in scattering theory
target, called the absorption cross section, is
is to prove the tunitarity of S, or equivalently,
intimately connected with the scattering cross
Ran IV+ = Ran W- (Ran = trange = timage). IV+
section. Analyses of scattering give infor-
is called complete if Ran IV+- = #a,(H). The
mation on the structure and interactions of
completeness of W, implies that S is unitary.
atoms, molecules, and elementary particles.
As a typical example we consider the l-body
One can also study the scattering of acoustic
problem. Note that the 2-body problem re-
and electromagnetic waves by inhomogeneous
duces to the l-body problem by separating out
media and obstacles, by considering notions the center-of-mass motion, which is free. Then
similar to the above.
H,, = -A (the negative TLaplacian in R3), H =
Scattering theory may be dated back to -A + V, the operator Y being multiplication
Lord Rayleigh. Since the advent of quantum
by a real-valued function V(x), called the po-
mechanics in the mid-1920s, scattering prob-
tential, and .A? = L,(R3). If I/ is short range, i.e.,
lems, mainly for central (spherically sym-
if, roughly speaking, V(x)=O(lxl-‘-‘) (E>O) at
metric) potentials, have been investigated
co, the wave operators are known to exist and
strenuously by physicists. It may be said, how-
to be complete (S. Agmon, Ann. Scuola Norm.
ever, that a scattering theory having mathe-
Sup. Piss, (4) 2 (1975)). If the potential V(x)
matically rigorous foundations began around
is long range, i.e., if, roughly speaking, V(x) =
the 195Os, when the pioneering work of K.
0(1x(-“) (E>O) at cx), then the foregoing defmi-
Friedrichs (Comm. Pure Appl. Math., 1 (1948)),
tion of the wave operators has to be modified.
A. Ya. Povzner (Mat. Sb., 32 (1953)), T. Kato
For the Coulomb potential V(x) = c/x, for
(J. Math. Sot. Japan, 9 (1957)), J. M. Cook
instance, one can adopt the following detini-
(J. Math. Phys., 36 (1957)), and J. M. Jauch tion of modified wave operators:
(He/u. Phys. Acta, 31 (1958)), among others,
appeared, and scattering theory has now grown IV+ =;-I& eitHexp( --it& - i(c/2)&“* log t).
into a branch of mathematical physics.
General references for mathematical scatter- It can be shown that the @+- exist (which
ing theory are, e.g., [l-5]. implies that the ordinary wave operators do
not exist) and are complete: Ran @* = Za,(H)
(J. D. Dollard, in [6]). The same result obtains
B. Wave and Scattering Operators for more general long-range potentials (H.
Kitada, J. Math. Sot. Japan, 30 (1978); T.
In tquantum mechanics the dynamics of Ikebe and H. Isozaki, Integral Equations and
an interacting system is given by a tone- Operator Theory, 5 (1982)).
parameter group of unitary operators emitH, If the wave operators exist and are com-
where t denotes the time and H, called the plete, they give tunitary equivalence between
Hamiltonian of the system, is a tself-adjoint the tabsolutely continuous parts of Ho and H
operator acting in a tHilbert space X. Ele- (- T. Kato [7]; 331 Perturbation of Linear
ments of 2 represent (pure) states of the sys- Operators).
tem. Let H,, be the “free” Hamiltonian of the In the foregoing discussion it was tacitly
corresponding “noninteracting” system. (There assumed that in dealing with scattering phe-
are at present no generally accepted definite nomena we adhere to states in JZ’~,(H,) and
375 c 1398
Scattering Theory

y%,,(H). A more physically intuitive definition R(,l k iO), 1> 0 (limiting absorption principle; -
in the case of potential scattering (H,, = -A, Agmon, lot. cit., for short-range potentials; for
H = H, -+ V) of scattering states C+(H) is the long-range potentials - R. Lavine, J. Func-
following: FEZ+ if and only iffor any r>O, tional Anal., 12 (1973); T. Ikebe and Y. Saito,
J. Math. Kyoto Uniu., 12 (1972); and Saito,
Publ. Res. Inst. Math. Sci., 9 (1974). With these
boundary values we can define “stationary”
(II II = b-norm), wave operators whose range is easily proved
to coincide with X8,(H), and show their equal-
where F, is the (projection) operator of multi- ity to the time-dependent wave operators
plication by the characteristic function of discussed in Section B, thus obtaining the
(x E R3 1Ix I < Y}. In general no inclusion rela- completeness of the latter.
tions between C,(H) and J?~,(H) are known. H,, is known to have generalized (improper)
But for a wide class of potentials it is known eigenfunctions cp,,(x, f) = eiX’t with generalized
that Z”,(H) as well as the tcontinuous sub- (improper) eigenvalues I< 12. The associated
space of H coincides with C+(H) (in this case eigenfunction expansion is nothing but the
there is no tsingular continuous spectrum) (W. Fourier integral expansion. An eigenfunction
0. Amrein and V. Georgescu, Helv. Phys. Acta, expansion, similar to the Fourier expansion,
46 (1973); C. Wilcox, J. Functional Anal., 12 that diagonalizes H can be obtained by using
(1973); Amrein, in [S]). generalized eigenfunctions
A purely abstract result in scattering theory
may be noted. Let H, and H be self-adjoint
operators in an abstract Hilbert space .Y? such =cp,(x,5)-(R(l5~*~iO)~cp,(~,5))(x),
that 1/= H - H, is a ttrace-class (+nuclear)
operator. Then the generalized wave operators which are the solutions to the Lippmann-
W,(H, H,,)=s-lim,,,, exp(itH)exp( -itH,) Schwinger equation

1sefilCllx-Y
P,,( H,,) exist, where I’,,( H,,) is the tprojection
onto .?Q(H,). Since this statement is sym-
metric in H, and H, the “inverse” generalized
wave operators W+(H,,, H) also exist, from
which one can con&de that W,(H, H,) are
-~ ~ Wv+(y>
47l
WY. R” ix-Y1

complete. Moreover, the invariance principle A rough statement of this is the following:
holds, which means roughly the following: Let 12,(5)=(2~)-3’2S~I-(~,S)~(x)dx. Then
If 4 is a strictly increasing function on R, lltlll= ll4, VW-(5)=1512~(5X and u(x)=
then W,(&H),q5(H,)) exists and is equal to (271)-3’2s(p+(x,5)a(5)d5 (- e.g. [4,X1.6] for a
W,(H, H,). This result can be applied to poten- more precise statement).
tial scattering when I/(x)EL~(R~)~L,(R~) In view of the fact that S commutes with H,,
(Kato [7]; D. B. Pearson, J. Functional Anal., we can show that S admits the following repre-
28 (1978)). sentation: Let ti(t)=(2n)-3’21~O(~, <)u(x)dx be
the tFourier transform of u. Then

C. Stationary (Time-Independent) Approach


=0(5)-k s~161~(~,l~l~‘)~(l~l~‘)d~‘,
We again consider the 1-body problem as in J
Section B. V(x) is assumed to verify certain
where
appropriate decay conditions at co as the
case may be. Consider the tresolvents R&)=
T((,(‘)=(~z)-~ cp,,(x,~)v(x)cp(x,~‘)dx
(Ho-z)-’ and R(z)=(H-z)-’ for zEC-R,
s
which are well-defined bounded integral opera-
tors on Z = L2(R3). Here we note the follow- is the kernel of the so-called T-operator, which
ing: [0, co) is the tcontinuous spectrum of is a tcompact operator on L2(S2) under suit-
H, and H, (-co, 0) is contained in the tresol- able conditions on V(x). T(& l’) is related to
vent set of H,, and H has possibly tdiscrete the experimentally measurable total cross
teigenvalues in [ -a,O) with (-co, -a) con- section (for incident momentum ii) (r(t):
tained in the resolvent set, where a is a posi-
tive number. When z approaches a positive 45)=2z lf(151;w,M2dw’ (w=5/151)>
real value, R,(z) and R(z) do not have limits as sS’
Tbounded operators on E. But if we regard .f(& Co,cd) = - 27c2 T(lw, Ad) (a. > 0).
them as operators from L,,, to LZ, -y (L2,a=
{u[(l +Ix~)%(x)EL~(R~)}), y> l/2, they can be The quantity f(k w, w’) is called the scattering
shown to have boundary values R,(i + i0) and amplitude and appears in the asymptotic
1399 375 F
Scattering Theory

expansion of cp+(x, 5) as monies k;,(w), weSz. Let E, be the projection


onto the subspace spanned by functions of the
form Z~=-,U~~(~).)~(O) (r= 1x1>O,w=x/r). .X
becomes an orthogonal sum of *l = E,#. The
where d=a/lal for acR3. aforementioned commutation property claims
An abstract version of the limiting absorp- that E,S = SE, = S,, and the operator 9, reduces
tion principle and eigenfunction expansion to multiplication by a scalar function ezidI(‘) in
is known as the Kato-Kuroda theory, for $r (- Section C). s,(n) is called the phase shift.
which the reader is referred to Kato and S. T. Defining the partial wave scattering amplitude
Kuroda in [6] and in Functional analysis and fi(A)=(2il)-‘(ezidl(“)- l), one obtains the
related fields, Springer, 1970, and to Kuroda partial wave expansion of the scattering
(J. Math. Sot. Japan, 25 (1973)). amplitude:

f(n;w,co’)= F (21+ l)fr(l)P~(COS@,


D. Time-Dependent Approach I=0
where 0 is the angle between o and w’, and P[
Consider the same situation as in Section C. is a tLegendre polynomial. The total cross
Since scattering is a time-dependent phenom- section is o(A)=47~l-~C~=,(21+ l)lfi(n)(’
enon, it seems natural to develop scattering [l-5,9].
theory in a time-dependent fashion. Indeed,
there is an approach to the completeness of
wave operators that does not resort to any F. Many-Body Problem (Multichannel
eigenfunction expansion results, but instead Scattering)
follows the temporal development of the wave
packet e-irH~. The completeness of W, will be We consider only the 3-body case, which is
established if one can show that any UEX’~,(H) complicated enough compared with the 2-
orthogonal to Ran W+ is 0. A crucial step to (essentially l-) body case. The complications
prove this is to find aclever decomposition of are both kinematical and dynamical. The
a wave packet into an outgoing and an incom- configuration of a 3-body system is given by
ing one, or to find projections P, such that a point in Rg. Once we choose the center-of-
P+ + P- =I and Pke-““o goes to 0 as t+ Tco. mass coordinates, there is no kinematically
Some compactness arguments are also needed. natural way to choose the remaining six co-
To construct such a decomposition or pro- ordinates. In the 2-body case a freely moving
jections one looks at the scalar product x. 5, particle in the remote past will be freely mov-
x and 5 being the position and momentum ing in the distant future. But in the 3-body
(operators), respectively. The main idea is that case there come into play various other dy-
if this is positive, the particle will be outgoing namical processes, such as capture, breakup,
(to infinity), and if negative, incoming (from in- rearrangement, and excitation.
finity). But since we work in the framework of The 3-body Hamiltonian is a self-adjoint
quantum mechanics, this classical-mechanical operator in L,(R’) of the form
intuition should be properly modified.
Besides the completeness of wave operators '= -i$ &k+ 1 Kj(xi-xj),
I i<j
it can also be shown through the above ap-
proach that the singular continuous spectrum where Ai is the 3-dimensional Laplacian as-
of H is absent. For details - [4, XI.171 and sociated with particle i, m, is the mass of par-
V. Enss, Comm. Math. Phys., 61 (1978). ticle i, and each vj(x) (= qi(x) = Kj( -x)) is
a real-valued function decaying at cc in R3
(not in R’). If we remove the center-of-mass
E. Partial Wave Expansion motion, I? can be written in the form
R = Ho 0 I + I 0 H (0 = ttensor product),
In this section we assume that the potential
V(x) is central, i.e., V(x) is a function of 1x1 where Ho is the center-of-mass Hamiltonian in
alone. Then the scattering operator S turns out L2(R3) representing the uniform free motion
to commute not only with H, but also with the of the center of mass, and H is the Hamiltonian
tangular momentum operator L = x x i-l V of relative motion in L2(R6). One should note
(vector product). The eigenvalues of L2 = L. L as mentioned above that there is no unique
(scalar product) are I(1 + 1) (1=O, 1,2,. . . ), and natural way of choosing coordinates in R6 and
those of L,, the third component of L, are m = representing H, but there are many equivalent
-I, -(l-l), . . . ,I- 1,1 if L* has eigenvalue representations. Suppose, for instance, that
l(l+ 1), while simultaneous eigenfunctions are particles 1 and 2 and particles 1 and 3 form
given by suitably normalized tspherical har- tbound states (12) and (13) and that there are
375 G 1400
Scattering Theory

no bound states between 2 and 3. We partition tering amplitude, determine the potential(s)
the whole system into clusters: (1) (2) (3), (12) giving rise to the operator or amplitude. We
(3) and (13) (2) (( ) represents a cluster and consider here only the 2-body case (as to the
figures in ( ) are the particles forming the many-body case, almost nothing is known).
cluster). A channel is a partition into clusters The central-potential case can be reduced to
together with a specified bound-state eigen- l-dimensional problems on (0, m). In the l-
function. Take, for instance, channel (12) (3), dimensional case the celebrated Gel’fand-
and suppose $E&(R~) is the eigenfunction Levitan theory (- 112 Differential Operators
in question. If we take x=x2 -x, and y = xj - 0) has long been known and has been success-
(m,+m,)-‘(m,x,+m,x,), then fully applied even to nonlinear problems such
as the +Korteweg-de Vries equation. In the
H= -&fly-&A,+ V,,(X) 3-dimensional case, however, the problem
becomes difficult; so far there has not been
any satisfactory theory comparable to that for
+ h3(s-Y)+ &I(*+ y), the l-dimensional case. The potential
function R3+R. The scattering amplitude
V(x) is a

wherem~‘=m;‘+(m,+m,)-‘,n-‘=m;‘+ f(& w, w’) is a function f: R x S2 x S2 +C. Let


tn;l. Let us neglect the interactions between A4 be the mapping that takes V into f: The
(12) and (3), i.e., set V,,= V31 =0 to define the inverse problem deals with M-‘. Several ques-
cluster decomposition Hamiltonian tions may be posed (in order of increasing
difficulty): (1) Is M one-to-one? (2) When it is
H (12)(3)= -&k&L+ KzW known that M is one-to-one and f is in the
image of M, how does one (re-)construct the
Let XC12jC3)= L2(R3) (called the channel Hil- V that yields f? (3) What conditions charac-
bert space consisting of functions of y), and terize the image of M? Question (1) has been
define a mapping 7:Y~,,~~,,-+~=L,(R6) rather satisfactorily answered insofar as short-
(functions of x and y) by (~f)(x, y) = $(x),f’(y). range potentials are concerned. Concerning
The channel wave operators y12jC3jk are de- questions (2) and (3), attempts have been and
fined by are being made to generalize the Gel’fand-
Levitan theory, but it may be said that we
are still at the beginning stage. References are
[2,9, lo] and R. G. Newton (J. Math. Phys.,
as isometries from flc,2)(3) into 2. Their
20 (1980); 21 (1981); 22 (1982)).
ranges are not expected to coincide as in the
l-body case. Y1jc2)(3)~ and Y13jc2j* are
similarly defined. Note that Xt1,(2,(3,=~.
H. Scattering for the Wave Equation
If x and p are distinct channels, we have
Ran W,, I Ran Wp+ and Ran W,_ I Ran Wp-,
Consider the +wave equation u,, --Au = 0 in R3.
but no such relations exist between Ran W,,
The solution u(t) = u(t, x) is uniquely deter-
and Ran Wp- or Ran W,- and Ran Wp+. De-
mined by the initial data {f; g} = {u(O), u,(O)},
fine, for channels CIand /j, Sap: y;“p-X* by Sap
and U,(t){f;g} ={u(t),u,(t)} defines the solu-
= (W,,)* Wp- Now the scattering operator
tion operator LJJt). The set of data {,fig} with
S for the 3-body system is defined as the
finite energy: S(IVf12+lglZ)dx<Co forms a
tdirect sum of SZp acting in the Hilbert space
Hilbert space &,. Uo(t) is a unitary group
2, @ 2%: S = C.,B @ SZp. Naturally the ques-
on X,,. A similar description is possible for
tion arises: Is S unitary? The first affirmative
solutions of the wave equation in an exte-
answer was made by L. D. Faddeev (Israel
rior domain R outside an obstacle with zero
Program for Scient$c Translations, 1965 (in
boundary condition. Denote the resulting
English; original in Russian, 1963)), and later
Hilbert space and solution operator by X
the work of J. Ginibre and M. Moulin (Ann.
and U(t), respectively. Let J:z&-+Y? be the
Inst. H Poincari, A21 (1974)) and L. Thomas
identification operator defined by (J { A g;)(x)
(Ann. Phys., 90 (1975)) came out. The method
= {f; g} (x), x E s2. The wave operators are de-
of these authors is stationary. There have also
fined by W+ = s-lim,, +u: U( -t)JU,(t). The
been some attempts using time-dependent
existence of W+ is shown rather easily by using
methods.
+Huygens’s principle. As in Section B, we
define the scattering operator S = W,! W- and
G. Inverse Problem say that W+ is complete if Ran WI, = I&(H),
where H is the self-adjoint tinfinitesimal gen-
The inverse problem in potential scattering erator of U(t): U(t) = e mirH.The completeness
may be formulated at least mathematically as of W+ and the unitarity of S are proved on the
follows: Given the scattering operator or scat- basis of the abstract translation representation
1401 376
Scheduling and Production Planning

theorem: Let U(t) be a unitary group on a expenditure-oriented, and (5) industrialization-


Hilbert space Z”. Suppose there exist sub- oriented. Types (2), (3), and (5) belong to pro-
spaces D, and D- , called outgoing and incom- duction planning in a broad sense, as emphasis
ing subspaces, such that U(t)D, CD+ for is placed on production in these models.
kt20, ntsR WW, = {O), and UreR WP, A typical production planning theory of
is dense in ti. Then we have two unitary primary importance is activity analysis, which
operators W, : ti+&(R; IV), where N is an has made remarkable progress since its initi-
auxiliary Hilbert space, such that W, U(t). ation by T. C. Koopmans [ 11. Its principal
9,’ is right translation by t, and D, (D-) is theoretical content consists of tlinear pro-
mapped onto L,(O, a; N) &(-co,@ N)) by gramming. Most applications of linear pro-
a+ (W-1. gramming are more or less concerned with
Turning to the concrete situation, one can production activities. Because of the additivity
study the detailed properties of S. The unique- and divisibility of production as well as the
ness theorem in the inverse problem is also limitation of production intensities, problems
obtained, to the effect that S determines the of production planning can be formulated as
obstacle uniquely. The foregoing treatment of problems of linear programming. The meth-
scattering is known as the Lax-Phillips theory ods of tlinear algebra are used to obtain an
(P. D. Lax and R. S. Phillips, in [6,8,11]. optimal production plan and are very impor-
tant in modern economic analysis, because
these methods not only provide practical
References
algorithms but also clarify the role of price,
especially in tdual linear programming
[l] W. 0. Amrein, J. M. Jauch, and K. B.
problems.
Sinha, Scattering theory in quantum mechan-
The originators of general equilibrium
ics, Benjamin, 1977.
theory (- 128 Econometrics) failed to give an
[2] R. G. Newton, Scattering theory of waves
analytical demonstration of the existence of
and particles, McGraw-Hill, 1966.
solutions of certain systems of equations of
[3] E. PrugoveEki, Quantum mechanics in
economic relevance. The existence of a deter-
Hilbert space, Academic Press, 1971.
minate equilibrium was established first by
[4] M. Reed and B. Simon, Methods of mod-
A. Wald for a system of equations of the
ern mathematical physics III, Scattering
Walras-Cassel type. On the other hand, J. von
theory, Academic Press, 1979.
Neumann [3] proved the existence of non-
[S] J. R. Taylor, Scattering theory, Wiley,
negative solutions CI,/3, xi, yj for a system of
1972.
inequalities
[6] Rocky Mountain J. Math., 1 (1) (1971)
m
(special issue on scattering theory).
c( 1 UijXi< f bijXi, j=l,2 ,...,n,
[7] T. Kato, Perturbation theory for linear i=l i=l
operators, Springer, 1976.
[S] J. A. Lavita and J.-P. Marchand (eds.), BjZfIl%Yj a j$ bijYj7 i=l,2 ,...,m,
Scattering theory in mathematical physics,
Proc. NATO Adv. Study Inst., Denver, Colo., by reducing the problem to the proof of the
1973, Reidel, 1974. existence of a tsaddle point (- 292 Nonlinear
[9] V. De Alfaro and T. Regge, Potential Programming A) of the function
scattering, North-Holland, 1965.
[lo] K. Chadan and P. C. Sabatier, Inverse @(X, Y)= F t bijxiyj/ f $ aijxiyj
problems in quantum scattering theory, i=l j=l i=l j=l

Springer, 1977. by means of Brouwer’s fixed-point theorem. In


[ 1 l] P. D. Lax and R. S. Phillips, Scattering this result an equilibrium is defined in the
theory, Academic Press, 1967. broad sense that demand for goods does not
exceed their supply, rather than requiring
exact balance.
A second important kind of production
376 (XIX.1 5) planning is related to both tinventory control
Scheduling and Production (- 227 Inventory Control) and sales planning.
An example is the minimization of
Planning
T
(z(t)+Bmax(dz/dt,O))dt
Production planning emerges in many situ- s0
ations. Models of economic planning can be subject to the condition z(t)>r(t), where z(t)
classified as (1) fiscal policy-oriented, (2) final and r(t) are the output and demand, respec-
demand-oriented, (3) structure-oriented, (4) tively, at time t. Stabilization of employment
376 1402
Scheduling and Production Planning

and production can also be classified as a such as the probability of completing the
production planning problem of this kind, in project before the specified due date, are com-
which inventory holding is considered as a puted. In CPM, on the other hand, a mini-
means for lessening the change of employment mum cost schedule to attain the given due
level. This is related to the problem of smooth- date is obtained by utilizing tnetwork flow
ing production by inventory control. Dynamic algorithms (J. E. Kelley [6], D. R. Fulkerson
programming (- 127 Dynamic Programming) [7]), in which the processing time of a task is
is very useful in dealing with problems of determined by linear interpolation between the
smoothing production. normal time (achieved with low cost) and the
Production planning as a production man- crash time (high cost).
agement tool is often embodied in scheduling. PERT and CPM are used in various areas
Consider a project that consists of n indivi- of application, e.g., civil engineering and the
sible tasks (jobs or activities) Ji, i = 1,2, , n, construction industry, shipbuilding, produc-
each requiring pi units of time for processing, tion of automobiles, machines, and electric
where pi is given either deterministically or apparatus, and management of research and
probabilistically. A precedence constraint development programs. PERT was originally
(generally a tpartial ordering) partially specify- developed by the US Navy to monitor and
ing the order in which these tasks are to be control the development of the Polaris fleet
processed is also imposed by technical con- ballistic missile program, while CPM was
siderations: One attempts to find a schedule developed by the RAND Corporation and the
(i.e., a specification of the time to process the Du Pont Corporation, both in the late 1950s.
tasks Ji) consistent with the given precedence Computers have been essential from the begin-
constraint. ning, to handle the large amount of associ-
Well-known techniques developed for this ated data. A number of application program
purpose are PERT (program evaluation and packages, each with some additional features,
review technique) [4] and CPM (critical path are currently available, e.g., PERT/TIME,
method) [S], in which the precedence con- PERT/COST, CPM, and RAMPS.
straint is represented by an acyclic tdirected The machine sequencing (scheduling) prob-
graph, called an arrow diagram, a project lem arises when the resources, instruments,
network, or a PERT network, such that each workers, and so forth, required to process a
task Ji corresponds to an arc of length equal task are abstractly formulated as machines
to pi. An arrow diagram is illustrated in Fig. 1. and if the restriction on the number of avail-
The longest path from the start node to the able machines is taken into consideration
end node in the network is called the critical (i.e., the conflict between tasks competing for
path, and gives the minimum time necessary to the same machine at the same time must be
complete the project. Following computation resolved). Usually one machine is assigned
of the critical path by means of dynamic pro- to each task. Such a machine is either (a)
gramming, computations are also made for the uniquely determined for each task or (b) chosen
earliest (latest) start time, the earliest (latest) from a given set of machines; in the latter case,
finish time, and the floats (i.e., the allowances there might be (i) parallel machines with the
for such start and finish times) of each task to same capability or (ii) machines with different
be satisfied in order to complete the project capabilities. The precedence constraints are
within the indicated minimum time. These are also ramified into independent (i.e., no con-
used to review and control the progress of the straint), in-tree, out-tree, series-parallel, and
project. In PERT the processing time of each general partial ordering constraints. Each task
task is probabilistically treated on the basis of has a ready time (release time) r, such that J,
three estimates: most likely, optimistic, and cannot be processed before it, and a due time
pessimistic. From these data other parameters, di. One is asked to find a schedule satisfying
the above machine constraints, precedence
constraints, and ready time constraints, while
considering an optimality criterion that is a
function of the completion time Ci of J, (i =
1,2, . . , n). Typical criteria for minimization
are: (1) maximum completion time (makespan)
C,,, = maxi Ci; (2) flowtime (total completion
time) F = C Ci; weighted flowtime C wiCi,
Fig. 1
where wi > 0 are weights representing the rela-
A, B, , M denote tasks, while the associated in-
tegers are their processing times. Bold arrows indi- tive importance of Ji; (3) maximum lateness
cate the critical path. The start node is on the left, L max=max; Li, Li= C; -d;; (4) total tardiness
the end node is on the right. T= C T, T = max(O, L,), and weighted total
1403 376 Ref.
Scheduling and Production Planning

tardiness x y’T;:; (5) number of tardy tasks job-shop and flow-shop scheduling problems
U = C Ui, Ui = 1 (if Ci > di), 0 (otherwise), and [S, 91. Many approximation algorithms to
weighted number of tardy tasks C wi Ui. obtain good suboptimal schedules in reason-
Numerous problems can be defined by able computation time are also known, and
combining the above conditions. Typical ones their worst-case and average accuracies have
might be: the job-shop scheduling problem, in been analyzed [lo], as these are important in
which n tasks are scheduled on m machines of practical applications.
type (a), and where the maximum completion In more realistic scheduling situations,
time is minimized; the flow-shop scheduling other factors, such as the set-up cost, balanc-
problem, which is the same as the job-shop ing of production lines, frequent modifications
scheduling problem except that n = n’m tasks and updatings of project data, capacity of
are divided into n’ groups of m tasks processed factories, manpower planning including the
on machme 1, machme 2, . . . , machine m, re- possibility of overtime and part-time employ-
spectively, in this order; the multiprocessor ment, should be taken into account. Both
scheduling problem, in which the maximum deterministic and probabilistic models have
completion time of n independent tasks on been proposed for these cases. Mathematical
m parallel machines is minimized; the one- tools used to compute adequate schedules
machine sequencing problem, assuming only include tmathematical programming, tqueu-
one machine (with various types of precedence ing theory, and tsimulation techniques.
constraints and optimality criteria), and others
P, 91.
These machine sequencing problems are
References
examples of the combinatorial optimization
problem (- 281 Network Flow Problems E),
as the processing time pi is usually considered [l] T. C. Koopmans, Activity analysis of pro-
to be a given constant. Their computational duction and allocation, Wiley, 1951.
complexity (- 71 Complexity of Computa- [2]0. Morgenstern, Economic activity analy-
tions) has been extensively studied with an sis, Wiley, 1954.
emphasis on the classification between those [3] J. von Neumann, Uber ein okonomisches
problems solvable in polynomial time and Gleichungssystem und eine Verallgemeinerung
those that are tNP-complete, as summarized des Brouwerischen Fixpunktsatzes, Ergebnisse
in [lo]. Table 1 lists representative results for eines Mathematischen Kolloquiums, 8 (1937),
one-machine sequencing problems with ri = 0 73-83; English translation, A model of general
(i = 1,2, . , n). The improvement of the algo- economic equilibrium, Collected Works, Per-
rithm efficiency is pursued for both polynomi- gamon, 1963, vol. 6, 29-37.
ally solvable problems and NP-complete prob- [4] D. G. Malcolm, J. H. Reseboom, C. E.
lems. tBranch and bound (- 215 Integral Clark, and W. Fazar, Application of a tech-
Programming D) is a common approach used nique for research and development program
to solve NP-complete problems such as the evaluation, Operations Res., 7 (1959), 646-669.

Table 1. One-Machine Sequencing Problems with ri =0


Optimality Precedence Other
Criterion Constraint Constraints Complexity
c nlax Partial order None Oh21
X Ci Partial order None NP-complete
C wiCi Series-parallel None O(nlogn)
Partial order pi=1 (i=1,2,...,n) NP-complete
L max Partial order None W2)
CTi Independent None Not known”
Partial order None NP-complete

ZwiT Independent None NP-complete


Z ui Independent None O(nlogn)
In-tree, out-tree pi=1 (i=1,2,...,n) NP-complete
Ewiui Independent Pi<Pj=-Wi2Wj O(nlogn)
Independent None NP-completeb
a. An algorithm with O(n5p,,,) running time is known, where pmar=maxipi.
b. An algorithm with O(n X pi) running time is known.
377 A 1404
Second Quantization

[S] J. E. Kelley, Jr., and M. R. Walker, Critical satisfying


path planning and scheduling, Proc. Eastern
a+(f)q’fi 0 1.. of.
Joint Computer Conference, 1959, 160- 173.
[6] J. E. Kelley, Jr., Critical-path planning and =(n+ l)“ZE~+“f@f, @ . Of..
scheduling: Mathematical basis, Operations
Res., 9 (1961), 296-320. For f 6 K, f denotes the element of the dual
[7] D. R. Fulkerson, A network flow computa- K* satisfyingf(g)=(g, f) for geK (the inner
tion for project cost curves, Management Sci., product is linear in the first entry). The annihi-
7 (1961), 1677178. lation operator a(f) is defined by
[S] R. W. Conway, W. L. Maxwell, and L. W.
Miller, Theory of scheduling, Addison-Wesley,
1967.
=n -‘:2,$ P(fj, f)E!yfi 8 . . . . . . . Of”,
[9] K. R. Baker, Introduction to sequencing
and scheduling, Wiley, 1974.
where the tensor product of fk, k #j, appears in
[lo] R. L. Graham, E. L. Lawler, J. K. Len-
the jth term and E= f 1 depending on which
stra, and A. H. G. Rinnooy Kan, Optimization
of k is taken in F+(K). For n = 0, a(f)Q is
and approximation in deterministic sequenc-
defined to be 0. The adjoint of a’(f) coincides
ing and scheduling: A survey, Ann. Discrete
with a(f) on D,.
Math., 5 (1979), 287-326. The creation and annihilation operators
map D, into itself and satisfy the following
commutation relations on D, :

Ca+(fJ>a+(f2)1T=C4fJ>4mT =Q
377 (Xx.27) C4fi)>a+(f2)lr =(f2,f1),
Second Quantization
where [A, B] T = AB T BA and f is used de-
pending on the choice of + in F*(K). These
A. Fock Space relations are often called canonical commuta-
tion relations for [ , ] - (CCRs) and canonical
For a complex Hilbert space K with dim K > 1, anticommutation relations for [ , ] + (CARS).
Km” denotes the n-fold tensor product of K On F-(K), a+(f) and u(f) are bounded
with itself (where the vectors fi @ . . . of, with with Ila+(f)ll= lIa(f)ll= Ilfll. On cF+(K), both
fjc K are total). Let E$!)
- be the projection a+(f) and a(f) are not bounded, though
operators on totally symmetric and antisym- u’(f) N -i” and a(f) N -u* are bounded.
metric parts of Km”: On F+(K), 2-“‘(a+(f)+a(f)) is essentially
self-adjoint. Let tj(f) be its closure. The oper-
ator W(f) = &W(f) is unitary and satisfies the
identity

where the sum is over all permutations P,


s+(P)= 1 and s-(P) is the signature of P (+1
for even permutations and -1 for odd permu-
Let K, be a real subspace of K such that the
tations). The following orthogonal direct sum
inner product (L g) in K is real for any f and g
is called a Fock space (symmetric for E,, and
in K, and K = K, + iK,. (K, is then a real
antisymmetric for Em):
Hilbert space.) The unitary operators U(f) =
B’(f)and V(f)=lV(if)forfeK,satisfythe
following Weyl form of the CCRs:

Here the term for n = 0 is the 1-dimensional U(,fl,) U(f2) = U(fl +fA
space C, and a vector 51 represented by 1 EC is
called the vacuum vector in P*(K). The sub- Vfl) Vf2) = Vfl +fh
space F+(K) = E!J)K@” is called the n-particle WfJ Uf2) = Vf2) WfJev( - i(f, ,f2)).
subspace. Th% operator N = Co,“=, n, which
The infinitesimal generators of the continu-
takes the value n on P+(K),,
- is called the num-
ous one-parameter groups of unit aries U(tf)
ber operator.
and V(tg) (PER) are denoted by q(f) and rr(g)
On the algebraic sum
and satisfy the following CCRs:

C44flX 44fi)l‘r = C4Sl)> 492)lY = 0,

Cdf ), 4dlY = i(AgY3


the creation operator a’(f) for f~ K is defined
as the unique linear operator with domain D, where [A, B] = AB - BA and YE n+
1405 377 c
Second Quantization

If Q is a linear operator on K, r(Q) denotes called second quantization. The Hamiltonians


the linear operator on F+(K) defined as the Hn for all n can now be combined into the
closure of Z@,“,, Q@” 10; It is bounded if expression
IlQllGl. UQl)UQ2)=r(Q1Q2)on D+. IfH is
a self-adjoint operator on K, then r(@‘)= H= Y+(x)TY(x)d3x
exp it dT(H) defines a self-adjoint operator 5
dT(H) on T,(K), usually called a bilinear
Hamiltonian and denoted (a+, Ha). More +; Y+(x)Y+(y)v(x-y)
s
explicitly,
x Y(x)Y(y)d3xd3y,
dT(H)E!J)fi
@.. of n
where the first term is dT( T).
=~E:"'f,Q...~H~Q...Qf,. For particles such as electrons the system of
n identical particles is described by a totally
If U is a unitary operator on K, then antisymmetric wave function, the total anti-
symmetry being referred to as Fermi statistics.
r(u)w(f)r(u)-l=w(uf). Then the antisymmetric Fock space E(K)
can be used in exactly the same manner as
B. Second Quantization F+(K) in the preceding case.
The method of second quantization was
A single (scalar) particle in quantum mechan- introduced by P. Dirac [2] for the case of
ics is described by a wave function Y(x), x E bosons and extended by P. Jordan and E.
R3, considered as a unit vector in a Hilbert Wigner [3] to fermions. Electromagnetic
space K = L2(R3). The system consisting of n waves, when quantized in this way, represent
such identical particles is described by a totally a system of photons, and the quantization of
symmetric function Y(x,, . . . ,x,), xjsR3, con- electron waves leads to the particle picture of
sidered as a unit vector in the totally sym- the electron. The method of second quantiza-
metric part E’$)K@” of the n-fold tensor prod- tion is intimately connected with the notion
uct of the one-particle Hilbert space K, where of fields, as shown below for free lields, and is
the restriction to totally symmetric wave func- the basis of the perturbation approach in field
tions is referred to as Bose statistics. In a non- theory (- 150 Field Theory).
relativistic system, the Hamiltonian operator
on a l-particle space is
C. Free Fields
T= -h*(2m)-‘A,,
Let crj (j= 1,2,3) be tPauli spin matrices and
called the kinetic energy (AX denotes the La- 0, = (h 7). Let p”= C,“=, a& for a 4-vector p =
placian); on an n-particle space it is typically (p”,p)withp=(p’,p2,p3)andpo=(m2+p2)1~Z.
given by
Let uj(a, A) be the irreducible unitary repre-
H,= -h2(2m)-’ f: AXj+i V(x,-xi), sentation [m+ , j] of .!?I on a Hilbert space
j=l i<j Kj=L2(R3,CZj, (2p”)-‘(m@)@‘jd3p) (- 258
where V is a 2-body potential. Lorentz Group C (3)).
The totality of multiparticle spaces Ey)K@‘” Consider first the symmetric Fock space
can be described in terms of the Fock space g+ (K,). For any complex-valued rapidly
F+(K), the vacuum vector R (no-particle decreasing (?-function f (f l Y(R4)), let
state), and the annihilation and creation oper-
ators, denoted by j(p)=(2~)-~” eip’“f(x)d4x
s
47)~ 'Wf(x)d3x, (P = (PO, p), PO = (p’ + m2)‘“),
s

u*(f)= Y+(x)f(x)d3x.
A(f)=a+(j)+a$) (=j44fW'+
s
where p. x = pox0 - &%1 pjxj and the bar
Since the CCRs (for operator-valued
denotes the complex conjugate. Then A(x) as
distributions)
an operator-valued distribution satisfies the
cwx), WYII- = C’y’c4 y+(Y)l- =o, twightman axiom and is called the free scalar
field of mass m. It satisfies the Klein-Gordon
CWA‘y’(Y)l- =S3(x-y) equation
are a continuous generalization of CCRs for 1 (Cl,+m2)A(x)Y=0
canonical variables in quantum mechanicsand
since Y(x) comes from the wave function
way of quantization, the above formalism
by
is ( q ,=(alaxo)2-jil ww2
>
,
377 D 1406
Second Quantization

it has the 4-dimensional scalar commutator as before, and let

and it has the two-point function

i(A(x)A(y)!2,R)=A,+(x-y).

Here Y is any vector in D,; for example, R is


the vacuum vector, and the invariant distri- where g2 = (p -i) = -is. Then $,(x) as an
butions A,,, and A,’ are defined by operator-valued distribution satisfies the
+Wightman axiom and is called the free Dirac
A,‘(x)=i(2~)-~ e-‘p’“(2p0)-‘d3p, field of mass m. In the present formulation,
s {I,!J~, IJ,} is a contravariant spinor of rank (1,O)
and { ti3, $,} is a covariant spinor of rank
A,,,(x)=(~TT~~ (sinp.x)(p’)-‘d3p. (0, I). This field satisfies the Dirac equation
s
Cy’(3/8x”)+im $(x)=0,
If we define U(a, A(A)) = T(u,(a, A)), then ( P >
as well as the relations
If gEY(R3) and hgY’(R3), J(p) and h”(p) c+kd> &?(Y)l+y = 0,
obtained by substituting g(x)6(x”) and
-!-r(x)S’(x’) into fin the defining equation cw4 &J(Y)*1 +y
of ,I above are in K,. We then define
=

rr(h)=u+(ii)+u(R)
~ ( =jn(x)h(x)d3x)’ ~ =j(Zi.l’(a/a,~)-im)~“}~~A’(x--y),

The operator-valued distributions q(x) and Here YE D,, A,,, and A,’ are as described
X(X) are the canonical field and its conjugate above, and the y’s are Dirac matrices in the
field at time 0 for the free scalar field and following form (somewhat different from but
satisfy the following canonical commutation equivalent to the usual form; - 351 Quantum
relations: Mechanics):
CV(X)>cp(Y)l -‘r,=C~(x),~(Y)lLy=O0,
[q(x), n(y)] -Y = iP(x - y)Y.

If we set T(t)- U(te,, 1) for e,=(l,O,O,O) and the 0’s are +Pauli spin matrices.
and (~(Qg)(x)=r(x~)g(x) for KEY and
g E sP(R3), then for YE D,
m D. Coherent Vectors and Exponential Hilbert
.4aQgY= WMd W)*Wt)& Space
s -cc
cc In the symmetric Fock space F+(K), a vector
-A(cr’Qg)Y= T(t)n(g)T(t)*Ycc(t)dt, of the form
s -cc
or, equivalently,

.4(x)= T(x”)cp(x)T(xo)*,
for f~ K is called a coherent vector. The set of
expf is linearly independent (in the algebraic
sense) and total. The inner product is given by
If .4(x) is a classical field, then q(x) and z(x)
are the value of A(x) and its time derivative at
x0 = 0, and they serve as initial data for the
Conversely, we can define F+(K) abstractly by
Klein-Gordon equation
introducing this inner product into the formal
(0,+m2)A(x)=0. linear combinations of expA f~ K, and by
completion. In this sense, 9+(K) is also de-
Consider next the antisymmetric Fock space
noted as exp K and is called an exponential
9-(K,,, @ Ki,*). For ,f+ EY’(R~,C~) (C?-valued
Hilbert space [S]. Then
rapidly decreasing C”-fUnCtiOnS), Write f=

(.f+ J-1, .f+ =(.fi J2X .f- =(f3Ar define .fk expx@ Kj = @ exp Kj,
1407 378 B
Semigroups of Operators, Evolution Equations

where expC@fi is identified with @ expfj. function spaces. Then it was generalized to the
If the number of indices is infinite, the right- theory of evolution equations as ordinary
hand side is the incomplete infinite tensor differential equations in infinite-dimensional
product containing the product of the vacuum linear spaces.
vector a.
If K = ja K, dp(l) and the measure p is
nonatomic, then for any measurable set S in
E, there corresponds a decomposition exp K = B. The Hille-Yosida Theorem
(exp K(S)) 0 (exp K(F)), where K(S)=
ss K,&(i) and SCis the complement of S in Let X be a tlocally convex topological linear
E, and an associated von Neumann algebra space, and denote by L(X) the totality of con-
R(S)=B(K(S))@ 1, where B(K(S)) is the set of tinuous linear operators defined on X with
all bounded linear operators on K(S). The values in X. A family { 7; 1t > 0) of operators
system {R(S)} forms a complete Boolean lat- 7; E L(X) is called a (one-parameter) semigroup
tice of type I factors on exp K. Coherent vec- of class (CO) or a strongly continuous semigroup
tors are characterized by the property of being if it satisfies the following two conditions: (i)
a product vector for {R(S)} in the sense that IT; T, = 7;+, (the semigroup property), To = 1 (the
for any S, AER(S) and A’ER(S’), the vector identity operator); and (ii) limt+ 7;x= TOx
Y = expf satisfies (VXE X, Vt, > 0). When X is a Banach space,
(ii) is implied by w-lim,lo IT;x =x (V’xc X), as
proved by N. Dunford in 1938. In this case
In this sense, we can interpret exp K as a con- there exist constants M > 0 and p > 0 such that
tinuous tensor product of exp K, and also, if (iii’) II 7;1/ <Meat (Vt > 0). Hence, considering
Y = SY’, +(A), exp Y as a continuous tensor emB17; in place of 7;, we can assume the equi-
product of exp Y’n [S]. continuity: (iii) For any continuous seminorm
p on X, there exists a continuous seminorm
q on X such that p( 7;~) < q(x) (Vx EX, Vt 2 0).
Such semigroups are called equicontinuous
References semigroups of class (Co) (abbreviated e.c.s.g.
(co)).
[l] V. Fock, Konfigrationsraum und zweite
Quantelung, Z. Phys., 75 (1932), 622-647. Example 1. X = L,(O, co) with co > p > 1.
[2] P. A. M. Dirac, The quantum theory of the
emission and absorption of radiation, Proc. (T+)(s)=x(t+s).
Roy. Sot. London, 114 (1927), 243-265.
[3] P. Jordan and E. P. Wigner, tiber das Example2.X=Z&co,co)with oo>p>l.
Paulische Aquivalenzverbot, Z. Phys., 47
(1928), 631-651. (7;x)(s)=(27ct)-“* Jymexp(v)x(u)du, t>O,
[4] F. A. Berezin, The method of second quan-
tization, Academic Press, 1966. (Original in =x(s), t = 0.
Russian, 1965.)
[S] H. Araki and E. J. Woods, Complete Example 3. X = BC( -co, CO).
Boolean algebras of type I factors, Publ. Res.
Inst. Math. Sci., A2 (1966), 157-242,451-452. (7;x)(s)=e-Ark~o~x(s-kii). t>O.
[6] J. Avery, Creation and annihilation oper-
ators, McGraw-Hill, 1976. Here 1 and p are positive constants. (For these
examples, we have // IT;11< 1; hence (iii) is satis-
fied.) For L, and BC - 168 Function Spaces.
We assume in the remainder of the article
that X is sequentially complete, that is, if a
sequence {x.} of X satisfies limn,m-tao p(x, - x,)
378 (XII.1 4) = 0 for every continuous seminorm p on X,
Semigroups of Operators then there exists a unique xcX such that
and Evolution Equations lim,,,p(x-x,)=0.
The infinitesimal generator A of an e.c.s.g.
(Co) { rI; 1t 2 0} is defined by
A. Introduction
(1)
The analytical theory of semigroups was
inaugurated around 1948 in order to define (This is also called the generator of IT;.) Then
exponential functions in infinite-dimensional we have the following results.
378 C 1408
Semigroups of Operators, Evolution Equations

(I) Differentiability theorem. For every D. Holomorphic Semigroups


complex number 1 with Re 1> 0, the resolvent
(A1 -A)-’ E L(X) exists and For an e.c.s.g. (Co) { 7; 1t > 0}, the following
three conditions are equivalent (K. Yosida,
(Al-A)-lx= “e-“‘rl;xdt (Vx E Xl, (2) 1963; the equivalence between (ii) and (iii) for
50 Banach spaces was proved earlier by E. Hille,
where the integration is Riemannian. Hence 1948): (i) When t > 0,
the domain D(A) of A is dense in X and coin- 7;‘x=I,iny(T,+,-7;)x
cides with the range R((,U - A))‘), and A is a
closed linear operator such that the family exists for all x E X and { (Ct 7;‘)” 1n = 1,2,
((A(A-A)-‘)“Ii>o, n=O,1,2 ,... } (3) and 0 < t $1) is equicontinuous for a certain
constant C > 0. (ii) When t > 0, 7; admits a
is equicontinuous. convergent expansion TL given locally by T,x
(II) Representation theorem. Let J” = = Zz,(n - t)" 7;(“)x/rr!. The extension exists
(I-n-IA) ml and consider the approximations for largil <arc tan(Ce-‘), and the family of
to 7;: operators {e-i T,} is equicontinuous in ,? for
largil <arctan(2-kCem1) with some positive
?;‘“‘x=P f (m!)-‘(ntJ”)mx, constant k. (iii) For the infinitesimal generator
II=0
A of 7;, there exists a positive constant C, such
that {(C, /l(il- A)-‘)“} is equicontinuous in
Then n = 1,2, and in 3, with Re(i) 2 1 + E, E> 0.
An e.c.s.g. (Co) { 7;} satisfying the above condi-
TXL = lim T(“)x = lim pcn)x
n-co f n-cc f (5) tions is called a holomorphic semigroup.
For example, introduce
uniformly on every compact set oft.
a+im
(III) Converse theorem. Let a linear opera- f,,,@)=(27q ezA-trn dz,
tor A with both dense domain D(A) and range s a-ice
R(A) in X satisfy the condition (n1- A) ml E
120, t>o, cr>o, O<a<l,
L(X) for n = 1,2, . . Then a necessary and
sufftcient condition for A to be the inlinitesi- =o, /.<O, (6)
ma1 generator of an e.c.s.g. (Co) is that the
family of operators where the branch of za is taken so that Rez”>
0 for Re z > 0. Following S. Bochner (1949),
{(I-n~‘A))ml.=1,2 ,... ;m=O,l,... } (3’) we define
be equicontinuous. Since such a semigroup
{ 7; 1t 2 0) is uniquely determined by A, we can i;,,x=i;x= “f;,,(s)T,xds, t > 0,
s0
write 7; = exp(tA).
These three theorems together are called the =x, t = 0, (7)
Hille-Yosida theorem or the Hille-Yosida-
Feller-Phillips-Miyadera theorem. from a given e.c.s.g. (Co) { 7; I t > 0). Then
{z,, I t >O} is a holomorphic semigroup (Yo-
Examples of Infinitesimal Generators. A = sida, T. Kato, and A. V. Balakrishnan, 1960).
d/ds for example 1 above, A = 2 m’d2/ds2 for Its infinitesimal generator A^, can be consid-
example 2, and (Ax)(s)=~,(x(s-p)--(s)) for ered as the fractional power ( -A)” of -A,
example 3. multiplied by -1.
Fractional powers (-A)“, aE C, of operators
have also been defined for operators A satisfy-
ing the weaker condition than (3’) that {A(1 -
C. Groups
A)-’ 11>0} is equicontinuous (Balakrish-
nan, H. Komatsu). If A is such an operator,
An operator A in a Hilbert space X gener- -m generates a holomorphic semigroup
ates a group { 7; 1 --co < t < co} of tunitary and the unique uniformly bounded solution of
operators of class (CO) satisfying 7; T, = T,,, the “elliptic” equation
for --co < t, s < co if and only if A is equal
x;‘= -Ax,, t > 0, lim x,=x0 (8)
to iH for some +self-adjoint operator H (M. r-0

H. Stone’s theorem, 1932). In a locally con-


is the solution of
vex space, a necessary and sufficient condi-
tion for a given e.c.s.g. (Co) {T, 1t 2 0} to be xi= --J-A xtr t>O, 1-ro
limx,=x,,
extendedto an equicontinuous group of class
(Co) {IT; 1 --co <t < co} is that the family (3’) and therefore x, = exp( - t-)x, (Balakrish-
be equicontinuous also for n = f 1, +2, . . nan). Equation (8) has also been discussed by
1409 378 H
Semigroups of Operators, Evolution Equations

M. Sova and H. 0. Fattorini from a different 11x--yII for x, VEX) in a Hilbert space X, an
point of view. analog of the Hille-Yosida theorem is known
(Y. KBmura, 1969). This result has been par-
tially extended to Banach spaces (- 286 Non-
E. Convergence of Semigroups linear Functional Analysis X).

Let a sequence {exp(tA,) 1n= 1,2, . . . } of e.c.s.g.


(C”) be equicontinuous as a family of operators
G. The Evolution Equation
EL(X). Then a necessary and sufficient con-
dition for there to exist an e.c.s.g. (Co) exp(tA)
such that lim,,,(exp(tA,))x=(exp(tA))x uni- Let ?; = exp(tA) be an e.c.s.g. (CO). Then for
formly on every compact interval oft is that x 6 D(A),
lim,,, (LoI - A,)-‘x = JAOx exist (for some 3Lo
7;‘x= A7;x( = TAX). (9)
with Re 1, > 0 and for all x E X) and be such
that R(JAO) is dense in X (H. F. Trotter, Kato). Considered in suitable function spaces, the
tequation of heat conduction (A= A = the
tlaplacian), the tSchr6dinger equation (A =
F. Miscellaneous Semigroups J-1 (A - V(x))), and the twave equation
given in matrix form
(i) Distribution semigroups. The semigroup
of translations (TX)(S) = x(t + s) in X =
L,( -co, co) is not continuous and hence not
measurable in t. However, IT;x is an X-valued are all of the form (9). For a linear operator A,
distribution. For semigroups (7;) such that 7;~ in X depending on t, the ordinary differential
is an X-valued distribution for x E X, an ana- equation in X
log of the Hille-Yosida theorem is known
(J.-L. Lions, 1960). It has been generalized to 4 = Ax, +fW, tao, (10)
ultradistribution semigroups by J. Chazarain is called the evolution equation. A family of
and to hyperfunction semigroups by S. Ouchi. operators { V(r, s) 1r > s > 0) in X which gives
(ii) Dual semigroups. The above semigroup general solutions to the homogeneous evolu-
{‘I;} of translations in L,( -co, co) is obtained tion equation
as IT;= SF from the e.c.s.g. (Co) {St} defined by
(S,x)(s)=x(s-t) in &(-co, co). Let B=d/ds be x; = A,x, (11)
the infinitesimal generator of {St}. The restric- (i.e., for any s>O, aeD( x,= V(t,s)a is a
tion of {SF} to the space of uniformly con- solution to (11) for x, = a) is called the evolu-
tinuous functions, which is the closure of the tion operator associated with the generators
domain D(B*) of the dual B* in L,( -co, co), {A,}. An evolution operator { V(r, s)} satisfies
is an e.c.s.g. (CO). This fact holds for the semi- (i) V(r, r) = I, (ii) V(r, s) V(s, t) = V(r, t). The
group {SF} of an e.c.s.g. (Co) {St} in a Banach solution to (10) is formally expressed by
space X in general (R. Phillips, 1955) and also
in a locally convex space. x, = V(t, 0)x, + * V(t, s)f(s)ds. (12)
(iii) Locally equicontinuous semigroups. The s0
infinitesimal generator A of the semigroup of
translations (7;x)(s)=x(t+s) in X=C( -00, co)
is d/ds. A has no resolvent since all complex
H. Integration of the Evolution Equation
numbers are eigenvalues of A. {T} is not
equicontinuous but locally equicontinuous,
i.e., {T IO < t < tl} is equicontinuous for any For equation (11) we have the following result
t 1 > 0. For locally equicontinuous (Co) semi- (Kato, 1953; Yosida, 1966). Assume the follow-
groups an analog of the Hille-Yosida theorem ing four conditions: (i) D(A,) is independent of
is obtained by using the notion of generalized t and dense in the Banach space X, and for all
resolvents (T. KGmura, 1968; Ouchi, 1973). a>O, (I-ccA,)-‘EL(X) with the estimate i/(1-
(iv) Differentiable semigroups. The notion of WA,)-‘11 < 1; (ii) B,,,=(I-A,)(I-A,)-’ is uni-
the holomorphy of semigroups in Section D is formly bounded in the norm for 0 <s, t < 1;
weakened to the differentiability. A character- (iii) C;Z,i IIBtj+,,to- B,+,. II < N, where N is inde-
ization of a semigroup {T,} such that T,x is pendent of the partition (0 = to < t, < . < t, =
infinitely differentiable in t > 0 is given by I); (iv) B,,. is weakly differentiable with respect
using the resolvent of the infinitesimal gen- to t such that the differentiated operator
erator (A. Pazy, 1968). dB,,/at is strongly continuous in t. Under
(v) Nonlinear semigroups. For a (Co) semi- these assumptions, we can prove that for x0 E
group { ?;} of contractions (i.e., 117;~ - 7;yll< D(A,), the limit V(t, 0)x, = s-lim,,, b(t, 0)x,,
378 I 1410
Semigroups of Operators, Evolution Equations

with (Lions, 1961). In order to obtain the unique-


ness or the differentiability of weak solutions,
~(I.s)=(I-(I-~)A(~))-l we need some additional conditions.
(ii) Some properties of strong solutions. Let
x(I-~*(~))-’ X be a Banach space. Let every semigroup
{T,“)} generated by A, be holomorphic in a
complex sector 1arg 1”I < 0,O > 0, independent
x x(I-;*(yy oft. Suppose one of the following conditions
holds: (1) For some x, 0 < c(< 1, D(&) is in-
dependent of t and for 1 -x < p < 1,
x(I-(Ep+(!!q
l/(A;-A:)AJ <C’lt-sl”, t,sE[:O, 1]
(0 <s < t < /), exists and gives the unique solu- (P. E. Sobolevskii, 1958-1961; Kato, 1961);
tion of (11). If f(t) is continuously differenti- (2) A,’ is differentiable in t,
able, the right-hand side of (12) exists and
gives a unique solution to the inhomogeneous //dA,‘/dt-dA,‘/dsll <C’lt-sl”
equation (10). for some C’>O, O</J<l, and

I. The Evolution Equation of Parabolic Type

Equation (lo), for which every A, is the in- for every I:Iargil>n/2-Ofor some N, O<
finitesimal generator of a holomorphic semi- c(< 1 (Kato and H. Tanabe, 1962). Then a
group, is said to be of parabolic type by ana- differentiable evolution operator { V(t, s)} as-
logy to parabolic partial differential equations. sociated with (11) exists.
Under weaker conditions, especially without The most interesting property of evolution
the condition that D(A,) is independent oft, equations of parabolic type is the analyticity
the existence of solutions of an equation of this of solutions. If A, is holomorphic in t in a cer-
type is obtained. Moreover, differentiability or tain sense, then the solutions are holomorphic
analyticity of solutions follow from some (Tanabe, 1967; first noted by Komatsu, 1961).
natural assumptions. Furthermore, a characterization of evolution
(i) Existence of weak solutions. Let X be a operators { V(t, s)} holomorphic in some com-
Hilbert space. For t E [0,1], let r/; be a subspace plex neighborhood of [0,1] (called holomorphic
and at the same time a Hilbert space with evolution operators) is obtained by using the
respect to a norm 111.lllf stronger than //. /(. resolvent of A, (Kato and Tanabe, 1967; K.
Since the form (.4,x, y) is tsesquilinear (linear Masuda, 1972; - [S]).
in x and antilinear in y), we get a sesquilinear
functional a(t, , .) on 1/; x y such that
J. Application to Semilinear Evolution
4LX, Y)’ -V,X,Y), X,Y~D(A,), Equations

if D(A,) is dense in 1/; with respect to 111‘IlIt and The evolution equation with a nonlinear addi-
if
tive term f’(t, x,): xi = A,x, +f(t, x,) can be writ-
ten as an inhomogeneous integral equation x,
= V(t, 0)x, + & V(t, s)f(s, x,)ds in the Banach
v, a(t, -, .) should be measurable in a certain space X, by means of the evolution operator
sense. A solution x, of the equation (10) in
{ V’(t, s)} introduced in Section G. The exis-
[0, I] satisfies tence, differentiability (Kato, H. Fujita, and
I Sobolevskii, 1963-1966), and analyticity
a(t,x,,u,)dt- f(x,:u;)dt (Masuda, 1967) of solutions of the Navier-
s0 s0
Stokes equation has been obtained by reduc-
ing it to an integral equation of this type.
= b-(tb,)dt+(Xo:uo) (13)
s0 Concerning quasilinear equations in which
for any differentiable X-valued function u, such A, depends on xc, the existence, differentia-
bility, and analyticity of their solutions have
that u,EI/;, ~,=O,~h~~uJ~dt<rn, and~~llu~l12dt
< cx3.A solution x, of (13) is called a weak been discussed.
solution of equation (lo), though it does not
necessarily satisfy (10). If the relation References
~(~,x,x)+~.llxl12~alllxll11, xe V,: [ 1] E. Hille and R. S. Phillips. Functional
holds for some i, a > 0, a weak solution of (10) analysis and semi-groups, Amer. Math. Sot.
in the sense of (13) exists for a given x0 E X Colloq. Pub]., 1957.
1411 379 c
Series

[2] N. Dunford and J. T. Schwartz, Linear converges, the a,+0 as n-co, but the converse
operators I, Interscience 1958. is not always true.
[3] J. L. Lions, Equations differentielles oper- Elementary properties of the convergence of
ationnelles et problemes aux limites, Springer, series are: (1) If z a, and C b. converge to a, b,
1961. respectively, then x(a, + b,) converges to a + b.
[4] K. Yosida, Functional analysis, Springer, (2) If C a, converges to a, then C ca, converges
1980. to ca for any constant c. (3) Suppose that {b,,,}
[S] S. G. Krein, Linear differential equations is a subsequence of {a,} obtained by deleting a
in a Banach space, Amer. Math. Sot., 1971. finite number of terms a, from {a,}. Then x b,
(Original in Russian, 1967.) is convergent if and only if Z a, is convergent.
[6] P. L. Butzer and H. Berens, Semigroups of (4) When a series C a, converges to a and {b,,,}
operators and applications, Springer, 1967. is a sequence such that b, = a, + a2 + . . . + ql,
[7] R. W. Carroll, Abstract methods in partial bz=a,l+,+a,l+,+...+ur2,b,=a,Z+1+...+
differential equations, Harper & Row, 1969. a Is> ‘..1 then z b, also converges to a. The
[S] H. Tanabe, Equations of evolution, Pit- converse, however, is not always true. For
man, 1979. (Original in Japanese, 1975.) example, 1 - 1 + 1 - 1 + . . . is oscillating, but
(l-l)+(l-l)+,..=O.

379 (X.18) B. Series of Positive Terms


Series
Suppose that C a, is a series of positive (or
nonnegative) terms. Since its partial sums s,
A. Convergence and Divergence of Infinite form a tmonotonically increasing sequence,
Series the series is convergent if and only if {sn} is
bounded. For example, the series xzl n-p
Let {a,} (n = 1,2,3,. . . ) be a sequence of real or (p > 0) converges if p > 1 because s,, < 2p-1/
complex numbers. Then the formal infinite (2P-’ - l), whereas it diverges if p ,< 1 because
sum a, + a, + . . is called an infinite series (or ++I > 1 + (m + 1)/2. The geometric series
series) and is denoted by x.“=i a, or C a,. The Z,“, a”-’ (a > 0) converges for a < 1 because
number a, is the nth term of the series C a,, s, = (1 - a”)/( 1 -a), and diverges for a 2 1 be-
ands,=a,+a,+...+a,isthenthpartialsum cause s, > n.
of C a,,. Also, for a finite sequence a,, a,, . . , a,, Some criteria for the convergence of a series
thesuma,+a,+... + a, is called a series. To X a, of nonnegative terms are: (1) If {a,) is
distinguish these two series, the latter is called monotone decreasing, then the series C a, and
a finite series. In this article, series means an Z 2”a,” have the same convergence behavior
infinite series. If the sequence of partial sums (Cauchy’s condensation test). (2) Suppose that
{s,} tconverges to s, we say that the series C a, f(x) is a positive monotone decreasing func-
converges or is convergent to the sum s and tion defined for x > 1 such that f(n) = a, (n =
write C,“=i a, = s or C a, = s. If the sequence 1,2,. . . ). Then the series C a, and the inte-
{s,} is not convergent, we say that the series gral JFf(x)dx have the same convergence
diverges or is divergent. In particular, if {s”} is behavior (Cauchy’s integral test), for example,
divergent to +co (-co) or toscillating, we say En-“(p>O) and s?x -Pdx. (3) If for any posi-
that the series is properly divergent to +co tive constant k we have a,, < kb, except for a
(-00) or oscillating, respectively. finite number of n, then the convergence of
The notation I: a, is customarily used for Z b,, implies the convergence of C a,. If kb, ,<
both the sum s of the convergent series and the a, and C b, diverges, then Z a, also diverges
formal series, which may or may not be con- (comparison test). (4) Let a, > 0 and b. > 0. If
vergent. When the series is convergent, the a,+,/a, ,< b,+,/b,, except for a finite number of
sum is sometimes called the Cauchy sum in values of n and C b, converges, then X a, also
contrast to the “summations” of series, which converges; if a,+,/a,>, b,+,/b, and Z b, di-
are not necessarily convergent (- Sections K verges, then C a, also diverges (- Appendix A,
ff.). Table 10).
Applying the Xauchy criterion for the
convergence of a sequence, we see that a neces-
sary and sufficient condition for C a, to be C. Absolute Convergence and Conditional
convergent is that for any given E> 0, we can Convergence
take N sufficiently large so that
A series C a, (with real or complex terms a,) is
IS,--S,J=Ja,+,+...+a,JcE
~ called absolutely convergent if the series C 1a,[
for all m, n such that M > n > N. Hence if X a, is convergent. If a convergent series is not
379 D 1412
Series

absolutely convergent, then it is called con- is monotone and converges to zero, then the
ditionally convergent. An absolutely convergent power series C b,,z” of a complex variable z is
series is convergent. A real series C a, whose convergent on the unit circle lzl = 1 except at
terms have alternating signs is called an alter- most for z = 1; the case z = -1 gives Leibniz’s
nating series. An alternating series C a, is test for alternating series (- Section C).
convergent if the absolute values of terms IanI
form a monotone decreasing sequence which
converges to zero (Leibniz’s test). An abso- E. Double Series
lutely convergent series remains absolutely
convergent under every rearrangement of A sequence with two indices, i.e., a mapping
terms and retains its sum under the rearrange- from the Cartesian product N x N of two
ment (Diricblet’s theorem). If a series with real copies of the set of natural numbers N to a
terms is conditionally convergent, then it is subset of the real or complex numbers, is
possible to rearrange its terms so that the called a double sequence and is denoted by
rearranged series converges to any given num- {a,,} or {a,,,}. If there exists a number 1 such
ber, diverges to +cc (or -co), or is oscillating that for any positive E there is a natural num-
(Riemann’s theorem). A convergent series ber N(E) satisfying 1umn - II <E for all m > N(E)
whose convergence behavior is unaffected by and n > N(E), then we say that the sequence
rearrangement and whose sum remains un- {a,,} has a limit 1 and write lim,,, ,n-tm umn = 1.
changed is called unconditionally convergent This limit should not be confused with re-
(or commutatively convergent). A real or com- peated limits such as lim,,, tm,,:, a,,,,,). If
plex series is unconditionally convergent if and lim m+m%n = a n uniformly in! and lim n-a 4
only if it is absolutely convergent. The notion = I, then lim m-m,n+m amn = 1. For a given
of infinite series can be extended to any com- double sequence {a,,}, the formal series
plete tnormed linear space, and absolute con- z&=1 %n is called a double series and is some-
vergence can be defined by replacing the ab- times denoted by C a,,,“. In contrast with
solute values of the terms by the norm of the double series, the ordinary series discussed
terms. However, in general, unconditional previously is called a simple series.
convergence is not always equivalent to ab- Given a double series Xumnr when the double
solute convergence. sequence of partial sums s,, = C& Cy=i ukl
is convergent to s, then Cum” is said to be
convergent to the sum s. On the other hand,
if s,, is not convergent, C a,,,” is said to be
D. Abel’s Partial Summation
divergent. If X:1 a,,,” converges to I),,, for
each m, then C,“=, b, = C,“=, (C,“=, cl,,,,,) is called
Let {aO,a,,a,, . . . } and {b,, b,, b,, . } be
the repeated (or iterated) series by rows. If
arbitrary sequences, and put A, = a, + a, +
cz i amn converges to c, for each n, then
+ a, for n > 0. Then the following formula
C.“=, c, = C.“=,(C,“=, umn) is called the re-
of Abel’s partial summation holds:
peated (or iterated) series by columns. Even if
“+k “+k two repeated series by rows and columns are
C a&,= C Uv-b,+J-A&n+, convergent, the two sums are not always iden-
v=n+, v=n+,
tical, and C umn is not always convergent.
However, if the double series C umn is conver-
for any n > 0 and any k > 1; this formula also gent and C,, a,,,” is convergent for each m, then
holds for n= -1 if we put A_, =O. the repeated series by rows is convergent to
Abel’s partial summation enables us to the same sum. A similar statement is valid for
deduce a number of tests of convergence for the repeated series by columns.
series of the form C a,b,,. In particular, the Suppose that we are given a double series
following criteria are easy to apply: x umn of nonnegative terms. If any one of
(1) C u,b, is convergent if x a, is convergent c m,.%m LCn%m and C, C, a,,,,, is conver-
and if the sequence {b,} is monotone and gent, the other two converge to the same sum.
bounded (Abel’s test). If the diagonal partial sum s,, = C& x:1 ukl
(2) Cu,b, is convergent if the sequence {s”} of converges to a, then the double series C a,,,,,
partial sums of C a, is bounded and if {b,} is also converges to a.
monotone and converges to zero (Dirichlet’s If C 1umn1 converges, the double series Z urn,,
test). is called absolutely convergent, whereas if C a,,,”
(3) C a, b, is convergent if C(b, - b,,,) is ab- converges but not absolutely, then Z umn is
solutely convergent and if C a, is (at least called conditionally convergent. If C a,, is
conditionally) convergent (test of du Bois- absolutely convergent, then any series ob-
Reymond and Dedekind). tained from z a,,,” by arranging the terms in an
For example, criterion (2) implies that if {b,,} arbitrary order is convergent to the same sum.
1413 379 I
Series

F. Multiplication of Series H. Termwise Differentiation of Infinite Series


with Function Terms
The series xc.“=, cnr where c, = a, b” + a,b,-,
+ . . . + anbl, is called the Cauchy product of Uniform convergence of an infinite series
two series C.“=1 a, and X$1 b.. (1) Let x a, and Xf.(x) is defined by uniform convergence of
C b” be two convergent series and A, B be the the sequence of the partial sums z;=lfk(x)
sums of these series. If their Cauchy product (- 435 Uniform Convergence). If the infinite
Cc, is also convergent, then it has the sum C = series Cf.(x) defined on an interval I of the
AB (Abel’s theorem). (2) If at least one of the real line is convergent at least at one point of I
two convergent series x a, and C b. with the and Cf.‘(x) is convergent uniformly in I when
sums A, B, respectively, is absolutely conver- the derivatives f;(x) exist, then zcf”(x) is con-
gent, then their Cauchy product Cc, is also vergent to f(x) uniformly in I, and Cf.(x) is
convergent and has the sum C= AB (Mertens’s termwise differentiable, that is, f’(x) = Cf,‘(x).
theorem). (3) If C a, and x b. are absolutely If the cp,(z) are holomorphic in a complex
convergent, then their Cauchy product Cc, is domain D and z cp,(z) converges to p(z) uni-
absolutely convergent (Cauchy’s theorem). (4) formly on every compact subset of D, then
Let C a, and C b. be two convergent series C q;(z) also converges to q+(z) uniformly on
with the sums A, B, respectively. If {na,} and every compact subset of D (Weierstrass’s
{nb,} are bounded from below, then z c, is theorem of double series). (For termwise inte-
convergent and has the sum C = AB (Hardy’s gration - 216 Integral Calculus.)
theorem).

I. Numerical Evaluation of Series

G. Infinite Product In some special cases, we can express the nth


partial sum s, of a series C a, as a well-known
function of n. Specifically, if Z a, is an arith-
Let {a,} be a given sequence with terms a, #
metic progression x:=1 (a + (k - 1)d) or a geo-
0 (n = 1,2, . . . ). The formal infinite product
metric progression J& uq’-‘, we have
u,.u,.a,.... is denoted by Hz=, a,. We call
p” = a, . a, . . . . a, its nth partial product. If the
s.=5(2u+(n-l)d), s”=p
&” - 1)
sequence {p,} is convergent to a nonzero limit q-l ’
p, then this infinite product is said to converge
to p, and p is called the value of the infinite respectively. If lql< 1, then CPouq” converges
product. We write n a, = p. If {p”} is not con- to a/( 1 - q). If B,+l (x) is the (r + 1)st tBer-
vergent or is convergent to 0, then the infinite noulli polynomial, then s, = l’+ 2’+ . . . + n’=
product is called divergent. Sometimes we [B,+I(x)];‘l/(r + 1). This sum was studied by
J. Bernoulli, who gave formulas up to r = 10 in
consider the infinite product n a, with a, = 0
for a finite number of n’s; and then by conver- his Ars conjectundi.
gence or divergence of n a, we mean that of In the series C u,, if we can find another
the infinite product n ah, where the sequence sequence {v”} such that u, = v, - v”-~, then s,
{u:} is obtained by deleting zero terms from =u,+u,+... + u, = v, - vO. For example, if u,
{a,}. Usually we do not treat an infinite prod- =n(n+l)(n+2), then v,=n(n+l)(n+2)(n+
3)/4 and s, = v, because v0 = 0 (- 104 Dif-
uct with a, = 0 for an infinite number of n’s
ference Equations). Series with trigonometric
A necessary and sufficient condition for
function terms are calculated analogously.
Hz=, a, to be convergent is that for any posi-
There are cases where the sum x a, itself can
tive E there is a number N such that Ip,/p, -
11 <E for all n, m > N. If n a, converges, then be expressed in a satisfactory form although
a,+ 1, but the converse is not always true. we cannot find an appropriately simple ex-
pression for each partial sum s,. For example,
It is often convenient to write an infinite
c(r) = C,“=l l/mr can be represented by tBer-
product as n(l + a,). Then n( 1 + a,) and
x log( 1 + u,) have the same convergence be- noulli numbers if r is even. In particular, c(2) =
7?/6, c(4) = a4/90 (- Appendix A, Table 10).
havior, where the imaginary part i0 of the
logarithm is assumed to satisfy 0 < 101<R. If If an infinite series converges rapidly, we can
a, > 0, convergence of n( 1 + a”) implies con- get a good approximation by taking a suitable
vergence of C a,, and vice versa. partial sum. On the other hand, if the series
If n(l +la”l) converges, then n(l +a.) is converges less rapidly, an effective means for
evaluating series is afforded by transformation
said to be absolutely convergent. An absolutely
of series. If the kth tdifference is exactly zero,
convergent infinite product is also convergent,
then
and the value of the infinite product is un-
changed by the alteration of the order of
terms.
379 J 1414
Series

Since the absolute value of finite differences holomorphic except at poles a, (n := 1,2, . , k)
often decreases rapidly, it is sometimes con- in a domain bounded by the closed curve C
venient to consider the series whose terms are and containing the points z = m (m = 1,2, . . . ,
the differences of the original series. One finite N), then
difference method is Euler’s transformation of
infinite series. In particular, the formula

-i, Res[n(cot 7cz)f(z)lZZa,.

is useful for numerical calculation of sums of


When the left-hand side of this equation is
slowly converging alternating series. In numer-
replaced by
ical calculation of the series, we usually start
calculating the numerical values of the first few
terms; we then apply such transformations as
? (-l)“f(m),
Euler’s to the remainder, and calculate the
we replace cot nz by cosecnz. Res[F(z)],,, is
partial sums of the transformed series.
the tresidue of F(z) at z = a. The line integral
When we calculate the sum of an infinite
along C is often calculated easily by choosing
series approximately, we must estimate the
a suitable deformation of C. Sometimes it
error, i.e., the remainder that must be added to
can be shown immediately that the integral
yield the sum of the series itself. We can esti-
along C is zero, or its asymptotic value can be
mate the maximum error by derivatives or
evaluated by the tmethod of steepest descent.
differences of higher orders. We also have the
transformations of Markov and Kummer. In
the former, every term of the series is repre- K. History of the Study of Divergent Series
sented by convergent series, and in the latter,
the given series is reduced by subtracting an- Mathematicians in the 18th century did not
other convergent series, which has a known concern themselves with the question of
sum and similar terms Cl]. whether tseries were tconvergent or tdivergent.
This indiscriminateness led to various con-
tradictions. In 1821 an exact definition of the
J. Infinite Series and Integrals notion of convergence of series (Section A) was
given by A. L. Cauchy; since then, mathema-
In numerical calculation of functions, we ticians have mostly concerned themselves with
sometimes use the Euler-Maclaurin formula convergent series. However, since divergent
c51: series appeared in many problems in analysis,
the study of such series could not be neglected,
f(z) dz and it became desirable to give a suitable
definition of their sum. Although some results
were given by L. Euler, N. Abel, and others, it
+R,,
was during the latter part of the 19th century
that methods of summation of divergent series
l m-4 were studied systematically. This study consti-
R,= --w Tf’m’(x + wz) dz,
s0 tuted a new branch of mathematics.
In the following sections, some important
where
methods of summation of divergent series
are mentioned. Cesaro’s method (-- Section
M) was the forerunner of the theory whose
general foundation is now the theory of linear
The speed of the convergence for this formula
transformations.
is greater than that for Taylor’s expansion
when tw is large, since the terms of the for-
mula are +Bernoulli polynomials B,(t) in L. Linear Transformations
0 < l< 1. We also have Boole’s formula, with
+Euler polynomials as its terms [4]. The for- For a sequence {s”} (n = 0, 1,2, . ) of real or
mulas discussed in this section are also used to complex numbers, assume that o,, = C,p”_, anisi
calculated approximately the partial sums of converges for n = 0, 1,2, , where (,uik) is a
infinite series. given matrix (i, k = 0, 1,2,. . ). The mapping
Another method of evaluating sums of T: {s,} + {on} is called a linear transformation,
infinite series analytically entails transforming and (on} is called the transform of {s”} under
infinite series to definite integrals using the T. If the matrix satisfies uik =O (k > i), then T is
tresidue theorem. If an analytic function f(z) is defined for any sequence {s”} and T is said to
1415 379 M
Series

be triangular. If the transform {on} under T is lim,,, Z,& ank = 1 for each K imply that T is
defined and convergent whenever {s,} con- regular (Perron’s theorem).
verges, then T is called a semiregular transfor- (3) Schur’s theorem. In order that T be
mation. If in addition {a,} has the same limit normal it is necessary and sufficient that (i); (ii);
as {s”}, then T is called a regular transforma- (iii); and (iv) for any E> 0 there exist a K > 0
tion. If for any bounded sequence {s,} the such that Cgx+i lank1 <E for each n.
transform {a,} is defined and convergent, then (4) In order that the regular triangular trans-
T is called a normal transformation. If T is tri- formation T be totally regular, it is necessary
angular and the transform {on} of {s,,} under and suffkient that ank 2 0 except for a finite
T is divergent to co whenever s,+co, then T number of k.
is called a totally regular transformation.
Let T be a regular transformation. If for at
M. Cesaro’s Method of Summation
least one divergent sequence {s,} the transform
{a,} of {s,} under T converges, then T is called
We write
a method of summation. The limit s of {c”} is
called the sum of {s”} under the method T of
(l-x)-“-‘= f &x”,
summation, and {s”} is said to be T-summahle II=0
to s. For a given method of summation Tl, let
D(T,) be the set of sequences that are Tl-
summable. If D( TJ = D( Tz), then the methods
Tl and T, are called equivalent. If D( TJ c D( T,),
A;=
( >
n+a
n
N-
na
T(a+ 1) ’

(1 -X)-U-i 5 u,x”=(l -x)-u 2 s,x”


then we say that Tl is weaker than Tz and n=o n=O
T, is stronger than Tl . If D( TJ $ D( T,) and
D( TJ + D( T2), then Tl and T, are called mutu- =“~oSiX
ally noncomparable. The following theorems
on linear transformations of sequences are where s, = Z:Go ui. Thus the series x ui is as-
important: sociated with the sequence {si). If OF)= s,“/A;
(1) Kojima-Scbur theorem. In order that T converges to s as n-+ co, then we say that C u,
be semiregular it is necessary and sufficient is summable by CesAro’s method of order c( (or
that (i) lim n--rmank exist for each k; (ii) t, = simply (C, a)-summable) to s and write xzZo u,
x&, lank1 exist and {t.} be bounded; and (iii) = s (C, a). This method of summation is called
lim,,, Z& ank exist. In that case, we have Cesiro’s method of summation of order CI(or
simply (C, cc)-summation).
It is natural to consider (C, cc)-summa-
tion for a> -1. We say that Zu, is (C, -l)-
and summable if x u, converges and nu, = o( 1).
Here Es0 U, = s (C, 0) means lim,,, s, = s, and
Zgl u, = s (C, 1) means s = lim,,,(s, + si +
. . . + s,-Jn. Generally, we have the following
results:
(1) AZ is increasing if a > 0 and decreasing if
O>a> -1. AZ= 1 and Ai>0 if a> -1.
(2) s;+fl+l =C;=oA:-kS;, Yl:-Li:-, =A;-‘,
g-&-l =sn ol-1 .
(3) (C, a) (a > 0) is regular, and D(C, a) 1
In particular, in order that {a,} converge
D(C,jl) if a>/?> -1.
whenever s,+O it is necessary and sufficient
that conditions (i) and (ii) be satisfied. In that (4) If C u, = s (C, a), then u, = o(n’). More-
case, over, if C uk = s’ (C, a), then C(u, + ub) = s +
s’ (C, a) and C Au, = Is (C, a) for any number 1.
lim a, = lim f anksk (5) If C u, = s (C, a) and E uk = s’ (C, /I),
n-m n-m k=O then their tCauchy product x u, = ss’ (C, a +
m p + 1) (Chapman’s theorem). Moreover, if
= hm ank sk ~;=oA~ls~~~-‘(~:,-k)I/A~=O(l), then Xv,=
k=O
I(. n-m > ss’ (C, /I) (T. Kojima). If a’, /I’> -1, sf’)(u,)=
(I. Schur, J. Reine Angew, Math., 151 (1921); T. O(n”‘), and $‘)(ub) = O(r#‘), then C u, =
Kojima, Tbhoku Math. J., 12 (1917)). ss’ (C, a’ + 8’ + 2) (G. Doetsch).
(2) Toeplitz’s theorem. In order that T (6) For any integer a > 0, in order that C u,
be regular it is necessary and sufficient that = s (C, a) it is necessary and sufftcient that
(i’) lim,,, ank = 0 for each k; (ii); and (iii’) there exist {D”} such that u,=(n+ l)(o.-u.,,)
lim,,,,, cE,O ank = 1 (0. Toeplitz, Prace Mat.- and C u, = s (C, a - 1) (G. H. Hardy, Proc.
Fiz., 22 (1914)). In particular, (ii) and (i”) London Math. Sot., (2) 8 (1910)). This condi-
379 N 1416
Series

tion is equivalent to (i)-(iii) together: (i) the Borel’s exponential method to the sum s, and
series C&s,“-*/(k + 1). (k + a) converges to we write C u, = s (B). The transformation thus
the limit b,; (ii) b,, = o(1) as n+ co; and (iii) determined is denoted by B and is called
(s,“-‘/A;-‘)+(n+a)T(a)b,,+,+s as n+co. Borel’s method of summation. If
(7) If z u, = s (C, a) (a > 0), one of the follow- cc
ing five conditions is sufficient for the conver- u(x)e-“dx = s,
gence of C u, (this is a kind of TTauberian s0
theorem): (i) nu, = o( 1); (ii) t, = Et=1 vu, = o(n); then C u, is said to be summable by Borel’s
(iii) CnpIu.Ip+l <co(p~O);(iv)nu,>-K(Kis integral method (or d-summable) to s, and we
independent of n); (v) lim inf(s, - s,) > 0 as m > write I; u, = s (23). Then we have: (1) B is regu-
~-CD, m/n+ 1 (R. Schmidt’s condition). lar and D(C,a)cD(B) (a> -I), while D(C,a)
(8) Ifa’>a> -1 and Xu.=s(C,a’), then Zu,, and D(d) are noncomparable. (2) If the radius
=s (C,a+c) for any E>O. of convergence of CEO u,x” is > 1 and C u, =
For a given series 2 u,, we write H,” for s,, s(B), then Cu,=s (A). (3) If C;=k+l u,=s (B)
Hi for the arithmetic mean of { Hf}, and H,’ (resp. (%J)), then Xu,=s+u,+u, + . . +u,JB)
for the arithmetic mean of {Hi}. Similarly, we (resp. (d)), but the converse is not always true.
can define {Hl} for any integer p. If H,P+s (4) Cu,=s (B) implies Iu,I1’“=o(n). (5) If Cu,=
as n+ co, then C u, is said to be summahle s(B) and Cub=s’ (B), then C(u.+ub)=s+s’ (B),
by Hiilder’s method of order p (or (H, p)- and C iu, = is (B) for any constant 1. The
summable) to s, and we write C u, = s(H, p). same is true for summation (d). (6) If C u, = s
For any integer p b 0, (H, p)-summability is (B) and if one of the following two conditions
equivalent to (C, p)-summability (Knopp- is satisfied, then C u, =s: (i) & u, = o( I);
Schnee theorem). (ii) lim inf(s, -s,) > 0 as m > n+ cc and (m -
n)$-0. (7) if Cu,=s (B) and si1: =o(n”-“*),
then Cu,=s (C,a) (a>O). (8) If Cu,=s (A) and
N. Abel’s Method of Summation u(t)> -MM~-‘expt, then Xu,=s (B). (9) If the
sequences {nk} and {nkS} satisfy nk+l > nkS,
If the radius of convergence of the power series n,~/n,>1+~(k=1,2,...;.z>O),u,=:O(nk<v<
z:n”Ou,r” is > 1 and C$=Ou,r”+s as r+l, then nkz), and Cu,=s (B), then s,~-s as k+co.
z IL, is said to be summable by Abel’s method
(or A-summable)
The transformation
and the transformation
to s, and we write C u, = s (A).
matrix is denoted by A,
is called Abel’s method
=d”u(4
si I
0
-xdx
IfCu,=s(b)and

Fe

of summation. converges for all i = 0, 1,2, , then C u, is


(l)IfCu,=s(A),thenlimsup,,,Iu,llin<l. said to be absolute Bore1 summable (or 1% )-
(2) If C u, = s (A) and C u; = s’ (A), then C(u, + summahle). Concerning this we have: (1) If
uk)s + s’ (A) and C lu, = Is (A) for any constant C I u, 1converges, then Z u, is 123I-summable,
i. Moreover, Czk+, u, = s - u0 - u1 - . . . - uk but even if C u, converges, C u, is not always
(A). (3) If C u, = s (A) and C u; = s’ (A), then 123I-summable. If Z u, is 1% I-summable, then
the Cauchy product is C u, = ss’ (A). (4) If C u,
C u, is 2%summable to s. In this case, we
= s (A) and one of the following five conditions write Cu,=s (ISI). (2) C.“=ou,=s (ISI) implies
is satisfied, then C u, = s: (i) nu, = o( 1); (ii) t, = C~,+,Un=S-(Uo+Ui+...+U~)(I~l).(3)1f
Et=1 vu, = o(n); (iii) nu, = 0( 1); (iv) nu, > -M; C u, = s (B), x uk = s’ (B), and if at least one of
(v) liminf(s,-s,)>O as m>n+co and m/n-+1. them is 123I-summable, then their Cauchy
These theorems are tTauberian, in the original product is Cu,=ss’ (IS]).
form proved by Tauber (Mom&. Math., 8
(1897)). (5) The matrix A is regular, and D(C, a)
cD(A)foranya>-l.(6)Ifzuu,=s(A)and P. Euler’s Method of Summation
s, > 0, then C u, = s (C, 1). Moreover, if @) =
O(l), then Cu,=s(C,a+E) for a> -1 and In the series x u,, if
&>O.

0. Borel’s Method of Summation


+
If for a given series C u,,
(s, = C;=. uk) converges to s as ke+ co, then C u,
u(x)=“go% is said to be summable by Euler’s method, and
we write C u, = s (E). The transformation thus
is convergent for all x, and u(x)/ex+s as x obtained is called Euler’s method of sum-
+ cx), then C u, is said to be summable by mation. A necessary and sufficient condition
1417 380 A
Set Functions

for 2 u, = s (E) is that C v, = s, where u, = as h+O, then C u, is called (R,)-summable or


2-(n+l)C;=o n uk. This summation (R,)-summable to s, respectively. The summa-
method
0k tion method (R,) is not regular, while (R2) is
is regular. We have C u, = s if C u, = s (E) and regular. If C u, is (R,)-summable, then it is
if one of the following two conditions is satis- also (R,)-summable, but (R, 2) and (R2) are
lied: (i) & u, = 0( 1); (ii) lim inf(s, - s,) 2 0 for noncomparable.
Other methods of summation were devel-
m>n+cc, (m-n)/&-*l. Cesaro and Euler
oped by G. H. Hardy and J. E. Littlewood, E.
summations are noncomparable. As an ex-
Le Roy, C. J. de La Vallee Poussin, and others
tension of Euler’s method, the Euler method
of summation of pth order is also defined
(e.g.,- C61).
For related topics - 121 Dirichlet Series,
(e.g.,- C61). 159 Fourier Series, 339 Power Series.

Q. Niirhmd’s Method of Summation


References

For a positive sequence {p,}, let P. = xt=,,py+


[l] K. Knopp, Theorie und Anwendung der
cc as n+cc and p,/P,-+O as n+oo. If
unendlichen Reihen, Springer, 1921; fifth edi-
tion, 1964; English translation from the second
edition, Theory and application of infinite
series, Blackie, 1928.
converges to s as n --* co, then C u, is said to be [2] T. J. PA. Bromwich, An introduction to
summable by Niirlund’s method of type (p,},
the theory of infinite series, Macmillan, second
and we write x u, = s (N, {p”}). The transfor- edition, 1926.
mation thus obtained is also regular and is For numerical computation of series,
called Nurlund’s method of summation. If [3] L. B. W. Jolley, Summation of series,
~u,=s(C,l)andO<p,<p,< . . . . thenCu,=
Chapman & Hall, 1925.
s (N, {p,}). Cesaro’s method is actually a [4] N. E. Norlund, Vorlesungen iiber Dif-
special case of this method. ferenzenrechnung, Springer, 1924.
For the Euler-Maclaurin formula,
R. M. Riesz’s Method of Summation [S] N. Bourbaki, Elements de mathematique,
Fonctions dune variable rtele, ch. 6, Actualites
Let (1,) be a sequence with increasing terms Sci. Ind., 1132a, Hermann, second edition,
and tending to +cc as n+co. If 1961.
For the summation of divergent series and
Tauberian theorems,
[6] G. H. Hardy, Divergent series, Clarendon
Press, 1949.
converges to s as r+co, then Cu, is said to be
[7] R. G. Cooke, Infinite matrices and se-
summable by Riesz’s method of order k and
quence spaces, Macmillan, 1950.
type d,, and we write C u, = s (R, A,, k). The
[S] K. Zeller, Theorie der Limitierungsverfah-
transformation thus obtained is regular and is
ren, Erg. Math., Springer, 1958; second edition,
called Riesz’s method of summation of the kth
1970.
order. In particular, if L, = n, then D(R, A,, k) =
[9] K. Chandrasekharan and S. Minakshisun-
NC, 4.
daram, Typical means, Oxford Univ. Press,
1952.
S. Riemann’s Method of Summation [lo] H. R. Pitt, Tauberian theorems, Oxford
Univ. Press. 1958.
If

g1U”($qk, u,=o,
converges for h>O and tends to s as h+O, then 380 (X.15)
C u, is said to be (R, k)-summable to s. When Set Functions
k = 1, this method, often called Lebesgue’s
method of summation, is not regular. When k =
2, it is ordinarily called Riemann’s method of A. General Remarks
summation and is regular. Corresponding to
A tfunction whose domain is a tfamily of sets
these cases, if
is called a set function. Usually we consider
set functions that take real values or fco.
For example, if f(x) is a real-valued function
380 B 1418
Set Functions

defined on a set X, and if we assign to each finite unions R of left open intervals in 1,.
subset A of X values such as sup,f; inf,f, or Then any additive interval function F(I) can
sup,f--inf,f; then we obtain a corresponding be extended to a finitely additive set function
set function. In particular, a set function whose F(R) defined on %(1,,). For the rest of this
domain is the family of left open intervals in article, it is understood that an additive inter-
R” is called an interval function. To distinguish val function means this extended set function.
between set functions and ordinary functions If for any E > 0 there exists a 6 > 0 such that
defined at each point of a set, we call the latter 111<6 implies F(I)<E (where 111 is the vol-
point functions. For example, if f(x) is an tinte- ume of the interval I), then we say that F is
grable (point) function with R ( = R’) as its continuous.
domain and we put F(I) = C,f(x) dx for I =
(a, b], then we obtain an interval function F
on R.
C. Completely Additive Set Functions

B. Finitely Additive Set Functions Let @(E) be a real-valued set function defined
on a tcompletely additive class B in a space X.
Let Q(E) be a real-valued set function defined If @ satisfies the complete additivity condition:
on a tfinitely additive class 23 in a space X. If
Ej,E,EB> EjnE,=O (jfk)
@ satisfies the finite additivity condition:

E,,E,E5 E,flE,=@ imply imply @

WE, U E,) = WV + WUr


then Q(E) is called a completely additive set
then B(E) is called a finitely additive set function (or simply additive set function) on 23.
function on 23. For each E E 23 we denote In this case the corresponding upper variation
sup{@(A)~AcE,A~23}(inf{@(A)~AcE,A~ V(E), lower variation y(E), and total variation
‘21)) by V(@,; E) (_V(@; E)), the upper (lower) vari- 1/(E) are all completely additive set functions,
ation of 0. Since Q(0) = 0, we have _V(@,;E) < and for every Es23 we have Q(E)= T/(E)+y(E)
06 v(q E). V(@; E)= v(@,; E)+ I_V(@,;E)I is (Jordan decomposition). Furthermore, V(E) =
called the total variation of @ on E. When sup&, I@(Ej)l, where the supremum is taken
we deal with a fixed @,, instead of v(@,; E), over all decompositions of E such that E =
I/(@,; E), V(@; E) we write simply 1/(E), I/(E), lJ/?=,Ej(Ej~d,EjnE,=O,j#k).‘~hecom-
V(E). If I’(@,; E) is bounded, then @ is said to pletely additive nonnegative set functions are
be of bounded variation. If Q(E) > 0 ( < 0) for the same as the finite measures. Hence the
every EEB, i.e., Ec E’ implies @(E)<@(E’) Jordan decomposition implies that every com-
(Q(E) 2 @(E’)), then @ is said to be monotone pletely additive set function is represented as
increasing (decreasing). Every finitely addi- the difference of two finite measures. A com-
tive set function of bounded variation can be pletely additive set function is also called a
represented as the difference of two monotone signed measure.
increasing finitely additive set functions. Any continuous additive interval function
Let 1, be a fixed interval in R” and F(I) be of bounded variation can be extended to a
an interval function defined for left open inter- completely additive set function. The notion
vals I c I,, where 0 is considered as a degener- of additive interval function of bounded
ate left open interval. If, for any two left open variation is a generalization of that of function
intervals I,, I, such that I, U I2 is an interval of bounded variation (- 166 Functions of
and I, n I, = 0, we have F(I, U I,) = F(I,) + Bounded Variation).
F(Z,), then we call F(I) an additive interval Let Q, be a completely additive set func-
function in I0 Specifically, if f(x) is a real- tion and p a finite or o-finite measure, both
valued bounded function on R and D is an defined on %3. If p(E) = 0 implies Q(E) = 0, then
interval function determined by D(I) =f(b) Q is said to be absolutely continuous with
-f(a), where I = [a, h) (i.e., D(I) is the incre- respect to p or p-absolutely continuous. Then @
ment of f(x)), then D is an additive interval is p-absolutely continuous if and only if for
function on R, called the increment function any E> 0, there exists a 6 > 0 such that p(E) < 6
of 1: For a given f the increment function is implies I@(E)1 <E. If for given @ and /* there
determined uniquely. Conversely, for a given exists an E,EB such that p(E,)=O and m,(E)
D, a function f such that D is its increment = Q(E f? E,) for every E E 23, then @ is said to
function is determined uniquely up to an addi- be singular with respect to p or p-singular.
tive constant. In this sense an additive interval In a to-finite measure space (X, $23,p), every
function in R may be identified with the corre- completely additive set function m(E) defined
sponding point function on R. on 23 can be represented uniquely as the sum
Let ‘%(I,) be the finitely additive class of all of a p-absolutely continuous set function and a
1419 381 A
sets

p-singular set function (Lebesgue decompo- regular sequence of sets belonging to 5 that
sition theorem). Also, Q(E) is p-absolutely converges to x, then there exists a finite or
continuous if and only if Q,(E) can be repre- countable set of disjoint E, E 5 such that
sented as the indefinite integral JEf(x)dp of a m( A \ lJ,2 1 Ej) = 0.
function f that is integrable on X with respect
to ~1(Radon-Nikodym theorem). This function
f(x) is called the Radon-Nikodym derivative, References
d@/dp, of @ with respect to p (- 270 Measure
Theory L (iii)). [l] H. Lebesgue, Lecons sur l’integration et la
recherche des fonctions primitives, Gauthier-
Villars, 1904, second edition, 1928.
D. Differentiation of Set Functions [2] S. Saks, Theory of the integral, Stechert,
1937 (Dover, 1964).
Let m be the Lebesgue measure in R” and [3] W. Rudin, Real and complex analysis,
E a Lebesgue measurable set. We denote McGraw-Hill, second edition, 1974.
sup(m(E)/m(Q)) for all cubes Q such that E c Q [4] H. L. Royden, Real analysis, Macmillan,
by r(E) and call it the parameter of regularity second edition. 1963.
of E. If for a sequence of sets {E,} there exists
an CLsuch that r(E,) > c1> 0, then {E,} is called
a regular sequence. If all the E, contain a point
P and the tdiameter of E, tends to 0 as n+ 00,
then we say that {E,} converges to the point P. 381 (11.1)
Let @ be a set function in R”. For a regular
sequence {E,} of closed sets converging to a
Sets
point P, we put I = lim sup(@(E,)/m(E,)) and
define the general upper derivative of @ at P to A. Definitions and Symbols
be the least upper bound of I for all such se-
quences {E,}, denoted by D@(P). Similarly, the G. Cantor defined a set as a collection of ob-
general lower derivative @m(P) of @ at P is jects of our intuition or thought, within a
defined to be the greatest lower bound of certain realm, taken as a whole. Each object in
liminf(@(E,)/m(E,)) for all regular sequences the collection is called an element (or member)
{E,} of closed sets converging to P. The or- of the set. The notation a E A (A3 a) means
dinary upper (lower) derivative, denoted by that a is an element of the set A. In this case
@(E)@)(E)), is defined in the same way by taking we say that a is a member of A or a belongs to
regular sequences of closed intervals instead A. The negation of a E A (AS a) is written a # A
of closed sets. o@, lJ@, @,q are point func- or acA (A$a or AKIN). The set having no
tions derived from @. Clearly, D@(P) <g(P) < element, namely the set A such that a$ A for
@(P) < D@(P). If D@(P) =_D@(P), then we write every object a, is called the empty set (or null
it simply as D@(P). If D@(P) is finite, then we set) and is usually denoted by 0. Two sets A
call it the general derivative of @ at P and say and B are identical, i.e., A = B, if every ele-
that @ is derivable in the general sense at P. If ment of A belongs to B, and vice versa. The
@(P)=@)(P), then we write it as Q(P). If m,(P) is set containing a, b, c, . . . as its elements is
finite, then we call it the ordinary derivative of said to consist of a, b, c, . and is denoted by
@ at P and say that @ is derivable in the ordi- {a, b, c, . . . }. The symbol {xl C(x)} (or {x; C(x)},
nary sense. We have the following theorems: sometimes E,[C(x)]) denotes the set of all
(1) A completely additive set function is deriv- objects that have the property C(x). Thus {a}
able in the general sense talmost everywhere is the set whose only element is a, and {a, b} is
(Lebesgue). The Radon-Nikodym derivative of the set with two elements a and b, provided
a set function absolutely continuous with that a # b. A set is called a finite set or an
respect to the Lebesgue measure is equal infinite set according as the number of its
almost everywhere to the generalized deriva- elements is finite or infinite.
tive of the set function. (2) An additive interval A set A is a subset of a set B if each element
function of bounded variation is derivable in of A is an element of B. In this case we also say
the ordinary sense almost everywhere (Lebes- that A is contained in B or that B contains A,
gue). (3) An additive interval function CDis and we write A c B and B 3 A. The negation
derivable in the ordinary sense at almost all ofAcB(BzA)isA#B(B$A).Forevery
points such that @i(P)< +co or q(P)> -co. setA,@cA.AcBandBcCimplyAcC.
For the proof of these theorems, Vitali’s IfAcBandBcA,thenA=B.Aisaproper
covering theorem is essential: Let A be a given subset of B (in symbols: A $ B, B $ A) if A c
set and 5 a family of measurable sets in Eu- B and A #B. Some authors use 2 ( 2) for c
clidean space. If for each x E A there is a (I),and c (3)for z(z).
381 B 1420
Sets

B. Algebra of Sets Generally the Cartesian product (or direct


product) of sets A, B, . . . , D, written A x B x
The union (join or sum) of sets A and B, x D, is defined as the set {(a, b, . . , d) 1a~ A,
written A U B, is the set of all elements which bEB, . . . . deD}.
belong either to A or to B or to both. The
intersection (meet or product) of sets A and B,
written A n B, is the set of all elements which C. Mappings
belong to both A and B. In other words, XE
AUBifandonlyifxEAorxsBorboth,and If there exists a rule which assigns to each
xEAnBifandonlyifxEAandxEB.Given element of a set A an element of a set B, this
sets A,B,and C, AUB=BUA, AflB=BflA rule is said to define a mapping (or simply
(commutative law); (A U B) U C = A U (B U C), map), function, or transformation from A into
(A n B) n C = A n (B n C) (associative law); B. The term transformation is sometimes re-
AU(BnC)=(AUB)fl(AUC), An(BUC)= stricted to the case where A= B. Usually
(A fl B) U (A n C) (distributive law); A U (A fl B) = letters L y, cp, $, stand for mappings. The
A, A n (A U B) = A (absorption law). expression f: A-B (ALB) means that f is a
Two sets A and B are disjoint if A fl B = 0. function which maps A into B. If f: A-B and
In this case the set C = A U B is said to be the a~ A, then f(a) denotes the element of B which
disjoint union (or sum) of A and B, and is is assigned to a by f: We call f(a) the image of
written sometimes as C = A + B. The set of a under f: The notation f: a /-+ b (or f: a + b)
elements of A which are not members of B is is often used to mean f(a) = b (but not in the
denoted by A - B, and is called the difference present volumes). The domain of a mapping
of A and B (or relative complement of B in A). f: A+B is the set A, and its range (or codo-
If A = B, A - B is called the complement (or main), written f(A), is the subset {f(a) 1LIE A}
complementary set) of B with respect to A. of B. Two functions f and g are equal (f=g)
We often consider a theory in which we if their domains coincide and f(a) = g(u) for
restrict our attention to elements and subsets each a in the common domain.
of a certain fixed set R, and call it the universal For a mappingf:A+B and a set CE‘$(A),
set of the theory. In geometic terms, R is also f(C) is defined to be the set { ,f(x) 1.x E C}. This
called the space or the abstract space, elements definition induces the mapping from p(A) to
of fi are called points, and subsets of fi point s@(B) which is usually also denoted by f: If
sets. If A is a subset of Q R - A is simply A,~q(A)(i=1,2),thenf(A,UA,)=f(A,)U
called the complement of A and is denoted A”. f(4) and ./IA1 n 4) cf(AJ nf(AJ. The
For AcR and BcQ AIBand ACcBC are inverse image of DE’@(B), denoted by J -l(D),
equivalent. Furthermore, we have A U A’ = 0, is defined to be the set {xjx~A,f(x)~D}; thus
AnA’=@, A’“=A;and(AflB)‘=A’UB’, the mapping f-’ :‘@(B)+‘Q(A) is defined. If
(A U B)” = A’ n B” (de Morgan’s law). B,~‘p(B)(i=1,2),thenf-‘(B,UBJ=,f-’(BJU
The power set of a set X, written ‘q(X), is f-‘(Bd;f-‘(B1 nB,)=f-‘(B,)nf~‘(B2);
the set of all subsets of X. A set whose ele- f-‘(B-B,)= A-,fm’(B,). Furthermore, A, c
ments are sets is often called a family of sets. f-I of andfof-‘(B,)cB,.
The pair consisting of objects a and b is A mapping g is an extension of a mapping f
denoted by (a, b). Two pairs (a, b), (c, d) are to a set A’ if A’ is the domain of g and contains
defined to be equal if and only if a = c and the domain A of i and if g(u) =f(u) for each a
h = d. A pair (a, h) is called an ordered pair, in A. In this case f is called a contraction (or
while the set {a, b} is sometimes called an restriction) of g to A or simply a partial map-
unordered pair. Generally the n-tuple ping of g, and is denoted by g 1A. A mapping .f
(a,h,c ,.._, d)ofngivenobjectsa,b,c ,..., d is the constant mapping (or constant function)
is defined to be (( . ((a, b), c), . . ), d), so that with the value b, if f(u) = b, for every a in the
(u, b, c, . . , d) =(a’, h’, c’, , d’) if and only if a = domain off: The identity mapping (or identity
a’, b = h’, , d = d’. The Cartesian product (or function) on A, often denoted by l,, is the
direct product) of sets A and B, written A x B, mapping with the domain A such that f(u) = a
is the set of all pairs (a, b) such that LZEA and for every a in A. Given two mappingsf: A-+B
beB.AxB=(ZIifandonlyifeitherA=@or and g : B + C, the mapping from A to C which
B=@;AxBcCxDifandonlyifAcCand assigns g(f(a)) to each a~ A is called the com-
B c D, provided that neither A nor B is empty. posite off‘ and g and is denoted by g of: If ,f:
Furthermore, A-B, g:B+C, and h:C+D, then (hog)of=
(AxB)U(A’xB)=(AUA’)xB, h o (g of) (associative law for composition of
mappings).
(AxB)n(CxD)=(AnC)x(BflD).
A mapping f: A+B is from A onto B if
The subset {(a, a) 1a~ A} of A x A is called f(A) = B. In this case f is also called a surjection
the diagonal of A x A and is denoted by A,,. (or a surjective mapping). A mapping f: A +B
1421 381 F
sets

is one-to-one (l-l, or injective) if a #a’ implies the image f(l) of 1eA is denoted by a,, and
f(a) #f(a’) for every pair of elements a and a’ in the mapping itself is denoted by {ul}lsr2 ({a,}
A, that is, if for each b in the range of A, there (LEA), or simply {a,}). In particular, if the set
exists only one element a of A such that f(u) = b. A is the power set of a set, the family {a,},,, is
Such an f is also called an injection. In par- called a family of sets indexed by A, or simply
ticular, given a subset B of a set A, the injec- a family of sets. (Moreover, if A is chosen to be
tion f: B-+A defined by the condition f(b) = a subset of the power set ‘Q(X) of a set X and
b for each be B is called the inclusion map f to be the identity mapping on A, then the
ping (inclusion or canonical injection). A neces- family of sets resulting from f can be identified
sary and sufficient condition for f: A-B to be with the set of subsets A itself.)
a surjection is that g of= h of imply g = h for The union U Is,, A, of a family of sets
every pair of mappings g:B+C and h: B-C. {AJ,,, is the set of all elements a such that
For f: A-B to be an injection it is necessary UE A, for at least one 1 in A. Their intersection
and suffkient that fog =fo h imply g = h for f-l le,, A, is the set of all elements a such that
every pair of mappings g : C + A and h : C+ A. a E A for all 1 in A. A family of sets {A,},,, is
A mapping which is both a surjection and an mutually disjoint if I #p implies A, n A, = 0.
injection is called a bijection (or bijective map In this case A = u 1Eh A, is called the disjoint
ping). If f: A-+B is a bijection, then the mapping union (or direct sum) of the sets of the family,
from B to A which assigns to each element b of and {AAJA,A is called a partition (or decompo-
B the unique element a of A such that f(u) = b sition) of A. For families of sets, the following
is called the inverse mapping (inverse function hold: U AA,= u~(upAd n-4,,= n~~n,A~~
or simply inverse) of J and is denoted byf-‘. (SEA, HEMP) (associative law); (Un.,,AA,)n
We have fof-’ = 1, and f-’ of= 1, for every (l,p)Ehx~(A~nB,),(nlEhA~)U
bijection f: A-B. ~EM4~=n~A.fl,EAxM
IFBY)= u (A, U BP) (distributive
If the domain A of a mapping f: A+B is the law); (UA~,,AA)C= nd;, (nl.aA,~=
Cartesian product of A, and A,, f(u) = b U Ish A; (de Morgan’s law).
(where u=(u,,u,)) is written asf(u,,u,)= b. A family kLIAEA of sets is a covering of a set
Given A= A, x A,, B=B, x B,, and fi: Ai-+Bi A, or covers A, if Ac Ule,, A,.
(i= 1,2), the mapping f: A+B defined by the Given f:X-Y, {A,},,, and {B,},,, (where
condition f(u,, u,)=(fi(u,),fi(u,)) is called the A,cX and B,c Y), then f(UIEaA,)=
Cartesian product (or direct product) of the map- +f(A,h f(cL,Ak= L,f(AJ; and
pings of fi and f2, and is denoted by f, x f2. 6 dJd$,= i.hf -‘(Bh, f -‘(nLe,,B+
For a mapping f: A-+& the subset G = leh 1
{(a, f(u)) 1UE A} of A x B is called the graph
off: The basic properties of the graph G off E. Direct Sum and Direct Product of Families
are: (1) For every a E A there exists a b E B such of Sets
that (a, b)sG. (2) (a, b)cG and (u,b’)~G imply
b = b’. Conversely, a subset G of A x B with Given a family {A,},,, of sets indexed by A, a
these two properties determines a mapping set S, and a family of injections {i,: A1-+S}lE,,,
f: A+B such that (u,‘b)~G if and only if f(u)= then the pair (S, {i,},,,) is called the direct sum
b. All notions concerning a mapping f: A-B of {A,},,, if { ii(An)}lsa is a partition of S. In
can be transferred by means of its graph to this case, S is written &,, A, (or &, A, or
those concerning a subset of a Cartesian prod- u I A,). Each A, is called a direct summand of
uct A x B. S, and each i, is called a canonical injection.
Given sets A and B, we denote by BA the set The Cartesian product (or direct product)
of mappings from A to B. If a mapping is lLA4 (lIlAI) of {41AEA (where 4c-V is
identified with its graph, BA is considered to the set of all mappings from A to X such that
be a subset of Fp(A x B). For XE!JI(A), the fiat for every 1~12. The sets A, are the
mapping cx: A+{O, 1) such that cx(x)= 1 if direct factors of nLEh A,. Each element f of
x E X and cx(x) = 0 if x 4 X is called the cbarac- &,, A, is denoted by {x~}~.~ or ( . . . , x1. . . .)
teristic function (or representing function) of X. (where xI = f (A)). The element xrl is the Ith
By assigning to each X E Ep(A) its characteristic component (or coordinate) off: The mapping
function c~E{O, l}A, we obtain a one-to-one prI : &,, AA+ A, which assigns x1 to each
correspondence between ‘$(A) and (0, l}A; {XAlkAErI ls,, A, is called the projection of
hence ‘$(A) is sometimes denoted by 2A. &, A, onto its Ith component. If A = { 1,2},
HAS,, A, can be identified with A, x A,.
D. Families of Sets
F. Set Theory
A mapping from a set A to a set A is also
called a family of elements of A indexed by A. It was G. tCantor who introduced the concept
A is its index set (or indexing set). In this case, of the set as an object of mathematical study.
381 G 1422
Sets

Cantor stated: “A set is a collection of definite, Axiom of power set: Given a set A, its power
well-distinguished objects of our intuition or set ‘1)(A) exists.
thought. These objects are called the elements Axiom of union: For any family of sets the
of the set” (G. Cantor, Math. Ann., 46 (1895)). union exists.
Cantor introduced the notions of tcardinal Axiom of substitution (or replacement): For
number and tordinal number and developed any set A and any mapping f from A, there
what is now known as set theory. He proved exists a set of all images f(x) with XE A.
that the cardinal number of the set of tran- In ordinary theories of mathematics the set
scendental numbers is greater than that of of natural numbers, the set of real numbers,
algebraic numbers, and that all Euclidean etc., are assumed to exist, in addition to sets
spaces have the same cardinal number regard- generated by the axioms in this section. In
less of their dimension. He stated the tcon- pure set theory the axiom of infinity is needed
tinuum hypothesis and also conjectured the to secure the existence of infinite sets.
twell-ordering theorem (G. Cantor, Math. The concept of the “set” {x 1x 4 x} does not
Ann., 21 (1883)), which was proved by E. Zer- automatically lead to Russell’s paradox. The
melo 121. In this proof Zermelo stated the trouble arises when this “set” is regarded as a
+axiom of choice explicitly for the first time, member of a collection represented by x. This
and used it in an essential way. leads to a narrower concept of sets. Consider a
Meanwhile it was pointed out that Cantor’s fixed collection I/ consisting of sets in the naive
naive set concept leads to various logical sense and closed under the set-theoretic oper-
tparadoxes (- 3 19 Paradoxes). Since the set ations mentioned in the axioms. Call a mem-
concept plays a fundamental role in every ber of V a set in the narrow sense. Then set
branch of mathematics, the discovery of the theory becomes free from the known para-
paradoxes had a serious impact upon math- doxes if the qualification for being a set is
ematics, and led to a systematic investigation restricted in this narrow sense. When sets in
of the ifoundations of mathematics. In the the narrow sense are called simply sets, sets in
course of attempts to avoid paradoxes, set the naive sense are called classes. The object
theory was reconstructed as taxiomatic set {x ( x$x} (where x ranges over sets in the nar-
theory (- 33 Axiomatic Set Theory), in which row sense) is a class which is not a set. Those
Cantor’s theory of cardinal numbers and classes which are not sets are called proper
ordinal numbers was restored. Also, the theory classes; for example, the class 1/of all sets and
of the algebra of sets, which forms a basis for the class of all ordinal numbers are both pro-
various branches of mathematics, was re- per classes. For classes, unrestricted use of the
constructed. Axiomatic set theory is consid- comprehension axiom again leads to para-
ered to be free from paradoxes. doxes, but other set-theoretic operations are
justifiably applicable to classes.
The notion of classes was first introduced
G. Classes in connection with the construction of an
axiomatic set theory. The term class was used
A set in the naive sense is a collection originally to denote certain subclasses of the
{x 1C(x)} of all x which satisfy a certain con- class V of all sets. In these volumes the term
dition C(x). The only principle for generating set is mostly used to mean a set in the naive
sets in naive set theory is the axiom of compre- sense, and most of the notions defined for sets
hension, which asserts the existence of the set are applicable to classes.
{.x 1C(x)} for any condition C(x). However, this
principle leads to paradoxes if the notion of an
arbitrary set is considered to be well defined; References
for example, the iRussell paradox is caused by
the set {x1x$x}. This situation necessitates [ 11 G. Cantor, Gesammelte Abhandlungen,
some restrictions on the axiom of comprehen- Springer, 1932.
sion. The simplest way to overcome the para- [Z] E. Zermelo, Beweis, dass jede Menge wohl-
doxes is to adopt Zermelo’s axiom of subsets: geordnet werden kann, Math. Ann., 59 (1904),
Given a set M and a condition C(x), there 51 l-516.
exists a set {x 1XE M, C(x)}. But this axiom [3] A. Schoenflies, Entwicklung der Mengen-
cannot produce any sets other than subsets of lehre und ihrer Anwendungen, Teubner, 1913.
sets whose existence is preassumed. Hence [4] F. Hausdorff, Grundziige der Mengen-
further generating principles of sets had to be lehre, Veit, 1914 (Chelsea, 1949).
introduced. The following axioms are usually [S] A. Fraenkel, Einleitung in die Mengen-
chosen as generating principles. lehre, Springer, third edition, 1928.
Axiom of pairing: For any two objects (pos- [6] F. Hausdorff, Mengenlehre, Teubner, 1927;
sibly sets) a and h, there exists a set {a,/~}. English translation, Set theory, Chelsea 1962.
1423 382 C
Shape Theory

[7] A. Fraenkel, Abstract set theory, North- for simplicity, we assume that all spaces are
Holland, 1953. metrizable and the mappings are continous.
[S] J. L. Kelley, General topology, Van General references are Borsuk [ 11, J. Dydak
Nostrand, 1955, appendix. and J. Segal [3], R. H. Fox [4], S. MardeSic
[9] N. Bourbaki, Elements de mathematique, c71.
I. Theorie des ensembles, ch. 2, Actualites Sci.
Ind., 1212c, Hermann, second edition, 1960;
B. Chapman’s Complement Theorem
English translation, Theory of sets, Addison-
Wesley, 1968.
Let X be a compactum. A closed set A of X is
[lo] P. R. Halmos, Naive set theory, Van
a Z-set in X if for any E > 0 there is a mapping
Nostrand. 1960.
f: X-X - A such that d(x,f(x)) <E for x E X,
where d is a metric on X. The Hilbert cube Q is
the countable product n& Zi, where Zi is the
closed interval [0, 11. The subset s = nF=, ZF
382 (1X.23) (ZF =(O, 1)) is called the pseudointerior of Q. The
Shape Theory following facts are known: (1) If a compact
metric space X is contained in s or Q -s, then
X is a Z-set in Q. (2) For any continuous
A. General Remarks
mapping f of a compact metric space X into
Q, there exists an embedding g of X into Q
In 1968 K. Borsuk introduced the notion
such that g is arbitrarily close to f and the
of shape as a modification of the notion of
image g(X) in Q is a Z-set. The complement
thomotopy type. His idea was to take into
theorem (T. A. Chapman [2]) states: Let X
account the global properties of topological
and Y be Z-sets in Q. Then Sh(X)= Sh( Y) iff
spaces and neglect the local ones. It is a classi-
Q-X and Q - Y are homeomorphic.
fication of spaces that is coarser than the
homotopy type but that coincides with it on
tANR-spaces. C. FAR, FANR, Movability, and Shape
Let 9 be the category whose objects are all Group: Shape Invariants
tpolyhedra and whose morphisms are homo-
topy classes of continuous mappings between A closed set A of a compactum X is a funda-
them. For spaces X and Y, denote the set of all mental retract of X if there is a shape mor-
thomotopy classes of continuous mappings of phism I: X + A such that r. i” = lA#, where i is
X to Y by [X, Y] and the homotopy class of a the inclusion of A into X. A compactum X is
mapping f by [f]. For a space X, let n, be a fundamental absolute retract (FAR) (resp.
the functor from 9 to the tcategory of sets fundamental absolute neighborhood retract
and functions that assigns to a polyhedron P (FANR)) if for any compactum Y containing
the set ZZx(P)=[X, P]. A morphism rp: P+Q X, X is a fundamental retract of Y (resp. of
of 9 induces the function ‘p# : [X, P] -+ [X, Q] some closed neighborhood of X in Y). A com-
defined by q#([f])=p.[f] for [f]:X+P. A pactum X is movable if for any embedding
tnatural transformation from ZZ, to Ux is a Xc Q and for any neighborhood U of X in Q
shape morphism from X to Y. A continuous there is a neighborhood V of X satisfying the
mapping f:X+ Y defines the shape morphism following condition: For any mapping f of a
f# of X to Y as follows: For [g]: Y+P in compactum Y to V and for any neighborhood
Z&(P), the composition [g.f] is an element of W of X, there is a homotopy H: Y x I+ U
17,(P). The correspondence: [g]+[g .f] de- such that H(y,O)=f(y) and H(y, 1)~ W for
tines a natural transformation from ZZ, to nx YE Y. In this definition, if Y is replaced by a
and hence determines the shape morphism compactum with dimension <k, then X is said
f # :X+ Y. The identity mapping 1x on X to be k-movable. Pointed FAR, FANR, mova-
defines the identity shape morphism l,# on bility, and k-movability are defined similarly
X. Given spaces X and Y, X shape dominates in the pointed shape category. The follow-
Y if there are shape morphisms 5: Y+X and ing facts are known (Borsuk [ 11, Dydak and
n:X+Y such that tr~=lx#, and we write Segal [3], J. Keeslings [S], J. Krasinkiewicz
Sh(X) ,< Sh( Y). If, in addition, 15 = 1y#, then [S]). A compactum X is a FAR if and only
X and Y are of the same shape, and we write if X is a pointed FAR iff Sh(X) = Sh(point), i.e.,
Sh(X) = Sh( Y). A shape category 9’ is the X has the same shape as a one-point space. A
category whose objects are all topological pointed compactum (X,x) is a pointed FANR
spaces and whose morphisms are shape mor- iff Sh(X, x)< Sh(K, k) for some pointed poly-
phisms between them. If we replace topo- hedron (K, k). An FANR is movable. A com-
logical spaces by pointed ones, the pointed pact connected Abelian topological group is
shape category is obtained. In what follows, movable if and only if it is locally connected.
382 D 1424
Shape Theory

A continuous image of a pointed l-movable [7] S. MardeSic, Shapes for topological spaces,
compactum is pointed l-movable. It is un- General Topology and Appl., 3 (1973) 265-
known whether (i) an FANR is a pointed 282.
FANR and (ii) movability means pointed [S] J. Krasinkiewicz, Continuous images of
movability. For a pointed compactum (X,x), continua and I-movability, Fund. Math., 98
let { (Ki, ki) 1i = 1,2, } be a countable tin- (1978) 141-164.
verse system consisting of pointed finite poly- [9] R. B. Sher. Realizing cell-like maps in
hedra whose limit is (X,x). The limit group Euclidean spaces, General Topology and
@rc,,(Ki, ki) is the kth shape group of (X, x), Appl., 2 (1972), 75-89.
where rc”(K, k) is the kth homotopy group of [lo] J. E. West, Mapping Hilbert cube mani-
(K, k). It is known that the shape groups for folds to ANR’s: A solution of a conjecture of
movable compacta behave like homotopy Borsuk, Ann. Math., (2) 106 (19771, l-8.
groups for ANR. A property P of spaces is
a shape invariant if whenever X has P and
Sh(X) = Sh( Y), then Y has P. FAR, FANR,
movability, k-movability, and shape groups
are shape invariants.
383 (11.26)
Sheaves
D. CE Mappings
A. Presheaves
A mapping f of a space X onto a space Y is
a cell-like (CE) mapping if it is proper and Let X be a ttopological space. Suppose that
Sh(f-‘(y))=Sh(point) for each point y of Y. It the following conditions are satisfied: (i) There
is known (R. B. Sher [9], Y. Kodama [6]) that exists an (additive) Abelian group F(U) for
if there is a CE mapping of X to Y with finite each open set Cl of X, and g(0) = {O}; and
dimension, then Sh(X) = Sh( Y). Here the finite- (ii) there exists a homomorphism r,,:g(V)+
dimensionality of Y is essential. A Q-manifold F(U) for each pair U c V, such that ruU = 1
is a space, each point of which has a closed (identity) and ruw = rur or,, for C’c Vc W.
neighborhood homeomorphic to Q. The fol- We call 5, consisting of a family ::F( U)} of
lowing are known (Chapman [Z], J. E. West Abelian groups and a family of mappings
[lo]): (1) If f is a CE mapping of a Q-manifold {r,,}, a presheaf (of Abelian groups) on X. If
M to an ANR X, then the mapping g: M x Q + u E 9( V) and U c V, we write r&(z) = a 1U and
X x Q defined by g(m,x)=(,f(m),x) for (rn,x)~ call it the restriction of a to U. A homomor-
M x Q is approximated by homeomorphisms. phism cp between two presheaves 9 and 3 on X
As a consequence, if X is a locally compact is a family {cp( U)} of group homomorphisms
ANR, then X x Q is a Q-manifold. (2) Every q(U):F(U)+Y(U) satisfying ruvocp(V)=
compact ANR is a CE image of a compact Q- cp(U) o ruv whenever U c V. The presheaves
manifold. (3) Every compact ANR has the on X and their homomorphisms form a
same homotopy type as that of a compact tcategory.
polyhedron. The following problem raised by
R. H. Bing is open: Is a CE image of a linite-
B. Axioms for Sheaves
dimensional compactum finite-dimensional?

A presheaf 9 is called a sheaf (of Abelian


References groups) if it satisfies the following condition: If
U is open in X and ( Ui)is, is an topen covering
[l] K. Borsuk, Theory of shape, Monograf. of U, and if for each ie I an element si of P( Ui)
mat. 59, Polish Scientific Publishers, 1975. is given such that si 1Ui n Uj = sj 1Ui n Ui for all i
[2] T. A. Chapman, Lectures on Hilbert cube and j, then there exists a unique SE g( U) such
manifolds, CBMS 28 regional conf. ser. in that s 1Ui = si for all i. By definition, a homo-
math., Amer. Math. Sot., 1976. morphism between two sheaves is a homomor-
[3] J. Dydak and J. Segal, Shape theory, Lec- phism of the presheaves. The sheaves on X
ture notes in math. 688, Springer, 1978. also form a category.
[4] R. H. Fox, On shape, Fund. Math., 74 Let fl be a presheaf, x a point of X, and ‘%,
(1972) 47-71. the idirected set of open neighborhoods of x,
[S] J. Keesling, Shape theory and compact with the order opposite to that of inclusion.
connected Abelian topological groups, Trans. Then {F(U) 1U E an,} is an inductive system of
Amer. Math. Sot., 194 (1974) 3499358. groups. The tinductive limit l$,,F( U) of
[6] Y. Kodama, Decomposition spaces and groups {Y”(U)} is denoted by gx and called
shape in the sense of Fox, Fund. Math., 97 the stalk of 9 over x. The image of SE P( U) in
(1977), 199-208. & is called the germ of s at x and is written s,.
1425 383 D
Sheaves

A homomorphism rp : S-+9 of presheaves travariant functor from the category of open


induces a homomorphism CP,:F~-@~ of stalks. sets of X to %?,and a homomorphism between
presheaves .9 and 9 on X is a tnatural trans-
formation between the functors 9 and 9.
C. Sheaf Spaces The presheaves (sheaves) of Abelian groups
on a space X form an tAbelian category, de-
We introduce a topology on the tdirect sum noted by Bx (U’). For a homomorphism f:
9’ = uxsx 9” in the following way: For each S-& of presheaves, the image, coimage,
open set U of X and each s E 9( U); consider kernel, and cokernel off in Bx are given by
the set Mu,, = {sx 1XE U} of the germs defined
by s at the points of U, and take the set of all
such +4u,S as a tbase of open sets of the to- (Coimf)(U)= Coimf(U),
pology. If p:F’+X is the mapping that maps
the points of FZ to x, then p is continuous, and WrfW)=Ker.W),
each p-‘(x) (= Px) has the structure of an (Cokerf)(U)=Cokerf(U).
Abelian group. Moreover, the following con-
ditions are satisfied: (i) p is a tlocal homeo- When 9 and 9 are sheaves, the kernel off in
morphism, and (ii) the group operations on wx coincides with the kernel in Bx, while the
p-‘(x) are continuous in the sense that (a, b)+ image and cokernel off in Vx are the as-
a + b is a continuous mapping from the sociated sheaves of the image and the cokernel
tfiber product 9’ xx 9’ (i.e., the subspace in Bx, respectively. Thus, f:9+% induces
{(a, b) 1p(a) = p(b)} of the product space S’ x fx:9~-6?~ at each XEX, (Kerf),=Kerfx,
9’) to s’ and a--+ -a is a continuous map- (Imf), =Imf,, (Cokerf), =Cokerf,, and a
ping from y to itself. In general, a topological sequence of sheaves O-+F&$S-+O is exact
space 9’ with a structure satisfying these if and only if 0-+9~%~%%7+0 is exact at
conditions is called a sheaf space over X. each x E X.
When P’ is a sheaf space, a continuous
mapping s from a subspace A of X to 5’ such D. Examples
that p o s = 1, is called a section of 9’ over
A. The set of sections over A, denoted by (1) Let G be an Abelian group (or some other
T(A, S’), is an Abelian group in the obvious talgebraic system) with tdiscrete topology.
way. If we associate I( U, 9”) with each open The Cartesian product X x G gives rise to a
set U and define ruV by the restriction of sec- sheaf on X, called a constant sheaf (or trivial
tions (I&S) = s 1U), then we get a sheaf P” on sheaf ).
X. If we start from a presheaf 9 and get 9” (2) Let X be a topological space and Y be a
via F’, the correspondence P-+9” is a tcovar- topological Abelian group (e.g., the real or
iant functor from the category of presheaves complex numbers). We obtain a sheaf 9 on X
to the category of sheaves, and V’ is called the by putting 9(U) = the set of all continuous
sheaf associated with the presheaf 9. If S is a mappings U -+ Y and ruv = the natural restric-
sheaf, we can prove 9” Q 9. Conversely, if we tion. The stalk over x E X is the set of germs at
start from a sheaf space 9’ and construct the x of continuous functions into Y. This sheaf is
sheaf gV and then the sheaf space P, then called the sheaf of germs of continuous func-
9”’ is canonically isomorphic to 9’. Since we tions with values in Y.
can identify a sheaf and the corresponding (3) When X is an tanalytic manifold and Y is
sheaf space, both are usually denoted by the a commutative tLie group, we define the sheaf
same letter. In particular, when g is a sheaf, of germs of analytic mappings with values in Y
P(U) is usually written I( U, SF). in the same way. If Y is the complex number
Given a section s E I-(X, 9) of a sheaf, the field C, this sheaf is the sheaf 8 of germs of
points x E X for which s, # 0 in Px from a analytic (or holomorphic) functions. A tcon-
closed set (the sheaf space 9 is not necessarily netted component of the sheaf space Lo can be
Hausdorff even if X is so). This set is called the identified with the tanalytic function deter-
support of s and is denoted by supp s. mined by the function element corresponding
In the theory of this and the previous two to a point on that component. The sheaf of
sections, we can replace Abelian groups by germs of functions of class C’ on a C-manifold
groups, rings, etc. Then F(U) is a group or (r < s) is similarly defined.
ring accordingly, and 9(1(o) the group consist- (4) Given a tvector bundle B over a topolog-
ing of the identity element or the ring consist- ical space X, we define a sheaf on X by 9(U)
ing of the zero element, respectively. We thus = I(U) ( = the module of sections of B over U)
obtain the theories of sheaves of groups, and ruV = the natural restriction. Here the stalk
sheaves of rings, etc. In general, a presheaf 9 over x E X consists of the germs at x of sections
on X with values in a category V is a tcon- of B, and is called the sheaf of germs of sec-
383 E 1426
Sheaves

tions of the vector bundle i?. We have similar Hq(X, R) is the 4th cohomology of the com-
definition for the sheaf of germs of differenti- plex O-tD”(X)sD’(X)%..., where D’(X)
able (analytic) sections when X is a tdifferenti- = T(X, 91’(X)) = the group of C” -differential
able (complex) manifold. The case where B is a forms of degree i on X. This proves the de
itensor bundle (e.g., the tcotangent bundle Rham theorem, which says that the +de Rham
Z*(X)) is important. The sheaf 9P(X) of germs cohomology group is isomorphic to the +(sin-
of C”-sections of the r-fold texterior power of gular) cohomology group of X with real coefl-
2*(X) is called the sheaf of germs of differen- cients (- 105 Differentiable Manifolds R). For
tial forms of degree r (0 < r < dim X). a sheaf 9 of noncommutative groups, we can
define the first cohomology H’(X, 9) [2].

E. Sheaf Cohomology
F. The Tech Cohomology Group
The category @ of sheaves of Abehan groups
on X has sufficiently many tinjective objects. A Let U = {U,} be an open covering of X, and
sheaf 9 with the property that rox: T(X,.F) write Uin Uj= U,, etc. Put
+T( U, 9) is surjective for any open set U is
Cp(~W= n r(~i,...i,~~)~ p==o, 1, 2 ,....
said to be flabby (or scattered). An injective io,...,ip
sheaf is flabby.
An element of Cp(9) is called a cochain of
Fix a nonempty family @ of closed subsets
degree p. Define d: Cp(F)-tCp”(F) by
of X satisfying the following two conditions: (i)
(df‘ ho...ip+l =Cf=‘:,(-lY(f, “... ~...ip~l I uil ,... i,+,)a
A, BE @ => A U BE 0’; (ii) any closed set con-
and denote the qth cohomology of the com-
tained in an element of @ belongs to a,. Put-
plex (Cp(F), d) thus obtained by If “(U, 9).
ting T,(p)= {sls6T(X,.F), supps~@} for
When an open covering b is a refinement
each .FE@, we obtain a +left-exact tcovariant
of U, there is a canonical homomorphism
functor T, from 9Zx to the category (Ab) of
H4(11,.F)+Hq(F3,F). So we can take the
Abelian groups. Therefore, by the general
inductive limit of the groups H4(U, 9) with
theory of homological algebra, we can define
respect to the refinement of open coverings.
the +right derived functors R4T,r,:wx-t(Ab)
This limit group is denoted by fil(X, 9) and is
(q = 0, 1,2, ). We put R4ro(9) = H&(X, 9)
called the Tech cohomology group with coeffi-
and call the Hg(X, 9) (q = 0, 1, ) the coho-
cient sheaf 9. It coincides with H4(X,F) for
mology groups with coefficient sheaf 9 and
q < 1, and if X is paracompact, for all q.
family of supports CD(- 200 Homological
Algebra I). When Q, is the family of all closed
subsets of X, we write Hq(X, ,9) instead of G. Relation to Continuous Mappings
Hg,(X, 9).
Thus the cohomology group H;(X, 3) Let X and Y be topological spaces and ,f: X
is the qth cohomology of the complex --t Y be a continuous mapping. If 9 is a sheaf
r,(l?O)~:r,(l?‘)~:r,(e2)~... induced by an on Y, the fiber product X x ,4 (where 9 is
tinjective resolution O~9~~“~~1~... viewed as a sheaf space over Y) is a sheaf on
of the sheaf 9 : &(X, 9) = Kerd”/Im dye’ X. It is denoted by f*(9) or f-‘(9) and is
(q=O, 1, . . ..d-’ =O). called the inverse image of 9. The correspon-
H$(X, 9) = r,(9), and from an exact se- dence F+,f*(F) is an exact functor from Vy
quence of sheaves 0+9+9+x+0 we get an to Vx. Next, let 9 be a sheaf on X. Associating
exact sequence O+H$(X, F)+Hg(X, te)+ r(f-‘( U), 9) with each open set I/ of Y, we
H~(X,~)-tH~(X,,~)~Hb(X,re)~H~(X,,~) obtain a sheaf on Y, which we denote by f,(3)
+H;(X,F)+.... and call the direct image of 3. The corre-
Similarly, the cohomology groups H$(X, 9) spondence f, is a left-exact functor %‘+@,
can also be calculated with an exact sequence and we can consider its right derived functors
O-.F-i?“+iZ+..., where each Q!‘is as- Ryf*. The sheaf Rqf,(Y) is the sheaf associated
sumed to be T,-acyclic (i.e., H$(X, Q!‘) = 0 for with the presheaf that associates H4(,jm1 (U), 3)
q > 0) instead of injective. The flabby sheaves, with each open set U.
for instance, are l-,-acyclic, so we can compute A homomorphism $ from B to ,f,(9) is also
H,$(X, 9) by a flabby resolution of 9 (R. called an ,f-homomorphism from 9 to 9. To
Godement). For example, let X be an n- give such a $ is equivalent to giving a family
dimensional +paracompact C”-manifold and of homomorphisms of the stalks IL,:~~(,,+??,
9=R. Then O+R+2f”(X)%%‘(X)~... is (xEX) satisfying the continuity condition: For
exact, where V14(X) is the +sheaf of germs of any open set U of Y and any section SE r( U, F)
(?-differential forms of degree q, d4 is +ex- over U, the mapping cp from f-‘(U) to S
terior differentiation (Poincare’s theorem), and defined by q(x)= $,(s(,f(x)) is continuous.
we have HP(X, U4) = 0 for p > 0. Therefore The functors ,f* and ,f. are related by
1427 384 A
Siegel Domains

Hom(f*(~),%)~Hom(~,f,(Q)). The Leray J. History


tspectral sequence
About 1945, J. Leray established the theory of
Q-4 = HP( Y, Pf*(cq) * H”(X, 9)
sheaf coefficient cohomology groups (in a form
exists and connects the cohomologies of X and slightly different from that in Sections E and
of Y. F) and the theory of spectral sequences to
study the relation between the local properties
of a continuous mapping and the global coho-
H. Ringed Spaces mologies. In the theory of functions of several
complex variables, K. Oka conceived the idea
of “ideals of indefinite domain.” These two
Let X be a topological space and Lo be a sheaf
on X of commutative rings with unity element ideas were unified by H. Cartan into the pre-
such that 0x # (0) for any XEX. Then the pair sent form of sheaf theory. As a link between
(X, 0) is called a ringed space, and 0 is called local properties and global properties, sheaf
its structure sheaf. A morphism (X, 0)+(X’, 0’) theory has been applied in many branches
is by definition a pair (f; 0) consisting of a of mathematics (- 16 Algebraic Varieties;
continuous mapping f: X-*x’ and an f- 21 Analytic Functions of Several Complex
homomorphism 8: 040’. When each 0x is a Variables; 23 Analytic Spaces; 72 Complex
tlocal ring, (X, 0) is called a local ringed space. Manifolds).
A morphism of local ringed spaces is defined
to be a pair (j Q):(X, 0)+(X’, 0’) as before, References
satisfying the additional condition that f3 is
local (i.e., 0,: Of,,, -c?, maps the maximal ideal [l] F. Hirzebruch, Neue topologische Metho-
into the maximal ideal for each x E X). These den in der algebraischen Geometrie, Erg.
concepts are important in algebraic geometry Math., Springer, second edition, 1962; English
and the theory of functions of several complex translation, Topological methods in algebraic
variables. geometry, Springer, third enlarged edition,
1966.
[2] A. Grothendieck, Sur quelques points
I. Direct Products and Tensor Products d’algebre homologique, Tohoku Math. J., (2) 9
(1957), 119-221.
Let FA (LEA) be sheaves of Abelian groups on [3] R. Godement, Topologie algebrique et
a topological space X. The sheaf F on X theorie des faisceaux, Actualites Sci. Ind., 1252,
defined by S(U)=n,FA(U) and ruv=nnr2” Hermann, 1958.
is denoted by 9 = n, FA and called the direct [4] R. G. Swan, The theory of sheaves, Univ.
product of sheaves {FA}. For each XCX there of Chicago Press, 1964.
is a natural mapping Fx+nI(gA),, which is in [S] G. E. Bredon, Sheaf theory, McGraw-Hill,
general neither injective nor surjective. When 1967.
A is a finite set, n 4 is also written F= FI [6] J. Leray, L’anneau spectral et l’anneau
+ . . . + 5” and is called the direct sum of the filtri: d’homologie dun espace localement
sheaves. The inductive limit F= ind lim & of compact et dune application continue, J.
an inductive system of sheaves on X also Math. Pure Appl., 29 (1950), l-139.
exists, and FZ = ind limgA,,. [7] H. Cartan, Stm. de topologie algebrique,
Let (X, 0) be a ringed space. A sheaf of Paris, 194881949; 1950-1951.
Abelian groups 9 on X is called a sheaf of 8- [S] A. Borel, Stm. de topologie algebrique
modules (or simply an Lo-module) if F( U) is an de l’Ecole polytechnique fed&ale, 1951. (Co-
0( U)-module for each U and ruV : S( V) +F( U) homologie des espaces localement compacts
is a module homomorphism compatible with d’apres J. Leray, Lecture notes in math. 2,
0( V)+Lo( U) for each U c v. Then px is an ox- Springer, 1964.)
module for each x E X. For a fixed (X, 8), the
O-modules form an Abelian category. When
9 and Q are O-modules, the tensor product
Z = 9 0 $ of 9 and 9 as sheaves over 0 is 384 (Vll.20)
defined as follows: Define a presheaf by U-r
F(U) @ Ocu,g( U) and rLiy = r& 0 r&, and let
Siegel Domains
2 be the associated sheaf of this presheaf.
Then we have %x = @x 0 0,%x. A. Siegel Domains
The notion of coherent sheaves is important
in the theory of O-modules (- 16 Algebraic Let D be a bounded domain in C” and G,,(D)
Varieties E). the full tholomorphic automorphism group of
384 B 1428
Siegel Domains

D, which is a +Lie transformation group with Examples. H(n, R) denotes the vector space
respect to the tcompact-open topology. If of all real symmetric matrices of degree n, and
G,,(D) acts transitively on D, then D is called a H +(n, R) the regular cone consisting of all
thomogeneous hounded domain. The study of positive definite matrices in H(n, R).
homogeneous bounded domains was initiated (i) The Siegel domain of the first kind
systematically by E. Cartan in 1936, while the D(H+(n,R))={X+iY(XEH(n,R),YE
notion of Siegel domains, which was intro- H+(n, R)} is called the Siegel upper half-
duced by I. I. Pyatetskii-Shapiro, has made plane, which is holomorphically equivalent to
remarkable contributions to the study of the classical tsymmetric domain of type III.
homogeneous bounded domains. (ii) Let u, u E C and F(u, u) be the 2 x 2 diag-
Let V be a convex domain in an n- onal matrix diag(u& 0), which is an H+(2, R)-
dimensional real vector space R. V is called a Hermitian form on C. The resulting Siegel
regular cone if for every x E V and 1> 0, ix E V domain is
and if V contains no entirely straight lines. Let
DW+G’, RI, F)
W be a complex vector space. A mapping F :
Wx W-1 RC (the tcomplexification
Hermitian form if the following conditions
of R) is a V-

satisfied: (Fi) F(u, u) is C-linear in u, (Fii) F(u, u)


are
=
{
(Z,,Z,,Z3,U)EC4
I(Imz, -1uI’
Im z3
Imz,
Imz, >

= F(u, u), where the bar denotes the tconjuga- eH+(2,R)


tion with respect to R, (Fiii) F(u, U)E V (the I
closure of V), and (Fiv) F(u, u) = 0 implies u = 0. (iii) Let B={t~CI[tl<l}, and let u,oeC.
Given a regular cone Vc R and a V-Hermitian Put LJu, u)=(l - It(2)-1(~U+t~~). Then L,(u, u)
form F on W, one can define a Siegel domain is a nondegenerate semi-Hermitian form, and
D( V, F) (of the second kind) by putting D( V, F) wehaveD(H+(2,R),L,B)={(z,u.t)EC311mz
={(x+iy,u)~R~x WIy-F(u,u)~V},whichis -(1-~t~2)~1Re(~u~2+~u2)>0,~t~<1},which
holomorphically equivalent to a bounded is a Siegel domain of the third kind and is
domain in RC x W. When W=(O), D(V, F) is holomorphically equivalent to the Siegel upper
reducedtoD(V)={~+iyER~Iy~V},which half-plane of dimension 3.
is called a Siegel domain of the first kind. A The domains in (i) and (ii) are both afhnely
mapping L : W x W+ RC is a nondegenerate homogeneous; the latter was originally found
semi-Hermitian form if L can be written as L by Pyateteskii-Shapiro in 1959 [I] and pro-
= L, + L,, where L, and L, are Rc-valued vides the least-dimensional example of non-
functions satisfying the conditions: (Li) L, symmetric homogeneous bounded domains,
satisfies (Fi) and (Fii); (Lii) L, is a symmetric which answered afftrmatively Cartan’s con-
C-bilinear form; (Liii) L(u, u)=O for all UE W jecture (1936): Are there non-tsymmetric
implies v=O. Let B be a bounded domain in a homogeneous bounded domains in C” (n > 4)?
complex vector space X and L, (pi B) an RC-
valued nondegenerate semi-Hermitian form on
W depending differentiably on p E B. Consider B. Infinitesimal Automorphisms of Siegel
a domain D( V, L, B) in RC x W x X defined by Domains
puttingD(V,L,B)={(x+iy,u,p)eRCx WxXl
y - Re L&u, U)E V,~EB}. The domain D( V, L, B) For a Siegel domain D( V, F) c RC x W, the Lie
is called a Siegel domain of the third kind over algebra gh of Gh can be identified with the Lie
B if it is holomorphically equivalent to a algebra of infinitesimal automorphisms, i.e., all
bounded domain. D( V, L, B) is a fiber space complete holomorphic vector fields on D( V, F).
over B. Let G(V) be the group consisting of all the
By the affine automorphism group G, of a linear automorphisms of R leaving V stable.
Siegel domain D( V, F) c RC x W we mean the Let us fix a base in R, and let (zi, zz, . , z,,) be
group consisting of all elements in the complex the complex linear coordinate system in RC
affine transformation group of RC x W leaving corresponding to it. Choose a complex linear
D( V, F) stable. The full holomorphic automor- coordinate system (u,, uz, , u,) in W. We
phism group G, of D( V, F) contains G, as a write F(u, u) as F(u,u)=(F,(u,v), . . ,F,(u, u)).
closed subgroup. If G, acts transitively on Consider the following two vector fields in the
D( V, F), then D( V, F) is said to be homogene- Lie algebra g0 of G,:
ous. A homogeneous Siegel domain is
necessarily affinely homogeneous, i.e., G, acts
transitively on D( V, F) [2]. The +Bergman
metric of D( V, F) which is a G,,-invariant +KHh- andputg~={XEg,I[E,X]=iX},1EZ.Then
ler metric, is tcomplete [3], and so D(V, F) is a g, can be written as a tgraded Lie algebra in
+domain of holomorphy. the following way: g, = g;’ + g:’ + gf Here
1429 384 C
Siegel Domains

we have {g, f, (j), w} (sometimes abbreviated g) is called


a j-algebra if the following conditions are
satisfied: (i) jfct for jo(j) and j=j’(modf) for j,
j’s(j); (ii)j*= -id(modf); (iii) j[k,x]=[k,jx]
(modf) for kef, XEQ; (iv) [jx,jy]=j[jx,y]+
XxJyl + IIx, ~1(modf) for x7YEQ;(4 4Ck xl)
= 0 for kc f; (vi) w( [ jx, jy]) = o( [x, y]); (vii)
w( [ jx, x]) > 0 for x 4 I. Let g’ be a subalgebra
c = (4, C&C , of g such that jg’ c g’ + I. Then, putting I’ =
I
g’ fl I, one can naturally induce a j-algebra
and g,” consists of all vector fields Xta,B) of the
structure on the pair {g’, t’}. The j-algebra
form
thus obtained is called a j-subalgebra of {g, I,
(j), w}. A j-algebra {g, f, (j), w} is called proper
(resp. effective), if, for any j-subalgebra {g’, I’}
with g’ compact semisimple, g’ is contained in
where the matrices A = (akj), B = (b,,) satisfy f (resp. if {g, f} is an effective pair).
the conditions: exp tA E G( V), ts R, AF(u, u) = Now let D be a homogeneous bounded
F(Bu, u) + F(u, Bu). For XcA,Bj~ 9." we define domain in C”, G a connected +Lie subgroup
tr XfA,, to be the sum of the trace of A and of G,(D) acting ttransitively on D, and K the
that of B. Let Q" be the I-eigenspace of ad E in tisotropy subgroup at a point in D. The Lie
g,, (2 E Z). Then gh can be written also in the algebras of G and K are denoted by g and I,
form of a graded Lie algebra: gh = g-’ + g -’ + respectively. Then the pair {g, I) becomes an
go + g’ + g2, and g” = gi is valid for 1= - 2, effective proper j-algebra. Conversely, to every
-l,O. Furthermore g,, can be nicely deter- effective proper j-algebra there corresponds a
mined by go in the following manner. pBi de- homogeneous bounded domain. The identity
notes a polynomial on RC x W homogeneous component of G,,(D) is isomorphic to the iden-
ofdegreepinz,,..., z,, and homogeneous of tity component of a treal algebraic group via
degreelinu,,...,~,. Let Q’ (resp. Q’) be the the tadjoint representation. Let {g, I,(j), o}
set of all polynomial vector fields of the form be a j-algebra. Suppose that g satisfies the
following conditions: (i) g = gm2 +g-’ + go as a
graded Lie algebra; (ii) go = f +jg -‘; (iii) there
exists a jc(j) such that jg-‘=g-i; and (iv)
there exists an r E g-’ such that [ jx, r] =x
for xog-*. Such a decomposition is called
a Siegel decomposition of g. To an effective j-
Then we have g1 = {XE~’ 1[X, g-‘1 cg’}, and
algebra admitting a Siegel decomposition
Q2={x&Icx>Q-21~Qo> cx,Q-‘l~Q’, there corresponds a unique Siegel domain up
ImTr[X, Y] =0 for YEQ-‘}. Another descrip-
to affme equivalence. Vinberg, Gindikin, and
tion of g1 and g2 has been given in terms of
Pyatetskii-Shapiro (Appendix in [l] or Trans.
Jordan triple systems [4]. The explicit descrip-
Moscow Math. Sot., 12 (1963)) proved that
tions of g1 and g2 have been given for most
the Lie algebra g,,(D) of G,,(D) contains a j-
homogeneous Siegel domains D( V, F) over
subalgebra admitting a Siegel docomposition
irreducible self-dual cones V (- Section D, T.
and corresponding to the same domain D,
Tsuji, Nagoya Math. .I., 55 (1974)). g,, = g. is
and obtained the realization theorem: Every
valid for the Siegel domains which are irreduc-
homogeneous bounded domain D is holomor-
ible quasisymmetric but not symmetric (-
phically equivalent to a Siegel domain. In con-
Section D). Main references for this section are
sequence, D is diffeomorphic to a Euclidean
[2-61.
space, and the isotropy subgroup K,(D) is a
maximal compact subgroup of G,(D). We have
the decomposition G,,(D) = K,(D). T (semi-
C. j-Algebras and Homogeneous Bounded direct), where T is an R-splittable solvable
Domains subgroup of G,,(D) acting simply transitively
on D. T is uniquely determined up to conju-
The notion of j-algebra was introduced by gacy in G:(D) (= the identity component of
Pyatetskii-Shapiro [ 11, which reduces the G,,(D)), and is called the Iwasawa group of D.
study of homogeneous bounded domains to The j-algebra structure of the Lie algebra t
purely algebraic problems. Let g be a Lie of the Iwasawa group T is characterized by
algebra over R, and I a subalgebra of g, (j) a the following properties: (i) for every t E t, the
collection of linear endomorphisms of Q, and o eigenvalues of ad t are all real; (ii) there exists
be a linear form on g. Then the quadruple a tcomplex structure j such that [ jx, jy] =
384 D 1430
Siegel Domains

j[jx, y] +j[x,jy] + [x, y] for x, ye t; and (iii) vector field on RC x W which is tangent to S,
there exists a linear form w on t such that and its restriction to S is an infinitesimal tCR-
~([jx,jy])=o([x,y]) and that w([jx,x])>O automorphism on S, i.e., a complete vector
for x #O. A Lie algebra satisfying (i)-(iii) is field generating a 1-parameter group of tCR-
called a normal j-algebra. There exists a one- equivalences of S onto itself. Conversely, every
to-one correspondence between the set of infinitesimal CR-automorphism on S can
holomorphic equivalence classes of homog- be extended uniquely to a holomorphic vector
eneous bounded domains and the set of j- field on RC x W, and an element of g,, is char-
isomorphism classes of normal j-algebras; by a acterized as an infinitesimal CR-automorphism
j-isomorphism here we mean an isomorphism on S whose extension leaves the TBergman
which commutes with j. Let {g,j,w} be a kernel form of D( V, F) invariant [6]. Let FI=
normal j-algebra and define an inner product dim,g,, m= dim,b, and let k=(g)- 1. Then
( , ) on g by (x, y) = w( [ jx, y]). The ortho- Cc/B, and consequently D( V, F), is embedded
gonal complement b with respect to ( , ) of holomorphically into the complex tGrass-
the +commutator subalgebra g, of g is an mann manifold of m-dimensional subspaces in
Abelian subalgebra of g, and the adjoint repre- gc and so into the tcomplex projective space
sentation of !J on g1 is fully reducible. One has Pk(C). Any element of Gf is induced from a
g=C,f,, 6=f,, and g, =Ca+Ofa, where f,= projective transformation and hence is a bi-
{.wggI [h,x] =~(h)x,h~t)}. The linear form t( rational transformation on D( V, F).
on b is called a root of g. There exist I roots Let D be a homogeneous bounded domain
s(, , , s(~(2 = dim I)) such that lo can be written in C”, g,, the Lie algebra of G,,(D), and f, the
in the form b =jtal + +jfnl, 1 being the rank isotropy subalgebra of g,,; and let gc be the
of 9. Then, after a suitable change of the num- complexification of g,,. g,, is a j-algebra. Let us
bering of the c(;s, any root c( will be seen to define the complex subalgebra g- of gc by
be of the form (xi + aJ2, (ai - cr,)/2 or aJ2, putting g- ={x+ijxIx~g,,,j~(j)}. Then we
where 1 d i < k < 1. A normal ,j-algebra admits have gc=g,,+gm, g,,flg-=f,. Let G, be the
a unique Siegel decomposition which can be connected Lie group generated by gc and
constructed by using root spaces. containing G:(D) as a subgroup. Let G- be the
connected (closed) Lie subgroup of G, gen-
erated by g-. Then D can be holomorphically
D. Equivariant Holomorpbic Embedding embedded in G,/G- as the open G:(D)-orbit of
the origin of G,/G- [S]. This embedding is
We retai.: the notation of Section B. Let called the generalized Bore1 embedding. G,/G-
D( V, F) c RC x W be a Siegel domain and gc be is compact if and only if D is symmetric, and in
the +complexiiication of the Lie algebra g,,. g-’ this case G,/G- coincides with the compact
has the complex structure defined by the endo- dual [9].
morphism ad I. Let g;’ be the k i-eigenspaces Let {t,,j, w} be a normal j-algebra of rank 1
in the complexificatioi gel of 9-l under adl. corresponding to a homogeneous bounded
Let us consider the complex subalgebras b = domain D,, and define the Hermitian inner
n:‘+rr~+d+$ and n=G’+g; ofg,, product h by h(x, y) = w( [ jx, y]) + iw( [x, y]) for
where the subscripts C denote the complexi- x, yet,. t, has I - 1 (normal) nontrivial j-ideals
fication of the respective space. Let G, be the (i.e., j-invariant ideals) up to j-isomorphisms.
connected tcomplex Lie group generated by Take a j-ideal t 1 oft,. Then we have t, = t 1 +
gc and containing G,” (= the identity com- t,, t, being a (normal) j-subalgebra oft, de-
ponent of Gh) as a subgroup. The Lie algebra fined as the orthogonal complement oft, in
of the normalizer B of b in G, coincides with t, with respect to h. The geometric version of
b. Identifying RC x W with n as a complex this is that D, is represented as a holomorphic
vector space, and denoting by 71the natural fiber space over the homogeneous bounded
projection of G, onto the complex coset space domain D, corresponding to t,, with fibers
G,/B, the composite mapping z = n exp is a holomorphically equivalent to the homog-
holomorphic embedding of n into Cc/B, eneous bounded domain D, corresponding to
which induces a holomorphic Gf-tequivariant t, For this fibering there exists a universal
embedding of D( V, F) into G,/B as an open fiber space D over the product & of certain
submanifold. This embedding is called the classical symmetric domains, with the same
Tanaka embedding. By the (Sbilov) boundary S fibers, which plays the same role as that of a
of D( V, F) we mean the real submanifold S= tuniversal fiber bundle in topology. Here, D is
{(x+iy,u)~R~x Wly=F(u,u)} ofRCx IV, again a homogeneous bounded domain. The
which is a subset of the boundary of D( V, F). S fiber space Do-D, is induced from the fiber
has the natural +CR-structure induced from the space D-+6, by the classifying mapping i of
complex structure of RC x IV. Every element D, to 8,. Let /3 be the generalized Bore1 em-
of g,, can be extended to a unique holomorphic bedding of D into G,/G-. Then there exists a
1431 384 F
Siegel Domains

complex Abelian subalgebra m of oc satisfying with respect to the inner product ( , ) on R.


gc = g- + m (semidirect), and one can con- Let g(V) be the Lie algebra of G(V). Then the
struct a biholomorphic mapping f of p(D) onto totality f(V) of skew-symmetric operators in
a certain Siegel domain of the third kind over g(V) with respect to ( , ) is a tmaximal com-
D, in the vector space m [8]. The fiber space pact subalgebra of g(V) and is the isotropy
D,+DZ coincides with the one induced from subalgebra of g(V) at a point eE V. Consider
the aforementioned Siegel domain of the third the associated Wartan decomposition g(V) =
kind by the composite mapping of I and f.8. t(V) + p(V). For each x E R there exists a
Every realization of a homogeneous bounded unique element T(x) E p( V) such that T(x)e = x.
domain as a Siegel domain of the third kind is Let F be a V-Hermitian form on a complex
obtained by this method. vector space W. Define a Hermitian inner
product ( , ) on W by (u, o) =(e, F(u, u))
for u, UE W, and let H(W) be the set of Her-
E. Classification of Homogeneous Bounded mitian operators on W with respect to ( , ).
Domains A (homogeneous) Siegel domain D( V, F) c
RC x W is called quasisymmetric if V is homo-
The main concern is to classify all homoge- geneous self-dual and if for each XE R there
neous bounded domains in C” up to holomor- exists R(x)cH( W) such that F(R(x)u, u) + F(u,
phic equivalence. Since the realization as R(x)v)= T(x)F(u, II) for u, UE W. The normalj-
Siegel domains has been set up, the second algebra t of an irreducible quasisymmetric
step is to get the uniqueness theorem: The Siegel domain is characterized by the following
.
holomorphic equivalence of two homogeneous condltlons: dim fc.i+.,J,2 = a (1~ i < k < I); and
Siegel domains implies that they are linearly dim ktii2 = b (1 < i < I), where a, b are some
equivalent, that is, there exists a (complex) constants and I is the rank oft (D’Atri and de
linear isomorphism between the ambient vec- Miatello). Quasisymmetric Siegel domains
tor spaces which carries the one domain to the have been completely classified (M. Takeuchi,
other. The uniqueness theorem was first stated Nagoya Math. J., 59 (1975), also [12]).
in 1963 (Appendix in [l J), rigorously proved
in 1967 [lo], and in 1970 the homogeneity
assumption was removed [2]. A homogeneous F. Generalized Siegel Domains and Further
Siegel domain is called irreducible if it is not Results
holomorphically equivalent to a product of
two homogeneous Siegel domains. Every Let 0 be a domain in c” x Cm (n, m 2 0) which
homogeneous Siegel domain is linearly equiva- is holomorphically equivalent to a bounded
lent to a product of irreducible homogeneous domain and contains a point of the form (z, 0),
Siegel domains [3, lo]. A homogeneous Siegel z E C”. R is called a generalized Siegel domain
domain D( V, F) is irreducible if and only if the with exponent c (ceR), if n is invariant under
regular cone V is irreducible, i.e., if V cannot holomorphic transformations of C” x Cm of the
be written as a direct sum of two regular types
cones. So the problem is to classify irreducible
(z, u)t+(z+ a, u) for all UER”,
homogeneous Siegel domains up to linear
equivalence. This reduces to classifying two (z, u)m(z, e%) for all PER,
kinds of nonassociative algebras with bigrada-
(z, u)++(e’z, e%) for all teR.
tion, called T-algebras and S-algebras [ 111.
Nonsymmetric homogeneous Siegel domains Let D be a bounded domain in C”, and r a
appear in dimension 4. The numbers of such subgroup of G,,(D). r is said to sweep D if there
domains are finite up to dimension 6, but in exists a compact set K c D such that TK = D.
every dimension 2 7 there is at least one con- r is said to divide D if r, provided with dis-
tinuous family of nonsymmetric irreducible crete topology, acts properly on D and sweeps
homogeneous Siegel domains, which are not D. D is called sweepable (resp. divisible) if there
mutually holomorphically equivalent. exists a subgroup r of G,,(D) which sweeps
There is a remarkable class of homogene- (resp. divides) D. A divisible generalized Siegel
ous Siegel domains, called quasisymmetric domain is symmetric. A sweepable generalized
[ 121, which contains the class of symmetric Siegel domain with exponent c 20 (resp. c =0)
bounded domains. A regular cone Vc R is is a Siegel domain (resp. a product of a Siegel
called self-dual if there exists an inner product domain of the first kind and of a homogene-
( , )onRsuchthat V={xcRI(x,y)>Ofor ous bounded circular domain) ([ 133; also A.
ye v-(O)}, v denoting the closure of V. V is Kodama, J. Math. Sot. Japan, 33 (1981)).
called homogeneous if the group G(V) is tran- Some results have been obtained concerning
sitive on V. Suppose that V is homogeneous geometry of bounded domains, homogeneous
self-dual. Then the group G(V) is tself-adjoint bounded domains, and Siegel domains in
384 Ref. 1432
Siegel Domains

complex Banach spaces [ 141, and also con- 385 (XVl.6)


cerning the unitary representations of the
generalized Heisenberg group on the square-
Simulation
integrable cohomology spaces of a,-complexes
on the Shilov boundary of a Siegel domain A. General Remarks
Cl51.
Simulation, in its widest sense, is a method of
utilizing models to study the nature of certain
References phenomena. This method is employed when
experimentation with the actual phenomena in
[l] 1. I. Pyatetskii-Shapiro, Automorphic question is difficult because of high cost in
functions and the geometry of classical do- time or money. Also, it is sometimes almost
mains, Gordon & Breach, 1969. impossible to carry out observations when the
[2] W. Kaup, Y. Matsushima, and T. Ochiai, behavior of the objects can be influenced by
On the automorphisms and equivalences of their surroundings.
generalized Siegel domains, Amer. J. Math., 92 We can classify simulation techniques into
(1970) 2677290. the following four types, although simulations
[3] K. Nakajima, Some studies on Siegel in practical use are usually a mixture of them.
domains, J. Math. Sot. Japan, 27 (1975) 544 The first type is model experimentation,
75. which includes model basins and wind tun-
[4] J. Dorfmeister, Homogene Siegel-Gebiete, nels in hydrodynamics and pilot plants in the
Habilitation, Univ. Miinster, 1978. chemical industry. In advance of construction
[S] S. Murakami, On automorphisms of Siegel in a real situation, we perform experiments
domains, Lecture notes in math. 286, Springer, on a small scale and verify or modify those
1972. theories upon which the construction is based.
[6] N. Tanaka, On infinitesimal automor- The second type is analog simulation or
phisms of Siegel domains, J. Math. Sot. Japan, experimental analysis. We investigate the
22 (1970) 180-212. properties of real objects by experiments on
[7] S. G. Gindikin, E. B. Vinberg, and I. I. alternative phenomena satisfying the same
Pyatetskii-Shapiro, Homogeneous Kahler differential equations as those known for or
manifolds, Geometry of Homogeneous assumed to be satisfied by the real objects. For
Bounded Domains, C.I.M.E. 1967; Cremonese, example, we use an equivalent electric network
1968,3387. to study dynamic vibration, and dynamic
[8] S. Kaneyuki, Homogeneous bounded systems to study heat conduction problems.
domains and Siegel domains, Lecture notes in When theoretical analysis of the actual phe-
math. 241, Springer, 1971. nomenon is difficult, we look for other phenom-
[9] K. Nakajima, On Tanaka’s embeddings of ena with similar properties and study them
Siegel domains, J. Math. Kyoto Univ., 14 in order to construct mathematical models for
(1974), 5333548. them. This type of simulation has come into
[lo] S. Kaneyuki, On the automorphism practical use mainly in engineering problems,
groups of homogeneous bounded domains, J. but recently it has been utilized for the study
Fat. Sci. Univ. Tokyo, 14 (1967), 89-130. of economic phenomena, nervous systems, the
[ 1 l] M. Takeuchi, Homogeneous Siegel circulating system of an artificial heart, etc.
domains, Publ. study group geometry 7, Analog simulation was in the past often per-
Kyoto, 1973. formed by means of tanalog computers. Now-
[12] I. Satake, Algebraic structure of sym- adays, analog simulation is more frequently
metric domains, Iwanami and Princeton Univ. performed by digital computers than by ana-
Press, 1980. log ones. And with the progress of electronics
[ 131 J. Vey, Sur la division des domaines de it has become easier to make special-purpose
Siegel, Ann. Sci. Ecole Norm. Sup., 3 (1970) simulators.
4799506. The third method of simulation, simulation
[ 141 J. P. Vigut, Le groupe des automor- in the narrow sense, has become more impor-
phismes analytiques d’un domaine borne d’un tant as tdigital computers have been developed.
espace de Banach complexe, Ann. Sci. Ecole In general this method is applied to problems
Norm. Sup., 9 (1976) 203-282. that are more complicated and of larger scale
[ 151 H. Rossi and M. Vergne, Group represen- than problems treated by analog simulation.
tations on Hilbert spaces defined in terms of When the mathematical expressions of the
&-cohomology on the Shilov boundary of a phenomenon and the algorithms of its dy-
Siegel domains, Pacific J. Math., 65 (1976), namic structure are known, it is easy to simu-
193-207. late it by means of a computer program. In
1433 385 C
Simulation

particular, when these techniques are used to the fourth type. But their use has also induced
study systems such as sets of machines, equip- heated discussions and controversies on the
ment at factories, or management organiza- validity of results.
tion, we call them system simulations. Major
fields where system simulation techniques have
been used are traffic control on highways or at B. Programming Languages for Simulation
airports, arrangement or operation of ma-
chines at factories, balancing problems in We usually describe models by using general
chemical processes, production scheduling in purpose language: such as FORTRAN, or list
connection with demands and stocks, overall processing languages such as LISP to simulate
management problems, and design of informa- situations on computers. For system simula-
tion systems. The method has also been ap- tion, a number of programming languages
plied in designing plants and highways and in have been developed and put to practical use.
the study of social or biological phenomena. They can be divided roughly into two cate-
Also, when we investigate instruction systems gories: those for which systems change continu-
of computers that are yet to be completed ously and those describing discrete changes.
or develop programming systems for such CSMP (Continuous System Modeling Pro-
computers, existing computers can be used to gram), CSSI (Continuous System Simulation
simulate the new ones. tRandom numbers play Language), and DDS (Digital Dynamics
an important role where the simulation must Simulator) belong to the former, and hence all
include random fluctuations (- 354 Ran- involve integration mechanisms; but each
dom Numbers). In such instances, the method has a different way of describing a model.
is often called the Monte Carlo method (- DYNAMO, which has been implemented, or J.
Section C). W. Forrester’s Industrial Dynamics and World
The fourth method of simulation deals with Dynamics, are used extensively. To control
systems containing human beings. Among simulation time, one may use GPSS (General
them are war games for training in military- Purpose Simulation System), SIMULA
operation planning, business games for train- (Simulation Language), or SIMSCRIPT
ing in business enterprises, and simulators for (Simulation Scriptor), each employing a differ-
training pilots and operators of atomic power ent method to describe state transitions.
plants. The contribution of human decision to
simulation processes is characteristic of these
cases. For example, the participants in a busi- C. The Monte Carlo Method
ness game are divided into several enterprise
groups. Each group discusses and decides how The Monte Carlo method was introduced by J.
to invest in plants, equipment, research, and von Neumann and S. M. Ulam around 1945.
advertising and how to schedule production They defined this as a method of solving deter-
for each quarter. On the basis of the deci- ministic mathematical problems using tran-
sions, a computer outputs the records of sales, dom numbers. L. de Buffon’s needle experi-
stocks, and cash for each quarter, according ment, in which the approximate value of 71is
to hidden rules. From the results, each group obtained by dropping needles at random many
decides on the next steps. In this way, the times, is a classical example of this method.
groups compete for development. This type Another example is the problem of evaluat-
of simulation is important not only for train- ing a definite integral I =cf(x)dx (B>f(x)>
ing but also for investigating the mechanism A 2 0). First we generate many pairs of (uni-
of human decision. Slightly different from form) random numbers (x, y), where a <x < b
this type of simulation, the “perceptron” and and A < y < B. The proportion (p) of pairs
EPAM (Elementary Perceiver and Memorizer) satisfying y <f(x) gives an estimate of the inte-
are related to artificial intelligence and have gral, i.e., I + p(B - A) (b -a). The techniques
been used extensively in the cognitive sciences for inverting matrices, solving tboundary value
and in research into the structure and function problems of partial differential equations, and
of the human brain. so on, are also examples of the Monte Carlo
The third type of simulation has attracted method in this sense. However, direct numer-
attention in particular and has been,used both ical calculation seems to be more useful in
in theoretical problems, such as the explication dealing with this sort of problem. At present,
of various phenomena, and in practical prob- Monte Carlo methods are usually used when it
lems, such as design or optimum operation of is difficult to construct (or solve) mathematical
systems or prediction of their behavior. For equations describing the phenomena in ques-
school education and training of technicians, tion, for example, when the phenomena in-
this is put to use together with simulations of volve tstochastic processes such as trandom
385 Ref. 1434
Simulation

walks. Some methods have been devised so as theory, where m is the mass of the particle),
to get precise results efficiently. and the S-matrix S is described in terms of the
S-matrix elements (pl,. . . ,p,lSlp;, ,pb,)
(which is a distribution) as
References

[l] B. P. Zeigler, Theory of modeling and mw= @(P)(Pl~lP’)~(P’)4(P’w4P’)>


s-
simulation, Wiley, 1976.
where Q, and Y are wave functions for n and n’
[2] R. S. Lehman, Computer simulation and
particles, p = (p, , . , p,), and similarly for p’.
modeling, Wiley, 1977.
(pISIp’) gives quantities measured in scattering
[3] R. E. Shannon, Systems simulations: The
experiments, as will be explained in Section B
art and science, Prentice-Hall, 1975.
[4] J. M. Hammersley and D. C. Handscomb, (2).
In quantum mechanics, the free and actual
Monte Carlo methods, Wiley, 1964.
(interacting) motion of particles is described in
[S] A. Newell and H. A. Simon, Human prob-
the same Hilbert space with free and interact-
lem solving, Prentice-Hall, 1972.
ing Hamiltonians H, and H. A vector @ in
[6] M. Minsky and S. Papert, Perceptrons,
interacting motion behaves like a vector cp in
MIT Press, 1969; second printing with correc-
free motion at infinite past if
tions by the authors, June 1972.
[7] J. W. Forrester, World dynamics, Wright- Il(exp[ -iHt])@-(exp[ -iH,t])cpll +O
Allen Press. 1972.
as t+ --oc
and hence

a,= W-(H; H&p,

Wm(H; Ho)= lim eiHfeeiHot.


386 (Xx.29) 1-t-m
S-Matrices Such a @ is often written as

A. Basic Notion @=@“(cp)= @‘“(P)cp(PMP)>


s
It is often useful to focus attention on the and is called an in-state. A definition for an
relation between a physical system’s input and out-state aout is obtained by changing t + -IX
output, without worrying about intermediate to t+ +cc and IK to W+. The S-matrix ele-
processes (the black box), which may be in- ment is defined (as a distribution) by
sufficiently understood or too complicated to
(PI.YP’)=(@““‘(P), @“(P’)).
analyze. For the scattering of particles, this
leads to the notion of an S-matrix that directly The existence and properties of IV+ (called
relates the state of incoming particles (before twave operators) are central subjects in scat-
scattering processes take place) to that of tering theory (- 375 Scattering Theory).
outgoing (scattered) particles. In quantum field theory, the asymptotic
In typical cases, the incoming and outgoing description is given in terms of vectors in
particles are described as mutually noninter- +Fock space, and the in- and out-states are
acting (these are called free particles). This constructed in terms of tasymptotic fields
implies particle motions along straight lines at (- 150 Field Theory).
constant speeds (asymptotic to the actual The foregoing description actually applies
motion at infinite past for incoming particles only to a system of one-component particles
and at infinite future for outgoing particles) in of the same kind. More generally. additional
classical mechanics, and wave functions (or (discrete) variables, say IX, are needed to dis-
vectors in a Hilbert space) obeying the Schro- tinguish different kinds of particles and differ-
dinger equation with a free Hamiltonian H, in ent spin components of each kind of particle,
quantum mechanics and more or less the same and the p’s appearing in the above formulas
in quantum field theory. should be replaced by (p, a)‘~, along with re-
A wave function for n particles is an L,- lated changes in the measure.
function of their momenta pr , , p, (each pj Even in a quantum mechanics of many
being a 3-dimensional vector) with respect to identical particles a bound state, if it exists, is
an appropriate measure (normally the Le- to be treated as another particle (different from
besgue measure dp(p) = n d3pj in quantum the original one) and should be distinguished
mechanics and the Lorentz-invariant measure by a’s in the asymptotic description r&
d~(p)=~{(m2+p/2)~‘~2d3pi} in quantum field If the interaction is of long range (e.g.,
1435 386B
S-Matrices

Coulomb interaction), the classical path of a invariant under G if and only if


particle does not have an asymptote in gen-
eral, and correspondingly the wave operators s= l,(,, 0 S(k)dv(k).
W+ do not exist for the usual free Hamil- s
tonian. Still, an asymptotic description of When a (scalar) particle is scattered by a
scattering is possible in some cases. central potential, irreducible representations
In the presence of massless particles, such are labeled by the energy E(p) = pZ/(2m) (for
as photons in quantum field theory, another time translation) and the angular momentum 1
difficulty, called the infrared problem, can arise (for rotations) with dim P@(p), I) = 1. There-
in the asymptotic description of scattering fore each S(k) = Sr( IpI) is a number. For any
because the scattered particle may be accom- given energy, I= 0, 1,2,. . are referred to as the
panied by an infinite number of massless par- S-wave, P-wave, D-wave,. . . or generally as
ticles (with very small energy). In such a situ- partial waves.
ation, a representation of a free massless field For relativistic scalar particles, irreducible
not equivalent to the standard Fock represen- representations (with positive energy and
tation is believed to be a possible candidate nonzero real mass) are labeled by the center-
for the asymptotic description of scattering. of-mass energy squared s (= (Z(m’ + pf)“*)” -
(Z pi)*) and the total angular momentum 1. Ifs
is below the threshold (3m)* of 3-particle
scattering, dim P(k) = 1 and S(k) = S,(s) is a
B. Basic Properties
number.

(1) Invariance. Let Ye, be the Hilbert space for


the asymptotic description of scattering, such (2) Unitarity. Sq is supposed to represent the
as the space of &-functions (p(pI, . . . , pm) rela- t= +co asymptotic (free) behavior of the state
tive to the measure dp(p). The S-matrix is an that initially (i.e., at t = -co) behaves like a
operator on &e whose matrix element is as free state cp. If there is no loss of probability in
described above. (The corresponding operator the description of scattering, the S-matrix is
S in the Hilbert space & describing the inter- isometric. If all asymptotic configurations
acting states is defined by S@‘“‘(rp) = @‘“(cp) are realized as a result of scattering, the S-
and is sometimes called an S-operator.) matrix must also be unitary. The mappings
put put
S is said to be invariant under a group G of a’” :(pE&?o+@‘n ((P)EZ (IV+ in quantum
transformations of the p’s (and possibly the a’s) mechanics) are proved to be isometric under
if (U(g)rp)(p)=cp(g-‘p) defines a continuous a general assumption. The unitarity is then
unitary representation U(g) and if U(g)S= proved in potential scattering (under some
SU(g) for all g E G. First, S is usually invari- conditions on the potential) by showing that
ant under time translation. In the quantum the two wave operators W, have the same
mechanics of a particle scattered by a rotation- range. In fact, a somewhat stronger result-
ally invariant potential, S is invariant under that this range is the same as the absolutely
the group of rotations of 3-dimensional vec- continuous spectral subspace for the interact-
tors p; in the quantum mechanics of many ing Hamiltonian H-is usually proved and is
particles (mutually interacting through central called completeness (of scattering states in the
potentials), S is invariant under the 3-dimen- absolutely continuous spectral subspace of H).
sional Euclidean group of transformations If the mappings @‘“’ and @” (or IV’,) are iso-
p+Rp + a with rotation R; in relativistic field metric and have the same range, then @‘“(cp)=
theory, S is assumed to be invariant under the @‘“‘(So), which shows that the state behaving
tinhomogeneous Lorentz group of transfor- like cp at t= -cc behaves like Srp at t = +co.
mations p+hp + a with p =(p’, p) and homo- In the simple multiplicity cases such as
geneous Lorentz transformation A. S,(lpI) and S,(s) above (which correspond to
In all these examples, G is of type I, i.e., the physical situation of purely elastic scatter-
there is a direct integral decomposition ing without any production or change of par-
ticles), these numbers must be of the form ezidl
z.=s2”(k)
0Y(k)dv(k), due to unitarity, where the real number 6, is
called the phase shift. In terms of the phase

u(g)
= Q(g)
01,&W> shift, the differential cross section do/da, which
is the average number of particles scattered per
unit time per unit solid angle around the direc-
into irreducible representations U, on S(k), tion forming an angle 0 with the incident
which are mutually inequivalent, where Z’(k) is uniform parallel beam of unit intensity (one
some Hilbert space for each k. The S-matrix is particle per unit time per unit area) when
386 B 1436
S-Matrices

viewed in the center-of-mass system, and the species), i = 1, , n, and similarly for (p’, x’), and
total elastic cross section Gc, = [(da/dR)dQ are 0, is the mapping from particles to their anti-
given by the following formulas, called the particles (without changing spin indices). We
partial wave expansion: may view the mapping Jp, a)-+q(cc)iF’“‘(p, 0,cx)
as an antiunitary operator 0, on .X0 satisfying
drr/d!a = I.f(s, @I*, O,SO,’ = S* so that

,f(s, U)= k-’ 2 (21+ l)j;Pl(cosU), ,f;=sin6,e’61,


1=0
This is related to the +TCP operator 0 in
I quantum field theory by OY”“‘(~)=‘-I”“(O,cp)
and @‘I”‘“(~)=Y”‘(@,,(p). The name TCP
where dQ = sin 0 dO dq (invariant measure on a
comes from the combination of 7‘ for time
2-dimensional sphere S’) and k is the wave
reversal (incoming F? outgoing, t’=p/m+ -u),
number of the particle in the center-of-mass
C for charge conjugation (particle ti anti-
system (k=h~‘~pJ=h-‘[(s/4)-m2]“2). The
particle), and P for parity, which is a quan-
function .1‘ is called the scattering amplitude.
tum number for space inversion (p-+-p).
The forward scattering amplitude ,f(s, 0) is
TCP symmetry was suggested to G. Liiders
related to the total cross section otot by the
(Dansk. Mat. Fys. Medd., 1954) by B. Zumino
optical theorem:
in the form that P-invariance implies TC sym-
rr,,,,= 4nk -’ Imf(s, 0), metry. W. Pauli (Niels Bohr and the deuelop-
rnent of physics, Pergamon 1955, 30) realized
which follows from the unitarity of the S-
that TCP is a symmetry. R. Jest (He/v. Phys.
matrix. Here the total cross section is expressed
Acta, 1957) gave its proof in the framework of
as the area of the transverse cross section
axiomatic quantum field theory.
of a classical (impenetrable) scatterer that
would scatter the same amount of particles.
In a purely elastic region, a,,, = oe,, and the (4) Crossing Symmetry. In the framework of
optical theorem is the same as the assertion either axiomatic quantum field theory or the
Imh=lf;l’. theory of local observables, it has been shown
Even for values of s above the threshold of by J. Bros, H. Epstein, and V. Glaser (Comm.
inelastic scattering, the restriction of S, to the Math. Phys., 1 (1965); also [3]) that there exist
subspace of two particles is again described by analytic functions H(k, r) of complex variables
numbers ezisl, where 6, is now complex, and k = (k, , . . , k4) in a certain domain D such that
the same formulas for the differential and total each kj is a complex 4-vector, the variables are
elastic cross sections hold, except that uni- on the mass-shell manifold defined by kf = rn;
tarity of S, now implies Im,fr>],fi12. (kf is in the Minkowski metric) and C kj = 0,
and the boundary value of H(k, CI) as kj ap-
(3) TCP Symmetry. In the framework of either proaches cjpj from Im s > 0 (s being the square
iaxiomatic quantum field theory or the itheory of the sum of two k’s for incoming particles) in
of local observables, the TCP theorem (or D is the scattering amplitude for the following
PCT theorem) shows [l] that to every particle processes involving the particles A, and their
there corresponds another particle with the antiparticles Ai (j = 1, ,4) with 4-momenta
same mass and spin, which is called the anti- pj, 4= 1 for Aj and -1 for xj (some of the A’s
particle and can be the original particle itself, and A’s may coincide):
such that any particle is the antiparticle of its
(i) A, + A,-tA, +&,
antiparticle, and the following relation, called
TCP invariance (or PCT invariance [Z]) holds (i’) A,+A,-tx,+x2,
for S-matrix elements:
(ii) Al+A3+~2+.&,

(ii’) A,s-A,-,~~+~~,
=rle4llbY F(o)-F(a’)(p’,Oo”‘ISIp,Bo”).
(iii) A, + A4-+xz + I%~,
Here F(a) and F(cc’) are the number of particles
(iii’) A,+A,-+&+.&.
with half odd integer spin amongst the incom-
ing and outgoing particles (i.e., in s( and c(‘), Any pair of relations taken from (i)-(iii) con-
q(x) and ~(cc’) are the product of I1 assigned stitutes a crossing symmetry.
to each particle in CIand a’, respectively, with
the assignment to a particle and its antiparticle
the same for bosons and opposite for fermions, (5) High Energy Theorems. M. Froissart (Phys.
(p, 2) is an abbreviation for an ordered n-tuple Rev., 123 (1961)) obtained from the Mandel-
of energy-momentum 4-vectors pi and other stam representation the following bound for
indices CX~ (for spin components and particle the forward scattering amplitude (called the
1437 386 C
S-Matrices

Froissart bound) at large s: strong control over the structure of the S-


matrix (G. F. Chew [6]; H. P. Stapp, Phys.
IF(s,O)( <(const)s(logs)‘,
Rev., 125 (1962); J. C. Polkinghorne, Nuouo
F(s, 4 7 MS, 4, t=2k2(cos&l). Cimento, 23 and 25 (1962); J. Gunson, J. Math.
Phys., 6 (1965); D. I. Olive, Phys. Rev., 135B
A. Martin has obtained such a bound in the (1964); Chew [7]; R. J. Eden et al. [S]). The
axiomatic framework. As a consequence we study of the S-matrix based on its general
have the Froissart-Martin bound: properties, such as invariance, unitarity, and
analyticity, is called S-matrix theory. In the
present form, it is adapted only to massive
where .E> 0 is arbitrary, R can be taken to be particles with short-range interactions, and
4m;5 (m, is the mass of a pion) for many cases, its applicability is believed to be limited to
such as XK, nK, EN, and 7cA scattering, and s0 strongly interacting systems.
is an unknown constant.
Many other upper and lower bounds for (2) Landau-Nakanishi Variety. C. Chandler
scattering amplitudes have been obtained and Stapp (J. Math. Phys., 10 (1969)) and D.
under other assumptions [4,5].
Iagolnitzer and Stapp (Comm. Math. Phys., 14
I. Ya. Pomeranchuk (Sou. Phys. JETP, 7 (1969)) clarified the analytic structure of the S-
(1958)) suggested the asymptotic coin- matrix in terms of Landau equations (- 146
cidence of total cross sections at high energy Feynman Integrals) based on the important
for scattering of AB and AB, where B is the physical idea of macroscopic causality, which
antiparticle of B. This Pomeranchuk theo- gives much more precise information in the
rem has been shown to hold by Martin physical region than a superficial application
(Nuovo Cimento, 39 (1965)) by using the ana- of tlocality (also called microcausality) in
lyticity derived in the axiomatic framework axiomatic quantum field theory, though it is
under the following sufficient condition: possible that a detailed study starting from
The existence of lim,,, [a(AB) -@@)I and microcausality and incorporating tasymptotic
lim,,,(slogs)-‘f(s, 0) =O. completeness (the so-called nonlinear program
in axiomatic quantum field theory) might
C. S-Matrix Approach eventually entail the macroscopic causality
condition (e.g., J. Bros, in [9]; Iagolnitzer
(1) History. All the information needed to [lo, ch. IV]; also K. Symanzik, J. Math. Phys.,
understand the experimental elementary- 1 (1960)).
particle scattering data seems to be expressible
by S-matrix elements. It was therefore natural (3) Microlocal Analysis. An important fact is
to try to develop a foundation (and practical that the normal analytic structure of the S-
methods of calculation) for the theory of ele- matrix discussed by Iagolnitzer and Stapp
mentary particles on the basis of some gen- essentially coincides with the notion of analy-
eral properties of the S-matrix, especially ticity in microlocal analysis, i.e., with micro-
when other approaches, such as quantum field analyticity (- 274 Microlocal Analysis; F.
theory, faced difficulties. W. Heisenberg (Z. Pham and Iagolnitzer, Lecture notes in math.
Phys., 120 (1943)) first pointed out the possi- 449, Springer, 1975; M. Sato, Lecture notes
bility of such an approach soon after the intro- in phys. 39, Springer, 1975)). In a word, the
duction of the S-matrix by J. A. Wheeler (Phys. tLandau equations have acquired a new inter-
Rev., 52 (1937)). Unfortunately, in the early pretation in the description of the micro-
1940s not much dynamical content could be analytic structure of the S-matrix. In the new
given to such an S-matrix-theoretic approach developments, the Landau equations define a
because only unitarity and some invariance variety in the cotangent bundle (over the mass-
properties, such as tLorentz invariance, were shell manifold in momentum space), and the
used to characterize the S-matrix. In the late +singularity spectrum of S-matrix elements is
195Os, through the study of the analyticity of assumed to be confined to Ucp’+(G) (except
the S-matrix in connection with dispersion for the so-called Jo-points), where G ranges
relations in quantum field theory, it became over all possible Feynman graphs and y+(G)
evident that causality is another important denotes the positive-a Landau-Nakanishi
determinant of S-matrix structure. In practice, variety associated with G (- 146 Feynman
causality in position space is used in the form Integrals). The union U,y’(G) is known to
of the analyticity of the S-matrix elements in be locally finite and hence makes sense (Stapp,
the energy-momentum space (variables dual to J. Math. Phys., 8 (1967)). The old interpre-
space-time positions in the Fourier transform). tation of Landau equations, as defining a
Subsequently it was realized that analyticity variety in energy-momentum space, corre-
combined with unilarity gives surprisingly sponds now to considering the variety L+(G)
386 Ref. 1438
S-Matrices

obtained by projecting 5?‘(C) from the co- and two outgoing scalar particles with equal
tangent bundle to the base manifold (i.e., the mass m > 0. Let f;(s) be the partial scattering
mass shell manifold). The new interpretation of amplitude defined earlier. Regge introduced
Landau equations led Sato (Lecture notes in the idea of extending the function A(s) to an
phys. 39, Springer, 1975) to make a further analytic function f(l, s) (16 C) and of applying
intriguing conjecture that the S-matrix would the Sommerfeld-Watson transformation in
satisfy a special toverdetermined system (a order to replace the partial wave expansion by
tholonomic system) of +(micro-) differential the integral
equations whose tcharacteristic variety is given ^
by the complexitication of Landau-Nakanishi &i/2 Jc(2i+ l)f(l,s)P,(-cosQ)dl/sinxl
varieties. This conjecture is closely related to
the monodromy-theoretic approach by T. = F(s, t)
Regge (Pub/. Res. Inst. Math. Sci., 12 suppl.
(1977) and the references cited therein) and his for a certain contour C in the complex I-plane
co-workers. which encircles {0, 1,2,. }. If f(i, s) is mero-
morphic in Re I> -l/2 and if it tends to zero
(4) Discontinuity Formula. It has turned out sufficiently rapidly at infinity, then one can
that the above approach is closely related to change the contour C so that, with the help of
the so-called discontinuity formula obtained by Cauchy’s integral formula, F-constant. t’@),
combining the unitarity and the analyticity of a(s) = max Re I(s), where the maximum is taken
the S-matrix. Actually T. Kawai and Stapp over all the poles of f(l, s). Thus the poles of
(Puhl. Res. Inst. Math. Sci., 12 suppl. (1977)) the extended function f(l, s) determine the
have shown that Sato’s conjecture can be asymptotic behavior of F as t * a (Regge
verified at several physically important points behavior) under the assumption that f(l, s) can
on the basis of the discontinuity formula. The be chosen to satisfy suitable analyticity and
discontinuity formula was first found by R. E. growth order conditions. These poles are
Cutkosky (J. Math. Phys., 1 (1960)) for Feyn- called Regge poles. Even though meromorphy
man integrals. It describes the ramification conditions are found to be satisfied for scatter-
property of the integral around its singularities ing by a (Yukawa) potential, they do not seem
(- 146 Feynman Integrals). An analogous to be satisfied for the full S-matrix in the relati-
formula has been shown to be valid also for vistic case. More general cases than those
the S-matrix, and it demonstrates how strict discussed here, i.e., the cases where more vari-
are the constraints derived from unitarity and ables are considered, are also being studied
analyticity (Eden et al. [S, ch. 41; M. J. D. but without full success at the moment. For
Bloxham et al., J. Math. Phys., 10 (1969); J. details and references - [7,12,13].
Coster and Stapp, J. Math. Phys., 10 (1969);
also Stapp in [l l] and Iagolnitzer [lo]). Note, (6) Veneziano Model. In connection with
however, that the derivation of the hitherto- Regge-pole theory, we note an interesting
known discontinuity uses either some ad hoc observation by G. Veneziano (Nuovo Cimento,
assumptions or some heuristic reasoning 57A (1968)) to the effect that r( 1 - a(s))r( 1 -
which is not rigorous or sometimes is even cc(t))/T( 1 -a(s) - a(t)), with a(s) being linear
erroneous from the mathematical viewpoint. in s, satisfies a crossing symmetry (in s and t)
Efforts to give a rigorous proof are still being and shows an exact Regge-pole behavior.
made, and these present several mathemati- Although the many results that have been
cally interesting problems (e.g., Iagolnitzer in obtained give rise to a hope of constructing a
[9] and M. Kashiwara and Kawai in [9]). realistic model of the S-matrix starting from
the aforementioned function, no one has yet
(5) Regge Poles. The results stated so far con- succeeded [ 141. A more promising approach is
cerning the analyticity of the S-matrix have the topological expansion procedure in which
been primarily derived in the low-energy the first term of the expansion apparently
region. It is commonly hoped that these results shares with the potential-scattering functions
can be related to its high-energy behavior the property of having only poles in the com-
through the inner consistency of S-matrix plex I-plane, along with several other physi-
theory, even though it is still unclear to what cally important properties of Veneziano’s
extent such a relationship can be developed. function.
Such a hope was advocated by Chew, who had
been inspired by the results of Regge (Nuouo References
Cimento, 14 (1959); 18 (1960)) for potential
scattering. After being adapted to the relativ- [1] H. Epstein, CPT invariance of the S-
istic case, Regge’s idea took the following matrix in a theory of local observables, J.
form: Consider the scattering of two incoming Math. Phys., 8 (1967), 750-767.
1439

[2] R. F. Streater and A. S. Wightman, PCT, tions describing water waves having traveling-
spin and statistics, and all that, Benjamin, wave solutions. The equation
1964. u, - 6uu, + u,, = 0, u = u(x, t), (1)
[3] H. Epstein, in Axiomatic field theory,
M. Chretien and S. Deser (eds.), Gordon & derived by de Vries is called the KdV equation
Breach, 1966. for short. Putting u(x, t) = s(x - ct - 6), we find
[4] A. Martin and F. Cheung, Analyticity that s is an telliptic function, and we obtain
property and bounds of the scattering ampli-
tudes, Gordon & Breach, 1970.
[S] A. Martin, Scattering theory in unitarity,
analyticity and crossing, Lecture notes in phys. as its degenerate form. This solution is called a
3, Springer, 1969. solitary wave.
[6] G. F. Chew, S-matrix theory of strong Around 1965, M. Kruskal and N. Zabusky
interaction physics, Benjamin, 1962. solved the KdV equation numerically, taking
[7] G. F. Chew, The analytic S-matrix: A basis several separated solitary waves as the initial
for nuclear democracy, Benjamin, 1966. data. They found that the waves interact in a
[8] R. J. Eden, P. V. Landshoff, D. I. Olive, complicated way but that eventually the initial
and J. C. Polkinghorne, The analytic S-matrix, solitary waves reappear. Noting the particle-
Cambridge Univ. Press, 1966. like character of the waves, they called each of
[9] D. Iagolnitzer (ed.), Proc. Les Houches, these waves a soliton. Subsequently, the KdV
Lecture notes in phys. 126, Springer, 1980. equation was found to have an infinite number
[IO] D. Iagolnitzer, The S-Matrix, North- of constants of motion.
Holland, 1978. C. Gardner, J. Greene, Kruskal, and R.
[ 1I] R. Balian, and D. Iagolnitzer (ed.), Struc- Miura associated the l-dimensional tschriidin-
tural analysis of collision amplitudes, North- ger operator - d2/dx2 + u(x, t) to each solution
Holland, 1976. u(x, t) of the KdV equation and showed that
[ 12) S. Frautschi, Regge poles and S-matrix its teigenvalues are preserved in time. More-
theory, Benjamin, 1963. over, they applied inverse scattering theory
[ 133 P. 0. B. Collins, An introduction to (- Section D) and obtained explicit formulas
Regge theory and high energy physics, Cam- for the solutions.
bridge Univ. Press, 1977.
[14] M. Jacob, Dual theory, North-Holland,
1974. C. Lax Representation

Let
L= -D’+u(x), D = a/ax.
387 (XX.34) For
Solitons M= -4D3+6uD+3u,,

the commutator [M, L] = ML - LM is the


A. General Remarks
operator of multiplication by the function
6uu, - u,,, . So u = u(x, t) is a solution of the
Solitons are nonlinear waves that preserve
KdV equation if and only if
their shape under interaction. Mathematically,
the theory of solitons continues to develop as L,=[M, L]. (2)
a theory of completely integrable mechanical
Equation (2) is also the condition that all L =
systems. Typical examples are the Korteweg-
L(t) are tunitarily equivalent to each other
de Vries equation (- Section B), the Toda
and the tspectrum of L is preserved through
lattice (- 287 Nonlinear Lattice Dynamics),
time.
and the Sine-Gordon equation
Most equations having soliton solutions can
u,, - u,, = sm u, be represented in the form of (2) for a suitable
pair of L and M. This representation is called
studied classically in connection with trans-
the Lax representation. One sometimes says
formations of surfaces of constant negative
that an isospectral deformation of L is given
curvature.
by (2).
On the other hand, isomonodromic defor-
B. The KdV Equation mations (- 253 Linear Ordinary Differential
Equations (Global Theory)) have been studied
In the late 19th century J. Boussinesq and then extensively by M. Sato and his co-workers,
D. Korteweg and G. de Vries obtained equa- and relations to soliton theory have been
387 D 1440
Solitons

discovered (Sato, T. Miwa, and M. Jimbo, Publ. The coefficient a(<) can be continued analyti-
Res. Inst. Math. Sci., 14 (1978), 15 (1979)). cally to the upper half-plane, where it has only
In the present case, the requirement that the a finite number of zeros, all of which are sim-
commutator [M, L] be a multiplication by a ple and lie on the imaginary axis. Denote
function determines an essentially unique them by iqj (j = 1, . , n). The +point spectrum
(2n + l)th-order ordinary differential operator of the operator L consists of the numbers
A4 = A,, the differential operator part of the - $, and the associated eigenfunctions are
+fractional power L”+l/*. [A,, L] is a poly- f*(~, iv,), which are real-valued. Put
nomial in u and its derivatives, denoted by
cj=(S.f+(X,i~J)2dX)-',
K,[u]. The equation u,=K,[u] is called the
nth KdV equation. The transformation taking and call t(<)=a(<)-’ and r(t)=b(<)/a(<) the
the initial data u(x) to the solution u,(x, t) of transmission coefficient and the reflection
the nth KdV equation is denoted by r,(t). coefficient, respectively. The triplet r(r), vi, c,
Then T,(t) T,(s) = T,(s) T,(t), i.e., the flows de- (.j= 1, , n) is called the scattering data. It is
fined by these higher-order KdV equations connected with the kernel K = K + by the
commute. This property and the existence of Gel’fand-Levitan-Marchenko equation
infinite number of invariant integrals are con- m
sequences of the complete integrability of the K(x,t)+F(x+t)+ F(x+t+s)K(x,s)d,s
higher-order KdV equations considered as s0
infinite-dimensional Hamiltonian systems. (t>O),
The KdV equations can be studied group-
theoretically as Hamiltonian systems on a F(.x)=n^’ PA r(<)e2”“dx+2f ciem2qJ”.
certain coadjoint orbit in the ‘dual space of s -w j=1

the iLie algebra of a certain class of +pseudo-


The potential is given by
differential operators (M. Adler, Inventiones
Mnth., 50 (1979); also - B. Kostant, Advances U(X)’ -(2K/dx)(x,O).
in Math., 34 (i 979) for the analogous facts for
When the reflection coefficient r(g) vanishes
the Toda lattice).
identically, the potential is called a reflection-
The ordinary differential equation K,[u] = 0
less potential. The kernel K then becomes a
is called a stationary KdV equation. By the
tdegenerate kernel and the potential is ex-
commutativity of the flows r,(t), each KdV
pressed by
flow leaves invariant the space of solutions of
K,[u] =O. The flows restricted to this space
form a completely integrable Hamiltonian u(x)= -2$logu(z),
system with finitely many degrees of free-
dom (S. P. Novikov, Functionul And. Appl., 8 where D(x) is the determinant of the n x n
(1974)). matrix whose j, k entry is S,, + cjexp { - (qj +
K,[u] =0 is also the condition that there ‘?!Jx}l(Vj + V!J
exist an ordinary differential operator M The authors of [ 1) showed that if u(x, t) is a
which commutes with L. J. Burchnal and T. solution of the KdV equation and if u(x, t)-0
Chaundy (Proc. London Math. Sot., (2) 21 (x-r km), then the time development of the
(1922)) studied this problem and showed that scattering data of the potential u(.u, t) is as
such L and M are connected by the relation follows: n and vi do not depend on t, and
M2 = P(L), where P is a certain polynomial of
cj(t) = cjes+, r(<, t) = r(<)e8ir'r,
degree 2nt 1 and that the potential u is ex-
pressed by the +theta function associated with The solution associated with the reflectionless
the thyperelliptic curve w2 = P(z). potentials are obtained by replacing c,, by c)(t)
in the formula for D(x). These are soliton
solutions of the KdV equation.
R. Hirota developed a method of treating
D. Inverse Scattering Method
functions like D(x) directly for most of the
equations in the soliton theory (Lecture notes
Let u(x) be a potential such that u(x)*0 as
in math. 515, Springer, 1976).
x--t Fm. The equation &f=[i2f’(Im<>0) has
A certain geometric method that enables
solutions ,f+(x, 0 that can be represented as
one to obtain solutions of the Sine-Gordon
f&q<)=cA”“(l +J;K+(X,t)P!‘dr). equation from a known solution has been
studied in the transformation theory of sur-
Putting i = 4 + iv and noting that f+(x, 0 and
faces of constant negative curvature (G. Dar-
.f+(x, -t) are independent solutions of Lf=
boux, Leqons sur la thiorie gh&de des sur-
[*,L one can express ,f- as
,&aces, Chelsea, 1972, vol. 3, ch. 12), and its
.1-(x, 4)=44).f+,(x, -cT)+b(<)f+b, 5). generalizations are called BLcklund transfor-
1441 387 F
Solitons

mations. Soliton solutions of the KdV equa- for each t. Then g and iLj are preserved in time.
tion can also be constructed by this method. The determination of the direction II* on the
Relations to the inverse scattering method, Jacobian variety is similar to that of u,, and
differential systems, and transformation groups the solutions of the KdV equation are ob-
have also been studied (Lecture notes in math. tained by replacing the vector c by tu, + c’ in
5 15, Springer. 1976). (3). The case g = 1 is the elliptic traveling wave
solution (- Section B). Most of these results
have been extended to the general periodic
E. Periodic Problem problem (H. P. McKean and E. Trubowitz,
Comm. Pure Appl. Muth., 29 (1976)).
Let the potential u(x) be of period 1, and con-
sider Hill’s equation (- 268 Mathieu Func-
tions E) Lf= /if: The real (i-) axis is divided F. Two-Dimensional KdV Equation
into intervals of unstable solutions (u-intervals)
alternating with intervals of stable solutions. Let S be a compact TRiemann surface of tgenus
One of the u-intervals is of the form (--co, &,I g, and let pu be a fixed point on S. Put F(K) =
and the others are finite, possibly degenerating a&+ . ..+a. and G(~)=b,,,~“‘f...+h~. A
to points. The special potentials that have only function $(x, y, t, p) of p E S and of x, y, t is
a finite number of nondegenerate u-intervals uniquely determined by the following con-
are the periodic analog of the reflectionless ditions: (a) it is meromorphic on S - {p,}, and
potentials, and are called the finite-gap (or its pole idivisor is a general divisor of degree g
-band) potentials. and does not depend on x, y, t; (b) for a tlocal
Consider the eigenvalue problem &f=I& parameter z at p7) (z&)=0) and ti=z-‘, tiO=
.f(~) =,f(z + I) = 0 for a fixed real parameter T. $exp(-KX-F(K)~-GG(K-)~) is holomorphic
Exactly one of its eigenvalues belongs to each near P,, and MP,) = 1.
finite u-interval. These eigenvalues move with Moreover, there is a differential operator
5, but those in degenerate intervals cannot L=a,D”+a,~,D”~’ +CJ’:;uj(x,y, t)Dj such
move. that $Y = L$. Expanding tiO at pw as 1+
Let u be a finite-gap potential, 3= [i.,j-,, xi=, (Jx, y, t)zj, one can express the coeff-
&j] (j = 1, , y) be the nondegenerate (finite) cients uj by tl, &, . . Analogously, there is an
u-intervals, and pj(z) be the associated eigen- M = b,,,D”+b,,~,D”-’ +J$&‘u,Dj such that
values in Ij. The potential is recovered by the $, = M$. The operators L and M satisfy the
formula relation
&l
u(x)= 1 ;,-2f /Lj(X). L, - M, = [L, M], (4)
j=O j=l
which is a generalization of the Lax represen-
Put P(i) = n&,(i. - I,), realize the +Riemann tation. The coefficients of L and M satisfy a
surface S of the hyperelliptic curve wz = P(z) as certain system of nonlinear differential equa-
a two-sheeted cover of the +Riemann sphere, tions (V. E. Zakharov and A. B. Shabat, Func-
and consider the pj(z) as points on S. Let wj tional Anal. Appl., 8 (1974); I. M. Krichever,
(j=l , ,g) be a basis for the space of the Functional Anal. Appl., 11 (1977)).
+differentials of the first kind on S. Fix a point Example. Let F(K) = K’ and G(K) = ti3 + cti.
PO in S and put Then one finds
P,(r)
wj(j(5)= -f L = 0’ + u, M=D3+(3u/2+c)D+v,
wj (j=L...,d
!f=1 s PO
where u = - 25; and v = 35,<; - 3<; - 35;
(- 3 Abelian Varieties L). Then the locus Eliminating u from (4), one has
(w,(z), . . . , w&T))( -m ~7 <“o) on the +Jacobian
3u,, + ( - 4u, + 4cu, + u,,, + ~uu,), = 0,
variety turns out to be a straight line, the
direction u, being determined by the tperiods the so-called two-dimensional KdV equation
of certain idifferentials of the second kind on (Kadomtsev-Petvyasbvili equation). If u(x, y, t)
S. Employing the solution of +Jacobi’s inverse does not depend on y, the equation reduces to
problem, we can write the potential in terms of the KdV equation, and if u does not depend
the TRiemann theta function as on t, to the Boussinesq equation.
The condition for reduction to these spe-
d2
u(x)= -2dxLlogo(xu,+c)+ c, (3) cial cases can be described in terms of the
meromorphic functions admitted by S. Sup-
where c is a certain constant vector and C is a pose that there is a meromorphic function
constant. EF( p) holomorphic for p # p, and of tprin-
Suppose now that u(x, t) is a solution of the cipal part F(K) at p =pK. Then $ is writ-
KdV equation which is a finite-gap potential ten as cp(x, r,p)exp{E,(p)y), and the coeffi-
387 G 1442
Solitons

cients of L and M do not depend on y. q they are solved by reduction to functional


satisfies Lcp = Eq, and (4) reduces to the Lax equations of some standard type. In this article
representation. functions are all real-valued functions of real
If such an E, exists, S js hyperelliptic, and variables unless otherwise specified.
pw is one of its tbranch points over the Rie-
mann sphere. Thus the result of the previous
section is recovered. B. Additive Functional Equations and Related
Equations
G. Solvable Models in Field Theory
Suppose that we are given an equation
The Sine-Gordon equation has been studied .f(-u+Y)=.f(x)+f(Y). (1)
extensively as a solvable model in +tield theory.
It is a special case of a field in two space-time Clearly, ,f(x) = cx (c a constant) is a solution. If
dimensions with values in a +symmetric space; f(x) is continuous, (1) has no other solution
this can also be treated by a variant of the (Cauchy). The same conclusion holds under
inverse scattering method (V. E. Zakharov and any one of the following weaker conditions:
A. V. Mikhailov, SOD. Phys. JEEP, 47 (1978)). (i) f(x) is continuous at a point; (ii) f(x) is
Much work has been done on the semiclass- bounded in a neighborhood of a point; (iii)
ical tquantization of equations encountered in f(x) is tmeasurable in a neighborhood of a
soliton theory. Recently, a method of exact point. However, it was shown by G. Hamel
quantization (called quantum inverse scatter- and H. Lebesgue by means of ttransfinite in-
ing) was developed (see, for example, L. A. duction that equation (1) has infinitely many
Takhtadzhyan and L. D. Faddeev, Russian nonmeasurable solutions. On the other hand,
Muth. Surveys, 34 (5) (1979)). it was proved by A. Ostrowski [2] that if a
solution f(x) of equation (1) does not take any
value between two distinct numbers for x on
References a set of positive measure, then f(x) is con-
tinuous. This result can be extended to the
[l] C. S. Gardner, J. M. Greene, M. D. Krus- case where x is a point (x1, . . . , x,) of an n-
kal, and R. M. Miura, Method for solving the dimensional Euclidean space. In this case, any
Korteweg-de Vries equation, Phys. Rev. Lett., continuous solution is of the form
19(1967),1095-1097.
[2] P. D. Lax, Integrals of nonlinear equations f(x)= i cjxj
of evolution and solitary waves, Comm. Pure j=l

Appl. Math., 21 (1968), 467-490. (Cj constant).


[3] B. A. Dubrovin, V. B. Matveev, and S. P. Next, we consider the equation
Novikov, Nonlinear equations of Korteweg-de
Vries type, finite-band operators and Abelian
(2)
varieties, Russian Math. Surveys 31 (1) (1976),
59-146. (Original in Russian, 1976.)
Any solution of equation (1) satisfies this equa-
[4] H. P. McKean, Integrable systems and
tion. When x is a point (x1, . , x,) of an n-
algebraic curves, Lecture notes in math. 755,
dimensional Euclidean space, any continuous
Springer, 1979, 83-200.
solution of (2) is of the form f(x) = Cy=, Cjxj +
[S] Yu. I. Manin, Algebraic aspects of non-
C,, (Cj constant). If a solution f(x) of (2) de-
linear differential equations, J. Sov. Math., 11
fined on a tconvex set K does not take any
(1) (1979), l-122. (Original in Russian, 1978.)
value between two distinct numbers for x on a
set of positive measure, then f(x) is continuous
(M. Hukuhara).
Consider the equation

388 (X111.32) dx + Y) =dMY). (3)


Special Functional Equations If a solution g(x) vanishes at some point 5,
then g(x) = 0. Excluding this trivial case, we
assume that g(x) never vanishes. Then, putting
A. General Remarks
y =x, we see that g(x) > 0 for all x. The substi-
tution j’(x)=logg(x) then reduces equation (3)
The term special functional equations usually
to equation (1). Thus we see that any continu-
means functional equations that do not in-
ous solution of (3) is of the form g(x) = exp(cx).
volve limit operations. Such functional equa-
Next, we consider the equation
tions appear in various fields, but there is no
systematic method for solving them. Usually du4 = Ymm. (4)
1443 388 D
Special Functional Equations

If a solution g(u) vanishes at some 5 #O, then is cos x, since b may take purely imaginary
g(u) = 0. Excluding this case, we assume that values.
g(u) # 0 for u # 0. For u, v > 0, by the substi-
tution x = log u, y = log u, we have an equation
D. SchrSder’s Functional Equation
of the form (3). On the other hand, putting u =
-1, we have g(-u)=g(-l)g(u). Since g2(-1)
Schriider’s functional equation is
= g( 1) = 1, we see that any continuous solution
of (4) is of the form 1~1’ or (sgnu)lul’ according f(W) = cm> (8)
asg(-l)=l or -1.
where O(x) is a given function and c is a con-
stant. A general solution of (8) can be written
as f(x)=fi(x)cp(x), where fi(x)#O is a particu-
C. The General Addition Theorem and Related lar solution of (8) and q(x) is a general solu-
Functional Equations tion of the equation cp(Q(x)) = q(x). Suppose
that there is a point a such that O(a) = a, and
The general addition theorem is: If the equation .9(x) and f(x) are both differentiable in a neigh-
borhood of x = a. Then we have f’(a) = 0 or
f(x+Y)=w(x)>f(Y)) (5) H’(a) = c. Consider the case where Q’(a) = c, and
has a continuous nonconstant solution ,f(x) on suppose that B(x) is twice differentiable at
--oo <x < +co, then f(x) is strictly monotone, x=a. WhenIcl=]O’(a)l<l,defineO,(x)(n=
and F(u, u) is strictly monotone increasing and 0,1,2, . ..) by 0,(x)=x and &(x)=0(6,-,(x)).
continuous with respect to u and v for t( <u, Then the sequence {(0,(x-a)~-“} (n=O, 1,
v</Iand satisfies a<F(u,v)<p for GL<U, u< 2,. ) converges uniformly in a neighborhood
fl. There is also a constant c satisfying F(c, c) = of x = a, and its limit function f(x) is a solu-
c, and the identity F(F(u, u), w) = F(u, F(v, w)) tion of (8). When ICI= ]0’(a)l> 1, put O(x)=u.
holds for any u, v, w in the interval (c(, b). Con- Then we have the equation f(Q -i(u)) = c -‘f(u),
versely, if F(u, v) is such a function, then (5) has and the problem reduces to the previous case.
a continuous nonconstant solution on --co < The results obtained for equation (8) can be
x < +co. Let f(x) be such a solution. Then any extended to the following system of equations:
other continuous solution is given by f(cx).
.I#4 (4, Q,(x), . . 9e&4)
When F(u, v) is continuously differentiable, a
continuow solution f(x) of (5) can be obtained =Ajfj(x)+Sifjml(x)+gj(x), j=1,2,...,n, (9)
as a solution of the differential equation
where the Q,(x) are given functions holomor-
f’(x) = F”(f(X)> 4G c =.f’(O), phic in a neighborhood of x = 0, Oj(0) = 0,
the coefftcients off in the right-hand side of
satisfying the initial condition f(0) = a.
(9) are numbers such that the matrix A=(u,)
Consider the equation
(au=0 except for ~~=$u~~i,~=S~) is the +Jor-
dan canonical form of the matrix formed by
{ (dOj/a~k)x=O}, and the gj(x) are polynomials
Suppose that F(u, u, w) is a polynomial in u, v,
consisting of terms of the form constant x
and w. If this equation has a tmeromorphic
x;’ x72 . ..X.“” with exponents m,, m2, . , m,
solution f(x), then f(x) must be a rational
forwhich1Vj=1;“1...1~.IfO<I~Vj]<1(j=1,2,
function, a rational function of expcx, or an
“‘2 n), then we can always choose the coeffi-
elliptic function [l, p. 641.
cients of the polynomials gj(x) so that equa-
Next, consider the equation
tion (9) has a solution {A(x)} holomorphic in
.f(x+Y)+f(x-y)=2(f(x)+f(y)). (6) a neighborhood of x = 0. The same conclusion
canbeobtainedforIS]>l(j=l,...,n).
Any solution continuous on -co <x < co is of
Consider Abel’s functional equation
the form f(x) = cx2. When x is a point (xi,
. . ,x,) of an n-dimensional Euclidean space, .f(W)=f(x)+ 1. (10)
any continuous solution of (6) is given by a
If we put expf(x) = p(x), then we have Schrii-
quadratic form j(x)=C? 1.1.c..x.x..
‘I 1 J der’s functional equation
Consider the equation
cp(@)) = 44.
Rx + Y) +f’tx - Y) = 2f(MY). (7)
Consider the equation
Any solution continuous on - co <x < m is
of the form f(x) = cash cx = (ecX + e -‘“)/2 or .0x + 1) = AWW (11)
f(x) = cos cx. If f(x) is allowed to take com-
If we put cp(x)=logf(x), then we have a tlinear
plex values, then any continuous solution can
difference equation of the form
be written in the form f(x)=(eb”+embx)/2 in
terms of a complex number b. A special case cp(x+l)---(x)=logA(x).
388 Ref. 1444
Special Functional Equations

References (2) Special functions of confluent type are


solutions of differential equations that are
[ 11 E. C. Picard, Lecons sur quelques equa- derived from thypergeometric differential
tions fonctionnelles, Gauthier-Villars, 1928. equations by the confluence of two regular
[2] A. Ostrowski, Mathematische Miszellen singular points, that is, by making one of the
XIV, Uber die Funktionalgleichung der Ex- regular singular points tend to the other one
ponentialfunktion und verwandte Funktional- so that the resulting singularity is an tirregular
gleichungenm, Jber. Deutsch. Math. Verein., singular point of class 1 (- 167 Functions of
38 (1929) 54-62. Confluent Type). Any function of this type can
[3] F. Hausdorff, Mengenlehre, de Gruyter, be expressed by means of +Whittaker func-
second edition, 1927, 175; English translation, tions, of which many important special func-
Set theory, Chelsea, 1962. tions, such as +Bessel functions, are special
[4] J. Aczel, Lectures on functional equations cases (- 39 Bessel Functions). Also, one can
and their applications, Academic Press, 1966. reduce to this type the iparabolic cylindrical
(Original in German, 1961.) functions, that is, the solutions of differential
equations with only one singular point which
is at infinity and is irregular of class 2.
(3) Special functions of ellipsoidal type are
389 (XIV.l) solutions of differential equations with four or
five regular singular points, some of which
Special Functions
may be confluent to become irregular singular
points. Examples are +LamC functions, +Math-
A. Special Functions ieu functions, and tspheroidal wave functions
(- 133 Ellipsoidal Harmonics; 268 Mathieu
The term special functions usually refers to Functions). In contrast to types (1) and (2),
the classes of functions listed in (l)-(4) (other functions of type (3) are difficult to charac-
terms, such as higher transcendental functions, terize by means of tdifference-differential
are sometimes used). (1) The tgamma function equations and have not been fully explored.
and related functions (- 174 Gamma Func- Sometimes the term special ,function in the
tion); (2) +Fresnel’s integral, the terror function, strict sense is not applied to them. To specify
the tlogarithmic integral, and other functions the special functions of types (1) and (2)
that can be expressed as indefinite integrals of the term classical special functions has been
elementary functions (- 167 Functions of proposed.
Confluent Type D); (3) telliptic functions (-
134 Elliptic Functions); (4) solutions of ilinear
ordinary differential equations of the second B. Unified Theories of Special Functions
order derived by the method of separation of
variables in certain partial differential equa- Though many special functions were intro-
tions, e.g., tlaplace’s equation, in various duced separately to solve practical problems,
tcurvilinear coordinates. Recently, new types several unified theories have been proposed.
of special functions, such as +PainlevC’s, have The classification in Section A based on dif-
been introduced as the solutions of special ferential equations may be regarded also as a
differential equations. kind of unified theory. Other examples are:
In this article we discuss class (4); for the (1) Expression by iBarnes’s extended hyper-
other classes, see the articles quoted. Class geometric function or its extension to the case
(4) is further divided into the following three of several variables by means of a definite inte-
types, according to the character of the tsin- gral of the form
gular points of the differential equations of
which the functions are solutions. Equations
with a smaller number of singular points than
those indicated in (l))(3) below can be in-
s
(-
(r-al)b1(i-u2)b’...(i--am)b-(i-z)Cd5

206 Hypergeometric Functions).


tegrated in terms of elementary functions. (2) A unified theory [ 143 that includes the
(1) Special functions of hypergeometric type gamma function and is based upon Truesdell’s
are solutions of differential equations with difference-differential equation
three tregular singular points on the Riemann
SF(z, ccyaz = F(z, cl + 1).
sphere. Examples are the thypergeometric
function and the +Legendre function. Any (3) Unification from the standpoint of ex-
function of this type reduces to a hypergeo- pansions in terms of tzonal spherical functions
metric function through a simple transforma- of a differential operator (the Laplacian) in-
tion (- 206 Hypergeometric Functions; 393 variant under a transitive group of motions
Spherical Functions). on a tsymmetric Riemannian manifold (- 437
1445 390 A
Spectral Analysis of Operators

Unitary Representations). With this approach 390 (XII.1 2)


a great variety of formulas can be derived in a
unified way [3,4].
Spectral Analysis of
Operators
References
A. General Remarks
[l] J. Dieudonne, Special functions and linear
representations of Lie groups, Expository Throughout this article, X stands for a +com-
lectures from the CBMS regional conference plex linear space and A for a +linear operator
held at East Carolina Univ., March 1979, in X. Except when X is finite-dimensional,
Amer. Math. Sot., 1980. A need not be defined over all X. A linear
[Z] B. C. Carlson, Special functions of applied operator A in X is by definition a linear map-
mathematics, Academic Press, 1977. ping whose tdomain D(A) and +range R(A) are
[3] J. D. Talman, Special functions: A group- linear subspaces of X. A complex number i is
theoretic approach, Benjamin, 1968. said to be an eigenvalue (proper value or char-
[4] N. Ya. Vilenkin, Special functions and the acteristic value) of A if there exists an x E D(A)
theory of group representations, Amer. Math. such that Ax =/1.x, x # 0. Any such x is called
Sot. Transl. of Math. Monographs 22, 1968. an eigenvector (eigenelement, proper vector,
(Original in Russian, 1965.) characteristic vector) associated with i. When
[S] W. Miller, Jr., Lie theory and special func- X is a ifunction space, the word eigenfunction
tions, Academic Press, 1968. is also used. For an eigenvalue i of A, the
[6] Y. L. Luke, The special functions and their subspace M(i) of X given by
approximations I, II, Academic Press, 1969.
M(i.)=M(i;A)=jx(Ax=ix},
[7] S. Moriguti, K. Udagawa, and S. Hitotu-
matu, Mathematical formulas, Special func- i.e., the subspace consisting of 0 and all eigen-
tions (in Japanese), Iwanami, 1960. vectors associated with i, is called the eigen-
[S] E. T. Whittaker and G. N. Watson, A space associated with i, and the number m(i)
course of modern analysis, Cambridge Univ. = dim Nl(i,) is called the geometric multiplicity
Press, 1902; fourth edition, 1958. of i. The eigenvalue i is said to be (geometri-
[9] E. Jahnke and F. Emde, Funktionentafeln cally) simple or degenerate according as m(i)
mit Formalen und Kurven, Teubner, 1933; = 1 or m(A) > 2. The problem of seeking eigen-
bilingual edition, Tables of functions with values and eigenvectors is referred to as the
formulas and curves, Dover, 1945. eigenvalue problem.
[lo] W. Magnus and F. Oberhettiner, For- When X is a ttopological linear space, the
mulas and theorems for the special functions notion of eigenvalues leads to a more general
of mathematical physics, Springer, third en- object called the spectrum of A. Let i, be a
larged edition, 1966. complex number and put A, =?.I - A, where 1
[ 1 l] R. Courant and D. Hilbert, Methods of is the iidentity operator in X. Furthermore,
mathematical physics I, II, Interscience, 1953, put RA=(AJ -’ = (7.1 -A)-‘, if the inverse
1962. exists. Then the resolvent set p(A) of A is de-
[12] A. Erdelyi (ed.), Higher transcendental fined to be the set of all i such that R, exists,
functions I--III, McGraw-Hill, 195331955. has domain +dense in X, and is continuous.
[ 131 J. Meixner, Spezielle Funktionen der The spectrum o(A) of A is, by definition, the
mathematischen Physik, Handbuch der Physik complement of p(A) in the complex plane, and
II, Springer, second edition, 1933. it is divided into three mutually disjoint sets:
[14] C. A. Truesdell, An essay toward a uni- the point spectrum cp(A), the continuous spec-
fied theory of special functions based upon the trum ac( A), and the residual spectrum (T~(A).
functional equation aF(z, sc)/az = F(z, a + l), These are defined as follows: CJJA) = (2) R,
Ann. Math. Studies, Princeton Univ. Press, does not exist} = { i, 1i, is an eigenvalue of A);
1948. q.(A) = {A 1R, exists and has domain dense in
[ 151 I. N. Sneddon, Special functions of math- X, but is not continuous}; g,JA) = IT.1 R, exists,
ematical physics and chemistry, Oliver & but its domain is not dense in X}.
Boyd, 1956. Let X be a tBanach space and B(X) the set
[ 161 F. W. Schaf ke, Einfiihrung in die Theorie of all +bounded linear operators with domain
der spezielle Funktionen der mathematischen X. If A is a iclosed operator in X, then do
Physik, Springer, 1963. p(A) if and only if R, E B(X). Moreover, a( A)
[17] G. Szegii, Orthogonal polynomials, is a closed set. In particular, if AcB(X), then
Amer. Math. Sot. Colloq. Publ., revised edi- o(A) is a nonempty compact set. In this case
tion, 1959. the spectral radius r,(A) is defined as r,(A)=
[IS] E. D. Rainville, Special functions, Mac- SUP~,~,,(~)I>.(. Then r,(A) < I/A”(( ‘In, n = 1,2, ,
millan, 1960; Chelsea, second edition, 1971. and 11A”11‘jn jr,,(A), n-+x’.
390 B 1446
Spectral Analysis of Operators

In many problems of analysis crucial roles N x N normal matrix. Then the eigenspaces
have been played by methods involving the associated with different eigenvalues of A are
spectrum and other related concepts. This mutually orthogonal. Moreover, the eigen-
branch of analysis is called spectral analysis. spaces of A as a whole span the entire space
For an infinite-dimensional X the theory is X. One can therefore choose a +basis of X
well developed when X is a +Hilbert space and formed by a tcomplete orthonormal set of
A is +self-adjoint or tnormal. eigenvectors of A. Specifically, there exists a
basis{cpjlj=l,...,N} ofxsuchthat Aqj=
pjqj and (vi, qj) = 6,, where 6, is the +Kro-
B. Eigenvalue Problems for Matrices necker delta. Moreover, p,, , p,,, exhaust
all the eigenvalues of A. In terms of the basis
Throughout this section let X be an N- { cpj}, an arbitrary x E X can be expanded as
dimensional complex linear space (N < co) and
A a linear operator in X. (We assume that A
is defined over all X.) With respect to a fixed
basis ($,, . . . , tiN) of X, the operator A is repre- where PA is the orthogonal tprojection on the
sented by an N x N matrix, also denoted by A. eigenspace associated with the eigenvalue 3..
Then the eigenvalues of A coincide with the Of particular importance among normal
roots of the icharacteristic equation det(il- matrices are Hermitian matrices and ‘unitary
A) = 0. There are no points of the spectrum matrices. The eigenvalues of a Hermitian
other than eigenvalues, that is o(A) = op(A), matrix are real, and those of a unitary matrix
Let i,~a(A). The multiplicity C?(n) of 3. as a have the absolute value 1.
root of the characteristic equation is called Solving the eigenvalue problem of a normal
the (algebraic) multiplicity of the eigenvalue i.. matrix A leads immediately to the diagonali-
The sum of C?(n) over all the eigenvalues of A zation of A. For instance, let U be the N x N
is equal to N. The eigenvalue 3, is said to be matrix whose jth column is equal to qj, Here
(algebraically) simple or degenerate according the basis {cp,} is as before, and each qj is re-
as G(i)= 1 or fi(i)>2. Let v= 1,2, . . . . and garded as a column vector. Then U is unitary,
N,,(i)= {x1(1.1- A)“x=OJ. Then {N&)} forms and the transform U*AU of A by U is the
a nondecreasing chain of subspaces M(%) = diagonal matrix whose diagonal entries are
N,(i)c N&)c . . . . which ceases to increase the pj. The problem of transforming a +Hermi-
after a finite number of steps. When v > G(1.), tian form to its canonical form can also be
the space N,(i) is equal to a fixed subspace solved by means of U. In fact, a Hermitian
a(n), sometimes called the root subspace (or form Q = Q(x) on X is expressed as Q(x) =
generalized eigenspace or principal subspace) of (Ax, x) with a Hermitian matrix A. For this
A associated with 3.. A vector in the root sub- A, construct U as before. Then by the trans-
space is called a root vector (or a generalized formation x = Uy of the coordinates of X, the
eigenvector). Then dim lii(i,) = %(I.) and hence form Q is converted to its canonical form
m(i) < Cz(,i). When A is a inormal matrix, Q=~L11y1(2+...+~,(y~~z. WhenXisareal
M(i,)= fi(1.) and m(i)=fi(n). linear space and A is a real symmetric matrix,
If two matrices A and B are tsimilar, i.e., if U is an orthogonal matrix. By means of the
there exists an invertible matrix P such that orthogonal transformation x = UJ~, the sur-
B = Pm’ AP, then A and B have the same eigen- face of the second order Q(x) = 1 in RN is con-
values with the same algebraic (and geometric) verted to the form p, y: + + pNyi = 1. The
multiplicities. The same conclusion holds for orthogonal transformation x = Uy is called
A and A’, where A’ is the ttranspose of A. the transformation to principal axes of the sur-
For the adjoint matrix A* we have o(A*) = face Q(x)= 1.
a(A)-jnjn~a(A)}. For an arbitrarypoly- When A is not normal, it can be transformed
nomial f the relation a(,j’(A))=f(a(A))= into +Jordan’s canonical form by a basis qj
[ ,f’(>.) 11.E o(A) ) holds (Frobenius’s theorem). taken from the root subspaces fit’n). However,
These relations can be extended to operators ‘pj need not be orthonormal even when A is
in a Banach space. In particular, a(f(A)) = diagonalizable.
,f(a(A)) if A is a bounded operator and f is a
function holomorphic in a neighborhood of
o(A) (for the ispectral mapping theorem C. Spectral Analysis in Hilbert Spaces
- 251 Linear Operators).
In the next four paragraphs, in which the Throughout the rest of this article except for
spectral properties of tnormal or +Hermitian the last section, X is assumed to be a Hilbert
matrices is discussed, we introduce into X the space with inner product ( , ). Furthermore,
Euclidean tinner product ( , ), regarding X the most complete discussions will be confined
as a space of N-tuples of scalars. Let A be an to +normal or +self-adjoint operators. A funda-
1447 390 E
Spectral Analysis of Operators

mental theorem in spectral analysis for such IIE(M)xll’ is a bounded regular tmeasure in
operators is the spectral theorem, which as- the ordinary sense, and the set function M-+
serts that a representation such as (1) holds (E(M)x, y) is a complex-valued regular +com-
in a generalized form. When the operator is pletely additive set function. For every com-
+compact, we have only to replace the sum by plex Bore1 tmeasurable function f on R, the
an infinite sum (- 68 Compact and Nuclear operator S(f) in X is defined by the relations
Operators). In the general case we need a kind
of integral. This is discussed in detail in Sec-
tions D and E. The general theory of spectral
analysis for nonnormal operators, however, is
rather involved even in Hilbert spaces, but two cxf1% Y) = a: f(4ww, YX (4)
s -co
important developments can be noted. One is
x~W(f)), YEX.
the theory of Volterra operators, and the other
is the theory of essentially normal operators. S(f) is a densely defined closed operator and
The former is discussed in Section H and the is denoted by S(f)=l?,f(l.)E(di). The corre-
latter and its related results in Sections I and J. spondence ,fwS(f) satisfies formulas of the
so-called operational calculus (- 25 1 Linear
Operators). In particular, S(f) = S(f)*, and
D. Spectral Measure hence S(f) is self-adjoint if f is real-valued. If f
is bounded on the support of E, then S’(f) is
Let 98 be a tcompletely additive class of sub- everywhere defined in X and is bounded. S(f)
sets of a set 0, that is, (a, 9) is a tmeasurable is sometimes called the spectral integral off
space. An operator-valued set function E = with respect to E. The operator S(f) can be
E( .) defined on g is said to be a (self-adjoint) defined in a similar way for a spectral mea-
spectral measure if(i) E(M), M ~g, is an sure on @ (and for a more general spectral
+orthogonal projection in X; (ii) E(R) = I; and measure).
(iii) E is tcountably additive, that is,

E E. Spectral Theorems

(istrong convergence) for a disjoint sequence For every self-adjoint operator H in a Hilbert
{Mn} of subsets in 8. A spectral measure E space X, there exists a unique real spectral
satisfies E(MnN)= E(M)E(N)=E(N)E(M), measure E such that
M, NE a. Spectral measures which are fre-
quently used in spectral analysis are those de- H= O” /IE(di). (5)
s -co
fined on the family 9$ (&Q of all +Borel sets in
the field of real (complex) numbers R (C). A In other words, H and E correspond to each
spectral measure on &?, (@) is sometimes re- other by the relations
ferred to as a real (complex) spectral measure.
For such a spectral measure E the support (or D(H)= x m 12(E(dl)x, x)< +cc ,
the spectrum) of E, denoted by A(E), is defined 1 IS -cc
to be the complement of the largest open set G
for which E(G) = 0. A complex spectral mea- Wx, Y) = r m WWx, Y),
J-CC
sure such that A(E)cR can be identified with
a real spectral measure. x ED(H), YEX.
Let E be a spectral measure on gr, and put This is the spectral theorem for self-adjoint
operators. The support of E is equal to the
~,=~((-~,4), -co<l<co. (2)
spectrum a(H), so that we can write
Then E, satisfies the relations

s-lim E, = 0, s-lim E, = I, where x,,., stands for the tcharacteristic func-


i--m A-+CC
tion of M. Formulas (5) and (6) are called the
where s-lim stands for strong convergence. A spectral resolution (or spectral representation)
family { EI,JAER of orthogonal projections satis- of the self-adjoint operator H. We call E the
fying the relation (3) is called a resolution of the spectral measure for H, and the {E,} corre-
identity. Relation (2) gives a one-to-one cor- sponding to E by formula (2) (or sometimes
respondence between the resolutions of the E itself) the resolution of the identity for H.
identity and the spectral measures on ?&. Let 1 be a real number. Then 1 E op(H) if and
Let E be a spectral measure on @, and let x, only if E( { n}) # 0. Also, i, E o,-(H) if and only if
y~x. Then the set function M+(E(M)x,x)= E( { 1.)) = 0 and E( I’) # 0 for any neighborhood
390 F 1448
Spectral Analysis of Operators

V of 1. The spectral measure E can be repre- resolution of the identity E associated with H:
sented in terms of the resolvent R(a; H) =
(a1 -H))’ of H by the formula
b-d

E((a,b))=limlim~ {R(p-G;H) For an arbitrary UEX, let L,(a) be the L,-


610 do 27ri s a+a
space over the measure ,u~ = p(,( .) = (E( .)a, a).
-R(p+-tEi;H)}dp In other words, f~L,(a) if and only if UE
D(f(H)). The correspondence f-j(H)u gives
(strong convergence).
an isometric isomorphism between L,(a) and
For every normal operator A in X, there
the subspace M(a)= {f(H)uIf~L~(u)} of X.
exists a unique spectral measure E on the
(In particular, M(u) is closed.) H is reduced by
family of all complex Bore1 sets $ such that
M(a), and the part of H in M(u) corresponds
to the multiplication ,f@)-y(n) in L,(u).
A= zE(dz).
sC For a given self-adjoint operator H there
exists a (not necessarily countable) family
This is called the complex spectral resolution {a,,},,, of elements aH of X such that
(or complex spectral representation) of the
normal operator A. The support A(E) is equal x= 1 M(qd (8)
BE@
to a(A). There are characterizations of point
and continuous spectra similar to the case of where C stands for the tdirect sum of mutually
self-adjoint operators. Normal operators have orthogonal closed subspaces. Consequently, X
no residual spectra. For a unitary operator U, is represented by the direct sum &,,sL,(u,) of
the support of the associated spectral measure L,-spaces. If XED(H) is represented by {fHJBEo
is contained in the unit circle I, so that U can in this representation, Hx is represented by
be represented as inf,h&

u=se’“F(dO)
r
(7)
G. Unitary Equivalence and Spectral
with a spectral measure F defined on I. For- Multiplicity
mula (7) is the spectral resolution of the unitary
operator U. In this section X is assumed to be a tseparable
For a self-adjoint operator H =l?& /IE(di) Hilbert space. Then (8) can be made more
the following two types of classification of precise. Namely, for a self-adjoint operator H,
a(H) are often useful. we can find a countable family {a,,‘;;:, of ele-
(i) The essential spectrum o,(H) is by defini- ments of X such that
tion the set o(H) minus all the isolated eigen-
values of H with finite multiplicity. When H is x = f M(u,) = f L,(u,), (9)
bounded this definition of the essential spec- "=I n=,
trum coincides with that to be given in Section k+, is tabsolutely continuous
I for a general AEB(X). o(H)\a,(H) is called
with respect to p,,, n-1,2,.... (10)
the discrete spectrum of H.
(ii) The set X,,(H) (resp. X,(H)) (called the Furthermore, if {a;} is another family satisfy-
space of absolute continuity (resp. singularity) ing (9) and (lo), then ~1~.and pO, are absolutely
with respect to H) of all u E X such that the continuous with respect to each other
measure (E(d/Z)u, U) is absolutely continuous (Hellinger-Hahn theorem). pL,I is said to be the
(resp. singular) with respect to the tLebesgue maximum spectral measure.
measure is a closed subspace of X that reduces Two operators H, and H2 are said to be
H. The restriction of H to X,, (resp. X,(H)) is unitarily equivalent if there exists a unitary
called the absolutely continuous (resp. sin- operator U such that H2 = U*H, U. A crite-
gular) part of H, and its spectrum, denoted by rion for unitary equivalence of self-adjoint
o,,(H) (resp. a,(H)), is called the absolutely operators can be given in terms of the spectral
continuous (resp. singular) spectrum of H. Note representation given previously. Namely, let
that na,,(H) and (T,(H) may not be disjoint. {u):‘}, i = 1, 2, be a sequence satisfying (9) and
(10) with respect to Hi. Then H, and H2 are
unitarily equivalent if and only if F(,~I) and p,pl
F. Functions of a Self-Adjoint Operator are absolutely continuous with respect to each
other for all n = 1,2, .
Let H = l?, iE(di,) be a self-adjoint operator A self-adjoint operator H is said to have a
in X. For a complex-valued Bore1 measurable simple spectrum if there exists an a E X such
function ,f on R, we define f(H) to be the oper- that M(u)=X. Such an a6X is called a gen-
ator S(f) determined by (4) in reference to the erating element of X with respect to H.
1449 390 I
Spectral Analysis of Operators

Self-adjoint operators with simple spectra projection to the set of all Volterra operators
are closely related to Jacobi matrices. Let H of Hilbert-Schmidt class possessing $3 as an
be such an operator with a generating ele- eigenchain (Gokhberg and Krein). Volterra
ment a~ X. Take a complete orthonormal set operators with the imaginary part of the +trace
{G,},“, in L,(a) such that G,, = G,(i) is a poly- class are especially important for applications.
nomial of degree n - 1 and ~.G,(;.)E L,(a). Then In this case we have the following fundamen-
I 1X’ tal theorem on the density of the spectrum of
, sf/z=l~ R = G,(H)a, is a complete orthonor-
ma1 set in X. The matrix representation {urn,,), the real part A, of the Volterra operator A:
u =(Hg,,,g,,,) of N with respect to the basis Let n+(r; AR) and n-(r; AR) be the numbers of
{iI} has the following properties: (i) a,,,“=0 if eigenvalues of A, in the intervals [l/r, CD) and
lm-n132; (ii) u, n+, =u,+,.,#O; (iii) unn is real. (--rx?, -l/r], respectively.
Any infinite matrix {u,,,.) satisfying (i), (ii), and
Then lim n+(r; A,#
r-u = r-m
lim n_(r; AR) = /i/n.
(iii) is called a Jacobi matrix. A Jacobi matrix
determines a +symmetric operator whose +de-
The number h is given by
ticiency index is either (0,O) or (1,l). Any self-
adjoint extension has a simple spectrum. (For
h= 1 j\dP A,dPiJ,
more details about Jacobi matrices and their
applications - 181.)

where the norm is the +trace norm and the


H. Triangular Representation of Volterra infimum is taken over all finite partitions Pi of
Operators a maximal eigenchain $3 for A. We refer for
further details and applications to the books
A linear operator A in a Hilbert space X is by Gokhberg and Krein [9, lo].
called a +Volterra operator if it is +compact
and iquasinilpotent (i.e., 0 is the only spec-
trum). The name is justified because under 1. Fredholm Operators and Essential Spectra
very general assumptions such an operator is of Operators
unitarily equivalent to the integral operator
of Volterra type in the vector-valued L, space Throughout Sections I and J we assume that
on [0, 11. Let ‘I3 be a maximal ttotally ordered X is a separable infinite-dimensional complex
family of orthogonal projections in X such Hilbert space, and we consider only bounded
that PX is an tinvariant subspace of a Volterra linear operators in X. The set B@)(X) of all
operator A for every P E p. Such a family $ +compact linear operators in X is a +maximal
always exists and is called a maximal eigen- two-sided ideal of the +C*-algebra B(X) of all
chain of A. Then generalizing the triangular bounded linear operators. The simple quotient
representation of nilpotent matrices, we have C*-algebra A(X) = B(X),/B(“)(X) is called the
the integral representation Calkin algebra. We denote the quotient map-
ping by z:B(X)+A(X). Then an operator
A=2i P,4, dP, A E B(X) is a +Fredholm operator if and only if
s ‘8 its image n(A) is an invertible element of A(X).
where A, =(A - A*)/(2i) is the imaginary part Let F(X) be the set of all Fredholm operators
of A and the integral is the limit in norm of in X, and let F,(X), neZ, be its subset of all
approximating sums of the form EQ,A,(P,- operators of +index n. F,(X) is a connected
Pi-i) for finite partitions 0 = PO < P, < < P, = set in B(X), and in particular, F,(X) is the
I of $3 in which Qi is an arbitrary projection inverse image of the connected component of
in ‘p such that Pi-i < Qi < P, (M. S. Brodskii). the identity in the multiplicative topological
Conversely, let ‘$3 be a totally ordered family of group n(F(X)) of all invertible elements in
orthogonal projections that contains 0 and A(X). The index gives the group isomor-
the identity. If the integral A = se PBdP con- phisms F(X)/F,(X)~~(F(X))/~(F,(X))EZ.
verges in norm for a compact linear operator More generally, we have for any compact
B, then A is a Volterra operator and ‘p is an topological space Y the group isomorphisms
eigenchain of A. If, moreover, B is self-adjoint, [Y, F(X)] z [Y, n(F(X))] z K(Y) of the groups
we have B = A, (I. C. Gokhberg and M. G. of thomotopy classes of continuous mappings
Krein; Brodskii). Furthermore, assume for and the K-group in the +l<-theory (M. F.
simplicity that ‘@ is continuous in the sense Atiyah [12]).
that for every P, < P2 in ‘$3 there exists an If N is a +normal operator, its essential
element P in ‘I3 such that P, cc P < P2. If B is a spectrum o,(N) is defined to be the set of all
+Hilbert-Schmidt operator, then the integral ~-E(T(N) that is not an isolated eigenvalue of
A = [,pPBdP converges in the +Hilbert-Schmidt finite multiplicity. Let H, and N, be self-
norm and the mapping Bw A is an orthogonal adjoint operators. Then (T~(H,) = rre(H2) if and
390 J 1450
Spectral Analysis of Operators

only if VU, I/ = H2 + K for a unitary opera- sion theory also gives a natural setting for the
tor U and a compact operator K (Weyl-von r-index theory of elliptic differential operators
Neumann theorem). due to Atiyah and I. M. Singer [ 13,163.
I. D. Berg and W. Sikonia (1971) extended An extension of B@)(X) by C(Y) is a +short
this result to normal operators N, and N2. exact sequence
Moreover, for any compact subset Y of C
O+B(‘)(X)&E%(Y)-0 (11)
there exists a normal operator N such that
(T,(N) = Y. Hence it follows that the essential of a C*-subalgebra E of B(X) and .+*-homo-
spectrum a,(N) of a normal operator N coin- morphisms, i.e., E is a C*-subalgebra of B(X)
cides with the tspectrum o(n(N)) of the image containing the identity 1 and including B(‘)(X)
in A(X). Thus we define the essential spec- as a C*-subalgebra, and cp is a *-homomor-
trum a,(A) of an arbitrary operator A to be phism onto C(Y) whose kernel is equal to
the spectrum a(n(A)) in A(X). An operator B(‘)(X). Or equivalently, an extension is a uni-
A EB(X) is said to be essentially normal (resp. tal (identity preserving) *-monomorphism 7:
essentially self-adjoint, essentially unitary) if C(Y)+A(X) defined by z=7cocp-‘. (For general
n(A) is normal (resp. self-adjoint, unitary) in extension of C*-algebras - 36 Banach Alge-
A(X). (Note that this definition of essentially bras.) Two extensions (I!$, cpJ and (E2, q2)
self-adjoint operators is completely different (or 7, and 74 are said to be equivalent if there
from that in 251 Linear Operators E.) Since an exists a *-isomorphism $: E, +E, such that
essentially self-adjoint operator is the sum of a q2 o ti = ‘p, (or equivalently there exists a uni-
self-adjoint operator and a compact operator, tary operator U such that n(U*) z,(,f)n(U)=
the Weyl-von Neumann theorem classifies zz(,f) for every ,f~ C( Y)). We denote by Ext( Y)
essentially self-adjoint operators up to tunitary the set of all equivalence classes of extensions
equivalence modulo B@)(X). An operator A is of B@‘(X) by C(Y).
essentially normal if and only if the commu- Let A be an essentially normal operator in
tator [A, A*] is compact, but it need not be X with the essential spectrum a,(A) = Y. Then
the sum of a normal operator and a compact the C*-algebra E, generated by B(‘)(X), A and
operator. For example, let V’ be the unilateral the identity 1, and the *-homomorphism (PA of
shift operator that maps the orthonormal basis EA onto C(Y) which sends A to the function
e, of X into e,+l for every i = 1,2, Then V is x(z)=z define an extension (E,, (PA). It is easy
essentially unitary, but it cannot be written as to see that two essentially normal operators A,
the sum of a normal operator and a compact and A, are unitarily equivalent module B(‘)(X)
operator. The essential spectrum a,(V) is if and only if a,(A,) = ce(A2) and the extensions
the unit circle, whereas the spectrum cr( V) (EA,, (PAN)and (EA,, cpa,) are equivalent. Con-
is the unit disk and ind(V-I)= -1 for )il<l. versely, if Y is a compact subset of C and (11)
is an extension, then (E, cp) is equivalent to
(E,, q,J, where A is an essentially normal
operator in E such that q(A) = x.
J. The Brown-Douglas-Fillmore (BDF) Theory
Extensions of B(‘)(X) by C(Y) appear also
in different parts of analysis. Let X be the Hil-
The following is the main theorem for essen- bert space L,(M) on a compact differentiable
tially normal operators, due to L. G. Brown, manifold M relative to a fixed smooth measure
R. G. Douglas, and P. A. Fillmore [ 141. Let A, and let E be the C*-subalgebra of B(X) gener-
and A, be essentially normal operators. There ated by all zeroth-order tpseudodifferential
are a unitary operator U and a compact oper- operators together with B@)(X). Then E and
ator K such that CJ*A, U = A, + K if and only the tsymbol mapping q:E+C(S*(M)) define
if oJA,)=a,(A,) and ind(A, -iJ=ind(A,-I.) an extension of B@‘(X) by C@*(M)), where
for every 1, in the complement of the essential S*(M) is the tcosphere bundle of M. This
spectrum. An essentially normal operator extension is closely related to the +Atiyah-
A is the sum of a normal operator and a Singer index theorem. Let fi be a +strongly
compact operator if and only if ind(A -A) = 0 pseudoconvex domain in c”. Then the C*-
for every /( in the complement of a,(A). algebra generated by +Toeplitz operators with
To prove this and many other facts, they continuous symbol gives rise to an extension
developed the theory of extension of B@)(X) by of B”‘(H,(8Q)) by C(aQ.
the C*-algebra C(Y) of continuous complex- Let 7, and tp be *-monomorphisms from
valued functions on a compact metrizable C(Y) into A(X) and a, = [z,] and a2 = [7J
space Y [ 14- 161. This revealed deep relations be corresponding elements in Ext( Y). Then
between the theory of operator algebras on the sum a, + a2 E Ext( Y) is defined to be the
Hilbert spaces (- 36 Banach Algebras, 308 equivalence class of 2: C( Y)+A(X) which
Operator Algebras) and algebraic topology (in sendsfto T,(~)@T~(~)EA(X)@A(X)C
particular, K-theory; - 237 K-Theory). Exten- A(X 0 X)gA(X). It does not depend on the
1451 390 Ref.
Spectral Analysis of Operators

choice of 7,) rz and the unitary X 0 X z X. Nuclear Operators). However, a full spectral
An extension t: C( Y)+A(X) is said to be triv- analysis may not be possible without further
ial if there exists a unital *-monomorphism assumptions.
u: C( Y)+B(X) such that 7 = x o u. For each For a tclosed operator with nonempty re-
compact metrizable space Y there exists a solvent set p(A), an toperational calculus can
unique equivalence class of trivial extensions be developed by means of a function-theoretic
in Ext( Y). Ext( Y) is an Abelian group in method based on the fact that the resolvent
which the class of trivial extensions is the R, =(il- A)-’ is a B(X)-valued holomorphic
identity element. The extension (EA, (PA) for function of 1 in p(A). In particular, the +spec-
an essentially normal operator A is trivial tral mapping theorem holds (- 25 1 Linear
if and only if A = N + K with N normal and Operators G).
K compact. Hence follows the Weyl-von A general class of operators having asso-
Neumann-Berg-Sikonia theorem. The BDF ciated spectral resolution was introduced by
theorem for essentially normal operators is N. Dunford. Let X be a Banach space. An
proved by the pairing Ext(Y) x K’(Y)-+Z operator E E B(X) is called a projection if EZ =
defined by the index, where K’(Y) = R(SY+) = E. As before we can define a (projection-
lim,,, [Y, GL(n, C)] (- 237 K-Theory; [ 131). valued countably additive) spectral measure on
The induced homomorphism y, : Ext( Y)-t Bc. An operator AfB(X) is said to be a spec-
Hom(K’(Y), Z) is always surjective, and it tral operator if there exists a spectral measure
is an isomorphism for Yc R3 or Y = s” but E on @ satisfying the following properties: (i)
not for Y c R4. E(M)A=AE(M),ME~~;(~~)~(A~,~,,,,,)cM,
Ext is a tcovariant functor from the cate- MEB~,, where A[, is the restriction of A to Y
gory of compact metrizable spaces to the and A is the closure of M; (iii) there exists a
category of Abelian groups. It is thomotopy k>O such that IIE(M)II dk for all ME& E is
invariant. Define for n = 0, 1, the group unique. A spectral operator A is expressed as
Ext,-,( Y) by Ext(S” Y), where S” Y is the n-fold A = S + N, where S = iczE(dz) and N is +quasi-
tsuspension. Then we have the periodicity nilpotent. A is said to be a scalar operator if N
Ext,+,(Y)gExt,(Y). Moreover, for each pair =O. Unbounded spectral operators are defined
of compact metrizable spaces Y 3 Z we have similarly, with (i’) E(M) A c AE( M) in place of
the long exact sequence (i). However, for unbounded spectral opera-
tors A we no longer have the decomposition
Ext(Z)%Ext(Y)r;Ext(Y/Z)%$Z)1-: ..,,
” n ” A = S + N. (For more details about spectral
operators - [3]. For other topics related to
where Y/Z is the space obtained from Y by
the material discussed in this section - 68
collapsing Z to a point and 8 is q*r;’ :
Compact and Nuclear Operators; 25 1 Linear
Ext,( Y/Z)+Ext,(SZ) defined by q: YUCZ-,
Operators; and 287 Numerical Computation
(YUCZ)/Y=SZandr:YUCZd(YUCZ)/CZ=
of Eigenvalues.)
Y/Z, CZ being the +cone over Z. Ext, satisfies
the +Eilenberg-Steenrod axioms for homology
theory except for the dimension axiom, which
is replaced by Ext,(S’) = Ext(S”-‘) = Z for M References
even and =0 for n odd.
The von Neumann algebra B(X) is classified
[1] N. I. Akhiezer and I. M. Glazman, Theory
as a tfactor of type I,. In the case of a factor
of linear operators in Hilbert space I, II,
M of type II, another index theory has been
Ungar, 1961, 1963. (Original in Russian, 1950.)
developed by H. Breuer [ 171 and others re-
[2] F. Riesz and B. Sz.-Nagy, Functional
placing B@‘(X) by the closed ideal of M gen-
analysis, Ungar, 1955. (Original in French,
erated by finite projections and using the
1952.)
tsemifmite trace on M for the dimensions of
[3] N. Dunford and J. T. Schwartz, Linear
kernels and cokernels of operators in M.
operators, I, II, III, Interscience, 1958, 1963,
1971.
[4] K. Yosida, Functional analysis, Springer,
K. Spectral Analysis in Banach Spaces 1965.
[5] T. Kato, Perturbation theory for linear
Spectral analysis becomes rather involved for operators, Springer, 1966.
general operators in a Banach space as well as [6] M. Reed and B. Simon, Methods of mod-
for nonnormal operators in Hilbert space. ern mathematical physics, Academic Press, I,
For a tcompact operator A, the nature of 1972; II, 1975; III, 1979; IV, 1978.
the spectrum cr(A) and the structure of A in the [7] P. R. Halmos, Introduction to Hilbert
root subspace associated with a nonzero eigen- space and the theory of spectral multiplicity,
value are well known (- 68 Compact and Chelsea, second edition, 1957.
391 A 1452
Spectral Geometry

[S] M. H. Stone, Linear transformations in The set of eigenvalues of A is called the spec-
Hilbert space and their applications to analy- trum of D and is denoted by Spec(D). There
sis, Amer. Math. Sot. Colloq. Publ., 1932. arises the question of how much information
[9] I. C. Gokhberg (Gohberg) and M. G. Spec(D) can impart about the geometric prop-
Krein, Introduction to the theory of linear erties (i.e., the shape, extent, and connected-
non-self-adjoint operators, Amer. Math. Sot., ness) of D. Generally, spectral geometry is the
1969. (Original in Russian, 1965.) study of the relations between the spectrum
[lo] I. C. Gokhberg (Gohberg) and M. G. of domains D of +Riemannian manifolds or
Krein, Theory and applications of Volterra compact Riemannian manifolds (A/l, g) and the
operators in Hilbert space, Amer. Math. Sot., geometric properties of D or (M, 91.
1970. (Original in Russian, 1967.)
[ 111 R. G. Douglas, Banach algebra tech-
niques in operator theory, Academic Press, B. Spectra
1972.
[12] M. F. Atiyah, K-theory, Benjamin, 1967. Let gp(M) denote the space of smooth +p-
[ 131 M. F. Atiyah, Global theory of elliptic forms on a compact m-dimensional C”-
operators, Proc. Intern. Conf. on Functional Riemannian manifold (M, g). Then eigenvalues
Analysis and Related Topics, Univ. Tokyo of the +Laplacian (Laplace-Beltrami operator)
Press, 1970, 2 I -30. A acting on gP(M) are discretely distributed in
1141 L. G. Brown, R. G. Douglas, and P. A. [0, co), and each multiplicity is finite (- 68
Fillmore, Unitary equivalence module the Compact and Nuclear Operators, 323 Partial
compact operators and extensions of C*- Differential Equations of Elliptic Type). The
algebras, Lecture notes in math. 345, Springer, spectrum for p-forms Spe@(M, .y) is { 1,. 1 d
1973, X-128. i,,, 2 6 }, where each eigenvalue is repeated
[15] L. G. Brown, R. G. Douglas, and P. A. as many times as its multiplicity indicates.
Fillmore, Extensions of C*-algebras and K- If 0 is an eigenvalue, its multiplicity is equal
homology, Ann. Math., 105 (1977), 265-324. to the pth +Betti number of M.
[ 161 R. G. Douglas, C*-algebra extensions and In the following, mainly the case p = 0
K-homology, Ann. math. studies 95, Princeton is explained. Spec’(M, g) is denoted by
Univ. Press, 1980. Spec(M,g). 0 is always in Spec(M, g) and its
[17] M. Breuer, Fredholm theories on von multiplicity is 1. So we put i, = 0, and I,, is
Neumann algebras, Math. Ann., I, 178 (1968), the first nonzero eigenvalue. A geometric
243-254; II, 180 (1969), 313-325. meaning of A,f at x for a function ,f is as fol-
lows: If {~,,};;f=~ are m geodesics mutually or-
thogonal at x and parametrized by arc length,
then @f)(x) = C,,(f’o Y,,)“(O).
391 (Vll.21) Let {cpi}zo be an orthonormal basis of
Spectral Geometry go(M) consisting of eigenfunctions: A’pi +ii’pi
=O, (cp,, cpj) =ib, qicpj= 6,. Then the +funda-
mental solution E(x, y, t) of the heat equation
A. General Remarks A-a/t = 0 is given by E(x, y, t) = Xi em”c’cp,(x) 0
vi(y) as a function on A4 x M x (0, m). We
Let E3 be a Euclidean 3-space with a standard put Z(t)=i, E(x,~,t)=&e~“c’. Z(t) and
coordinate system (x,, x2, x3) and E2 be the Spec(M, g) are equivalent. The Minakshisun-
x1 x,-plane. Consider a domain D in EZ as a daram-Pleijel asymptotic expansion of Z(t),
vibrating membrane with the fixed boundary
Z(t)-(1/4nt)m’~(uo+a,t+a,t~+...), tl0,
(?D. Then the height x3 = F(x, ,x2, t) obeys
the differential equation of hyperbolic type, is the bridge connecting Spec(M, g) and geo-
i*F(x,, x2, t)/&* = c’AF(x,, x2, t), where A = metric properties of (M, g), because uo, a,, .
c?‘Jc’x: +d2/?xz denotes the +Laplacian in E2 can be expressed as the integrals of functions
and c is a constant (we put c = 1 in the follow- over M defined by y =( gij), the components
ing). For solutions of the form F(x, , x2, t) = Rjkl of the +Riemannian curvature tensor, and
U(x, , xz) v(t), Cl is a solution of the TDirichlet their derivatives of finite order [ 11. a, is the
problem in DcE’; AU+A”U=O, where 1, volume of (M, g) and a, = (l/6) sM S, where S is
is a positive constant called an teigenvalue the tscalar curvature. a2 was calculated by H.
of A. Solutions of the form F(x,, x2, t) = P. Mckean, I. M. Singer, and M. Berger, and
CI(x,, x,)sin$t represent the pure tones that a3 by T. Sakai.
the membrane produces as normal modes. Let D be a bounded domain in E* or, more
That is, the shape of D is related to the possi- generally, a bounded domain in a Riemannian
ble sounds or vibrations (i.e., to the eigen- manifold. and assume that the boundary SD is
values of A) through the Dirichlet problem. piecewise smooth. For smooth functions which
1453 391 D
Spectral Geometry

take the value 0 on aD, eigenvalues of the The number of nonisometric flat tori (or
Laplacian A are discretely distributed in (0, m), more generally, compact flat Riemannian
and each multiplicity is finite. We denote the manifolds) with the same spectrum is finite (M.
spectrum of D by Spec(D) = {i L < i, < . }. The Kneser, T. Sunada).
multiplicity of 1.1 is 1, and an eigenfunction If one considers spectra for two types of
,f corresponding to 1, takes the same sign in forms, then the situation turns out to be sim-
D. The behavior of Z(t) = xi e -‘I’ for D is dif- pler. For example, if SpecP(M, g) = SpecP(N, h)
ferent from that for (M, g) since Z(t) reflects the for p = 0, 1, then (M, g) is of constant curvature
geometric situation of aD in this case. K if and only if (N, h) is also and K’= K (V. K.
(M, g) and (N, h), or D, and D, are called Patodi [9]).
isospectral if they have the same spectra.
Examples for which the spectrum is explic-
itly calculable are as follows: spheres (Sm, go = D. The First Eigenvalue
canonical), real projective spaces (RPm, go),
complex projective spaces (CP”, Jo, go), (S2n+1, The first eigenvalue 3,, for (M, g) or for a
gs = suitably deformed from go), flat tori, (and domain D in (M, g) reflects the geometric situ-
for domains D) unit disks, rectangles, equi- ation of (M, g) or D. A lower bound of 1, given
lateral triangles, etc. by J. Cheeger is Aw1 >h(M)‘/4, where h(M) is
the isoperimetric constant, defined by

C. Congruence and Characterization h(M)=inf{vol(S)/min[vol(M,),vol(M,)]},

where the inf is taken over all smoothly em-


Let D, and D, be bounded domains in E’. An
bedded hypersurfaces S dividing M into two
open question is whether isospectral D,, D, are
open submanifolds M, , M2, t3M, = aM2 = S,
congruent. Concerning this, there is M. Kac’s
and vol means the volume. l/4 is the best
paper with the famous title “Can one hear the
possible estimate.
shape of a drum?” Let D be a bounded domain
Let p denote the maximum radius of a disk
in E2 with smooth boundary 3D. If D has r
included in a simply connected D c E2. Then
holes, then
i, B l/(4$) (W. K. Hayman, R. Osserman).
Z(t)-A(D)/4ntH@D)/4&i+(l -r)/l2, If (M, g) has +Ricci curvature > k > 0, then
A, > mk/(m- 1) holds, and the equality holds if
tl.0, and only if (M, g) is isometric to (Sm, (l/k)g,)
holds (A. Pleijel, Kac, P. Mckean, I. M. Singer; (A. Lichnerowicz, M. Obata). E., can also be
- [7]), where A(D) denotes the area of D and estimated from k and the diameter d(M) of M
L(iTD) denotes the length of 8D. This theorem (P. Li, S. T. Yau).
implies that the area, the length of t3D, and the To obtain upper bounds of 2, the minimum
number of holes are determined by Spec(D). principle of 1, is useful. We state it only for
In particular, if D, is a unit disk, Spec(D,) = Of, 9):
Spec(D,) implies that D, and D, are congruent.
There are some other results on Z(t) for do-
mains D of surfaces in Em or for domains D in
Riemannian manifolds (M, g). where inf is taken over all piecewise smooth
Two isometric (M, g), (N, h) are isospectral. functions f satisfying jM f = 0, f # 0, and ( , )
Concerning the question of whether isospec- denotes the local inner product with respect to
tral (M, g), (N, h) are isometric, there are some g. An upper bound of 1, for (M, g) of nonnega-
counterexamples. The first is the case of two tive curvature is A., <~(m)/d(M)~, where c(m) is
flat tori T16, given by J. Milnor. Examples some constant depending on m (Cheeger).
with nonflat metrics were given by N. Ejiri For (M, g) or D a submanifold of another
using warped products and by M. F. Vigneras Riemannian manifold, there exist some esti-
for surfaces of constant negative curvature. mates of i, in terms of second fundamental
In those examples, M and N are homeo- forms, etc.
morphic. A. Ikeda showed that there are tlens i, >j2/(A(D)) holds for DC E2, and the
spaces that are isospectral but not homotopy equality holds if and only if i?D is a circle (C.
equivalent. Faber, E. Krahn), where j denotes the first zero
Examples of affirmative cases are as follows: of the tBesse1 function Jo. This estimate is
Spec(M, g) = Spec(S”, go), m < 6, implies that generalized in many directions; for example,
(M, g) is isometric to (Sm, go) (M. Berger, S. for D c(M2, g) in terms of the integral of the
Tanno). The result is the same for (RPm, g,,), Gauss curvature, etc. It is very useful to note
m < 6. For n < 6, (CP”, Jo, go) is characterized that the estimate of 1, for D is deeply related
by a spectrum among +K%hlerian manifolds to the isoperimetric inequality (- 228 Isoperi-
CM,J, d. metric Problems).
391 E 1454
Spectral Geometry

E. Hersch Type Theorem the nodal set in (M, g) or D is called a nodal


domain off: The nodal set off is a smooth
With respect to the first eigenvalue n,(g) submanifold of (M, g) or D except for a set
and the volume Vol(M,g) of (M,g), n,(g). of lower dimension. The number of nodal
vol(M, g)“” is invariant under a change of domains of an eigenfunction corresponding to
metric g&g (c is a constant). Hersch’s proh- the ith eigenvalue is <i + 1 for (M, g) and <i
lem is stated as follows: Is there a constant for D (Courant-Cheng nodal domain theorem).
k(M) depending on M so that for any Rie-
mannian metric g on M, 1, (9). vol(M, g)2/” <
k(M)? J. Hersch proved this for a 2-sphere M I. Estimate of N(A)
= S2 with k(S2) = 8x, and in this case the
equality holds if and only if g is proportional No.) is defined as the number of elgenvalues of
to the canonical metric go. (M, g) or D which are less than or equal to i.
The Hersch type theorem holds for an For D c E2, H. A. Lorentz conjectured that the
oriented Riemann surface M of genus q with behavior of N(i) for I+ co does not depend on
k(M) = 8n(q + 1) (P. C. Yang, S. T. Yau). There the shape of D but only on the area A(D) of D,
is no such constant k(S’“) for an m-sphere S”, i.e., lim,,, N (A)/1 = A(D)/4n. This was proved
m > 3 (H. Urakawa, H. Muto, S. Tanno). by H. Weyl. Generally, for D or (M, g), the
behavior of N(i) for I+ co is v~l(D)/l”‘~/
(4n)“‘2T(m/2 + l), and this is related to the
F. The Multiplicity of Ai
first term v0l(D)/(4nt)“‘~ of the asymptotic
expansion of Z(t) by +Tauberian and +Abelian
By a theorem of K. Uhlenbeck each eigenvalue
theorems.
for a Riemannian manifold (M, g) is simple.
However, for (Sm, go) the first eigenvalue 1, is
m and its multiplicity is m + 1. Furthermore, for
J. Spec(M, g) and Geodesics
some gs deformed from go, i, ( gs) of (S2”+l, g,)
has multiplicity n2 +4n + 2, which is larger
Let T”= Em/l- be a flat torus, I- being the
than m+ 1 (=2n+2).
lattice for T”. Let I-* be the lattice dual to r.
The multiplicity m&) of the ith eigenvalue
Then Poisson’s formula,
l-i for a Riemann surface of genus q satisfies
m(1.J < 49 + 2i + 1 (S. Y. Cheng, G. Besson).

gives a clear relation between Spec( T”‘) =


G. kth Eigenvalue
j4.rr21y(2,ycT*} and the set {lx(,xET} of
lengths of closed geodesics on T”.
The minimum principle for 1, of Spec(M, g) is If (M, g) satisfies some conditions, then
stated as follows: Let f, be an eigenfunction
Spec(M, g) determines the set of lengths of
corresponding to ii, 0 < i < k - 1. Define H,-,
periodic geodesics (Y. Colin de Verdiere), and
to be the set of piecewise smooth functions
the spectrum characterizes those Riemannian
f # 0 orthogonal to each f;, i.e., sMfJ;. = 0. Then manifolds whose geodesics are all periodic (J.
J. Duistermaat, V. W. Guillemin).

where inf is taken over f~ Hk-l. We have the


K. SpecP(M, g) and the Euler-PoincarC
minimax principle for i, of the first and second
Characteristic
type. We state the second type only:

Let (M, g) be oriented and even dimensional.


Let EP(x, y, t) be the tfundamental solution of
the theat equation for p-forms. Corresponding
where L, denotes a k-dimensional linear sub- to Z(t) for Spec(A4, g), we get ZP(t‘l = $,,,,EP =
space of so(M). From this, for l-parameter Cie-ip,,t. Then
metrics g. (a < u < b) on M, the continuity of
&(gu) with respect to u follows. tr Ep =x(M),
p$O(-l)“Zp(t)= 5 (-l)P{
p=o M
H. Courant-Cheng Nodal Domain Theorem where x(M) denotes the +Euler-PoincarC char-
acteristic of M (Mckean, Singer). 0n the other
Let f be an eigenfunction on (M, g) or D. The hand, the +Gauss-Bonnet theorem is x(M) =
set of all zero points off is called the nodal set SW C, where C is a function on A4 expressed
of ,f (or the nodal curve off if m = 2). Each as a homogeneous polynomial of components
connected component of the complement of of the Riemannian curvature tensor. Then
\

1455 392
Spherical Astronomy

Patodi proved to an unbounded self-adjoint operator for


L,(N). Generally A has a continuous spec-
trum. Some conditions for (N, h) to have pure
point spectrum were given in terms of curva-
ture (Donnelly, P. Li).
L. q-Function If D is a minimal submanifold of another
Riemannian manifold, estimates of 1, are
Let (X, g) be a compact oriented 4k-dimen- related to the stability of D (- 275 Minimal
sional Riemannian manifold with boundary Submanifolds).
dX = Y and assume that some neighborhood On the nonexistence of the l-parameter
of Y in (X, g) is isometric to a Riemannian isospectral deformation (M, g,)-*(M, gt), there
product Y x [0, E). Define an operator B are results for (i) m = 2 and go of negative
acting on forms of even degree on Y by curvature, m> 3 and negatively pinched go (V.
Bw=(-l)“+“+‘(*d-d*)w, WEsqY), Guillemin, D. Kazhdan); (ii) flat metrics go (R.
Kuwabara); and (iii) go of constant positive
where * denotes the +Hodge star operator and curvature (Tanno).
d denotes exterior differentiation on Y. Then As for spectral geometry for complex La-
BZ = A holds. Using the spectrum { p} of B, we placian on Hermitian manifolds, there are
define the q-function by results by P. Gilkey, Tsukada, and others.

References
q(s) is a spectral invariant, and
[l] M. F. Atiyah, R. Bott, and V. K. Patodi,
w(X)= Lh,...,~~)-vr1(0) On the heat equation and the index theorem,
sx
Inventiones Math., 19 (1973), 279-330.
holds (Atiyah, Patodi, and Singer [2]), where [2] M. F. Atiyah, V. K. Patodi, and I. M.
sgn(X) is the tsignature of the quadratic form Singer, Spectral asymmetry and Riemannian
defined by the +cup product on the image of geometry I, II, III, Proc. Cambridge Philos.
HZk(X, Y) in tfzk(X), L, is the kth +Hirzebruch Sot., 77 (1975), 43-69,405-432; 79 (1976), 71-
L-polynomial, and pl, , pk are the Pon- 99.
tryagin forms of (X, g). [3] M. Berger, P. Gauduchon and E. Mazet,
Le spectre d’une variCt& riemannienne, Lecture
M. Analytic Torsion notes in math. 194, Springer, 1971.
[4] J. Cheeger, A lower bound for the smallest
Let x be a representation of the fundamental eigenvalue of the Laplacian, Problems in
group n,(M) of (M,g) by the orthogonal group Analysis: Symposium in Honor of Salomon
and E, be the associated vector bundle. Let Ax Bochner, R. C. Gunning (ed.), Princeton Univ.
be the Laplacian acting on E,-valued p-forms Press, 1970, 175-199.
on M and {i;,i} be its spectrum. Then [5] P. B. Gilkey, The index theorem and the
heat equation, Publish or Perish, 1974.
logT(M,x)= f (-l)‘plog; C(n;,i)m.T [6] M. Kac, Can one hear the shape of a drum ?
I”0 (i )I .s=ll Amer. Math. Monthly, 73 (April 1966), l-23.
is independent of the choice of g. T(M, x) is [7] H. P. Mckean and I. M. Singer, Curvature
called the analytic torsion of M. T(M, x) is and eigenvalues of the Laplacian, J. Differen-
equal to the tR-torsion z(M, x) (W. Miiller, tial Geometry, 1 (1967), 43-69.
Cheeger). [S] S. Minakshisundarum and A. Pleijel, Some
properties of the eigenfunctions of the Laplace
operator on Riemannian manifolds, Canad. J.
N. Concluding Remarks Math., 1 (1949), 242-256.
[9] V. K. Patodi, Curvature and the funda-
An tisometry $ of (M, g) commutes with the
mental solution of the heat operator, J. Indian
Laplacian and induces a linear transformation
Math. Sot., 34 (1970), 269-285.
$p of each eigenspace V,. Using the asymp-
totic expansion of C tr($P)e-“‘, the Atiyah-
Singer tG-signature theorem has been proved
(H. Donnelly, Patodi). 392 (XX.5)
The Atiyah-Singer +index theorem has been
Spherical Astronomy
proved by using Gilkey’s theory and the heat
equation (Atiyah, R. Bott, Patodi).
Let (N, h) be a complete Riemannian mani- Spherical astronomy is concerned with the
fold of negative curvature. Then A is extended apparent positions of celestial bodies and their
392 Ref. 1456
Spherical Astronomy

motions on a celestial sphere with center at an Annual parallax for a fixed star is half the
observer on the Earth, while tcelestial mechan- difference of its apparent directions, which are
ics is concerned with computing heliocentric measured at the ends of a diameter perpendic-
true positions of planets and comets and geo- ular to the direction of the star from the orbit
centric true positions of satellites. The purpose of the Earth. The effect of the annual paral-
of spherical astronomy is to find all possible lax varies with a period of one year. However,
causes of displacement of the apparent posi- except for nearby stars, it is not necessary to
tions of celestial bodies on the celestial sphere take this effect into account when computing
from their geocentric positions and to study apparent positions.
their effects. Atmospheric refraction, geo- The pole of the Earth on the celestial sphere
centric parallax, aberration, annual parallax, moves on a circle around the pole of the eclip-
precession, nutation, and proper motion are tic due to the gravitational attraction of the
examples of these causes. Moon, Sun, and planets, and therefore the
When light from a celestial body passes equinox moves clockwise on the ecliptic.
through the Earth’s atmosphere, it is refracted Because the resultant of the attractive force of
since air densities at different heights are differ- the Moon, Sun, and planets changes periodi-
ent. This phenomenon is called atmospheric cally, the motion of the equinox is not uni-
refraction. The effect of refraction on the form. Therefore the motion is expressed as the
apparent direction of the celestial body is a sum of secular motion, called precession, and
minimum when the body is at its culmination, periodic motion, called nutation. Since the
and vanishes when this coincides with the ob- positons of fixed stars on the celestial sphere
server’s zenith, while the maximum refraction are measured with respect to the equator and
of 34’5 occurs when the body is at the horizon. the equinox, their right ascensions and decli-
Topocentric positions differ appreciably nations are continuously changing because
from geocentric positions for the Moon and of precession and nutation.
planets, since their geocentric distances are not Since the stars are not fixed in space but
large compared with the radius of the Earth. themselves have proper motions, their posi-
The difference is largest when the observer is tions on the celestial sphere are continuously
on the equator and the celestial body is at the changing.
horizon, and this largest value is called the Spherical astronomy also includes predic-
geocentric parallax. The geocentric parallax of tions of solar and lunar eclipses, the theory
the Moon is between 53i9 and 6Oi2; those of of +orbit determination to compute apparent
the Sun, Mercury, Venus, Mars, Jupiter, and positions of celestial bodies in the solar system
Saturn are, respectively, 8:‘64&81’94, 6”-161’5, by use of orbital elements, and the compu-
5”-32”, 31’5-231’5, li’4&2i’l, and 01’8~11’1. For tation of ephemerides for the Sun, Moon,
fixed stars geocentric parallaxes can be re- planets, and fixed stars. Practical astronomy,
garded as zero since the stars are far from the which develops theories and methods of ob-
Earth. servation by use of meridian circles, transit
The Earth moves in an orbit around the Sun instruments, zenith telescopes, sextants, the-
with period of one year (365.2564 days) and odolites, telescopes with equatorial mountings,
rotates around the polar axis, which is inclined and astronomical clocks, and navigational
at 66’5 to the orbital plane (the ecliptic), with astronomy, which deals with methods for
period of one day (23 hours, 56 minutes, 4.091 determining the positions of ships and aircraft,
seconds). Therefore the observer on the Earth are closely connected to spherical astronomy.
moves with a speed depending on the latitude It should be noted that recently radar has
(0.465 km/set on the equator) due to the rota- been used to measure distances to the Moon
tion and moves with an average speed of and planets accurately, a contribution to de-
29.785 km/set on the ecliptic. Due to these termining the size of the solar system with
motions of the observer, apparent directions of precision.
celestial bodies are displaced from their geo-
metric directions. Displacement due to the
rotation is called diurnal aberration, and that References
due to the orbital motion, annual aberration.
The effect of diurnal aberration is between 0” [ 11 W. Chauvenet, A manual of spherical and
and Oi’32 and varies with a period of one day, practical astronomy I, II, Dover, 1960.
while that of the annual aberration is between [2] S. Newcomb, A compendium of spherical
0” and 201’496 and varies with a period of one astronomy, Dover, 1960.
year. Moreover, to compute the positions of [3] W. M. Smart, Textbook on spherical
celestial bodies, the travel time of light to the astronomy, Cambridge Univ. Press, fifth edi-
observer should be taken into account. tion. 1965.
1457 393 B
Spherical Functions

[4] E. W. Woolard and G. M. Clemence, where the contour of integration in (3) is a


Spherical astronomy, Academic Press, 1966. closed curve with positive direction on the
c-plane, avoiding the half-line (--co, -l),
and admitting 1 and z as inner points of the
domain it bounds, whereas the contour in
393 (XIV.6) (4) is a closed cc-shaped curve encircling the
Spherical Functions point 1 once in the negative direction and
the point -1 once in the positive direction.
A. Spherical Functions The functions P,,(z) and Q”(z) are called Le-
gendre functions of the first and second kind,
The term spherical functions in modern termi- respectively. The integral representation (3) is
nology means a certain family of functions on SchPfli’s integral representation. If Re(v + 1) >
tsymmetric Riemannian spaces obtained as 0, we can deform the contour of integration
simultaneous teigenfunctions of certain inte- and obtain
gral operations (- 437 Unitary Representa-
tions). In this article, however, we explain
(5)
only the classical theory of Laplace’s spherical
functions with respect to the rotation group in
If v is an integer, it is convenient to use the
3-dimensional space.
representation (5).
Solutions of tlaplace’s equation AI/=0 that
From (3)-(5), we can obtain the recurrence
are homogeneous polynomials of degree n
formulas for Legendre functions of distinct
with respect to the orthogonal coordinates x,
degrees. The recurrence formulas for P,,(z) and
y, z are called solid harmonics of degree n. If n
Q,(z) have exactly the same form (- Appendix
is a positive integer, there are 2n + 1 linearly
A, Table 18.11). Expanding the integrand in (3)
independent solid harmonics of degree n. In
and (4) with respect to z- 1 and c/z, the fol-
+polar coordinates (r, 0, q) they are of the form
lowing identities are obtained:
r” Y,(O, cp), where Y,(O, cp), the surface harmonic
of degree n, satisfies the differential equation
P,,(z)=F Vf 1, --v, l,? ) ll-z1<2,
1 a au, 1 a2Y ( >
smOay +-’
sin 0 80 ( > sin*0 a(p2 &I-(v+l)
Q&z)= (2~)“~’ r(v + 3/2)
+n(n+l)Y,=O.

Here, if we apply keparation of variables to 0


and cp and put z = cos 8, then the component
cp is represented by trigonometric functions,
in
XF
( v+lv+2
--,-,v+-,-
2 2
31
222 >
,

and the other component in 0 reduces to a


solution of Legendre’s associated differential where F(sl, /$ y, z) is the thypergeometric func-
equation tion. These expansions are the solutions in
series of Legendre’s differential equation in the
(1 -z’)$-2z$ neighborhood of the tregular singular points
z = 1 and x), respectively (- Appendix A,

+(n(n+l)-&
> w=o. (1)
Table 18.11).
If v is a positive integer, since [ = 1 is not a
+branch point in (3), P,(z) is represented by
Rodrigues’s formula
B. Legendre Functions
(12- 1)
With m = 0 in (1) and II replaced by an arbi- r.,(z)=& 2”(z-i)“”
di
trary complex number v, the equation is re-
duced to Legendre’s differential equation

d2w dw
,(l-z2)~-22-+v(V+l)W=0,
dz2 dz In this case, P,,(z) is a polynomial of degree n
such that
whose fundamental solutions are represented
by
[n/21 (2n - 2r)! Z"-2r
P,(z)= c t-11
(l+.z+) (52L1)V t-=0 2”r!(n-r)!(n-2r)! ’
P”(Z) =& q- 0”” dL (3)
PO(z)=1,
(l+,-'+) (p-1)’ di which is called the Legendre polynomial (Le-
QM=& ,“(,-i)‘” ' (4)
gendre, 1784). The tgenerating function for the
P
393 c 1458
Spherical Functions

Legendre polynomials is (1 - 2pcos fI + p2)-1/2, obtained by Ferrers and that with the minus
whose expansion with respect to p is of the sign by Heine and Hobson.
form Es0 P,,(z)p”, z =cos 8. Here the tgenerat-
ing function (1 - 2p cos fI + p2) -‘I2 is the inverse
D. Surface Harmonics
of the distance between two points (p, e) and
(1,0) in polar coordinates. Hence P,,(z) is also
From the considerations so far for the sur-
called the Legendre coefficient. If z is real, { ((2n
face harmonics K(0, cp), 2n + 1 independent
+ 1)/2)“2P,(z)},“0 constitutes an orthonormal
solutions
system on [ -1, l] (- 317 Orthogonal Func-
tions). The n zeros of P.(z) are all real, simple, P”(COS 4, P,“(cos @sin mcp,
and lie in (-l,l). For sufficiently large n, we
qP,“(cos 0)cos mcp, 1 <m<n,
have
are obtained. Since P,(cos e) vanishes on II
PJCOS e)
latitudes of the unit sphere, and P,“(cos 0).
cos mcp and P,m(cos @sin mrp vanish on n -m
latitudes and m longitudes of the unit sphere,
respectively, the former functions are called
Qn@s 0) zonal harmonics and the latter, tesseral har-
monics. The general form of surface harmonics
x of order n is given by a linear combination
of zonal and tesseral harmonics:

Y,(R cp)= A,,oP,(cos Q)


n
+ 1 (A,,,cosmcp+B,,,sinmcp)P,“(cosfI). (7)
C. Associated Legendre Functions WI=1
Expressing two surface harmonics Ybl) and
For any positive integer m, the functions
Yj’) in linear combinations such as (7), the
P;(z) =( 1 - z2p2 d”P”(Z)/dZ”, following orthogonality relations hold:

Q;(z)=(l -z’)“‘2&“Qy(~)/d~m
YL’)(B, cp)Yj’)(Q, cp)sinBdBdq
are called the associated Legendre functions of
the first and second kind, respectively. This 47l
=k,,- ,4p#’
definition, due to N. M. Ferrers, is convenient 2nSl ’ n’”
(
for the case -1 <z < 1. For arbitrary com-
plex z in a domain G obtained by deleting the
segment [ -1, l] from the complex plane, the
following definition, due to H. E. Heine and
Since the family of all zonal and tesseral har-
E. W. Hobson, is used:
monics constitutes a komplete orthogonal
P=“(z) = (22 - 1)“‘2 d”P”(Z)/dZ”, system, it is possible to expand a function
f(e, cp) on the sphere into an orthogonal series:
Q;(Z) = (z’ - l)m’z d”Q,(z)/dz”.

The associated Legendre functions satisfy the


associated Legendre differential equation (1).
In particular, for v = n (a positive integer) and
z=x (real),

{(2n+l)(n-m)!/2(n+m)!}“2P,m(Z),
+B,,,sinmcp)P,“(cos@ 1
n=O, 1, . ..m=constant.

constitute an orthonormal system on [ -1, 11. To obtain surface harmonics, the following
method is effective. Let v be a direction pro-
The addition theorem for the Legendre func-
tions is portional to the direction cosines 1, m, n. Then
a function

=P”(z,)P”(z,)+2 ,f (n-m)!PF(Z,)
m=,(n+m)!
is a solution of Laplace’sequation. Physically,
x P,“(z2)cosmcp,
this corresponds to a tpotential of double pole
where the equality with the plus sign was with moment a and direction v. A more gen-
1459 394 A
Stability

era1 multipole potential are


e -vni
P?qz)
y P = _
2’ ‘471sinvn

also satisfies Laplace’s equation. If we put V=


wx, Y, zb -‘“-l, U, is a spherical function of
order n (Maxwell’s theorem). Various spherical
functions correspond to particular directions
vi. For example, if every vi is equal to z, we
have zonal harmonics; and if n-m of the vi are where the contour of integration for the latter
equal to z and m of the vi are symmetric on the integral is a curve encircling the point -1 once
xy-plane, we obtain tesseral harmonics. Let y in the positive direction and the point 1 once
be an angle between two segments connecting in the negative direction. Then the associated
the origin to the points (r, 0, cp) and (r’, 0’, cp’) in Legendre functions for an arbitrary number p
polar coordinates. Then cos y = cos 0~0s 0’ + are defined as follows:
sin Osin @cos(q - cp’), and if we choose the
line connecting the origin to a point (r’. H’, cp’)
P,“(z) =
as the axis defining P,,, we have
r”+l X' a 4" a
ww9=(--l)“~ rz+r’ay Q:(z)= ;;r;f;l:‘(zz - l)““Q$!!$‘(z).
(
zr a n i If v-p is a positive integer Pckfin is called
+Tz -' the Gegenbauer polynomial, a;sg denoted by
>(> r
C,-,(z). The C,-,(z) are obtained as coefli-
These are called biaxial spherical harmonics, cients of the expansion of the generating func-
which can also be represented (by the addition tion (1 - 2hz + z’)-(~~+~)~* (- Appendix A,
theorem) by means of spherical harmonics Table 20.1).
with respect to each axis. For spherical functions of several variables
there is an investigation by P. Appell and J.
Kempe de Feriet [23 (- 206 Hypergeometric
E. Extension of the Legendre Functions Functions D).

We extend the associated functions with posi-


tive integer m to any number m. First, if m is a References
negative integer, we put
[1] E. W. Hobson, The theory of spherical and
im
P”-“(z)=(l -ZZ))m’* =K,, d[,-, ... ellipsoidal harmonics, Cambridge Univ. Press,
s1 s1 1931 (Chelsea, 1955).
Is 12 [2] P. Appell and J. Kempe de Feriet, Fonc-
X di, PAi,M, a tions hypergeomttriques et hyperspheriques,
s1 s1 polynomes d’Hermite, Gauthier-Villars, 1926.
<m [3] C. Miiller, Spherical harmonics, Lecture
Q,“(z)=(l -z’))~” *dim d[,-, ...
sm sm notes in math. 7, Springer, 1967.
[4] T. MacRobert, Spherical harmonics, Sned-
r3 i2
X QXM, > den, third revised edition, 1967.
4,
s a2 s cc Also - references to 206 Hypergeometric
a definition due to Ferrers. Then Functions, 389 Special Functions.

P”-“‘(z)=(-l)“r(v-m+l)P~(z),
T(v+m+l)

394 (XIII.1 3)
Stability
When we use the definition due to Heine and
Hobson, the factor (-1)” in these formulas is A. General Remarks
excluded. Two fundamental solutions, called
hypergeometric functions of the hyperspherical Stability was originally a concept concerned
differential equation with stationary physical states. When a state
is affected by a small disturbance, this state
(1 -z2)d2w/dz2-2(/*+1)zdw/dz
is said to be stable if the disturbance subse-
+(v-p)(v+p+ l)w=O, quently remains small, and unstable if the dis-
394 B 1460
Stability

turbance gradually increases. For instance, is tautonomous, stability implies uniform


consider a rod placed in the Earth’s gravi- stability.
tational field with one end fixed at a point If (1) p(t) is uniformly stable and (2) for any
around which the rod can rotate freely. When t, and 4 > 0, there exist a [ > 0 independent
the rod is placed vertically, this state is station- of to and a T > 0 independent of t,, such that
ary. It is stable if the rod is hanging down from Ix,-q$t,)I <i and t>t,+ T imply Ix(t, t,,x,)
the fixed end, and unstable if it is standing on -v(t)1 <II, then q(t) is said to be uniformly
the fixed end. In physical systems only the asymptotically stable.
stable state is practically realizable, so this dis- Suppose that there exists a positive number
tinction is important. I with the following property: For any to
The concept of stability is used not only in and E > 0 one can take a 6(~) > 0 such that
relation to physical states but also in many Ix0 - 4(k4l <W implies Id4 to, x0) - 4Wl<
other fields of science. We shall restrict our- ~e~‘(*~~o) for t > to. Then c)(t) is said to be
selves to stability of solutions of differential exponentially stable. Exponential stability
equations. There, the term stability is used in implies uniform asymptotic stability.
the sense that a small change in the initial
values results in a small change in the solution.
As long as the solution is considered within a C. Criteria
finite interval of the independent variable, this
stability is naturally guaranteed by the con- To deal with the stability of x = q(t), we need
tinuity of the solution with respect to its initial consider only the case q(t) = 0, since the trans-
values (- 316 Ordinary Differential Equations formation x = y + b(t) reduces equation (1) to
(Initial Value Problems)). The problem arises
when an independent variable moves over an Y’ = W, Y + dt)) - f(t, r(t)) = F(t, Y),
unbounded interval.
F(t, 0) = 0, (2)
Let (x1, , x,)=x, (x, (t), ,x,(t)) = x(t),
(x;(t), . . ..xb(t))=x’(t) (the symbol ’ means and thus x = p(t) is transformed into y = 0. If F
differentiation by t), and 1x1= &, IxjI. Con- is continuously differentiable with respect to y,
sider the differential equation (2) can be written in the form

x’=f(t, x), (1) Y’ = F&c 0)~ + g@>YX g(LY)=4lYl).


for which the existence and uniqueness of the The linear part of this equation,
solution of the initial value problem is as-
Y’ = F,(t, WY,
sumedforItl<co,Ixl<co.Letx=p(t)bea
solutionof(l).Ifforany~>Oandt,,a6>0 is called the variational equation for (1). So,
can be chosen so that (x(t,) - p(t,)J < 6 implies in this section, we can state several criteria
Ix(t)-(p(t)l<Efor t,<t<ccj (-x-ctgt,), for stability of the null solution y = 0 of the
where x(t) is any solution of(l), then x = p(t) is equation
said to be (Lyapunov) stable in the positive
Y’ = WY + gk Yh Ig(LY)l=4Yl). (3)
(negative) direction. If it is stable both in the
positive and negative directions, it is said to be (I) If (3) is linear (i.e., g(t, y) = 0). then y = 0 is
stable in both directions. In the remainder of stable if and only if every solution of (3) is
this article we will consider stability in the bounded as t j co.
positive direction only. Corresponding asser- (II) If (3) is linear, uniform asymptotic sta-
tions for stability in the negative direction can bility implies exponential stability.
be obtained by reversing the sign oft. Let .f(t, y) be a function defined for IyI <p, t
> SL.If there exists a continuous function w(y)
such that w(O) = 0, w(y) > 0 (y # 0). f(t, y) 3 w(y)
B. Classification (IyI <p, t > a), then f(t, y) is said to be positive
definite. If -f(t, y) is positive definite, then
We denote by x = x(t, to, x0) a solution of (1) ,f(t, y) is said to be negative definite.
such that x=x0 at t=t,. (III) The existence of a Lyapunov function
Suppose a solution x = p(t) is stable. If for V(t, y) with the following properties implies the
any to there exists a < > 0 such that 1x(t, t,, x0) stability of y = 0: (i) V(t, y) is positive definite
-q(t)l-0 as t+cO for any x(t, t,,x,) with Ix0 and differentiable, (ii) V(t, 0) = 0, (iii) P(t, y) =
-q(to)( < [, p(t) is said to be asymptotically r/; + V” (WY + gk Y)) GO.
stable. The existence of V(t, y) with the following
If a constant 6 in the definition of stability properties implies the uniform asymptotic
can be chosen independently of to, q(t) is said stability of y=O: (i) same as (i) above, (ii) there
to be uniformly stable. When equation (I) exists a continuous function u(y) such that
1461 394 Ref.
Stability

40) = 0, U(Y) > 0 (Y Z 01, Ut, Y) G U(Y), (iii) PO, Y) any solution x(t) with Ix(tl)-p(tJc6 for
is negative definite. some t, and t,, lJt, gr<io x(t) belongs to the E-
Hereafter we shall assume that Ig(t, y)I = neighborhood of U t,st<z ~(0, then r(t) is said
o(lyI) as y-0 uniformly with respect to t. to have orbital stability.
(IV) If P(t) is a constant matrix all of whose When f(t,x) in equation (1) is independent of
teigenvalues have negative real parts, then t, (I) is often called a tdynamical system. In the
y = 0 is asymptotically stable [3,4]. theory of dynamical systems, not merely the
(V) Let P(t) be continuous and periodic with stability of a solution itself but also the sta-
period T and Z be a ifundamental system of bility of a closed invariant set is of importance
solutions of the variational equation (- 126 Dynamical Systems).
It is also of importance to investigate the
2’ = P(t)z. (4)
change in solution caused by a small change in
Then there exists a constant matrix C such the right-hand member of the equation. Sup-
thatZ(t+T)=Z(t)C. Leti,,...,i,bethe pose, for instance, that the right-hand member
eigenvalues of C. Then the numbers pLk= of the equation depends continuously on a
(log E.,)/T (k = 1, , n) are called the charac- parameter E. Then the question arises as to
teristic exponents of (4). Obviously they are how the solution changes if E changes. The
determined up to integral multiples of 2ni/T. If theory of such problems is called iperturbation
the real parts of the characteristic exponents theory. Suppose that the equation
are all negative, then y = 0 is asymptotically
x’ = f(t, x, R)
stable [3,4].
(VI) If f(t, x) in (1) is periodic in t with period admits a periodic solution q(t) for E=O. Then
T and (1) admits a periodic solution x = q(t) q(t) is said to be stable under perturbation if
with period T, then (1) can be reduced to (3) by for E# 0 the same equation admits a periodic
putting x = y + p(t), and P(t), g(t, y) are both solution lying near q(t). In tcelestial mechanics
periodic in t with period T. Thus criterion (V) and ‘nonlinear oscillation theory this concept
can be applied as a stability criterion for the plays an important role.
periodic solution of (1). There are many other
criteria for various particular forms of the
equation (- 290 Nonlinear Oscillation). For
References
the +autonomous case where f(t, x) is of the
form p(x) or p(x)+q(t) with q(t+ T)=q(t),
many results have been found. [I] R. E. Bellman, Stability theory of differen-
(VII) If the solution z = 0 of tial equations, McGraw-Hill, 1953.
[2] L. Cesari, Asymptotic behavior and sta-
z’ = P(t)z
bility problems in ordinary differential equa-
is uniformly asymptotically stable, then the tions, Springer, third edition, 197 1.
solution y = 0 of (3) is also uniformly asymp- [S] E. A. Coddington and N. Levinson,
totically stable [4]. Theory of ordinary differential equations,
McGraw-Hill, 1955.
[4] S. Lefschetz, Differential equations, geo-
D. Conditional Stability metric theory, Interscience, 1957.
[S] A. Lyapunov (Ljapunov), Problkme gCn-
Let p(t) be a solution and 3 a family of solu- Cral de la stabilitk du mouvement, Ann. Math.
tionsof(l).IfforanyF:>O,a6>Ocanbe Studies, Princeton Univ. Press, 1947.
determined so that 1x(t,) - q(t,,)l < 6 implies [6] V. V. Nemytskii (Nemyckii) and V. V.
Ix(t) - r(t)1 <E for t, d t < n3 for any solution Stepanov, Qualitative theory of differential
x(t) in 3, then q(t) is said to be stable with equations, Princeton Univ. Press, 1960.
respect to the family 3. If a family 3 can be (Original in Russian, 1947.)
found so that a solution is stable with respect [7] 0. Perron, iiber eine Matrixtransforma-
to 3, the solution is said to be conditionally tion, Math. Z., 32 (1930), 465-473.
stable. For instance, in equation (3), if P(t) is a [S] 0. Perron, Die StabilitCtsfrage bei Dif-
constant matrix some of whose eigenvalues ferentialgleichungen, Math. Z., 32 (1930), 703%
have negative real parts, g(t, y) is differentiable 728.
with respect to y, and g,(t, y) = o( 1) uniformly [9] A. Halanay, Differential equations, sta-
in t as ~‘0, then y = 0 is conditionally stable. bility, oscillations, time lags, Academic Press,
We now mention a weaker kind of stabil- 1966.
ity called orbital stability. Let q(t) be a solu- [lo] T. Yoshizawa, Stability theory by Lja-
tion and E any positive number. If there can punov’s second method, Pub]. Math. Sot.
be found a positive number 6 such that for Japan, 1966.
395 A 1462
Stationary Processes

395 (XVII.1 2) B. Spectral Decomposition of Weakly


Stationary Processes
Stationary Processes
The covariance function p(t) is obviously
A. Definitions tpositive definite and continuous. Therefore,
by +Bochner’s theorem, we have the tspectral
Stationary process is a general name given to decomposition of p(t):
all tstochastic processes (- 407 Stochastic
Processes) that have the property of being p(t) = e’“‘F(dl), (4)
s 1”
stationary (to be defined in the next para-
graph) under a shift of a time parameter t that where T’ is either R (when T= R) or [-n, x]
extends over T, which is either the set of all (when T= Z) and F is a bounded measure on
real numbers R (a continuous parameter) or T’. The decomposition (4) is called the Khin-
the set of all integers Z (a discrete parameter). chin decomposition of p(t), and F(dl) is called
Let (Q, 23, P) be a tprobability space and the spectral measure. If the process {X,} is real-
{Xt(w)} (TV T,wEQ) a complex-valued tsto- valued, then the spectral measure F(d) is
chastic process. If for every n, every t,, t,, , symmetric with respect to the origin.
t,~ T, and every +Borel subset E, of complex To obtain the spectral decomposition of a
n-dimensional space C”, the equality weakly stationary process X, itself,, we intro-
duce the tHilbert space &(R) (where fi =
P(Gq.C> “‘,Xc,+l)~Kl) Q(%, P) is the basic probability space on
=p((x,*>...>X,,,&~,) (1) which each X, is regarded as a tsquare inte-
grable function). Let m(X) be the subspace of
holds, then {X,} is called a strongly (or strictly) &(R) spanned by the X, (te T) and the con-
stationary process; while if E(lX,j’) is finite for stant function 1. Since {X,} is weakly station-
every t, and if the tmoments up to the second ary, we can define a one-parameter group of
order are stationary, i.e., if +unitary operators U, (t E T) determined by
U,X, = X,,, and U, 1 = 1. By Mane’s theorem
WC+,) = E(X,),
we have the spectral decomposition of U,:
Jw,+,Xs+,) =wws),

then {X,} is called a weakly stationary process


(4
u,=se’“‘E(di).
T’
(5)

or a stationary process in the wider sense. The


Setting M(A) = E(A) (X,-m), we obtain the
“stationary” in the latter sense obviously in-
spectral decomposition of X,:
cludes the former if E(jX,[‘)< co. Condition
(2) is equivalent to
X,= U,X,=m+ e’“‘M(di). (6)
E(X,) = m (a constant independent of t), s T’
We also have
E((X*-m)(X,-m))=p(t-s)
(a function oft-s). (3)
W(A,)>M(A,))= W, nM. (7)
We call m and p(t) the mean and the covari-
ante function of {X,}. The study of weakly stationary processes is
In the continuous parameter case, we as- based on the decomposition (6). For example,
sume tcontinuity in probability, the weak law of large numbers for {X,},

f? P(lX,+* -XJ>&)=O, &>O, 1 B


1.i.m. ~ X,dt=m+M({O}), (8)
s-a-mB-A s A
for a strongly stationary process, and continu-
is an immediate consequence of (6). In the
ity in me5n square,
discrete parameter case a similar result is
obtained by replacing the integral sign in
expression (8) by the summation sign. In partic-
for a weakly stationary process. The latter ular, if F is continuous at the origin, we have
assumption is equivalent to continuity of the M({O})=O, and only the constant m remains
covariance function p(t). in the right-hand side of (8) [l, 21.
A +Gaussian process is strongly stationary if
and only if it is weakly stationary; and so it is
simply called a tstationary Gaussian process. C. Weakly Stationary Random Distributions
Such processes constitute a typical class of
stationary processes (- 176 Gaussian Pro- Just as we introduce tdistributions as gen-
cesses C). 1 eralizations of ordinary functions, we define
1463 395 D
Stationary Processes

weakly stationary random distributions as (t > 0) from the known values X, (s < t). If the
generalizations of weakly stationary processes. domain of the admissible predictors is limited
Let 2 be the space of all functions of class C” to linear functions of X, (s < t), the theory is
on T= R with compact support. We introduce called linear prediction theory. We can assume
the same topology on 9 as in the theory of without loss of generality that the mean value
distributions. If a random variable X,EL,(R) m of X, is zero and that the spectral measure
is defined for every cpE 2 and the mapping cp F(d2) of {X,} is not a zero measure.
+X, is continuous in the &-sense and linear, Let J&(X) be the subspace of L,(R) spanned
then the family {X,} of random variables is by the X, (s < t), then A(X) = Vt.AfC(X). A linear
called a random distribution in the wider sense predictor for X,,, from X, (s d t) is an element
(- 407 Stochastic Processes). Furthermore, if Y of J%‘!(X). If a linear predictor minimizes
the prediction error a2(r)=E(IX1+,- Yl’) in
KhC> 1)=(X,, 11,
J@,(X), it is called an optimum linear predictor,
(XTh(p, X*,J = cx,> Xv) (9) which turns out to be the t(orthogona1) projec-
tion of X,,, on J&(X) and which is denoted by
for every PER’, where ( , ) stands for the inner
&. Since {X,} is stationary, the error a’(z)
product in L2(Q) and
does not depend on t for such a predictor.
Corresponding to the spectral decomposition
(6) of X,, the optimum linear predictor is ex-
then {X,} is said to be a weakly stationary

x1,,=
pressed in the form
random distribution. With a weakly stationary
process we can associate a weakly stationary
random distribution

x,=sTX,q$)dt.
by the relation sT’ e’“‘$,(i)M(dl), (12)

(10) where &( .) is square integrable with respect to


the spectral measure F(dl).
The subspace &&(X) is nondecreasing in t. If
This correspondence is one-to-one, and there-
J&(X) is independent of t, i.e., &&(X) = d(X)
fore we can identify {Xt} with {X,} as we
for every t, then {X,} is said to be determinis-
identify an ordinary function with a distri-
tic. In this case we have r?,,, = X,,, for every t
bution. From equations (9) it follows that
and z > 0, since X,,, E J&(X). This means that
there exist a constant m and a distribution
the linear predictor enables us to determine
p such that E(X,)=mlcp(t)dt and ./2(X,-
the unknown quantities without error, and
E(X,))(X,-E(X,)) =p(q*Ij/), where * de-
therefore such a process is of no probabilistic
notes convolution, and $(t)= $( -t). We call m
interest. On the other hand, if &&$(X)= {0},
and p the mean value and covariance distri-
then {X,} is said to be purely nondeterministic.
bution of {X,}, respectively. By the generalized
A general {XI} is expressed as the sum of the
Bochner theorem p can be expressed in the
deterministic part {Xf} and the purely nonde-
form
terministic part {X:} (Wold decomposition).
Furthermore, we have &(X”)= &@JX), and
p(q)= f$(A)F(di), c)(i)= eiAffp(t)dt,
J&(X) = &(Xd) + &JX”) (direct sum). Thus
s s
{Xp} and {X:} can be dealt with separately.

s(1
+3,yF(di)
<Co (11)
where F(dl) is a slowly increasing measure, i.e.,
A weakly stationary process {Xc} is purely
nondeterministic if and only if the spectral
measure F(di) is absolutely continuous with
respect to the Lebesgue measure, and the
for some positive integer k. F(di) is called the density f(i) is positive almost everywhere and
spectral measure. This expression is the gen- satisfies
eralization of the Khinchin decomposition.
li
The spectral decomposition corresponding to logf(i)di> -cc,
(6) and the +law of large numbers for X, can s -n

sO”
bm.)
di>--co
be discussed in a manner similar to that for (discrete parameter case),
weakly stationary processes (K. It8 [3]).

-~ I+/12

D. Prediction Theory (continuous parameter case).

By using f(1) the optimum linear predictor can


Let {X,} be a weakly stationary process. Sup- be obtained.
pose that its values X, (s < C) up to time t are First, we explain the discrete parameter
observed. Prediction theory deals with the case. There exists a function y(z) = C,zO a,z’ in
problem of forecasting the future value X,,, the +Hardy class H2 relative to the unit disk
395 D 1464
Stationary Processes

such that its boundary value satisfies the process {X,}:


f
x,= a,-.&,>
Then we can find a sequence of mutually or- s - x,
thogonal random variables {t,) (t E Z) such which enables us to obtain the optimum pre-
that {X,} admits a backward moving average dictor and the prediction error in a manner
representation similar to the discrete parameter case [4]. In
f particular, if the optimal y(z) is of the form i)(z)
&= c %,i’s. (13) = c/P(iz), where c is a constant and P(z) is a
s=--u,
polynomial of degree p, then X, is p- 1 times
There are many pairs {u,} and {t,} which give differentiable and P(d/dt)X, = (d/dt)& up to a
the representation (13), but if y(z) is maximal multiplicative constant; therefore zr,., is ob-
(optimal), namely, if y(z) is expressed as tained explicitly. To obtain the optimum linear
-Sfz predictor for YE.&(X), we first establish the
‘:(z)=J%exp & 1 logf(i)f+dl. ) expression
( s I -Z
>
(14)
then the representation (13) is canonical in the
sense that J&(X) = J&(<) for every t. Hence the
and then take c,“=-,f(s)<, or s’!% f(s)di;, for
optimum predictor r?,,, for X,,, is given by
the optimum linear predictor.
The results stated above can be generalized
to multivariate (n-dimensional) stationary
processes [6,7] and to the case where the
where parameter space T is multidimensional.
N. Wiener observed the individual tsample
process X(t, 0~) and discussed a method of
finding the optimum predictor for X(t + z, tu)
The prediction error G’(T) of this predictor is
given by by using a linear functional

r-l s
d(1)= 1 1a,12. X(t-s,w)dK(s)
S=O s0
Example. Let the covariance function of a (K is of tbounded variation) of the values X,
weakly stationary process {X,} be emal” (a > 0). (s d t) [S]. The spectral measure played an
Then we have important role in his observation. Calculations
in this case are analogous to those mentioned
above.
For a weakly stationary random distribu-
tion {X(q)} (QEZ), the prediction theory is
The maximal y(z) is expressed as m( 1- reduced to that for ordinary stationary pro-
fiz)-‘, and cesses. Assume that the spectral measure F(di,)
of {X(q)} satisfies (11). Set e(t)=exp(t) (t<O),
= 0 (t > 0), and let e,(t) be the k-times convo-
lution of e(t) with itself. Set Y(p) =: X(e, * (p).
Then {Y(v)} is equivalent to a weakly station-
ary process. It is obvious that J&‘~(X) = J&( Y)
for every t, where .&Z’,(X) is the linear subspace
We now come to the continuous parameter spanned by {X(v)Isupport ofcpc(-m,t]}.
case. By replacing the holomorphic function This consideration allows us to reduce the
y(z) on the unit disk with the one on the half- prediction problem for {X(v)} to that for the
plane, we see that almost all results obtained stationary process corresponding to {Y(q)}.
in the discrete parameter case hold similarly in Nonlinear prediction theory is formulated
this case. The maximal y(z) is expressed as as follows. Let ‘23, be the smallest o-algebra
with respect to which every X, (s d t) is mea-
lsiz di
logf(i,)---- surable and H,(X) be the subspace of&(R)
z-i 1+12
consisting of all ‘%,-measurable elements. The
Using the +Fourier transform a, of the bound- problem is to forecast X,,, (z > 0) by using
ary function of y(z) and a process { &} with an element of H,(X). The optimum predictor is
orthogonal increments, we have the canonical obviously equal to E(X,+,I 8,). For a station-
backward moving average representation for ary Gaussian process it has been proved that
1465 395 F
Stationary Processes

the optimum predictor found in H,(X) belongs closely (relative to the A(X)-norm) as possible.
to A’*(X). Therefore the optimum nonlinear The best approximation is the projection of
predictor coincides with the optimum linear S,,, on A&(X), but its expression in terms of
predictor. However, except for stationary the spectral measure becomes extremely com-
Gaussian processes, no systematic approach plicated [S]. This problem is usually discussed
for nonlinear prediction theory has been es- under the assumption that S, and N, are ortho-
tablished so far. (For a typical case that arises gonal. Let us further assume that their spectral
from a stationary Gaussian process - 176 measures are absolutely continuous. The den-
Gaussian Processes H.) sity functions are denoted by fs(A) and j&),
respectively. If {X, 1TV T} is observed, then the
best (linear) approximation S, of S, is given by
E. Interpolation and Filtering cc
S*= e’“‘q,(i)M(dl),
s -m
Interpolation and filtering of stationary pro-
cesses have many similarities with prediction where cp&) =L&MLd4 +f&U). The mean
theory, both in the formulation of the prob- square error E( 1S, - $1’) of this filtering is
lems and in their method of solution.
Let {X,} be a weakly stationary process, all
of whose values {X,1 t# T,}. T, some interval,
are known with the exception of those at tc T,.
The problem of linear interpolation of the
unknown value X, (t E T,) is to find the best F. Strongly Stationary Processes and Flows
approximation of this random variable by the
limit of linear combinations of the known Let {X,(o)} (TV T, ~ER(!B, P)) be a strongly
stationary process. To study it we take the
values. The following example illustrates the
problem in the discrete parameter case. coordinate representation of {Xt} as follows.
Example. Let Tl = {to} and f(A)dA be the Let R be the complex vector space CT, 23 the cr-
algebra generated by the Bore1 tcylinder sets,
spectral measure of {X,}. The interpolation of
XrO has an error if and only if X,(w) the tth coordinate of the function WE
CT, and P the probability distribution of the
= 1 process {X,) defined on (Q %3).Define the
-,f.(9di.< (@2 shift transformation S, of Q onto itself by
s
(SW) (s) = w(s + t). Then {St} forms a group
Expressing X, in the form (6) with m = 0, the
of tmeasuie-preserving transformations
best (linear) interpolation X1, of X, is given by
on Q(%3, P) (- 136 Ergodic Theory) since
{X,} is strongly stationary. Thus we are given
a (measure-preserving) Wow {St} (TV T)
on Q(23, P). Conversely, if {St} (TV T) is a
and the error of the interpolation is expressed (measure-preserving) flow on a probability
by space Q(23, P), then {X,}, given by X,(w) =
f(S,w), is a strongly stationary process, pro-
E(~X,o-2J)=4n2 vided that f is measurable. Many properties
of a strongly stationary process are closely
The problem of interpolation for multivariate related to those of the corresponding flow.
stationary processes has also been discussed For example, the tstrong law of large num-
c71. bers for a strongly stationary process follows
The filtering problem originated in com- from +Birkhoff’s individual ergodic theorem
munication theory as a technique to extract for flows. +Ergodicity, several kinds of tmixing
the relevant component from a received signal properties, and the spectra1 properties of a
with noise [S, 131. Suppose that a complex- strongly stationary process are defined in
valued stationary process {Xtl with continu- accordance with the respective notions for
ous parameter is expressed in the form the corresponding flow. Now we give some
a; examples of flows corresponding to strongly
x,= e’“‘M(d/l) = SC+ N,, stationary processes.
s -m (1) If X, (t E Z) are mutually independent and
where (S,, NJ is a (2-dimensional) weakly have the same probability distribution, then
stationary process with mean vector 0. Here, S, the process {Xt} (FEZ) is strongly stationary
and N1 indicate the signal and noise, respec- and &%3~ is trivial (the definition of ‘93-D).
tively. The filtering problem is to find the Hence the corresponding flow is a +Kolmo-
element of J&(X) that approximates S,,, as gorov flow.
395 G 1466
Stationary Processes

(2) Similarly to (1) the flow corresponding H. Strongly Stationary Random Distributions
to iGaussian white noise (- 176 Gaussian
Process) is also a Kolmogorov flow. Let % be the space of all C”-functions with
(3) The mixing properties of the flow corre- compact support and 3’ be the space of tdistri-
sponding to a stationary Gaussian process is butions. If X,+,(w) is defined for w~R(23, P) and
determined by the smoothness of its spectral q E 9, and if for almost all w, X,(w) belongs to
measure F(di). The flow is ergodic if and only 9’ as a linear functional of cp, then {X,} is
if F is continuous (i.e., F has no point mass). In called a random distribution. Suppose that the
this case, the flow is also tweakly mixing. For joint distribution of the random variables
the flow to be tstrongly mixing, it is necessary x Th’Pl’ x4(p2’ . ../ XThVn(r,,(~(t)=p(t-hh)) is inde-
and sufficient that the covariance function p(t) pendent of h. Then we call {X,(w)} a strongly
of the process tend to zero as 1t I+ io. In this (or strictly) stationary random distribution. If
case the flow is +mixing of all orders (- 136 we identify random distributions that have the
Ergodic Theory) [ 14- 161. same probability law, then {X,} is determined
by the characteristic functional

C(q) = E(eiX.).
G. Analytic Properties of Sample Functions of
Stationary Processes For {X,} to be strictly stationary it is neces-
sary and sufficient that the equality C(z,,cp)
In the continuous parameter case, we always = C(q) hold. The simplest example of a strictly
assume that tcontinuity in probability holds stationary random distribution is the Gaussian
for strongly stationary processes and tmean white noise (- 176 Gaussian Processes, 341
square continuity holds for weakly stationary Probability Measures) [20].
processes. Hence the processes discussed here
are all continuous in probability, and without
loss of generality we can assume that the sta- I. Generalizations of Stationary Processes
tionary processes are tseparable and +mea-
surable (- 407 Stochastic Processes). The concept of stationary processes is gen-
Let {Xc} be a weakly stationary process. eralized in many directions.
Assume that the moments up to order 2n of (1) Let T be a set different from R or Z, and
the spectral measure F(di) are all finite. Then suppose that there is given a group G of trans-
almost all tsample functions of {Xt} are n - 1 formations that map T onto itself. If a family
times continuously differentiable, and almost {X,} of random variables with parameter t E T
all sample functions of {X:-i)} are absolutely has the property that for every choice of ran-
continuous. Define the spectral distribution dom variables X,,, Xtl, ,X,“, the joint distri-
function F(n)=F(( -co,n]) for the spectral bution of (X,,, , , X,,J is always independent
measure F(d).) of {X,}. If F satisfies of gE G, then {X,} (TV T) is said to be a strictly
G-stationary system of random variables.
Similarly, a weakly G-stationary system of
random variables can be defined [21,22].
for a nonnegative integer Y, then almost all (2) Let T be a Riemannian space, and let G
sample functions of {Xt} have continuous be the group of all isometric transformations
rth derivatives. In particular, if the condition on T or one of its subgroups. Suppose that a
(16) is satisfied for r = 0, then almost all sam- ttensor field X,(w) of constant rank is asso-
ple functions are continuous. Conditions for ciated with any o~Q(2$ P) at every point t.
Holder continuity of almost all sample func- Then X(w) = {X,(w) 1t E T} is called a random
tions of a weakly stationary process have also tensor field over the Riemannian space T. Any
been obtained [ 17,l S]. (For sample functions gE G induces an isometric transformation of
of stationary Gaussian processes - 176 the tangent vector space at t to that at gt.
Gaussian Processes F.) Hence g maps a tensor field X(w) to another
For a strongly stationary process {X,} with tensor field gX(w) for every o. If X(w) and
E(X,) = 0 and finite E(X& the sample covari-

T
gX(w) have the same probability law, then
ante function X((l)) is said to be strictly G-stationary. X(w) is

R(t)= lim i
~-m2T

is determined
s -.,
X,+,(4X,(4

with probability 1. We can


ds
defined to be weakly G-stationary
way [21,22].
in a similar

(3) In the same way as we extended stochas-


tic processes to random distributions, we can
therefore apply the theory of igeneralized generalize random tensor fields to random
harmonic analysis, due to Wiener [9]. (Further currents and discuss stationary random cur-
results on sample function properties are rents [21].
found in [ 191.) (4) Stochastic process with stationary incre-
1467 396 B
Statistic

ments of order n. Assume that {X,} (PER) is not [15] S. Kakutani, Spectral analysis of station-
necessarily a stationary process but that the ary Gaussian processes, Proc. 4th Berkeley
nth-order increment of X, is stationary. Then Symp. Math. Stat. Prob. II, Univ. of California
by taking the nth derivative D”X, in the sense Press (1961), 239-247.
of random distributions, we obtain a station- [ 161 H. Totoki, The mixing property of Gauss-
ary random distribution. From the properties ian flows, Mem. Fat. Sci. Kyushu Univ., (A)
of D”X, we can investigate the original process 18 (1964), 136-139.
itself. Brownian motion is an example of a [ 173 I. Kubo, On a necessary condition for the
stochastic process with stationary increments sample path continuity of weakly stationary
of order 1. processes, Nagoya Math. J., 38 (1970), 1033
(5) Weakly stationary processes of degree k. 111.
A weakly stationary process is a process whose [18] T. Kawata and 1. Kubo, Sample prop-
moments up to order 2 are stationary. Gen- erties of weakly stationary processes, Nagoya
eralizing this, we can define a weakly station- Math. J., 39 (1970) 7-21.
ary process of degree k by requiring the mo- [ 19) H. Cramer and M. R. Leadbetter, Sta-
ments up to order k to be stationary. We can tionary and related processes, Wiley, 1967.
obtain more detailed properties of such pro- [20] I. M. Gel’fand and N. Ya. Vilenkin, Gen-
cesses than those of weakly stationary pro- eralized functions IV, Academic Press, 1964.
cesses [23]. (Original in Russian, 1961.)
[21] K. ItB, Isotropic random current, Proc.
3rd Berkeley Symp. Math. Stat. Prob. II, Univ.
References of California Press (1956) 125-132.
[22] A. M. Yaglom, Second-order homo-
[l] J. L. Doob, Stochastic processes, Wiley, geneous random fields, Proc. 4th Berkeley
1953. Symp. Math. Stat. Prob. II, Univ. of California
[Z] U. Grenander and M. Rosenblatt, Statis- Press (1961) 593-622.
tical analysis of stationary time series, Wiley, 1231 A. N. Shiryaev, Some problems in the
1957. spectral theory of higher-order moments I,
[3] K. Ito, Stationary random distributions, Theory Prob. Appl., 5 (1960), 265-284. (Orig-
Mem. Coil. Sci. Univ. KyBto, (A) 28 (1954) inal in Russian, 1960.)
209-223.
[4] K. Karhunen, Uber die Struktur station-
arer zufalliger Funktionen, Ark. Mat., 1 (1950)
141-160.
[5] H. Cramer, On the linear prediction prob- 396 (XVlll.3)
lem for certain stochastic processes, Ark.
Mat., 4 (1963), 45-53.
Statistic
[6] N. Wiener and P. Masani, The prediction
theory of multivariate stochastic processes I, A. General Remarks
II, Acta Math., 98 (1957) 11 l-150; 99 (1958),
93% 137. A statistic is a function of a value (i.e., a
[7] Yu. A. Rozanov, Stationary random pro- sample value) observed in the process of sta-
cesses, Holden-Day, 1967. (Original in Rus- tistical inference (- 401 Statistical Inference).
sian, 1963.) A statistic is used for two purposes: (a) to
[8] N. Wiener, Extrapolation, interpolation, characterize the set of observed values or
and smoothing of stationary time series, MIT sample values, and (b) to summarize the in-
Press, 1949. formation contained in the sample about the
[9] N. Wiener, Generalized harmonic analysis, unknown parameters of the population from
Acta Math., 55 (1930), 1177258. which it is assumed to have been drawn.
[lo] N. Wiener, The homogeneous chaos,
Amer. J. Math., 60 (1938) 8977936.
[ 1 l] N. Wiener, Nonlinear problems in ran- B. Samples and Statistics
dom theory, MIT Press, 1958.
[12] P. Masani and N. Wiener, Non-linear The basic concepts in statistical inference are
prediction, in Probability and Statistics: The +population and tsample. Let (D, a, P) be a
H. Cramer Volume, Wiley, 1959, 190-212. +probability space, where P is a tprobability
[13] A. Blanc-Lapierre and R. Fortet, Theorie measure on a. A trandom variable X defines a
des fonctions aleatoires, Masson, 1953. l-dimensional probability distribution @(A)
[14] G. Maruyama, The harmonic analysis of = P{w\ X(w)tA}, where A is a l-dimensional
stationary stochastic processes, Mem. Fat. Sci. ‘Bore1 set, which gives rise to a 1 -dimensional
Kyushu Univ., (A) 4 (1949), 45-106. probability space (R, &I’, a). Here 9’ is the
396 C 1468
Statistic

family of all l-dimensional +Borel sets. Let certain measurable space (%, &) and a family
x*,x,,..., X,, be tindependent random vari- 9 = {P, 10~ O} of probability measures on d.
ables with identical l-dimensional distribu- A statistic, in general, is a random variable
tions. The +n-dimensional random variable expressed as Y=f(X) by a measurable func-
X =(X, , X,, . , X,) is called a random sample tion f defined on a sample space (F, Se) taking
of size n from the population (0, g, P). In values in another meaurable space (g, U).
particular, when each of X,, . . . , X, takes only When (“Y, q) is (R, 3”) or (R”, a”), Y=f(X)
two values (usually 0 and l), the sample is is accordingly called a l-dimensional or n-
called a Bernoulli sample or a sequence of dimensional statistic.
Bernoulli trials. Generally, if Qn is the tn-
.dimensional probability distribution deter-
mined by X (i.e., the direct product of n copies C. Population and Sample Characteristics in
of the l-dimensional probability distribu- the l-Dimensional Case
tion Q), then the n-dimensional probability
space (R”, &J!“, @,.), where !F is the family In a l-dimensional probability space (R,
of n-dimensional Bore1 sets, is called an n- @, PO) the following quantities, called popu-
dimensional sample space. A point belonging lation characteristics, are used to characterize
to the set of actually observed values of the the population distribution PO: Letting F(z) =
sample X, which is a random variable by P3(( -co, z]) be the tdistribution function of
definition, is called a sample value and is de- P,,, we use the population mean p = jz dF(z);
noted by x. Thus the sample value can be the population variance 0’ = s(z - p)’ dF(z); the
expressed as x=X(w) (weQ) and regarded as population standard deviation cr( > 0); the popu-
a point in the sample space (sample point). The lation moment of order k p; = s zk dF(z) (p’, = p);
basic underlying structure which determines the kurtosis p4/04; the coefficient of excess
the probability distribution is the set Q, which (p4/g4) - 3; the skewness p3/03; the cc-quantile
we can view as describing the physical struc- or 100QO-point m satisfying F(m - 0) < CY<
ture of the observed phenomena, but statistical F(m + 0); the median, which is the SO%-point;
procedures are always carried out through the the first and third quartiles, which are the
observations of samples, and R itself is often 25x-point and 75x-point, respectively; the
disregarded. The l-dimensional probability range, which is the third quartile minus the
distribution Q, (the n-dimensional probability first quartile; and the mode, which is the value
distribution a’, determined by X) is called the or values of i for which dF(z)/dz attains its
population distribution in the l-dimensional (n- maximum.
dimensional) sample space, since it is induced Sometimes the kurtosis and others are
from the probability measure on (Q &9). called population kurtosis, etc. Here the word
A statistic Y is a random variable expressed “population” is used when it is desirable to
as Y=f(X), where f is a tmeasurable func- distinguish population characteristics from the
tion from the sample space (R”, S?“, @J into a sample characteristics defined in 341 Proba-
measurable space (R, gi). The value of the bility Measures.
statistic Y corresponding to a sample value x Letx=(x,,...,x,)beapointofann-
of the sample X is denoted by y =f(x). dimensional sample space (a sample value).
When we deal with a statistical problem we Corresponding to each l-dimensional Bore1
often have no exact knowledge of the popula- set A, the number of components of x that
tion distribution @(@,.) except that it belongs belong to A is called the frequency of A in the
to a family 9 = {P, 18~ 0) of probability mea- sample value x = (xi, . , x,), and (frequency)/n
sures on L@(%“). We call fI the parameter of is called the relative frequency of A. If we take
the probability distribution and 0 the para- A=( -co, z] and regard its relative frequency
meter space. The typical cases described in this F,(z) as a function of z, it becomes a tdistri-
section can be extended as follows: (1) The bution function for every x E R”, called the
distribution Q, may be an r-dimensional proba- empirical distribution function based on x.
bility distribution. In this case a sample of size Various characteristics can be defined from
n induces an nr-dimensional sample space. (2) the empirical distribution function in exactly
Random variables Xi, . ,X,, being mutually the same way as population characteristics are
independent, may not have identical distri- derived from a population distribution func-
butions. (3) Random variables Xi,. . . , X, may tion. These are called sample characteristic
not be mutually independent. In both cases values and can be expressed as functions of
(2) and (3) the sample space is of the form x ,,“‘, X”.
(R”, 9, O,,), but n may not be the sample size Assuming that x = (xi, . , x,) is a sample
itself, nor Qn be the direct product of n copies value of a sample X = (Xi, . . , X,,), the statistic
of identical l-dimensional components. (4) The obtained by substituting X for x in the func-
most general sample space is expressed as a tion denoting a sample characteristic value is
1469 396 E
Statistic

called a sample characteristic and given the F the population covariance J(u-p(,,)(u-
same name as the corresponding population p&dF(u, u) of Ui and y and the population
characteristic, except that the word “popula- correlation coefficient, which is equal to (popu-
tion” is replaced by the word “sample.” A lation covariance)/rr(,,+,, are defined. Here
sample characteristic is a function of random F(u, u) is the joint distribution function of Ui
variables. Hence it is also a random variable, and K pcl) and pc2) are the respective popula-
and the problem of deriving its probability tion means of Ui and 6, and a(,, and a(,, are
distribution from the assumed population the respective standard deviations. As corre-
distribution is called that of sampling distri- sponding sample characteristics, we have the
bution (- 374 Sampling Distributions). Thus sample covariance CL1 ( Ui - U) (y - V)/n of
we define the sample mean X =CG1 X,/n; the (U,, . , U,) and (V,, , V,) and the sample
sample variance J& (Xi - X)‘/n (sometimes correlation coefficient
CFZ’=l(Xi - X)‘/(n - 1) is taken as the sample
variance); the sample standard deviation

which is the positive square root of the sample


variance; the sample mode, which is the value Similarly, statistics of the samples from a
taken by the largest number of Xi; and the population of k-dimensional distribution
sample moment of order k Zy=, (Xi - X)‘/n. (k 2 3) can be defined (- 280 Multivariate
Among other statistics of frequent use are Analysis). More generally, in statistical in-
the order statistic, i.e., the set of values of ference we encounter samples where observed
X, , . , X, arranged in order of magnitude and values may not be mutually independent or
usually denoted by X,,) < X,,, < <X,,,. identically distributed, but have more com-
Various other statistics are defined in terms plicated probability structures. Statistics as
of order statistics: the sample median Xmed = functions of such samples are also considered.
Xc(n+l)i2) for odd n and = (Xcniz, + Xccn,2j+1J/2
for even n, the sample range R = max Xi -
minXi=Xc,,-Xc,,, and so on. The empir- E. General Properties of Statistics
ical distribution function F,(z) or its standard-
ized form S,(z) = &{ F,(z) - F(z)} can also be The general theory of statistics has been
considered to be a function of the order sta- studied in a measure-theoretic framework.
tistics, and hence is a statistic taking values in (3, -r4,P) is called a statistical structure, where
the space of functions of a real variable. So is (%, Se) is a measurable space and B is a family
the empirical characteristic function of probability measures on (X, d). A (r-
subfield &? of & (hereafter abbreviated a-field)
&(t)= exp(itz)dF,(z)=Cexp(itx(j,)/n. is called sufficient for B if for any A EJZZ there
s i exists a g-measurable conditional proba-
In a sequence of Bernoulli trials, a set of bility of A independent of POE 8, that is, ‘a a-
measurable function (PA(X) satisfying
successive components with an identical value
is called a run. For example, (01100010) has a
run of 0 of the length 3 and a run of 1 of the PotA n B) = v,kW’,,(x)
sB
length 2.
Among the statistics listed in the previ- for all Poe9 and BE.%?.
ous paragraphs, the order statistic is an n- For any two g-fields gl and gZw,,the nota-
dimensional statistic and all others are l- tion 9, c~[P] means that to each set A,
dimensional. in til there corresponds an A, in aZ satisfy-
ing P,((A, -A,)U(A,-.4,))=0 for all PogB.
When the reverse relation &?* ca’, [P] also
D. Other Cases holds, we write G?, = & [P].
For a statistic t which is a measurable func-
Let (R*, @, P,) be a 2-dimensional probability tion from (X,d) to (Y,%?), g(t)={BIB~.ti,
space with a 2-dimensional population distri- ME%?} is a c-field and is called the o-field
bution P,, and let (X, , . . . , X,) (Xi = ( Ui, v)) be induced by t. If a(t) is sufficient for .Y, t is said
a random sample of size n from P,,. In this to be sufficient for 8. Since sufficiency of a
case also, the population characteristics for statistic means that of a a-field, we consider
the tmarginal distributions of Ui and q and only sufficiency of a a-field.
sample characteristics for (U, , . . . , U,) and 3 is called necessary if for any sufficient g0
(V, , , V,) are defined as in Section C. we have B c &$[P]. A necessary and sufficient
As an index for association between Ui and u-field is called a minimal sufficient g-field. A
396 F 1470
Statistic

necessary and sufficient statistic is also called for all AEF-, and p(A-S)=O for all Ae,F
a minimal sufficient statistic. Such a statistic implies p(E - S) = 0. A a-finite measure is
does not always exist; .%32containing a sufh- localizable. A measure p is said to have the
cient 28i is not always sufficient. finite subset property if for any A satisfying 0 <
D is said to be complete if for every 8- p(A), there exists a B c A satisfying 0 <p(B) <
measurable integrable function cp, Jx Q(X)&‘,(X) co. A statistical structure (S, &, 9) is said to
=Oforall P,EYimplies PJ{xlcp(x)#O})= be weakly dominated if 9 is dominated by a
0 for all P,EY. 93 is said to be boundedly com- localizable measure p with the finite subset
plete if for every bounded &?-measurable cp, property and a density dP,/dp exists for all
~.r(p(x)dPo(x)=O for all Po~9 implies P,({xl Polyp. In this case a minimal sufficient a-field
q(x)#O})=O for all P,eY. When 3(t) is exists, and a pairwise sufficient o-field is suffi-
(boundedly) complete, t is called (boundedly) cient. For example, let .d = 2” and B be the
complete. If .%, c .GY2and 9& is (boundedly) set of all discrete probability measures on .d.
complete, 93, is also (boundedly) complete. If 9 is weakly dominated by the counting mea-
.%, is (boundedly) complete and sufficient and sure p which is localizable on J&‘.
9& is minimal sufficient, we have 9, = 9J2 [Yj. The order statistic is sufficient if 9 is domi-
nated, Xi, , X, are mutually independent
and identically distributed random variables
F. Dominated Statistical Structure with Z = R”, and each Ps 6.9 is invariant under
every permutation of the components of the
When all P,,E~ are absolutely continuous with points x=(x, , , x,) in 3. Moreover, the
respect to a o-finite measure i on .d, then order statistic is complete if 9 is large enough.
(Z, :d, 9) is said to be a dominated statistical For example, we have the following theorem:
structure and .oP is said to be a dominated The order statistic is complete if every P,” (the
family of probability distributions. In this case, component of P. on a0 c @, f3E 0) is ab-
Ps has the density ,&(x) = dP,/di with respect to solutely continuous with respect to the Lebes-
i by the Radon-Nikodym theorem. If SZ?is gue measure I on R and { Pj 10~ 0) contains
separable, 9 is a separable metric space with all Pi for which g(z) = dPi/dl is constant on
respect to the metric p(Pol, PB,)=supBt d IPo,(B) some finite disjoint intervals in .SYO c R. A
-Pe2(B)I. There exists a countable subset Y= similar result holds for discrete distributions.
{P,,,Po2,...) of 9 such that P,(N)=0 for all We call 0 a selection parameter when ,/i(x)
P,EY implies P,(N)=0 for all P,ES. If we put = c(f?)~,~(x)h(x), where h(x) is a positive a-
i.,=CiciPol, c,>O, Cici= 1, &, dominates 9, measurable function, xER(x) is the indicator
and if .?8 is sufficient for Y we can choose a 1?A- function of a set &E&Y, and c(0) is a constant
measurable version of dP,/di,,. Conversely, if depending on 0. Here 0 determines the carrier
there exists a a-finite measure i such that we E, of f&x) but does not essentially affect the
can choose a g-measurable version of dP,/di functional form of f&x). A necessary and suffi-
for all PotzY, then 98 is sufficient. cient statistic is given by t*(x)= n {Es1 E,ZIX,
If 9 is dominated by a a-finite 2, .“A is suffi- P,E.~‘}. Here the class of sets of the form given
cient if and only if there exist a d-measurable in the right-hand side of this expression is
g,, and an .d-measurable h independent of 0 takenasY,andweset%‘={ClCc:Y,t*~‘(C)E
satisfying 28). We call t*(x) the selection statistic. Two
examples follow.
dpo
-=g& a.e. (&,A) for all PoE9. (I) TUniform distributions. Let 0 = {(a, /J) I
di
--x <a<B< a}, T,,=R, Pj be the uniform
This is called Neyman’s factorization theorem. distribution on (a, /J’), and X=(X,. , X,) be a
With a dominated statistical structure, there random sample of size n having P,p as its popu-
exists a minimal sufficient c-field, and a o-field lation distribution. Then E, = {x 1(x,< mini xi <
containing a sufficient o-held is also sufficient. max,x,<fi} and fk(x)=(B-a)~“~l,,(x). If we
We say that a o-field g is pairwise sufficient put t(x)=(minixirmaxixi), Y=R’, and %=
for 9 if it is sufficient for every pair {PO,, P,,} the set of all Bore1 sets of R2, it follows that
of measures in .Y. A necessary and sufficient r(x) = t*(x) [Y], where t*(x) is the selection
condition for 9 to be sufficient for a domi- statistic. Hence t(x) itself is necessary and
nated set B is that 3 be pairwise sufficient for sufficient.
.“p. (II) tExponentia1 distributions. Put 0 =
Recently, a more general statistical struc- (-co,co),TO=R,and
ture has been studied. Put =,(p)= {A 1A~sd,
p(A) < a}. A measure p on d is said to be a
localizable measure if there exists ess-sup 9( p)
for any subfamily B c &Jp), that is, if there where r is a known constant, and let X =
exists a set E E .d such that p(A - E) = 0 holds (X,, , X,) be a random sample of size n
1471 396 I
Statistic

having a population distribution with density ally independent with a common distribu-
function c&). Then tion of exponential type, the distribution of
X = (X, , . X,,) is of exponential type, and
t,=(x\O<mjnx,}, vice versa. The family (1) of distributions is a
special form of +Polya-type distributions, and
various distributions given in (III)-(VII) below
&lx) = x”e”““~,u(x)exp --3 f xi are written in this form. In the following exam-
( i=, >
ples, for a sample of size n from the specified
If we put t(x)=min,xi and let t*(x) be the selec- distribution, h(x) is the density with respect
tion statistic, it follows that t(x) = t*(x) [S], to Lebesgue measure in (III), (IV), and (V)
and t(x) is necessary and sufficient. If 0 = and to counting measure in (VI) and (VII)
{(a,B))O<:a<~,--n~<O<co},thent(x)= (- Appendix A, Table 22).
(mini xi, C, x~) is a necessary and sufficient (III) +Normal distributions N(,u, a’), ~rO,=
statistic. (--‘lj> m30).

G. Exponential Families of Distributions

A dominated B is called an exponential family


of distributions if and only if ,f&) = dP,/di can
be expressed in the form (IV) +I-distributions T(p, a), X0 = (0, co).

.A,(4 = ew i sji(x)aji(fl) + aO(u) + sO(x) , .h(x)=exp (P- lli$logxie~,$ xi


c j=l > (
XEX, UEO, (1)
-tIplogo-nnloglJp) .
where the sj(x) (j = 0, 1, , k) are real-valued >
S-measurable functions and the E,(B) (j = 0,
(V) tExponentia1 distributions e(l, a), X0 =
1. , k) are constants depending on 0. If there
h xi).
exists a sufficient statistic for Y that is not
equivalent to but is in a certain sense simpler ,f,(x)=exp - f: xi i--n*ogn+nz
than the sample itself or the order statistics, cc i=l > !
then it can be shown under some regularity
(VI) +Binomial distributions Bin(N, p), To =
conditions that Y must be an exponential (0,l ,.I., N).
family. The following theorem provides an
instance of the hypotheses that guarantee
such a conclusion: Let X be a sample from .hW=exp f: xi lOg&fnNlOg(l -p)
(( i=, >
a l-dimensional
with Pf the population
probability space (x0, a,,, P,“)
distribution, where x0 -
n
is a finite or infinite interval in R and a0 is c
i=l
the class of all Bore1 sets. Let I denote the (VII) +Poisson distributions P(A), So =
Lebesgue measure. Assume that {Pi) is domi- (0, 1,2,3, . . . ).
nated by I and g,,(z) = dPj/di is greater than a
positive constant and continuously differenti- i xi log/z- f log(x,!)-ni
.&x)=exp
able in z on SO. Assume further that there cc i=l ) i=l >
exists a sufficient statistic t(x) with the prop-
erty that for each open subset B of 9’ ( t R”) H. Ancillary Statistics
and i-null set N there are two points x f x’
in B - N such that t(x) # t(x’). Then :Y is an A statistic t(x) is called an ancillary statistic
exponential family, and the k given in (1) is when for every element A in .&(t), P,(A) is
less than n. Similar results are known also independent of 0, or in other words, when the
for cases where X, , X, are not identically distribution of t(x) is independent of 8. A sufft-
distributed. cient condition for a statistic to be ancillary is
It is evident from the construction of a that it is independent of some sufftcient statis-
necessary and sufficient statistic that the tic. Conversely, an ancillary statistic is inde-
statistic t(x)=(si (x), . . . ,.s~(x)), where the s/(.x) pendent of all boundedly complete sufficient
are those appearing in (1) is sufficient for an statistics.
exponential family and necessary if xi(H), ,
~(0) arc linearly independent. If ((cI, (O), _. , I. Invariant Statistics
a,(U)) ( UE 0 1 contains a k-dimensional interval,
t(x) is complete. The distribution of t(x) is of Suppose that we are given groups of one-to-
exponential type. When X,, . . . , X, are mutu- one measurable transformations G and G^ on
396 J 1472
Statistic

P’ and 0, respectively. Suppose also that we tion cp’ satisfying E,(q)= E,(cp’) for all P,EY.
are given a thomomorphism g-3 from G to c Let (0, +Z) be a measurable space of parameter
satisfying P&g-‘B)=P,,(B). In this case p= 0 and 6 be the set of all probability measures
{ PR18~ O} is called G-invariant. If G 1s transi- on %‘. Moreover we assume that P,,(B) is +Z-
tive, there exists a fixed element 0, of 0 that is measurable as a function of f3 for any fixed
sent to an arbitrary Q by an element g of G. In BE&. For any 5~6, we define 1, by
this case, B is called a transformation para-
meter. In particular, if 0 = R, X is a random i&A x C)= P,(A)d<(O), Aed, CEW.
sample from a population distribution, and sC
P,(B)=Poo(B-O), whereB-O={xI(x,+ We denote by & the extension of i, to & x V.
0, , x, + 0) E B}, then 0 is called a location # is said to be Bayes sufficient if
parameter. When 0 = (0, co) and P,(B) =
P,(B/B), where B/H={xI(Ox,, . . . . HxJEB}, f3 E~~(l,.,l~xx)=E~~(I,.,I~xx)
is called a scale parameter. Now assume that B for all 5~ 6, CE W, that is, the a posteriori
is a combination of these two kinds of param- distribution on 0 given & coincides with that
eterssuchthatO=(sr,fi)(-a<<a<,O< given a for any a priori 5. When B is domi-
b < co) and P,(B) = Peo((B - x)/B), where 0, = nated, these definitions coincide with the class-
(0,l). Then if 9 is an exponential family, (1) ical definition of sufficiency. Generally, a D-
of Section G can be written as sufficient g-field contains at least one sufficient
g-field. A a-field containing a sufficient o-field
is Bayes sufficient. Hence Bayes sufficiency
follows from D-sufficiency and from classical
where the kj (j = 0, 1, , m) are constants. sufficiency. If a D-sufficient g-field is separable
We call t(x) an invariant statistic with re- it is sufficient.
spect to a general transformation group G The notion of prediction sufficiency or ade-
when t(gx)=t(x) for all gEG and x~%“. An quacy was defined by Skibinsky [lo]. Let
invariant statistic is said to be maximal invar- (X, Y) be a pair of random variables defined
iant with respect to G if, for t(x) = t(x’), there over the probability space (?Z x 03, & x g’, yO).
exists a g E G such that x = gx’. If t, is maximal We suppose that X is the sample to be ob-
invariant with respect to G, a statistic t is served, and Y is (are) the value(s) of future
invariant under G if and only if t,,(x) = t,(x’) observation(s) about which we are to make
implies t(x) = t(x’). prediction(s) based on X. X and Y have joint
When 9 is G-invariant, a set A (E.&) is called probability distribution with an unknown
G-invariant if gA = A for all g E G. We denote parameter. A statistic T= 7’(X) or a subfield
by -do the set of all G-invariant sets in &. V of zz! is said to be prediction sufficient or
.d” is clearly a c-field. A set A (E&) is called adequate if (a) given T, X and Y are condition-
almost G-invariant if gA = A (~2, .Y) for all g E G. ally independent (or given %‘, & and g are
We denote by zZ* the o-field consisting of all conditionally independent or Markov) and (b)
almost G-invariant sets. If 9 is G-invariant, % T is sufficient for X (% is sufficient for d). It
is sufficient for 9, g%?= 8 for all g E G, and was proved that in any form of prediction on
moreover, 8’ = &?* (,d, g), then go is a sutli- Y, we may restrict ourselves to the class of
cient o-subfield of do, where go = a fl do and procedures that are functions of 7 (or are %-
93*=and*. measurable).

References
J. Various Definitions of Sufficiency
[l] D. A. S. Fraser, Nonparametric methods
There are many different definitions of sufi- in statistics, Wiley, 1957.
ciency, and the relations among them have [2] E. L. Lehmann, Testing statistical hypoth-
been investigated. A cr-field 8 is called decision- eses, Wiley, 1959.
theoretically sufficient or D-sufficient if for a [3] S. S. Wilks, Mathematical statistics, Wiley,
given d-measurable decision function 6 there 1962.
exists a g-measurable decision function 6’ [4] R. R. Bahadur, Sufficiency and statistical
such that decision functions, Ann. Math. Statist., 25
(1954), 423-462.
6(x, E)dP,(x)= 6’(x, E)dP,(x) [S] T. S. Pitcher, A more general property
s !/ s .!l than domination for sets of probability mea-
for all E E 9, PB~.Y, where a decision space sures, Pacific J. Math., 15 (1965), 597-611.
(D, 9) is quite arbitrary. % is called test suffi- [6] L. Brown, Sufficient statistics m the case of
cient if for any given &-measurable test func- independent random variables, Ann. Math.
tion cp, there exists a g-measurable test func- Statist., 35 (1964), 1456-1474.
1473 397 B
Statistical Data Analysis

[7] E. W. Barankin and A. P. Maitra, Gen- cross sectional or time series. Each different
eralization of the Fisher-Darmois-Koopman- type of statistical data requires a different
Pitman theorem on sufficient statistics, Sank- type of procedure (- 280 Multivariate Anal-
hy& ser. A, 25 (1963), 217-244. ysis, 421 Time Series Analysis).
[8] W. J. Hall, R. A. Wijsman, and J. K.
Ghosh, The relationship between sufficiency
and invariance with applications in sequential B. Frequency Distributions and Histograms
analysis, Ann. Math. Statist., 36 (1965), 575-
614. Statistical data have the simplest structure
[9] K. K. Roy and R. V. Ramamoorthi, Rela- when they consist of a collection of observa-
tionship between Bayes, classical and deci- tions made on an aggregate of objects sup-
sion theoretic sufficiency, Sankhya, ser. A, 41 posedly of the same kind. Such an aggregate is
(1979), 48-58. usually called a population, and the number of
[lo] M. Skibinsky, Adequate subfields and its members (its size) is denoted by N. When
sufficiency, Ann. Math. Statist., 38 (1967), 155~ the data are qualitative or categorical, each
161. member of the population is classified into
several types according to some criteria, the
data consist of the numbers of the members
of the population classified into each of the
397 (XVIII.1) categories. Such numbers are usually called
frequencies, and the set of frequencies is called
Statistical Data Analysis
the frequency distribution.
When the data are quantitative and univari-
A. Statistical Data ate, one quantitative attribute of each member
of the population is observed, and the results
Statistical data analysis is comprised of a col- are given as a set of N real numbers (x1, x2,
lection of mathematical methods whereby “‘/ xN). When N is large, as is usually ex-
we can deal with numerical data obtained pected, it is necessary to summarize these
through observations, measurements, surveys, results in some manner. One common method
or experiments on the “objective” world. The is to tabulate the frequency distribution: We
purpose of statistical analysis is to extract the define a certain number of intervals (aim,, ai],
relevant information from that numerical data i=l,..., K,a,<a,<...<a,,a,dminx,,
pertinent to the subject under consideration. max xi < a,; and we count the numbers of
The nature and the properties of the subject those x’s falling within each of the intervals
and also the purpose of the analysis may vary and tabulate those numbers or frequencies f.,
greatly. The subject may be physical, biolog- i= 1,2, , K. Frequency distribution is often
ical, chemical, sociological, psychological, represented in the form of a histogram, where
economic, etc. in nature, and the purpose of the endpoints of the intervals are marked on
the analysis can be purely scientific, as well as the horizontal axis, and above each interval a
technological. medical, or managerial. Because rectangle of area proportional to the frequency
of the great diversity of statistical data, the for the interval is drawn. It is usually recom-
methods of statistical data analysis and the mended that the widths of the intervals in the
manner of application should differ greatly frequency distribution be equal, especially
from situation to situation; we cannot expect when it is to be represented by a histogram. It
a single unified system of methods to be ap- is, however, often impossible or impractical to
plicable to all cases. Nevertheless, we have do so, and sometimes a logarithmic or other
several formal methods of statistical analysis functional scale is used in the abscissa of the
that are more or less mutually related and histogram; then it is desirable that the inter-
have been successfully applied to most, if not vals of the transformed values are approxi-
all, statistical data. mately of equal lengths. The number K of the
Statistical data can be classified into several intervals should also be of an appropriate
types according to a few criteria: according magnitude, neither too large nor too small; K
to the property of each observation or mea- is often constrained by the size N of the popu-
surement, they can be either quantitative or lation, the shape of the distribution, or other
qualitative; according to whether only one factors. Usually, K is chosen to be between 6
observation is made on each object under and 20.
investigation or many observations on the From the frequency distribution, we obtain
same object, they can be either univariate or the cumulative distribution by associating with
multivariate; and according to whether the each endpoint ai of the intervals the number Fi
observations are made at one time or consecu- of x’s not greater than ai, namely, Fi = Cjsih.
tively in the course of time, they may be either The curve obtained by connecting K + 1 points
397 c 1474
Statistical Data Analysis

of coordinates (ai, FJ, i = 0, 1, , K, by linear when the histogram is symmetrically shaped.


segments is called the cumulative distribution Hence the third moment or its ratio to the third
curve (or polygon). power of the standard deviation s,” is used as a
measure of the asymmetry of a distribution;
C. Characteristics of the Distribution this is called the skewness. The fourth moment
is large if there are some values which are far
In order to summarize univariate quatitative off from others and small when all values are
data, various values are calculated from the concentrated; hence it tends to be large when
values x,, . , xN. Such values are called statis- the histogram has a rather sharp peak in the
tics (singular, ‘statistic) and are used to char- center and has a long tail in either direction
acterize the distribution of the values. Various or both, and tends to be small when the histo-
types of statistics characterize different aspects gram is flat in the center and drops off sharply
of the distribution: at both ends. Accordingly, the ratio M,/V,f is
(a) Representative value or measure of loca- used as a measure of long-tailedness of the
tion: a value which is supposed to give the distribution; this ratio minus 3 is called the
“representative,” “ typical,” or “most common” kurtosis.
value in the population. By far the most com-
monly used measure is the mean X=x:, xi/N. D. Theoretical Frequency Distribution
Z is sometimes called the arithmetic mean, and
some other “means” are also calculated: es- When the observed values can be any real
pecially when all the values are positive, the number (sometimes in an interval), the size of
geometrical mean Xc = (n,xil’“) or harmonic the population N is increased indefinitely, and
mean X,= (Ci(l/xi)/N))’ may be calculated; the widths of the intervals are decreased to
more generally, for some monotone function 0, the histogram is expected to approach a
j’(x) we can calculate the S-mean by x,. = smooth curve. And in the limit when N is
f-‘(Cif(xi)/N), of which the geometric and infinity, we can assume that the distribution is
the harmonic means are special cases. Another represented by a mathematically well-behaved
measure of location is the median, which is the function f(x) and that the ratio of the numbers
value in the population located exactly in the of those values in the population within the
middle of the ordering of the magnitudes; more interval (a, b) to the size of the population
precisely, if x(r) < x,~) < . . < x,~) are the values approaches jif(x)dx. Such a function j’(x) is
in the population arranged according to their called the frequency function or density func-
magnitudes, the median x,,,,~=x~~~+~~,~~ for odd tion. Various types of functions have been
N, and =~(x~~,~,+x~(~,~)+~)) for even N. The proposed and used as “theoretical” frequency
mode is also sometimes used; this is defined as functions to approximate the actually ob-
the value (usually the center) corresponding to served frequency distributions. The most im-
the highest frequency. portant is the normal density function
(b) The measure of variability or dispersion
shows how widely the values in the popula-
tion vary. The most common measure is the
cp(x)=-
qkexpi -&X-P)’1
standard deviation, which is defined by s,=
The following density functions most com-
dm, and its square is called the
monly appear in applications: the gamma
variance VX2. A similar measure is the mean
density, f(x) = xp-’ exp(-x/a)iaPT(p) for x>O
absolute deviation D, = xi jxi - xl/N. Another
and = 0 for x < 0; the beta density, f(x) =
type of a measure of dispersion is the range R,
xP-‘(1-x)4-1/B(p,~)forO<x<1 and =0
= max xi - min xi, and the interquartile range
otherwise.
Qx= -YJN/~) - xw/4) and more generally the
We can conceive of a population of infinite
interquantile range x~,~) - x(r, +Nj for some cc
size with some density function; the term
The ratio of the standard deviation to the
theoretical distribution is used to mean such
mean is called the coefficient of variation (C. V.
a population with its density function, and
for short) and is used as a measure of relative
more specifically the tnormal distribution, etc.
variability when all the values in the popula-
Such a population and associated density is
tion are positive.
often called a continuous distribution. For a
(c) Characteristics often used to characterize
theoretical distribution, the mean, variance,
the “shape” of distributions are the moments
and moments are naturally defined by
(around the origin) mk = N -’ xix” and the
central moments (moments around the mean)
/I = xf(x) dx, d = (x - p)‘f(x)dx,
M, = N -’ &(xi - X)’ for a positive integer k; for s s
a specific k these are called the kth moments.
Central kth moments with odd k are equal to pk= (x-dkfWx> p;= xkj”(x)dx.
zero when the distribution is symmetric, i.e., s s
1475 397 G
Statistical Data Analysis

It should, however, be noted that the mean, bution: .h = .C,pj( 1 - p)“-j for 0 d j d n, and
the variance, or the moments may not exist for the Poisson distribution: fj = e -’ iL’/j! for j =
particular distributions. 0, 1, The bypergeometric distribution: ji =
K. Pearson introduced a system of density MCj.N-MCn-jlNC,,r for max(O, M+n-N)<
functions defined as solutions of the differen- j < min(n, M), and the negative binomial distri-
tial equation bution:,&=j+,~,C,~,pr(l-p)‘,j=O,l ,..., are
also often used.
dInf(x)/dx=(A+Bx)/(C+Dx+Ex?), For discrete distributions we can defme
where A, B, , E are constants. A distribution moments by p = C,jr;. and pk = Cj( j- p)“.fi, and
thus obtained is called a Pearson distribution. pL;=cjjk,fi, k=2,3 ,....
The normal, gamma, and beta distributions
together with some other commonly used dis-
tributions, such as the t- and F-distributions, G. Generating Functions and Cumulants
are Pearson distributions.
For a theoretical distribution with the density
function j’(x), the moment generating function
E. Measure of Concentration M(O) is defined by M(0)=~esX,f(x)dx. When
M(O) is well defined in an open interval includ-
When all the observed values are nonnegative ing the origin, the distribution has all kth
in nature, we may sometimes require some moments, and it can be expanded as
measure of inequality or concentration of the
1 I 1
distribution. For such a purpose we order the M(0)=l+~‘,6+2j~,02+...+~~;Ok,
observed values according to their magnitudes
and obtain x cI,~x~2,~...<x,,,; wedefineSi= from which the term “moment generating
&,ix,i,fori=l,...,N,andS,=O,plotN+ function” is derived. When 6 is replaced by it
1 points (S,/S,, i/N), i=O, 1, . . . , N, and con- with real t, we have the characteristic function
nect them by line segments. The graph thus cp(t)= M(u), which can be expanded as
obtained is called the Lorentz curve or the
curve of concentration, and it connects the
~(f)=l+p)(if)+~,r~(if)‘+...+~p~(if)k
origin and the point (1,l). It lies below the 45”
line, and if all the values are nearly equal the
+41tlk)
curve comes close to the 45” line, but if values
are widely unequal, the curve comes close to if the distribution has moments up to the
the horizontal axis and suddenly jumps to the kth. The function K(O) = In M(0) is called the
point (1,l). The area between the curve and the cumulant generating function, and the coeffi-
45’ line is called the area of concentration, and cients xj in the expansion
it is equal to one-fourth of the mean difference
5 divided by the mean, where 6 is defined by

are called the cumulants. The kth cumulant kk


is expressed as a polynomial of the moments of
G = h/j~ is called the Gini coefficient of con- order not exceeding k; thus
centration and is used as a measure of con-
ICI =LL;, $ =c(; -$12> K3 = p; -3/L&‘; + 2143,
centration or inequality of distribution. Other
measures, including the coefficient of variation, etc.
are also used to represent the concentration.
For the normal distribution,

F. Discrete Distributions M(U)=exp pQ+E


i 2 I ;

There are cases where the observed values hence


are taken only from the nonnegative integers,
e.g., the number of individual animals of a
specific species in an area, of accidents during
a specified time, etc. In such cases, when we and the kth cumulant for k > 3 is equal to 0.
increase the number of observations, the distri- Cumulants are used as measures indicating
bution does not approach one with a contin- whether the distribution is close to or different
uous density function but rather one with from the normal.
a certain theoretical discrete distribution. For discrete distributions, the moment
Among theoretical discrete distributions, the generating function is defined as M(0) =
most commonly used are the binomial distri- C e’j.6, but the probability generating func-
397 H 1476
Statistical Data Analysis

tion P(t)= C&l’= M(lnt), the factorial equal to 1 minus the ratio of the variance of
moment generating function Q(O) = P(B + I), the residual yi-a- bxi to that of y; hence it is
and the factorial cumulant generating functions sometimes called the coefficient of determina-
R(t) =ln A(t) are also of use. The coefficient of tion. Similarly, the linear regression function
tj in the Maclaurin expansion of a(t) is ex- of x on y is defined by x = c + dy, where d =
pressed as +, and called the kth factorial Cov(x, y)/s: and c = X- dy. We have bd = r&,,
moment; it is equal to j+, = Ej j( j - I) . . (j - and Il/dl=Ib/&J>jIbJ.
k + l)fj. That in the expansion of R(t) is the We can tabulate the bivariate frequency
factorial cumulant. The factorial cumulant rctkl distribution by splitting the range of x values
is expressed by the same polynomial in pt,, as into K intervals (aiel, ai], i = 1, . * K, and the
K~ is expressed in p;. For the Poisson distri- range of y values into L intervals jbje,, bj], j=
bution, M(O)=expA(e’ - 1); hence r?(t)= Jr 1, . . , L, and counting the number fj of cases
and it follows that the factorial cumulants ICKY, for which ai-l <x<ai and bj_, <ydbj. In
for k > 2 are all equal to zero if and only if the contrast to the bivariate frequency distribu-
distribution is Poisson. tion, the distributions of x and y values are
called the marginal distributions.

H. Bivariate Distribution
I. Bivariate Density Function

When two quantitative observations are ob-


tained for each member of a population of As we did for the univariate distribution, we
size N, the results are given as N pairs of real can consider the limiting shape of the bivariate
numbers (xi, yJ, i = 1,2, . . , N. Such data are frequency distribution when the size N of the
called bivariate data and the distribution, population tends to infinity and define a con-
bivariate distribution. Those data can be illus- tinuous bivariate distribution with density
trated as N points in a plane with coordinates function f(x, y), with which the ratio of those
(xi, yJ, and such an illustration is called a members in the population with values (x, y) in
scatter diagram. In order to characterize a a set S in a plane is given by jjSf(x, y) dx dy.
bivariate distribution, we often use bivariate The bivariate density function is also called
moments the joint density, and then the density functions
of x and y are called the marginal density
functions and are given by f,(x) = jf(x, y)dy
and f,(y) = jf(x, y) dx. The joint moments of a
where ? and y are the means of x and y, re- continuous bivariate distribution are defined
spectively; especially, the (1,l) moment M,,, by ~k,l=~jb-dk(~ -~r,)‘f(h y)dxdy, where
is called the covariance and is denoted as p, and p(2 are the means of x and y, respec-
COV(X, y). The most often used measure of tively. The joint moment generating function is
the strength of the relation between x and defined by M(t,, tz) =~~erlx+*2Yf’(x, y)dxdy, and
y values is the correlation coefficient rX,Y = the cumulant generating function by K(t t, t2)
Cov(x, y)/s+ where s, and sY are the stan- = log M(t,, t2), from which the joint cumulants
dard deviations of x and y. It is easily shown (jk+l
that -1 <I , x,y G 1, and when there exists a K ---K(t,,tJl*,=o,r,=o
“-‘-atfat:
nearly linear relationship between x and y
values, rx,Y is close to either +1 or -1 accord- are derived.
ing to whether the x and y values change in The conditional density of y given x is de-
the same direction or in opposite directions. fined by f(y 1x) =j(x, y)/f, (x), and the distri-
When there is no clear relationship between x bution with this density function is the con-
and y, the correlation coefficient is close to ditional distribution of y given x; this latter can
zero, but it may not be a good measure of the be interpreted as the distribution of y of those
relationship when x and y values are related members in the population with x values in
nonlinearly. the interval (x,x + dx], where dx is small. The
A linear function y = a + bx is called the conditional density and the conditional distri-
linear regression function of y on x, for which bution of x given y are similarly defined. The
the sum of the square distances Ci(yi - a - mean and the moments of the conditional
bx,)’ is minimized. For the linear regression distribution are called the conditional mean
function the coefficients u and b are deter- and the conditional moments. The conditional
mined by b = Cov(x, y)/s: and u =y - bx, and b mean of y given x, considered as a function of
is called the regression coefficient. We have x, is called the regression function of y on x.
that Zi(yi-a-bxi)2/C(yi-~)2= 1 -r$, i.e., By far the most important theoretical bivari-
that the square of the correlation coefficient is ate density is the bivariate normal density,
1477 397 J
Statistical Data Analysis

which is given by regression function of xk on x r , . . , xkml, when


the coefficients a r, . . , ukml are so determined
f(x,Y)=(2~~1~2J1-p2)-’ that the sum Q=Ci(xik-a,-ua,xi, -...-
1 (x-PlY (Y-d ak-1~ik-1)2 is minimized. They are determined
xexp
{ 2(1-$) [ (r:+ 0: from the equation

~2p(x-Pl)(Y--I”2)
Cila,+Ci2a2+...+Cik-lUk~l=Cikr (*)
a1 02 II1 i=l,...,k-1,
for which the mean of x is pI and that of y is witha,=x,-a,~,-...--a,-,~,-,, where C,
p2, the variances of x and y are 0: and ~22, are the covariances. a,, . , ak-r thus deter-
respectively, and the covariance of x and y is mined are called the regression coefficients of
equal to per 02. The bivariate normal distri- xk on x,, . . . . xk-r, and such a procedure is
bution has several remarkable properties: All called the method of least squares. The equa-
the k, 1joint cumulants are equal to zero for tion (*) is called the normal equation. If we
k + 12 3; the marginal distributions of x and write~ik=a,+a,xi,+...+ak~Ixik~,,wehave
y are normal; the regression functions of y on Q=~(x~~-~~~)~=~(x~~-X~)~-~C(~~~-X~)~=
x and x on y are both linear; the conditional C(Xik-Xk)2 X(1-R,,, ,.._,k-l*), where &I1 ,.._,k-1
distribution of y given x (and x given y) is is the multiple correlation coefficient of xk and
normal and the conditional variance is con- xl, . . . , xk-r, which is also equal to the correla-
stant; and the contours f(x, y) = c for different tion coefficient of xk and gk. The square of the
values of c are equicentric ellipsoids. multiple correlation coefficient is also called
the coefficient of determination. The quantities
xik -tik are called the residuals. Let ti k-l and
J. Higher-Dimensional Data fik be the values of regression functions of xkml
and xk, respectively, on x1, . , xke2, and let
When the data are of more than two dimen- yik-l =xikm, -iik-, and yik=xik-iik be the
sions, i.e., more than two observations are residuals; then the correlation coefftcient of y,-,
made on each of the objects, we designate the and y, is equal to the partial correlation co-
data by Nk-tuples of real numbers (xii, xi2, efficient of xk-r and xk given x1, , xkm2.
. ..) Xik), i= 1, . . . . N (k 2 3). Then we can calcu- We have the following relation between the
late the moments of each of the variates and multiple and the partial correlation coefftcients:
the joint moments, which are defined by
l-R2 kll,...,k-I
M k,,k, . . . . . kk
=(I -Rk2,, ,_.., k-2)(1 -rk2_l,k(L ,..., k-2).

=~~(xil-XI)kI(Xi2-22)*2
...(Xik-Xk)kk. Multiple and partial correlation coefftcients
are also expressed in terms of the correlation
Also, we can arrange the variances and covar- coefficients of the variates. For example, it can
iances in a symmetric matrix of order k, and be shown that
we call it the (variance-) covariance matrix. A
covariance matrix is easily shown to be non-
negative definite. The determinant of the covar- and that
iance matrix is called the generalized variance.
r2311 =(rz3-r12r13U (1--rf2)(l -rf3).
The matrix with the (i,j) element equal to the
correlation coefftcient of the ith and the jth For higher dimensions, we can also define
variates rii (rii is set equal to 1) is called the the (joint) density function f(xr ,x2, , xk) and
correlation matrix and is denoted by R. R is the (joint) moment generating function
also nonnegative definite. If we denote the M(t,,tz>...,tk)

(i, j) cofactor of R by R,, the quantity defined


by =S...Sexp(t,slfr?x2+...+r,xk)

R,~,,.,., (i). .._.kc&f


xf(x,,x, ,..., x,)dx, . ..dX..
is called the multiple correlation coefficient of
the ith variate and all other variates; and The most important multivariate joint den-
sity is that of the multivariate normal distri-
rijl1 . . (1) ,...,(j) ,.._, k= -RijIJRLiRjj bution, which is expressed by
is called the partial correlation coefficient of
the ith and the jth variates given all other
variates. The meaning of these coefficients will
be elucidated below. A linear function a0 + - Pi) txj - Pj) ,

u,~,+a,x,+...+a,~,x,~, iscalledthelinear 1
397 K 1478
Statistical Data Analysis

where Z is the covariance matrix with ele- it can be shown that X2/N =0 if and only if
ments ‘TV,,C-’ =(c#j), and p’i is the mean of the the two criteria are independent, and that
ith variate. For the multivariate normal dis- O<X’/N<min(m, - l,m,-1). When we take
tribution the moment generating function is -xi = 1 if the object is classified into the ith
given by category according to the first criterion and
x, = 0 otherwise and take yj = 1 if it belongs to
M(t,, . . ..t.)=exp ~,liti+~~cijtitj the jth category in the second criterion and
( 1 yj=O otherwise, it is shown that the sum of
squares of the multiple correlation coefficients
K. Contingency Tables of xi and y,, ,y,,,-, is equal to X2/N.

When several qualitative observations are


made on N objects, each object is classified L. Decomposition of the Variance
according to the combination of the cate-
gories, and the data are summarized by the When one observation is qualitative while
numbers N(i,, i,, , ik) of the objects that another is quantitative the objects are classi-
fall in the i, th category according to the first fied into several categories according to the
observation, i,th category in the second obser- first observation, while for each object the
vation, etc. A table that shows the results value of the second observation is also given.
of such observations is called a (k-way) contin- Let xij be the observed value of thejth object
gency table. If there are m, categories in the in the group of the ith category; then for each i
first criterion, m2 categories in the second, we can obtain frequency distributions of xii,
etc., the contingency table is also called an m, and compare these distributions. Let Ni be the
by m2 by.. by mk table. The numbers fi(,j, i,) of number of objects in the ith category, and
the objects which are classified into the i,th xi = 2 xij/Ni be the mean in the ith category.
category according to thejth criterion are Then the weighted variance of the .Yi defined
called marginal frequencies. If we have by uB = CiNi(i(xi -@/N, where X is the mean of
all the observations, i.e., X= C N,Y,lN, is called
N(i,,i, ,..., i,)/N=fl(l,i,)fl(2,i,)...m(k,i,)/Nk the between-group variance, and the weighted
for all i,, i,, , i,, mean of the variances of each of the groups
defined by uw = xi Cj(xij - YJ’/N is called the
then the k observations or criteria are within-group variance. It can be shown that V,
independent. + Vw = k’= z C(xii - X)‘/N, i.e., the variance of
The simplest contingency table is a 2 by 2 all the observations is decomposed as a sum of
table, where several measures for the relation the between-group and the within-group var-
of two observations or criteria have been iances. The ratio V,/V is called the correlation
proposed, among which the most commonly ratio, which is equal to the square of the multi-
used are the measure of association defined by ple correlation coefficient of x and y,, , y,,
where yi= 1 if the object is in the ith category
N(l,l)N(2,2)-N(1,2)N(2,1)
Q= and yi = 0 otherwise.
N(1,1)N(2,2)+N(1,2)N(2,1)
and the odds ratio
M. Ordinal Data
6_N(L 1WW)
N(l,WG 1) When the observation is not quantitative but
and also there exists a natural ordering among the
categories into which the objects are classified,
,,=W, 1)N(2,2)--N(l,W(Z 1) the observation is said to be in an ordinal
@l, l)fi(1,2)fl(2, l)N(2,2) scale, or simply ordinal.
When two ordinal observations are made
where N(i,jJ are marginal frequencies. The
on the same set of N objects, we can define
two observations are independent if and only
several measures of association between the
ifQ=Oora=I and V=O. Visequaltothe
two ordinal scales. For each pair of objects we
correlation coefficient of the variables x, and
define a variable cij, i, ,j = 1, , N, i # j, as cij = 1
-x2, for which x, = 0 if the object is classified
if the ith object is classified as “better” (or
into the first category according to the ith
“superior”) than the jth object according to
criterion and xi = 1 if it is classified into the
both of the measurements, cij= -1 if the order-
second category. In a two-way m, by m2 table
ings are different in the two scales, or cij = 0 if
a measure of association is defined by
they are in the same category according to
either or both of the scales. A measure of asso-
ciation is then given by S = C C c,jiN(N - l),
1479 397 N
Statistical Data Analysis

which takes a value between -I and +l but of the time series data. Various methods of
usually cannot attain kl. Other ways of nor- seasonal adjustment have been proposed and
malizing the sum C C cii have been proposed. applied, but none is definitive. The other case
Another method of calculating the associ- is where the cyclical changes are produced
ation is the scoring method, i.e., giving a set of from the observed process itself; here, the
ordered real numbers to the categories of each length of the period and the pattern of the
of the scales and calculating the correlation cyclical change must be estimated.
coefficient between the scores. The simplest Now assume that the data do not contain
scores are 0, 1, , m - 1 when there are m any trend, or that the trend has been effec-
categories, but other methods of scoring are tively eliminated. First we calculate the cor-
also used. Scores that give the largest pos- relation coefficient between x(t) and x(t +.s) by
sible correlation are called canonical scores,
which are obtained as the characteristic vec-
tors of the matrices NN’ and N’N, where N is
the matrix of the contingency table.
where
T-s
-u:=1 x(t)/(T-s), z= ,=$, XOMT-4,
N. Time Series Data 1=1
or more simply by
Time series data can be recorded in a con-
tinuous time scale, but usually measurements r(s)=TC(x(t)-X)(x(t+s)-5x)/
are made at discrete times, which are most
commonly equally spaced. Hence we here ((T-.+J~m),
denote them as x(t), t= 1,2, , T. First we
where X= C x(t)/T. r(s) is called the serial
consider the quantitative univariate case. The correlation coefficient or the autocorrelation
intertemporal change of x(t) is often decom-
coefficient of lags. When there exists a clear
posed into three parts: and definite cyclical change of period s in the
x(t)=m(t)+c(t)+e(t), data, r(s) is close to 1. The diagram in which
the serial correlation coefficient r(s) is plotted
where m(t) is called the trend, and represents against s is called the correlogram. In order to
the secular, systematic change of x; c(t) is
see the cyclical properties of the data more
called the cycle, and represents the recurrent clearly, we calculate the power spectral density
pattern of the change; and e(t) is called the
error or random fluctuation, and represents w(l)= 1+2Cr(s)cosis.
s
the irregular changes. Such a decomposition
cannot be defined rigorously without assuming The graph of w(i) is called the power spectrum,
some probabilistic or stochastic model for x(t), or simply the spectrum. w(i) represents the
but it is intuitively clear and practically useful square of the width of the sine curve of fre-
in many applications. quency 3.12~ or of period 271/j& contained in the
There are two ways to estimate the trend. data. The spectral density is closely related to
One is to calculate the moving average a(t) = the intensity, defined by
(x(t-k)+x(t-k+l)+...+x(t)+...+x(t+
k))/(2k + 1) and use it as an estimate of the [(A)=+( (Tx(t)cosir)l + (Tx(Qsin;1>3,
trend of x(t); here k should be chosen to sub-
stantially eliminate the cyclic and random which is proportional to the square of the
parts. More generally we can use the weighted multiple correlation coefficient of x(t) and the
moving average defined as x(t) = xi”= -k w(j)x functions cos it and sin It and is large if the
(t+j), where w(-j)= w(j) and Cw(j)= 1. The data contains a sine curve of frequency I/271. It
second method is to assume some functional can be shown that I(I) is approximately equal
form, usually a polynomial in t, for the trend: to w(R)V(x)/n. The spectral density thus ob-
m(t) = u0 + a, t + + LIPtk, and to determine the tained usually oscillates irregularly and far
coefficient by least squares, i.e., to calculate the from smoothly; hence smoothing by use of a
values of uO, u,, . . , uk which minimize J$(x(t) “spectral window” is often applied (- 421
-u,-u,t--...-uktk)2. Times Series Analysis).
There are two cases of cyclical changes. One When several observations are made in time
is the case when there is a clearly defined rele- series data, we speak of multivariate time
vant external time period, such as the seasons series. Let x,(t) be the ith observation in the
of the year or the days of the week. In such tth period. The correlation coefficient between
cases the effects of such external periodical x,(t) and x,(t + s) is called the serial cross-
cycles must be eliminated, and the process correlation coefficient and is denoted by r,j(s)
which does that is called seasonal adjustment (s=O, kl, f2,. .). Analogously to the univari-
397 0 1480
Statistical Data Analysis

ate case we define tion, and the information we require is repre-


sented by a set of unknown constants that
w,(i) = 1 rij(s) cos is + ix rij(s) sin is
s s characterize the probability distribution of the
data as unknown parameters. Suppose, to take
= Pzjti) + iqij(A),
the simplest case, that we have repeated obser-
and we call pijo.) the cospectral density be- vations of results of some experiment under a
tween the ith and the jth variables, and yij(n) fixed condition and that we obtain the values
the quadrature spectral density. pi + qc is called x1,x;, . . . , x,. Since the experimental condition
the amplitude, and is fixed, the variations among the xi values can
be considered to be due to chance causes,
such as variations in materials, uncontrolled
small fluctuations in experimental conditions
where wi and wi are the spectral densities of or instruments, and various other variations
usually called the errors. Whatever the true
the ith and the jth variables, is called the
causes of the variations, we can consider them
coherence.
to be random, and we can regard the values
X1,X2,.‘., x, as the results of random experi-
0. Events in Time Scale ments or the realizations of random variables
X, ,X2, . ,X, independently and identically
Some data give us the time points at which a distributed according to some probability dis-
specific event occurs. Let 7; be the time when tribution. Or we may think of a hypothetical
the event occurs for the ith time. Then usually infinite population of the results of supposedly
the most important information we want to infinite replications of the experiment under
obtain is about the time intervals di= T+, - 7;. the same fixed condition, and regard the actual
If there is a periodicity in the occurrences of observations as n values chosen from this
the event, the di will be approximately equal. population at random. We may also consider
On the other hand, if the event tends to occur that in this hypothetical infinite population,
repeatedly after its first appearance, some di the frequency distribution is represented by a
will be small while others will be large. When density function J which in turn determines
there is no periodicity, no tendency to repeti- the probability distribution of each observa-
tion, and no increasing or decreasing trend tion We may be interested in the “average”
in the occurrences, we can suppose that the values of the result of the experiment as well as
event occurs simply by chance, and this is the magnitude of the variability; then those
good reason to suppose that the density func- values are represented by the mean and the
tion of the distribution of the intervals is variance of the population distribution. If the
exponential, i.e., it can be expressed by ,f(d) = form of the population distribution is assumed
(exp( - d/a))/a for d >O. Also in such a case to be completely specified except for the mean
the number of occurrences in fixed time in- p and the variance 02, the density function f is
tervals are distributed according to the Pois- determined without these two parameters, and
son distribution. Such a sequence of occur- is expressed as f(x; p, cr). The joint density for n
rences of an event is called a +Poisson process. repeated observations is l-&j’(xi; ,u, n). The set
More generally, let f(d) be the density function of assumptions that determines the probability
of the time intervals; then distributions of the observations in terms of
the unknown parameters is called the probabi-
f(d) listic model, and its determination is called the
h(d)= 1 -j;f(c)dc
problem of specification.
is called the hazard rate or hazard function. Once the probabilistic model is given, the
The hazard function is constant if and only if purpose of statistical data analysis can be
the process is Poisson. formulated as making judgments on the values
of the parameters, which may sometimes go
wrong but can be relied on with some margin
P. Probabilistic Models of probability of error that can be mathemati-
cally rigorously ascertained. The formal proce-
In many applications of statistical data anal- dure of making such judgments is called sta-
ysis, the data exhibit variabilities and fluctu- tistical inference, and its mathematical theory
ations that are due to fortuitous or hazardous has been well established over the last hundred
causes or chance effects and that obscure the years (- 401 Statistical Inference).
information contained in the data. In such In most cases of statistical inference, the
cases we assume that the chance variabilities joint density function of the data plays an
and fluctuations are random variables distri- important role, and when it is regarded as a
buted according to some probability distribu- function of the unknown parameters for given
1481 398 A
Statistical Decision Functions

values of observations it is called the tlikeli- observation, measurement, or experimentation


hood function. did not actually prevail but that there has been
some heterogeneity among the observations.
Bimodality or multimodality of the histogram,
Q. Exploratory Procedures i.e., existence of two or more peaks in the
histogram, usually strongly suggests such
In many applications of statistical data anal- heterogeneity. In such cases, grouping or stra-
ysis, we are not quite sure of the validity of the tification of the observations is required to
probabilistic model assumed, or we admit that make the conditions of observation within
the models are, at best, approximations to each group nearly uniform.
reality and hence cannot be exactly correct;
the approximation may not be precise enough
References
for the conclusions drawn from the assump-
tions to be practically reliable. Therefore we
[l] R. A. Fisher, Statistical methods for re-
have to check whether the model assumed is at
search workers, Oliver & Boyd, first edition,
least approximately valid for the data, and
1925.
if not, we have to look for a better model that
[2] M. G. Kendall and A. Stuart, The ad-
reflects more accurately the structure of the
vanced theory of statistics, 3 vols., Griffin,
actual data. Thus in many practical appli-
1958%1966.
cations of data analysis, we have to scruti-
[3] J. W. Tukey, Exploratory data analysis,
nize the structure of the data and try various
Addison-Wesley, 1977.
models before settling on a model and draw-
ing final conclusions (which are still suscep-
tible to further revisions when more data are
obtained). Methods used in such a process
are called exploratory procedures, which de- 398 (XVlll.6)
pend partly on the formal procedure of testing Statistical Decision
hypotheses and partly on intuitive reasoning
Functions
sometimes combined with graphical presen-
tations of the data, and also on scientific and
empirical understanding of the subject matter. A. General Remarks
Suppose in the simplest case that n obser-
vations X,, , X, are assumed to be indepen- The theory of statistical decision functions was
dently and identically distributed. Under the established by A. Wald as a mathematically
condition that all those values are observed unified theory of statistics (- 401 Statistical
under the same well-controlled situation, this Inference). In this theory, the problems of
assumption is reasonable. But in reality some mathematical statistics, for example, statistical
of the observations may be subject to some hypothesis testing and tstatistical estimation,
unexpected effect due to either a fortuitous are formulated in a unified way [ 11.
outside cause or some “gross error” in the A tmeasurable space (X, 23) with a fixed
measurement procedure, the process of re- tprobability measure is called a sample space,
porting, etc., and may show much greater and an element x~% is called a sample point.
variation than others. Such observations can Suppose that we are given a family 9 = {PO 1
be detected by certain outlier tests or simply OER} of probability measures on (?Z, 23),
by looking at the data carefully, and if it is where Sz is called the parameter space, and a
established that some observations are de- +random variable X takes values in % accord-
finitely outside of the possible random varia- ing to a true probability distribution P as-
bility or are subject to some hazardous ex- sumed to belong to 9. This article deals with
ternal effect, those data could be omitted from the problems involved in making a decision
consideration. Further, the assumed proba- about the parameter 0, called determining the
bility density f(x, 0) may not well approximate true value of the parameter, such that P = PO.
the distribution of the actual data even after To describe the procedure for such a decision
the “outliers” are omitted. Some test for good- based on the observation of the behavior of X,
ness of fit should be applied, and if the hy- we need a triple (&, (5; D) consisting of a set .d,
pothesis is rejected, we have to modify our a +a-algebra a of subsets of .d, and a set D of
model. Also, if we are provided with several mappings 6 from Z into the set of probability
candidates for the model to be adopted, we measures on (.&, %), 6 :x+6(. 1x), such that
have to apply some procedure of model selec- for a fixed CE&, the function 6(C 1x) is %-
tion (- 400 Statistical Hypothesis Testing, measurable. We call .r4 an action space or
403 Statistical Models). It could also happen decision space, 6 a statistical decision function
that the supposedly uniform conditions of (or simply a decision function) or statistical deci-
398 B 1482
Statistical Decision Functions

sion procedure, and D a space of decision func- testing procedure 6 is adopted, the proba-
tions. In actual decision procedures, 6(C 1x) is bilities of the terrors of the first or second kind
the probability that an action belonging to C coincide with the values of r(O, 6) for BEW~ or
is taken, based on the observation of sample OEW~, respectively (- 400 Statistical Hypoth-
point x. We further consider a nonnegative esis Testing).
function w : R x ,ti + R, called a loss function, When Q is the union of mutually disjoint
such that for a fixed 8, ~(0, a) is (S-measurable. nonempty sets We, w2,. . . ,a,, &= {ul, u2,
By averaging the loss, we obtain the risk “‘, a,}, and w(O,uj)=cij (Qewi) with cij>O
function: (i #j), cii = 0, the decision problem is called
an n-decision problem.
r(N, 6) =

B. Optimal&y of Statistical Decision Functions


Two decision functions 6 and 6’ are identified
if 6(C 1x) = 6’(C 1x) for almost every x with Consider the problem of choosing the best
respect to PO, for all 0~0 and all CEK When decision function 6. When r(O,6,) < r(O, 6,) for
to each x E .!X there corresponds a unique ac- all 0 and there exists at least one B0 such that
tion a, such that 6( {a,} 1x) = 1, the decision r(Q, 6,) < r(B,, 6,), the decision function 6, is
function 6 is said to be nonrandomized; other- said to be uniformly better than 6,. If there
’ wise, randomized. The system (%, 8,Y, R, .d, exists a decision function 6, in D that is uni-
6, W, D) is called a statistical decision problem. formly better than any other 6 in D, it is the
From the point of view of this theory, ipoint best decision function. However, such a func-
estimation, tinterval estimation, and statistical tion 6, does not always exist. A decision func-
hypothesis testing (- 400 Statistical Hypoth- tion $ in D is said to be admissible if there
esis Testing) are described as follows. exists no other decision function 6 in D that is
(1) In point estimation we assume that the uniformly better than 6: In other words, g is
action space .d is a subset of R and that we admissible if and only if the validity of the
are given functions cp:%‘+.d and I:P+R. inequality r(@, 6) < r(O, 5) for some 6 ED and all
The problem is to estimate the value of I(P) 8~52 implies r(O,6)=r(B,8) for all OER. When
by using the real value q(x) at an observed there is no information about P except that it
sample point XEX. As the loss function, we is a member of 9, we follow the minimax
often set w(~,u)=C(B)(~-I(P,J)~, where C(0) is principle and choose a function 6* for which
a function of 0, and call it a quadratic loss we have inf,,,sup,,,r(0,6)=sup,,,r(B, a*).
function (- 399 Statistical Estimation). This decision function 6* is called a minimax
(2) In interval estimation we assume that decision function or minimax solution.
each action is represented as an interval in Let 5 be a c-algebra of subsets of R, and
R. Each interval [u, u] can be represented suppose that r(O, 6) is g-measurable for any
by a point (u, V) of the half-space R2’ = {z = fixed 6. If, furthermore, we are given a proba-
(zl, z2) 1z, <z2}, which may be taken as the bility measure 5, called an a priori distribution,
action space. A weighted sum XW, (0, z) + on (0, s), we choose a s^ that satisfies
[jw,(B, z) (c(, /j 2 0) of two functions,
inf r(O,tT)d<(f3)= r(O,i)d<(O).
6EDs R s0
Such a 6 and the integral of r(0, 8) are called a
Bayes solution and the Bayes risk relative to 5,
often supplies the loss function (- 399 Statis- respectively. Let F be a family of a priori dis-
tical Estimation). tributions on (Q 8). If $ satisfies
(3) In testing a hypothesis H:OE~, versus
antalternativeA:OEWl(W,,r)W1=O,W,,UW1
= a), the action space can be expressed by the
set consisting of two points a,, a2, where a, s^is called a Bayes solution in the wider sense
denotes the decision to reject H and a2 the relative to F. If .Y is tdominated by k with a
decision to accept H. The loss function is b x 3-measurable f(x, 0) =dP,/dl, ~(0, a) is
defined as follows: 3 x K-measurable, and

This is called a simple loss function. Whatever is nonempty and Q-measurable for ),-almost
1483 398 D
Statistical Decision Functions

every x, then a Bayes solution 6 with respect class in 9. However, when some members of
to t satisfies &(A, (x) = 1 for I-almost every x. 9 are not atomless, *(9) is not always equal
For a sample point x with j&(x, O)d<(fl)#O, a to $(9”), but $(C@‘) is a closed subset of Rk. In
probability measure r~(. ( x, 5) on (Q 3) defined particular, when n = 2 for given probability
by measures Pi, , Pk over (%,23), the set S of all
points (P,(B), . . ..P.(L?))(BEB) in Rk is a closed
fk ‘WW) set. If in addition P,, . . . , Pk are all atomless, S
s is convex. These results are known as the
v(BIx,5)=2
Lyapunov theorem.
fk 0) dt(@
sn A two-decision problem with oi = { l),
w2 = {2}, and cij= 1 (ifj), =0 (i=j) is called a
is called an a posteriori distribution. To get a
dichotomy. We discuss this problem in some
Bayes solution it is enough to minimize the
detail in order to explain the concept of opti-
value of lo w(Q, a)dq(U( x, 5) for every observed
mality of decision functions. For a dichotomy,
X.
S= $(9) is a set in R2 that (i) is convex, (ii) is
If c in the definition of A, defined above is a
closed, (iii) is symmetric about (l/2, i/2), (iv)
a-finite measure on (Q, B), a decision function
contains the points (0, l), (l,O), and (v) is a
6 satisfying &A, 1x) = 1 for I-almost every x is
subset of interval [0, l] x [0, l] (Fig. 1). The set
called a generalized Bayes solution with respect
of decision functions 6 that is mapped under $
to 5.
onto the curve ACDB in Fig. 1 constitutes the
Let D’ be a subset of the space D. If for any
minimal complete class, and a decision func-
6 ED -D’ there exists a 6’ E D’ that is uniformly
tion 6 mapped onto the point D is a minimax
better than 6, then D’ is called a complete class.
solution. Let 5 be an a priori distribution such
If for any 6~ D there exists a TED’ that is
that
either uniformly better than 6 or has the same
risk function as 6, then D’ is called an essen- i”(l)=4 5(2)=8 (c(+p=1,a>o,B~o).
tially complete class. If D’ is complete and any
proper subset of D’ is not complete, then D’ is
called a minimal complete class. If a minimal
complete class exists, it is unique and coin-
cides with the set of all admissible decision 0x i py = consta
functions.

C. n-Decision Problems

In an n-decision problem where &= {ai,


, a,), we set hi(x) = ~?(a,[ x) for a decision Fig. 1
function 6, where C&(X) is B-measurable and
satisfies Sj(x)>,O, 6,(x)+ . . . +6,(x)= 1. We Then a Bayes solution relative to 5 is mapped
consider the set 9 of vector-valued functions under II/ onto a supporting point c with direc-
A(x) = (6, (x), . , S,(x)) whose components hi(x) tion ratio -E/B and is obtained as the char-
satisfy the conditions just given. Such a vector- acteristic function of the set E = {x 1u,,(x) <
valued function A(x) can be identified with /$J(x)}, where f and g are tRadon-Nikodym
S(x); we write 6(x) instead of A(x) also. If in derivatives dP, Jdl and dP, JdL with respect to
addition the parameter space R is a finite set E.= P, + P2, that is, f, g are measurable func-
{Q,, Q,, . . . , a,), we can consider a mapping tions on 57 such that for any measurable set
$:9+R” defined by $(s)=(r(O,, 6), . . . , r(&, 6)) E, we have
and then S=$(9)={$(6)\6~9} is convex and
closed in Rk. If 6 is nonrandomized, then for PI (El = Id5 P,(E)= gdl.
each x, one and only one of the C&(X) is 1, and sE s E

all others are 0. Hence, in this case, 3 is the This fact implies that the most powerful test
disjoint union of d-measurable subsets Bi constructed in the Neyman-Pearson funda-
(i=l,...,n)suchthat?I(a,Jx)=l ifandonly mental lemma (- 400 Statistical Hypothesis
if x E Bi. A probability measure m on (X, 23) is Testing B) is precisely a Bayes solution.
said to be atomless if for any set A~23 with
m(A)>Oand any b with O<b<m(A), there
exists a subset BE 23 of A such that m(B) = b. If D. Complete Class Theorems
R is finite and every member of B is atomless,
the image $(63’) of the set 9’ of all nonrandom- Suppose that 9 is tdominated by a cr-finite
ized decision functions coincides with $(k%). measure I, and let f(x, 0) be the Radon-
This shows that 9’ is an essentially complete Nikodym derivative of Pe with respect to i
398 E 1484
Statistical Decision Functions

Consider a subspace L of L, (:X, i.) containing holds on %, and (iii) for any pair tl,, &(ER),
(f(x, 0)I Uefi}, and the following equivalence there exists a I, such that no element a in
relation between bounded 2%measurable func- .d makes two of the w(Qi, a) (i= 1,2,3) attain
tions on 1’: ‘p, and qz are defined to be equiv- their minimum simultaneously, then essential
alent if and only if l ‘pi f& = 1 q2j% holds completeness of sT in 9 implfes that T is
for every f~ L. We further assume that the sufftcient [3] (- 399 Statistical Estimation).
dual space of L is the linear space %I1of the
equivalence classes of bounded measurable
functions just defined. Let C,(&) be the set of E. Invariance
all continuous functions on .d with compact
support, rt o !x the integral of 2~ C,(,c3) with Suppose that there exist one-to-one transfor-
respect to a probability measure TI over ~2, mation groups G, G, and G of X, Q, and -01,
and 9 o 6 the integral respectively, onto themselves (transformations
belonging to G, G, and G are tmeasurable with
respect to 23, 8, and K, respectively) and that
there exist homomorphisms g-3, g-Q of G to
for 9 EL, fi E D. As a base for neighborhoods of G and e, respectively, such that for BE% we
the space of decision functions around &ED, have Po(gm’B)=Pge(B) and w(gO,ga)=w(O,a).
weconsider I’(&,:% ,,..., a,,g ,,..., y,,s)= Then a decision problem (%,23, B, R, &, K:,
~6~(g,06,0a,-g,060a,(<&, i=l, . . ..n}. w, D) is said to be invariant under (G, G, G). In
where cli E C,(&), gi E L, E> 0. a decision problem invariant under (G, G, G), a
Let F be the set of all a priori distributions [ decision function satisfying 6(#C 1gx) = 6(C 1x)
on (Q, 3) each of which assigns the total mass is called an invariant decision function.
I to a finite subset of !& B the set of all Bayes Suppose that the transformation group G is
solutions relative to some <(EF), and W the set locally compact and is the union of a count-
of all Bayes solutions in the wide sense relative able family {Kn} of compact subsets. Let I be
to F. Suppose that .d is a tlocally compact, the o-algebra of Bore1 subsets of G such that
separable metric space and w(O,a) is lower the mapping (g, x)-(g,.gx) is measurable in
semicontinuous with respect to a for every the sense that the inverse image of any set in
fixed 0. Then if D is compact and convex, I- x 23 is also a set in I x %3. For such a I the
the intersection of Wand the closure B of i? +orbit G(x) of G through x is r-measurable.
constitutes an essentially complete class [2]. We assume that following conditions: (1) 9 is
Moreover, if Q is compact with respect to the dominated; (2) G operates teffectively on 3; (3)
metric d(0,,0,)=su~,,,(P,~(B)-P,~(B)( and C operates ttransitively on Q, (4) for any com-
{ w(. , a) 1a E &} is a uniformly bounded and pact subset J of G,
tequicontinuous family on R, then the class of
Bayes solutions relative to some a priori distri-
bution is a complete class [ 11. These propo-
sitions are called complete class theorems (for where ,u’ is a right-invariant Haar measure
(- 225 Invariant Measures C) on G and
complete classes in specified problems and
admissibility of individual procedures - 399 K;J-‘={gh-‘Ig~K,,h~J};(S)there
Statistical Estimation; 400 Statistical Hypoth- is a tconditional probability distribution
esis Testing; and Appendix A, Table 23). P,(.:z), given zig, on X; (6) the integral
Two measurable spaces (S, -vi) and (R, W) are jww(@> iWPo(dg x : z 1 a tt ams its minimum
said to be isomorphic if there exists a corre- value b(& z) for any 0 and z E G(x); and (7)
spondence p of S and R (E E Y implies p(E) E .X
and, conversely, p(E) EB implies E E Y) where w(O,@)Po(dgx:z)-b(B,z)
p is one-to-one onto. Let 3 be the set of all
decision functions associated with a sample uniformly in a, where K,x = { gx 1YE K,}. Then
space (:I,‘%) and an action space (,d, K), T be a the best invariant decision function exists
+statistic on (5, %) taking values in a measur- which is also minimax in 9 [4] (- 400 Statis-
able space (?U, E), and g* be the set of all tical Hypothesis Testing). It is shown in [9]
decision functions having sample space (“y, @) that some invariant minimax decision func-
and action space (rA, K). The set of all 6 E 9 for tions are not admissible.
which there exists a fi* E%* satisfying 6(C 1x)
=6*(CI T(x)) (CEK) is denoted by gr. If(i)
T is a +sufIicient statistic and if (ii) (.z!, K) is iso- F. Sequential Decision Problems
morphic to the measurable space Rk associated
with the u-algebra of its Bore1 subsets, then t%), In the general framework of statistical deci-
is essentially complete in 9. Conversely, if(i) .‘P sion theory, not only the decisions to be taken
is a dominated family, (ii) f(x, 0) > 0 always but the number of samples to be observed
148.5 398 G
Statistical Decision Functions

may be determined based on the previous and&(x, ,..., x,)=d* ifJw(O,d*)dlr,(x, ,...,
observations. x,;<,J=inf,,~w(B,d)d~,(x, ,..., x.;&), where
A simple formulation is illustrated for the x,(x,, , x,; &) is the posterior distribution
sequential decision problem given below. given X,=x1, . . . . X,=x, under the prior 5,.
Suppose that X, , X,, , X, are independently For the dichotomy problem with the simple
and identically distributed random variables loss function discussed above, the Bayes deci-
with probability measure PH. We assume that sion rule has the form
the X’s are to be observed one by one, and
s,(x,, . ,x,)= 1 if nO(x,, . . . . x,)<n, or >7cr,,
at the ith stage, when we have observed X,,
“’ >Xi, we are to decide whether to continue = 0 otherwise,
sampling and observe Xi+l or to stop obser-
and
vation and choose an action or decision in 3,
utilizing all the observations thus far obtained. d”n(X,,..., x,,)=dl if rr,(x,, . . . . x,)<n,,
Then a decision rule is defined by a sequence
=d, ifn,,(x ,,..., x,) ax,,
of pairs (S,,, s,), n = 0, 1, , where 6, is a map-
ping from the space X” to 9 (for the sake of where nO(xl, . . . . x,) is the posterior prob-
simplicity we here exclude randomized deci- ability that O=Q,, given X, =x,, Xn=x,,. This
sions), and s,=s,(x~, . ,x,) is a measurable amounts to the following rule: Continue sam-
function from X” to the interval [0, 11. s, gives pling as long as
the probability of stopping the sampling when
the first II of the x’s are observed, and 6, de-
tines the decision taken when the observations
and decide on d, as soon as 2(x,, . . . , x,) > y2
are stopped. We include s0 and S, to denote
andond, assoonasI(x,,...,x,)dy,,which
the probability of taking a decision without
is actually equivalent to a tsequential proba-
making any observation and the decision to be
bility ratio test (- 400 Statistical Hypothesis
taken then. We call 6, the terminal decision
Testing).
rule and s, the stopping rule. Then for such a
decision rule 3 the total expected loss or the

r(O,cY)=
fsn~(l-sj(x,
II=0,...)
Xj))
risk is given by

j=O
G. Information in Statistical

The part &=(X, b,P, 0) of a decision problem


Experiments

A = (%, 23,8, s1, d, 6, w, D) is called a statistical


experiment In this section, we consider n-
decision problems, i.e., those wherein fi con-
sists of a finite number { 1,2,. . . , n} of states,
where c,(x,j x1, , x”~,) is the cost of observa- and we denote the set S = {r( 1,6), r(2,6), ,
tionofX,=x,whenX,=x,,...,X,-,=x,-,. r(n, S) 16 EGO} by L(A). Let 8, and & be two
The rule 5 is called a sequential decision rule experiments having a common parameter
or a sequential decision function, and the whole
space R, and let A1 and A2 be two decision
setup a sequential decision problem. In most of problems composed of Q, and gZ, respectively,
the sequential decision problems the cost of
and a common (&, c;, w). We say that the
observation is assumed to be equal to a con- experiment &, is more informative than the
stant c per observation, and then the Bayes experiment &* if L(A,)zJL(A,) for any action
risk r*(t) for the prior distribution &, satisfies
space (&,a) and any loss function w [6];
the relation
that is, whatever the actions proposed and the
loss incurred, the experiment 6, can offer a
r*(&)=min i;f w(O,d)d&,
IS decision procedure at least as efficient as the
experiment &Z. Thus the set L(A) with A =
r*(nH(x,,Te))dPs(x,)dTe+c (8, .d, a, w, D) represents some feature of infor-
ss mation that d can provide about the states Q.
where x,(x,, &J denotes the posterior distri- However, comparison of L(A) is not easy to
bution when X, =x, is observed under the carry out. S. Kullback and R. A. Leibler de-
prior distribution &; and the Bayes decision fined the concept of information for the case of
rule satisfies the condition a dichotomy A [S]. If the probability distri-
bution induced by a random variable X has a
sn(xl, ,x,)= 1 if r*(7rB(xlr ,x,; &J)
Radon-Nikodym derivative fi(x) (>O) or f*(x)
(> 0) with respect to 1, we define
=inf ~(0, d)dx,(x,, , x,; to),
s
= 0 otherwise,
398 H 1486
Statistical Decision Functions

calling this the Kullback-Leibler information L(A)=1 is that ns, p(,Jm,g,J=O, where
number (or K-L information number). This
number is uniquely determined by the set S = ,h%ii6%x.
L(A) in Fig. 1, and the larger S becomes, the
larger r(fi ,fi) becomes. If x, fl (a + /I = 1) are a Consequently, if the X, (n = 1,2,. ) have the
priori probabilities and ;rl( 11 x) and ~(2 1x) are a same distribution, we have L(A) = 1, and the
posteriori probabilities of 1 and 2, respectively, correct decision can be made with no error
we have, from the Bayes theorem, based on infinitely many independent obser-
vationsofX,,X,,....
f,(x) rl(l Ix)
~o”/zo=lw~(2,x) 1w;.
H. Relation to Game Theory
Here the right-hand side stands for the change
in the probability of the occurrence of the state
The theory of statistical decision functions
after an observation of x, and the texpectation
is closely related to game theory. From the
of the left-hand side under ,1; is I(,j; ,,f2). The K-
game-theoretic viewpoint, a statistical decision
L information number has the following prop- problem is considered to be a zero-sum two-
erties: (i) 1(X: 1,2)>0, and 1(X: 1,2)=00
person game played by the statistician against
,f= y; (ii) tindependence of X and Y implies nature. A strategy of nature is the true distri-
I(X:1,2)+l(Y:1,2)=I(X, Y:1,2);(iii)for a bution P of the variable X or the true value of
statistic T=t(x), 1(X: 1,2)<I(T: 1,2), where 0, and a strategy of the statistician is a decision
the equality holds if and only if T is sufficient 6. In this setup, the risk function r(O, 8) can be
for J = {P, , P2}, in which dP, /di =fi and dP,/ regarded as a +payoff function paid by the
dl=f,; and, as a result of (ii), (iv) if X,, , X,
statistician to nature. An a priori distribution
are distributed independently with the same [ is a +mixed strategy of nature. A randomized
distribution, 1(X,, ,X,,: 1,2) = nl(X, : 1,2).
decision function is a mixed strategy of the
Suppose next that R is the real line and that statistician. A minimax decision function cor-
the Radon-Nikodym derivative g(t: 0) of the responds to a minimax strategy of the statis-
distribution of a statistic T with respect to a tician. A minimax strategy of nature is called a
measure i has the following properties: (i) the least favorable a priori distribution. If a deci-
set of all t at which g(t : 0) > 0 is independent of sion problem is tstrictly determined as a game,
0; (ii) g(t: 0) is continuously twice differentiable; a minimax solution is a Bayes solution in the
and (iii) the order of differentiation with re- wide sense.
spect to U and integration with respect to t If 6, is a Bayes solution with respect to to
can be interchanged. Then I(T: 0,0 + dfl) = and r(O,6,) Q R(&,, 6,) for all 0, the decision
1(T: O)dO’ for an infinitesimal displacement d0
problem is strictly determined, and 6, is mini-
of 0. where max and to is a least favorable a priori distri-
bution, where R(i;,,fi,)=S,r(O,fi,)dr,((l). If
b, is a Bayes solution in the wide sense and
r(U, 6,) is constant as a function of 0, the deci-
Here I( T: H) coincides with the Fisher infor- sion problem is strictly determined, and 6,
mation (- 399 Statistical Estimation). is minimax. If 6, is admissible and r(O, 6,) 3
Suppose that we are given a sequence (Xi, c < x, 6, is minimax. If 0 is a finite set and
X,, ) of independent random variables inf,sup,r(@ 6) < cx), the decision problem is
whose distributions have as their density either strictly determined and there exists a least
(.A ,f2,. .I or (sl, g2,. .I, that is, .A and gi are favorable a priori distribution. We have few
candidates for the density of distribution of Xi general results about generalized Bayes solu-
(i = 1,2, ). A method to determine which of tions (- 173 Game Theory; 161).
the sequences (,fi), (gi) actually corresponds to
(Xi) is given by the Kakutani theorem. Let F be
the distribution of (X, , X2,. ) when each Xi is References
distributed according to ,fi, and let G be that of
(X,, X2, ) when each Xi is distributed accord- [l] A. Wald, Statistical decision functions,
ing to gi. To see how X,, X2,. are actually dis- Wiley, 1950 (Chelsea, 1971).
tributed, we assume that the loss incurred by an [2] L. LeCam, An extension of Wald’s theory
incorrect decision is 1 and the loss incurred of statistical decision functions, Ann. Math.
by a correct decision is 0. Denote such a deci- Statist., 26 (1955) 69981.
sion problem (dichotomy) by A. Then we gen- [3] R. R. Bahadur, Sufficiency and statistical
erally have L(A)cl, where I= {(x,y)lO<x, decision functions, Ann. Math. Statist., 25
y < 1). A necessary and sufficient condition for (1954) 423-462.
1487 399 c
Statistical Estimation

[4] H. Kudo, On minimax invariant estimates include in statistical estimation the problem
of the transformation parameter, Nat. Sci. of predicting the tolerance region in which the
Rep. Ochanomizu Univ., 6 (I 955), 3 t-73. value of a yet unobserved random variable
[S] S. Kullback, Information theory and sta- may come out.
tistics, Wiley, 1959.
[6] D. H. Blackwell and M. A. Girshick,
B. Point Estimation
Theory of games and statistical decisions,
Wiley, 1954.
In the method of point estimation for a given
[7] L. Weiss, Statistical decision theory,
parametric function g, we choose a measurable
McGraw-Hill, 196 1.
mapping q from the sample space (,“x, %) into a
[S] H. Raiffa and R. Schlaifer, Applied statis-
measurable space (&‘, 6) and state that “the
tical decision theory, Harvard Univ. Graduate
value of g(0) is p(x)” for an observed value x,
School of Business Administration, 1961.
where -d is a set containing the range of g and
[9] W. James and C. Stein, Estimation with
K is a tcomplete additive class of subsets in &.
quadratic loss, Proc. 4th Berkeley Symp.
The mapping cp, or the random variable q(X)
Math. Stat. Prob. I, Univ. of California Press
taking values in the space .vY, is called an
(1961), 361-379.
estimator of g(O), while the value q(x) deter-
[lo] J. 0. Berger, Statistical decision theory,
mined by the observed value x is called an
foundations, concepts, and methods, Springer,
estimate of g(0). This estimate is sometimes
1980.
termed a nonrandomized estimate in contrast
[l I] R. H. Farrell, Weak limits of sequences of
to the following generalized notion of esti-
Bayes procedures in estimation theory, Ap-
mator. A mapping from 2 to a set of proba-
pendix, Decision theory, Proc. 5th Berkeley
bility distributions defined over (.&‘, a) is called
Symp. Math. Stat. Prob. I, Univ. of California
a randomized estimator, which reduces to a
Press, 1967, 83-l 11.
nonrandomized estimator when each image
distribution degenerates to a single point. We
assume that .d = R and 6 = the class 23 of all
+Borel sets in R, unless stated otherwise. We
399 (XVlll.7) denote the texpectation and +variance with
Statistical Estimation respect to P, by E, and V,, respectively.

A. General Remarks C. Unbiasedness

Statistical estimation is one of the most im- An estimator q(X) of g(0) may not be exactly
portant methods of statistical inference (- 401 equal to g(0) for any OEO except for trivial
Statistical Inference). Its purpose is to estimate cases, but could instead be stochastically
the values of iparameters (or their functions) distributed around it. An estimator p(X) is
involved in a distribution of a statistical ‘popu- said to have unbiasedness if it is stochastically
lation by using observations on the popula- balanced around s(0) in some sense, such as
tion(- 396Statistic). Let:Y={P,IOEO) bea mean, median, or mode. A statistic Q(X) is
family of tprobability distributions, indexed by called a (mean) unbiased estimator of g(0) if
a parameter 0 and defined over a +measurable
4ddW) = d4
space (i.e., sample space) (2, a). Let X be a
+random variable taking values in .r and dis- for any OE 0. A parametric function 9 is said
tributed according to a probability distribu- to be estimable if it has an unbiased estimator.
tion P that is a member of .Y. Statistical esti- For example, the sample mean is unbiased for
mation is a method of estimating the +true the population mean: E,(X) = E,,(X) for any
value of the parameter 0 (i.e., the parameter 0 0 E 0. Unbiasedness usually implies mean un-
such that P = Pf,) or the (true) value q(O) of a biasedness, and we assume this unless stated
given parametric function (1 (i.e., a function otherwise. The function
defined over 0) or both, at 0, based on the
observed value x of the random variable X.
The function y maps the parameter values into is called the bias of the estimator q(X). If we
R, Rk, or some function space. Statistical esti- restrict ourselves to unbiased estimators q(X)
mation methods are classified into two types: only, it is best to choose, if possible, a q(X)
point estimation, which deals with individual whose variance V,((p(X)) is minimum uni-
values of s(O), and interval (or region) estima- formly for every OE 0.
tion, by means of which regions that may con- Theorem (Rao-Blackwell). If T= t(X) is a
tain the value g(0) are considered. We can also +sufflcient statistic, then for any unbiased
399 D 1488
Statistical Estimation

estimator q(X) of g the tconditional expecta- t, then there exists a uniformly best median
tion $(t) = E((p(X) 1T= t) yields another un- unbiased estimator of 0. Actually, if 0 = 6(t) is a
biased estimator q*(X) = $(t(X)) of g, which solution of F(t,O)= l/2, then cp(X)=O^(t(X)) is
satisfies V,(cp*) < I$(cp) for all fle@, with the such an estimator.
equality holding if and only if q(x) = q*(x) If for any 0~0 the +mode of the density
(a.e.9). The notation a.e.8 means that the function or the probability mass function of an
statement concerned holds with probability 1 estimator Q(X) is equal to g(O), then q(X) is
with respect to P0 for each 0~0. An estimator called a modal unbiased estimator of g(0).
cp of g(O) is called a uniformly minimum var-
iance (or UMV) unbiased estimator if cp is
unbiased for g(0) and has a minimum variance D. Lower Bounded of the Variance of an
uniformly in 0 among the class of unbiased Unbiased Estimator
estimators for g(O).
Theorem (Lehmann-Scheffk). If T is a sufli- When there does not exist a sufficient and
cient and tcomplete statistic, then for any complete statistic, we can still seek. to minimize
estimable parametric function g(O), there the variance of the (mean) unbiased estimator
exists a unique UMV unbiased estimator at every fixed point O= 0,. In the remainder
of g(Q) that is a function of T. For exam- of this section, B is assumed to be dominated
ple, suppose that X = (Xi, X,, . , X,) is a ran- by a measure p, and p,(x) denotes the density
dom sample from a population with exponen- function of PO with respect to p and Q(X) =
tial type distribution PO with density p,(x) = p,(~)/p,~(x). The following theorem guaran-
p(O)u(x)exp(Cf,, ai(0)ti(x)) with respect to tees the existence of the locally best unbiased
Lebesgue measure and that the set {(a,(O), , estimator.
a,(O)) 1OE 0: contains some open set of Rk. In Theorem (Barankin). Let &? be the set of all
this case, T=(t,(X), , tk(X)) is a sufficient unbiased estimators of a parametric function
and complete statistic, and hence every real- g(O) with finite variance at 0 = 0,. 4ssume that
valued measurable function $(T) is the unique ~62 is not empty and Es,((ns(X))‘) < co. Then
UMV unbiased estimator of the parametric there exists an estimator ‘p,, in JZ that mini-
function EB($( T)). If for any OE 0 the tmedian mizes the variance at 0, within &‘. Actually,
of the distribution of an estimator q(X) equals { cpO} = JZ n Y,,, where Y,, is the linear space
a real parametric function g(0) when X is dis- generated by { rtO(x) 1OE 0). The minimum
tributed as PO, i.e., if variance is given as follows:

then q(X) is called a median unbiased esti-


mator. For example, a sample median (suitably
defined for the case of an even number of
samples) is median unbiased for the popula-
tion median. If q(X) is a median unbiased
estimator of g, then for any real-valued mono- =sup,, f: f: !l(ei)h(ej)nij )
tone function h, an estimator h(p(X)) for 1 i=l j-1

/Q(O)) is median unbiased, that is, median where h(O)=g(O)-g(O,J and i’jis the (i,j)-
unbiasedness is preserved under monotone component of the inverse of the II x n matrix
transformations, which is not the case with (lij) with iij= E,p(~B,(X)~,j(X)) and where the
mean unbiasedness. supremum sup1 ts taken over all positive in-
Restricting our consideration to the class of tegersn,O,,..., O,~Oanda, ,..., a,~R,and
all median unbiased estimators, we can use the becomes sup,, when the supremum is taken
function over n and the (3,. This theorem leads to the
following three theorems with respect to the
lower bound of the variance of an unbiased
estimator. The first is immediate and the last
as an indicator of the behavior pattern of an two are obtained by replacing sorne order
estimator cp. The estimator cp that minimizes differences of rr8 with the corresponding
a(u, 0, cp) for all values of u and 0 (u # 0) is said differentials.
to be uniformly best. This property is also Theorem. For any unbiased estimator q(X)
preserved under monotone transformations. of g and 0, E 0, we have
Theorem (Birnbaum). If a family of distri-
%O((P(X))asuP {(s(e)-g(e,))2/E,,(~,(X)
butions {PO 10 E 0 c R} has a monotone +like- *t*
lihood ratio with respect to a statistic t(x) and
- ~oo(m2~
the distribution function F(t, 0) of T = t(X) is
continuous both in t for any 0 and in 0 for any (Chapman-Robbins-Kiefer inequality).
1489 399 E
Statistical Estimation

Theorem. Suppose 0 c R. For any unbiased q(X) of g(%) and under similar conditions to
estimator q(X) of g and under certain regular- those for the l-dimensional case, we have
ity conditions, we have

where g,‘(%o)=ag(%)/a%ilB=O, and Jij is the (i,j)-


(Cram&r-Rao inequality), where the equality component of the inverse of the matrix (Jij)
holds only for the exponential type distri- with
bution p,(x)=j(%)u(x)exp(a(O)cp(x)). An ex-
J,= E~,(~logP~(X)l~~i I ~=~,~logP~(X)l~Qjl~=~,)~
ample of such regularity conditions is (i)-(iii):
(i) ER,((ns(X))‘)< co for all 0~0; (ii) p,(x) has If %*(X)=(%:(X), , Q(X)) is an unbiased
a partial derivative p:(x) at %= 0, (a.e. P,J; estimator of %= (0,) . . , 0,)’ (i.e., Ee(6T(X)) = 0;
and (iii) for i = 1,2, . . , k), then the tcovariance matrix
V,&%*(X)) of O*(X) at B. satisfies I&(%*(X))>
lim EOO
M-0
[(
pBo+‘w(xh?ow)
ps,(X)A%
P&(X)
PO&X) >I* =o, J-l, that is, the difference VJ%*(X))-
nonnegative
J-’ is a
definite matrix. If X = (X,, . , X,)
is a random sample from a distribution having
Corollary. Let X =(X1, ,X,) be a random
density ,f(x,, %), then by setting
sample from a distribution with density .f’(x, %)
and let 1, = E,,,(d lOg.f(Xl> @/a%i I H=@,

x a l”gftXl >41aej I8=8J


Since E,((d logpO(X)/a%)*) = n1(%), the CramCr- we have J, = nl,. The matrix I =(lij) is called
Rao inequality implies the Fisher information matrix of the distri-
bution f(xl, %).

The number I(0,) is called the Fisher infor-


mation of the distribution f(x, %). When the E. Decision-Theoretic Formulation
equality holds for an unbiased estimator q(X), (- 398 Statistical Decision Functions)
Q(X) is called an efficient estimator of g(0). In
general, the efficiency of an unbiased estimator Let W(%, a) ( 2 0) be the loss incurred from an
cp at 0 = 8, is defined by estimate (or action) a of the parameter when
Wcp) = (s'(~o))'lW(~o) Y&P)).
the true value of the parameter is 0. The risk
function of an estimator q(X) of the para-
Theorem. For any unbiased estimator q(X) metric function g(0) is then defined as
of g(B) and under certain regularity conditions.
we have 46 cp)= E,W(Q> cp(W)).
Statistical decision theory deals with the prob-
lem of minimizing, in an appropriate manner,
the risk function by a suitable choice of cp. The
(Bhattacharyya inequality), where g(i)(%o)= notions of complete class, Bayes estimator,
d’g(O)/d%‘) 8=B,, and K’j is the (i,j)-component admissibility, minimax estimator, and invar-
of the inverse of the matrix (K,j) with iant estimator, explained here and in Sections
F-I, are the most important of the theory. The
K, = EOO(ss), i,j=l,..., k unbiased estimator explained in Section C
may also be considered an important concept
An example of such regularity conditions is of the theory.
(i)-(iii): (i) Es,((ns(X))*) < co; (ii) p,(x) is k-times A class C of estimators is said to be essen-
differentiable with respect to 0 at 0 = 0,; (iii) tially complete if for any estimator cp there
the ith partial derivative pgA(x),i < k, satisfies exists an estimator ‘p. in C such that

lim EOO =o, f-(0, cpo) d r(O, cp)


LW-0
for any %E 0. The following two theorems
where APO(X) IsZe, = P80+A8(X) - P80(X) and hold, provided that the action space .d is R
AiPO(X) = A(A”P,(X)) for i > 2. For k = 1 the and the loss function VV(%,a) is convex with
Bhattacharyya lower bound is the same as the respect to a~& for any OEO.
Cram&-Rao lower bound. In general, the Theorem (Hodges-Lehmann). If w(%, a)-+ GO
former gives a sharper lower bound than the as )aJ + cc, then the class of all nonrandomized
latter. estimators is essentially complete.
If the parameter is multidimensional, %= Theorem. If T= t(X) is a sufficient statistic.
(0, , , %J, then for any unbiased estimator , then the class of all functions of T is essentially
399 F 1490
Statistical Estimation

complete. Actually, given any estimator p(X) G. Admissibility of Estimators


of g(O), the conditional expectation tj(t)=
An estimator (pO(X) of a parametric function
E(rp(X)I T=t) yields an estimator C&X)=
g(0) is said to be admissible if and only if for
$(t(X)) satisfying r(U, qo)< r(U, cp) for any 0,
any estimator q(X) of g(0) the inequality
where the equality holds when and only when
r(U,cp)<r(U,cp,) for all 0~0 implies that
47= pO, provided that W is convex in the strict
sense.
A loss function of the form
for all 0~0. If an estimator is the unique
W(U,U)=~.(U)(U-~(U))~, i(U)>O, Bayes estimator relative to some a priori
distribution, then it is admissible. For exam-
is called a quadratic loss functions. If, in partic- ple, let Xi, , X, be a random sample from
ular, i(0) = 1, then N(U, l),and let W(U,u)=(u-0)'. Then q(X)
40, cp)= E,((dX) -d@Y) = (c + r&X)/( 1+ nr?) is the unique Bayes
estimator relative to the prior distribution
is called the mean square error of the estimator N(c, rr2) of 8, where X is the sample mean.
q(X) of g(0). This error coincides with the Hence any estimator of the form r,,(X) = UX + h
variance when the estimator is unbiased. is admissible when 0 <a < 1 and --nc, <h < cx).
In the rest of this section and the next sec-
tion, we restrict ourselves to quadratic loss
functions. If a statistic of the form cp(X) with
F. Bayes Estimators real c is an admissible estimator of g(U), then

Let 5 be an a priori distribution over the


parameter space 0 associated with a certain
to-algebra 3, and assume that r(U, cp) is 3- Theorem (Karlin). Let X be a random vari-
measurable for every 4”. Denote by EC the able having a 1-dimensional exponential type
average operator relative to 5. The infimum distribution dP,(x)=~(0)es”&(x) with a param-
of the average risk r(&cp)= E<(r(U,cp))= eterspaceO=Z(_H, U)={UIJ?,Pdp(x)<m)

ErE,( W(U, q(X))) for cp running over its range a closed or open interval, and let g(B) = E,(X) =
is called the +Bayes risk relative to <, while an -/l'(U)/B(U) be a parametric function to be
estimator q(X) of g(0) at which the average estimated. Then the estimator cpJX) = X/
risk r(t, cp) attains the intimum is called a (E.+ 1) for real i is admissible provided that
Bayes estimator relative to 5. A Bayes esti- jf(/j(U))m"dU-x as h-U and ~~(fl(U))m'dU+;73
mator is obtained as follows: Assume that 9 as u-tf) for any c~(d,U).
is dominated by a measure p with B x 3- Corollary. When 0 =( -“o, co) the estimator
measurable density pH(x), the loss function cpO(X) = X is admissible.
W(0, a) is 3 x K-measurable, and Corollary. Let 0 =(-co, co), and assume
that both intervals (--cc, 0] and [O, x) have
positive measure with respect to p. Then every
P,(X)d5(4 < Q’.
s0 estimator of the form q(X) = uX with 0 <u < 1
is admissible. This theorem can be applied to a
For each observed value of x, the Bayes esti- random sample Xi, , X, drawn from an
mator v(x) takes the value a that minimizes exponential type distribution because the
r(a 1x)= Ei(W(U,a)Ix) sufficient statistic X = C X,/n has an exponen-
tial type distribution and E,(X) = E,(X,).
= WfL U)P(O I x)@(@, Theorem (Karlin). Let X be a random
s0 variable having a distribution 0,(x) = q(U).
r(x) dx for 0 d x < 0 and = 0 otherwise, with the
where ~(0 ( x) is the probability density, with
parameter space 0 = (0, co), where st r(x) dx <
given x, of 0. We call r(a 1x) the posterior risk.
+co and s; r(x)dx= co. Among the estima-
Theorem (Girshick-Savage). Suppose that
tors of the type Q,(X) = c(q(X))-” for g(U) =
the loss function is quadratic. For any x the
(q(U))-< with c(> 0, only one estimator with the
value of the posterior risk is either CC (for
value c = (2~ + l)/(~ + 1) is admissible. This
every value of a) or finite (for all or only one
theorem is applicable also when the size of the
value of u). If the Bayes risk relative to t is
random sample is larger than 1.
finite, then a Bayes estimator p*(X) relative to
Theorem (Stein). Let Xi, , X, be a random
5 is determined uniquely as follows: q*(x) = a,
sample from a univariate distribution d!‘,(x) =
if r(u, I x) < zz for only one value a,, whereas
f(.u - U)dx with a location parameter 0. Define
~*(x)=EF(g(0)E.(O)(x)/Er(E.(O)(x)ifr(u(x)<r3
for every a. If E'(l.(U))< CO, then a Bayes esti-
A,=A,(x,.....xn)=[;T Uk(nf(x,-U))dU
mator is either biased or has average risk zero.
1491 399J
Statistical Estimation

for k=O, 1, 2. If 35 (1964)) proved that Huber’s minimax robust


location estimator minimizes the maximum
asymptotic variance over some family of sym-
metric distributions in a neighborhood of the
normal distribution (- 371 Robust and Non-
parametric Methods).

then the Pitman estimator (p&X,, , X,) =


,4,(X,, . , X,$&,(X,, . , X,) of the parameter I. Invariant Estimator
0 is admissible.
For simplicity, assume that the range A of a
Inadmissibility of the Usual Estimator for parametric function g(0) coincides with the
Three or More Location Parameters. Let X = parameter space, and consider the point esti-
(X,, , X,y be a k-variate normal random mation of g(B) = 0 (0 is not necessarily real).
variable with mean 0 = (0, , , 0,)’ and covar- Suppose that there exist two groups G = (7)
iance matrix I, the identity. Then the Pitman and G = {t} of one-to-one measurable trans-
estimator of 0 is 6=X. However, Stein showed formations of !X and 0, respectively, onto
that X is inadmissible. It is strictly dominated themselves such that (i) there exists a homeo-
by the estimator @*(X)=(1 -(k-2)/(X(‘)X, morphic mapping r-t? from G to G; (ii) if X
where 1.1 denotes the Euclidean norm IX)‘= has a distribution Pe, then ZX has the distri-
&X/. That is,ifk>3, E/0*(X)-01’< bution Pi@; and (iii) W@O,za) = W(0, a) for
EIX-Uj2 for any 0. any 0, a, and z. An estimator cp is said to be
An estimator such as O*(X) is called Stein’s invariant if it satisfies q(zx)=Tcp(x) for any
shrinkage estimator (James and Stein, Proc. 4th 7 E G, a.e. 8. An estimator cp is called a best
Berkeley Symp., 1 (1960)). invariant estimator if the risk function r(N, cp’),
where cp’ is an invariant estimator, takes its
minimum value when cp’= cp.
H. Minimax Estimation
If the group G is ttransitive on 0, then the
risk function of any invariant estimator is
An estimator (p*(X) is said to be minimax if
independent of the value of 0, and hence any
and only if
admissible invariant estimator is best invar-
supr(O, q*)=infsupr(O, cp). iant. For example, in the point estimation of a
n ‘p n location parameter with a quadratic loss func-
If an estimator q* is admissible and the risk tion, the Pitman estimator is best invariant.
r(0, q*) is constant with respect to 0, then ‘p* is Theorem. If 0 is a compact topological
minimax. space, c is a group of homeomorphisms of 0
Theorem (Hodges-Lehmann). A Bayes esti- onto itself, and G 1s homeomorphic to 0
mator ‘p* relative to an a priori distribution under gE&&,EO with a fixed &,E@, then
t is minimax if 5 assigns the whole proba- any Bayes estimator relative to the +right-
bility to a subset w c 0, r(0, q*) is constant invariant Haar measure over G is best in-
(say, c) for OEW, and r(Q, cp*)<c for 0~0. variant. This result can be generalized to a
Let X have a binomial distribution B(n, O), 0 < locally compact 0 (- 398 Statistical Decision
0< 1; I) is unknown. If the prior distribution Functions).
of 0 is a beta distribution b(&/2, m), then
the Bayes estimator is T*(X) = (X + &/2)/(n +
&), which has constant risk ER( T*(X) - Q2 = J. Sequential Estimation
(2( 1 + $))-’ for all 0,O < 0 < 1. Thus, ac-
cording to this theorem, T*(X) is minimax. It Estimation methods based on sequential sam-
is interesting to compare the mean square pling are not as popular as tsequential tests,
error of T*(X) to that of minimum variance because their efficiency is not very large com-
unbiased estimator o^= X/n, O( 1 -0)/n. pared to that of nonsequential estimation. A
Theorem (Wald). If there exists a sequence generalization of the Cram&r-Rao inequality to
of prior distributions { <,} such that any sequential unbiased estimator q(X) of a
parametric function g(0) is the Wolfowitz
inequality,

for any 0~0, then (p* is minimax. For exam-


ple, the last theorem led to the proof of the for every 0 E 0, under regularity conditions
fact that the Pitman estimator of a location similar to those for the fixed-size sample prob-
parameter is minimax 141. In the discussion of lem, where N is the sample size and I(0) the
robust estimation, Huber (Ann. Math. Stutist., Fisher information.
399 K 1492
Statistical Estimation

K. Asymptotic Theory mators. An estimator {cp,} is said to be asymp-


totically normally distributed if the asymptotic
In practical problems of statistical inference distribution of n”‘(q,-g(8)) is normal:
the sample size n is often large enough to give
-LP[n”‘(cp,-g(8))1e]-N,(~(8),v(O)) as n-co.
sharp estimates of the parameters involved;
then the sample distributions of estimators can ~(0) and u(6) are called the asymptotic bias and
be approximated closely by their asymptotic asymptotic covariance matrix, respectively.
distributions, which are of a simpler nature. They are not always equal to the limits, if any,
Assume that X =(X1, X,, . ) is a sequence of of the mean and covariance matrix of n”*(cp,-
independent and identically distributed (i.i.d.) g(0)). { cp,} is usually called a consistent and
random variables with the common distri- asymptotically normal (CAN) estimator if the
bution Ps, 0~0. For each n, let cp,= ‘p”(X,, asymptotic distribution of n112(q, -g(O)) is
“‘> X,) be an estimator of g(0) that is a func- normal with the asymptotic bias zero. Then
tion of 0 to & (c RP). Thus cp, is a measurable the distribution 14((p, IO) is approximated
mapping from (%“, an) to (&, %). Let us de- by N,(O, u(O)/n). For example, the *moment
note the distribution of (P” by Y(cp,), Y((p,) 0) method estimator is a CAN estimator.
or Y(cp, 1P,), the last two emphasizing that the Theorem. Let {cp,} be a CAN estimator of
underlying probability distribution is Ps. For g(O)ER’ with asymptotic variance u(0). Then
example, if the mean vector E,(cp,) = m,(O) and
liminfE,{n~cp,-g(~)~2}~u(~) for every 0cO.
the covariance matrix u,,(O)= V,(cp,) exist for “-CJZ
every n, and if ~[u,(O)-“‘(~~,(X)--~(O))I O]+
Theorem. Suppose that g(0) is a continu-
N,,(O, I) as n+ co, then Y(cp,, 10) is approxi-
ously differentiable function from 0 (c Rk) to
mated by a p-variate normal distribution
RP (pGk). Let G(0)=(agi(Q)/aQj), the tJaco-
N,,(m,(O), u,,(O)) (- 341 Probability Measures
bian matrix. If {T,} is a CAN estimator of /3
D). (P” is said to be asymptotically (mean)
with asymptotic covariance matrix u(e), then
unbiased for g(H) if m,(O)+g(O) for any a~@ as
{ g( T,)} is a CAN estimator of g(Q):
n+co. But we often calculate the asymptotic
distribution without obtaining the exact mean
and covariance matrix of the estimator cp, for
each n. In the asymptotic theory it may be An estimator {T,} is said to be a best asymp-
reasonable to regard the sequence of estima- totically normal (BAN) estimator of B if {T,}
tors {cp,} rather than each estimator cp, as an is a CAN estimator of 0 with asymptotic
“estimator,” but we do not bother with the variance I-‘(e), where I(Q) is the tFisher in-
difference between these definitions of an formation matrix on 6’ in a single observation.
We can see that the maximum likelihood (ML)
estimator.
estimator (- Section M) is a BAN estimator.
Consistency. { cp,} is called a consistent esti-
mator of g(0) if (P” converges to g(0) in proba- Functional on Distribution Functions. Let q(F)
bility as n-co: be a functional on distribution functions to R’.
limP,{Icp,-g(Q)/>E}=O foranyE>Oand Let us consider the class of estimators that
are defined by 50, = cp(fi”), where pfi is the tern-
every OE 0. pirical distribution function of n samples X,,
If the convergence is almost sure, it is called . . ,X,. An estimator {cp,j with (P” = cp(~$)
a.s. consistent. For example, if cp, is asymptoti- for each n is said to be Fisher consistent for
cally unbiased with the covariance matrix u,,(O) g(0) if cp(F,)=g(Q) for every OEO when F, is
such that lu,(O)l+O as n+co, then cp, is a the true distribution function. {(pm), is also a.s.
consistent estimator of g(0). A sufficient con- consistent for g(Q) if {cp,} is a Fisher consistent
dition for exist\ence of a consistent estimator is estimator of g(0) and if cp is a contmuous func-
given by the following result. tional. Furthermore, if cp is differentiable, we
Theorem (LeCam). Let !Z be a Euclidean can see that {cp,} is also a CAN estimator by
space and !I3 the a-algebra of all Bore1 sets in using the fact that n”‘(&Fi’(t))- t), O<t< 1,
.F. If the parameter space 0 is a locally com- converges weakly to the tBrownian bridge.
pact subset of Rk, Ps # Pep for any 0 # 8’ (identi- Let S be a set of distribution functions. S is
fiability condition), and P,“-+P, whenever 0,-r said to be a star-shaped set of F if HE S
0, then there exists a consistent estimator of 0. implies F(‘)=(l -t)F+ tHeS for any t~[O,l].
cp, = g( T,) is a consistent estimator of g(0) if Theorem (Von Mises). Assume that
{T,} is a consistent estimator of 0 and g(0) is a (Fl) there exists a star-shaped set S, at F such
continuous function of 8. that lim, Pe{pn~Ss} = 1;
(F2) for any t~[0, l] and HE&, there exist
Asymptotic Normality. The class of estimators derivatives (d’/dt’)cp[(l -t)F+ tH], i= 1, 2;
is restricted to what are called consistent esti- (F3) there exists &(y) from R’ to R’ such that
1493 399 M
Statistical Estimation

$[(I-f)&+tH]I,=,=
m
s-oo ~~YMH(Y)
L. Moment Method

The moment method is also utilized to obtain


- F,(y)) for all HE S,,; and
estimators. Suppose that K c R’ and 0 c Rk.
Denote the tpopulation distribution function
for any 6 and a>O,where F#=(l -t)F~+&~. by FB and the iempirical distribution function
Then if 40, with (P” = q(p”‘,) is Fisher consistent of n samples Xl,. , X, by p”. The following
for g(O), we have system of simultaneous equations is derived

=sxj&(dx)
Pj(0)
=E,(XJ)
from letting thejth tpopulation moment

where
m
u(0) = -n 441(Y)2dF,(Y) - m be equal to thejth tsample moment
s {S --io
mnj=n” 5 Xij= x’fl”(dx).
i=l s
{c,}-Consistency. For a sequence of positive
numbers c, tending to infinity as n-+co, an For example, for j = 1, , k,
estimator T, is called consistent for OE 0 with
cc(O)=(k(O), . , c(~(O))‘= (m,, , , m,,Y = m,.
order c, (or {c,}-consistent for short) if for
every a > 0 and every 0 E 0 there exist a sufti- A moment method estimator is determined as a
ciently small positive number 6 and a suffi- solution O=&(X)E@ of k numbers of simulta-
ciently large number K satisfying neous equations.
Theorem. Assume that the function ~(0)
n-m from 0 to Rk is continuously differentiable and
that the Jacobian matrix M(O) = a~(@)/80 =
Let {c,,} be a maximal order of consistency.
(8pi(0)/aQj), i, j= 1, , k, is nonsingular in a
This notion was introduced by Takeuchi and
neighborhood of the true parameter. Then the
Akahira. They studied consistent estimators of
moment method estimator exists and is a
location parameters with various orders. Let
CAN estimator:
Z = 0 = RI. Suppose that for every HE 0, Ps
has a density function f(x - 0) with respect to
the Lebesgue measure.
where u(Q)=(cov,(X’, Xj)), i, j= 1, . . . , k. In
Theorem. Assume that
general, a moment method estimator is not
(OCl)f(x)>Oifa<x<bandf(x)=Oifx<a
a BAN estimator. However, in view of its
or x,<b;
simple form, a moment method estimator is
(OC2) there exist positive numbers 0 < c(< b <
important and often utilized as a first-step
m and 0 <A’, B’ < co such that
estimator in order to determine the maximum
likelihood estimator by the iteration method.

lim (b-x)‘-fif(x)=B’;
x-b-O M. Maximum Likelihood Method
(OC3) f(x) is twice continuously differentiable
in the interval (a, b) and there exist positive Suppose that a distribution Ps has the density
numbers 0 < A”, B” < cc such that function f(x, 0), OE 0 c Rk, with respect to a +rr-
finite measure p, and let xl, ,x, be observed
x~ho(x-a)‘-=~f’(x)(= A”, values of random samples X, , . , X, from the
population f(x, 0). Then the function L, of 0
lim (b - x)‘-O I,f’(x)I = B”. defined by
x-h-O

Assume further that f”(x) is bounded if ~2 2. L,(O;Xll...,X,)=~f(xi,o)


Then for each CIthere exists a consistent esti- i=l

mator with the order given in Table 1. is called the likelihood function. If 0 = &(x, ,
“‘2 x,) maximizes the value of L,(O) for fixed
Table 1 x 1, , x, and if it is a measurable mapping
{c,}-consistent from (X”, W) to (0, ‘6’) with W a to-algebra
c! order c, estimator of subsets of 0, then &(X)= &(X1, , X,) is
called the maximum likelihood (ML) estimator
o<ct <2 nl/a { min Xi + max Xi -
of 0. This method of finding estimators is
(a+b))/2 called the maximum likelihood method. If the
r=2 (n log .p* ML estimator parameter is transformed into a new para-
E>2 n l/2 ML estimator meter n = h(0) by means of a known one-to-
399 M 1494
Statistical Estimation

one bimeasurable transformation h and if


(AN3) I$,( I$logj”(x.D))i)< co for OEO”;
there exists a unique ML estimator 8,, of 0,
then Q,,(X)= h(&(X)) is a unique ML estimator (AN4) the Fisher information matrix I(H) =
of q. In other words, the ML estimator is in-
variant for every one-to-one transformation. i, [ (Al”gf(x, 0)) ($logf.(x, O):J] exists and
Many statisticians have investigated and is positive definite for 0~0’; and
improved known adequate regularity con- (AN5) there exists an H(x) such that
ditions under which the ML estimator exists
and is a BAN estimator. $logf(x,II) <H(x) and E,(H(X))<C, a
Theorem (Wald). Assume that
constant for 0~0’.
(Cl) 0 is a closed subset of Rk with nonempty
Then the maximum likelihood estimator is
interior 0”;
a BAN estimator: Z’[n”‘(& - 0) I 0-t
(C2) for any XE.~, f‘(x, 0) is continuous with
N(O,I(O)-‘) as n’m for 8~0’. Note that
respect to II and lim,,,,,,f(x, 0) = 0 if 0 is not
under assumption (ANl) the likelihood func-
bounded;
tion attains the maximum in 0’ with proba-
(C3) if 0, #O,, then PSI # PSI and s ]S(x, 0,) -
bility tending to 1 as n+ co if the true value
fk 0,) &b) > 0;
of the parameter exists in 0’. Hence the ML
(C4) &,(llogfW, O,)l)< a; and
estimator is determined as a root 0 =8,(x,, ,
(C5) &o(log ‘fW, 0, ~1) < ~0 and 4,0(hzicp(X, 4)
x,) of the likelihood equation with the same
<co,where,f(x,O,p)=sup{f(x,0’);(0’-H(~p)
probability as above:
and cp(x, r) = sup{ f(x, 0); IO]> Y}. (The last
two functions are measurable according to
=o.
assumption (C2).) j=l,...,k
Then if a sequence of measurable functions,
We also call n-‘8logL,,(H)/c?O the likelihood
{0,(x r , , x.))., satisfies
estimating function. The essential fact used in
L,(Q;x,,...,x,) the proof of the above theorem is the asymp-
lim inf 2 C> 0 (a.s. P,J,
n-x L(~o;x,,...,x,) totic equivalence of the ML estimator and
the likelihood estimating function:
then as n--, co, 0,(X,, . . . , X,) converges a.s. to
the true value 0, of the parameter. Hence if A,(O)-I(O)~I”~(&@~~ in PO as n+co,
the ML estimator exists, it is as. consistent.
where A,(0)=n~1’2a10gL,(0)/8Q. Note the fact
Pfanzagl (Metrika, 14 (1969)) and Fu and
that
Gleser (Ann. Inst. Statist. Math., 27 (1975))
gave rigorous proofs for the existence of the Y[A,(Q)IH]~N,(O,I(f!I)) as n-cc
ML estimator.
holds according to the central limit theorem
Theorem. Under assumptions (Cl) and (C2),
(- 250 Limit Theorems in Probability Theory
there exists a maximum likelihood estimator
B (1)).
8” for any positive integer n. That is, & =
&(x,, ,x,) is a measurable function from
(.V. !B”) to (O,%) and satisfies L(i),; x,, , x,J Contiguity. We now turn to the situation
= sups L(0; x, ) , XJ. where we need asymptotic distributions under
In the remainder of this section we suppose the alternative distribution Po+n-l:2h with 0+
that assumptions (Cl)-(C5) are satisfied. We n-“‘hE@ in estimation as we do in testing
use the notation hypotheses (- 400 Statistical Hypothesis
Testing). The notion of contiguity, due to
?f
1= a k-column vector, LeCam (1960) is basic for the asymptotic
(‘0 methods of estimation theory. We consider
sequences {P,) and {PA} of probability mea-
g=(&,(lx,Q), a k x k matrix, sures on (!~‘,‘B“) with the +Radon-Nikodym
derivatives p, and pi with respect to a a-finite
a3 log f measure, such as P, + Pi. Denote by x,, =
r=(‘;;;;;i”)> h,i,j=l,...,k. A[P;; P,] a generalized log-likelihood ratio
that is defined by logp’,/p, on the set { p,pL
Theorem (Cram&r). Assume that > 0) and is an arbitrary measurable function
(AN 1) for a.s. [p] x, f(x, 0) is three-times con- on the set { p,pl, = 0). Let {B,} be any sequence
tinuously differentiable with respect to each of {%‘}-measurable sets, and let { T,} be any
component of 0=(0,, . . ..@J’E@~. sequence of {23”}-measurable functions. A

a2.Kx,
0)
-
(AN2) for OE@‘, mf(x> 0) sequence of distributions {Z[T’,J PJ} is said to
aodp=O

s
be relatively compact if every subsequence
in’} c {n) contains a further subsequence
and -dp=O;
,r au2 (m) c {n’} along which it converges to a prob-
1495 399 N
Statistical Estimation

ability distribution. In the Euclidean space this section we assume (Cl)-(CS) and (ANl)
relative compactness is equivalent to tight- (AN5) stated in Section M.
ness: that is, for every E> 0 there is a b(~) such
that P,,{ 1T,] > h(a)} <E for every n. BAN Estimators. We suppose that the para-
Theorem. The following statements are all meter space 0 is a subset of R’ in this para-
equivalent. graph. We restrict our attention to the class
(1) For any { 7;,}, T,+O in P,, if and only if T,+ of CAN estimators {T,} of the real-valued
0 in PJ. parameter 0 for which Y[n”‘( Y&,- Q) IQ] +
(2) For any {T,), (Z[T,I P,]} is relatively N(0, u(0)) as n+cZI. Fisher’s conjecture con-
compact if and only if (YpCT, 1Pi]) is relatively cerning the lower bound to asymptotic var-
compact. iance u(0) of any CAN estimator is
(3) For any (B,), P,{B,) -0 if and only if
P;{B,}+o.
(4) Whatever the choice of x,, {U[x, ( P,] 1 and where I(0) is the Fisher information on 0 in a
{ Y[,Y, ] P,‘] } are relatively compact. single observation. The asymptotic efficiency
(5) Whatever the choice of xn, { .,rU[x, 1P,,] ) is of a CAN estimator with asymptotic var-
relatively compact. Furthermore, if {m) c {n) iance u(B) is defined by (u(0)1(0)))‘. A CAN
is a subsequence of {n} such that -le[x,,, 1Pm] estimator with asymptotic variance I(e)-’ is
converges to U[x], then E{e”) = 1. called a BAN estimator or an asymptotically
Two sequences {Pn} and {PA) satisfying efficient estimator. Note that under suitable
requirements (l)-(5) of the above theorem are regularity conditions there always exists an
said to be contiguous. asymptotic efficient estimator, for example a
Theorem. Suppose that {Pn} and {Pi} are ML estimator, although for a sample of finite
contiguous. Let {m} c in} be a subsequence size there exists an efficient estimator if and
such that Y[x,,, T,) P,,,] converges to a limit only if the family of density functions is of the
Y[x, T]. Then p[x,,,, T,) PA] converges to exponential type.
@Y[x, T], where v=@‘Y[x, T] is given by Unfortunately, Fisher’s conjecture is not
v(A)=~,eXd~[~, T]. true without further conditions on the compet-
Now, set P,, = Pl and Pi = P:+,- L,2hfor each ing estimators. A counterexample was pro-
n. Under suitable regularity conditions, such vided by Hodges and reported by LeCam
as assumptions (Cl)-(C5) and (ANl)-(ANS), it (1953). Let {T,,} be any CAN estimator with
is easy to see that {P;) and {Pi+, , z,,} are the asymptotic variance u(0). Consider the
contiguous. At the same time, we can see that estimator
the asymptotic linearity of A,(0 + n-i” h; 0) =
AC&- I 2h; P,“] holds (say) in the vicinity of rT, if
T,‘=
the true parameter as follows: { T, if

A,,(O+~~‘/‘h;0)-h~~,(0)+~h’,(~~)h-O in PO. where O<a< 1 is a constant. {T,‘} is also a


CAN estimatl or with asymptotic variance u’(0)
such that
The asymptotic equivalence of the ML es-
timator and A,(O), and the asymptotic linearity a2u(0) if O=O,
of the log-likelihood function and A,(O), leads u’( 0) =
40) otherwise.
to the regularity (- Section N) of the ML
estimator. Let { T,} be a BAN estimator; then T,’ is an
Theorem. Under suitable regularity con- estimator with asymptotic variance less than
ditions as above, the ML estimator is regular: I(0))’ and is called a superefficient estimator.
Theorem (LeCam). The set of points H for
~[n”2(e,-O)-hIO+n-“2h]~N,(0,1(B)-’), which the inequality due to Fisher fails is of
for any ~~~~ with O+n-“2he0. Lebesgue measure zero. A condition due to
Bahadur leads to the continuity of asymptotic
variance which implies the validity of the
N. Asymptotic Efficiency above inequality.
Theorem (Bahadur). Suppose that {T,,} is a
In Section D we discussed lower bounds of
CAN estimator with asymptotic variance t:(O)
variances of unbiased estimators for finite
satisfying the condition
sample size and defined the efficiency of an
unbiased estimator with variance u,(0) by 1
liminfP,,l+,m 1/2{T”<00+n-“2}<j
(u,(@11(0)))‘. In this section we first discuss n-oo
the asymptotic efficiency of a CAN estimator
or
in the same vein as in the case of finite sample
size. Second, we see a specific approach to the
large-sample theory of estimation. Throughout
399 0 1496
Statistical Estimation

Then the following inequality, due to Fisher, bution of the ML estimator {&} in the ordinary
holds at 0 = 0,: regular case. It follows from the characteriza-
tion L, = N, * GB, that the first two theorems
L’(Oo)> z(e,)-‘.
in this section hold also for regular estimators.
(- LeCam, Proc. 6th Berkeley Symp., 1972,
Regular Estimators. Wolfowitz and Kaufman
and Roussas and Soms, Ann. Inst. Statist.
considered an operationally more justifiable
Math., 25 (1973).)
restriction on competing estimators, called
the uniformity property, stating that, for an
estimator {T,} of 0, Y[nl”(T,-O)IO](y) con- 0. Higher-Order Asymptotic Efficiency
verges to any limit L,()i) uniformly in (y, 0)E
Rk x C, where C is any compact subset of the In Section N it was shown that the ML esti-
interior 0’ of 0 c Rk. The ML estimator { &} mator is a BAN estimator. In general, how-
also has this uniformity property under suit- ever, there exist many BAN estimators. For
able regularity conditions, such as (Cl)-(C5) example, consider the case of a +multinomial
and (ANI)-(ANS) to which some uniform- distribution where probabilities of events are
ity properties are added [29]. We note that parametrized. That is, let X =(nl, , n,)‘, n=
asymptotic variance is not a good measure- n, + + n,, be distributed as a multinomial
ment of asymptotic efficiency unless an estima- distribution M(n; T[I,..., n,),n,+...+7r,=l,
tor is a CAN estimator, and that asymptotic and let L$ be a subset of m-vectors, I& =
concentration is in general a more pertinent {n(O)=(n,(O), . . . . q,,(Q))‘;fl~@}, 0 CR’. De-
measurement. fine fi=(fil, . . . . &J’=(nI/n, . . . . n,/n)‘. Then
Theorem. For an estimator {T,} with uni- we consider (i) the ML estimator, (ii) the
formity property above, it holds that the limit minimum-chi-square estimator, (iii) the mini-
L,,(y) is a probability distribution function and mum modified chi-square estimator, (iv) the
continuous for either one of the variables y or 0 minimum Haldane discrepancy (Ilk) estima-
if the other is fixed and furthermore that the tor, (v) the minimum Hellinger distance (HD)
probability measure L, is absolutely contin- estimator, and (vi) the minimum Kullback-
uous with respect to the Lebesgue measure on Leibler (K-L) information estimalor. These are
Rk. defined as the values of the parameter 0 that
Theorem. The asymptotic concentration of minimize the following quantities, respectively:
the ML estimator {&} about 0 is not less than (i) ML= -logL,= -nn~“=,fiilogxi(0); (ii)
that of any estimator {T,} with uniformity ~~=C~,(nfi~-wr~(Q))~/(nz~(0)); (iii) modX2=
property: For any convex and symmetric set C:=,(nbi-nn,(0))2/(nbi); (iv) Dk=GE1 7Ci(o)k+1/
S c Rk about the origin, 3:; (v) HD=cosY’ C~l(fii7ci(0))1i2; and (vi)
K-L = CE”=, 7ci(H)log(ni(0)/fii). Rao [:32] showed
lim P,jn1~2(&O)~S}>lim Ps{n”‘(Tn-O)Es}.
n-r, n--u that under suitable regularity conditions these
estimators are Fisher consistent and BAN
Schmetterer (Research papers in statistics, F.
estimators.
N. David (ed.), 1966) provided the notion of
the continuous convergence of distributions of
Fisher-Rao Approach to Second-Order Asymp-
estimators of f3 which is weaker than uniform
totic Efficiency. For 0 E 0 c R’, let ps. be the
convergence. An estimator {T,} is said to be
density for n i.i.d. observations x.=(x1, , x,),
regular if
and let qs,, be the density of estimator T,. The
~[n’~2(T,-~)-h~I+nn-‘~2h]~L, as n+co, +Fisher information contained in X and in T,
are defined by nI(O) = E(d log psJdQ2 and
where L, is a probability distribution inde-
n1,“(0)= E(dlogq,,/dQ)2, respectively. Rao
pendent of h with 0+nm1/2h~0. It was shown
defined the first-order (asymptotic) efficient
that the ML estimator {&} is regular under
estimator T, satisfying one of the following
ordinary regularity conditions. Hijek obtained
conditions: (1) n112Idlogp,,/dO-dlogq,,/d@l+O
the following characterization of the asymp-
in probability; (2) I(0) - Zrn(B)+O as n+
totic distribution of any regular estimator and,
a; (3) the asymptotic correlation between
independently Inagaki (Ann. Inst. Statist.
n”2(T,-0) and n1/2dlogp,,/d0 is unity; (4)
Muth., 22 (1970); 25 (1973)) obtained a similar
In”2dlogp,,/dQ-a-fln”2(T,-0)1+0 in prob-
result.
ability. We note that the larger the condition
Theorem. The asymptotic distribution L, of
number (j), the stronger the condution. A tirst-
any regular estimator {T,} is represented as
order efficient estimator is a BAN estimator.
the tconvolution of a normal distribution NC,
Fisher proposed
and some residual distribution G,:
E; = lim (n1(O)-nl,.(@)=!iz V,(dlogp,,/dO
LB=N,,*GO, “-CC
where N,,= Nk(O, I(O)-‘), the asymptotic distri- -dlwq,,ld@
1497 399 P
Statistical Estimation

as a measure of second-order (asymptotic) k = 1, 2, . , a +jc,}-consistent estimator { T,J is


efficiency to distinguish different BAN es- said to be the kth-order asymptotically median
timators. Fisher stated without any sort of unbiased (AMU) estimator if for any 0 E 0
proof that the maximum likelihood estimator (CR’) there exists a positive number 6 such
minimizes E;, i.e., maximizes second-order that
efficiency. Rao proposed
lim sup c,“-’ PC{T,<7}-i =0
E,=m:n I/,(dlogp,,/dO-an”‘-Bn(T,-I)) n-x lr-HI<6
or

as a measure of second-order efficiency for lim sup c,“-’ lP~(7.2r):~=O.


n-r lr-Ol<d
first-order efficient estimators ( T,J satisfying
condition (4). He showed that the estimators This notion, which is an extension of the con-
mentioned above for multinomial distribution dition due to Bahadur for the asymptotic
are first-order efficient estimators satisfying efficiency, was introduced by Takeuchi and
condition (4) and furthermore calculated Akahira. For a kth-order AMU estimator
second-order efficiencies of these estima- {7;,), G,(~,O)+~,‘G,(~,O)+...+C,~+~G~-~(~,O)
tors measured in terms of E,: (i) E,(ML)= is called the kth-order asymptotic distribution
l;‘(O)I(O); (ii) E2(~‘)=A(0)+E2(ML); (iii) of c,(T,-0) if
E,(modX2)=4A(0)+ E,(ML); (iv) El(Q)=
(k+ 1)2A(0)+ E,(ML); (v) E,(HD)=A(0)/4
+ E,(ML); (vi) E,(K-L)=A(O)+E,(ML) with -c,‘G,(t,O)-...-c,“+‘G,+,(t,O)I=O.

Pfanzagl and Takeuchi and Akahira obtained


the concrete form of the second- or third-order
asymptotic distribution of the ML estimator.
A kth-order AMU estimator is said to be
&h-order asymptotic efficient if the kth-order
asymptotic distribution of it attains uniformly
the bound for the kth-order asymptotic distri-
butions of the kth-order AMU estimators.
Takeuchi and Akahira showed that under
suitable regularity conditions, r, is second-
order asymptotic efficient if

Rao [33] gave another definition of second- + ~P~~VWW(~)~‘~)

order efficiency based on the expansion of the x t2fp(t)+O(n-“2),


variance of T, after correcting for bias: V,( T,)
where O(t) is the standard normal distribution
=(nI(fl))-’ + $(0)ne2 + o(n-‘). The quantity
function and q(t) is its density function, and
G(O) is considered to be a measure of second-
further that the modified ML estimator
order efficiency. The results of Fisher and Rao
were confined to multinomial distributions.
Efron (Ann. Statist., 3 (1975)) and Ghosh and
Subramanyan (Su)tkhy& sec. A, 36 (1974)) for the ML estimator & is second-order asymp-
extended the results to the so-called curved totic efficient. Pfanzagl (Ann. Statist., 1 (1973))
exponential family of distributions. Efron gave obtained a similar result. The formulation due
a geometric interpretation to the effect that to Takeuchi and Akahira is more extensive
second-order efficiency is related to the curva- since it can be applied to the so-called non-
ture of a statistical problem corresponding regular cases.
to y(O) above, and S. Amari recently extended
this differential-geometric approach to the
discussion of higher-order efficiency of esti- P. Estimating Equations
mators. Rao suggested that E; is equal to E,.
Ghosh and Subramanyan gave a sufficient We often determine an estimator as a solution
condition for the equality to hold, whereas 0 = T,(x,, ,x,) of an equation Yn(x,, ,x,; 0)
Efron provided a counterexample to show that = 0; for example, the ML estimator as a solu-
E; #E, in general. tion of the likelihood equation. In such case,
such an equation is called an estimating equa-
Pfanzagl and Takeuchi-Akahira Approaches to tion and Y”(O)= Y,,(X,, . . . , X,; Q), a random
Higher-Order Asymptotic Efficiency. For each function, is called an estimating function [3].
399 Q 1498
Statistical Estimation

Call T, an estimator based on an estimating Theorem. Suppose that an estimator {T,}


function Y,,(O). The following result is a modifi- satisfies Y’,(T,)-tO a.s. (or in probability) as
cation of a theorem due to Hodges and Leh- n+co. Then, under (Nl))(N4) (i), T,+O, a.s.
mann (Ann. Math. Statist., 34 (1963)). (or in probability) as n+ co.
Theorem. Let 0 be an interval of R’. Sup- Lemma. Under (Nl))(N5),
pose that a real-valued estimating function
Y,,(Q) satisfies the following three conditions:
SuPi ly’.(4 - yn(Qo) - 44An ml’2+ I$d);
(M 1) Y,(0) is a nonincreasing function of the
real parameter 0;
in probability as n-t co.
(M2) for any real number h, n”‘Y,JO, + n-‘/‘h)
Suppose that {9[n1’2Y,,( ‘Q] } is ttight. It
- n”‘Y~(O,,) + yh+0 in probability, where y is a
implies that Yn( 7”)+0 in probability, and
positive constant; and
hence from the above theorem T,-+O, in prob-
(M3) ~[n”2Y~(Bo)](y)~~(y/cr), where @ is a
ability. Thus letting z = T, we have
continuous distribution function with zero
mean and unit variance.
Define an estimator based on Y” by T. = (0: +
in probability. That is, for any E> 0 there
0,**)/2, where Q,*=sup{tI) Y,(e)>O} and (I,**
exists an no such that for n>n,, F’{n”2jL(T,)I
=inf{O]Y’n(Q)<O}, Then we have X[n1’2(T,-
<c+ln r12Y (T,)-n1’2Yn(Oo)I}> 1 -a. This
O,)](y)*Q(yy/~) as n+co. Huber consid-
and the tigh;ness of { 2[n”2 Y,( T,)] } and
ered a formulation that guarantees the asymp-
{Lf[n112Y,,(Bo)]} lead to the tightness of
totic normality of an M-estimator. An M-
estimator T, is defined by a minimum problem {~[n1~2(~-00)]}, and the converse is also
true. At the same time we have n’#‘YJ T,) -
of the form Cy=‘=,p(Xi, T,) = min, C;=, p(Xi, 0)
n’~2Y,(~o)-An1~2(T,-~o)-+0 in probability.
or by an implicit equation C:=, $(Xi, T,) = 0,
The following theorem is a straightforward
where p is an arbitrary function and $(x, 0) =
consequence.
Zp(x, 0)/80. Note that p(x, 0) = - logf(x, 0)
Theorem. Suppose that an estimator & satis-
gives the ordinary ML estimator. Let 0 be a
fies n1’2Yn(&)+0 as n-co. Then, under (Nl))
closed subset of Rk, let (?Z, 9, P) be a proba-
(N6), ~[n112(~,-Ho)I P]+N,(O,A -I CA’- ‘) as
bility space, and let t//(x, 0) be some function
n+co.
on Z x 0 with value in Rk. Assume that X,,
X 2r... are independent random variables with
values in !Z having the common probability Q. Interval Estimation
distribution P that need not be a member of
the parametric family. Consider an estimating Interval estimation or region estimation is a
function method of statistical inference utilized to esti-
mate the true value g(0) of the given para-
metric function by stating that g(0) belongs to
a subset S(x) of &‘, based on the observed
Assume that value x of the random variable X If
(Nl) for each fixed 0~0, $(x,Q) is d-
P,{g(@S(X)}>l-- for any 0~ 0
measurable and $(x, 0) is keparable;
(N2) the expected value n(0) = E{ $(X, O)} for a constant x (0 < c(< 1), then the random
exists for all 0~0, and has a unique zero at region S(X) is called a confidence region of g(0)
tl=O”EOO; of confidence level 1 - c(, and the intimum of
(N3) there exists a continuous function that the left-hand side with respect to r’l E 0 is called
is bounded away from zero, b(0) > b, > 0, the confidence coefficient. In particular, if 0 c
suchthat~(su~,{I$(X,Wb(@})<~, R and a confidence region is an interval, as
liminf,,,,,{ I,‘.(O)]/b(O)} > 1, and is often the case, then the region is called a
Wmsup,,,+,{ I~(x,,I))--i(H)llb(B)})< 1; confidence interval, and two boundaries of the
(N4) for u(x, 0, 4 = SUP { I $(x, 4 - Icl(x, OIlI interval are called confidence limits. If a partic-
IT-Hldd}, ular subset S(X) among the set of confidence
(i) as d--rO, E(u(X, 0, d))+O, regions of g(0) of confidence level 1 - 51mini-
(ii) there exist positive numbers do, b,, and b, mizes PB{g(O’)ES(X)} for all pairs ‘9 and 0’ (#O),
such that E(u(X, 0, d)) < h, d and E(u(X, 0, d)2) S(X) is said to be uniformly most power-
<b,dforOanddsatisfyingO<IO-fJ,I+d<d,; ful. If a confidence region S(X) of g(0) of confi-
(N5) in some neighborhood of Q,, I(Q) is con- dence level l--51 satisfies P,{g(fl’)t~S(X)} Q l-
tinuously differentiable and the Jacobian ma- t( for all pairs 0 and 0’ (#O), then it is said to
trix at 0 = O,, A = (a&(0,)/8,), is nonsingular; be unbiased. The notion of invariance of a con-
and fidence region can be defined similarly, and
(N6) the covariance matrix C=E{$(X, 0,). the definition for S(X) being uniformly most
$(X, 0,)‘) exists and is positive definite. powerful unbiased (UMPU) or uniformly most
1499 399 Ref.
Statistical Estimation

powerful invariant (UMPI) can be formulated [4] S. Zacks, The theory of statistical infer-
in an obvious manner. ence, Wiley, 1971.
For each t&e@ let A(&) be an tacceptance [S] A. Birnbaum, On the foundation of statis-
region of a ttest of level CIof the thypothesis tical inference, J. Amer. Statist. Assoc., 57
Q=Q,. Foreachx~.%letS(x)={B]x~A(B), (1962), 269-326.
8~ O}. Then S(X) is a confidence region of B [6] E. L. Lehmann and H. Scheffe, Complete-
of confidence level 1 - 0~.If A(fi,) is an accep- ness, similar regions and unbiased estimation
tance region of a UMP test of the hypothesis I, II, Sankhya, sec. A, 10 (1950), 305-340; 15
0 = &, for each Q,,, then S(X) is a UMP con- (1955), 219-236.
fidence region of 0 of confidence level 1 - c(. [7] H. Morimoto, N. Ikeda, and Y. Washio,
Furthermore, corresponding to an acceptance Unbiased estimates based on sufficient statis-
region A(&,) of a UMP unbiased (invariant) tics, Bull. Math. Statist., 6 (1956).
test, a UMP unbiased (invariant) confidence [S] E. W. Barankin, Locally best unbiased
region can be constructed in a similar manner. estimates, Ann. Math. Statist., 20 (1949), 4777
501.
[9] A. Bhattacharya, On some analogues to
R. Tolerance Regions the amount of information and their uses in
statistical estimation, Sankhya, sec. A, 8 (1946)
Let X and Y be distributed according to prob- 1-14.
ability distributions P,X and &,” labeled by a [lo] K. Ishii, Inequalities of the types of
common 0~0 over measurable spaces (X, 23) Chebyshev and Cramer-Rao and mathemat-
and (V, g), respectively, and consider the prob- ical programming, Ann. Inst. Statist. Math., 16
lem of predicting a future value of the ran- (1964) 277-293.
dom variable Y using the observed value x of [ 11) C. R. Rao, Information and the accuracy
the random variablC X. If a mapping S(x) attainable in the estimation of statistical para-
sending a point x to a set belonging to % is
meters, Bull. Calcutta Math. Sot., 37 (1945),
used to predict that the value of Y will lie in 81-91.
the set S(x), then the random region is called a
[12] M. A. Girshick and L. G. Savage, Bayes
tolerance region of Y. In particular, if a toler- and minimax estimates for quadratic loss
ance region of a real random variable is an function, Proc. 2nd Berkeley Symp. Math.
interval, it is called a tolerance interval and its
Prob., 1 (1951), 53-74.
boundaries tolerance limits. Cl33 S. Karlin, Admissibility for estimation
For simplicity, suppose that X and Y are with quadratic loss, Ann. Math. Statist., 29
independent. There are several kinds of toler- (1958) 4066436.
ance regions. First, if P~(PJ{YES(X)} ap)>y [14] E. J. Pitman, The estimation of the loca-
for any 0 E 0, then S(X) is called a tolerance tion and scale parameters of a continuous
region of Y of content p and level y. Second, if population of any given form, Biometrika, 30
E~(&“{YES(X)})>/I for any OEO, then S(X)
(1939), 391-421.
is called a tolerance region of Y of mean con-
[ 151 C. Stein, Inadmissibility of the usual
tent 8. Suppose that the random variable X =
estimator for the mean of a multivariate nor-
(Xi, . . , X,,)’ is a trandom sample and that
mal distribution, Proc. 3rd Berkeley Symp.
both Xi and Y obey the same distribution.
Math. Prob., 1 (1956), 197-206.
If further the set {Pll Qe 0) forms the totality
[16] C. Stein, The admissibility of the Pit-
of l-dimensional tcontinuous distributions
man’s estimator for a single location para-
and the distribution of Pf{ YES(X)} does
meter, Ann. Math. Statist., 30 (1959) 970-979.
not depend on the choice of 0, then S(X) is
[17] J. L. Hodges and E. L. Lehmann, Some
called a distribution-free tolerance region.
problems in minimax estimation, Ann. Math.
For example, if X,,, < Xc*, < . <Xc., is an
Statist., 21 (1950), 182-197.
torder statistic, then the interval [X(,,, Xcs,] (for
[ 181 H. Kudo, On minimax invariant esti-
I < s) is a distribution-free tolerance interval mates of the transformation parameter, Nat.
for a random variable Y, independent of Xi, Sci. Rep. Ochanomizu Univ., 6 (1955), 31-73.
. ..) X,,, which has the same distribution as Xi.
[19] J. Wolfowitz, The efficiency of sequential
estimates and Wald’s equation for sequential
References processes, Ann. Math. Statist., 18 (1947) 215-
230.
[l] H. Cramer, Mathematical methods of [20] R. A. Fisher, On the mathematical foun-
statistics, Princeton Univ. Press, 1946. dations of theoretical statistics, Philos. Trans.
[2] C. R. Rao, Linear statistical inference and Roy. Sot. London, 222 (1922) 309-368.
its applications, Wiley, second edition, 1973. [21] R. A. Fisher, Theory of statistical estima-
[3] S. S. Wilks, Mathematical statistics, Wiley, tion, Proc. Cambridge Philos. Sot., 22 (1925),
1962. 700-725.
400 A 1500
Statistical Hypothesis Testing

[22] R. von Mises, On the asymptotic distri- consists of one point, it is called a simple hy-
butions of differentiable statistical functions, pothesis, otherwise a composite hypothesis.
Ann. Math. Statist., 18 (1947), 309-348. Let .“x’ be a tsample space associated with a
[23] L. LeCam, On some asymptotic prop- to-algebra % of subsets of X. To test a hypoth-
erties of maximum likelihood estimates and esis H is to decide whether H is false, based on
related Bayes estimates, Univ. California Publ. the observation of a sample X(E.‘)‘). The asser-
Statist., 1 (1953), 277-330. tion that H is not false does not necessarily
[24] L. LeCam, Locally asymptotically normal imply the validity of H. Such an assertion is
families of distributions, Univ. California Pub]. called the acceptance of H, while the opposite
Statist., 3 (1960), 37798. assertion, that H is false, is called the rejection
[25] G. G. Roussas, Contiguity of probability of H. In this framework for the testing prob-
measures, Cambridge Univ. Press, 1972. lem, H is often called a null hypothesis (- 401
[26] A. Wald, Note on the consistency of the Statistical Inference).
maximum likelihood estimate, Ann. Math. Consider a testing procedure in which H is
Statist., 20 (1949), 595-601. rejected with probability q(x) (0 6; q(x) d 1)
1271 R. R. Bahadur, On Fisher’s bound for and accepted with probability 1 - q(x), when
asymptotic variances, Ann. Math. Statist., 35 x E x is observed. This testing procedure is
(1964), 1545-1552. characterized by the function cp on .% with
1281 J. Hijek, A characterization of limiting range in [0, 11. Here q(x) is taken as 8-
distributions of regular estimates, Z. Wahr- measurable on I’, and is called a test function
sheinlichkeitstheorie und Verw. Gebiete, 14 or test. If q(x) is the indicator function xR(x) of
(1970),323-330. a set B (E !B), then the test is rejecting H when
[29] S. Kaufrr.an, Asymptotic efficiency of the x belongs to B and accepting H otherwise. The
maximum likelihood estimator, Ann. Inst. set B is called a critical region, and its comple-
Statist. Math., 18 (1966), 155- 178. mentary set an acceptance region. A test is
[30] J. Wolfowitz, Asymptotic efficiency of the called a nonrandomized test if it is the indicator
maximum likelihood estimator, Theory Prob. function of a set. Other tests are called random-
Appl., 10 (1965), 247-260. ized tests.
[31] M. Akahira and K. Takeuchi, Asymptotic Suppose that the tdistribution Iof the sample
efficiency of statistical estimators: Concepts X is a probability measure Ps on I Y’, %). The
and higher order asymptotic efficiency, Lec- probability of rejecting H when 0 is the true
ture notes in statist. 7, Springer, 1981. value of the parameter is calculated from
[32] C. R. Rao, Asymptotic efficiency and
limiting information, Proc. 4th Berkeley Symp. E,(v) = vW’&W.
Math. Prob., 1 (1961), 531-546. s-x
[33] C. R. Rao, Criteria of estimation in large Let x be a given constant in (0,l). If a test
samples, Sankhya, sec. A, 25 (1963), 189-206. (i?(x) satisfies &(~)<a for all 0Ea),, or, in
[34] P. J. Huber, The behavior of maximum other words, if the probability of rejecting H
likelihood estimates under nonstandard con- when H is true is not greater than a, a is called
ditions, Proc. 5th Berkeley Symp. Math. Sta- the level of cp and such a test is called a level c(
tist. Prob., 1 (1967), 221-233. test. We denote the set of all level c( tests by
W4, and supRtoH E,(v) is called the size of cp.
To judge the merit of tests, we introduce a
different hypothesis A: The true value of 0
belongs to w, ,Q - (of,. This is called an alter-
400 (XVlll.8) native hypothesis, or, for simplicity, an alterna-
Statistical Hypothesis tive. The errors of a test are divid’ed into two
Testing kinds: errors owing to the rejection of the
hypothesis H when it is true, and errors owing
to the acceptance of H when it is false. The
A. General Remarks former are called errors of the first kind, and
the latter, errors of the second kind. The proba-
A statistical hypothesis is a proposition about bility E,(q) of rejecting H when HEW,.,, that
the tprobability distribution of a tsample X. If is, the probability of the correct decision being
it is known that the idistribution of X belongs made for OE~~, is called the power of a test or
to a family of distributions 3 = {PO 1OER} with the power function. The probability of com-
a parameter space R, the hypothesis can be mitting an error of the second kind is 1 -E,,(q)
stated as follows: The value of the parameter 0 for OEW,,,. A testing problem is indicated by
belongs to wH, where oH is a nonempty subset the notation (x, 23,Y, wH, cl,,,,). A test cp in a
of the parameter space 0. This hypothesis is class cP(!x) of tests is said to be uniformly most
also written simply as H:HEw,,. When w,, powerful in m,(x) (or UMP in m(2)) if for any
1501 400 c
Statistical Hypothesis Testing

(I/E@(E), E,(q)>&($) for all DECO,,. When ~3~ of distribution of the sample is h,, against
consist of a single point, it is said to be most the alternative A: H= g, and let (pi be a most
powerful. powerful level c(test for this problem (H,: A). If
SU~~.~~ E,(cp,) < tl, (pI is a most powerful level cI
test for testing H : Q E wH against A : fI = 8. The
B. The Neyman-Pearson Fundamental Lemma measure 1, satisfies E,(cp,.) 2 E,(cp,) for any
probability measure I.’ on wH and is called a
Let p be a to-finite measure over (Y, %) and least favorable distribution.
,f; , _, ,,f,+, be p-integrable real-valued func- When the alternative hypothesis w, consists
tions. If c 1, , c, are constants such that the of more than one point, a uniformly most
set @(c, , , c.) of test functions cp satisfying powerful test does not generally exist. How-
ever,ifR=R,w,=(-co,&J,(~~,=(&,,m),and
P.f;dPGci, i=1,2 ,,.., n, ,&(x) is a density function with tmonotone
likelihood ratio with respect to a statistic 7’(x),
is not empty, then there exists at least one test then a UMP level a test v(x) exists and is
‘p,, in @(c,, , c,) that maximizes lv).fn+, dp defined by q(x)= 1 if T(x)>c; = a constant
among all cpin @(c,, , c,). A test $5 is one if T(X) = c; and = 0 if T(x) < c. For a one-
of these tests if it satisfies the following two parameter exponential family of distributions,
conditions: there exists a UMP level CItest for testing
(I ) For appropriate constants k, , , k, B 0, H:w,=(-cc,O,]U[O,,co)against A:w,=
(O,, 0J (0, < 0,). However, a UMP test does
1 when .A,+, C-d> i kAx),
not exist for the problem obtained by inter-
i=,
Q(x)=
changing the positions of (0” and oA.
Since hypothesis-testing problems admitting
0 when .L+l(x) < t ki,fi(X)
i=l UMP tests are rather rare, alternative ways for
judging the merit of tests are needed, and two
almost everywhere with respect to .u, and
have been devised. The first is to restrict the
(2) the equation
class CDof tests and to find a UMP test in this
restricted class. The second is to introduce an
$.AdP= Cir i=1,2 )..., II, alternative criterion of optimality and to select
I a test accordingly. The first is discussed in
holds. detail in Sections C, D, and E, and the second
If (cl, , c,,) is an interior point of the subset in Section F.

of the n-space R” and @ satisfies (2) and maxi- C. Unbiased Tests


mizes J ~,f,+ 1dp among all cp in cD(c 1, , c,,),
then 4 satisfies (1). These statements are called The unbiasedness criterion is based on the
the Neyman-Pearson lemma. idea that the probability of rejecting the hy-
As an illustration, suppose that R =( 1,2, , pothesis H when it is true (the probability of an
n, n $- 1) and that {PO 10~0) is dominated by a error of the first kind) should preferably be no
o-finite measure kt. Let ,fi(x) be the density of larger than that of rejecting H when it is false
Pi with respect to p. When wH is a finite set (the power). If a level CYtest cp satisfies E,(v) B CI
(l,..., n) and wA consists of a single point for OEW~, then q is called an unbiased level a
n + 1, then 4 satisfying (1) and (2) with c, = test. Let a,,(n) be the set of all unbiased level CI
= c, = x is a uniformly most powerful level tests. A UMP test in @Jr) is called a uni-
s( test. formly most powerful (or UMP) unbiased level
If B is generated by a countable number of SI test.
sets, then there exists a most powerful level c( If PO is of the texponential family whose
test for any hypothesis against a simple alter- parameter space Q is a finite or an infinite
native A: 0 = I!?. A method of obtaining such a open interval of Rk, then there exists a UMP
most powerful test is given in the Lehman- unbiased level c( test for the following two
Stein theorem: Denote by f0 the density func- problems: (1) H: 0, da, A: 0, > a, where 0, is
tion of Ps with respect to a o-finite measure p, the first coordinate of Q = (O,, , Ok); and (2)
and define H: 0, = a, A: 0, # 11.For example, when the
sample is normally distributed with unknown
h,(x)= 1 .Ik(x)d40) mean p and unknown variance rs2, the Student
J% test (defined in Section G) for a hypothesis
for a probability measure F. on wH. Consider H : ,u = p0 against an alternative A : p # fLO is a
testing the simple hypotheses H,: The density UMP unbiased test.
400 D 1502
Statistical Hypothesis Testing

D. Similar Tests and Neyman Structure transformation group G on the sample space
3’. Denote the set of all invariant 1e:vel tl tests
If E,(q) is constant for all BEw (~a), cp is by @[(cc). A test that is uniformly most power-
called a similar test with respect to o. If E,(q) ful in CD,(a) is called a uniformly must powerful
is a continuous function of 8, an unbiased (in short, UMP) invariant level x test. If there
test cp is similar with respect to the common exists a unique UMP unbiased level tl test ‘p*,
boundary 5 of wH and w,, provided that 52 then a UMP invariant level a test (if it exists)
is a topological space and the density function coincides with (p*[9]. When T(x) is tmaximal
is continuous in w. Therefore, in this case, invariant under G, a necessary and sufficient
unbiased tests are found in the class of all condition for q(x) to be invariant i:s that q be
tests similar with respect to 6. Let a statistic a function of T(x).
y = T(x) be tsufficient for gw = { Ps 10 E w}. A For example, suppose that the sample X
test cp is said to have Neyman structure with =(X,, , X,) is taken from N(p, 0’) with
respect to T if the conditional expectation unknown p and 0’. In this situation, Y = (X, S)
E(cp 1T(x) = y) of cp equals a constant [pm]. is a sufficient statistic, where x=&X,/n and S
(For the notation [$,] - 396 Statistic.) A test = d-. Let G be the group of trans-
cp having Neyman structure with respect to the formations (x, s)-(CT, cs) (c > 0) on the range
statistic T(x) is similar with respect to w. A test <V of Y and c be the group of transformations
cp similar with respect to w has Neyman struc- (p, o’)+(cp, ca’) (c > 0) on the parameter space.
ture with respect to T(x) if and only if the Both the hypotheses H, :p/02 <O and H, :
family $ = { Qs = Ps T-’ 18~ w} of imarginal p/a2 = 0 are invariant under G. Since t = 4.
distributions of T is tboundedly complete. xl(slJn-l) 1s maximal invariant, any invari-
ant level a test is in the class of functions of
t. The Student test, defined in Section G, is
E. Invariant Tests UMP invariant under G.

Consider groups G and c of one-to-one trans-


formations on X and R, respectively. Suppose F. Minimax Tests and Most Stringent Tests
that each element of G is a measurable trans-
formation of X onto itself (i.e., gi?E% for any Minimax tests and most stringent tests are
BE 2J) and that a homomorphism g-g of G sometimes used as alternatives to 1JMP tests.
into G is defined so that P,(g-‘B)=P,,(B). Suppose that B = {P, 1@EQ} is a tdominated
The hypothesis H:Qw+ and the alternative A: family and 23 is generated by a countable
OEW~ are said to be invariant under G if gwH = number of sets. A level a test (p* is called a
wH and gwa = w, for all g E G, and in this case minimax level a test if for any level a test cp,
the testing problem (%, 23,9, wH, ma) is said
inf E,(cp*) > inf E,(q).
to be invariant under G. A test is called an BEWl flEWA
invariant test if cp(gx) = q(x) for every g E G,
Such a test exists for any ac(0, 1). If a group G
and E&q)= E,(cp) holds for any invariant q.
of measurable transformations on F leaves a
Accordingly, if G is ttransitive on wH, and
testing problem invariant, then an intimate
invariant test is similar with respect to wH. If
relation exists between the minimax property
the sample space %” is a subset of R” that is
and invariance. Concerning this relation, we
invariant under the translation (x 1, ,x,)’
have the following theorem: For each ae(0, 1)
(x, + a, , x, + a) with a real a and if there
there is an almost invariant level a test that is
exists a 8’=.?i.@~R such that P,.(B)=P,({(x,
minimax if there exists a a-field % of subsets
--a, . . . . x,-a)l(x,, . . . . xJEB}) for any f3~n,
of G and a sequence {v”} of probability mea-
then z is a transformation on Sz. In this case
sures on (G,cLI) such that (i) BE% implies
the real number a is called a location para-
{(x,g)IgxEB}E% x!!l; (ii) AE’LI, gc:G implies
meter. Furthermore, if the sample space Z” is a
AgEzZ; and (iii) lim,,,lv,(Ag)-v,,(A)I=O for
subset invariant under the similarity trans-
any AE% and gEG. Fundamental in the invar-
formation (x,, . . . . x,,)*(nx,, . . . . ax,) (a>O) and
iant testing problem is the Hunt-Stein lemma:
if there exists a O’=Z. fl~fi such that P,.(B)=
Under the condition just stated, for any cp
Pd{(x,la, . . ..x.&)l(x,, . . ..x.)~B)l for any
there exists an almost invariant test $ such
OER, then the real number a is called a scale
~ that
parameter. The invariance principle states that
a test for a testing problem invariant under G
should preferably be invariant under G. A test
q(x) is called an almost invariant test if cp(gx) The following six types of transforlmation
= q(x) [Y] for all gE G. groups satisfy the condition of the theorem: (1)
Suppose that the testing problem of a hypoth- the group of translations on R”, (2:l the group
esis under consideration is invariant under a of similarity transformations on R”, (3) the
1503 400 G
Statistical Hypothesis Testing

group of transformations q=(a, h):(x,, ,x,)E these tests, (3) is UMP and (1) is UMP when
R”+(ax,+h....,ax,+h)~R”(O<a<x, --~j< x > l/2. All tests (l)-(5) are UMP unbiased,
b c m), (4) finite groups, (5) the group of ortho- and (3)-(5) are UMP invariant under the
gonal transformations on R”, and (6) the direct translations (xi, . . . ,x,)+(x, +a, . . . . ~,+a)
product of a finite number of the groups men- (-Q <a < co). Since (1) and (2) are also UMP
tioned in (l)--(5). invariant under the transformations (x,, , x,,)
We call lI’,*(O) = suprpcDCo)E,(q) an envelope +(ux,, . ..) ax,) (0 <a < co), they are most
power function, and q* (E@(E)) is called a most stringent tests.
stringent level 5( test if Suppose that X, , , X, are independently
distributed according to N(p,,o:) and that
sup (K(O) - E&Q*)) d sup (fiZ(O) - E,(V))
RFOA REmA Y1, , K are independently distributed ac-
cording to N(p2, cri), where pi, n2, (T,, and (r2
for any cp~@(x). There exists a most strin-
are assumed unknown unless otherwise stated.
gent level C!test for each ae(O, 1). If a testing
Here we give the important tests for pi, n2, crt,
problem is invariant under a transformation
0:. Letx=(x ,,..., x,)andy=(y ,,..., y,)be
group G on X and G satisfies the condition
sample points in R” and R”, respectively, and
in the Hunt-Stein lemma, then a uniformly
denote Cz, xi/m, Cb, yJn, C~,(X~-Y)~, and
most powerful invariant level r* test is most
Cy=,(yi-L)2 by X, 7, s,‘, and st, respectively. (6)
stringent among the level CItests (- 398 Sta-
Assume that (T, and oz are known, and con-
tistical Decision Functions).
sider a hypothesis p, =p2. When an alterna-
Admissibility of a test and completeness of a
tive p, >n2 (pi #p2) is taken, we can use as a
class of tests are defined with respect to the
critical region S = {(x, y) 1T(x, y) 3 c(x)} (S =
probability of an error of the second kind (-
{k Y) 1I T(x, Y)I 2 44) 1, where W, Y) = (F -
398 Statistical Decision Functions). The uni-
formly most powerful level 51test and the y)/&m. These tests are both UMP
uniformly most powerful unbiased level c( test unbiased and invariant under the translations
are admissible. (x,,...rx,, y,, . . . . y,)-(x, +a, . . ..x.+a,y, +
u, , y, + a). (7) Assume that o1 = cr2, and
consider a hypothesis p, =p2. When pr #p2
(pi >p2) is the alternative, S= {(x, y)]) T(x,y)l
G. Useful Tests Concerning Normal 3 C(M)} (S = {(x, y) I T(x, y) > c(a)}) can be
Distributions (- Appendix A, Table 23) used as a critical region, where T(x, y) =
(x-y)JhTiq(JJ~).
In this section, we treat the rejection regions S Both tests are UMP unbiased and are in-
that are commonly used in testing problems variant under (xi, ,x,, y,, . . , y,)+(uxi +
related to normal distributions. Let c( be the b ,..., ax,+b,ay,+b ,..., ay,+b)(-a<bi
level of S, and let c(a) and d(a) be positive zoo,0 <a < co). (8) Testing the hypothesis H:
numbers determined by a. In (l)-(5) below, the IL, = nL?is called the Behrens-Fisher problem.
sample consists of n mutually independent Note that nothing is assumed about the rela-
random variables X,, ._ ,X, each of which tion of the variances 0: and 0: of the two
is assumed to be normally distributed with samples X and Y, in contrast to (7). It is dif-
mean p and variance cr2. For any sample point ficult to construct a statistic whose distribu-
x=(x ,,..., x,),denoteC:=,xJnbyxand tion is independent of C: and cr: when H is
& (xi- $ by s2. (1) To test the hypothesis true. Compare this with (l))(7), where the
p<pLo against an alternative n> pO, we can proposed statistics have this property and are
use as a critical region S={xl t(x)>c(a)}, used to construct similar critical regions S.
where the test statistic t(x) is given by &(?? The critical region
-p,JJs2/cn- 1). (2) To test the hypothesis p
= bicoagainst an alternative p # pLo, we can use
S = {x 11t(x)1 2 C(X)} with the same test statistic
t(x) as in (1). These tests based on the statistic
t(x) are generally called Student tests or t-tests. with an appropriately chosen J is similar to
(3) To test the hypothesis a2 = 0,’ against the such a region S. This test is called Welch’s test.
alternative o2 > g,’ with 00’ > 0, we can use S = (9) For a hypothesis cri = g2, we can use as a
{xIx2(x)>c(z)), where x2=s2/ot. (4) To test critical region S = {(x, y) 1F(x, y) < c(m)}, S =
the hypothesis g2 > 0: against the alternative {(x, Y) I Fb, Y) > c(cO~~
or S= {(x3Y) IW6 Y) 2 44
cr2 <a& we can use S = {x ( x’(x) < C(U)}, where or <c(a)}, when rrl <cr2, g2<cr,, or r-r1#a,,
x2 is the same as in (3). (5) To test the hy- respectively, is taken as alternative, where
pothesis o2 = 00” against g2 # I$, we can use F(x, y)=(n- l)s:/(m- I)$. All these tests are
S={X(X~(X)<C(Z) or >/d(a)}, where x2 is the UM P unbiased and are invariant under the
same as in (3). Each of these tests based on the transformations (x,, . . , x,, y,, , y,)+
statistic x2 is called a chi-square test. Among (ux,+b ,..., ux,+h,ay,+h ,..., uy,+b)(-m<
400 H 1504
Statistical Hypothesis Testing

h < E, 0 <a < ~3). A test based on F(x, y) is Assume that B’C = 0. Let U be an orthogo-
called an F-test. nal transformation in R” such that the first,
, rth, (s + l)st, . , nth rows of UB and the
(s + l)st, . , nth rows of UC are all equal to
H. Linear Hypotheses zero vectors. Using the notation X = UX, @=
UBq, l= UC<, and W= UW, we obtain the
canonical form X = 4 + 4 + W of the: model (2).
Let Xi, , X, be independent and distributed
according to N(pLi,02) (i= 1,2, ,n), where pi, W is also a vector of independently and identi-
p2, , pS (s <n), D are assumed to be unknown cally distributed normal random variables.
and pi = 0 (s < id n). The hypothesis is H: pL1= The hypothesis H is expressed as 4 =O. In this
pL2= = p, = 0 (r < s), and the alternative model, we have E(X,) = 0 for i = s $- 1, . . . , II, and
hypothesis is that at least one pi, 1 < i < r, does moreover, @X,)=0 for i= 1, . . . . r,.i+ 1, . . . . n if
not vanish. The critical region S = {(x, , , x,) 1 and only if H is true.
x.) = c;=, xf/c:,b+, x2 > The least squares estimator Y of A< is the
F(X,r...rX,;X,+l,,
C(E)} is a UMP unbiased test for this prob- tmaximum likelihood estimator, and X -Y,
lem, and S is invariant under the group gi of Y-Z, Z are distributed independently ac-
translations cording to the n-variate normal distributions
N(O, a2u - PA)), N((P, -P&x a2(Pa -P,)),
and N(Bq+ P,C<, 02Ps), respectively. Hence
Q/a’, QH/g2, and X’P,X/a’ are distributed
4x ,,..., X,,&+,fU ,...1 x,+4x,+,,...,x,),
independently according to the tnoncentral x2-
the group g2 of similarity transformations distributions with n-s, r, and s-r degrees of
(Xl,...,X,)‘(CX1,..., cx,), the group g3 = O(r) freedom and tnoncentrality parameters 0,
of orthogonal transformations in R’= {(x1, cC’(P,-P,)C{/cr*, and (Bq+PsCi)‘(Bq+
, x,)}, the group g4 = O(n - s) of orthogonal PBc~Y~2, respectively. The tlikelihood ratio
transformations in R”-“= {(x,+, , ,x,)), and test of the hypothesis H has a critical region
finite products of elements of the groups gi, Sijfr’ > c, is a +uniformly most powerful invar-
g2, g3, and g4. This test is also a kind of F- iant test with respect to the group of linear
test. More generally, let us denote X=(X,, transformations leaving the hypothesis H
X,, , X,)’ and assume that it is expressed as invariant, and is the +most stringent test. This
test is also uniformly most powerful among the
X=A<+W, W=(W,, w,..., WJ’, (1)
tests whose tpower function has a single vari-
where 5 = (5,) &, , t,)‘, s d n is a vector of un- able cC’(P, - Ps)Cc/~2. Furthermore, for s -
known parameters and A a matrix of known r = 1, this test is a tuniformly most powerful
constants, and W,, W,, , W, are distributed unbiased test. In the decomposition X’X =
independently according to the normal distri- X’(P,-P,)X+X’P,X+X’(I-P,)X=Q,+
bution with mean 0 and variance c2. Then a Qs + Q, the terms QH and Q, are called the
general linear hypothesis is a hypothesis stating sum of squares due to the hypothesis and due
that the vector 5 lies within a linear subspace to the error, respectively. Such a process of
M of R”. The set of points A< with 6 satisfying decomposition is called the analysis of var-
a linear hypothesis H is the linear subspace iance and its result is summarized in the anal-
L(B) of L(A) spanned by the column vectors ysis of variance table (Table 1).
of an n x k, matrix B. Assume, for example,
that the dimension of L(B) (= the rank of B) is
s-r. Let C be an y1x k, matrix whose column I. The Likelihood Ratio Test
vectors span the orthocomplement L;(B) of
the space L(B) with respect to the space L(A). The likelihood ratio test is comparatively easy
Then the model (1) can be written as to construct. Let L(x,, , x,; 0) be: the tlikeli-
hood function. Then
X=Bg+Cc+W, E(W) = 0, (2)
L(x,, . . ..x..O)
with a k,-vector 1 and a k,-vector [, and hence Nx l,...,x”)= SUPBEwH
the hypothesis H is represented by 6 = 0’. We suPfJEoHuw,~(x,> ‘.’ >x,; 0)
denote by Y = P,X the projection of X onto is called the likelihood ratio, and the test corre-
the space L(A) and by Z the projection PRX of sponding to the critical region S = {(xi, , x,) 1
X onto the space L(B). The quantity QH =X’(P, fvx , , , x,) < c,} is called the likelihood ratio
- P,)X equals the square of the length of the test, where c, is a positive constant deter-
vector Y-Z and represents the sum of squares mined by the level CI. Let i&,(x,, . . . ,x,) and
of residuals for the hypothesis H. The error & VA(~l, , x,) be the tmaximum likelihood
mean square 8’ = X’(I - P,)X/(n - s) = Q,/(n - s) estimators for f3 in o, and in w, Li w,, respec-
and also (ii = Q,,/r under H are unbiased tively; that is, L(x; &(x)) = supBtwH L(x; 0) and
estimators of g2. Ux; &HVA(x))=su~BEoHUwA I&; 0 Then
1505 400 K
Statistical Hypothesis Testing

Table 1
Degrees of Ratio of
Factor Sum of Squares Freedom Mean Square Variances
H QH=X’(Pa-P,)X r 8; = QH/r r3j#’
B Q,=X’P,X s-r (i,2/(s-r)
Error Q,=X’(I-P,)X n-s cf2 = Q&I-S)
Total X’X n

Jw; ii,, called a test. For example, a likelihood ratio


A(x)=--.
test is frequently understood as a sequence
of tests S, = {(x li ~“,X,)lMX,, ~~~>X,)~&},
The F-test for a linear hypothesis is a likeli-
where {i} is a sequence of constants and
hood ratio test, and other examples of the like-
A&, , . , x,) is the likelihood ratio defined by
lihood ratio test are shown in Appendix A,
(X”“‘, (13(“),Pi”)) and wr,. If a test { cp,} satisfies
Table 23. However, the likelihood ratio test
4dcp,bO (QE~ and ~%(cp,,)+l (0~~~1 as
does not necessarily have the desirable prop-
n-tm, (47,) is said to be a consistent test. If
erties stated in the preceding sections.
these convergences are uniform with respect to
0, {cp,} is said to be a uniformly consistent test.
J. Complete Classes When a uniformly consistent test exists, wH
and wA are said to be finitely distinguishable.
The set of critical regions of the type {x 1T(x) > Suppose that the observed values are identi-
c} for the problems H : 0 < 0, and A : B > O,, cally distributed (that is, (Xv, !B,, P,,) is a copy
where the distribution family :Y of the statistic of a probability space (X, 8, PO)) and wH and
T is of +Polya type 2 in the strict sense, and the (uA are both compact with respect to the met-
set of regions of the type {x 1c < T(x) < d} for ric ~(O,~‘)=SU~,,~IP,(B)~P,.(B)I. In this
the problem H:O, <tY<tI, and A:O<t), or 0, case, wH and w,,, are finitely distinguishable if
< 0, where the distribution family 9 of the E,(q) is a continuous function of 0 for any q
statistic T is of tPolya type 3 in the strict [4]. Kakutani’s theorem (- 398 Statistical
sense, are examples of minimal complete Decision Functions) is regarded as a proposi-
classes [6]. It has been proved under a mild tion concerning distinguishability when the
condition that the set of all tests with convex null hypothesis and the alternative are both
critical regions is essentially complete when simple.
the underlying distributions are of exponential The following result about the tlimit distri-
type and the null hypothesis is simple. bution of a likelihood ratio is due to H. Cher-
Let Po,~,Y (i=O, 1) and let ‘B, be a g- noff [3]: Let %“(“I be an n-space and R be an
subalgebra of 8. 8, is sufftcient for ‘B w.r.t. open subset of Rk containing the origin 0.
(P,,, P,,,] if and only if the class of all 9,- Suppose that the observed random variables
measurable test functions is essentially com- are independent and distributed according to a
plete, i.e. iff for every critical region BE B there density ,f(x, 0); that is, the likelihood function
exists a ‘%,-measurable test function p0 such L(x; 0) is n~=,f’(xi, 0). Moreover, assume the
following regularity conditions:
that K+,((P~~ &&J and ~%~(nJ3 &,hJ.
Assume that 9 = {P, 1OE 0) is dominated; if for (1) logf(x, 0) is three-times differentiable with
every ‘B-measurable test function cp there respect to 0 at every point of the closure of
exists a %,-measurable test function I/ such some neighborhood N of 0 = 0.
that E,($)=E,,(cp) for all (1~0, then 8, is (2) There exist an integrable function F and a
sufficient for ‘B w.r.t. .Y [9, lo] (- 398 Statis- measurable function H such that (i) 1gf/%I, I<
tical Decision Functions D, 399 Statistical F(x) for every ()E N; (ii) ]82f/(:0iZ0jl < F(x) for
Estimation E). every 0~ N; (iii) Ia3 logf/~(liZIj~H,,,~ < H(x); and
(iv) supsE,(H(x))< x.
(3)Foreveryi,,j=1,2 ,..., k,wehaveJf=
K. Asymptotic Theory E,,[(cilogf;iL:0J (alogf/dHj)] < co, and the
matrix J, = (Jj) is positive definite for all 0 E N.
Let (Xv, 8,, P,,) (v = I, 2, , n) be a sequence of Let P(x,, , x,; w) = supHEw L(x, , . , x,; 0) for
probability spaces, where the parameter space a subset w of R. Consider testing a hypothesis
fi is common to all v. Let (GX(n)
.’ 1B (“), Pg’) be the UEW~ against an alternative BEW*, where 0 is
direct product probability space of (Xv, B”, Pyo) an accumulation point of oH. If 3,*(x,, , xn)
for v = 1,2, . , n. For each sample space (Y(‘), = P(x,, ,x,; wH)/P(x,, ,x,; w,), i* plays
‘B@), Pr),n,),denote a test function for H : 0 E essentially the same role as the likelihood ratio
~,(~R)~~~A:OELI)~(C~-w,,) by (pn(xl, i and hence can be used in its place. We call a
“‘> x,,). A sequence { cp,,) (n = 1,2, ) is often subset C of Q a cone if OE C implies UOE C for
400 K 1506
Statistical Hypothesis Testing

any a > 0. A subset w of R is said to be appro- [S]. For the test of goodness of fit, the em-
ximated by C if w satisfies inf,,cilx-yII = pirical distribution function may a!so be used
~(llvll) for all Y~O and inf,,,llx-A =~(ll.ll) (- 371 Robust and Nonparametric Methods).
for all xcC around the origin, where ~~xl12= A test of independence by contingency
Cy=, x2. Suppose that wH and s,, are appro- tables is one application of the chi-square test
ximated by two cones C, and C,, respectively. of goodness of fit. We suppose that n individ-
Then, setting z, = 4 J,’ .4(x) with uals are classified according to two categories
A and B, where A has r ranks A,, A,, . . . . A,
1 n alogf(x,,o) and B has s ranks B,, B,, , B,. Let ~~.,p.~, pi,
A(x)= naxl c’(j 1
[ L be the probabilities that the observed value
of an individual belongs to Ai, Bj, Ai n Bj, re-
spectively. Let xi,, x.~, xij be the numbers of
individuals belonging to Ai, Bj, and Ai n Bj,
the limit distribution of - 2 log i*, when 0 respectively. Table 2 is called a contingency
= 0, coincides with that of infgEc,(z, - @‘J&z, table. To test the null hypothesis H that the
- 0) -infssc,(z, - @yJ,(z, - 0). In particular, divisions of A and B into their ranks are inde-
whenR=Rkandw,={(@ ,..., Q,“,O,+, ,..., Ok)1 pendent, that is, H: pij = ~~.p.~, the statistic
--co <Oi< co, i=s+ 1, . . . . k} and some regular- x2=& ~,“=1(xij-xi.x.j/n)2/(xi.x.j~n) is ap-
ity conditions are assumed, the limit distri- plied. When H is true, x2 is asymptotically
bution of - 2 log i.* is the +X2-distribution distributed according to the chi-square distri-
with s degrees of freedom if the hypothesis is bution with (r - l)(s - 1) degrees of freedom as
true. n-co.
The asymptotic behavior of the chi-square Likelihood ratio tests and chi-square tests of
test of goodness of fit is also very important. goodness of fit are consistent tests under con-
Suppose that (X,, ,X,) has a multinomial ditions stated in their respective descriptions.
distribution n!(x,! . . . xk!))‘p;l . ..p.“(Cf,, xi In general, there are many consistent tests for
=n, xi>O), and consider testing H:p, =pT, a problem. Therefore it is necessary to con-
, pk = pi. The chi-square test of goodness of sider another criterion that has to be satisfied
fit has a critical region of the type {x 1x2(x r, by the best test among consistent tests. Pit-
“‘> xk)>c}, where x2(x,, . . . ,xk) is Cfi=,((xi- man’s asymptotic relative efftciency is such a
np~)‘/np~), i.e., the weighted sum of the squares criterion. Other notions of efficiency have also
of the differences between the value po of pi been introduced.
and the maximum likelihood estimator x,/n of A completely specified form of distribution
pi. The limit distribution of x2(x,, , xk) is rather exceptional in applications. More
when pi = ~0, i = 1,2, , k, is the chi-square often we encounter cases where distribution of
distribution with k - 1 degrees of freedom. the sample belongs in a large domain. Various
Suppose moreover that k functions p,(H) tests independent of the functional form of
(i=l,...,k;s<k)ofO(ERS)aregivenandthat distribution have been proposed, and the
the hypothesis to be tested is that the sample asymptotic theory plays an important role in
has been drawn from a population having a those cases (- 371 Robust and Nonparamet-
distribution determined by H : pi = pi(O) (i = ric Methods).
1, , k; s < k) for some value of 0. In this The following concept of asymptotic effi-
case the chi-square test of goodness of fit could ciency is due to R. R. Bahadur [l 11: Let {T,}
be applied after replacing the parameter 8 be a sequence of real-valued statistics defined
in pi = p,(O) (i = 1,2, . , n) by the solutions on %-(“). { 7”) is said to be a standard sequence
&(x1, . , xk) of the system of equations of the (for testing H) if the following three conditions
modified minimum chi-square method, are satisfied.
(I) There exists a continuous probability
ci=Lk xi---=O-pi npi dp,aej distribution function F such that for each
8EwH, lim,,, Pg){ T, < t} = F(t) for every
(j=l ,...,~).Supposethat(l)p,(O)>c~>O teR’.
(i= 1, . . ..k) and C~zlpi(B)= 1; (2)pi(Q) is twice
continuously differentiable with respect to the Table 2. Contingency Table
coordinates of 8; and (3) the rank of the matrix B, B, ... B, Total
(api/aOj) is k. Then the system of equations
above has a unique solution 0 = &(x1, , x,), A, x11 x,2 ... XIS Xl.
XI2 .”
and g,, converges in probability to 0, when @= A2 x21 x2.v x2.

&. The asymptotic distribution of x2(x)=


k, xi, xi2 “’ XL, x;.
Cl=,((xi- np,(&))Z/npi(&)) is the chi-square
distribution with n-s - 1 degrees of freedom Total x.r x.~ 1.. x., n
1507 400 Ref.
Statistical Hypothesis Testing

(II) There exists a constant a, O<a < co, then X,, etc. At each stage a decision is made
such that log{1 -F(t)}= -(at2/2)(1+o(l)} as on the basis of the previously obtained data
t-xl. whether the observation should be stopped
(III) There exists a function b(0) on R-o, and a judgment made on the acceptability of
with O<b(O)< co such that for each O~fi--w,, the hypothesis. Such a test is called a sequen-
tial test. Let X, , X2,. . be independent and
~~~P4”‘{I(T,,‘n”‘)-b(O)l>t}=O
identically distributed by&(x). For testing a
simple hypothesis H : 0 = 0 against a simple
for every t > 0.
alternative A: 0 = 1, we have the sequential
Suppose that {T,} is a standard sequence.
probability ratio test: Let G,(x,, x2, . . . , xn)
Then T, has the asymptotic distribution F if H
= l-I:=, fi (xi)/l-& fO(xi), and preassign two
is satisfied, but otherwise T,+ co in proba-
constants a, < ul. After the observations of
bility. Consequently, large values of T, are
X 1, . . . , X, are performed, the next random
significant when T, is regarded as a test statis-
variable X,,, is observed if a, < G,,(x,, , x,)
tic for H. Accordingly, for any given XE.%“(“),
<a,. Otherwise the experiment is stopped,
1 - F( T,(x)) is called the critical level in terms
and we accept H when G,, < a, or accept A
of T,, and is regarded as a random variable
when a, d G,. The constants a, and a, are
defined on Z(“) [ 11. It is convenient to describe
determined by the desired probabilities CI~ and
the behavior of this random variable as n+ m
x2 of errors of the first and second kind, re-
in terms of K,, where K,(x) = - 2 log[ l-
spectively. It is known that, among the class of
F(T,(x))]. Then for each OEW~, K, is asymp-
sequential tests in which the probability of
totically distributed as a chi-square vari-
error of the first (second) kind is not greater
able xi with 2 degrees of freedom and for OE
than CI~ (a,), the sequential probability ratio
n-w,, K,/n-tab2(8) in probability as n-ma.
test minimizes the expected number of obser-
The asymptotic slope of the test based on {T,}
vations when either H or A is true (- 398 Sta-
(or simply the slope of { T,}) is defined to be
tistical Decision Functions; 404 Statistical
c(0)=ab2(0). Note that the statistic K,“’ is
Quality Control).
equivalent to T, in the following technical
sense: (i) {K,“‘} is a standard sequence; (ii) for
each OE~, the slope of {K,!“) equals that of
References
{T’}; and (iii) for any given n and x, the critical
level in terms of K,!‘* equals the critical level in
[1] E. L. Lehmann, Testing statistical hypoth-
terms of T,. Since the critical level of K,“* is
eses, Wiley, 1959.
found by substituting K,“* into the function
[2] A. Wald, Sequential analysis, Wiley, 1947.
representing the upper tail of a fixed distribu-
[3] H. Chernoff, On the distribution of the
tion independent of F, {KA”} is a normalized
likelihood ratio, Ann. Math. Statist., 25 (1954),
version of {T,}. Suppose that {T,(l)} and {T,‘*)}
573-578.
are two standard sequences defined on Xc”),
[4] W. Hoeffding and J. Wolfowitz, Distin-
and let F(‘)(x), a,, and hi(Q) be the functions and
guishability of sets of distributions, Ann.
constants prescribed by conditions (I)-(III) for
Math. Statist., 29 (1958), 700-718.
i= 1,2. Consider an arbitrary but fixed /3 in
[S] M. G. Kendall and A. Stuart, The ad-
0 -wH, and suppose that x is distributed
vanced theory of statistics II, Griffin, 1961.
according to P,. The asymptotic efficiency
[6] S. Karlin, Decision theory for P6lya type
of {T,“)} relative to {T,*)} is defined to be
distributions, Case of two actions I, Proc. 3rd
~12(0)=c,(Q)/c,(O), where c,(O)=a,b~(O) is the
Berkeley Symp. Math. Stat. Prob. I, Univ. of
slope of {T,‘)}, i = 1, 2. The asymptotic effi-
California Press (1956), 115-128.
ciency is called Bahadur efficiency.
[7] A. Birnbaum, Characterizations of com-
Several comparisons of standard sequences
plete classes of tests of some multiparametric
are given in [ 111. The relationship between
hypotheses, with applications to likelihood
Bahadur efficiency and Pitman efficiency for
ratio tests, Ann. Math. Statist., 26 (1955), 21-
hypothesis-testing problems has also been
36.
studied. Under suitable conditions the two
[8] H. Cram&, Mathematical methods of
efficiencies coincide.
statistics, Princeton Univ. Press, 1946.
[9] K. Takeuchi and M. Akahira, Characteri-
L. Sequential Tests zations of prediction sufficiency (adequacy) in
terms of risk functions, Ann. Statist., 3 (1975),
Let X,, X,, . be a given sequence of random 1018-1024.
variables. To test a hypothesis concerning the [lo] J. Pfanzagl, A characterization of suffi-
distributions of these variables (sample sizes ciency by power functions, Metrika, 21 (1974),
are not predetermined), we observe first X,, 197-199.
401 A 1508
Statistical Inference

[ll] R. R. Bahadur, Stochastic comparison of the parameters 0 and ‘1. Then, given the ob-
tests, Ann. Math. Statist., 3 1 (1960), 276-295. servations X, =x, , X, =x2, , the conditional
probability density for 0 and 4 is given by

401 (XVlll.2) 7~is called the prior density and b the posterior
Statistical Inference density for the parameters. Then all the in-
formation obtained from the sample is consid-
ered to be contained in the posterior distri-
A. The Statistical Model bution with the density p(O, q), and conclusions
on the parameters can be drawn from it.
Broadly and loosely speaking, the term “statis- The prior density ~(0, q) does nol. necessarily
tical inference” may imply any procedure for represent a frequency function of a population
drawing conclusions from statistical data. But of which the parameters are a random sample,
now it is usually understood more rigorously to but in most cases treated by the Bayesian
mean those procedures based upon a +proba- approach it is considered to be a summary of
bilistic model of the data to obtain conclusions the statistician’s judgment over relative possi-
concerning the unknown parameters of the bilities of the different values of the parameters
population that represents the probabilistic based on all the information obtained before
model by viewing the observed data as a ?-an- the observations are made. Bayesians claim
dom sample extracted from the population. that it is always possible to determine such a
As the simplest example, suppose that, for prior distribution in a coherent way, specifying
some system, we have a number of observa- the subjective probability, representing a per-
tions from repeated measurements or experi- son’s judgment under uncertainty, as opposed
ments under a supposedly uniform condition. to the objective probability, representing the re-
If we can assume that there are no systematic lative frequencies in a population. 1,. J. Savage
trends or tendencies involved, we can suppose [7] succeeded in developing a formal mathe-
that the variations among repeated observa- matical theory of the subjective probability
tions are due to random causes and assume from a set of postulates about the consistent
that the observed values X,, X,, are inde- behavior of a person under uncertainty.
pendently and identically distributed random The non-Bayesian statisticians, however, do
variables. Our purpose in making observations not accept the Bayesians’ viewpoint and insist
is to draw some information from the data, that that statistical inference should be free from
is, to make some judgment on an unknown any subjective judgment and be based solely
system quantity 0, which together with some on the objective properties of the sample de-
other quantity (quantities) q characterizing the rived from the assumed model. The theory
measurement or the experiment, determines developed by R. A. Fisher, J. Neyman, and E.
the distribution of the Xi. We assume that the S. Pearson, and others is based pn the non-
distribution has a density function J’(x; 0, q). Bayesian approach.
This amounts to assuming that the observed
values X, , X,, . are a sample randomly
drawn from a hypothetical population of the C. Problems of Non-Bayesian Inference
results of the measurements or experiments
supposedly continued indefinitely. Then the The most commonly used forms of statistical
problem of statistical inference is one of mak- inference are point estimation, used when we
ing some judgment based on the random want to get a value as the estimate for the
sample. The set of hypotheses postulating the parameter; interval estimation, when we want
distribution of the observed values is called the to get an interval that contains the true
probabilistic model of the observations, and value of the parameter with a probability not
the problem of determining a model in a spe- smaller than the preassigned level; and hypoth-
cific situation is called that of specification. esis testing, when it is required to determine
whether or not some hypothesis about the
parameter values is wrong (- 399 Statis-
B. Bayesian and Non-Bayesian Approaches tical Estimation, 400 Statistical Hypothesis
Testing).
There are two different ways to make infer- In any type of statistical inference, the prob-
ences on the population parameters: the Baye- lem can be abstractly formulated by deter-
sian approach and the non-Bayesian approach. mining a procedure that defines a rule, based
In the Bayesian approach it is assumed that on the sample observed, for choosing an ele- ’
we have some probability density ~(0, I?) for ment from the set of possible conclusions.
1509 401 E
Statistical Inference

Such a procedure is evaluated by the probabil- 403 Statistical Models), i.e., choosing the best
istic properties derived under different values of various possible models.
of the parameter from the distribution of the We may also seek procedures that are little
sample, and it is usually required to satisfy affected by the departure of the distribu-
some type of validity criteria (such as un- tion of the data from the assumed or some
biasedness of an estimator, size of a test, etc.), other model that satisfies the condition of
and among those satisfying them, one which validity without any assumption about the
is considered to be best according to some exact shape of the distributions (- 371 Robust
optimality criterion (such as minimum var- and Nonparametric Methods). Generally, the
iance or most-powerfulness) is looked for. But problem of determining the model or specifi-
in the sense of objective probability, the prob- cation should not be dealt with by mathemat-
abilistic property of a procedure is relevant ical methods alone, and it should be consid-
only for the frequencies in repeated trials when ered by taking into account the properties
the same procedure is applied to a sequence of and nature of the subject under considera-
samples obtained from the population and has tion and also the process of measurement or
no direct implications for the conclusion ob- experimentation.
tained by applying the procedure to a spe-
cific sample we have in hand. For this reason
Neyman argued that in statistical inference E. History
there is really no such thing as inductive in-
ference but only inductive behavior. Fisher The first appearance of statistical inference as
disagreed strongly with this argument and a method of grasping numerical characteris-
emphasized that statistical analysis is induc- tics of a collective was seen in the study by J.
tion and that its purpose is to allow us to Graunt (1662) of the number of people who
draw the proper conclusions from a particular died in London. W. Petty applied Graunt’s
sample and that the probabilistic properties method further to the comparison of commu-
of the procedure should and could have rele- nities in his Political arithmrtic (1690). J. P.
vance for a particular conclusion obtained Siissmilch, a member of the Graunt school,
from a specific sample, provided that all the in- perceived the regularity in mass observations
formation contained in the sample is used. The and stressed the statistical importance of this
arguments between Fisher and Neyman led to regularity. The development of the theory of
a heated controversy between their followers probability inevitably affected the theory of
that is still not completely settled. Fisher’s statistical inference. The method of T. Bayes
arguments lead to the principle of sufficiency was the first procedure of statistical inference
and the principle of conditionality. The prin- in the current meaning of this expression. We
ciple of sufftciency dictates that all inferences now have a theorem bearing his name (the
should be based on a sufficient statistic if there Bayes theorem), which is stated in current
is one, and the principle of conditionality re- language as follows: If we know the proba-
quires that any inference should be based on bility P,(E) that a cause C produces an effect E
the conditional distribution given the ancillary and if the prior (or +a priori) probability P(C)
statistic, i.e., a statistic whose distribution is of the existence of the cause C is also known,
independent of the parameter, if there is such a then the posterior (or +a posteriori) probability
statistic. These two principles are accepted by of C, given an effect E, is equal to
many statisticians who do not follow all of
Fisher’s arguments, though the principle of =ccP(Wc(E)
I-‘,(C) P(Wc(E)
conditionality sometimes leads to difficulties
due to nonuniqueness of the ancillary. (- 342 Probability Theory F). This theorem,
easily extendable to the continuous case, sug-
gests the following inference procedure: If we
are informed that an effect E has taken place,
D. Specification Problem
then we calculate the probabilities PE(C) for
every cause C, compare them, and infer that
It is often difficult and sometimes impossible the C* with PE(C*)=max,P,(C) is the most
to have an exactly correct model for the data, probable cause of E.
and we must be satisfied with a model that Both P. S. Laplace and C. F. Gauss dis-
gives a sufficiently close approximation and is cussed the theory of estimation of parameters
mathematically tractable as well. It may also (- 399 Statistical Estimation) as an application
happen, however, that a model first specified of the Bayes theorem. In his research, Laplace
may be far from reality and could lead to considered a monotone function fV((t- Ol),
erroneous conclusions if relied on blindly. and W= 1t - 0) in particular, of the distance
Here, the problem of model selection arises (- 1t - 01 between a parameter value fl and its
401 F 1510
Statistical Inference

estimate t as a measure of significance of the methods, which later came to be termed large
error of the estimate t. Gauss, following La- sample theory. Suppose, for instance, that
place, used this weight function W( 1t - 01) of {X,,..., X,} is an tindependent sample of size
error, and going beyond Laplace, realized that n from a normal population N( p, tag). The
it would be mathematically fruitful to put random variable Z = &(x - pO)/f~ with x =
W( 1t - 01) = (t - Q2. Such considerations led xiXi/n is distributed according to N(0, 1)
him to the study of the tleast squares method, when p = pLo. Therefore, if the size n is suffl-
in which the terminology and notation he de- ciently large (n-, co), we estimate CFby B =
vised are still in use. He also developed the {Ci(Xi-X)2/(n- 1)}1’2 and deal with the
theory of errors and recognized the impor- random variable T= ,,&(x - pO)/B obtained
tance of the normal distribution and found by inserting B in place of u in the expression
that the least square estimate is equal to the for Z, as if T itself were distributed according
most probable value if the errors are normally to N(0, 1).
distributed.
F. Galton, a biologist, revealed the useful-
ness of statistical methods in biological re- F. Development in the 20th Century
search and explored what we call iregression
analysis (- 403 Statistical Models) by intro- W. S. Gosset, writing under the pen name
ducing the concepts of regression line and “Student,” reported in 1908 the discovery of
tcorrelation coefficient. His research on re- the exact distribution of T and thereby opened
gression analysis originated from the study of the new epoch of exact sampling theory (- 374
the correlation between characteristics of Sampling Distributions). This work of Student
parents and children, but he failed to realize made it possible to perform statistical in-
the difference between tpopulation character- ference by means of small samples and conse-
istics and tsample characteristics. quently changed statistical researc:h from the
Following Galton, K. Pearson developed study of collectives to that of uncertain phe-
the theory of regression and correlation, with nomena; in other words, the concept of popula-
which he succeeded in establishing the basis of tion was once again related to a tprobability
biometrics (- 40 Biometrics). He arrived at space with a tprobability distribution (i.e., a
the concept of population in statistics: A sta- population distribution) containing unknown
tistical population is a collective consisting of parameters. Thus it began to be emphasized
observable individuals, while a sample is a set that a sample has to be drawn at random (i.e.,
of individuals drawn out of the population and a random sample) from the population if we
containing something telling us about charac- are to make an inference about a parameter
teristics of the population. Thus statistical based on the sample.
research is regarded as investigation that Fisher presented a complete derivation,
focuses not on a sample as such but on a using the multiple integration method, of
population from which the sample has been the +t-distribution (the sampling distribution
drawn. This consideration raised the problem of T). In addition, Fisher introduc:ed the con-
of the goodness-of-fit test (- 400 Statistical cepts of tnull hypothesis and significance test,
Hypothesis Testing), that is, the problem of which were the starting points for later pro-
knowing whether a sample is likely to have gress in the theory of hypothesis testing. He
been drawn from a population whose distri- also added the concepts of tconsistency, teffl-
bution was determined by theoretical con- ciency, and tsufflciency to the list lof possible
siderations. K. Pearson characterized some properties of testimators, and he studied the
population distributions occurring in practice connection between the information contained
by a differential equation, and classified them in a sample and the accuracy of an estimator,
into several types. Using this classification, he which led to the idea of amount of informa-
discussed goodness-of-fit tests and developed tion. Fisher also proposed the tmaximum
the X2-distribution (tchi-square distribution) in likelihood estimator, which is formally equi-
relation to the problem of testing hypotheses. valent to the most probable value, but he
Statisticians in the time of K. Pearson renamed it and gave it a foundation com-
thought of a population as a collective having pletely independent of any prior information
infinitely many individuals (i.e., an infinite and showed that it leads to the at least asymp-
population), which led to the idea that the totically efficient estimator.
larger the size of a sample (i.e., the number of Fisher made efforts to obtain a distribution
individuals in the sample), the more precisely of the parameter directly from the sample
could the sample give information about the observation, hence independently of the con-
population. They carried out inferences,in- cept of prior probability. He sought in this
cluding the testing of hypotheses (- 400 Sta- way to be released from the weakness of the
tistical Hypothesis Testing), by approximate Bayes method. For this purpose hie introduced
1511 401 G
Statistical Inference

the concept of fiducial distribution, which was Decision Functions) in 1939, there has been
the subject of bitter controversy in the period a steady increase in its importance. In this
that followed. As an example of a fiducial theory the totality 9 of available statistical
distribution, we consider here the +Behrens- procedures, which is considered implicitly in
Fisher problem: Let X,, . ,X, and YI, . , K the Neyman-Pearson theory, is put forth ex-
be samples drawn independently from the plicitly as a set and defined as the space of
populations N(pI, $) and N(p2, gg), respec- decision functions. Wald also defined the +risk
tively, where the parameters pl, p2, ol, and g2 function of a statistical decision procedure and
are all unknown. The problem raised is to test used it as a basis for judging procedures. In
the hypothesis p, =p2 or to estimate 6 =p, - addition, he employed the concept of prior
p2 by an interval. To solve this problem, we probability and the Bayes procedure for the
Put purpose of proving the tcomplete class theo-
rem. Wald’s idea of bringing the concept of
Z=CX,/m, Y=C rjJn,
I j prior probability back into statistical theory
carried a great deal of weight, and much litera-
Sf=C(Xi-X)‘/(m- l),
I ture has now been accumulated on this sub-
ject. Prior probability as a technique in statis-
tics was abandoned after Fisher’s introduction
of the maximum likelihood method indepen-
and learn that TI = &(x--~JS1 and T2
dent of prior probability and Neyman’s
= &( r- p2)/S2 are mutually independent
assertion that a probability distribution on the
and distributed according to the t-distribution
tparameter space made no sense. In addition,
with degrees of freedom m - 1 and n - 1, re-
Wald linked statistical inference to games (-
spectively. From this fact Fisher reasoned as
173 Game Theory) and introduced the tmini-
follows: Given observed values X, 7, sl, s2 of
max principle into statistics. The decision-
the variables X, Y, S,, S,, the distributions of
theoretic setup also enabled him to develop a
the parameters pL1 and p2 are induced from
theory of sequential analysis by comparing
the distributions of Tl and T, by means of
the cost of sampling with the risk of erro-
transformations
neous decisions (- 400 Statistical Hypothesis
Testing).
After the publication of Savage’s book in
Jm Jn 1954, there was a revival of the Bayesian ap-
Consequently the distribution of 6 =X-J- proach, i.e., one based on the concept of sub-
(T,s,/&- Tzs2/&) is obtained. These jective probability, and now the group of those
distributions are called the fiducial distribu- statisticians who accept the Bayesian approach
tions of the parameters pI, p2, and 6. The are called Bayesians or neo-Bayesians.
interval IS-(X-y))l<c of 6 deduced from the
fiducial distribution of 6 is called a fiducial
interval of 6. G. Applications
Neyman and E. S. Pearson developed a
mathematical theory of testing hypotheses, in Methods of statistical inference are applied in
which they deliberately defined a family of many fields where statistical data are used for
population distributions admissible for formal scientific, engineering, medical, or managerial
treatment and considered alternative hypoth- purposes. Methods of producing data that
eses within the family. They proposed to are appropriate for statistical inference have
relate a test to its tpower function, on the basis also been developed. R. A. Fisher developed
of which the test would be judged. Their ideas the method of statistical +design of experi-
brought mathematical clarity to the theory of ments (- 102 Design of Experiments) that
inference. Furthermore, concerning interval when it is impossible or impractical to elimi-
estimation, Neyman devised an alternative to nate completely experimental errors or vari-
the fiducial interval, the tconfidence interval, abilities, provides the procedures to obtain
which has full mathematical justification. such data. These data, though subject to ran-
Unfortunately it was later found that the dom errors, are susceptible to rigorous statis-
confidence interval, fiducial interval, and the tical inference. For this purpose Fisher intro-
Bayes posterior interval based on the posterior duced the principles of trandomization, tlocal
distribution often gave distinctly different control, and treplication in the design of experi-
results to the same problem, which became a ments. W. A. Shewhart defined the tstate of
source of controversy among different schools statistical control in mass-production pro-
of thought. cesses, where the variabilities of the products
Since the publication of A. Wald’s theory of can be considered to be due to chance causes
statistical decision functions (- 398 Statistical alone and hence are statistically analyzable.
401 Ref. 1512
Statistical Inference

Applying the idea of statistical inference to this Mechanics; 351 Quantum Mechanics). Dy-
situation, Shewhart established the method of namical description of such microscopic mo-
statistical quality control (- 404 Statistical tion in full detail is impossible and even mean-
Quality Control). Neyman introduced the ingless. A physical process in thermodynamics
method of irandom sampling into statistical or thydrodynamics is described in terms of a
surveys and developed the theory of estima- relatively small number of macroscopic vari-
tion and allocation based on the theory of ables, such as temperature, pressure, and a
statistical inference (- 373 Sample Survey). velocity field. Such a process shows a remark-
In many applied fields there exist systems able simplicity which is a statisticail result of
of statistical methods which have been devel- the molecular chaos. This is the reason why
oped specifically for the respective fields, and statistical mechanics is needed as a theoretical
although all of them are based essentially on model to unify microscopic dynamics and
the same general principles of statistical in- +probability theory. Thus statistical mechanics
ference, each has its own special techniques aims at deriving physical laws in the macro-
and procedures. Specific names have been scopic world from the atomistic structures of
invented, such as biometrics (- 40 Biometrics) the microscopic world on the basis of micro-
econometrics (- 128 Econometrics), psycho- scopic dynamical laws and probabilistic laws.
metrics (- 346 Psychometrics), technometrics, Its function is twofold. First, statistical me-
sociometrics, etc. chanics should give microscopic proofs of the
macroscopic laws of physics, such as those of
thermodynamics or the laws of macroscopic
References
electromagnetism. Second, it should also pro-
vide us with detailed knowledge of physical
[1] R. A. Fisher, Statistical methods for re-
properties of a given material system once its
search workers, Oliver & Boyd, 1925; eleventh
microscopic structure is known. In this sense,
edition, 1950.
statistical mechanics is an essential basis of the
[2] R. A. Fisher, The design of experiments,
modern science of materials.
Oliver & Boyd, 1935; fourth edition, 1947.
Strictly speaking, the dynamics of the
[3] R. A. Fisher, Contributions to mathemat-
microscopic world obeys iquantum mechanics.
ical statistics, Wiley, 1950.
However, even before the birth of quantum
[4] R. A. Fisher Statistical methods and scien-
mechanics, statistical mechanics had pro-
tific inference, Oliver & Boyd, 1956.
gressed on the basis of classical mechanics.
[S] J. Neyman and E. S. Pearson, Contri-
This stage of statistical mechanics is often called
butions to the theory of testing statistical
classical statistical mechanics, in contrast to
hypotheses I, II, Statist. Res. Mem. London
Univ., 1 (1936), l-37; 2 (1938), 25557. quantum statistical mechanics based on quan-
tum mechanics. Statistical mechanics has a
[6] A. Wald, Statistical decision functions,
fully developed formalism to apply to physical
Wiley, 1950 (Chelsea, 197 1).
systems in thermal equilibrium. This is some-
[7] L. J. Savage, The foundation of statistical
times called statistical thermodynamics or
inference, Methuen, 1962.
equilibrium statistical mechanics. Until the
[S] C. R. Rao, Linear statistical inference and
1950s the term “statistical mechanics” had often
its applications, Wiley, 1965.
been used in this narrow sense. In a wider
[9] D. A. Fraser, The structure of inference,
Wiley, 1968. sense it is concerned with systems m more
general states, for instance, in nonequilibrium
[lo] D. V. Lindley, Introduction to proba-
bility and statistics from a Bayesian viewpoint states. In the modern literature, a general
statistical mechanical theory of nonequilib-
I, II, Cambridge Univ. Press, 1965.
rium systems is often referred to as the statis-
tical mechanics of irreversible processes.

1 B. History
402 (XX.1 9)
Statistical Mechanics The early stage of statistical mechanics can
be traced back to the kinetic theory of gases,
which started in the 18th century. In dilute
A. General Remarks
gases, gas molecules fly freely through the
whole volume of the vessel and collide only
One cubic centimeter of water contains about
from time to time. In thermal equilibrium, the
3 x 10” water molecules. A macroscopic sys-
average energy of each molecule is determined
tem of matter thus consists of an enormous
by the temperature of the gas; namely
number of particles incessantly moving in
accordance with the laws of dynamics (- 271 m 0212 = m z/2 = m 0,212= kT/2,
1513 402 D
Statistical Mechanics

where (u,, u,,, u,) is the velocity, an overbar p,, , p,. Dynamical states of the system
means the average, m is the mass of a mole- constitute a set of points in this space. At a
cule, T is the absolute temperature, and k is given time, the state of the system is repre-
the Boltzmann constant (= 1.38 x 10-l” erg. sented by a point P in the phase space, and
deg-i). The velocity of each molecule is only hence the motion of the system is represented
probabilistic, and a idistribution function by the motion of P. If the system is conserva-
,f’(n,, ry, vZ) is defined as the tprobability den- tive, its energy function is constant. Let X be
sity that the velocity of a given molecule is the +Hamiltonian function. Then the motion
found to be m the neighborhood of (a,, ryr u,). of P is confined to an energy surface defined
In a dilute gas, this is given by by the condition .Z = E = constant. Measure
on an energy surface is defined as the limit of
./Xv,, ug,0,) = Cexp

the Maxwell-Boltzmann
-;(v;+v;+vi)/kT

distribution law.
1 , (1) the volume element lying between two neigh-
boring energy surfaces corresponding
energies E and E + dE. The motions of P form
to the

L. Boltzmann viewed the velocity distri- a ttopological group that makes this measure
bution function as changing in time as a result invariant (+Liouville’s theorem).
of molecular collisions and gave an equation A dynamical quantity A(p, y) of the system
of the form changes its value as the phase point P moves
on the energy surface. The time average A of
a- (4 A is identified with the value of A observed in
;,=AIfl+rlfl,
the equilibrium state of the system, namely,
where A[,f’] is the change of the distribution the average of A with respect to the invariant
function ,f by acceleration due to the presence measure. Boltzmann justified this assumption
of external forces and I[.f] is the change by the following reasoning. If the energy sur-
caused by molecular collisions. r[,J’] is an face has a finite measure and the trajectory of
integral which is nonlinear in ,J This type of P does not make a closed curve on the energy
surface, it can be assumed that the trajectory
equation is called a Boltzmann equation [ 1,2].
will move around practically everywhere on
Boltzmann introduced the H-function by the
the surface. Mathematically formulated, the
definition
on!y measurable subset of the surface that has
a nonzero measure and is invariant under the
H= f logf dv,dv,dv; (3)
SIS motion is the whole surface. This assumption
is the ergodic hypothesis [669]. The long-time
and proved on the basis of equation (2) that
average of A will then equal the average of A
dH/dt,<O. This theorem is known as the H-
over the entire energy surface with weight
theorem [l-4]. The equilibrium distribution
function equal to the measure previously in-
(1) is therefore obtained from equation (2) as
troduced. The latter average is called the phase
the solution that makes H a minimum. In fact
average and is denoted by (A). Boltzmann
the H-function is related to the entropy S by
thus asserted that
S= -kH. (4) A =(A). (5)
Boltzmann further showed (1877) that the
Efforts of mathematicians to study the ergodic
distribution function of a system in thermal
hypothesis created an important branch of
equilibrium can be obtained on more general
mathematics called ergodic theory (- 136
grounds without relying on a kinetic equa-
Ergodic Theory).
tion of the type (2) and that the statistical
mechanics of systems in equilibrium can thus
be constructed on a basis much more general
D. Ensembles in Classical Statistical
than that given by a kinetic theory. It was W.
Mechanics
Gibbs, however, who clearly established (1902)
the complete framework of statistical thermo-
Once we admit the ergodic hypothesis, or
dynamics, although he had to confine himself
more specifically the assumption (5) the cal-
to classical statistical mechanics [S].
culation of the observed value of a physical
quantity A for a system in equilibrium is re-
duced to finding the phase average of A on
C. The Ergodic Hypothesis an energy surface. The task of statistical me-
chanics of systems in equilibrium is thus re-
For a given dynamical system with n tdegrees duced essentially to calculating phase averages
of freedom, the phase space is defined as a 2n- and establishing relationships between them
dimensional space with tgeneralized coordi- [10P13].
nates q i , , q, and tgenerahzed momenta For a set (called an ensemble in this case) of
402 E 1514
Statistical Mechanics

identical systems with the same energy, we By a traditional convention we introduce the
consider the phase average for the tprobability parameter
space with the measure mentioned in Section
/I= l/kT
C on the energy surface corresponding to the
given energy value. Gibbs called a probability and write (9) as
space of this kind a microcanonical ensemble.
An average in this probability space is defined E = - a log Z(/?)/d/?,

by where Z(b) is called the partition function or


the sum over states and is given for a system
composed of N identical particles by

where .# is the Hamiltonian function, grad%’


is its gradient in the 2n-dimensional phase
Z(P)
=s emBx dT/N!. (10)

space, and the integration is carried over the If an exchange of particles with the environ-
energy surface with dS as surface element. ment takes place in addition to an exchange of
When the observed system is in mechanical energy, the probability of finding a system
contact with a heat reservoir, the composite with particle number N in the volume element
system consisting of the system and the heat dT is given by
reservoir is regarded as an isolated system
with constant energy. Then an ensemble of the
composite systems is treated as a microcanon- where p is a real parameter called I he chemical
ical ensemble. It is more convenient and more potential; this characterizes the environment
physical, however, to consider the heat re- with regard to the exchange of particles. This
servoir simply as providing an environment ensemble is called the grand canonical en-
characterized by its temperature T, and to semble. The average of a dynamical quantity A
concentrate only on the system in which we is then given by
are interested. Then the system is no longer
isolated and exchanges energy with its en-
vironment. Since the energy of the system is
no longer constant, the system will be found
in any part of the phase space with a certain where the dependence of A and X on N is
probability. To find the probability distri- now explicitly written, and where
bution for an ensemble of this system is a
problem of asymptotic evaluation which is
solved on the basis of the ergodic hypothesis
and the fact that a heat reservoir has an ex- is called the grand partition function.
tremely large number of degrees of freedom.
This asymptotic evaluation is traditionally
done with the help of +Stirling’s formula or by
using the Fowler-Darwin method [lo], but E. Ensembles in Quantum Statistical
Mechanics
it is essentially based on the tcentral limit
theorem [ 111.
The probability space of this kind of en- The quantum counterpart of the classical
semble of systems in contact with heat re- ergodic hypothesis is that to each of these
servoirs was called a canonical ensemble by quantum states an equal probabi1it.y weight
Gibbs [S]. If dT is a volume element of the should be assigned [lo]. A microcanonical
phase space of the system, the probability of ensemble is then defined by this principle of
finding a system arbitrarily chosen from the equal weight, which yields in turn
ensemble in a volume element dT is given by <A)=~A,ICI (13)
I I
Pr(dT) = Cexp( -X/kT)dr. (7)
instead of (6). Here the index 1 refers to the
Accordingly, the average of a dynamical quan- quantum states lying in the interval AE, and
tity A is given by A, is the quantum-mechanical expectation of
a dynamical variable A in the quantum state
(1)=~Ae-““dTi’Se-.““‘d1. (8) 1. A canonical ensemble is now defined by
assigning

For example, the average energy is (14)

to thejth quantum state as the probability


that the system will be found in that state. The
1515 402 G
Statistical Mechanics

expectation value of A must be given by interaction of particles plays a critical role.


Such problems are clearly many-body prob-
(A)=~Aje-BE~/~e~BEJ=tr,4e~PH/tre~PH,
i j lems. There are a number of important and
(15) interesting problems in this category, for
example, transitions between ferromagnetic
where H is the Hamiltonian. The partition
and paramagnetic states and those between
function is defined by
the superconducting and normal states of
Z=Ce-fJE,=tremP”, (16) metals. Transition from a high-temperature
phase to a low-temperature phase is generally
corresponding to (10). regarded as a consequence of the appearance
For a system consisting of identical parti- of a certain type of order in thermal motion.
cles, quantum mechanics requires a particular This kind of phase change is called an order-
symmetry of its twave function; namely, the disorder transition [14-161.
wave function must be even or odd with re-
spect to permutation of any two particles
according as the particles are bosons or fer- G. Thermodynamic Limit and Characterization
mions. This symmetry requirement is peculiar of Equilibrium States
to quantum mechanics. Thus, even for an ideal
gas consisting of noninteracting particles, Although an actual system is finitely extended,
quantum statistics leads to results characteris- the enormous sizes of the usual macroscopic
tically different from those of classical statis- systems in comparison to the sizes of their
tical mechanics. This difference becomes more constituent particles justifies the idealization to
significant when the particle mass is smaller, infinitely extended systems. At the same time,
the density is larger, and the temperature is there are several mathematical advantages in
lower. Quantum effects of this kind are seen in considering infinitely extended systems, such
metallic electrons, in liquid helium, in an as- as the absence of walls (replaced by the bound-
sembly of photons or phonons, and in high- ary condition at infinity, should it be relevant),
density stars. The statistical laws obeyed by appearance of phase transitions as mathemat-
bosons are called Bose statistics, and those ical discontinuities rather than mathematically
obeyed by fermions, Fermi statistics. smooth though quantitatively sudden changes,
The expectation value of A in the grand and mathematically clear-cut occurrence of
canonical ensemble is given in quantum sta- broken symmetries.
tistics by Equilibrium states of infinitely extended
systems are usually obtained by taking the
(A)=E(~,~))‘tr(Ae~PH+P@‘), (17)
limit of the equilibrium states of systems in a
where H is the tsecond-quantized Hamil- finite volume V as both V and the number of
tonian, N is the number operator, the trace tr particles IV tend to co with the density p = N/V
is taken on the (nonrelativistic) +Fock space fixed; this is called the thermodynamic limit.
(symmetric or antisymmetric according to It is sometimes possible to formulate the
Bose or Fermi statistics), and Z(/3, p) is the dynamics of infinitely extended systems direct-
grand partition function given by ly and to characterize their equilibrium states,
which more or less coincide with the thermo-
(18)
dynamic limit of equilibrium states of finitely
extended systems [ 17-211. The simplest and
F. Many-Body Problems in Statistical most fully investigated case of lattice spin
Mechanics systems is explained below in detail [ 171. Since
classical systems can be viewed as special cases
Since statistical mechanics is primarily con- of quantum systems, we start with the latter.
cerned with systems with large numbers of To be definite, we take a v-dimensional cubic
particles, problems in statistical mechanics lattice Z” with a lattice site n = (nr , , n,)
are essentially many-body problems. In prac- specified by its integer coordinates nj. (In the
tice, however, there are some cases where lattice case, the thermodynamic limit is simply
extreme idealization is possible, as in ideal the limit as V+ co.)
gases, where the interaction between gas mole- The C*-algebra ‘11 of observables is gen-
cules is ignored. In some cases we can proceed erated by the subalgebra 53, at each lattice site
by successive approximation, taking the par- n, which is assumed to be the algebra of all d x
ticle interactions as perturbations. Such per- d matrices (for example, linear combinations
turbational treatments are, however, entirely of +Pauli spin matrices ~(“)=(c$‘r, a:), @)) and
uselessfor someproblems,such as phasetran- the identity for d = 2) and to commute with
sitions, of which an example is the conden- operators at other lattice sites. The group of
sation of gases into liquid states, where the lattice translations n -tn + a is represented by
402 G 1516
Statistical Mechanics

automorphisms yu of 2t, satisfying ~~21, = 2l,+, AemPH(““‘), where T,, 0 IJ?is the product of
(ya#) = o(“+~‘)). F or any subset A of Z”, 21(A) the unique tracial state z, on ‘U(A) and a state
denotes the C*-subalgebra of 2I generated by $ on 2l(A’) (the boundary condition), H(A)=
21,, rlEc\. U(A)+ W(A) and W(A)=C,{@(I) InA#Qi,
A model is specified by giving a potential @ In A” # 0) (the surface energy). The follow-
which assigns to each finite nonempty subset ing conditions on a state cp of 21 are mutually
I of Z” an operator @(I)=@(I)* E 21(I). The equivalent under the condition that & is a
Hamiltonian for a finite subset A of Z” is given generator (which holds under any one of the
by U(A)=C ,,,@(I). In order to control long- conditions described above) and is satisfied
range interactions, various assumptions are by any limit state of the above VI,, as A /*Z”
introduced. Examples are finiteness of either of (i.e., a state in &, {cpA I A’ c A, $}, with the
the following: bar denoting weak closure).
1. KMS condition: cp(Accip(B))= q(BA) for
ll@ll =sup 1 N(I) -I ll@,(I)ll, (19)
n li” any A, BE’LI such that a,(B) is an entire func-
tion of t. (4” is called a b-KMS state.)
(20)
2. Local thermodynamic stability: For any
finite subset A of Z” and for any state t/j having
Here N(I) is the number of points in I.
the same restriction to ‘U(A) as the state cp
Let 21:’ be a maximal Abelian *-subalgebra
under consideration, F,,Jcp) d FA,6(11/) (the
of ‘2l,, (such as (c, +c,@‘)) satisfying r,2l;‘=
minimality of the free energy multiplied by fi),
2[::, and 21” be the Abelian C*-subalgebra
of 21 generated by ‘LIZ’, ME Z’. If @(I) is in 2V’ where ~A,ab) = BdW)) - &(cp), .sA(cp)=
lim{S,,.(rp)--S,,. ,,((p)) as A’7 Z’(the open
for all I, we call the potential Q Ahelian or
classical. There exists a conditional expecta- system enWd, s,(cp) = -&log p,(v)) (the
closed system entropy) and the density matrix
tion 7~” which is a positive mapping of norm
1 from 21 onto 21” satisfying n”(ABC)= p,,(cp)~Sl(A)+ is defined by cp(A)=r,(p,(cp)A)
for all A E 21(A).
II@(B)C for A and C in 21” and x”‘( 1) = 1. If
3. Gibbs condition: For every finite subset A
a state cp on 21 satisfies q(A)= 47(7?(A)), we
of Z”, the perturbed state ~fl~‘~’ (not neces-
call the state cp classical. Classical states are in
sarily normalized) is the product qz x $ of
one-to-one correspondence with the restriction
the Gibbs state q:(A)= tr(e~P”cA)A)/tre-“u(A’
on 2V’, which can be viewed as a probability
on AE~I(A) and some (unknown) state $ on
measure on the spectrum (also called configu-
21(Ac), where the representative vector @ for p
ration space) of the C*-algebra ‘II” of obser-
for the GNS representation no, is assumed to
vables for classical spin lattice systems. This
be separating for 71$X)“, and then q?“(“)(A)=
correspondence makes it possible to view
(0, ~&A)fi) for
classical spin lattice systems as quantum spin
lattice systems with Abelian interactions.
For a given potential @, the time evolution
of the infinitely extended system is described
by the one-parameter group x,, PER, of *- x ( W(A))A~~l-‘n.. A;-%JW(A))a),
automorphisms of 2l defined as the following (22)
limit: where AQ is the tmodular operator for @ and
the series converges.
For a classical potential, this condition re-
The limit exists if @(I) = 0 for N(I) > IV duces to the conditions that q is classical and
and Ill@lll< a, or if for some i>O that the restriction of 40 to 2V’ as a measure on
C,~^“(suP,C,{ll~(~)ll Jl% NU)=n})< the configuration space { 1 d)“’ satisfies the
w, or if v = 1 (1 -dimensional lattice) and following DLR equation due to R. L. Do-
W,X)ZIZI, mcx,~)f brushin, 0. E. Lanford, and D. Ruelle: The
~~~~c,iii~uii Irw
0) < W. An alternative way is first to de- conditional probability for <(A)E { I d}A
fine S,(A)=C,i[@(I),A] for AEU~%(A) knowing ME (1 d}““” is proportional
(A is a finite subset of Z’), which exists if to exp( -bH(A)), where H(A) = U(.I) + W(A) is
~~~@~~~< w, and to prove that the closure 6, a function of t(U(A) depending only on <(A)).
of &, is a generator of a one-parameter sub- 4. Roepstorff-Araki-Sewell inequality: For
group X, ( = exp r&J. In the above cases, &, is a any AE UA21(A), @(A*&,(A)) is real and
generator.
- ih(A*&(A)) 3 S(dA *A), dAA ‘I), (23)
A general canonical ensemble for a system
in a finite subset A of the lattice Z’, with where S(u,u)=ulog(u/v) if u>O, vr0, S(O,o)=
some boundary condition in the outside A’= 0 for a30 and S(u,O)= +cx3 for u,O.
ZV\A, is given by v~(A)=(T,, @ $)(r-OH(A) x 5. Roepstorff-Fannes-Verbeure inequality:
1517 402 G
Statistical Mechanics

For cc-entire A ~41, Dyson, Comm. Muth. Phys., 12 (1969); J. FrGh-


lich and T. Spencer, Comm. Math. Phys., 83
(1982)). If a l-dimensional interaction has a
finite range (i.e., @(I) = 0 if the diameter of I
exceeds some number rO) or if it is classical
where F(u, 2;)= (u - u)/log(u/u) for u > 0, u > 0, and Cls,, N(l)-‘(diaml-t 1)11@(1)11< 00 for
IA# u, F(u, u) =: u for u > 0, F(u, 0) = F(0, 0) = 0. d=2, then q(A) for cp~K, and AE~I(A) for
If the interaction is translationally invariant a finite A is real analytic in a and any other
(i.e., y,@(l) = @(I + a) for all UE Z” and I) and if analytic parameter in the potential (Araki,
we restrict our attention to translationally Comm. Math. Phys., 14 (1969); [22]; M. Cas-
invariant states (i.e., cp(y,(A))= q(A) for all sandro and E. Olivieri, Comm. Math. Phys.,
A~‘11 and UEZ”), then the following conditions 80 (1981)).
are also equivalent to the above. For a 2-dimensional king model with the
6. Variational principle: [k(cp)- s(q) < be($) nearest-neighbor ferromagnetic interaction
-s($) for all translationally invariant II, (the [23], K, consists of only one point for 0 <[j <
minimality of the mean free energy), where I$ while K, for fi > /& has exactly two extremal
e(cp)=limN(A)~‘cp(U(A))=limN(A)~’~(H(A)) points corresponding to positive and negative
(the mean energy), s(v) = lim N(A)-lS,,(v) (the magnetizations (M. Aizenman, Comm. Math.
mean entropy), the infimum value Be(cp) - s(v) is Phys., 73 (1980); Y. Higuchi, Colloquia Math.
-P(/?@) with P([j@)=limN(A)--’ logT,(e-Pu(A)) Sot. J&OS Bolyui, 27 (1979)). In this case, all
(the pressure), and the limits exist if A 7 Z” is KMS states are translationally invariant,
taken in the following van Hove sense: For while there exist (infinitely many) translation-
any given cube C of lattice points, the mini- ally noninvariant KMS states for sufficiently
mal number ni (C) of translations of C that large b if v = 3 (Dobrushin, Theory Prob. Appl.,
cover A and the maximal number n,(C) of 17 (1972); H. van Beijeren, Comm. Math. Phys.,
mutually disjoint translations of C in A satisfy 40 (1975)).
II; (C)/n, (C)d 1 as A 7 Z”. The accumulation points of b-KMS states
7. Tangent to the pressure function: P(Q) as [j- +cr, (or -m) provide examples of
is a continuous convex function on the Ba- ground (or ceiling) states defined by any one of
nach space of translationally invariant Q, with the following mutually equivalent conditions
I/@(/ < cry. A continuous linear functional c( on I +, 2 + (or 1 - ,2 -) (0. Bratteli, A. Kishimoto,
this Banach space is a tangent to P at (I, if and D. W. Robinson, Comm. Math. Phys., 64
I’(@ + Y) B P(Q) + X(Y) for all Y. For a trans- (1978)):
lationally invariant state $, we define cc,(Y) = 1 + (1 -). Positivity (negativity) of energy: For
$(C,,O N(I)-’ Y(I)). The condition is that any A E uA %(A), -@(A*&,(A)) is real and
- rrp is a tangent to P at p@. (Conversely, any positive (negative).
tangent CI to P at [j@ arises in this manner.) 2, (2-). Local minimality (maximality) of
The set I<, of all (normalized) [j-KMS states energy: For any finite subset A of Z” and for
is nonenipty, compact, and convex. A P-KMS any state $ with the same restriction to %(A’)
state v is an extremal point of K,] if and only if as the state cp under consideration, cp(H(A)) <
it is factorial (i.e., the associated von Neumann 44HbV) (v(H(N)>+(H(A))).
algebra n,(SI)” has a trivial center). It then has For translationally invariant potentials and
the clustering property lim,-,, { cp(Ay,(B)) - states, the following condition is also equiva-
cp(A)q(y,(B))) =0 and is interpreted as a pure lent to the above:
phase. Any /I-KMS state has a unique integral 3 + (3 ). Global minimality (maximality) of
decomposition into extremal B-KMS states. energy: e(v)<e($) (e(v)>e($)) for all trans-
For any @,, K, is a one-point set for suffi- lationally invariant states $.
ciently small IpI. For a l-dimensional system The totality of KMS, ground, and ceiling
(V = l), K, consists of only one point (unique- states can be characterized by the follow-
ness of equilibrium states usually interpreted ing formulation of the impossibility of per-
as indication of no phase transition) if the sur- petual motion: Let P, = PF E 91 be a norm-
face energy W([ - N, N]) is uniformly bounded differentiable function of the time t E R with a
(H. Araki, Comm. Math. Phys., 44 (1975); A. compact support, representing (external) time-
Kishimoto, Comm. Muth. Phys., 47 (1976)). For dependent perturbations. Then there exists a
the two-body interaction @({m,n})= -Jim- unique perturbed time evolution X: as a one-
n-zo~m)c7~), this condition is satisfied if r > 2 parameter family of *-automorphisms of \LI
while 11@11 < w and c(, defined if c(> 1. There is satisfying (d/&)x:(A) = x/(&,(A) + i[Pt, A])
more than one KMS state (with spontaneous for all A E’LI in the domain of 6,. A state cp
magnetization) for 2 >, r > I and large /U > 0, changes with time t as q,(A)= cp(s,“(A)) under
and hence a phase transition exists (F. J. the perturbed dynamics $, and the total
402 H 1518
Statistical Mechanics

energy given to the system (mechanical work as forming an electron gas, in which electron
performed by the external forces) is given by scattering by lattice vibrations or by impurities
LP(~)=~Zn. cp,(dPJdt)dr. For KMS states at is more important than electron-electron scat-
any /I, as well as ground and ceiling states, tering. Following the example of gas theories
L’(cp)>O for any P,. If cp is a factor state, the H. A. Lorentz set forth a simple theory of
converse holds, i.e., L’(q) > 0 for all P, implies irreversible processes of metallic electrons.
that cp is either a KMS, ground, or ceiling His theory was, however, not quite correct,
state. The condition L’(q) > 0 for all P, is since metallic electrons are highly quantum-
equivalent to - icp( U*&,( U)) > 0 for all unitary mechanical and classical theories cannot
U in the domain of 6, and in the identity be applied to them. Quantum-mechanical
component of the group of all unitaries of ‘Il. theories of metal electrons were developed by
A state cp satisfying this condition is called A. Sommerfeld and F. Bloch.
passive, and a state cp whose n-fold product
with itself as a state on ~Jl@” IS passive relative
to at’” for all n is called completely passive.
1. Master Equations
The last property holds if and only if cp is a
KMS, ground, or ceiling state (W. Pusz and
The Boltzmann equation gives the velocity
S. L. Woronowicz, Comm. Math. Phys., 16
distribution function of a single particle in the
(1970)).
system. This line of approach can be extended
The totality of KMS, ground, and ceiling
in two directions. The first is the so-called
states can be characterized by a certain sta-
master equation. For example, consider a
bility under perturbations (P, considered
gaseous system consisting of N particles, and
above) under some additional condition on
ask for the probability distribution of all the
z, (R. Haag, D. Kastler, and E. B. Trych-
momenta, namely, the distribution function
Pohlmeyer, Comm. Math. Phys., 38 (1974); 0.
fN(pI, ,pN; t), where p,, . . . ,pN are the mo-
Bratteli, A. Kishimoto, and D. W. Robinson,
menta of the N particles. The equations of
Comm. Math. Phys., 61 (1978)).
motion are deterministic with respect to the
When a lattice spin system is interpreted as
complete set of dynamical variables (x 1, p ,,‘..3
a lattice gas, an operator Nn~YtU:’ (such as ($1
xN, pN). The equation for ,f(p,, , pN, t) may
+ 1)/2) is interpreted as the particle number
not be deterministic, but it may be stochastic
at the lattice site n and N(A) = &,, N, is the
because we are concerned only with the vari-
particle number in A. It defines a representa-
ablesp,,...,p,, with all information about
tion of a unit circle T by automorphisms Z,
the space coordinates x1, , xN disregarded.
of % defined as r,,(A) = lim eiN(“)BAe-iN(h)B
This situation is essentially the sa.me in both
(A /*Z”), called gauge transformations (of the
classical and quantum statistical mechanics.
first kind). The grand canonical ensemble can
If the duration of the observation process is
be formulated as a /I-KMS state with respect
limited to a finite length of time and the preci-
to t(,r@, (instead of x,), where the real constant
sion of the observation to a certain degree of
p is called the chemical potential. It can be
crudeness, the time evolution of the momen-
interpreted as an equilibrium state when the
tum distribution function fN can be regarded
gauge-invariant elements {A E% 1T,(A) = A}
as a +Markov process. In general. an equation
instead of 91 are taken to be the algebra of
describing a Markov process of a certain dis-
observables or as a state stable under those
tribution function is called a master equation.
perturbations that do not change the particle
Typically it takes the following form for a
number.
suitable choice of variables x:

H. The Boltzmann Equation


=sdx’(
W(x’,
x)f‘(x’,
t)-W(x,
x’).f”(x,
t)), (25)

Statistical mechanics of irreversible processes where W(x, x’) is the transition probability
originated from the kinetic theory of gases. from x to x’. By expanding the first integrand
Long ago, Maxwell and Boltzmann tried to into a power series in x-x’, with x’ fixed and
calculate viscosity and other physical quanti- by retaining the first few terms, we obtain the
ties characterizing gaseous flow in nonequilib- Fokker-Planck equation:
rium. The +Boltzmann equation is generally a
nonlinear +integrodifferential equation. On the (alat)f(x, t) = - (dlo?x)(a, (x)f(x, t))
basis of this equation mathematical theories + (~2/~x2H%(xMx.~ t))/Z (26)
were developed by D. Enskog, S. Chapman,
and D. Hilbert [2]. a,(x) = W(x, x + r)r”dr. (27)
Free electrons in a metal can be regarded s
1519 402 Ref.
Statistical Mechanics

J. The Hierarchy of Particle Distribution and the random force cannot be independent,
Functions but are related by a theorem asserting that
cr
Another way of extending the Boltzmann my= <f(tJf(t1 +t))dt. (29)
equation is to consider a set of distribution s0
functions of one particle, of two particles, and
In an electric conductor, the thermal motion of
generally of n (<N) particles selected from the
charge carriers necessarily induces irregular-
whole system of N particles. For example, a
ities of charge distribution, and so an electro-
two-particle distribution function is the func-
motive force that varies in time in a random
tion fi(xl, u,,x*, L’~, t) for positions and veloc-
manner is created. This random electromotive
ities of two particles at time t. The complete
force is similar to the random motion of a
dynamics of the entire system of particles
Brownian particle and is called the thermal
can be projected to the time evolution of this
noise. For such a thermal noise there exists a
hierarchy of distribution functions. The equa-
relation similar to (29) between the resistance
tion of motion for ,fi then contains the func-
and the random electromotive force. This
tion fi if the interaction of particles if pairwise,
relation is known as the Nyquist theorem.
the equation for f2 contains f3, and so on.
These theorems are contained in a more gen-
Thus the equations of motion for the set of
eral theorem called the fluctuation-dissipation
distribution functions make a chain of equa-
theorem.
tions. The whole chain is equivalent to the
When an external force is applied to a
deterministic equations of motion for the
system in thermal equilibrium, some kind
dynamics of N particles. However, if the parti-
of irreversible flow, an electric current, for
cle number N is made indefinitely large, with
example, is induced in the system. The rela-
the time scale of observation always finite, the
tionship between the flow and the external
chain of equations for the distribution func-
force is generally represented by an admit-
tions asymptotically approaches a stochastic
tance. If the external force is periodic, the
process if certain conditions are satisfied.
admittance is a function of frequency (0 and
Approximate methods of solving the hierarchy
is given by
equations in classical cases have been devel-
oped by J. Yvon, J. G. Kirkwood, M. Born,
H. S. Green, and others. (30)
In quantum statistics, similar hierarchy
equations can be considered. A typical exam- where $(t) is equal to the correlation function
ple is the so-called Green’s function method of the flow that appears spontaneously as the
fluctuation in thermal equilibrium when no
c271.
external force is applied. This general ex-
pression for an admittance, often called the
K. Irreversible Processes and Stochastic Kuho formula, gives a unified viewpoint from
Processes which responses of physical systems to weak
external disturbances can be treated without
The statistical mechanics of physical processes
recourse to the traditional kinetic approach.
evolving in time is a hybrid of dynamics and
The static limit (w-+0) of the admittance is
the mathematical theory of stochastic pro-
the transport coefficient. The reversibility of
cesses. A typical example is the theory of
dynamics leads to relations among transport
Brownian motion. A colloidal particle floating ’
coefficients, called Onsager’s reciprocity rela-
in a liquid moves incessantly and irregularly
tions in the thermodynamics of irreversible
because of thermal agitation from surrounding
processes.
liquid molecules. For simplicity, an example of
When external disturbances are so large that
l-dimensional Brownian motion is considered
the system deviates considerably from thermal
here. Phenomenologically we assume that a
equilibrium, the responses may show character-
colloid particle follows an equation of motion
istic nonlinearities. Such nonlinear phenomena
of the form
are important from both experimental and
mti = - myu + f( t), (28) theoretical points of view, and constitute a
central subject of modern research (- 433
called the Langevin equation, where m is the Turbulence and Chaos).
mass of the colloid particle and u is the veloc-
ity. The first term on the right-hand side is the
friction force due to viscous resistance, and the References
second term represents a random force acting
on the particle from surrounding molecules. [I] L. Boltzmann, Lectures on gas theory,
If (28) describes the Brownian motion in Univ. of California Press, 1964. (Original in
thermal equilibrium, the friction constant my German, 1912.)
403 A 1520
Statistical Models

[2] S. Chapman and T. G. Cowling, The [27] A. A. Abrikosov, L. P. Gor’kov, and I. E.


mathematical theory of non-uniform gases, Dzyaloshinskii, Methods of quantum field
Cambridge Univ. Press, third edition, 1970. theory in statistical physics, Prentice-Hall,
[3] R. C. Tolman, The principles of statistical 1963. (Original in Russian, 1962.)
mechanics, Oxford Univ. Press, 1938. [28]E. H. Lieb and D. C. Mattis, Mathemat-
[4] D. ter Haar, Elements of statistical me- ical physics in one dimension, Academic Press,
chanics, Holt, Rinehart and Winston, 1961. 1966.
[S] J. W. Gibbs, Elementary principles in
statistical mechanics, Yale Univ. Press, 1902
(Dover, 1960).
[6] P. and T. Ehrenfest, The conceptual foun-
403 (XVlll.5)
dations of the statistical approach in mechan- Statistical Models
ics, Cornell Univ. Press, 1959. (Original in
German, 1911.) A. General Remarks
[7] I. E. Farquhar, Ergodic theory in statis-
tical mechanics, Interscience, 1964. A statistical model is defined by specifying the
[8] V. I. Arnol’d and A. Avez, Ergodic prob- structure of the probability distributions of the
lems of classical mechanics, Benjamin, 1968. relevant quantities. When a statistical model is
[9] R. Jancel, Foundations of classical and used for the analysis of a set of data, its role is
statistical mechanics, Pergamon, 1963. to measure the characteristics of a certain con-
[lo] R. H. Fowler, Statistical mechanics, Cam- figuration of the data points. R. A. Fisher [l]
bridge Univ. Press, second edition, 1936. advanced a systematic procedure for the appli-
[l l] A. Ya. Khinchin, Mathematical founda- cation of statistical models. The process of
tion of statistical mechanics, Dover, 1949. statistical inference contemplated by Fisher
(Original in Russian, 1943.) may be characterized by the following three
[12] L. D. Landau and E. M. Lifshits, Statis- phases: (1) specification of the model, (2) esti-
tical physics, Pergamon, 1969. (Original in mation of the unknown parameters, and (3)
Russian, 1964.) testing the goodness of tit. The last phase is
[ 131 R. Kubo, Statistical mechanics, North-
followed by the first when the result of the
Holland, 1965. testing is negative. Thus the statistical in-
[ 141 H. S. Green and C. A. Hurst, Order- ference contemplated by Fisher is, realized
disorder phenomena, Interscience, 1964. through the process of introduction and selec-
[ 151 H. E. Stanley, Introduction to critical
tion of statistical models.
phenomena, Oxford Univ. Press, 1971. We always assume that the true distribution
1161 C. Domb and M. S. Green, Phase tran-
of an observation exists in each particular
sitions and critical phenomena, Academic
application of statistical inference, even though
Press, 1972.
it may not be precisely known to us. Our par-
1171 0. Bratteli and D. W. Robinson, Opera-
tial knowledge of the generating Imechanism of
tor algebras and quantum statistical mechan- the observation suggests various possible con-
ics II, Springer, 1981.
straints on the form of the true dl!stribution.
[ 181 D. Ruelle, Statistical mechanics: Rigorous
The basic problem of statistical inference is
results, Benjamin, 1969. then to generate an approximation to the true
[ 191 D. Ruelle, Thermodynamic formalism,
distribution by using the available obser-
Addison-Wesley, 1978.
vational data and a model defined by a set
[20] D. W. Robinson, The thermodynamic of probability distributions satisfying the
pressure in quantum statistical mechanics,
constraints.
Springer, 197 1.
[21] R. B. Israel, Convexity in the theory of
lattice gas, Princeton Univ. Press, 1979. B. The Criterion of Fit
[22] D. H. Mayer, The Ruelle-Araki transfer
operator in classical statistical mechanics, The use of statistical models can best be ex-
Springer, 1980. plained by adopting the predictive point of
1231 B. M. McCoy and T. T. Wu, The two- view, which defines the objective of statistical
dimensional Tsing model, Harvard Univ. Press, inference as the determination of the predictive
1973. distribution, the probability distribution of a
1241 N. S. Krylov, Works on the foundation of future observation defined as a function of the
statistical physics, Princeton Univ. Press, 1979. information available at present. The perfor-
[25] K. Huang, Statistical mechanics, Wiley, mance of a statistical inference procedure is
1963. then evaluated in terms of the expected dis-
1261 F. Reif, Fundamentals of statistical and crepancy of the predictive distribution from
thermal physics, McGraw-Hill, 1965. the true distribution of the future observation.
1521 403 c
Statistical Models

The probabilistic interpretation of thermo- the sensitivity of the density function f(y)
dynamic entropy developed by L. Boltzmann given by
[2] provides a natural measure of the discre-
a,+a,y+...+a,yP
pancy between two probability distributions. &ilY) =
The entropy of a distribution specified by the b,+h,y+...+b,yq’
density f(y) with respect to the distribution Pearson’s system of distributions is defined by
specified by g(y) is defined by putting p = 1 and q = 2 and by assuming vari-
ous constraints on the parameters ai and hj
and the support of f(y) [4].
E. Wong [S] discussed the construction of
where, as in what follows, the integral is taken continuous-time stationary Markov processes
with respect to some appropriate measure dy. with the distributions of Pearson’s system as
This definition of entropy is a faithful repro- their stationary distributions. This allows a
duction of the original probabilistic interpre- structural interpretation of the parameters of a
tation of the thermodynamic entropy by Boltz- distribution of the system.
mann and allows the interpretation that the
entropy B(f;g) is proportional to the logar- (2) Maximum Entropy Principle and the Ex-
ithm of the probability of getting a statistical ponential Family. To develop a formal theory
distribution of observations closely approxi- of statistical mechanics E. T. Janes [6] intro-
mating f(y) by taking a large number of inde- duced the concept of the maximum entropy
pendent observations from the distribution estimate of a probability distribution. This
g(y). (For a detailed discussion - [3].) concept leads to a natural introduction of the
Obviously we have exponential family. Following Kullback [7],
we start with a distribution g(y) and try to find
Nf;d= .f(Y)l%dY)dY- .l’(Y)lwf(YVY. ,f(y) with prescribed expectations of statistics
s s T,(y), . , T,(y) and with maximum entropy
The second term on the right-hand side is a B(f;g). Such a distribution f(y) is given by the
constant depending only on f(y). The first relation
term is the expected log likelihood of the dis-
tribution g(y) with respect to the true distri-
bution f(y). Thus a distribution with a larger where it is assumed that the right-hand side is
value of the expected log likelihood provides a integrable. By varying the parameters z,, , rk
better approximation to the true distribution. over the allowable range we get the exponen-
Even when ,f(y) is unknown, logy(y) provides tial family of distributions. I. J. Good [S] con-
an unbiased estimate of the expected log like- sidered the Janes procedure as a principle for
lihood. This fact constitutes the basis of the the generation of statistical hypotheses and
objectivity of the rlikelihood as a criterion for called it the maximum entropy principle.
judging the goodness of a distribution as an
approximation to the true distribution. (3) Parametric Models of Normal Distribu-
tions. Of particular interest within the ex-
ponential family is the family of normal distri-
C. Parametrization of Probability butions. This is obtained by assuming the
Distributions knowledge of the first- and second-order mo-
ments of a distribution and applying the maxi-
When we construct a statistical model it is a mum entropy principle [9]. Obviously, the
common practice to represent the uncertain parametrization of a normal distribution is
aspect of the true distribution by a family of concerned only with the mean vector and the
probability distributions with unknown para- variance-covariance matrix.
meters. This type of family is called a para- Let X =(X1 , , X,)’ be an n-dimensional
metric family; the model is called a parametric normal random variable with mean EX =
model. The parameters in a parametric model (m, , , m,,y and variance-covariance matrix
are the keys to the realization of the infor- L’=(gij), where aij=E(Xi-mi)(Xj-mj). (E
mation extraction from data by statistical denotes expectation and ’ denotes the trans-
methods. Accordingly, the introduction of pose.) A nonrestrictive family of n-dimensional
mathematically manageable parametric normal distributions is characterized by n +
models forms the basis for the advance of n(n + 1)/2 parameters, mi (i = 1, . , n), and qj
statistical methods. (i = 1, , n; j = 1, , n). The prior information
on the generating mechanism of X introduces
(I) Pearson’s System of Distributions. A wide constraints on these parameters and reduces
family of distributions can be generated by the number of free parameters.
assuming a rational-function representation of Reduction of the dimensionality of the
403 c 1522
Statistical Models

parameter vector m = (ml, , m,)’ is realized called the factor analysis model and the dimen-
by assuming that m is an element of a k- sion g of F is called the number of factors. By
dimensional subspabe of R” spanned by the keeping the number of factors sufficiently
vectors a, =(a1 ,, ,aJ, ,ak=(uklr . . . ,a,,)‘, smaller than the dimension of X, we get a
i.e., by assuming the relation parametrization of Z with a smaller number of
free parameters than the unconstrained model.
m=Ac,
Starting with g = 1 and successively increasing
whereA=(a,,...,a,)isannxkmatrixandc= the number of factors, we can get a hierarchy of
(C, , . , cJ’ is a k-dimensional vector with k < models with successively increasing numbers
n. This parametrization is obtained when for of parameters. (- [12] for a very general
each Xi the observation (ail, . , uik) is made on modeling of the variance-covariance matrix.)
a set of k factors and the analysis of the linear
effect of these factors on the mean of Xi is
(4) Parametrization of Discrete Distribution.
required. We have the representation X = AC
Consider the situation where the observation
+ W, where W is an n-dimensional normal
produces one of the events represented by r =
random vector with EW = 0 and variance-
0, 1,2, , k with probability p(r), where k
covariance matrix C (- Section D).
may be infinite. Represent by X =(X1, . , X,)
To complete the model we have to specify
the result of n independent observations. The
the variance-covariance matrix C. One of the
probability p(X) of getting such a result is
simplest possible specifications is obtained by
given by the relation
assuming that the Xi are mutually indepen-
dent and of the same variance 0’. This reduces
Z to (~~1, where ! denotes an n x n identity logp(X)= i &b(X)>
t-=0
matrix. With this assumption the number
where O,=logp(r) and n,(X) denoles the num-
of necessary parameters to represent the
ber of Xi’s which are equal to r. (The term not
variance-covariance matrix reduces from n(n
depending on the Q,‘s is omitted in the above
+ 1)/2 to 1. The model obtained with the
and subsequent formulas, since it is immaterial
assumptions C = (r21 and m = AC is called the
for problems of inference.) Thus a nonrestric-
general linear model (or regression model)
tive model is obtained by assuming only the
with normal error, or simply the normal linear
relations O,<O and ~~=,e*,= 1. Obviously the
model. The model is called a regression model
model defines an exponential family and vari-
also when the ai are random variables (- e.g.,
ous useful parametrizations are realized by
Cl%111). introducing some constraints on the para-
A typical example of nontrivial parametri-
meters 0,.
zation of the covariance structure of X is ob-
When the events r are arranged in a 2-
tained by assuming the representation
dimensional array (i,j) (i = 1, , rn; j= 1, , n)
X=m+AF+W, we have

where F=(f,, . . ..f.,’ denotes the vector of


logp(X) = f i @,n,(X)
random effects and W the vector of measure- i=l j=l
ment errors. It is assumed that F and W are
One simple parametrization is given by
mutually independent and normally distri-
buted with EF = 0 and EW = 0. Also the com-
ponents of W are assumed to be mutually
where it is assumed that CF1 c(~= C,?=, pj =
independent. The variance-covariance matrix
C& yij=& yij=O. Obviously this is a pa-
C of X is then given by
rametrization of 0, as a linear function of
C= A’.DA’+A, the parameters xi, pj, and yij, and the model
thus obtained is called the log linear model.
where @ = EFF’ and A = EWW’, which is
The model shows a formal similarity to the
diagonal.
analysis-of-variance model (- Section D). By
When A is a design matrix, the parametri-
introducing successively more restrictive as-
zation provides a components-of-variance
sumptions on the parameters, we can get a
model (or random-effects model) of which the
hierarchy of models for the analysis of a two-
main use is the measurement of the variance-
way contingency table. Extension to cases
covariance matrix @ of the random effects
when more than two factors are involved is
fi, . ,f, rather than the measurement of F
obvious (- e.g., [ 131).
itself. If we consider F to be representing the
Here we consider that Xi is a dichotomous
effects of some latent factors for which A is not
variable, i.e., k = 1, and that the probability of
uniquely specified, the above representation
Xi = 1 may depend on i, i.e., we have
of ,Z gives merely a formal, or noncausal,
parametrization of 2. In this case the model is
1523 403 E
Statistical Models

where e,, = log Prob(X, = 0) and 0, i = c;=, aijtj, i= 1, . . . . n, then the model (1) is
logProb(Xi = 1). We assume that a vector of obtained with A = (aij). Usually one of the
observations a, = (air, , a,,)’ is available components of the vector a is taken as unity.
simultaneously with Xi and that we are inter- In this framework the form x = Cjajlj is called
ested in analyzing the relation between a, and the linear regression function or regression
the probability distribution of Xi. The analysis hyperplane, and for k = 2, the graph of the
is realized by exploring the functional relation linear function x = a, 5, + 5, and its coefficient
between ri = 8, i - f3ei and a,. We can assume a t1 are called the regression line and regression
linear relation coefficient, respectively. The components of
the vector a are called fixed variates or ex-
7=Ac,
planatory variables. Frequently we encounter
where7=(7, ,..., 7,,)‘, A=(a, ,..., a,)‘,andc= the case where the vector a, and consequently
(c, , , c,)‘. The parameter 7i = log{ p(Xi = the matrix A, are random variables. When this
l)/(l -p(Xi= l))} is the log odds ratio or logit is the case, a discussion like that above can be
of the event Xi = 1, and the model is called carried out for given A by regarding A< in (1)
the linear logistic model [ 141. A hierarchy as the conditional expectation of the vector X.
of models can be generated by assuming a
successively more restrictive linear relations
among the components of C. E. The Method of Least Squares

Consider the subspace L(A) of the tsample


D. General Linear Models space R” spanned by the column vectors of A.
Then the dimension s of L(A) equals the rank
Another class of models often used in practical of A, and L(A) and its torthocomplement
applications is composed of general linear L’(A) are called the estimation space and the
models or linear regression models, where the error space, respectively. The torthogonal
observed value is considered to be the sum of projection y of a point x to the space L(A) is
the effects of some fixed causes and the error. expressed as y = PAx with a real tprojection
Let X = (Xi, . , X,)’ be an n-dimensional matrix PA. The variable Y = PAX is called the
trandom variable, and denote the expectation least squares estimator of E(X), and the rou-
of X by E(X)=(p,, . . . . p,)‘. If E(X) is of the tine of getting such an estimator Y, called the
form A< with an unknown parameter < = method of least squares, minimizes the squared
(5 , , . . . , <,J’, and a given n x k matrix A, then error (X - A<)‘(X - A<) for a given X. This
we can express X as method consists of two operations solving the
normal equation A’A< = A’X with respect to 5,
X=A<+W, E(W)=(O, . ,oy, (1) and setting Y = Ai, where i is a solution of the
with the error term W = ( WI, . . . , IQ’. We equation. For s = k, we obtain Y = A(A’A)-‘A’X
frequently assume a set of conditions on the directly. Even when s < k, where the solution
distribution of X; for example, (i) X,, . . . , X, of the normal equation is not unique, Y is
are mutually +independent, (ii) Xi, . , X, uniquely determined. The quantity Q = X’(I -
have a common unknown tvariance rr’, (iii) P,)X, where I is the unit n x n matrix, is the
(X,, , X,) is distributed according to an n- squared distance of the point X from the space
dimensional tnormal distribution. The equa- L(A) and is called the error sum of squares
tions (1) together with conditions on the with n-s degrees of freedom.
distribution are called a linear model. A linear function p’< of the parameter 5 with
Among the methods of statistical analysis of coefficient vector /I= (pi, . . , b,J’ is called a
linear models are regression analysis, analysis linearly estimable parameter (or estimable
of variance, and analysis of covariance as parameter) if there is a linear unbiased esti-
explained below, but these are not clearly mator, that is, an unbiased estimator of the
distinguished from each other. (I) In design-of- form b’X, of /I’<. In order that @‘< be estimable
experiment analysis, i.e., analysis of variance, it is necessary and sufficient that /I’ be a linear
the matrix A and the vector < in (1) are called combination u’A of the row vectors of the
a design matrix and an effect, respectively. In matrix A. A linear unbiased estimator that has
this case entries of A are assumed to be either minimum variance among all linear unbiased
1 or 0. (II) In regression analysis, we are first estimator uniformly in 6 is called the best
given a linear form x = C&r ajtj of a vector linear unbiased estimator (b.1.u.e.). If the con-
a = (ai, , ak)’ with coefficient vector < = ditions (i) and (ii) of Section D are satisfied,
(lr, , &... Let Xi, , X, be the observed then for any given n-vector u the b.1.u.e. of a
valuesofxatnpointsa,=(a ,,,..., a,$ ,..., a, parameter y = u’A< is given by y*= u’Y with
= (a n,, . . . , a,,J, respectively, where n > k. If the Y = P,X, and its variance equals (u’P,u)d,
observations are unbiased, that is, if E(X,) = while the expectation of the quantity Q is
403 F 1524
Statistical Models

given by (n-s)o’. This proposition is known models { ,f( .I&); OiE Oi} (i = 1, . , k) such that
as the Gauss-Markov theorem. Hence the o,co, c . c 0,. The comparison of models
b.1.u.e. f = u’Y is frequently cited as the least is then realized through the comparison of the
squares estimator of y. The quantity 8’= maximum likelihoods f(x 1e,(x)), where f&(x)
Q/(n - s) is an unbiased estimator of rs2 and is denotes the maximum likelihood estimate of
called the mean square error. If in addition the fli based on the data x. For Oi c @i the log
condition (iii) of Section D is assumed, then $ likelihood ratio is defined by
and’&* are the tuniformly minimum variance
unbiased estimators of y and g2, respectively. n(“iloj; x)’ -210g{ .fCxIei(x))/f(x Ioj(x))}.
When the error term W in (1) has covar- The analysis of log likelihood ratios is realized
iance matrix Z = a’& with an unknown real by the decomposition
parameter g2 and a known matrix C,, the
A(O,/O,;x)=A(0,/0,;x)+A(0,/0,;x)+...
valaue of the parameter < minimizing Q=
(X - A<)‘Ci’ (X -A<) is called the generalized + A(Ok-,/c&; x).
least squares estimator of < if it exists. This
The log likelihood ratios A(@,~~,/@,; x),
estimator has properties similar to those of the
least squares estimator. A(@,-,/@,-,;x), . . ..A(O./O,;x) are succes-
sively tested by referring to chi-square distri-
butions with the degrees of freedom d(k) -
F. Model Selection and the Method of d(k-l),d(k-l)-d(k-2),...,d(2;i-d(l),
Maximum Likelihood respectively, where d(i) denotes the dimension
of the manifold Oi. The assumption of the chi-
When a parametric family of distributions square distributions is only asymptotically
{ f(. IO); OE 0) is given and an observation x valid under the usual regularity conditions (-
is made, logf(x 10) provides an unbiased esti- 400 Statistical Hypothesis Testing). The model
mate of the expected log likelihood of the defined with Oi for which A(OJ@~,; x) first
distribution f(. IO) with respect to the true becomes significant is selected and f(y 1OJx)) is
distribution of the observation. The value of 0 accepted as the predictive distribution. The
which maximizes this estimate is the maximum problem of how to choose the levels of sig-
likelihood estimate of the parameter and is nificance to make the test procedure a proce-
denoted by O(x) (- 399 Statistical Estimation). dure for model selection remains <open.
In a practical application we often have to
(2) Model selection by AIC. One way out of
consider a multiple model, defined by a set
the difficulty of model selection is to assume
of component models { .fi(. IO,); Oil 0;) (i=
a prior distribution over Oi for each model
1, , k). The problem of model selection is
{ A(. I 0,); 0, E O,}. This leads to Bayesian model-
concerned with the selection of a component
ing, which is discussed in Section G. Another
model from a multiple model. The difference
possibility is to replace each component model
of the difftculties of handling a simple model
{ ,fi(. I Oi); O,E@~} by a distribution L(. I &(x))
defined by a one-component model, and a
specified by the maximum-likelihood estimate
multiple model is quite significant. For a sim-
&(x). The problem here is how to define the
ple model { f(. 10); 0~ O}, each member of the
likelihood of each distribution A(. ( Oi(x)). An
family is a probability distribution. In the case
information criterion AIC was introduced by
of a multiple model, its member is a model
H. Akaike [16] for this purpose; it is defined
which is simply a collection of distributions
and does not uniquely specify a probabilistic by
structure for the generation of an observation. AIC = (-2)log,(maximum likelihood)
Thus the likelihood of each component model
+ 2 (number of estimated parameters).
with respect to the observation x cannot be
defined and the direct extension of the method We may consider -0.5 AIC to be the log
of maximum likelihood to the problem of “likelihood” of f(. 10(x)) which is corrected for
model selection is impossible. This constitutes its bias as an estimate of E,E,logf(yl O(x)),
a serious difficulty for the handling of multiple where E, denotes the expectation with respect
models. Apparently, Fisher used the proce- to the true distribution of x, and where it is
dure of testing to solve this difficulty. assumed that x and y are independent and
identically distributed. The maximum “likeli-
(1) Analysis of log Likelihood Ratios. The hood” estimate of the model is then defined by
procedure of model selection by testing, which the model with minimum AIC. This realizes a
is applicable to a wide class of models, is the procedure of model selection that avoids the
method of analysis of log likelihood ratios ambiguity of the testing procedure. It is appli-
[lS]. Consider the situation where a model cable, at least formally, even to the case of a
is to be determined by using a hierarchy of nonhierarchical set of models.
1525 403 Ref.
Statistical Models

G. Bayesian Models proceed completely analogously to Fisher’s


scheme of statistical inference (- e.g., [ 181).
Consider the situation where an observation x The basic underlying idea of both the mini-
is made and it is desired to produce an esti- mum AIC procedure and Bayesian modeling is
mate p(y 1x) of the true distribution of a future the balancing of the complexity of the model
observation y. p( y 1x) is called a predictive against the amount of information available
distribution. Assume that x and y are sampled from the data. This unifying view of the con-
from one of the distributions within the family struction of statistical models is obtained by
(s(yl O)f(x] 0); HE@}, i.e., x and y are stochasti- the introduction of entropy as the criterion for
tally independent but share common struc- judging the goodness of fit of a statistical
tural information represented by 0. As a design model (- [19] for more details).
criterion of p( y 1x) we assume a probability
distribution n(0) of 0. The model {.f‘(. IO); 0~0) References
with T-C(O)is called a Bayesian model and n(I)) is
called the prior distribution. From the relation [l] R. A. Fisher, Statistical methods for re-
search workers, Oliver & Boyd, 1925; Hafner,
fourteenth edition, 1970.
where E,,, denotes the expectation of y con- [2] L. Boltzmann, Uber die Beziehung zwi-
ditional on x and E, the expectation with schen dem Zweiten Hauptsatze der mechani-
respect to the marginal distribution of x, it schen Wlrmetheorie und der Wahrscheinlich-
can be seen that the optimal choice of p(y 1x) keitsrechnung respektive den Satzen iiber das
which maximizes the expected log likelihood is Warmegleichgewicht, Wiener Berichte, 76
given by the conditional distribution (1877) 373-435.
[3] I. N. Sanov, On the probability of large
Aylx)= dYl~)P(~lx)~~L deviations of random variables, IMS and AMS
s Selected Transl. Math. Statist. Prob., 1 (1961)
213-244. (Original in Russian, 1957.)
where p(B I x) is the posterior distribution of 0
[4] K. Pearson, Contributions to the mathe-
defined by
matical theory of evolution II, Skew vari-
ation in homogeneous material, Philos. Trans.
Roy. Sot. London, ser. A, 186 (1895), 3433414.
Also included in Karl Pearson’s early statis-
When a prior distribution n(O) is specified, tical papers, Cambridge Univ. Press, 1948,41&
the parametric family of distributions { f(. IO); 112.
0~0) is converted into a stochastic structure [S] E. Wong, The construction of a class of
which specifies a probability distribution of stationary Markoff processes, Proc. Amer.
the observations. The likelihood of the struc- Math. Sot. Symp. Appl. Math., 16 (1963), 264-
ture, or the Bayesian model, with respect to 276. Also included in A. H. Haddad (ed.),

sf’(x
IU)7c(U)dU.
an observation x is defined by Nonlinear systems, Dowden, Hutchinson &
Ross, 1975, 33-45.
[6] E. T. Janes, Information theory and statis-
tical mechanics, Phys. Rev., 106 (1957) 620-
When there is uncertainty about the choice of 630.
the prior distribution we can consider a set of [7] S. Kullback, Information theory and statis-
possible prior distributions and apply the tics, Wiley, 1959 (Dover, 1967).
method of maximum likelihood. Such a proce- [S] I. J. Good, Maximum entropy for hypoth-
dure is called the method of type II maximum esis formulation, especially for multidimen-
likelihood by I. J. Good [ 171. For a multiple sional contingency tables, Ann. Math. Statist.,
model { Ji(. I 0,); OiE Oi} (i = I, , k), if prior 34 (1963) 91 I-934.
distributions ~~(0~) are defined, a model selec- [9] C. E. Shannon and W. Weaver, The math-
tion procedure is realized by selecting the ematical theory of communication, Univ. of
Bayesian model with maximum likelihood. Illinois Press, 1949.
Bayesian modeling has often been consid- [lo] S. R. Searle, Linear models, Wiley, 1971.
ered as not quite suitable for scientific appli- [ 1 l] F. A. Graybill, Theory and application of
cations unless the prior distribution is objec- the linear model, Duxbury Press, 1976.
tively defined. However, even the construction [ 121 K. G. Joreskog, A general method for
of an ordinary statistical model is always analysis of covariance structures, Biometrika,
heavily dependent on our subjective judgment. 57 (1970) 2399251.
Once the objective nature of the likelihood of [13] Y. M. M. Bishop, S. E. Feinberg, and P.
a Bayesian model is recognized, the selection W. Holland, Discrete multivariate analysis:
or determination of a Bayesian model can Theory and practice, MIT Press, 1975.
404 A 1526
Statistical Quality Control

[ 141 D. R. Cox, The analysis of binary data, daily use; and pollution following ‘disposal.
Chapman & Hall, 1970. For a product or service to conform to this
[ 151 I. J. Good, Comments on the paper by sort of quality it has become necessary to
Professor Anscombe, J. Roy. Statist. Sot., ser. conduct QC activities not only during produc-
B, 29 (1967) 39-42. tion but also at early stages of design and
[16] H. Akaike, A new look at the statistical development of new products.
model identification, IEEE Trans. Automatic The measured characteristics of quality
Control, AC-19 (1974), 7166723. vary from one product to another because of
[ 171 I. J. Good, The estimation of proba- natural variability in the material and produc-
bilities, MIT Press, 1965. tion process involved, ability of individual
[lS] G. E. P. Box, Sampling and Bayes’ in- workers, errors in different sorts of measure-
ference in scientific modeling and robustness, ment, etc. If the variations among the mea-
J. Roy. Statist. Sot., ser. A, 143 (1980), 383- sured values from a process can be attributed
430. to “chance causes” and their distribution ex-
[ 191 H. Akaike, A new look at the Bayes pro- pressed by a probability or a probability den-
cedure, Biometrika, 65 (1978), 53-59. sity function, the process is said to be in a
“state of statistical control” according to W. A.
Shewhart or in a “stable state” by JIS. In this
case the value of a characteristic is deemed to
404 (XVIII.14) be the realization of a random variable X.
Sometimes the variations are attributed to
Statistical Quality Control “assignable causes,” which must be identified
and eliminated.
A. General Remarks

According to the Japanese Industrial Standard


B. Control Charts
(JIS) Z 8101, “Quality Control (QC) is a system
comprising all the methods used in manufac-
turing products or providing services econom- The control chart provides a means of evaluat-
ically that meet the quality requirements of ing whether a process is in a stable state.
consumers.” To emphasize that modern qual- The control chart is made by plotting points
ity control makes use of statistical methods, illustrating a statistic of the quality charac-
it is sometimes referred to as Statistical Qual- teristics or manufacturing conditions for an
ity Control (SQC). In order to implement ordered series of samples or subgroups. A
effective QC, statistical concepts and methods sheet of the control chart is provided with a
must be applied and the “Plan-Do-Check- middle line between a pair of lines depicting
Action” (PDCA) cycle must be followed in the upper control limit (UCL) and the lower
research and development, design, procure- control limit (LCL). The stable state is as-
ment, production, sales, and so on. These QC sumed to be exhibited by points within the
activities are executed on a company- control limits. Points falling outside the con-
wide basis from the top management to the trol limits suggest some assignable causes,
production workers. This type of QC is called which should be eliminated through corrective
Company-Wide Quality Control (CWQC) or measures.
Total Quality Control (TQC). The idea underlying control cha.rts as devel-
The quality Q is an abstract notion of the oped by Shewhart is to apply the statistical
conformity of a product or service to con- principle of significance to the control of pro-
sumers’ requirements; it also refers to the total duction processes. Other types of control
of the characteristics of a product or service as charts have also been developed, far example,
perceived by consumers. The quality charac- acceptance control charts and adaptive control
teristics may include both measurable physical charts. These have been successfully applied
and/or chemical features, such as strength and to many quality control problems.
purity, or features such as color or texture as The foundation of Shewhart’s control chart
appreciated by individuals. These latter char- is the division of observations into what are
acteristics could be called “consumer qual- called “rational subgroups.” A rational sub-
ities.” Furthermore, the concept “quality” group is the one within which variability is
has also been used to describe the social im- due only to chance causes. Between different
pact of a product or service. This might be subgroups, however, variations due to assign-
called “social quality.” Examples of social- able causes might be detected. In most pro-
quality issues are pollution by solid waste or duction processes the rational sub,group com-
drainage in the production stage; degradation, prises the data collected over a short period
maintainability, and safety of a product in of time during which essentially the same con-
1527 404 c
Statistical Quality Control

dition in material, tool setting, environmental observed at a ratio of 97.7%, which is the value
factors, etc., prevails. of the risk /j’ for each plotted point, under the
The limits on the control charts are placed, normal distribution of the plotted statistic.
according to Shewhart, at a 30 distance from One reason why the control chart has been a
the middle line, where 0 is the population practical tool in many applications is this lack
standard deviation (or standard error) of the of sensitivity to a relatively small shift of the
statistic; 3a expresses limits of variability level. If greater sensitivity is required, 20 limits,
within the subgroup. Assuming that the popu- “warning limits” are used. This results in a
lation distribution of an observed character- greater risk c( of erroneously finding a process
istic is “normal,” the range between the limits out of control.
should include 99.7% of the points plotted so Other decision criteria based on aspects of
long as the process is “in control” at the mid- run theory are also used. Charts using ac-
dle value. Accordingly, 0.3% of the plotted cumulated data from several rational sub-
points from the “in control” process fall out- groups for each plotted value are sometimes
side the limits, and thus give an erroneous recommended: Moving average and moving
“out of control” signal. range charts and the cusum (“cumulative
To determine that the process is in con- sum”) chart are examples. The statistical
trol for a normally distributed characteristic theory for these charts is more complicated
N(p, c*), we have to investigate the variability than that for the simple charts discussed here.
between the means p and the standard devi-
ations 0 of different distributions of X for
different subgroups. Thus the state of control C. Sampling Inspection
of a process is determined with control charts
for Sampling inspection determines whether a lot
should be accepted or rejected by drawing a
x = i X&l and s=/F sample from it, observing a quality character-
i=l
istic of the sample, and comparing the ob-
which are the appropriate statistics corre- served value to a prescribed acceptance crite-
sponding to p and c. Despite the theoretical rion. Sampling may be conducted in several
drawback of the statistical range R = maxi Xi - stages. Definite criteria are required to decide
min,X, against s, use of the range is often pre- at each stage whether to accept or reject the
ferred in QC work because of its simplicity lot or continue sampling on the basis of sam-
in computation. Hence the % R charts are ple values observed so far. There also must
obtained from the previously collected k ra- be some rules to determine the size of the next
tional subgroups each of size it as follows: sample if it is to be taken. These criteria and
rules together are called the sampling inspec-
UCL=X=+A,R, UCL=D,R,
tion plan. The number of samples eventually
LCL=X-A$. LCL=D& drawn and observed and their sizes are gener-
ally random variables. In single sampling in-
where 2 and l? are the averages of the k values spection the final decision is always reached
of X and R, respectively, and after one stage of sampling is completed. Dou-
ble sampling inspection makes the final deci-
4=1-3:; sion after at most two stages of sampling are
2
completed. Multiple sampling inspection makes
E[R] =&a, VCR] = E[(R - E[R])‘] the final decision after at most N stages of
sampling are completed (N < co). Inspection
=d 3 262.
without a predetermined limit on the number
For n < 7, LCL for R cannot be given because of sampling stages, sequential sampling inspec-
D3 becomes negative. tion, is usually constructed so that the proba-
The other commonly used control charts are bility of the indefinite continuation of sam-
the p chart (proportion of nonconformity: pling is 0.
binomial distribution) and the c chart (number Once a sampling inspection plan is deter-
of defects: Poisson distribution). For those mined, the probability for accepting a lot with
charts the above theory of normal distribution given composition can be calculated. This
is also used to approximate the binomial and probability as a function of lot composition is
Poisson values. called the operating characteristic of the plan.
It is generally sufficient to use the agreed-on In most cases, the quality of a lot is expressed
decision criterion (3~ limits) and to recognize a by a real parameter fl (e.g., fraction defective,
relatively small risk (tl= 0.003) for practical i.e., percentage of defective products, or the
purposes. It should be noted, however, that a average of some quality characteristic), and we
shift of the process mean p by la would not be use only inspection plans whose operating
404 Ref. 1528
Statistical Quality Control

characteristics are expressed as a function of 0. p,, CI,and /I are assigned, a plan can be estab-
The graph of this function is called an OC- lished to minimize n. It rejects the lot when
curve. We impose certain desirable conditions the sample mean X =x& Xi/n exceeds a fixed
on the OC-curve and design plans to satisfy number determined by pa, pi, E, and p. This
them. Tables for this purpose, sampling inspec- too is based on the UMP test of p <p,, against
tion tables, are prepared for practical use. The PaPI.
condition most frequently employed is ex- (3) Cases where the samples are drawn in
pressed in the following form in terms of four more than one stage: As in (2), assign two
constants H,, I),, tl, 8: The probability of rejec- values of H, c(, and /3; there is still liberty to
tion is required to be at most c( when 0 < 0, (or choose n,, It*, . , which are the sizes of the
0 > O,), and the probability of acceptance at samples drawn at each stage. Hence there are
most [j when 0 > 0, (or 0 < 0,). Here dois called many possible plans fulfilling the imposed
the producer’s risk, and /3 the consumer’s risk. conditions. Among them a plan is sought
If the rejection of a lot is identified with the to minimize the expectation of n = n 1 + n2 +
rejection of a statistical hypothesis 0 < O,, the . . . (called the average sample number). For
OC-curve is actually the power curve of the example, plans based on the sequential proba-
test upside down (i.e., the graph of 1 minus bility ratio tests are in common use (- 400
the Tpower function), and the producer’s and Statistical Hypothesis Testing).
consumer’s risks are precisely the terrors of the (4) Among other special plans, sampling
first and second kind. The choice of a plan is inspection with screening and sampling inspec-
actually the choice of a test under certain con- tion with adjustment are worthy of mention. In
ditions on its power curve. Commonly used the first plan, all the units in the rejected lots
plans are mostly based on well-established are inspected and defective units replaced by
tests, some of which have certain optimum nondefective ones. In this case, fixing p1 (the
properties. A few examples, (l))(4), are given lot tolerance percent defective) and /I, or the
below. Here sampling inspection by attribute is average fraction defective after the inspection
an inspection plan that uses a statistic with a (average outgoing quality level), we attempt to
discrete distribution, whereas sampling inspec- minimize the expected amount of inspection,
tion by variables uses a statistic with a contin- that is, the expected number of inspected units
uous distribution (- 400 Statistical Hypoth- including those in the rejected lots. In the
esis Testing). second plan, acceptance criteria are tightened
(1) Single sampling inspection by attribute or loosened according to the qualit!/ of the lots
concerning the fraction of defective items in a just inspected.
lot: Let the defective fraction be denoted by p
and identified with 0 in the preceding para-
References
graph. Assign two values of p, say p. and p1
(0 < p0 < p, < l), the producer’s risk CI, and the
[l] J. M. Juran, Quality control handbook,
consumer’s risk b. Together they give con-
McGraw-Hill, 1974.
ditions to be fulfilled by the OC-curve. Draw n
[2] Japanese Standard Association, Termi-
items from a lot at random, and suppose that
nology, JIS Z 8101, 1963.
they contain Z defective items. The decision is
[3] Japanese Standard Association, Sampling
then made after observing Z, whose distri-
inspection, JIS Z 9001-9006, 1957.
bution is thypergeometric and approximately
[4] Japanese Standard Association, Control
ibinomial when the size of the lot is large
charts, JIS Z 9021-9023, 1963.
enough. There exists a plan that minimizes n
among all plans satisfying the imposed con-
ditions under either of the two assumptions
about the distribution of Z. It rejects the lot
when Z is greater than a fixed number deter- 405 (XVII.1 6)
mined by pO, pl, E, and 8. This plan is based Stochastic Control and
on the tUMP test of the hypothesis p < PO
Stochastic Filtering
against the alternative p 2 pi.
(2) Single sampling inspection by variables
concerning the population mean p in the case A. General Description of Stochastic Control
where the population distribution is N(p, a’)
with known 0’: Draw a sample (X,, , X,) of Stochastic control is an optimization method
size n from a lot. Assume that the X are inde- for systems subject to random dismrbance. Let
pendently distributed with the same distri- I be a compact convex subset of Rk, called a
bution N(n, LT’). Suppose that smaller values of control region. Let K be an ?n-dimensional
the quality characteristic stand for a more Brownian motion and c#V) = a( IV,; s < t) (say
desirable quality. If two values of p, say p0 and $) be the least +a-field for which IV,, s < t, are
1529 405 c
Stochastic Control and Stochastic Filtering

measurable. An +8-progressible measurable tive definite and cp is smooth. Then V(t)cp(x)


f-valued process is called an admissible con- belongs to +Wrri,$ for any p and is the unique
trol. For an admissible control U, the system solution of the Bellman equation (= dynamic
evolves according to an n-dimensional con- programming equation)
trolled stochastic differential equation (CSDE)
aw
~=SUp.IL”W-c(x,u)W+f(x,u)}
dX,=a(X,, U,)dW,+y(X,, UJdt, “El-
where a symmetric n x n matrix a(x, u) and n- a.e. on (0, co) x R”, W(0, x) = q(x) on R”.
vector y(x, U) are continuous in R” x F and
In addition if inf,,,c(x,u)>O, then W=
Lipschitz continuous in XE R”. Hence the
lim,,, V(t)cp exists and is the unique solution
CSDE has a unique solution, called the re-
of the Bellman equation sup.,r{L” W- c(x, u) W
sponse for U,. The problem is to maximize (or
+f(x, u)} =0 a.e. on R”. When r is the hitting
minimize) the performance J:
time, the value function is related to the Bell-
man equation with a boundary condition.

C. Feedback Control

where X, is the response for U, with X,, =x and In practical problems we specify the kind of
r is a constant time or a thitting time asso- information on which the decision of the con-
ciated with a target set. We put V(r, x, cp)= troller can be based at each time. We fre-
suputadm,contro,J(r, x, cp, U) the value function quently assume that the data obtained up to
as a function of x. If the supremum value is that time is available. The following situations
attained at an admissible control o,, then or is are possible: (1) The controller knows the com-
called an optimal control. plete state of the system. This is called the case
of complete observation. (2) The controller has
partial knowledge of the state of system. This
B. Bellman Principle is called the case of partial observation. A
feedback control (= policy) is a function of
In order to get V(t + s, x, cp), R. Bellman ap- available information, namely, a F-valued
plied the following two-stage optimization. progressible measurable function defined on
After using any U up to time t, a controller [O, co) x Cj[O, co), wherej is the dimension of
changes U to an optimal one. Then at time t + data and Cj[O, co) is a metric space of totality
s the performance J(t, x, V(s;, cp), U) is ob- of j-vector valued continuous functions on
tained. Taking the supremum with respect to [O, co). A policy U is called a Markovian policy
I/, one gets V’(t + s, x, cp). This is called the if U(t, 5) is a Bore1 function oft and the tth
Bellman principle. Let C be the +Banach lattice coordinate of 5. When a policy U is applied,
of the totality of bounded and uniformly con- the system is governed by the +SDE
tinuous functions on R”. Suppose that a, y, f
and c( $0) are bounded and smooth; then for dX,=aK, Uk Y))dY+y(X,, u(t, Y))dt
constant time t, the value function V(t, x, cp) with data process Y. When the SDE has a
belongs to C whenever (PE C. Moreover, the tweak solution, U is called admissible. For
family of operators V(t) defined by V(t)cp(x)= example, when X = Y, any Markovian policy is
V(t, x, cp) becomes a tmonotone contraction admissible if a is uniformly positive definite.
semigroup on C. The semigroup property Let X, be a weak solution for U. Then its
V(t + s, x, q) = V(t, x, V(s, , cp)) is nothing but performance J(z, x, cp, U) < V(t, x, cp).
the Bellman principle. The tgenerator G is (1) The case of complete observation. When
expressed by CI is uniformly positive definite, an optimal
Markovian policy can be constructed in the
“El- following way. Since F is compact, there exists
for a smooth function cp, where L” is the gen- a Bore1 function 0 on [0, co) x R” which gives
erator of +diffusion of the response for con- the supremum, namely,
stant control u(E F), namely,
sup {L”F@)cp(x)-4% 4 Ur)cp(x)+f(x, 41
“Cl-
=LB’t.x)V(t)cp(x)-c(x, O(t,x))V(t)cp(x)

+fk m 4).
+ f Yiix;u)$.
i=1
This relation implies that V(t)q(x) = J(t, x, cp,
Furthermore. assume that a is uniformlv nosi- ii(t, X,)) for any weak solution X,. Hence
405 D 1530
Stochastic Control and Stochastic Filtering

0 is an optimal Markovian policy. Especially D. Stochastic Maximum Principle


when a, c, and j are independent of u and
y(x, u) = R(x). u, where R(x) is an n x k matrix, A stochastic version of +Pontryagin’s maxi-
an optimal Markovian policy 0 is obtained mum principle gives a necessary condition for
by a measurable selection of maximum points optimality. This means that the in:stantaneous
of (grad. V(t)q, R(x). u). Since the supremum value of optimal control maximizes the sto-
of a linear form is attained at the boundary’ chastic analog of Pontryagin’s Hamiltonian.
al-, one can suppose that u(t, x) belongs to Suppose that the system evolves according to
ar. This is called hang-hang control. an n-dimensional CSDE
(2) The case of partial observation. One
dX,=cc(X,)dw+y(X,, UJdt.
useful method is the separation principle. This
means that the control problem can be split The problem is to seek conditions on admis-
into two parts. The first is the mean square sible control U, such that E,[jtf(X,, UJdt] is
estimate for the system using a tfiltering. The maximized, where T is a constant lime. Assume
second is a stochastic optimal control with that a, y, and f are bounded and smooth.
complete observation. But generally speaking, Define a Hamiltonian H on R” x I‘ x R” by
the problem of deciding under what conditions H(x, u, Y) = y(x, u) Y + f (x, u). Let DCbe op-
the separation principle is valid is difficult. In timal and r?, its response starting at x. Then
the case of the following linear regulator the under some conditions there exist .i > 0 and z-
separation principle holds. progressively measurable qt,k = (qt,k,, qt,k,, ,
Suppose that the system process X, and the qt,k,) (k = 1, . , n) and Y, = (Y,, r Y,,,) which
observation process y obey the following satisfy the SDE
SDEs:

dX,=A(t)dM/;+(B(t)X,+h(t, cl@, Y))dt,

dx=d@+H(t)X,dt, +qt,kd&, k=l,.../ n,

where A(t), B(t), and H(t) are nonrandom and H(z?,, ~~,Yt)=max,,,H($,u,Y,) a.e.
matrix-valued functions and mc is a j-
dimensional Brownian motion independent of
E. Optimal Stopping and Impulse Control
W,. The problem is to search for a feedback
control which gives the maximum value. Sup- Suppose that X, is an n-dimensional diffusion
pose that a feedback control U(t, 5) is Lip-
whose generator A is an elliptic differential
schitz continuous in 5 E C’[O, m). Put
operator. Let z be a tstopping time. The op-
Qh 4 timal stopping problem is to seek a stop-
ping time F so that E,[g(X,] is maximized,
T
1 sup E,,
U,Lip
Ut, X,, u(t, VW + ‘W,)

where (X,, 7) is the unique solution for U with


1
> where g is nonnegative and continuous.
called optimal. The value function V(x) =
sup,E,[g(X,)] is characterized
? is

as the least
the initial condition X,=x, Y, = 0. By the Lip- texcessive majorant of g. Moreover, under
schitz condition of U, a, = rr( Y,; s < t) is inde- some conditions V belongs to the domain of A
pendent of U, and the tconditional expectation and is the unique solution of the +free bound-
ary problem; V>g, AV<O, and (V-g). AV=O.
8, = E(X,/crJ is governed by the following
SDE, by way of the +Kalman-Bucy filter: Therefore, in the Hilbert space framework, the
value function is related to the variational
d&=I’(t)H’(t)dw*+(B(t)g,+h(t, C?J)dt inequality. An optimal stopping ttme is pro-
vided by the hitting time for the set {x 1 V(x) =
with some o,-progressible measurable control
ot and an n-dimensional Brownian motion g(x)).
Impulse control is a variant of the optimal
w* adapted to 0,. Moreover, P(t) is the terror
stopping problem. At some moment (= stop-
matrix satisfying the +Riccati equation, and
ping time) a controller shifts the current state
H’ is the transpose of H. Let g(t, x) be the
to some other state. But not all shifts are al-
probability density of the normal distribution
lowed: State x can be shifted to a state of x +
N(0, P(t)), and put L”(t,z?,u)=JL(t,x,u) g(t,x
[0, m)“. Let 7k, k = 1,2, be a sequence of in-
-2)dx and q(i)=jY(x)g(T,x-2)dx; then
creasing stopping times and & be a [0, co)‘-
the problem turns into
valued crJX)-measurable random variable.
7
Q(s, x) = sup E,,
c [S.s
L(t, 8,, C?Jdt +‘?(&)
1 The sequence U = {71, 5,) z2, t2,. } is called an
impulse control. U transfers the process X, to

Recalling the SDE for X,, we can use the Bell-


man equation for choosing an optimal one.
1531 405 H
Stochastic Control and Stochastic Filtering

(LSDE)

and the problem is to seek Li so as to maximize where A(s) (or B(s)) is an n x n (or n x r)
E,Cjhe m”z,f(r;u)dt-Z.,“=, em”‘~K(&)], where matrix-valued continuous function, q is an Y-
3, (> 0) is constant and the function K (20) dimensional Brownian motion independent of
stands for the cost of shifting. The value func- the noise w;, and the initial data X0 is a Gauss-
tion is related to a quasivariational inequality. ian random variable independent of & and
RC. Suppose further that h(t, x) is linear, i.e.,
h(t, x) = H(t)x, where H(t) is an m x n-matrix-
F. General Description of Stochastic Filtering valued function. Then the joint process (X,, YJ
is Gaussian. Hence the nonlinear filter r?,
The problem of estimating the original signal
coincides with the linear filter and satisfies
from data disturbed by noises is called a sto-
chastic filtering problem. Let X,, TV [0, T], be a

+st
continuous stochastic process with values in J?,=E[X,]+ ‘(A(s)-P(s)H(s)‘H(s))&ds
s0
R”, called a signal (or system) process. It is
transformed (or coded) to h(t, X,), where h(t, x) P(s)H(s)‘dY,,
is an m-vector-valued continuous function. 0
Suppose that it is disturbed by a noise ri: and
where H(s)’ is the transpose of H(s), and P(f) =
we observe t = h(t, X,) + ri,. Usually I$ is
(Pii( is the error matrix defined by Pij(t) =
assumed to be a twhite noise independent of
E(X; - Xi:‘, (Xi - 2:). It satisfies the matrix
X,. Since the white noise is a generalized func-
Riccati equation
tion, the integral of t, i.e.,
Wt)
---=A(t)P(t)+P(t)A(t)’
y= fh(s,X,)ds+ &:, dt
s0
- P(t)H(t)‘H(t)P(t) + B(t)B(t)‘,
is called an observation process, where K is an
+m-dimensional Brownian motion independent P(0) = covariance of X,.
of X,. It is assumed for convenience that IX,12
Let @(t, s) be the tfundamental solution of the
and & Ih(s, XJ’LEs are integrable.
linear differential equation dx/dt = (A(t) -
Assume that X, is a l-dimensional signal
P(t)H(t)‘H(t))x. Then the solution 8, is repre-
process. The least square estimation of XC by
nonlinear functions of observed data ys, s < t,
is called a nonlinear filter of X, and is denoted
by XC. Let e or g( x; s < t) be the least trr-field
sented by

T?c=@(t,O)EIXo]+
fs
0
@(t,s)P(s)H(s)‘dY,
for which <, s d t, are measurable. Then the
filter 2, is equal to an +%-measurable random This algorithm is called the Kalman-Bucy filter
variable such that EIX,-2?,12<EIX,-Z12 [ 11. Analogous results for discrete-time models
holds for any R-measurable L2 random vari- have been obtained by Kalman.
able Z. Hence it coincides with the tcon-
ditional expectation E[X, I9J. Now let H, be
the closed linear space spanned by l’,, s < t. H. Nonlinear Filter
The least square estimation of X1 by elements
of H,, i.e., the orthogonal projection of X, onto
H,, is called the linear filter of X, and is de- In the study of nonlinear filters, the tcon-
ditional distribution n(dx) = P(X, Edx 1F,) is
noted by $. Obviously, the mean square error
considered besides 8,. Suppose that X, is
of a nonlinear filter is less than or equal to that
governedbytheSDE
of a linear filter, but a linearfilter is calculated
more easily. If (XC, x) is a Gaussian process,
both filters coincide. X,=X,+ jIa(s,X.)ds+ j;(‘(s,X.)dCi.,.
When X, is an n-dimensional process (X,‘,
“‘> X:), the n-vector process Xt = (8)) ,8:) where a(s, x) (or b(s, x)) is an n-vector (n x r-
(or z* = (xi, . . , x:)) is called the nonlinear matrix) valued Lipschitz continuous function.
(or linear) filter of X,. Then ~~,(,f)=jf(x)n,(dx) satisfies the SDE
I
G.f)= W’(KJl+ ~s(Lf)ds
G. Kalman-Bucy Filter s0

Suppose that the signal process X, is governed


by a tlinear stochastic differential equation
405 I 1532
Stochastic Control and Stochastic Filtering

where h,(x) = h(s, x) and ties the LSDE

w
axiaxj’
The density p,(x) (if x exists) satisfies
rt

Under additional conditions on a(s, x) and


b(s, x), n,(dx) has a density function ret(x), and
it satisfies

If h(t, x) is a smooth function, then Qx, y) is


continuous in y, so that am or p,(f) is a
continuous functional of the observed data
(x; s < t). Thus the filter rr, is a +ro bust statistic.
Remarks. (i) the signal and noise are not
where L* is the formal adjoint of L.
independent if the signal is controlled based
The process 1, = x - jk rc,(h,) ds is a Brown-
on the observed data. In these cas’es, correc-
ian motion such that ~(1,; s < t) c e holds
tion terms are sometimes needed for the SDE
for any t. If ~(1,; s < t)= <e holds for all t, It is
of the nonlinear filter. (ii) If the +sa.mple paths
called the innovation of q. The innovation
of the signal process are not continuous, a
property is not valid in general. A sufficient
similar SDE for a nonlinear filter IIS valid with
condition is that (X,, x) is a Gaussian process
L being replaced by some integrodifferential
or h(t, x) be a bounded function. However, in
operator. If it is a +Markov chain with finite
any case, +&-adapted martingales are always
state, L is the generator of the chain. (iii) Sev-
represented as tstochastic integrals of the form
eral results are known for the case where
XT=1 ~~f,‘(w)dl~, where the fsi are Fs-adapted
the noise K is not a Brownian motion but a
processes.
+Poisson process.

I. Bayes Formula
References

Let C (or D) be the space of all continuous


mappings x (or y) from [0, T] into R” (or R”) [l] W. F. Fleming and R. W. Rishel, Deter-
equipped with the uniform topology. x, (y,) is ministic and stochastic optimal control,
the value of x (y) at time t. Let g*(C) be the Springer, 1975.
least c-field of C for which x,, s < t, are mea- [2] R. Bellman, Dynamic programming,
surable. &J*(D) is defined similarly. We denote Princeton Univ. Press, 1957.
by %> @IV>%x, Y the +laws of processes X,, I& [S] N. V. Krylov, Controlled diffusion pro-
and (X,, K), respectively. These are defined on cesses, Springer, 1980. (Original in Russian,
&(C), &Q(D), and gr(C)@&(D), respectively. 1977.)
Then @xx,r is equivalent (mutually tabsolutely [4] M. Nisio, Stochastic control theory, Indian
continuous) to the product measure Ox 0 $+, Statist. Inst. Lect. Notes, Macmillan, 1981.
on each g,(C) @ t#(D). The +Radon-Nikodym [S] H. J. Kushner, On the stochastic maxi-
density c(, of @xx,r with respect to @x 0 Qw is mum principle, fixed time of control, J. Math.
written as Anal. Appl., 11 (1965) 78-92.
[6] A. N. Shiryaev, Optimal stopping rules,
Springer, 1980. (Original in Russian, 1976.)
[7] A. Bensoussan and J. L. Lions, Sur la
thtorie du controle optimal. I, Temps d’arret;
II, Controle impulsionnel, Hermann, 1977.
[S] R. S. Bucy and R. E. Kalman, New results
where hi and y’ are the corresponding com- in linear filtering and prediction theory, J.
ponents of vectors and dyf denotes the +Ito Basic Eng. ASME, (D) 83 (1961), 05-108.
integral. [9] R. S. Bucy and D. D. Joseph, Filtering for
The conditional distribution n,(dx) is com- stochastic processes with applications to
puted by the Bayes formula: guidance, Interscience, 1968.
[lo] M. Fujisaki, G. Kallianpur, and H.
Kunita, Stochastic differential equations for
the nonlinear filtering problem, Osaka J.
where Y=( E;; 0 <t < 7). Moreover, p,(f) satis- Math., 9 (1972) 19-40.
1533 406 B
Stochastic Differential Equations

[ 1 l] R. S. Liptzer and A. N. Shiryaev, Statis- defined them for a large class of random inte-
tics of stochastic processes. 1, General theory; grands. Ito’s integrals have been extended in
II, Applications, Springer, 1977, 1978. (Original the martingale framework by H. Kunita and S.
in Russian, 1974.) Watanabe, and by others [14,19], as shown
[12] G. Kallianpur, Stochastic filtering theory, below.
Springer, 1980. Let (C&g, P) be a probability space, and
let F= {Wt,, be an increasing family of CJ-
subfields of B. Usually we assume that {e} is
right continuous, i.e., Ft+,,: = na,O~!&+E=&
for every t > 0. Denote by .,&! = A(F) the total-
406 (XVll.14) ity of all continuous square-integrable martin-
Stochastic Differential gales X=(X,) relative to {e}; to be precise,
Equations X is an {*}-martingale such that, with proba-
bility 1, X0 =O, t *X, is continuous and E(Xf)
-C co for every t 2 0. We introduce the metric
A. Introduction
11X- YII =C& 2-“min(l,IlX,-- KJ2) on J&
where 11 II 2 stands for the L,(Q P)-norm. We
Stochastic differential equations were rigor-
always identify two stochastic processes X =
ously formulated by K. ItB [7] in 1942 to
(X,) and Y=(x) if sample functions t*X,
construct diffusion processes corresponding to
and t --) k; coincide with probability 1. Then, by
Kolmogorov’s differential equations. For this
virtue of Doob’s inequality II maxO 4sSt 1X, -
purpose he introduced the notion of stochastic
Y,l ((2G2\\Xr- y1\12 (- 262 Martingales), .Jz’
integrals, and thus a differential-integral cal-
becomes a complete metric vector space.
culus for sample paths of stochastic processes
Next, by an integrable increasing process we
was established. This theory, often called ItB’s
mean a process A =(A,) with the following
stochastic analysis or stochastic calculus, has
properties: (i) A is adapted to {&}, i.e., A, is
brought an epoch-making method to the theory
z-measurable for every t > 0; (ii) with proba-
of stochastic processes. It provides us with a
bility 1, A, = 0, t+A, is continuous and nonde-
fundamental tool for describing and analyzing
creasing; (iii) A, (20) is integrable for every
diffusion processes that we can apply effec-
t 20, i.e., E(A,) < cc. We denote by d = d(F)
tively to limit theorems and to the probabil-
the totality of integrable increasing processes.
istic study of problems in analysis. It also
We call a process V = (V$ an integrable process
plays an important role in the statistical the-
of bounded variation if 1/ is expressed as y =
ory of stochastic processes, such as tstochastic
Ai -A: with A’, A’e,d. The totality of inte-
control or tstochastic filtering. Stochastic dif-
grable processes of bounded variation is de-
ferential equations on manifolds provide a
noted by v = Y(F). It follows from the +Doob-
probabilistic method for differential geome-
Meyer decomposition theorem that, for every
try, sometimes called stochastic differential
M, N EM, there exists a unique VE ^tr such that
geometry. Recently, many interesting examples
M,N,- y is an {g}-martingale. We denote
of infinite-dimensional stochastic differential
this V as (M, N). In particular, (M, M) ES&‘,
equations have been introduced to describe
and it is denoted simply by (M). (M, N) is
probabilistic models in physics, biology, etc.
called the quadratic variation process because
A unified theory of stochastic calculus has
been developed in the framework of Doob’s ~::=l(M,,-M,i-,)(N,i-NN,i~I)~(M,N), in
probability as lAl+0, whereA:t,=O<t, <...
martingale theory and this, combined with
<t,=tisapartitionandIAl=maxiGiG.lti-
Stroock and Varadhan’s idea of martingale
problems, provides an important method in timl 1. Brownian motion is the most important
the theory of stochastic processes (- 262 example of continuous square-integrable mar-
Martingales). tingales, and this is characterized in our frame-
work as follows. Suppose that a d-dimensional
continuous {*}-adapted process X = (Xt)
B. Stochastic Integrals satisfies Mf = Xf - g E JZZ and (M’, M’), =
6’jt, i, j= 1, 2,. , d. Then X is a d-dimensional
As is well known, almost all sample paths of a Brownian motion such that X,-X,, and the
Wiener process are continuous but nowhere % are independent for every u > o > t. Such a
differentiable (- 45 Brownian Motion), and Brownian motion is called an {&}-Brownian
hence integrals with respect to these functions motion, and a system of martingales M’ E &?
cannot be defined as the usual Stieltjes inte- having this property is often called a system
grals. But these integrals can be defined by of {%}-Wiener martingales.
making use of the stochastic nature of Brown- Now, we fix ME=&. We denote by yZ(M)
ian motion. Wiener defined them (the Wiener the totality of real, {e}-adapted, and measur-
integrals) for nonrandom integrands, but It6 able processes Q=@(t)) such that li@li:,=
406 B 1534
Stochastic Differential Equations

~l~oW2~<W,1 < cc for every t > 0. Two For @‘gyp, we can choose a s,equence of
Q1, Q2ey2(M) are identified if iI@, -@211f,M= {yy}-stopping times {on} such that. a,, < 00,
0 for all t 3 0. Since 11 /It,M is an &-norm G,,T~ as. and M”~E.&‘, @,,.E&?~(M~“), n= 1,
on [0, t] x sl with respect to the measure 2, For example, set o,, = min [n, inf{ t 1(M),
,u,Jds, do) = d (M),(u)P(dw), it is easy to see + f0 @(s)‘d ( M),5> n}]. Then there exists an
that yZ(M) is a complete metric vector space IMP J?oc such that r”(@p = Ih’““(Q’,,) for
with the metric llQ-@liM=C,“=, 2-“min(1, II@ n = 1, 2, . . , which is unique and indepen-
-Q’IIn,M)r @, WeY2(M). If @=(0(t)) is given, dent of a particular choice of {g”}. r”(@) is
for a partition 0 = t, < t, < < t, --, CCIand called the stochastic integral of @ E -!Zp(M)
&-measurable bounded functions .f;, i = 0, by ME&““. I”(@)(t) is often denoted by
l,..., by ~h@(~)dMs, and the random variable I”(@)(t)
obtained by fixing t is also called a stochastic
integral.
i=l,2,...,
Some of the basic properties of stochas-
then @~-ip2(M) and the totality tic integrals are: (i) If ME A?““, @E LP~oc(M),
processes are dense in yl(M). If and YE yp(I”(@)), then @Y 6yj°C( M) and
P(W) = f 1h’I’a’(Y). (ii) If ME.&“” and 0, YE
define I”(@)=(IM(@)(t))r3,, by
2$““(M), then for every n, PER we have a@ +
//Y =(c+D(t) + /?Y(t))E9~°C(M) and ,I”(a@ + PY)
=c(I~(@)+/JI~(Y). Also if M, NE./z”‘~ and 0
t,<t<t,+,. E&“‘~(M)~~$‘~:“‘(N), then for n, PER we have
@,E~~(RM+~~N) and l’“‘PN(@)==alM(@)+
Then IMP.&?‘, and it holds also that (Z”(@), jr”(@). (iii) If M, NE J?“~~, @E Up(M), and
1”(Y))=Sb@(s)Y(s)d(M,N),for M, NE.,&’ YE-~@(N), then@Y~PJ~°C((M,R’))and
and (3, Y’E~~. In particular, III”(@)(t)ll~ = <I”(@,), IN(V), =fo@sy,d<M, W,. Here,
EC(I”(@,))J = lIQll~M, and hence llI”(@,)II = P~c( V) for VE I r’oc is defined as follows:
ll@llM. This implies that @,E~~c~~(M)- With probability 1, s+ V, is of bounded varia-
IMP.& is an isometric linear mapping, and tion on every finite interval [0, t], the total
hence it can be extended to y2(M) uniquely, variation of which is denoted by 17111.Then
preserving the isometric property. r”(@)6 1v 1E &!F, and we define 5?joc( V) (17> 1) to
./& is called the stochastic integral of @)E be the totality of real, {5$}-adapted, and
P’*(M) by MEA. I”‘(@)(t) is often denoted by measurable processes Q, = (Q(t)) such that,
&@(s)dM,, and the random variable I”(Q)(t) with probabilityl, &l@(#‘dl Vl,v< cc for
obtained by fixing t is also called a stochastic every t > 0. In particular, P!~oc( M) =: LFp( (M)).
integral. (iv) If MEA”‘, @EL??(M), and CJis an {z}-
The definition of stochastic integrals can be stopping time, then I”“(@‘,) = I”“(@) = I”““(@‘,)
extended further by the following localization = [I”(@)]o. (v) If O(t) = I;,,,) .f; where cr is
method. For an {$}-tprogressively measur- an {$}-stopping time and f is a bounded
able process X =(X,) and {.P,}-stopping time go-measurable random variable, then @)E
0, the stopped process X”=(X:) is defined by T?‘(M) for every ME ~J4”” and I”’ (0) (t) =
X:=X,,, (t~a=min(t,c)). It follows from the f(M,- M,,,). (vi) The definition of stochastic
+optional sampling theorem of Doob that integrals is independent of the increasing
XO~.&! if XE.&‘. For @=(@(t))~2?~(M) and an family of cr-subfields in the following sense: If
(.P$stopping time (T, Q,, = (Q,,(t)) defined by (gt} is another family such that M belongs
@‘,(O=~~teJ) Q(t) also belongs to L&(M) and it to the class &lo’ for both {e} and {gr} and @
holds that I”(@‘,)=[IM(@)]“. Keeping these belongs to the class 14,(M) for both {%e} and
facts in mind, we give the following definition. {$), then I”(@) is the same whether it is de-
Let k”” = {M = (M,) 1there exists a sequence fined with respect to j.Pj} or {$}.
of {PI}-stopping times g,, such that o,< “c, In particular, N = I”‘(@), ME ~A”“, @E
o,,rcn as n+rx, a.s. and M”~E.&! for every n 9;“‘(M), satisfies (N, L), = & @(s)d( M, L),
= 12 ). ,PZ”’ and Vrloc are defined in a for all L E J4”0c. Conversely, NE ,,z?“~~having
simila%r way. For M, NE.,&‘~~~, (M, N)EV”~ is this property is unique, and hence it coin-
defined to be the unique process in V”“’ such cides with I”(@). I”(@)~.M if and only if
that (Mu”, N”“) = (M, N)“” for a sequence of &‘W2d(W.+~.
stopping times {a,} as above, which can be The above definition of stochastic integrals
chosen common to M and N. (M, M) is de- can be extended with a slight technical modiii-
noted by (M) as before. We fix ME JH“” and cation to the case when M, is not necessarily
set $‘:“‘(M)= {@=(@(t))la real, {.%)-adapted continuous [ 191. Among such general stochas-
and measurable process such that, with proba- tic integrals, a particularly important role is
bility one, r0 @(s)‘d (M), < 00 for every t 3 0). played by stochastic integrals describing point
1535 406 C
Stochastic Differential Equations

processes, including Poisson point processes as we define XA=(XA(f)) by X”(t)=X(C,) and


an important special case. These stochastic call it the time change of X determined by A.
integrals are important in the study of discon- Then X* is progressively measurable with
tinuous processes including +LCvy processes; respect to {E}. If X:X(t)=X(O)+ M(t)+ V(t)
even in the study of continuous processes, such is a semimartingale with respect to {&}, then
as diffusion, they provide an important tool X* is a semimartingale with respect to {.R),
for the treatment of excursions [4,6,8]. and its semimartingale decomposition is given
Let (Q 5, P) and i.6) be as above. By a by X”(t)=X(O)+M*(t)+ V*(t). The map-
continuous semimartingale with respect to pings M-MA and I/h VA are bijections
j/Z), or simply a semimartingale when there is between .,M”’ and ~,p and between V””
no danger of confusion, we mean a process and F““, respectively, where =k”l”’ and ?7’0c
X=(X(t)) of the following form: X(t)=X(O)+ are defined relative to {OR}. Furthermore,
M(t)+ V(t), where X(0) is an .&-measurable (MA, N*) = (M, N)* for every M, NE J4’Oc.
random variable, A4 =(M(~))E&“” and V= Noting that ~)E~~~~(M) can always be chosen
( V(~))E V’“‘. M and V are uniquely deter- {.%,)-progressively measurable (in fact {*e}-
mined from X, and this decomposition is tpredictable), the mapping @+@,A defines a
called the semimartingale decomposition of X. bijection between 9;“‘(M) and @‘(MA), and
M is called the martingale part and V the drift we have I@(@“)= [Z”(Q)]“.
part of the semimartingale X. A semimartin- Transformation of drift (Girsanov transfor-
gale X is often called an It6 process if M(t) = mation). For m~.&!“Oc, set D,(t)=exp[m,-
~o~(~)dB(.s) and I~‘(t)=~~Y(s)ds, where BE f(m),]. Then D, - 1 E JZ’~~, and if m satisfies
./w is an {,$I-Brownian motion, Q)E~F, and a certain integrability condition (for example,
Y~E~“~.(Wherr I/;=t,9p(d”“(l/)isdenoted E(exp[$(m)J) < co for every t 20, in partic-
simply by 2?p.) A d-dimensional process ular, (m), < ct for all t for some constant c >
whose components are semimartingales is 0), then D, is a martingale, i.e., E(D,,,(t)) = I
called a d-dimensional semimartingale. The for all t 20. If E(D,(t))= 1 for all t, then there
following formula, originally due to It8 and exists a probability P on (Q, 9) (if (Q 3) is a
extended by Kunita and Watanabe, is of fun- nice measurable space and 9 = V,2 0 e, which
damental importance in stochastic calculus. we can assume without loss of generality) such
It& formula. Let X(t)=(X’(t), , X”(t)) be that P(A)=E(D,(t): A) for al’ AE&, ~20. Let
a d-dimensional semimartingale and X’(t) = X be a semimartingale with the decomposition
X’(0) + M’(t) + V’(t) be the semimartingale X(t)=X(O)+ M(t)+ V(t). On the probability
decomposition of components. Let F(x) = space (Q.F”, p) with the same family {.e}, X is
F(x’, , xd) be a C*-function defined on Rd. still a semimartingale but its semimartingale
Then F(X(t)) is also a semimartingale, and we decomposition is given by X(t) = X(0) + a(r) +
have P(t), where h?(t)= M(t)-(M,m)(t) and P(t)=
V(t) + (M, m)(t). Furthermore, it holds that
FM(t)) = F(X@)) + i ’ QF(X(s))dM’(s) (fi,&=(M,N), M, NE.,@‘~. This result is
i=, s iJ known as Girsanov’s theorem. The transfor-
mation of probability spaces given above is
+ i fDiF(X(s))dv’(s) called a transformation of drift or a Girsanov
i=l s0
transformation since it produces a change as
f
shown above of the drift part in the semimar-
DiDjF(X(s))d(Mi, M’)(s)
+;.i tingale decomposition.
1.1 1 s 0
In the discussion above, the increasing
(where Di = c?/c’x’).
family {&} was fixed. It is also important to
In others words, if Y(t)= Y(O)+ M(t)+ V(t) is study how the semimartingale character
the semimartingale decomposition of Y(t) = changes under a changing increasing family
F(X(t)), then M(t)=Cf=, ~oDiF(X(s))dM’(s) c121.
and V(t)=~fzlfoDiF(X(s))dVi(s)+
1/2C~,j=,SbDiDjF(X(s))d(M’,Mj)(s).
We now discuss other important transfor- C. Stochastic Differentials
mations on semimartingales.
Time change. Let A E &‘c, and assume fur- Tn this section, we introduce stochastic dif-
ther that with probability 1, t+A, is strictly ferentials of semimartingales and rewrite the
increasing and lim,,, A,= co. Let u-C, be the results in the previous section in more conve-
inverse function of t+ A,, i.e., C,, = min{ t 1 nient form. Let (Q cp, P) and {&} be as above
A, 3 u}. Then for every u > 0, C, is an {~q}- and & c4, ^Y‘ M”‘, J&“, Voc be defined as
stopping time. Set 6 = 9c,, t > 0. For an {@}- in Secti& B. I& 2? we denote the totality of
+progressively measurable process X =(X(t)), continuous semimartingales relative to {e}.
406 D 1536
Stochastic Differential Equations

For XEZ?, let X(t)=X(O)+M,(t)+ Vx(t) be the important role in transferring notions used in
semimartingale decomposition. We write ordinary calculus into stochastic calculus and
formally X(t)-X(O)=&dX(s) and call dX in defining intrinsic (i.e., coordinate-free) no-
(denoted also by dX, or dX(t)) the stochastic tions probabilistically. In particular, it is fun-
differential of X. To be precise, dX can be damental to the study of stochastic differential
considered as a random interval function equations on manifolds (- Section G).
dX(I)=X(t)- X(s), I=@, t] or the equivalence
class containing X under the equivalence
relation X - Y on 2 defined by X - Y if and D. Stochastic Differential Equations
only if X(t) - X(0) = Y(t) - Y(O), t 2 0. For X,
YES and a, /ER, ctdX+bdY is defined by Here, we give a general formulation of stochas-
d(otX+/3Y) and dX.dY by d(M,, M,). Let d22 tic differential equations in which the infini-
be the totality of stochastic differentials of tesimal change of the system may depend on
elements in 5! and dA and dV be that of the past history of the system; however, equa-
elements in .&PC and Y”‘, respectively. di? is tions of Markovian type, in which the intini-
a commutative algebra under the operations tesimal change of the system depends only
just introduced. Note that dX.dYedV and on the present state of the system, are consid-
that dX . d Y = 0 if either of dX and d Y is in ered in most cases. Let Wd be the space of d-
dV. In particular, dX d Y. dZ = 0 for every dimensional continuous paths: Wd = C( [0, CO)
dX, d Y, and dZ. Let g be the totality of {ce}- +Rd):= the totality of all continuous functions
progressively measurable processes @ = (Q(t)) w: [O, a)+Rd, endowed with the topology of
‘such that, with probability 1, supoGs,,J@(s)J the uniform convergence on finite intervals
< co for every t > 0. Noting that %?c 5Zp( M) and 8( Wd) be the topological c-field. For each
for any ME J%“~~, we define @. dX E dS! for t 20, define pt: Wd+ Wd by (p(w)(s)= w(tr\s),
Q, E g and X E 2 to be the stochastic differ- and let ~JWd)=~z~l(&?(Wd)), t>O. Let &d,r
ential of the semimartingale &@(s)dM&s)+ be the totality of functions a(t, w)= (olj(t, w)):
f0 m)(s) d VJs). @. dX is uniquely determined [0, co) x Wd+Rd @ R’ (:= the totality of d x
by @ and dX. ItB’s formula is stated, in this r real matrices) such that each component
context, as follows: For X =(X1, , Xd), Xi E olj(x,w)(i=1,2 ,..., d;j=1,2 ,..., r)is@[O,x?))
9, and F:Rd+R, which is of class C’, F(X)E x B( Wd)-measurable and Br:( Wd)-measurable
2, and for each fixed t 2 0. In general, cxj(t,w) is called
nonanticipative if it satisfies the second property
dF(X)= 2 D,F(X).dX’
i=l
above. An important case of c(E&“,~ is when it
is given as cc(t, w) = o(t, w(t)) by a Bore1 func-
+;,$ D,D,F(X).dX’.dXj. tion a: [0, co) x Rd-+Rd @ R’. In this case, c1is
l., I called independent of the past history or of
We now define another important operation Markovian type. For a given c(E G?“,~ and ,%E
on the space dd. Noting that $ c g, we define dd, I, we consider the following stochastic
XodYfor X, YE2 by differential equation:

(1) dX’(t)= i $(t,X)dB’(t)+/?‘(t,X)dt,


XodY:X.dY+;dX.dY. j=l

i=l,2 ,..., d,
This is uniquely determined from X and d x
and is called the symmetric multiplication of X also denoted simply as
and d Y. It is also called a stochastic differential
dX(t)=x(t,X)dB(t)+&,X)dt.
of the Stratonovich type or ItB’s circle operation
since the notation was introduced by It8 [9]. Here X(t)=(X’(t), , Xd(t)) is a d-dimensional
r0 X o d Y is called the stochastic integral of the continuous process. B(t) = (B’(t), , B*(t)) is a
Stratonovich type, whereas r0 X. d Y is that of r-dimensional Brownian motion wlith B(0) = 0.
the It6 type. Under this operation, It8’s for- A precise formulation of equation (1) is as
mula is rewritten as follows: For X =(X *, . , follows. X=(X(t)) is called a solution of equa-
Xd), X’E$, and F:Rd+R, which is ofclass C3, tion (1) if it satisfies the following conditions:
F(X), D,F(X)EL~?, and (i) X is a d-dimensional, continuous, and {@“I)-
adapted process defined on a probability space
dF(X)= i D,F(X)odX’. (QF-, P) with an increasing family {&}, i.e.,
i=l
X : R + Wd which is R/ac,( Wd)-measurable for
This chain rule for stochastic differentials takes every t>O; (ii) c$(t,X)e6Cp, @(t,X-)EL?p,
the same form as in the ordinary calculus. For i=l,..., d,j=l,..., r(-SectionBforthe
this reason symmetric multiplication plays an definition of spd”“); (iii) there exists an r-
1537 406 D
Stochastic Differential Equations

B(t) with law (distribution), and the law of X on Wd is

fs
dimensional {e}-Brownian motion
B(O)=0 such that the equality called the law (distribution) of X. We say that
the uniqueness in the sense of law of solutions
X’(t)-X’(O)= i ccj(s,X)dBj(s) for (1) holds if the law of any solution X is
j=l 0 uniquely determined by its initial law, i.e., if

+rB’(s,
X)ds,
s
0
i=!,2 ,..., d,
whenever X and X’ are two solutions whose
initial laws coincide, then the laws of X and
X’ coincide. In this definition, we restrict
holds with probability 1. ourselves to the solutions whose initial values
Thus a solution X is always accompanied are nonrandom, i.e., the initial laws are 6-
by a Brownian motion B. To emphasize this, distributions at some points in Rd. Next, we
we often call X a solution with the Brownian say that the pathwise uniqueness of solutions
motion B or call the pair (X, B) itself a solution for (1) holds if whenever X and X’ are any
of (1). In the above definition, a solution is two solutions defined on the same probabil-
given with reference to an increasing family ity space (Q 9, P) with the same increasing
{&}. The essential point is that c-fields a(B(u) family {.e} and the same r-dimensional {&}-
-B(u); ~>u>t)
independent for every t:
and 0(X(s), i?(s);OQs< t) are
If X satisfies the con-
ditions of solutions stated above, then the
Brownian motion such that X(0)=X’(O)
then X(t) = X’(t) for all t > 0 a.s. In this de!?
nition also, the solutions can be restricted to
a.s.,

specified independence is obvious, and con- those having nonrandom initial values.

setting z = nE,e 0(X(s), B(s); O<s<


conditions
t
versely, if this independence is satisfied, then by
+E), the
of solutions stated above are satis-
We say that equation (1) has a unique strong
solution if there exists a function F(x, w): Rd x
H+Wd(M/;;={w~WIIw(0)=O}) such that tb-
fied. But it is usually convenient to introduce following are true: (i) For any solution (X, ’ r
some increasing family {e} into the definition of(l), X = F(X(O), B) holds as.; (ii) for any
of solutions as above. When a and B are of the Rd-valued random variable X(0) and an r-
Markovian type, sc(t, w)= cr(t, w(t)), fl(t, w)= dimensional Brownian motion B =(B(t)) with
b(t, w(t)), the corresponding equation B(0) = 0 which are mutually independent, X =
(2) dX(t)=a(t,X(t))dB(t)+b(t,X(t))dt F(X(O), B) is a solution of (1) with the Brown-
ian motion B and the initial value X(0). If
is called a stochastic differential equation of this is the case, F(x, w) itself is a solution of (1)
Markovian type. Furthermore,
b(t, x) are independent t,
if o(t, x) and
of i.e., cr(t, x) = G(X)
and b(t, x) = b(x), the equation
with the initial value x, and with respect to the
canonical Brownian motion B(t, w) = w(t) on
the r-dimensional Wiener space ( Wg, 9, P), 9
is the completion of a( W,l) with respect to the
(3) dX(t)=a(X(t))dB(t)+b(X(t))dt
r-dimensional Wiener measure P. If equation
is called a stochastic differential equation of (I) has a unique strong solution, then it is clear
time homogeneous (or time-independent) Mar- that pathwise uniqueness holds. Conversely,
kovian type. if pathwise uniqueness holds for (1) and if a
Next, we define the notions of the unique- solution exists for any given initial law, then
ness of solutions. There are two kinds of equation (1) has a unique strong solution,
uniqueness: uniqueness in the sense of law (in CC251.
distribution) and pathwise uniqueness. When The existence of solutions was discussed by
we consider the stochastic differential equa- A. V. Skorokhod [20]. If the coefficients c1and
tions as a means to determine the laws of /j’ are bounded and continuous on [O, co) x Wd,
continuous stochastic processes, uniqueness in a solution of (1) exists for any given initial law.
the sense of law is sufficient. If, on the other This is shown as follows [6]. We first con-
hand, we regard the stochastic differential struct approximate solutions by Cauchy’s
equation as a means to define the sample polygonal method and then show that their
paths of solutions as a functional of the ac- probability laws are ttight. A limit process in
companying Brownian motion, i.e., if we re- the sense of probability law can be shown to
gard the equation as a machine that pro- be a solution. The assumption of bounded-
duces a solution as an output when we input a ness above can be weakened, e.g., to the fol-
Brownian motion, the notion of pathwise lowing condition: For every T> 0, a constant
uniqueness is more natural and more impor- KT > 0 exists such that
tant. As we shall see, this notion is closely
related to the notion of strong solutions. (4) ll4t,w)ll + II/m 4 GKTU + IIW~
These notions are defined as follows. For a fE[O, 7-1, WE Wd
solution X=(X(t)) of (I), X(0) is called the
initial value, its law on Rd is called the initial Here ll~ll,=maxO~,,, Iw(s)l. In the case of the
406 E 1538
Stochastic Differential Equations

Markovian equation (2) it is sufficient to increasing family {.c} such that 9=Vvr,,,&
assume that g(t, x) and h(t, x) are continuous: ’ we set up an so-,-measurable, d-dimensional
random variable X(0) with a given law and
(5) lla(t,x)ll + Ilb(t,x)ll dK,(l +lxl), a d-dimensional {,9j}-Brownian motion B
t~[0, T], xeR”. =(&t)) such that &O)=O. Set X(t)=X(O)
+@t) and M(t)=exp[fo/3(s,X)d&s)-
If these conditions are violated, a solution X(c)
~foI[~(s,X)12ds]. Then M(t) is an {.e}-
does not exist globally in general but exists martingale, and the probability P on (Q F) is
up to a certain time e, called the explosion determined by P(A)= E(M,; A), A E 9$ By
time, such that lim,,, Ix(t)] = m if e < x?. To
Girsanov’s theorem, i?(t) = X(t) - X(0) -
extend the notion of solutions in such cases,
j” fi(s, X)ds is a d-dimensional {&I-Brownian
we have to replace the path space Wd by the
motion on (Q 9, p), and hence (X.. B) is a
space tid that consists of all continuous func-
solution of (6). Any solution is given in this
tions w: [0, m)+Rd (= RdU {A) = the one-point
way and hence the uniqueness in the sense of
compactification) satisfying w(t) = A for every
law holds. But the pathwise uniqueness does
t>e(w)(=inf{t]w(t)=A}).
not hold in general; an example was given
Now, we list some results on the uniqueness
by Tsirel’son [ 1,6] as follows. Let it,,) be a
of solutions. First consider the equations of
sequence such that 0 < < t, < t,-, < t, = 1
the Markovian type (2), and assume that the
and lim + t, = 0. Set
coefficients are continuous and satisfy the
condition (5). (i) If (T, h are Lipschitz continu- (0, tat, and t=O,
ous, i.e., for every N > 0 there exists a constant w(4+1)-
B(t>w)=
O t,+,-ti+* >’
wk+2)
K, such that llo(t,x)--a(t,y)ll + llb(t,x)-
h(t,y)l/ dK,lx-yl, tE[O, Tl, x, yEB,:= b
[ZE Rdl ]z] < N}, then the pathwise uniqueness I tE Ch+,>cl i=O,1,2 ,...,
of solutions holds for equation (2). Thus the where 0(x)=x - [xl, x E R, is the decimal part
unique strong solution of (2) exists, and this of x.
is constructed directly by Picard’s succes- Time changes (- Section B) are also used to
sive approximation (ItB [7,8]). (ii) If d= 1, a solve some stochastic differential equations
is Holder continuous with exponent l/2 and
C61.
h is Lipschitz continuous, i.e., for every N > 0,
K, exists such that
E. Stochastic Differential Equations and
Diffusion Processes

In this section we consider equations of time-


then the pathwise uniqueness of solutions independent Markovian type (3) only. The
holds for equation (2) (T. Yamada and time-dependent case can be reduced to the
Watanabe [25]). (iii) If the matrix a(t, x)= time-independent case by adding one more
~~(t,x)a(t,x)* (i.e., u’j(t,~)=J$=~ &(l,x)&(t,x)) component Xd+‘(t) such that dXdil(t)=dt.
is strictly positive definite, then the unique- Further, we assume that coefficients B(X)E
ness in the sense of law of the solution for (2) Rd @ R’ and b(x)c Rd are continuous on Rd
holds (D. W. Stroock and S. R. S. Varadhan and the uniqueness in the sense of law of solu-
[21]). (iv) An example of stochastic differential tions holds. Let P,, XER~, be the law on Wd,
equations for which the uniqueness in the or on ed if there is an explosion, of a solution
sense of law holds but the pathwise uniqueness with the initial law 6, (= the unit measure at
does not hold was given by H. Tanaka as x). Then { Px} possesses the +strong Markov
follows: d=r=l, b(t,x)=O and o(t,x)=Zlxao) property with respect to {9*}, where z is a
-I Ix<O). Another example in the non- suitable completion of .%J Wd) or @( Gd), and
Markovian cases was given by B. S. Tsirel’son hence ( Wd, {zl, P,) or ( md, {%}, P,) is a diffu-
(see below). sion process ton Rd (- 115 Diffusion Pro-
Next, consider non-Markovian equations of cesses, 261 Markov Processes).
the following form: Let A be the differential operator

(6) dX(t)=dB(t)+fi(t,X)dt; A=;,$ a”(x)DiDj+ i b’(x)Q (Di=c?/axi)


i.e., the case d = r and cc(t, iv) = I (identity ma- I., 1 i=l
I
trix). Assume further that /{E.& is bounded. with the domain C’i(Rd) (= the tot.*lity of C2-
Then a solution of (6) exists for any given functions on Rd with compact supports), where
initial distribution, unique in the sense of law, u’j(x) = CL=, &x)~&x). By Ito’s formula,
and it can be constructed by the Girsanov
transformation of Section B as follows. On a (7) f(W-f(w(O))- ‘(Aff)MW
suitable probability space (Q,.s, P) with an s0
1539 406 F
Stochastic Differential Equations

is an {&}-martingale for every f~ Ci(Rd) (we example, consider a reflecting +Brownian mo-
set f(A) = 0), and this property characterizes tion on the half-line [0, co). This is a diffusion
the diffusion process. The diffusion is gen- process X=(X,) on [O, GO)obtained by set-
erated by the operator A in this sense. Fur- ting X,=(x,1 from a l-dimensional Brownian
thermore, if for some 1>0, (I.-,4)(C,2(Rd)) motion xt. The corresponding differential
is a dense subset of C,(Rd) (= the totality operator is A = id21dx2, and the boundary
of continuous functions f on Rd such that condition is Lu = du/dx lxzO = 0, that is, the
I$,,,, f‘(x) = 0) then the ttransition semi- transition expectation u(t, x) = E,[,f(X,)] is
group of the diffusion is a tFeller semigroup determined by du/& = Au, Lu = 0, and u 11=0
on C,(Rd), and its infinitesimal generator A is =,f: In constructing such diffusion processes
the closure of (A, Ci(Rd)). Hence u(t, x) = with boundary conditions, stochastic differen-
E,[f(w(t))], ,~‘EC~(R~), is the unique solu- tial equations can be used effectively. In the
tion of the evolution equation duldt = Au, case of reflecting Brownian motion, it was
u 11=0 =f: Generally, if the coefficients 0 and formulated by Skorokhod in the form
b are sufficiently smooth, we can show, by
(9) dX(t)=dB(t)+dq(t).
using the stochastic differential equation (3),
that u(t,x) is also smooth for a smooth f Here B(t) is a 1-dimensional Brownian mo-
and satisfies the heat equation au/i% = Au. tion (B(O)=O), X(t) is a continuous process
Taking the expectations in (7), we have the re- such that X(f)>,O, and cp(t) has the following
lation ~,Cf(wWl =.fW +Sb4CAfbW)l& property with probability 1: q(O) = 0, t-r
which implies that the transition probability q(t) is continuous and nondecreasing and
P(t, x, dy) of the diffusion satisfies the equation increases only on such t that X(t) = 0, i.e.,
ap/cit = A*p in (t, y) in a weak sense, where A* &Iloj(X(s))dq(s)=cp(t). Given a Brownian
is the adjoint operator of A. If a/&-A* is motion B(t) and a nonnegative random vari-
+hypoelliptic, we can conclude that P(t, x, dy) able X(0) which are mutually independent,
possesses a smooth density p(t, x, y) by appeal- X(t) satisfying (9) and with the initial value
ing to the theory of partial differential equa- X(0) is unique and given by X(t) = X(0) + B(t),
tions. Recently, P. Malliavin showed that a t <a,=min{t)X(O)+B(t)=O} and X(t)= B(t)
probabilistic method based on the stochastic -min,,“,,,,B(s), t>a, (P. L&y, Skorokhod;
differential equations can also be applied to - C6,181).
this problem effectively, [6, 16, 173. In the case of multidimensional processes,
If c(t, x) is continuous and u(t, x) is s&i- possible boundary conditions were determinea
ciently smooth in (t, x) on [0, co) x Rd, then the by A. D. Venttsel’ [24]. Stochastic differential
following fact, more general than (7), holds: equations describing these diffusions were for-
mulated by N. Ikeda [S] in the 2-dimensional
(8) u(t, w(t))exp[ I:c(,s, w(s))ds]-r(O,x) case and by Watanabe [23] in the general
case as follows. Let D be the upper half-space

-Jiev[J~~(u,-(u~~du] R~,={x=(x~,...,x~)(x~>O),~D={~(X~=O),
and d = {x 1xd > 0). The general case can be
reduced, at least locally, to this case. Suppose
x (au/& +(A +c)u)(s, w(s))ds that the following system of functions is
is a local martingale (i.e., E&Z’“‘) with respect given: a(x):D-tRd x R’, b(x):D-rRd, z(x):aD
to {me, PI}. By applying the toptional sampling +Rdm’ x R‘, B(x):aD-tRd-‘, and p(x):ciD+
theorem to (8) for a class of {&}-stopping [0, cc), which are all bounded and con-
times, we can obtain the probabilistic repre- tinuous. Consider the following stochastic
sentation in terms of the diffusion of solutions differential equation:
for initial or boundary value problems related
to the operator A [3,4]. dX’(t) = i cr;(X(t))lb(X(t))dB’(t)
j=l

F. Stochastic Differential
Boundary Conditions
Equations with
I +b’(X(t))Zb(X(t))dt

As we saw in the previous section, diffusion


processes generated by differential operators
can be constructed by stochastic differential
+BiWWPW>
i=l,2 ,..., d-l.
equations. A diffusion process on a domain
with boundary is generated by a differential
dXd(t)= i ~f’(X(t))l~(X(t))dBj(t)
operator that describes the behavior of the j=l
process inside the domain, and a boundary
condition that describes the behavior of the I +bd(X(t))l~(X(t))dt+d~(t),
process on the boundary of the domain. For
406 G 1540
Stochastic Differential Equations

By a solution of this equation, we mean a the following stochastic differential equation


system of continuous semimartingales 3E= on M:
(X(t), i?(t), M(t), q(t)) over a probability space
(11) dX,=A,(X,)odwk(t)+A,(X,)dt.
(Q9, P) with an increasing family {&} satis-
fying the following conditions: (i) X(t)=(X’(t),
(Here, the usual convention for the omission of
,Xd(t)) is D-valued, i.e., Xd(t)>O; (ii) with
the summation sign is used.) A pre’cise mean-
probability 1, ip(O)=O, t-+q(t) is nondecreas- ing of equation (11) is as follows: We say that
ing, and S’olaD(X(~))d~(s)=cp(t); (iii) B(t) and X =(X,) satisfies equation (11) if X is an (g}-
M(t) are r-dimensional and s-dimensional
adapted continuous process on M admitting
systems of elements in &PC, respectively, explosions such that, for any C”-function f
such that (B’, Bj),=h”t, (B’,M*),=O, and on M with compact support (we set f(A) =
(Mm, M”), = Smnq(t), i, j = 1, , r, m, n = 1, , s; 0), f(X,) is a continuous semimartingale
and finally (iv) the stochastic differentials of satisfying
these semimartingales satisfy (10).
The processes B(t), M(t), and q(t) are sub- (12) df(X,)=(A,f)(X,)odwk(t)+(A,f)(X,)dt,
sidiary, and the process X(r) itself is often
where o is It6’s circle operation defined in
called a solution. We say that the uniqueness
Section C. This is equivalent to saying that
of solution holds if the law of X=(X(t)) is
X,=(X,‘, , Xf), in each local coordinate, is
uniquely determined from the law of X(0). As
a d-dimensional semimartingale such that
before, the existence and the uniqueness of
solutions imply that solutions define a diffu- (13) dX;=r$(X,)odwk(t)+bi(X,)dt
sion process on D, and these are guaranteed
= cr;(X,)dwk(t)
if, for example, min,,?,add(x) > 0 and c, b,
7, fi are Lipschitz continuous,
we set a”(x) = XL=1 c$(x)a,!(x) and &(x) =
&
[6,23]. Here,

T:(x)z{(x). It is a diffusion process gener-


+ ;k$lDjq$j+bi
1 (X,)dt,

ated by the differential operator where AL(x) = a,!(x)Di, k = 1, 2,. , r, and A,(x)
= bi(x)Di. By solving the equation in each
A =;. i aij(x)DiDj+ i b’(x)Di, local coordinate and then putting these solu-
I,, I i=l tions together, we can obtain for each XE M
a unique solution X, of (11) such that X0 =
and by the Venttsel’ boundary condition,
x. We can also embed the manifold M in a
higher-dimensional Euclidean space and solve
Lu(x)=;.~~ a’j(x)D,o,u(~)+~~ /3’(x)Diu(x)
‘., 1 i=l the stochastic differential equation there. We
denote the solution by X(t, x, w). The law P,
+Ddu(x)-p(x)(Au)(x)=O on 8D.
on fiM of [t-X(t, x, w)] defines a diffusion
process on M which is generated by the dif-
G. Stochastic Differential Equations on ferential operator A =~~;=, AZ + .4,.
Manifolds Next, if we consider the mapping x--t
X(t, x, w); then, except for w belonging to a
Let M be a connected a-compact C”-manifold set of P-measure 0, the following is valid: For
of dimension d, and let W’= C( [0, a)+M) be all (t, w) such that X(t, x0, W)E M, the mapping
the space of all continuous paths in M. If M x-+X(t, x, w) is a diffeomorphism between a
is not compact, let fi = MU {A} be the one- neighborhood of x0 and a neighborhood of
point compactilication of M and PM be the X(t, x0, w). This is based on the following fact
space of all continuous paths in &! with A as a for stochastic differential equations on Rd. If
ttrap. These path spaces are endowed with in equation (3) the coefficients 0; and b’ are
the a-fields g( W,) and &?( GM), respectively, C”-functions with bounded derivatives of all
which are generated by Bore1 cylinder sets. orders a, /al> 1, then, denoting by X(&x, w)
By a continuous process on M we mean a the solution such that X(0)=x, we have that
( W,, .?8(W,))-valued random variable, and by x+X(t, x, w) is, with probability 1, a diffeo-
a continuous process on M admitting explo- morphism of Rd for all t [ 131.
sions we mean a ( wM, a( eM))-valued ran- Example 1: Stochastic moving frame [6,15].
dom variable. In this section the probability Let M be a Riemannian manifold of dimen-
space is taken to be the r-dimensional Wiener sion d, O(M) be the orthonormal frame bundle
space (WJ, 9, P) with the increasing family over M, and L,, L,, . . . , L, be the basic vector
{E}, where z is generated by 9&( W,l) and fields on O(M), that is,
P-null sets. Then w =(w(t)), WE WG, is an r- 1
dimensional {Ft}-Brownian motion. (Lif)(x,e)=lim-Cf(x,,e,)-.f(x,e)I1,
t-0 t
Suppose that we are given a system of C” -
vector fields A,, A,, . . . . A, on M. We consider i= 1, . ..d.
1541 406 Ref.
Stochastic Differential Equations

where e = (e, , . , ed) is an orthonormal basis in strong solution, Theory Prob. Appl., 20 (1975),
T,(M), x, = Exp(tei), i.e., the geodesic such that 416&418.
x0 =x and i = ei, and e, is the parallel translate [2] K. D. Elworthy, Stochastic dynamical
of e along x,. Let b be a vector field on M and systems and their flows, Stochastic Analysis, A.
L,, be its horizontal lift on O(M), i.e., L, is a Friedman and M. Pinsky (eds.), Academic
vector field on O(M) determined by the follow- Press, 1978, 79-95.
ing two properties: (i) L, is horizontal and (ii) [S] A. Friedman, Stochastic differential equa-
&(L,)=b, where n:O(M)+M is the projec- tions and applications I, II, Academic Press,
tion. Consider the following stochastic differ- 1975.
ential equation on O(M): [4] I. I. Gikhman and A. V. Skorokhod, Sto-
chastic differential equations, Springer, 1972.
fir(t)= L,(r(t))odw’(t)+ L,(r(t))dt.
[S] N. Ikeda: On the construction of two-
Solutions determine a family of (local) diffeo- dimensional diffusion processes satisfying
morphisms r+r(t, r, w)=(X(t, r, w), e(t, r, w)) Wentell’s boundary conditions and its applica-
on O(M). The law of [t-X(t,r, w)] depends tion to boundary value problems, Mem. Coll.
only on x = n(r), and it defines a diffusion Sci. Univ. Kyoto, (A, Math.) 33 (1961), 367-
process on M that is generated by the dif- 427.
ferential operator +A,,, + b (AM is the Laplace- [6] N. Ikeda and S. Watanabe, Stochastic
Beltrami operator). Using this stochastic mov- differential equations and diffusion processes,
ing frame r(t, r, w), we can realize a stochastic Kodansha and North-Holland, 1981.
parallel translation of tensor fields along the [7] K. It& Differential equations determining
paths of Brownian motion on M (a diffusion Markov processes (in Japanese), Zenkoku
generated by )A,) that was first introduced Shija Sfigaku Danwakai, 1077 (1942), 1352%
by It8 [lo], and by using it we can treat heat 1400.
equations for tensor fields by means of a prob- [S] K. It& On stochastic differential equations,
abilistic method. Mem. Amer. Math. Sot., 4 (1951).
Example 2: Brownian motion on Lie groups. [9] K. It& Stochastic differentials, Appl. Math.
Let G be a Lie group. A stochastic process and Optimization, 1 (1975), 347-381.
{g(t)} on G is called a right-invariant Brownian [ 101 K. It& The Brownian motion and tensor
motion if it satisfies the following conditions: fields on Riemannian manifold, Proc. Intern.
(i) With probability 1, y(0) = e (the identity), Congr. Math., Stockholm 1962,536-539.
and t-g(t) is continuous; (ii) for every t > s, [ 1 l] K. It8 and S. Watanabe, Introduction to
g(t)g(s)-’ and o(g(u); u<s) are independent; stochastic differential equations, Proc. Intern.
and (iii) for every t > s, g(t)g(s)-’ and g(t - s) Symp. SDE, Kyoto, 1976, K. It8 (ed.), Kino-
are equally distributed. kuniya, 1978, i&xxx.
Let A,, A,, . , A, be a system of right- [ 123 T. Jeulin, Semi-martingales et grossisse-
invariant vector fields on G, and consider the ment d’une filtration, Lecture notes in math.
stochastic differential equation 833, Springer, 1980.
[ 131 H. Kunita, On the decomposition of
(14) dgt=Ai(gt)odwi(t)+A,(g,)dt. solutions of stochastic differential equations,
Then a solution of (14) with go = e exists Proc. LMS Symp. Stochastic Integrals,
uniquely and globally; we denote this solution Durham, 1980.
by g’(t, w). It is a right-invariant Brownian [ 141 H. Kunita and S. Watanabe, On square
motion G, and conversely, every right-invariant integrable martingales, Nagoya Math. J., 30
Brownian motion can be obtained in this way. (1967), 209-245.
The system of diffeomorphisms g+g(t, g, w) [15] P. Malliavin, GCometrie difftrentielle
defined by the solutions of (14) is given by stochastique, Les Presse de l’universiti: de
Pv(L 9, 4 = SOk w)g. MontrCal, 1978.
Generally, if M is a compact manifold, the [16] P. Malliavin, Stochastic calculus of vari-
system of diffeomorphisms g,:x+X(t, x, w) ation and hypoelliptic operators, Proc. Intern.
defined by equation (11) can be considered as Symp. SDE, Kyoto, 1976, K. It8 (ed.), Kino-
a right-invariant Brownian motion on the kuniya, 1978, 195-263.
infinite-dimensional Lie group consisting of all [ 171 P. Malliavin, Ck-hypoellipticity with
diffeomorphisms of M [2]. degeneracy, Stochastic Analysis, A. Friedman
and M. Pinsky (eds.), Academic Press, 1978,
199%214,327-340.
[ 181 H. P. McKean, Stochastic integrals,
References
Academic Press, 1969.
[ 191 P. A. Meyer, Un tours sur les inttgrales
[l] B. S. Tsirel’son (Cirel’son), An example of stochastiques, Lecture notes in math. 511,
stochastic differential equation having no Springer, 1976,245-400.
407 A 1542
Stochastic Processes

[20] A. V. Skorokhod, Studies in the theory of butions. Now, consider two stochastic pro-
random processes, Addison-Wesley, 1965. cesses .%“= {X,},,, and 02 = { II;JfE7.. ?q is called
[21] D. W. Stroock and S. R. S. Varadhan, a modification of .?? if they are defined over a
Diffusion processes with continuous coefli- common probability space (Q8, P) and P(X,
cients I, II, Comm. Pure Appl. Math., 22 = x) = 1 (t E T). Regardless of whe.ther .ot and
(1969), 345 - 400,479%530. O?/are defined over a common probability
[22] D. W. Stroock and S. R. S. Varadhan, space or over different probability spaces, X
Multidimensional diffusion processes, and Y are said to be equivalent or each is said
Springer, 1979. to be a version of the other if their iinite-
[23] S. Watanabe, On stochastic differential dimensional distributions are the same. Ac-
equations for multi-dimensional diffusion cording to Kolmogorov’s extension theorem,
processes with boundary conditions, I, II, J. every stochastic process has a version over the
Math. Kyoto Univ., 11 (1971), 169-180, 545- space W=R“.
551. The function X(w) oft obtained by fixing w
[24] A. D. Venttsel’ (Wentzell), On boundary in a stochastic process {X,},,, is called the
conditions for multidimensional diffusion pro- sample function (sample process or path) corre-
cesses, Theory Prob. Appl., 4 (1959), 164- 177. sponding to o. In applying various operations
[25] T. Yamada and S. Watanabe, On the to stochastic processes and studying detailed
uniqueness of solutions of stochastic differen- properties of stochastic processes, such as
tial equations, J. Math. Kyoto Univ., 13 (1973), continuity of sample functions, the notions of
4977512. measurability and separability play important
roles. We assume that T is an interval in the
real line, and (if needed) that the probability
measure P is tcomplete. Denote by 3 the class
407 (XVll.4) of all +Borel subsets of T. A stochastic process
is said to be measurable if the function
Stochastic Processes ix&T
X,(w) of (t, w) is 3 x ‘B-measurable. Continuity
in probability defined in the next paragraph
A. Definitions gives a sufficient condition for a stochastic
process to have a measurable modification.
The theory of stochastic processes was origi- A stochastic process {X,},,, is said to be sepa-
nally involved with forming mathematical rable if there exists a countable subset S of T
models of phenomena whose development in such that
time obeys probabilistic laws. Given a basic
tprobability space (Q, d, P) and a set T of real
numbers, a family {Xt}ttT of real-valued tran-
dom variables defined on (0, !B, P) is called a
stochastic process (or simply process) over < lim sup X,(w) forany tcT =l.
s-tt,ses
(Q 93, P), where t is usually called the time
parameter of the process. For each finite t-set It was proved by J. L. Doob that every sto-
{t,, , t,}, the +joint distribution of (Xrl, , chastic process has a separable modification
X,“) is called a finite-dimensional distribution C61.
of the process {X,),,T. Stochastic processes Various types of continuity are considered
are classified into large groups such as taddi- for stochastic processes. {Xt}ttT is said to be
tive processes (or processes with independent continuous in probability at SE T if P( 1X,-X,1
increments), +Markov processes, +Markov >E)+O (t+s, tE T) for each E>O; it is said to
chains, tdiffusion processes, +Gaussian pro- be continuous in the mean (of order 1) at SE T if
cesses, +stationary processes, imartingales, and E(lX,+X,l)-0 (t+s, TV T). Continuity in the
+branching processes, according to the prop- mean of order p (>l) is defined similarly. Con-
erties of their finite-dimensional distributions. tinuity in the mean of any order implies con-
This classification is possible because of the tinuity in probability. Suppose thalt {X,},,T is
following fact, a consequence of Kolmo- separable. Then
gorov’s textension theorem (- 341 Proba-
bility Measures I): Given a system .p of finite-
dimensional distributions satisfying certain
tconsistency conditions, we can construct a are measurable events. If P(Q) > 0, then SE T is
suitable probability measure on the space called a fixed point of discontinuity. The con-
W= R“ of real-valued functions on T so that ~ dition P(u seT O,s)= 0 means that almost all
the stochastic process ix,),,,, obtained by sample functions are continuous. Regularity
setting X,(w) = the value of WE W at t, has properties of sample functions of processes,
9 as its system of finite-dimensional distri- such as continuity or right continuity, have
1543 407 B
Stochastic Processes

been studied by many people. The following of a-algebras on (R,‘13, P) is said to be com-
theorem is due to A. N. Kolmogorov: Let T= plete if the probability space (0,8, P) is com-
[O, 11. If plete and if all the P-negligible sets belong
to 8,. Fron now on we assume that {‘%,) is
complete and right continuous (i.e., 8, = ‘H,,
for constants y > 0, E> 0, and c > 0, then for all t 3 0). Let B be a subset of R and {X,}
jXt}tr7. has a modification {s,),,, for which be a process. We call zg = inf{ t 3 0 1X,(o) E B} a
almost all sample functions are continuous, hitting time for B. Measurability of zg is not
and always guaranteed, that is, zs is not always a
stopping time. G. A. Hunt showed that for a
wide class of Markov processes hitting times
for analytic sets are stopping times. This result
is based on a theorem of G. Choquet on capa-
for any 6(0 < fi <E/Y). Each of the following is a citability and was generalized by P. A. Meyer
sufficient condition for % = jXr}te7. to have a as follows: (i) For every progressively mea-
modification for which almost all sample paths surable process, hitting times for analytic sets
are right continuous functions with left limits.
are stopping times; and (ii) for every progres-
(i) 5 is an additive process which is continuous sive set A, D,=inf{t>OI(t,o)EA} is a stop-
in probability (P. LCvy [3,4], K. ItB [S]; - 5 ping time. The following notions on measur-
Additive Processes B). (ii) .%Yis a supermartin- ability are also important. The predictable (r-
gale that is continuous in probability (Doob algebra on [0, cn) x 0, denoted by Y, is defined
[6]; - 262 Martingales C).
to be the least o-algebra on [0, co) x R with
respect to which every process X,(w) that is
B. Increasing Families of a-Algebras adapted to {!I$} and has left continuous paths
is measurable in (t, w). The well-measurable or
In the investigation of stochastic processes optional c-algebra on [0, cc) x R, denoted by
(especially Markov processes. martingales, 0, is defined to be the least o-algebra on [0, co)
and stochastic differential equations), the x 0 with respect to which every process X,((u)
notion of increasing families of cr-algebras that is adapted to {S,} and has right continu-
often plays an important role. Let (Q %, P) ous paths with left limits is measurable in (t, w).
be a probability space, and let T= [0, m). A A process {X,} defined on R is said to be pre-
family P%M of a-subalgebras of !I3 is called dictable (resp. well-measurable or optional) if
an increasing family of a-algebras on (Q ‘%, P) the function (t, W)H X,(o) oti [0, co) x 0 is
if 8,X c 8, for s < t. A process { Xt}ttr is said to measurable with respect to the predictable cr-
be adapted to {B3,} if X, is !&-measurable for algebra B (resp. the optional a-algebra 6). For
each tE T. {X,} is said to be progressively further information regarding the nqtions
measurable (or a progressive process) with given in this section - [lo].
respect to {S,} if for every t E T the mapping Up to this point it was assumed that the
(s, W)H X,(w) of [0, t] x Q into R is measurable space in which a process {XtJttT takes values,
with respect to the o-field 93([0, t]) x 2J. A namely, the state space of {X,},,,, is a set of
process {X,} with right continuous paths, real numbers; but in general, topological
adapted to {&}, is progressively measurable spaces or merely measurable spaces can be
with respect to { ‘11,). The same conclusion taken for the state spaces of stochastic pro-
holds for a process with left continuous paths. cesses. The general definitions and results
A subset A of [0, co) x R is said to be progres- already given can be extended to stochastic
sive if the indicator process a,(w) = lA(t, w) of A processes whose state spaces are +locally com-
is a progressive process. A random time 7 on pact Hausdorff spaces satisfying the second
R with values in [0, co] is called a stopping tcountability axiom.
time (or Markov time) if {T < t} E 8, for all t > Moreover, the time parameter set T of a
0. Constants (30) are stopping times. If 0 process {X,},,T need not be a set of real num-
and z are stopping times, then min(g, z) and bers. For example, P. LCvy [ I23 and H. P.
max(cr, T) are also stopping times. The limit of McKean [ 131 investigated stochastic processes
an increasing sequence of stopping times is a with several-dimensional time; such processes
stopping time, while the limit of a decreasing are sometimes called random fields. The case
sequence of stopping times is a stopping time in which T is S,Y, or in general a space of
with respect to {!&+}, where !B,+ = n,>,%J3,. functions (which is nuclear) has also been
Let 23, be the class of A E% such that A n investigated (- Section C). A probabilistic
{t<t}d!33,(v’td-); th en it is a a-algebra if z is formulation of equilibrium states given by R.
a stopping time. If {X,} is a progressive pro- L. Dobrushin 1141 initiated recent probabilis-
cess and if T is a stopping time, then X,lI,<,-i tic study of statistical mechanics. For further
is %,-measurable. An increasing family (‘93,) information concerning processes with general
407 c 1544
Stochastic Processes

time parameter spaces - 136 Ergodic Theory, a concrete representation off is known [ 161.
176 Gaussian Processes, 340 Probabilistic When k = 0, a necessary and sufficient condi-
Methods in Statistical Mechanics. tion for c(q) to be the characteristic functional
of a stationary random distribution with
independent values at every point 1s that
C. Random Distributions and Generalized expf(x) be the characteristic function of an
Stochastic Processes tinfinitely divisible distribution. One such
random distribution is the so-calle’d Guassian
The investigation of random distributions was white noise, namely, the distribution deriva-
initiated by I. M. Gel’fand [ 153 and ItB [ 173. tive of Brownian motion whose characteristic
Denote by 9 the space of functions oft (--co functional is
<t < co) of class C” with compact tsupport,
and by 9’ the space of tdistributions.
tion X(cp, w) of OE Q and cpE 9 is called a
At func-
ev(-~~la(r)12dt).
random distribution (or generalized stochastic
A family { X,},,a of real-valued random
process) if X(cp, w) is a distribution as a func-
variables indexed by 9 is called a random
tion of cp for almost all w and is measurable as
distribution in the wide sense if X, is linear in
a function of w for each fixed cp. Denote by
cp, namely, X0,+,, = ax, + bX, with proba-
23(w) the smallest a-algebra containing sets of
bility 1 for fixed cp, @ES, and real <constants a,
the form {~~g’ly(cp)~E} ((PEG, E is a Bore1
b, and if X,-O in probability whenever cp+0
set). A random distribution is nothing but a
in the topology of 9. (For a typical class of
s’-valued random variable. For a random
random distributions in the wide seense - 395
distribution X(cp, w), a probability measure @x
Stationary Processes C.) A random distri-
on 23(9’) is induced by
bution in the wide sense has a mod.itication
@,(B)=P{wlX(~,w)EB}, BE23(9). that is a random distribution.
In the definition of random distributions
The functional
(in the wide sense) one can replace the space
2 by the space 9 of rapidly decreasing C”-
functions or in general by some space Q of
functions that is nuclear. For example, one
is called the characteristic functional of X(cp, o) can define random distributions as Y’-valued
or Ox. The functional c(q) is continuous posi- random variables.
tive definite, and c(0) = 1. Conversely, given a Up to this point random distributions of
functional c(q) with these properties, a theo- one variable have been considered. Random
rem of R. A. Minlos (- 341 Probability distributions of several variables are called
Measures J) states that there exists a unique generalized random fields and have been in-
probability measure CDon d(g) whose charac- vestigated by ItB [lS], A. M. Yaglom [19],
teristic functional equals c(q). In other words, Gel’fand and N. Ya. Vilenkin [ 163; and others.
a random distribution with the character- Moreover, K. Urbanik [20,21] developed a
istic functional C(Q) can be constructed over theory of generalized stochastic processes
(is’, b(LY), 0). based on G. Mikusinski’s theory instead of L.
Typical classes of random distributions that Schwartz’s theory of distributions.
have been investigated so far are stationary
random distributions and random distri-
butions with independent values at every D. Random Measure
point. (For stationary ones - 395 Stationary
Processes.) A random distribution X(cp, w) is Let (S, 5, m) be any a-finite measure space,
called a random distribution with independent and put ~,={A~~lrn(A)< co}. B:y virtue of
values at every point if cpl(t)cp2(t)~0 implies the +Kolmogorov’s extension theorem, there exists
independence of X(cp,, w) and X(cp,, w), that is, a family {W4~AEa, of real random variables
c(cp, + (p2)=c((p1)c((p2). A sufficient condition indexed by go such that (i) for any mutually
for the functional of the form disjoint A,, , A,E’&,, { W(A,), . , W(A,)} is
m independent; (ii) for any AE so, W(A) is Gauss-
4cp) = exp f(cp(t), cp’(t), ” 1cp’k’(o) dt ian distributed with mean 0 and variance
(S -cc > m(A); and (iii) for any A, BE so, E( W(A) W(B))
(,f is continuous and f(O, . . , 0) = 0) to be the = m(A n B). Similarly, there exists a family
characteristic functional of a stationary ran- JW)lwo of real-valued random variables
dom distribution with independent values at indexed by so such that (i) for any mutually
every point is that the function exp(sf(xO, x i, disjoint A,, . . . . @SO, {W,), . . ..NtA)} is
. . . . Xk))Of(XO,X1 )... , xk) E Rk be positive defi- independent; (ii) for any AE g,,, N(A) is +Pois-
nite for each s > 0 [16]. Under this condition, son distributed with mean m(A); and (iii) for
1545 408 A
Stochastic Programming

any A, BEAM, E(N(A)N(B))=m(AflB). Sem. Math. Univ. Padova, (5) 22 (1963), 24-
iW(41A,~o and IW)lA,~o are called a 101.
Gaussian random measure and a Poisson ran- [ 131 H. P. McKean, Jr., Brownian motion
dom measure associated with the measure with a several-dimensional time, Theory Prob.
space ($3, m), respectively. By using these Appl., 8 (1963), 335-354.
random measures, the theory of multiple inte- [ 143 R. L. Dobrushin, The description of the
grals can be developed. random field by its conditional distributions
By a point function p on S we mean a map- and its regularity conditions, Theory Prob.
ping p:D,+S, where the domain D, is a count- Appl., 13-14 (1968), 1977224. (Original in
able subset of (0, co). p defines a counting Russian, 1968).
measure N, on (0, co) x S such that N,((O, t] x [ 151 I. M. Gel’fand, Generalized random
U)= #{s~D,,;s<t,p(s)~U} (t>O, UE&). processes (in Russian), Dokl. Akad. Nauk
For a point function p and t > 0, the shift point SSSR (N.S.), 100 (1955) 8533856.
function 0,p is defined by DB,P = {s ~(0, co); s + 1161 I. M. Gel’fand and I. Ya. Vilenkin, Gen-
teD,} and (O,p)(s)=p(s+t). Let l7, be the eralized functions IV, Academic Press, 1964.
totality of point functions on S and g(Z7s) be (Original in Russian, 1961.)
the smallest a-field on us with respect to [17] K. Ito, Stationary random distributions,
which all p+N,((o, t] x U, t>O, UE~J Mem. Coll. Sci. Univ. Kyoto, (A) 28 (1954),
are measurable. A point process on S is a 209-223.
(Z7,, a(Z7s))-valued random variable. Then [ 181 K. Ito, Isotropic random current, Proc.
there exists a point process p on S such that 3rd Berkeley Symp. Math. Stat. Prob. II, Univ.
(i) for any t >O, p and 0,p have the same prob- of California Press, 1956, 125- 132.
ability law, and (ii) N, is a Poisson random [19] A. M. Yaglom (Jaglom), Some classes of
measure associated with ((0, a) x S, @O, co) x random fields in n-dimensional space related
B(Z7,), dt x m(ds)). The point process p is called to stationary random processes, Theory Prob.
the stationary Poisson point process with the Appl., 2 (1957), 273-320. (Original in Russian,
characteristic measure m. 1957.)
[20] K. Urbanik, Stochastic processes whose
sample functions are distributions, Theory
References Prob. Appl., 1 (1956), 132-134. (Original in
Russian, 1956.)
[l] A. N. Kolmogorov, Grundbegriffe der [21] K. Urbanik, Generalized stochastic pro-
Wahrscheinlichkeitsrechnung, Erg. Math., cesses, Studia Math., 16 (1958), 2688334.
1933; English translation, Foundations of the
theory of probability, Chelsea, 1950.
[2] N. Wiener, Differential-space, J. Math.
Phys., 2 (1923) 131-174. 408 (X1X.7)
[3] P. Levy, Processus stochastiques et
mouvement brownien, Gauthier-Villars, 1948.
Stochastic Programming
[4] P. Levy, Theorie de l’addition des vari-
ables aleatoires, Gauthier-Villars, 1937. A. General Remarks
[S] M. S. Bartlet, An introduction to stochas-
tic processes, Cambridge Univ. Press, 1955. Stochastic programming is a method of finding
[6] J. L. Doob, Stochastic processes, Wiley, optimal solutions in mathematical program-
1953. ming in its narrow sense (- 264 Mathematical
[7] J. L. Doob, Stochastic processes depending Programming), when some or all coefftcients
on a continuous parameter, Trans. Amer. are stochastic variables with known probabil-
Math. Sot., 42 (1937), 1077140. ity distributions. There are essentially two dif-
[S] K. Ito, Stochastic processes, Aarhus Univ. ferent types of models in stochastic program-
lecture notes 16, 1969. ming situations: One is chance-constrained pro-
[9] P. A. Meyer, Une presentation de la gramming (CCP), and the other is a two-stage
theorie des ensembles sousliniens; application stochastic programming (TSSP). The differ-
aux processus stochastiques, Sem. Thtorie du ence between them depends mainly on the
Potentiel, 196221963, no. 2, Inst. H. Poincare, informational structure of the sequence of ob-
Univ. Paris, 1964. servations and decisions. For simplicity, let us
[lo] C. Dellacherie and P. A. Meyer, Prob- here consider stochastic linear programming,
abilites et potentiel, Hermann, 1975, chs. I-IV. which is the best-known model at present.
[ 1 l] C. Dellacherie, Capacites et processus Let A,, A be m x n-dimensional matrices and
stochastiques, Springer, 1972. x, c E R” and b, b. E R”. Suppose further that
[12] P. Levy, Le mouvement brownien fonc- components of A, b, c are random variables,
tion dun ou de plusieurs parametres, Rend. while those of A,, b, are constants. Consider
408 B 1546
Stochastic Programming

the following formally defined linear program- being allowed to compensate for it after the
ming problem: min,{ c’x 1Ax < b, x E X0}, X0 = specification of those values. Second stage:
{x~A,x<b,,x>O}. Let (n,%,P) be a proba- One obtains an optima1 compensation ye
bility space (- 342 Probability Theory) such R’ for the given x and the realized values of
that {A(w), h(w), ~((0)) is a measurable trans- the random variables. Assuming that q E R’
formation on 0 into R” xnimtn. is a random vector in addition to A, b, c, we
can formulate TSSP as follows. First stage:
min,E,{(c(w)‘x + Q(x, w) 1XEX)}; second
B. Chance-Constrained Programming (CCP) stage: Q(x,~)=min~{q(o)‘yI Wy=A(w)x-
b(w),y>O}, where X=X,flK, K :={xlQ(x,o)
This method is based on the assumption that < +co with probability 1) and q(tu)‘y is a loss
a decision x has to be made in advance of function for the deviation A(w)x -- b(o). The
the realization of the random variables. Sup- m x n matrix W is called a compensation ma-
pose that A,(w) is the ith row of A(w), and trix. Several theorems have been proved: (i) K
b,(w) is the ith component of b(w). We call is a closed convex set; (ii) Q(x) = E,Q(x, co) is a
P( {w 1A,(w)x < h,(w)}) > cli a chance constraint, convex function on K if the random variables
where rxi is a prescribed fractional value deter- in A(w), b(w), q(w) are square integrable; (iii)
mined by the decision maker according to his if P has a density function, then Q(x) has a con-
attitude toward the constraint A,(w)x < hi(o): tinuous gradient on K; (iv) when P has a finite
if he attaches importance to it, he will take discrete probability distribution, a. TSSP prob-
C(~as great as possible. Defining feasible sets lem is reduced to a linear programming prob-
X,(cci) and X by X,(cc,)={xIP({wIA,(w)x< lem having a dual decomposition structure.
b,(w)}) > ai}, X=X,, n {n:, X,(cc,)}, we can
formulate CCP as follows: min,{ F(x) 1x~X},
References
where F: X + R, is the certainty equivalent of
the stochastic objective function c’x. We have
[l] A.. Charnes and W. W. Cooper, Chance-
four models of CCP corresponding to differ-
constrained programming, Management Sci.,
ent types of F(x): (i) E-model: F(x)=?x, C=
6 (1959), 73-79.
E,c(w)‘x. (ii) I/-model: F(x) = Var(c(w)‘x) =
[2] G. B. Dantzig and A. Madansky, On the
x’V,x, where V, is a variance-covariance ma-
solution of two-stage linear programs under
trix of c(w). (iii) P, -model: F(x) =A P( {w 1c(w)‘x
uncertainty, Proc. 4th Berkeley Symp. Math.
<f})aaO, 1/2<a,< 1. (iv) P,-model: F(x)=
Statist. Probab. I, Univ. of California Press,
P( {w ) c(w)x 3 y}) for a given constant y. In
1961.
particular, if the components of A(w), b(o),
[3] S. Kataoka, A stochastic programming
c(w) have a multidimensional normal distri-
model, Econometrica, 3 1 (1963), 18 1- 196.
bution, the certainty equivalent of the ith
[4] I. M. Stancu-Minasian and M. J. Wets, A
chance constraint is derived in the following
research bibliography in stochastic program-
form: Aix+cD~‘(ai)(x’V,x+2wjx+u~)1i2~~,
ming 19551975, Operations Res., 24 (1976),
where Ai, I$ wi, vi, hi are expectation vectors
1078-1119.
or a variance-covariance matrix of A,(w) and
[S] S. Vajda, Probabilistic programming,
b,(w) and @(t)=S’, exp( - z2/2) dz/&. The
Academic Press, 1972.
set Xi(ai) for this constraint can be shown to
[6] P. Kall, Stochastic linear prog,ramming,
be convex for l/2 < ai < 1, by using the con-
Springer, 1976.
vexity of the function &% for a positive
[7] V. V. Kolbin, Stochastic programming,
semidefinite matrix V. Under the same assump-
Reidel 1977. (Original in Russian, 1977.)
tion we can obtain the objective functions
F(x)=~‘x+W’(a,)Jx’l/,x for the P,-model
and F(x) = (C’x - y)/Jx’v,x for the P,-model.
These four models have been shown to be
computable by applying convex programming 409 (11.7)
techniques, including the conjugate gradient
method. Further studies on the convexity of a
Structures
more general chance constraint P( {co 1A (w)x <
b(u)}) 2 a, 0 < c(< 1, appear in several articles. A. Examples of Structure

Structure is a unified description of mathe-


C. Two-Stage Stochastic Programming (TSSP) matical objects such as tordered sets, irings,
+linear spaces, +topological spaces, +prob-
This method divides the decision process into ability spaces, +manifolds, etc., using only the
two stages. First stage: Before the realization concepts +set and irelation. The following are
of random variables, one makes a decision x, examples.
1547 409 B
Structures

(1) Order. An tordering on a set A is a (4) Topology. A +topology on a set A is


binary relation in A (- 358 Relations) with determined by a set a~ VW(A), called the +sys-
a +graph r such that (i) if a~ A, then (a, a)~cc; tern of open sets, satisfying the following
(ii) if (u,b)~a and (b,a)~m, then a=h; and (iii) conditions: (i) JZIEC( and AGE; (ii) if /jca, then
if(a,h)Ea and (h,c)~x(, then (a,c)~a, where c( is u /Jo 2; and (iii) if /I c c( is finite, then fi /Ie c(.
an element of the ipower set ‘@(A x A). We say We say that c( defines the structure of a topol-
that c( determines a structure of ordering on ogy on the set A (- 425 Topological Spaces).
the set A.
(2) Law of composition. A law of compo-
sition on a set A is a mapping from A x A (or
B. Mathematical Structures
a subset) to A. This mapping is considered as
a ternary relation with a graph s(E~J?(A x A
x A), satisfying the following two conditions We now explain the concept of mathematical
(or possibly only (ii)): (i) if (CL,h)~ A x A, then structure for the case of a tlinear space (- 256
(n, h, c) E 1 for some c E A; (ii) if (a, b, c) E a and Linear Spaces). A linear space has two basic
(a, h, c’) E s(, then c = c’. We say that r deter- sets, one of which is a set K of elements called
mines the structure of a law of composition scalars and the other, a set V of elements called
in A. The +associative law in the law of com- +vectors, two laws of compositions in K called
position determined by r is given by: (iii) if addition and multiplication, a law of compo-
(u,b,x)~cc,(b,c,y)~r,(x,c,z)~cx, and (u,~,z’)Ex, sition in Vcalled addition, and an operation of
then z = z’. A set with conditions (i), (ii), and K on Vcalled scalar multiplication. The laws
(iii) is called a semigroup. of composition and the operation are given by
(3) Operation. An operation of a set A on a elements of power sets: cx,, z2 E $$I%( K x K x K ),
set B is a mapping from A x B (or a subset) to x~E(@(I/x Vx V), and QE$%(K x Vx V); and
B. It is considered as a ternary relation with the basic properties of the linear space, such as
the graph yg’j.J(A x B x B), satisfying the fol- ?.(a + b) = iu + i,b (IG K, u, bE V), are described
lowing two conditions (or possibly only (ii)): (i) as propositions on K, V, x1, , r*4 and denoted
if (a, b)E A x B, then ((I, b, C)E~ for some CE B; by P(K, r/; r, , , x4),
(ii) if (a, b,c)Ey and (a, b,c’)Ey, then c=c’. We Up to now, we have been considering a
say that 11determines the structure of an oper- given linear space. To give a description of a
ation of A on B. Each element r of A is called linear space in general, we use the symbols
an operator on B; A is called a domain of X,, X,, 5,) . , t4 instead of the symbols K,
operators on B, and B is called an A-set. When V, s(, , , CQ, replace conditions such as CI, E
B is the main object of consideration, an oper- ‘P(KxKxK) /... byt,~,1I(X,xX,xX,) ,...,
ation of A on B is sometimes called an external and consider the set Z of these symbols and
law of composition of A on B. The law of com- formulas:
position of A as described in (2) is then called
an internal law of composition of A. When a
domain of operators A on B determined by r,Eq(X* xx, XX,) ,...)
~E’J~(A x B x B) has an internal law of compo-
L~‘!-YXl x x2 XX,)
sition determined by a~p(A x A x A), it is
usually assumed that the following conditions The set Z is called the type of linear space.
on n, y are satisfied: (iii) if (a, b,x)Eg, (b,c,y)E;‘, Similarly, we consider the set r of all
(x,c,z)~y, and (u,y,z’)~y, then z=z’. If we P(X,, X,, [,, , <,), corresponding to the
denote the law of composition by (a, b)+ah basic properties P(K, V, aI,. , n,), of the
and the operation by (u, b)+u. b, then con- linear space. The set r is called the axiom
dition (iii) may be written: (iii’) (ab).c=u.(b.c) system of the linear space.
for a, b6 A and CEB. When an A-set B with an In general, let A,, , A, be the basic sets
external law of composition determined by (K and V in the preceding example). The basic
;JE’Q(A x B x B) has an internal law of compo- concepts z,, , r,, (r,, , x4 in the preceding
sition determined by ~E’@(B x B x B), it is example) are given as elements of finitely
usually assumed that the following condition generated sets from A,, , A,,,, i.e., elements
on 8, ;I is satisfied: (iv) if(u,b,l)Ey, (a,c,y)~y, of sets obtained by a finite number of ap-
(b,c,z)EB, (x,~, w)E/~, and (u,z, w’)~;‘, then w = plications of the operations of forming the
w’. According to the notation ah and u.b, it +Cartesian product and the tpower set from
is described as: (iv’) (~~b)(u~c)=u~(bc) for UE A A,, , A,. Basic properties are given as pro-
and b,cEB. positions on A,, , A,, c(, , , CI,. These basic
The mapping A x B+B(B x A-B) is called properties and A,, , A,,,, c(, , , cq, determine
a left (right) operation of A on B. To emphasize a mathematical system. We consider also a
leftness or rightness, “left-” or “right-” is at- type Z of symbols X,, , X, of basic sets and
tached to corresponding concepts. symbols cl, . . , <. of basic concepts, and an
409 c 1548
Structures

axiom system I- of basic properties. The pair composition, and the set of automlorphisms
(C, r) determines a mathematical struc- forms a group under composition. The con-
ture. When we substitute sets A,, . , A, for cept of homomorphism is a fundamental
X ,,..., X,andsc ,,..., x,for< ,,..., &.where concept appearing in all algebraic systems. A
X,, , X, and 5,). , & satisfy the axiom homomorphism is sometimes called a
system r, then (A,, . . , A,,a ,,..., cc,)iscalleda representation.
mathematical system with the mathematical An element e of an algebraic system A with
structure (C, r), or a model of the structure a law of composition ab is called an identity
(C, r). Two mathematical systems are called element if ae = eu = a (for all a E A). If such an
similar or of the same kind if they have the element exists, then it is unique. In the case of
same mathematical structure. Groups, rings, a ring, two laws of composition, addition and
topological spaces, etc., are mathematical multiplication, are given. In this case the iden-
structures. Mathematical systems are some- tity element for multiplication (if it exists) is
times called algebraic systems in the wider called the identity element (or +un-ity element)
sense, and when we consider mainly their laws of the ring. In the case of homomorphism
of composition (and operations), we call them between groups, the identity element is map-
algebraic systems. We explain this in further ped to the identity element, but this does
detail in Section C. not always hold in general algebraic systems.
Since the identity element plays an important
role, it is frequently added to the basic con-
cepts. Homomorphism between mathematical
C. Algebraic Systems
systems is generally defined to induce a map-
ping between basic concepts. A semigroup and
Algebraic systems are sets with laws of compo- a ring with a unity element are called a unitary
sition and operations satisfying certain axiom semigroup (monoid) and a unitary ring, respec-
systems; the laws of composition and oper- tively, and homomorphisms betwe:en these
ations and the axiom systems they satisfy systems are restricted to mappings that map
determine their type (- 2 Abelian Groups, 29 the identity element to the identity element.
Associative Algebras, 42 Boolean Algebra, 67 Let A and A’ be similar algebraic systems,
Commutative Rings, 149 Fields, 151 Finite and let A be a subset of A’. A is called a sub-
Groups, 190 Groups, 23 1 Jorden Algebras, 248 system of A’ if the mapping ,f: A+ A’ defined
Lie Algebras, 368 Rings). Each algebraic sys- by ,f(u) = u(ue A) is a homomorphllsm. A sub-
tem has its own theory, but general properties system of a group (ring) is called a tsubgroup
and related concepts are dealt with from a (tsubring), and similarly for other ialgebraic
common standpoint. From this common systems.
ground we often get an insight into concepts An iequivalence relation R in an alge-
from which arose a general theory of math- braic system A is called compatible with the
ematical systems. We describe here only al- law of composition if R(u, a’) and R(h, h’)
gebraic systems, but it is possible to describe imply R(ub, a’h’). Consider the tquotient set
similar concepts for mathematical systems. A/R. Then the law of composition in A/R is
The following is a description based mainly on uniquely determined so that the mapping
one law of composition (a, b)-ah; a similar /‘: A --) A/R defined by a of is a homomor-
description is possible for the case of two or ‘phism. The algebraic system thus obtained
more laws of composition. is called a quotient system. A quotient sys-
The law of composition ub is sometimes tem of a group (ring) is a group (ring), called
written a+h, u’h, [a,h], etc. A mappingf:.4i-, a +quotient group (tquotient ring). Other
A’ of similar algebraic systems A and A’ is cases, including those where operations are
called a homomorphism provided that ,f‘(uh) = given, are treated similarly (- 52 Categories
J’(u)f(b) (a, hi A). A’ is said to be homomor- and Functors).
phic to A if there is homomorphism from A
onto A’. If ,f is one-to-one, onto, andits in-
verse mapping J’-’ : A’AA is also a homomor-
References
phism, then f is called an isomorphism, A’ is
said to be isomorphic to A, and the relation is
written A G A’. The composition of homomor- [l] N. Bourbaki, ElCments de mathtmatique,
phisms is a homomorphism, and the identity I. ThCorie des ensembles, ch. 4, Structures,
mapping is an isomorphism. A homomor- Actualit& Sci. Ind., 1258a, Hermann, second
phism of A to A itself is called an endomor- edition, 1966; English translation, Theory of
phism, and if it is also an isomorphism, then sets, Addison-Wesley, 1968.
it is an automorphism. The set of all endo- [2] C. Chevalley, Fundamental concepts of
morphisms of A forms a tsemigroup under algebra, Academic Press, 1956.

You might also like