You are on page 1of 100

monthly

THE AMERICAN MATHEMATICAL

VOLUME 121, NO. 10 DECEMBER 2014

Evolving Evolutoids 871


P. J. Giblin and J. P. Warder
Numerical Semigroups, Cyclotomic Polynomials, 890
and Bernoulli Numbers
Pieter Moree
A Perron-Type Theorem on the Principal Eigenvalue 903
of Nonsymmetric Elliptic Operators
Lei Ni
Absolute Convergence in Ordered Fields 909
Pete L. Clark and Niels J. Diepeveen

NOTES
An Unnoticed Consequence of Szegö’s Distribution 917
Theorem
William F. Trench
Coset Intersection Graphs for Groups 922
Jack Button, Maurice Chiodo, and Mariano Zeron-Medina Laris
Complex Descartes Circle Theorem 927
Sam Northshield
The Length of an Arithmetic Progression Represented 932
by a Binary Quadratic Form
Pallab Kanti Dey and R. Thangadurai
Cosines and Cayley, Triangles and Tetrahedra 937
Marshall Hampton
A Note on the Spectral Theorem in the Finite-Dimensional 942
Real Case
Felipe Acker
PROBLEMS AND SOLUTIONS 946
BOOK REVIEW
Mostly Surfaces 954
By Richard Evan Schwartz
Genevieve S. Walsh

END NOTES 957


MONTHLY REFEREES FOR 2014 960
MATHBITS
963, From the Monthly Over 100 Years Ago. . .
964, One More Proof of the Irrationality of 2

An Official Publication of the Mathematical Association of America


Zombies and “If you’re dying to read a novel treatment of calculus,
then you should run (don’t walk!) to buy Zombies and
Calculus Calculus by Colin Adams. You’ll see calculus come alive
Colin Adams in a way that could save your life someday.”
—Arthur Benjamin, Harvey Mudd College
Cloth $24.95 978-0-691-16190-7

John Napier “Havil is an enthusiastic and engaging writer—he


Life, Logarithms, brings to life John Napier’s original work and gives an
account of his mathematical ideas. Readers will gain an
and Legacy
appreciation for Napier’s brilliance and for an era when
Julian Havil scientific computation was still in its infancy.”
—Glen Van Brummelen, author of Heavenly
Mathematics
Cloth $35.00 978-0-691-15570-8

The Irrationals “From its lively introduction straight through to a


rousing finish this is a book which can be browsed for
A Story of the
its collection of interesting facts or studied carefully by
Numbers You Can’t anyone with an interest in numbers and their history.
Count On . . . This is a wonderful book.”
Julian Havil —Richard Wilders, MAA Reviews
Paper $18.95 978-0-691-16353-6

“One cannot help being impressed, in reading the book


Elliptic Tales and pursuing a few of the references, by the magnitude
Curves, Counting, of the enterprise it chronicles.”
and Number Theory —James Case, SIAM News
Avner Ash & Paper $16.95 978-0-691-16350-5
Robert Gross

In Pursuit of “An excellent place for an interested amateur to get the


gist of these big ideas in a down-to-earth discussion.”
the Traveling —Jordan Ellenberg, Wall Street Journal
Salesman Paper $16.95 978-0-691-16352-9
Mathematics at
the Limits of
Computation
William J. Cook

See our E-Books at


press.princeton.edu
monthly
THE AMERICAN MATHEMATICAL

VOLUME 121, NO. 10 DECEMBER 2014

EDITOR
Scott T. Chapman
Sam Houston State University

NOTES EDITOR BOOK REVIEW EDITOR


Sergei Tabachnikov Jeffrey Nunemacher
Pennsylvania State University Ohio Wesleyan University

PROBLEM SECTION EDITORS


Douglas B. West Gerald Edgar Doug Hensley
University of Illinois Ohio State University Texas A&M University

ASSOCIATE EDITORS
William Adkins Jeffrey Lawson
Louisiana State University Western Carolina University
David Aldous C. Dwight Lahr
University of California, Berkeley Dartmouth College
Elizabeth Allman Susan Loepp
University of Alaska, Fairbanks Williams College
Jonathan M. Borwein Irina Mitrea
University of Newcastle Temple University
Jason Boynton Bruce P. Palka
North Dakota State University National Science Foundation
Edward B. Burger Vadim Ponomarenko
Southwestern University San Diego State University
Minerva Cordero-Epperson Catherine A. Roberts
University of Texas, Arlington College of the Holy Cross
Allan Donsig Rachel Roberts
University of Nebraska, Lincoln Washington University, St. Louis
Michael Dorff Ivelisse M. Rubio
Brigham Young University Universidad de Puerto Rico, Rio Piedras
Daniela Ferrero Adriana Salerno
Texas State University Bates College
Luis David Garcia-Puente Edward Scheinerman
Sam Houston State University Johns Hopkins University
Sidney Graham Anne Shepler
Central Michigan University University of North Texas
Tara Holm Frank Sottile
Cornell University Texas A&M University
Roger A. Horn Susan G. Staples
University of Utah Texas Christian University
Lea Jenkins Daniel Ullman
Clemson University George Washington University
Daniel Krashen Daniel Velleman
University of Georgia Amherst College
Ulrich Krause
Universität Bremen

ASSISTANT MANAGING EDITOR MANAGING EDITOR


Bonnie K. Ponce Beverly Joy Ruedi
NOTICE TO AUTHORS Proposed problems or solutions should be sent to:
DOUG HENSLEY, MONTHLY Problems
The MONTHLY publishes articles, as well as notes and other fea-
Department of Mathematics
tures, about mathematics and the profession. Its readers span
Texas A&M University
a broad spectrum of mathematical interests, and include pro-
3368 TAMU
fessional mathematicians as well as students of mathematics
College Station, TX 77843-3368
at all collegiate levels. Authors are invited to submit articles
and notes that bring interesting mathematical ideas to a wide In lieu of duplicate hardcopy, authors may submit pdfs to
audience of MONTHLY readers. monthlyproblems@math.tamu.edu.
The MONTHLY’s readers expect a high standard of exposition; Advertising correspondence should be sent to:
they expect articles to inform, stimulate, challenge, enlighten, MAA Advertising
and even entertain. MONTHLY articles are meant to be read, en- 1529 Eighteenth St. NW
joyed, and discussed, rather than just archived. Articles may Washington DC 20036
be expositions of old or new results, historical or biographical Phone: (202) 319-8461
essays, speculations or definitive treatments, broad develop- E-mail: advertising@maa.org
ments, or explorations of a single application. Novelty and
Further advertising information can be found online at www.
generality are far less important than clarity of exposition
maa.org.
and broad appeal. Appropriate figures, diagrams, and photo-
graphs are encouraged. Change of address, missing issue inquiries, and other sub-
scription correspondence can be sent to:
Notes are short, sharply focused, and possibly informal. They
are often gems that provide a new proof of an old theorem, a maaservice@maa.org
novel presentation of a familiar theme, or a lively discussion or
of a single issue. The MAA Customer Service Center
P.O. Box 91112
Submission of articles, notes, and filler pieces is required via Washington, DC 20090-1112
the MONTHLY’s Editorial Manager System. Initial submissions in (800) 331-1622
pdf or LATEX form can be sent to the Editor Scott Chapman at (301) 617-7800
www.editorialmanager.com/monthly Recent copies of the MONTHLY are available for purchase
through the MAA Service Center at the address above.
The Editorial Manager System will cue the author for all re-
quired information concerning the paper. Questions concern- Microfilm Editions are available at: University Microfilms In-
ing submission of papers can be addressed to the Editor at ternational, Serial Bid coordinator, 300 North Zeeb Road, Ann
monthly@shsu.edu. Authors who use LATEX can find our ar- Arbor, MI 48106.
ticle/note template at www.maa.org/monthly.html. This tem-
The AMERICAN MATHEMATICAL MONTHLY (ISSN 0002-9890) is
plate requires the style file maa-monthly.sty, which can also
published monthly except bimonthly June-July and August-
be downloaded from the same webpage. A formatting docu-
September by the Mathematical Association of America at
ment for MONTHLY references can be found there too.
1529 Eighteenth Street, NW, Washington, DC 20036 and Lan-
Letters to the Editor on any topic are invited. Comments, criti- caster, PA, and copyrighted by the Mathematical Association
cisms, and suggestions for making the MONTHLY more lively, of America (Incorporated), 2014, including rights to this jour-
entertaining, and informative can be forwarded to the Editor nal issue as a whole and, except where otherwise noted, rights
at monthly@shsu.edu. to each individual contribution. Permission to make copies
of individual articles, in paper or electronic form, including
The online MONTHLY archive at www.jstor.org is a valuable
posting on personal and class web pages, for educational and
resource for both authors and readers; it may be searched
scientific use is granted without fee provided that copies are
online in a variety of ways for any specified keyword(s). MAA
not made or distributed for profit or commercial advantage
members whose institutions do not provide JSTOR access
and that copies bear the following copyright notice: [Copy-
may obtain individual access for a modest annual fee; call
right 2014 Mathematical Association of America. All rights
800-331-1622 for more information.
reserved.] Abstracting, with credit, is permitted. To copy
See the MONTHLY section of MAA Online for current informa- otherwise, or to republish, requires specific permission of the
tion such as contents of issues and descriptive summaries of MAA’s Director of Publications and possibly a fee. Periodicals
forthcoming articles: postage paid at Washington, DC, and additional mailing offic-
es. Postmaster: Send address changes to the American Math-
www.maa.org/monthly.html
ematical Monthly, Membership/Subscription Department,
MAA, 1529 Eighteenth Street, NW, Washington, DC 20036-1385.
Evolving Evolutoids
P. J. Giblin and J. P. Warder

Abstract. The envelope of straight lines normal to a plane curve C is its evolute; the envelope
of lines tangent to C is the original curve, together with the entire tangent line at each inflexion
of C. We introduce some standard techniques of singularity theory and use them to explain
how the first of these envelopes turns into the second, as the (constant) angle between the set of
lines forming the envelope and the set of tangents to C changes from 12 π to 0. In particular, we
explain how cusps disappear and what happens at inflexions, where the evolute goes to infinity.
We also study the family of “wavefronts” or “parallels” associated with these envelopes.

1. INTRODUCTION. Let σ be a plane curve, which we shall often assume is closed,


and always assume is free from singularities (such as cusps) and self-intersections. The
family of tangent lines to σ and the family of normal lines to σ each have an envelope.
We make the definitions precise in Section 2, but the general idea is that the envelope
of a family of lines in the plane is a curve tangent to all of them. Unsurprisingly, the
envelope of the tangent lines to σ contains—at least—σ itself. The envelope of normal
lines is called the evolute of σ , and this has cusps corresponding to the curvature
extrema of σ . The evolute also “goes to infinity” corresponding to inflexions—zeros
of curvature—of σ .
It is natural to ask what lies between the envelope of tangents and the envelope
of normals. Let us fix an angle α and consider straight lines L obtained by rotating
each tangent to σ at a point p ∈ σ about p counterclockwise through α, denoting
the envelope of the lines L by τα . See Figure 1. Thus, τ0 is the envelope of tangent
lines and τπ/2 is the envelope of normal lines. For other values of α, τα is a so-called
evolutoid of σ . The geometry of these envelopes has been studied since Réaumur [13]
in 1709. For a modern reference see [10]; this, like most studies of evolutoids, restricts
attention to the case when σ is an oval, that is, a closed curve without inflexions and

Figure 1. Left: We consider the envelope τα of lines L such that the counterclockwise angle α between the
tangent T to σ and L is constant. Center and right: An ellipse σ (t) = (2 cos t, sin t) and the envelope with
α = 14 π = 0.785 . . . , clearly showing four cusps, and α = 0.5, where the cusps have almost disappeared and
the envelope is more closely approximating the ellipse itself. For clarity, the lines L are drawn only in the
“forward” direction at each point. See also Example 3.2.

http://dx.doi.org/10.4169/amer.math.monthly.121.10.871
MSC: Primary 53A04, Secondary 53A05; 57R45; 58K05

December 2014] EVOLVING EVOLUTOIDS 871


hence strictly convex, such as an ellipse. We relax this condition here and consider
curves with inflexions. In [1], the authors study the opposite situation of generalized
involutes of plane curves: σ is an involute of τ if τ is the evolute of σ .
What happens when α moves from 0 to 12 π, so that τα evolves from the envelope
of tangent lines to a many-cusped evolute? The object of this article is to explain how
some basic techniques of singularity theory enable us to say exactly what happens, in
the sense of showing that in precisely defined conditions the cusps appear and disap-
pear in a fixed manner and, crucially, explaining the contribution of inflexions. These
are important since—as is well known, though we prove it in Proposition 2.3 below—
the envelope of tangent lines to a plane curve σ contains, besides σ itself, also the
entire tangent line at each inflexion of σ . Figure 2 shows the envelope of normals
(α = 12 π) and the envelope of tangents (α = 0) to a closed curve with two inflexions.
Later we investigate what happens for α close to 0; see Section 5 and Figure 9 for an
illustration.

Figure 2. A curve with two inflexions—the curve appears as the kidney-shaped line on the left. Left and
middle: The envelope of normals, which has six cusps and two asymptotes, shown in full on the left. The
center diagram shows the normals themselves; for clarity they are drawn only in the direction in which they
“focus” on the envelope. Right: The envelope of tangents, this time drawn in both directions. Here the envelope
includes the original curve and the whole of the tangent lines at the inflexions.

The key ingredients of singularity theory on which we call are the theory of un-
foldings and discriminants and the theory of functions on discriminants. We cannot
(alas) present all the details of these theories here, but we hope that enough is said to
show how powerful abstract techniques yield highly concrete geometrical results. (For
details of most of the techniques, and other geometrical applications, see [6].)
The article is organized as follows. In Section 2 we firm up the definitions and
give an explicit formula for the envelope τα , recalling some basic facts about plane
parametrized curves. In Section 3 we study the cusps of τα , introducing the first ideas
from singularity theory and the classification of functions. In Section 4 we set the study
of envelopes in the general context of discriminants and functions on discriminants. In
Section 5 we state the necessary results from singularity theory and apply them to the
example of evolutoids, giving the main results as Corollary 5.7 and Theorem 5.9. In a
nutshell, the results say that as α varies the local appearance of the evolutoids changes
in one of only two ways, a “swallowtail transition” as in Figures 5 and 6, or a “beaks”
transition as in Figures 4 and 9.
In Section 6 we study the wavefronts associated to a given value of α; the cusps
on these wavefronts trace out the envelope τα , but in general the wavefronts are not
closed curves. Finally, in Sections 7 and 8 we draw things together and offer some
more details of proofs.

872 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
2. ENVELOPES. We shall abuse notation mildly by using the same symbol σ to
denote a plane curve and a parametrization. For us, the most interesting examples occur
when σ is a closed curve, such as an ellipse (σ (t) = (a cos t, b sin t) for a > 0, b > 0,
and 0 ≤ t ≤ 2π), and we shall give such examples below. Since the line L is not itself
oriented, the angle α in Figure 1 can be considered modulo π. Parametrizing σ by t
automatically gives it an orientation (increasing t), but if we reverse this orientation,
then α gives the same line L.
We assume throughout that σ (t) = (X (t), Y (t)) is a regular parametrized curve;
that is, X and Y are smooth functions of t and (using  for d/dt), X  (t) = Y  (t) =
0 never happens, so that the speed ||σ  (t)|| is always nonzero. We use the standard
notation T , or T (t), for the unit tangent σ  (t)/||σ  (t)|| and N , or N (t), for the unit
normal, obtained by rotating T counterclockwise through 12 π. The curvature κ(t) is
given by T  (t) = κ(t)||σ  (t)||N (t), and in terms of X and Y has the rather unattractive
formula (omitting the variable t) κ = (X  Y  − X  Y  )/(X 2 + Y 2 )3/2 .
Finally, we make the following assumption.

Assumption 2.1. The curve σ is generic in the precise sense that all inflexions (zeros
of curvature) and vertices (extrema of curvature) are ordinary. This means that, for σ ,
inflexions are simple zeros and vertices are simple extrema: κ = 0, κ  = 0 at inflexions,
and κ = 0, κ  = 0, κ  = 0 at vertices.1

We are interested in the line L obtained by rotating the tangent a fixed angle α; the
direction of L is therefore T (t) cos α + N (t) sin α. Hence, T (t) sin α − N (t) cos α is
perpendicular to L and a vector equation of the line L is F(x, t) = 0, where

F(x, t) = (x − σ (t)) · (T (t) sin α − N (t) cos α). (1)

Here · is the usual scalar product of vectors and x = (x, y) ∈ R2 . Thus, we regard α as
fixed; then F(x, t) = 0 represents a family of lines: Each t gives a line, and as t varies
the line moves in the (x, y)-plane.
There is a simple method for finding the envelope τα of these lines. Visually, the
envelope is a curve tangent to all the lines, or the “limit of intersection points of con-
secutive lines”; see Figure 3. In fact, both these latter definitions, when made precise,
give curves which are always subsets of τα , as defined below. This is shown in [6, pp.
107–109].

Figure 3. Envelopes of families of lines are intuitively formed by (left) intersections of “consecutive” lines or
(right) a curve tangent to all the lines. Generally these definitions coincide with Definition 2.2 used here.

Definition 2.2. The envelope τα of the family of lines given by (1), for a fixed α, is
the set of points x = (x, y) in the plane for which there exists t with

∂F
F(x, t) = (x, t) = 0.
∂t
1 It can be shown that, in a precise sense, “almost all” closed curves are generic in the sense used here. See

[6, Chapter 9].

December 2014] EVOLVING EVOLUTOIDS 873


To solve these equations we use the standard Serret–Frenet formulas,

T  (t) = κ(t)N (t)||σ  (t)||, N  (t) = −κ(t)T (t)||σ  (t)||, (2)

which relate the derivatives of T and N to T and N themselves, the curvature κ and
the speed ||σ  || of the curve σ . (See any book on differential geometry, or alternatively
[6, Chapter 2].)
Using the fact that α is constant gives

∂F 
= − sin α||σ  (t)|| + (x − σ (t)) · κ(t) sin α N (t)||σ  (t)||
∂t

+ κ(t) cos αT (t)||σ  (t)|| .

Now any vector is a linear combination of the form λT (t) + μN (t). Applying this to
the vector x − σ (t) and substituting in F = ∂∂tF = 0, we obtain two equations for λ
and μ:

λ sin α − μ cos α = 0,
λκ(t) cos α + μκ(t) sin α = sin α. (3)

Solving these gives a parametrization of the envelope, for a fixed α:

sin α cos α sin2 α


x(t) = σ (t) + T (t) + N (t), (4)
κ(t) κ(t)

provided κ(t) = 0. Thus, if α = ± 12 π, the lines F = 0 are the normals to σ and the
envelope is the usual evolute, namely the set of points σ (t) + κ(t)
1
N (t); these points are
also called the centers of curvature of σ . If α = 0 (or π), then the lines F = 0 are the
tangents to σ and the envelope, away from inflexions where κ(t) = 0, is the original
curve σ .
Note that when α = 0 and κ(t) = 0, then using (3) directly we have μ = 0, λ arbi-
trary so, as mentioned in the Introduction, we have the following.

Proposition 2.3. The envelope of tangent lines consists of the original curve σ to-
gether with the whole tangent line at inflexion points.

Our aim in what follows is to describe exactly how, as α moves from 12 π to 0, the
evolute of σ turns into the curve σ itself, and in particular to explain what happens at
inflexions, where κ(t) = 0.

Remark 2.4. There are numerous attractive properties of the envelope τα . Here are
two, the first pointed out to us by the referees, and the second proved in [10, Proposi-
tion 5].
(i) The line joining the center of curvature σ (t) + (1/κ(t))N (t) of the curve σ
at the point with parameter t to the envelope point as in (4) has the direction
sin αT (t) − cos α N (t), which is perpendicular to the line L. This has the fol-
lowing interpretation, using “angle in a semicircle”: the envelope point (4) is
the intersection of the line L with the circle tangent to σ at σ (t) and passing

874 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
through the center of curvature. When α = 12 π, this point is the center of cur-
vature. When α = 0 the result holds—the envelope point is σ (t) itself—except
of course at inflexions, where κ(t) = 0.
(ii) This relates the evolutoid τα with the center symmetry set (CSS) or Minkowski
set of σ (it is given yet a third name in [10], namely the midenvelope of σ ). This
is the envelope of all lines joining pairs of distinct points of σ at which the tan-
gent lines are parallel. The CSS is the subject of several investigations, both in
the plane, and through generalizations (using tangent planes rather than tangent
lines) in higher dimensions, employing more advanced techniques of singular-
ity theory. See [8, Section 5] for an introduction and [9] for a more technical
discussion. Now, consider the points P, Q of the evolutoid τα corresponding
with two distinct points p, q of σ at which the tangent lines are parallel. Let R
be the point where the line PQ meets the line pq. Then R is the CSS point on
pq. It is not clear to us whether there are sensible generalizations of evolutoids
to higher dimensions, and hence a generalization of this result. Note that the
CSS, unlike the evolutoid, is invariant under nonsingular linear transformations
of the plane, since such a transformation preserves parallel lines.

3. CUSPS ON THE ENVELOPE. Consider the envelope curve given by (4). When
will this curve not be regular? The condition is that the speed is zero, that is, the deriva-
tive of x with respect to t is the zero vector. Again, using the Serret–Frenet formulae
(2), this derivative is, assuming κ(t) = 0 and omitting now the variable name t,
   
  κ   κ
x = ||σ || − 2 sin α cos α − ||σ || sin α T + ||σ || sin α cos α − 2 sin α N
2 2
κ κ
 

κ
= ||σ  || cos α − 2 sin α (cos αT + sin α N ).
κ

This is zero if and only if κ 2 ||σ  || cos α − κ  sin α = 0. Writing s for the arclength
parameter on σ , dκ
dt
= dκ ds
ds dt
= dκ ds
||σ  (t)||; hence, we have the following.

Proposition 3.1. The envelope (4) (still assuming κ = 0) is not regular if and only
if κ 2 cos α − κs sin α = 0, where κs is the derivative of curvature with respect to ar-
clength s on σ . This condition can also be written in terms of the radius of curvature
ρ = 1/κ : ρs sin α + cos α = 0.

Note that κ 2 and κs are independent of the direction of orientation of σ , and that
α > 0, say, always means a counterclockwise rotation of the oriented tangent to σ .
This gives the same line, whichever orientation of σ is used. That is why the equation
in the Proposition is unaltered when the orientation of σ is reversed.
For α = ± 21 π, that is, for the envelope of normals (the evolute of σ ), the Proposi-
tion gives the familiar condition κ  = 0 (this can be the derivative with respect to any
regular parameter), which says that σ has an extremum of curvature, that is, a vertex.
For α = 0, the envelope of tangents, it says that, away from inflexions, the envelope
has no cusps—but the envelope is σ itself, so this is nothing new.

Example 3.2. In the special case where σ (t) = (r cos t, r sin t), a circle of radius
r > 0, then κ is r1 , a nonzero constant. The Proposition shows that there are no cusps
at all, the only exception being cos α = 0, when all the lines are radii and pass through
the center of the circle so the envelope degenerates to a point. In fact, for other val-
ues of α we have ||x(t)||2 = r 2 cos2 α, so that the envelope is a concentric circle of

December 2014] EVOLVING EVOLUTOIDS 875


radius r | cos α|. The reader may enjoy showing from the Proposition that, for an el-
lipse σ (t) = (a cos t, b sin t), where a > b > 0, the value of α at
 which the four cusps
appear (are “born”) on τα , starting from α = 0, is α0 = arctan 2ab/3(a 2 − b2 ) . For
a = 2, b = 1, as in Figure 1, this comes to about α = 0.418 rad, or 24◦ .

When a plane curve such as this envelope curve is not regular, then in general we
expect it to have an “ordinary cusp,” that is, a singular point which is “like” the cusp
at the origin on the curve (t 2 , t 3 ). More formally, a local diffeomorphism of the plane
should take the given curve to this standard cusp.2 The condition for an ordinary cusp
is in fact that the second and third derivatives of σ (with respect to any regular param-
eter), evaluated at the cusp point, should be independent vectors.3 Note that for (t 2 , t 3 )
at t = 0 these vectors are (2, 0) and (0, 6), hence certainly independent. Some modest
calculation (see Section 8.1) reveals the following.

Proposition 3.3. Assume as before that κ is nonzero, and also that sin α = 0; that
is, the lines forming the envelope are not the tangent lines to σ . Then the cusp as in
Proposition 3.1 is an ordinary cusp if and only if 2κs2 − κκss = 0, where the deriva-
tives, with respect to arclength s, are evaluated at the cusp point. This can also be
written as ρss = 0 where ρ = 1/κ is the radius of curvature of σ .

Example 3.4. When α = 12 π, the conditions for an ordinary cusp reduce to κ =


0, κ  = 0, κ  = 0, and by Assumption 2.1 we know that all points of σ where the
first two of these conditions hold also satisfy the third condition. So all cusps of τπ/2
(the evolute of σ ) are ordinary cusps. But the Proposition does not guarantee that for
other values of α the cusps on τα are ordinary. Indeed, in Example 3.2, where σ is an
ellipse, the cusps are not ordinary at the moment of “birth,” when α = α0 . By suitably
choosing a and b we can make α0 take any value in 0 < α0 < 12 π.

Our object in this article is not to study a single value of α, but to study what
happens to τα as α varies. For this we put the investigation in a wider context.

4. DISCRIMINANTS. The function F(x, t) in (1), for a fixed α, is an example of a


family of functions of one variable t with two parameters (x, y) = x. We can include
α in the parameters as well as x, y:

F (x, α, t) = (x − σ (t)) · (T (t) sin α − N (t) cos α). (5)

This is a three-parameter family, where we use a script F to emphasize this.


Any family G(X, t) = G(x1 , x2 , . . . , xk , t) with k parameters (we shall always have
k = 2 or 3) has a discriminant DG , as follows, where we use a subscript as in G t to de-
note partial differentiation. We will use X to denote (x1 , x2 , . . . , xk ), and X0 to denote
a fixed value of X, to avoid confusion with x, which denotes (x, y) in our discussion
of envelopes. (For contrasting discussions of discriminants see [6, Chapter 6],[11].)
2 A local diffeomorphism here is a smooth map with smooth inverse, from a neighborhood of the cusp point

in the plane to a neighborhood of the origin, taking the one curve to the other. Similar definitions apply to
higher dimensions.
3 Starting with (x, y) = (at 2 + bt 3 + · · · , ct 2 + dt 3 + · · · ) this means that ad − bc = 0. In fact, an in-

vertible linear transformation x = a X + bY, y = cX + dY transforms the curve into (X, Y ) = (t 2 + · · · ,


t 3 + · · · ), and a reparametrization easily turns this into (u 2 , u 3 + · · · ) for a new parameter u. The final step to
make a smooth and invertible change of coordinates in the plane turning this into “normal form” (u 2 , u 3 ) takes
something more substantial and it is usual to invoke the Preparation Theorem. This reduction to normal form
is the very stuff of singularity theory; see, for example, [4] or, more technically, [7].

876 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
Thus, combining our notation yields

DG = {X = (x1 , x2 , . . . , xk ) : for some t, G = G t = 0 at (X, t)} . (6)

Example 4.1.
(i) Cusp. Let G(X, t) = G(x1 , x2 , t) = t 3 + x1 t + x2 . For a fixed X, this is a (re-
duced) polynomial of degree 3 in t, and DG consists exactly of those polyno-
mials with a repeated root. It is the curve X = (−3t 2 , 2t 3 ) parametrized by t
with an ordinary cusp at the origin.
(ii) Cuspidal edge surface. A bit more bizarrely, let G(X, t) = G(x1 , x2 , x3 , t) =
t 3 + x1 t + x2 , where x3 plays no role on the right side. The discriminant is
called a (standard) cuspidal edge surface. It is the product of an ordinary cusp
with a line, as in Figure 4, left, and is parametrized by x3 and t, namely

(x3 , t) → (−3t 2 , 2t 3 , x3 ). (7)

The product of the cusp point itself with this line (the x3 -axis here) is called
the line of cusps. This surface has an important property.

Consider a point p of the line of cusps, and the tangent T to the line of
cusps at p. Then the tangent planes at points away from the line of cusps
but with limit p, have a limit that contains T.

For the surface as in (7), a normal is in direction (t, 1, 0) at points away from
the line of cusps, and this has limit (0, 1, 0) as t → 0, so the limiting tangent
plane is x2 = 0. Of course, in general when we encounter a cuspidal edge
surface it will not be so “straight up and down” (see Figures 7 and 8)—it will
be locally diffeomorphic to the standard surface in a neighborhood of a point
p as above—but the property stated will still be true locally.

Figure 4. Left: The level sets x3 = c of the function h 1 (x1 , x2 , x3 ) = x3 in the cuspidal edge surface are
all curves with a cusp. Center and right: The level sets defined by functions h 2 (x1 , x2 , x3 ) = x1 + x32 and
h 3 (x1 , x2 , x3 ) = x1 − x32 . Below each figure is drawn a schematic diagram of the transitions undergone by
these level sets as c changes. The transition for h 2 is called a “beaks” or “bec-à-bec,” and that for h 3 a “lips,”
where for c > 0 the level set is empty and for c = 0 it is a single point.

December 2014] EVOLVING EVOLUTOIDS 877


(iii) Swallowtail surface. Let G(X, t) = G(x1 , x2 , x3 , t) = t 4 + x1 t 2 + x2 t + x3 .
For a fixed X this is a (reduced) polynomial of degree 4 in t. These polynomi-
als fill a three-dimensional space with coordinates (x1 , x2 , x3 ) and the discrim-
inant of G consists exactly of those polynomials which have a repeated root.
It is a surface known as a (standard) swallowtail 4 surface and is illustrated in
Figure 5. Solving for x2 and x3 , the surface is parametrized by x1 and t:

(x1 , t) → (x1 , −4t 3 − 2x1 t, 3t 4 + x1 t 2 ). (8)

Figure 5. Left: A swallowtail surface with the curved lines of cusps and self-intersection curve marked. These
all pass through the origin O. Right: A planar section x1 = constant c < 0 of a swallowtail surface, and the
“swallowtail transition” which these sections—level sets of the function h 0 (x1 , x2 , x3 ) = x1 —undergo as the
constant c moves through 0.

The origin is then called the swallowtail point and there are two lines of
cusps through the origin, given by G = G t = G tt = 0 and parametrized by
(−6t 2 , 8t 3 , −3t 4 ), t > 0, and t < 0. These represent polynomials with a triple
root. There is also a curve of self-intersection, parametrized by (−2t 2 , 0, t 4 )
(a half-parabola), representing polynomials with two double roots. The origin
represents the polynomial t 4 with a fourfold root. One important property of
this surface is the following.

The tangents to the lines of cusps, and the tangent to the self-intersection
curve, all have the same limit at the origin (here the limit is the x1 -axis).
Furthermore, the tangent planes to the swallowtail surface at points
away from these curves all have the same limit (here the x1 x2 -plane),
which contains the above limit of tangent lines.

(A surface normal at such a point, with parameters x1 , t as in (8), is (t 2 , t, 1),


which has limit (0, 0, 1) as t → 0.) Any surface locally diffeomorphic to the
standard swallowtail, in a neighborhood of the swallowtail point, will also sat-
isfy the stated property locally.
(iv) Of course there is the example at the heart of this article, given by (1), where
the discriminant D F is the envelope τα for a fixed α. Using (5) instead,
x1 , x2 , x3 become x, y, α, respectively, and the discriminant DF , in (x, y, α)-
space, is

DF = {(x, y, α) : there exists t such that F (x, y, α, t) = Ft (x, y, α, t) = 0} .


(9)
4 Originally queue d’aronde in the French of the great mathematician René Thom (1923–2002), one of the

founders of singularity theory.

878 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
This is the union of all the envelopes, for all α: they are spread out in the
α-direction. (Figures 6 and 7 below illustrate this discriminant.)

Figure 6. See Example 4.2. Two views of the discriminant surface DF , as in (9), for σ an ellipse. The α-axis
is vertical in the right-hand figure and α = 0 is at the bottom (the envelope is the original ellipse) and α = 12 π
is at the top (the envelope of normals has four cusps). The surface appears to have (curved) cuspidal edges,
with (curved) lines of cusps C marked, and the horizontal sections—the level sets of α close to the marked
swallowtail point, where α = α0 , say—appear to undergo a swallowtail transition.

Figure 7. Two views of the cuspidal edge discriminant DF close to an ordinary inflexion of a curve and close
to α = 0, looking from below the plane α = 0 (left), and from above (right). The lines of cusps C are labelled
as are the visible parts of the curve σ in the plane α = 0; also the x-axis, which is the inflexional tangent to σ
at the origin. The plane sections α = constant of DF evolve through α = 0 by a “beaks” transition, as shown
in Figure 9.

Singularity theory includes extensive investigations of discriminants and of func-


tions on discriminants. To illustrate the latter, consider examples (ii) and (iii) above,
the three functions

h 1 (X) = x3 , h 2 (X) = x1 + x32 , h 3 (X) = x1 − x32

for the cuspidal edge surface in (ii) and the function h 0 (X) = x1 for the swallowtail
surface in (iii). These are illustrated in Figures 4 and 5 by means of their level sets, that
is, the sets of points of the cuspidal edge or swallowtail surface for which h i = c, for
values of c passing through 0. We are interested in the answers to two questions here.

December 2014] EVOLVING EVOLUTOIDS 879


(Q1) How can we recognize, in a given situation, such as that of DF , that a discrim-
inant is, up to a local diffeomorphism of R3 , a cuspidal edge or a swallowtail
surface? See Section 5.1.
(Q2) How can we recognize that a given function, e.g., α : DF → R, has level
sets that undergo an evolution (or transition or “perestroika”5 ) in one of the
“standard” ways of Figures 5 and 4? See Section 5.2.
In the case of when the discriminant is DF and the function is h(x, y, α) = α, the
level sets h = constant are of course the individual envelopes of the family, and we
want to study precisely how these change as α changes.

Example 4.2. Let σ be an ellipse. The surface DF is illustrated in Figure 6. The


surface appears to have the structure of (curved) cuspidal edge and swallowtail surfaces
and the function h(x, y, α) = α appears to have level sets that undergo a swallowtail
transition for certain values of α. How can these observations be verified? Read on!

5. APPLYING RESULTS FROM SINGULARITY THEORY. In this section we


state and then apply the results from singularity theory, which allow us to make precise
statements about the way in which the envelopes τα evolve as α changes. Details of the
results in Section 5.1 are in [6], while those in Section 5.2 are found in various places,
such as [2, 5].

5.1. How to recognize a discriminant surface. We consider a discriminant as in (6),


and restrict to the case k = 3, so that the general form will have a family of functions
G(X, t) = G(x1 , x2 , x3 , t), and

DG = {X : there exists t such that G = G t = 0 at (X, t)}.

Definition 5.1. For X = X0 , the function g(t) = G(X0 , t) has


(i) type A2 at t = t0 if g  (t0 ) = g  (t0 ) = 0, g  (t0 ) = 0,
(ii) type A3 at t = t0 if g  (t0 ) = g  (t0 ) = g  (t0 ) = 0, g (4) (t0 ) = 0.
We also say g has an “A2 or A3 singularity” at t = t0 .

Thus, the type measures how many partial derivatives of G with the xi parameters
held fixed vanish at t0 . In the special case of the family F (x, y, α, t) in (5), defin-
ing the envelopes τα , X is replaced by (x, y, α). The following Proposition gives the
conditions for A2 and A3 singularities, expressed in terms of the curvature κ and its
derivatives. Part (i) is a routine and not very interesting calculation using the Serret–
Frenet formulas (2). Part (ii) deals with the case of an inflexion and we shall verify
this since it is slightly more surprising. Recall from Proposition 2.3 that the envelope
of tangents to a curve σ , that is τα with α = 0, when σ has inflexions, consists of σ
and the whole tangent line at inflexion points. We shall be interested in the discrimi-
nant DF close to a point (x0 , 0) where x0 = (x0 , y0 ) is the inflexion point itself. It will
turn out that this discriminant is locally diffeomorphic to a cuspidal edge surface, as in
Figure 7, and we need the result of (ii) to show this.
5 Perestroika in Russian means approximately the same as “restructuring” in English, and the Western World

heard a great deal about it in the 1980s and 1990s during the Gorbachev era in the Soviet Union and then the
Russian Federation. Its use in a mathematical context was popularized by the great Russian mathematician
Vladimir Igorevich Arnol’d (1937–2010).

880 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
Proposition 5.2. Let the point (x0 , α0 , t0 ) = (x0 , y0 , α0 , t0 ) satisfy F = Ft = 0, so
that (x0 , α0 ) ∈ DF .
(i) Let s denote the arclength function on σ , as in Section 3. Suppose that κ(t0 ) = 0
so that x0 is given by (4). Then f (t) = F (x0 , α0 , t) has, at t = t0 ,
(a) type A2 provided, at t0 ,

κ 2 cos α − κs sin α = 0, 2κs2 − κκss = 0,

(b) type A3 provided, at t0 ,

κ 2 cos α − κs sin α = 0, 2κs2 − κκss = 0, 6κs3 − κ 2 κsss = 0.

(ii) Suppose that κ(t0 ) = 0, κ  (t0 ) = 0, so that σ has an ordinary inflexion at t = t0 .


Then, setting α0 = 0, the only x close to σ (t0 ) for which Ftt (x, 0, t0 ) = 0 is
x = σ (t0 ) itself and f (t) = F (σ (t0 ), 0, t) has type A2 at t0 .

Note that the A2 condition in (i)(a) is the same as that for an ordinary cusp given in
Proposition 3.3.

Proof of (ii). The calculations are marginally simplified by taking the parameter t to
be itself arclength, so that Serret–Frenet formulas become T  = κ N and N  = −κ T .
(But we shall still write t for the parameter.) Let us take t0 = 0. We know that, locally,
F = Ft = 0 at (x, 0, t) implies that either x = σ (t) or x is on the tangent line at the
inflexion σ (0). Using the formula (5), we differentiate twice and put α = t = 0; this
gives Ftt (x, 0, 0) = (x − σ (0)) · (κ  (0)T (0)), which can only be zero, for x close to
the inflexion point, when x coincides with that point. (This actually implies that at
other points of the tangent line at the inflexion, DF is a nonsingular surface.) Further-
more, Fttt (σ (0), 0, 0) = −2κ  (0) = 0, so F (σ (0), 0, t) has exactly an A2 singularity
at t = 0.

Now let us return to a general family G as above. We will continue to use X to


denote (x1 , x2 , x3 ), and X0 to denote a fixed value of X. Suppose that G = G t = 0 at
(X0 , t0 ), where the function g(t) = G(X0 , t) has type Ar , r = 2 or 3, at t0 . It is almost
true that DG , in a small neighborhood of X0 , is locally diffeomorphic to the standard
discriminant of Examples 4.1. In fact, there is an additional “genericity” condition to
verify which, in the case G = F , we shall show is automatic from our Assumption 2.1.

Example 5.3. Consider the family G(X, t) = 2t 3 + x1 t 2 + x2 (independent of x3 ),


for which DG = {(x1 , 0, x3 )} ∪ {(−3t, t 3 , x3 )}. Then g(t) = G(0, 0, 0, t) = 2t 3 has,
at t0 = 0, an A2 singularity. But DG is not a cuspidal edge surface near the origin; it
is the product of the curve x13 + 27x2 = 0, together with its inflexional tangent x2 = 0,
by the x3 -axis.

The additional condition that is needed, besides the A2 or A3 singularity, is as fol-


lows. It essentially says that the parameters xi perturb the singularity in a “sufficiently
general” way; the technical term is that they “unfold” the singularity in a (uni)versal
manner.

Definition 5.4. Suppose that G = G t = 0 at (X0 , t0 ) and g(t) = G(X0 , t) has an Ar


singularity at t0 . Consider the partial derivatives G x1 , G x2 , G x3 with respect to the pa-
rameters xi , evaluated at X0 , and in particular their Taylor polynomials Ti up to degree

December 2014] EVOLVING EVOLUTOIDS 881


r − 1, expanded about t0 (so these have r terms). The family G(X, t) is called a versal
unfolding of g at t0 if the Ti span a vector space of dimension r . Thus, if the coeffi-
cients in the Ti are placed as the columns of an r × 3 matrix, the rank is r . Clearly this
is possible only for r ≤ 3.

Example 5.5.
(i) For the G, and X0 = (0, 0, 0), t0 = 0 in Example 5.3, where r = 2, the deriva-
tives with respect to xi are t 2 , t, 0, respectively (independently of X0 , in fact).
The Taylor polynomials up to degree 1, expanded about 0, are 0, t, 0, respec-
tively, so the criterion of Definition 5.4 does not hold. It is also clear that there is
“something missing” from this family for, whatever the values of x1 , x2 , x3 , the
function g(t) = G(x1 , x2 , x3 , t) will always have a singularity at t = 0, in fact
of type A1 unless x1 = 0. We cannot turn the graph of the function g(t) = t 3
into a graph without turning points by adjusting the xi . On the other hand, for
Example 4.1(ii) the condition for a critical point is 3t 2 + x1 = 0 and there are
no critical points if x1 > 0.
(ii) For the G of Examples 4.1(i), (ii), (iii), the criterion is easily shown to hold.
For example, with 4.1(iii), where r = 3, the Ti are t 2 , t, 1, respectively.

The key theorem then is as follows; for details, see [6].

Theorem 5.6. Suppose that G satisfies the criterion of Definition 5.4. Then, in a neigh-
borhood of X0 ∈ DG , the discriminant is locally diffeomorphic to a standard cuspidal
edge surface when r = 2 and a standard swallowtail surface when r = 3 (as in Ex-
amples 4.1(ii), (iii)).

The most important example for us is of course DF , and there we must work a little
harder to verify the criterion of Definition 5.4 when G = F . We give some details in
Section 8.2 and the result is then as follows.

Corollary 5.7. The family F , as in (5), satisfies the conditions of the above the-
orem. Thus, when f (t) = F (x0 , t) has an Ar singularity at t0 , r = 2 or 3, in the
cases covered by Proposition 5.2, the discriminant DF , which is the union of the en-
velopes τα spread out in the α direction, is always locally diffeomorphic to a standard
cuspidal edge (r = 2) or a standard swallowtail surface (r = 3) in a neighborhood
of x0 .

So Figures 6 and 7 are not deceiving us.

5.2. How to recognize level sets of a function. There are two cases to consider,
namely level sets of a function on a cuspidal edge surface, and on a swallowtail surface
in 3-space R3 . Fortunately, in both cases the conditions to realize the transitions of
Figures 4 and 5 are intuitively very reasonable. We shall only state the answers here;
there is more discussion of functions on discriminants in [2, 5]. For a smooth function
h : R3 → R, defined in a neighborhood of X0 ∈ DG , there will be a kernel plane K
through X0 . This is the plane tangent to the level set of h through X0 and has equation
(X − X0 ) · (h x1 , h x2 , h x3 ) = 0, where the partial derivatives are evaluated at X0 . We
refer back to Examples 4.1(ii), (iii) for properties of tangent planes to a cusped edge
and swallowtail surface.

882 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
Proposition 5.8.
(i) For a cuspidal edge surface, with X0 on the line of cusps, the level sets of h on
DG will all be cusped curves, as in Figure 4, left, provided the plane K does
not contain the tangent to the line of cusps through X0 . (“K is transverse to the
line of cusps.”)
On the other hand, the levels sets undergo a “beaks” or “lips” transition, as
in Figure 4, center and right, provided K does contain this tangent but does not
coincide with the limiting tangent plane to the cuspidal edge surface at points
approaching X0 . (“K is transverse to this limiting tangent plane.”)
(ii) For a swallowtail surface, with X0 at the swallowtail point, the level sets on DG
undergo a swallowtail transition, as in Figure 5, with two cusps merging and
disappearing, provided K does not contain the limiting tangent to the lines of
cusps on DG at X0 . (“K is transverse to this limiting tangent line.”)

See Figure 8 for examples where the conditions of (i) hold and also where they do
not hold. Similarly, level sets of a function on a swallowtail surface failing to satisfy
the condition of (ii) will not resemble a swallowtail transition.

Figure 8. (a) An example of a function h(X) = x2 + x32 on the cuspidal edge illustrated, which does not satisfy
the conditions of Proposition 5.8(i), since the level set for c = 0, appearing as the lighter colored surface on the
left, has tangent plane at the origin equal to the x1 x3 -plane, which coincides with the limiting tangent planes
to the cuspidal edge. The level sets where the surface x2 + x32 = c meets the cuspidal edge evolve in the way
illustrated. Clearly this is not anything like the transitions of Figure 4. In (b) and (c), the function on the curved
cuspidal edge M is assumed to be height in the direction of the vertical arrow. For (b), the level set at any level
is a horizontal plane, which is transverse to the line of cusps C and all the horizontal sections are cusps. For
(c), the level set (horizontal plane) through P is tangent to C but does not coincide with the (vertical) limiting
tangent planes to M, hence producing a “beaks” transition as in Proposition 5.8(i).

Remarkably, the conditions of Proposition 5.8 are always satisfied for the discrim-
inant DF , provided only that the original curve σ satisfies the Assumptions 2.1. We
sketch the proof of this in Section 8.3 below. In particular, the transition on the enve-
lope τα through α = 0, at an inflexion point of σ , is a “beaks” transition in which two
cusps collide, momentarily giving the envelope the entire tangent line at the inflex-
ion point, and then separate into two smooth branches. This is illustrated in Figure 9
for a closed curve with two inflexions, in fact the curve σ (t) = ((cos t + 2 cos(2t)
+10) cos t, (12 sin t − 8) sin t). (The transition cannot be of “lips” type as in Figure 4,
right, since the envelope cannot become empty.) Thus we have the following, which
completely describes the local changes in the envelopes τα for a curve σ satisfying the
genericity Assumption 2.1.

Theorem 5.9. The evolutoids τα evolve locally according to a stable cusp (Figures 4,
left, 6 and 8(b)) at A2 points where the curvature κ is nonzero; according to a swal-
lowtail transition (Figure 6) at A3 points where κ = 0; and according to a “beaks”

December 2014] EVOLVING EVOLUTOIDS 883


transition (Figures 4, center, and 8(c)) at points where κ = 0 and α = 0. At all other
points the envelope τα is a smooth curve.

For example, in Figure 1, the envelope is undergoing swallowtail transitions that are
identical, up to a local diffeomorphism, with those of the standard swallowtail sections
in Figure 5.

Figure 9. Center: A curve σ with two inflexions, and (left) the envelope τα for α small and negative; (right)
for α small and positive. Two cusps approach and merge close to each inflexion, leaving smooth branches, in
a “beaks” transition as in Figure 4, center. The envelope for α = 0 is the original curve and the tangents at
the inflexion points, which are drawn dashed in the center figure (see Proposition 2.3 and compare Figure 2,
right). The figure also shows two swallowtail configurations on each envelope, which collapse in a swallowtail
transition as α → 0. The envelope τα goes “to infinity” at points corresponding to the inflexions themselves,
since the denominator in (4) vanishes.

6. WAVEFRONTS. For the envelope of normals τπ/2 to a curve σ , there is associ-


ated a family of wavefronts, also called parallels or offsets, which look like radiation
emanating from σ into the surrounding space. See Figure 10, left. The cusps on the
wavefronts trace out the envelope, which in this case is just the four-cusped evolute of

Figure 10. Left: The “ordinary” wavefronts of an ellipse σ , which are obtained by displacing σ a constant
distance along its normals. The cusps on the wavefronts all lie on the envelope of normals, the 4-cusped curve,
which is also drawn. Right: Nonclosed wavefronts, given by (10), corresponding to the envelope of lines shown
in Figure 1, center, for which α = 14 π . Note that as the wavefronts (in either example) approach a cusp on the
envelope, two cusps on the wavefronts collapse together—this is, in fact, another example of a swallowtail
transition.

σ . The wavefronts have the parametrization σ (t) + wN (t), where w takes a constant
value along each wavefront. Thus σ is displaced a constant distance w along the nor-
mals to σ . The general prescription for wavefronts from a family of lines (or curves)

884 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
F(x, t) is as follows. We “integrate” F, that is, we look for a family G satisfying
G t = F. Then the wavefronts are given by F = 0 and G = constant.
In the case of the family of normals, which is given by F(x, t) = (x − σ (t)) · T (t),
it is easy to write down a suitable G. Let us take t to be the arclength parameter;
then we can take G(x, t) = − 12 ||x − σ (t)||2 = − 21 (x − σ (t)) · (x − σ (t)), since by
(2) σ  (t) = T (t), so that G t = F. The solutions of F = 0, G = w1 = constant
√ <0 are
the points of the normal (F = 0) for which the distance to the curve is ± −2w1 , that
is, the ordinary parallels (offsets) of σ .
For the family F in (1), with α fixed, we have to work a little harder, but writing
F = F1 sin α − F2 cos α, we can use the above solution for G 1 with (G 1 )t = F1 and,
for F2 we need

G 2 = (x − (X (t), Y (t))) · (−Y  (t), X  (t)) dt,

where σ (t) = (X (t), Y (t)). Note that the integral is independent of the parametrization
of σ and in fact it represents, up to an added constant, twice the area swept out by
a line from x to σ (t), as t travels from some arbitrary starting value t0 . When t is
arclength, then G 2 = (x − σ (t)) · N (t) dt. In the special case of an ellipse σ (t) =
(a cos t, b sin t), we can write down the integral explicitly:

G 2 (t) = ay cos t − bx sin t + abt (+ constant).

Note that this contains t on its own, so that unless cos α = 0, in which case G 2 is not
needed, the solution is not periodic. The wavefronts will not be closed curves, even
though the ellipse σ is closed.
Returning to F for a fixed α and any unit speed curve σ , and using G = G 1 sin α −
G 2 cos α, we have a prescription for finding the wavefronts. It is not hard to check
that the following formula satisfies F(σw (t), t) = 0, G(σw , t) = − 21 w2 sin α, which is
constant for a given w and α,

σw (t) = σ (t) + (w − t cos α)(T (t) cos α + N (t) sin α). (10)

This gives an explicit formula for the family of wavefronts corresponding to a given
σ and α. Fixing w and letting t vary, we get an individual wavefront, parametrized
by t, corresponding to that w. It can be checked that the singular points (cusps) on
the wavefronts are given by the additional condition G tt = 0, that is Ft = 0, which
says that the cusps lie on the envelope τα of lines given by F = Ft = 0. We say that
the cusps of the wavefronts sweep out the envelope τα . The exception is sin α = 0, for
which τα is the envelope of tangent lines to σ . In that case, there are no cusps to sweep
out anything.
If α = 12 π (the envelope of normals to σ ), then σw (t) = σ (t) + wN (t), which is the
usual parallel or offset of σ , obtained by moving down the normals a distance w. Note
that w and −w give different parallels, but the same value of G. Fixing the value of G
gives two parallels, corresponding to values of w of opposite sign. If we use α = − 21 π
instead, then w and −w are interchanged.
For a general α = ± 21 π, the factor w − t cos α in (10) tells us how far along the line
given by t we must go to reach the wavefront point. For a closed curve σ , parametrized
by 0 ≤ t < T , we can add any multiple of T on to t and obtain the same point of σ
and the same line of the family. Thus, the wavefront meets the line corresponding to
the value t infinitely often.
An example, using the ellipse, is shown in Figure 10, right.

December 2014] EVOLVING EVOLUTOIDS 885


7. CONCLUSION. We have shown how to investigate a family of envelopes,
parametrized by an angle α, using some results from singularity theory. The family τα
of envelopes interpolates between the plane curve σ (α = 0) and its evolute (α = 12 π),
with some additional complications connected with inflexions of σ . We can apply
some general results about discriminants—cusp, cuspidal edge, swallowtail—together
with real-valued functions α on discriminants and their associated level sets, that is,
the sets on which α is constant. It is the identification of envelopes with discriminants
that is at the heart of the applications given in this article. We have also investigated
the corresponding wavefronts, whose singular points sweep out the envelopes τα , and
found that, unlike the parallels associated with the evolute of a curve, these wavefronts
are in general not closed curves. The same techniques—reduction to normal form, use
of standard models, discriminants and functions on discriminants—give a great deal of
information about the differential geometry of surfaces and higher-dimensional man-
ifolds, and there are many applications to areas of science such as control theory and
dynamical systems. Some examples are given in the classic text [3], and applications
to shape analysis are in [12].

8. SOME PROOFS AND ADDITIONAL NOTES

8.1. Proof of Proposition 3.3. Here is an indication of how to prove Proposition 3.3
without getting tangled in too much algebra. We will assume for this that ||σ  (t)|| = 1
for all t, which says that σ is unit speed, or that t = s is the arclength parameter.
This is no loss of generality (see any book on differential geometry of curves, or
alternatively [6, pp. 27–28]) and saves a little writing. In fact, write λ = − cos α +
(κ  sin α/κ 2 ), so that the condition for a cusp is λ = 0. Then the envelope in (4) has
x = −λ cos αT − λ sin α N . Differentiating this twice, to give x and x , and putting
λ = 0 (after differentiating!) quickly gives the condition for these vectors to be linearly
dependent as 2κλ2 = 0, that is, λ = 0, and this gives the required formula, provided
sin α = 0.

8.2. Sketch of a proof of Corollary 5.7. Let us write s for sin α and c for cos α,
σ (t) = (X (t), Y (t)) and assume that σ is unit speed, that is, ||σ  (t)|| = 1 for all t, and
that κ = 0 (see below for the case κ = 0). Recall (5) that

F (x, α, t) = (x − σ ) · (T s − N c) = (x − X )(X  s + Y  c) + (y − Y )(Y  s − X  c).

To show that F is “versal,” we need to consider the Taylor series in t, including con-
stant term, up to degree 2, of the derivatives Fx , F y , Fα , and check that they are in-
dependent when evaluated at an A3 (swallowtail) point, that is, one at which the con-
ditions of Proposition 5.2(i)(b) hold. So we need to take these three derivatives and
differentiate them with respect to t twice to get the necessary terms of the Taylor se-
ries. We are, of course, allowed to assume (4). For example,

Fα = (x − σ ) · (T c + N s), so Fαt = −T · (T c + N s) + (x − σ ) · (κ N c − κ T s),

but using (4), this boils down to simply − cos α. Differentiating again and substituting
from (4) shows Fαtt = 0. Let us write u = (s, c); then, for example, Fx = T · u and
F y = −N · u. Altogether we find that the 2-jets are the columns of the matrix J1 below

886 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
1
(where the binomial coefficient 2
has been omitted):
⎛ ⎞ ⎛ ⎞
Fx Fy Fα T ·u −N · u s
κ
J1 = ⎝ Fxt F yt Fαt ⎠ = ⎝ κN · u κT · u −c ⎠ .
Fxtt F ytt Fαtt 2  2 
(−κ T + κ N ) · u (κ N + κ T ) · u 0

The determinant of this matrix is κ 2 sin α + κ  cos α. If this is zero, then, using
Proposition 5.2(i) (where κs is our κ  ), we have κ 4 + κ 2 = 0 so that κ = 0, contrary to
assumption. This proves the required independence. See [6, Chapter 6] for full details
of this method.
In the case of an A2 singularity where κ = 0, the versal unfolding condition
is simply that the top left 2 × 2 minor of J1 is nonsingular, and this amounts
to κ = 0.
Finally, when considering an (ordinary) inflexion, so that κ = 0, and working at
α = 0, it is easy to check that F has exactly an A2 singularity. Let us take the unit
tangent at the inflexion to be (1, 0) so that the unit normal is (0, 1). The vector u is
(0, 1) here, and the Taylor expansions to degree 1 of Fx , F y , Fα come to 0, −1, and
−t, respectively. These span polynomials of degree 1 in t.

8.3. Verifying the conditions needed to apply Proposition 5.8 to the discriminant
DF . First, consider the case where the curve σ does not have an inflexion at σ (t0 ), but
f (t) = F (x0 , α0 , t) has an A2 or A3 singularity at t = t0 . In the A2 case, by Corol-
lary 5.7, DF is locally diffeomorphic to a cuspidal edge surface close to (x0 , α0 ). This
cuspidal edge is given by F = Ft = Ftt = 0, that is, three equations in the four vari-
ables x, y, α, t, and the solutions are then projected to (x, y, α)-space, where DF lies.
A standard technique for calculating tangent vectors (the implicit function theorem,
which is covered in books of advanced calculus, or see [6, Chapter 4]) says that we look
for nonzero kernel vectors of the 3 × 4 matrix J2 of partial derivatives of F , Ft , Ftt
with respect to the four variables (x, y, α, t), evaluated at (x0 , α0 , t0 ). Note that the
first three columns of J2 are the same as the three columns of J1 in the previous sec-
tion, while the fourth column is (0, 0, Fttt = 0) at an A2 point and (0, 0, 0) at an
A3 point. Thus for an A2 point, we can always find a kernel vector, (x, y, α, t) say,
whose first three components are not all zero, using the first two rows of J2 , and then
determine t using the third row of J2 , since Fttt = 0. Then (x, y, α) is a nonzero tan-
gent vector to the line of cusps C on DF in (x, y, α)-space. However, this cannot be
done with α = 0 in view of the nonsingularity of the top left 2 × 2 submatrix of J2 (or
J1 ). So a tangent vector to C will never be horizontal and changing α to nearby values
gives a stable cusp as in Figure 8(b) and not (c).
There is clearly a problem with this argument at an A3 point (x0 , α0 ), where Fttt =
0, since in view of the nonsingularity of J1 , all kernel vectors of J2 have the form
(0, 0, 0, t). This simply says that, in (x, y, α)-space, the curve C on DF is singular at
(x0 , α0 )—not surprising, since DF is a swallowtail surface and the space curve C itself
has a cusp at (x0 , α0 ). However, the above argument still applies, by taking, say, a unit
tangent vector (x, y, α) and moving toward (x0 , α0 ) along C . The last component can-
not tend to 0 without the other two tending to 0 as well, which is a contradiction.
√ In the
present case, we can be more explicit: A tangent vector (of length 1 + κ 2 > 1) to C ,
obtained from the first two rows of J2 , is ((sin(2α), cos(2α)) · N , (sin(2α), cos(2α)) ·
T, κ). It is plain that this cannot have a limit in which the third component is 0. This
is certainly visible in Figure 6, where the limiting tangent to the lines of cusps is far
from horizontal.

December 2014] EVOLVING EVOLUTOIDS 887


Finally, consider the case when σ has an inflexion at t0 = 0, say, and x0 =
σ (t0 ), α0 = 0. Then DF is locally diffeomorphic to a cuspidal edge surface by
Corollary 5.7. In this case, it is easier to do a calculation in local coordinates, for
instance, taking σ (t) = (t, at 3 + bt 4 + ct 5 + · · · ) where a = 0, since there is an ordi-
nary inflexion at the origin, and expanding everything about (x, y, α, t) = (0, 0, 0, 0).
Then an explicit calculation shows that DF is locally parametrized by (x, t) and the
line of cusps by t:

DF : (x, t) → (x, 6ax 2 t − 9axt 2 + 4at 3 + · · · , 6axt − 6at 2 + · · · );


 
8b(24b2 − 35ac) 4
C : t → 2t + · · · , −40ct + · · · ,
4
t + ··· .
a2

It is clear that the limiting tangent to C is in the direction (1, 0, 0), which is in the plane
α = 0. Since we know that DF is a cuspidal edge surface, we can find the limiting
tangent plane to DF at points away from C by taking any path on DF that avoids
C (apart from at (x, α) = (0, 0, 0)), such as the path given by t = 0. The normal to
DF then comes to (0, −6ax + · · · , 6ax 2 + · · · ), which has limit (0, 1, 0) so that the
limiting tangent plane is the plane y = 0 and hence does not coincide with the plane
α = 0, as required.

ACKNOWLEDGMENTS. The second author is grateful for financial support from the Engineering and
Physical Sciences Research Council in the UK during his Ph.D. programme; the present article derives from
part of Chapter 3 of [14]. Both authors thank Joel Haddley for suggesting that they consider wavefronts corre-
sponding to the envelopes studied, Joel and Victor Goryunov for insisting that the inflexion case is also a beaks
transition, and the referees who encouraged us to improve our exposition and gave us earlier references to the
literature on evolutoids.

REFERENCES

1. T. M. Apostol, M. A. Mnatsakanian, Tanvolutes: Generalized involutes, Amer. Math. Monthly 117 (2010)
701–713, http://dx.doi.org/10.4169/000298910x515767.
2. V. I. Arnol’d, Wave front evolution and equivariant Morse lemma, Comm. Pure Appl. Math. 29 (1976)
557–582, http://dx.doi.org/10.1002/cpa.3160290603.
3. , Catastrophe Theory. Third edition. Springer-Verlag, Berlin, 1992, http://dx.doi.org/
10.1007/978-3-642-58124-3.
4. Th. Bröcker, L. Lander, Differentiable Germs and Catastrophes. London Math. Soc. Lecture Notes. Vol.
17. Cambridge Univ. Press, Cambridge, 1975, http://dx.doi.org/10.1017/cbo9781107325418.
5. J. W. Bruce, Functions on discriminants, J. London Math. Soc. 30 (1984) 551–567, http://dx.doi.
org/10.1112/jlms/s2-30.3.551.
6. J. W. Bruce, P. J. Giblin, Curves and Singularities. Cambridge Univ. Press, Cambridge, 1992.
7. J. W. Bruce, N. P. Kirk, A. du Plessis, Complete transversals and the classification of singularities, Non-
linearity 10 (1997) 253–275, http://dx.doi.org/10.1088/0951-7715/10/1/017.
8. P. Giblin, Affinely invariant symmetry sets, Caustics ’06, Banach Center Publications, Warsaw, Vol. 82
(2008) 71–84, http://dx.doi.org/10.4064/bc82-0-5.
9. P. J. Giblin, V. M. Zakalyukin, Singularities of centre symmetry sets, Proc. London Math. Soc. 90 (2005)
132–166, http://dx.doi.org/10.1112/s0024611504014923.
10. M. Hamman, A note on ovals and their evolutoids, Beiträge Algebra Geom. 50 (2009) 433–441.
11. G. Katz, How tangents solve algebraic equations, or a remarkable geometry of discriminant varieties,
Expo. Math. 21 (2003) 219–261, http://dx.doi.org/10.1016/s0723-0869(03)80002-6.
12. Medial Representations. Edited by S. M. Pizer and K. Siddiqi. Springer, New York, 2008, http://dx.
doi.org/10.1007/978-1-4020-8658-8.
13. R.-A. F. de Réaumur, Methode générale pour déterminer le point d’intersection de deux lignes droites
infiment proches, qui rencontrent une courbe quelconques vers le même côté sous des angles égaux
moindres, ou plus grandes qu’un droit, et pour connoı̂tre la nature de la courbe décrite par une infinité de
tels points d’intersection, Histoire de l’Académie Royale des Sciences (1709) 149–162, http://www.
academie-sciences.fr/activite/archive/ressource.htm.

888 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
14. J. P. Warder, Symmetries of Curves and Surfaces. Ph.D. thesis, University of Liverpool, 2009, http://
www.liv.ac.uk/~pjgiblin/papers/pw-thesis.pdf.

PETER J. GIBLIN is Professor of mathematics (Emeritus) at the University of Liverpool, where he was
Head of the Department of Mathematical Sciences 2000–2004. He has been a Visiting Professor at UNC
Chapel Hill, the University of Massachusetts at Amherst, and Brown University. His research interests are in
singularity theory and its applications to differential geometry and computer vision. He also enjoys working
with High School students on challenging projects. Now that he is retired, he has more time, equal enthusiasm,
but less energy for doing mathematics himself, thereby preserving the status quo.
Department of Mathematical Sciences, The University of Liverpool, Liverpool L69 7ZL, UK
pjgiblin@liv.ac.uk

J. PAUL WARDER is a career mathematician of some 25 years experience, having been engaged in varied
mathematical modelling and applied statistics roles in the nuclear, finance, and public service sectors. After
completing his undergraduate studies at the University of Liverpool in 1988, he returned to the mathematics
department in 2004 to complete successive M.Sc. and Ph.D. studies under the supervision of Peter Giblin.
paul.warder@gmail.com

December 2014] EVOLVING EVOLUTOIDS 889


Numerical Semigroups, Cyclotomic
Polynomials, and Bernoulli Numbers
Pieter Moree

Abstract. We give two proofs of a folklore result relating numerical semigroups of embedding
dimension two and binary cyclotomic polynomials and explore some consequences. In partic-
ular, we give a more conceptual reproof of a result of Hong et al. (2012) on gaps between the
exponents of nonzero monomials in a binary cyclotomic polynomial.
The intent of this paper is to better unify the various results within the cyclotomic polyno-
mial and numerical semigroup communities.

1. INTRODUCTION. Let a1 , . . . , am be positive integers, and let S be the set of all


nonnegative integer linear combinations of a1 , . . . , am , that is,

S = S(a1 , . . . , am ) = {x1 a1 + · · · + xm am | xi ∈ Z≥0 }.

Then S is a semigroup (that is, it is closed under addition). The semigroup S is said
to be numerical if its complement Z≥0 \S is finite. It is not difficult to prove that
S(a1 , . . . , am ) is numerical if and only if a1 , . . . , am are relatively prime (see, e.g.,
[15, p. 2]). If S is numerical, then max{Z≥0 \S} = F(S) is the Frobenius number of S.
Alternatively, by setting d(k, a1 , . . . , am ) equal to the number of nonnegative integer
representations of k by a1 , . . . , am , one can characterize F(S) as the largest k such
that d(k, a1 , . . . , am ) = 0. The value d(k, a1 , . . . , am ) is called the denumerant of k.
That F(S(4, 6, 9, 20)) = 11 is well known to fans of Chicken McNuggets, as 11 is the
largest number of McNuggets that cannot be exactly purchased; hence, the notion of
of the Frobenius number is less abstract than it might appear at first glance. A set of
generators of a numerical semigroup is a minimal system of generators if none of its
proper subsets generates the numerical semigroup. It is known that every numerical
semigroup S has a unique minimal system of generators and also that this minimal
system of generators is finite (see, e.g., [18, Theorem 2.7]). The cardinality of the min-
imal set of generators is called the embedding dimension of the numerical semigroup
S and is denoted by e(S). The smallest member in the minimal system of generators
is called the multiplicity of the numerical semigroup S and is denoted by m(S). The
Hilbert series of the numerical semigroup S is the formal power series

HS (x) = x s ∈ Z[[x]].
s∈S

It is practical to multiply this by 1 − x as we then obtain a polynomial, called the


semigroup polynomial:

PS (x) = (1 − x)HS (x)


 
= x F(S)+1 + (1 − x) x s = 1 + (x − 1) xs. (1)
0≤s≤F(S) s∈ S
s∈S

http://dx.doi.org/10.4169/amer.math.monthly.121.10.890
MSC: Primary 20M14, Secondary 11C08; 11B68

890 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
From PS one immediately reads off the Frobenius number:

deg(PS (x)) = F(S) + 1. (2)

The nth cyclotomic polynomial n (x) is defined by


ϕ(n)
 
n (x) = (x − ζnj ) = an (k)x k ,
1≤ j≤n k=0
( j,n)=1

with ζn an nth primitive root of unity (one can take ζn = e2πi/n ). It has degree ϕ(n),
with ϕ Euler’s totient function. The polynomial n (x) is irreducible over the rationals
(see, e.g., Weintraub [22]) and has integer coefficients. The polynomial x n − 1 factors
as

xn − 1 = d (x) (3)
d|n

over the rationals. By Möbius inversion it follows from (3) that



n (x) = (x d − 1)μ(n/d) , (4)
d|n

where μ(n) denotes the Möbius function. From (4) one deduces that if p|n is a prime,
then

 pn (x) = n (x p ). (5)

A good source for further properties of cyclotomic polynomials is Thangadurai [19].


A purpose of this paper is to popularize the following folklore result and point out
some of its consequences.

Theorem 1. Let p, q > 1 be coprime integers, then


 (x pq − 1)(x − 1)
PS( p,q) (x) = (1 − x) xs = .
s∈S( p,q)
(x p − 1)(x q − 1)

In case p and q are distinct primes it follows from (4) and Theorem 1 that

PS( p,q) (x) =  pq (x). (6)

Already Carlitz [5] in 1966 implicitly mentioned this result without proof.
The Bernoulli numbers Bn can be defined by

z  zn ∞
= Bn , |z| < 2π. (7)
ez − 1 n=0
n!

One easily sees that B0 = 1, B1 = −1/2, B2 = 1/6, B3 = 0, B4 = −1/30 and Bn = 0


for all odd n ≥ 3. The most basic recurrence relation is, for n ≥ 1,
 n  
n+1
B j = 0. (8)
j=0
j

December 2014] SEMIGROUPS, POLYNOMIALS, AND BERNOULLI NUMBERS 891


n−1
The Bernoulli numbers first arose in the study of power sums S j (n) := k=0 k j . In-
deed, one has (see Rademacher [14])
 
1 
j
j +1
S j (n) = Bi n j+1−i . (9)
j + 1 i=0 i

In Section 5, we consider an infinite family of recurrences for Bm of which the follow-


ing is typical

m
Bm = (1 + 2m−1 + 3m−1 + 5m−1 + 6m−1 + 9m−1 + 10m−1 + 13m−1 + 17m−1 )
4m − 1
m−1    r

7m m 4
+ (1 + 2m−r + 3m−r )Br .
4(1 − 4 ) r =0 r
m 7

The natural numbers 1, 2, 3, 5, 6, 9, 10, 13, and 17 are precisely those that are not in
the numerical semigroup S(4, 7).
Let f = c1 x e1 + · · · + cs x es , where the coefficients ci are nonzero and e1 < e2
< · · · < es . Then the maximum gap of f , written as g( f ), is defined by

g( f ) = max (ei+1 − ei ), g( f ) = 0 when s = 1.


1≤i<s

Hong et al. [9] studied g(n ) (inspired by a cryptographic application [10]). They re-
duce the study of these gaps to the case where n is square-free and odd and established
the following result for the simplest nontrivial case.

Theorem 2. [9]. If p and q are arbitrary primes with 2 < p < q, then

g( pq ) = p − 1.

In Section 6 a conceptual proof of Theorem 2 using numerical semigroups is given.

2. INCLUSION-EXCLUSION POLYNOMIALS. It will turn out to be convenient


to work with a generalization of the cyclotomic polynomials, introduced by Bachman
[1]. Let ρ = {r1 , r2 , . . . , rs } be a set of natural numbers satisfying ri > 1 and (ri , r j )
= 1 for i = j, and put
 n0 n0
n0 = ri , ni = , ni j = [i = j], . . . .
i
ri ri r j

For each such ρ we define a function Q ρ by



(x n0 − 1) · i< j (x
ni j
− 1) · · ·
Q ρ (x) =   · (10)
i (x ni − 1) · i< j<k (x
n i jk
− 1) · · ·

For example, if ρ = { p, q}, then

(x pq − 1)(x − 1)
Q { p,q} (x) = . (11)
(x p − 1)(x q − 1)

892 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121

It can be shown that Q ρ (x) defines apolynomial of degree d := i (ri − 1). We define
its coefficients aρ (k) by Q ρ (x) = k≥0 aρ (k)x k . Furthermore, Q ρ (x) is selfrecipro-
cal; that is aρ (k) = aρ (d − k) or, what amounts to the same thing,
 
1
Q ρ (x) = x Q ρ d
. (12)
x

If all elements of ρ are prime, then comparison of (10) with (4) shows that

Q ρ (x) = r1 r2 ···rs (x). (13)

If n is an arbitrary integer and γ (n) = p1 · · · ps its square-free kernel, then by (5) and
(13) we have Q { p1 ,..., ps } (x n/γ (n) ) = n (x) and hence inclusion-exclusion polynomials
generalize cyclotomic polynomials. They can be expressed as products of cyclotomic
polynomials.

Theorem 3. [1]. Given ρ = {r1 , . . . , rs } and




Dρ = d : d| ri and (d, ri ) > 1 for all i ,
i


then Q ρ (x) = d∈Dρ d (x).

For example, Q {4,7} = 28 14 .

Binary inclusion-exclusion polynomials: A close-up. Lam and Leung [11] discuss


binary cyclotomic polynomials  pq in detail, with p and q primes (their results were
anticipated by Lenstra [12]). Now, let p, q > 1 be positive coprime integers. All ar-
guments in their paper easily generalize to this setting (instead of taking ξ to be a
primitive pqth-root of unity as they do, one has to take ζ a pqth root of unity satisfy-
ing ζ p = 1 and ζ q = 1). One finds that

ρ−1
 
σ −1 
q−1

p−1
Q { p,q} (x) = xip x jq − x − pq xip x jq , (14)
i=0 j=0 i=ρ j=σ

where ρ and σ are the (unique) nonnegative integers for which 1 + pq = ρp + σ q.


On noting that upon expanding the products in identity (14), the resulting monomials
are all different, we arrive at the following result.

Lemma 1. Let p, q > 1 be coprime integers. Let ρ and σ be the (unique) nonnegative
integers for which 1 + pq = ρp + σ q. Let 0 ≤ m < pq. Then either m = αp + βq or
m = αp + βq − pq with 0 ≤ α ≤ q − 1 the unique integer such that αp ≡ m(mod q)
and 0 ≤ β ≤ p − 1 the unique integer such that βq ≡ m(mod p). The inclusion-
exclusion coefficient a{ p,q} (m) equals


⎨1 if m = αp + βq with 0 ≤ α ≤ ρ − 1, 0 ≤ β ≤ σ − 1;
−1 if m = αp + βq − pq with ρ ≤ α ≤ q − 1, σ ≤ β ≤ p − 1;

⎩0 otherwise.

December 2014] SEMIGROUPS, POLYNOMIALS, AND BERNOULLI NUMBERS 893


Corollary 1. The number of positive coefficients in Q { p,q} (x) equals ρσ and the
number of negative ones equals ρσ − 1. The number of nonzero coefficients equals
2ρσ − 1.

This corollary (in case p and q are distinct primes) is due to Carlitz [5].
Lemma 1 can be nicely illustrated with an LLL-diagram (for Lenstra, Lam and
Leung). Here is one such diagram for p = 5 and q = 7:
28 33 3 8 13 18 23
21 26 31 1 6 11 16
14 19 24 29 34 4 9
7 12 17 22 27 32 2
0 5 10 15 20 25 30.
We start with 0 in the lower left and add p for every move to the right and q for every
move upward. Reduce modulo pq. Every integer 0, . . . , pq − 1 is obtained precisely
once in this way (by the Chinese remainder theorem).
Lemma 1 can be reformulated in the following way.

Lemma 2. Let p, q > 1 be coprime integers. The numbers in the lower left corner of
the LLL-diagram are the exponents of the terms in Q { p,q} with coefficient 1. The num-
bers in the upper right corner are the exponents of the terms in Q { p,q} with coefficient
−1. All other coefficients equal 0.

3. TWO PROOFS OF THE MAIN (FOLKLORE) RESULT. In terms of inclusion-


exclusion polynomials, we can reformulate Theorem 1 as follows.

Theorem 4. If p, q > 1 are coprime integers, then PS( p,q) (x) = Q { p,q} (x).

Our first proof will make use of “what is probably the most versatile tool in numer-
ical semigroup theory” [18, p. 8], namely Apéry sets.

First proof of Theorem 4. The Apéry set of S with respect to a nonzero m ∈ S is de-
fined as
Ap(S; m) = {s ∈ S : s − m ∈ S}.

Note that
S = Ap(S; m) + mZ≥0
and that Ap(S; m) consists of a complete set of residues modulo m. Thus, we have
 

1 
HS (x) = xw x mi = x w. (15)
w∈Ap(S;m) i=0
1 − xm w∈Ap(S;m)

Note that if S = S(a1 , . . . , an ), then Ap(S; a1 ) ⊆ S(a2 , . . . , an ). It follows that


Ap(S( p, q); p) consists of multiples of q. The latter set equals the minimal set of mul-
tiples of q representing every congruence class modulo p and hence Ap(S( p, q); p) =
{0, q, . . . , ( p − 1)q} (see [16, Proposition 1] or [18, Example 8.22]). Hence,

1 + x q + · · · + x ( p−1)q 1 − x pq
HS( p,q) (x) = = .
1 − xp (1 − x q )(1 − x p )
Using this identity and (11) easily completes the proof.

894 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
Remark. This proof is an adaptation of the arguments given in [16]. Indeed, once
we know the Apéry set of a numerical semigroup S, by using [16, (4)], we obtain an
expression for HS (x) and consequently for PS (x). Theorem 4 is a particular case of
[16, Proposition 2], with { p, q} = {a, a + d} and k = 1.

Our second proof uses the denumerant (see [15, Chapter 4] for a survey) and the start-
ing point is the observation that

1 
= r ( j)x j , (16)
(1 − x p )(1 − xq) j≥0

where r ( j) denotes the cardinality of the set {(a, b) : a ≥ 0, b ≥ 0, ap + bq = j}. In


the terminology of the introduction, we have r ( j) = d( j; p, q). Concerning r ( j) we
make the following observation.

Lemma 3. Suppose that k ≥ 0, then r (k + pq) = r (k) + 1.

Proof. Put α ≡ kp −1 (mod q), 0 ≤ α < q and β ≡ kq −1 (mod p), 0 ≤ β < p and k0 =
αp + βq. Note that k0 < 2 pq. We have k ≡ k0 (mod pq). Now if k ∈ S, then k < k0
and k + pq = k0 ∈ S (since k0 < 2 pq). It follows that if r (k) = 0, then r (k + pq)
= 1. If k ∈ S, then k = k0 + t pq for some t ≥ 0 and we have r (k) = 1 + t, where we
use that

k = (α + tq) p + βq = (α + (t − 1)q) p + (β + 1) p = · · · = αp + (β + tq) p.

We see that r (k + pq) = 1 + t + 1 = r (k) + 1.

Remark. It is not difficult to derive an explicit formula for r (n) (see, e.g., [2, Section
1.3] or [13, pp. 213–214]). Let p−1 , q −1 denote inverses of p modulo q, respectively,
q modulo p. Then we have
   −1 
n p−1 n q n
r (n) = − − + 1,
pq q p

where {x} denote the fractional-part function. Note that Lemma 3 is a corollary of this
formula.

Second proof of Theorem 4. From Lemma 3 we infer that

 
pq−1


(1 − x pq ) r ( j)x j = r ( j)x j + (r ( j) − r ( j − pq))x j
j≥0 j=0 j= pq

pq−1
 
= r ( j)x j + xj = x j,
j=0 j≥ pq j∈S( p,q)

where we used that r ( j) ≤ 1 for j < pq and r ( j) ≥ 1 for j ≥ pq. Using this identity
and (16) easily completes the proof.

December 2014] SEMIGROUPS, POLYNOMIALS, AND BERNOULLI NUMBERS 895


4. SYMMETRIC NUMERICAL SEMIGROUPS. A numerical semigroup S is
said to be symmetric if

S ∪ (F(S) − S) = Z,

where F(S) − S = {F(S) − s|s ∈ S}. Symmetric semigroups occur in the study of
monomial curves that are complete intersections, Gorenstein rings, and the classifica-
tion of plane algebraic curves; see, e.g., [15, p. 142]. For example, Herzog and Kunz
showed that a Noetherian local ring of dimension one and analytically irreducible is a
Gorenstein ring if and only if its associate value semigroup is symmetric.
We will now show that the self-reciprocity of Q { p,q} (x) implies that S( p, q) is sym-
metric (a well-known result; see, e.g., [18, Corollary 4.7]).

Theorem 5. Let S be a numerical semigroup. Then S is symmetric if and only if PS (x)


is self-reciprocal.

Proof. If s ∈ S ∩ (F(S) − S), then s = F(S) − s1 for some s1 ∈ S. This implies that
F(S) ∈ S, a contradiction. Thus, S and F(S) − S are disjoint sets. Since every inte-
ger n ≥ F(S) + 1 is in S and every integer n ≤ −1 is in F(S) − S, the assertion is
equivalent to showing that
 
xj + x F(S)− j = 1 + x + · · · + x F(S) , (17)
0≤ j≤F(S) 0≤ j≤F(S)
j∈S j∈S

if and only if PS (x) is self-reciprocal. On noting by (1) that


⎛ ⎞
   
1 ⎜ ⎟
x F(S)+1 PS − PS (x) = 1 − x F(S)+1 + (x − 1) ⎝ xj + x F(S)− j ⎠ ,
x 0≤ j≤F(S) 0≤ j≤F(S)
j∈S j∈S

we see that x F(S)+1 PS (1/x) = PS (x) if and only if (17) holds. Clearly, (17) holds if
and only if S is symmetric.

Using the latter result and Theorem 4 we infer the following classical fact.

Theorem 6. A numerical semigroup of embedding dimension 2 is symmetric.

Theorem 4 together with Theorem 3 shows that if e(S) = 2, then PS (x) can be written
as a product of cyclotomic polynomials. This leads to the following problem.

Problem 1. Characterize the numerical semigroups S for which PS (x) can be written
as a product of cyclotomic polynomials.

Since PS (0) = 0, the product cannot involve 1 (x) = x − 1 and so it is self-


reciprocal. Therefore, by Theorem 5 such an S must be symmetric. Ciolan et al. [6]
make some progress toward solving this problem and show, e.g., that PS (x) can be
written as a product of cyclotomic polynomials also if e(S) = 3 and S is symmetric.

896 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
5. GAP DISTRIBUTION. The nonnegative integers not in S are called the gaps of
S. For example, the gaps in S(4, 7) are 1, 2, 3, 5, 6, 9, 10, 13, and 17. The number of
gaps of S is called the genus of S and denoted by N (S). The set of gaps is denoted by
G(S). The following well-known result holds; see [15, Lemma 7.2.3] or [18, Corollary
4.7].

Theorem 7. We have 2N (S) ≥ F(S) + 1 with equality if and only if S is symmetric.

Proof. The proof of Theorem 5 shows that

2#{0 ≤ j ≤ F(S) : j ∈ S} ≤ F(S) + 1,

with equality if and only if S is symmetric. Now use that

#{0 ≤ j ≤ F(S) : j ∈ S} = F(S) + 1 − N (S).

From (2) and Theorem 1 we infer the following well-known result due to Sylvester:

F(S( p, q)) = pq − p − q. (18)

From Theorem 6, Theorem 7, and (18), we obtain another well-known result of


Sylvester:

N (S( p, q)) = ( p − 1)(q − 1)/2. (19)

For four different proofs of (18) and more background, see [15, pp. 31–34]; the shortest
proof of (18) and (19) the author knows of is in the book by Wilf [23, p. 88].
Additional information on the gaps is given by the so-called Sylvester sum

σk ( p, q) := sk .
s∈ S( p,q)

By (19) we have σ0 ( p, q) = ( p − 1)(q − 1)/2. By (1) and Theorem 4, we infer that


 1 − Q { p,q} (x)
xj = . (20)
j∈ S( p,q)
1−x

It is not difficult to derive a formula for σk ( p, q) 


for arbitrary k. On substituting x = e z
and recalling the Taylor series expansion e = k≥0 z k /k!, we obtain from (20) and
z

(11) the identity



zk e pqz − 1 1
σk ( p, q) = pz − z . (21)
k=0
k! (e − 1)(e − 1) e − 1
qz

We obtain from (21), on multiplying by z and using the Taylor series expansion (7),
that


zm ∞
zi 

z j  ( pqz)k
∞ 

zm
mσm−1 ( p, q) = Bi pi Bjq j − Bm .
m=1
m! i=0
i! j=0 j! k=0 (k + 1)! m=0 m!

Equating coefficients of z m then leads to the following result.

December 2014] SEMIGROUPS, POLYNOMIALS, AND BERNOULLI NUMBERS 897


Theorem 8. [17]. For m ≥ 1 we have
m m−i  
1  m+1
mσm−1 ( p, q) = Bi B j pm− j q m−i − Bm .
m + 1 i=0 j=0 i, j, m + 1 − i − j

Using this formula we find, e.g., that

1
σ1 ( p, q) = ( p − 1)(q − 1)(2 pq − p − q − 1)
12
(this result is due to Brown and Shiue [3]) and
1
σ2 ( p, q) = ( p − 1)(q − 1) pq( pq − p − q).
12
The proof we have given here of Theorem 8 is due to Rødseth [17], with the difference
that we gave a different proof of the identity (21).
By using the formula (9) for power sums, we obtain from Theorem 8 the identity
m  
 m
mσm−1 ( p, q) = pm−r −1 Bm−r q r Sr ( p) − Bm ,
r =0
r

giving rise to the following recursion formula for Bm :


m−1    r

m qm m p
Bm = m σm−1 ( p, q) + Br Sm−r ( p).
p −1 p(1 − pm ) r =0 r q

On taking p = 4 and q = 7, we obtain the recursion for Bm stated in the introduction.


Tuenter [20] established the following characterization of the gaps in S( p, q). For
every finite function f ,

 
p−1
( f (n + p) − f (n)) = ( f (nq) − f (n)),
n∈ S n=1

where p and q are interchangeable. He shows that by choosing f appropriately one


can recover all earlier results mentioned in this section and in addition the identity
 
(n + p) = q p−1 n.
n∈ S( p,q) n∈ S( p,q)

Wang and Wang


 [21] obtained results similar to those of Tuenter for the alternate
Sylvester sums s∈ S( p,q) (−1)s s k .

6. A REPROOF OF THEOREM 2. As mentioned previously, the gaps for S(4, 7)


are given by 1, 2, 3, 5, 6, 9, 10, 13 and 17. One could try to break this down in terms
of gap blocks, that is, blocks of consecutive gaps, (also known in the literature as
deserts [7, Definition 16])): {1, 2, 3}, {5, 6}, {9, 10}, {13}, and {17}. It is interesting to
compare this with the distribution of the element blocks, that is, finite blocks of con-
secutive elements in S. For S(4, 7) we get {0}, {4}, {7, 8}, {11, 12}, and {14, 15, 16}.
The longest gap block we denote by g(G(S)) and the longest element block by g(S).

898 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
The following result gives some information on gap blocks and element blocks in
a numerical semigroup of embedding dimension 2. Recall that the smallest positive
integer of S is called the multiplicity and denoted by m(S).

Lemma 4.
1) The longest gap block, g(G(S)), has length m(S) − 1.
2) The longest element block, g(S), has length not exceeding m(S) − 1.
3) If S is symmetric, then g(S) = m(S) − 1.

Proof. 1) Let S = {s0 , s1 , s2 , s3 , . . .} be the elements of S written in ascending or-


der, i.e., 0 = s0 < s1 < s2 < · · · . Since s0 = 0 and s1 = m(S), we have g(G(S))
≥ m(S) − 1. Since all multiples of m(S) are in S, it follows that actually g(G(S))
= m(S) − 1.
2) If g(S) ≥ m(S), it would imply that we can find k, k + 1, . . . , k + m(S) − 1 all in
S such that k + m(S) ∈ S. This is clearly a contradiction.
3) If S is symmetric, then we clearly have g(S) = g(G(S)) = m(S) − 1.

Remark. The second observation was made by my intern Alexandru Ciolan. It allows
one to prove Theorem 10.

Finally, we will generalize a result of Hong et al. [9].

Theorem 9. If p, q > 1 are coprime integers, then g(Q { p,q} (x)) = min{ p, q} − 1.

Proof. Note that g(Q { p,q} (x)) equals the maximum of the longest gap block length and
the longest element block length, and hence, by Lemma 4 equals m(S( p, q)) − 1 =
min{ p, q} − 1.

This result can be easily generalized further.

Theorem 10. We have g(PS (x)) = m(S) − 1.

Proof. Using that PS (x) = (1 − x)HS (x) and Lemma 4 we infer that g(PS (x)) =
max{g(S), g(G(S))} = m(S) − 1.

7. THE LLL-DIAGRAM REVISITED. It is instructive to indicate (we do this in


boldface) the gaps of S( p, q) in the LLL-diagram. They are the elements

{αp + βq : 0 ≤ α ≤ q − 1, 0 ≤ β ≤ p − 1, αp + βq > pq}.

Note that the Frobenius number equals (q − 1) p + ( p − 1)q − pq and so appears in


the top right-hand corner of the LLL-diagram. We will demonstrate this (again) for
p = 5 and q = 7:
28 33 3 8 13 18 23
21 26 31 1 6 11 16
14 19 24 29 34 4 9
7 12 17 22 27 32 2
0 5 10 15 20 25 30.

December 2014] SEMIGROUPS, POLYNOMIALS, AND BERNOULLI NUMBERS 899


As a check, we can verify that N (S( p, q)) = ( p − 1)(q − 1)/2 integers appear in
boldface.
On comparing coefficients in the identity
 
(1 − x) xj = a{ p,q} ( j)x j
j∈S( p,q) j≥0

we get the following reformulation of Theorem 4 at the coefficient level.

Theorem 11. If p, q > 1 are coprime integers, then




⎨1 if k ∈ S( p, q), k − 1 ∈ S( p, q);
a{ p,q} (k) = −1 if k ∈ S( p, q), k − 1 ∈ S( p, q);

⎩0 otherwise.

Corollary 2. The nonzero coefficients of Q { p,q} alternate between 1 and −1.

The next result gives an example where an existing result on cyclotomic coefficients
yields information about numerical semigroups.

Theorem 12. Let p, q, ρ and σ be as in Lemma 1. If S = S( p, q), then there are


ρσ − 1 gap blocks and ρσ − 1 element blocks.

Proof. In view of Theorem 11 we have a{ p,q} (k) = 1 if and only if k is at the start of an
element block (including the infinite block [F(S) + 1, ∞) ∩ Z). Moreover, a{ p,q} (k) =
−1 if and only if k is at the end of a gap block. The proof is now completed by invoking
Corollary 1.

Using Lemma 2 and Theorem 11 our folklore result can now be reformulated in
terms of the LLL-diagram.

Theorem 13. Let p, q > 1 be coprime integers and denote S( p, q) ∩ {0, . . . , pq − 1}


by T . The integers k ∈ T such that k − 1 ∈ T are precisely the integers in the lower-
left corner of the LLL-diagram. The integers k ∈ T such that k − 1 ∈ T are precisely
the integers in the upper-right corner. If k is not in the lower-left or upper-right corner,
then either k ∈ T and k − 1 ∈ T or k ∈ T and k − 1 ∈ T .

Denote S( p, q) by S. Note that the upper-right integer in the lower-left corner of


the LLL-diagram equals F(S) + 1 and that the remaining integers in the lower-left
corner are all < F(S). This observation together with (19) then leads to the following
corollary of Theorem 13.

Corollary 3. If p, q > 1 are coprime integers, then




⎪ {0 ≤ k ≤ F(S) : k ∈ S, k − 1 ∈ S} = ( p − 1)(q − 1)/2 − ρσ + 1;

⎨ {0 ≤ k ≤ F(S) : k ∈ S, k − 1 ∈ S} = ρσ − 1;

⎪ {0 ≤ k ≤ F(S) : k ∈ S, k − 1 ∈ S} = ρσ − 1;


{0 ≤ k ≤ F(S) : k ∈ S, k − 1 ∈ S} = ( p − 1)(q − 1)/2 − ρσ − 1.

900 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
The distribution of the quantity ρσ that appears at various places in this article has
been recently studied using deep results from analytic number theory by Bzdega [4]
and Fouvry [8]. In particular, they are interested in counting the number of integers
m = pq ≤ x with p, q distinct primes such that θ(m), the number of nonzero coeffi-
cients of m , satisfies θ(m) ≤ m 1/2+γ , with γ > 0 fixed. (Note that by Corollary 1 we
have θ(m) = 2ρσ − 1.)

ACKNOWLEDGMENT. I would like to thank Matthias Beck, Scott Chapman, Alexandru Ciolan, Pedro A.
Garcı́a-Sánchez, Nathan Kaplan, Bernd Kellner, Jorge Ramı́rez Alfonsı́n, Ali Sinan Sertoz, Paul Tegelaar, and
the three referees for helpful comments. Alexandru Ciolan pointed out to me that g(S) ≤ m(S) − 1, which
allows one to prove Theorem 10.

REFERENCES

1. G. Bachman, On ternary inclusion-exclusion polynomials, Integers 10 (2010) 623–638, http://dx.


doi.org/10.1515/INTEG.2010.048.
2. M. Beck, S. Robins, Computing the Continuous Discretely: Integer-point Enumeration in Polyhedra.
Undergraduate Texts in Mathematics. Springer, New York, 2007, http://dx.doi.org/10.1007/
978-0-387-46112-0.
3. T. C. Brown, P. J.-S. Shiue, A remark related to the Frobenius problem, Fibonacci Quart. 31 (1993)
32–36.
4. B. Bzdega, Sparse binary cyclotomic polynomials, J. Number Theory 132 (2012) 410–413, http://dx.
doi.org/10.1016/j.jnt.2011.08.002.
5. L. Carlitz, The number of terms in the cyclotomic polynomial F pq (x), Amer. Math. Monthly 73 (1966)
979–981, http://dx.doi.org/10.2307/2314500.
6. A. Ciolan, P. A. Garcı́a-Sánchez, P. Moree, Cyclotomic numerical semigroups, http://xxx.tau.ac.
il/pdf/1409.5614.pdf. Submitted for publication.
7. J. I. Farrán, C. Munuera, Goppa-like bounds for the generalized Feng–Rao distances, International Work-
shop on Coding and Cryptography (WCC 2001) (Paris), Discrete Appl. Math. 128 (2003) 145–156,
http://dx.doi.org/10.1016/s0166-218x(02)00441-9.
8. É. Fouvry, On binary cyclotomic polynomials, Algebra Number Theory 7 (2013) 1207–1223, http://
dx.doi.org/10.2140/ant.2013.7.1207.
9. H. Hong, E. Lee, H.-S. Lee, C.-M. Park, Maximum gap in (inverse) cyclotomic polynomial, J. Number
Theory 132 (2012) 2297–2315, http://dx.doi.org/10.1016/j.jnt.2012.04.008.
10. H. Hong, E. Lee, H.-S. Lee, C.-M. Park, Simple and exact formula for minimum loop length in Atei
pairing based on Brezing-Weng curves, Des. Codes Cryptogr. 67 (2013) 271–292, http://dx.doi.
org/10.1007/s10623-011-9605-y.
11. T. Y. Lam, K. H. Leung, On the cyclotomic polynomial  pq (x), Amer. Math. Monthly 103 (1996) 562–
564, http://dx.doi.org/10.2307/2974668.
12. H. W. Lenstra, Vanishing sums of roots of unity. Proceedings, Bicentennial Congress Wiskundig
Genootschap (Vrije Univ., Amsterdam, 1978), Part II, 249–268, Math. Centre Tracts 101 Math. Cen-
trum, Amsterdam, 1979.
13. I. Niven, H.S. Zuckerman, H.L. Montgomery, An Introduction to the Theory of Numbers. Fifth edition.
John Wiley & Sons, New York, 1991.
14. H. Rademacher, Topics in Analytic Number Theory. Die Grundlehren der mathematischen Wis-
senschaften 169 Springer-Verlag, New York-Heidelberg, 1973, http://dx.doi.org/10.1007/
978-3-642-80615-5.
15. J. L. Ramı́rez Alfonsı́n, The Diophantine Frobenius Problem. Oxford Lecture Series in Mathematics and
Its Applications. Vol. 30. Oxford Univ. Press, Oxford, 2005, http://dx.doi.org/10.1093/acprof:
oso/9780198568209.001.0001.
16. J. L. Ramı́rez Alfonsı́n, Ø. J. Rødseth, Numerical semigroups: Apéry sets and Hilbert series, Semigroup
Forum 79 (2009) 323–340, http://dx.doi.org/10.1007/s00233-009-9133-5.
17. Ø.J. Rødseth, A note on Brown and Shiue’s paper on a remark related to the Frobenius problem,
Fibonacci Quart. 32 (1994) 407–408.
18. J. C. Rosales, P. A. Garcı́a-Sánchez, Numerical Semigroups. Developments in Mathematics. Vol. 20.
Springer, New York, 2009, http://dx.doi.org/10.1007/978-1-4419-0160-6_3.
19. R. Thangadurai, On the coefficients of cyclotomic polynomials, in Cyclotomic Fields and Related
Topics. Bhaskaracharya Pratishthana, Pune, 2000. 311–322.

December 2014] SEMIGROUPS, POLYNOMIALS, AND BERNOULLI NUMBERS 901


20. H. J. H. Tuenter, The Frobenius problem, sums of powers of integers, and recurrences for the Bernoulli
numbers, J. Number Theory 117 (2006) 376–386, http://dx.doi.org/10.1016/j.jnt.2005.06.
015.
21. W. Wang, T. Wang, Alternate Sylvester sums on the Frobenius set, Comput. Math. Appl. 56 (2008)
1328–1334, http://dx.doi.org/10.1016/j.camwa.2008.02.031.
22. S. H. Weintraub, Several proofs of the irreducibility of the cyclotomic polynomials, Amer. Math. Monthly
120 (2013) 537–545, http://dx.doi.org/10.4169/amer.math.monthly.120.06.537.
23. H. S. Wilf, Generatingfunctionology. Academic Press, Boston, MA, 1990.

PIETER MOREE is since 2004 Researcher/Scientific Coordinator (half/half) at the Max Planck Insitute for
Mathematics. His main scientific interest is in problems that require both algebraic and analytic number the-
ory for their solution, e.g., problems related to the Artin primitive root conjecture. He enjoys playing tennis,
reading, and making new friends.
Max-Planck-Institut für Mathematik, Vivatsgasse 7, D-53111 Bonn, Germany
moree@mpim-bonn.mpg.de

902 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
A Perron-type Theorem on the
Principal Eigenvalue of Nonsymmetric
Elliptic Operators
Lei Ni

And I cherish more than anything else the Analogies,


my most trustworthy masters.
They know all the secrets of Nature. —Kepler

Abstract. We provide a proof for a Perron-type theorem on the principal eigenvalue of non-
symmetric elliptic operators based on the strong maximum principle. This proof is modeled
after a variational proof of Perron’s theorem for matrices with positive entries that does not
appeal to Perron–Frobenius theory.

1. INTRODUCTION. Perron’s theorem (cf. [3, Theorem 1, Ch. 13]) asserts that a
square matrix A = (αi j ) with positive entries αi j > 0 must possess a positive eigen-
value with multiplicity one. Moreover, for this positive eigenvalue, there exists an
eigenvector whose entries are all positive. The purpose of this note is to prove an
analogous result for second order elliptic operators, which we will now describe.
Let  be a smooth bounded domain in Rn and let


n
∂2  n

L=− a (x)ij
+ bk (x) + c(x)
i, j=1
∂ xi ∂ x j k=1
∂ xk

be an elliptic operator defined on . For simplicity, we assume that a i j (x), bk (x),


c(x) ∈ C ∞ (). We also assume that L is uniformly elliptic, i.e., there exists a constant
θ > 0 so that for all x ∈ ,


n
a i j (x)ξi ξ j ≥ θ|ξ |2 (1)
i, j=1

for any ξ ∈ Rn . For the purpose of our discussion on the spectrum of L (with zero
boundary data), we may assume, by adding a constant if necessary, that c(x) ≥ 0.
Since L is not necessarily self-adjoint, its eigenvalues are in general complex numbers.
However, there exists the following analog of Perron’s theorem for positive matrices.

Theorem 1.1.
(i) There exists a real eigenvalue λ1 > 0 for the operator L with zero boundary
condition.
(ii) This eigenvalue is of multiplicity one, in the sense that there exists an eigen-
function w1 (x) > 0 in  with w1 |∂ = 0, and if u is any other function not
http://dx.doi.org/10.4169/amer.math.monthly.121.10.903
MSC: Primary 35P15, Secondary 58J05; 47A75

December 2014] A PERRON-TYPE THEOREM 903


identically zero satisfying Lu = λ1 u, in  and u|∂ = 0, then u must be a
multiple of w1 .
(iii) Furthermore, if for some function v not identically zero, Lv = λv and λ = λ1 ,
then e(λ) > λ1 .

The eigenvalue λ1 above is called the principal eigenvalue of the operator L. We


can find a proof of Theorem 1.1 in [2, Section 6.5], which is now a classic text on
partial differential equations (PDE). The proof in [2] makes use of iterations and the
sophisticated Schaefer’s fixed point theorem. In this note, we begin by giving a simple
proof of Perron’s original theorem for positive matrices. Then using the idea of this
proof, together with some standard results in the basic theory of second-order elliptic
PDE, such as the strong maximum principle, we give a more direct proof of Theo-
rem 1.1.
The main property of Sobolev spaces H k () = W k,2 () that will be needed here
is the Rellich–Kondrachov compactness theorem. Our proof (as well as that of [2])
also assumes some knowledge of the L 2 -theory of elliptic operators concerning the
solvability and the regularity of weak solutions. We can find these basic results, for
example, in Theorem 1 of Section 5.7, Theorem 6 of Section 6.2, and Theorem 5 of
Section 6.3 in [2].

2. A PROOF OF PERRON’S THEOREM. In this section, we give a proof of Per-


ron’s theorem. In the next section, we will use analogous methods for the proof of
the corresponding PDE result. We first fix the convention that for an (m × n)-matrix
B = (βi j ), B > 0 (respectively, B ≥ 0) means that each entry βi j > 0 (respectively,
βi j ≥ 0), and A ≥ B means that A − B ≥ 0. For a vector x, we apply the same con-
vention by viewing it as an (m × 1)-matrix. Below is a proof of Perron’s theorem for
a positive square (m × m)-matrix A.

Proof. Set = {ε > 0|ε ∈ R, Ax ≥ εx, for some vector x ≥ 0, x = 0}.


It is easy to see that = ∅ and that it is bounded. Let λ1 = sup . We now show
that there exists a vector x > 0 such that Ax = λ1 x. First, by the definition of λ1 , we
can pick λ( j) ∈ such that the sequence {λ( j) } converges to λ1 . By the definition of
we also have vectors x( j) ≥ 0 such that Ax( j) ≥ λ( j) x( j) . Without loss of generality,
we may choose x( j) with x( j)  = 1. After possibly passing to a subsequence, we may
also assume that x( j) converges to a vector x. By the way x is obtained, it is clear that
x ≥ 0, Ax ≥ λ1 x, and x = 1. To show that Ax = λ1 x, we use the following simple
lemma.

Lemma 2.1. For any vector y ≥ 0 with y = 0, Ay > 0. In particular, for any real
vector z, there exists real number ε > 0 such that Ay > εz.

m The assumption y = 0 implies that y j > 0 for some 1 ≤ j ≤ m. Hence,


Proof.
k=1 aik yk ≥ ai j y j > 0. For the second statement, we may let, for example,

δ
ε=
max1≤i≤m (|z i | + 1)

where δ = min1≤i≤m (Ay)i .

To finish the proof of Ax = λ1 x, we use reductio ad absurdum. Assume that Ax


= λ1 x and we will derive a contradiction. Let y = Ax − λ1 x. Thus y = 0 by assump-

904

c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121


tion, and since y = limi→∞ Ax( j) − λ( j) x( j) , we also have y ≥ 0. By Lemma 2.1, there
is an ε > 0 so that Ay > ε Ax. This implies that Az > (λ1 + ε)z for z = Ax > 0, con-
tradicting the fact that λ1 = sup . Hence, Ax = λ1 x. In addition, Lemma 2.1 implies
that Ax > 0 and also x > 0, i.e., all the entries of the eigenvector x are positive.
To finish the proof of Perron’s theorem, we must show that, up to a positive scalar
constant, x is the unique eigenvector with eigenvalue λ1 > 0. First, we observe that if
x = 0 is a vector such that Ax = λ1 x , and x ≥ 0, Lemma 2.1 implies x > 0. Now for
any vector y = 0 with Ay = λ1 y, we can find a real number c such that cx − y ≥ 0 and
such that at least one entry of cx − y is equal to 0 (i.e., there exists i, with 1 ≤ i ≤ m
so that cxi − yi = 0). We claim that this implies cx − y = 0 so that y = cx. Suppose
not; then if we let x = cx − y, we would have x = 0, Ax = λ1 x , and x ≥ 0, but x is
not positive, contradicting the observation we made at the beginning of this paragraph.
This proves that y = cx and λ1 is of multiplicity one.

We now make an additional observation. If λ is an eigenvalue (which in general is


a complex number) of A with an eigenvector z, then let w = abs(z), the nonnegative
vector obtained by taking the norm of each entry of the vector z. It is easy to see
that Aw ≥ |λ|w, and equality holds if and only if w > 0, λ > 0. Since λ1 = sup , it
implies that λ1 ≥ |λ|.
The above proof is basically the same as that of [1], which was attributed to Bohnen-
bust.

3. THE PDE CASE. We proceed to give a proof of Theorem 1.1 along the same line
of arguments as above. Let L be the uniformly elliptic operator of Section 1. First,
recall the following strong maximum principle (see Corollary 2.8, 2.9 of [4], as well
as Theorem 4 and Lemma of Section 6.4 in [2]).

Theorem 3.1. Assume that u ∈ C 2 () ∩ C() satisfies Lu ≥ 0 and u|∂ = 0. Then
u > 0 in  unless u ≡ 0. If u is not identically 0, then ∂u
∂ν
< 0 on ∂, where ν is the
exterior unit normal of ∂.

As a consequence of the above maximum principle, we conclude that 0 is not an


eigenvalue of L. Hence, by [2, Theorem 6 of Ch. 6.2], L has a well-defined inverse L −1
on L 2 () such that it is a bounded operator from L 2 () → H 2 () ∩ H01 (), where
H01 () = W01,2 () denotes the first L 2 -Sobolev space with vanishing boundary value.
This implies that there exists C such that

L −1 ( f )2 ≤ C f  L 2 . (2)

Here  · k denotes the Sobolev norm of H k (). Moreover, by [2, Theorem 5 of Ch.
6.3] there exists Ck ≥ 0 such that L −1 ( f )k+2 ≤ Ck  f k for any k ≥ 0. Let k0 be
an integer so large, but fixed, that C 2 () ⊂ H k0 −1 (), and let X be the Hilbert space
H k0 () ∩ H01 (). Elliptic regularity ensures that L −1 maps X into X . In our discus-
sions below, L −1 : X → X is the infinite dimensional analogue of the linear transfor-
mation defined by a positive square matrix A : Rm → Rm in the previous section.
For the proof of Theorem 1.1, we need an infinite dimensional analogue of Lemma
2.1.

Lemma 3.1. For any nonzero u ∈ X , u ≥ 0, let w = L −1 (u). Then w > 0 in  and
w|∂ = 0. Furthermore, for any v ∈ C 2 () with v|∂ = 0, there exists an ε > 0 such
that w ≥ εv.

December 2014] A PERRON-TYPE THEOREM 905


Proof. By the maximum principle, Theorem 3.1, we conclude that w > 0 in  and
∂w
∂ν
< 0 on ∂. We now prove the last statement. Consider a general boundary
point x ∈ ∂. After a local change of coordinates, we may assume that there is a
neighborhood U of x on which is defined a coordinate system x = (x , xn ) with x
= (x1 , . . . , xn−1 ) such that U ∩ ∂ is defined by xn = 0 and U ∩  is defined by
xn > 0. Theorem 3.1 implies ∂∂w xn
> 0. Consequently, by the Taylor expansion in xn ,
we have
∂w
w(x , xn ) = w(x , 0) + (x , 0)xn + o(xn )
∂ xn

and
∂v
v(x , xn ) = v(x , 0) + (x , 0)xn + o(xn ).
∂ xn

Observe that w(x , 0) = v(x , 0) = 0 and ∂∂w


xn
(x , 0) > 0. Therefore, a comparison of
the two equations above shows that in a neighborhood of ∂, w ≥ εv for some ε > 0.
By the continuity of w, v and the positivity of w in , this implies the claim of the
lemma.

Now we present the proof of Theorem 1.1, the infinite dimensional analogue of
Perron’s theorem.

Proof. Let  = {ε > 0|L −1 ( f ) > ε f in , for some f ∈ X, f > 0 in }. In view of
(2), it is easy to see that  is bounded. Lemma 3.1 implies that  = ∅. Let μ1 = sup .
Pick μ( j) ∈  with μ( j) → μ1 . By the definition of , there exist u j ∈ X , with
u j (x) > 0 for x ∈  and u j |∂ = 0 such that L −1 (u j ) > μ( j) u j . Without loss of gen-
erality, we assume that u j  L 2 = 1. Since we can only infer that a subsequence of
{u j } is weakly convergent, we shall employ a finite iteration to get a better convergent
subsequence {z j } with each term z j satisfying the same properties as the correspond-
ing u j .
First, for any fixed j and l with 1 ≤ l ≤ k0 , let z j,l = (L −1 )l (u j ). We prove induc-
tively using the maximum principle that z j,l > μ( j) z j,l−1 in . Indeed, for l = 1, this
follows from the choice of z j,1 = L −1 (u j ) and z j,0 = u j . Assume that the claimed
inequality holds for l, namely z j,l > μ( j) z j,l−1 . Since z j,l+1 − μ( j) z j,l = L −1 (z j,l −
μ( j) z j,l−1 ), Theorem 3.1 and the inductive hypothesis imply that z j,l+1 > μ( j) z j,l .
On the other hand, the standard elliptic estimate ([2, Theorem 2, Section 6.3]) as-
serts that there exists a positive constant C depending on L and  such that z j,l 2l
≤ C l u j  L 2 . Let z j = z j,k0 . In particular, there exists a constant C independent of
j such that z j 2k0 ≤ C . By the Rellich–Kondrachov compactness theorem, after
passing to a subsequence if necessary, lim j→∞ z j = w1 in H k0 ()-norm, for some
w1 ∈ X . Note that L −1 (z j ) = z j,k0 +1 ≥ μ( j) z j,k0 = μ( j) z j . From this, we deduce that
L −1 (w1 ) ≥ μ1 w1 , w1 |∂ = 0, and w1 ≥ 0. To see that w1 = 0, we observe that z j
≥ (μ( j) )k0 u j ≥ 0; hence, z j  L 2 ≥ (μ( j) )k0 . Now the argument in the linear algebra
proof, replacing Lemma 2.1 by Lemma 3.1, implies that L −1 w1 = μ1 w1 . In particular,
w1 > 0 in  and w1 |∂ = 0. Taking λ1 = μ11 , this proves the existence of the principal
eigenvalue and a corresponding positive eigenfunction.
To show that λ1 is of multiplicity one, assume that Lu = λ1 u for a real-valued func-
tion u. Equivalently, we have L −1 u = μ1 u with μ1 = λ11 . Let = {η > 0|w1 − ηu
≥ 0}. Clearly is nonempty by Lemma 3.1 and is bounded. Then let η1 = sup .

906

c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121


Lemma 3.1 asserts that w1 − η1 u ≡ 0 since otherwise L −1 (w1 − η1 u) = μ1 (w1 −
η1 u) > εu for some ε > 0, which contradicts η1 being the supremum of . Thus,
u is a multiple of w1 . (This particular part of the argument is quite close to that of [2,
Section 6.5].)
Finally, we prove part (iii) of Theorem 1.1. Let u (which in general is complex
valued) be an eigenfunction with eigenvalue λ. By applying the maximum principle
to |u|2 /w12−2ε , for any small ε > 0, it was shown in [2, Section 6.5] (see also [6])
that e(λ) ≥ λ1 . This result can also be proved via the following slightly different
argument, which also shows a sharper result: e(λ) > λ1 , for any λ = λ1 . That is a
sharpening of the result.
Let v = wu1 , L = L − c(x). A direct calculation yields


n
∂v ∂ v̄  n
∂ log w1 ∂|v|2

L |v| = 2(e(λ) − λ1 )|v| − 2
2 2
a +2
ij
ai j .
i, j=1
∂ xi ∂ x j i, j=1
∂ xi ∂x j

As in [7], the regularity of |v|2 (which implies the finiteness of L |v|2 ) and the sin-
gularity of ∂ log w1 2 2
∂ xi
along ∂ imply that ∂|v|
∂N
= 0 on ∂. Here ∂|v|
∂N
is defined as
n i j ∂|v| 2 ∂
i, j=1 a ∂ xi ν j with ν being the exterior normal and ν j = ν, ∂ x j . Suppose that
e(λ) ≤ λ1 . Since v is nonconstant, the strong maximum principle implies that |v|2
can only attain its maximum (on ) at some boundary point x0 ∈ ∂. But Hopf’s
2
lemma (see Theorem 2.5 of [4]) asserts that ∂|v|
∂N
> 0 at x0 , which contradicts the just-
∂|v|2
established conclusion that ∂N
= 0 on ∂. The contradiction proves e(λ) > λ1 .

Note that the same argument as the one above proves that if μ is any Neumann
eigenvalue of the operator L with nonconstant eigenfunction, then e(μ) > 0. In fact,
the slightly better result e(μ) > min c(x) holds.
It seems interesting to estimate the gap e(λ) − λ1 from below in terms of the
geometry of the coefficients a i j (x), bk (x), and c(x), as well as that of . Another in-
teresting question is whether there is a generalization of this result to hypo-elliptic
operators. Similarly, we can ask for an effective positive lower estimate for the non-
trivial Neumann eigenvalue μ (i.e., the eigenvalue with nonconstant eigenfunctions)
for the operator L with c(x) = 0. For this last problem, consult [5] for some recent
progress for the case that the domain  is convex.

ACKNOWLEDGMENT. The work is partially supported by NSF grant DMS-1105549. We thank Professor
Guershon Harel for the encouragement on writing this article up and Professor Jim Isenberg for editing help.
We particularly thank Professor Peter Li, Professor Nolan Wallach, and Professor Hung-Hsi Wu for their help,
which greatly improved the article’s exposition. Finally, we thank one of the referees who pointed out an
oversight in the first version and suggested various improvements.

REFERENCES

1. R. Bellman, Introduction to Matrix Analysis. Second edition. McGraw-Hill, New York, 1970.
2. L. Evans, Partial Differential Equations. Second edition. American Mathematical Society, Providence, RI,
2010.
3. F. Gantmacher, The Theory of Matrices. Chelsea, New York, 1977.
4. Q. Han, F.-H. Lin, Elliptic Partial Differential Equations. Second edition. American Mathematical Society
and Courant Institute of Mathematical Sciences, Providence, RI, 2011.
5. L. Ni, Estimates on the modulus of expansion for vector fields solving nonlinear equations, J. Math. Pures
Appl. 99 (2013) 1–16, http://dx.doi.org/10.1016/j.matpur.2012.05.009.

December 2014] A PERRON-TYPE THEOREM 907


6. M. Protter, H. Weinberger, Maximum Principles in Differential Equations. Prentice-Hall, Englewood
Cliffs, NJ, 1967.
7. I.-M. Singer, B. Wong, S.-T. Yau, Stephen S.-T. Yau, An estimate of the gap of the first two eigenvalues in
the Schrödinger operator, Ann. Sc. Norm. Super. Pisa Cl. Sci. 12 (1985) 319–333.

LEI NI received his Ph.D from the University of California, Irvine, in 1998. He was a research assistant pro-
fessor at Purdue University from 1998–2000 and a Szegő assistant professor at Stanford from 2000–2002
before he joined the faculty at the University of California, San Diego, where he is now a professor of mathe-
matics. His research interests include differential geometry and partial differential equations.
Department of Mathematics, University of California at San Diego, La Jolla, CA 92093
lni@math.ucsd.edu

908

c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121


Absolute Convergence in Ordered Fields
Pete L. Clark and Niels J. Diepeveen

Abstract. We explore the distinction between convergence and absolute convergence of se-
ries in both Archimedean and non-Archimedean ordered fields and find that the relationship
between them is closely connected to sequential (Cauchy) completeness.


1. INTRODUCTION.∞ A real series ∞ n=1 an is absolutely convergent if the “ab-
solute series” n=1 |an | converges. This is a strange and rather sneaky terminology;
many calculus students have heard “the series is absolutely convergent” as “absolutely,
the series is convergent.” The language strongly (and somewhat subliminally) hints
that
∞ an absolutely convergent series should converge. Fortunately this holds. Since
n=1 |an | converges, the partial sums form a Cauchy sequence—for all ε > 0, there is
N ∈ Z+ such that for all n ≥ N and m ≥ 0, we have n+m k=n |ak | < ε. Thus

m+n  m+n
  
 
 ak  ≤ |ak | < ε,
 
k=n k=n


and ∞ n=1 an converges by the Cauchy criterion.
Above we used the convergence of Cauchy sequences—i.e., that R is sequentially
complete. In the spirit of “real analysis in reverse”—cf. [5] and [3]—it is natural to
ask about convergence versus absolute convergence in an arbitrary ordered field. In
a recent note, Kantrowitz and Schramm ask whether in an ordered field F, every ab-
solutely convergent series is convergent if and only if F is sequentially complete [2,
Question 3]. We will answer this question and also determine the ordered fields in
which every convergent series is absolutely convergent.
Let us begin with a taxonomic refresher on ordered fields. An ordered field F is
Dedekind complete if every nonempty subset S of F that is bounded above admits
a least upper bound. Dedekind complete ordered fields exist [1, §5], and if F1 and F2
are Dedekind complete ordered fields, there is a unique field homomorphism ι : F1 →
F2 , which is moreover an isomorphism of ordered fields [1, Cor. 3.6, Cor. 3.8]. This
essentially unique Dedekind complete ordered field is, of course, denoted by R and
called the field of real numbers.
For x, y ∈ F we write x  y if n|x| < |y| for all n ∈ Z+ . An ordered field F
is non-Archimedean if there is x ∈ F with 1  x—equivalently, if Z+ is bounded
above in F—otherwise F is Archimedean. Notice that for x, y ∈ F with x = 0, x 
y holds if and only if 1  xy . Thus F is non-Archimedean if and only if x  y for
some x, y ∈ F with x = 0.

http://dx.doi.org/10.4169/amer.math.monthly.121.10.909
MSC: Primary 40J05

December 2014] ABSOLUTE CONVERGENCE 909


Example. The formal Laurent series field R((t)) (see, e.g., [3, §3]) has a unique or-
dering extending the ordering on R in which 1t is greater than every real number. In
particular 1  1t , so R((t)) is non-Archimedean.
Explicitly, every nonzero a ∈ R((t)) has the form



a= bm t m , b M = 0, (1)
m=M

and then a has the same sign as b M . Put v(a) = M and also put v(0) = ∞. Then a
sequence {an } in R((t)) converges to 0 if and only if v(an ) converges to ∞.
Let {an } be a Cauchy sequence in R((t)). Then {an } is bounded; there is M ∈ Z
such that for all n ∈ Z+ we may write



an = bm,n t m .
m=M

The Cauchy condition also implies that for every m ≥ M, the real sequence
∞ {bm,n }∞
n=1
is eventually constant, say with value bm , and thus {a
n } converges to
m
m=M bm t . Thus
R((t)) is sequentially complete. Similarly, a series ∞ n=1 an is convergent if and only
if an → 0, so absolute convergence is equivalent to convergence in R((t)).
An ordered field F is Archimedean if and only if there is a homomorphism
ι : (F, <) → (R, <) [1, Thm. 3.5], i.e., if and only if F is isomorphic to a sub-
field of R with the induced ordering. An ordered field is Dedekind complete if and
only if it is Archimedean and sequentially complete [1, Lemma 3.10, Thm. 3.11]. It
follows that the Archimedean ordered fields that are not (sequentially = Dedekind)
complete are, up to isomorphism, precisely the proper subfields of R.
Now we can state our main result.

Main Theorem. Let F be an ordered field.


a) Suppose F is sequentially complete and Archimedean (so F ∼ = R). Then:
(i) every absolutely convergent series in F is convergent;
(ii) F admits a convergent series that is not absolutely convergent.
b) Suppose F  is sequentially complete and non-Archimedean. Then:
(i) a series ∞ n=1 an is convergent if and only if an → 0. In particular,
(ii) a series is absolutely convergent if and only if it is convergent.
c) Suppose F is not sequentially complete. Then:
(i) F admits an absolutely convergent series that is not convergent;
(ii) F admits a convergent series that is not absolutely convergent.

Part c) gives an affirmative answer to Question 3 of [2].

Part a) of the main theorem is familiar from calculus. Part (i) has already been recalled,

and for part (ii) we need only exhibit the alternating harmonic series ∞ n=1
(−1)n+1
n
. We
have included part a) to facilitate comparison with the other cases.
A natural next step is to finish off the case of Archimedean ordered fields by estab-
lishing part c) for proper subfields of R. We do so in §2, using arguments that could be
presented in an undergraduate honors calculus/real analysis course. (This turns out to

910

c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121


be logically superfluous; later we will prove part c) of the main theorem for all ordered
fields. But it still seems like an agreeable way to begin.)
For the remainder of the main theorem we need some techniques for constructing
sequences in ordered fields. In §3 we examine and classify ordered fields with respect
to the existence of sequences of various types. For instance, there are ordered fields
in
∞ which every convergent sequence is eventually constant. In such a field, a series
n=1 an is convergent if and only if an = 0 for all sufficiently large n if and only if

n=1 |an | is convergent. Thus, in order for the main theorem to hold in this case, such
a field must be sequentially complete. We will show that every convergent sequence
is eventually constant if and only if every Cauchy sequence is eventually constant. In
fact we give (Theorem 6) four other conditions equivalent to “there is a convergent
sequence that is not eventually constant,” including a characterization in terms of the
associated topology and a characterization in terms of the cofinality of the underlying
ordered set.
We prove parts b) and c) of the main theorem in §4.

2. SUBFIELDS OF R.
∞
Theorem 1. Let F  R be a proper subfield. There is a series n=1 an with terms in
F that is absolutely convergent but not convergent.

Proof. Since F is a proper subfield of R, we may choose x ∈ [−1, 1] \ F. We ∞claim


there is a sign sequence {sn }∞
n=1 —i.e., sn ∈ {±1} for all n—such that x =
sn
n=1 2n .
 N sn  N sn
Indeed, for N ≥ 0, take s N +1 to be 1 if n=1 2n < x and −1 if n=1 2n > x. Then
 N sn
we see inductively that | n=1 − x| ≤ 2−N , which implies convergence. In fact, the
2n  N sn
sign sequence {sn } is uniquely determined; if we take s N +1 = −1 when n=1 <x
 N sn 2n
or S N +1 = 1 when n=1 2n > x, then we get
 N +1   ∞ 
 s  
∞   s 
   
− x  > 2−N −1 = 2−n ≥ 
n n
 ,
 2n   2 
n
n=1 n=N +2 n=N +2


and we have made an irrevocable mistake! The series ∞ sn
n=1 2n has terms in F and is

not convergent in F, but the associated absolute series is ∞n=1 2n = 1 ∈ F.
1

The proof of Theorem 1 is reminiscent of that of the Riemann rearrangement theo-


rem, which states that a real series ∞ a
n=1 n with a n → 0 and ∞
n=1 |an | = ∞ can be
rearranged so as to converge to any L ∈ R. The following result is a variant in which,
instead of permuting the terms of a series, we change their signs.

Theorem 2. Let {an }∞ n=1 be a positive real sequence with an → 0 and ∞ n=1 an = ∞.
For L ∈ R, there is a sign sequence sn ∈ {±1} such that ∞ n=1 sn a n = L.

Proof. We may assume L ∈ [0, ∞). Let N1 be the least positive integer such that
 N1
n=1 an > L, and put s1 = · · · = s N1 = 1. Let N2 be the least integer greater than
 N2
N1
N1 such that n=1 sn an − n=N a < L, and put s N1 +1 = · · · = s N2 = −1. We
1 +1 n
continue in this manner, taking just enough terms of constant sign to place the partial
sum on the opposite side of L as the previous partial sum. The condition ∞ n=1 an = ∞
ensuresthis is well-defined, and the condition an → 0 guarantees that the resulting
series ∞ n=1 sn an converges to L.

December 2014] ABSOLUTE CONVERGENCE 911


∞ (−1)n+1
The series n=1 is defined in any subfield F ⊂ R and is not absolutely conver-
∞ (−1)n+1
n
gent. However, since n=1 n = log 2 ∈ R \ Q, this series need not be convergent
in F. Using Theorem 2 we can fix this with a different choice of signs.

Corollary 3. Let F ⊆ R be a subfield. Then there is a series in F that is convergent


but not absolutely convergent.

Proof. Apply Theorem 2 with an = 1


n
for all n ∈ Z+ , and L = 1.

The constructions of this section are intended to complement those of [2]. In particular,
the proof of Theorem 1 answers their Question 1 for b = 2.

3. BUILDING SEQUENCES AND SERIES IN ORDERED FIELDS. We begin


by carrying over two results from calculus to the context of ordered fields.

Lemma 4. Let (X, <) be a totally ordered set, and let {xn }∞
n=1 be a sequence in X .
Then at least one of the following holds:

(i) the sequence {xn }∞


n=1 admits a constant subsequence;
(ii) the sequence {xn }∞
n=1 admits a strictly increasing subsequence;
(iii) the sequence {xn }∞
n=1 admits a strictly decreasing subsequence.

Proof. If the image of the sequence is finite, we may extract a constant subse-
quence. So assume the image is infinite. By passing to a subsequence we may assume
n → xn is injective. We say m ∈ Z+ is a peak of the sequence if for all n > m we have
xn < xm . If there are infinitely many peaks, the sequence of peaks forms a strictly de-
creasing subsequence. So suppose there are only finitely many peaks and thus there
is N ∈ Z+ such that no n ≥ N is a peak. Let n 1 = N . Since n 1 is not a peak, there
is n 2 > n 1 with xn2 > xn1 . Since n 2 is not a peak, there is n 3 > n 2 with xn3 > xn2 .
Continuing in this way we build a strictly increasing subsequence.

Lemma 5. Let {an }∞ ∞


n=1 be a Cauchy sequence in the ordered field F. If {an }n=1 admits
a convergent subsequence, then it converges.

Proof. Let {ank }∞


k=1 be a subsequence converging to L ∈ F. For ε > 0, choose K ∈
Z+ such that |am − an | < 2ε for all m, n ≥ K and |ank − L| < 2ε for all k ≥ K . Put
n K ≥ K ; then |an − L| ≤ |an − an K | + |an K − L| < ε for all n ≥ K .

Now we introduce an invariant of a linearly ordered set that plays an important role
in the theory of ordered fields. A subset S of a linearly ordered set X is cofinal if for
all x ∈ X there is y ∈ S with x ≤ y. The cofinality of X is the least cardinality of a
cofinal subset. An ordered field is Archimedean if and only if Z+ is a cofinal subset, so
Archimedean ordered fields have countable cofinality. The subset {t −n }∞ n=1 of R((t)) is
countable and cofinal, so R((t)) is non-Archimedean of countable cofinality. For any
infinite cardinal κ there is an ordered field of cofinality κ [4, Cor. 2.7].
A Z-sequence in F is a sequence {an }∞ ∞an > 0 for all n and an → 0. A
n=1 with
ZC-sequence is a Z-sequence {an }∞ n=1 such that n=1 an converges.

912

c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121


Theorem 6. For an ordered field F, the following are equivalent:

(0) F is first countable (every point admits a countable base of neighborhoods);


(i) F has countable cofinality;
(ii) F admits a convergent sequence that is not eventually constant;
(iii) F admits a ZC-sequence;
(iv) F admits a Z-sequence;
(v) F admits a Cauchy sequence that is not eventually constant.

Proof. (0) =⇒ (i): Let {Un }∞ n=1 be a countable neighborhood  base at 0. For n ∈ Z ,
+
+
choose εn > 0 such that (−εn , εn ) ⊂ Un . Then εn | n ∈ Z is cofinal.
1

(i) =⇒ (o): Let {sn }∞ n=1 be a cofinal sequence of positive elements. Let εn = sn and
1

Un = (−εn , εn ). Then {Un }n=1 is a countable base at zero, and thus for all x ∈ F, the
collection {Un + x}∞ n=1 is a countable base at x. Thus F is first countable.
(i) =⇒ (ii): Let S = {sn }∞ n=1 be a cofinal sequence. Put a1 = max(1, s1 ). Having de-
fined an , put an+1 = max(an + 1, sn+1 ). Then {an }∞ n=1 is a strictly increasing sequence
whose set of terms is a cofinal subset. The sequence {an−1 }∞ n=1 converges to 0 and is not
eventually constant.
(ii) =⇒ (iii): Let {xn }∞ n=1 be a sequence that is convergent and is not eventually con-
stant; its set of terms must then be infinite. By Lemma 4, {xn }∞ n=1 has a subsequence
that is either strictly increasing or strictly decreasing; by changing the signs of all
terms if necessary and adding a constant we get a strictly increasing convergent se-
quence 0 < S1 < S2 < · · · . Put S0 = 0, and for n ∈ Z+ , put an = Sn − Sn−1 . Then
{an }∞
n=1 is a ZC-sequence.
(iii) =⇒ (iv) =⇒ (v) is immediate.
(v) =⇒ (i): If {an }∞ n=1 is a Cauchy sequence and not eventually constant, then

  
1 
 m, n ∈ Z+ , am = an
|am − an | 

is countable and cofinal. Let α > 0 in F. There is an N ∈ Z+ such that for all
m, n ≥ N , we have |am − an | ≤ α1 . Since the sequence is not eventually constant,
there are m, n ≥ N with am = an , and then α < |am −a
1
n|
.

Theorem 6 is a key step towards establishing the main theorem. For starters, it
disposes of the case of ordered fields of uncountable cofinality; by (i) ⇐⇒ (v),
 must be sequentially complete. Moreover, by (i) ⇐⇒ (ii) an infinite
such fields
series ∞ n=1 an converges if and only if an = 0 for all sufficiently large n if and
only if an → 0. And in the case of countable cofinality it gives us some useful se-
quences.
Consider the sequence {t n }∞ +
n=1 in R((t)). It converges to 0, and for all n ∈ Z we
n ∞
have t n+1
 t . One can use the sequence {t }n=1 to “test for convergence”; a sequence
n

{an }∞
n=1 in R((t)) converges if and only if for each N ∈ Z+ we have |an | ≤ t N for all
sufficiently large n.
Here is a useful generalization to arbitrary ordered fields. A test sequence is a Z-
sequence {εn }∞ +
n=1 such that εn+1  εn for all n ∈ Z .

December 2014] ABSOLUTE CONVERGENCE 913


Proposition 7. For an ordered field F, the following are equivalent:
(i) F admits a test sequence;
(ii) F is non-Archimedean of countable cofinality.

Proof. (i) =⇒ (ii): The existence of 0 < ε2  ε1 shows F is non-Archimedean. The


implication (iv) =⇒ (i) of Theorem 6 shows that F has countable cofinality.
(ii) =⇒ (i): Let S = {sn }∞ n=1 ⊂ F be countable and cofinal. Since F is non-
Archimedean, there is x1 ∈ S with 1  x1 and s1 ≤ x1 . Then x1  x12 , and by
cofinality there is x2 ∈ S such that x2 ≥ max(x12 , s2 ). Continuing in this manner
we get a sequence {xn }∞ −1
n=1 . Taking εn = x n gives a test sequence.

4. PROOF OF THE MAIN THEOREM. We will now prove part b) of the main
theorem.
Let F be a sequentially complete non-Archimedean field. First we must show that
a series ∞ n=1 an in F is convergent if and only if an → 0. That a convergent series
has an → 0 follows (as usual) from the fact that a convergent sequence is a Cauchy se-
quence. Suppose an →  0. If F has uncountable cofinality, then by the remarks follow-
ing Theorem 6, a series ∞ n=1 an converges if and only if an = 0 for all large enough n,
hence if and only if an → 0. Now suppose F has countable cofinality. By Proposition
7, there is a test sequence {εn }∞ +
n=1 . For k ∈ Z , choose Nk such that for all n ≥ Nk , we
have |an | ≤ εk+1 . Then for all n ≥ Nk and  ≥ 0, we have

|an + an+1 + · · · + an+ | ≤ |an | + · · · + |an+ | ≤ ( + 1)εk+1 < εk .

Thus the sequence is a Cauchy sequence, and hence convergent because F is sequen-
tially complete. The fact that a series in F is convergent if and only if it is absolutely
convergent follows immediately, since an → 0 if and only if |an | → 0.
Before proving part c) of the main theorem, we want to build one more type of

sequence. A ZD-sequence is a Z-sequence {an }∞ n=1 such that n=1 an diverges.

Lemma 8. For an ordered field F, the following are equivalent:


(i) F admits a ZD-sequence;
(ii) item F is Archimedean or is not sequentially complete.

Proof. (i) =⇒ (ii): Let {an }∞ n=1 be a ZD-sequence in F. By part b) of the main theo-
rem, F cannot be non-Archimedean and  sequentially
∞ complete.
(ii) =⇒ (i): If F is Archimedean, n1 n=1 is a ZD-sequence. Suppose F is non-
Archimedean and not sequentially complete, and let {an }∞ n=1 be a divergent Cauchy
sequence. By Lemmas 4 and 5, after passing to a subsequence and possibly changing
the sign, we get a strictly increasing, divergent Cauchy sequence {Sn }∞n=1 . Put S0 = 0,
and for n ∈ Z+ , put an = Sn − Sn−1 . Then {an }∞n=1 is a ZD-sequence.
Finally we can prove both assertions of part c) of the main theorem.

Theorem 9. For an ordered field F, the following are equivalent:


(i) F is sequentially complete;
(ii) Every absolutely convergent series in F converges.

Proof. (i) =⇒ (ii): This was proved at the beginning of §1.


¬ (i) =⇒ ¬ (ii): Let {an }∞ n=1 be a divergent Cauchy sequence in F. By Theo-
rem 6 there is a ZC-sequence {ck }∞ ∞
k=1 . Since {an }n=1 is a Cauchy sequence, there is

914

c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121


a strictly increasing sequence of integers {n k }∞ k=1 such that for all n ≥ n k we have
|an − ank | < ck . It follows that |ank+1 − ank | < ck for all k. By Lemma 5, {ank }∞
k=1 is
divergent hence so is {ank − an1 }∞ k=1 . For k ∈ Z +
, put

ank+1 − ank + ck
d2k−1 = , and
2
an − ank − ck
d2k = k+1 .
2
Then


k 
k

(d2i−1 + d2i ) = ani+1 − ani = ank+1 − an1 .


i=1 i=1

This is a divergent
∞ subsequence of the sequence of partial sums associated to {dk }∞
k=1 ,
and hence k=1 dk diverges. Since −ck < ank+1 − ank < ck , we have

|d2k−1 | + |d2k | = d2k−1 − d2k = ck .

∞  ∞
Hence k=1 |dk | = k ck is convergent, i.e., k=1 dk is absolutely convergent.

Theorem 10. Let F be an ordered field that is not sequentially complete. Then F
admits a convergent series that is not absolutely convergent.

Proof. By Lemma 8, F admits a ZD-sequence {dn }∞ +


n=1 . For n ∈ Z , put

dn −dn
a2n−1 = , a2n = .
2 2

Then for all n ∈ Z+ we have



n
d n2 
0≤ ak ≤ ,
k=1
2
∞ ∞ ∞
so n=1 an converges (to 0). But n=1 |an | = n=1 dn diverges.

ACKNOWLEDGMENTS. We thank Paul Pollack and the referees for several comments that led to im-
provements of the exposition. The authors were partially supported by National Science Foundation grant
DMS-0701771.

REFERENCES

1. J. F. Hall, Completeness of Ordered Fields (2011), http://arxiv.org/pdf/1101.5652v1.pdf.


2. R. Kantrowitz, M. Schramm, Series that converge absolutely but don’t converge, College Math. J. 43
(2012) 331–333, http://dx.doi.org/10.4169/college.math.j.43.4.331.
3. J. Propp, Real analysis in reverse, Amer. Math. Monthly 120 (2013) 392–408, http://dx.doi.org/10.
4169/amer.math.monthly.120.05.392.

December 2014] ABSOLUTE CONVERGENCE 915


4. J. H. Schmerl, Models of Peano arithmetic and a question of Sikorski on ordered fields, Israel J. Math. 50
no. 1–2 (1985) 145–159, http://dx.doi.org/10.1007/bf02761121.
5. H. Teismann, Toward a more complete list of completeness axioms, Amer. Math. Monthly 120 (2013)
99–114, http://dx.doi.org/10.4169/amer.math.monthly.120.02.099.

PETE L. CLARK is an associate professor at the University of Georgia. His primary research interests lie in
arithmetic geometry and number theory.
University of Georgia, Athens, GA 30606
pete@math.uga.edu

NIELS J. DIEPEVEEN is a private citizen of the Netherlands. He studies topology and related topics.
niels@dv1.demon.nl

916

c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121


NOTES
Edited by Sergei Tabachnikov

An Unnoticed Consequence of Szegö’s


Distribution Theorem
William F. Trench


Abstract. Suppose f = k=1 w f  , where w1 , w2 , . . . , wk are positive constants,
k
=1 w = 1, and f 1 , f 2 , . . . , f k are real-valued and Riemann integrable functions on [−π, π ].
Let λ() () ()
1n ≤ λ2n ≤ · · · ≤ λnn be the eigenvalues of the Hermitian Toeplitz matrices Tn
()
() n () π −ir x
= [tr −s ]r,s=1 , where
 tr = 2π −π e
1
f  (x) d x , and let λ1n ≤ λ2n ≤ · · · ≤ λnn be the eigen-
values of Tn = n=1 w Tn() . We give conditions implying that

1
n
lim |λr n − w1 λr(1) (2) (k)
n − w2 λr n − · · · − wk λr n | = 0.
n→∞ n r =1

A Toeplitz matrix has the form T = [tr −s ]r,s=1


n
. Let f 1 , f 2 , . . . , f k be real-valued and
Riemann integrable on [−π, π], and consider the Hermitian Toeplitz matrices
 π
1
Tn() = [tr() ]n
−s r,s=1 , where tr
()
= e−ir x f  (x) d x for r = 0, ±1, ±2, . . .
2π −π
and 1 ≤  ≤ k. In standard terminology, f  is the symbol of the family {Tn() }∞
n=1 . Let

Tn = w1 Tn(1) + w2 Tn(2) + · · · + wk Tn(k) ,


where w1 , w2 , . . . , wk are real numbers. Thus, {Tn }∞
n=1 is a family of Hermitian Toeplitz
matrices and
f = w1 f 1 + w2 f 2 + · · · + wk f k (1)
is its symbol.
Although there is in general no apparent relationship between the eigenvalues
λ() () ()
1n ≤ λ2n ≤ · · · ≤ λnn (2)
of Tn() , 1 ≤  ≤ k, and the eigenvalues
λ1n ≤ λ2n ≤ · · · ≤ λnn , (3)
of Tn , we will show that if
w > 0 for 1 ≤  ≤ k, w1 + w2 + · · · + wk = 1, (4)

and
μr n = w1 λr(1) (2) (k)
n + w2 λr n + · · · + wk λr n , (5)

http://dx.doi.org/10.4169/amer.math.monthly.121.10.917
MSC: Primary 15A18, Secondary 15A51; 15B05

December 2014] NOTES 917


then

1
n
lim |λr n − μr n | = 0 (6)
n→∞ n
r =1

under suitable conditions on f 1 , f 2 , . . . , f k . Thus, a convex combination of the eigen-


values of Tn(1) , Tn(2) , . . . , Tn(k) approximates the eigenvalues of the same convex combi-
nation of Tn(1) , Tn(2) , . . . , Tn(k) sufficiently well that the average absolute error tends to
zero as n → ∞.
The motivation for the conditions on f 1 , f 2 , . . . , f k begins with Szegö’s distribution
theorem. Suppose h is real-valued and Riemann integrable on [−π, π],

−∞ < a ≤ h(x) ≤ b < ∞ for − π ≤ x ≤ π,

and the sets

L  = {x ∈ [−π, π] : a < h(x) < a + }

and

U = {x ∈ [−π, π] : b −  < h(x) < b}

have positive Lebesgue measure for all  > 0; thus, a and b are the essential lower
and upper bounds of h. We say that x0 is an essential minimum point of h if N (x0 ) ∩
[−π, π] ∩ L  has positive measure for every neighborhood N (x0 ) of x0 and every
 > 0, or an essential maximum point if N (x0 ) ∩ [−π, π] ∩ U has positive measure
for every N (x0 ) and  > 0.
Let

γ1n ≤ γ2n ≤ · · · ≤ γnn

be the eigenvalues of the Hermitian Toeplitz matrix


 π
1
Hn = [h r −s ]r,s=1
n
, where h r = e−ir x h(x) d x for r = 0, ±1, ±2, . . . .
2π −π

From Szegö’s distribution theorem [1, p. 62],

a ≤ γ1n ≤ γ2n ≤ · · · ≤ γnn ≤ b for n ≥ 1,

lim γ1n = a, lim γnn = b,


n→∞ n→∞

and
 π
1
n
1
lim G(γr n ) = G(h(x)) d x for all G ∈ C[a, b].
n→∞ n 2π −π
r =1

Given Szegö’s result and f as in (1), we assume that f 1 , f 2 , . . . , f k have the same
essential minimum and maximum a and b, so (1) and (4) imply that a ≤ f (x) ≤ b for

918 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
all x in [−π, π]. The essential minimum and maximum of f are a and b if and only
f 1 , f 2 , . . . , f k have at least one essential minimum point and one essential maximum
point in common, which we also assume.
Our final assumption stems from the following lemma, first stated in more general
form in [2]. See [3] and [4] for expository discussions of this special case.

Lemma 1. If

−∞ < a ≤ α1n ≤ α2n ≤ · · · ≤ αnn ≤ b < ∞

and

a ≤ β1n ≤ β2n ≤ · · · ≤ βnn ≤ b for n ≥ 1,

then the following statements are equivalent:

1

lim (G(αr n ) − G(βr n )) = 0 for all G ∈ C[a, b];
n→∞ n
n=1

1
n
lim |αr n − βr n | = 0;
n→∞ n
r =1

1

lim |G(αr n ) − G(βr n )| = 0 for all G ∈ C[a, b].
n→∞ n
n=1

For reasons that will soon become clear, Lemma 1 leads us to assume that

( f  (x) − f  (y))( f k (x) − f k (y)) ≥ 0 for a ≤ x, y ≤ b and 1 ≤ k,  ≤ m. (7)

Now we can prove (6). From Szegö’s theorem,


 π
1
n
1
lim G(λr()n ) = G( f  (x)) d x for all G ∈ C[a, b] (8)
n→∞ n 2π −π
r =1

and 1 ≤  ≤ k, and
 π
1
n
1
lim G(λr n ) = G( f (x)) d x for all G ∈ C[a, b]. (9)
n→∞ n 2π −π
r =1

Let
 
2(r − 1)
xr n = −1 π for 1 ≤ r ≤ n.
n

From the definition of the Riemann integral,


 π
1
n
1
lim G( f  (xr n )) = G( f  (x)) d x for all G ∈ C[a, b], (10)
n→∞ n 2π −π
r =1

December 2014] NOTES 919


where 1 ≤  ≤ k and
 π
1
n
1
lim G( f (xr n )) = G( f (x)) d x for all G ∈ C[a, b]. (11)
n→∞ n 2π −π
r =1

From (7), for each n ≥ 2 there is a single permutation σn of {1, 2, . . . , n} such that

f  (xσn (1),n ) ≤ f  (xσn (2),n ) ≤ · · · ≤ f  (xσn (n),n ) for 1 ≤  ≤ m. (12)

This, (1), and (4) imply that

f (xσn (1),n ) ≤ f (xσn (2),n ) ≤ · · · ≤ f (xσn (n),n ). (13)

Rearranging the terms on the left sides of (10) and (11) yields

1   1 π
n
lim G f  xσn (r ),n = G( f  (x)) d x for all G ∈ C[a, b], (14)
n→∞ n
r =1
π −π

where 1 ≤  ≤ k, and

1   1 π
n
lim G f xσn (r ),n = G( f (x)) d x for all G ∈ C[a, b]. (15)
n→∞ n
r =1
π −π

From (8) and (14),

1    ()  
n
lim G λr n − G f  xσn (r ),n = 0 for all G ∈ C[a, b],
n→∞ n
r =1

where 1 ≤  ≤ k, so (2), (12), and Lemma 1 imply that

1 

() 

n
lim λr n − f  xσn (r ),n
= 0 for 1 ≤  ≤ k. (16)
n→∞ n
r =1

From (9) and (15),

1
n
lim (G(λr n ) − G( f (xσn (r ),n ))) = 0 for all G ∈ C[a, b],
n→∞ n
r =1

so (3), (13), and Lemma 1 imply that

1
m
lim |λr n − f (xσn (r ),n )| = 0. (17)
n→∞ n
r =1

From (1), (4), and (5),


k
μr n − f (xσn (r ),n )) = w (λr()n − f  (xσn (r ),n )),
=1

920 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
so (16) implies that

1 

n
lim |μr n − f xσn (r ),n )
= 0.
n→∞ n
r =1

This and (17) imply (6), which completes the proof.


Incidentally, (6) and Lemma 1 imply that

1
n
lim |G(λr n − G(μr n )| = 0 for all G ∈ C[a, b].
n→∞ n
r =1

It is important to note that this paper is about a very special case of Szegö’s dis-
tribution theorem, which—in its original formulation—is about families of Hermitian
Toeplitz matrices Tn = [tr −s ]r,s=1
n
, where {tr }r∞=−∞ are the Fourier coefficients of a real-
valued (not necessarily bounded) Lebesgue integrable function on [−π, π]. This theo-
rem and its numerous extensions comprise part of the core of modern operator theory.
Although the arguments in this paper use the properties of the Riemann integral in an
essential way, it seems reasonable to hope that more knowledgeable investigators will
be motivated to extend our result to a more general setting.

REFERENCES

1. U. Grenander, G. Szegö, Toeplitz Forms and Their Applications, Univ. of California Press, Berkeley and
Los Angeles, 1958.
2. W. F. Trench, Absolute equal distribution of families of finite sets, Linear Algebra Appl. 367 (2003) 131–
146, http://dx.doi.org/10.1016/s0024-3795(02)00597-9.
3. , Simplification and strengthening of Weyl’s definition of equal distribution of two families of
finite sets, Cubo. A Mathematical Journal 06 no. 3 (2004) 47–54.
4. , An elementary view of Weyl’s theory of equal distribution, Amer. Math. Monthly 119 no. 10
(2012) 852–861, http://dx.doi.org/10.4169/amer.math.monthly.119.10.852.

Mathematics Department, Trinity University, San Antonio, TX 78212


wtrench@trinity.edu

100 Years ago this Month in The American Mathematical Monthly


Edited by Vadim Ponomarenko
A large volume entitled “List of Prime Numbers from 1 to 10,006,721” by Pro-
fessor D. N. Lehmer, University of California, was recently published by the
Carnegie Institution of Washington. The last four digits of the primes are given
in the body of the table and the remaining digits are at the top and bottom of the
columns. Five thousand primes are listed on each page, and the table gives the
rank of any prime in the series of primes. The same precautions were taken in
printing these tables as were observed in printing the factor tables for the first
ten million numbers, which were prepared in 1909 by the same author under the
auspices of the Carnegie Institution. By the publication of these tables Professor
Lehmer has rendered an important service to mathematics, in view of their great
accuracy and convenient arrangement.
—Excerpted from “Notes and News” 21 (1914), 345.

December 2014] NOTES 921


Coset Intersection Graphs for Groups
Jack Button, Maurice Chiodo, and Mariano Zeron-Medina Laris

Abstract. Let H, K be subgroups of G. We show that the intersection properties of left cosets
of H and right cosets of K exhibit considerable symmetry, allowing us to prove a generaliza-
tion of Hall’s theorem for transversals.

If H and K are subgroups of G, then G can be partitioned as the disjoint union of all
left cosets of H , as well as the disjoint union of all right cosets of K . But how do these
two partitions of G intersect each other?

Definition 1. Let H be a subgroup of G (written H < G). A left transversal for H in


G is a choice of exactly one representative from each left coset of H . Denote such a
collection by {li }i∈I . A right transversal for H in G is defined in an analogous fashion.
A left-right transversal for H is a set S that is simultaneously a left transversal, and a
right transversal, for H in G.

A useful tool for studying the way left and right cosets interact, and obtaining
transversals, is the coset intersection graph that we introduce as follows.

Definition 2. Let H, K < G. We define the coset intersection graph  H,K G


to be a
graph with vertex set consisting of all left cosets of H ({li H }i∈I ) together with all
right cosets of K ({K r j } j∈J ), where I , J are index sets. If a left coset of H and right
coset of K correspond, they are still included twice. Edges (undirected) are included
whenever any two of these cosets intersect, and an edge a H − K b corresponds to the
nonempty set a H ∩ K b.

Observing that left (respectively, right) cosets do not intersect, we see that  H,K
G
is
a bipartite graph, split between {li H }i∈I and {K r j } j∈J .
For H a subgroup of a finite group G, the existence of a left-right transversal is well
known, sometimes presented as the following application of Hall’s marriage theorem
[4]. A set of k left cosets of H contains a total of k|H | elements, and these cannot fit
into fewer than k right cosets of H . It follows that every set of left cosets of H meets
at least as many right cosets of H . Hence, by Hall’s theorem there is a matching on the
bipartite graph  H,H
G
and thus a left-right transversal by taking one element from each
edge in this matching.
This paper shows that in fact a much stronger result is true; we can completely
describe the way that left and right cosets of H intersect, without any need for Hall’s
theorem but instead by studying and applying the properties of the coset intersection
graph. We begin this now.

Theorem 3.  H,K
G
is always a disjoint union of complete bipartite graphs.

Proof. We first show that for a, b, c, d ∈ G if a H − K b − cH − K d is a path in  H,K G


,
then there is an edge a H − K d. Note that there exist h 1 , h 2 , h 3 ∈ H and k1 , k2 , k3 ∈
http://dx.doi.org/10.4169/amer.math.monthly.121.10.922
MSC: Primary 20E99

922 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
K such that ah 1 = k1 b, k2 b = ch 2 , ch 3 = k3 d. Rearranging gives c = k3 dh −1 3 , so
b = k2−1 k3 dh −1
3 h 2 , so a = k k
1 2
−1
k 3 dh −1
3 h h
2 1
−1
and thus ah h
1 2
−1
h 3 = k k
1 2
−1
k 3 d. Hence,
a H − K d as required.
Take any li H and some K r j in the connected component of li H in  H,K G
; there must
be at least one such K r j . We show li H and K r j are connected by an edge. For if not,
then there must be at least one finite path of length > 1 connecting them; take a min-
imal such path γ from li H to K r j . Then γ begins with li H − K a − bH − K c − . . .,
where K a = K r j . But by the previous remark, li H and K c must be joined by an edge,
contradicting the minimality of γ . So li H and K r j are joined by an edge for every K r j
in the connected component of li H .

When considering subgroups of finite size, the graph  H,K


G
exhibits greater symme-
try. Denote the complete bipartite graph on (s, t) vertices by Ks,t .

Theorem 4. Let H, K < G. Suppose that |H | = m and |K | = n. Then the graph  H,K G

is a collection of disjoint, finite, complete bipartite graphs, where each component is


of the form Ksi ,ti with si /ti = n/m.

Proof. Take a connected component of  H,K G


. From Theorem 3, this must be finite as
|H | and |K | are finite; a (finite) left coset of H cannot intersect infinitely many (finite)
right cosets of K and vice-versa. So this component is isomorphic to Ks,t , with vertices
given by s left cosets of H and t right cosets of K . Thus, in G, the disjoint union of
these s left cosets must be set-wise equal to the disjoint union of these t right cosets.
So s|H | = t|K |, and hence, s/t = n/m.

Corollary 5. Let G be a group, and H, K < G. Suppose that |H | = m and |K | = n,


where m ≥ n. Then there exists a set T ⊆ G that is a left transversal for H in G and
that can be extended to a right transversal for K in G. If H = K in G, then T becomes
a left-right transversal for H .

Proof. For each complete bipartite component Ksi ,ti of  H,K G


, choose a maximum
matching. That is, a matching that uses the maximum number of vertices. Such a
matching will necessarily use all si vertices from the H -side of Ksi ,ti and si vertices
from the K -side. Choose an element from each edge in this matching, recalling that an
edge a H − K b corresponds to the nonempty subset a H ∩ K b. Call the collection of
these chosen elements T . Then T contains precisely one element from each left coset
of H , making T a left transversal for H . Also, no two elements of T lie in the same
right coset of K , thus T extends to a right transversal for K .

Note that with this we have proved the existence of a left-right transversal for H a
finite subgroup of G without the use of Hall’s theorem.
Under the hypothesis of Theorem 4, we see that sets of si left cosets of H com-
pletely intersect sets of ti right cosets of K , with si /ti constant over i. With this in mind,
another way of visualizing  H,K G
is by the following simultaneous double-partitioning
of G; draw left cosets of H as columns and right cosets of K as rows, partitioning G
into irregular “chessboards” denoted Ci , each with edge ratio n : m. Each chessboard
Ci corresponds to the connected component Ksi ,ti of  H,K G
, and individual tiles in Ci
correspond to the nonempty intersection of a left coset of H and a right coset of K
(i.e., edges in Ksi ,ti ). Corollary 5 would then follow by choosing one element from
each tile on the leading diagonals of the Ci ’s. We give an example of this here.

December 2014] NOTES 923


Example. Take the finite group G := S4 and the subgroup H := (1 2 3) ∼
= C3 . Then
G has the following H -coset decompositions:

left cosets: {eH, (1 2)H, (1 4)H, (2 4)H, (3 4)H, (1 2 4)H, (1 3 4)H, (1 4 2)H },
right cosets: {H e, H (1 2), H (1 4), H (2 4), H (3 4), H (1 4 2), H (1 4 3), H (1 2 4)}.

Note that, for convenience, the right transversal for H was chosen by taking the inverse
of the elements in the left transversal for H ; this can always be done.
The coset intersection graph  H,H
G
, along with its corresponding chessboards, can
be seen in Figure 1 below. From these, we can easily read off a left-right transversal
for H in G, for example:

{e, (1 2), (1 4), (2 4), (3 4), (1 3)(2 4), (1 2)(3 4), (1 4)(2 3)}.

Figure 1. Coset intersection graph CS43 ,C3 and corresponding chessboards

We now compute the sizes of the complete bipartite components of  H,K G


. Recall
that if H < G, then the
 core of H , core(H ), is the intersection of all conjugates of
H in G: core(H ) := g∈G g −1 H g. It is contained in H , normal in G, and will be of
finite index in G whenever H is. We denote the index of H in G by |G : H |.

Proposition 6. Let H, K < G and g ∈ G, with H and K both finite. Then the number
of right cosets of K intersecting g H , denoted Mg , is given by

|H |
Mg = .
|g H g −1 ∩ K |

A symmetric result applies for the number of left cosets of H intersecting K g.

Proof. Let N := core(H ∩ K ). We show that if g H ∩ K a = ∅ for some a ∈ G, then


the number of cosets of N in g H ∩ K a is the same as the number in g H g −1 ∩ K ,
independent of a, and this number will be finite. So, as g H ∩ K a = ∅, we must have
gh = ka for some h ∈ H and k ∈ K . As N is normal, we have that the number of
cosets of N in g H ∩ K a is the same as the number in g H a −1 ∩ K = g H h −1 g −1 k ∩

924 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
K = g H g −1 k ∩ K , which is the same as the number in g H g −1 ∩ K k −1 = g H g −1 ∩
K . Observe that this number will be |g H g −1 ∩ K : N |. This immediately gives

(number of cosets of N in g H ) = (number of cosets of N in g H g −1 ∩ K )·Mg .


|H :N |
Thus, Mg = |g H g −1 ∩K :N |
, and the proposition now follows.

By taking the quotient of G by core(H ∩ K ), our previous results can be strength-


ened. Instead of finite subgroups, we consider finite index subgroups. Recall that the
intersection of two finite index subgroups is again finite index.

Theorem 7. Let H, K < G. Suppose that |G : H | = n and |G : K | = m. Then the


graph  H,K
G
is a collection of disjoint, finite, complete bipartite graphs, where each
component is of the form Ksi ,ti with si /ti = n/m.

Proof. Take core(H ∩ K ), which has finite index in G. Suppose |G : core(H ∩ K )|


= l. Consider the finite quotient G/ core(H ∩ K ). Set H and K to be H/ core(H ∩
K ) and K / core(H ∩ K ), respectively. Since |G : core(H ∩ K )| = |G : H | · |H :
core(H ∩ K )|, we have that |H | = l/n. Similarly, |K | = l/m. Now apply Theo-
rem 4 to H , K < G/ core(H ∩ K ), observing that a H ∩ K b = ∅ if and only if
a H ∩ K b = ∅, for any a, b ∈ G.

Corollary 8. Let H, K < G. Suppose that |G : H | = n and |G : K | = m, where


m ≥ n. Then there exists a set T ⊆ G that is a left transversal for H in G and that
can be extended to a right transversal for K in G. If H = K in G, then T becomes a
left-right transversal for H .

Proof. Consider the finite quotient G/ core(H ∩ K ) and set H and K to be H/ core
(H ∩ K ) and K / core(H ∩ K ), respectively. Apply Corollary 5 to H , K < G/ core
(H ∩ K ), forming a left transversal T of H that extends to a right transversal of K .
Now choose one preimage of each element of T from G and call this set T , which has
all the desired properties.

Proposition 9. Let H, K < G and g ∈ G, with H and K both of finite index. Then
the number of right cosets of K intersecting g H , denoted Mg , is given by

|G : g H g −1 ∩ K |
Mg = .
|G : H |

A symmetric result applies for the number of left cosets of H intersecting K g.

|H :N |
Proof. Follow the proof of Proposition 6 again to obtain Mg = |g H g −1 ∩K :N |
(where
N := core(H ∩ K )). The proposition now follows immediately.

All of our results can be derived from the work of Ore [7], who makes use of double
cosets; partitions of G into sets of the form K g H (where H, K < G). Observe that
every right coset of K in the double coset K g H meets every left coset of H in K g H .
To see this, note that a right coset of K in K g H has the form K gh, and a left coset
of H in K g H has form kh H . Thus K gh and kg H have common element kgh. So by

December 2014] NOTES 925


Theorem 3, the (complete bipartite) connected components of  H,K G
correspond exactly
to the double cosets of G; the union of the left (equivalently, right) cosets in the vertex
set of such a component are exactly one double coset of G. From this point of view,
Propositions 6 and 9 reduce to the standard fact about the number of right cosets of
K and left cosets of H that are contained in the double coset K g H , originally due to
Frobenius [5, Ch.II §16 Theorem 6]. The symmetry exhibited by the coset intersection
graph is not immediately obvious from Ore’s use of terminology, and our exposition
is more direct.
A Historical Remark (with contributions from Warren Dicks and Jack Schmidt). The
results in this paper have a somewhat piecemeal historical origin. A weaker version
of Corollary 5, that a subgroup of a finite group always has a left-right transversal,
appeared in 1910 by Miller [6]. In 1913, Chapman [2] proved the same result; he then
realised the existence of the proof by Miller and in 1914 issued a corrigendum [3]. In
1927, Scorza [8] proved Corollary 5 for two separate subgroups H, K but still taking
G to be finite (the first time such a proof used double cosets). By the time of Zassen-
haus’ text [10] in 1937, Corollary 8 was known for finite index subgroups of infinite
groups (the first time such a proof used Hall’s theorem). In 1941 Shü [9] addressed
this problem in a way that leaves us somewhat confused. In 1958, Ore [7] expanded
significantly on such ideas and gives what is to date the most complete treatment of
these, as well as his own historical account.

ACKNOWLEDGMENTS. We would like to thank Imre Leader for suggesting that we prepare this note
for the MONTHLY, the Department of Pure Mathematics and Mathematical Statistics at the University of
Cambridge for hosting us while we carried out this research, and Warren Dicks for showing such a keen
interest in the historical context of this material. The second author was partially supported by the Italian FIRB
“Futuro in Ricerca” project RBFR10DGUA 002 at the University of Milan. The main results in this paper
were first announced by the authors in the extended abstract [1].

REFERENCES

1. J. Button, M. Chiodo, M. Zeron-Medina Laris, Coset intersection graphs, and transversals as generat-
ing sets for finitely generated groups, Edited by J. González-Meneses et al., Extended Abstracts Fall
2012, Trends in Mathematics, Vol. 1, 29–34, Birkhäuser, Basel, 2014, http://dx.doi.org/10.1007/
978-3-319-05488-9_5.
2. H. Chapman, A note on the elementary theory of groups of finite order, Messenger of Math. 42 (1913)
132–134.
3. , On a note on the elementary theory of groups of finite order, Messenger of Math. 43 (1914) 85.
4. P. Hall, On representatives of subsets, J. Lond. Math. Soc. 10 (1935) 26–30.
5. W. Lederman, Introduction to Group Theory. Oliver and Boyd, London, 1973.
6. G. Miller, On a method due to Galois, Quart. J. Math. Oxford Ser. 41 (1910) 382–384.
7. O. Ore, On coset representatives in groups, Proc. Amer. Math. Soc. 9 no. 4 (1958) 665–670.
8. G. Scorza, A proposito di un teorema del Chapman, Boll. Unione Mat. Ital. 6 (1927) 1–6.
9. S. Shü, On the common representative system of residue classes of infinite groups, J. Lond. Math. Soc.
16 (1941) 101–104.
10. H. Zassenhaus, Lehrbuch der Gruppentheorie (German). Teubner, Berlin, 1937.

Selwyn College, University of Cambridge, Grange Road, Cambridge, CB3 9DQ, UK


J.O.Button@dpmms.cam.ac.uk

Mathematics Department, University of Neuchâtel, Rue Emile-Argand 11, Neuchâtel, 2000, Switzerland
maurice.chiodo@unine.ch

31 Mariner’s Way, Cambridge, CB4 1BN, UK


marianozeron@gmail.com

926 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
Complex Descartes Circle Theorem
Sam Northshield

Abstract. We present a short proof of the Descartes circle theorem on the “curvature-centers”
of four mutually tangent circles. Key to the proof is associating an octahedral configuration of
spheres to four mutually tangent circles. We also prove an analog for spheres.

It can be traced back to at least Descartes that four mutually tangent circles have cur-
vatures (reciprocals of radii) satisfying the relation

(a + b + c + d)2 = 2(a 2 + b2 + c2 + d 2 ). (1)

This M ONTHLY has published several papers concerning this fascinating topic: [1, 3,
4, 5, 6]. It was only in 2001 [3] that is was noticed, and proved, that the “curvature-
centers” (curvature times center where the center is considered a complex number)
satisfy the same relation. We present a short proof of this result (Theorem 1) and an
analogous version for spheres (Corollary 1).

For the purposes of this paper, a sphere will always be contained in the half-space
C × [0, ∞) and be tangent to the complex plane. Let S(z, r ) denote the sphere with
radius r tangent to C at z. It is obvious that S(z, r ) and S(w, s) are tangent to each
other if and only if

|z − w|2 = 4r s.

It is also immediate that given any three points z 1 , z 2 , z 3 ∈ C, there are unique numbers
r1 , r2 , r3 such that the spheres S(z i , ri ) are mutually tangent. In particular, if {i, j, k}
= {1, 2, 3} then

|z i − z j | · |z i − z k |
ri = . (2)
2|z j − z k |

We say that two circles are orthogonal if they intersect at right angles; see Figure 1.

Lemma 1. Let C1 , C2 be two orthogonal circles which intersect at w1 , w2 and that


have curvatures c1 , c2 and centers z 1 , z 2 , respectively. Let k1 , k2 be the curvatures of
any two tangent spheres tangent to C at w1 , w2 respectively. Then

4
(a) k1 k2 = = c12 + c22
|w1 − w2 |2

and
4w1 w2
(b) k1 k2 w1 w2 = = c12 z 12 + c22 z 22 .
|w1 − w2 |2
http://dx.doi.org/10.4169/amer.math.monthly.121.10.927
MSC: Primary 52C26

December 2014] NOTES 927


Proof. Let S(w, ρ) be any sphere tangent to both S(w1 , ρ1 ) and S(w2 , ρ2 ), where ρi
= 1/ki for i = 1, 2. Then, by (2),

2|w2 − w| 2|w1 − w| 4
k1 k2 = = .
|w2 − w1 ||w − w1 | |w2 − w1 ||w − w2 | |w1 − w2 |2

The quadrilateral [z 1 , w1 , z 2 , w2 ] in Figure 1 has area represented by both r1r2 and by


|z 1 − z 2 ||w1 − w2 |/2. Hence,

r12 + r22 |z 1 − z 2 |2 4
c12 + c22 = = =
(r1r2 ) 2 |z 1 − z 2 |2 |w1 − w2 |2 /4 |w1 − w2 |2

and so (a) is shown.

W1

r1 r2
a

Z1 Z2
Z

W2

Figure 1. Two orthogonal circles.

Without loss of generality, z 2 − z 1 , i(w2 − w1 ) ∈ R. Let z and a be as labeled in


Figure 1. In particular, let a denote half the distance between w1 and w2 . Then, by
part (a),

a 2 = |w1 − w2 |2 /4 = 1/(k1 k2 ) = 1/(c12 + c22 ). (3)

Referring again to Figure 1,


 
z1 = z − r12 − a 2 , z2 = z + r22 − a 2 , w1 = z + ia, w2 = z − ia, (4)

and so, using c1 = 1/r1 and equation (3),


  
c1 c2
c12 r1 − a = c1 1 − a c1 = √
2 2 2 2
= c2 r22 − a 2 .
2
k1 k2

Hence, using (4),



c12 z 12 = c12 z 2 − 2zc12 r12 − a 2 + 1 − c12 a 2 ,

c22 z 22 = c22 z 2 + 2zc22 r22 − a 2 + 1 − c22 a 2

928 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
and thus

c12 z 12 + c22 z 22 = (c12 + c22 )z 2 + 1 = k1 k2 (z 2 + a 2 ) = k1 k2 w1 w2

which shows part (b).

Lemma 2. Given three mutually tangent circles C1 , C2 , C3 with curvatures c1 , c2 , c3


and respective centers z 1 , z 2 , z 3 , let c and z be the curvature and center respectively
of the circle orthogonal to each of the three given circles. For i = j ∈ {1, 2, 3}, let
{z i j } = Ci ∩ C j , let Si j denote the unique sphere tangent at z i j such that all three
spheres S12 , S13 , S23 are mutually tangent, and let ki j denote the curvature of Si j . Then

(a) ki j = ci + c j , and

(b) c2 z 2 = c1 z 1 c2 z 2 + c1 z 1 c3 z 3 + c2 z 2 c3 z 3 .

Proof. By Coxeter [2, p. 15], c = c1 c2 + c1 c3 + c2 c3 . By Lemma 1,

c2 + c12 = k12 k13 , c2 + c22 = k12 k23 , c2 + c32 = k13 k23 ,

and so ki j = ci + c j and part (a) is shown.


Given two tangent circles with centers z, w and respective radii r, s, it is easy to see
that the tangent point is (sz + r w)/(r + s). Hence,
ci z i + c j z j
zi j = (5)
ci + c j

and so, by part (a),

ki j z i j = ci z i + c j z j .

By Lemma 1b,

c2 z 2 + c32 z 32 = k13 k23 z 13 z 23 = (c1 z 1 + c3 z 3 )(c2 z 2 + c3 z 3 )

and therefore

c2 z 2 = c1 z 1 c2 z 2 + c1 z 1 c3 z 3 + c2 z 2 c3 z 3

which shows part (b).

Given four mutually tangent circles, there are six points of tangency between them
(see Figure 2). At each such point z, assign a sphere tangent to the plane at z with
curvature equal to the sum of the curvatures of the two circles that meet at z. By
Lemma 2(a), the six spheres have disjoint interiors and any two are tangent to each
other if and only if their points of tangency to the plane are on the same circle. We will
refer to this collection of spheres as the octahedral arrangement of spheres associated
with the four circles since the adjacency graph of these six circles forms an octahedron.
A key idea for the proof of the main theorem is that the respective octahedral ar-
rangements of spheres associated to {Ci } and to {Ci } are the same. We now show the
complex Descartes circle theorem.

December 2014] NOTES 929


C2

C'1

C4 C3

Figure 2. Four circles and their duals; three circles and their duals are labeled.

Theorem 1. Given four mutually tangent circles with curvatures ci and respective
centers z i ,
 2 
ci z i =2 ci2 z i2 .

Proof. Consider a configuration of four mutually tangent circles C1 , C2 , C3 , C4 . We


define C1 to be the circle containing the three tangency points of C2 , C3 , C4 , we define
C2 , C3 , C4 similarly, and we let ci denote the curvature of Ci . These form what is
called the dual configuration, see Figure 2, where C1 , C2 , C3 , and C4 are labeled.
For i = j ∈ {1, 2, 3, 4}, let Si j be the sphere tangent to the plane at Ci ∩ C j with
curvature ki j := ci + c j . By Lemma 2(a), the spheres {Si j : 1 ≤ i < j ≤ 4} form the
octahedral arrangement of spheres associated with the circles C1 , . . . , C4 . Similarly,
let Si j be the sphere tangent to the plane at Ci ∩ C j with curvature ki j := ci + cj .
 
If {i, j, m, n} = {1, 2, 3, 4}, then Si j = Smn and thus ki j = kmn . Note z i j = z mn and
thus, by (5),

ci z i + cj z j = (ci + cj )z i j = (cm + cn )z mn = cm z m + cn z m .

For convenience, let wi = ci z i , wi = ci z i , K i j = ki j z i j , and K ij = ki j z i j . By Lemma



2(b), w4 = σ w1 w2 + w1 w3 + w2 w3 (where σ is either 1 or −1) since C4 is the in-
circle of the triangle connecting the centers of C1 , C2 , and C3 . Hence

2(w1 + w2 + w3 − w4 ) = K 12 + K 34 + K 12 + K 13 + K 23 − K 14 − K 24 − K 34
       
= K 34 + K 12 + K 34 + K 24 + K 14 − K 23 − K 13 − K 12

= 4w4 = 4σ w1 w2 + w1 w3 + w2 w3 .

It follows that

w4 = w1 + w2 + w3 + 2σ w1 w2 + w1 w3 + w2 w3

and the result follows from the fact that equation (1) is equivalent to

d = a + b + c ± 2 ab + ac + bc.

930 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
The complex Descartes circle theorem is a true generalization of the Descartes circle
theorem since replacing z i by (z i + z)/z in the formula and taking the limit as z goes
to infinity gives the old version.
The spheres in an octahedral arrangement associated with four mutually tan-
gent circles obeys a similar formula (the proof of which follows immediately from
Theorem 1).

Corollary 1. Given four mutually tangent circles C1 , C2 , C3 , C4 , and i =  j


∈ {1, 2, 3, 4}, let {z i j } = Ci ∩ C j , Si j be the sphere tangent to plane at z i j with
curvature ki j := ci + c j . Then,
⎛ ⎞2
 
(a) 2⎝ ki j ⎠ = 9 ki2j ,
i< j i< j

⎛ ⎞2
 
(b) 2⎝ ki j z i j ⎠ = 9 ki2j z i2j .
i< j i< j

REFERENCES

1. H. S. M. Coxeter, The problem of Apollonius, Amer. Math. Monthly 75 (1968) 5–15, http://dx.doi.
org/10.2307/2315097.
2. H. S. M. Coxeter, Introduction to Geometry. Second edition, John Wiley and Sons, New York, 1969.
3. J. Lagarias, C. Mallows, A. Wilks, Beyond the Descartes circle theorem, Amer. Math. Monthly 109 (2002)
338–361, http://dx.doi.org/10.2307/2695498.
4. D. Pedoe, On a theorem in geometry, Amer. Math. Monthly 74 (1967) 627–640.
5. P. Sarnak, Integral Apollonian packings, Amer. Math. Monthly 118 (2011) 291–306, http://dx.doi.
org/10.4169/amer.math.monthly.118.04.291.
6. J. B. Wilker, Four proofs of a generalization of the Descartes circle theorem, Amer. Math. Monthly 76
(1969) 278–282, http://dx.doi.org/10.2307/2316373.

Department of Mathematics, SUNY, Plattsburgh, NY 12901


northssw@plattsburgh.edu

December 2014] NOTES 931


The Length of an Arithmetic Progression
Represented by a Binary Quadratic Form
Pallab Kanti Dey and R. Thangadurai

Abstract. In this paper we prove that if Q(x, y) = ax 2 + bx y + cy 2 is an integral binary


quadratic form with a nonzero, nonsquare discriminant d and if Q represents an arithmetic
progression {kn +  : n = 0, 1, . . . , R − 1}, where k and  are positive integers, then there
are absolute constants C1 > 0 and L 1 > 0 such that R < C1 (k 2 |d|) L 1 . Moreover, we prove
that every nonzero integral binary quadratic form represents a nontrivial 3-term arithmetic
progression infinitely often.

Let Q(x, y) = ax 2 + bx y + cy 2 be an integral binary quadratic form of discriminant


d = b2 − 4ac = 0. Recently, A. Alaca, Ş. Alaca, and K. S. Williams [1] proved that
Q represents an arithmetic progression of infinite length if and only if d is a perfect
square, that is, d = m 2 for some nonzero integer m. Suppose from now on that d is not
a perfect square so that Q cannot represent an arithmetic progression of infinite length.
We address the question “How long can an arithmetic progression represented by Q
be?” Making use of the ideas used by Alaca, Alaca, and Williams in the proof of their
theorem [1], we obtain an upper bound for the length of any arithmetic progression
represented by Q.
We require the following result.

Proposition 1. Let N ≡ 0 (mod 4) be a nonzero integer which is not a perfect square.


Then there exist absolute constants C > 0 and L > 0 for which there is a prime p = 2
satisfying
 
N
p ≤ C|N | ,
L
= −1.
p

Proof. As N ≡ 0 (mod 4) is not a perfect square there is an integer a satisfying


 
N
= −1, where 1 ≤ a ≤ |N | − 1,
a

see for example [2, p. 298]. Clearly (a, |N |) = 1 so, by Linnik’s theorem [5], there
are absolute constants C > 0 and L > 0 such that the least prime p in the arithmetic
progression

{|N |k + a : k = 0, 1, 2, . . .}

satisfies

p ≤ C|N | L .

http://dx.doi.org/10.4169/amer.math.monthly.121.10.932
MSC: Primary 11E25, Secondary 11E12

932 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
Since p belongs to this arithmetic progression, we have p ≡ a (mod |N |) and, by [2,
Lemma 2.3, p. 291] we deduce
   
N N
= = −1.
p a
 
N
Finally, as N ≡ 0 (mod 4) and = −1, we see that p =
 2.
p

Remark. By a deep result of Xylouris [8], one has L ≤ 5.2. Xylouris’s work is a
refinement of that of Heath-Brown [4], who showed that L satisfies L ≤ 5.5.

In this short note we prove the following result.

Theorem 1. Let Q(x, y) = ax 2 + bx y + cy 2 be an integral binary quadratic form


with discriminant d = b2 − 4ac = 0. Suppose that d is not a perfect square and that
Q represents an arithmetic progression {kn +  : n = 0, 1, . . . , R − 1}, where k and 
are positive integers. Then there are absolute constants C1 > 0 and L 1 > 0 such that
R < C1 (k 2 |d|) L 1 .

Proof. Set N = 4k 2 d so that N is a nonzero integer with N ≡ 0 (mod 4) which is not


a perfect square. By the Proposition 1, there are absolute constants C > 0 and L > 0
for which there is a prime p = 2 satisfying
 
N
p ≤ C|N | , L
= −1.
p
 
4k 2 d
Hence = −1 and thus
p
 
d
(d, p) = (k, p) = 1, = −1.
p

If p|ac, then
   
b2 − 4ac b2
−1 = = =0 or 1,
p p

which is impossible. Hence (ac, p) = 1.


As (k, p) = 1, there exists an integer t with 1 ≤ t < p2 such that kt ≡ 1 (mod p2 ).
Define the integer u by u = (kt − 1)/ p 2 so that kt = 1 + up2 . As kt ≥ 1, we see that
u ≥ 0. Furthermore, as up2 < kt < kp2 we have u < k. Hence

kt = 1 + up 2 , 1 ≤ t < p2 , 0 ≤ u < k.

We now construct an integer n with 1 ≤ n < C 3 |N |3L such that p|(kn + ) and
p2  (kn + ).

December 2014] NOTES 933


If p > , we choose n = t ( p − ). Note that 1 ≤ n < p3 . Since p ≤ C|N | L , it is
clear that n < C 3 |N |3L ≤ C 3 |N |3L . Also, we see that

kn +  = kt ( p − ) +  = (1 + up2 )( p − ) +  = p(1 + up2 − up),

so that p|(kn + ) and p 2  (kn + ) as required.


If p ≤  and p  , then we choose n = t ( p − 1). Note that 1 ≤ n < p3
≤ C 3 |N |3L . Moreover, we have

kn +  = kt ( p − 1) +  = (1 + up 2 )( p − 1) +  = p(1 + up2 − up)

so that p|(kn + ) and p 2  (kn + ).


If p ≤  and p, then we choose n = tsp, where the positive integer s = / p is
not divisible by p. Clearly 1 ≤ n < p 3 < C 3 |N |3L . Here

kn +  = ktsp +  = (1 + up 2 )sp + sp = sp(2 + up2 ),

so that p|(kn + ) and (as p = 2 and p  s) p2  (kn + ).


Finally, if p ≤  and p 2 |, we choose n = t p. Note that 1 ≤ n < p3 ≤ C 3 |N |3L
≤ C 3 |N |3L . In this case we have

kn +  = kt p +  = (1 + up 2 ) p +  = p(1 + up2 + (/ p)),

so that p|(kn + ) and (as p|(/ p)) p2  (kn + ).


This completes the construction of an integer n satisfying 1 ≤ n < C 3 |N |3L such
that p|(kn + ) and p 2  (kn + ).

Next we show that the integer kn +  is not represented by Q. Suppose on the


contrary that the integer kn +  is represented by Q. Then there exist integers x and y
such that kn +  = ax 2 + bx y + cy 2 . Since p|(kn + ), we have ax 2 + bx y + cy 2 ≡ 0
(mod p). Therefore, since

4a(ax 2 + bx y + cy 2 ) = (2ax + by)2 − (b2 − 4ac)y 2 ,

we see that (2ax + by)2 ≡ dy 2 (mod p). If p  y then d ≡ ((2ax + by)z)


2
(mod p)
for some integer z such that yz ≡ 1 (mod p). This contradicts that p = −1. If p|y
d

then p|(2ax + by) so p|2ax. But p = 2 and p  a hence p|x. Therefore, p2 divides
ax 2 + bx y + cy 2 = kn + , contradicting p2  (kn + ). This completes the proof that
the integer kn +  is not represented by Q.

Since all the integers , k + , 2k + , . . . , (R − 1)k +  are represented by Q,


we must have n > R − 1, that is, R ≤ n. But n < C 3 |N |3L so R < C 3 |4k 2 d|3L
= C1 (k 2 |d|) L 1 , where L 1 and C1 are absolute constants satisfying L 1 = 3L > 0 and
C1 = C 3 26L > 0.

Remark. There is no loss of generality in assuming that Q represents an arithmetic


progression of positive integers since if Q only represents an arithmetic progression
of negative integers then −Q represents an arithmetic progression of positive integers.

934 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
The least length of a nontrivial arithmetic progression is 3. Does every nonzero in-
tegral binary quadratic form represent an arithmetic progression of length 3? Two deep
results of Weber [7] and Green [3] positively answer the above question in the partic-
ular case when the integral binary quadratic form is positive-definite. In 1882, Weber
[7] proved that if Q is a primitive integral binary quadratic form which is positive-
definite, then the set of primes that are represented by Q has positive relative density.
In 2006 Green [3] proved that any subset of primes having positive relative density has
a 3-term arithmetic progression. Thus, by putting these two deep results together, we
see that every primitive, positive-definite, integral binary quadratic form represents a
3-term arithmetic progression. In this paper we shall prove in an elementary way that
any nonzero integral, binary quadratic form represents a nontrivial arithmetic progres-
sion of length 3 infinitely often.

Theorem 2. Every nonzero integral binary quadratic form represents a nontrivial


arithmetic progression of length 3 infinitely often.

Proof. Let Q(x, y) = ax 2 + bx y + cy 2 be a nonzero, integral binary quadratic form.


Since Q is nonzero, at least one of the integers a, b, and c is nonzero. We consider the
following cases.
Case 1: a = 0.
Let x be a positive integer. Then, as

Q(2x 2 − 1, 0) = a(4x 4 − 4x 2 + 1),


Q(2x 2 + 2x + 1, 0) = a(4x 4 + 8x 3 + 8x 2 + 4x + 1)
= a(4x 4 − 4x 2 + 1) + a(8x 3 + 12x 2 + 4x),

and

Q(2x 2 + 4x + 1, 0) = a(4x 4 + 16x 3 + 20x 2 + 8x + 1)


= a(4x 4 − 4x 2 + 1) + 2a(8x 3 + 12x 2 + 4x),

Q represents a nontrivial arithmetic progression of length 3 infinitely often.


Case 2: a = 0
In this case, Q(x, y) = bx y + cy 2 and its discriminant is d = b2 .
Subcase (i): b = 0
Since d is a nonzero perfect square, by the result of Alaca et al. [1], the form
Q(x, y) represents an infinite arithmetic progression in positive integers and hence
it represents a nontrivial arithmetic progression of length 3 infinitely often.
Subcase (ii): b = 0
In this case we have Q(x, y) = cy 2 , where c = 0, as a = b = 0. Then, taking x to
be any integer and proceeding similarly as in the first case, we deduce that Q(x, y)
represents infinitely many nontrivial arithmetic progressions of length 3.

Remark. The statement of Theorem 2 is not true in general if we replace 3 by a larger


integer. For example, Q(x, y) = x 2 does not represent an arithmetic progression of
length 4 as there do not exist 4 squares in arithmetic progression (see [6, pp. 21–22]).

ACKNOWLEDGMENTS. We are grateful to the referees for going through the paper very carefully and
modifying it to a much nicer form.

December 2014] NOTES 935


REFERENCES

1. A. Alaca, Ş. Alaca, K. S. Williams, Arithmetic progressions and binary quadratic forms, Amer. Math.
Monthly 115 (2008) 252–254.
2. R. Ayoub, An Introduction to the Analytic Theory of Numbers. Mathematical Surveys, Number 10, Ameri-
can Mathematical Society, Providence, Rhode Island, 1963, http://dx.doi.org/10.1090/surv/010.
3. B. Green, Roth’s theorem in the primes, Ann. Math. 161 (2005) 1609–1636,
http://dx.doi.org/10.4007/annals.2005.161.1609.
4. D. R. Heath-Brown, Zero-free regions for Dirichlet L-functions and the least prime in an
arithmetic progression, Proc. London Math. Soc. (3) 62 (1992) 265–338,
http://dx.doi.org/10.1112/plms/s3-64.2.265.
5. Y. V. Linnik, On the least prime in an arithmetic progression I . The basic theorem, Rec. Math. (Mat.
Sbornik) N. S. 15 no. 57 (1944) 139–178.
6. L. J. Mordell, Diophantine Equations. Pure and Applied Mathematics, Vol. 30, Academic Press, London,
1969.
7. H. Weber, Beweis des Satzes, daB jede eigentlich primitive quadratische Form un endliche viele
Primzahlen darzustellen fa̋hig ist, Math. Ann. 20 (1882) 301–329.
8. T. Xylouris, Über die Linniksche Konstante, Diplomarbeit, Universität Bonn, 2009. (arXiv:0906.2749v1
[math.NT] 15 Jun 2009).

Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019, India


pallabdey@hri.res.in

Harish-Chandra Research Institute, Chhatnag Road, Jhunsi, Allahabad 211019, India


thanga@hri.res.in

936 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
Cosines and Cayley, Triangles and Tetrahedra
Marshall Hampton

Abstract. This article surveys some of the more aesthetically appealing and useful formulas
relating distances, areas, and angles in triangles and tetrahedra. For example, a somewhat ne-
glected trigonometric identity involving only the cosines of a triangle is an instance of the
famous Cayley cubic surface. While most of these formulas are well known, some novel iden-
tities also make an appearance.

Heron’s formula and the Cayley–Menger determinant. Many people learn in


school that for a triangle with vertices A, B, and C, the area of the triangle, ABC , can
be computed from Heron’s formula [2, 13],

ABC = s(s − rAB )(s − rAC )(s − rBC ) (1)

in which ri j is the distance between vertex i and vertex j, and s is the semiperimeter
rAB + rAC + rBC
s= .
2
Almost two thousand years later, a form of Heron’s formula was found that gen-
eralizes to simplices of any dimension. This is the Cayley–Menger determinant; for a
triangle:
⎛ ⎞
0 1 1 1
⎜ 1 0 r2 r2 ⎟
⎜ AC ⎟
−162ABC = det ⎜ AB
2 ⎟
⎝ 1 rAB2
0 rBC ⎠
2 2
1 rAC rBC 0
= −(−rAB + rAC + rBC )(rAB − rAC + rBC )(rAB + rAC − rBC )
× (rAB + rAC + rBC ).

The matrix in the determinant is called the Cayley–Menger matrix. Cayley found
the determinantal form [7]—the polynomial itself was known earlier, by Lagrange.
Menger discovered a number of further properties of the matrix and closely related
variants of it [14]. For an n − 1-dimensional simplex of n vertices A1 , . . . An , the
volume formula generalizes to
⎛ ⎞
0 1 1 ... 1
⎜ 1 0 r A2 1 A2 ... r A2 1 An ⎟
(−1)n+1 ⎜ ⎟
⎜ ⎟
2A1 ...An = n det ⎜ 1 r A2 1 A2 0 ... r A2 2 An ⎟. (2)
2 (n!)2 ⎜ ⎟
⎝ ... ... ... ... ... ⎠
1 r A2 1 An r A2 2 An ... 0

The definitive article about the Cayley–Menger matrix is [5], and its properties
are also nicely summarized in [4]. The use of distances as coordinates is masterfully
http://dx.doi.org/10.4169/amer.math.monthly.121.10.937
MSC: Primary 51K05, Secondary 52-02

December 2014] NOTES 937


investigated in [1] and, with particular attention paid to triangles, in [8]. Some uses
and variants of the Cayley–Menger determinant are discussed in [11, 12].

Laws of cosines. For brevity in some of the more complex formulae, we write cABC
for the cosine of the angle ABC.
The classic law of cosines is
2
rAB − rAC
2
− rBC
2
+ 2rACrBC cACB = 0.

One derivation of the law of cosines makes use of the relations of the form

rAB = rAC cBAC + rBC cABC (3)

which can be used for many purposes because of their linearity in the distances
and cosines. These relations generalize to the n-dimensional simplex through the
polyhedral version of the divergence theorem. If we choose a vector field to be the
outward-pointing normal ni to the ith facet of a bounded convex polyhedron, then
the divergence of the field is 0. This is equal to the surface integral

0= ni · nj  j = (i) − cij  j (4)
j

where ci j is the cosine of the angle between facets i and j, and i is the area of the
ith facet. Multiplying this equation by i and subtracting  j times the corresponding
equation for face j gives

i2 = 2j − 2 cjk  j k . (5)
j=i j,k=i

While equation (5) has the same form as the law of cosines for triangles, it seems much
less useful since more than one cosine is involved in each equation. The lower degree
conditions in (4) are usually a better starting point for other identities.

A neglected trigonometric identity: the Cayley cosine cubic. If we think of the


equations (3) as linear in the distances, then the distances must be in the kernel of the
coefficient matrix, i.e.,
⎛ ⎞⎛ ⎞
−1 cBAC cABC rAB
⎝ cBAC −1 cACB ⎠ ⎝ rAC ⎠ = 0.
cABC cACB −1 rBC

If we choose to normalize the kernel vector by setting rAB = 1, then

cBAC + cACB cABC


r AC = , and
1 − cACB
2

cABC + cACB cBAC


r BC =
1 − cACB
2

which are complementary to the law of cosines in that they express a single distance in
terms of the cosines. Of course, these formulae break down if the triangle is collinear.

938 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
In order to have a nontrivial kernel, the determinant must vanish, which gives us the
beautiful relation:

⎛ ⎞
−1 cBAC cABC
det ⎝ cBAC −1 cACB ⎠ = cACB
2
+ cABC
2
+ cBAC
2
+ 2cACB cABC cBAC − 1 = 0. (6)
cABC cACB −1

This trigonometric identity has been known for a long enough time that it is difficult
to determine its first appearance. But it does not seem well known, and it is not usually
cited as an example of the famous Cayley cubic, a surface with the maximal number
(four) of isolated singular points (where both the surface and the gradient of its defining
function vanish). Perhaps we should call it the Cayley cosine cubic.
The four singular points of the Cayley cosine cubic are

(c123 , c132 , c213 ) ∈ {(1, 1, −1), (1, −1, 1), (−1, 1, 1), (−1, −1, −1)}.

The first three of these points correspond to collinear triangles, while the last one is
unrealizable as a triangle. However, it is consistent algebraically with a triangle whose
edge lengths add up to zero (rAB + rAC + rBC = 0). Intriguingly, this is the fourth factor
in the Cayley–Menger determinant as well.
The Cayley cosine cubic contains six lines that intersect the planes ci jk = ±1. The
three lines with ci jk = 1 are the boundary of the portion of the surface corresponding
to triangles with positive distances.

Figure 1. The Cayley cubic.

In a tetrahedron, these laws of cosines generalize in a variety of ways. There is


a Cayley cosine cubic for each of the four faces, involving the three facial cosines
(cosines between the edges of the tetrahedron).

December 2014] NOTES 939


The Cayley cosine cubic determinant (6) generalizes directly to the dihedral angles
of a tetrahedron [3], where ci j is the cosine of the angle between faces i and j:
⎛ ⎞
−1 cAB cAC cAD
⎜ c −1 cBC cBD ⎟
det ⎝ AB = 0.
cAC cBC −1 cCD ⎠
cAD cBD cCD −1

This formula may relate to a rigidity result [15] on tetrahedra, which states that if
all of the dihedral angles of a tetrahedron are less than or equal to the corresponding
angles of another tetrahedron, then the two tetrahedra must be similar.
The dihedral angles can be computed in terms of facial cosines at a vertex. For each
dihedral angle, there are two choices for the vertex. For instance, the cosine of the
dihedral angle between faces A and B can be expressed as

cADB − cADC cBDC cACB − cBCD cACD


cAB = =
sADC sBDC sBCD sACD

from which we can eliminate the sines by squaring and substituting the Pythagorean
identity to obtain a relation between the facial cosines:

(1 − cACD
2
)(1 − cBCD
2
)(cADB − cADC cBDC )2
= (1 − cADC
2
)(1 − cBDC
2
)(cACB − cBCD cACD )2 . (7)

For each of the six edges of the tetrahedron, we have such an equation. These cannot
be independent since the space of similarity classes of tetrahedra is five-dimensional.
Somewhat similarly, the law of sines can be used once for each triangular face to
obtain the identity

sACB sBDC sCAD sABD = sBAC sCBD sACD sADB ,

which could be converted into a cosine identity by squaring both sides. This is some-
what unsatisfactory, however, since it involves eight angles.
If a geometric problem can be cast into polynomial form, the computation of a
Gröbner basis (or bases) provides an automated path for eliminating variables and
obtaining new or simpler relations [6] (caveat emptor: in practice, many Gröbner bases
require excessive memory and computational time to compute). It is beyond the scope
of this article to describe Gröbner bases in full. They are analogous to the reduction of
a linear system to echelon form (Gaussian elimination), but for polynomial (nonlinear)
systems. For more background on Gröbner bases, see [9].
By computing a Gröbner basis for the system of equations (4) for a tetrahedron
(using Singular [10]), we found a fairly simple condition on the six angles bordering
one face. The six angles are on three faces but do not include any of the three angles
meeting at the common vertex. For a common vertex A, this condition is

2
cACB 2
cADC 2
cABD − cADB
2 2
cABC 2
cACD + cADB
2 2
cABC − cACB
2 2
cADC
− cACB
2 2
cABD − cADC
2 2
cABD + cADB
2 2
cACD + cABC
2 2
cACD + cACB
2

− cADB
2
− cABC
2
+ cADC
2
+ cABD
2
− cACD
2
= 0.

940 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
If we use the squares as variables, i.e., let qi jk = ci2jk , then there are nine singular
planes corresponding to collinear configurations, along with the origin. This singular
point at the origin could be interpreted as a projective closure having point A at infinity.
This can be written in a nicer form:
⎛ ⎞ ⎛ ⎞
1 1 1 1 1 1 1 1
⎜ 1 2 2 ⎟ ⎜ 1 2 2 ⎟
⎜ 1 cABC cABC ⎟ ⎜ 1 cACB cACB ⎟
det ⎜ ⎟ = det ⎜ ⎟. (8)
⎝ 1 cACD2
1 2
cACD ⎠ ⎝ 1 cADC 2
1 2
cADC ⎠
2 2 2 2
1 cADB cADB 1 1 cABD cABD 1

This determinantal form was found by an ad hoc approach. Is there an elementary


way to derive this identity? We do not know of one, but it seems likely.

ACKNOWLEDGMENT. The author thanks the reviewers of this article as well as Gareth Roberts and
Richard Moeckel for their comments and suggestions, which have improved it in many ways.

REFERENCES

1. A. Albouy, A. Chenciner, Le problème des n corps et les distances mutuelles, Invent. Math. 131 no. 1
(1997) 151–184.
2. H. Alexandrinus, Metrika. ca. 60.
3. D. Audet, Déterminants sphérique et hyperbolique de Cayley-Menger, Bulletin AMQ 51 (2011) 45–52.
4. M. Berger, Geometry I. Translated from the French by M. Cole and S. Levy, Springer, New York, 1987.
5. L. M. Blumenthal, B. E. Gillam, Distribution of points in n-space, Amer. Math. Monthly 50 (1943) 181–
185, http://dx.doi.org/10.2307/2302400.
6. B. Buchberger, F. Winkler, Gröbner Bases and Applications. Lecture note series. Cambridge Univ. Press,
Cambridge, 1998, http://dx.doi.org/10.1017/cbo9780511565847.
7. A. Cayley, On a theorem in the geometry of position, Camb. Math. J. 2 (1841) 267–271, http://dx.
doi.org/10.1017/cbo9780511703676.002.
8. A. Chenciner, The “form” of a triangle, Rend. Mat. Serie VII 27 (2007) 1–16.
9. D.A. Cox, J. Little, D. O’Shea, Ideals, Varieties, and Algorithms: An Introduction to Computational
Algebraic Geometry and Commutative Algebra. Springer, New York, 2007.
10. W. Decker, G.-M. Greuel, G. Pfister, H. Schönemann, SINGULAR 3-1-5—A computer algebra system
for polynomial computations, (2012), http://www.singular.uni-kl.de.
11. A. W. M. Dress, T. F. Havel, Distance geometry and geometric algebra, Found. Phys. 23 (1993), 1357–
1374, http://dx.doi.org/10.1007/bf01883783.
12. T. F. Havel, Some examples of the use of distances as coordinates for euclidean geometry, J. Symbolic
Comput. 11 (1991) 579–593, http://dx.doi.org/10.1016/s0747-7171(08)80120-4.
13. D. A. Klain, An intuitive derivation of Heron’s formula, Amer. Math. Monthly 111 (2004) 709–712,
http://dx.doi.org/10.2307/4145045.
14. K. Menger, Untersuchungen über allgemeine Metrik, Math. Ann. 100 (1928) 75–163.
15. I. Rivin, J. H. Lindsey II, A similarity criterion: 10462, Amer. Math. Monthly 105 no. 7 (1998) 671.

Department of Mathematics and Statistics, University of Minnesota, Duluth, MN, 55812


mhampton@d.umn.edu

December 2014] NOTES 941


A Note on the Spectral Theorem in the
Finite-Dimensional Real Case
Felipe Acker

Abstract. We give a proof of the spectral theorem for self-adjoint operators in the finite-
dimensional real case that involves no complexification of the space, no determinants, and no
Lagrange multipliers. The tools in our proof are the fundamental theorem of algebra and the
intermediate value theorem.

1. THE THEOREM.

Definition. Let V be a real inner product space. We denote the inner product of u and
v by u, v. A linear transformation A : V → V is self-adjoint if

Au, v = u, Av ∀ u, v ∈ V.

Definition. Let V be a real vector space and let A : V → V be a linear transformation.


If v is a nonzero vector, λ is a scalar and

Av = λv,

then λ is an eigenvalue of A and v is an eigenvector of A associated with λ.

Our goal is to prove the following theorem.

Theorem (The spectral theorem). Let V be a finite-dimensional real inner product


space and let the linear transformation A : V → V be self-adjoint. Then there is a set
of eigenvectors of A that form an orthonormal basis of V.

2. ABOUT THE PROOF. The proof is carried out by induction on the dimension of
V. The clue is in the following lemma.

Lemma. Let V be an inner product space and let the linear transformation A : V →
V be self-adjoint. Suppose that the subspace E of V is invariant under A (that means
A(E) ⊂ E). Then the orthogonal complement of E,

E ⊥ = {u ∈ V| u, v = 0 ∀v ∈ E} ,

is also invariant under A.

Proof. Let us suppose A(E) ⊂ E and let u ∈ E ⊥ . Since Av ∈ E, we have, for all
v ∈ E,
http://dx.doi.org/10.4169/amer.math.monthly.121.10.942
MSC: Primary 15A18

942
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
Au, v = u, Av = 0.

This proves Au ∈ E ⊥ .

Suppose now the theorem holds in the n-dimensional case. It follows from the
lemma that we can prove the theorem in the (n + 1)-dimensional case if we can prove
the existence of an eigenvector v. This follows since the subspace E spanned by v is
invariant under A and the theorem applies to the n-dimensional space E ⊥ . The proof
of the existence of the eigenvector v can use the complexification of V or Lagrange
multipliers, searching for critical points of

1
f (u) = Au, u
2
on the unit sphere of V [1, pp 106 and 113].
Our alternative strategy requires that we begin with the one- and two-
dimensional cases (dimension 2 requires the intermediate value theorem). Then we
prove that, in the general case, there always exists an invariant subspace of dimension
1 or 2. We will avoid, under all circumstances, the use of determinants and, a fortiori,
the characteristic polynomial of A will not even be mentioned (despite the algebraic
flavor of the final part of the proof).

3. THE ONE- AND TWO-DIMENSIONAL CASES. The one-dimensional case is


trivial; let us go directly to the two-dimensional one.

Theorem (dimension 2). Let V be a two-dimensional real inner product space and
let A : V → V be a self-adjoint linear map. Then there is a set of eigenvectors of A
that form an orthonormal basis of V.

Proof. Let the vectors e1 and e2 form an orthonormal basis of V. Let v


−→ v⊥ be
the map from V to V defined by (x1 e1 + x2 e2 )⊥ = −x2 e1 + x1 e2 . Define v by v
   
= v, v 2 . Notice that v⊥  = v and v⊥ , v =
1

 0 for⊥ all v in V. We want to prove


the existence of a unit vector v in V such that Av, v = 0. This will imply, since
we are in dimension 2, that both v and v⊥ are eigenvectors of A (and they form an
orthonormal  basis
 of V).
Define c : 0, π2 −→ V by

c(θ) = cos θ e1 + sin θ e2 ,


 
and define α : 0, π2 −→ IR by
 
α(θ) = Ac(θ), c(θ)⊥ .

Then α is continuous and


π 
α = Ae2 , −e1  = − Ae2 , e1  = − e2 , Ae1  = − Ae1 , e2  = −α(0).
2
It follows from the intermediate value theorem that there exists θ in [0, π/2] such that
α(θ) = 0. Taking v = c(θ) concludes the proof.

December 2014] NOTES 943


4. THE GENERAL CASE. Let V be a real finite-dimensional vector space. Given
a linear operator A : V −→ V and a polynomial p(x) = ak x k + · · · + a1 x1 + a0 we
define the operator p(A) : V −→ V by

p(A)v = ak Ak v + · · · + a1 Av + a0 v.

Proposition 1. For each linear operator A : V −→ V there exists a monic polynomial


p such that p(A) = 0.

Proof. The zero-dimensional case is trivial. Let n ≥ 1 be the dimension of V. Since


2
the space of linear transformations from V to V is n 2 -dimensional, I, A, A2 , . . . , An
are linearly dependent. This means that there exists a nonzero polynomial q(x)
= ak x k + · · · + a1 x1 + a0 (of degree k ≤ n 2 ) such that q(A) = 0. Dividing q by
ak , we get a monic polynomial p such that p(A) = 0.

Proposition 2. Assume V is nontrivial and let A : V −→ V be linear. Then there


exists a subspace E of V, of dimension 1 or 2, such that A(E) ⊂ E.

Proof. Let I be the set of monic polynomials q such that q(A) = 0. We take p in I of
minimal degree. Since V is nontrivial, the degree of p must be at least one. It follows
from the fundamental theorem of algebra that p factors as a product of polynomials of
degree at most two. So, we can decompose p as a product of two monic polynomials,

p(x) = a(x)g(x),

in which the degree of a equals one or two. We study these two possibilities.
(i) If the degree of a is one, we may suppose that a(A) = A − λI . Since g(A) = 0
(otherwise, p would not have minimal degree), there is a nonzero vector v in the
range of g(A). Consequently, a(A)v = (A − λI )v = 0, so v is an eigenvector of
A. Let E be the space spanned by v.
(ii) If the degree of a is two, we may suppose that a(A) = A2 + b A + cI . As in
the previous case, the range of g(A) must contain a nonzero vector v. If v is an
eigenvector of A, we are done. If not, then v and Av span a two-dimensional
subspace E. But, since a(A)v = 0, we get A(Av) = −b Av − cv, and this shows
that E is invariant under A.

We can now prove our main theorem.

Theorem. Let V be a finite-dimensional real inner product space and let the linear
transformation A : V → V be self-adjoint. Then there is a set of eigenvectors of A that
form an orthonormal basis of V.

Proof. We prove the theorem by induction on the dimension of V. The one-dimensional


and the two-dimensional cases are done. Let n > 2 be the dimension of V and assume
that the theorem holds for spaces of dimension less than n. By Proposition 2, there
exists a one- or a two-dimensional subspace E of V such that A(E) ⊂ E. The one- or
the two-dimensional cases of the theorem (already proved) apply to E; E ⊥ is covered
by the induction hypothesis.

The real Schur form [2, p. 341] also follows from Proposition 2.

944
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
Proposition 3 (The real Schur form). Every real square matrix M can be factored
as

M = QQ T ,

where Q is real orthogonal, Q T is the transpose of Q, and  is a real block upper


triangular matrix in which each diagonal block is 1 × 1 or 2 × 2.

Proof. Since the 1 × 1 and 2 × 2 cases are trivial, we suppose M is n × n, with n ≥ 3.


Let A : IR n → IR n be the linear transformation associated with M,
A
x
−→ M x.

We must prove the existence of an orthonormal basis γ of IR n such that the matrix of A
relative to γ is a block upper triangular matrix in which each diagonal block is 1 × 1 or
2 × 2. We argue inductively on the order n of M. By Proposition 2, there exists a one-
or a two-dimensional subspace E of IR n such that A(E) ⊂ E. Let α be an orthonormal
basis of E. Define B : E ⊥ → E ⊥ by Bx = P Ax, where P is the orthogonal projec-
tion onto E ⊥ . By the induction hypothesis, there exists an orthonormal basis β of E ⊥
such that the matrix of B relative to β is a block upper triangular matrix in which
each diagonal block is 1 × 1 or 2 × 2. The matrix of A relative to the (ordered) basis
γ = α ∪ β (starting with the elements of α) is a block upper triangular matrix in which
each diagonal block is 1 × 1 or 2 × 2

REFERENCES

1. P. D. Lax, Linear Algebra and its Applications. Second edition. Wiley-Interscience, Hoboken, NJ, 2007.
2. G. H. Golub, C. F. van Loan, Matrix Computations. Third edition. Johns Hopkins Univ. Press, Baltimore,
MD, 1996.

Departamento de Matemática Aplicada, Instituto de Matemática, Universidade Federal do Rio de Janeiro


acker@labma.ufrj.br

December 2014] NOTES 945


PROBLEMS AND SOLUTIONS
Edited by Gerald A. Edgar, Doug Hensley, Douglas B. West
with the collaboration of Itshak Borosh, Paul Bracken, Ezra A. Brown, Randall
Dougherty, Tamás Erdélyi, Zachary Franco, Christian Friesen, Ira M. Gessel, László
Lipták, Frederick W. Luttmann, Vania Mascioni, Frank B. Miles, Richard Pfiefer,
Dave Renfro, Cecil C. Rousseau, Leonard Smiley, Kenneth Stolarsky, Richard Stong,
Walter Stromquist, Daniel Ullman, Charles Vanden Eynden, Sam Vandervelde, and
Fuzhen Zhang.

Proposed problems and solutions should be sent in duplicate to the MONTHLY


problems address on the back of the title page. Proposed problems should never
be under submission concurrently to more than one journal. Submitted solutions
should arrive before April 30, 2015. Additional information, such as general-
izations and references, is welcome. The problem number and the solver’s name
and address should appear on each solution. An asterisk (*) after the number of
a problem or a part of a problem indicates that no solution is currently available.

PROBLEMS
11803. Proposed by Sam Speed, Germantown, PA. Let a1 (k, n) = (9k (24n + 5) −
5)/8, a2 (k, n) = (9k (24n + 13) − 5)/8, a3 (k, n) = (3 · 9k (24n + 7) − 5)/8, and
a4 (k, n) = (3 · 9k (24n + 23) − 5)/8. Show that for each nonnegative integer m
there is a unique integer triple ( j, k, n) with j ∈ {1, 2, 3, 4} and k, n ≥ 0 such that
m = a j (k, n).
11804. Proposed by George Stoica, University of New Brunswick, Saint John, Canada.
Prove that 10|x 3 + y 3 + z 3 − 1| ≤ 9|x 5 + y 5 + z 5 − 1| for real numbers x, y, and z
with x + y + z = 1. When does equality hold?
11805. Proposed by Gleb Glebov, Simon Fraser University, Burnaby, Canada.
(a) Show that

 ∞
(−1)k 

(−1)k 5π 3 3
+ =
k=0
(3k + 1)3 k=0 (3k + 2)3 243

and


(−1)k 

(−1)k 13
− = ζ (3).
k=0
(3k + 1)3 k=0
(3k + 2) 3 18

(b) Prove that



18  (−1)k

9 1
(log x)2
ζ (3) = d x − .
13 0 x3 + 1 13 k=0 (3k + 2)3

Here, ζ denotes the Riemann zeta function.


http://dx.doi.org/10.4169/amer.math.monthly.121.10.946

946 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
11806. Proposed by István Mező, Nanjing University of Information Science and Tech-
nology, Nanjing, China. Prove that
 2π  x  ∞
log n
log  ecos x sin(x + sin x) d x = (e − 1)(log(2π) + γ ) + .
0 2π n=2
n!

Here  denotes the gamma function and γ denotes the Euler–Mascheroni constant.
11807. Proposed by Robin Oakapple, Albany, OR. Given a quadrilateral ABCD in-
scribed in a circle K , and a point Z inside K , the rays AZ, BZ, CZ, and DZ meet K
again at points E, F, G, and H , respectively, to yield another quadrilateral also in-
scribed in K . Develop a construction that takes as input A, B, C, and D and returns
a point Z such that this second quadrilateral has (at least) three of its sides of equal
length.
11808. Proposed by D. M. Bătineţu-Giurgiu, “Matei Basarab” National College,
Bucharest, Romania, and Neculai Stanciu, “George Emil Palade” School, Buzău,
Romania. Let  be the gamma function. Compute
 (n!)−1/n
lim n 2
(nx) d x.
n→∞ ((n+1)!)−1/(n+1)

11809. Proposed by Omran Kouba, Higher Institute for Applied Science and Technol-
ogy, Damascus, Syria. Let an
be a sequence of real numbers.
(a)
∞Suppose √ that an
consists of  nonnegative√ numbers and is nonincreasing, and
∞ n
n=1 an / n converges. Prove that n=1 (−1) an converges.
(b) Find√ a nonincreasingsequence a√ n
of positive numbers such that
limn→∞ nan = 0 and ∞ n=1 (−1) n
a n diverges.
11795. Proposed by Mircea Merca, University of Craiova, Craiova, Romania. Let p be
the partition counting function on the set Z+ of positive integers, and let g be the func-
tion on N given by g(n) = 12 n/2 , (3n + 1)/2. Let A(n) be the set of nonnegative
integer triples (i, j, k) such that g(i) + j + k = n. Prove for n ≥ 1 that
1 
p(n) = (−1) i/2−1 g(i) p( j) p(k).
n (i, j,k)∈A(n)

SOLUTIONS

Large Sum of Sizes Implies Large Size of Sum


11666 [2012, 699–700]. Proposed by Dmitry G. Fon-Der-Flaass (1962–2010), Insti-
tute of Mathematics, Novosibirsk, Russia, and Max A. Alekseyev, University of South
Carolina, Columbia, SC. Let m be a positive integer, and let A and B be nonempty
subsets of {0, 1}m . Let n be the greatest integer such that |A| + |B| > 2n . Prove that
|A + B| ≥ 2n . (Here, |X | denotes the number of elements in X , and A + B denotes
{a + b : a ∈ A, b ∈ B}, where addition of vectors is componentwise modulo 2.)
Solution by Richard Stong, Center for Communications Research, San Diego, CA. We
prove by induction on n that |A| + |B| > 2n implies |A + B| ≥ 2n (for any n). The
case n = 0 is trivial, since the sum of sets that are not both empty is nonempty.

December 2014] PROBLEMS AND SOLUTIONS 947


Consider n ≥ 1. If |B| = 1, then |A + B| = |A| > 2n − 1, which suffices. By sym-
metry, we may therefore assume |A|, |B| ≥ 2. Choose w, x ∈ A and y, z ∈ B. Given
nonzero elements u and v in {0, 1}m as an additive group, there is a homomorphism
φ : {0, 1}m → {0, 1} such that φ(u) = φ(v) = 1. With u = x − w and v = z − y, we
obtain φ(x) = φ(w) and φ(y) = φ(z). For i ∈ {0, 1}, let Ai = {v ∈ A : φ(v) = i},
and similarly for Bi . By construction, the four sets are nonempty, and A + B is the
disjoint union of (A0 + B0 ) ∪ (A1 + B1 ) (mapping to 0) and (A0 + B1 ) ∪ (A1 + B0 )
(mapping to 1).
Since |A0 | + |A1 | + |B0 | + |B1 | > 2n , at least one of |A0 | + |B0 | and |A1 | + |B1 |
exceeds 2n−1 , and similarly at least one of |A0 | + |B1 | and |A1 | + |B0 | exceeds 2n−1 .
By the induction hypothesis, both sets in our decomposition of A + B have size at
least 2n−1 , so |A + B| ≥ 2n .
Editorial comment. Most solvers used induction. Traian Viteam and Robin Chap-
man used the combinatorial nullstellensatz. O.P. Lossers and Pál Peter Dályáy used
Kneser’s theorem.
Also solved by G. Apostolopoulos (Greece), R. Chapman (U. K.), P. P. Dályáy (Hungary), A. Habil (Syria),
Y. J. Ionin, O. P. Lossers (Netherlands), R. Tauraso (Italy), T. Viteam (Chile), and the proposers.

The Gambler’s Ruin in Disguise


11672 [2012, 800]. Proposed by José Luis Palacios, Universidad Simón Bolı́var,
Caracas, Venezuela. A random walk starts at the origin and moves up-right or down-
right with equal probability. What is the expected value of the first time that the
walk is k steps below its then-current all-time high? (Thus, for instance, with the
walk UDDUUUUDDUDD · · · , the walk is three steps below its maximum-so-far on
step 12.)
Solution I by Padraig Condon, Trinity College, Dublin, Ireland. The answer, which we
denote by pk , is k(k + 1). Since p1 is the expected time of the first D, we have p1 = 2.
Let qk be the expected number of steps to reach k steps below the all-time high given
a starting point k − 1 steps below the all-time high. Note that q1 = p1 = 2. To reach k
steps below the all-time high, we must first reach k − 1 steps below the all-time high.
Hence, pk = pk−1 + qk for k ≥ 2.
From a point k − 1 steps below the all-time high, after the next step with equal
probability we are k or k − 2 steps below the all-time high. In the first case, we have
arrived, while in the second case we must first return to k − 1 steps below the all-
time high. Thus, the expected number of steps in the second case is qk−1 + qk . Hence,
qk = 1 + 12 (qk−1 + qk ), which simplifies to qk = 2 + qk−1 . With q1 = 2, we obtain
k
qk = 2k. Thus, pk = pk−1 + 2k, and p1 = 2 yields pk = i=1 2i = k(k + 1).
Solution II by Richard Stong, San Diego, CA. We prove that the answer is k(k + 1)
by expressing the problem in terms of the stopping time of the classical Gambler’s
Ruin problem in which one gambler starts with $k dollars and the other with $(k + 1),
and each step transfers $1 from one gambler to the other, each direction having equal
probability. Interpret upward and downward moves as wins and losses by the gambler
currently having less money, respectively. (The total amount of money is odd so there
can never be a tie.) At every step, each outcome has probability 1/2.
At each time, the gambler with less money has k − m dollars exactly when we are
m steps below the current all-time high. This is true at the start and is easily checked
to be preserved by each move. The only interesting case is when we move to a new
all-time high. Before the move, the money is split k + 1 to k. The gambler with less

948 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
money wins, the path reaches a new high and the money is again split k + 1 to k, but
the two gamblers interchange roles.
In the classical problem starting with $a and $b, it is well known that the expected
number of steps until one gambler is ruined is ab, in this case k(k + 1). We have
shown that this also is the expected number of steps until the path is first k steps below
its all-time high.
Also solved by M. Andreoli, W. Barta, R. Chapman (U. K.), C. Delorme (France), S. J. Herschkorn, G. Lau
(U. K.), O. P. Lossers (Netherlands), H. M. Mahmoud, R. Martin (Germany), I. Pinelis, M. A. Prasad (India),
R. Pratt, R. Tauraso (Italy), M. Wildon (U. K.), and the proposer.

Polynomials with Galois Groups of 2-power Order


11673 [2012, 800]. Proposed by Kent Holing, Statoil, Trondheim, Norway. Let Q and
g be monic polynomials in Z[x], with Q an irreducible quartic, and let f = Q ◦ g.
Suppose that f is irreducible over Q and that the order of the Galois group of F is a
power of 2. Which groups are possible as the Galois group of Q? If, moreover, Q has
negative discriminant, determine the Galois group of A.
Solution by Robin Chapman, University of Exeter, Exeter, England, UK. The possible
Galois groups of Q are the eight-element dihedral group D4 , the cyclic group Z 4 , and
the Klein four-group V4 . If the discriminant of Q is negative, then the group is D4 .
Let K and L be the splitting fields of Q and f , respectively, over Q. The zeroes of
f are the solutions β of g(β) = α such that α is a root of Q. Hence, K ⊆ L. Thus, if
the Galois group of f has order a power of 2, then the index [L : Q] is a power of 2.
Since [K : Q] is a factor of [L : Q], it follows that [K : Q] is also a power of 2. Since
Q is an irreducible quartic, its Galois group is a transitive subgroup of S4 , and the only
such subgroups whose order is a power of 2 are Z 4 , V4 , and D4 .
Each of these groups can occur as the Galois groups of an irreducible quartic Q,
such as when Q is x 4 − 4x 2 + 2, x 4 + 1, or x 4 − 2, respectively. In each case, one can
k
take g(x) = x or, less trivially, g(x) = x 2 for any positive integer k.
If Q has negative discriminant, then it has two real and two nonreal zeroes. Com-
plex conjugation thus induces an element with order 2 in the Galois group that
fixes one of the zeroes of Q, that is, a transposition. Of the three possibilities pre-
viously mentioned, only D4 has such an element. This possibility does occur when
Q is x 4 − 2.
Also solved by P. P. Dályay (Hungary), J. H. Lindsey II, R. Stong, M. Wildon (U. K.), and the proposer.

Norm of a Linear Functional


11674 [2012, 800]. Proposed by Pál Péter Dályay, Szeged, Hungary. Let a and b be
 1 a < 0 < b. Let S be the set of continuous functions f from [0, 1]
real numbers with
to [a, b] with 0 f (x) d x = 0. Let g be a strictly increasing function from [0, 1] to R.
1
Define φ from S to R by φ( f ) = 0 f (x)g(x) d x.
(a) Find sup f ∈S φ( f ) in terms of a, b, and g.
(b) Prove that this supremum is not attained.
Solution by Earl R. Barnes, Morgan State University, Baltimore, MD. Let ξ = b−ab
.
ξ 1
Since a < 0 < b, we have 0 < ξ < 1. Note that 0 a d x + ξ b d x = 0. Let λ = g(ξ ).
1
For any f satisfying the conditions of the problem, we have 0 f (x) d x = 0 and
a ≤ f (x) ≤ b, so

December 2014] PROBLEMS AND SOLUTIONS 949


 1  ξ  1
φ( f ) = (g(x)−λ) f (x) d x = (g(x)−λ) f (x) d x + (g(x)−λ) f (x) d x
0 0 ξ
 ξ  1  ξ  1
≤ (g(x)−λ)a d x + (g(x)−λ)b d x = a g(x) d x + b g(x) d x.
0 ξ 0 ξ

Equality can hold if and only if f (x) = a a.e. on [0, ξ ] and f (x) = b a.e. on [ξ, 1].
This can happen only if f is discontinuous at ξ , so the inequality is strict for all f ∈
S. On the other hand, this upper bound can be approached as closely as we like by
choosing ε small and positive and taking f (x) = a for 0 ≤ x ≤ ξ − ε, f (x) = b for
[ξ + ε, 1], and f linear on the interval [ξ − ε, ξ + ε].
Also solved by. K. F. Andersen (Canada), R. Bagby, P. Bracken, R. Chapman (U. K.), S. J. Herschkorn, B.
Karaivanov, O. Kouba (Syria), J. C. Linders (Netherlands), J. H. Lindsey II, O. P. Lossers (Netherlands), I.
Pinelis, Á. Plaza (Spain), A. Stenger, R. Stong, E. I. Verriest, S. V. Witt, GCHQ Problem Solving Group
(U. K.), TCDmath Problem Group (Ireland), and the proposer.

An Inequality for the Partition Function


11675 [2012, 801]. Proposed by Mircea Merca, Constantin Istrati Technical College,
Campina, Romania. Let p be the Euler partition function, i.e., p(n) is the number of
nondecreasing lists of positive integers that sum to n. Let p(0) = 1, and let p(n) = 0
for n < 0. Prove that for n ≥ 0 with n = 3,

p(n) − 4 p(n − 3) + 4 p(n − 5) − p(n − 8) > 0.

Solution by Roberto Tauraso, Università di Roma “Tor Vergata,” Rome, Italy. Let P(x)
be the generating function for integer partitions,


∞ 

1
P(x) = p(n)x = n
.
n=0 n=1
1 − xn

We have to prove that for n = 3, the coefficient of x n in (1 − 4x 3 + 4x 5 − x 8 )P(x) is


positive. Since

1−4x 3 +4x 5 −x 8 = (x 2 −x 3 )(1−x)(1−x 2 )+(1+x +x 2 )(1−x)(1−x 2 )(1−x 3 ),

we must prove positivity of the coefficient of x n for n = 3 in



1 ∞
1
(x 2 − x 3 ) + (1 + x + x 2
) .
n=3
1−x n
n=4
1 − xn

It is clear that the coefficient of x n in the second term is positive for n = 3, so it is


sufficient to show that the coefficient of x n in the first term is nonnegative for n = 3.
It suffices to show that for n ≥ 4, among partitions with all parts at least 3, there are at
least as many with sum n − 2 as with sum n − 3. This follows from the injection that
adds 1 to the largest part.
Also solved by G. Apostolopoulos (Greece), D. Beckwith, R. Chapman (U. K.), P. P. Dályay (Hungary), C. De-
lorme (France), I. Gessel, J.-P. Grivaux (France), Y. J. Ionin, O. P. Lossers (Netherlands), M. A. Prasad (India),
R. Stong, R. Tauraso (Italy), M. Wildon (U. K.), GCHQ Problem Solving Group (U. K.), and the proposer.

950 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
A Powered Gamma Limit
11676 [2012, 801]. Proposed by D. M. Bătineţu-Giurgiu, “Matei Basarab” National
College, Bucharest, Romania, and Neculai Stanciu, “George Emil Palade” Secondary
School, Buzau, Romania. For real t, find
2
 2 2

lim x sin t (x + 2)(cos t)/(x+1) − (x + 1)cos t/x .
x→∞

Here,  is the gamma function.


Solution by Santiago de Luxán, Fraunhofer Heinrich-Hertz-Institute, Berlin. We will
2
prove a generalization: with f (t) = cos2 te− cos t , and for a, b ∈ R,
2
2 2

L(a, b) = lim x sin t (x + a)cos t/(x+a−1) −(x + b)cos t/(x+b−1) = (a −b) f (t).
x→∞

Assume that the limit exists (applying L’Hopital’s rule at the end of the calculation
verifies that it does). Apply Stirling’s formula to both (x + a) and (x + b) to obtain
cos2 t cos2 t 
2 x + a − 1 x + b − 1
L(a, b) = lim x sin t −
x→∞ e e
2 2
 2 2

= lim e− cos t x 1−cos t (x + a − 1)cos t − (x + b − 1)cos t
x→∞
2 2
x + a − 1 cos t x + b − 1 cos t

− cos2 t e e
=e lim .
x→∞ 1/x
This is an indeterminate limit that can be evaluated using l’Hopital’s rule:
2 2 
a − 1 − sin t b − 1 − sin t
L(a, b) = f (t) lim (a − 1) 1 + − (b − 1) 1 +
x→∞ x x
  2
= f (t) (a − 1) − (b − 1) = (a − b) cos2 t e− cos t .
2
For the case in the problem as stated, a = 2 and b = 1 so L = e− cos t cos2 t.
Also solved by K. F. Andersen (Canada), R. Boukharfane (Canada), P. Bracken, R. Chapman (U. K.), H. Chen,
P. P. Dályay (Hungary), A. Ercan (Turkey), D. Fleischman, C. Georghiou (Greece), O. Geupel (Germany),
M. L. Glasser, J.-P. Grivaux (France), O. Kouba (Syria), K.-W. Lau (China), J. Li, O. P. Lossers (Netherlands),
H. M. Mahmoud, G. Martin (Canada)R. Nandan, M. Omarjee (France), P. Perfetti (Italy), I. Pinelis, R. Stong,
D. B. Tyler, GCHQ Problem Solving Group (U. K.), and the proposers.

Dedekind η Function Disguised


11677 [2012, 880]. Proposed by Albert Stadler, Herrliberg, Switzerland. Evaluate
∞ 
 √ √ 
1 + 2e−nπ 3 cosh(nπ/ 3) .
n=1

Solution by Radouan
√ √ Boukharfane, Polytechnique Montréal, Montreal, Canada. The
π 3/18 4
answer is e / 3. We use the Dedekind η function defined for a complex number
t with positive imaginary part by

πit 

 
η(t) = e 12 1 − e2πint .
n=1

December 2014] PROBLEMS AND SOLUTIONS 951


It is well known that√ this function satisfies the functional equation η(−1/t) =

−it η(t). Put t = i/ 3, and use −1/t = 3t to derive
∞ 
 √ √ 
1 + 2e−nπ 3
cosh(nπ/ 3)
n=1
∞ 
 √  

√ nπ
−√
= 1 + e−nπ 3
e 3 +e 3

n=1
∞  
∞
  n=1 1 − e
6πint πit
e− 4 η(3t)
= 1+e 2πint
+e 4πint
= ∞  = πit
n=1 1 − e e− 12 η(t)
2πint
n=1

π 3
η(−1/t) π √ e 18
− πit √
=e 6 = e 6 3 −it = √ .
η(t) 4
3

Editorial comment. Unfortunately the problem appeared with typos, making the prod-
uct divergent. Solutions showing divergence were also accepted.
Also solved by G. Apostolopoulos (Greece), R. Chapman (U. K.), D. Fleischman, O. Geupel (Germany),
M. Omarjee (France), R. Stong, R. Tauraso (Italy), and the proposer.

The Determinant of the Fibonacci Matrix


11678 [2012, 880]. Proposed by Farrukh Ataev Rakhimjanovich, Westminster Inter-
national University in Tashkent, Tashkent, Uzbekistan. Let Fk be the kth Fibonacci
number, where F0 = 0 and F1 = 1. For n ≥ 1, let An be an (n + 1) × (n + 1) matrix
with entries a j,k given by a0,k = ak,0 = Fk for a ≤ k ≤ n and by a j,k = a j−1,k + a j,k−1
for j, k ≥ 1. Compute the determinant of An .
Solution by Yuri Ionin, Central Michigan University, Mount Pleasant, MI. We show
that the determinant is −2n−1 .
Let us call an m-by-m matrix an NW-matrix if each entry not in the the first row or
column equals the sum of its northern and western neighbors; furthermore, it is a unit
NW-matrix if all entries in the first column equal 1. Index the rows and columns from
1 to m. We claim that every unit NW-matrix has determinant 1. We use induction on
m; the claim is immediate for m = 1.
For m ≥ 2, let X be a unit NW-matrix of order m. Obtain Y from X by subtracting
each row from the row immediately below it, leaving row 1 unchanged. Column 1 of Y
is all 0 except for 1 in the first row. For i ≥ 2, we have yi,2 = xi,2 − xi−1,2 = xi,1 = 1.
Also, for i ≥ 3 and j ≥ 2,

yi, j = xi. j − xi−1, j = (xi−1. j + xi, j−1 ) − (xi−2, j + xi−1, j−1 ) = yi−1, j + yi, j−1 ,

so Y is an NW-matrix. The matrix Z obtained from Y by deleting the first row and
column is a unit NW-matrix; by the induction hypothesis, det Z = 1. Expanding the
determinant of Y along the first column yields det(X ) = det(Y ) = det(Z ) = 1.
Clearly det A1 = −1, so choose n ≥ 2. Obtain B from An by leaving the first two
rows unchanged and subtracting from each subsequent row the two rows immediately
above it; note that det B = det An . For i ≥ 3 and j ≥ 1,

bi, j − bi, j−1 = (ai, j − ai−1, j − ai−2, j ) − (ai, j−1 − ai−1, j−1 − ai−2, j−1 )
= ai−1, j − ai−2, j − ai−3, j = bi−1, j .

952 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
The matrix B takes the form
⎡ ⎤
0 1 0 ···
⎢1 2 3 · · ·⎥
⎢ ⎥
⎢0 0 2 · · ·⎥
⎢0 · · ·⎥
⎢ 0 2 ⎥.
⎢. .. .. .. ⎥
⎣ .. . . .⎦
0 0 2 ···
Expanding det B along the first two columns yields det B = − det(2W ), where W is a
unit NW-matrix of order n − 1. By the claim, det An = −2n−1 det(W ) = −2n−1 .
Editorial comment. Sergio Falcón and Ángel Plaza observed that the problem and its
solution appear as an example in A. R. Moghaddamfar, S. M. H. Pooya, General-
ized Pascal triangles and Toeplitzmatrices,
 Electron. J. Lin. Alg. 18 (2009), 564–588.
Ionin’s matrix W , with i, j-entry i+i j , is shown to have determinant 1 via four proofs
in A. Edelman and G. Strang, Pascal Matrices, this MONTHLY 111 (2004), 189–197;
an earlier such proof appears in C. A. Rupp, Problem 3468, this MONTHLY 37 (1930),
552 (solution by H.T.R. Aude, 38 (1931), 355).
Also solved by D. Beckwith, R. Chapman (U. K.), P. P. Dályay (Hungary), C. Delorme (France), S. Falcón
& Á. Plaza (Spain), O. Geupel (Germany), J. P. Grivaux (France), E. A. Herman, B. Karaivanov, O. Kouba
(Syria), O. P. Lossers (Netherlands), M. Omarjee (France), R. E. Prather, C. P. Rupert, R. Stong, R. Tauraso
(Italy), J. van Hamme (Belgium), Armstrong Problem Solvers, GCHQ Problem Solving Group (U. K.), and
the proposer.

Lower Bound on a Product


11679 [2012, 000]. Proposed by Tim Keller, Orangeville, CT. Let n be an integer
greater than 2, and let a2 , . . . , an be positive real numbers with product 1. Prove that
 2  n 2n−1
n
(1 + ak )k > .
k=2
e 2

Solution by Traian Viteam, Punta Arenas, Chile. For n = 2 the inequality reduces to
4 > 2/e, which is trivial. For 2 < k ≤ n, the AM–GM inequality implies that

1 1 ak ak k kk ak2
(1 + ak )k = + ··· + + + ≥ .
k−2 k−2 2 2 (k − 2)k−2 4
Multiplying (1 + a2 )2 > a22 together with these inequalities for k ∈ {3, ..., n} yields

 1 n−1  n 2n−1
n
(n − 1)n−1 n n a22 · · · an2
(1 + ak ) >
k
=2 1− ,
k=2
22 4n−2 n 2
n
since i=2 ai = 1 is assumed. Using e x ≥ 1 + x with x = n−1 1
, it follows that
n−1
1 1 −(n−1)
e ≥ 1+ = 1−
n−1 n
and hence (1 − n1 )n−1 ≥ e−1 . Substituting this inequality into the product inequality
above yields the stated result.
Also solved by G. Apostolopoulos (Greece), R. Boukharfane (Canada), E. Eyeson, D. Fleischman, N. Grivaux
(France), S. Kaczkowski, O. Kouba (Syria), O. P. Lossers (Netherlands), R. E. Prather, D. B. Tyler, GCHQ
Problem Solving Group (U. K.), and the proposer.

December 2014] PROBLEMS AND SOLUTIONS 953


REVIEWS
Edited by Jeffrey Nunemacher
Mathematics and Computer Science, Ohio Wesleyan University, Delaware, OH 43015

Mostly Surfaces. By Richard Evan Schwartz. American Mathematical Society, Providence,


RI, 2010, xiii+ 314 pp., ISBN 978-0-8218-5368-9, $47.00.

Reviewed by Genevieve S. Walsh


This excellent book, aimed at undergraduates, delves into a wide variety of beautiful
topics, and is a great place to begin thinking about fun and serious mathematics. My
general advice for those considering this enjoyable mathematical romp is just to go
ahead and jump.
Mostly Surfaces is not what one might expect out of an undergraduate topology
course. For example, it does not contain the definition of a topology. Instead this book
advances quickly, assuming very little mathematical background, into current topics
in geometry and topology. While it might be difficult to fully experience this kind of
depth at the undergraduate level, seeing exciting new topics is important. This is what
Mostly Surfaces does exceptionally well. A course that involves a good 2–3 weeks
of wallowing in the joys of separation properties would make a nice and probably
necessary complement to a course based on this book.
So what fun and exciting topics are contained in this book? And how does it go
from the definition of a group to Veech surfaces?
This book actually gets through a lot of topology very quickly by restricting to met-
ric spaces. Then the author uses the gluing construction to create lots of surfaces. He
justifies the restriction to metric spaces by pointing out that in the context of topolog-
ical spaces, it takes a lot of time to understand what the construction actually means,
which is completely true. However, students who have not had some experience with
metric spaces will probably not be able to go from the definition of a metric space to
proving that a torus is a manifold in five pages. The section with examples of gluing is
great, and more of this kind of thing could be added by the instructor. In fact, probably
choosing and expanding upon a few of the topics would make a reasonable course, and
would allow one flexibility to adapt to the audience.
After dispensing with the classification of surfaces, Mostly Surfaces moves along
quickly. The definition of a group is given, without assuming that the reader has any
familiarity with the concept. The book forges ahead with the definition of fundamental
group. There are also some nods to the more uninitiated students, where loops in S 1 are
described as paths a bug might take walking around the circle, ending at the same place
it began. Then if you are standing in the middle of the circle, watching the bug carefully
the whole time (without moving your feet!), the winding number of the path is the
number of times your head has been twisted around, clockwise or counterclockwise.
There is also some meaty material here. The author proves the isomorphism theorem,
that the deck group of a simply connected cover E : X̃ → X is the same as π1 (X ),
when X̃ and X are path-connected metric spaces. He also proves the existence of
http://dx.doi.org/10.4169/amer.math.monthly.121.10.954

954 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
universal covers for “good” metric spaces, referring the reader to Hatcher for a more
formal treatment. Allen Hatcher’s book [3] is, in my mind, the definitive text for this
material. Anyone (strong undergraduate or graduate student) with plans to use the
fundamental group later in their work should be pointed here.
At this point I should emphasize that this book is very different from other under-
graduate books which concern surfaces. Armstrong [1], for example, is for a much
different kind of course. The two books cover some of the same material, namely the
fundamental group and classification of surfaces. However, Armstrong [1] is much
more traditional, and less concerned about reaching research material. The same goes
for most books (or course notes) on Riemann surfaces. In fact, one of the things that
makes Mostly Surfaces so worthwhile is that the material (in totality) and point of view
are not presented elsewhere at an (even allegedly) undergraduate level.
The text that most reminds me of Mostly Surfaces is a book by Francis Bona-
hon, Low-Dimensional Geometry: From Euclidean Surfaces To Hyperbolic Knots, [2],
which is also published under the AMS Student Mathematical Library. There is some
material overlap with Mostly Surfaces: they both cover hyperbolic, Euclidean and
spherical surfaces, and gluing constructions. More significantly, Bonahon [2] is very
similar in spirit: it starts with surfaces, with a decidedly geometric point of view, and
then uses this to launch into research level material. Bonahon [2] goes in the direc-
tion of hyperbolic 3-manifolds, whereas Mostly Surfaces delves into different struc-
tures on surfaces, such as flat cone surfaces and hyperbolic structures. Both books
have been used by my colleagues in successful topics courses. I can easily imagine
teaching courses at a variety of levels based on either, or both, of these books. For
example, a little bit of Chapter 1 along with Part 1: “Surfaces and Topology” and Part
2: “Surfaces and Geometry” of Mostly Surfaces would make a great (although some-
what unusual) introductory geometry/topology course. One could even just do Part 2
on its own, lingering on the more beautiful parts, and filling in the classification of
surfaces. It could be nice to supplement this with some explicit examples from Chap-
ter 5 of Bonahon [2] and of geometric universal covers in Chapters 6 and 7 of that
same book. For a more advanced audience, Parts 3, 4, and 5 of Mostly Surfaces would
provide an overview/review of complex analysis and an entry into the fields of cone
surfaces, Veech surfaces, and Teichmüller theory. That same audience would appreci-
ate the three-dimensional hyperbolic geometry, Kleinian groups, and the example of
the figure-8 knot complement in [2].
Both Mostly Surfaces and Bonahon [2] have too much material for a beginning
mathematician to digest at one meal. But that itself is part of an introduction to re-
search: there is usually too much to do. And there is a lot of beautiful math in this
book.
The material discussed above is essentially Section 1, “Surfaces and Topology.”
Section 2, “Surfaces and Geometry,” is not really standard in the undergraduate cur-
riculum, but it should be. There are sections on the three geometries in dimension 2:
Euclidean, spherical, and hyperbolic. The section on Euclidean geometry contains par-
ticularly nice and unusual tidbits, like Pick’s theorem. Actually I like all of it: the area
of a hyperbolic surface in terms of Euler characteristic, the Hairy Ball theorem, the
isometries of S 2 , etc. Although I think some of the other material in this book might
be difficult for beginning undergraduates to digest, Section 2 is accessible, interesting,
important, and often not taught in an undergraduate curriculum. Spherical and hyper-
bolic geometry ideally would be taught widely, not just to develop geometric intuition
but because they are applicable to so many areas of science. For people interested in
learning more deeply about these topics, as well as many others, I would highly advise
the now classic Thurston [5].

December 2014] REVIEWS 955


Section 3, “Surfaces and Complex Analysis,” is a nice overview of complex anal-
ysis, with a decidedly geometric bent. This section plus other topics from the book
might make a nice companion to Needham [4] or a more traditional complex analysis
course.
Sections 4 and 5, “Flat Cone Surfaces” and “The Totality of Surfaces,” are more
advanced and specialized than the previous part of the book. Cone surfaces them-
selves are possibly easier to visualize than smooth surfaces, and the connection be-
tween translation surfaces and billiard paths brings in dynamics nicely. (See Zorich
[6] for a more comprehensive treatment.) However, computing the Veech group of a
translation surface is probably at a level of abstraction above where an undergraduate
is happy. Similarly, a student who has just seen the definition of a surface in Chapter
2 will (probably) not have developed enough intuition in one semester to cope with
Teichmüller space and the moduli space. I think this material is absolutely great for
beginning graduate students or anyone with a research interest in geometry, topology,
or dynamics. Continued fractions and the Farey graph (Chapter 19) are, of course,
good for everyone.
Section 6, “Dessert,” is exactly that: some topics the author could not resist. The
Banach-Tarski theorem is a classic, and would work well with a course in point-set
topology. There seems to be a theme in this section of cutting things up and putting
them back together a different way, or perhaps not being able to do so. I am particu-
larly pleased that there is some discussion on scissors congruence. A dissection of a
polyhedron is a description of that polyhedron as a finite union of (smaller) polyhe-
dra which have disjoint interiors. Two polyhedrons P and Q are scissors congruent
if they admit dissections P = P1 ∪ P2 ∪ · · · ∪ Pn and Q = Q 1 ∪ Q 2 ∪ · · · ∪ Q n such
that each Pk is isometric to Q k . The Dehn dissection theorem says that the cube and
the tetrahedron (of the same volume) are not scissors congruent. Finally, the Cauchy
rigidity theorem is proven, which asserts that one cannot “flex” a three-dimensional
strictly convex Euclidean polyhedron.
I really like Mostly Surfaces and I hope to use it for an undergraduate course in the
very near future. My only criticism is that I think using this book, especially in totality,
with beginning undergraduates is a bit like playing chess with a 5-year old. There may
be little plastic bishops thrown against the wall. But it’s probably good for them. And
it’s definitely fun.

REFERENCES

1. M. A. Armstrong, Basic Topology. Undergraduate Texts in Mathematics. Springer, New York, 1997,
http://dx.doi.org/10.1007/978-1-4757-1793-8.
2. F. Bonahon, Low-Dimensional Geometry: From Euclidean Surfaces To Hyperbolic Knots. Student Mathe-
matical Library: IAS/Park City Mathematical Subseries. American Mathematical Society, Providence, RI,
2009, http://dx.doi.org/10.5860/choice.47-5697.
3. A. Hatcher, Algebraic Topology. Cambridge Univ. Press, Cambridge, 2001.
4. T. Needham, Visual Complex Analysis. Oxford Univ. Press, Oxford, 1999, http://dx.doi.org/10.
2307/3618747.
5. W. P. Thurston, Three-Dimensional Geometry and Topology. Princeton Univ. Press, Princeton, NJ, 1997.
6. A. Zorich, Flat surfaces, in Frontiers in Number Theory, Physics, and Geometry. Vol. I. Springer, Berlin,
2006, http://dx.doi.org/10.1007/3-540-31347-8_13.

Tufts University, Medford, MA 02144


genevieve.walsh@tufts.edu

956 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
EDITOR’S ENDNOTES

The following comes to us from former MONTHLY Editor Dan Velleman.


I recently learned from Donald Knuth that a version of Question 2 in my paper “A
drug induced random walk” (this MONTHLY 121 (2014), 299–317) appeared in the
MONTHLY problems section as problem E3429. It was proposed in 1991 by Donald
Knuth and John McCarthy, and a solution by Walter Stromquist and Tim Hesterberg
was published in the August-September 1992 issue. The solution includes an editorial
comment tracing the problem back to at least 1982.

We received the following from George Andrews at Penn State.


I wish to point out a reference relevant to the paper:
 x
“An asymptotic formula for 1 + x1 based on the partition function” by C. P. Chen
and J. Choi, this MONTHLY 121 (2014), 338–343.
Their main theorem as a formal series in x1 follows from Nathan Fine’s Theorem 1
on page 37 of his book Basic Hypergeometric Series and Applications, AMS, Provi-
dence, 1988.
 k
In Fine’s theorem, take C j (k) = k!1 j+1
1
. Then

q
ψ j (q) = e j+1 ,

which implies
 ∞ qj −1 log(1−q) 1
ψ j (q j ) = e j=1 j+1 = e−1−q = e−1 (1 − q)− q .
j≥1

The rest follows by replacing q by − x1 in Fine’s Theorem 1. If one examines this


Theorem, following Fine’s analysis, one can easily deduce that the Chen-Choi Theo-
rem is actually an identity for |x| > 1 because |a j | ≤ p( j).
In regard to the same paper, a grant number was incorrectly listed. The grant
2012-0002957 should have been listed as 2010-0011005. We stand corrected with our
apologies.

Rob Pratt, senior manager at SAS, sends along the following.


A special case ( j = n) of the problem addressed in “Finding Your Seat Versus
Tossing a Coin” (this MONTHLY 121 (2014), 545–546) appeared as Problem 735 in

http://dx.doi.org/10.4169/amer.math.monthly.121.10.957

December 2014] EDITOR’S ENDNOTES 957


The College Mathematics Journal (September 2002). The solution (“Finding the right
seat”) appeared in September 2003 and contained several references as well as the
generalization that appears in the current article.
With regard to the same paper, we have the following from Jerrold W. Grossman.
The seat-taking problem on page 545 of the June/July Monthly has been around for
a long time. It appears in Peter Winkler’s Mathematical Puzzles (published in 2003,
page 35) and it was on Car Talk (October 4, 2004). Surely a reference to one or both
of these would have been appropriate. The generalization from just worrying about
the last person to finding the probabilities for everyone is nice, and is not found in the
usual references one finds when searching for this problem on the Web (for example
by using the words airplane seat random).

Jonathan Sondow and Petros Hadjicostas send along the following.


In Chris D. Lynd’s article “Using Difference Equations to Generalize Results
for Periodic Nested Radicals” (this MONTHLY 121 (2014), 45–59), he first recalls
Ramanujan’s 1911 formula



3= 1 + 2 1 + 3 1 + 4 1 + ....

Then he says that in 1935 Herschfeld proved that (a) the limit



lim 1 + 2 1 + 3 1 + ...(n − 1) 1 + n 1
n→∞

exists and equals 3, and (b) for a nonnegative sequence an , the infinite nested radical


a1 + a2 + a3 + ...

−n
converges if and only if the sequence (an )2 is bounded.
While all that is true, Herschfeld was not the first. Eight years earlier, in a note on
page 348 of “The Collected Papers of Srinivasa Ramanujan” (G.H. Hardy, P.V. Seshu
Aiyar and B.M. Wilson, eds., Cambridge University Press, 1927; reprinted by Chelsea,
1962), T. Vijayaraghavan proved statements (a) and (b).

Michael Levitan at Villanova helps us correct a typo.


In the May 2014 MONTHLY, there is a typo in Daniel Ullman’s review of “Math
on Trial.” On p. 464, in the third paragraph, the first word of the 5th line should read
“certainty” not “certainly.”

958 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
Jerry Koliha sends the following comments about his MONTHLY paper “Evaluating
Lebesgue integrals efficiently with the FTC” (this MONTHLY 121 (2014), 361–364).
There is a misprint in my paper, but it occurs four times. In the theorems Lebesgue’s
FTC, Theorem 1, Theorem 2, and Theorem 3, the first occurrence of ∞ in each of these
theorems should be −∞.

Finally, Michael Maltenfort from Northwestern University contributes the following.


On page 487 of “On the Solution of Linear Mean Recurrences,” (this MONTHLY,
121 (2014), 486–498), the authors wish to show r + mr1m = 1 + m1 for |r | > 1. They
claim that |r | + m|r1|m < 1 + m1 , which is easily seen to be false (e.g., r = m = 3); in
fact, the given proof actually shows that this inequality always holds in the opposite
direction.
The authors’ approach, which is to show that |r + mr1m | = 1 + m1 , cannot work:
when m ≥ 2 is even, the intermediate value theorem guarantees that |r + mr1m | = 1 +
1
m
for some r between −1 and −2. Nonetheless, the originally desired result is true,
and the authors prove a more general result in Example 2.2.

Scott T. Chapman, Editor

December 2014] EDITOR’S ENDNOTES 959


MONTHLY REFEREES FOR 2014

The MONTHLY expresses its appreciation to the following people for their help in
refereeing during the past year (July 2013 to July 2014).We could not function suc-
cessfully without such people and their hard work.

Abrams, Aaron Bivens, Irl D’Andrea, Carlos


Abrams, Daniel Boas, Harold Davis, Diana
Agarwal, Ravi Bolt, Michael DeAngelis, Valerio
Akopyan, Arseniy Böröczky, Károly, Jr. Deckelman, Steven
Alben, Silas Borovik, Alexandre del Campo, Abraham
Alekseyev, Max Borradaile, Glencora DeWitt, Meghan
Alexander, Stephanie Borwein, David Dilcher, Karl
Ali, Sayel Bouzarth, Liz Dow, Alan
Alladi, Krishnaswami Bownik, Marcin Dragović, Vladimir
Allen, G. Donald Bremner, Andrew Draisma, Jan
Allman, Elizabeth Brenton, Lawrence Dresden, Greg
Alonso, David Bressoud, David Drmota, Michael
Álvarez, Juan-Carlos Brooksbank, Peter Du, Bau-Sen
Anderson, Dan Bubboloni, Daniela Duncan, John
Anderson, David Buchanan, J. Robert Edwards, Robert
Andrade, Julio Bullen, Peter Efthimiou, Costas
Andrew, Alan Burns, Keith Eisenberg, Bennett
Anich, Nicholas Butler, Steve Elder, Sam
Anisca, Razvan Cahen, Paul-Jean Evans, Anthony
Arnold, Maxim Cairns, Grant Evans, Ron
Arnoux, Pierre Calkin, Neil Facchini, Alberto
Aschbacher, Michael Callan, David Fechner, Włodzimierz
Ash, J. Marshall Cannata, Roberto Feinsilver, Philip
Askey, Richard Carter, Nathan Filaseta, Michael
Avis, David Castro, Francis Fillion, Nicholas
Babenko, Yuliya Čerin, Zvonko Finch, Steven
Badertscher, Erich Chabert, Jean-Luc Finn, David
Baeth, Nicholas Chamberland, Marc Fintzen, Jessica
Baginski, Frank Chauve, Cedric Folland, Gerald
Baginski, Paul Chen, Chao-Ping Ford, Kevin
Bakker, Lennard Chen, Hongwei Forgács, Tamás
Banks, John Choi, Stephen Frankl, Peter
Baragar, Arthur Clark, Pete Friedman, Greg
Barany, Michael Coll, Vincent Fuchs, Dmitry
Bayer, Margaret Conrad, Keith Fuchs, Elana
Beck, Matthias Cook, Sam Galovich, Jennifer
Behrends, Ehrhard Cook, William Galperin, Gregory
Benjamin, Arthur Coons, Michael Garcı́a, Victor
Bergelson, Vitaly Cox, David Garcia, Stephan
Berger, Lisa Coykendall, James Garcı́a-Sánchez, Pedro
Bergner, Julia Cráciun, Gheorghe Garity, Dennis
Berndt, Bruce Cristea, Mihai Garrity, Thomas
Beukers, Frits Cruz-Uribe, David Gaspar, Jaime
Bhargava, Manjul Curgus, Branko Geroldinger, Alfred
Binding, Paul Daileda, Ryan Gessel, Ira
Bishop, Christopher Dajani, Karma Gibbons, Courtney

http://dx.doi.org/10.4169/amer.math.monthly.121.10.960

960 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
Giblin, Peter Kaftal, Victor Look, Daniel
Glasser, Lawrence Kain, Ben Loper, Alan
Glaz, Sarah Kantrowitz, Robert Lorch, John
Goldman, William Kaplan, Samuel Lovett, Shachar
Goodman-Strauss, Chaim Katok, Svetlana Lovett, Stephen
Gora, Pavel Katz, Eric Luca, Florian
Gordon, Cameron Katz, Victor Lucas, Tom
Gorkin, Pamela Kenyon, Richard Lucchini, Andrea
Gornet, Ruth Khavinson, Dmitry Ludwig, Lew
Graham, Ron Khovanova, Tanya Luecke, John
Granville, Andrew Kim, Junhyong Luijten, Erik
Gray, Jeremy Kim, Peter Lutwak, Erwin
Gremaud, Pierre Klain, Dan Maclagan, Diane
Gressman, Philip Kleitman, Daniel Malandro, Martin
Griffiths, Phillip Klingler, Lee Manlove, David
Grinberg, Eric Knill, Oliver Mann, Stephen
Grinstead, Charles Knopfmacher, Arnold Manning, Anthony
Gross Mark Kobayashi, Mitsuo Mansour, Toufik
Grujic, Zoran Koch, Sarah Margetis, Dionisios
Grundman, Helen Koliha, Jerry Margolius, Barbara
Grynkiewicz, David Kontorovich, Alex Markarian, Roberto
Hajja, Mowaffaq Koralov, Leonid Matousek, Jirı́
Hales, Tom Koshlukov, Plamen Matthews, Gretchen
Halter-Koch, Franz Kossak, Roman Maurer, Stephen
Hampton, Marshall Kowalski, Travis McCleary, John
Harbater, David Kozek, Mark McCuan, John
Hart, Joan Kra, Irwin McLean, K. Robin
Hartshorne, Robin Krantz, Steven Mellor, Blake
Hasenauer, Richard Krivelevich, Michael Melman, Aaron
Hasfura-Buenaga, Roberto Kuczmarski, Fred Mendivil, Franklin
Hasselblatt, Boris Kufner, Alois Mercer, Peter
Hell, Pavol Kutsia, Temur Merino, Dennis
Henk, Martin Lambers, James Miller, Steven
Henle, James Lamoureux, Mike Minda, David
Hinkkanen, Aimo Langer, Joel Moll, Victor
Hodges, Wilfrid Larson, Lee Montgomery, Hugh
Hoffman, Michael Lawlor, Gary Moree, Pieter
Hofmann, Karl Lazebnik, Felix Mortini, Raymond
Holmer, Justin Leader, Imre Mossinghoff, Michael
Holmsen, Andreas Leamer, Micah Muir, Jerry
Howard, Fredric Leckband, Mark Murty, Ram
Howards, Hugh Lehnen, Al Nathanson, Melvyn
Howe, Roger Leise, Tanya Nelsen, Roger
Hrbacek, Karel Lemmermeyer, Franz Nica, Bogdan
Hubbard, John Levesque, Claude Niederhausen, Heinrich
Ipsen, Ilse Levi, Mark Nitecki, Zbigniew
Irving, Rob Levy, Doron Oh, Byeong-Kweon
Isaacs, Martin Lewis, Barry Olberding, Bruce
Isaksen, Dan Lewis, Mark O’Leary, Robbin
Ivanov, Sergei Lewis, Tim Olofsson, Peter
Jarai, Antal Li, Hanfung Ono, Ken
Jerónimo-Castro, Jesús Li, Zhongshan Osborn, Judy-Anne
Johnson, Warren Lindstrom, Tom Osgood, Brad
Jones, Nathan Little, John Ott, Katharine
Jones, Rafe Littlejohn, Lance Pak, Igor

December 2014] INDEX TO VOLUME 121 961


Palsson, Eyvindur Schneider, Rolf Tichy, Robert
Pambuccian, Victor Schumer, Peter Tilley, Burt
Parker, Darren Scroggs, Jeff Tokieda, Tadashi
Parks, Harold Seal, Gavin Townsend, Alex
Pedersen, Henrik Segert, Jan Traves, Will
Pengelley, David Seoane-Sepulveda, Juan Trench, William
Perdomo, Oscar Shao, Shuanglin Treviño, Enrique
Pérez-Chavela, Ernesto Shattuck, Mark Troubetzkoy, Serge
Peterson, Jonathon Sheard, Michael Tsukerman, Emmanuel
Peterson, William Shene, Ching-Kuang Tucker, Alan
Pinasco, Juan Shifrin, Theodore Tucker, Thomas
Pinkus, Allan Shparlinski, Igor Tuenter, Hans
Pinsky, Mark Siegel, Alan Tuncali, Murat
Pippenger, Nicholas Simpson, Carlos Tuyls, Karl
Pollack, Paul Smith, Greg Ulas, Maciej
Pomerance, Carl Smith, Laura Ullrich, David
Prasolov, Victor Smyth, Malcolm Vega, Maria
Proctor, Emily Sofo, Anthony Verde-Star, Luis
Prodinger, Helmut Sondow, Jonathan Vincze, Csaba
Propp, James Spencer, Craig Volkmer, Hans
Pudwell, Lara Spindler, K. Wagon, Stan
Radunskaya, Ami Spirova, Margarita Walsh, Gary
Ramı́rez Alfonsı́n, Jorge Spoehel, Reto Wangberg, Aaron
Reich, Simeon Srivastava, Hari Washington, Lawrence
Reimann, Jan Stahl, Saul Weber, Matthias
Reisner, Shlomo Stanhope, Elizabeth Weidman, Patrick
Respondek, Jerzy Stanton, Dennis Weintraub, Steven
Richter, Christian Steele, Michael Wilkens, George
Richter, Wolf-Dieter Steinsaltz, David Williams, Kenneth
Riehl, Emily Steprans, Juris Wilson, James
Ringe, Don Steuding, Jorn Winkler, Peter
Rittaud, Benoı̂t Stevens, Christine Winkler, Reinhard
Rivin, Igor Stillwell, John Winterhof, Arne
Robbins, Neville Stolarsky, Kenneth Wiseman, James
Roch, Sebastian Stoll, Michael Woo, Peter
Rogers, Keith Stowe, Dennis Yiu, Paul
Rosen, Julian Strang, Gilbert Young, Matt
Ross, Kenneth Straub, Armin Yukna, S.
Rosson, Holly Strauch, Oto Zaffran, Dan
Rouse, Jeremy Stuart, Jeffrey Zalcman, Lawrence
Ruzsa, Imre Styer, Robert Zanello, Fabrizio
Sáenz, Ricardo Su, Francis Zaslavsky, A.
Sándor, József Sullivan, Rosemary Zeilberger, Doron
Saari, Kalle Sullivant, Seth Zeitz, Paul
Sabia, Juan Szilasi, Zoltán Zhan, Xingzhi
Sapir, Mark Szymanski, Waclaw Zhang, Hanzhe
Sather-Wagstaff, Sean Talvila, Erik Ziemer, Bill
Schaeffer, George Tam, Tin-Yau Zieve, Michael
Schinzel, A. Teismann, Holger Zivaljevic, Rade
Schlenk, Felix Thorne, Frank Zorn, Paul
Schmah, Tanya Thornhill, Christopher

962 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
From the Monthly Over 100 Years Ago. . .

Halsted on Gauss

“It has been said the heroic age of non-euclidean geometry is passed, since long gone are
the days when Lobachevski flinched into calling his system ‘imaginary geometry,’ and, I might
add, Gauss kept a more cowardly silence because, as he confesses, he ‘fears the outcry of the
Boeotians,’ (proverbial for stupidity).
“. . . the designation ‘Absolute Geometry’ was first used by myself as a rendering for John
Bolyai’s phrase scientiam spatii absolute veram, in which I have always gloried as showing
the magnificent verve of the young Magyar hero, victim of the meanness of Gauss, as was also
his own son who passed his life an exile here in Colorado.”

G. B. Halsted, Duncan M. Y. Sommerville, Amer. Math. Monthly 19 (1912) 1–4.

Reaction 1 . . .

“I am sure that you know by reputation, if not personally, Mr. George Bruce Halsted, who
has given much attention to the study of Non-Euclidean Geometry, a branch of mathematics
in which you have taken great interest. If I am correct in this, you are aware that he has
been a great champion of John Bolyai. For some reason this has developed in him a spirit of
antagonism to my grandfather and led him into a very unjust attack upon him. In the January
number for this year of ‘The American Mathematical Monthly’ he had an article in which he
declared that John Bolyai was a victim of ‘the meanness of Gauss’. To this he added that one
of Gauss’ own sons also was a victim of this ‘meanness’, and that he had spent his life an exile
in the State of Colorado.
“Professor Florian Cajori of Colorado Springs in this State, wrote me after reading Profes-
sor Halsted’s article. He knew that the reference could only be to Eugene Gauss, my father.
. . . I promptly replied to Professor Cajori that my father was in no sense an exile from his
home in Göttingen, and furthermore, that he had never been in the State of Colorado.
“. . . The enclosed letters also show the circumstances under which my father left Göttingen,
and I think they are a complete vindication of my grandfather against the charge made by
Halsted that he had ‘meanly’ treated my father.”

Letter from Robert Gauss to Felix Klein - September 3, 1912, available at


http://en.wikisource.org/wiki/Robert_Gauss_to_Felix_Klein_-_September_
3,_1912.

Reaction 2 . . .

“American mathematicians are indebted to Halsted for making the writings of the creators
of non-Euclidean geometry accessible to them in the English language. His commentaries
were always spicy and valuable, even though, as a historian, Halsted was not always able to
maintain the attitude of an impartial judge. At his hands Gauss, for instance, received scant
justice.”

F. Cajori, George Bruce Halsted, Amer. Math. Monthly 29 (1922) 338–340.


Expanded version available at http://arxiv.org/abs/1405.4198.

—Submitted by Jonathan Sondow

http://dx.doi.org/10.4169/amer.math.monthly.121.10.963
MSC: Primary 01A70; Secondary 53-03

December 2014] 963



One More Proof of the Irrationality of 2

A number of proofs of the irrationality of 2 are found in mathematical litera-
ture. I have also devised one such proof.

Proof. Clearly,
√ 12 < 2 < 22 . Since 2 lies between
√ two consecutive
√ integral
squares, 2 cannot be an integer. Suppose 2 is rational, i.e., 2 = mn , 1 <
n<m< √2n; m, n ∈ N. Since m > n, we may write: m = n + r, n > r, r ∈ N.
Hence, 2 = n+r . Let g = (n, r ). So, n = gs and r = gt, s > t; (s, t) =
n √
1; g, s, t ∈ N. Then, 2 = s+t
s
. In other words, 2s 2 = s 2 + 2st + t 2 . So,

s 2 = t (2s + t). (1)

Since t | R.H.S. of equation (1) and (s, t) = 1, so t = 1 if t is to divide the L.H.S.


of (1). Hence, s 2 = 2s + 1, i.e., s 2 − 2s + 1 = 2; i.e.,

(s − 1)2 = 2 = (± 2)2 . (2)

Since s is an integer, so is (s − 1). Thus,


√ equation (2) implies that there is an
integer between 1 √and 2 that equals 2; that is absurd. Hence, our supposition
was wrong and so 2 is irrational.

We can generalize the above argument.



General proof. Let a be a square-free positive integer and [ a] = b, where √ []
denotes the greatest integer contained in the square root. Then√b2 < ( a)2 <
(b + 1)2 . Thus, a lies
√ between two consecutive
√ squares; hence, a is not an in-
teger. Suppose that a is rational, i.e., a = mn , 1 < n < m < (b + 1)n; m, n ∈
N. Using a division algorithm, we may write: m = bn + r, n > r, r ∈ N. Hence,

a = bn+r . Let g = (n, r ). So, n = gs and r = gt, s > t, (s, t) = 1; g, s, t ∈ N.
√n
Then a = bs+t s
.
Hence,

(a − b2 )s 2 = t (2bs + t). (3)

Since t | R.H.S. of equation (3) and (s, t) = 1, either (a − b2 ) = tu, u ∈ N,


or t = 1. In the first case, we would have s(us − 2b) = t ⇒ s | t, which is
impossible for s > t. In the second case, viz., t = 1, we obtain

s{(a − b2 )s − 2b} = 1. (4)

Equation (4) ⇒ s | 1 ⇒ s = 1, which contradicts our supposition that s > t =


√ it leads to a = (b + 1) , contradicting our supposition a < (b + 1) .
2 2
1. Further,
Hence, a is irrational.

—Submitted by Amrik Singh Nimbran

http://dx.doi.org/10.4169/amer.math.monthly.121.10.964
MSC: Primary 11J72

964 
c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 121
A MERICAN M ATHEMATICAL S OCIETY

New R
elea ses fr
ARNO om th
SWIMM
LD
ING
AGAIN
e AMS
Boris A ST THE
.
Ontari Khesin, Un TIDE
o, Can iv MATH
ICERM ada, an ersity of Toron
, Brown d Serg
e L. Ta to,
CIRCL
Pennsylv Univer bachnik ELEM ES FO
Editors ania State Un sity, Providen o ENTAR R
This b iv ersity, S ce, RI, v, STUDE Y SCH
Vladim
ook re
counts ta te Colle and NTS OOL
ir Arno the wo g e, PA, N
ld. rk an d life atasha
2014; 22
4 of emin Univer Rozhkovska
US$23.2 pages; Softcov ent mat sity, Ma y
0; Order er ; ISBN hematic nhatta a, Kansas Sta
code M : 978-1-
4704-1 ian gram, w An ove n, KS te
BK/86 699-7; Li ith pro rview
blems, o
st US$29
; AMS m Titles in activiti f the popular
es, and
embers this serie
s are co teachin elementary sc
MATH MSRI -publishe g guides hool pro
EMAT Mathem d w ith the
Mat
. -
UNDE ICAL ISBN: 97
8-1-4704
atical
Circles
hematic
al Scienc
es Rese
RSTAN -1695-9;
List US$
Library
, Volum arch In
stitute (M
ESSAY DING 25; All in e 13; 20 SRI).
S ON A OF NA dividual 14;
s US$20 166 pages; So
PHY MAZ TURE ; Order ftc
code M over ;
SIC ING PEARL
THEIR AL PHENOM S FRO CL/13
UNDER ENA A
N THE LV MAL
MATH
EMATIC STANDING D OV SC OST C
IANS BY Roman HOOL ITY
V. I. A
rn Duda, OF MA
A collec old Univer THEM
Arnold tion of short Translat sity of W ATICS
. stories edby Dan rocław,
by Russian This ch iel Dav Poland
2014; 16
7 mathem ronic ies
US$23.2 pages; Softcov atician key asp one seek le of th
e Lvov
0; Order er ; ISBN Vladim ec ing a
: 978-1- ir anywher ts of twenti school
eth-cen cultural and
code M 4704-1 wil
instituti l appeal to an
BK/85 701-7; Li e else in
st US$29 th e extan tu ry Poli
; AMS m
embers
History
of Mat t English
-l
sh mathemat onal overview y-
4704-1 an guage li ics of
A DEC 076-6; Li hematics, Vo terature not described
ADE O st US$39
; AMS
lume 40
; 2014 .
MATH F THE membe ; 23
rs US$31 1 pages; Har
CIRCL BERK .20; Ord dcover
THE E ELEY er code ; ISBN
HMATH : 978-1-
AM THE W /40
VOLUM ERICAN EX AR OF
GUNS
E II PERIEN MATH AND
Zvezdel C E, EMAT
ICS
CA, an ina Stankova E
MATH
d ,
Editors Tom Rike, O Mills College, COMM MATICAL PR
U A
This se
co
akland O
High S akland, WESTE NITIES IN F CTICES AND
mathem nd volume R R
atical to contain
s a va
chool, C
A, WAR I N ALLIES A ANCE AND
pic s, some riety o David
A R OUND ITS
Marie ubin, Sorbon
Titles in new an f
this serie d some enticing and WORL
s are co continu st Curie, ne Un D
ing from imulating
-publishe iversité
MSRI d with th Gauche, Institu s, U
Mathem td
mathém France, and C e mathématiq niversité Pierr
e Mathe Volume
pages; So atical matical
I.
ftcover Circles Sciences
atiques atherin ues de Ju e
ssieu-P et
Order ; ISBN: L ib rary, Vo Re se ar
code M 978-0-82 lume 14
ch Institu
te (MSR d e Ju ssieu-Pa e G o ld
This bo
ris Riv stein, CNRS aris Rive
CL/14 18-491 ; 2014; I).
2-5; List o
US$25;
All indi
approx
imately War I o k suggests a n e Gauch , Instit
364 n math e, Fran u
ce, Edit t de
viduals ew visi
US$18.7 ematics on
5; History and mat of the long- ors
o f Mat h ematicia term influen
4704-1
469-6; Li hematics, Vo ns. ce of W
st US$12 lume 42 orld
6; AMS ; 2014;
membe 39
rs US$10 1 pages; Hardc
0.80; O over ; IS
rder co BN
de HMAT : 978-1-
H/42
www.a Orde Or
ms.org r Online: (8 der by Phon
/books 00)321 e:
tore
(401)4 -4267 (U.S. &
55-400
0 (Worl Canada),
dwide
)
facebo
ok.com
/amerm
athsoc
plus.goo @amermaths
gle.com oc
/+AmsO
rg
1529 Eighteenth St., NW ● Washington, DC 20036

You might also like