You are on page 1of 101

THE AMERICAN MATHEMATICAL

MONTHLY
VOLUME 118, NO. 3 MARCH 2011

Yueh-Gin Gung and Dr. Charles Y. Hu Award for 2011 to 195


Joseph A. Gallian for Distinguished Service to Mathematics
Barbara Faires

Was Cantor Surprised? 198


Fernando Q. Gouvêa

On Legendre’s Work on the Law of Quadratic Reciprocity 210


Steven H. Weintraub

Equimodular Polynomials and the Tritangency Theorems 217


of Euler, Feuerbach, and Guinand
Alexander Ryba and Joseph Stern

The Sharkovsky Theorem: A Natural Direct Proof 229


Keith Burns and Boris Hasselblatt

A First Look at Differential Algebra 245


John H. Hubbard and Benjamin E. Lundell

NOTES

Lines of Best Fit for the Zeros and for the Critical Points 262
of a Polynomial
Grant Keady

Regular Matchstick Graphs 264


Sascha Kurz and Rom Pinchasi

A Recursive Scheme for Improving the Original Rate of 268


Convergence to the Euler–Mascheroni Constant
Edward Chlebus

PROBLEMS AND SOLUTIONS 275

REVIEWS

Logical Labyrinths. By Raymond M. Smullyan 283


Christopher C. Leary

An Official Publication of the Mathematical Association of America


Invitation to Complex
Analysis

Ralph P. Boas
Second Edition revised by Harold P. Boas

An ideal choice for a first course in complex
analysis, this book can be used either as a
classroom text or for independent study.
Written in an informal style by a master
expositor, the book distills more than half a century of experience
with the subject into a lucid, engaging, yet rigorous account. The book
reveals both the power of complex analysis as a tool for applications
and the intrinsic beauty of the subject as a fundamental part of pure
mathematics.

Written at the level of courses commonly taught in American


universities to seniors and beginning graduate students, the book is
suitable for readers acquainted with advanced calculus or introductory
real analysis. The treatment goes beyond the standard material of
power series, Cauchy’s theorem, residues, conformal mapping, and
harmonic functions by including accessible discussions of many
intriguing topics that are uncommon in a book at this level. Readers will
encounter notions ranging from Landau’s notation to overconvergent
series to the Phragmén-Lindelöf theorem. The flexibility afforded by
the supplementary topics and applications makes the book adaptable
either to a short, one-term course or to a comprehensive, full-year
course.

Each topic is discussed in a typical, commonly encountered situation


rather than in the most general, abstract setting. There are no
numbered equations. Numerous exercises interspersed in the text
encourage readers to test their understanding of new concepts and
techniques as they are presented. Detailed solutions of the exercises,
included at the back of the book, both serve as models for students
and facilitate independent study. Supplementary exercises at the ends
of sections, not solved in the book, provide an additional teaching tool.

This second edition of Invitation to Complex Analysis has been


painstakingly revised by the author’s son, himself an award-winning
mathematical expositor.

Catalog Code: ICA Series: Textbooks ISBN: 978-0-88385-764-9


Hardbound, 2010 List: $63.95 Member Price: $50.95
THE AMERICAN MATHEMATICAL

MONTHLY
VOLUME 118, NO. 3 MARCH 2011

EDITOR
Daniel J. Velleman
Amherst College

ASSOCIATE EDITORS
William Adkins Jeffrey Nunemacher
Louisiana State University Ohio Wesleyan University
David Aldous Bruce P. Palka
University of California, Berkeley National Science Foundation
Roger Alperin Joel W. Robbin
San Jose State University University of Wisconsin, Madison
Anne Brown Rachel Roberts
Indiana University South Bend Washington University, St. Louis
Edward B. Burger Judith Roitman
Williams College University of Kansas, Lawrence
Scott Chapman Edward Scheinerman
Sam Houston State University Johns Hopkins University
Ricardo Cortez Abe Shenitzer
Tulane University York University
Joseph W. Dauben Karen E. Smith
City University of New York University of Michigan, Ann Arbor
Beverly Diamond Susan G. Staples
College of Charleston Texas Christian University
Gerald A. Edgar John Stillwell
The Ohio State University University of San Francisco
Gerald B. Folland Dennis Stowe
University of Washington, Seattle Idaho State University, Pocatello
Sidney Graham Francis Edward Su
Central Michigan University Harvey Mudd College
Doug Hensley Serge Tabachnikov
Texas A&M University Pennsylvania State University
Roger A. Horn Daniel Ullman
University of Utah George Washington University
Steven Krantz Gerard Venema
Washington University, St. Louis Calvin College
C. Dwight Lahr Douglas B. West
Dartmouth College University of Illinois, Urbana-Champaign
Bo Li
Purdue University

EDITORIAL ASSISTANT
Nancy R. Board
NOTICE TO AUTHORS Proposed problems or solutions should be sent to:
The MONTHLY publishes articles, as well as notes and DOUG HENSLEY, MONTHLY Problems
other features, about mathematics and the profes- Department of Mathematics
sion. Its readers span a broad spectrum of math- Texas A&M University
ematical interests, and include professional mathe- 3368 TAMU
maticians as well as students of mathematics at all College Station, TX 77843-3368
collegiate levels. Authors are invited to submit arti-
cles and notes that bring interesting mathematical
In lieu of duplicate hardcopy, authors may submit
ideas to a wide audience of MONTHLY readers.
pdfs to monthlyproblems@math.tamu.edu.
The MONTHLY’s readers expect a high standard of ex-
position; they expect articles to inform, stimulate,
challenge, enlighten, and even entertain. MONTHLY Advertising Correspondence:
articles are meant to be read, enjoyed, and dis- MAA Advertising
cussed, rather than just archived. Articles may be 1529 Eighteenth St. NW
expositions of old or new results, historical or bio- Washington DC 20036
graphical essays, speculations or definitive treat-
ments, broad developments, or explorations of a Phone: (866) 821-1221
single application. Novelty and generality are far Fax: (866) 387-1208
less important than clarity of exposition and broad E-mail: advertising@maa.org
appeal. Appropriate figures, diagrams, and photo-
graphs are encouraged. Further advertising information can be found online
at www.maa.org
Notes are short, sharply focused, and possibly infor-
mal. They are often gems that provide a new proof Change of address, missing issue inquiries, and
of an old theorem, a novel presentation of a familiar other subscription correspondence:
theme, or a lively discussion of a single issue. MAA Service Center, maahq@maa.org
Beginning January 1, 2011, submission of articles and All at the address:
notes is required via the MONTHLY’s Editorial Man-
ager System. Initial submissions in pdf or LATEX form The Mathematical Association of America
can be sent to the Editor-Elect Scott Chapman at 1529 Eighteenth Street, N.W.
Washington, DC 20036
http://www.editorialmanager.com/monthly
Recent copies of the MONTHLY are available for pur-
chase through the MAA Service Center.
The Editorial Manager System will cue the author maahq@maa.org, 1-800-331-1622
for all required information concerning the paper.
Questions concerning submission of papers can be Microfilm Editions: University Microfilms Interna-
addressed to the Editor-Elect at monthly@shsu.edu. tional, Serial Bid coordinator, 300 North Zeeb Road,
Authors who use LATEX are urged to use article.sty, Ann Arbor, MI 48106.
or a similar generic style, and its standard environ-
ments with no custom formatting. The style of ci- The AMERICAN MATHEMATICAL MONTHLY (ISSN
tations for journal articles and books should match 0002-9890) is published monthly except bimonthly
that used on MathSciNet (see http://www.ams. June-July and August-September by the Mathe-
org/mathscinet). Follow the link to Electronic Publi- matical Association of America at 1529 Eighteenth
cations Information for authors at http://www.maa. Street, N.W., Washington, DC 20036 and Lancaster,
org/pubs/monthly.html for information about fig- PA, and copyrighted by the Mathematical Associa-
ures and files, as well as general editorial guidelines. tion of America (Incorporated), 2011, including rights
Letters to the Editor on any topic are invited. to this journal issue as a whole and, except where
Comments, criticisms, and suggestions for making otherwise noted, rights to each individual contribu-
the MONTHLY more lively, entertaining, and infor- tion. Permission to make copies of individual arti-
mative can be forwarded to the Editor-Elect at cles, in paper or electronic form, including posting
monthly@shsu.edu. on personal and class web pages, for educational
and scientific use is granted without fee provided
The online MONTHLY archive at www.jstor.org is a that copies are not made or distributed for profit
valuable resource for both authors and readers; it or commercial advantage and that copies bear the
may be searched online in a variety of ways for any following copyright notice: [Copyright the Mathe-
specified keyword(s). MAA members whose institu- matical Association of America 2011. All rights re-
tions do not provide JSTOR access may obtain indi- served.] Abstracting, with credit, is permitted. To
vidual access for a modest annual fee; call 800-331- copy otherwise, or to republish, requires specific
1622. permission of the MAA’s Director of Publications and
See the MONTHLY section of MAA Online for current possibly a fee. Periodicals postage paid at Washing-
information such as contents of issues and descrip- ton, DC, and additional mailing offices. Postmaster:
tive summaries of forthcoming articles: Send address changes to the American Mathemati-
cal Monthly, Membership/Subscription Department,
MAA, 1529 Eighteenth Street, N.W., Washington, DC,
http://www.maa.org/ 20036-1385.
Yueh-Gin Gung and Dr. Charles Y. Hu
Award for 2011 to Joseph A. Gallian for
Distinguished Service to Mathematics
Barbara Faires

Joe Gallian.

The two themes that run through Joseph A. Gallian’s service to mathematics are (a) en-
couraging young mathematicians and helping them to develop successful careers and
(b) communicating mathematics to the widest possible audience. He was one of the
early proponents of undergraduates conducting mathematical research, and his REU
at Duluth, which began in 1977, is widely regarded as the premier REU. The quality
of the work at Joe’s REU is evidenced by the 160 papers by participants that grew out
of their REU work. These papers have appeared in such journals as Crelle’s Journal,
Journal of Algebra, Journal of Combinatorial Theory, Discrete Mathematics, Applied
Discrete Mathematics, Annals of Discrete Mathematics, and Journal of Graph Theory.
The REU, along with Joe’s continuing contact with its participants, makes an important
contribution to developing the next generations of mathematicians. Participants have
included some prominent mathematicians, whose careers this REU helped to shape.
Joe is also an inspiration to a generation of mathematicians who involve students in
high-quality undergraduate research in mathematics. Not only is Joe successful with
his own REU, but he is generous with his time and advice to help others to set up
REUs. In 2002, Joe was recognized by the Council on Undergraduate Research with
their Fellow Award, given to members who have demonstrated sustained excellence in
research with undergraduates.
doi:10.4169/amer.math.monthly.118.03.195

March 2011] YUEH-GIN GUNG AND DR. CHARLES Y. HU AWARD FOR 2011 195
Project NExT is the MAA’s widely acclaimed professional development opportu-
nity for new or recent Ph.D.’s. Joe has been involved with Project NExT since its first
summer in 1994 when he gave the closing address. The address was so extraordi-
narily successful that he has given each subsequent closing address. Later, when Joe
became co-director of Project NExT in 1998, he assumed primary responsibility for
many parts of the program, participated in developing the workshop program, and of-
ten drafted articles for Focus and reports to the Board of Governors. His boundless
energy, enthusiasm, mathematical sophistication, and academic savvy have made him
the perfect person to work with the hundreds of new mathematics faculty who have
become Project NExT Fellows.
Joe’s Project NExT work illustrates that his service to mathematics ranges across
all the levels of work that needs to be done. Not only does Joe participate in long-
range planning and vision discussions, but he also does the small tasks that keep a
program functioning successfully. As with his REU, Joe does not treat Project NExT
as a job for which he has specified, narrowly defined duties, but as a program to which
he generously gives his time and to whose success he is committed.
In his talks, Joe combines thorough preparation, imaginative presentation, and a
showman’s flair with solid mathematical content. An indication of Joe’s success at
communicating mathematics was a standing ovation from the audience at his Pi Mu
Epsilon Frame Lecture. This audience included high school students as well as profes-
sors, and all understood and were excited with Joe’s talk. Joe has given 24 addresses
at national meetings, 65 at MAA Section meetings, and over 200 at colleges and uni-
versities. Joe also communicates mathematics beyond the mathematical community.
Articles about his work have appeared in twenty-five news outlets in the United States
as well as in Europe and India. Four of these were in Science News and one in the New
York Times. In addition to this he has more than a 100 articles in mathematical jour-
nals and other publications including Math Horizons, the Macmillan Encyclopedia of
Chemistry, and the Mathematical Intelligencer. Joe Gallian was named by a Duluth
newspaper as one of the “100 Great Duluthians of the 20th Century.”
Joe has served professional organizations and the mathematical community at
large. Joe has been national coordinator for Mathematics Awareness Month (2003 and
2010); he has served on more than 50 national committees, chairing at least 10 of
them; he was a Council on Undergraduate Research Councilor for 11 years, serving
as chair of the mathematics and computer science division for part of that time; he has
served as associate editor for Mathematics Magazine and the American Mathematical
Monthly; and he has been director or co-director of five conferences. Joe has refereed
for 40 journals and is a reviewer for NSF, the Research Council of Canada, and the
Australian Research Council. Those who work with Joe know that he is always an
active contributor to a project in which he is involved—he is efficient and he moves
the project along.
Joe Gallian’s many awards and honors attest to his passion to serve undergraduates,
professional organizations, and the mathematical community. He has been honored
with teaching awards from the University of Minnesota Duluth, the Carnegie Founda-
tion for the Advancement of Teaching, and the Mathematical Association of America
(Haimo Award). Joe has received the MAA’s Trevor Evans and Carl B. Allendoerfer
Awards and has been an MAA Polya Lecturer. Joe served as second vice president and
then president of the MAA.
Joe completed his undergraduate degree at Slippery Rock University, M.A. at the
University of Kansas, and Ph.D. at Notre Dame. He is a professor of mathematics and
statistics at the University of Minnesota Duluth, where he was recognized in December
2009 with the Chancellor’s Award for Distinguished Research.

196 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



About the Gung-Hu Award. The Yueh-Gin Gung and Dr. Charles Y. Hu Award for
Distinguished Service to Mathematics is the successor to the Association’s Award for
Distinguished Service to Mathematics, first presented in 1962. It is intended to be the
most prestigious award for service offered by the Association. The initial endowment
was contributed by husband and wife Dr. Charles Y. Hu and Yueh-Gin Gung. Dr. Hu
and Yueh-Gin Gung were not mathematicians, but rather a professor of geography at
the University of Maryland and a librarian at the University of Chicago, respectively.
They contributed generously to our discipline because, as they wrote, “We always
have high regard and great respect for the intellectual agility and high quality of mind
of mathematicians and consider mathematics as the most vital field of study in the
technological age we are living in.”

The Essence of Mathematics . . .

“The essence of mathematics is accuracy.”


Walter Bagehot, The Works of Walter Bagehot, Vol. V,
The Travelers Insurance Company, Hartford, CT, 1891, p. 457.

“The essence of mathematics is exact truthfulness.”


Ellery W. Davis, The Condition of Secondary Mathematical Instruction
With Some Hints to Remedies, Mathematical Supplement
of School Science, Vol. 1, No. 1, 1903, p. 8.

“The essence of mathematics lies precisely in its freedom.”


Georg Cantor, Ueber unendliche, lineare Punktmannichfaltigkeiten,
Mathematische Annalen 21 (1883) 564.

March 2011] YUEH-GIN GUNG AND DR. CHARLES Y. HU AWARD FOR 2011 197
Was Cantor Surprised?
Fernando Q. Gouvêa

Abstract. We look at the circumstances and context of Cantor’s famous remark, “I see it, but
I don’t believe it.” We argue that, rather than denoting astonishment at his result, the remark
pointed to Cantor’s worry about the correctness of his proof.

Mathematicians love to tell each other stories. We tell them to our students too, and
they eventually pass them on. One of our favorites, and one that I heard as an under-
graduate, is the story that Cantor was so surprised when he discovered one of his the-
orems that he said “I see it, but I don’t believe it!” The suggestion was that sometimes
we might have a proof, and therefore know that something is true, but nevertheless still
find it hard to believe.
That sentence can be found in Cantor’s extended correspondence with Dedekind
about ideas that he was just beginning to explore. This article argues that what Can-
tor meant to convey was not really surprise, or at least not the kind of surprise that
is usually suggested. Rather, he was expressing a very different, if equally familiar,
emotion. In order to make this clear, we will look at Cantor’s sentence in the context
of the correspondence as a whole.
Exercises in myth-busting are often unsuccessful. As Joel Brouwer says in his poem
“A Library in Alexandria,”

. . . And so history gets written


to prove the legend is ridiculous. But soon the legend
replaces the history because the legend is more interesting.

Our only hope, then, lies in arguing not only that the standard story is false, but also
that the real story is more interesting.

1. THE SURPRISE. The result that supposedly surprised Cantor was the fact that
sets of different dimension could have the same cardinality. Specifically, Cantor
showed (of course, not yet using this language) that there was a bijection between the
interval I = [0, 1] and the n-fold product I n = I × I × · · · × I .
There is no doubt, of course, that this result is “surprising,” i.e., that it is counter-
intuitive. In fact Cantor said so explicitly, pointing out that he had expected something
different. But the story has grown in the telling, and in particular Cantor’s phrase about
seeing but not believing has been read as expressing what we usually mean when we
see something happen and exclaim “Unbelievable!” What we mean is not that we
actually do not believe, but that we find what we know has happened to be hard to
believe because it is so unusual, unexpected, surprising. In other words, the idea is that
Cantor felt that the result was hard to believe even though he had a proof. His phrase
has been read as suggesting that mathematical proof may engender rational certainty
while still not creating intuitive certainty.
The story was then co-opted to demonstrate that mathematicians often discover
things that they did not expect or prove things that they did not actually want to prove.
For example, here is William Byers in How Mathematicians Think:
doi:10.4169/amer.math.monthly.118.03.198

198 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Cantor himself initially believed that a higher-dimensional figure would have a
larger cardinality than a lower-dimensional one. Even after he had found the ar-
gument that demonstrated that cardinality did not respect dimensions: that one-,
two-, three-, even n-dimensional sets all had the same cardinality, he said, “I see
it, but I don’t believe it.” [2, p. 179]

Did Cantor’s comment suggest that he found it hard to believe his own theorem
even after he had proved it? Byers was by no means the first to say so.
Many mathematicians thinking about the experience of doing mathematics have
found Cantor’s phrase useful. In his preface to the original (1937) publication of the
Cantor-Dedekind correspondence, J. Cavaillès already called attention to the phrase:

. . . these astonishing discoveries—astonishing first of all to the author himself:


“I see it but I don’t believe it at all,”1 he writes in 1877 to Dedekind about one of
them—, these radically new notions . . . [14, p. 3, my translation]

Notice, however, that Cavaillès is still focused on the description of the result as
“surprising” rather than on the issue of Cantor’s psychology. It was probably Jacques
Hadamard who first connected the phrase to the question of how mathematicians think,
and so in particular to what Cantor was thinking. In his famous Essay on the Psychol-
ogy of Invention in the Mathematical Field, first published in 1945 (only eight years
after [14]), Hadamard is arguing about Newton’s ideas:

. . . if, strictly speaking, there could remain a doubt as to Newton’s example,


others are completely beyond doubt. For instance, it is certain that Georg Cantor
could not have foreseen a result of which he himself says “I see it, but I do not
believe it.” [10, pp. 61–62].

Alas, when it comes to history, few things are “certain.”

2. THE MAIN CHARACTERS. Our story plays out in the correspondence between
Richard Dedekind and Georg Cantor during the 1870s. It will be important to know
something about each of them.
Richard Dedekind was born in Brunswick on October 6, 1831, and died in the same
town, now part of Germany, on February 12, 1916. He studied at the University of
Göttingen, where he was a contemporary and friend of Bernhard Riemann and where
he heard Gauss lecture shortly before the old man’s death. After Gauss died, Lejeune
Dirichlet came to Göttingen and became Dedekind’s friend and mentor.
Dedekind was a very creative mathematician, but he was not particularly ambitious.
He taught in Göttingen and in Zurich for a while, but in 1862 he returned to his home
town. There he taught at the local Polytechnikum, a provincial technical university. He
lived with his brother and sister and seemed uninterested in offers to move to more
prestigious institutions. See [1] for more on Dedekind’s life and work.
Our story will begin in 1872. The first version of Dedekind’s ideal theory had ap-
peared as Supplement X to Dirichlet’s Lectures in Number Theory (based on actual
lectures by Dirichlet but entirely written by Dedekind). Also just published was one
of his best known works, “Stetigkeit und Irrationalzahlen” (“Continuity and Irrational
Numbers”; see [7]; an English translation is included in [5]). This was his account of
how to construct the real numbers as “cuts.” He had worked out the idea in 1858, but
published it only 14 years later.
1 Cavaillès misquotes Cantor’s phrase as “je le vois mais je ne le crois point.”

March 2011] WAS CANTOR SURPRISED? 199


Georg Cantor was born in St. Petersburg, Russia, on March 3, 1845. He died in
Halle, Germany, on January 6, 1918. He studied at the University of Berlin, where the
mathematics department, led by Karl Weierstrass, Ernst Eduard Kummer, and Leopold
Kronecker, might well have been the best in the world. His doctoral thesis was on the
number theory of quadratic forms.
In 1869, Cantor moved to the University of Halle and shifted his interests to the
study of the convergence of trigonometric series. Very much under Weierstrass’s in-
fluence, he too introduced a way to construct the real numbers, using what he called
“fundamental series.” (We call them “Cauchy sequences.”) His paper on this construc-
tion also appeared in 1872.
Cantor’s lifelong dream seems to have been to return to Berlin as a professor, but
it never happened. He rose through the ranks in Halle, becoming a full professor in
1879 and staying there until his death. See [13] for a short account of Cantor’s life.
The standard account of Cantor’s mathematical work is [4].
Cantor is best known, of course, for the creation of set theory, and in particular for
his theory of transfinite cardinals and ordinals. When our story begins, this was mostly
still in the future. In fact, the birth of several of these ideas can be observed in the
correspondence with Dedekind. This correspondence was first published in [14]; we
quote it from the English translation by William Ewald in [8, pp. 843–878].

3. “ALLOW ME TO PUT A QUESTION TO YOU.” Dedekind and Cantor met in


Switzerland when they were both on vacation there. Cantor had sent Dedekind a copy
of the paper containing his construction of the real numbers. Dedekind responded, of
course, by sending Cantor a copy of his booklet. And so begins the story.
Cantor was 27 years old and very much a beginner, while Dedekind was 41 and at
the height of his powers; this accounts for the tone of deference in Cantor’s side of
the correspondence. Cantor’s first letter acknowledged receipt of [7] and says that “my
conception [of the real numbers] agrees entirely with yours,” the only difference being
in the actual construction. But on November 29, 1873, Cantor moves on to new ideas:

Allow me to put a question to you. It has a certain theoretical interest for me, but
I cannot answer it myself; perhaps you can, and would be so good as to write me
about it. It is as follows.
Take the totality of all positive whole-numbered individuals n and denote it
by (n). And imagine, say, the totality of all positive real numerical quantities x
and designate it by (x). The question is simply, Can (n) be correlated to (x) in
such a way that to each individual of the one totality there corresponds one and
only one of the other? At first glance one says to oneself no, it is not possible, for
(n) consists of discrete parts while (x) forms a continuum. But nothing is gained
by this objection, and although I incline to the view that (n) and (x) permit no
one-to-one correlation, I cannot find the explanation which I seek; perhaps it is
very easy.

In the next few lines,


 Cantor points out that the question is not as dumb as it looks,
since “the totality qp of all positive rational numbers” can be put in one-to-one cor-
respondence with the integers.
We do not have Dedekind’s side of the correspondence, but his notes indicate that
he responded indicating that (1) he could not answer the question either, (2) he could
show that the set of all algebraic numbers is countable, and (3) that he didn’t think the
question was all that interesting. Cantor responded on December 2:

200 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



I was exceptionally pleased to receive your answer to my last letter. I put my
question to you because I had wondered about it already several years ago, and
was never certain whether the difficulty I found was subjective or whether it was
inherent in the subject. Since you write that you too are unable to answer it, I may
assume the latter.—In addition, I should like to add that I have never seriously
occupied myself with it, because it has no special practical interest for me. And
I entirely agree with you when you say that for this reason it does not deserve
much effort. But it would be good if it could be answered; e.g., if it could be
answered with no, then one would have a new proof of Liouville’s theorem that
there are transcendental numbers.

Cantor first concedes that perhaps it is not that interesting, then immediately points
out an application that was sure to interest Dedekind! In fact, Dedekind’s notes indi-
cate that it worked: “But the opinion I expressed that the first question did not deserve
too much effort was conclusively refuted by Cantor’s proof of the existence of tran-
scendental numbers.” [8, p. 848]
These two letters are fairly typical of the epistolary relationship between the two
men: Cantor is deferential but is continually coming up with new ideas, new questions,
new proofs; Dedekind’s role is to judge the value of the ideas and the correctness of
the proofs. The very next letter, from December 7, 1873, contains Cantor’s first proof
of the uncountability of the real numbers. (It was not the “diagonal” argument; see [4]
or [9] for the details.)

4. “THE SAME TRAIN OF THOUGHT . . . ” Cantor seemed to have a good sense


for what question should come next. On January 5, 1874, he posed the problem of
higher-dimensional sets:

As for the question with which I have recently occupied myself, it occurs to me
that the same train of thought also leads to the following question:
Can a surface (say a square including its boundary) be one-to-one correlated
to a line (say a straight line including its endpoints) so that to every point of the
surface there corresponds a point of the line, and conversely to every point of the
line there corresponds a point of the surface?
It still seems to me at the moment that the answer to this question is very
difficult—although here too one is so impelled to say no that one would like to
hold the proof to be almost superfluous.

Cantor’s letters indicate that he had been asking others about this as well, and that
most considered the question just plain weird, because it was “obvious” that sets of
different dimensions could not be correlated in this way. Dedekind, however, seems to
have ignored this question, and the correspondence went on to other issues. On May
18, 1874, Cantor reminded Dedekind of the question, and seems to have received no
answer.
The next letter in the correspondence is from May, 1877. The correspondence seems
to have been reignited by a misunderstanding of what Dedekind meant by “the essence
of continuity” in [7]. On June 20, 1877, however, Cantor returns to the question of
bijections between sets of different dimensions, and now proposes an answer:

. . . I should like to know whether you consider an inference-procedure that I use


to be arithmetically rigorous.
The problem is to show that surfaces, bodies, indeed even continuous struc-
tures of ρ dimensions can be correlated one-to-one with continuous lines, i.e.,

March 2011] WAS CANTOR SURPRISED? 201


with structures of only one dimension—so that surfaces, bodies, indeed even
continuous structures of ρ dimension have the same power as curves. This idea
seems to conflict with the one that is especially prevalent among the represen-
tatives of modern geometry, who speak of simply infinite, doubly, triply, . . . ,
ρ-fold infinite structures. (Sometimes you even find the idea that the infinity of
points of a surface or a body is obtained as it were by squaring or cubing the
infinity of points of a line.)

Significantly, Cantor’s formulation of the question had changed. Rather than asking
whether there is a bijection, he posed the question of finding a bijection. This is, of
course, because he believed he had found one. By this point, then, Cantor knows the
right answer. It remains to give a proof that will convince others. He goes on to explain
his idea for that proof, working with the ρ-fold product of the unit interval with itself,
but for our purposes we can consider only the case ρ = 2.
The proof Cantor proposed is essentially this: take a point (x, y) in [0, 1] × [0, 1],
and write out the decimal expansions of x and y:

(x, y) = (0.abcde . . . , 0.αβγ δ . . . ).

Some real numbers have more than one decimal expansion. In that case, we always
choose the expansion that ends in an infinite string of 9s. Cantor’s idea is to map (x, y)
to the point z ∈ [0, 1] given by

z = 0.aαbβcγ dδe . . .

Since we can clearly recover x and y from the decimal expansion of z, this gives the
desired correspondence.
Dedekind immediately noticed that there was a problem. On June 22, 1877 (one
cannot fail to be impressed with the speed of the German postal service!), he wrote
back pointing out a slight problem “which you will perhaps solve without difficulty.”
He had noticed that the function Cantor had defined, while clearly one-to-one, was not
onto. (Of course, he did not use those words.) Specifically, he pointed out that such
numbers as

z = 0.120101010101 . . .

did not correspond to any pair (x, y), because the only possible value for x is
0.100000 . . . , which is disallowed by Cantor’s choice of decimal expansion. He
was not sure if this was a big problem, adding “I do not know if my objection goes to
the essence of your idea, but I did not want to hold it back.”
Of course, the problem Dedekind noticed is real. In fact, there are a great many real
numbers not in the image, since we can replace the ones that separate the zeros with
any sequence of digits. The image of Cantor’s map is considerably smaller than the
whole interval.
Cantor’s first response was a postcard sent the following day. (Can one envision him
reading the letter at the post office and immediately dispatching a postcard back?) He
acknowledged the error and suggested a solution:

Alas, you are entirely correct in your objection; but happily it concerns only the
proof, not the content. For I proved somewhat more than I had realized, in that I
bring a system x1 , x2 , . . . , xρ of unrestricted real variables (that are ≥ 0 and ≤ 1)
into one-to-one relationship with a variable y that does not assume all values of

202 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



that interval, but rather all with the exception of certain y 00 . However, it assumes
each of the corresponding values y 0 only once, and that seems to me to be the
essential point. For now I can bring y 0 into a one-to-one relation with another
quantity t that assumes all the values ≥ 0 and ≤ 1.
I am delighted that you have found no other objections. I shall shortly write
to you at greater length about this matter.

This is a remarkable response. It suggests that Cantor was very confident that his
result was true. This confidence was due to the fact that Cantor was already thinking
in terms of what later became known as “cardinality.” Specifically, he expects that the
existence of a one-to-one mapping from one set A to another set B implies that the
size of A is in some sense “less than or equal to” that of B.
Cantor’s proof shows that the points of the square can be put into bijection with a
subset of the interval. Since the interval can clearly be put into bijection with a subset
of the square, this strongly suggests that both sets of points “are the same size,” or,
as Cantor would have said it, “have the same power.” All we need is a proof that the
“powers” are linearly ordered in a way that is compatible with inclusions.
That the cardinals are indeed ordered in this way is known today as the Schroeder-
Bernstein theorem. The postcard shows that Cantor already “knew” that the Schroeder-
Bernstein theorem should be true. In fact, he seems to implicitly promise a proof of
that very theorem. He was not able to find such a proof, however, then or (as far as I
know) ever.
His fuller response, sent two days later on June 25, contained instead a completely
different, and much more complicated, proof of the original theorem.

I sent you a postcard the day before yesterday, in which I acknowledged the
gap you discovered in my proof, and at the same time remarked that I am able
to fill it. But I cannot repress a certain regret that the subject demands more
complicated treatment. However, this probably lies in the nature of the subject,
and I must console myself; perhaps it will later turn out that the missing portion
of that proof can be settled more simply than is at present in my power. But since
I am at the moment concerned above all to persuade you of the correctness of
my theorem . . . I allow myself to present another proof of it, which I found even
earlier than the other.

Notice that what Cantor is trying to do here is to convince Dedekind that his theorem
is true by presenting him a correct proof.2 There is no indication that Cantor had any
doubts about the correctness of the result itself. In fact, as we will see, he says so
himself.
Let’s give a brief account of Cantor’s proof; to avoid circumlocutions, we will ex-
press most of it in modern terms. Cantor began by noting that every real number x
between 0 and 1 can be expressed as a continued fraction

1
x=
1
a+
1
b+
c + ···
2 Cantor claimed he had found this proof before the other. I find this hard to believe. In fact, the proof

looks very much like the result of trying to fix the problem in the first proof by replacing (nonunique) decimal
expansions with (unique) continued fraction expansions.

March 2011] WAS CANTOR SURPRISED? 203


where the partial quotients a, b, c, . . . , etc. are all positive integers. This representation
is infinite if and only if x is irrational, and in that case the representation is unique.
So one can argue just as before, “interleaving” the two continued fractions for x
and y, to establish a bijection between the set of pairs (x, y) such that both x and y
are irrational and the set of irrational points in [0, 1]. The result is a bijection because
the inverse mapping, splitting out two continued fraction expansions from a given one,
will certainly produce two infinite expansions.
That being done, it remains to be shown that the set of irrational numbers between
0 and 1 can be put into bijection with the interval [0, 1]. This is the hard part of the
proof. Cantor proceeded as follows.
First he chose an enumeration of the rationals {rk } and an increasing sequence of
irrationals {ηk } in [0, 1] converging to 1. He then looked at the bijection from [0, 1] to
[0, 1] that is the identity on [0, 1] except for mapping rk 7 → ηk , ηk 7 → rk . This gives a
bijection between irrationals in [0, 1] and [0, 1] minus the sequence {ηk } and reduces
the problem to proving that [0, 1] can be put into bijection with [0, 1] − {ηk }.
At this point Cantor claims that it is now enough to “successively apply” the fol-
lowing theorem:

A number y that can assume all the values of the interval (0 . . . 1) with the soli-
tary exception of the value 0 can be correlated one-to-one with a number x that
takes on all values of the interval (0 . . . 1) without exception.

In other words, he claimed that there was a bijection between the half-open interval
(0, 1] and the closed interval [0, 1], and that “successive application” of this fact would
finish the proof. In the actual application he would need the intervals to be open on the
right, so, as we will see, he chose a bijection that mapped 1 to itself.
Cantor did not say exactly what kind of “successive application” he had in mind,
but what he says in a later letter suggests it was this: we have the interval [0, 1] minus
the sequence of the ηk . We want to “put back in” the ηk , one at a time. So we leave
the interval [0, η1 ) alone, and look at (η1 , η2 ). Applying the lemma, we construct a
bijection between that and [η1 , η2 ). Then we do the same for (η2 , η3 ) and so on. Putting
together these bijections produces the bijection we want.
Finally, it remained to prove the lemma, that is, to construct the bijection from [0, 1]
to (0, 1]. Modern mathematicians would probably do this by choosing a sequence xn
in (0, 1), mapping 0 to x1 and then every xn to xn+1 . This “Hilbert hotel” idea was still
some time in the future, however, even for Cantor. Instead, Cantor chose a bijection
that could be represented visually, and simply drew its graph. He asked Dedekind to
consider “the following peculiar curve,” which we have redrawn in Figure 1 based on
the photograph reproduced in [4, p. 63].
Such a picture requires some explanation, and Cantor provided it. The domain
has been divided by a geometric progression, so b = 1/2, b1 = 3/4, and so on;
a = (0, 1/2), a 0 = (1/2, 3/4), etc. The point C is (1, 1). The points d 0 = (1/2, 1/2),
d 00 = (3/4, 3/4), etc. give the corresponding subdivision of the main diagonal.

The curve consists of infinitely many parallel line segments ab, a 0 b0 , a 00 b00 and
of the point c. The endpoints b, b0 , b00 , . . . are not regarded as belonging to the
curve.

The stipulation that the segments are open at their lower endpoints means that 0 is not
in the image. This proves the lemma, and therefore the proof is finished.

204 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



C
a
a
d b
a
d b
a
d b

a
d b

P
O b b1 b2 b3 b4

Figure 1. Cantor’s function from [0, 1] to (0, 1].

Cantor did not even add that last comment. As soon as he had explained his curve, he
moved on to make extensive comments on the theorem and its implications. He turns
on its head the objection that various mathematicians made to his question, namely
that it was “obvious” from geometric considerations that the number of variables is
invariant:

For several years I have followed with interest the efforts that have been made,
building on Gauss, Riemann, Helmholtz, and others, towards the clarification of
all questions concerning the ultimate foundations of geometry. It struck me that
all the important investigations in this field proceed from an unproven presuppo-
sition which does not appear to me self-evident, but rather to need a justification.
I mean the presupposition that a ρ-fold extended continuous manifold needs ρ
independent real coordinates for the determination of its elements, and that for
a given manifold this number of coordinates can neither be increased nor de-
creased.
This presupposition became my view as well, and I was almost convinced of
its correctness. The only difference between my standpoint and all the others
was that I regarded that presupposition as a theorem which stood in great need
of a proof; and I refined my standpoint into a question that I presented to several
colleagues, in particular at the Gauss Jubilee in Göttingen. The question was the
following:
“Can a continuous structure of ρ dimensions, where ρ > 1, be related one-
to-one with a continuous structure of one dimension so that to each point of the
former there corresponds one and only one point of the latter?”
Most of those to whom I presented this question were extremely puzzled that
I should ask it, for it is quite self-evident that the determination of a point in an
extension of ρ dimensions always needs ρ independent coordinates. But who-
ever penetrated the sense of the question had to acknowledge that a proof was
needed to show why the question should be answered with the “self-evident”
no. As I say, I myself was one of those who held it for the most likely that the

March 2011] WAS CANTOR SURPRISED? 205


question should be answered with a no—until quite recently I arrived by rather
intricate trains of thought at the conviction that the answer to that question is
an unqualified yes.3 Soon thereafter I found the proof which you see before you
today.
So one sees what wonderful power lies in the ordinary real and irrational num-
bers, that one is able to use them to determine uniquely the elements of a ρ-fold
extended continuous manifold with a single coordinate. I will only add at once
that their power goes yet further, in that, as will not escape you, my proof can be
extended without any great increase in difficulty to manifolds with an infinitely
great dimension-number, provided that their infinitely-many dimensions have the
form of a simple infinite sequence.
Now it seems to me that all philosophical or mathematical deductions that use
that erroneous presupposition are inadmissible. Rather the difference that ob-
tains between structures of different dimension-number must be sought in quite
other terms than in the number of independent coordinates—the number that was
hitherto held to be characteristic.

5. “JE LE VOIS . . . ” So now Dedekind had a lot to digest. The interleaving argu-
ment is not problematic in this case, and the existence of a bijection between the ra-
tionals and the increasing sequence ηk had been established in 1872. But there were at
least two sticky points in Cantor’s letter.
First, there is the matter of what kind of “successive application” of the lemma
Cantor had in mind. Whatever it was, it would seem to involve constructing a bijection
by “putting together” an infinite number of functions. One can easily get in trouble.
For example, here is an alternative reading of what Cantor had in mind. Instead
of applying the lemma to the interval (η1 , η2 ), we could apply it to (0, η1 ) to put it
into bijection with (0, η1 ]. So now we have “put η1 back in” and we have a bijection
between [0, 1] − {η1 , η2 , η3 , . . . } and [0, 1] − {η2 , η3 , . . . }.
Now repeat: use the lemma on (0, η2 ) to make a bijection to (0, η2 ]. So we have “put
η2 back in.” If we keep doing that, we presumably get a bijection from (0, 1) minus
the ηk to all of (0, 1).
But do we? What is the image of, say, 13 η1 ? It is not fixed under any of our functions.
To determine its image in [0, 1], we would need to compose infinitely many functions,
and it’s not clear how to do that. If we manage to do it with some kind of limiting
process, then it is no longer clear that the overall function is a bijection.
The interpretation Cantor probably intended (and later stated explicitly) yields a
workable argument because the domains of the functions are disjoint, so it is clear
where to map any given point. But since Cantor did not indicate his argument in this
letter, one can imagine Dedekind hesitating. In any case, at this point in history the
idea of constructing a function out of infinitely many pieces would have been both
new and worrying.
The second sticky point was Cantor’s “application” of his theorem to undermine
the foundations of geometry. This is, of course, the sort of thing one has to be careful
about. And it is clear, from Dedekind’s eventual response to Cantor, that it concerned
him.
Dedekind took longer than usual to respond. Having already given one wrong proof,
Cantor was anxious to hear a “yes” from Dedekind, and so he wrote again on June 29:

3 The original reads “. . . bis ich vor ganz kurzer Zeit durch ziemlich verwickelte Gedankereihen zu der Ue-

berzeugung gelangte, dass jene Frage ohne all Einschränkgung zu bejahen ist.” Note Cantor’s Überzeugung—
conviction, belief, certainty.

206 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Please excuse my zeal for the subject if I make so many demands upon your
kindness and patience; the communications which I lately sent you are even for
me so unexpected, so new, that I can have no peace of mind until I obtain from
you, honoured friend, a decision about their correctness. So long as you have not
agreed with me, I can only say: je le vois, mais je ne le crois pas. And so I ask
you to send me a postcard and let me know when you expect to have examined
the matter, and whether I can count on an answer to my quite demanding request.

So here is the phrase. The letter is, of course, in German, but the famous “I see it,
but I don’t believe it” is in French.4 Seen in its context, the issue is clearly not that
Cantor was finding it hard to believe his result. He was confident enough about that
to think he had rocked the foundations of the geometry of manifolds. Rather, he felt
a need for confirmation that his proof was correct. It was his argument that he saw
but had trouble believing. This is confirmed by the rest of the letter, in which Cantor
spelled out in detail the most troublesome step, namely, how to “successively apply”
his lemma to construct the final bijection.
So the famous phrase does not really provide an example of a mathematician having
trouble believing a theorem even though he had proved it. Cantor, in fact, seems to have
been confident [überzeugt!] that his theorem was true, as he himself says. He had in
hand at least two arguments for it: the first argument, using the decimal expansion,
required supplementation by a proof of the Schroeder-Bernstein theorem, but Cantor
was quite sure that this would eventually be proved. The second argument was correct,
he thought, but its complicated structure might have allowed something to slip by him.
He knew that his theorem was a radically new and surprising result—it would cer-
tainly surprise others!—and thus it was necessary that the proof be as solid as possible.
The earlier error had given Cantor reason to worry about the correctness of his argu-
ment, leaving Cantor in need of his friend’s confirmation before he would trust the
proof.
Cantor was, in fact, in a position much like that of a student who has proposed
an argument, but who knows that a proof is an argument that convinces his teacher.
Though no longer a student, he knows that a proof is an argument that will convince
others, and that in Dedekind he had the perfect person to find an error if one were
there. So he saw, but until his friend’s confirmation he did not believe.

6. WHAT CAME NEXT. So why did Dedekind take so long to reply? From the
evidence of his next letter, dated July 2, it was not because he had difficulty with the
proof. His concern, rather, was Cantor’s challenge to the foundations of geometry.
The letter opens with a sentence clearly intended to allay Cantor’s fears: “I have
examined your proof once more, and I have discovered no gap in it; I am quite certain
that your interesting theorem is correct, and I congratulate you on it.” But Dedekind
did not accept the consequences Cantor seemed to find:

However, as I already indicated in the postcard, I should like to make a remark


that counts against the conclusions concerning the concept of a manifold of ρ
dimensions that you append in your letter of 25 June to the communication and
the proof of the theorem. Your words make it appear—my interpretation may be
incorrect—as though on the basis of your theorem you wish to cast doubt on the
meaning or the importance of this concept . . .
4 I don’t know whether this is because of the rhyme vois/crois, or because of the well-known phrase “voir,

c’est croire,” or for some other reason. I do not believe the phrase was already proverbial.

March 2011] WAS CANTOR SURPRISED? 207


Against this, I declare (despite your theorem, or rather in consequence of re-
flections that it stimulated) my conviction or my faith (I have not yet had time
even to make an attempt at a proof) that the dimension-number of a continu-
ous manifold remains its first and most important invariant, and I must defend
all previous writers on the subject . . . For all authors have clearly made the
tacit, completely natural presupposition that in a new determination of the points
of a continuous manifold by new coordinates, these coordinates should also (in
general) be continuous functions of the old coordinates . . .
Dedekind pointed out that, in order to establish his correspondence, Cantor had
been “compelled to admit a frightful, dizzying discontinuity in the correspondence,
which dissolves everything to atoms, so that every continuously connected part of one
domain appears in its image as thoroughly decomposed and discontinuous.” He then
set out a new conjecture that spawned a whole research program:
. . . for the time being I believe the following theorem: “If it is possible to establish
a reciprocal, one-to-one, and complete correspondence between the points of a
continuous manifold A of a dimensions and the points of a continuous manifold
B of b dimensions, then this correspondence itself, if a and b are unequal, is
necessarily utterly discontinuous.”
In his next letter, Cantor claimed that this was indeed his point: where Riemann and
others had casually spoken of a space that requires n coordinates as if that number was
known to be invariant, he felt that this invariance required proof. “Far from wishing to
turn my result against the article of faith of the theory of manifolds, I rather wish to
use it to secure its theorems,” he wrote. The required theorem turned out to be true,
indeed, but proving it took much longer than either Cantor or Dedekind could have
guessed: it was finally proved by Brouwer in 1910. The long and convoluted story of
that proof can be found in [3], [11], and [12].
Finally, one should point out that it was only some three months later that Cantor
found what most modern mathematicians consider the “obvious” way to prove that
there is a bijection between the interval minus a countable set and the whole interval.
√ dated October 23, 1877, he took an enumeration φν of the rationals and let
In a letter
ην = 2/2ν . Then he constructed a map from [0, 1] sending ην to η2ν−1 , φν to η2ν , and
every other point h to itself, thus getting a bijection between [0, 1] and the irrational
numbers between 0 and 1.
7. MATHEMATICS AS CONVERSATION. Is the real story more interesting than
the story of Cantor’s surprise? Perhaps it is, since it highlights the social dynamic that
underlies mathematical work. It does not render the theorem any less surprising, but
shifts the focus from the result itself to its proof.
The record of the extended mathematical conversation between Cantor and Dede-
kind reminds us of the importance of such interaction in the development of mathe-
matics. A mathematical proof is, after all, a kind of challenge thrown at an idealized
opponent, a skeptical adversary that is reluctant to be convinced. Often, this adversary
is actually a colleague or collaborator, the first reader and first critic.
A proof is not a proof until some reader, preferably a competent one, says it is. Until
then we may see, but we should not believe.

REFERENCES

1. K.-R. Biermann, Dedekind, in Dictionary of Scientific Biography, C. C. Gillispie, ed., Scribners, New
York, 1970–1981.

208 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



2. W. Byers, How Mathematicians Think: Using Ambiguity, Contradiction, and Paradox to Create Mathe-
matics, Princeton University Press, Princeton, 2007.
3. J. W. Dauben, The invariance of dimension: Problems in the early development of set theory and topology,
Historia Math. 2 (1975) 273–288. doi:10.1016/0315-0860(75)90066-X
4. , Georg Cantor: His Mathematics and Philosophy of the Infinite, Princeton University Press,
Princeton, 1990.
5. R. Dedekind, Essays in the Theory of Numbers (trans. W. W. Beman), Dover, Mineola, NY, 1963.
6. , Gesammelte Mathematische Werke, R. Fricke, E. Noether, and O. Ore, eds., Chelsea, New York,
1969.
7. , Stetigkeit und Irrationalzahlen, 1872, in Gesammelte Mathematische Werke, vol. 3, item L,
R. Fricke, E. Noether, and O. Ore, eds., Chelsea, New York, 1969.
8. W. Ewald, From Kant to Hilbert: A Source Book in the Foundations of Mathematics, Oxford University
Press, Oxford, 1996.
9. R. Gray, Georg Cantor and transcendental numbers, Amer. Math. Monthly 101 (1994) 819–832. doi:
10.2307/2975129
10. J. Hadamard, An Essay on the Psychology of Invention in the Mathematical Field, Princeton University
Press, Princeton, 1945.
11. D. M. Johnson, The problem of the invariance of dimension in the growth of modern topology I, Arch.
Hist. Exact Sci. 20 (1979) 97–188. doi:10.1007/BF00327627
12. , The problem of the invariance of dimension in the growth of modern topology II, Arch. Hist.
Exact Sci. 25 (1981) 85–267. doi:10.1007/BF02116242
13. H. Meschkowski. Cantor, in Dictionary of Scientific Biography, C. C. Gillispie, ed., Scribners, New York,
1970–1981.
14. E. Noether und J. Cavaillès, Briefwechsel Cantor–Dedekind, Hermann, Paris, 1937.

FERNANDO Q. GOUVÊA is Carter Professor of Mathematics at Colby College in Waterville, ME. He is


the author, with William P. Berlinghoff, of Math through the Ages: A Gentle History for Teachers and Others.
This article was born when he was writing the chapter in that book called “Beyond Counting.” So it’s Bill’s
fault.
Department of Mathematics and Statistics, Colby College, Waterville, ME 04901
fqgouvea@colby.edu

March 2011] WAS CANTOR SURPRISED? 209


On Legendre’s Work on the Law of
Quadratic Reciprocity
Steven H. Weintraub

Abstract. Legendre was the first to state the law of quadratic reciprocity in the form in which
we know it and he was able to prove it in some but not all cases, with the first complete proof
being given by Gauss. In this paper we trace the evolution of Legendre’s work on quadratic
reciprocity in his four great works on number theory.

As is well known, Adrien-Marie Legendre (1752–1833) was the first to state the law
of quadratic reciprocity in the form in which we know it (though an equivalent result
had earlier been conjectured by Euler), and he was able to prove it in some but not
all cases, with the first complete proof being given by Gauss [3]. In this paper we
trace the evolution of Legendre’s work on quadratic reciprocity in his four great works
[10, 11, 12, 13] on number theory. These works span a 45 year period in Legendre’s
life, dating from 1785, 1797, 1808, and 1830 respectively.
Before beginning with our analysis here, we call the reader’s attention to several
other relevant works. [15] overlaps with our work here, though it has a somewhat
different focus. [2] is a brief survey, and [14] is an extended treatment of the early
history of reciprocity laws. A highly readable account of the development of number
theory around this era can be found in [9], which has excerpts from original works of
Euler, Legendre, Gauss, and others, translated into English.
In this paper, we will use Legendre’s language to the extent possible. In particular,
we will not use the terms quadratic residue/nonresidue or the notion of congruence in
the body of this article, as these were never used by Legendre.
We begin with Legendre’s 1785 paper [10]. In Article I of that paper he proves a
result due originally to Euler:

Theorem A. Let c be an odd prime and let d be any integer not divisible by c. Then
d c−1 − 1 is divisible by c.
Furthermore, c divides the formula x 2 + dy 2 (i.e., there are integers x and y not
divisible by c with x 2 + dy 2 divisible by c) if and only if (−d)(c−1)/2 leaves a remainder
of 1 when divided by c; otherwise (−d)(c−1)/2 leaves a remainder of −1 when divided
by c. If −c/2 < d < c/2, each possibility occurs for (c − 1)/2 values of d.1

We follow Legendre’s notation throughout this paper and let a and A be distinct
primes of the form 4x + 1 and b and B be distinct primes of the form 4x − 1 (or
4x + 3; Legendre used both but preferred 4x − 1).
In Article IV he rewrites the above conclusions as (−d)(c−1)/2 = 1 or (−d)(c−1)/2 =
−1, with the convention here, as he explicitly states elsewhere, that this is true “omit-
ting multiples of c.” With this convention he then states the following 8 theorems:
doi:10.4169/amer.math.monthly.118.03.210
1 Note that “c divides the formula x 2 + dy 2 ” if and only if −d is a quadratic residue (mod c). To see this,
first suppose that −d is a quadratic residue (mod c), and let x be an integer with x 2 ≡ −d (mod c). Then
x 2 + d12 ≡ 0 (mod c). Conversely, if x 2 + dy 2 ≡ 0 (mod c), let y be an integer with y y ≡ 1 (mod c). Then
x 2 y 2 + d ≡ 0 (mod c), so −d ≡ (x y)2 (mod c).

210 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Theorem I. If b(a−1)/2 = 1, then a (b−1)/2 = 1.

Theorem II. If a (b−1)/2 = −1, then b(a−1)/2 = −1.

Theorem III. If a (A−1)/2 = 1, then A(a−1)/2 = 1.

Theorem IV. If a (A−1)/2 = −1, then A(a−1)/2 = −1.

Theorem V. If a (b−1)/2 = 1, then b(a−1)/2 = 1.

Theorem VI. If b(a−1)/2 = −1, then a (b−1)/2 = −1.

Theorem VII. If b(B−1)/2 = 1, then B (b−1)/2 = −1.

Theorem VIII. If b(B−1)/2 = −1, then B (b−1)/2 = −1.

Note that Theorems I and II are equivalent (being contrapositives of each other) as
are Theorems III and IV, and Theorems V and VI, leaving five essentially different
cases. Legendre makes this observation in the course of his proofs.
Legendre then proceeds to (attempt to) prove these theorems. As he observes, he is
successful in unconditionally proving Theorems I, II, and VII, but for the remaining
cases his proof is conditional on an auxiliary hypothesis that he cannot prove. We state
this as Hypothesis A below. His key tool in these (partial) proofs is the following result
of his from Article III:

Theorem B. Let r , s, and t be squarefree, pairwise relatively prime positive integers.


Then the equation r x 2 + sy 2 = t z 2 has a nonzero solution in integers if and only if
there are integers λ, µ, and ν such that

r λ2 + s tµ2 − s tν 2 − r
, ,
t r s

are all integers.

Note in particular that in this theorem r , s, and t are not required to be prime.
However, in his (partial) proofs of Theorems I–VIII he uses only the cases where r , s,
and t are primes or are equal to 1.
Now we come to Legendre’s 1797 book [11]. In this work we see two major changes
from his previous work.
The first change is the introduction of what we now call the Legendre symbol mn .

Here n is a prime and m is an arbitrary integer not divisible by n. From Theorem A
he knows that m (n−1)/2 leaves a remainder of ±1 when divided by n, and then he sets
m
n
= 1 or −1 as this remainder is 1 or −1. Legendre occasionally uses this symbol
when n = 1 as well, in which case he sets mn = 1 for every nonzero integer m.

In the opinion of the author, this is more than a notational convenience. In intro-
ducing this notion, Legendre reifies this concept, and makes it into an object of in-
dependent study. This line of thought later led to the Jacobi symbol and the Hilbert
symbol.
The second change is the introduction of the term “reciprocity.” We have in this
work [11, par. (164)], a paragraph entitled:

March 2011] LEGENDRE’S WORK ON QUADRATIC RECIPROCITY 211


Théorême contenant une loi de réciprocité qui existe entre deux nombres pre-
miers quelconques (Theorem containing a law of reciprocity that exists between
two arbitrary prime numbers).

Legendre begins this paragraph by letting m and n be odd primes, and with this
implicit assumption he states the theorem:

For arbitrary prime  4x − 1,


numbers m and n, if they are not both of the form
then mn = mn , and if they are both of the form 4x − 1, then mn = − mn .
  
These two general cases are contained in the formula
n m−1 n−1
m 
= (−1) 2 · 2 .
m n
Again, in the opinion of the author, this general heading reflects not only an ap-
preciation of the importance of this overall result, but a conception of it as a single
result (rather than a collection of eight results), and a conception of it as reflecting a
relationship between distinct odd primes.
Legendre begins his proof by observing that, in case r , s, and t are primes or are
equal to 1, the conditions in Theorem B can be restated in terms of the Legendre
symbol:
     
−r s st rt
= 1, = 1, = 1.
t r s
He then proceeds to develop theproperties
 of the Legendre symbol. From his defi-
nition he immediately derives −N a
= N
a
and −N
b
= − N
b
, as (a − 1)/2 is even,
so (−1)(a−1)/2 = 1, and (b − 1)/2 is odd, so (−1)(b−1)/2 = −1,  and also the multi-
plicativity of the Legendre symbol in general: McN = Mc Nc .2
In order to prove reciprocity he must divide the proof into the same eight cases as
he did in 1785. (We warn the reader who wishes to consult the works of Legendre that
he changes the numbering of these eight cases from 1785 to 1797, leaves it alone from
1797 to 1808, but changes it again from 1808 to 1830. In this paper we use the 1785
numbering throughout.) His proof is essentially (with one exception, noted below) the
same as that in 1785, with some simplification due to the use of the Legendre symbol.
He uses the same auxiliary hypothesis, Hypothesis A, as in 1785, which we now state
(we have delayed stating it until now as it is much easier to state using the Legendre
symbol).

Hypothesis A.
b

= 1 and Ab

(a) For any a and A, there exists b with a
= −1.
A
 A

(b) For any a and b, there exists A with a
= −1 and b
= −1.
a
 a

(c) For any b and B, there exists a with b
= −1 and B
= −1.

The only essential difference between the proofs in 1785 and 1797 is that in 1797
Legendre gives two proofs of cases I and II, the first of which is the same as his 1785
proof.
2 Nowadays it is common to define the Legendre symbol m by m = 1 if m is a quadratic residue (mod n)
 
n n
and mn = −1 if m is a quadratic nonresidue (mod n), but this was not Legendre’s definition. Of course, the


modern definition and Legendre’s definition are equivalent, by Euler’s theorem. Note that the multiplicativity
of the Legendre symbol is immediate from Legendre’s definition, but takes some work to obtain from the
modern definition.

212 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Now we come to Legendre’s 1808 book [12]. First we see that Legendre has isolated
the multiplicativity of the Legendre symbol as an important property and instead of
proving it inter alia as part of the proof of reciprocity, he proves this property in [12,
par. (135)], the paragraph in which he first introduces the Legendre symbol.
The major difference, however, is that Legendre has changed his proof of reci-
procity. His proof here of cases I and II is the same as his second proof of these cases
in 1797. As for the remaining cases, he has a new approach, based on his earlier results
on the possibility of solving the equations M x 2 − N y 2 = ±1 for M and N primes. Us-
ing these results, he treats cases VII and VIII together, establishing them uncondition-
ally. As for the remaining cases, his proof depends on a different auxiliary hypothesis,
Hypothesis B, than he had previously used. His auxiliary hypothesis here is:

Hypothesis B. For any a, there exists b with ab = −1.




Of course, by this time Gauss had proved the law of quadratic reciprocity in general.
Gauss gave two proofs in [3] and a third in [4]. (The author admits to not being able
to read Latin, and he consulted [3, 4, 5, 6] in the German translation [7].) In [12, par.
(381)], Legendre gives Gauss’s third proof as well.
In Legendre’s two-volume 1830 book [13] we again find his attempt to prove
quadratic reciprocity. Here his proof is the same as in 1808, with the same dependence
on Hypothesis B in some of the cases. He again gives Gauss’s third proof, but he also
gives a proof in [13, par. (679)] that is a modification of a proof due to Jacobi [8].
Jacobi’s proof is in part a simplification of Gauss’s
 sixth proof [6]. Jacobi begins with
what we now call the Gauss sum P = c−1 d
P
√ d=1 c exp(2πid/c). It is easy to show √
that
P 2 = (−1)(c−1)/2 c, so P = ± c if c is a prime of the form 4x + 1 and P = ±i c if
c is a prime of the form 4x − 1.
It is a celebrated theorem of Gauss [5] that the sign is always positive, and in that
work Gauss used this fact to provide his fourth proof of quadratic reciprocity. Jacobi
used the value of P. Legendre modified Jacobi’s proof to use only the value of P 2 so
that the difficult sign question could be bypassed. In fact, the observation that quadratic
reciprocity can be proved using only the value of P 2 and not the value of P goes back
to Gauss in his sixth proof [6]. Legendre describes the proof he gives as the simplest
of all known proofs of quadratic reciprocity.
Legendre realized full well that his own two proofs of quadratic reciprocity were
incomplete. The change from his 1785/1797 proof to his 1808/1830 proof enabled
him to prove an additional case of reciprocity unconditionally. But this new proof also
involved a change of the auxiliary hypothesis on which the proof of the remaining
cases depended. Evidently, since he replaced Hypothesis A by Hypothesis B in his
later works, he regarded that as progress. In 1808 in [12, par. (169)], the first time
Hypothesis B appears, he observes that a, being of the form 4x + 1, is necessarily
of the form 8x + 1 or 8x + 5, and he can verify this hypothesis in the 8x + 5 case,
leaving the 8x + 1 case open. He further observes that that case splits into the two
cases 24x + 1 and 24x + 17, and he can verify this hypothesis in the 24x + 17 case,
choosing b = 3, leaving the 24x + 1 case open. In 1830 in [13, par. (171)] he observes
that in addition he has verified this hypothesis for each of the fifteen primes a of the
form 24x + 1 with a ≤ 1009. He verifies this by considering the remainders when a
is divided by 168 or 264, where he chooses b = 7 or b = 11 respectively.
In fact, Legendre’s earlier Hypothesis A is a consequence of Dirichlet’s theorem
on primes in an arithmetic progression, although, in the author’s opinion, this is cer-
tainly overkill. Ironically, there is no known proof of his Hypothesis B that does not
use quadratic reciprocity. Thus, what seemed to him to be an advance seems to us

March 2011] LEGENDRE’S WORK ON QUADRATIC RECIPROCITY 213


nowadays to be a step backwards. (This has been observed earlier; see for example
[14, 15].)
We now turn to the “supplements” to the law of quadratic reciprocity.
 The first of
these is that −1
c
= 1 if c is a prime of the form 4x + 1 and −1
c
= −1 if c is a prime
of the form 4x + 3. This was known to Fermat and has already been remarked upon
above. The second of these is that 2c = 1 if c is a prime of the form 8x + 1 or 8x + 7
and 2c = −1 if c is a prime of the form 8x + 3 or 8x + 5. Legendre derives this in an

elementary way in 1797 in [11, par. (148)] from the following theorem:

Theorem C.
(a) An odd prime c is of the form y 2 + z 2 if and only if c is of the form 8x + 1 or
8x + 5.
(b) An odd prime c is of the form y 2 + 2z 2 if and only if c is of the form 8x + 1 or
8x + 3.
(c) An odd prime c is of the form y 2 − 2z 2 if and only if c is of the form 8x + 1 or
8x + 7.

Legendre credits the discovery of all parts of this theorem to Fermat, with the first
proofs of parts (a) and (b) due to Euler and the first proof of (c) due to Lagrange.
This derivation of the value of 2c is unchanged in 1808/1830 in [12, 13], although
in those works Legendre investigates the equations M x 2 − N y 2 = ±2, and some but
2

not all cases of c follow more simply from those investigations, as he notes. (The
derivation of the value of 2c was much simplified by Gauss. In [3] he proves this in

an elementary way and in [4] he gives a second proof, using “Gauss’s lemma,” as part
of his third proof of reciprocity. Both of these proofs were well known in the 19th
century, appearing in the famous textbook [1], written by Dirichlet but with revisions
and supplements due to Dedekind, which gave (a slight reformulation of) Gauss’s third
proof of reciprocity as well. But only the last of these
 proofs is well known today, when
it has become the standard proof of the value of 2c .)
We have concentrated here on Legendre’s work, but we would like to make a few
more historical remarks. We have mentioned that Euler stated a conjecture equivalent
to the law of quadratic reciprocity (though it takes a bit of work to see that), but Euler’s
statement seems not to have had any influence on either Legendre or Gauss. Euler
coined the terms quadratic residue and nonresidue, which were not used by Legendre
but were used by Gauss. Legendre coined the term quadratic reciprocity, but this was
never used by Gauss, who always referred to this result as the “fundamental theorem
(in the theory of quadratic residues),” nor did Gauss ever use the Legendre symbol in
any of his works on the subject.
In the introduction to [3] Gauss writes that his work there had been done without
knowledge of prior results in the subject. He also writes there that in the meanwhile,
the “excellent” work [11] of the “highly deserving” Legendre appeared, but that he
did not rewrite [3] to take it into account, only adding a few additional remarks in the
Appendix. He makes some historical remarks in [3, par. (151)] immediately after he
proves the fundamental theorem, in which he comments on the efforts of Euler and
Legendre. In particular, Gauss credits Legendre for having arrived at that theorem in
[10], without having been able to completely prove it (as Legendre himself admitted
there), and then claims (quite fairly, in the opinion of the author) that his own proof is
the first proof.
In the first paragraph of [4] Gauss writes that in number theory it is often easy to
inductively arrive at results whose proofs lie very deep, or even which defy proof.

214 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



In the second paragraph of that work Gauss credits Legendre with being the first to
discover the fundamental theorem, although he could not prove it, but that he him-
self arrived at it independently in 1795 and was able to prove it only after a year’s
effort.
Legendre evidently felt that Gauss never gave him due credit for the law of quadratic
reciprocity, nor for the discovery of the method of least squares, which they both ar-
rived at as well. See [16] for a bitter commentary that Legendre wrote about Gauss.
To quote excerpts from [16], in the translation given there, “. . . the author [Legendre]
spoke of his method of least squares: it is enough to recall that he published it for the
first time, in 1805 . . . However, as a very celebrated geometer [Gauss] has not hesi-
tated to appropriate this method to himself in a work printed in 1809 . . . In addition,
we would have willingly spoken of the episode of 1809 as of a totally new and differ-
ent kind, if we had not found that in 1801 the same geometer made another attempt of
this type, in a way this earlier behavior was even more imperfect . . . ” While we may
feel that there is enough glory in the development of number theory to go around, that
feeling was apparently not shared by all the protagonists.
Finally, one may ask why Legendre continued to include his own incomplete proof
of the law of quadratic reciprocity in [12] and [13]. While this question cannot be
definitively answered without reading Legendre’s mind, the author speculates that
there were two reasons. First, Legendre hoped that a (simple) proof of his auxiliary
hypothesis would be found, thus vindicating his approach, and second, to have omit-
ted his proof would have been to weaken his claim of priority for this theorem, and
that is something he was certainly most unwilling to do.

ACKNOWLEDGMENTS. This work was done while the author was on sabbatical leave at the Mathematics
Institute of the University of Göttingen, which he would like to thank for its hospitality. He notes with particular
pleasure that the copy of [11] he consulted in Göttingen was the copy from Gauss’s personal library. Preparation
of this manuscript for publication was supported by a Faculty Research Grant from Lehigh University.

REFERENCES

1. P. G. L. Dirichlet, Vorlesungen über Zahlentheorie, Chelsea, New York, 1968; reprint of the 4th edition,
Braunschweig, 1893.
2. G. Frei, The reciprocity law from Euler to Eisenstein, in The Intersection of History and Mathematics,
Science Networks Historical Studies, vol. 15, S. Chikara, S. Mitsuo, and J. W. Dauben, eds., Birkhäuser-
Verlag, Basel, 1994, 67–90.
3. C.-F. Gauss, Disquisitiones Arithmeticae, privately printed, Leipzig, 1801.
4. , Theorematis arithmetici demonstratio nova, Commentationes soc. reg. sc. Gottingensis XVI,
1808.
5. , Summatio quarumdum serium singularium, Commentationes soc. reg. sc. Gottingensis recen-
tiores I, 1811.
6. , Theorematis fundamentalis in doctrina de residuis quadraticis demonstrationes et ampliationes
novae, Commentationes soc. reg. sc. Gottingensis recentiores IV, 1818.
7. , Untersuchungen über höhere Arithmetik (trans. H. Maser), American Mathematical Society/
Chelsea, Providence 2006.
8. C. G. J. Jacobi, Letter to Legendre of 5 August 1827, in Collected Works, vol. I, C. W. Borchardt, ed.,
Chelsea, New York, 1969, 390–396; reprint of the original edition, G. Reimer, Berlin, 1881.
9. A. Knoebel, R. Laubenbacher, J. Lodder, and D. Pengelley, Mathematical Masterpieces, Further Chron-
icles by the Explorers, Springer, New York, 2007.
10. A. M. Le Gendre, Recherches d’analyse indéterminée, Histoire de l’Académie royale des sciences avec
les Mémoires de Mathématique et de Physique pour la même Année, 1785, 465–559; also available at
http://gallica.bnf.fr.
11. A. M. Legendre, Essai sur la Théorie des Nombres, Paris, 1797.
12. , Essai sur la Théorie des Nombres, Paris, 1808.

March 2011] LEGENDRE’S WORK ON QUADRATIC RECIPROCITY 215


13. , Théorie des Nombres, Tomes I et II, Paris 1830. Also available in German translation in Zahlen-
theorie von Adrien-Marie Legendre (trans. H. Maser), Michigan Historical Reprint Series, University of
Michigan Library.
14. F. Lemmermeyer, Reciprocity Laws: From Euler to Eisenstein, Springer, Berlin, 2000.
15. H. Pieper, Über Legendres Versuche, das quadratische Reziprozitätsgesetz zu beweisen, Acta Hist.
Leopold. 27 (1997) 223–237.
16. S. M. Stigler, An attack on Gauss, published by Legendre in 1820, Historia Math. 4 (1977) 31–35.
doi:10.1016/0315-0860(77)90032-5

STEVEN H. WEINTRAUB is Professor of Mathematics at Lehigh University. His research spans a range of
areas in algebra, geometry, and topology, although lately he has become interested in the history of mathematics
as well. When he is not thinking about mathematics he can often be found reading mystery novels or flying his
airplane (but not doing both simultaneously).
Department of Mathematics, Lehigh University, Bethlehem, PA 18015
steve.weintraub@lehigh.edu

A Rational Function Without a Rational Antiderivative

We give a simple proof, based on the mean value theorem of calculus, that the
antiderivative of 1/(1 + x 2 ) is not rational.
Assume on the contrary that

P(x) p0 x n + · · · + pn
R(x) = =
Q(x) q0 x m + · · · + qm

is such that
1
R 0 (x) = ,
1 + x2
with p0 , q0 6= 0. Observe that R must be strictly increasing. Next, using the mean
value theorem, we see that, if x > 0, there exists y between x and 2x such that
x
R(2x) − R(x) = R 0 (y)x ≤ .
1 + x2
Hence
p0 q0 (2n − 2m )x n+m + · · ·
R(2x) − R(x) = →0
q02 2m x 2m + · · ·

as x → +∞, and so n ≤ m. Therefore R(x) approaches the same finite limit as


x → −∞ and x → +∞ ( p0 /q0 if n = m, and 0 if n < m), contradicting the
fact that it is strictly increasing.

—Submitted by André Giroux, Université de Montréal

216 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Equimodular Polynomials and the
Tritangency Theorems of Euler, Feuerbach,
and Guinand
Alexander Ryba and Joseph Stern

Abstract. We strengthen a result of Lehmer, obtaining a new necessary condition for the roots
of a complex polynomial to have equal modulus. From this we derive the famous theorem of
Feuerbach, as well as the less well-known theorems of Euler and Guinand on the tritangent
centers of a triangle. The latter theorems constrain the possible locations of the incenter and
excenters subject to fixed locations for the circumcenter and orthocenter.

1. INTRODUCTION. As early as 1765, Euler knew that if a triangle is varied while


its circumcenter and orthocenter remain fixed, then its incenter will stay within a
bounded region. Euler [3] characterized the region by a pair of simple inequalities,
but did not comment on its shape.
Remarkably, the shape of the region was not known until 1984, when Guinand [5]
proved it to be a punctured disc. Guinand’s method extended nicely to the excenters,
and he was able to show that each excenter ranges over the common exterior of Euler’s
punctured disc and a certain simple closed curve. We refer to the boundary curves of
these regions as the Euler shield and Guinand shield.1
There are now several proofs of Euler’s theorem (see, e.g., [8, 9, 10, 12, 13]). Our
approach is motivated by [10], which derives the theorem from the fact that a certain
cubic polynomial has complex roots of equal modulus. Following Lehmer, we call a
polynomial equimodular if all its roots have the same nonzero modulus. In [6], Lehmer
gives a necessary condition for equimodularity. We strengthen this condition and apply
the result to a special family of cubics, obtaining new proofs of the theorems of Euler
and Guinand. Just as Guinand explained the geometric meaning of the Euler shield, our
proof introduces a synthetic construction of the Guinand shield, shedding light on its
geometric meaning. We are also able to extract a short proof of Feuerbach’s theorem.
The three theorems—Euler, Feuerbach, Guinand—we call “E-F-G”.

2. THE THEOREMS E-F-G. Euler’s 1765 article [3] marks an important milestone
in triangle geometry. In it, he introduces the line now referred to as the Euler line,
as well as his formula for the distance between the circumcenter and the incenter of
a triangle. These points are two of the four classical triangle centers, illustrated in
Figure 1.
The circumcenter O is the center of the circumscribed circle, as well as the inter-
section point of the perpendicular bisectors of the sides.2 The incenter I is the center
of the inscribed circle or incircle, and also the intersection point of the angle bisec-
tors. The remaining classical centers are the orthocenter H , where the altitudes meet,
and the centroid G, where the medians meet. In a nonequilateral triangle, the points
O, G, and H lie on the Euler line in the order O-G-H , with GH = 2 · OG. A fifth
triangle center, apparently unknown to Euler, is the nine-point center N . This is the
doi:10.4169/amer.math.monthly.118.03.217
1 After a suggestion of John Conway.
2 Our notation and terminology are consistent with those of [2].

March 2011] EQUIMODULAR POLYNOMIALS AND TRITANGENCY THEOREMS 217


A

G O
I
N
H

B D A C

Figure 1. The classical triangle centers and the Euler segment OH.

center of the nine-point circle, which passes through the midpoints of the three sides
and the feet of the three altitudes. It turns out that N is also the midpoint of OH, so
that OG : GN : NH = 2 : 1 : 3. The nine-point circle is often attributed to Euler, but
no evidence has been found to support this attribution. According to [7], there are
precedents of the nine-point circle theorem dating as far back as the 1804 work of
Benjamin Bevan, but it is first explicitly described in the 1821 article [1] of Brianchon
and Poncelet. The term “nine-point circle” was coined in 1842 by O. Terquem [11].
Although the 1765 paper is historically important for its introduction of the Euler
line, Euler states that his primary aim is to compute the sides of a triangle in terms of
its central distances, OH, OI, and IH. He does this by showing that for a nonequilateral
triangle, the central distances determine the coefficients of a real cubic whose roots are
exactly the sides a, b, c. This leads to a pair of necessary conditions on OH, OI, IH,
which come from the fact that the cubic must have three positive real roots. Guinand [5]
shows that Euler’s necessary conditions are also sufficient to guarantee the existence
of a triangle with pre-specified central distances. This is now called Euler’s theorem.
In order to formulate Euler’s theorem, we begin by defining two functions

N (O, H ) = 21 (O + H ) and G(O, H ) = 13 (2O + H ),

where O and H are any two points. In the archetypal case when O and H are the
circumcenter and orthocenter of a triangle, N (O, H ) and G(O, H ) coincide with the
nine-point center N and the centroid G. For brevity, we always write N = N (O, H )
and G = G(O, H ).

Definition 1. The Euler shield determined by two points O and H , denoted by  =


(O, H ), is the circle with GH as diameter (Figure 3).

The Euler shield can also be described as the locus of a point X such that OX =
2 · NX. This equation defines a circle of Apollonius whose center lies on line ON.
Since OG : GN : NH = 2 : 1 : 3, the values X = G and X = H satisfy the equation.
As G and H lie on line ON, the circle of Apollonius has GH as a diameter. We may
now state Euler’s theorem.

Theorem E (Euler, 1765). Three points O, H , and I are the circumcenter, orthocen-
ter, and incenter of a triangle if and only if I is inside the Euler shield  and differs
from N .

218 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Another celebrated theorem of the 1765 paper is the formula3

OI 2 = R(R − 2r ). (1)

Here R is the circumradius, or radius of the circumscribed circle, while r is the in-
radius, or radius of the incircle. Equation (1) often appears in conjunction with Feuer-
bach’s relation

NI = 21 R − r, (2)

which expresses the famous result that the incircle and the nine-point circle are inter-
nally tangent. The term 21 R represents the radius of the nine-point circle.4
Both (1) and (2) have analogues in which the incenter is replaced by one of the
three excenters. These are the centers of the excircles, which touch one side of 4ABC
internally and the other two sides externally (see Figure 2). Each is the intersection of
one internal angle bisector with two external angle bisectors. We denote the excenters
opposite A, B, C by E a , E b , E c . An arbitrary excenter will be denoted by E, with ρ
being the radius of the corresponding excircle. The incircle and excircles are collec-
tively called tritangent circles, and their centers tritangent centers. Theorems about
them are called “tritangency theorems”.

C B
I
N
C
B A

Ea

Figure 2. The excenter E a and A-excircle; Feuerbach’s theorem.

The analogues of formulas (1) and (2) for an excenter are as follows:

OE2 = R(R + 2ρ) (3)


NE = 12 R + ρ. (4)

Together, (2) and (4) comprise Feuerbach’s theorem (illustrated in Figure 2):
3 Euler expresses the squared distances between the classical centers in terms of the area 1 and the elemen-

tary symmetric functions of the sides a, b, c. His expression for OI 2 is converted to the form (1) by substitution
of the well-known formulas 21 = r (a + b + c) and 4R1 = abc.
4 Two circles are internally (externally) tangent if and only if the distance between their centers is equal to

the difference (sum) of their radii. The nine-point circle has radius 21 R because it is the circumscribed circle of
the midpoint triangle.

March 2011] EQUIMODULAR POLYNOMIALS AND TRITANGENCY THEOREMS 219


Theorem F (Feuerbach, 1822). In a nonequilateral triangle, the nine-point circle is
internally tangent to the incircle, and externally tangent to the excircles.

Guinand sought constraints on the location of an excenter, just as Euler had for
the incenter. The region of possible excenters turns out to be the common exterior of
 and a rational algebraic curve 0, which we call the Guinand shield (see Figure 3).
Guinand’s definition of this curve is stated in algebraic rather than geometric language.
As we will see, 0 turns out to have a simple geometric characterization. Like that of
the Euler shield, it depends only on the points O and H .

Definition 2. The Guinand shield determined by two points O and H , denoted by


0 = 0(O, H ), is defined as follows. If X is a point on the Euler shield , let ` X be the
line HX when X 6= H , and the tangent line of  at H when X = H . For each X ∈ ,
there are two points L ∈ ` X such that L X = 2 · O X . Each such L has a reflection P
across O. Then 0 is the joint locus of the two points P as X varies over .

The construction involved in this definition is fully illustrated in Figure 5; Figure 3


shows only the resulting locus. We now state Guinand’s theorem.

Theorem G (Guinand, 1984). Three points O, H , and E are the circumcenter, the
orthocenter, and one excenter of a triangle if and only if E lies outside both  and 0.

OG N H

Figure 3. The Euler shield  and Guinand shield 0.

3. INVERSIVE STABILITY. Euler and Guinand proved their theorems by gener-


ating the desired triangle from the roots of real cubic equations. The roots of Euler’s
cubic are the sides a, b, c, while those of Guinand’s cubic are cos A, cos B, cos C.
Continuing the approach of [10], we generate the vertices A, B, C—regarded as points
of C—from a complex cubic, by squaring its three roots. The circumcenter of 4ABC
lies at 0 precisely when the associated cubic is equimodular. This observation leads
us to study the notion of equimodularity in its own right, and in this section we will
develop a useful necessary condition for it.
A polynomial f will be called inversively stable if for some r > 0, the set of com-
plex roots of f is invariant under inversion in the circle |z| = r . Every equimodular
polynomial is inversively stable with respect to the circle on which its roots lie. The
following lemma completely characterizes inversive stability, and therefore provides a
necessary condition for equimodularity.

220 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Lemma 1. The monic polynomial

f (z) = z n + cn−1 z n−1 + · · · + c1 z + c0

is inversively stable with respect to a circle of radius r > 0 if and only if

r 2k ck = c0 c̄n−k , k = 0, 1, . . . , n. (5)

The only possible radius for which this may occur is r = |c0 |1/n .

In formula (5), the bar denotes complex conjugation. Of course, cn = 1.

Proof. Inversion in the circle of radius r is the map z 7 → r 2 /z̄. This transforms f (z)
into f (r 2 /z̄), which has the same set of roots as

zn c̄1r 2 n−1 c̄2r 4 n−2 r 2n


 2
r
g(z) = · f = zn + z + z + ··· + .
c̄0 z̄ c̄0 c̄0 c̄0
Since f and g are monic and have the same degree, the inversive stability of f is equiv-
alent to the condition f = g. Equating coefficients and taking conjugates produces (5).
When k = n, (5) becomes r 2n = c0 c̄0 = |c0 |2 , or r = |c0 |1/n .

Taking moduli in (5) gives the following weaker condition of Lehmer [6]:

r k |ck | = r n−k |cn−k |, k = 0, 1, . . . , n.

Now let each monic polynomial be identified with its vector of coefficients. This
puts the natural topology of Cn on the set of monic polynomials of degree n. When-
ever we apply topological terms to polynomial spaces, it is this topology that is to be
understood. We call a polynomial simple if it has no repeated roots. The following
lemma is a direct consequence of the implicit function theorem, and presents a key
topological property of simple polynomials.

Lemma 2 (Root Continuity Lemma). Let f 0 be a simple polynomial of degree n.


There is a neighborhood U of f 0 , and a set of n continuous functions ρ1 , . . . , ρn from
U into C, such that the roots of any f ∈ U are exactly ρ1 ( f ), . . . , ρn ( f ).

Proof. For each root z 0 of f 0 , the implicit function theorem gives a neighborhood W of
f 0 and a continuous function ρ : W → C such that ρ( f 0 ) = z 0 and f (ρ( f )) = 0 for
all f ∈ W . The n distinct roots of f 0 give rise to n such “root functions” ρ1 , . . . , ρn ,
each continuous near f 0 . Let V be a neighborhood of f 0 in which all the ρ j are con-
tinuous. Write δ( f ) = min j6=k |ρ j ( f ) − ρk ( f )|. Note that δ is continuous on V , and
δ( f 0 ) > 0. Thus δ > 0 in some neighborhood U of f 0 . For each f ∈ U , the num-
bers ρ1 ( f ), . . . , ρn ( f ) furnish n distinct solutions of f (z) = 0. As deg f = n, these
solutions exhaust the roots of f .

We prove next that the “equimodular region” in a space of inversively stable simple
polynomials is both open and closed, and is therefore a union of connected compo-
nents. This will later allow us to identify the Euler and Guinand shields with the
boundaries of certain equimodular regions.

Theorem 1. In any space of simple, inversively stable polynomials, the set of equimod-
ular elements and the set of non-equimodular elements are each open.

March 2011] EQUIMODULAR POLYNOMIALS AND TRITANGENCY THEOREMS 221


Proof. Let S be a set of simple, inversively stable polynomials with the relative topol-
ogy inherited from polynomial space. Let M be its equimodular subset, and take
f 0 ∈ M. The preceding lemma gives a neighborhood U of f 0 in which the roots ρk ( f )
vary continuously with f . The roots of f 0 are distinct and equimodular, so their ar-
guments must be distinct. Since these arguments are locally continuous functions of
the roots themselves, there is a possibly smaller neighborhood V of f 0 in which the
arguments of the roots remain distinct. An inversively stable polynomial whose roots
have distinct arguments is obviously equimodular. Hence V ⊂ M, and M is open in
S. Similarly, each f 0 ∈ S − M has at least two roots of unequal modulus, and by the
preceding lemma, so does any f sufficiently close to f 0 . Thus S − M is open in S.

4. FEUERBACH’S THEOREM. In our approach to the E-F-G theorems, points of


the plane are represented by complex coordinates. This is suggested by the following
theorem of [10], which relates the classical triangle centers to the algebra of C. It
assumes a coordinate system whose origin lies at the circumcenter of the given triangle.
This puts O = 0, and therefore H = 2N = 3G.

Theorem 2. Suppose 4ABC has circumcenter 0. Exactly two of the eight sets of
square roots α, β, γ of A, B, C make 4αβγ acute-angled. For either set,

H = α2 + β 2 + γ 2 (6)
I = −(βγ + γ α + αβ) (7)
E a = −βγ + γ α + αβ, E b = βγ − γ α + αβ, E c = βγ + γ α − αβ. (8)

Sketch of proof. Since G = 31 (A + B + C), we have H = 3G = A + B + C, i.e., (6).


Let X, Y, Z be the midpoints of arcs BC, CA, AB of the circumscribed circle, as in
Figure 4. Simple arc-measure computations show that I is the orthocenter of 4XYZ,
so that (6) gives I = X + Y + Z . In addition, it can be shown that X is the midpoint of
IEa , so E a = X − Y − Z . Writing A = α 2 , B = β 2 , and C = γ 2 , clearly X = ±βγ ,
Y = ±γ α, and Z = ±αβ. In [10] it is shown that when 4αβγ is acute-angled, all the
± signs become −, yielding (7) and (8).

Y
A = α2

Z
0
I

B = β2 C = γ2

Ea

Figure 4. Constructions used in the proof of Theorem 2.

222 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Equations (6)–(8) yield (α + β + γ )2 = H − 2I and (−α + β + γ )2 = H − 2E,
where E = E a is the excenter opposite A. Thus we may select values of the relevant
square roots for which
√ √
α + β + γ = − H − 2I , −α + β + γ = − H − 2E.

Euler’s relations (1) and (3) have several relatively straightforward proofs (see, e.g.,
[2]). Combining them with our condition (5) yields a short new proof of Feuerbach’s
theorem.

Proof of Theorem F. Since O = 0, the vertices A, B, C are equimodular. By Theorem


2, the points A, B, C have a set of equimodular square roots α, β, γ such that

α + β + γ = − H − 2I and βγ + γ α + αβ = −I.

Let Q = −αβγ , so |Q|2 = |A · B · C| = R 3 . Note that α, β, γ are the roots of



f I (z) = z 3 + H − 2I z 2 − I z + Q. (9)

Since f I is equimodular, the necessary condition (5) applies in the form



R 2 H − 2I = −Q I¯.

Taking moduli and squaring both sides gives R 4 |H − 2I | = R 3 |I |2 . Substituting H =


2N and |I |2 = R(R − 2r ) (formula (1)), we find that |N − I | = 12 R − r . This says
that the distance between the centers of the incircle and the nine-point circle equals
the difference between their radii. Thus the two circles are internally tangent.
As for E = E a , observe that −α, β, γ are the equimodular roots of

f E (z) = z 3 + H − 2E z 2 − E z − Q. (10)

In this case, condition (5) and formula (3) yield |N − E| = 12 R + ρ, implying the
external tangency of the nine-point circle and the A-excircle.

5. THE THEOREMS OF EULER AND GUINAND. As in Section 4, we use com-


plex coordinates with O = 0. Fix a point H 6 = 0. Recall that the points O and H
determine N and G, as well as the Euler and Guinand shields  and 0. We now in-
vestigate conditions under which a point P is eligible to be a tritangent center of some
triangle with circumcenter O and orthocenter H . Such triangles will be called admis-
sible in what follows. As we will see, the desired conditions turn out to involve 
and 0.
Neither O nor N is eligible to be a tritangent center. An admissible triangle is clearly
nonequilateral, so Feuerbach’s theorem implies that N differs from each tritangent
center. This means that the right-hand side of (2) is positive, and likewise for the right-
hand side of (1). Thus O differs from I , and (3) shows immediately that O also differs
from E a , E b , and E c .
The following lemma constructs a pair of inversively stable cubics indexed by a
point P. Either of the cubics associated with a given point will later determine whether
the point is a tritangent center for some admissible triangle.

March 2011] EQUIMODULAR POLYNOMIALS AND TRITANGENCY THEOREMS 223


Lemma 3. Given a point P ∈ / {0, N }, each of the two square roots of H − 2P deter-
mines a unique inversively stable polynomial of the form

z3 + H − 2P z 2 − Pz + Q (Q ∈ C). (11)

The two polynomials so determined have mutually negative roots.

Proof. By Lemma 1, a polynomial of the form (11) is inversively stable if and only if

|Q|2/3 (−P) = Q H − 2P. Taking moduli allows us to solve for |Q|, and substitut-
ing the result back into the original equation gives

P2 √

P H − 2P. (12)

Q = −
(H − 2P)2

By hypothesis, H − 2P 6 = 0. Associate the signs ± arbitrarily with the two square


roots of H − 2P. It follows from (12) that the two corresponding polynomials f P+ and
f P− are uniquely determined by P, and also that f P− (−z) = − f P+ (z). Thus the roots of
f P+ and f P− are mutual negatives.

It turns out that there is no need to distinguish between f P+ and f P− , and we write f P
for either of them. The reason is that f P+ and f P− have the same equimodularity status,
and throughout this paper their roots only appear in homogeneous combinations of
degree 2, which are invariant under z 7 → −z.
We call a polynomial square-simple if the squares of its roots are distinct.

Theorem 3. The point P is a tritangent center of some admissible triangle if and only
if f P is equimodular and square-simple.

Proof. Suppose P is a tritangent center for an admissible triangle 4ABC, and con-
sider α, β, γ , the equimodular square roots of A, B, C supplied by Theorem 2. For-
mulas (9) and (10) show that the roots of f P are ±α, ±β, ±γ (e.g., −α, β, γ when
P = E a ). Thus f P is equimodular, and since A, B, C are clearly distinct, f P is also
square-simple.
Conversely, suppose f P has equimodular roots a, b, c with distinct squares A, B, C.
Since
√ A, B, C are distinct points of a circle, they form2 a triangle. Now a + b + c =
− H − 2P and bc + ca + ab = −P, so that H = a + b2 + c2 = A + B + C. By
(6), 4ABC is admissible. Theorem 2 supplies a possibly different set of square roots
α, β, γ of A, B, C satisfying (6)–(8). However, since a = ±α, b = ±β, and c = ±γ ,
the point P = −(bc + ca + ab) can only be one of

−(βγ + γ α + αβ), −βγ + γ α + αβ, βγ − γ α + αβ, βγ + γ α − αβ.

These are precisely the tritangent centers of 4ABC by (7) and (8).

Naturally we may ask whether the triangle of Theorem 3 is unique. Suppose P is a


tritangent center for each of 4A1 B1 C1 and 4A2 B2 C2 . The preceding proof shows that
A1 , B1 , C1 are the squares of the roots of f P , and similarly for A2 , B2 , C2 . Hence the
two triangles coincide. It follows that no point can be an incenter for one admissible
triangle and an excenter for another. The two triangles would coincide, and no point
can be both inside and outside the same triangle. We have thus proved:

224 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Corollary 4. The regions of possible incenter locations and possible excenter loca-
tions are disjoint.

There is also a third region, of points ineligible to be a tritangent center. This follows
from Theorem 3, once it is noted that there are nonequimodular cubics f P . Consider
for instance P = −2G (conveniently chosen inside 0). Choosing λ ∈ C such that
7λ2 = G, formulas (11) and (12) give

f −2G (z) = z 3 + 7λz 2 + 14λ2 z + 8λ3 = (z + λ)(z + 2λ)(z + 4λ).

Because f −2G is nonequimodular, −2G is not eligible to be a tritangent center.


Theorem 3 suggests that we try to determine those points P for which f P is
equimodular but not square-simple. There are two ways for square-simplicity to fail:
a pair of equal roots, or a pair of mutually negative roots. The two cases will give rise
to two curves, which turn out to be the Euler and Guinand shields.

Theorem 5. The polynomial f P is equimodular and has a pair of mutually negative


roots if and only if P lies on the Euler shield .

Proof. Note that f P is equimodular and has a pair of mutually negative roots if and
only if f P (z) = (z − z 0 )(z + z 0 )(z − ωz 0 ), where z 0 , ω ∈ C satisfy z 0 6 = 0 and |ω| =
1. Equating coefficients and writing H = 2N , this is equivalent to

2N − 2P = ω2 z 02 , P = z 02 . (13)

Since z 0 and ω are arbitrary subject to z 0 6 = 0 and |ω| = 1, our two equations are
equivalent to the single condition 2|N − P| = |P|. This is exactly the Apollonius
condition 2 · NP = OP, which characterizes .

Theorem 6. The polynomial f P is equimodular and has two equal roots if and only if
P lies on the Guinand shield 0.

Proof. Observe that f P is equimodular and has two equal roots if and only if f P (z) =
(z − z 0 )2 (z − ωz 0 ), where z 0 , ω ∈ C have z 0 6 = 0 and |ω| = 1. Equating coefficients
and putting L = −P, this amounts to

H + 2L = z 02 (2 + ω)2 , L = z 02 (1 + 2ω). (14)

Suppose (14) is true, and define X ∈ C by X = z 02 . It follows at once that

H = (2 + ω2 )X. (15)

Thus 2(N − X ) = H − 2X = ω2 X , and therefore 2 · NX = OX, that is, X ∈ .


Similarly, from L − X = 2ωX we infer that LX = 2 · OX. Next, notice that

H−X (1 + ω2 )X ω + ω̄
= = = Re(ω).
L−X 2ωX 2
If X 6= H , then H − X and L − X are related by a real nonzero scale factor, so L
must lie on line HX, which is denoted by ` X . If X = H , we have ω = ±i. Thus
L − H = L − X = 2ωX = ±2i H , and LH ⊥ OH. This means that L lies on the
tangent line of  at H , denoted by ` H . Therefore P ∈ 0.

March 2011] EQUIMODULAR POLYNOMIALS AND TRITANGENCY THEOREMS 225


Conversely, suppose P ∈ 0. Then some X ∈  has L ∈ ` X and LX = 2 · OX. Since
|X | = 2|N − X |, it makes sense to define z 0 , ω ∈ C by setting z 02 = X and ω2 =
2(N − X )/ X , as these imply z 0 6 = 0 and |ω| = 1. Equations (14) may be recovered by
reorganizing the computations of the preceding paragraph.

Incidentally, if we eliminate X between (15) and the second equation of (14), and
if we put ω = eit , the following simple parametrization of 0 results:

1 + 2eit
 
P=− H, 0 ≤ t ≤ 2π.
2 + e2it

The parametrization confirms that 0 is a continuous closed curve.5


We now establish a minor technical point, that  and 0 divide the plane into three
connected regions. Of course, Figure 3 makes this completely apparent.

Lemma 4. C − 0 −  has exactly three connected components.

Proof. Consider the reflection of 0 through O, which we call 3 (Figure 5). For each
X ∈ , there are two points of ` X at a distance of 2 · O X from X . These are L 1 , L 2 ∈
3, labeled so that L 1 and H are on the same side of X whenever X 6 = H . (When
X = H , the L i are interchangeable.)

L = L2

P = P1

O G N H

L = L1

P = P2

Figure 5. Construction of 3 and 0.

As X varies over , the distances HX and OX are respectively maximized and


minimized when X = G. Since HG = 2 · OG, we have 2 · OX ≥ HX, with equality
for X = G only. Thus H is strictly between L 1 and L 2 for X 6 = G, and L 1 = H for
X = G. This means that every ray from H meets 3 in at most one other point. In
particular, H is the only candidate for a self-intersection point of 3. However, as X
makes a circuit of , L 2 never passes through H , while L 1 does so only for X = G.
Thus in fact 3 is a Jordan curve, and so is 0.
Observe next that 0 ∩  = {G}. For suppose P ∈ 0 ∩ . By Theorems 5 and 6, f P
must have the form (z − z 0 )2 (z + z 0 ). Equating coefficients gives H − 2P = z 02 = P,
so that 3P = H , and P = H/3 = G.
5 It also shows that 0 is a rational curve, since eit is a rational function of u = tan(t/2).

226 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Since  − {G} is connected and disjoint from 0, it is either entirely inside 0 or
entirely outside. In fact it must be outside 0, since H is. Thus Ext(0) −  has two
components, while C − 0 −  has three.

Finally, we will use a continuity argument to show that the three connected re-
gions formed by the Euler and Guinand shields coincide with the regions of incenters,
excenters,
√ and non-tritangent points.6 The argument requires a continuous choice of
H − 2P, and to achieve this we will cut the plane. For each point P, there are two
possibilities for f P , one for each square root of H − 2P. The branch point of these
square roots is N . If we cut the plane
√ along a continuous curve from N to ∞, then in
the cut plane there is a choice of H − 2P which varies continuously with P. Formu-
las (11) and (12) supply a corresponding continuous choice of f P . It will be convenient
to cut along the ray κ = NH, because this ray avoids 0. Notice that we can modify κ
slightly to obtain a cut κ ∗ that avoids both 0 and any given point P0 ∈ κ − {N }. We
simply take a disc about P0 that avoids 0, and deform κ inside this disc so as to avoid
P0 .

Theorem 7. f P is equimodular and simple if and only if P ∈ Ext(0) − {N }.

Proof. The set Cκ = (C − κ) − 0 − {0} has two components, one inside 0 and one
outside. Choose the map f : P 7 → f P to be continuous on Cκ . Let M be the equimod-
ular subset of f (Cκ ). By Theorem 1, both M and f (Cκ ) − M are open. The continuity
of f implies that both U = f −1 (M) and its complement V = Cκ − U are open. Thus
U and V must be the components of Cκ .7 Since −2G lies between −H and G, inside
0, in fact V = Int(0) − {0} and U = Ext(0) − κ.
We have now accounted for all points P ∈ / κ. To account for a given point P0 ∈
κ − {N }, the preceding argument may be repeated in the cut plane C − κ ∗ , where κ ∗
is a suitably deformed cut avoiding 0 and P0 .

We now know that the region inside 0 contains the non-tritangent points. To com-
plete our proof of the theorems of Euler and Guinand, we need only identify the two
regions outside 0.

Proof of Theorems E and G. The set Dκ = Ext(0) −  − κ has two components, one
inside  and one outside. (Cutting along the ray NH does not disconnect the interior
of , as may be seen in Figure 3.) Again, choose f : P 7 → f P so as to be continuous
on Dκ . With each P ∈ Dκ we may associate a triangle 4ABC having one tritangent
center at P (Theorems 3, 5, 6, 7). Recall that A, B, C are the squares of the roots of f P .
Since P 7→ f P is continuous, the root continuity lemma implies that P 7 → (A, B, C)
is continuous. Moreover, P is the incenter of 4ABC if and only if P is inside 4ABC.
Consider the partition Dκ = I ∪ E, where I and E are the subsets of possible incen-
ters and possible excenters (which are disjoint by Corollary 4). We claim that I and E
are open. Take a point P0 ∈ I. Being an incenter, P0 is inside its triangle 4A0 B0 C0 .
By the continuity of P 7 → (A, B, C), there is a neighborhood U of P0 such that each
P ∈ U lies inside its triangle 4ABC. Hence each P ∈ U is a possible incenter, and
U ⊂ I. This shows that I is open. A similar argument shows that E is open.
6 In[5], Guinand refers to the region of non-tritangent points as the acentric lacuna.
7A fixed admissible triangle will have some excenter E ∈ / κ, showing that U is nonempty. When κ is later
replaced by the deformed cut κ ∗ , the deformation may be taken sufficiently small that E ∈
/ κ ∗ , so that U is still
nonempty. In either of these cases, −2G ∈ V .

March 2011] EQUIMODULAR POLYNOMIALS AND TRITANGENCY THEOREMS 227


Exactly as in the proof of Theorem 7, I and E must be the components of Dκ . Euler’s
formulas (1) and (3) imply that the incenter and excenters of any admissible triangle
satisfy |E|2 > R 2 > |I |2 . Thus if I contains points of arbitrarily large modulus, so
must E. However, Dκ has only one unbounded component. So I must be bounded, and
we have I = Int() − κ. Accounting for points of κ − {N } with deformed cuts κ ∗ , we
conclude that Int() − {N } is the incenter region and Ext(0) ∩ Ext() is the excenter
region.

REFERENCES

1. C. J. Brianchon and J.-V. Poncelet, Géométrie des courbes. Recherches sur la détermination d’une hyper-
bole équilatère, au moyen de quatres conditions données, Annales de Gergonne 11 (1820–1821) 205–220;
also available at http://www.numdam.org/item?id=AMPA_1820-1821__11__205_0.
2. H. S. M. Coxeter and S. L. Greitzer, Geometry Revisited, Mathematical Association of America, Wash-
ington, DC, 1967.
3. L. Euler, Solutio facilis problematum quorundam geometricorum difficillimorum, Comm. Acad. Sci.
Petropol. 11 (1765) 103–123; also in Opera Omnia, A. Speiser, ed., ser. I, vol. 26, no. 325, 139–157;
available at Euler Archive, http://math.dartmouth.edu/~euler/.
4. K. W. Feuerbach, Eigenschaften einiger merkwürdigen Punkte des geradlinigen Dreiecks und mehrerer
durch sie bestimmten Linien und Figuren: Eine analytisch-trigonometrische Abhandlung, Riegel und
Wiesner, Nürnberg, 1822.
5. A. P. Guinand, Euler lines, tritangent centers, and their triangles, Amer. Math. Monthly 91 (1984) 290–
300. doi:10.2307/2322671
6. D. H. Lehmer, The complete root-squaring method, J. Soc. Indust. Appl. Math. 11 (1963) 705–717. doi:
10.1137/0111053
7. J. S. MacKay, History of the nine point circle, Proc. Edinb. Math. Soc. 11 (1892) 19–61. doi:10.1017/
S0013091500031163
8. B. Scimemi, Paper-folding and Euler’s theorem revisited, Forum Geom. 2 (2002) 93–104.
9. G. C. Smith, Statics and the moduli space of triangles, Forum Geom. 5 (2005) 181–190.
10. J. Stern, Euler’s triangle determination problem, Forum Geom. 7 (2007) 1–9; also available at http:
//forumgeom.fau.edu/FG2007volume7.
11. O. Terquem, Considérations sur le triangle rectiligne, Nouv. Ann. de Math. 1 (1842) 196–200; also avail-
able at http://www.numdam.org/item?id=NAM_1842_1_1__196_1.
12. A. Várilly, Location of incenters and Fermat points in variable triangles, Math. Mag. 74 (2001) 123–129.
doi:10.2307/2690626
13. P. Yiu, Conic solution of Euler’s triangle determination problem, J. Geom. Graph. 12 (2008) 75–80.

ALEX RYBA received his B.A. and Ph.D. from the University of Cambridge. His main area of interest is
finite group theory. After teaching at the University of Illinois at Chicago, the University of Michigan, and
Marquette University, he joined the faculty of Queens College CUNY in 1998. One of his first students at
Queens College was Joe Stern.
Department of Computer Science, Queens College, Flushing NY 11367
ryba@sylow.cs.qc.edu

JOE STERN is a Ph.D. student at Columbia University. After receiving the 1999 British Marshall Scholar-
ship, he joined the faculty of Stuyvesant High School, where he taught until 2009. He is a two-time recipient
of the MAA’s Edyth May Sliffe Award for Distinguished High School Teaching. Two of his last students at
Stuyvesant were Alex Ryba’s sons, Andrew and Nicholas.
Department of Mathematics, Columbia University, New York NY 10027
jstern@math.columbia.edu

228 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



The Sharkovsky Theorem:
A Natural Direct Proof
Keith Burns and Boris Hasselblatt

Abstract. We give a natural and direct proof of a famous result by Sharkovsky that gives a
complete description of possible sets of periods for interval maps. The new ingredient is the
use of Štefan sequences.

1. INTRODUCTION. In this note f is a continuous function from an interval into


itself. The interval need not be closed or bounded, although this is usually assumed
in the literature. The point of view of dynamical systems is to study iterations of f :
if f n denotes the n-fold composition of f with itself, then for a given point x one
investigates the sequence x, f (x), f 2 (x), f 3 (x), and so on. This sequence is called
the f-orbit of x, or just the orbit of x for short.
It is particularly interesting when this sequence repeats. In this case we say that x
is a periodic point, and we refer to the number of distinct points in the orbit or cycle
O := { f n (x) | n = 0, 1, . . . } as the period of x.1 Equivalently, the period of x is the
smallest positive integer m such that f m (x) = x.
A fixed point is a periodic point of period 1, that is, a point x such that f (x) = x.
A periodic point with period m is a fixed point of f m (and of f 2m , f 3m , . . . ). Thus, if
f n (x) = x, then the period of x is a factor of n.
If f has a periodic point of period m, then m is called a period for (or of ) f . Given
a continuous map of an interval one may ask what periods it can have. The genius of
Alexander Sharkovsky lay in realizing that there is a structure to the set of periods.

1.1. The Sharkovsky Theorem. The Sharkovsky Theorem involves the following
ordering of the set N of positive integers, which is now known as the Sharkovsky
ordering:
3 F 5 F 7 F · · · F 2 · 3 F 2 · 5 F 2 · 7 F · · · F 22 · 3 F 22 · 5 F 22 · 7 F · · ·
· · · F 23 F 22 F 2 F 1.
This is a total ordering; we write l F r or r G l whenever l is to the left of r .
It is crucial that the Sharkovsky ordering has the following doubling property:
l F r if and only if 2l F 2r. (1)
This is because the odd numbers greater than 1 appear at the left end of the list, the
number 1 appears at the right end, and the rest of N is included by successively dou-
bling these end pieces, and inserting these doubled strings inward:

odds, 2 · odds, 22 · odds, 23 · odds, . . . , 23 · 1, 22 · 1, 2 · 1, 1.

Sharkovsky showed that this ordering describes which numbers can be periods for a
continuous map of an interval.
doi:10.4169/amer.math.monthly.118.03.229
1 Dynamicistsusually refer to m as the least period.

March 2011] THE SHARKOVSKY THEOREM: A NATURAL DIRECT PROOF 229


Theorem 1.1 (Sharkovsky Forcing Theorem [14, 16]). If m is a period for f and
m F l, then l is also a period for f .

This shows that the set of periods of a continuous interval map is a tail of the Shar-
kovsky order. A tail is a set T ⊂ N such that s F t for all s ∈
/ T and all t ∈ T . There are
three types of tails: {m} ∪ {l ∈ N | l G m} for some m ∈ N, the set {. . . , 16, 8, 4, 2, 1}
of all powers of 2, and ∅.
The following complementary result is sometimes called the converse to the Shar-
kovsky Theorem, but is proved in Sharkovsky’s original papers.

Theorem 1.2 (Sharkovsky Realization Theorem [14, 16]). Every tail of the Shar-
kovsky order is the set of periods for some continuous map of an interval into itself.

The Sharkovsky Theorem is the union of Theorem 1.1 and Theorem 1.2: a subset
of N is the set of periods for a continuous map of an interval to itself if and only if the
set is a tail of the Sharkovsky order.
All proofs of the Sharkovsky Theorem that we know are elementary, no matter how
ingenious; the Intermediate-Value Theorem is the deepest ingredient. There is varia-
tion in the clarity of the proof strategy and its implementation. Our aim is to present,
with all details, a direct proof of the Forcing Theorem that is conceptually simple and
involves no artificial case distinctions. Indeed, its directness provides additional infor-
mation (Section 8). We also reproduce a proof of the Realization Theorem in Section
7 at the end of this note.
The standard proof of the Sharkovsky Forcing Theorem studies orbits of odd period
with the property that their period comes earlier in the Sharkovsky sequence than any
other period for that map. It shows that such an orbit is of a special type, known as
a Štefan cycle,2 and then that such a cycle forces the presence of periodic orbits with
Sharkovsky-lesser periods. The second stage of the proof considers various cases in
which the period that comes earliest in the Sharkovsky order is even. Finally, this
approach requires special treatment of the case in which the set of periods consists of
all powers of 2.
We extract the essence of the first stage of the standard proof to produce an argument
that does not need Štefan cycles, and we replace the second stage of the standard proof
by a simple and natural induction. Our main idea is to select a salient sequence of orbit
points and to prove that this sequence “spirals out” in essentially the same way as the
Štefan cycles considered in the standard proof.

1.2. History. A capsule history of the Sharkovsky Theorem is in [11], and [1] pro-
vides much context. The first result in this direction was obtained by Coppel [5] in the
1950s: every point converges to a fixed point under iteration of a continuous map of
a closed interval if the map has no periodic points of period 2; it is an easy corollary
that a continuous map must have 2 as a period if it has any periodic points that are not
fixed. This amounts to 2 being the penultimate number in the Sharkovsky ordering.
Sharkovsky obtained the results described above and reproved Coppel’s theorem in
a series of papers published in the 1960s [14, 16]. He also worked on other aspects
of one-dimensional dynamics (see, for instance, [13, 15, 17]). Sharkovsky appears to
have been unaware of Coppel’s paper. His work did not become known outside eastern
Europe until the second half of the 1970s. In 1975 this M ONTHLY published a famous
paper, “Period three implies chaos” [10] by Li and Yorke, which included the result that
the presence of a periodic point of period 3 implies the presence of periodic points of
2 “Š” is pronounced “Sh.”

230 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



all other periods.3 This amounts to 3 being the initial number in the Sharkovsky order.
Some time later Yorke attended a conference in East Berlin, and during a river cruise
a Ukrainian participant approached him. Although they had no language in common,
Sharkovsky (for it was he) managed to convey, with translation by Lasota and Mira,
that unbeknownst to Li and Yorke (and perhaps all of western mathematics) he had
proved his results about periodic points of interval mappings well before [10], even
though he did not at the time say what that result was.
Besides introducing the idea of chaos to a wide audience, Li and Yorke’s paper
was to lead to global recognition of Sharkovsky’s work. Within a few years of [10]
new proofs of the Sharkovsky Forcing Theorem appeared, one due to Štefan [18],
and a later one, which is now viewed as the “standard” proof, due to Block, Guck-
enheimer, Misiurewicz, and Young [3], Burkart [4], Ho and Morris [9], and Straffin
[19]. Nitecki’s paper [12] provides a lovely survey from that time. Alsedà, Llibre, and
Misiurewicz improved this standard proof [1] and also gave a beautiful proof of the
Realization Theorem, which we reproduce in Section 7.
The result has also been popular with contributors to the M ONTHLY. We mention
here a short proof of one step in the standard proof [2] and several papers by Du
[6, 8, 7]. Reading the papers by Du inspired the work that resulted in this article.

1.3. Related Work. There is a wealth of literature related to periodic points for one-
dimensional dynamical systems. [1] is a good source of pertinent information. There is
a characterization of the exact structure of a periodic orbit whose period comes earliest
in the Sharkovsky order for a specific map. There is also work on generalizations to
other permutation patterns (how particular types of periodic points force the presence
of others, and how intertwined periodic orbits do so), to different one-dimensional
spaces (that look like the letter “Y,” the letter “X,” or a star “∗”), and to multivalued
maps.

2. INTERVALS, COVERING RELATIONS, AND CYCLES.


f
Definition 2.1. We say that an interval I covers an interval J and write I −
→ J if
J ⊂ f (I ). We usually omit f and simply write I → J instead.

2.1. Coverings Produce Cycles. The Intermediate-Value Theorem allows us to trans-


late knowledge of how intervals are moved around into information about the presence
of periodic points. This is the content of the next three lemmas.
f
Lemma 2.2. If [a1 , a2 ] −
→ [a1 , a2 ], then f has a fixed point in [a1 , a2 ].

Proof. If b1 , b2 ∈ [a1 , a2 ] with f (bi ) = ai , then f (b1 ) − b1 ≤ 0 ≤ f (b2 ) − b2 . By the


Intermediate-Value Theorem, f (x) − x = 0 for some x between b1 and b2 .

Lemma 2.3 (Itinerary Lemma). If J0 , . . . , Jn−1 are closed bounded intervals and
f f f
J0 −
→ ··· − → Jn−1 − → J0 (this is called a loop or n-loop of intervals) then there is
a point x that follows the loop, that is, f i (x) ∈ Ji for 0 ≤ i < n and f n (x) = x.

Proof. We write I  J if f (I ) = J . If I → J , there is an interval K ⊂ I such that


K  J because the intersection of the graph of f with the rectangle I × J contains a
3 It should not be forgotten that Li and Yorke’s work contains more than a special case of Sharkovsky’s:

“chaos” is not just “points of all periods.”

March 2011] THE SHARKOVSKY THEOREM: A NATURAL DIRECT PROOF 231


minimal arc that joins the top and bottom sides of the rectangle. We can choose K to
be the projection to I of such an arc.

K I

Figure 1. Finding K  J .

Thus there is a closed bounded interval K n−1 ⊂ Jn−1 such that K n−1  J0 . Then
Jn−2 → K n−1 , and so there is K n−2 ⊂ Jn−2 such that K n−2  K n−1 . Inductively, there
are closed bounded intervals K i ⊂ Ji , 0 ≤ i < n, such that

K 0  K 1  · · ·  K n−1  J0 .

Any x ∈ K 0 satisfies f i (x) ∈ K i ⊂ Ji for 0 ≤ i < n and f n (x) ∈ J0 . Since K 0 ⊂


J0 = f n (K 0 ), Lemma 2.2 implies that f n has a fixed point in K 0 .

We wish to ensure that the period of the point x found in Lemma 2.3 is n and not a
proper divisor of n, such as for the 2-loop [−1, 0]  [0, 1] of f (x) = −2x, which is
followed only by the fixed point 0.

Definition 2.4. We say that a loop J0 → · · · → Jn−1 → J0 of intervals is elementary


if every point that follows it has period n.4

With this notion, the conclusion of Lemma 2.3 gives us:

Proposition 2.5. For an elementary loop J0 → · · · → Jn−1 → J0 there is a periodic


point with period n that follows the loop.

This makes it interesting to give convenient criteria for being elementary. The sim-
plest is that any loop of length 1 is elementary (since the period of a point that follows
such a loop must be a factor of 1). A criterion with wider utility is:

Lemma 2.6. A loop J0 → · · · → Jn−1 → J0 of intervals is elementary if it is not


followed by either endpoint S of J0 and the interior Int(J0 ) of J0 is disjoint from each of
n−1
J1 , . . . , Jn−1 , i.e., Int(J0 ) ∩ i=1 Ji = ∅.

Proof. If x follows the loop, then x ∈ Int(J0 ) because x ∈ J0 and it is not an endpoint.
If 0 < i < n then f i (x) ∈
/ Int(J0 ) because it is in Ji , and so x 6 = f i (x). Thus x has
period n.

2.2. Cycles Produce Coverings. A closed bounded interval whose endpoints belong
to a cycle O of f is called an O-interval.
4 This is a different use of the word “elementary” from the one in [1].

232 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



In the rest of the paper the above ideas will be applied to O-intervals. We will use
only information that can be obtained from the action of f on O and therefore applies
to all continuous maps f for which O is a cycle.
In particular, all of the covering relations I → J of O-intervals considered in the
rest of the paper are O-forced. By this we mean that J lies in the O-interval whose
endpoints are the leftmost and rightmost points of f (I ∩ O). By our standing assump-
tion that f is continuous and the Intermediate-Value Theorem, this implies I → J . We
say that a loop of O-intervals is O-forced if every arrow in it arises from an O-forced
covering relation.
Because in the remainder of the paper these are the only covering relations we will
f
→” and “→” will henceforth denote O-forced covering relations.
use, the symbols “−

3. EXAMPLES. The first example is the most celebrated special case of the Shar-
kovsky Theorem: that period 3 implies all periods. The second and third examples
apply the same method to longer cycles and illustrate how our choice of O-intervals
differs from that made in the standard proof. The last example illustrates our induction
argument, which is built on the doubling structure of the Sharkovsky order.

3.1. Period 3 Implies All Periods. A 3-cycle comes in two versions that are mirror
images of one another. In Figure 2, the dashed arrows indicate that x1 = f (x0 ), x2 =
f (x1 ), and x0 = f (x2 ). In both pictures, I1 is the O-interval with endpoints x0 and x1 ,
and I0 is the O-interval with endpoints x0 and x2 . The endpoints of I1 are mapped to
the very left and right points of the cycle, so we have the O-forced covering relations
I1 → I1 and I1 → I0 . The endpoints of I0 are mapped to those of I1 , and so I0 → I1
is O-forced. We summarize these covering relations by writing I1  I0 .

I0 I1 I1 I0

x2 x0 x1 x1 x0 x2
Figure 2. 3-cycles.

Since I1 → I1 , Lemma 2.2 implies that I1 contains a fixed point of f .


The endpoints of I1 cannot follow the cycle I1 → I0 → I1 because they are periodic
points with period 3, whereas a point that follows this cycle must have period 1 or 2.
By Lemma 2.6, f has a point with period 2.
No point of O, and hence no endpoint of I0 , has three consecutive iterates in the
interval I1 . Hence by Lemma 2.6 the loop

l − 1 copies of I1
z }| {
I0 → I1 → I1 → · · · → I1 → I0

is elementary if l > 3. Thus, f has a periodic point of period l for each l > 3.
This shows a special case of the Sharkovsky Theorem: the presence of a period-3
point causes every positive integer to be a period.

3.2. A 7-cycle. Consider a 7-cycle O and O-intervals as in Figure 3. Again, we write


xi = f i (x0 ) and I1 = [x0 , x1 ] and so on, as indicated. With this choice of intervals we
get the following O-forced covering relations:

March 2011] THE SHARKOVSKY THEOREM: A NATURAL DIRECT PROOF 233


I0 I5
I4
I3
I2
I1

x6 x4 x2 x0 x1 x3 x5

Figure 3. A 7-cycle.

(1) I1 → I1 and I0 → I1 ,
(2) I1 → I2 → I3 → I4 → I5 → I0 , and
(3) I0 → I5 , I3 , I1 .
This information can be summarized in a graph as follows:

I1 I2
I0 I3 (2)
I5 I4

From this graph we read off the following loops.


(4) I1 → I1 ,
(5) I0 → I5 → I0 ,
(6) I0 → I3 → I4 → I5 → I0 ,
(7) I0 → I1 → I2 → I3 → I4 → I5 → I0 ,
(8) I0 → I1 → I1 → · · · → I1 → I2 → I3 → I4 → I5 → I0 with 3 or more
copies of I1 .
I1 → I1 is elementary because it has length 1, and the remaining loops are elementary
by Lemma 2.6 because Int(I0 ) ∩ I j = ∅ if 1 ≤ j ≤ 5 and the loops cannot be followed
by an endpoint of I0 for reasons familiar from the previous example. The lengths of
these loops are 1, 2, 4, 6, and anything larger than 7, which proves that this cycle forces
every period l G 7.
The standard proof uses a different choice of O-intervals to study this example: the
interval Ii for each i with 2 ≤ i ≤ 5 is replaced by the interval between xi and xi−2 .
With this alternative choice one still obtains the covering relations (1)–(3), but our
choice of O-intervals adapts better to other situations such as that in the next example.

3.3. A 9-cycle. Figure 4 shows a 9-cycle O for which we chose O-intervals I0 , . . . , I5


such that Int(I0 ) ∩ I j = ∅ if 1 ≤ j ≤ 5 and the covering relations in the graph (2)
above are satisfied. The arguments in Subsection 3.2 apply word-for-word to show
that there are elementary loops, and hence periodic orbits, of length 1, 2, 4, 6, and
anything larger than 7.
The endpoints x0 , . . . , x6 of the intervals in Figure 4 spiral outwards from the “cen-
ter” c := (x0 + x1 )/2 like the corresponding points in Figure 3, but now they do not
constitute the entire cycle O and we do not have f (xi ) = xi+1 for every i.

234 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



I0 I5
I4
I3
I2
I1

x6 x4 x2 x0 x1 x3 x5

Figure 4. A 9-cycle.

The sequence x0 , . . . , x6 is chosen using the algorithm explained in Section 5. The


main idea in this algorithm is that one does not always choose xi+1 = f (xi ), but moves
inwards towards the center c if this will make f (xi+1 ) lie further from c. Figure 5
illustrates this with the graph of a simple function f that exhibits the cycle O.
Starting from a point (xi , f (xi )) on the graph of O one moves horizontally to the
diagonal, then vertically to the point ( f (xi ), f 2 (xi )) on the graph. Then, if possible,
one skips to a point on the graph of O that is closer to c in the horizontal direction and
further from c in the vertical direction; this point will be (xi+1 , f (xi+1 )). Such skips
happen in steps 2 and 3 of this example.
The process terminates when the sequence has spiralled out past a point (x6 here)
whose image under f is on the same side of c as the point itself.

x5 +

x3 +

x1 +
c +
x0 +

x2 +

x4 +

x6 +

+ + + + + + + + + +
x6 x4 x2 x0 c x1 x3 x5

Figure 5. The spiral of the points xi .

In the next section we abstract the properties of the endpoints of the intervals
I0 , I1 , . . . that are essential to the above argument.

3.4. A 6-cycle. Consider the 6-cycle in Figure 6. The salient feature here is that the
3 points in the left half are mapped to the 3 points in the right half and vice versa.
Therefore, the 3 points in the right half form a cycle • • • for the second iterate

March 2011] THE SHARKOVSKY THEOREM: A NATURAL DIRECT PROOF 235


I1 I1
I0 I0

x0 x1

Figure 6. A 6-cycle.

f2 f2
f 2 . As in Subsection 3.1 we have the covering relations I1 −→ I1 , I1 −→ I0 , and
f2
I0 −→ I1 for the intervals I0 and I1 shown in Figure 6. We can conclude as before that
f 2 has elementary loops of all lengths.
For f itself we choose two additional intervals I00 and I10 by taking I j0 to be the
shortest O-interval that contains f (I j ∩ O).
We now illustrate a recursive method we will use later: we show how to associate
with an elementary k-loop for f 2 an elementary 2k-loop for f itself. In the present
example this then tells us that every even number is a period.
f2
Consider an elementary k-loop for f 2 made using the covering relations I1 −→ I1 ,
f2 f2 f2 f f
→ I10 −
I1 −→ I0 , and I0 −→ I1 . Replace each occurrence of “I1 −→” by “I1 − →” and
f2 f f
each occurrence of “I0 −→” by “I0 − → I00 −→” and note that this produces a 2k-loop
for f that is not a k-loop traversed twice (which would cause difficulty with being
elementary). We show that it is elementary using the definition of elementary. Suppose
a point p follows the 2k-loop under f . We need to show that it has period 2k for f .
Observe that p follows the original elementary k-loop under f 2 and hence has period
k for f 2 . On the other hand, the iterates of p under f are alternately to the left and
the right of the middle interval (x0 , x1 ) since the 2k-loop for f alternates between
primed and unprimed intervals. Therefore, the orbit of p consists of 2k distinct points;
there are k even iterates on the right and k odd iterates on the left. This means that the
period of p for f is 2k. Since k was arbitrary, we infer that this 6-cycle forces all even
periods (as well as period 1 due to the interval [x0 , x1 ] in the center, which covers itself
under f ).
In the next 3 sections we prove the Sharkovsky Forcing Theorem 1.1. We first show
that the existence of a special sequence in an m-cycle O produces all desired cycles.
Next we construct such a sequence under a mild assumption on O. Finally we reduce
the general case to this latter one.

4. ŠTEFAN SEQUENCES PRODUCE CYCLES. Let m ≥ 2 and O an m-cycle of


a continuous map f on an interval.

Definition 4.1. Let p be the rightmost of those points in O for which f ( p) > p, and
q the point of O to the immediate right of p.
We define the center of O by c := ( p + q)/2. For x ∈ O we denote by Ox ⊂ O the
set of points of O in the closed interval bounded by x and c. That is, Ox = O ∩ [x, p]
when x ≤ p, and Ox = O ∩ [q, x] when x ≥ q.
We say that a point x ∈ O switches sides if c is between x and f (x).

From the examples in Section 3 we extract the following desirable properties of a


sequence of points of O.

236 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Definition 4.2. A sequence x0 , . . . , xn of points in O is called a Štefan sequence if
(Š1) {x0 , x1 } = { p, q}.
(Š2) x0 , . . . , xn are on alternating sides of the center c and the sequences (x2 j ) and
(x2 j+1 ) are both strictly monotone (necessarily moving away from c).
(Š3) If 1 ≤ j ≤ n − 1, then x j switches sides and x j+1 ∈ O f (x j ) .
(Š4) xn does not switch sides.

Remark 4.3. The condition x j+1 ∈ O f (x j ) in (Š3) means that c < x j+1 ≤ f (x j ) if
x j < c and c > x j+1 ≥ f (x j ) if x j > c.
(Š2) implies that x0 , . . . , xn are pairwise distinct. Hence n + 1 ≤ m and so n < m.
Figure 2 and Figure 3 show Štefan sequences that happen to consist of the entire cycle;
we have n + 1 = m in these cases. Figure 4 provides an illustration in which a Štefan
sequence is a proper subset of the cycle and n + 1 < m.
(Š1) and (Š4) together imply that n ≥ 2 and hence m ≥ 3. Note that for m = 1 the
Sharkovsky Forcing Theorem is vacuously true and for m = 2 it is an application of
Lemma 2.2.

Proposition 4.4. Suppose that the m-cycle O has a Štefan sequence. If l G m, then
f has an O-forced elementary l-loop of O-intervals and hence a periodic point with
least period l.

Given a Štefan sequence x0 , . . . , xn we define the desired O-intervals I0 , . . . , In−1


as follows. For 1 ≤ j < n, we take I j to be the shortest interval that contains x0 , x1 ,
and x j , while I0 is defined to be the O-interval with endpoints xn and xn−2 . It follows
from (Š2) that Int(I0 ) ∩ I j = ∅ if 1 ≤ j < n.

Proposition 4.5. With I j chosen as above, we have the following O-forced covering
relations.
(1) I1 → I1 and I0 → I1 .
(2) I1 → I2 → · · · → In−1 → I0 .
(3) I0 → In−1 , In−3 , . . . .
They can be summarized in a graph as follows:

I1
I0 In−5

In−1 In−4
In−2 In−3

Proof. (1) We will, in fact, prove that I j → I1 for j = 0, . . . , n − 1. This amounts to


showing that f (I j ) contains x0 and x1 .
By (Š2) and (Š3) (or by (Š1) if j = 1) the endpoints of I j for j = 1, . . . , n − 1
are on opposite sides of c and both switch sides. The endpoints of I0 are on the same
side of c, but one switches sides and the other does not, by (Š4). In either case f (I j )

March 2011] THE SHARKOVSKY THEOREM: A NATURAL DIRECT PROOF 237


contains points of O on both sides of c and must therefore contain x0 and x1 by (Š1)
and the definition of c.
(2) It suffices to show for 1 ≤ j ≤ n − 1 that f (I j ) contains x0 , x1 , and x j+1 . We
have already seen that x0 and x1 are in f (I j ). Since f (x j ) is also in the interval f (I j ),
this implies that O f (x j ) ⊂ f (I j ). It follows from this and (Š3) that x j+1 ∈ f (I j ) as
well.
(3) It suffices to show that f (I0 ) contains x0 , x1 , and all of the points xn−1 , xn−3 , . . .
of O that are on the opposite side of c from xn . We have already seen that f (I0 )
contains x0 and x1 . But xn−2 is in I0 and it follows from (Š3) that f (xn−2 ) is at least as
far from c as xn−1 , which is further from c than xn−3 , xn−5 , . . . , by (Š2). Consequently
the points xn−1 , xn−3 , . . . lie in f (I0 ).

From the graph in Proposition 4.5 we read off the following loops:
(L1) I1 → I1 ;
(L2) I0 → In−(l−1) → In−(l−2) → · · · → In−2 → In−1 → I0 for even l ≤ n;
(L3) I0 → I1 → I1 → · · · → I1 → I2 → · · · → In−1 → I0 with r ≥ 1 repeti-
tions of I1 (and hence of length l = n − 1 + r ).

Proof of Proposition 4.4. If l G m then there are 3 cases.


If l = 1 we use that the loop (L1) has length 1 and is hence elementary.
If l ≤ n is even, (L2) provides a loop of length l.
If n ≤ l 6= m, then (L3) with r = l − n + 1 provides a loop of this length.
The fact that Int(I0 ) ∩ I j = ∅ if 1 ≤ j < n combined with Lemma 2.6 will tell us
that these loops are elementary once we show that they cannot be followed by a point
of O. This holds for the loops in (L2) because they have length l ≤ n < m (Remark
4.3) and for the loops in (L3) because either they have length l < m, or else we have
l > m and hence r = l − n + 1 ≥ m + 1 − n + 1 ≥ 3 repetitions of I1 .

5. CONSTRUCTING A ŠTEFAN SEQUENCE. The Sharkovsky Forcing Theo-


rem would be immediate from Proposition 4.4 if every cycle had a Štefan sequence.
However, the cycle in Figure 6 has no Štefan sequence because every point switches
sides. We now show that this is the only obstacle to finding a Štefan sequence.

Proposition 5.1. A cycle with more than one point contains a Štefan sequence unless
every point switches sides.

Proof. Let O be a cycle with m ≥ 2 points.


First we identify a set S ⊂ O, which contains the points of O that are candidates to
be nonfinal terms in a Štefan sequence. Let M be the maximal O-interval containing
[ p, q] such that all points of O that are in M switch sides; O ∩ M is thus the set
of all x ∈ O such that every point of Ox switches sides. The set S consists of those
x ∈ O ∩ M such that f maps x further from c than any other point in Ox . Equivalently,
x ∈ O ∩ M is in S if O f (w) ⊂ O f (x) for all w ∈ Ox . Note that p, q ∈ S.

... ... | ... ...


x c f (x)
Figure 7. x ∈ S.

238 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



We now define a map σ : S → O, which will take an element of a Štefan sequence
to its successor in the sequence. We always choose σ (x) ∈ O f (x) ; since x ∈ M this
ensures that x and σ (x) are on opposite sides of c.
/ M, we can take σ (x) to be any point of O f (x) that does not switch
(i) If f (x) ∈
/ S.
sides. In this case σ (x) ∈
(ii) If f (x) ∈ M, then σ (x) is the point of O f (x) that maps furthest from c, i.e.,

f (O f (x) ) ⊂ O f (σ (x)) .

By construction (see Figure 8, for example) we have σ (x) ∈ S in this case.

... ... | ... ...


f (x) σ (x) c f (σ ( x))
Figure 8. The successor map σ in case (ii).

We noted that x and σ (x) are on opposite sides of c, so σ 2 (x), if defined, is again on
the same side as x. It is crucial for obtaining the outward spiraling in (Š2) that σ 2 (x)
is further from c than x, i.e., that σ 2 (x) ∈
/ Ox .

Lemma 5.2. If there is an x ∈ S such that σ 2 (x) ∈ Ox , then all points of O switch
sides.

Proof. In order to have σ 2 (x) defined and in Ox , we must have x, y := σ (x), and
z := σ (y) = σ 2 (x) all in S. Moreover σ (x) and σ (y) are both obtained using case (ii)
in the definition of σ . Hence

f (O f (x) ) ⊂ O f (σ (x)) = O f (y)

and

f (O f (y) ) ⊂ O f (σ (y)) = O f (z) .

Since z = σ 2 (x) ∈ Ox and x ∈ S, we have

O f (z) ⊂ O f (x) .

Combining the above inclusions shows that O f (x) ∪ O f (y) is mapped into itself by
f . Since f is a cyclic permutation of O, the only nonempty f -invariant subset of O is
O itself. Thus O = O f (x) ∪ O f (y) . But all points of this set switch sides because x and
y are in S.

To conclude the proof of Proposition 5.1 we now suppose that there is a point of O
that does not switch sides and show that this implies the existence of a Štefan sequence.
The contrapositive of Lemma 5.2 implies that we cannot have both σ ( p) = q and
σ (q) = p. Therefore we can choose {x0 , x1 } = { p, q} in such a way that x2 := σ (x1 ) 6 =
x0 and then continue to choose xi+1 = σ (xi ) while xi ∈ S. We now verify that this
produces a Štefan sequence.
Our choice of {x0 , x1 } = { p, q} gives (Š1).

March 2011] THE SHARKOVSKY THEOREM: A NATURAL DIRECT PROOF 239


To check (Š2) note that successive terms lie on alternating sides of c because x and
σ (x) are on opposite sides of c. To check that the sequence spirals outward note first
that our choice of x0 and x1 ensures that x2 ∈ / Ox0 . Thereafter, Lemma 5.2 shows that
xi+2 = σ 2 (xi ) ∈
/ Oxi , i.e., xi+2 lies further from c than xi .
This implies in particular that the terms of the sequence are pairwise distinct. Since
they lie in the finite set O, the sequence terminates. We label the last term xn and note
that it necessarily arises from (i) in the definition of σ . Hence xn does not switch sides,
which implies (Š4).
To check (Š3) we note first that for j < n we have x j ∈ S ⊂ M, and x j therefore
switches sides. Finally, x j+1 = σ (x j ) ∈ O f (x j ) by the definition of σ .

Proposition 5.1 and Proposition 4.4 give the following main case of the Sharkovsky
Theorem.

Proposition 5.3. If an m-cycle O with m ≥ 2 contains a point that does not switch
sides, then for each l G m there is an elementary, O-forced l-loop of O-intervals, and
hence an l-cycle.

6. PROOF OF THE SHARKOVSKY FORCING THEOREM. To prove the Shar-


kovsky Forcing Theorem it remains to reduce the case of a cycle in which all points
switch sides to the main case of Proposition 5.3. We use the fact that the left and right
halves of such a cycle are cycles for f 2 of half the length.

Proposition 6.1. An m-cycle O has an O-forced elementary l-loop of O-intervals for


each l G m.

By Proposition 2.5, this implies the Sharkovsky Forcing Theorem 1.1.

Proof. We proceed by induction on m. Proposition 6.1 is vacuously true for m = 1


since there is no l G 1.
Suppose now that Proposition 6.1 is known for all cycles of length less than m. Let
O be an m-cycle. If there is a point that does not switch sides, then the conclusion of
Proposition 6.1 follows by Proposition 5.3.
Otherwise, all points switch sides. Write L := min O and R := max O. Then O L
(see Definition 4.1) contains the points in O to the left of c, O R contains those to the
right of c, and f swaps these sets: f  O L is a bijection from O L to O R and f  O R is
a bijection from O R to O L , so O L and O R have the same number of points, and m is
even.
Since m is even, it follows from the doubling property (1) that l G m if and only if
l = 1 or l = 2k with k G m/2. Therefore we need to show that f has an elementary
1-loop as well as an elementary O-forced 2k-loop of O-intervals for each k G m/2.
As the elementary 1-loop we can take the middle O-interval [ p, q], since p =
max O L and q = min O R .
To find the required 2k-loops, we use the inductive assumption and the fact that
O L and O R are cycles of length m/2 for the second iterate f 2 . Proposition 6.1 can be
applied to either of these cycles. Using O R , we find that f 2 has an elementary O R -
forced k-loop of O R -intervals for each k G m/2. The induction will be complete once
we show that these give rise to elementary 2k-loops for f itself.
To that end, consider an elementary k-loop

f2 f2 f2 f2 f2
I0 −→ I1 −→ I2 −→ · · · −→ Ik−1 −→ I0 (3)

240 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



of O R -intervals for f 2 . For later convenience we set Ik := I0 . Let Ii0 be the shortest
closed interval that contains f (Ii ∩ O) ⊂ O L . These are O-intervals and by construc-
f
tion we have the O-forced covering relation Ii − → Ii0 for each i, 0 ≤ i < k. The re-
mainder of the proof consists of showing that this produces an O-forced 2k-loop
f f f f f f f
→ I00 −
I0 − → I10 −
→ I1 − → Ik−1 −
→ ··· − 0
→ Ik−1 −
→ I0 (4)

for f that is elementary.


To see that this is an O-forced loop we show that we also have the covering rela-
f f2
tions Ii0 −
→ Ii+1 and that they are O-forced. Since Ii −→ Ii+1 and this covering is O R -
forced, there are points ai , bi ∈ Ii ∩ O R such that the closed interval between f 2 (ai )
and f 2 (bi ) contains Ii+1 . But then ai0 := f (ai ) and bi0 := f (bi ) are in Ii0 ∩ O and the
closed interval between f (ai0 ) = f 2 (ai ) and f (bi0 ) = f 2 (bi ) contains Ii+1 , as required.
It remains to show that the loop in (4) is elementary. Consider a periodic point x
for f that follows the loop (4). It is a periodic point for f 2 that follows the elementary
loop (3) and hence has period k with respect to f 2 . Therefore k points of its f -orbit
(the even iterates) lie in O R . Since the intervals in the loop (4) are alternately to the
right and the left of the center, so are the iterates of x under f . Therefore another k
points (the odd iterates) lie in O L , and the orbit has length 2k. Hence x has period 2k
with respect to f , and the loop (4) is elementary.

7. THE SHARKOVSKY REALIZATION THEOREM. An elegant proof of the


Sharkovsky Realization Theorem 1.2 is given in [1]. It reveals one Sharkovsky tail at
a time as one increases h in the family of truncated tent maps

Th : [0, 1] → [0, 1], x 7 → min(h, 1 − 2|x − 1/2|) for 0 ≤ h ≤ 1.

This family has several key properties.

h
Th

Figure 9. Truncated tent maps.

(a) T1 has only one periodic point (the fixed point 0) while the tent map T1 has
a 3-cycle {2/7, 4/7, 6/7} and hence has all natural numbers as periods by the
Sharkovsky Forcing Theorem 1.1.
(b) Any cycle O ⊂ [0, h) of Th is a cycle for T1 , and any cycle O ⊂ [0, h] of T1 is
a cycle for Th .
What makes the proof so elegant is that h plays three roles: as a parameter, as the maxi-
mum value of Th , and as a point of an orbit. The key idea is to let h(m) := min{max O |
O is an m-cycle of T1 } for m ∈ N. (We can write “min” instead of “inf” because T1 has
a finite number of periodic points for each period.5 ) From this and (b) we obtain:
5 Inspection of the graph of T1n shows that it has exactly 2n fixed points.

March 2011] THE SHARKOVSKY THEOREM: A NATURAL DIRECT PROOF 241


(c) Th has an l-cycle O ⊂ [0, h) if and only if h(l) < h.
(d) The orbit of h(m) is an m-cycle for Th(m) , and all other cycles for Th(m) lie in
[0, h(m)).
From (d) and the Sharkovsky Forcing Theorem 1.1 we see that if l G m, then Th(m) has
an l-cycle that lies in [0, h(m)); it follows from (c) that h(l) < h(m). By symmetry,
(e) h(l) < h(m) if and only if l G m.
We see from (c), (d), and (e) that for any m ∈ N the set of periods of Th(m) is the tail of
the Sharkovsky order consisting of m and all l G m.
The set of all powers of 2 is the only other tail of the Sharkovsky order (besides
∅, which is the set of periods of the translation x 7 → x + 1 on R). We have h(2∞ ) :=
supk h(2k ) > h(2k ) by (e) for all k ∈ N, so Th(2∞ ) has 2k -cycles for all k by (c). Suppose
Th(2∞ ) has an m-cycle with m not a power of 2. By Theorem 1.1 Th(2∞ ) also has a 2m-
cycle. Since the m-cycle and the 2m-cycle are disjoint, at least one of them is contained
in [0, h(2∞ )) and hence in [0, h(2k )) for some k ∈ N, contrary to (c) and (e).

8. CONCLUSION. It may be of interest to note that the proof given here provides
more information than the statement of the Sharkovsky Forcing Theorem 1.1. When in
the proof of Proposition 4.4 we treated the loops in (L3) on page 238 we only needed
to know that n ≤ l 6 = m. Therefore Proposition 4.4 can be amplified to the following:

Proposition 8.1. If an m-cycle O contains a Štefan sequence x0 , . . . , xn , then O forces


periods l = 1 (from (L1)), l ≥ n (from (L3)), and even l ≤ n (from (L2)).

This includes periods that precede m in the Sharkovsky order.


An extreme instance is given by a cycle in which the point q chosen at the beginning
of Proposition 5.1 is max O and f (q) = min O, i.e., a cycle of the form • · · · • •. The
3 points shown here constitute a Štefan sequence with n = 2, which forces period 3
and hence all periods.
Another way in which additional information can be extracted by keeping track of
patterns arises in connection with cycles whose length is 2k for some k. If such a cycle
O contains a point that does not switch sides, then there is a Štefan sequence with
n < m − 1, and Proposition 8.1 shows that O forces all periods l ≥ n, in particular for
some odd such l, and hence there are periods that are not powers of 2. Morover, if all
points of O do switch sides, the reduction in the proof of Proposition 6.1 yields a cycle
of length 2k−1 for f 2 to which one can apply the previous reasoning: it either forces
a period that is not a power of 2 or all its points switch sides. In the latter case one
can again reduce a step. If this keeps happening until one has reduced to period 2 for
k−1
f 2 , then we say that O is simple, and we have observed that if a continuous map has
only powers of 2 as periods, then all cycles must be simple.
In other words, if there is a cycle of length 2k for any k > 1 that is not simple, then
it forces a period that is not a power of 2.
These observations illustrate that our method can make use of more information
than just the period of the cycle from which one starts; this differs from the standard
proof, which begins by discarding the initial orbit. Like our proof, refinements of Shar-
kovsky’s Theorem systematically take into account “patterns” instead of just periods.
The definition of a Štefan sequence implies that if n = m − 1, there will be only
one point of O, namely xm−1 , that does not switch sides. The point xm−1 must be either
the leftmost or rightmost point of O and the sequence x0 , x1 , . . . must spiral outwards

242 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



clockwise or counterclockwise as shown:

x m−1 ... x4 x2 x0 x1 x3 ... x m−2 or x m−2 ... x3 x1 x0 x2 x4 ... x m−1 .

Furthermore we must have f (xi ) = xi+1 for 0 ≤ i < m − 1. These orbits are called
Štefan cycles. They are central to the standard proof of the Sharkovsky Theorem. Our
proof is more direct because we do not need these cycles, but they inspired our defini-
tion of Štefan sequences.

ACKNOWLEDGMENTS. We thank the Instituto Superior Técnico of the Universidade Técnica de Lisboa
and the Mittag-Leffler Institute of the Royal Swedish Academy of Sciences for their hospitality and support.
We are also grateful to Wah Kwan Ku for pointing out an error in a draft of this paper, Aaron Brown and
Stephen Sherman for expository suggestions, and the referee and Dan Velleman for extensive corrections and
improvements. Keith Burns was partially supported by NSF grant DMS-0701140.

REFERENCES

1. L. Alsedà, J. Llibre, and M. Misiurewicz, Combinatorial Dynamics and Entropy in Dimension One, 2nd
ed., Advanced Series in Nonlinear Dynamics, vol. 5, World Scientific, River Edge, NJ, 2000.
2. R. Barton and K. Burns, A simple special case of Sharkovskii’s theorem, Amer. Math. Monthly 107 (2000)
932–933. doi:10.2307/2695586
3. L. Block, J. Guckenheimer, M. Misiurewicz, and L.-S. Young, Periodic points and topological entropy
of one-dimensional maps, in Global Theory of Dynamical Systems, Z. Nitecki and C. Robinson, eds.,
Lecture Notes in Mathematics, vol. 819, Springer Verlag, Berlin, 1980, 18–34.
4. U. Burkart, Interval mapping graphs and periodic points of continuous functions, J. Combin. Theory Ser.
B 32 (1982) 57–68. doi:10.1016/0095-8956(82)90076-4
5. W. A. Coppel, The solution of equations by iteration, Proc. Cambridge Philos. Soc. 51 (1955) 41–43.
doi:10.1017/S030500410002990X
6. B.-S. Du, A simple proof of Sharkovsky’s theorem, Amer. Math. Monthly 111 (2004) 595–599. doi:
10.2307/4145161
7. , A simple proof of Sharkovsky’s theorem revisited, Amer. Math. Monthly 114 (2007) 152–155.
8. , A collection of simple proofs of Sharkovsky’s theorem (2007), available at http://arxiv.
org/abs/math/0703592.
9. C. W. Ho and C. Morris, A graph-theoretic proof of Sharkovsky’s theorem on the periodic points of
continuous functions, Pacific J. Math. 96 (1981) 361–370.
10. T.-Y. Li and J. A. Yorke, Period three implies chaos, Amer. Math. Monthly 82 (1975) 985–992. doi:
10.2307/2318254
11. M. Misiurewicz, Remarks on Sharkovsky’s Theorem, Amer. Math. Monthly 104 (1997) 846–847. doi:
10.2307/2975290
12. Z. Nitecki, Topological dynamics on the interval, in Ergodic Theory and Dynamical Systems II–College
Park, MD 1979–1980, Progr. Math., vol. 21, Birkhäuser, Boston, 1982, 1–73.
13. A. N. Sharkovskiı̆, The reducibility of a continuous function of a real variable and the structure of the
stationary points of the corresponding iteration process, Dokl. Akad. Nauk RSR 139 (1961) 1067–1070.
14. , Coexistence of cycles of a continuous map of the line into itself, Ukrain. Mat. Zh. 16 (1964)
61–71; trans. J. Tolosa, Proceedings of “Thiry Years after Sharkovskiı̆’s Theorem: New Perspectives”
(Murcia, Spain 1994), Internat. J. Bifur. Chaos Appl. Sci. Engrg. 5 (1995) 1263–1273.
15. , Fixed points and the center of a continuous mapping of the line into itself, Dopovidi Akad. Nauk
Ukr. RSR 1964 (1964) 865–868.
16. , On cycles and structure of a continuous mapping, Ukrain. Mat. Zh. 17 (1965) 104–111. doi:
10.1007/BF02527365
17. , The set of convergence of one-dimensional iterations, Dopovidi Akad. Nauk Ukr. RSR 1966
(1966) 866–870.
18. P. Štefan, A theorem of Šarkovskii on the existence of periodic orbits of continuous endomorphisms of
the real line, Comm. Math. Phys. 54 (1977) 237–248. doi:10.1007/BF01614086
19. P. D. Straffin Jr., Periodic points of continuous functions, Math. Mag. 51 (1978) 99–105. doi:10.2307/
2690145

March 2011] THE SHARKOVSKY THEOREM: A NATURAL DIRECT PROOF 243


KEITH BURNS is a Professor of Mathematics at Northwestern University. He obtained his Ph.D. from War-
wick University (UK) and works on geometrical and dynamical problems connected with geodesics in mani-
folds with nonpositive curvature. When not doing mathematics he likes to run marathons.
Department of Mathematics, Northwestern University, Evanston, IL 60208
burns@math.northwestern.edu

BORIS HASSELBLATT is Professor and Chair of Mathematics at Tufts University. He obtained a Vordiplom
in Physics from the Technische Universität Berlin in 1981, an M.A. in mathematics from the University of
Maryland in College Park in 1984, and a Ph.D. in mathematics from the California Institute of Technology in
1989. He studies mainly hyperbolic dynamical systems, often of types that are geometrically motivated. When
he is not doing mathematics he enjoys singing. He dedicates this article to all those who do not dedicate their
articles to themselves.
Department of Mathematics, Tufts University, Medford, MA 02155
Boris.Hasselblatt@tufts.edu

Polanyi on Mathematics

“All these difficulties are but consequences of our refusal to see that mathemat-
ics cannot be defined without acknowledging its most obvious feature: namely,
that it is interesting. Nowhere is intellectual beauty so deeply felt and fastidi-
ously appreciated in its various grades and qualities as in mathematics, and only
the informal appreciation of mathematical value can distinguish what is math-
ematics from a welter of formally similar, yet altogether trivial statements and
operations.”

Michael Polanyi, Personal Knowledge: Towards a Post-Critical Philosophy,


University of Chicago Press, Chicago, 1958, p. 200

244 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



A First Look at Differential Algebra
John H. Hubbard and Benjamin E. Lundell

Abstract. This article is an introduction to the common algebraic methods used to study both
solutions to polynomial equations and solutions to differential equations: Galois theory and
differential Galois theory. We develop both theories simultaneously by studying the solutions
to the polynomial equation x 5 − 4x 2 − 2 = 0 and the solutions to the differential equation
u0 = t − u2.

1. INTRODUCTION. The object of this paper is to prove that the differential equa-
tion

u0 = t − u2 (1)

has no solutions which can be written using elementary functions, or antiderivatives


of elementary functions, or exponentials of such antiderivatives, or antiderivatives of
those, etc. We should note that equation (1) can be solved using power series, integrals
which depend on a parameter, or Bessel functions of order 1/3. However, as we will
see, none of these methods of solution are “algebraic” in nature.
We aim to give a precise definition of “algebraic” by developing the theory of differ-
ential algebra, which is largely the work of Ritt. Other contributors include Liouville,
Picard, Vessiot, Kolchin, and Rosenlicht. The part of differential Galois theory which
we will require is remarkably analogous to the part of Galois theory which leads to a
proof of Abel’s celebrated result that a general polynomial equation of degree five or
higher cannot be solved by radicals. In an effort to derive these two areas in parallel,
we will also explain why the polynomial equation

x 5 − 4x 2 − 2 = 0 (2)

has no solutions which can be written as radicals of solutions to lower degree polyno-
mial equations.
The paper is written with a reader in mind who at some point studied Galois theory:
either very recently and is therefore not an expert, or long ago and has since forgotten
many of the finer points. The examples are chosen to illuminate the theorems: it is
scarcely possible to give all the details of every proof in this article. For further reading,
we recommend [4], [5], [7], [8], [9], or [10]. For the basics of Galois theory, differential
equations, or algebraic groups, see [1], [2], or [3], respectively.

2. SPLITTING FIELDS. Our first step will be to determine where solutions to equa-
tions (1) and (2) lie. Recall that a field is a set in which an addition, subtraction, multi-
plication, and division are defined, and that these operations satisfy the rules which one
expects from elementary arithmetic. Three standard examples are the rational numbers
Q, the real numbers R, and the complex numbers C. All fields in this paper will have
characteristic 0.
For the case of polynomials, we now have all of the background we need.
doi:10.4169/amer.math.monthly.118.03.245

March 2011] A FIRST LOOK AT DIFFERENTIAL ALGEBRA 245


Definition 2.1. Given a polynomial f with coefficients in a field F ⊂ C, the splitting
field of f over F, denoted E f , is the smallest subfield of C which contains F and all
of the roots of f .

The reason we can be sure that all solutions to a polynomial f lie in some subfield
of C is the fundamental theorem of algebra, which says that any degree-n polynomial
with coefficients in C has n (not necessarily distinct) roots in C.

Example 2.2. Consider the polynomial f (x) = x 2 − 2. Then the field


√ √
E f = Q( 2) = {a + b 2 : a, b ∈ Q}

is the splitting field of f over Q. Why?


First, this is a field. One can obviously add, subtract, and multiply numbers of this
form and obtain another number of this form. Division is possible since

1 a−b 2
√ = 2 ,
a+b 2 a − 2b2

and a 2 − 2b2 = 0 implies that a = b = 0 (since √ 2 is irrational).

Second, it does contain both roots of f : ± 2 ∈ Q( 2). Third, it is obviously the
smallest field which contains these roots.

Remark. The splitting field need not be thought of as a subfield of C. We need only
to fix an algebraic closure of Q and we could work in there (as any algebraic closure
contains the roots any polynomial, by definition). We find it easier to think of subfields
of C, but that is just a crutch.
In fact, the results on the Galois theory of polynomials that we collect in this section
and the next are true in a much more general setting: if F is any characteristic-zero
field, we will have the same results for a polynomial f with coefficients in F, provided
that we fix an algebraic closure of F to begin with. This level of generality is not
appropriate in these introductory sections, but will be necessary in Proposition 4.6.

To deal with differential equations rather than polynomial equations, one must con-
sider fields with a bit more structure.

Definition 2.3. A differential field, here called a D-field, is a field F, together with a
derivation δ : F → F which satisfies the rules

δ(u + v) = δ(u) + δ(v) and δ(uv) = uδ(v) + vδ(u).

A linear differential operator of degree k on (F, δ) is a map L : F → F given by a


formula

L(u) = δ k (u) + αk−1 δ k−1 (u) + · · · + α0 u

where δ k (u) = δ(δ(. . . (u) . . . )) and αi ∈ F.

The first example of a D-field is C(t), the set of all rational functions in one variable
with complex coefficients, with the usual addition and multiplication, and with the
derivation given by the ordinary derivative. The standard rules for differentiating say

246 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



that the derivative of a rational function is again a rational function. Another example is
the field M(U ) of meromorphic functions on an open subset U ⊂ C, that is, quotients
u/v of analytic functions with v not identically zero. For our purposes, the field C(t)
will be the smallest field of interest (analogous to Q for polynomials), and M(U ) the
largest (like C). The reason we can use M(U ) as our “big field” is the existence and
uniqueness theorem for differential equations: if U is a simply connected subset of C
and α1 , . . . , αk are analytic on U , then the differential equation

L(u) = u (k) + α1 u (k−1) + · · · + αk u = 0

has a unique solution in M(U ) for any t0 in U with any given initial conditions
u(t0 ), u 0 (t0 ), . . . , u (k−1) (t0 ).

Definition 2.4. If (F, δ) is a differential field, then the constants of F are the elements
u ∈ F such that δ(u) = 0.

In this paper, the field of constants will always be C. We now have the background
needed to define where the solutions to a differential equation lie.

Definition 2.5. Let L(u) = δ n (u) + αn−1 δ n−1 (u) + · · · + α0 u be a differential oper-
ator where the αi ∈ F ⊂ M(U ) are all analytic in some simply connected open set
U ⊂ C. Then the differential splitting field of L over F in M(U ), denoted E LU , is the
smallest D-subfield of M(U ) containing F and all solutions of L(u) = 0 on U .

We will suppress U from the notation and write E L for E LU when no confusion can
arise.

Example 2.6. One of the simplest differential operators is L(u) = u 0 − u. In this case
the only coefficients are the numbers ±1, which are certainly analytic on all of C, so
we may consider the splitting field over C(t) to be the smallest D-subfield of M(C)
containing the rational functions and the solutions of the differential equation u 0 =
u, that is, all functions of the form Cet . It should be clear that this subfield, E L , is
precisely the space of functions of the form

p0 (t) + p1 (t)et + · · · + pm (t)emt


q0 (t) + q1 (t)et + · · · + qn (t)ent

where the pi and q j are polynomials with coefficients in C, with the denominator
not identically zero. Indeed, this is a differential field (clearly closed under addition,
multiplication, division, and differentiation), and more or less obviously the smallest
such field containing the constants and et . One√should think of this splitting field as a
close analog of the numbers of the form a + b 2.

3. GALOIS GROUPS. We now investigate the structure of splitting fields. In par-


ticular, we try to understand what happens to solutions of a polynomial or differential
equation under a field automorphism. We begin with the case of polynomials.

Definition 3.1. Let F ⊂ C be a field and suppose K /F is any field extension. The
Galois group, denoted Gal(K /F), is the group of all field automorphisms of K which
leave F fixed, where the group law is given by composition of automorphisms. If f is
a polynomial with coefficients in F, then we call Gal(E f /F) the Galois group of f .

March 2011] A FIRST LOOK AT DIFFERENTIAL ALGEBRA 247


Fix a polynomial f with coefficients in F, and let σ ∈ Gal(E f /F). Since σ fixes F
and respects field operations (both by definition), we see that f (σ (a)) = σ ( f (a)) for
any a ∈ E f . In particular, if a is a root of f , i.e., f (a) = 0, we see that

0 = f (a) = σ ( f (a)) = f (σ (a)).

We conclude that elements of the Galois group of f permute the roots of f . Conse-
quently, if we denote the set of roots of f by R f , then there is a group homomorphism

Gal(E f /F) → Perm(R f )

which can easily be seen to be an injection, so Gal(E f /F) is naturally isomorphic to a


subgroup of the finite group of permutations of the roots. From now on, we will always
identify the Galois group of a polynomial with a subgroup of this permutation group.
In fact, we can say more. Write f = g · h, where g is an irreducible, nonconstant
polynomial with coefficients in F, and suppose g(a) = 0 for a ∈ C. First, we note that
we necessarily have f (a) = g(a)h(a) = 0, so that a ∈ E f . Let σ ∈ Gal(E f /F). By
the same reasoning as before

0 = g(a) = σ (g(a)) = g(σ (a)),

so that not only does Gal(E f /F) permute the roots of f , it permutes the roots of the
irreducible factors of f . Because of this, we can focus solely on the case where f
itself is irreducible for the remainder of the paper.
2
Example 3.2. If f (x) = x√ −2√as in Example 2.2, then the group Gal(E f /Q) is the
group of permutations of { 2, − 2}.

Example 3.3. Let f (x) = x 5 − 1. Then the set of roots is the set with the five elements
ωk = e2kπi/5 , with k = 0, . . . , 4. Clearly the Galois group is not the whole group of
permutations; no automorphism can map 1 to anything else. This is a particular case
of the following general statement: the Galois group acts transitively on the roots of a
polynomial f (i.e., given any two roots a1 , a2 of f there exists σ ∈ Gal(E f /Q) such
that σ (a1 ) = a2 ) if and only if f is irreducible.
In our case, x 5 − 1 = (x 4 + x 3 + x 2 + x + 1)(x − 1), and the roots of the two fac-
tors cannot get mixed up. How about the other roots? It is not quite obvious,1 but ω1 can
be mapped to any other root ωk , k = 1, 2, 3, 4 by an automorphism σ ∈ Gal(E f /Q).
Knowing σ (ω1 ) completely determines σ , since

ωk = ω1k , so σ (ωk ) = σ (ω1 )k .

Once you see that, it is not hard to see that the Galois group is the multiplicative group
of Z/5Z, which is a cyclic group of order 4. The same argument shows that the Galois
group of the polynomial x p − 1 is the multiplicative group of the field Z/ pZ for any
prime p.

Definition 3.4. An extension K /F of a field F is called a Galois extension if the set


of elements of K which are fixed by all the elements of Gal(K /F) is precisely F; only
then is the Galois group really useful.2
1 This is the content of the theorem by Gauss that cyclotomic polynomials are irreducible.
2 When L/K is not a Galois extension, the right thing to consider is the set of embeddings K ⊂ C which
are the identity on F. We will not pursue this here.

248 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



The following theorem gives us all finite Galois extensions of a field F of charac-
teristic zero.

Theorem 3.5. If f is an irreducible polynomial with coefficients in a field F ⊂ C,


then the field extension E f /F is a Galois extension. In fact, every finite Galois exten-
sion of F arises as the splitting field of some irreducible polynomial with coefficients
in F. Finally, in this setting, |Gal(E f /F)| = [E f : F].

Proof. See, for instance, Section 14.1 of [1].

In Examples 3.2 and 3.3 we saw examples of Galois extensions. We now look at an
extension which is not a Galois extension.

Example 3.6. Consider the field F of real numbers of the form

a + b21/3 + c41/3

for any rational numbers a, b, c. In this case, Gal(F/Q) = {1} since any element of
the group must send 21/3 to a cubic root of 2, and there are no other such roots in F.
Thus, F/Q is not a Galois extension.

Theorem 3.7 (The Fundamental Theorem of Galois Theory). Let K /F be a finite


Galois extension. If M is a field and F ⊂ M ⊂ K , then Gal(K /M) is a subgroup
of Gal(K /F). Furthermore, this defines an inclusion-reversing bijection between the
subfields of K containing F and the subgroups of Gal(K /F). Under this bijection,
the normal subgroups correspond to the subfields M such that M/F is also a Galois
extension, and, in this case, Gal(M/F) ' Gal(K /F)/ Gal(K /M).

Proof. See Section 14.2 of [1].

Example 3.8. Consider the polynomial f (x) = x 3 − 2. This has three roots, and the
Galois group Gal(E f /Q) is the full group of permutations of these roots3 and so has
order six. The splitting field, E f , contains the ratios of the roots, which are cubic roots
of unity. If we set E g to be the splitting field of g(x) = x 2 + x + 1, then Gal(E f /E g )
is cyclic of order three. Therefore, Gal(E f /E g ) E Gal(E f /Q) since index-two sub-
groups are always normal. According to Theorem 3.7, then, E g /Q is a Galois exten-
sion with Gal(E g /Q) ' Gal(E f /Q)/ Gal(E f /E g ).
More specifically, set ω = e2πi/3 . Then Gal(E f /Q) is generated by complex con-
jugation and the unique field automorphism σ which maps 21/3 7 → ω · 21/3 , whereas
Gal(E f /E g ) is generated by just σ , since complex conjugation is not the identity on
E g . This last statement shows that Gal(E g /Q) is isomorphic to the quotient group,
which is generated by the coset containing complex conjugation, and so is of order
two.
Finally, we consider the order-two subgroup of Gal(E f /Q) generated by complex
conjugation. The field corresponding to this subgroup under the bijection in Theorem
3.7 is precisely the field considered in Example 3.6. There, we saw that this field was
not a Galois extension of Q. This corresponds to the fact that no subgroup of two
elements is normal in Gal(E f /Q).
3 In particular, the real cubic root of 2 cannot be algebraically distinguished from the other roots.

March 2011] A FIRST LOOK AT DIFFERENTIAL ALGEBRA 249


4. DIFFERENTIAL GALOIS GROUPS. In this section, we will develop an anal-
ogous Galois theory for differential field extensions. Throughout this section, we will
assume that all differential fields contain C(t) and lie inside some field M(U ) for
U ⊂ C a simply connect open set.

Definition 4.1. Let (K , ) be a differential extension of the differential field (F, δ).
The differential Galois Group, DGal(K /F), is the group of field automorphisms
σ : K → K which restrict to the identity on F and satisfy σ ((u)) = (σ (u)) for all
u ∈ K , and the group law is given by composition of automorphisms. If L is a linear
differential operator with coefficients in F, then we call DGal(E L /F) the differential
Galois group of L.

Just as in the case of polynomials, if we fix a linear differential operator L of degree


k with coefficients in F, then the elements of DGal(E L /F) will map one solution of
L(u) = 0 to another. The proof is identical to that in the polynomial case. The space
VL of solutions of L(u) = 0 in M(U ) has dimension k as a vector space over C
(since we must specify k initial conditions to be guaranteed a unique solution). As
we just mentioned, elements of the D-Galois group permute the solutions of L and
are F-linear (and hence necessarily C-linear). We conclude that the D-Galois group
DGal(E L /K ) is naturally isomorphic to a subgroup of GL(VL ). Thus, if you choose a
basis of solutions of L(u) = 0, you can think of DGal(E L /F) as a group of invertible
complex k × k matrices. As with polynomials, we will always make this identification.

Example 4.2. Let L be the operator given by L(u) = u 0 − u; in Example 2.6, the
splitting field of L was determined. Any automorphism of E L must send one solution
of u 0 = u to another, so in particular, it must send et to Cet for some nonzero complex
number C. Moreover, C completely determines the D-Galois automorphism. Conse-
quently, the D-Galois group is C∗ = GL1 (C), the multiplicative group of the complex
numbers.

Theorem 4.3. The differential Galois group DGal(E L /K ) of a linear differential op-
erator L is an algebraic subgroup of GL(VL ); that is, it is a subset defined by finitely
many polynomial equations.

Proof. See Section 17 of [4].

Let us see what this says for a few examples. The additive group C has lots of sub-
groups, isomorphic to Z, Z ⊕ Z, R, etc., but none of them are algebraic. For instance,
Z is defined by the equation sin π z = 0, but f (z) = sin(π z) is not a polynomial (nor
is the function f (z) = z − z̄, which vanishes exactly on R). The group C∗ also has
lots of subgroups, but only those consisting of the nth roots of unity for some n are
algebraic (obviously defined by the single equation z n − 1 = 0).
The next theorem is a strong result on algebraic groups, and will be necessary for
the proof of Theorem 8.1.

Theorem 4.4. Let V be a finite-dimensional vector space over C, and G ⊂ GL(V ) be


an algebraic subgroup. Then the connected component G 0 ⊂ G of G containing the
identity is a normal subgroup, which is also algebraic, and the quotient group G/G 0
is finite.

Proof. See Chapter II, Section 7.3 of [3].

250 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Our next example shows that a D-Galois group can perfectly well be finite.

Example 4.5. Let U ⊂ C be the open unit disk and consider the linear differential
operator

t
L(u) = u 0 + u,
1 − t2

whose coefficients are analytic on U . Let w be an analytic branch of 1 − t 2 on U ,
for instance the one which is positive on (−1, 1). Then L(Cw) = 0 for any C ∈ C,
√ splitting field E L ⊂ M(U ) of L over C(t) is the set of functions of the form
and the
u + v 1 − t 2 with u, v ∈ C(t).
Since L is a first-order operator, we see that DGal(E L /C(t)) is a subgroup of
GL1 (C) = C∗ . Let σ ∈ DGal(E L /C(t)), so that σ (w) = Cw for some C ∈ C. Then,
σ (w)2 = C 2 w 2 = C 2 (1 − t 2 ). But also, σ (w)2 = σ (w 2 ) = σ (1 − t 2 ) = 1 − t 2 , since
σ fixes 1 − t 2 ∈ C(t). Therefore, C 2 = 1. Thus, DGal(E L /C(t)) is the group√of two
elements:
√ the identity automorphism and the automorphism which exchanges 1 − t
2

and − 1 − t . 2

This illustrates the following fact: even if a linear differential operator is irreducible,
in the sense that it is not the composition of two linear differential operators of lower
degree, the differential Galois group of the splitting field may well not act transitively
on the nonzero solutions, which may have an “individuality” of their own.

The situation in this example is completely general.

Proposition 4.6. Let L be a linear differential operator with coefficients in some D-


field F ⊂ M(U ), where U ⊂ C is a simply connected open set. If E L /F is a Galois
extension and DGal(E L /F) is finite, then all the elements of E L are algebraic over F.

Before we prove this proposition, we should explain what we mean by “E L /F is a


Galois extension.” By this, we mean that E L is the splitting field of some irreducible
polynomial with coefficients in the D-field F (contained in some algebraic closure of
F). In the setting of Example 4.5, we had E L = E f where f (x) = x 2 − (1 − t 2 ) ∈
C(t)[x]. As we mentioned in the remark following Example 2.2, the results on the
Galois theory of polynomials we developed in Sections 2 and 3 carry over to this more
general setting.

Proof of Proposition 4.6 Choose v ∈ E L , and consider the polynomial


Y
f := (x − σ (v)).
σ ∈DGal(E L /F)

The coefficients of this polynomial are fixed under DGal(E L /F), hence in F by The-
orem 4.8 below. Thus, f is a polynomial with coefficients in F which has v as a root,
so that v is algebraic over F, as desired.

The previous proposition connects splitting fields of linear differential operators to


splitting fields of polynomials. One might ask, then, are Galois D-extensions the right
extensions to study in the D-field setting? The answer is no. However, we would still
like an analogue of Galois extensions for differential fields.

March 2011] A FIRST LOOK AT DIFFERENTIAL ALGEBRA 251


Definition 4.7. An extension K /F of differential fields is called a Picard-Vessiot ex-
tension if the set of elements of K which are fixed by all the elements of DGal(K /F)
is precisely F.
Picard-Vessiot extensions play the role in differential Galois theory that Galois ex-
tensions play in polynomial Galois theory. In support of this, we offer the following
analogues of Theorems 3.5 and 3.7 for differential fields.
Theorem 4.8. If L is a linear differential operator with coefficients in a differential
field F ⊂ M(U ), then the field extension E L /F is a Picard-Vessiot extension. More-
over, every Picard-Vessiot extension of F arises as the splitting field of some linear
differential operator with coefficients in F.
Proof. See the section in [8] on Picard-Vessiot extensions.
Theorem 4.9 (The Fundamental Theorem of Differential Galois Theory). Let
K /F be a Picard-Vessoit extension. If M is a D-field and F ⊂ M ⊂ K , then
DGal(K /M) is an algebraic subgroup of DGal(K /F). Furthermore, this defines
an inclusion-reversing bijection between the D-subfields of K containing F and the
algebraic subgroups of DGal(K /F). Under this bijection, the normal subgroups cor-
respond to the D-subfields M such that M/F is also a Picard-Vessoit extension, and,
in this case, DGal(M/F) ' DGal(K /F)/ DGal(K /M).
Proof. See Sections 15–19 of [4].
5. THE DISCRIMINANT AND THE WRONSKIAN. The resemblance between
Galois theory and D-Galois theory is quite striking, but the correspondence between
the discriminant of a polynomial and the Wronskian of a linear differential operator is
positively uncanny.
Definition 5.1. If f is an irreducible polynomial with coefficients in some field F,
and E f is its splitting field, containing roots x1 , . . . , xd , then the discriminant of f is4
Y
1( f ) = ± (xi − x j ),
i6 = j

where the sign is + if and only if the number of factors is divisible by four.
A priori, this looks like an element of E f , but it is clearly fixed by Gal(E f /F),
Q an element of F by Theorem 3.5. In E f , the discriminant 1( f ) is
and is therefore
the square of i< j (xi − x j ) (that is what the sign was for), but it is not necessarily
a square in√F. If it is not a square, there is an intermediate field between F and E f ,
namely F( 1( f )). It is fairly easy to understand the relation between the various
Galois groups.

Proposition 5.2. We have Gal(E f /F( 1( f ))) = Gal(E f /F) ∩ Alt(R f ), where
Alt(R f ) ⊂ Perm(R f ) is the subgroup of even permutations. In particular, the discrimi-
nant is a square in F precisely when Gal(E f /F) consists entirely of even permutations
of the roots of f .
Proof. An even permutation σ can be written as a product of an evenQ number of trans-
positions, and hence it will not change the sign of (and hence fixes) i< j (xi − x j ).
4 For the sake of comparison with the Wronskian, it might be better to define the discriminant in terms of

the resultant of f , but we avoid that here because it is longer and more technical.

252 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Example 5.3. Consider the polynomial f (x) = x 3 − 2; one can compute that
1(x 3 − 2) = −108. In Example √ 3.8, we saw that Gal(E f /Q) = S3 . Proposition
5.2 then shows that Gal(E f /Q( 1( f ))) = A3 . Of course, we had√ seen this already

in Example 3.8: −108 = −1 · 22 · 33 so that Q( 1( f )) = Q( −3) = E g , where
g(x) = x 2 + x + 1 is the irreducible polynomial giving the cubic root of unity.

Let U be a simply connected open subset of C and F a differential subfield of


M(U ). The Wronskian of a differential operator L with coefficients in F is best un-
derstood by making the differential equation L(u) = 0 into a system of first-order
equations as follows.
If L(u) = u (n) + αn−1 u (n−1) + · · · + α0 u, then we can represent the equation
L(u) = 0 as the matrix equation A L u = u0 , where A L , u, and u0 are the n × n,
n × 1, and n × 1 matrices
0 u u
     0 
 ..  u  0
u 00 
. In−1  , u :=  .  , and u0 :=  .  .
 
A L := 
 0   ..   .. 
−α0 −α1 . . . −αn−1 u (n−1) u (n)

If u 1 , . . . , u n are n linearly independent solutions to L(u) = 0, we define a new


n × n-matrix W L whose ith column is ui for i = 1, . . . , n. Then W L satisfies the n × n-
matrix differential equation W 0 = A L W .

Definition 5.4. The Wronskian of the differential operator L is the function Wr L =


det(W L ) of the complex variable t. As the matrix W L depends on the choice of the
basis of solutions to L(u) = 0, the Wronskian Wr L is only defined up to a nonzero
complex constant.

Unfortunately, it would seem from the definition that one would have to know a
basis for the space of solutions to L(u) = 0 to compute the Wronskian. That, in gen-
eral, could be very difficult to come by. However, the following proposition gives us a
method of computing the Wronskian just by considering the matrix A L .

Proposition 5.5. The Wronskian Wr L satisfies the differential equation

u 0 = Tr(A L )u.

Hence, we can write


Z t 
Wr L (t) = Wr L (t0 ) exp Tr(A L (s))ds ,
t0

where t0 is any point in U and we integrate along any path from t0 to t in U . In


particular, the Wronskian is always contained in a differential extension obtained by
adjoining the exponential of an antiderivative.

Proof. To see the first fact, we use the Jacobi identity for invertible n × n matrices:

(det W )0 = Tr(W 0 · Adj W ),

where Adj W is the unique n × n invertible matrix such that

W · Adj W = (Adj W ) · W = det(W )In .

March 2011] A FIRST LOOK AT DIFFERENTIAL ALGEBRA 253


The first part of the proposition now follows immediately since

Wr0L = Tr(W L0 · Adj W L )


= Tr(A L W L Adj W L ), since W L0 = A L W L ,
= Tr(A L · det(W L )In ), since W L · Adj W L = det(W L )In ,
= Tr(A L · Wr L In )
= Tr(A L ) Wr L .

Suppose now that the Wronskian is 0 at t0 ∈ U . Then there is some linear combination
w of u 1 , . . . , u n such that w(t0 ) = w 0 (t0 ) = · · · = w (n−1) (t0 ) = 0. However, L(w) =
0, so by the uniqueness theorem for solutions to L(u) = 0, we must have the w is the
constant function 0. This violates the linear independence of the solutions u 1 , . . . , u n ,
so we may conclude that Wr L is nonvanishing on U . Thus, we may divide to get

Wr0L
= Tr A L
Wr L
as functions on the simply connected set U . Thus, if we pick any t0 ∈ U , the integral
Z t Z t
Wr0L (s)
Tr A L (s) ds = ds
t0 t0 Wr L (s)

= log Wr L (t) − log Wr L (t0 )

is independent of the path chosen from t0 to t and the second part of the proposition
follows.

Again, if the Wronskian is not in the original D-field, this gives an intermediate D-
field extension F ⊂ F(Wr L ) ⊂ E L , and it is not too difficult to understand the effect
on the D-Galois groups.

Proposition 5.6. After identifying DGal(E L /F) and GL(VL ), we have

DGal(E L /F(Wr L )) = DGal(E L /F) ∩ SL(VL ),

where SL(VL ) ⊂ GL(VL ) is the subgroup of automorphisms of determinant one. In


particular, if the Wronskian is in F, then the D-Galois group is contained in SL(VL ).

Proof. Let σ ∈ DGal(E L /F). Then, as discussed above, σ permutes the solutions of
L(u) = 0. As W L is determined by these solutions, we see that σ acts on W L by acting
on its entries. We will denote this action by σ (W L ). Since σ is a field homomorphism,
and as det W L is a sum of products of elements of F, we can conclude σ (Wr L ) =
σ (det W L ) = det(σ (W L )).
By identifying DGal(E L /F) and GL(VL ), we can find a matrix S ∈ GL(VL ) such
that σ (W L ) = W L S.5 Therefore,

σ (Wr L ) = det(σ W L ) = det(W L S) = det(W L ) det(S) = det(S) Wr L .

In particular, σ fixes Wr L if and only if det S = 1, as claimed.


5 This is not obvious, but it can be checked by writing down matrices with respect to the basis {u , . . . , u }.
1 n
Also, notice that this means that the matrix corresponding to σ acts by right multiplication.

254 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



We illustrate these ideas with an example which is crucial to the proof of Theorem
8.1.
Example 5.7. Consider the Airy differential operator L A (u) = u 00 − tu, and let v be a
solution to L(u) = 0. Then
0 1
 
AL A = ,
t 0
and, if L A (v) = 0, then
0 1 v
  
AL A v =
t 0 v0
v
 0
=
tv
v
 0
= , since L A (v) = 0,
v 00
= v0 .
We can now use Proposition R
5.5 to
 compute the Wronskian Wr L A . Since Tr A L A = 0,
we have that Wr L A = C exp 0 = C, for some nonzero complex number C. Since
C ∈ C(t), we see that DGal(E L A ) is a subgroup of SL2 (C) by Proposition 5.6; deter-
mining precisely which subgroup is the content of Theorem 8.1.
6. RADICAL EXTENSIONS AND SOLVABLE GALOIS GROUPS. Recall a
major result in your high school algebra class: the quadratic formula. Presuming that
you worked mainly in characteristic zero back then, this simple formula allowed you to
find the roots of any quadratic polynomial you were given. Not surprisingly, a square
root appeared in the solution. This basic setting provides all the intuition needed to
continue.
Definition 6.1. Suppose that you can find all the roots of an irreducible polynomial f
with coefficients in a field F by the following procedure:
(i) Choose some element a1 ∈ F and consider the splitting field F1 of x d1 − a1 .
(ii) Choose an an element a2 ∈ F1 , and set F2 to be the splitting field of x d2 − a2 .
(iii) Continue in this way until you have a field Fi which has all the roots of f .
Then, we say that f is solvable by radicals.
More complicated (and probably not covered in high school) is the formula for
cubics: first, to solve the equation x 3 + ax 2 + bx + c = 0, substitute y = x − a3 to
convert the problem to
a2 2a 3 ab
y 3 + py + q = 0, where p = b − and q = − + c.
3 27 3
Then we find
q 1/3
4 p3

−q ± q2 + 27 p
y=  − 1/3 .
2
 r
p3
−q± q 2 + 427
3 2

March 2011] A FIRST LOOK AT DIFFERENTIAL ALGEBRA 255


Note that this is a typical radical extension: first we adjoin a square root, q 2 + 4 p 3 /27,
p

then a cube root of an element of the field generated by the first extension.

Definition 6.2. A group G is said to be solvable if there is a chain of subgroups,

{1} = G n ⊂ G n−1 ⊂ · · · ⊂ G 0 ⊂ G −1 = G,

such that each G j is a normal subgroup of G j−1 and the quotient groups G j−1 /G j are
all abelian.

Standard examples of solvable finite groups are the symmetric groups S3 and S4 ,
the latter via the chain

{1} ⊂ V ⊂ A4 ⊂ S4 ,

where V is the Klein four-group and A4 is the alternating group. The groups Sn and
An are not solvable, however, for n ≥ 5.
The next proposition shows that the similarity in naming is no coincidence.

Proposition 6.3. An irreducible polynomial f with coefficients in F is solvable by


radicals if and only if Gal(E f /F) is a solvable group.

7. LIOUVILLIAN EXTENSIONS AND SOLVABLE DIFFERENTIAL GA-


LOIS GROUPS. Of course, we can consider radical extensions of a differential
field, but they are not the right analog of radical extensions in the context of differ-
ential fields. There, the “simple” extensions of a D-field F are those obtained by
considering the antiderivative V of an element v ∈ F, and considering the smallest
D-field containing F and either V or e V . We will also consider all finite algebraic
extensions of F as elementary.

Definition 7.1. A Liouvillian extension K of a D-field F is one such that there is a


sequence

F = F0 ⊂ F1 ⊂ F2 ⊂ · · · ⊂ Fn = K

such that each field Fi+1 is either finite algebraic over Fi , or generated by an antideriva-
tive or exponential of an antiderivative of an element of Fi .

Notice that if you are thinking of all these fields as subfields of M(U ) for an ap-
propriate U , then it may be necessary to restrict to some U1 ⊂ U : if v has a pole in U
with nonzero residue, then there will not be an antiderivative of v defined on all of U ,
but there will be one on any simply connected subset of U which avoids the poles of v.
Consider the extension Fi /Fi−1 ; there are three possibilities for DGal(Fi /Fi−1 ) de-
pending on how the extension is generated:
(i) If Fi /Fi−1 is generated by a finite algebraic element, then DGal(Fi /Fi−1 ) is
finite.
(ii) If Fi /Fi−1 is generated by an antiderivative of an element α ∈ Fi−1 , then we
can think of Fi as the splitting field of the linear operator L(u) = u 0 − α. In
particular, the solution we use to generate Fi is defined only up to addition of a
constant in C. Since DGal(Fi /Fi−1 ) must permute the solutions of L(u) = 0,
we see that DGal(Fi /Fi−1 ) ' C (since C has no algebraic subgroups).

256 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



(iii) If Fi /Fi−1 is generated by the exponential of an antiderivative of an element
α ∈ Fi−1 , then we can think of Fi as the splitting field of the linear operator
L(u) = u 0 − αu. In particular, the solution we use to generate Fi is defined
only up to multiplication by a constant in C∗ , and so DGal(Fi /Fi−1 ) ' C∗ or
DGal(Fi /Fi−1 ) is cyclic of order n (corresponding to the algebraic subgroup
of the nth roots of unity) and Fi /Fi−1 is an algebraic extension by Proposition
4.6.

Example 7.2. Let F ⊂ M(U ), where U ⊂ C is some simply connected open set.
Parts (ii) and (iii) above show that if L is a first-order linear operator defined over
F, then E L /F is a Liouvillian extension. What can we say about second-order linear
operators? The answer requires us to think more about first-order operators.
Suppose L(u) = u 0 + αu where α ∈ F. Let β ∈ E L and let v be a solution to the
nonhomogeneous equation L(u) = β. Solving for v first requires us to R multiply both
sides of the equation v 0 + αv = β by the integrating factor w := exp α . Note that

w is contained in a Liouvillian extension of F.
This multiplication transforms our equation into (vw)0 = wβ. Integrating and di-
viding gives

1
Z
v= wβ;
w
in particular, v is contained in a Liouvillian extension of F since w and β are.
We now have all we need to consider a second-order linear operator L(u) = u 00 +
α1 u 0 + α0 u with α0 , α1 ∈ F. Suppose v satisfies L(v) = 0 and let K ⊂ E L be the
smallest D-subfield which contains v and F. It need not be the case the K /F is a
Liouvillian extension. However, we can show that E L /K is a Liouvillian extension.
Let w be another, linearly independent, solution to L(w) = 0; then, up to a scalar
multiple, we have

v w
 
Wr L = det = vw0 − wv 0 .
v 0 w0

This shows that w satisfies the first-order nonhomogeneous differential equation

v0 W rL
w0 − w= ,
v v
where W L /v is contained in a Liouvillian extension of K . From what we saw above,
this shows that E L /K is a Liouvillian extension. It is worth noting that if K /F is a
Liouvillian extension, then E L /F will be a Liouvillian extension as well.

Proposition 7.3. Any elementary function is contained in some Liouvillian extension


of C(t).

Proof. The difficulty


√ is that the
 definition of a D-field never mentioned compositions,
like esin t or log 1 − t 2 + 1 . But such compositions are contained in Liouvillian ex-
tensions. Indeed, any composition will be of the form eu , log u, or sin u. Trigonometric
functions are dealt with using Euler’s formulas

eit + e−it eit − e−it


cos t = and sin t = .
2 2i

March 2011] A FIRST LOOK AT DIFFERENTIAL ALGEBRA 257


The exponentials were explicitly included in the definition of Liouvillian extensions,
and the logarithm is the antiderivative of u 0 /u.

The following proposition says that Liouvillian extensions are the analogs of radical
extensions.

Proposition 7.4. Let L be a linear differential operator with coefficients in a D-field


F, and let G = DGal(E L /F). The D-splitting field E L is (contained in) a Liouvillian
extension if and only if G 0 , the connected component of the identity in G, is solvable.

Proof. See Sections 25, 26, and 27 of [4].

8. SOLUTIONS OF EQUATIONS (1) AND (2). We now restate our goals in the
new language we have developed over the previous sections:
(i) The polynomial f (x) = x 5 − 4x − 2 is not solvable by radicals.
(ii) No solution of the differential equation u 0 = t − u 2 is contained in a Liouvillian
extension.
We begin by showing (i). Recall that a polynomial is solvable by radicals only if
the Galois group of its splitting field is a solvable group; that is, it suffices to show that
Gal(E f /Q) is not solvable.
Since f has degree five, there are five roots defined over C. The Galois group
Gal(E f /Q) permutes these roots and is thus (naturally isomorphic to) a subgroup of
S5 . If a ∈ C is such that f (a) = 0, then, since f is irreducible, the field Q(a) is an ex-
tension of Q of degree five. Since Q(a) ⊂ E f , the degree of E f /Q must be divisible by
five. By Theorem 3.5, the order of Gal(E f /Q) is divisible by five as well. By Cauchy’s
theorem, there is therefore an element of order five (or 5-cycle) in Gal(E f /Q).
Note that f (−2) < 0 < f (−1) and f (0) < 0 < f (2), so that f has real roots
a, b, c satisfying

−2 < a < −1 < b < 0 < c < 2.

By considering derivatives of f , one sees that these are the only real roots. Thus, there
are two complex conjugate roots of f .
Now consider the action of complex conjugation on the field E f . It is certainly an
automorphism, and it fixes Q ⊂ R. It is therefore an element of Gal(E f /Q) ⊂ S5 .
We just concluded that three of the roots of f are real, and thus fixed by complex
conjugation, and that the two remaining roots are swapped. We can therefore conclude
that Gal(E f /Q) contains a 2-cycle as well.
That is all we need, however. It is a fact from elementary group theory that a 2-
cycle and a 5-cycle generate all of S5 . We conclude that Gal(E f /Q) = S5 , and f is
not solvable by radicals.
We now proceed to our second goal. Up to this point, we have considered only
linear differential operators and their solutions. In light of this, our results so far would
seem to have little bearing on the solutions to the nonlinear equation u 0 = t − u 2 .
However, there is still hope: recall the Airy differential operator

L A (u) = u 00 − tu

from Example 5.7. It is certainly linear, and, since the function t has no poles, the
splitting field E L A is a subfield of the meromorphic functions on C, M(C).

258 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Let v ∈ E L A be any function such that L A (v) = 0, and let w = v 0 /v be the loga-
rithmic derivative of v (this is necessarily in the field E L A ). Then differentiation shows
that

v · v 00 − [v 0 ]2
w0 =
v2
t · v 2 − [v 0 ]2
= , since v 00 − tv = 0 by assumption,
v2
= t − w2 ;

that is, w is a solution to u 0 = t − u 2 ! Moreover, we can reverse


R this
 process to see that
if w is any function such that w 0 = t − w2 , then v = exp w satisfies L A (v) = 0.
Thus, we have a way of relating solutions of u 0 = t − u 2 to solutions of the linear
differential equation L A (u) = 0.

Theorem 8.1. We have that G = DGal(E L A /C(t)) = SL2 (C).

Proof. In Example 5.7, we found that G ⊂ SL2 (C). To prove equality, let G 0 be the
connected component of the identity. Then G/G 0 is finite by Theorem 4.4.
Since SL2 (C) is 3-dimensional, there are very few proper connected subgroups. In
particular, they are all conjugate to one of the following four:
1 0
 
(i) ,
0 1

a 0
  
(ii) :a∈C , ∗
0 1/a

1 b
  
(iiii) : b ∈ C , or
0 1

a b
  
(iv) : a ∈ C ,b ∈ C .

0 1/a
Note that for each of these groups, (1 0)t is a common eigenvector. Thus, if G 0
were a proper subgroup of SL2 (C), then all elements of G 0 would have a common
eigenvector v ∈ E L A which satisfies L A (v) = 0.
Since differentiation commutes with the action of G on E L A , we then see that w =
v 0 /v is left fixed by G 0 . Thus, the Picard-Vessiot extension generated by w is a D-
subfield, M say, of E L A and G 0 ⊂ DGal(E L /M) (because G 0 fixes w). In particular,
we can apply the Fundamental Theorem of Differential Galois Theory (Theorem 4.9)
so that DGal(M/C(t)) ' G/ DGal(E L /M) is a quotient of G by a group containing
G 0 ; hence, DGal(M/C(t)) is finite, so that w is an algebraic function by Proposition
4.6 (and so has finitely many poles).
However, as we saw above, w0 = t − w 2 since w is the logarithmic derivative of the
solution v of L A (u) = 0. For any number t0 < −1 − π/2, the solution w with w(t0 ) =
0 has domain of definition (a, b) with t0 − π/2 < a < b < t0 + π/2, since the solution
is above tan(t + t0 ) for t > t0 and beneath tan(t + t0 ) for t < t0 (see Figure 1). Thus,
w has at least as many poles as tan(t), which has infinitely many poles. Hence w is not
algebraic and our guess that G 0 6 = SL2 (C) is false.

March 2011] A FIRST LOOK AT DIFFERENTIAL ALGEBRA 259


y
7

x
–7 7

–7
Figure 1. Solutions to w0 (t) = t − w 2 .

Corollary 8.2. No nonzero solution of the Airy equation belongs to a Liouvillian ex-
tension of C(t).

Proof. By Proposition 7.4, if one nonzero solution (and hence all solutions by Ex-
ample 7.2) of the Airy equation belonged to a Liouvillian extension, then G 0 =
SL2 (C) would be solvable. Since solvability is preserved by quotients (this is an
exercise using the definition and the isomorphism theorems for groups), we would
also have that PSL2 (C) = SL2 (C)/{± Id} would be solvable. By Theorem 8.4 of [6],
PSL2 (C) is simple. Thus, PSL2 (C) is solvable only if it is abelian. One can easily
check that this is not the case.

We have (finally!) reached our second goal:

Corollary 8.3. The differential equation u 0 = t − u 2 has no solutions which belong to


a Liouvillian extension of C(t).

Proof. Suppose v is such a solution. Then exp v(t) dt is contained in a Liouvil-


R 
lian extension of C(t) and satisfies the Airy equation. This contradicts the previous
corollary.

REFERENCES

1. D. Dummit and R. Foote, Abstract Algebra, 3rd ed., John Wiley, Hoboken, NJ, 2004.
2. J. Hubbard and B. West, Differential Equations: A Dynamical Systems Approach, Texts in Applied Math-
ematics, vol. 5, Springer-Verlag, New York, 1995; corrected reprint of the 1991 edition.
3. J. Humphreys, Linear Algebraic Groups, Graduate Texts in Mathematics, vol. 21, Springer-Verlag, New
York, 1975.
4. E. Kolchin, Algebraic matric groups and the Picard-Vessiot theory of homogeneous linear ordinary dif-
ferential equations, Ann. of Math. (2) 49 (1948) 1–42. doi:10.2307/1969111
5. , Differential Algebra and Algebraic Groups, Pure and Applied Mathematics, vol. 54, Academic
Press, New York, 1973.
6. S. Lang, Algebra, 3rd ed., Graduate Texts in Mathematics, vol. 211, Springer-Verlag, New York, 2002.
7. A. Magid, Lectures on Differential Galois Theory, University Lecture Series, vol. 7, American Mathe-
matical Society, Providence, RI, 1994.
8. , Differential Galois theory, Notices Amer. Math. Soc. 46 (1999) 1041–1049.

260 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



9. J. Ritt, Differential Algebra, American Mathematical Society Colloquium Publications, vol. XXXIII,
American Mathematical Society, New York, 1950.
10. M. van der Put and M. Singer, Galois Theory of Linear Differential Equations, Grundlehren der Math-
ematischen Wissenschaften (Fundamental Principles of Mathematical Sciences), vol. 328, Springer-
Verlag, Berlin, 2003.

JOHN H. HUBBARD received his undergraduate degree from Harvard University and his doctorate from the
Université de Paris-Sud. He is currently a professor at Cornell University and the Université de Provence. His
research interests lie in differential equations and complex dynamics.
Department of Mathematics, Cornell University, Ithaca, NY 14850
jhh8@cornell.edu

BENJAMIN E. LUNDELL received his undergraduate degree from the University of Illinois at Urbana-
Champaign and a Certificate of Advanced Study in Mathematics from Cambridge University. He is currently
a doctoral candidate at Cornell University studying number theory and arithmetic geometry.
Department of Mathematics, Cornell University, Ithaca, NY, 14850
blundell@math.cornell.edu

March 2011] A FIRST LOOK AT DIFFERENTIAL ALGEBRA 261


NOTES
Edited by Ed Scheinerman

Lines of Best Fit for the Zeros and for the


Critical Points of a Polynomial
Grant Keady

Abstract. Combining results presented in two papers in this M ONTHLY yields the following
elementary result. Any line of best fit for the zeros of a polynomial is a line of best fit for its
critical points.

This note gives a generalization of results on cubic polynomials presented in [1]. Our
notation will follow that paper. A line of best fit for a set of points in the plane is
defined, as in [1, p. 682], to be a line that minimizes the sum of squares of the per-
pendicular distances from the points to the line. (Sometimes, elsewhere, such a line is
called a “least-squares perpendicular-offsets” line.)
Let {z j | 1 ≤ j ≤ n} be a set of n ≥ 2 complex numbers. Define
n n
1X
(z j − z A )2 .
X
zA = zj and Z= (1)
n j=1 j=1

The generalization of [1, Theorem 2.4] is as follows.

Theorem 1. Let {z j | 1 ≤ j ≤ n} be a set of n ≥ 3 complex numbers. Let z A and Z


be defined as in equation (1). Let p(z) be the monic polynomial of degree n
n−1 n
p(z) = z n + ajz j =
X Y
(z − z j ),
j=0 j=1

and denote the roots of its derivative by z k0 , so that p 0 (z) = n n−1 0


k=1 (z − z k ). The lines
Q
of best fit for the zeros z j of p coincide with the lines of best fit for the zeros z k0 of p 0 ,
and these lines pass through z A .

The zeros z k0 of p 0 are also called the critical points of p.


When n = 3 and Z = 0 the points z j form the vertices of an equilateral triangle.
When n = 3 and Z 6 = 0, Theorem 1 becomes [1, Theorem 2.4], attributed there to
Coolidge. (Closely related theorems are due to Grace, Bôcher, and Siebeck.)

Theorem 2 (Coolidge, n = 3). Suppose the complex numbers z 1 , z 2 , z 3 form the ver-
tices of a triangle which is nonequilateral. If p(z) = 3j=1 (z − z j ), then the unique
Q
line of best fit for the three numbers is the line through the roots of p 0 (z).

The first ingredient in the proof of Theorem 1 is the following.


doi:10.4169/amer.math.monthly.118.03.262

262 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Theorem 3 ([1, Theorem 2.3]). Suppose z j , 1 ≤ j ≤ n, are complex numbers, and
z A and Z are as in equation (1) above.
(a) If Z = 0, then every line through z A is a line of best fit for the points z 1 , . . . , z n ,
and these are the only lines of best fit.

(b) If Z 6= 0, then the line through z A that is parallel to the vector from 0 to Z is
the unique line of best fit for z 1 , . . . , z n .

The next ingredient inPthe proof is taken from [2]. (See also Newton’s identities.)
First note that an−1 = − nj=1 z j = −nz A . Now suppose, as in [2], that z A = 0: there is
no loss of generality in this translation of the points of the complex plane. Then an−1 =
0. (It also follows that the coefficient of z n−2 in p 0 (z) is also zero, so (n − 1)z A0 =
Pn−1 0 Pn
k=1 z k = 0.) Squaring the equation j=1 z j = 0, we find that

n
z 2j = −2
X X
Z= z j z k = −2an−2 .
j=1 1≤ j<k≤n

This leads to a simple relationship between Z and Z 0 :

Theorem 4 ([2, equation (1.9)]). Denote the zeros of the polynomials p and p 0 as
above, and suppose that their centroids are located at the origin. Then
n−1 n
0
X
0 2 (n − 2) X 2 (n − 2)
Z = (z k ) = zj = Z. (2)
k=1
n j=1
n

Proof. Using z A = 0,
n−2
p0 X ja j
= z n−1 + z j−1 ,
n j=1
n

and it follows that


2(n − 2) (n − 2)
Z0 = − an−2 = Z,
n n
as stated.

Finally, the proof of Theorem 1 proceeds as follows. There is no loss of generality


in translating the points of the complex plane so that z A = 0. All lines of best fit, those
for the zeros and those for the critical
√ points, necessarily pass through√ the origin, and,
by Theorem 3, are in directions Z associated with √ the zeros
√ and Z 0 associated

with the critical points. However, by Theorem 4, Z = (n − 2)/n Z , and this
0

completes the proof.


Theorem 1 can be applied repeatedly. The lines of best fit for the zeros of any
derivative p (k) , 1 ≤ k ≤ (n − 2), coincide with the lines of best fit for the zeros of the
original polynomial. If the quadratic p (n−2) obtained by (n − 2) differentiations of the
original polynomial has distinct roots, the line through these roots is the line of best fit
for the original polynomial. The line(s) of best fit for the zeros of p is (are) completely
determined by the coefficients an−1 and an−2 .

March 2011] NOTES 263


While it seems highly likely that our Theorem 1 is a rediscovery, to date a search
of the reasonably-accessible English-language publications has not found any prior
publication. The proof by assembling results from this M ONTHLY is original. There is
a longer account at the author’s website, and this will be updated with related results
and further references—especially, if found, those for any prior publication.

REFERENCES

1. D. Minda and S. Phelps, Triangles, ellipses, and cubic polynomials, Amer. Math. Monthly 115 (2008)
679–689.
2. I. J. Schoenberg, A conjectured analogue of Rolle’s theorem for polynomials with real or complex coeffi-
cients, Amer. Math. Monthly 93 (1986) 8–13. doi:10.2307/2322536

School of Mathematics and Statistics, University of Western Australia, Nedlands 6009, Western Australia,
and Department of Mathematics and Statistics, Curtin University, Bentley 6102, Western Australia
Grant.Keady@graduate.uwa.edu.au

Regular Matchstick Graphs


Sascha Kurz and Rom Pinchasi

Abstract. A matchstick graph is a plane geometric graph in which every edge has length 1 and
no two edges cross each other. It was conjectured that no 5-regular matchstick graph exists. In
this paper we prove this conjecture.

1. INTRODUCTION. A matchstick graph is a plane geometric graph in which ev-


ery edge is a line segment of length 1 and no two edges cross. (See for example the
Harborth graph in Figure 1.)

Figure 1. The Harborth graph.

We call a matchstick graph r -regular if every vertex has degree r . At an Oberwol-


fach meeting for discrete geometry in 1981 Heiko Harborth asked for the minimum
doi:10.4169/amer.math.monthly.118.03.264

264 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



While it seems highly likely that our Theorem 1 is a rediscovery, to date a search
of the reasonably-accessible English-language publications has not found any prior
publication. The proof by assembling results from this M ONTHLY is original. There is
a longer account at the author’s website, and this will be updated with related results
and further references—especially, if found, those for any prior publication.

REFERENCES

1. D. Minda and S. Phelps, Triangles, ellipses, and cubic polynomials, Amer. Math. Monthly 115 (2008)
679–689.
2. I. J. Schoenberg, A conjectured analogue of Rolle’s theorem for polynomials with real or complex coeffi-
cients, Amer. Math. Monthly 93 (1986) 8–13. doi:10.2307/2322536

School of Mathematics and Statistics, University of Western Australia, Nedlands 6009, Western Australia,
and Department of Mathematics and Statistics, Curtin University, Bentley 6102, Western Australia
Grant.Keady@graduate.uwa.edu.au

Regular Matchstick Graphs


Sascha Kurz and Rom Pinchasi

Abstract. A matchstick graph is a plane geometric graph in which every edge has length 1 and
no two edges cross each other. It was conjectured that no 5-regular matchstick graph exists. In
this paper we prove this conjecture.

1. INTRODUCTION. A matchstick graph is a plane geometric graph in which ev-


ery edge is a line segment of length 1 and no two edges cross. (See for example the
Harborth graph in Figure 1.)

Figure 1. The Harborth graph.

We call a matchstick graph r -regular if every vertex has degree r . At an Oberwol-


fach meeting for discrete geometry in 1981 Heiko Harborth asked for the minimum
doi:10.4169/amer.math.monthly.118.03.264

264 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



vertex number m(r ) of an r -regular matchstick graph; see also [2]. Obviously we have
m(0) = 1, m(1) = 2, and m(2) = 3, corresponding to a single vertex, a single edge,
and a triangle, respectively.
The determination of m(3) = 8 is an entertaining amusement. For degree r = 4 the
exact determination of m(4) is unsettled so far. The smallest known example is the
so-called Harborth graph, see, e.g., [3, p. 171], yielding m(4) ≤ 52. While it is not
too hard to show m(4) ≥ 20 by hand, a customized exhaustive computer search [4] is
needed to obtain the lower bound m(4) ≥ 34. The hunt goes on to prove or disprove
the optimality of the Harborth graph.
By the Eulerian polyhedron formula, every finite planar graph contains a vertex
of degree at most five, so that we have m(r ) = ∞ for r ≥ 6. The question whether
m(5) is finite has a long history. As early as in 1982 Aart Blokhuis proposed a proof
of the nonexistence of a 5-regular matchstick graph. It circulated for a while until a
gap was discovered. In 2009 the old memorandum reappeared and turned out to be
basically correct although containing several misprints; see [1] for a slightly edited
version. During the years there was a strong belief, based on unpublished proofs, that
no matchstick graph of degree 5 exists. In this paper we give a short proof of this
conjecture.

2. 5-REGULAR MATCHSTICK GRAPHS.

Theorem 1. Finite 5-regular matchstick graphs do not exist.

Proof. Suppose to the contrary that there is such a graph M which we consider also
as a planar map, that is, a crossing-free embedding of a planar graph in the plane. This
drawing consists of vertices, edges, and faces. Without loss of generality we assume
that this graph is connected and denote by V the number of its vertices, by E the
number of its edges, and by F the number of faces in the planar map M. By Euler’s
formula we have V − E + F = 2. For every k ≥ 3 we denote by f k the number of
faces in M with precisely P k edges.
We observe that 2E = k f k = 5V and F = f k . Therefore
P

5 X X
−6 = −3V + E + 2E − 3F = −3V + V + k fk − 3 fk
2
1 X
=− V + (k − 3) f k . (1)
2
The idea of the proof is to assign a charge to every vertex and every face in M so
that the total charge is negative. Then we will redistribute the charges according to a
simple local rule and reach a contradiction by showing that the charge of each vertex
and each face becomes nonnegative.
We begin by giving a charge of − 21 to each vertex and by giving a charge of k − 3
to each face in M with precisely k edges. By (1) the total charge of all the vertices and
faces is negative.
We redistribute the charge in the following very simple way. Consider a face T of
M and a vertex x of T . Let α denote the measure of the internal angle of T at x. If
α > π3 we take a charge of min 12 , 2π3 α − 12 from T and move it to x (see Figure 2).

We now show that after the redistribution of charges every vertex and every face has
a nonnegative charge. Consider a vertex x. Let ` denote the number of internal angles
at x that are greater than π3 . As the degree of x is equal to 5 we must have ` > 0. If
due to one of these ` angles we transfered a charge of 21 to x, then the charge at x is

March 2011] NOTES 265


1
min , 3
2 2π
α− 1
2

x
T α

Figure 2. The distribution of a charge from a face T to its vertex x.

nonnegative. Otherwise, denote by α1 , . . . , α` the measures of these ` internal angles


at x that are greater than π3 . For i = 1, . . . , ` we transfer a charge of 2π3 αi − 12 to x due
to αi . Hence the total charge transfered to x is 2π3
P`
αi − 2` . The angle count at x
P` P` i=1
gives 2π ≤ i=1 αi + (5 − `) 3 . This implies i=1 αi ≥ 2π − (5 − `) π3 = π3 (` + 1).
π

We conclude that the total charge transfered to x is at least 2π3 π3 (` + 1) − 2` = 21 . This


leaves x with a nonnegative charge.
Consider now a face T in M with k ≥ 3 edges. Assume first that T is a bounded
face. The initial charge of T is k − 3 ≥ 0. Therefore, if the charge at T becomes
negative this implies that one of the internal angles of T is greater than π3 . If k = 3,
that is, T is a triangle, then it must be an equilateral triangle, hence having all internal
angles equal to π3 . Consequently, the charge of T is unaffected and remains k − 3 = 0.
If k = 4, then T is a rhombus. If only two internal angles of T are greater than π3 ,
then at most a total charge of 1 was deducted from the initial charge of T , leaving its
charge nonnegative. If all internal angles of T are greater than π3 , then the total charge
deducted from T is at most 2π3 · 2π − 42 = 1, leaving the charge at T nonnegative.
If k = 5, then we split into two subcases. If at most four internal angles of T are
greater than π3 , then there is a deduction of at most 2 from the initial charge of T ,
leaving the charge at T nonnegative. If all five internal angles of T are greater than π3 ,
then as the sum of the internal angles of T is equal to 3π, the total charge deducted
from T amounts to 2π3 · 3π − 52 = 2, again leaving the charge at T nonnegative.
Finally, if k ≥ 6, then the charge deducted from T is at most k2 leaving T with a
charge of at least k − 3 − k2 ≥ 0.
It is left to consider the unbounded face S of M. If the number of edges of S is at
least 6, we are done as in the case of a bounded face. The cases where the unbounded
face consists of at most 5 edges can easily be excluded. Another way to settle this
issue is to observe that if S consists of at most 5 edges, then the total charge deducted
from S is at most 52 , leaving the charge of S at least − 52 (and in fact at least − 23 ). We
still obtain a contradiction as the sum of all charges should be equal to −6, while only
the unbounded face may remain with a negative charge, and its charge is not smaller
than − 25 .

266 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



3. CONCLUDING REMARKS. It is interesting to note that Theorem 1 is not true
if we consider it on the sphere. A matchstick graph drawn on a sphere is a drawing of
the vertices as points on the sphere and edges as great arcs connecting corresponding
points, with the property that the lengths of all connecting arcs are equal and no two
arcs cross. The example of the icosahedron shows that 5-regular matchstick graphs
may exist on a sphere (see Figure 3). In this example the charges of all triangles are
negative after the redistribution, since the angle sum of a spherical triangle is strictly
larger than π .

Figure 3. Icosahedron on a sphere.

REFERENCES

1. A. Blokhuis, Regular finite planar maps with equal edges (2009), available at http://www.wm.
uni-bayreuth.de/fileadmin/Sascha/aartb.pdf.
2. H. Harborth, Match sticks in the plane, in The Lighter Side of Mathematics, R. K. Guy and R. E. Woodrow,
eds., MAA Spectrum, Washington, DC, 1994, 281–288.
3. N. Hartsfield and G. Ringel, Pearls in Graph Theory. A Comprehensive Introduction, Academic Press,
Boston, 1990.
4. S. Kurz, Fast recognition of planar non-unit distance graphs—searching the minimum 4-regular planar
unit distance graph (preprint).

Department of Mathematics, University of Bayreuth, Universitätsstr. 30, 95447 Bayreuth, Germany


sascha.kurz@uni-bayreuth.de
Department of Mathematics, Technion–Israel Institute of Technology, Haifa 32000, Israel
room@math.technion.ac.il

March 2011] NOTES 267


A Recursive Scheme for Improving the
Original Rate of Convergence to the
Euler–Mascheroni Constant
Edward Chlebus

Abstract. We have used Euler–Maclaurin summation to develop a recursive scheme for mod-
ifying the original approximation for the Euler–Mascheroni constant γ . Convergence to γ re-
sulting from successively employing the proposed scheme has been significantly accelerated
while the form of the approximation originally introduced by Euler is still preserved.

1. INTRODUCTION. For n ∈ N let’s denote by γn(E) the difference between a par-


tial sum of the harmonic series
n
X 1 1 1 1
sn = =1+ + + ··· + (1)
k=1
k 2 3 n

and the natural logarithm of n + 1, i.e.,

γn(E) = sn − ln(n + 1). (2)

The Euler constant (also known as the Euler–Mascheroni constant) γ was introduced
by Euler [6] in 1734 (the paper [6], originally presented to the St. Petersburg Academy
of Sciences in 1734, was first published in 1740) as the limit of this difference (see
also [7]):

γ = lim γn(E) = lim [sn − ln(n + 1)] = 0.5772156649 . . . . (3)


n→∞ n→∞

The Euler–Mascheroni constant is usually defined not by (3) but by a slightly mod-
ified formula (see, e.g., [1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 13, 14, 17])
0
γ = lim γn(E ) = lim (sn − ln n) (4)
n→∞ n→∞

obtained from (3) by replacing γn(E) with the approximation


0
γn(E ) = sn − ln n. (5)

 easily observe, (4) is equivalent to (3) because ln(n + 1) = ln n +


As one can
ln 1 + n1 and limn→∞ ln 1 + n1 = 0.
0
The lower and upper bounds of γn(E ) − γ given by Young [17]

1 0 1
< γn(E ) − γ < (6)
2(n + 1) 2n
doi:10.4169/amer.math.monthly.118.03.268

268 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



0
imply very slow convergence of (4). Replacing γn(E ) in equation (4) with DeTemple’s
approximation [5]

1
 
γn(D) = sn − ln n + (7)
2
significantly speeds up the rate of convergence to γ since
1 1
< γn(D) − γ < . (8)
24(n + 1) 2 24n 2
0
Accuracy estimates of the approximations γn(E ) and γn(D) are also given in [1, 4, 8,
11, 13, 16]. Although some of them slightly differ from the aforementioned bounds
determined by Young and DeTemple, all the authors agree  that the errors contributed
by (5) and (7) are equal to 2n1 + O n12 and 24n1 2 + O n13 as indicated by (6) and (8),
respectively.
Negoi [10] has adopted DeTemple’s approach and further accelerated convergence
to γ by introducing

1 1
 
γn = sn − ln n + +
(N )
(9)
2 24n
0
in place of γn(E ) in (4) and proving that

1 1
− < γn(N ) − γ < − . (10)
48n 3 48(n + 1)3
The following key questions naturally arise at this point:
• What is the origin of the approximations γn(D) and γn(N ) that have such a profound
effect on the rate of convergence despite seemingly negligible modifications of (5)?
• How can we further speed up the rate of convergence to γ ?
Our objective is to address these problems.

2. A RECURSIVE SCHEME FOR ACCELERATING CONVERGENCE OF (4).


Using Euler–Maclaurin summation we get the formula for the error of the approxima-
0
tion γn(E ) [2, 7, 9]

1 X B2k
sn − ln n − γ = − . (11)
2n k=1 2kn 2k

The Bernoulli numbers B2k determine the terms of this expansion:


1 1 1 1 1 1
sn − ln n − γ = − + − + − + ··· .
2n 12n 2 120n 4 252n 6 240n 8 132n 10
(12)
1
By employing the Newton–Mercator series for 2n


1 (−1)k+1 1 1 1
  X
ln 1 + = = − 2+ − ··· (13)
2n k=1
k(2n)k 2n 8n 24n 3

March 2011] NOTES 269


and subtracting (13) from (12), we can eliminate the term 2n1 from the right-hand side
of (12):
1 1 1 23 1
  
sn − ln n 1 + −γ = − + −
2n 24n 2 24n 3 960n 4 160n 5
11 1 143 1
 
− − + +O . (14)
8064n 6 896n 7 30720n 8 n9
The first two terms on the left-hand side of (14) constitute a new approximation of the
Euler–Mascheroni constant:
1
  
γn = sn − ln n 1 +
(1)
. (15)
2n

Its accuracy γn(1) − γ = 24n1 2 + O n13 , explicitly determined by (14), is higher than

0
that of γn(E ) given by (6) and can be easily further increased by applying the Newton–
Mercator series again and eliminating the term 24n1 2 from (14) in the same way as 2n1
has been eliminated from (12). This process can be continued until the desired kth
approximation is obtained.
The kth approximation contributes an error
(k) −2k
γn(k) − γ = a2k n (k)
+ a2k+1 n −(2k+1) + a2k+2
(k)
n −(2k+2) + · · · (16)
(k) (k) (k)
depending on n and constant coefficients a2k , a2k+1 , a2k+2 , . . . . Note that −2k, not
−(k + 1), is the highest exponent of n for k ≥ 2 in (16). This choice seems to be
natural as there are no odd exponents of n (except for the first term) in the original
infinite expansion (11–12). It means in practice that for k ≥ 2 each subsequent approx-
imation γn(k) eliminates not one but two succesive terms of the series still left on the
right-hand side of the formula for γn(k−1) − γ .
Here are the three subsequent approximations directly following γn(1) (15):

1 1 1
    
γn = sn − ln n 1 +
(2)
1+ 1− (17)
2n 24n 2 24n 3
1 1
   
γn = sn − ln n 1 +
(3)
1+
2n 24n 2
1 143 1
   
× 1− 1 + 1 − (18)
24n 3 5760n 4 160n 5
1 1 1 143
     
γn(4) = sn − ln n 1 + 1+ 1 − 1 +
2n 24n 2 24n 3 5760n 4
1 151 1
   
× 1− 1 − 1 − (19)
160n 5 290304n 6 896n 7
and the respective errors they contribute:
143 1 151 1
γn(2) − γ = − − −
5760n 4 160n 5 290304n 6 896n 7
30893 1
 
+ +O (20)
6635520n 8 n9

270 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



151 1 109793 1
 
γn(3) − γ = − − + + O (21)
290304n 6 896n 7 22118400n 8 n9
109793 1
 
γn(4) − γ = + O . (22)
22118400n 8 n9

The proposed recursive scheme can be generalized as follows.

Theorem. For k ≥ 1 and l ≥ 2k let al(k) be the rational numbers defined as

(−1)l − 2l Bl
al(1) = (23)
l2l

for l ≥ 2 and

l  ml
X (−1) m +1 am(k)
al(k+1) = al(k) − l
(24)
m∈{2k,2k+1} m
l≡0 (mod m)

for k ≥ 1 and l ≥ 2(k + 1), where Bl are the Bernoulli numbers with B3 = B5 = B7 =
· · · = 0.
Then we have


1 al(1)
   X
γn(1) − γ = sn − ln n 1 + −γ = (25)
2n l=2
nl

and

k−1
"  ! (l)
!#
(l)
1 a a
Y
2l 2l+1
γn(k) − γ = sn − ln n 1 + 1 + 2l 1 + 2l+1
2n l=1 n n

X a (k)l
−γ = (26)
l=2k
nl

for k ≥ 2.

Corollary. For fixed k ≥ 2 and n → ∞,

k−1
"  ! (l)
!#
(l)
1 a a
Y
2l+1
γn(k) − γ = sn − ln n 1 + 1 + 2l2l 1 + 2l+1 −γ
2n l=1 n n
(k)
a2k 1
 
= 2k + O . (27)
n n 2k+1

The numbers a2(1) , a3(1) , a4(2) , a5(2) , a6(3) , a7(3) , . . . , a2k


(k)
can be determined by recursively

March 2011] NOTES 271


computing successive rows of the following triangular matrix

a2(1) a3(1) ··· (1)


a2k
a4(2) a5(2) ··· (2)
a2k
a6(3) a7(3) · · · (3)
a2k
..
.
(k)
a2k

consisting of k 2 entries.

3. RESULTS AND CONCLUSIONS. The scheme we have developed for improv-


ing the convergence rate to the Euler–Mascheroni constant works well. It is clearly
seen from (14), (20–22), and (27) that the order of the proposed approximation γn(k)
is O n12k , i.e., for increasing n the accuracy of every subsequent approximation is
higher than that of the preceding one.
We have empirically validated these theoretical results. Figure 1 shows absolute
values of the relative approximation error

γn(k) − γ
δn = × 100% (28)
γ

for all the derived approximations (15) and (17–19). Performance of the original ap-
proximations (2) and (5), and Negoi’s (9), is also illustrated for comparison.

100

0.01

0.0001

1e–006
| δ n | [%]

1e–008
γn(E ) (5)
1e–010 γn(E) (2)
γn(1) (15)
1e–012 γ (N) (9)
n
γn(2) (17)
1e–014 γ (3) (18)
n
γn(4) (19)
1e–016
5 10 15 20 25 30 35
n
Figure 1. Comparison of the rates of convergence to γ for various approximations.

272 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



The convergence improvement provided by our algorithm is clear. All the resulting
approximations (15) and (17–19) work better than the original ones (2) and (5) whose
absolute errors are nearly identical. The Negoi approximation γn(N ) successfully com-
0
petes with γn(E) , γn(E ) , and γn(1) but is outperformed by our remaining approximations
γn , γn , and γn .
(2) (3) (4)

The proposed scheme not only works well but also nicely explains the origin of
DeTemple’s and Negoi’s improvements of the original formula for γ . Note that De-
Temple’s approximation γn(D) is equivalent to γn(1) whereas Negoi’s approximation γn(N )
is a simplified version of γn(2) which can be expressed as
1 1 1 1 1 1
 
γn = sn − ln n + +
(2)
− − − − . (29)
2 24n 48n 2 48n 3 576n 4 1152n 5
Negoi has considered in his formula (9) only the first three terms of the logarithm
given in (29). The orders of the original approximation (6) and DeTemple’s approxi-
mation (8) are also consistent with those given by Euler–Maclaurin summation, (12)
and (14), respectively.
In this paper we have analyzed only the first four approximations resulting from
employing the proposed algorithm. The rate of convergence to γ can be improved
even further if one keeps applying our scheme (27) until the desired approximation
accuracy is obtained.
29,844,489,545 decimal digits of the Euler–Mascheroni constant γ are currently
known. They were computed for n = 233 in March 2009 by Yee and Chan [15] us-
ing the algorithm of Brent and McMillan [3] which is based on the modified Bessel
functions and avoids the need for computation of the Bernoulli numbers required for
Euler–Maclaurin summation. The error of this algorithm is O(e−8n ) [15].
Our approach, which still employs Euler–Maclaurin summation, is less computa-
tionally efficient than the Sweeney [12] or Brent–McMillan [3] methods. The nice
feature of the approximations γn(k) , however, is that they preserve the form γn(k) =
sn − ln [ f (n)] introduced by Euler (2) with only the original function f (n) = n + 1
being modified in order to improve the rate of convergence to γ .
ACKNOWLEDGMENTS. The author wishes to thank the anonymous reviewer for helpful comments.

REFERENCES

1. H. Alzer, Inequalities for the gamma and polygamma functions, Abh. Math. Sem. Univ. Hamburg 68
(1998) 363–372. doi:10.1007/BF02942573
2. T. M. Apostol, An elementary view of Euler’s summation formula, Amer. Math. Monthly 106 (1999)
409–418. doi:10.2307/2589145
3. R. P. Brent and E. M. McMillan, Some new algorithms for high-precision computation of Euler’s con-
stant, Math. Comp. 34 (1980) 305–312.
4. C. -P. Chen, Inequalities for the Euler–Mascheroni constant, Appl. Math. Lett. 23 (2010) 161–164. doi:
10.1016/j.aml.2009.09.005
5. D. W. DeTemple, A quicker convergence to Euler’s constant, Amer. Math. Monthly 100 (1993) 468–470.
doi:10.2307/2324300
6. L. Euler, De progressionibus harmonicis observationes, Commentarii academiae scientiarum imperialis
Petropolitanae 7 (1740) 150–161; also available at http://www.math.dartmouth.edu/~euler.
7. J. Havil, Gamma: Exploring Euler’s Constant, Princeton University Press, Princeton, 2003.
8. E. A. Karatsuba, On the computation of the Euler constant γ , Numer. Algorithms 24 (2000) 83–97. doi:
10.1023/A:1019137125281
9. D. E. Knuth, Euler’s constant to 1271 places, Math. Comp. 16 (1962) 275–281.
10. T. Negoi, A faster convergence to Euler’s constant, Math. Gaz. 83 (1999) 487–489. doi:10.2307/
3620963
11. A. Sintamarian, A generalization of Euler’s constant, Numer. Algorithms 46 (2007) 141–151. doi:10.
1007/s11075-007-9132-0

March 2011] NOTES 273


12. D. W. Sweeney, On the computation of Euler’s constant, Math. Comp. 17 (1963) 170–178.
13. S. R. Tims and J. A. Tyrrell, Approximate evaluation of Euler’s constant, Math. Gaz. 55 (1971) 65–67.
doi:10.2307/3613323
14. E. W. Weisstein, Euler–Mascheroni Constant—From MathWorld, A Wolfram Web Resource, http:
//mathworld.wolfram.com/Euler-MascheroniConstant.html.
15. A. J. Yee, Large computations (2010), available at http://www.numberworld.org/nagisa_runs/
computations.html.
16. L. Yingying, On Euler’s constant—calculating sums by integrals, Amer. Math. Monthly 109 (2002) 845–
850. doi:10.2307/3072374
17. R. M. Young, Euler’s constant, Math. Gaz. 75 (1991) 187–190. doi:10.2307/3620251

Department of Computer Science, Illinois Institute of Technology, 10 West 31st St., Chicago, IL 60616
chlebus@iit.edu

A New Construction of a Hyperbola

We give a method of locating points on the hyperbola H with equation x 2 /a 2 −


y 2 /b2 = 1 using ruler and compass, given the lengths a and b. To construct the
right branch of H (the left one is analogous), first draw the vertex V = (a, 0) and
the points P = (a, b) and Q = (a, −b). Draw the circle centered at the origin
O passing through P and Q, √ which crosses the x-axis at the foci F = (e, 0) and
F 0 = (−e, 0), where e = a 2 + b2 . Choose any point C on the segment PQ,
except for P and Q. Draw the circle k with center C that passes through V . The
x-axis is then one of the two tangents from F to k. Construct the second tangent t
from F to k, the tangential point being T (T can be found as the intersection of k
with the circle through V with center F). Similarly, draw the tangent u (different
from the x-axis) from the left focus F 0 to k, with tangential point U . Finally,
consider the intersection X of u and t. The trace of X when C moves on the line
segment PQ is the right branch of H .

X
U
u P t
k C
O T
F0 V F

To check the correctness of the construction, we use the fact that the points X of
the right branch of H are characterized by the property d(F 0 , X ) − d(F, X ) =
2a, where d denotes distance. According to our construction we have d(F 0 , U ) =
d(F 0 , V ), d(X, U ) = d(X, T ), and d(F, V ) = d(F, T ). Thus we get
d(F 0 , X ) − d(F, X ) = d(F 0 , U ) + d(X, U ) − (d(F, T ) + d(X, T ))
= d(F 0 , V ) − d(F, V ) = (e + a) − (e − a) = 2a.

—Submitted by Joachim Jäger, Saarbrücken, Germany

274 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



PROBLEMS AND SOLUTIONS
Edited by Gerald A. Edgar, Doug Hensley, Douglas B. West
with the collaboration of Mike Bennett, Itshak Borosh, Paul Bracken, Ezra A. Brown,
Randall Dougherty, Tamás Erdélyi, Zachary Franco, Christian Friesen, Ira M. Ges-
sel, László Lipták, Frederick W. Luttmann, Vania Mascioni, Frank B. Miles, Bog-
dan Petrenko, Richard Pfiefer, Cecil C. Rousseau, Leonard Smiley, Kenneth Stolarsky,
Richard Stong, Walter Stromquist, Daniel Ullman, Charles Vanden Eynden, Sam Van-
dervelde, and Fuzhen Zhang.

Proposed problems and solutions should be sent in duplicate to the MONTHLY


problems address on the back of the title page. Submitted solutions should arrive
at that address before July 31, 2011. Additional information, such as general-
izations and references, is welcome. The problem number and the solver’s name
and address should appear on each solution. An asterisk (*) after the number of
a problem or a part of a problem indicates that no solution is currently available.

PROBLEMS
11558. Proposed by Andrew McFarland, Płock, Poland. Given four concentric circles,
find a necessary and sufficient condition that there be a rectangle with one corner on
each circle.
11559. Proposed by Michel Bataille, Rouen, France. For positive p and x ∈ (0, 1),
define the sequence hxn i by x0 = 1, x1 = x, and, for n ≥ 1,
pxn−1 xn + (1 − p)xn2
xn+1 = .
(1 + p)xn−1 − pxn
Find positive real numbers α, β such that limn→∞ n α xn = β.
11560. Proposed by Gregory Galperin, Eastern Illinois University, Charleston, IL, and
Yury Ionin, Central Michigan University, Mount Pleasant, MI.
(a) The diagonals of a convex pentagon P0 P1 P2 P3 P4 divide it into 11 regions, of which
10 are triangular. Of these 10, five have two vertices on the diagonal P0 P2 . Prove that if
each of these has rational area, then the other five triangles, and the original pentagon,
all have rational areas.
(b) Let P0 , P1 , . . . , Pn−1 , n ≥ 5 be points in the plane. Suppose no three are collinear,
and, interpreting indices on Pk as periodic modulo n, suppose that for all k, Pk−1 Pk+1
is not parallel to Pk Pk+2 . Let Q k be the intersection of Pk−1 Pk+1 with Pk Pk+2 . Let αk
be the area of triangle Pk Q k Pk+1 , and let βk be the area of triangle Pk+1 Q k Q k+1 . For
0 ≤ j ≤ 2n − 1, let
(
α j/2 , if j is even;
γj =
β( j−1)/2 , if j is odd.

Interpreting indices on γ j as periodic modulo 2n, find the least m such that if m con-
secutive γ j are rational, then all are rational.
doi:10.4169/amer.math.monthly.118.03.275

March 2011] PROBLEMS AND SOLUTIONS 275


11561. Proposed by Cezar Lupu (student), University of Bucharest, Bucharest, Roma-
nia. Let f 1 , . . . , f n be continuous real valued functions on [0, 1], none identically zero,
R1
such that 0 f i (x) f j (x) d x = 0 if i 6 = j. Prove that
n Z 1 n Z 1
!2
Y Y
f k (x) d x ≥ n
2 n
f k (x) d x ,
k=1 0 k=1 0

n Z n Z
!2
X 1 X 1
f k2 (x) d x ≥ f k (x) d x , and
k=1 0 k=1 0
R1
f k2 (x) d x
n
X
2 ≥ n .
0 2
R
1
k=1
0 k
f (x) d x

11562. Proposed by Pál Péter Dályay, Szeged, Hungary. For positive a, b, c, and z,
let 9a,b,c (z) = 0((za + b + c)/(z + 2)), where 0 denotes the gamma function. Show
that 9a,b,c (z)9b,c,a (z)9c,a,b (z) is increasing in z for z ≥ 1.
11563. Proposed by Vlad Matei (student), University of Bucharest, Bucharest, Roma-
nia. For each integer k ≥ 2, find all nonconstant f in Z[x] such that for every prime
p, f ( p) has no nontrivial kth-power divisor.

SOLUTIONS

Explaining a Polynomial
11403 [2008, 949]. Proposed by Yaming Yu, University of California–Irvine, Irvine,
CA. Let n be an integer greater than 1, and let f n be the polynomial given by
n   i−1
X n Y
f n (x) = (−x)n−i (x + j).
i=0
i j=0

Find the degree of f n .


Solution by Nicolás Caro, IMPA, Rio de Janeiro, Brazil, and independently by Cos-
min Pohoata, Tudor Vianu National College, Bucharest, Romania. The degree of f n
is bn/2c. This follows immediately from the stronger statement that the coefficient of
x r in f n (x) is the number of derangements of [n] with r cycles, since each cycle must
have at least two elements. Here [n] = {1, . . . , n}, and a derangement is a permutation
with no fixed points.
Let c(n, k) be the number of permutations of [n] with k cycles (the unsigned Stirling
number of the first kind). The well-known generating function for these numbers is
given by nk=1 c(n, k)x k = n−1 j=0 (x + j) (provable in many ways, including induction
P Q
on n). Thus
n   i n   n−`
n n
(−x)`
X X X X
f n (x) = (−x) n−i k
c(i, k)x = c(n − `, k)x k
i=0
i k=1 `=0
` k=0
n   n n X r  
n ` n
(−1)` c(n −`, r −`)x r .
X X X
= (−1) r
c(n −`, r −`)x =
`=0
` r =` r =0 `=0
`

276 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



The coefficient of x r in this expression is precisely the inclusion-exclusion formula
to count permutations with r orbits that have no fixed points. The universe is the set
of permutations with r orbits, and the ith of the n sets to be avoided is the set of
permutations in which element i is a fixed point.
Editorial comment. Let d(n, r ) be the number of derangements of [n] with r cycles.
The fact that the coefficient of x r in f n (x) is d(n, r ) can also be proved by induction
on n using Pascal’s formula and the recurrence d(n, r ) = (n − 1)[d(n − 2, r − 1) +
d(n − 1, r )].
O. P. Lossers (and others) gave a short proof of the degree statement by observ-
ing that f n (x) is the nth derivative (with respect to t) of the product e−t x (1 − t)−x ,
evaluated at t = 0.
This polynomial appears explicitly in An Introduction to Combinatorial Analysis,
chapter 4 section 4, pp. 72–74 by Riordan (Wiley, 1958). It is also mentioned in Ad-
vanced Combinatorics, chapter VI, section 6.7, p. 256, by Comtet (Reidel, 1974) with
some references to previous non-combinatorial appearances in articles by Tricomi and
Carlitz.
Also solved by T. Amdeberhan & S. B. Ekhad, R. Bagby, D. Beckwith, R. Chapman (U.K.), P. Corn,
P. P. Dályay (Hungary), O. Geupel (Germany), D. Grinberg, J. Grivaux (France), S. J. Herschkorn, E. Hys-
nelaj (Australia) & E. Bojaxhiu (Albania), D. E. Knuth, O. Kouba (Syria), J. H. Lindsey II, O. P. Lossers
(Netherlands), Á. Plaza & S. Falcón (Spain), M. A. Prasad (India), R. Pratt, O. G. Ruehr, B. Schmuland
(Canada), A. Stadler (Switzerland), J. H. Steelman, R. Stong, R. Tauraso (Italy), M. Tetiva (Romania), BSI
Problems Group (Germany), GCHQ Problem Solving Group (U.K.), Microsoft Research Problems Group,
NSA Problems Group, and the proposer.

Mean Inequalities
11434 [2009, 463]. Proposed by Slavko Simic, Mathematical Institute SANU, Bel-
grade, Serbia. Fix n ∈ N with n ≥ 2. Let x1 , . . . , xn be distinct real numbers, and
let p1 , . . . , pn be positive numbers summing to 1. Let
Pn 3
Pn 3
k=1 pk x k − k=1 pk x k
S = P
n 2
Pn 2  .
3 p
k=1 k kx − p x
k=1 k k

Show that min{x1 , . . . , xn } ≤ S ≤ max{x1 , . . . , xn }.


Solution by Jim Simons, Cheltenham, U.K. Consider a probability distribution on the
real line that takes value x j with probability p j for 1 ≤ j ≤ n. Write µi0 for the ith
moment about 0 and µi for the ith moment about the mean µ01 . Now

µ03 − µ01 3 µ3 + 3µ01 µ2 µ3


S= = = µ01 + .
3(µ02 − µ01 2 ) 3µ2 3µ2
From this we obtain inequalities stronger than those proposed:
1 2 1 2
min{x1 , . . . , xn } + µ01 ≤ S ≤ max{x1 , . . . , xn } + µ01 .
3 3 3 3

Also solved by R. Bagby, R. Chapman (U.K.), M. P. Cohen, W. J. Cowieson, P. P. Dályay (Hungary), H. Deh-
ghan (Iran), P. J. Fitzsimmons, D. Grinberg, E. A. Herman, T. Konstantopoulos (U.K.), O. Kouba (Syria), J. H.
Lindsey II, O. P. Lossers (Netherlands), J. Posch, K. Schilling, B. Schmuland (Canada), R. Stong, M. Tetiva
(Romania), B. Tomper, BSI Problems Group (Germany), GCHQ Problem Solving Group (U.K.), Microsoft
Research Problems Group, and the proposer.

March 2011] PROBLEMS AND SOLUTIONS 277


A Circumradius Equation
11443 [2009, 548]. Proposed by Eugen Ionascu, Columbus State University, Colum-
bus, GA. Consider a triangle ABC with circumcenter O and circumradius R. Denote
the distances from O to the sides AB, BC, CA, respectively, by x, y, z. Show that if
ABC is acute then R 3 − (x 2 + y 2 + z 2 )R = 2x yz, and (x 2 + y 2 + z 2 )R − R 3 = 2x yz
otherwise.
Solution by Philip Benjamin, Berkeley College,Woodland Park, NJ. We first prove the
identity

1 − cos2 A + cos2 B + cos2 C = 2 cos A cos B cos C.



(∗)

Indeed, C = π − (A + B), so cos C = − cos(A + B) = sin A sin B − cos A cos B.


Isolating sin A sin B and squaring yields cos2 A cos2 B + 2 cos A cos B cos C +
cos2 C = sin2 A sin2 B = 1 − cos2 A − cos2 B + cos2 A cos2 B. This simplifies to
(∗).
Let the side lengths a, b, and c be opposite angles A, B, and C, respectively, so
a/ sin A = b/ sin B = c/ sin C = 2R. The perpendicular from O to side AB bisects
AB, so we have a right triangle with side lengths x, c/2, and R. Since c = 2R sin C,
we conclude that x = R|cos C|. Similarly y = R|cos A| and z = R|cos B|. If 4ABC is
acute, then the three cosines are positive, so multiplying (∗) by R 3 produces the desired
result. Otherwise, say angle C is right or obtuse. Now x = −R cos C and the other two
cosines are positive. Again, multiplying (∗) by R 3 produces the desired result.
Editorial comment. A similar problem was proposed in Crux Mathematicorum with
Mathematical Mayhem, December, 2008, Problem 3395.
Also solved by A. Alt, H. Bailey, M. Bataille (France), D. Beckwith, R. Chapman (U.K.), L. Csete (Hungary),
C. Curtis, P. P. Dályay (Hungary), P. De (India), D. Fleischman, V. V. Garcı́a (Spain), M. Garner & J. Zacharias,
O. Geupel (Germany), M. Goldenberg & M. Kaplan, M. R. Gopal, D. Gove, J.-P. Grivaux (France), L. Herot,
J. G. Heuver (Canada), E. Hysnelaj (Australia) & E. Bojaxhiu (Germany), Y. K. Jeon (Korea), G. A. Kandall,
Y. H. Kim (Korea), L. R. King, B. Klotzsche, T. Konstantopoulos (U.K.), O. Kouba (Syria), K.-W. Lau (China),
J. C. Linders (Netherlands), J. H. Lindsey II, O. P. Lossers (Netherlands), J. McHugh, J. Minkus, J. H. Nieto
(Venezuela), P. Nüesch (Switzerland), J. Oelschlager, M. Omarjee (France), J. Posch, C. R. & S. Selvaraj, R. A.
Simon (Chile), J. Simons (U.K.), R. Stong, M. Tetiva (Romania), B. Tomper, Z. Vörös (Hungary), M. Vowe
(Switzerland), H. Widmer (Switzerland), S. Xiao (Canada), Con Amore Problem Group (Denmark), GCHQ
Problem Solving Group (U.K.), NSA Problems Group, and the proposer.

Extrema
11449 [2009, 647]. Proposed by Michel Bataille, Rouen, France. (corrected) Find the
maximum and minimum values of
(a 3 + b3 + c3 )2
(b2 + c2 )(c2 + a 2 )(a 2 + b2 )
given that a + b ≥ c > 0, b + c ≥ a > 0, and c + a ≥ b > 0.
Solution by Jim Simons, Cheltenham, U.K. Call this big expression X . Since X is
homogeneous, we may assume a 2 + b2 + c2 = 1. The feasible region then consists
of a triangular patch on the positive octant of the unit sphere, excluding the ver-
tices (where one of a, b, c is zero), but including the interiors of the sides (where
two of a, b, c are equal). Using spherical polar coordinates, we may set (a, b, c) =

278 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



(cos α, sin α cos θ, sin α sin θ), where, since a, b, c are positive, θ is uniquely deter-
mined and 0 < θ < π/2. Now

2
cos3 α + sin3 α(cos3 θ + sin3 θ )
X=
sin2 α cos2 α + sin2 α sin2 θ cos2 α + sin2 α cos2 θ
 

2
cos3 α + sin3 α(cos3 θ + sin3 θ )
= .
sin2 α cos4 α + cos2 α sin2 α + sin4 α sin2 θ cos2 θ

If f (θ ) = cos3 θ + sin3 θ , then f 0 (θ) = 3 cos θ sin θ (sin θ − cos θ ). In the feasible re-
gion for θ , this is positive for θ > π/4 and negative for θ < π/4. Thus f , and with
it, the numerator of X for fixed α, is less at θ = π/4 than at any other feasible θ .
Similarly, if g(θ) = sin2 θ cos2 θ , g, and with it, the denominator of X for fixed α, is
increasing in θ for θ < π/4 and decreasing in θ for θ > π/4. Thus X is, for fixed α,
smallest at θ = π/4, and greatest at an edge of the feasible region. By symmetry, the
minimum value of X is 9/8, attained when a = b = c.
From the foregoing, the maximum value of X on the closure of the feasible re-
gion occurs at a point where, with respect to any translation into spherical coordi-
nates, θ is extremal. The only such points are the corners of the region. At (a, b, c) =
(2−1/2 , 2−1/2 , 0), X = 2. However, this maximum is not attained because these corners
are not in the feasible region.

Also solved by R. Agnew, A. Alt, R. Bagby, D. Beckwith, H. Caerols & R. Pellicer (Chile), R. Chapman
(U.K.), H. Chen, C. Curtis, Y. Dumont (France), D. Fleischman, J.-P. Grivaux (France), E. A. Herman, F.
Holland (Ireland), T. Konstantopoulos (U.K.), O. Kouba (Syria), A. Lenskold, J. H. Lindsey II, P. Perfetti
(Italy), N. C. Singer, R. Stong, T. Tam, R. Tauraso (Italy), M. Tetiva (Romania), E. I. Verriest, Z. Vörös
(Hungary), S. Wagon, G. D. White, GCHQ Problem Solving Group (U.K.), Microsoft Research Problems
Group, and the proposer.

Max Min Coordinate Difference

11450 [2009, 647]. Proposed by Cosmin Pohoata (student), National College “Tudor
Vianu,” Bucharest, Romania. Let A be the unit ball in Rn . Find
 
min ai − a j .

max
a∈A 1≤i< j≤n

Solution by Omran Kouba, Higher Institute for Applied Sciences and Technology,
Damascus, Syria. Let Mn denote the desired maximum. p It is implicit in the statement
of the problem that n ≥ 2. We show that Mn = 12/(n(n 2 − 1)).
Let (a1 , . . . , an ) be an element of A at which the maximum is achieved, and
let Mn = min{|ai − a j | : 1 ≤ i < j ≤ n}. There is a permutation σ of the set
{1, 2, . . . , n} such that aσ (1) ≤ aσ (2) ≤ · · · ≤ aσ (n) . Write for simplicity b j = aσ ( j) .
For j > i, we then have

j
X
b j − bi = (bk − bk−1 ) ≥ ( j − i)Mn .
k=i+1

March 2011] PROBLEMS AND SOLUTIONS 279


From this we conclude that |b j − bi | ≥ | j − i| Mn for 1 ≤ i, j ≤ n. Therefore
X X X
Mn2 ( j − i)2 ≤ (b j − bi )2 = (a j − ai )2
1≤i, j≤n 1≤i, j≤n 1≤i, j≤n
X
= (a 2j + ai2 − 2ai a j )
1≤i, j≤n

n n
!2
X X
≤ 2n ak2 − 2 ak ≤ 2n,
k=1 k=1

ak2 ≤ 1 when (a1 , . . . , an ) ∈ A. On the other hand,


Pn
since k=1

n n
!2
X X X n 2 (n 2 − 1)
( j − i) = 2n
2 2
k −2 k = .
1≤i, j≤n k=1 k=1
6
p
It follows that Mn2 ≤ 12/(n(n 2 − 1)), so Mn ≤ 12/(n(n 2 − 1)).
Conversely, if we consider (a1(0) , a2(0) , . . . , an(0 ) defined by
s  
(0) 12 n+1
ak = k− , k = 1, 2, . . . , n,
n(n 2 − 1) 2

then (a1(0) , . . . , an(0) ) ∈ A and


s
12
min ai(0) − a (0) .

=
j
1≤i< j≤n n(n 2 − 1)
p
Thus Mn ≥ 12/(n(n 2 − 1)).
Editorial comment. Marian Tetiva (Romania) notes that a stronger form of this prob-
lem appeared as Problem E2032, this M ONTHLY 76 (1969) 691–692, proposed by
D. S. Mitrinović. See also Problem 3.9.9 in Mitrinović, Analytic Inequalities (Springer-
Verlag, 1970).
Also solved by A. Alt, R. F. de Andrade, M. R. Avidon, R. Bagby, D. Beckwith, J. Cade, R. Chapman (U.K.),
L. Comerford, W. J. Cowieson, P. P. Dályay (Hungary), A. Fielbaum (Chile), D. Fleischman, O. Geupel (Ger-
many), J.-P. Grivaux (France), E. A. Herman, A. Ilić (Serbia), T. Konstantopoulos (U.K.), J. Kuplinsky, J. H.
Lindsey II, O. P. Lossers (Netherlands), M. D. Meyerson, D. Ray, K. Schilling, B. Schmuland (Canada), J.
Simons (U.K.), R. Stong, M. Tetiva (Romania), E. I. Verriest, GCHQ Problem Solving Group (U.K.), NSA
Problems Group, and the proposer.

A Cauchy–Schwarz Puzzle
11458 [2009, 747]. Proposed by Cezar Lupu (student), University of Bucharest,
Bucharest, Romania, and Vicenţiu Rădulescu, Institute of Mathematics “Simon Stoi-
low” of the Romanian Academy, Bucharest, Romania. Let a1 , . . . , an be nonnegative
and let r be a positive integer. Show that
 2
X i r j r ai a j X n X i r j r k r ai a j ak
  ≤ m r −1 am .
1≤i, j≤n
i + j −1 m=1 1≤i, j,k≤n
i + j +k−2

280 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Solution by Francisco Vial,
Pn student, Pontificia Universidad Católica de Chile, Santi-
ago, Chile. Let f (x) := i=1 i r ai x i−1 , so
Z 1 n
X
f (x) d x = m r −1 am ,
0 m=1
 
Z 1 Z 1 X X i r j r ai a j
f 2 (x) d x =  i r j r ai a j x i+ j−2  d x = , and
0 0 1≤i, j≤n 1≤i, j≤n
i + j −1
 
1 1
i r j r k r ai a j ak
Z Z X X
f 3 (x) d x =  i r j r k r ai a j ak x i+ j+k−3  d x = .
0 0 1≤i, j,k≤n 1≤i, j,k≤n
i + j +k−2

The stated inequality is equivalent to


Z 1 2 Z 1  Z 1 
f 2 (x) d x ≤ f (x) d x f 3 (x) d x ,
0 0 0

which follows by applying the Cauchy–Schwarz inequality to f (x)1/2 and f (x)3/2 .


Remarks. Because a1 , . . . , an are nonnegative, f (x) is nonnegative and continuous
on [0, 1], so f (x)1/2 and f (x)3/2 are real and well defined. The parameter r need not
be an integer.
Also solved by M. R. Avidon, R. Chapman (U.K.), P. P. Dályay (Hungary), D. Grinberg, O. Kouba (Syria), O. P.
Lossers (Netherlands), J. Simons (U.K.), R. Stong, GCHQ Problem Solving Group (U.K.), and the proposers.

An Orthocenter Inequality
11461 [2009, 844]. Proposed by Panagiote Ligouras, Leonardo da Vinci High School,
Noci, Italy. Let a, b, and c be the lengths of the sides opposite vertices A, B, and C of
an acute triangle. Let H be the orthocenter. Let da be the distance from H to side BC,
and similarly for db and dc . Show that
  1/4
1 2 3 1 1 1
≥ √ +√ +√ .
da + db + dc 3 abc bc ca ab

Solution by Michael Vowe, Fachhochschule Nordwestschweiz, Muttenz, Switzerland.


Let R be the circumradius, r the inradius, F the area, and s the semiperimeter. From
da = 2R cos B cos C, db = 2R cos C cos A, dc = 2R cos A cos B, we obtain
  r
da + db + dc = 2R cos A cos B + cos B cos C + cos C cos A ≤ 2r 1 +
R
(see 6.10, p. 181, in D. Mitrinovic et al., Recent Advances in Geometric Inequalities,
Dordrecht, 1989). From Jensen’s inequality for concave functions (here, the square
root), we have
s   r
1 1 1 1 1 1 1 6s
√ +√ +√ ≤3· + + = .
ab bc ca 3 ab bc ca abc

March 2011] PROBLEMS AND SOLUTIONS 281


From abc = 4R F = 4Rr s and s 2 ≥ 27r 2 (6.1, p. 180, ibid.) we get
  1/4 √ !1/4
2 3 1 1 1 2 3 6s
√ +√ +√ ≤
3 abc bc ca ab 3 (abc)3/2
√ !1/4
2 3 6 1 23/8 1 1
≤ · √ = · 3/8 · 5/8 .
3 (4Rr ) 3/2
3 3r 3 R r

Thus it suffices to prove


1 23/8 1 1
 r  ≥ · 3/8 · 5/8 .
2r 1 + 3 R r
R
Writing x = r/R, this means we must prove x 3/8 (1 + x) ≤ 3/(2 · 23/8 ) for 0 < x ≤
1/2. The function f (x) = x 3/8 (1 + x) is increasing on [0, 1/2], though, and we are
done.
Equality holds only if x = 1/2, or equivalently, R = 2r , which makes the triangle
equilateral.
Also solved by P. P. Dályay (Hungary), O. Faynshteyn (Germany), K.-W. Lau (China), C. R. Pranesachar (In-
dia), R. Stong, GCHQ Problem Solving Group (U.K.), Microsoft Research Problems Group, and the proposer.

An Erroneous Claim
11465 [2009, 845]. Proposed by Pantelimon George Popescu, Polytechnic University
of Bucharest, Bucharest, Romania, and José Luis Dı́az-Barrero, Polytechnic University
of Catalonia, Barcelona, Spain. Consider three simple closed curves in the plane, of
lengths p1 , p2 , and p3 , enclosing areas A1 , A2 , and A3 , respectively. Show that if
p3 = p1 + p2 and A3 = A1 + A2 , then 8π A3 ≤ p32 .
Solution by the Texas State University Problem Solvers Group, San Marcos, TX. The
problem as stated is false. Consider the following counterexample. Let the first curve
be a square of side 1, so p1 = 4 and A1 = 1. Let the second curve be a square √ with
p2 = 40 and√A2 = 100. Let the third curve be a rectangle with sides 11 + 2 5 and
101/(11 + 2 5 ) so that p3 = 44 and A3 = 101. These three curves fulfill the require-
ments of the problem, and yet 8π A3 > p32 .
Let us incorporate the additional requirement that p12 + p22 = 2 p1 p2 . Then the re-
quired inequality can be proved as follows. The isoperimetric inequality applied to any
of the curves is
 p 2
i
Ai ≤ π ,

and thus 4π Ai ≤ pi2 . Therefore
4π A3 = 4π A1 + 4π A2 ≤ p12 + p22 .
With the newly-added condition we get
8π A3 = 8π A1 + 8π A2 ≤ 2 p12 + 2 p22 = p12 + p22 + 2 p1 p2 = ( p1 + p2 )2 = p32 .

Also solved by G. Apostolopoulos (Greece), R. Bagby, B. Burdick, R. Chapman (U.K.), W. J. Cowieson, P. P.


Dályay (Hungary), J.-P. Grivaux (France), K. Hanes, J. H. Lindsey II, M. D. Meyerson, J. Minkus, J. Simons
(U.K.), R. Stong, and the Microsoft Research Problems Group.

282 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



REVIEWS
Edited by Jeffrey Nunemacher
Mathematics and Computer Science, Ohio Wesleyan University, Delaware, OH 43015

Logical Labyrinths. By Raymond M. Smullyan. A K Peters, Ltd., Wellesley, MA, 2009, vii +
327 pp., ISBN 978-1-56881-443-8, $49.00.

Reviewed by Christopher C. Leary


It is the pervading law of all things organic and inorganic, of all things physical and meta-
physical, of all things human and all things superhuman, of all true manifestations of the head,
of the heart, of the soul, that the life is recognizable in its expression, that form ever follows
function. This is the law. [1]
—Louis Sullivan

Since I entered this world late in the Eisenhower administration, some of my earliest
introductions to the field of logic were through Raymond Smullyan’s work. I truly ad-
mire the authors among us who have the talent to open a field to others in an interesting
and non-threatening manner, and Smullyan has that talent to spare. His logical puzzle
books, from What is the Name of this Book? to Forever Undecided and beyond have
introduced thousands to the delights of formal logic, and his variations on knights and
knaves puzzles have been canonical material in logic classes for the past three decades,
and will be for decades to come.
Logical Labyrinths thus begins in familiar territory with the author guiding us into
propositional logic through informal reasoning and logic conundrums. This volume,
however, has a more ambitious goal than the author’s more popular works. Smullyan
says in the preface that it is “. . . a bridge from all my previous recreational puzzle books
to all my technical writing in the fascinating field of symbolic logic.” So this book aims
to be more of a textbook, helping the reader move from an intuitive understanding of
logical arguments to a mastery of specialized results that is on a par with that of first-
year graduate students. This is a daunting goal for a book of 320 pages, no matter
who the author. Although there is much to praise about this work, and indeed parts
of it are brilliant, I am left at the end with a feeling of disappointment. The overly
ambitious reach of the book seems to have left it without a core audience to whom the
author can pitch his prose. Smullyan treats the advanced topics at the conclusion of his
journey in much the same way that he treats the introductory material. Unfortunately,
Smullyan’s characteristic style, which works so well at the beginning of the book,
does not seem to fit as well at the end. In my opinion this makes Logical Labyrinths
somewhat unsatisfactory.
Think about your ideal mathematics class. Maybe for you it is a first proofs course,
or an analysis course, or the high school geometry class that your chair has not let you
teach for the last fifteen years. Whatever the subject is, now think about the students in
that class. Whether they are advanced students or raw beginners, most likely you have
thought of these students as bright, inquisitive, somewhat self-motivated, and eager to
join you as you explore the subject matter at hand. You are looking forward to sharing
your love of your field with them, exploring the main ideas as well as showing them
doi:10.4169/amer.math.monthly.118.03.283

March 2011] REVIEWS 283


(or having them discover) some of your favorite little gems that are, perhaps, off the
beaten path. You smile with anticipation as you look forward to the end of the class,
knowing that you will have inspired them, that they will take with them at least the
major points of the class, and most of all that they will remember their enthusiasm as,
together, you explored this wonderful area of mathematics. This ideal student in this
ideal class is the perfect audience for any author, but perhaps more so for Smullyan
and his distinctive method of introducing logic.
Much is demanded from the reader of this text. For without a doubt, Smullyan is
not so much interested in laying out the foundations of logic as in having the reader
discover the foundations of logic. His initial puzzles about liars and truth-tellers, in-
sane liars and spies are set out with a dual purpose: certainly they set the stage for a
somewhat more formal development of propositional logic in later chapters, but also
they are designed to engage the reader, to entice her into the subject, and to get her to
think, and think hard, about the questions posed by the puzzle. And the ideal student
of the last paragraph will do just that, taking each question as a challenge and putting
thought and time into each puzzle in order to build her skills and insight as she reads.
Of course, that’s not what I did. The kids needed to get to school, there were gro-
ceries to buy, I had to get to work, and the car needed to go to the shop. And let’s not
fool ourselves—I have better time management skills and fewer demands on my time
than most undergraduates, many of whom have to do all of that as well as find time to
write papers and do the reading for three or four other classes.
So what did I do? After I read one of Dr. Smullyan’s puzzles, I thought about it
for maybe five seconds and then I skipped ahead a couple of pages to the solution,
all nicely laid out for me in black and white, thereby undermining the developmental
plan that the author has crafted in his text. Dana Scott, in his endorsement on the back
cover of Logical Labyrinths, says it nicely: “Suitable for self-study, the book will be
most valuable if the readers agree not to peek at the solutions before learning to think
for themselves!” Unfortunately, if I were considering this book as a text, it would be
hard for me to believe that the students, even the students in my ideal class, would be
so restrained as to not peek.
We all know that reading mathematics is an active process. All authors whose goals
include engaging the reader in active reading have the problem of how to encourage
their readers to work through the material while, at the same time, providing the in-
formation in the text that they wish to convey. After posing a problem that the reader
should ponder, the author faces this challenge: find the best way to provide a solution
for the reader who either thinks that she has an answer and would like to know if it
is correct, or has tried and is stymied, or is just too lazy to put in the effort but has to
get this chapter finished before 9 a.m. tomorrow, and the author has to do all of this
without undermining the active reading that is desired.
George Exner, in his invaluable little volume An Accompaniment to Higher Mathe-
matics [1], uses the symbol

284 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



to encourage the readers of his book to stop, pick up a pencil and some paper, and
write something down, explicitly encouraging the active engagement of the reader. At
the end of the text he provides hints, but no solutions, to his problems, exercises, and
explorations. Jim Henle’s An Outline of Set Theory [2] follows a similar plan, laying
out what is essentially an inquiry-based learning script in the first third of the book,
hints for the exercises in the middle third, and solutions in the final third of the book.
When I taught set theory using this text, I distributed large paper clips and told the
students to paper clip the last part of the book shut until the end of the term. This was
a reasonably successful experiment, at least to the extent that several of the students
assured me that they had not, in fact, looked at the solutions during the course of the
semester. In some sense, the author would like there to be some small cost involved in
getting the answer, a cost that will encourage the reader to put forth the effort needed
to receive maximal benefit from the problem that is posed.
Perhaps today’s technology can help us with the problem of encouraging active
reading while still providing the information that is needed. For example, authors and
publishers could provide a web-based answer key. I can envision a system where users
would register and references to answers would cost some nominal amount of real
money that would be donated to a charity of the user’s choice, like the Red Cross or
the MAA. Charge people 20c per answer with a limit of $10.00 per book, and the bar
would be set high enough that readers would be encouraged to think about questions
before running to get the answers, and we could do some good in the world, as well.
Returning to the book at hand, my personal feeling is that Smullyan’s cost for find-
ing the solutions to his puzzles is too low. At least it is too low to prevent me from
cheating myself. I don’t know why flipping forward three pages makes it more likely
that I will peek than looking at the back of the book, but I have the strong feeling that
it does, and this makes me feel that the puzzle-book style that Professor Smullyan has
used so effectively in his popular books is not as effective in a potential textbook.
The list of topics covered in Logical Labyrinths is impressively, almost dauntingly,
long. Starting from scratch, Smullyan brings us through propositional and first-order
logic, proof tableaux, enough set theory to get to König’s infinity lemma, complete-
ness, compactness, interpolation, and Beth’s definability theorem. Certainly a student
who grasps all of these arguments and understands the import of these theorems will
have gained a solid understanding of first-order logic, although arguably logic exam-
ined more from a philosopher’s point of view than from a mathematician’s point of
view. I must say that I found the selection of advanced topics to be somewhat idiosyn-
cratic. Incompleteness is only lightly covered in the last chapter (more on this later),
and very little is said about the interplay between logic and computer science. These
seem to be opportunities that the author chose to ignore, and I feel that the book ends
up being slightly unbalanced because of these choices.
There are parts of the book that just sing. I was very taken with Professor Smullyan’s
treatment of cardinality and Cantor’s theorem, and the technical explanation of the
tableaux method and the subsequent proofs of its completeness and correctness are
compelling. Unfortunately, the end of the book was, for me, problematic. Being fore-
warned that the book was about generalization, I was not surprised to have the final
sections be about a grand unification theorem for the major points of the book, but at
the end of the day I wasn’t convinced that I cared.
Every time I read a book I expect that the author had a reason for taking the trouble
to write it. One of the things that I like an author to do is to have a point of view, to
explain that point of view, and to argue that his or her point of view is correct. Smullyan
states that his goal in this book is to bring beginners to an understanding of research in
symbolic logic. That is a fine goal, and he does an admirable job of achieving it. But

March 2011] REVIEWS 285


I do not feel that he ever told me why the reader should be interested in the advanced
results that he presents. In many logic texts (including the one that I wrote) there seems
to be a natural stopping point after the incompleteness theorems of Gödel, where the
implications of the theorems are easy to explain and defend. Smullyan detours around
this point, however, and at the end of Logical Labyrinths I find myself wondering why I
should care about the last few chapters. Where were the examples and the motivation?
Chapter 23 proves Craig’s interpolation lemma, promising us important applications
in the next two chapters. Chapter 24 uses the interpolation lemma to prove Robinson’s
consistency theorem and Chapter 25 proves Beth’s definability theorem, but never are
we given a clear idea about the reason that these two results are important. Perhaps
their utility is supposed to be obvious, but I wanted more explanation from the text. At
some point the author needs to inspire us, not just challenge us and quiz us.
The back cover of Logical Labyrinths contains the usual mix of a publisher’s syn-
opsis and ringing endorsements from respected reviewers. The late Martin Gardner
correctly tells us, “It’s a volume that only Ray could have written . . . .” The puzzles,
the jokes, and the tone of the book are vintage Smullyan, both in the introductory
chapters and in the more advanced material. Unfortunately, this style does not seem
to work as well late in the book as it does early in the book. The immediate attrac-
tion of the logic puzzles and paradoxes combined with the engaging style and wit of
the author make the early chapters of Logical Labyrinths compelling reading, but that
charm is lost by the end of the book. Although the form is consistent throughout, I
believe that for most readers the function of the later chapters of this text needs to be
different than the function of the earlier chapters. By losing sight of Sullivan’s dictum,
Professor Smullyan lets us down.

REFERENCES

1. G. R. Exner, An Accompaniment to Higher Mathematics, Springer-Verlag, New York, 1996.


2. J. M. Henle, An Outline of Set Theory, Springer-Verlag, New York, 1986.
3. L. H. Sullivan, The tall office building artistically considered, Lippincott’s Magazine series 57 (1896)
403–409.

SUNY Geneseo, Geneseo NY 14454


leary@geneseo.edu

286 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118



Methods For
Euclidean Geometry
Owen Byer,
Felix Lazebnik,
and Deirdre L. Smeltzer

Series: Classroom Resource Materials

Methods for Euclidean Geometry explores


one of the oldest and most beautiful of
mathematical subjects. The book begins
with a thorough presentation of classical
solution methods for plane geometry prob-
lems, but its distinguishing feature is the
subsequent collection of methods which
have appeared since 1600.

For example, the coordinate method, which


is a central part of the book, has been part
of mathematics for four centuries. Howev-
er, it has rarely served as a tool that students
consider using when faced with geometry
problems. The same holds true regarding
the use of trigonometry, vectors, complex numbers, and transformations. The
book presents each of these as self-contained topics, providing examples of their
applications to geometry problems. Both strengths and weaknesses of various
methods, as well as the ranges of their effective applications, are discussed.

Importance is placed on the problems and their solutions. The book contains
numerous problems of varying difficulty; over a third of its contents are devoted
to problem statements, hints, and complete solutions. The book can be used as
a textbook for geometry courses; as a source book for geometry and other math-
ematics courses; for capstone, problem-solving, and enrichment courses; and for
independent study courses.
480 pp., 2010, ISBN 978-0-88385-763-2, Hardbound
List: $69.95 MAA Member: $55.95 Catalog Code: MEG

To order visit us online at www.maa.org or call us at


1-800-331-1622.
New from the MAA
A Historian Looks Back
The Calculus as Algebra and
Selected Writings
Judith V. Grabiner

Judith Grabiner, the author of A Historian Looks


Back, has long been interested in investigating what
mathematicians actually do, and how mathematics
actually has developed. She addresses the results of her
investigations not principally to other historians, but to mathematicians and teachers
of mathematics. This book brings together much of what she has had to say to this
audience.

The centerpiece of the book is The Calculus as Algebra: J.-L. Lagrange, 1736-
1813. The book describes the achievements, setbacks, and influence of Lagrange’s
pioneering attempt to reduce the calculus to algebra. Nine additional articles round
out the book describing the history of the derivative; the origin of delta-epsilon proofs;
Descartes and problem solving; the contrast between the calculus of Newton and
Maclaurin, and that of Lagrange; Maclaurin’s way of doing mathematics and science
and his surprisingly important influence; some widely held “myths” about the history
of mathematics; Lagrange’s attempt to prove Euclid’s parallel postulate; and the central
role that mathematics has played throughout the history of western civilization.

The development of mathematics cannot be programmed or predicted. Still, seeing how


ideas have been formed over time and what the difficulties were can help teachers find
new ways to explain mathematics. Appreciating its cultural background can humanize
mathematics for students. And famous mathematicians’ struggles and successes should
interest—and perhaps inspire—researchers. Readers will see not only what the
mathematical past was like, but also how important parts of the mathematical present
came to be.

To order visit us online at www.maa.org or call us at 1-800-331-1622.

282 pp., Hardbound, 2010 Catalog Code: CAGH ISBN 978-0-88385-572-0


List: $62.95 MAA Member: $49.95
New from the MAA
Functions, Data and Models
An Applied Approach to College
Algebra
Sheldon P. Gordon & Florence S.
Gordon

Series: MAA Textbooks

This is a college algebra-level textbook


written to provide the kind of mathemati- 504 pp., Hardbound 2010
ISBN 978-0-88385-767-0
cal knowledge and experiences that stu- List: $69.95
dents will need for courses in other fields, MAA Member: $55.95
such as biology, chemistry, business, fi- Catalog Code: COS
nance, economics, and other areas that
are heavily dependent on data either from
laboratory experiments or from other
studies. The focus is on the fundamental mathematical concepts
and the realistic problem-solving via mathematical modeling rather
than the development of algebraic skills that might be needed in
calculus.

Functions, Data, and Models presents college algebra in a way that


differs from almost all college algebra books available today. Rather
than going over material covered in high school courses the Gor-
dons teach something new. Students are given an introduction to
data analysis and mathematical modeling presented at a level that
students with limited algebraic skills can understand. The book con-
tains a rich set of exercises, many of which use real data. Also in-
cluded are thought experiments or what if questions that are meant
to stretch the student’s mathematical thinking.

To order visit us online at www.maa.org or call us at 1-800-331-1622.


MATHEMATICAL ASSOCIATION OF AMERICA 1529 Eighteenth St., NW • Washington, DC 20036

New title by the MAA

Counterexamples in Calculus
Sergiy Klymchuk
As a robust repertoire of examples is essential for students
to learn the practice of mathematics, so a mental library of
counterexamples is critical for students to grasp the logic of
mathematics. Counterexamples are tools that reveal incor-
rect beliefs. Without such tools, learners’ natural misconcep-
tions gradually harden into convictions that seriously impede
further learning. This slim volume brings the power of coun-
terexamples to bear on one of the largest and most important
courses in the mathematics curriculum.
—Professor Lynn Arthur Steen, St. Olaf College, Minnesota,
USA, Co-author of Counterexamples in Topology

Counterexamples in Calculus serves as a supplementary resource to en-


hance the learning experience in single variable calculus courses. This
book features carefully constructed incorrect mathematical statements
that require students to create counterexamples to disprove them. Methods
of producing these incorrect statements vary. At times the converse of a
well-known theorem is presented. In other instances crucial conditions are
RPLWWHGRUDOWHUHGRULQFRUUHFWGH¿QLWLRQVDUHHPSOR\HG,QFRUUHFWVWDWH-
ments are grouped topically with sections devoted to: Functions, Limits,
Continuity, Differential Calculus and Integral Calculus.

7KLV ERRN DLPV WR ¿OO D JDS LQ WKH OLWHUDWXUHDQG SURYLGH D UHVRXUFH IRU
using counterexamples as a pedagogical tool in the study of introductory
calculus. In that light it may well be useful for
‡ KLJKVFKRROWHDFKHUVDQGXQLYHUVLW\IDFXOW\DVDWHDFKLQJUHVRXUFH
‡ KLJKVFKRRODQGFROOHJHVWXGHQWVDVDOHDUQLQJUHVRXUFH
‡ DSURIHVVLRQDOGHYHORSPHQWUHVRXUFHIRUFDOFXOXVLQVWUXFWRUV
Catalog Code: CXC
101pp., Paperbound, 2010
ISBN: 978-0-88385-756-6
List: $45.95
MAA Member: $35.95

Order your copy today!


1.800.331.1622 www.maa.org

You might also like