You are on page 1of 496

THE ALGEBRAIC FOUNDATIONS OF MATHEMATICS

This book is in the ADDISON-WESLEY SERIES I N INTRODUCTORY COLLEGE MATHEMATICS

Consulting Editors

RICHARD S. PIETERS

GAILS. YOUNG

THE ALGEBRAIC FOUNDATIONS OF MATHEMATICS


ROSS A . BEAUMOXT
and

RICHARD S. PIERCE
Department of Mathematics University of Washington

ADDISOK-WESLEY

PUBLISHING

COMPANY, INC.
LONDON

READING, MASSACHUSETTS

PALO ALTO

Copyright @ 1963 ADDISON-WESLEY PUBLISHING COMPANY, INC. Printed in the United States U;r; Arnerica
ALL RIGHTS RESERVED. THIS BOOK, OR PARTS THEREOF, MAY NOT B E REPRODUCED I N ANY FORM WITHOUT WRITTEN PERMISSION OF THE PUBLISHERS.

Library of Congress Catalog Card No. 623-8895

PREFACE This book is an offspring of two beliefs which the authors have held for many years: it is worthwhile for the average person to understand what rnathematics is al1 about; it is impossible to learn much about mathematics without doing mathematics. The first of these convictions seems to be accepted by most educated people. The second opinion is less widely held. Mathematicians teaching in liberal arts colleges and universities are often under pressure from their colleagues in the humanities and social sciences to offer short courses which will painlessly explain mathematics to students with varying backgrounds who are seeking a broad, liberal education. The extent to which such courses do not exist is a credit to the good sense of professional mathematicians. Mathematics is a big and difficult subject. I t embraces a rigid method of reasoning, a concise form of expression, and a variety of new concepts and viewpoints which are quite different from those encountered in everyday life. There is no such thing as "descriptive7' mathematics. In order to find answers to the questions "What is mathematics?" and "What do mathematicians do?", it is necessary to learn something of the logic, the language, and the philosophy of mathematics. This cannot be done by listening to a few entertaining lectures, but only by active contact with the content of real mathematics. I t is the authors' hope that this book will provide the means for this necessary contact. For most people, the road from marketplace arithmetic to the border of real mathematics is long and steep. I t usually takes severa1 years to make this journey. Fortunately, because of the improving curriculum in high schools, many students are completing the elementary mathematics included in algebra, geometry, and trigonometry before entering college, so that as college freshmen they can begin to appreciate the attractions of sophisticated mathematical ideas. Many of these students have even been exposed to the new programs for school mathematics which introduce modern mathematical ideas and methods. Too often, such students are shunted into a college algebra or elementary calculus course, where the main emphasis is on mathematical formalism and manipulation. Any enthusiasm for creative thinking which a student may carry into college will quickly be blunted by such a course. It is often claimed that the manipulative skills acquired in elementary algebra and calculus are what a student needs for the application of mathematics to science and engineering, and indeed to the practica1 problems of life. Although not altogether wrong, this argument overlooks the obvious fact that in almost any situation, the ability to use mathematical technique and reasoning is more valuable than the ability to manipulate and calculate accurately. v

vi

PREFACE

Elementary college algebra and calculus courses usually cultivate manipulation a t the expense of logical reasoning, and they give the student almost no idea of what mathematics is really like. It is often painfully evident to an instructor in, say, a senior leve1 course in abstract algebra that the average mathematics major in his class has a very distorted idea of the nature of mathematics. The object of this book is to present in a form suitable for student consumption a small but important part of real mathematics. I t is concerned with topics related to the principal number systems of mathematics. The book treats those topics of algebra which are basic for advanced studies in mathematics and of fundamental importance for al1 working mathematicians. This is the reason t,hat we have entitled our book "The Algebraic Foundations of Mathematics. " In accord with the philosophy that students should be taught mathematics by exposing them to the mathematics of professional mathematicians, the book should be useful not only to students majoring in mathematics, but also to adequately prepared students of any speciality. Since mathematics is a logical science, it is appropriate that any book on real mathematics should emphasize mathematical proofs. The student who masters the technique and acquires the habit of mathematical proof is well on his way toward understanding the nature of mathematics. Such a mastery is hard to achieve, but it is within the reach of a large percentage of the college population. This book is not intended to be an easy one. I t is not meant for the college freshman with minimum preparation from high school. An apt student with three years of high school mathematics should be able to study most parts of the book with profit, but his progress may not be rapid. Appropriate places for the use of this book include: a freshman course to replace the standard precalculus college algebra for students who will progress to a rigorous treatment of calculus, a terminal course for liberal arts students with a good background in mathematics, an elementary honors course for mathematics majors, a course to follow a traditional calculus course to develop maturity, and a refresher course for high school mathematics teachers. The book is written in such a way that the law of diminishing returns will not set in too quiekly. That is, enough difficult material is included in most sections and chapters so that even the best students will be challenged. The student of more modest ability should keep this in mind in order to combat discouragement. Some sections digress from the main theme of the book. These are designated by a "star." For the most part, starred sections can be omitted without loss of continuity, although it may be necessary to refer to them for definitions. It should be emphasized that the starred sections are not the most difficult parts of the book. On the contrary, much of the material

PREFACE

vii

in these sections is very elementary. A star has been attached to just those sections which are not sufficiently important to be considered indispensable, but which are still too interesting to omit. The complete book can be covered in a two semester or three quarter course meeting three hours per week. The following table suggests how the book can be used for shorter courses. Course College algebra Time required 1 Semester, 3 hours 1 Quarter, 5 hours Chapter
1 (Omit 1-3, 1-5, and starred sections) 2 (Omit starred sections) 4 (Omit 4-1) 5 (Omit starred sections) 6 (Omit 6-1, 6-4, 6-5) 7 (Omit 7-1, 7-2, 7-3, 7-6, and starred sections) 8 9 (Omit starred section) 10 (Omit 10-4)

Development of the classical number systems Theory of equations

1 Semester, 3 hours 1 Quarter, 5 hours


1 Semester, 2 hours 1 Quarter, 3 hours

1 through 8 (Omit starred sections) 4 5 6 8 9 10 (Omit 4-1, 4-3, 4-5, 4-6) (Omit starred sections) (Omit 6-1, 6-4, 6-5) (Omit starred section) (Omit 10-3, 10-4)

Elementary theory of numbers

1 Semester, 2 hours 1 Quarter, 3 hours

1 (Omit starred sections) 2 (Omit starred sections) 4 5

Above all, this book represents an effort to show college students some of the real beauty of mathernatics. The appreciation of mathematical beauty is not like the enjoyment of literature, music, and other art forms. I t requires serious effort and hard study. I t is much more difficult for a mathematician to explain his triumphs and masterpieces than for any other kind of artist or scientist. Consequently, most mathematicains do not try to interpret their work to the general public, but only communicate with

viii

PREFACE

colleagues having similar interests. For this reason, a mathematician is often considered to be a rather aloof person who lives partly in this world and partly in some other mysterious realm. This is in fact a fairly accurate conception. However, the door to the world of mathematics is never locked, and anyone who will make the effort can enjoy the beauties of an intellectual domain which comes closer to aesthetic perfection than any other science. Acknowledgements. Writing a textbook is not a routine chore. Without the help of many people, we might never have finished this one. We are particularly indebted to Professors C. W. Curtis, R. A. Dean, and H. S. Zuckerman, who read most of the manuscript of this book, and gave us many valuable suggestions. Our publisher, Addison-Wesley, has watched over our work from beginning to end with remarkable patience and benevolence. The swift and expert typing of Mary Pierce is sincerely appreciated. Finally, we are grateful to many friends for sincere encouragement during the last two years, and especially to our wives, who have lived with us through these trying times.

Seattle, Washington January 1963

1-1 1-2 1-3 1-4 1-5 "1-6 *1-7 2-1 2-2 2-3 *2-4 2-5 *2-6 3-1 3-2 3-3 4-1 4-2 4-3 4 4 4-5 4-6 5-1 5-2 5-3 *5-4 *5-5 5-6 5-7 *5-8 6-1 6-2 6-3

Sets . . . . . . . . . . . . . . The cardinal number of a set . . . . . . . The construction of sets from given sets . . . . . . . . . . . . The algebra of sets Further algcbra of sets . General rules of operation Measures on sets . . . . . . . . . . . . . . Properties and esamples of measures Proof by induction . . . . . . . . The binomial theorem . . . . . . . Generalizations of the induction principlc . S h e technique of induction . . . . . Inductive properties of the natural numbcrs Inductive definitions . . . . . . .

. . . . . . . .
. . . . . . . .

.
. .

.
. .

.
. .

.
. .

. . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . The definition of numbers Operations with the natural numbers The ordcring of the natural numbers
Construction of the integers . . . . . . . . Rings Generalized sums and products Integral domains . . . . Thc ordering of the integers . Properties of order . . . .

.
.

.
.

.
.

.
.

.
.

.
.

.
.

.
.

. . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . S h e division algorithm Greatest common divisor . . . . . . . . . . . . . . . The fundamental theorem of arithmetic . . . . . . . . . . . More about primes Applications of the fundamental theorem of arithmetic . . . . . . . . . . . . . . Congruences . . . . . . . . . . . Linear congruences The theorems of Fermat and Euler . . . . . . .

. . . . . . . .

. . . . . . . .

. . . Basic properties of the rational numbcrs . . . . . . . . . . . . . Fields The characteristic of integral domains and fields . ix

. . . . . . . . . . . .

CONTENTS

6-4 6-5 7-1 7-2 7-3 7-4 7-5 7-6 "7-7 "7-8 *7-9 *7-10 8-1 8-2 8-3 8-4 9-1 9-2 9-3 9-4 9-5 9-6 9-7 9-8 "9-9 9-10 9-1 1 9-12 10-1 10-2 10-3 10-4

Equivalence relations . The construction of & .

. .

. .

. .

. .

. .
. .
.

. . . . . . . 213 . . . . . . . 218
. .
.

Development of the real numbers . . . . . . The coordinate line Dedekind cuts . . . . . . . . Construction of the real numbers The completeness of the real numbers Properties of complete ordered fields Infinite sequences . . . . . . Infinite series . . . . . . . Decimal representation . . . . Applications of decimal representations

. .
.

. .
.

. .
.

. .
.

. . . .
. .

. .

. .

. .

. .

. .

. .

. .

. .

.
. . .

.
. . .

.
. . .

.
. . .

.
. . .

.
. . .

.
. . .

.
. . .

. . . . . . . .
. . . . 286 . . . . 291 . 298 . . . . 303

The construction of the complesnumbers . . . Comples conjugates and the absolute value in C . The geometrical representation of complex numbers Polar representation . . . . . . . . . Algebraic equations . . . . . . . . . Polynomials . . . . . . . . . . . The division algorithm for polynomials . . . . Greatest common divisor in F[x] . . . . . . The unique factorization theorem for polynomials Derivatives . . . . . . . . . . . . The roots of a polynomial . . . . . . . The fundamental theorem of algebra . . . . The solution of third- and fourth-degree equations Graphs of real polynomials . . . . . . . Sturm's theorem . . . . . . . . . . Polynomials with rational coefficients . . . .

Polynomialsinseveralindeterminates . Systems of linear equations . . . . The algebra of matrices . . . . . The inverse of a square matrix . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

394 410 428 443

INTRODUCTION

As we explained in the preface, the purpose of this book is to exhibit a small, but significant and representative, part of the world of mathematics. The selection of a principal subject for this project poses difficulties similar to those which a blind man faces when he tries to discover the shape of an elephant by means of his "sense of feel." Only a few aspects of the subject are within reach, and it is necessary to exercise care to be sure the part examined is truly representative. We might select some important unifying concept of modern mathematics, such as the notion of a group, and explore the ramifications of this idea. Alternatively, an older and perhaps familiar topic can be examined in depth. I t is this last more conservative program which will be followed. We will study the principal number systems of mathematics and some of the theories related to them. An attempt will be made to answer the question "what are numbers?" in a way which meets the standards of logical precision demanded in modern mathematics. This program has certain dangers. Familiarity with ordinary numbers hides subtle difficulties which must be overcome before it is even possible to give an exact definition of them. Checking the details in the construction of the various number systems is often tedious, especially for a student who does not see the point of this effort. On the other hand, the end products of this work, the real and complex number systems, are objects of great usefulness and importance in mathematics. Moreover, the development of these systems offers an opportunity to exhibit a wide variety of mathematical techniques and ideas, so that the student is exposed to a representative cross section of mathematics. I t is customary in technical books to te11 the reader what he will need to know in order to understand the text. A typical description of such requirements in mathematical textbooks runs as follows: "This book has no particular prerequisites. However, the reader will need a certain amount of mathematical maturity. " Usually such a statement means that the book is written for graduate students and seasoned mathematicians. Our prerequisites for understanding this book are more modest. The reader should have successfully completed two years of high-school algebra and a year of geometry. The geometry, although not an absolute prerequisite, will be very helpful. For certain topics in the chapters on the complex numbers and the theory of equations, a knowledge of the rudiments of trigonometry is assumed. We do not expect that the reader will have much "mathe1

INTRODUCTION

matical maturity." Indeed, one of the main purposes of this book is to put the reader in touch with mature mathematics. Some of the obstacles which a beginning student of mathematics faces seem more formidable than they reafly are. With a little encouragement almost any intelligent person can become a better mathematician than he would imagine possible. The purpose of the remainder of this introduction is to provide some encouraging words on a variety of subjects. I t is hoped that our discussion will smooth the reader's way throughout the book. We suggest that this material be read quickly, then referred to later as it is needed. The number systems. There are five principal number systems in mathematics: the natural numbers: 1, 2, 3, 4, etc.; the integers: 0, 1, -1, 2, -2, 3, -3, etc.; the rational numbers: 0, 1, -1, +, -+,-$, -3, 3, -3, etc.; the real numbers:O, 1, *, -*, 4,-4, a, 3 - $'a,etc.;and the complez numbers: 0, 1, i, 1 G, T S ,etc. With the possible exception of the complex numbers, each of these systems should be familiar. Indeed, the study of these number systems is the principal subject of arithmetic courses in elementary school and of algebra courses in high school. Of course, the names of these systems may not be familiar. For exarnple, the integers are sometimes called whole numbers, and the rational numbers are often referred to as fractions. In this book the number systems will be considered at two levels. On the one hand, we will assume at least a superficial knowledge of numbers, and use them in examples from the first chapter on. On the other hand, Chapters 3, 4, 6, 7, and 8 each present a critica1 study of one of these systems. The reader has two alt,ernatives. He can either skim the material in these chapters, relying on the knowledge of numbers which he already possesses, or he can study these chapters in detail. The latter road is longer and more tedious, but it leads to a very solid foundation for advanced courses in mathematical analysis. Variables. I f a single event can be called the beginning of modern mathematics, then it may possibly be the introduction of variables as a systematic notational device. This innovation, due largely to the French inathematician Francois Vihta (1540-1603), occurred about 1590. Without variables, mathematics would not have progressed very far beyond what we now think of as its "beginnings." By using variables it is possible to express complicated properties of numbers in a very simple way. Basic laws of operation, such as

m, +

z+ y =y+z

and

x + (y+z) = (x-ky) i-2,

can be stated without using the variables x, y, and z, but the resulting statements lack the clarity of these algebraic identities. For example,

the statement that "the product of a number by the sum of two other ilumbers is equal to the sum of the product of the first number by the secoild with the product of the first number by the third" is more simply and clearly-expressed by the identity

More complicated laws would be almost impossible to stat*ewithout using variables. The reader who doubts this should try to express in words the relatively simple identity

The variables which are encountered in high-school algebra courses usually range over systems of numbers; that is, it is intended that these variables stand for real numbers, rational numbers, or perhaps only for integers. However, variable symbols are often useful in other contexts. For example, the symbols 1and m in the statement "if 1and m are two different nonparallel lines, then 1 and m have exactly one point in common" are variables, representing arbitrary lines in a plane. In this book variables will be used to denote many kinds of objects. However, in al1 cases, a variable is a symbol which represents an unspecified member of some definite collection of objects, such as numbers, points, or lines. The given collection is called the range of the variable, and a particular object in the range is called a value of the variable. The notations used for variables in mathematical literature often puzzle students. In the simplest cases, the letters of the alphabet are used as variable symbols. However, some mathematical statements involve a very large number of variables, and in some cases, even infinitely many. To accommodate the need for many variables, letter symbols with subscripts are usually employed, for example, xl, x2, x7, y3, 215, a2, b7, etc. Sometimes double subscripts are more convenient than single ones. Thus, , ~ etc. ~ Variable symbols are we find expressions such as x l , ~~, 7 x2~,52, often used to denote a subscript on a variable letter. For instance xi, y,, ak, zi,j, etc. In these cases, the variable subscript is usually assumed to stand for a natural number, or possibly an integer. Mathematical language. One of the difficulties in learning mathematics is the language barrier. Not only must the student master many new concepts and the riames of these concepts, but he must also learn numerous abbreviations and symbols for common words. Except for the use of abbreviations, the grammar of mathematics is the same as that of the language in which it is written.

INTRODUCTION

A sentence in mathematical writing is any expression which is a meaningful assertion, either true or false. According to this definition, such formulas as 2.'2 - 2, 1=2, 1 1<3-2, and 0=0

must be counted as sentences. Sentences may contain variables. Por example, the statement "There is a real number x such that xlOO25x7 500 = O" is an assertion which is either true or false, 57~5~ although it is not obvious which is the case.* There are other expressions of importance in mathematics which cannot be called sentences. These are formulas, such as

,x

+ zj = 1,

x2

+ 2x + 1 = 0,

and

> 2,

and expressions which have the form of sentences, except that variables occur in place of the subject or object; for example, "x is an integer" and "2 divides n." Expressions such as these are called sentential functions. They have the property that substituting numerical values (or whatever objects the variables represent) for the variables converts them into sentences. For instance, by suitable substitutions for x and y, the sentential function x y = 1 is transformed into the sentences

I t makes no sense to ask whether or not a formula such as x y = 1 is true. For some values of x and y it is true; for others it is not. On the other hand, the formula x y=y x has the property that every substitution of numbers for x and y leads to a true sentence. Such a sentential function is usually called an identity. Sentential functions which are not formulas may also have the property of being true for al1 values of the variables occurring in them. For example, the statement x" has this property of universal validity, provided "either x < y, or y that it is understood that x and y are variables which range over real numbers. A sentential function which is true for al1 values of the variables in it is said to be identically true or identically valid (the adjective "identically is sometimes omitted) . Implications. Many beginning students of mathematics have trouble understanding the idea of logical implication. As many as one-half of al1 statements in a mathematical proof may be implications, that is, of the form " p implies q," where p and q are sentences or sentential functions.

<

j7

* It is true.

ISTRODUCTION

p irnplies q q is implicd by p

If p, then q q if P p only if q only if q is p p is a sufricient condition for q i t is suficient t h a t p


q is a necessary condition

x positive implies t h a t x is nonnrgative x noniicgativc is implicd by x bcing positive if x is positive, then x is nonnegative x is nonnegative if x is positive x is positive only if x is nonnegative only if x is nonnepative is x positive

x positive is a sufficicnt condition for x


t o bc nonnegative for x t o be norinegative, i t is sufficient t h a t x be positive x nonnegative is a necessary condition for x to be positive for x t o be positive, i t is nriessary t h a t z be nonncgative

l
1
I

for p i t is necessary t h a t p

Icor this reasoii, it is important to be able to recogiiize a11 implicatioii, and to undcrstand what it means. Thc variety of ways in which mathcmaticians say "p implies q" is ofteri bcwildcring to studeiits. The expressions "x = 1 iniplics x is an iiiteger"; "if x = 1, thcn x is an integer "; (.'E = 1 orily if x is an integcr "; ".r = 1 is a siificierit conditioii for x to be an iiitcgcr "; aiid "for .-c to eyual 1, it is neccssary that x be ari iiiteger" al1 have thc samc mcaiiiilg. Such statements as thcsc occur rcpeatcdly in aiiy book or paper oii mathcmatics. For thc readcr's coilveiiience, TVC list in Table 1 some of the forms in which "p implies q" may be written, togcthcr with cxamples of thesc locutions. If p and q are both sciltences, then the implicatioii "p implics q" is a sentence; if cither p, or q, or both p arid q are sciitential functions, theii "p implies q" is a sentential fuiictioii. Iii case " p implies q" is a seiitence, then its truth is completely dctcrmiiicd by the truth or falsity of p aiid q. Specifically, this implicatioii is truc cither if p is false or q is true. I t is false only if p is true and q is false. I;or cxample, "3 = 3 implies 1 < 3" is triie, "3 = 2 implies 1 < 2 " is truc, " 3 = 1 implies 1 < 1 " is truc, but "3 = 3 implics 1 < 1 " is falsc. I t may seem strange to coiisidcr a scn tcilce "p implies q" to bc true eveii though there is iio apparent conilectioil 1)etwceii p aiid q. The idea which C O ~ V C Y Sis that thc validity of the the statement "p implies q" 11s~a11y

INTRODUCTION

TABLE2
Form
p is equivalent to q p if and only if q p is a necessary and sufficient condition for q p implies q, and conversely

Example

+1 y+1 +1 y+1 y is a necessary and sufficient condition for x + 1 y+1 x y implies x + 1 y + 1, and conx x x
= y

is equivalent to x

y if and only if x
=

versely

sentence q is somehow a consequence of the truth of p. I t is hard to see how the truth of such an implication as "3 = 1 implies 1 < 1" fits this conception. Our convention concerning the truth of an implication becomes more understandable when we consider how a sentence of the form "p implies q" may be obtained from a sentential function by substitution of numerical values for the variables. For example, the implication "y 2 = x implies y < x" is a sentential function which everyone would agree is identically valid. That is, it is true for al1 values of x and y. However, by substituting 1 for x and 1 for y, we obtain the sentence "3 = 1 implies 1 < 1," whose truth was previously admitted only with reluctance. Converse and equivalence. From any statements p and q it is possible to form two different implications, "p implies q" and "q implies p." Each of these implications is called the converse of the other. An implication does not ordinarily have the same meaning as its converse. For example, the converse of the statement "if n > O, then n2 > 0 " is the implication "if n2 > O, then n > O." These assertions obviously have different meanings. In fact, the first statement is identically true, whereas the second statement is not true for al1 n ; for example, ( - I ) ~ = 1 > O and -1 < O. I f the implication "p implies q" and its converse "q implies p" are both true, then the statements p and q are said to be equivalent. In practice, the notion of equivalence of p and q is most frequently applied when p and q are sentential functions. For example, if x and y are variables which range over numbers, then the formulas x = y 1=y 1 are equivalent, since "x = y implies x 1=y 1" and x and "x 1= y 1 implies x = y" are identically valid. There are various ways of saying that two statements p and q are equivalent. Most of these forms are derived from the terminology for implications. Severa1 examples are given in Table 2.

+ +

+ +

INTRODUCTION

TABLE3
P
Q

not p false false true true

not q false true

p implies q

not q implies not p true false true true

true true false false

true false true false

f alse
true

true false true true

Contrapositive and inverse. In addition to the implication "p implies q" and its converse "q implies p," two other implications can be forrned using p and q. These are "not q implies not p" and "not p implies not q." The implication "not q implies not p" is called the contrapositive of "p implies q," while "not p implies not q" is called the inverse of ('p implies q." For example, the contrapositive of the statement "if x = 1, then x is an integer " is the implication "if x is not an integer, then x is not equal to l." I t is easy to see that the contrapositive of "p implies q" is true under exactly the same circumstances that this implication is itself true. The most convincing way to demonstrate this fact is to make a table listing al1 of the possible combinations of truth values of any two sentences p and q, together with the corresponding truth or falsity of " p implies q" and its contrapositive (Table 3). The entries in the fifth column of Table 3 are determined by the combinations of true and false in the first two columns, while the entries of the last column are determined from the combinations which occur in the third and fourth columns. Of course, the entries of the third column are just the opposite of those in the first column, and a similar relation exists between the fourth and second columns. The fact that an implication is logically the same as its contrapositive is often very useful in mathematical proofs. Sometimes, rather than proving a statement of the form "p implies q," it is easier to prove the contrapositive "not q implies not p." This is logically acceptable. Also, if we wish to prove that p and q are equivalent, that is, " p implies q" and "q implies p " are valid, it is permissible to establish that "p implies q" and "not p implies not q. " This is because "not p implies not q" is the contrapositive of "q implies p. " However, beware ; it is not correct to claim that if p implies q and not q implies not p, then p is equivalent to q. DeJinitions. Simple mathematical proofs often consist of nothing more than showing that the conditions of some definition are satisfied. Kevertheless, beginning students frequently find such arguments difficult to understand. Consider, for example, the problem of showing that 222 is

INTRODUCTION

From this we can infer such formulas as

The logical structure of a mathematical proof may have one of two forms. The direct proof starts from certain axioms or definitions, and proceeds by application of logical rules to the required conclusion. The second method, the so-called indirect proof, is perhaps less familiar, even though it is often used unconsciously in everyday thinking. The indirect proof begins by assuming "hat the statement to be proved is false. Then, using this assumption, together with the appropriate axioms and definitions, a contradiction of some kind is obtained by means of a logical argument. From this contradiction it is inferred that the statement originally assumed to be false must actually be true. For example, let us show by an indirect proof that there is no largest natural iiumber. This proof uses three general properties of numbers, which, for our purposes can be considered as axioms: 1 is a natural iiumber; (a) if n is a natural iiumber, theii n 1; (b) n < n (c) if n < m, then n 2 m is impossible. Our indirect proof begins with the assumption that the statement to be proved is false, that is, we assume that there is a largest natural number. Let this number be deiioted by n. To say that n is the largest natural number means two things: (i) n is a natural number; (ii) if m is a natural number, then n m. Applying the rule of detachment to (a) aiid (i) gives (iii) n 1 is a natural number. Substituting n 1 for m in (ii), we obtain (iv) if n 1 is a natural number, then n n 1. The rule of detachment can nolv be applied to (iii) and (iv) to conclude that (v) n 2 n 1. 1 for m iii (c) gives However, substituting n (vi) if n < n 1, then n n 1 is impossible. This, together with (b) and the rule of detachmeiit yields (vii) n 2 n 1 is impossible. The statements (v) and (vii) provide the contradiction which completes this typical indirect proof. In spite of the elementary character of the logic used by mathematicians, it is a matter of experience that understanding proofs is the most difficult aspect of mathematics. Most people, mathematiciaiis included, must work hard to follow a difficult proof. The statements follow each other relent-

>

+ + + + + +

> +

> +

10

INTRODUCTION

lessly. Each step requires logical justification, which may not be easy to find. The result of this labor is only the beginning. After the step-by-step correctness of t,he argument has been checked, it is necessary to go on and find the mathematical ideas behind the proof. Truly, real mathematics is not easy.

CHAPTER 1

SET THEORY
1-1 Sets. The notion of a set enters into al1 branches of modern mathematics. Algebra, analysis, and geometry borrow freely from elementary set theory and its terminology. Indeed, al1 of mathematics can be founded on the theory of sets. As is to be expected, an idea with such a wide range of application is quite simple, and any intelligent person can learn enough about set theory for most useful applications of the subject. The central idea of set theory is that of dealing with a collection of objects as an individual thing. Mathematics is not alone in using this idea, and many occurrences of it are found in everyday experience. Thus, for example, one speaks of the Smith family, meaning the collection of people consisting of John Smith, his wife Mary, and their son William. Also, if we referred to Mrs. Smith's wardrobe, we would be treating as a single thing the collection of individual pieces of clothing belonging to Mrs. Smith. The mathematical use of this device of lumping things together into a single entity differs from common usage only in the frequency and systematic manner of its application.

DEFINITION 1-1.1. A set is an entity consisting of a collection of objects. *


Two sets are considered to be the same if they contain exactly the same objects. When this is the case, we say that the sets are equal. The objects belonging to a set are called the elements of the set. A set is usually determined by some property which the elements of the set have in common. In the example given above, the property of being a. piece of clothing belonging to Mrs. Smith defines the set which we cal1 Mrs. Smith's wardrobe. I t should be emphasized that in thinking of a collection of objects as a set, no account is takeil of the arrangement of the objects or any relations between them. Thus, for example, a deck of

* This statement cannot be considered as a mathematical definition of the terni "set." I n mathematics, a definition is supposed to completely identify the object being defined. Here we have only supplied the synonym "collection" for the less familiar term "set," The problem of finding a satisfactory mathematical definition is far more difficult than it might seem. The uncritical use of sets can lead to contradictions which are avoided only by imposing restrictions on the naive concept of a set, Finding a definition of "set" which is free from contradictions and which satisfies al1 mathematical needs has for 75 years been a central problem of the logical foundations of mathematics. Fortunately, these difficult aspects of set theory can be ignored in almost al1 mathematical applications of the theory.
11

12

SET THEORY

[CHAP.1

52 cards, considered as a set, remains the same whether it is in its original package or is shuffled and distributed into four bridge hands.

x2 = o. EXAMPLE 4. The set of numbers a on the real line (Fig. 1-1) which satisfy -1 5 a 5 l.

EXAMPLE 1. The set consisting of the numbers O and 1. EXAMPLE 2. The set of numbers which are roots of the equation x2 - x = 0. EXAMPLE 3. The set of numbers which are roots of the equation " x

EXAMPLE 5. The set of al1 numbers x/2, where x is a real number which satisfies -2 5 x 5 2. EXAMPLE 6. The set of al1 points a t a distance less than one from a point p in some plane. EXAMPLE 7. The set of al1 points inside a circle of radius one with center a t the point p in the plane of the preceding example. EXAMPLE 8. The set of al1 circles with center a t the point p in the plane of Example 6. EXAMPLE 9. The set consisting of he single number O. EXAMPLE 10. The set which contains no objects whatsoever.

According to our definition of equality of sets, we see that the sets of Examples 1, 2, and 3 are the same. Although O occurs as a so-called double root of the equation x 3 - x 2 = O in Exaniple 3, only its presence or absence matters when speaking of the set of roots. The sets of Exanlples 4 and 5 are the same, as are those of Examples 6 arid 7. I f we consider a circle to be the same thing as the set of al1 of its poiiits, then the eleineiits of the set of Example 8 are themselves sets. Sets of sets will be studied more thoroughly in Section 1-5. The set described in Example 9 contains a single element. Such sets are quite common. I t is conventional to regard such a set as an eiltity which is different from the eleinent which is its only member. Even in ordinary conversation this distinctioii is often made. If Robert Brown is a bachelor with no known rclsttions, then we would say that the Brown family consists of one meniber, but we would not say that Mr. Brown consists of oiie nlember. The reader may feel that the set of Example 10 does not satisfy the description given in

1-11

SETS

13

Definition 1-1.1. However, it is customary in mathcmatics to interpret the term "collcction" i11 such a ivay that this ~iotion includes the collection of no objects. Actiially, thc sct containing no elements arises quite naturally in many situations. For instante, in consideriiig thc scts of real numbers which are roots of algebraic equatioris, it ~voilldbe awkjvard to makc a special rase for equations like x2 1 = O, ivhich has no real roots. 13ecause of its importailcc, thc set coiitaiilirig no elemcilts has a special namc, thc empty set, and it is rcprcscntcd by a special symbol, a. When it is neccssary to cal1 atteiltioii to the fuct that a set A is not the empty set, then ve will say that A is nonempty. Oiie reason that set theory is used in so many brailches of mathcmatics is thc versatility of its notation. As aiiyoiie ~ v h o has studied elcmentary algebra might expect, thc, letters of the alphabet are used to rcpresent sets. 111 this book, sets will be represeiitcd by capital letters, and the elements of sets ~ i l usually l bc represeiitcd by small letters. Thc statemeiit that an object a is an elemeiit of a set A is symbolized by

We rcad thc cxpression a E A as "a is in A," or sometimes, "a in A." To give a specific example, lct, A be t,he set of roots of the equation x2 - x = O (Example 2). Theii

IVe often wish to cxpress the fact t,hat an object is not contained in a certaiii set. If a is not an element of t,he set A, \ve writc

and rcad this cxpressioii "a is iiot in A . " Thus if A again is the set of Examplc 2, wc would havc 2 4 A, 3 4 A, 4 4 A, etc. I t \vas mentioned earlier that a set is ofteri dcfincd by some property possessed by its elements. There is a very uscful ilotatioiial device iri set theory which gives a standard method of symbolizing thc sct of al1 objects having a certain property. For instaiice, the sets of Kxamples 2 and 4 are respcct ivcly writtcn {2-~z2-x=0], and {al-l5a<l).

The symbolic form (*I*] is sometimcs called the set builder. In using it, we replace ttie first asterisk by a variable elemcnt symbol (x aild a in the examples), aiid the second astcrisk is rcplaced by a meaningful condition which the object represented by thc variable miist satisfy to be an element of the set (x2 - n: = O and - 1 5 a 5 1 in the examplcs). Thus, the

SET THEORY

set builder occurs in the form (xlcondition on x) (or with some variable other than x), and this expression represents the set of al1 elements which satisfy the stated condition. Often, the totality of possible objects for which the variable stands is evident from the condition required of the variable. For example, if the real roots of algebraic equations are under discussion, then in (x/x2 - x = O), it is clear that x stands for a real number, and that the set consists of the real numbers which satisfy x2 - x = O . In {al-1 5 a 5 11, it may not be clear what kind of numbers are allowed as values of the variable. If it is necessary to be more explicit, we would write

where R is the set of al1 real numbers. Similarly, in Example 6, the set builder notation would be

where P is the set of al1 points in some plane and d(p, q) is the distance between points p and q in the plane. Here, the variable q can take as its value any point in the plane P, and the set in question consists of those points in P which satisfy d(p, q) < l. Other forms of the set builder notation for this example are

I t is often convenient to use general symbols or expressions in place of the variable element in the set builder notation. For example, (x21x E R ) , where R is the set of al1 real numbers, is the set of al1 squares of real numbers; (x/ylx E N, y E N), where N is the set of al1 natural numbers, is the set of al1 positive fractions. A variation of the set builder notation can be used to denote sets which contain only a few elements. This coilsists of listing al1 of the elements of the set between braces. For exalnple, the sets of Examples 1 and 8 would be written (0,l) and (01, respectively. I t is sometimes convenient to repeat the same element one or more times in the notation for the set. Thus, in Example 3, we might first write the set of roots of x3 - .r2 = x x (x - 1) = O as (O, 0, 1) , since O is a double root. Of course, by the definition of equality, {OJO,1) = 0 l . The notation (0, O, 1) conveys no more iiiformation about

1-11

SETS

15

xs - x2 = O than {O, 1) does. Similarly, (a, b, a, a, b) represeiits the same set a s {a, b). There is aiiother good reason for allowing repetitioii of one or more elements in the notation for a set. Consider the example {a, b] . Wc can think of this as the set whose members are the lettcrs a and b. 111 mathematical applications, however, it is often coiivenient to regard {a, b) as the sct containing variable quantities a and b. As siich, it would become a specific sct if particular values where substituted for a arid b. E'or example, if we allow natural numbers to be substituted for a and b, then each choice of valucs for a and b determines a set whose members are these selected natural riumbers. In this example, a and b may take on the samc value. For instai~ce,if a = 1, b = 1, t h e i ~{a, b) = (1, 1)' = (1). If we did not allow repetition of the elements in designsting sets, the collection of sets (a, b) dctermined by substituting values for a and b would be considcrably more difficult t o describe. This difficiilty would be increased iii more complicated examples. Scts containing many elements can often be represented by listing some of the elements between braces aild using a sequence of dots to indicate omitted elemciits. lior examplc, it is clear that
{1, 2, 3,

. . . , 2165)

represents thc set of al1 natural numbers from 1 t o 2165. Some infinite sets caii also he represented in this way. For example,

denotes the set of al1 natural iiumbers. DEFISITIOS 1-1.2. A set A is called a subset of the set R (or A is included i n R) if every element of A is an element of B. I t is customary t o express the fact that. A is a subset of R by writing A 2 B or B 2 A. Aily set A is a subset of itself, A A , according to this definition. If A 2 B, but A # R (that is, A is not thc same as the set R), then A is called a proper subset of B and iii this case ~ v c writc A C R or B > A . If A is iiot a subset of B, ~ v c writc A g R or B 2 A .

EXAMPLE 1l . The set of al1 even integers, 2 3 = (0, &2, &4, . . .), is a proper subsct of the sct Z of al1 intcgers. EXAMPLE 12. The set of a11 poirits a t distance Iess than one from a point p in a plane P is a proper subset of the set of points of P a t distance less than or p. equal to oiie f r o n ~

16

SET THEORY

EXAMPLE 13. {O, 1) C (O, 1, 2, 4). EXAMPLE 14. (a10 < a 5 1) C {a10 5 a EXAMPLE 15. Q, E A for every set A .

1).

The reader should carefully check to see that in each of these examples the condition of Definition 1-1.2 is satisfied. The fact that Q, is a subset of every set may seem straiige. However, it is certainly true according to our definitions: every element of is an element of A, or in other words, no element of can be found which is not in A. Since Q, has no elements, this condition is certainly satisfied. The inclusion relation has three properties which, although direct consequences of our definition, are quite important. The first of these has already been noted. THEOREM 1-1.3. For any sets A, B, and C, (a) A E A , (b) if A c B and B 5 A, then A = B, (c) if A E B and B S C, then A E C.

Proof. We will prove property (b) in detail, leaving the proof of (c) t o the reader. If A c B and B E A , then every element of A is an element of B and every element of B is an element of A. That is, A and B contain exactly the same elements. Thus, by Definition 1-1.1, A = B.
Certain sets occur so frequently in mathematical work that it is convenient to use particular symbols to designate them throughout any mathematical paper or book. An example is the practice of denoting the empty set by the symbol a. In this book, the number systems of mathematics, considered as sets, will occur repeatedly. We therefore adopt the f ollowing conventions :

N designates the set of al1 natural numbers : (1, 2, 3, . . .) ; Z designates the set of al1 integers: {. . . , -3, -2, -1,0, 1, 2, 3, . . .) ; Q designates the set of al1 rational numbers: (albja E 2, b E N) ; R designates the set of al1 real numbers.
This notation, though not universal, would be recognized by most modern mathematicians. Throughout this book, the letters N, 2 , Q, and R will not be used to denote any set other than the corresponding ones listed above. I n mathematical literature, a considerable amount of variation in notation can be found. The terminology and symbolism introduced in this section will be used in the remainder of this book, but it is by no means

1-11

SETS

17

universal. F o r t h e reader's coiivenience, \ve list some common alternative termii~ology . Set : class, ensemble, aggregate, collect ion. Element of a set: member of a set, poiiit of a set. E m p t y set : void set, vaciious set, null set, zero set. CP: o, A. a ~ A : A 3 a . a 4 A : ~ E ' A , ~ E A , A ~ ~ . (*]*) : (* : *), [*)*], [* : *l.

l. Using the set builder form, write expressions for the follon-ing sets. (a) The set of al1 even integers. (b) The set of al1 integers which are divisible by five. (c) The set of al1 integers which leave a remainder of one mhen divided by five. (d) The set of al1 rational numbers greater than five. (e) The set of al1 points in space which are inside a sphere with center a t the point p and radius r . (f) The set of solutions of the equation x3 - 2x2 - x 2 = 0. 2. Te11 in words what sets are represented by the following expressions. (a) {x E NIx > 10) (b) {x E QIX - 3 E N) ( 4 ( 5 , 6 , 7, . . -1 (4 {a, b, , Y, 2 ) (e) { X ~ X = y2 z2, y E R, z E R, x2 - y2 = (x - y)(x y)) 3. Describe the following sets by listing their elements. (a) {xIx2 = 1) (b) {x1x2 - 2x = 0) 1 = O ) (c) {x1x2 - 2x 4. List the following collections of sets. (a) The sets {a, b, c), where a, b, and c are natural numbers less than or equal to 3. (b) The sets {a2 a 1), where a is a natural number less than or equal to 5 . (c) The three element sets (a, b, c), where a, b, and c are integers between -2 and 4. 5. State al1 inclusion relations which exist between the folloing sets: N, 2, Q, R, the set of al1 even integers, {n(n = m2, m E Z), {zjx = y2, y E Q), { X ~ X = y2 - n, y E R, n E 2). 6. Prove Theorem 1-1.3(c). 7. Prove that if A B, B C C, then A C C, and if A C B and B G C, then A c C.

+ +

18

SET THEORY

[CHAP.

1-2 The cardinal number of a set. The simplest and most important classification of sets is given by the distinction between finite and infinite sets. Returning to the examples considered in Section 1-1, the set of Examples 1, 2, and 3, and the sets of Examples 8 and 9 are finite, while the sets of Examples 4 through 7 are infinite. Note that the empty set @ is considered to be finite. I t is not altogether easy to explain the difference between a finite and an infinite set, although almost everyone with some experieiice learns to distinguish finite from infinite sets. Roughly speaking, a finite set is either the empty set, or a set in which we can designate a first element, a second element, a third element, and so on, until at some stage we reach an nth element and find that there are no more left. Of course, the number n of elements in the set may be one, two, three, four, . . . , a million, or any natural number whatsoever. A set is said to be infinite if it is not finite, that is, if its elements cannot be counted. Examples of infinite sets are the set N of al1 natural numbers, the set Z of al1 integers, the set Q of al1 rational numbers, the set R of al1 real numbers, etc. These examples show that some of the most important sets encountered in mathematics are infinite. If A is a finite set, it is meaningful to speak of the number of elements of A. This number is called the cardinal number (or cardinality) of the set A. We will use the notation iA 1 to designate the cardinal number of A . As examples,

l(O,1,4)1=3.

I t was remarked above that the empty set 4> is regarded as being finite: Since @ has no elements, it is natural to say that the cardinality of (P is zero. Thus, in symbols, (@l = 0. There are many synonyms for the expression "cardinal number of A." Besides the term "cardinality of A," which we have already mentioned, one finds such expressions as "power of A " and "potency of A. " The notation IA( for the cardinal number of the set A is not universal either. The symbolism A is perhaps even more common (but difficult to print and type), and such expressions as card A or N(A) can also be found. These descriptions of finite and infinite sets, and of the cardinal number of a finite set are too vague to be called mathematical definitions. Moreover, we have not said anything about the cardinal numbers of sets which are not finite. The first man to systematically study the cardinal number concept for arbitrary sets (both finite and infinite) was Georg Cantor (1845-1918). His researches have had a profound influence on al1 aspects of modern mathematics. In the remainder of this section, we will examine one of Cantor's most important ideas, and see in particular how it enables as to explain the concept of a finite set in more exact terms.

1-21

THE CARDINAL KUMBER OF A SET

19

DEFINITION 1-2.1. A pairing between the elements of two sets A and B such that each element of A is matched with exactly one element of B, and each element of B is matched with exactly one element of A , is called a one-to-one correspondence between A and B.
The reader should study the following examples to be sure that he fully understands the meaning of the fundamental concept defined in Definition 1-2.1.

EXAMPLE 1. Let A = (1, 2, 31, and B = (a, b, e]. Then there are six possible one-to-one correspondences between -4 and B: 1 2 b 3 1 2 3 1 2 3 1 2 3 1 2 3 a 1 2 3

b c a

5 5 5
c a b

5 5 5
b a c

5 5 5
c b

5 5 5
c b a

EXAMPLE 2. It is impossible to obtain a one-to-one correspondence between the set A = (1, 2, 3) and the set B = (1, 2). No matter how we try to pair , and B, we find that more than one element of A must off the elements of 4 correspond to a single element of B. If 1 t , 1 and 2 t , 2, then 3 must correspond 1, the element 1 E B , 1, 2 t , 2, 3 to 1 or 2 in B. I n the correspondence 1 t is paired with both 1 E A and 3 E A, so that the correspondence is not one-toone. A similar situation occurs in al1 possible correspondences between A and B. The more general fact that there is no one-to-one correspondence between (1, 2, 3, . . . , m) and (1, 2, 3, . . . , n) if m < n is also true. This can be proved using the properties of the natural numbers, which will be discussed in the next two chapters. EXAMPLE 3. There is a one-to-one correspondence bet.c~-eenthe set Z = (. . . , -3, -2, -1, 0, 1, 2, 3, . . .) of al1 integers and the set N = (1, 2, 3, . . .) of al1 natural numbers. The elements of Z and N can be paired off as follows:

Note that in order to construct a one-to-one correspondence between Z and N, not al1 of the numbers of N can be paired with themselves. Otherwise, we would use up al1 of N and have nothing left to associate with O, -1, -2, . . . .

Usiiig Defiiiitioii 1-2.1, we caii clarify the notioil of a fiiiite set.

DEFINITION 1-2.2. Let A be a set and let n be a natural number. Then the cardinal iliimber of A is n if there is a one-to-oiie correspondence between A and the set (1, 2, 3, . . . , n), consisting of the first n natural

20

SET THEORY

[CHAP.

numbers. A set A is Jinite if A = @, or there is a natural number n such that the cardinal number of A is n. Otherwise A is called in$nite. This definition is no more than a careful restatement of the informal descriptions of finite and infinite sets, and their cardinal numbers, which were given a t the beginning of this chapter. The usual practice of writing a finite set in the form {al, az, a3,

, a,)

(without repetition)

exhibits the one-to-one correspondence between A and (1, 2, 3, . . . , n), namely, a l a 2 a3 . . . a,

Cantor observed that it is possible to say when two sets A and B have the same number of elements, without referring to the exact number of elements in A and B. This idea is illustrated in the following example. Suppose that in a certain mathematics class, every chair in the room is occupied and no students are standing. Then without counting the number of students and the number of chairs, it can be asserted that the number of students in the class is the same as the number of chairs in the room. The reason is obvious; there is a one-to-one correspondence between the set of al1 students in the class and the set of al1 chairs in the room.

DEFINITION 1-3.3. Two sets A and B are said to have the same cardinal number, or the same cardinality, or to be equivalent if there exists a one-to-one correspondence between A and B.
By Example 1, the two sets {1,2,3) and {a, b, c ) have the same cardinality. By Example 3, so do the sets N and 2. However, according to Example 2, the sets {1,2, 3) and (1,2) do not have the same cardinal number . In accordance with Definition 1-2.3, the existence of any one-to-one correspondence between A and B is enough to guarantee that A and B have the same cardinal number. As in Example 1, there may be many one-to-one correspondences between A and B.

EXAMPLE 4. Every set *4 is equivalent to itself, since a ++ a for a E A is obviously a one-to-one correspondence of A with itself. If A contains more than one element, then there are other ways of defining a one-to-one correspondence of A with itself. For example, let L4 = (1, 2). Then there are two one-to-one correspondences of A with itself : 1 * 1, 2 t , 2 and 1 * 2, 2 o 1. Any oneto-one correspondence of a set with itself is called a permutation of the set.

1-21

THE CARDINAL KUMBER OF A SET

21

If A = {al, a2, . . . , a,) and B = {bl, b2, . . . , b,) are finite sets which both have the cardinal number n, then there is a one-to-one correspondence between A and B :

so that A and B are equivalent in the sense of Definition 1-2.3. That is, if A and B are finite sets, then A and B are equivalent if 1 A 1 = 1 BI. The important fact to observe about Definition 1-2.3 is that it applies to infinite as well as finite sets. One of Cantor's most remarkable discoveries was that infinite sets can have different magnitudes, that is, in some sense, certain infinite sets are "bigger than" others. To appreciate this fact requires some work. Example 3 has already illustrated the fact that infinite sets which seem to have different magnitudes may in fact have the same cardinality. An even more striking example of this phenomenon is the following one.

EXAMPLE 5 . The set iV of natural numbers has the same cardinality as the set F = (m/nlm E N, n E N)
of al1 positive rational numbers. This can be seen with the aid of a dia.gram as in Fig. 1-2. By following the indicated path, each fraction will eventually f we number the fractions in the order that they are encountered, be passed. I which are equal to numbers which have skipping fractions like S, 2, S, and

2,

22

SET THEORY

[CHAP.

been ~reviouslypassed, 1%-e get the desired one-to-one correspondence between N and F:

Cantor showed that many important collections of numbers have the same cardinality as the set N of al1 natural numbers. For example, this is true for the set d of al1 real algebraic numbers, that is, real numbers r which are solutions of an equation of the form

where no, ni, . . . , nk-1, n k are any integers. The set A includes al1 rational numbers, since m / n is a root of the equation nx - m = O; A also includes numbers like

d2, 4%)4 3 , a, . . ..

Judging from these examples, one might guess that al1 infinite sets have the same cardinality. But this is not the case. Cantor proved that it is impossible to give a pairing between N and the set R of al1 real numbers. Later, we will be able to present Cantor's proof that these sets do not have the same cardinal numbers. The fact that the set A of real algebraic numbers has the same cardinality as N and the result that R and N do not have the same cardinal numbers together imply that R # A . That is, there are real numbers which are not solutions of any equation

with no, nl, . . . , nk-l, nk integers. This interesting fact is by no means evident. I t is fairly hard to exhibit such a real number, but Cantor's results immediately imply that they do exist. Although Cantor's work on the theory of sets was highly successful in many ways, it raised numerous new and difficult problems. One of these ranks among the three most famous unsolved problems in mathematics (the other two: the Fermat conjecture, which we will describe in Chapter 5 , and the Riemann hypothesis, urhich is too technical t o explain in this book.) Cantor posed the problem of urhether or not there is some set S of real numbers whose cardinality is different from both the cardinality of N and the cardinality of R. The conjecture that no such set S exists

1-21

THE CARDINAL NUMBER OF A SET

23

is known as the continuum hypothesis. I t was first suggested in 1878, and to date, it has been neither proved nor disproved. An infinite set is called denumerable if it has the same cardinality as the set N of al1 natural numbers. If S is denumerable, then it is possible to pair off the elements of S with the numbers 1 , 2 , 3 , . . . . Thus, the elements can be labeled a l , a2, a3, . . . , where a, is the symbol which stands for the element corresponding to the number n. Hence, if S is denumerable, then S can be written {al, a2, a3, . . .), with the elements of S listed in the form of a sequence. The converse statement is also true. That is, a set which can be designat,ed {al, a2, a3, . . .) is denumerable (or possibly finite, since distinct symbols might represent the same element of S). As we have shown in this section, the set of al1 integers and the set of al1 positive rational numbers are examples of denumerable sets. We conclude this section by listing for future reference the following important propert'ies of the equivalence of sets. (1-2.4). Let A, B, and C be arbitrary sets. Then (a) A is equivalent to A ; (b) if A is equivalent to B, then B is equivalent to A; and (c) if A is equivalent to B and B is equivalent to C, then A is equivalent to C. I t has already been noted in Example 4 that (a) is satisfied. Property (b) follows from the fact that the definition of a one-to-one correspondence is symmetric. That is, if A and B are interchanged in Definition 1-2.1, the definition says the same thing as before. Thus, a one-to-one correspondence between A and B is a one-to-one correspondence between B and A. The proof of (c) is left as an exercise for the reader (see Problem 8).

l. State which of the following sets are finite.

(a) {(x, Y,z>lxE (0, 1, 2)) Y E (3, 41, 2 E (0, 2,411 (b) ( X ~E XZ, x < 5 ) (e) {xlx E N, x2 - 3 = 0) (d) (x1x E Q, O < x < 1)

2. What is the cardinal number of the following finite sets? (b) {nln E 2, n2 5 86) (a) {nln E N, n < 1000) (d) (nln E N, n3 5 27) (e) (n21n E Z, n2 5 36) (e) {n31n.E N, n3 27) 3. Let A = (1, 2, 3, 4) and B = (a, b, c, d). List al1 one-to-one correspondences between -4 and B.

<

24

SET THEORY

[CHAP.

4. Using the method by which we proved that the positive rational numbers have the same cardinality as the set N of natural numbers, indicate how to prove that N has the same cardinal number as the set of al1 pairs (m, n,) of natural numbers. List the pairs which correspond to al1 numbers up to 21. 5. Prove that the set of al1 rational numbers Q has the same cardinality as the set of al1 natural numbers N. 6. Let A be the set of al1 positive real numbers x, and let B be the set of al1 real numbers y satisfying O < y < l. Show that the pairing x ++ y, where y = 1/(1 x) is a one-to-one correspondence between A and B. 7. Let A be a denumerable set, and let B be a finite set. Show that the set S = ((2, y)lx E A, y E B) is denumerable. 8. Suppose that sets A and B have the same cardinality, and that sets B and C have the same cardinality. Show that A and C have the same cardinality.

1-3 The construction of sets from given sets. I n this section, we will discuss two important methods of constructing sets from given sets. The first process combines two sets X and Y to obtain a set called the product of X and Y. The second construction leads from a single set X to another set called the power set of X. There are severa1 other methods of building sets from given sets, but they will not be considered in this book. The definition of the product of two sets is based on the concept of an ordered pair of elements. Suppose that a and b denote any objects whatsoever. I f the elements a and b are grouped together in a definite order (a, b), where a is the first elernent and b is the second element, then the resulting object (a, b) is called an ordered pair of elements. Two ordered pairs are the same if and only if they have the same first element and the same second element. Thus we arrive a t the following definition."

DEFINITION 1-3.1. (a, b)

(e, d) if and only if a

= c

and b = d.

EXAMPLE 1. Let A = (1, 2, 3). Then the following distinct ordered pairs of elements of A can be fornied: (1, l), (1, 2)) (1, 3)) (2, l ) , (2, 2), (2, 3), (3, l), (3, 2), (3, 3). Note that (a, b) = (b, a) only if a = b. By Definition 1-3.1, this is true in general. EXAMPLE 2. A man has tulo pairs of shoes, one brown pair and one black pair. If he dresses in the dark, what are the possible combinations of shoes

* There is a simple way to define an ordered pair in the framework of set theory, namely, for objects a and b let (a, b) = ({a, b), a). An ordered pair is then a definite object, and i t is possible to prove that (a, b) = (e, d) if and only if a = c and b = d. However, we will use the informal description given in the text and regard this property of ordered pairs as a definition.

1-31

THE CONSTRUCTION OF SETS FROM GIVEN SETS

25

which he can put on? Let X = {left brown shoe, left black shoe) and Y = {right brown shoe, right black shoe). Then the set of al1 possible combinations which the man might wear is the set of al1 ordered pairs with the first element taken from the set X and the second element taken from the set Y, that is, the set of al1 pairs (left brown shoe, right brown shoe), (left brown shoe, right black shoe), (left black shoe, right brown shoe), (left black shoe, right black shoe). EXAMPLE 3. The set of al1 ordered pairs of natural numbers is the set

S
Thus,

((n, m)(n E N, m E N ) .

DEFINITION 1-3.2. Let X and Y be sets. Then the product of the sets X and Y is t,he set of al1 ordered pairs (x, y), where x E X and y E Y. The product of X and Y is denoted by X X Y. Thus, in symbols X

{(x, y)lx E X ) y E Y).

EXAMPLE 4. The ordered pairs listed in Examples 1, 2, and 3 are exactly the elements of the products A X A, X X Y, and N X N, respectively.

5. Let U = (1, 2, 3), V = (1, 31, and P V = (2, 3). Then EXAMPLE V = i(1, l), (1, 3)) (2? 1)) (2, 3), (3, l), (3, 3)), and V X U = ((1, l), (1, 2), (1, 3), (3, l), (3, 2), (3, 3)). I t follows that U X V f V X U . Thus, in forming the product of two sets, the order in which the sets are taken is significant. We also have

ux

(U

{(O, 0, 2), ((1) 3), 2), ((3, 3)) 2)) ((1) l), 3), ((3, l ) , 3), ((3) 3), 3)),

((2) 1), 2), ((2, 3), 2)) ((3, 1), 2), ((1) 3), 3), ((2, 1), 3)) ((2, 3), 3),

Note that the elements of ( U X V ) X TV are different from al1 of the elements of U X ( V X TV). I n fact, the elements of ( U X V ) X T V are ordered pairs whose first element is an ordered pair of numbers, and the second element is a number. I n U X ( V X TV) i t is just the other way around: the elements are ordered pairs in which the first element is a number and the second element is an ordered pair of numbers. The reader must be careful to make a distinction between ((2, l), 3) and (2, (1, 3)), for example.

26

SET THEORY

[CHAP.

EXAMPLE 6. If X is any set, then X X @ = cP X X = cP. Indeed, since the empty set contains no element, there cannot be any ordered pair whose first or second element belongs to the empty set.

Even though U X V # V X U and (U x V) X W # U X (V x W) in Example 5, it is true that U X V is equivalent to V X U and ( U X V) x W is equivalent to U X (V X W), as we see by counting the elements in each of these sets. It is easy to prove that these results hold in general.

THEOREM 1-3.3. Let X, Y, and Z be sets. Then (a) X X Y is equivalent to Y X X, and (b) (X X Y) X Z is equivalent to X X (Y X 2 ) .
Proof. We will prove (a) and leave the proof of (b) as an exercise for the reader. According to Definition 1-2.3, we must show that there is a one-to-one correspondence between X X Y and Y X X. Every element of X X Y is an ordered pair (x, y), with x E X y E Y; every element of Y X X is an ordered pair (z, w), with z E Y and w E X. I f (x, y) E X X Y, then (y, x) E Y X X, so that (x, y) can be matched with (y, x). (y, x) is the desired one-to-one correspondence The pairing (x, y) between X X Y and Y x X.

The definition of the product of two sets can be generalized to a finite collection of sets X1, X B , . . . , Xn. The product of these sets, deiloted by X1 x Xz X . X X,, is the set of al1 ordered strings of elements - (xl, x2, . . . , x,), where xi E Xi for i = 1, 2, . . . , n. For example, if n = 3, then

EXAMPLE 7. Let U, V, and TV be the sets defined in Example 5, that is = (1, 2, 31, V = (1, 3), and TV = (2, 3). Then

I t is possible to generalize Theorem 1-3.3 to products of finite collections of sets (see Problems 6, 7, and 8). We turn now to a second method of obtaining a new set from a given set X. DEFIXITION 1-3.4. Let X be any set. The set of al1 subsets of X is called the power set of X, and is denoted by P(X).

1-31

THE CONSTRUCTION OF SETS FROM GIVEN SETS

27

Thus, the elements of P(X) are precisely the subsets of X. In particular, E P ( X ) a n d X E P(X).

EXAMPLE 8. If X EXAMPLE '9. If X EXAMPLE 10. If X

=
= =

a , then P(X)

(a).
=

(a), then P(X)

(a, (a}).
=

(a, b), then P(X)

(a, (a), (b), (a, b)) .

If X is an infinite set, then it has infinitely many distinct subsets. That is, P(X) is infinite if X is infinite. In fact, if x E X, then {x) E P(X), so that P(X) contains a t least as many elements as X. Suppose that X is a finite set. Let X1 be a set which is obtained by adjoining to X a new element a which is not in X. That is, the elements of X1 are al1 of the elements of X, together with the new element a. Then every subset of X1 either does not contain a and is therefore a subset A of X, or else it contains a and is therefore obtained from a subset A of X by adjoining the element a to A. Thus, every subset A of X gives rise to two distinct subsets of X1, the set A itself and the set Al obtained by adjoining a to A. Note that al1 of the sets so constructed are different. That is, A # Al, and if A # B, then A # Bl, Al # B, and Al Z B1. Therefore, there are just twice as many subsets of X1 as there are subsets of X. That is, 1 P(X1)1 = 21P(X) 1. Starting with the empty set @ (for which IP(X)I = 1), it is possible to add elements one by one, doubling the cardinality of the resulting power set each time an element is added, until a set X containing n elements is obtained. Our reasoning shows that the power set of X will contain 2" elements.
THEOREM 1-3.5.

I f

1 x 1 = n, then IP(X)I = 2".

There is another way to prove this theorem which is worth examining, since it gives additional information about the number of subsets of a finite set. Let X consist of the distinct elements al, a,, . . . , a,. With each ak in X, associate a symbol xk, and consider the formal product

I f this expression is multiplied out, the result is a sum of distinct products of x'a (except the first term, which is 1). There is exactly one such product for every subset {a,,, a,,, . . . , a,,) of {al, a,, . . . , a,), namely x,,x,, . . . x,,. The empty set corresponds to l. For instance, if

28

SET THEORY

Kow replace each x k by the symbol t. Then the product becomes (1 t)(l t) . . . (1 t), while in its expansion, al1 products corresponding to sets containing the same number j of elements become t'. In the example, X = {ai, a2, as), we obtain (1 t)3 = 1 t t 4-t t2 t2 t2 t3 = 1 3t 3t2 t3. As in this example, al1 of the terms t j can be collected into a single expression of the form Nj,,tj, where Nj,, is precisely the number of subsets of X which have cardinality j. Therefore

+ + +

+ +

+ +

We can specialize even more by letting t have the value l . Then the identity (1-1) becomes

The sum 011 the right-hand side of this identity represents the number of subsets containing no elements of X (the empty set), plus the number of subsets containing one element of X, plus the number of subsets containing two elements of X, and so on, until we reach N,,,, the number of subsets containing n elements of X. Clearly, this sum is just the total number of subset,s of X, since every subset contains some number of elements of X between zero and n. Thus, we have arrived a t the same conclusion as before : there are exactly 2" subsets of a set X with n elements. By using the binomial theorem of algebra (see Section 2-2) to expand (1 t),, it is possible to squeeze more information from identity (1-1). We get

The coefficient of t j is

*
-

n(n - l ) . . . (n j!

j + 1) -

n! j!(n
-

j)!

* An exclamation mark (!) following a natural number n denotes the number obtained by multiplying together al1 the numbers from 1 to n. For example, l! = 1, 2! = 1 . 2 = 2, 3! = 1 . 2 . 3 = 6, 4! = 1 - 2 . 3 - 4 = 24. I t i s a l s o customary to define O! to be 1. With this convention, the formulas for the binomial coefficients are correct in the cases j = O and j = n.

1-31

THE CONSTRUCTION OF SETS FROM GIVEX SETS

29

Comparing the identities (1-1) and (1-3) we see that

for al1 numbers t. This leads us to expect that the coefficients of the same powers of t on each side of the equation must be equal. That is, No,, = 1, Ni,, = n, Nz,, = n(n - 1)/2,. . . , N,,, = 1, and in general

Later we will be able to prove that if

for al1 values of t, then a. = bo, a l = bl, . . . , a, = b,. This will justify (1-4). Thus, our somewhat longer proof of Theorem 1-3.5 yields the interesting fact that in a set X containing n distinct elements, there are n!/j!(n - j) ! different subsets containing exactly j elements. For example, in a set containing ten elements, there are 10!/4!6! = 210 subsets of cardinality 4.

1. List the elements of the sets A X B, B X A, ( A X B) X C, and A X (B X C), where A = (x,y, 2 , w), B = (1, 2)) and C = ( a ) . 2. Prove Theorem 1-3.3(b). 3. Let U = (1, 2). Prove that U X N is equivalent to N, where N is the set of al1 natural numbers. 4. Prove that if U is a finite nonempty set and V is a denumerable set, then U X V is equivalent to V. 5 . Prove that if U and V are denumerable sets, then the following sets are equivalent: U, V, U X V, V X U. 6. State the generalization of Theorem 1-3.3 for a finite collection of sets

X1) X2) , Xn. 7. Prove that U X V X W is equivalent to ( U X V) X TV, where U, V , and W are arbitrary sets. 8. Prove that the following sets are equivalent: U X V X W, U X W XV, V X U X W , V X W X u , w x U X V , W X V X U. 9. For any set X, define
n factors
7-7

xn = X X X X . . . X X ,
where n is a natural number. Show that if X = (1, 2), and Y then IXnJ = J P ( Y ) I .
=

(1, 2,

. . . , n),

30

SET THEORY

[CHAP.

10. List the elements of P(X), where X = {1, 2, 3, 4). 11. Let X be a set with 7 objects. (a) How many subsets of X of cardinality a t most three are there? (b) How many subsets of X of cardinality a t least three are there? 12. (a) Let t = -1 in equation (1-1) and interpret the meaning of the resulting identity. (b) What is the number of subsets of even cardinality of a set containing n elements? 13. Show that if the sets A and B have the same cardinality, then so do P ( A ) and P(B). 14. Cantor proved that if X is an infinite set, then X and P(X) do not have the same cardinal number, that is, it is impossible to give a one-to-one correspondence between the elements of X and the elements of P(X). Prove this fact. [Hint: Suppose that a o A is such a correspondence. Let

(aja E X , a -

A a n d a G? A ) .

Show that if b ++ B, then both b E B and b 4 B.]

1-4 The algebra of sets. The ordinary number systems satisfy severa1 b = b a , a ( b . c) = ( a . b ) c important laws of operation, such as a and a (b c) = a b a c. There are also natural operations of combining sets which satisfy rules analogous to these identities. Moderii algebra is largely eoncerned with systems which satisfy various laws of operation, so it is natural that the algebra of sets should be a part of this subject. Our objective in this section is to study the principal operating rules for sets. The first two basic operations of set theory are analogous to addition and multiplication of numbers. They are binary operations, that is, they are performed on a pair of sets to obtain a new set.

DEFIXITION 1-4.1. Let A and B be sets. Theil A UB A nB


= =

(xlx E A o r x E B ) , (XIXE Aandx E B ) .

The set A U B is called the union (or join or set s u m ) of A and B. The set A n B is called the intersection (or meet or set product) of A and B. As we pointed out in the Introductioii, the word "or" in mathematics is interpreted in the inclusive sense, so that the statement "x E A or x E B" includes the case where z is in both A and B. Thus, the union of A and B contains those elements which are in A, or in B, or in both A and B. The intersectioii of 4 , and B contaii~s those elements which are in both A and B.

1-41

THE ALGEBRA OF SETS

31

These sets can be illustrated by means of simple pictures called Venn diagrams. The elements of the sets are represented by the points inside a closed curve in the plane. It should be emphasized that these diagrams are only symbolic, and that the elements of the sets which they represent are not necessarily points in the plane, but can be any objects whatsoever. In Fig. 1-3, the total shaded area is A U B and the doubly shaded area is A n B.
AnB

EXAMPLE 1. (1, 3, 4 , 5, 7 ) U (2, 3, 6 ) EXAMPLE 2. ( 1 , 3, 4 , 5, 7 ) EXAMPLE 3. EXAMPLE 4 . (a10 EXAMPLE 5 . (a10

= =

( 1 , 2, 3, 4 , 5, 6 , 7 ) .

n (1, 3, 4, 5, 7 ) n
<a< <a<
1)

( 2 , 3, 6 ) (2, 6 )

(3).
(a10

= @. =

1) U {O, 1 )

< a 2 1).
=

(al$

< a < 2)

{al$

<a<

1).

EXAMPLE 6 . If *4 is the set of al1 points of a line through the point p and if B is the set of al1 points of a second (different) line through p, then A l B = ( p ) .

In most mathematical applications of set theory, al1 of the sets under consideration will be subsets of some particular set X. This set, called the universal set, may be different for different problems, but it will usually be fixed throughout any discussion. For the purposes of developing the algebra of sets, we will fix a universal set X once and for all. Al1 of the sets under consideration are assumed to be subsets of X. The third basic operation of set theory is analogous to forming the negative of a number. I t is a unary operation, that is, it is performed on a single set to obtain a new set.

DEFINITION 1-4.2. Let A be a subset of the set X. Then

Ac

= (X~X E

X , x g A).

The set A" is called the complement of A in X (or simply the complement of A if it is understood that A is being considered as a subset of the universal set X).

32

SET THEORY

[CHAP.1

Thus, A" consists of those elements of X which are ilot elements of A. There are many different notations in mathematical literature for the complement of a set A. Some which the reader may encounter are A', A, C(A), and c(A). In Fig. 1-4, the shaded area represents AC.

EXAMPLE 7. Let X EXAMPLE 8. Let X

= =

(1, 2, 3, 4, 5). Then (1, 3)"


(1, 2, 3). Then (1, 3) "
=

(2, 4, 5).

(2).

EXAMPLE 9. Let X be the set of al1 real numbers. (ala 2 o>

Then {ala

<

O)"

THEOREM 1-4.3. Let A, B, a n d C be subsets of X. Then thefollowing identities are satisfied: (a) (b) (c) (d) A A A A A (e) A (f) A (g) A uB=BuA,AnB=BnA; u (B u C) = (A u B) u C, A n (B n C ) u A = A , A n A = A; n (B u C ) = (A n B ) u (A n C ) , u(BnC)= (AuB)n(AuC); U A " = X, A n A " = iP; uX=X,An@=@; u @ = A , A n X = A. (A

n B) n C ;

Proof. Al1 of these identities are simple consequences of the definition of union, intersection, and complement. We will illustrate this assertion by giving the detailed proofs of (b) and (d). The remaining identities are left for the reader to check. Suppose that x E A U (B U C). Then according to Definition 1-4.1, either x E A or x E B U C. Suppose that x E A. Then by Definition 1-4.1 again, x E A U B. Again, by 1-4.1, x E (A U B) U C. On the other hand, if x E B U C, then either x E B or x E C. I f x E B, then x E A U B, and consequeiitly x E (A U B) U C. I f x E C, then we conclude immediately that x E (A U B) U C. Thus, in every case, if x E A U (B U C),

1-41

THE ALGEBRA O F SETS

33

theii x E (A U B) U C. By Definitioii 1-1.2, this means that A U (B U C) E (A U B) U C. A similar argument shows that (A U B) U C 2 A u (B u C). Hence, A U (B U C) = (A U B) U C. This proves the first half of (b). If x E A n (B n C), then by Definition 1-4.1, x E A and x E B n C. Thus, x E A, x E B, and x E C. Consequently, x E A n B and x E C. Therefore x E (&4 n B) n C. Hence, A n (B n C) G (A n B) n C. Similarly, (A n B) n C G A n (B n C). This shows that A n (B n C) = (A n B) n C. To prove the first equality of (d), suppose that x E A n (B U C). Then x E A and x E B U C. Hence, either x E A and x E B or x E A and x E C. That is, either x E A n B or x E A n C. Consequently, x E (A n B) U (A n C ) . WehaveshownthatA n (B u C ) E ( A n B ) u (A n C). On the other hand, suppose that x E (A n B) U (A n C). Then either x E A n B or x E A n C. If x E A n B, then x E A and ~ E B so , that X E A and ~ E B u C . Therefore X E A n ( B u C ) . Similarly, if x E A n C, then x E A n (B U C). Hence, in any case, x E A n (B U C). We have shown that

This inclusion relatioii, combined with the one obtained above, yields

Let us illust'rate by a Venn diagram the identity (d) which we have just proved. The heavily outliiied region in Fig. 1-5 represents either side of the identity. The reader should illust,rate the other identities of Theorem 1-4.3 by Venn diagrams. The identities (a) through (g) in Theorem 1-4.3 are the basic rules of operation in the algebra of sets. By algebraic manipulations alone, it is possible to derive from these numerous other laws of operation.

34

SET THEORY

[CHAP.

EXAMPLE 10. Let A, B, and C be subsets of a universal set X.


(a) (A U B) n C = C n ( A U B) (A n C) U ( B n C), and similarly,
=

( C n A) U ( C n B) =

(b) A U (B n A) = (A U B) n (A U A) = ( A U B) n A = A n (B U A) ( - ~ ~ ( B u A ) ) u @ (= A ~ ( B u A ) ) u ( A ~ A ~A ) = n ((BuA)uA')= A n (B U (A U AC)) = A n ( B U X) = A n X = A, t h a t is, A U ( B n A)


=

sncl

A n ( B U A)

A-

(e) If A n B = A n C and A U B = A U C, then B = C. Indeed, B = BU ( A ~ B ) = B U ( A ~ C ) = ( B U A ) ~ (BUC) = ( A U B ) ~ ( B u C ) = (14 u C ) n ( B u C ) = ( A n B ) U C = ( A n C ) U C = c U n c ) = C.

Identities such as those of Example 10 can of course always be obtained directly from the definitions of the set operations, as y e did for the proof of Theorem 1-4.3. However, identities which involve severa1 sets can usually be derived more easily by algebraic manipulations. THEOREM 1-4.4. Let A, B, and C be sets. (a) A c A u B , B c A u B ; A = , A n B , B A n B . (b) If A 2 C and B C, then A U B E C; if A 2 C and B 2 C, then A n B 2 C. f A S B, then A U C B U C and A n C 2 B n C. (c) I (d) A E B if and only if A n B = A ; A 2 B if and only if AuB=A. The proofs of the various statements in Theorem 1-4.4 are again simple applications of the definitions. For exahple, let us prove the first part of (d). I f A c B, then x E A implies z E B, and hence x E A n B. A n B. I f x E A n B, then in particular, x E A, so that Thus A A n B c A . Therefore, A = A n B. Conversely, if A = A n B, then every element of A is in A n B and, in particular, in B. Therefore A 2 B. THEOREM 1-4.5. Let A and B be subsets of the set X.

f A C B, then Ac 2 Bc. (a) I (b) (A U B)" = Ac n Bc; ( A (c) ( A C ) C = A. (d) ac = X; XC = @.

n B)" =

u B".

1-41

T H E ALGEBRA O F SETS

35

The statements (a), (e), and (d) of Theorem 1-4.5 should be clear. Let us examine (b). To say that x E (A U B)" is the same as saying x A U B, which in turn amounts to x A and x B. That is, x E AC and x E Bc, which means x E Ac n Bc. Thus (A U B)" and A" n Bc contain exactly the same elements, so they are equal. The proof that (A n B)" = A" U Bc is similar. We illustrate the identity (A n B)" = A" U BC by a Venn diagram. In Fig. 1-6, the region outside of the doubly shaded region represents each of the sets (A n B)" and A" U Bc.

1. If the universal set is the collection N of al1 natural numbers, determine ,4 U B, A n B, and Ac in the following cases. (a) 4 , = (nln is even) , B = {nln < 10) (b) A = {n1n2 > 2n - 1), B = {nln2 = 2n 3) (e) A = {nl(n 1)/2 E N), B = {nln/2 E N)

2. Prove Theorem 1-4.3(a), (c), (e), (f), and (g). 3. Justify each step of the computations in Example 10, using the results of Theorem 1-4.3 where they are needed. 4. Prove the following identities by algebraic manipulations, using the results of Theorem 1-4.3 and Example 10. (a) A U ( A c n B) = A U B, A n ( A C uB) = A n B (b) A U (B n (A U C)) = A U (B n C) (c) ( ( n n B) U (B n c>) U (C n A) = ( ( A U B) n ( B u C ) )

( C U A)

5. Illustrate Theorem 1-4.4(c) by a Venn diagram. 6. Show that if A E C, then A U (B n C) = (A U B) 7 C. 7. Prove Theorem 1-4.5(a), (e), and (d). 8. Using Theorem 1-4.3(d), (e), (g) and Theorem 1-4.4(d), show that if A U B = X, then Ac B. Also, show that if A n B = <P, then B E A". Thus, show that B = Ac if and only if A U B = X and A n B = a. 9. Use the result of Exercise 8 to give a new proof of Theorem 1-4.5(b).

36

SET THEORY

[CHAP.

10. Rlake Venn diagrams to illustrate the following identities. (a) A n (B U (C U D)) = (.(A n B) U (A n (BU ~ Cc) (b) ( A U (B n c ) ) ~=

n c))

U (A

n D)

11. If A and B are any sets, then the diflerence between A and B is defined to be A - B = ( a E Ala B). I n particular, if A and B are subsets of some universal set X, then A - B = A n Be. Show that the following are true. (a) (,4 - B) - C = A - ( B U C ) (b) A - (B - A ) = A (c) A - ( A - B) = A n B 12. Define ,4/B
=

Ac n Bc. Prove that the following are true.


(b) (A/A)/(B/B)
=

(a) A/A = 14c (c) (A/B)/(A4/B) =

A ri B

UB

The binary operation (*/*) is called the Schefler stroke operation. 13. Translate the identities of Theorem 1-4.3(d) and Theorem 1-4.4(b) into rules involving only the Scheff er stroke operation. 14. Define A O B = (A n Be) U ( A c n B). Prove the following. (a) A O A = @ , A @ @ = A (b) (A O B) O C = A O ( B O C) (c) A n ( B o C) = ( A n B) O ( A n C )

1-5 Further algebra of sets. General rules of operation. It is possible to extend many of the identities in the previous section to theorems concerning operations on any number of sets.

DEFINITION 1-5.1. Let S be a set whose elemeiits are sets. Then


u(S) = {xlx E A for some A E S), n(S) = (~1.2: E A for al1 A E S).

As in the case of two sets, u(S) is called the u n i o n of the sets of S and n(S) is called the intersection of the sets in S. Thus, u(S) contains those elemeizts which are in any one or more of the sets in S, and n(S) contains those elemeizts which are in every set in S. For these definitions, S need not be a finite collectioiz of sets (see Example 3 below). In Fig. 1-7, S = {A, B, C, D); u(S) is the total shaded area and n(S) is the most heavily shaded area, inside the heavy outline.

EXAMPLE 1. Let S = ((1, 2), {l., 3, 5:-, (2, 5, 6)). Then U(S) = (1, 2, 3, 5, 6) and n(S) = @. EXAMPLE 2. If S
=

{ A , B), then U(S)

A U B and n(S)

*4 l B.

1-51

FURTHER ALGEBRA O F

SETS.

GENERAL RULES O F OPERATION

37

EXAMPLE 3. Let C be a circle in soine plane P. Let S consist of al1 sets A which satisfy the following specifications: the elements of A are al1 points of P lying on the side containing C of some tangent line to C. (See Fig. 1-8.) Then U(S) is the set of al1 points in P, while n(S) consists of al1 points inside C.
EXAMPLE 4. Let S be the empty set of subsets of the universal set X. Shen

U(S)

@,

and

n(S)

X.

The reader should carefully check these examples to be sure that they satisfy the condition of Definition 1-5.1. Example 4 may perhaps be surprising, but it is nevertheless correct according to the definition. Moreover, the intersection or union of the empty set of sets is often eiicountered. It would be a nuisance if these operations were undefined in this case. I f S is a collection of sets, and if its member sets can be labeled by the elements of another set 1,then we write S = (CJi E 1 ). For example, let I = N = (1, 2, 3, . . .), and let S be the collection of sets

Then the set {1,2,. . . , i) can be denoted by Ci7and we have S = {Cili E N). f S= When this notation is used, the set I is called aiz index set. I (CiJi E I ) , it is customary to write uieICi for u(S) and n i ~ ~ for C i n(S). The identities of Theorem 1-4.3(b) are called the associative laws for the operations of set union and set intersection. These are special cases of a general associativity principle. THEOREM 1-5.2. I f the sets of the collection ( A l, A s , . . . , An) are united in any way, two a t a time, using each of the sets a t least once, the resulting set is equal to

38

SET THEORY

[CHAP.

If the sets of this collection are intersected in any way, two at a time, using each set a t least once, the resulting set is equal to

The phrases "united in any way, two at a time" and "intersected in any way, two a t a time" are somewhat vague. However, the meaning becomes clear if we look a t examples. The possible ways to unite the sets (A 1, A2, A3), using each set once are : Al U (A2 U A3), Al U (A3 U A2), (Al U A2) U A3, (Al U A3) U A2, A2 U (A3 U Al), A2 U (Al U A3), (A2 U A3) U Al, (A2 U Al) U A3, A3 U (Al U A2), A3 U (A2 U Al), (A3 U Al) U A2, (A3 U A2) U Al-

Four sets can be united, two a t a time using each set once, in 120 ways: (Al U (A2 U A3)) U A4, ((Al U Az) U A3) U A47 A1 u ((A2 u A31 u A,), (A1 U A2) U (A3 U A,), A l U (A2 U (A3 U Aa)), together with the cornbinations obtained by interchanging Al, A2, A3, and A4 in various ways. I t is a simple chore to prove any individual instance of Theorem 1-5.2, for example, to show that (Al U (A2 U A3)) U A4 = u ( { A ~ , A2, A3, A4)). This is perhaps the best way for the reader to convince himself that the theorem is true. Of course, this method is not a "proof" and will not satisfy the mathematician who demands a general method which will cover al1 possible cases at once. A mathematically correct proof of Theorem 1-5.2 is out of reach until we have discussed inductive proofs. These will be considered in Chapter 2, and a proof of the theorem will be given there. As a consequence of Theorem 1-5.2, we may adopt the notation

since the expression A l U A2 U . U A, is just what is obtained from unions of the type considered by omitting parentheses. The theorem says that the arrangement of parentheses is of no consequence anyway. I t is possible to give a proof now of a theorem which is closely related to Theorem 1-5.2. THEOREM 1-5.3. Let S and T be two sets whose elements are sets. Then (a) u(S U T) = u(S) U u(T), (b) n ( S U T) = n(S) n n(T).

1-51

FURTHER ALGEBRA O F

SETS.

GENERAL RULES O F OPERATION

39

I~XAMPLE 5. Let S

((1, 2, 31, {2,4)), and T = ((2, 31, (2, 4)). Then {{l,2,3), {2,4), (2,3117 {1,2,3,4), ncs u T ) = (21, U(S) = (1, 2 7 3,4), U(T) = (2, 3,4), n ( S ) = {2),n(T) = (21, U(S)U U(T) = {1,2, 3,4), n ( S ) ri n ( T ) = (2).
=

uT= U(S u T ) =
S

The proof of Theorem 1-5.3 amounts to a careful examination of Definitions 1-4.1 and 1-5.1. Suppose x E u(S U T ) . Then x E A for some f A E S, A E S U T. But if A E S U T , then either A E S or A E T. I f A E T , then x E u ( T ) . In either case, x E u(S) U then x E ~ ( 8 ) I . U(T). This shows that u(S U T ) c u ( S ) U u ( T ) . On the other hand, if f A E S, then certainly A E x E u ( S ) , then x E A for sorne A E S. I S U T. Thus, x E u(S U T ) . Therefore u ( S ) G u(S U T ) . Similarly, ( T ) c u(S u T ) . Thus u(S) u u ( T ) u(S U T ) ,by Theorem 1-4.4(b). I f the opposite inclusions are combined, it follows that u(S U T ) = U(S) U u ( T ) . Part (b) of Theorem 1-5.3 is proved similarly. The identities (d) of Theorem 1-4.3 are the distributive laws for the set operations of union and intersection. These laws can also be generalized. THEOREM 1-5.4. Let (Bili E 1)be a set of sets, and let A be any set. Then

A ri ( u ~ E I B = ~U ) E E I( A n Bi), A U (ni~~B = i) ( A U Bi).


by Definition 1-4.1, Proof. Suppose that x E A n ( u i ~ ~ B i Then ). ~ B ~that x E Bi for E A and x E UiEIBi. TO say that x E u ~ E means some i E I by Definition 1-5.1. For this particular i, x E A n Bi. Therefore, x E ui1(A n Bi). On the other hand, if x E u ~ I ( n A Bi), then x E A n Bi for some i E I. By Definition 1-5.1, it follows that x E A and x E uiEIBi. Consequently, x E A n (UiIBi).This argument S ~ O W S

that the elements of A n (uiEIBi) are exactly the same as the elements of uiEI(A n Bi). The other statement is proved in a similar way.

EXAMPLE 6. I n Theorem 1-5.4, let I = N = (1, 2, 3, . . .). Define Bi {1, 2, . . . , i) for i E 1,and let A = (2nIn E N) = (2, 4, 6, . . .). Then

UiEIBi= N,

and

n (Ui~rBi) = A nN

A.

We have A n B; = (2nln E N, 2n 5 i), and U ~ E I ( A l Bi) = {2nln E N) = A. Further, n i ~ r B = i (11, SO that A U ( n i ~ r B i= ) A U (1) = (1, '44, 6)

SET THEORY

Finally,

. . . , i, i + 2, i + 4, . . .) if i is even (1, 2, . . . , i, i + 1, i + 3, . . .) if i is odd.


ni~l(A U Bi)
=

Therefore,
(1, 2, 4, 6,

. . .).

In the ordinary arithmetic of numbers, it is possible to start with a single nonzero number, say 2, and to build from it infinitely many other numbers by addition, subtraction, multiplication, and division. One of the surprising facts about the arithmet,ic of sets is that only a finite number of different sets can be constructed from a finite number of sets using the operations of union, intersection, and complementation. For example, starting with a set A (contained in the universal set X), we obtain the sets A", A n A = A, A U A = A. Thus the first step of the construction yields one new set A". At the second step, we get A n A" = +, A U A" = X, as well as A and A" again. The next step produces no new sets, nor does any step thereafter. A little calculation will show that the only possible sets which can be constructed from two sets A and B in X are a, A, B, A",

Bc,AnB,AcnB,An13c,AcnBc,AuB,AcuB,AuBc,A~Bc, (A n Bc) U (Ac n B), (A n B) U (Ac n Bc), X. In this list, the four sets A n B, A n Bc, A" n B, and A" n Bc are particularly interesting.
An examination of the Venn diagram in Fig. 1-9 indicates why these sets are important. We see that except for a, each set in our list is the union of one, two, three, or al1 of these fundamental sets. For example, A

uB

(A

n B) u

( A n Bc) u (Ac n B).

This is an example of a general theorem which is usually called the disjunctiue normal form t heorem.

THEOREM 1-5.5. Let A l , A 2 , . . . , A, be subsets of the set X. Then


every set which can be formed from these sets by union, intersection, or complementation is either or has a representation as a union of

1-61

MEASURES ON SETS

certain of the sets

where il,iz, . . . , i , are O or 1 and A5i is A j if ij = O and Aj" if ij = 1. For example, if n = 3,

n A 2 n A3, M010 = A l n A; n A,, M100 = A ? n A2 n A,, Mi10 = A" A; n A,,


MOOO = Al

MOOl = Moll = M,,, = M,,, =

Al Al A", A;

n A 2 n A;, n A; n A;,
A2

A;,

n A; n A;.

By the theorem, every possible nonempty combination of A l , A2, and A 3 can be obtained as a union of one, two, three, four, five, six, seven, or al1 of these Mijk. For instance,

The proof of Theorem 1-5.5, like the proof of Theorem 1-5.2, can be carried out only by mathematical induction. Since this result will not be needed in later parts of this book, a formal proof will not be given.

1. Suppose that S = {A), where A is a set. What is U(S)? What is n(S) ? 2. Check Theorem 1-5.2 for the following particular combinations of sets: (Al U A2) U ( A 3 U A4), Al U (A2 U (A3 U ~ 4 ) ) , ( A l U (A2 U A s ) ) U A4. 3. Prove Theorem 1-5.3(b). 4. Let (Ai[i E 1) be a set of subsets of X. Show that the following are true. (a) (uiEIAi)" = niIA(
(b) ( n i ~ r A i ) " = ui~A:

5. What is the largest number of different sets which can be constructed from three subsets A, B, C of a universal set X, using the operations of union, intersection, and complementation? I f C A C B C C C X, how many different sets can be constructed?
*1-6 Measures on sets. One important application of set theory is its use in mathematical statistics. The foundation of statistics is the theory of probability, and in its mathematical form, probability is the study of certain kinds of measures on sets. In this section and the next the concept of measure of a set will be introduced, and some of its simplest properties will be examined.

SET THEORY

Severa1 ways of "measuring" sets are already known to the reader. For example, the measure of a line segment (which may be considered as a set of poiiits) is usually taken to be the length of the segment. A good measure of a finite set A is 1 Al, the number of elements in A. But there are situations where different measures of line segments and finite sets are more useful. For example, a railroad map usually indicates the route between major cities by a sequence of line segments connecting intermediate points as shown in Fig. 1-10. Here the length of each line segment is of little interest. The important measure of these segments is the actual rail line distance between the cities corresponding to the points which the segments connect. Another useful measure for these line segments might be the annual cost of upkeep of that section of the rail line which is represented by them. Note that this measure has a natural extension to those subsets of the map which are unions of two or more segments. For example, if I I represents the part of the rail line between Milwaukee and Chicago and if I 2 represents the part between Detroit and Buffalo, then the cost of upkeep of the part of the rail line represented by I l U I 2 would be the cost for I l plus the cost for 1 2 . We will now consider a measure for finite sets which links our discussion to the application of set theory to probability. Suppose that a pair of dice, labeled A and B, are rolled. Both A and B will come to rest with a number of dots from 1 to 6 on the "up" face. The result of the roll can therefore be represented by an ordered pair (m, n) of natural numbers, where m gives the number of dots on the "up" face of A and n gives the number of dots on the "up" face of B. Thus, m and n can be any natural numbers from 1 to 6. I f the dice are "honest," then it is reasonable to suppose that for any roll of the dice, the 36 different pairs (m, n) are equally likely to occur. Now it is customary to define the "point" which is made on any roll of the dice to be the total number of spots on the two "up" faces. Thus, if the outcome of the roll is represented by (m, n), then the point made on the roll is m n. Therefore, the possible points which can be made on a roll of the dice are the numbers from 2 to 12, that is, the set of possible points is (2, 3, . . . , 12). We now assign a measure to the subsets of (2, 3, . . . , 12). I f S is such a subset, assign as the measure of S the probability that on a roll of the dice the point made will be a member of S. The probability of making a certain point is the ratio of the number of different ways that the point can be made, to 36, the number of possible results of a roll. For example, the probability of making the point 2 is

1-61

MEASURES ON SETS

43

ure would assign to the subset (2) the number h. Suppose now that the subset S is the set (7). The outcome of the roll will be in (7) only if the point made is 7. Since 7 can be made in six possible ways: (1, 6), (2, 5), (3, 4,) (4, 3), (5, 2 ) ) (6, 1), the probability of making 7 is = 3, and the measure of (7) is i. As another example, take S = (7, 11). The point made on a roll will be in this set if it is a 7 or 11. We have seen that there are six ways of making 7. There are tmo ways of making 11: ( 5 , 6) and (6,5). Thus, the measure of the set (7, 11) is & = 4j. It is clear now that this "probability measure" can be determined for each of the Z1 = 2048 different subsets of possible points. Let us now look for some common properties of the measures described in the above examples and try to arrive a t a suitable mathematical notion of measure. One property is immediately evident. In each case, there is a rule for assigning a certaiii number to various subsets of a giveii set. In the example of a railroad map, two different measures were suggested for line segments making up the map. The second of these measures, the cost of upkeep, was actually defined for unions of segments of the map. In both cases, however, the measures are defined only for very special subsets of the whole map. In general, measures need not be defined 011 al1 subsets of a given set, but only on some collection of subsets. However, unless these collections satisfy certain "closure " conditions, the measures on them will not be very useful.

6, since 2 can be made in only one way, by the roll (1,l). Thus our meas-

DEFINITION 1-6.1. Let X be a set. A nonempty collection S of subsets of X is called a ring of subsets of X (or just a ring of sets) if it satisfies the following two conditions.
(a) If A E S and B E S, then A U B E S. (b) I f A E S and B E S, then A n Bc E S.

EXAMPLE 1. If X is any set, then the collection of al1 subsets of X is a ring of subsets of X. EXAMPLE 2. Let X be an infinite set. Then the collection of al1 finite subsets of X is a ring of subsets of X. Moreover, X is not in this ring. EXAMPLE 3. Let S be the set of al1 subsets of R which are finite unions of sets of the type I = {xja < x 5 b , a E R, b~ R). Such sets are called half-open intervals. That is, each set of S has the form I1U 1 2 U U In,where Ii = {xlai < x 2 bi), 1 2 = (xla2 < x 5 b2), . . . , 1 , = {xla, < x b,), for some real numbers al, a2, . . . , a,, bi, b2, . . , b,. Then S is a ring of sets.

<

44

SET THEORY

[CHAP.

The expression "ring of sets" is standard mathematical terminology. I t is derived from abstract algebra. The "closure" conditions to which we alluded above are the properties (a) and (b) in Definition 1-6.1. There are other important closure conditions which are satisfied by rings of sets. THEOREM 1-6.2. Let S be a ring of subsets of X. Then
E S; (a) If A E S and B E S, then A n B E S ; (b) f Al, A2, . . . , A, E S , then Al U A2 U (c) I Al n A 2 n . - . nA , E S .

U A, E S and

Proof. One of the requirements in Definition 1-6.1 is that S be nonempty . Thus, there is some subset A of X which belongs to S. Consequently, by Definition 1-6.1 (b), A n A" = is in S. Suppose that A E S and B E S. Then by Definition 1-6.1 (b), A n BCE S. Now use Definition 1-6.l(b) again, with A n Bc taking the place of B. We obtain A n (A n Bc) E S. However, by Theorems 1-4.5 and 1-4.3, A n (A n BC)"= A n (Ac u B) = (A n Ac) u (A n B) = u (A n B) = A n B. Thus, A n B E S. Finally, if Al, A2, . . . , A , belong to S, then using Definition 1-6.l(a) repeatedly gives Al U A2 E S, Al U A2 U A3 = ( A ~ u A ~ ) U A ,~ . .E . ,S A ~ u A ~ u . - - u A , E S . Similarly, by using repeatedly 1-6.2(b), which we have just proved, we find that Al n A, n - n A,ES.
There is one more important property that our examples have in common. I n the upkeep cost measure on the segments of the railroad map, we noted that if Il and I2are distinct segments, then the measure of 1U I2is the measure of Il plus the measure of 12. This is still clearly true if Il and I2are replaced by unions of segments, provided that these unions have no segment in common. This additivity property is shared by the probability measure example. Here, the measure was defined for al1 subf A and B are subsets such that sets of the set of points (2, 3, . . . , 12). I no number of (2,3, . . . , 12) is in both A and B (A n B = a), then the measure of A U B is the sum of the measures of A and B. For example, the measure of (7) is +,the measure of (1 1) is &,and the measure of (7) U = $. This simple property is the essence of (11) = (7, 11) is 8 the mathematical notion of measure. Two sets A and B are said to be disjoint if they have no elements iii common, that is, A n B = a. A collection of sets is called pairwise disjoint if each pair of different sets in the collection is disjoint. Note that the term "pairwise disjoint" refers to the collection of sets as a whole and not to the individual sets in the collectioii.

1-61

MEASURES ON SETS

45

EXAMPLE 4. The sets (7, 11) and (2, 12) are disjoint. 5. The collection of line segments in the railroad map example are EXAMPLE pairwise disjoint, provided we agree that each line segment includes its lefthand endpoint, but not its right-hand endpoint. EXAMPLE 6. Let A l , A2, A3, . . . be the sets of real numbers x defined by Al = (x(1< x 21, A2 = {xl2 < x 5 31, A3 = (213 < x 5 41, etc. Then the collection A l , A2, A3, . . . is pairwise disjoint.

<

DEFINITION 1-6.3. Let X be a set, and let S be a ring of subsets of X. A measure on the collection S is a rule which assigns to each set A in the collection S some real number m ( A ) , subject to the conditioii that if A and B are disjoint sets in S , then

We will be concerned principally with measures defined on the set of al1 subsets of a finite set. For this discussion the following example is important.
EXAMPLE 7. Let X be a set containing n distinct elements xl, x2, . . . , x,. Let mi, m2 . . . , m, be a sequence of n real numbers. For a nonempty subset *4 of X , define m ( A ) to be the sum of al1 those mi for which xi E A. If A = let m ( A ) = O. For instance, if n = 3,

+,

m(+) = O, m ( { x i ) ) = mi, m((x2)) = m2, m({xa)>= ma, m ( { x i ,x2)) = mi ma, m ( { x i ,~ 3 ) = ) mi -i- m3, m({x2,2 3 ) ) = m2 m3, m((x1, x2, 2 3 ) ) = mi 4- m2 m.

+ +

It is left to the reader to show that the condition of Definition 1-6.3 is satisfied, so that a measure is defined on the collection P ( X ) of al1 subsets of X . Particular cases are worth noting. (1) If m1 = m2 = . = m, = 1, then m ( A ) = Al, the cardinal number of A . (2) If mi = 1, m2 = . = m, = O, then m ( A ) = 1 if xi E A and m ( A ) = O if xi 4 A. Thus we can say that m measures whether or not xi is in A. (3) Let xi = 1, x2 = 2 , . . . , X n = n. Let m1 = -1, m2 = 1, m3 = -1,. . . , m, = (-1)". Then m ( A ) is just the number of even numbers in A minus the number of odd numbers in A.

I t is not surprising that there are so many interesting special cases of Example 7, since actually every measure on the collection P(X) of al1

46

SET THEORY

[CHAP.

subsets of a finite set S is of this form. This will become clear after we observe t,hat the additive property of measures has a simple generalization.

THEOREM 1-6.4. Let m be a measure defined on a ring S of subsets of a set X. If (A 1, AS, . . . , A,) is a collection of sets in S and this collection is pairwise disj oint, then
m(Al U A2 U
U A,) = m(&)

+ m(&) + + m(An).
+

I f n = 2, this theorem is the same as the additivity condition for a measure required in Definition 1-6.3, namely m(A1 U A2) = m(Al) m(A2) if A 1 n A = di. Consider the case n = 3. The assertion is

Since the collection {A1, A2, A3) is pairwise disjoint, we know in particular, that A2 n A3 = Qj. Since m is a measure, by Definition 1-6.3, m(A2 U A3) = m(&) Thus, we have

+ m(&).

Now if Al and A2 U A3 are disjoint, we can apply Definition 1-6.3 again to the left side of the last equality to obtain the desired result:

By the distributive law for the set operations (Theorem 1-4.3), we obtain Al n (A2 u A,) = (A1 n A,)

(A1 n A,) = di U di = di,

so that A l and A2 U A3 are indeed disjoint. We used here the fact that Al n A2 = @ and A l n A3 = @, which is justified by the assumption that {Al, A2, A3) is pairwise disjoint. By repeated application of the argument used in the case n = 3, it is possible to see that Theorem 1-6.4 is true for any n. A formal proof of this theorem will not be given here, because such a proof is based on the principle of mathematical induction. The reader should begin to be aware that mathematics leans heavily on this important method of proof which will be discussed in the next chapter. Accepting Theorem 1-6.4, we are ready to examine the assertion that every measure defined on P ( X ) for a finite set X is of the type given in

1-61

MEASURES ON SETS

47

Example 7. For simplicity, suppose that X = (21, 2 2 , $3, x4), where x l , x2, x3, x4 are distinct. Suppose that m is a measure defined on P(X). Then m1 = m((xl)), m2 = m({x2)>, m3 = ( b ) ) , m4 = m((x4)) are certain real numbers. I t is evident that if A is any nonempty subset of X, we can write A = Uzi~~{xi). For example, {xl, x2, x3) = ( ~ 1 )U {XZ)U {x3). Moreover, if i # j, then {xi) n (xj) = <P. Thus, the collection of al1 distinct one element sets {xi), with xi E A, is pairwise disjoint. Hence, by Theorem 1-6.4, m(A) is the sum of al1 mi = m((xi)) for which xi E A. For example, if A = {xl, x2, x3,), then

This argument shows that any measure m on P(X) is a measure of the type described in Example 7, that is, Example 7 is the most general possibility for a measure on the set of al1 subsets of a finite set. Indeed, starting with a measure m on P(X) for which nothing is assumed except that it satisfies the conditions of Definition 1-6.3, we have shown that there are numbers mi corresponding to the distinct elements xi E X such that for any nonempty subset A of X, the measure m(A) is precisely the sum of those mi for which xi is in A. But this is just the measure of Example 7, except possibly for A = <P. However, in the iiext section we will show that m(@) = O for every measure.

1. I n the dice rolling example, find the measure of the following sets:

(2, 3, 41, (10, 11, 12)) (2, 3, 4, 6, 8, 10, 11, 12).


2. Find al1 collections S of subsets of (1, 2, 3) wliich are rings of sets. (There are 15 such collections.) 3. Show that the number of sets in a ring of subsets of a finite set is always a power of 2. [Hint: Let S be such a ring. Let Al, A2, . . . , A, be al1 those nonempty sets in S which do not contain a smaller nonempty set of S. Show that I ++ U ; E T A ~defines a one-to-one correspondence between the subsets I of (1, 2, . . . , n) and S.] 4. Show that m, defined in Example 7, is a measure, that is, m satisfies the condition of Definition 1-6.3. 5. Prove Theorem 1-6.4 in the case n = 4. 6 . Give the details of the proof that if m is a measure defined on a finite set (21, x2, . . . , x,), then there are real numbers mi, m2, . . . , m, such that for a nonempty subset A, m ( A ) is the sum of al1 m; for which x; E A.

48

SET THEORY

[CHAP.

7. I n a certain game, three pennies are tossed a t the same time and points are scored, depending on the outcome of the toss, as follows:
3 heads 2 heads and a tail
= =

20 points, 10 points,

3 tails = 15 points, 2 tails and a head = 5 points.

Define a probability measure m on the collection of subsets of the possible points (20, 15, 10, 51, as was done for the dice rolling example in the text. Find m((20, 15, 10, 5)), m((20, lo)), and m((5)). What is the probability that a t least two heads will appear in the outcome of a toss? 8. I n a certain card game, two cards are dealt from a standard deck of 52 cards. Aces count 4 points, kings 3 points, queens 2 points, jacks 1 point, and al1 other cards O points. I n a given deal, the possible points range from O points (neither card is an ace or a face card) to 8 points (a pair of aces). For a subset A of the set {O, 1, 2, . . . , 8) of possible points, define m(A) to be the probability that on a given deal the number of points scored is an element of A. Find m((O)>, n2({%9, m((5, 6, 7,811, and m((l1).

"1-7 Properties and examples of measures. In this section we derive some useful properties of measures.

THEOREM 1-7.1. Let m be a measure on a ring S of subsets of a set X. (a) m(@) = 0. f A E S and A" E S, then m(Ac) = m(X) - m(A). (b) I Proof. ( a ) Since @ n @ = @, the empty set is disjoint from itself (and it is the only set having this property). Thus m(@) = m(@U &) = m(@) m(@). Subtracting the nurnber m(@) from both sides of this equality gives O = m(@). (b) By Theorem 1-4.3(e), A n A" = and A U A" = X. Thus, m(Ac) = m(A U A") = m(X). A and A" are disjoint, so that m(A) Again, by subtraction, m(Ac) = m(X) - m(A).

THEOREM 1-7.2. Let m be a measure on a ring S of subsets of a set X. Let A, B be in S. Theii

Proof. An examination of the Venn diagram in Fig. 1-11 shows that

uB

(A n Bc) u (Ac n B ) A = (A n Bc) u (A n B), B = (Ac n B ) n (A n B),


=

( A n B),

and that the collection of subsets (A

n Bc, Ac n B, A n B)

is pairwise

PROPERTIES AND EXAMPLES OF MEASURES

disjoint . Consequently, by Theorem 1-6.4,

m ( A U B ) = m ( A n Bc) m ( A ) = m ( A n Bc) m(B) = m(Ac n B )

+ + ( A V B) + m ( A n B ) , + m(A n B), + m ( A n B).

Subtracting the second and third equalities from the first one gives

m ( A U B)

m ( A ) - m(B) = -m(A n B),

which when rearranged is the desired identity. We conclude this chapter by giving some practica1 examples of measures on finite sets.
EXAMPLE 1. In a certain class, 40% of the students are blonds, the rest are brunettes, 12% are left-handed, and 5% are both blond and left-handed. Find the percentage of atudents who are right-handed brunettes. Let A be the subset of blonds in the class and B be the subset of left-handed students. Then A 1 7 B is the subset of left-handed blonds and A U B is the subset of students who are either blond or left handed. Moreover ( A U B)" is the subset of students who are neither blond nor left handed, that is, the subset of right-handed brunettes. Recalling that cardinality is a measure on the set of al1 subsets of a finite set, Theorem 1-7.2 gives

Now if there are n students in the class, and C is any subset of students, then 1Cl/n gives the fraction of the class which is in C and 1001Cl/n gives the percentage of the class which is in C. Thus, we have

Therefore, since 47% of the class is in A U B, 53% of the class is in (A U B)" that is, are right-handed brunettes.

50

SET THEORY

[CHAP.

EXAMPLE 2. A certain type of spring balance is constructed so that i t measures only weights between one and two pounds. If we have three steaks each of which is known to weigh between and 1 pound, how can we use the spring balance to determine their weights exactly? Let the steaks be denoted by xl, 22, x3. For a subset A of X = (xl, x2, x3), let m(A) be the total weight of the steaks in the set A. Clearly m is a measure on the subsets of X, Let Al = (x2, x3), A2 = (xi, $31, A3 = (x1, 52). Because of our rough knowledge of the weights of $1, x2, and x3, we are certain that m(*41), m(Aa), and m(A3) can be accurately determined by the spring balance, since their weights are between 1 and 2 pounds. Now Al U A2 = A2 U A3 = A3 U Al = X, and A l fl A2 = (x3), A2 n A3 = {xl), A3 n A1 = (52).

Thus, by Theorem 1-7.2,

Adding these equalities gives

Hence, m(X)

+ ( m ( ~ l ) m(A2)

+ m(&)).

Therefore,

I t is possible to extend the result of Theorem 1-7.2 to an identity which involves more than two sets. For example, suppose that A, B, and C are subsets of X and that m is a measure defined on a ring of subsets of X. Then using Theorem 1-7.2 repeatedly,

m(A)

+ m(B) + m(C)

m(B ri C) - m((A

n B) U

( A n C))

m(A) m(B) [ m ( An B)

+ m(C) + m(B C) + m(C n A ) ] + m(A n B n C).


f l

1-71

PROPERTIES AND EXAMPLES OF MEASURES

51

EXAMPLE 3. The classification of blood type is made on the presence or absence of three distinct antigens in the blood. These antigens are denoted by A, B, and Rh. The possible blood types are eight in number:
Type O, Rh negative O, Rh positive A, Rh negative A, Rh positive B, Rh negative 13, Rh positive AB, Rh negative AB, ~h positive

Blood contains no antigens Rh A A and Rh B B and Rh A and B al1 antigens

Suppose that in a group of ten people: 4 have antigen A, 5 have antigen B, 6 have antigen Rh, 2 have antigens A and B, 3 have antigens A and Rh, 3 have antigens B and Rh, and 2 have al1 antigens. Determine the number of people in the group having type O, Rh positive blood. Let TA, TB) TRh denote the sets of people having the respective antigens A, B, and Rh. The number of people with type O (Rh positive or negative) is

The number of people with type O, Rh negative is

The number of people with type O, Rh positive is therefore 3 - 1 = 2. By similar considerations, it is possible to determine the number of people with each of the eight possible blood types.

1. Use the identities of Section 1-3 to show that

A UB A B

= = =

(A n Bc) U ( A c n B) U ( A n B), (A n Bc) u ( A n B), (A" n B) U ( A n B).

52

SET THEORY

[CHAP.

2. Determine m(A U B U C U D) in terms of m(A), m(B), m(C), m(D), m(A n B ) , m(A n C), m ( A n o ) , m(B n C), m(B n o ) , m(C n D), m(A n B n C), m(A n B n D), nz(A n C n D), m(B l C n D), and m(A n B n C r i D). 3. Show that the empty set @ is the only set disjoint from itself. 4. Suppose that a certain spring balance rneasures only weights between 1+ and 3 pounds. If four steaks are known to weigh between 4 and 1 pound, show how the spring balance can be used to determine their weights exactly. 5. I n Example 3, find the number of people with blood types A, Rh negative, A, R h positive, AB, R h negative, and AB, Rh positive. 6. Three numbers 1, 2, 3 are written in random order. Assume that each possible ordering is equally likely. What is the probability that a t least one of the numbers will occupy its proper place, that is, 1 occurs first, or 2 occurs second, or 3 occurs third? 7. I n a certain sample of the population, i t is found that lung cancer occurs in 15 cases per 100,000 people. It is estimated that 80% of those with lung cancer smoke and that 65% of those without lung cancer smoke. (These are fictitious estimates.) Determine the approximate ratio of smokers with lung cancer t o smokers without lung cancer.

CHAPTER 2

MATHEMATICAL INDUCTION
2-1 Proof by induction. The essence of mathematics is the construction of logically correct proofs for general theorems. A beginning student is apt to look upon a mathematical proof as a sort of magical incantation which somehow gives truth to a theorem. Nothing could be further from the intention of the person who devises the proof. A proof is worthless if it is not convincing, at least to an intelligent person who makes the effort to understand it. Generally speaking, there are two steps leading to the understanding of a mathematical proof. The first step is the mechanical checking of the proof to see that each statement follows as a logical consequence of statements which precede it. I f the argument survives this test, and if the final statement is the assertion which was to be proved, then it must be admitted that the proof is valid. But to really understand the proof it is necessary to take the second, more difficult, step. One must look at the overall pattern of the argument and discover the basic idea behind it. The ideal is to see the proof through the mind of the person who originated it. Of course, this may require a high degree of mathematical talent, to say nothing of hard work, but the reward in self-satisfaction is substantial, every bit as great, perhaps, as the reward which a musician obtains from mastering a difficult piano or violin sonata. Fortunately there are a few general methods of constructing mathematical proofs which are both elementary and powerful. Our objective in this chapter is to explore in detail one of the most important of these methods, the so-called proof by mathematical induction. Mathematical induction must be distinguished from logical induction. Roughly speaking, logical induction is the process of discovering general laws by noting some common feature in a number of special cases. As an example, if the sequence of numbers 1, 4, 9, 16, 25, . . . is written down, most people who have had some experience with arithmetic will infer by logical induction that the next term in this sequence will be 36. They recognize that 1, 4, 9, 16, and 25 are, respectively, the squares of 1, 2, 3, 4, and 5 , so that the natural choice for the next term is 62 = 36. Logical induction, although it is important for the process of mathematical discovery, is of no use in mathematical proofs. On the other hand, mathematical induction is primarily a technique of proof. I f we examine, say, Theorems 1-5.2, 1-5.5, and 1-6.4, we see that they are statement's which involve an arbitrary natural number n. These state53

54

MATHEMATICAL INDUCTION

[CHAP.

ments become specific assertions only when particular numbers are substitut,ed for n. For small values of n , the statements are quite easy to prove. The difficulty lies in finding a proof which takes care of al1 values of n. This is a situation in which mathematical induction can often be used. We present some examples of mathematical statements which involve an arbitrary natural number n.

EXAMPLE 2. If a

> -1,

then ( 1

+ a)" > 1 + na.

EXAMPLE 3. (Theorem 1-6.4). Let m be a measure defined on a ring S of subsets of X. Let { A l , A2, . . . , A,) be a pairwise disjoint collection of sets in S. Then m(A1 U A2 U U An) = m ( A l ) m(A2) m(An).

+ +

In order to illustrate the mechanism of mathematical induction, consider Example l . As is often the case, our notation is not well adapted for small values of n. For n = 1, 2, 3, and 4, the assertions should read 1 = + . l ( l + 1), 1 + 2 = + . 2 ( 2 + 1) = 3,

For n larger than 4, the notation expresses the asserted identity clearly enough. Thus if allowance is made for the inadequate notation in the 3 cases n = 1, 2, 3, and 4, the statement of Example 1, 1 2 n = +n(n l ) ,can be considered as a compact method of writing an infinite sequence of formulas:

+ + +

A person who is not familiar with the identity which we are considering may be somewhat surprised that the formula works for the values n = 1 , 2 , 3 , and 4. But he may be justifiably skeptical that this fact makes 3 4 the assertion true in general. Let us try n = 5. Then 1 2 5 = ( 1 + 2 + 3 + 4 ) + 5 = + . 4 ( 4 + 1) + 5 = 1 0 + 5 = 15. Here we have been able to simplify our calculation somewhat by using the formula which we have already checked for the case n = 4. It turns out that this simplification is the real key to t,he general proof of the formula.

+ + + +

2-11

PROOF BY INDUCTION

55

For + . 4 ( 4 1) 5 can be expressed as 15 = + * 4 ( 4 1) 5 = 5 . ( 3 . 4 + 1) = + . 5 ( 4 + 2 ) = + - 5 ( 5 + l ) , the required result. A similar calculation can now be made for n = 6, and, using the same simplification, we find that the formula is also correct in this case. In fact, the process of passing from one formula to the next can be formalized if we are willing to use a variable symbol n instead of a specific number. Thus, suppose that we have already shown that 1+2+3+...+n= Then +n(n+l).

+ +

+ +

The first, third, fourth, and fifth equality signs in the above identity are justified on the basis of the rules of algebraic operation. The remaining equality, the second, is justified by the assumption that formula n of the n = +n(n l ) , is valid. Note that 1 3 sequence, 1 2 2 + 3 + - - . + n + ( n + l ) and 1 + 2 + 3 + - . . + ( n + l ) areboth abbreviations for the sum of the first n 1 natural numbers, so that they are equal. Thus, the equality of the first and last terms of the above expression is the n plus first identity in the sequence of formulas which we are trying to prove. In other words, the calculation shows that if some identity of the sequence is valid, then so is the following one. In particular, since the fifth identity is correct (as well as the first, second, third, and fourth), so is the sixth. Consequently, so is the seventh, the eighth, and so on. Since any formula of the sequence will eventually be reached in this way, we conclude that the identity of Example 1 is valid for al1 n. The proof we have given for the identity of Example 1 is a proof by mathematical induction. Although this method of attack is usually suggested for mathematical statements which involve an arbitrary natural number n, there are often other types of proof available. For example, we could also prove the formula of Example 1 as follows. Let S be the sum of the first n natural numbers. Then

+ + +

Therefore, n(n

+ 1) = 2s and

+n(n

+ 1).

56

MATHEMATICAL INDUCTION

[CHAP.

Let us next consider Example 2 : If a -1, then (1 a)n 1 na. This example is somewhat different from the first one, since it has the form of a mathematical theorem, rather than a mathematical identity. As in the case of Example 1, the statement of Example 2 can be considered to be an abbreviation for an infinite sequence of statements:

>

> +

If a If a If a

2. -1, 2 -1, 2 -1,

then (1 + a ) then (1 a ) 2 then (1 a)3

> 1 + a. + > 1 f2a. + > 1 + 3a.

In these statements, a represents an arbitrary real number, and we re-1 in each statement. Clearly, the first of these statements quire a is true. Let us try to proceed as in Example 1, using the nth statement of the sequence to prove the following statement. Assume then that -1, we have 1 a 2 0. Therefore, (1 a)" 1 na. Since a multiplying each side of the assumed inequality by 1 a preserves the direction of the inequality and gives

>

> +

>

Since a2 2 O for every real number a, it follows that

Combining these two inequalities gives the desired result,

I n this example, the argument needed to pass from statement n to statement n 1 is somewhat more complicated than the corresponding proof in Example 1. Nevertheless, it achieves the same end: from the truth of the first statement (n = l ) , the truth of the second statement (n = 2) follows; from the truth of the second statement, the truth of the third statement (n = 3) follows; and so on. Eventually every statement of the sequence is proved. Let us review the methods which we have used to prove the statements given in Examples 1 and 2. I t should be evident that both proofs follow the same outline. That outline, stated in general terms, is the principie of induction. The statements in Examples 1 and 2 involve an arbitrary natural number n. Thus, in both cases, we are presented with the problem of proving

2-11

PROOF BY INDUCTION

57

al1 of the statements in an infinite sequence P1, P2,P3,. . . of mathematical assertions. The procedure which we followed to prove these statements in the examples consisted of two steps. First, we observed that the first statement P1 of the sequence is true. Then we showed that for any n, it is possible to construct a proof of the statement Pn+l, based on the assumption that P, is true. This deduction of Pn+lfrom P, took the form of an ordinary mathematical argument (using logic and known mathematical facts). The number n occurred throughout the proof as a variable. For example, we could have substituted a number like 23 for each occurrence of n in the proof to obtain a deduction of Ps4 from P23. From these two steps in both Examples 1 and 2, it was concluded that al1 the statements were true. These conclusions were special cases of what f mathematical induction. is called the principle o (2-1.1). Principle o f mathematical induction. Let P1,P2,P3, . . . be a sequence of statements. Suppose that (a) P1 is true, and (b) for any n, if P, is true, then P,+l is true. Then al1 of the statements P1, Ps, P3,. . . are true. By assumption (a), P1 is true. By assumption (b) in the case n = 1, if P1is true, then P2is true. Thus P2is true. By (b) in the case n = 2, if P2is true, then P3is true. Thus P3is true. We can continue indefinitely in this way. Since any statement of the sequence will ultimately be reached, it follows that every one of the statements is true. To apply the principle of induction in making a mathematical proof, it is necessary to establish that conditions (a) and (b) are satisfied. The proof of (a) is usually called the basis of the induction while the proof of (b) is called the induction step. In carrying out the proof of the induction step, it may be assumed throughout the argument that the statement P, is true. This is called the induction hypothesis. I t should be noted however that the validity of the induction step does not necessarily depend on the truth of P,. For example, if P, is the 1)2 assertion "n2 n is odd," then Pn+l is the statement "(n (n 1) is odd." Since (n 1)2 (n 1) = n2 n 2(n l ) , it follows that if P, is true, then so is Pn+l(because the sum of an odd number and an even number is odd). That is, condition (b) is satisfied. However, P, is actually false for every n E N, since n2 n = n(n l), and a t least one of the natural numbers n or n 1 is even. Another aspect of the proof of the induction step which should be emphasized is that n must represent an arbitrary natural number (that is, a variable) throughout the argument. This is essential because the fact that P, implies P,+l is applied successively with n = 1, 2, 3, . . . .

+ +

+ + +

+ + +

58

MATHEMATICAL INDUCTION

[CHAP.

EXAMPLE 4. We prove Theorem 1-6.4 (Example 3). The proof is a typical application of mathematical induction to establish a mathematical theorem. The statement to be proved for the basis step is the following: If {Al} is a pairwise disjoint collection of sets, then m(A1) = m(A1). This is obviously true. To prove the induction step, we make the induction hypothesis P,: If
(Al, A2,

..

- 7

An)
U An) = proved is of sets in m(A,)

is a pairwise disjoint collection of sets in S, then m(A1 U A2 U . m(A1) rn(*42) m(A,). The statement which has to be P,+i: If {Al, A2, . . . , A,, An+l) is a pairwise disjoint collection S, then m(A1 U A2 U U 14, U An+l) = m(A1) m(A2) . . m(A,+l). Note that by the definition of pairwise disjointness, if

+ +

+ +

is a pairwise disjoint collection, then so is (Al, A2, . . . , A,}. tion hypothesis can be applied to obtain

Thus the induc-

Consequently, m(Ai) -4- m(Aa)

3-

4- m ( 4 4

+ m(An+l)
=

m(A1 U A2 U

U A,)

+ m(A,+i).

We would like to conclude that m(Al U A2 U U A,) m(A,+l) = This conclusion is justified by Definition m(A1 U A2 U U A, U A,+i). 1-6.3, provided that A l U A2 U U A, and A,+l are disjoint. However by Theorem 1-5.4, An+i

n ( A l U A2 U
=

U A,)

(An+i n A l ) U (A,+l ri A2) U = @ U @ U . . . ~= @ a,

. U ( A , + ~n A,)

since {Al, A2, . . . , A,, ,4,+1) is a pairwise disjoint collection. Thus, we have shown that the truth of P,+l follows from the truth of P,. By the principle of induction, this proves Theorem 1-6.4.

1. Use mathematical induction to prove the following identities. (2n - 1) = n2 (a) 1 + 3 + 5 + - - . + s ~ ( n 1)(2n 1) (b) 1 2 + 2 2 + 3 2 + - . - + n 2 = 1 (2n 1 ) 2 = +(n 1)(2n 1)(2n (c) l 2 32 52 l6 z6 36 (-i),-'n6 (d)

+ + + + + + +

+ 3)

2-11

PROOF BY INDUCTION

2. Use mathematical induction to prove the following identities.

3. Use mathematical induction to prove the following identities.


(a) (1 2 . 312 - ( 2 . 3 . 4)2

+ (-Un-'[n(n + l)(n + 2)12


L

+ ( 3 . 4 - 5)2 -

(b)

( 1 . 2 - 3 . . . r ) + [ 2 . 3 . 4 . - . (r+ 1)]+[3.4-5.e. (r+ 2)]+.-[n(n 2) (n -01

+ + +
+ +
)

4. Use (a) (b) (c)

mathematical induction to prove the following identities. 1+2+22+.**+2n-1 = 2"-1 n(+ln-l = 4 - (n 2) (+) "-l 1 2(+) 3(+)2 3 + 33 + 35 + . . . + 32-1 = 3 s(9" - 1)

5. Use mathematical induction to prove the following identities. (a)


1 -1
-(

1-

1-

= -

1 n+1

6. Let t be any real number different from 1. Use mathematical induction to prove the following identities.

(a)

i+t+t2+...+tn-'

= -

tn - 1 t -1

MATHEMATICAL INDUCTION

[CHAP.

7. Use mathematical induction to prove the following inequalities. (b) 2fl+3 < ( n + 3)! (a) n < 2, r ) !, where r E N. (c) n ! r ! < (n 8. Prove that for al1 natural numbers m and n, m(m 1) . (m n - 1) is a multiple of n. 9. Prove by mathematical induction that if O al 5 1, O 5 a2 5 1, . . . , ( 1 - a,) 2 1 - al - a2 - a,. O 5 a , 5 l , t h e n ( l - a l ) ( l - a2)

+
a *

<

10. Prove by mathematical induction that if a l , aa, numbers, then (al 2 a2n ala2 - a,, 5

, a2, are positive real

+ ; +
a2 =
2

and the inequality is strict unless al

ala2

(al

;
~

=
(a ~l

a,. [Hint: First show t h a t

; a2)2 .]

11. Give a proof by mathematical induction of the case of Theorem 1-5.4 in which I = { 1 , 2 , . . . , n).

2-2 The binomial theorem. In this section, we will use mathematical induction to prove the binomial theorem. The binomial theorem and its generalization, the multinomial theorem, are important results not only in elementary algebra, but also in number theory, probability theory, and combinatorial analysis. An application of the binomial theorem has already been given in Section 1-3. Moreover, the proof of this theorem is a good exercise in the use of mathematical induction. The formulas

are familiar from elementary algebra. They suggest the problem of finding the general expanded version of the power (x y)n. An examination of the cases n = 2, 3, 4 suggests that the general formula should be of the f ollowing f orm :

where the coefficient Ni,, of x n " y is some natural number which depends on i and n. For example, if n = 4, then N 1,4 = 4, N 2 , 4 = 6, and N3,4 = 4. Mathematical induction now provides a means of verifying

2-21

THE BINOMIAL THEOREM

61

this guess. Let Pn be the statement that equation (2-1) is valid with Ni,, certain positive integers (which will be determined presently). In particular, P1 is just the statement x y=x y, which is certainly true. Then, making the induction hypothesis that Pnis true, an algebraic calculation gives

This identity establishes the validity of Pn+1. I t is only necessary to note that the coefficients for the identity Pn+iwill be

Since the Ni,, are natural numbers by the induction hypothesis, so are the coefficients Moreover, these constants, which are usually called binomial coeficients, satisfy a simple relationship which makes it possible to obtain their value. For convenience, define

N~,, = N,,,
Then NiTn+i= Ni-1,n

1,

for al1 n. f0r 1

(2-2) (2-3)

+ Ni,nj

5 i 5 n.

Consider the diagram (known as the Pascal triangle) :

62

MATHEMATICAL INDUCTION

[CHAP.

The rule of formation should be clear. The edges of the triangle are composed of ones. The position of the numbers in successive rows is staggered, so that every number not on an edge of the triangle has two numbers above it, one of them to the right and the other to the left. Moreover, each such number is the sum of the two numbers above it. 1f-wewrite down a similar triangle with the binomial coefficients:

We see that equations (2-2) and (2-3) express exactly the same rules of formation that were used to construct the Pascal triangle, and these rules clearly determine uniquely the numbers which appear in the triangle. Hence, the numbers in the nth row of the Pascal triangle are precisely the binomial coefficients for the expansion of (x y)".

A striking characteristic of the Pascal triangle is its symmetry about the vertical line through its center. This symmetry is expressed in terms of the binomial coefficients by the formula

The proof that equation (2-4) is actually valid is another simple exercise in mathematical induction. The details are left to the reader. Another less obvious relationship between binomial coefficients can be discovered from the Pascal triangle by tracing down a diagonal from left to right. For example, on the third diagonal we get the sequence 1, 3, 6, 10, 15, 21, . . . . The rule of formation here is not immediately evident. However, consider the successive quotients: 3, = 2 = 27 6 = 5 3) 1 5 - 3 - 6 21-z . . . . Similarly, down the next diagonal, 1, 4, 10, -_ - 4, 1 5-

2-21

THE BINOMIAL THEOREM

63

20, 35, . . . , the quotients are 4, 9 = S, These observations suggest another identity:

2 =

6 3,

20

= 24)---.

This can in fact be proved by induction, using equations (2-2) and (2-3). We will not carry out this proof. Instead, let us see how equation (2-5) can be used to determine a numerical expression for the binomial coefficients Ni,,, O < i < n. By successive cancellation, we obtain

n(n - 1 ) - . . ( n - i + l)(n - i)(n - i - 1 ) . . . 2 . 1 2 . l][i(i - 1) . . . 2 . 11 [(n - i)(n - i - 1)

Recalling the convention that O! = 1, we see that the expression n !/(n - i) ! i ! represents Ni,, even for i = O and i = n. For by (2-2))

The discussion of this section can now be summarized as a theorem. THEOREM 2-2.1

I f n is a natural number, then

where the binomial coefficients Ni,, are natural numbers which are given by the formula

Except for the inductive proof of (2-5), al1 of the facts in Theorem 2-2.1, have been established. Instead of proving (2-5)) %vewill show directly (by induction on n) that Ni,, = n!/(n - i)! i! if O 5 i 5 n. We have already observed that for al1 n, No,, = n!/(n - O)!O!, and NnSn = n!/(n - n)! n !. This provides the basis of the induction. For if

64

MATHEMATICAL

INDUCTION

[CHAP.

n = 1, then O 5 i 5 n implies that either i = O or i = l. For the induction step, assume that Ni,, = n ! / ( n - i )! i ! for each i between O and n (including i = O and i = n ) . Then by (2-3))

provided that 1 5 i 5 n. Since the cases i = O and i = n 1 are taken care of by the first remark of this paragraph, the proof of the induction step is complete. By the principle of mathematical induction, Theorem 2-2.1 is established. The notation Ni,, for the binomial coefficients seems appropriate for in Section 1-3. However, a the interpretation of these numbers g~ven more common designation of Ni,, is (7). That is,

Henceforth, we will use ( ) : rather than Ni,, to denote the binomial coefficients. There are numerous useful identities involving binomial coefficients. We give one sample of such a relation.

THEOREM 2-2.2. Let m and n be natural numbers, and let k be an O 5 lc 5 m , O 5 lc n. Then integer ~at~isfying

<

(O) (k) (7) : (3 :


+

(k

1) +

(k

2) +
+

(3 (O)
+
+

m+n lc

>
m]

I t is possible to prove this formula by induction on m n, using (2-2) and (2-3). However, there is a simpler proof, based on Theorem 2-2.1, which makes it clear why such an identity holds. We observe that ("kn) is the coefficient of xm+n-kyk in the expansion of (z y)mCn. However,

[(O)

xm

+ (T)xm-ly + . + ( m m - 1 ) xym-l + (;)Y

I f these expressions are multiplied together and the terms with the same powers of x and y are collected, it is clear that the coefficient of x m fn - k y k

r
CD

'd. CD
D-

9 2.
3 II
u

e
3

II
Ch3

66

MATHEMATICAL INDUCTION

[CHAP.

The general idea of the proof of Theorem 2-2.2 was used in Section 1-3 to obtain an expression for the number of subsets of cardinality k in a set with n elements. The method consists of obtaining two different expressions for the coefficients of a polynomial in one or more variables, and then equating the corresponding coefficients. The justification for this procedure must wait until the nature of polynomials has been examined more carefully (see Section 9-2). As in the case of Theorem 2-2.2, there are many instances in which an inductive proof can be replaced by this process of "equating coefficients."

1. Write the binomial formula for the case n cal value of al1 of the coefficients.)

15. (Determine the numeri-

2. Calculate n!/i!(n - i)! for n = 7, O < i < 7, and compare your results with the values of the binomial coefficients obtained from the Pascal triangle. 3. Prove (2-4) by induction on n, using (2-2) and (2-3).
4. Prove (2-5) by induction on n, using (2-2) and (2-3).
5. Show that the binomial formula implies

Show conversely that this identity implies the binomial formula. [Hint: let t = y/$.] 6. (For students familiar with differential calculus.) Prove (2-5) by differentiating both sides of the identity

then expanding the left-hand side and comparing coefficients of equal powers of t. 7. Prove Theorem 2-2.2 by induction on m n. 8. Using Theorem 2-2.2 and (2-4), show that (g)2

a2= (2).

+ (y)2 + (t)2 + +
5
n,

9. Prove by induction on n that if m is a natural number satisfying m then (3 r;') K2) (3 = (:++:l. 10. Prove by induction on r that

+ +

2-31

GENERALIZATIONS

OF THE INDUCTION

PRINCIPLE

67

2-3 Generalizations of the induction principle. I n this section, we will consider some variations of the principle of mathematical induction. I t is often difficult to use (2-1 . l ) directly in a mathematical proof, even though the problem under consideration seems to be accessible to induction. In many?such cases, a slight modification of the induction principle (2-1.1) will lead to success. Our first observation amounts to only a change of the notation in (2-1.1).

(2-3.1). Let r be an integer. Suppose that P,, P,+l, P,+2,. . . is a sequence of statements such that (a) P, is true, and (b) for any n r, if P, is true, then P,+l is true. Then al1 of the statements P,, P,+l, P,+2, . . . are true.

>

In many inductive problems, the direct application of (2-1.1) requires an unnatural change in notation. I t is better to use (2-3.1) in such cases.
EXAMPLE 1. If n 4, 2n < n!. In this case r = 4. The assertion Pq is correct: 24 = 16 < 24 = 4!. I t is easy to show that if 2n < n!, then 2"+l < (n l)!. Note that the statements P l (2l < l!), P2 (22 < 2!), and Pg (23 < 3!) are false.

>

Suppose that P1,P2,P3,. . . is a sequence of statements, and that P1 and P2are true. To prove P3by the ordinary induction process, we would show that P2 imples P3. However, al1 that we want is a proof that P3 is true, and it may be the case that P3is a consequence of P1,or of a combination of P1 and P 2 . More generally, if it has been shown that P1, P2). . . , Pn are al1 true, and if it is possible to prove that the truth of Pn+lis a consequence of some, or possibly all, of the statements Pl, P2,. . . , Pn,then we can assert that P1, P2,. . . , P,, Pn+1are al1 true, and we are ready to go on to the next statement in the sequence. If this can be done for every n, then it is possible to proceed along the sequence of statements, proving them one at a time. Eventually, any particular statement will be shown to be true. We can formulate this process as a revised principle of induction which, at the same time, takes advantage of the more general notation introduced in (2-3.1). (2-3.2). Let r be an integer. Suppose that P,, P,+l, P , + 2 , . . . is a sequence of statements such that (a) P, is true, and (b) for any n 2 r, if P,, P,+l, . . . , P, are al1 true, then P,+l is true. Then al1 of the statements P,, P,+l, Pr+2,. . . are true.

68

MATHEMATICAL INDUCTION

[CHAP.

A proof which is based on (2-3.2) is called a course of values induction. As in the case of ordinary induction, the proof of the first statement P, of the sequence is called the basis of the induction, and the proof that the truth of P,, P,+l, . . . , P, implies that Pn+lis true is called the induction step. For a course of values induction, the induction hypothesis is the assumption that P,, P,+l, . . . , P, are true. The conditions (a) and (b) of (2-3.2) can be combined into a single condition.
(2-3.3). Let r be an integer. Suppose that P,, P,+l, Pr+2, . . . is a sequence of statements such that for any n 2 r, if Pm is true for al1 m satisfying r 5 m < n, then P, is true. Then al1 of the statements P,, Pr+l, PT+2, . . . are true. The condition in (2-3.3) may seem ambiguous in the case where n = r, since it is impossible to have a natural number m satisfying r 5 m < r. But this simply means that the statement "P, is true for al1 m satisfying r 5 m < r " is automatically satisfied (or, in mathematical terminology, "vacuously satisfied"). Thus, the condition, for case n = r, is just the requirement that P , is true, which is condition (a) of (2-3.2). The induction hypothesis for the form of the induction principle given by (2-3.3) is the assumption that al1 of the statements Pm with r 5 m < n are true. This is different from the induction hypothesis in (2-3.2), where m 5 n, that is, for it is assumed that P, holds for r

Our discussion above shows that this shift from n 1 to n is necessary in order that condition (a) of (2-3.2) can be included in the condition of (2-3.3). Of course, condition (b) of (2-3.2) is also included in (2-3.3), which is assumed to hold for al1 n (and therefore it holds if n is replaced by n 1). Course of values induction is frequently used in the study of natural numbers. I n order to give an example, we introduce the important concept of a prime number. A prime number (or simply a prime) is a natural number greater than 1, which is not divisible by any natural number other than itself and l . For example, 2, 3, 5, 7, 11, and 13 are primes, while 4, 6, 8, 9, 10, 12, 14, and 15 are not primes.

EXAMPLE 2. We will prove using a cousse of values induction that every natural number greater than 1 is divisible by a prime. Since n = 2 is the first

2-31

G E X E R I ~ L I Z A T I O ' J S OF THE ISDCCTIOS

PRIXCIPLE

69

natural number greater than one, we take r = 2 in (2-3.3). statements t o be proved is: P 2 : 2 is divisible by a prime, P 3 : 3 is divisible by a prime,

The sequence of

P,: n is divisible b y a prime.


Suppose t h a t n 2 2 and t h a t P, is true for al1 m satisfying 2 2 m < n. This is the induction hypothesis. If n is a prime, then P, is true, since n is divisible by itself. Kote t h a t this observation covers the case n = 2, the basis of the induction. If n is not a prime, then n has a divisor S which is different from 1 and n. Thus n = s t, where 2 5 s < n. I3y the induction hypothesis, P, is truc. Therefore, s is divisible by some prime p, t h a t is, s = p k for some natural number k. 13ut then n = ( p . k) t = p (k t), which shows t h a t n is divisible by the prime p. 13y (2-3.3), P , is true for evcry natural number n. I t is not hard t o show t h a t the number s in this proof must be less than n - 1. Thus i t would be impossible t o carry out the proof using ordinary induction, b u t rather P, follom~sfrom P,, where since P , is not a consequence of Pnm1, s<n-l.

As another example of a course of values induction we will give the promised proof of Theorem 1-5.2. Actually, we will prove only half of this theorem, since the othcr half has a similar proof.

Prooj o j l'heorem 1-5.2. The statemcnt to be proved is this: if sets A 1, A 2 , . . . , A,, are united in any way, two a t a time, using each set a t least once, then thc resulting sct is equal to u ( ( A 1 , A2, . . . , A n ) ) .
There is a surprise in this proof. The induction variable is not n, as one might expect. Consider the particular sct ( A 1 U A 2 ) U ( A 3 U A4). For this case

S1= { A l , A,), S,

= -[A3, A q > , and

S=

S 1

8 2 =

{ A l , A27 A3, A4).

The first equality of this calculation depends on the observation of Example 2 of Section 1-5, that

thc sccond cquality follo\vs from Thcorcm 1-5.3. Each of thc equality signs in the calculation can be corisidcred as a coiirse of valucs induction

70

MATHEMATICAL INDUCTION

[CHAP.

step. But the induction variable is the number of occurrences of U in the expression (Al U A2) U (A3 U A4), not the number of sets involved in the expression. We can rephrase the statement to be proved so that it has the proper form for induction with the induction variable in clear view. P,: Let S be a finite collection of sets. Let E be a set obtained by forming n binary unions, starting with the sets in S and using each of these sets a t least once. Then E = ~ ( 8 ) . I t is convenient to begin the sequence of induction statements with Po, that is, we take r = O in (2-3.2). Let S be a finite collection of sets. Let E be a set obtained by forming O binary unions (that is, no binary unions), starting with the sets in S and using each of these sets a t least once. Then E = ~ ( 8 ) . If we are to form no binary unions and use each set in S a t least once in the process, then S can contain only a single set A. Thus, for r = O, S = { A ) and E = A. Since E = A = u({A)) = u(S), the basis of induction is satisfied. Assume that P, is true for O 5 m 5 n. Let E be a set constructed in accordance with the statement P,+l, that is, E is obtained by forming n 1 binary unions, starting with the sets of a finite collection S and using each of the sets of S a t least once. The final step in the construction of E is the formation of a union El U E2,where E l and E2are constructed from the sets in S by forming binary unions. Thus E = El U Ea,where El is constructed with ml binary unions, using al1 sets in a subcollection S1 of S, and E2 is constructed with m2 binary unions, using al1 sets of a subcollection S2of S. Since every set in S occurs in the expression E, any set of S must be used either in the construction of El, or in the construction of Ea. Therefore S = Si u S2.

(Of course, there can be sets which are in both Si and S2,and in fact, it might even happen that S1 = S2 = S. This is why induction on the number of sets in S will not work.) The total number of occurrences of U in El U E2 is ml 1 m2. But since E has n 1 occurrences of U, it follows that n 1 = m1 m2 1. Thus, mi 2 n and m2 5 n. But these are the conditions which we need in order to apply the induction hypothesis to El and E2. That is, by P,, and P,,, it follows that

+ + + + +
El
=

u(S1)

and

E2 = ~ ( 8 2 ) .
Thus, by Theorem 1-5.3,

2-41

THE TECHNIQUE OB INDUCTION

71

This completes the proof of the induction step. Therefore, by (2-3.2), the proof of the first half of Theorem 1-5.2 is complete,

l. Complete the proof that 2"

< n!,

for n

' : 4.

2. Prove in detail that the condition in (2-3.3) is equivalent to conditions (a) and (b) in (2-3.2). 3. Let P , be as in the proof of Theorem 1-5.2. Give the details of the proof of statements Pi and P2. 4. Prove the second half of Theorem 1-5.2. 5. I t will be showri in Chapter 3 that if a prime number p divides a product r s of two natural numbers, then either p divides r, or p divides s. Using this fact, give an inductive proof of the following theorem. Let t(n) be the number of (distinct) primes which divide the natural number n. Then 2t(n)5 n. If n is odd, then 3t(n) n.

<

"2-4 The technique of induction. A real appreciation of the power of mathematical induction can be obtained only by studying some of the problems to which it applies. Some of these applications have been given in the examples of Sections 2-1, 2-2, and 2-3. In this section we will examine a few more samples of induction.

EXAMPLE 1. Consider the following list of identities:

These suggest a general theorem: the sum of the first n consecutive cubes is the square of a natural number. This statement is similar to Example 1 and Problems l(a) and (b) of Section 2-1. Thus, i t appears to be a likely candidate for induction. However, in the example and problems of Section 2-1, the induction step of the proof is carried out by simple algebraic manipulation. For this suggested theorem, there seems to be no such algebraic process available. The trouble here is that our statement is not precise enough. I n contrast to the example and problems of Section 2-1, the statement to be proved does not say that the sum of the first n cubes is given by a particular expression in terms of n, but only that it is a square of some number. Let us examine the

72

MATHEMATICAL INDUCTION

[CHAP.

cases given above with the hope of finding a more exact statement of the theorem:

The sequence of numbers 1, 3, 6, 10, 15 uTasencountered in the discussion of 2, 1 2 3, 1 Example 1 of Section 2-1. They were the sums 1, 1 2 3 4, and 1 2 3 4 5. We proved that these sums can be expressed in the form +n(n 1). This observation suggests that a more complete statement than the original one is true, namely,

+ +

+ + + + +

+ +

This identity can be proved by a straightforward application of the principie of mathematical induction. We leave this task as an exercise for the reader.

Example 1 illustrates a surprising phenomenon in the technique of using mathematical induction. Proofs by induction often fail because the theorem to be proved has not been stated in a strong enough form. When the appropriate statement of the result is discovered, mathematical induction may work quite well. The reason that this happens is not hard to see. When the statement of a theorem is strengthened, we of course have more to prove. However, we also have more to work with, because the induction hypothesis is also strengthened. The problem is to strike the right balance between hypothesis and conclusion, so that the induction step can be taken. Induction often works better if we make the problem more general. Moreover, the inductive method can sometimes be used to discover theorems. Our next example illustrates these facts.
EXAMPLE 2. Consider a square array of points with 10 points on each side (see Fig. 2-1). We define a path through the array of points to be a broken line segment starting a t the lower left-hand dot, proceeding from dot to dot, moving either to the right or upward, and finally ending a t the upper right-hand dot. One such path is shown in Fig. 2-1. The problem is to find the number of possible paths through the array. A more concrete formulation of this problem is to consider a person in the center of a large town, say a t the corner of First Street and First Avenue. In how many ways can he drive to the corner of Tenth Street and Tenth Avenue, traveling by a route just 18 blocks in length? A little experimentation will convince the reader that the number of such paths is too large to count easily. One possible method of finding the desired number is to

THE TECHNIQUE OF INDUCTION

work up to i t inductively. If the square has two dots on each side, as in Fig. 2-2, there are only two paths. We may hope to work up through squares with 3,4, 5, . . . , 10 dots on each side. I n fact, if s is any natural number, it should be possible to determine the number of paths through a square array with S dots along each side. An even more general problem can be considered. How many paths are there through a rectangular array of points with r dots horizontally and s dots vertically? I t may seem optimistic to try to solve this general problem when the particular case of a 10 X 10 square array is apparently not easy. However, here is a situation in which the general problem is more accessible to induction than the specific one. Let P,,, denote the number of paths through an r by s rectangular array, where either r > 1 or s > 1. If r = 1 or s = 1, then the dots are in line (vertical or horizontal) and there is clearly only one path along the line of dots. I n other words, p1,~ = P8,l = 1, for al1 r > 1 and s > 1. If both r and s are larger than 1, then there are two possible starts for a path, either to the dot A immediately right of the lower left-hand dot, or to the dot B just above the initial one (see Fig. 2-3). Suppose

@-a

74

MATHEMATICAL INDUCTION

[CHAP

that the first move is to the right. We then have to follow a path from A to the upper right-hand corner. The number of such paths is just the number of paths through an r - 1 by S array, that is, P T - l , s in our notation. Similarly, if the first move is to point B, then there are P,,s-l ways to continue. Thus, since every path passing through A is different from every path passing through B, P r , s = Pr-1,s Pr,s-1, (2-7)

for r > 1 and S > 1. The relations (2-6) and (2-7) are similar to the identities (2-2) and (2-3) which determine the binomial coefficients. This can be seen more easily by changing our notation. We restate (2-6) and (2-7) as Pi,n+i = Pn+i,i = 1 (n 11, (2-8)

>

Now define
Nk,l

= Pk+l,l-k+l
=

(O

5 75

1).

(2- 1o)

Letting 7

0, 1

n, and also 7

n, 1 = n in (2-lo), we obtain

which is (2-2). Similarly, (2-10) yields

Then using (2-9), we have


Ni,n+i = Pi+i,n-i+2

= Pi,n-i+2

Pi+i,n-i+i

= Ni-i,n

Ni,,

(1

5i5

n),

which is (2-3). It was mentioned in Section 2-2 that the only solutions of (2-2) and (2-3) are the binomial coefficients. Therefore, N k , l = (i). NOWlet 7 = r - 1 , l = r + S - 2. Then

I n the particular case of a square with of paths is

dots on a side, r

= S

and the number

By letting S = 10, we obtain the solution of the problem which was originally proposed; there are

2-53

IXDUCTIVE PROPERTIES O F T H E NATURrlL NUMBERS

75

paths through the 10 by 10 array of dots. Thus, a person driving from First Avenue and First Street to Tenth Avenue and Tenth Street and back in our mythical city could do so every day of the year for more than 66 years without ever twice using exactly the same route in either direction.

In this example, ure have used induction as a method of proof somewhat indirectly, namely, to show that P,,, = (ri-S-2 , 1 ). This induction was actually carried out in Section 2-1. However, t,hemet,hod of setting up the problem (that is, obtaining a relation between P,,,, P,-l,s, and P,,,-l) is clearly based on the principle of induction. Xote that in order to apply this technique, it is necessary to generalize the original problem of finding the number of paths in a 10 by 10 array to the corresponding problem for a rectangular array of arbitrary size.

l. Carry out the proof of the identity

2. I t is well known that the sum of the interior angles of a regular n-sided polygon is (n - 2) 180 degrees. Give a proof of this fact by first generalizing i t to a suitable class of (not necessarily regular) polygons and then using induction. [Ilint: Divide a regular polygon into two polygons with a smaller number of sides by drawing a line between two nonadjacent vertices. Then see what induction hypothesis is needed to carry through the induction step. You may use the fact that the sum of the interior angles of a triangle is 180.] 3. Consider a triangular array of dots obtained from the s by s square array of JI:xample 2 by deleting al1 dots above the diagonal line from the lower left-hand corner to the upper right-hand corner. Figure 2 4 illustrates the case s = 5. Define paths from the lovver left-hand dot to the upper righthand dot as before. \Vhat is the number of paths through the triangular array with 10 dots on the FIGURE 2-4 horizontal and vertical sides?

. . ....
0 . .

2-5 Inductive properties of the natural numbers. There is a close relation between the principle of induction and the order properties of the natural numbers. Our purpose in t,hissection is to describe this relationship. We have spoken severa1 times of a sequence P1, P2,P 3 , . . . of statements, and later we will discuss arbitrary sequenccs of rational and real numbers. So far we have not given a complet,e definition of a sequence.

76

MATHEMATICAL INDUCTION

[CHAP.

We have assumed that this notion has an intuitive meaning. When denoting a sequence by xl, x2, x3, . . . , we are taking advantage of the obvious fact that the objects of any sequence can be labeled by the natural numbers. This observation can be used to define the concept of a sequence in any f6rmaL development of mathematics based on set theory and the axioms of the natural numbers. That is, a sequence is a correspondence between the natural numbers and the objects of a set

where 1 corresponds to xl, 2 corresponds to x2, and so on. The objects xl, x2, x3, . . . need not be distinct. For example, if 1 corresponds to O, 2 corresponds to 1, 3 corresponds to 0, 4 corresponds to 1, etc., we obtain the sequence which is usually written O, 1, 0, 1, . . . . The elements of the sequence are the members of the set X. This definition is precise, and it agrees with our intuitive notion of a sequence. Moreover, if sequences are defined in this way, then their properties, and in particular the principle of induction, can be derived from properties of the natural numbers. Let us see what property of the natural numbers it is that yields the principle of mathematical induction. A careful examination of the discussion in Section 2-1 shows that the principle of induction depends on the fact that if one proceeds along a sequence of objects, from one to the next, then eventually any given element of the sequence will be reached. Applying this observation to the sequence 1, 2, 3, . . . of natural numbers in their usual order, we get the f ollowing statement. (2-5.1). Principie of induction for the natural numbers. Let S be a set of natural numbers such that (a) 1 E S, and 1 is in S. (b) if a natural number n is in S, then the next number n Then S contains every natural number.

In the formal development of mathematics, (2-5.1) is usually taken as an axiom. The principle of induction is deduced from it easily. Let P1, P2,Pa,. . . be a sequence of statements indexed by the natural numbers. Let S be the set of al1 natural numbers n for which the corresponding statement P, is true. Suppose that the sequence of statements satisfies the two conditions (a) and (b) of (2-1.1). Then by (2-l.la), P1 is true, so that by the definition of S, 1 E S. Thus S satisfies (2-5.la). If n E S, then by the definition of S, P, is true, and therefore by (2-l.lb), P,+l is true. Hence, n 1 E S. Thus S satisfies (2-5.lb). Consequently, according to (2-5.1), S contains every natural number. This means that every one of the statements P1, PS,P3,. . . , Pn is true.

2- 51

INDUCTIVE PROPERTIES OF T H E NATURAL NUMBERS

77

There is another important property related to the ordering of the natural numbers. Let A be a nonempty finite set of natural numbers. Then the elements of A can be listed in some way:

By successively examining the numbers in this list, it is possible to pick out a smallest one, that is, a number ni which satisfies

Thus, A contains a smallest number. This same conclusion is reasonable even if A is infinite. Suppose that A is any nonempty set of natural numbers. Since A is not empty, it is possible to select some element a E A. Let A. = ( n l n E A, n 5 a ) . Then A. consists of some, but not necessarily all, of the natural numbers 1, 2, 3, , . . , a. Therefore, A. contains only a finite number of elements, and it is not empty since a E Ao. Thus A. has a smallest element. Cal1 this smallest element m. By the definit'ion of Ao, m E A and m a. If n E A, then either n E Ao, or n > a. In the first case, m n because m is the smallest element of Ao. I n the second case, we have n > a m. Thus, m is the least element of A. In practice, it may be difficult to determine which numbers belong to Ao, and if the cardinality of A. is large (say 1 Aol = 1 0 ' ~ ~the ~ ) process , of selecting the smallest number m might take severa1 lifetimes. However, A does have a least element, whether we can find it easily or not. This fact, which has important mathematical applications, can be stated as follows.

<

<

>

(2-5.2). Well-ordering principle. Let A be a nonempty set of natural numbers. Then A contains a smallest number m (that is, m E A and m _< n for al1 n E A). I t is obvious that the conclusion of the well-ordering principle is not true if A is the empty set. There are two reasons for pointing this out. First, a common blunder in applying the principle is committed by failing to prove that the set to which it is applied is not empty. Second, the wellordering principle is often used to show indirectly that some set A of natural numbers i s empty. One assumes that A is not empty, so that the well-ordering principle can be used to infer that A contains a smallest number. Then, from the existence of this smallest number in A, some contradiction follows. Therefore, A must be empty. This method of proof can often be used instead of a course of values induction.

78

MATHEMATICAL INDUCTION

[CHAP.

EXAMPLE 1. Using the well-ordering principle, we give a new proof of the fact that every natural number n greater than 1 is divisible by a prime number (originally proved in Example 2, Section 2-3, using course of values induction). Let A be the set of al1 natural numbers n which are greater than one and not divisible by a prime. If il is empty, then every n > 1 is divisible by a prime, and this is what we wish to show. Hence, suppose that A is not empty. Then by the well-ordering principle, there is a smallest number m E A. Since m belongs to A , i t is not 1 and it is not divisible by a prime. I n particular, m itself is not a prime. Therefore, m is divisible by some natural number k which is different from 1 and m. I n particular, k < m, and k 4 A, since m is the smallest number in A. By the definition of A, this means that either k = 1, or Ic is divisible by a prime. However k 1, so that k must be divisible by a prime p. Since p divides k and k divides m, i t follows that p divides m. Uut this contradicts the fact that m E A, since no number in A is divisible by a prime. Thus, the original assumption that A is not empty must be incorrect. That is, A = cP, which means that every natural number greater than one is divisible by a prime.

1. Show by examples that the well-ordering principle is not true for subsets of 2, Q, or R. [Hint: Give examples of nonempty sets which do not contain a smallest element.] 2. Show that the well-ordering principle is satisfied for subsets of the following sets. (a) (nln E 2, n 2 k), where 7 is any integer (b) (1 -

tin

E N)

3. Show by a method similar to the derivation of (2-1.1) from (2-5.1) that the generalized induction principle (2-3.3) can be deduced from the well-ordering principle (2-5.2). 4. Show that if m and n are any two natural numbers, then a t least one of the following is true. (a) m = n k (b) there is a natural number k such that m = n 1 (c) there is a natural number 1 such that n = m

+ +

[Hint: Let S be the set of al1 natural numbers m such that for al1 n either (a), (b), or (c) is satisfied. Note that 1 is in S. Show that condition (b) of (2-5.1) is also satisfied.]

2-61

INDUCTIVE DEFINITIONS

5. Show that (2-5.1) is a consequence of (2-5.2). 6. Give an inductive proof of the sequence of statements Pl, P2, P3, . . . , where Pn is the assertion: if A is a set of natural numbers and if n E A , then A contains a smallest element.

7. Show by the well-ordering principle that every nonempty finite set of natural numbers has a largest element.
"2-6 Inductive definitions. As the reader probably realizes, definitions are an important part of mathematics. Often an inductive process is used to formulate a mathematical definition. Definitions of this sort are called inductive, or recursive.

EXAMPLE l. Let x be a real number. Then the nth power of x is defined for al1 n as the product of x with itself n times. However, a more precise definition of xn is formulated inductively by means of two requirements:

EXAMPLE 2. Many important sequences are defined recursively. as an example the Fibonacci sequence:
2L1

V C T e cite

1,U2

1,U3

2 , u 4 = 3 , U 5 = 5, U6

- 13

f....

This sequence is defined by the inductive conditions


U 1 = U2 =

1,

un+i

= Un

+ un-1

(for n

> 2).

EXAMPLE 3. Often informal definitions are given for objects which should properly be defined by induction. For example, the sum S, = 1 2 . (n - 1) n of the first n natural numbers was introduced informally in Section 2-1. The inductive definition of this sum is given by the conditions Si = 1, S,+l = S, (n 1). A proof by mathematical induction would establish the identity S, = +n(n 1) in the same way as before.

+ + +

+ +

An inductive definition consists of two parts:


(1) conditions C1, C2, . . . , Ck, such that C1 determines a unique object 01,C2 determines a unique object 02,. . . , Ck determines a unique object Ok; (2) a condition K, which for any natural number n 2 k , determines a unique object in terms of 01, 0 2 , . . . , O,. In Example 1, O, is the number xn which is obtained by taking the nth power of z. In this example, k = 1, and the condition C1 is the

80

MATHEMATICAL INDUCTION

[CHAP.

equality x1 = x. The condition K is the equality xn+' = (xn) - 2 . In Example 2, 0, is the nth term of the Fibonacci sequence. Here, k = 2, C1 is the condition u1 = 1, C2 is the condition u2 = 1, and K is the condition n 2Un+ = Un Un-1,

>

I t is important to show that inductive definitions give uniquely determined objects 0, for every natural number n. That is, we would llke to know that there exists a sequence 01, 0 2 , . . . , O,, . . . of objects such that O, satisfies C, for every n 5 k, and 0, satisfies K for n > k, and that if O, OS, . . . , O;, . . . is any sequence of objects such that 0; satisfies Cn for n 5 Ic, and 0; satisfies K for n > k, then Oi = 01, 0; = 0 2 , . . . , 0; = O, . . . . This can be proved using the induction principle for the natural numbers (2-5.1).

THEOREM 2-6.1. Suppose that Cl, C2, . . . , Ck, and K are conditions having the properties stated in (1) and (2) above. Then there is a unique sequence 01,02,. . . , O,, . . . of objects such that O , satisfies Cn for n 5 k, and 0, satisfies K for n > k.
Proof. Let S be the set of al1 natural numbers m such that there are unique objects 01, 02,. . . , Om with the properties that for n 5 m and n 5 Ic, O, satisfies C,, and for I c < n _< m, O, satisfies K (provided m > k). Then 1, 2, . . . , and k are in S, since by ( l ) , conditions C1, C2, . . . ! Ck, respectively, determine unique objects 01, 02, . . . , Ok. Suppose that some m 2 k belongs to S. Then by (2), there is a unique object satisfying K. Therefore there are unique objects 01, 0 2 , . . . , Om, such that O,(n 5 m 1) satisfies C, for n 5 Ic, and 0, satisfies K for n > k. Thus m 1 E S. Hence, S satisfies the conditions of (2-5.1), and therefore every natural number is in S. This means that there is a unique sequence 01, 02,. . . , O,, . . . of objects such that O, satisfies C, for n 5 k , and 0, satisfies K for n > k.

As one might expect, if objects 0, are defined inductively, then mathematical induction is an important tool for establishing the properties of these objects. We illustrate this fact by obtaining an estimate of the size of the Fibonacci numbers. Let a be a positive real number satisfying 1 a = a2. Then a is a solution of the equation x2 - x - 1 = O. By the formula for the roots of a quadratic equation, the solutions of this equation are i ( 1 6 ) Of the solutions, only the first is positive. Thus, and s ( 1 - fi). a = +(1 6 ) . In particular, a > 1. Hence, u1 = 1 < a and u2 = 1 < a < a2. Make the induction hypothesis that u, < am for al1

2-61

INDUCTIVE DEFINITIONS

81

5 n. Then

Thus by the principle of induction we conclude that u, < a", for al1 n. Note that although we do not know an explicit formula for the number u,, we can nevertheless find some of its properties. This is typical of objects which are defined inductively:

1. Using the inductive definition of Example 1, prove that

and if O

< x < y, then O < xn < y".

2. Give an inductive definition for the sums in Problems l(a), 2(a), 3(a), and 4(a) of Section 2-1. 3. Give an inductive definition of n! 4. Give an inductive definition of nx (the operation of adding x to itself n times), and prove that this is the sarne as the operation of multiplying x by n. 5. List the first 50 terms of the Fibonacci sequence. 6. Show that no two consecutive terms of the Fibonacci sequence are divisible by the same natural number greater than one.

7. I n the Fibonacci sequence ul, u2, . . . , u,. n is a multiple of 3 and that u, is odd otherwise.
an

. . , show that

u, is even if

< u,+2,

8. In the Fibonacci sequence ul, u2, where a = + ( l d5).

9. Let a = + ( l d 5 ) and b = +(1 - 4 5 ) . Prove that if u, is the nth term of the Fibonacci sequence, then u, = (1/2/5) (an - bn).
10. Let the sequence vi,
v2, 03,

. . . , u,, . . . , show

that for al1 n,

. . . be defined inductively by

List the first 25 terms of this sequence. Show by induction that for any natural numbers m and n, 2um+, = umvn unvm, where ui, u2, . . . , U,, . . . is the Fibonacci sequence.

CHAPTER 3

THE NATURAL NUMBERS


3-1 The definition of numbers. I f you were to ask a person with an average education to define mathematics, his answer might be "the science of numbers. " Although this definition does not convey much information, it would be hard to find a more concise description of mathematics. Almost al1 of mathematics is concerned directly or indirectly with ordinary numbers. For this reason, it is important to examine carefully the concept of number, a notion which up to now has been taken for granted. What are numbers? This is a question of concern to both mathematicians and philosophers. Many answers have been gjven, but none can be considered to be final. However, 19th century mathematical research has shown that the common number systems (the integers, rational numbers, real numbers, and complex numbers) can al1 be constructed from the natural numbers. The question "What are numbers?" is therefore replaced by the apparently simpler problem "What are natural numbers? " In this chapter, we will discuss the natural numbers. In later chapters, the integers, rational numbers, real numbers, and complex numbers will be studied in turn. We will concentrate on the basic properties of each number system, and discuss the extent to which these properties determine the system. The intuitive idea of the various kinds of numbers which students develop in school will be critically examined. Finally, we will indicate the processes by which the integers are formally constructed from the natural numbers, the rational numbers are obtained from the integers, the real numbers are defined from the rationals, and the complex numbers are constructed from the real numbers. I t is important for advanced mathematics students to go through the constructions of these number systems in detail, but students a t an elementary leve1 will find the process long and tedious. I t is better for beginners to concentrate on understanding the definitions and why they are made. The text which follows aims to present a guide toward such understanding. A complete development of the number systems will be given in the form of problems, and interested students are invited to work out the details for themselves. The concept of a natural number was developed over a period of many centuries as a tool for counting the objects in sets. That is, the natural numbers were introduced as labels to designate the property shared by sets of equal cardinality (in the sense introduced by Cantor, see Section 1-2). Thus, when two traders of ancient Egypt agreed to exchange

3- 1 1

THE DEFISITION

OF XUMBERS

83

three camels for seven wives, it was essential that both understood how many items he would have to give up and how many he would receive. This understanding could be achieved, for example, by means of "counters, " that is, collections of stones from which sets of small cardinality might be formed in the palm of the hand. I+om very early times, numbers have been treated as concrete objects, rather than the names of properties of sets. I n fact, this fictitious viewpoint was essential for the creation and development of mathematics. I t was not until the 19th century that mathematicians started wondering how to justify the existence of numbers. One of the earliest attempts to define the natural numbers as objects was made by the German mathematician Gottlob E'rege (1848-1925) in a book on the foundations of arithmetic, published in 1893. Basing his work on Cantor's set theory, Frege defined the cardinal number of a set A to be the class of al1 sets which are equivalent to A by Cantor's definition of equivalence (see Definition 1-2.3). According to Frege's definition, the cardinal number of {O, 1) is the class of a11 sets {al, a2), where a l and a2 are distinct objects. Similarly, the cardinal number of (1, 11, 111) is the class of al1 sets {al, a2, a3), where a l , a*, and a3 are distinct elements. The natural numbers are defined to be the cardinal numbers offinite sets. Thus, the number "2" is a set, namely, the set of al1 pairs of distinct objects. Frege's definition of the natural numbers is therefore based on two concepts: the notion of a finite set, and the definition of the cardinal number of an arbitrary set. I n Section 1-2, it was taken for granted that the natural numbers existed, and that their properties were well known. Thus, it made sense in Definition 1-2.2 to define a set to be finite if it was equivalent to (1, 2, . . . , n) for some natural number n. This definition cannot be used if the natural numbers are defined as in Frege's program, using the concept of a finite set. This difficulty can be avoided by defining a set A to be finite if and only if there is no one-to-one correspondence betiveen A and a proper subset* of A . I t turned out that there was a more serious flaw in the second notion which enters into Frege's definition of the natural numbers. Unless handled with great care, the concept of the class of al1 sets which are equivalent to a given set A leads to perplexing logical contradictions. Exactly how much care is needed to avoid these contradictions is somewhat uncertain even now. Thus, the definition of the cardinal number of an arbitrary set which Frege used is unacceptable, and therefore so is his construction of the natural numbers.

* I t is possible to give a convincing argument that this definition of a finite set agrees with the intuitive idea of such a set. For examplc, see The Foundations b y R. L. 1j7ilder, pp. 62-71. TViley (1952)) New York. oj ~Jiathematics

84

THE NATURAL XUMBERS

[CHAP.

A more satisfactory definition of the natural numbers was given by John von Neumann (1903-1957) in 1923. von Neumann observed that any standard sequence of sets, containing what we would intuitively recognize as 1, 2, 3, 4, . . . elements, respectively, can be adopted as the sequence of natural numbers. He showed that a convenient choice for this standard sequence is

These particular sets* are now usually called Jinite ordinal numbers. The elements of each such ordinal number n are the empty set, and al1 ordinal numbers which precede n. Usually, the number O (zero) is included among the ordinal numbers. I f this is done, then according to von Neumann's definition, O would have to be the empty set, since is the only set which contains no elements. With this convention, the definition of finite ordinal numbers is very natural: n is the set of al1 ordinal numbers which precede it. In this book, we will consider O to be an ordinal number, but not a natural number. This convention simplifies certain statements concerning the arithmetic of N, for example, the cancellation law of multiplication. In order to develop some of von Neumann's theory of the natural numbers, we must give an exact definition of these objects. The method by which the natural numbers can be generated is clear: and
=

{a, 1, 2,

. . . ,n

1) U {n) = n U (n).

Starting with 1 = {a), we obtain successively 2


=

(1)

= =

{a)

(1)

{a, 1)
=

(a, {a)),

3 = 2 U (2)

{a, 1) U (2)

{a, 1,2) =

{a, (a;,, {a, (al-)),

and so on. Every natural number is ultimately obtained by this process.

* The reader should remember the convention discussed in Section 1-1 that an object a is to be distinguished from the set {a) whose only element is a. Thus, 1 = (a) # a , and 2 = (a, {a)) Z {a, a ) = {a) = 1, etc.

3-11

THE DEFIXITIOS

OF NUMBEHS

85

DEFINITION 3-1.1. (a) (<P.} is a natural number. (b) If n is a set which is a natural number, then n U ( n ) is a natural number. (c) The nataral numbers are just those sets which are obtained by repeated application of t,he rules (a) and (b). Hencefort.h, the term "natural number" will refer to the sets defined in Definition 3-1 . l . Of course, the familiar symbols 1, 2, 3, . . . will be used to denote the respective sets

We will use lower-case letters to represent natural numbers, even though this violates the custom of denoting sets by capital letters. As usual, the symbol N will stand for the set of al1 natural numbers. Our definition of the natural numbers may a t first seem strange to the reader. IIo\vcver, the use of this definition requires very little readjustment in our way of thinking about natural numbcrs. The theorems and definitions which have been given in Chapters 1 and 2 are al1 sensible and correct when the term "nat,ural number" is interpreted according to Definition 3-1.1. As an example, let us consider the principle of induction (2-5.1). It is convenient to introduce the following notation. If n is a natural number, define S(n) = n U ( n ). Intuitively, S(n) denotes the "successor" of n, that is, n 4-1. With this notation, t,he induction principle can be st,ated in the following form. (3-1.2). If A is a set of natural numbers such that (a) 1 E A, and (b) if n E A, then S(n) E A , then A contains every natural number. This principle is virtually a restatement of Definition 3-1.1 (c). Indeed, the conditions (a) and (b) in (3-1.2) state that (a) belongs to A, and if n is iii A , thcn n U ( n ) belongs to A. I n particular, any set which is obtained by repeated application of the (a) and (b) in Definition 3-1.1 must also belong to A. Thcrefore, according to Defini tion 3-1.1 (c), every natural number is in A . The fact that each natural number n is a set which contains what we instinctively think of as n elements often makes it possible to simplify the statements of definitions and theorems. E'or example, Definition 1-2.2 (of a set X having cardinality n) is intuitively eyuivalent to the statement that there is a one-to-one correspondence Inetween X and n.

86

THE XATURAL NUMBERS

[CHAP.

(3-1.3). If n is any natural number and X is a set, then only if) X is equivalent to n.

1 x 1 = n if

(and

Because it is intuitively sound and better suited than Definition 1-2.2 for the development of the theory of the natural numbers from Definition 3-1 . l , we will use (3-1.3) as our definition of a set having cardinality n. As in Definition 1-2.2, a set X will be calledJinite if either X = G, or there is a natural number n such that X is equivalent to n. Although it would be contrary to our intuition, it is not inconceivable that a set X might be equivalent to two different natural numbers m and n. This would imply that there is a one-to-one correspondence between m and n. Fortunately, it is possible to show that if a one-to-one correspondence between natural numbers m and n exists, then m = n (see Problem 19). Consequently, it makes sense to say that the cardinal number of the set X is n if X is equivalent to n, and to define 1 x 1 to be this unique number. In other words, (3-1.3) is meaningful. As a set, each natural number n is equivalent to itself. Therefore, according to (3-1.3)) Inl = n, for every natural number n. I f X is any finite, nonempty set, then X is equivalent to 1x1. In fact, for X to be nonempty and finite means by definition that there is a natural number n such that X is equivalent to n. Then by (3-1.3), = n. Hence, X is equivalent to 1x1. This observation leads to the following useful fact. If X and Y are two finite nonempty sets, then X is equivalent to Y if and only if 1 x 1 = (Y/. To prove this statement, suppose first that X and Y are equivalent. Since Y is equivalent to / Y / ,it follows from (1-2.4~) that X is equivalent to 1 Y1 . Therefore, taking n in (3-1.3) to be the natural number 1 Y(, we obtain 1 x 1 = / Yl. TO prove the converse x 1 = 1 Yl. Since X is equivalent to 1 x 1 ,and Y is statement, suppose that 1 equivalent to lY1 (and 1 x 1 = / Y / ) ,it follows from (1-2.4~) that X is equivalent to Y. Many results follow from Definition 3-1.1 and (3-1.2). In particular, we cite the following.

1 x 1

(3-1.4). (3-1.5). (3-1.6). (3-1.7). m = n.

1 is a natural number.

If n is a natural number, then S(n) is a natural number.


There is no natural number n such that 1 = S(n). If m and n are natural numbers such that S(m) = S(n), then

The statements (3-1.4) and (3-1.5) are reformulations of Definition 3-1.1 (a) and (b), respectively, using the notation 1 instead of (G) and S(n) instead of n U {n). The proofs of (3-1.6) and (3-1.7) are easy, and we leave them as exercises for the reader (see Problems 4 and 6).

3-11

THE DEFIXITION

OF NUMBERS

87

The statements (3-1.2), (3-1.4), (3-1.5), (3-1.6), and (3-1.7) are called Peano's axioms,because it was shown in 1889 by the Italian mathematician Guiseppi Peano (1858-1932) that the whole theory of the natural numbers can be developed from these statements. I n Peano's development of arithmetic, the natural numbers constitute a set N of undefined objects, with a distinguished clement 1. I t is assumed that an operation is defined on N which corresponds intuitively to the process of passing from a natural number n to its successor S ( n ) . l~inally, it is assumed that Peano's axioms are satisfied. From these fetv axioms, it is possible to define addition and multiplication in N, and show that these opcrations have their familiar propert,ies. This axi0mat.i~ development of the natural numbers is carried out in severa1 t,extbooks, in which a construction of the number systems is given. However, we u-ill not use I'cano's definition of the operations in N. When the natural numbers are defined as in Definition 3-1.1, the addition and multiplication operations have useful meanings in terms of the operations of set theory.

l. According to Frege's definition, what is the cardinal number of the empty set ?

2. Write in full the sets which arc 5 and 6 in von Neumann's definition of the natural numbers. 3. Prove that if n E N, then @ E n ; thus, no natural number is the cmpty set. [IIint: Let A = {n E NI@E n). Use (3-1.2) to show t h a t A = N.]
4. Show that if n E N, then n 4 1, so that 1 # S(n).

5. Prove that if m E n, then m n. [Hint: Lct A or m 5 n) . Use (3-1.2) to show that A = N.]
6. Prove (3-1.7).

{n E N I either m 4 n,

The following problems lead to some of the most important properties of the natural numbers. Several of the assertions made in this section and a number of the unprovcd statements in the next two sections occur among these problems. The rcader with limited mathematical background will probably have difficulty proving some of the statements in Problems 7 through 19, evcn though hints are supplied in many cases. Such students are advised to rcad the problems, and t r y to see what they mean (keeping in mind that thc set n is {a, 1, 2, . . . , n - 1))) without attempting to do al1 of them. Howcver, Yroblems 7 through 19 should be worked in the order in which thcy appear, because many of them depend on thc preccding ones. 7. Prove that n 4 n for al1 natural numbers n. [Hint: Let A Csc (3-1.2) to prove that A = N.]
=

{n E Nln G? n) .

8. Prove that if m E n, thcn m C n.


9. Prove that for every natural number n, cither n
=

1, or 1 E n.

88

THE NATURAL NUMBERS


=

[CHAP.

10. Prove that for al1 natural numbers n, either n

1, or n

S(k) for some

k
A A

E N.

11. Show that if m E n, then either #(m) = n, or else S(m) E n. [Hint: Let = {n E N I for al1 m E n, either S(m) = n, or else S(m) E n). Prove that
=

N.]

12. Show that for al1 m and n, either m E n, m = n, or n E m. [Hint: Let A = {n E N I for al1 m, either m E n, m = n, or n E m). Prove that A = N.] 13. Prove that for al1 m and n, either m C n, m 14. Prove that m C n if and only if m E n. 15. Prove that m C n if and only if S(m) C S(n). 16. Show that if n is a natural number, and X is a proper nonempty subset of x 1 = m. [Hint: Let A be the n, then there exists m E N such that m c n and 1 set of al1 n for which the statement is true. Show that 1 E A . Suppose that n E A. To prove that S(n) E A, suppose that C X C S(n) = n U (n). Let Y = X n n. Show that the statement of the problem is true for S(n) and X in each of the following cases: Y = a, Y = n, C Y C n and n 4 X, C Y C n and n E X.] 17. Prove that if X is a subset of a finite set Y, then X is finite, and moreover, x 1 C 1Yl. if X C Y, then 1 18. (a) Show that if X is a finite set and z is any element, then X U ( 2 ) is finite. (b) Prove that if X and Y are finite sets, then X U Y is finite. [Hint: Let A = {m E N I if X is a finite set and lY/ = m, then X U Y is finite). Use (3-1.2) to prove that A = N.] 19. (a) Prove that if n E N and 1 C n, then there is no one-to-one correspondence between 1 and n. (b) Show that if m E N and n E N are such that there is a one-to-one correspondence between #(m) and S(n), then there is a one-to-one correspondence between m and n. [Hint: Let r +-+ S be a one-to-one correspondence between S(m) and S(n). If m E S(m) corresponds to n E S(n), then r +-+ S is a one-to-one correspondence between m and n, also. If m does not correspond to n, show that the given correspondence can be modified to obtain a one-to-one correspondence between m and n.] (c) Prove that if m and n are natural numbers such that m and n are equivalent, then m = n. [Hint: Let A = (m E N I for al1 n E N, if m C n, then m and n are not equivalent). Use Problem 19(a) and (b), and (3-1.2) to show that A = N. The statement (c) then follows from the result of Problem 13.1 3-2 Operations with the natural numbers. Once people began to think of the natural numbers as concrete objects, it was found that these objects could be combined in useful ways. Thus, if a set A contains 2 elements and a set B, which is disjoint from A, contains 3 elements, then the union A U B invariably contains 5 elements. The process of forming the union
=

n, or n C m.

3- 21

OPERATIOXS

IVITH THE SATUR-IL SUMBERS

89

A U B of disjoint sets gives risc to the abstract operation ivhich ive cal1 addition: 2 3 = 5. Thc definition of addition can be stated very simply.

I)EFINITIOI; 3-2.1. Let m and n be nat,ural ilumbcrs. Let X aiid Y be sets such that = m, IYJ = n, and (a) (b) X n Y = a . Then thc sum of m and n is the natural numbcr ( X U YI.

1 x 1

The siim of m and n is denoted by m n. Thc proccss which associates with cach pair m and n of natural numbers thcir sum m n is the binary operation callcd addition. The fact that every pair of natural numbcrs has a uniclue sum is expresscd by saying that thc natural numbcrs are closcd under addition. In order to sce that 3-2.1 is a valid dcfinition, wc must shoiv that it provides a rule by which any two natural numbers can be combincd to produce a uniclue third natural number. This is accomplished by proving the folloiving statcmcnts.

(1) For any natural numbers m and n, thcrc cxist sets X and Y which satisfy (a) and (b). (2) If , Y and Y are sets satisfying (a) and (b), then X U Y is finite, that is, therc is a natural number k such that IX U Y 1 = k. (3) The natural numbcr IX U Y 1 is the samc for al1 pairs X, Y of sets satisfyirlg (a) and (b). It. is clear t'hat (1) and (2) guarantee that cach pair of natural numbcrs has a sum, and (3) insures the fact that this sum is unique. To prove (l), ivc must define tivo scts X and Y which satisfy (a) and (b). Thcre are many ways in which this can be donc. We can, for example, use the following construction. IJct X be thc product sct m X {1), and let Y = n X (2). Thcn , j ~ m , k E n, 1; (k,2), arc one-to-orie corresporidcnces bctween m and m x (1) and n and n X (2) , respcctivcly. Thcrcforc, X is equivalent to m and Y is equivalcnt to n. Thus, by (3-l.:<), (a) is satisfied. Supposc that (b) is not satisficd, that is, X n Y # a. Thcn therc ivould be some j E m and k E n such that (j, 1)
=

(lc, 2).
=

By 1)cfinition 1-3.1, this implies in particular that 1 = 2, that is, {a) {+, {a)). Since this is clcarly false, it folloivs that X n Y = a.

90

THE NATURAL NUMBERS

[CHAP.

The proof of (2) is based on (3-1.2)) and it will not be given (see Problem 18, Section 3-1). Statement (3) is a consequence of the following result from set theory. (3-2.2). I f X, X', Y, and Y ' are sets such that (a) X is equivalent to X 'and Y is equivalent to (b) X n Y = cP and X 'n Y ' = a, then X U Y is equivalent to X' U Y'.

Y ' ,and

Proof. Since X and X' are equivalent, there is a one-to-one correspond,x ' between X and X ' . Similarly, there is a one-to-one correence x t spondence y ++ y' between Y and Y ' . The required one-to-one correspond'U Y ' is obtained by combining the given ence between X U Y and X correspondences x t ,x 'and y ++ y ' .Since X n Y = cP and X' n Y ' = cP, the resulting combination is a one-to-one correspondence. Figure 3-1 below illustrates this proof. As an exercise, the reader can use (3-2.2) to prove (3).

The following fact is an immediate consequence of Definition 3-2.1. (3-2.3). If X and

Y are disjoint, finite, nonempty sets, then

The basic rules of addition are the following familiar statements. (3-2.4). Properties o f Addition. Let k, m, and n be natural numbers. Then (a) Ic+ ( m + n ) = @ + m ) + n ; (b) m + n = n + m ; (c) if k n = m n, then l c = m.

Statement (a) is the associative law o f addition, (b) is the commutative law o f addition, and (c) is the cancellation law o f addition. The properties (a) and (b) are easily seen to follow from the corresponding associative and commutative laws for set unions. For example, if X

3- 2 1

OPERATIONS

WITH THE NATURAL NUMBERS

91

and Y are disjoint sets such that 1 x 1 = m and ( Y ( = n, then by Definition 3-2.1, m + n = J X U Yl and n + m = J Y u XI. Since X U Y = Y U X , it follows that m n =n m. The proof of (c) will be given in the next section. Before turning our att'ention to the operation of multiplication, we will prove that the "successor" of each natural number is obtained by adding 1 to the number:

Proof. Let X = n X {1}, and let Y = (a) X (2) = {(a, 2)). Then, as in the proof of statement (1) following Definition 3-2.1,
( , l ) , k ~ n , and @-(+,2)

are one-to-one correspondences between n and X and 1 = {a) and Y, respectively. Therefore, 1 x 1 = n and (Y1= 1. &loreover, X n Y = as before. Hence, by Definition 3-2.1, / X U Y 1 =n 1. On the other hand, the correspondence

t ,

(k, 1 ,

for Ic E n,

t ,

(a, 2)

is one-to-one between n U {n) = S(n) and X U Y. Thus, X U Y is equivalent to the natural number S(n), that is, J X U Y 1 = S(n). Therefore, S(n) = ( X U Y ( = n 1.

Like addition, multiplication of natural numbers is a binary operation under which the set of natural numbers is closed. That is, if we are given natural numbers m and n, the operation of multiplication yields a unique natural number which is called the product of m and n, and is usually denoted by* mn or m n. I n this book, both the notations mn and m n will be used. The multiplication of natural numbers is often defined to be the process of repeated addition. IIor a given natural number n, we obtain the product 2 n by adding n n, the product 3 n by adding (n n) n [or n (n n), since by (3-2.4a), addition is associative]. In general, the n n, where there are m product m n is found by adding n terms in the sum. The familiar terminology "m times n " is a literal expression of the idea that multiplication is repeated addition. This definition is meaningful for specific products such as 2 n and 3 n. However, in the definition of m n for an arbitrary natural number m, the statement

+ +

+ +

+ +

--

* The symbol m x n is also frequently used to denote the product of m and n. However, in our notation, m x n stands for the'set product of m and n considered as sets.

92

THE NATURAL NUMBERS

[CHAP.

"there are m terms in the sum n n n" is vague. The difficulty in this definition can be corrected by defining multiplication by the conditions (a) 1 n = n, ( b ) S(m) n = m n

+ +

+ n,

and using (3-1.2) to prove that m n is a uiiiquely defined natural number for each pair of natural numbers m and n. An alternative definition of multiplication, based on the concept of the product of two sets, is better suited to our discussion in which the operations of the natural numbers are related to operations with sets. The following example shows that such a definition is reasonable.

EXAMPLE l. The seats in theaters are usually labeled by letters corresponding to the rows, and by numbers corresponding to the positions of the seats in each row. Suppose that a theater has 26 rows, each of which contains 50 seats. Then each seat is labeled by an ordered pair consisting of a letter of the alphabet and a number from 1 to 50. Consequently, there is a one-to-one correspondence between the seats in the theater and the set A X L, where A = (a, b, c, . . . , z), and L = (1, 2, 3, . . . , 50). Therefore the number of seats in the theater is of the product set A X L. On the other exactly the cardinal number A X hand, we know very well that the number of seats can be determined by multiplying the number of rows by the number of seats in each row. That is, 1 A X L / = /A/ (L].

LI

DEFINITION 3-2.6. Let m and n be natural numbers and X and Y be sets such that 1 x 1 = m and 1 Y ! = n. Then the product of m and n is defined to be the natural number I X x Y(. The process which associates with each pair m and n of natural numbers their product m n is called multiplication. As in the case of Definition 3-2.1, this definition needs some justification. I n Definition 3-2.6, the existence of sets X and Y which satisfy 1 x 1=m and ] Y 1 = n is clear. Indeed, the sets m and n themselves satisfy I m1 = m and lnl = n. Two facts must be proved. (1) If X and Y are setd such that 1 x 1 = m and 1 Y1 = n, then X X Y is finite, that is, there is a natural number lc such that I X x Y1 = k. (2) The natural number ( X X Y ! is t'he same for al1 pairs X, Y of sets satisfying 1 x 1 = m and / Y /= n. The result (1) is obtained by using (3-1.2), and as before the details mil1 not be given (see Problem 6 in this section). Statement (2) is a consequence of the following result concerning set products.

3-21

OPERATIONS WITH THE NATURAL NUMBERS

93

(3-2.7). If X, X', Y, and Y' are sets such that X is equivalent to X' and Y is equivalent to Y', then X X Y is equivalent to X' X Y'. Proof. Since X is equivalent to A", there is a one-to-one correspondence x ++ x' between X and X'. Similarly, there is a one-to-one correspondence y ++ y' betwecn Y and Y'. The required one-to-one correspondence between X x I' and X' X Y' is

The following important thcorem follows easily from Dcfinition 3-2.6. THEOI~EM 3-2.8. If X and Y are finite nonempty sets, then

Proof. Since X and Y are finite nonempt,y sets, there exist natural numbers m and n such that X is equivalent to m, and Y is equivalent to n. By Definit,ion 3-1.3, this means that 1 x 1 = m and IYI = n. Therefore,

by Definition 3-2.6. AMultiplicationhas the following properties. (3-2.9). Properties of multiplication. Let k, m, and n be any natural numbers. Then (a) (b) (c) (d) (e)

k (m n) = (k . m ) n; n z 72 = n m ; if k n = m n, then 1 c = m; 1 en = n ; k.(m+n) = k.m+k.n.

Parts (a), (b), and (c) of (3-2.9) correspond exactly to the three properties of addition givcn in (3-2.4). Thus, we say that multiplication is associativc, commutative, and satisfies the cancellation law. The fact that the number 1 is an identity element for multiplication is stated in (d). The two basic operations are connected by the cquality (e), which is called the distributive l a w of multiplication with respect to addition. Xote that (b) and (d) imply n 1 = n, and (b) and (c) imply the "right-hand" distributive law ( m + n ) . k = m . k + n - I c . Thc identities (a), (b), (d), and (e) are proved using the definitions of multiplication and addition, together with some simple results of set theory; the property (c) will be proved in the next section. Let X, Y, and

94

THE NATGHAL XCMBERS

[CHAP.

W be sets such that 3-2.8,

1 x 1 = k, 1 Y1 = m! and ITfVj = n.

Then by Theorcm

Similarly, Sincc X X (Y X JV) is cquivalent to (X X Y) X TV by Thcorem 1-3.3(b), and thcse are firiitc nonempty sets, it follo~vsthat IX X (Y X TV) j = I(X X Y) X TVl. Thcrefore, li (m n) = ( 1 c . m) n. The identity (3-2.913) is a dircct consequencc of Thcorem 1-3.3(a) :

use the facts that 1 {@) ( = 1, and that (@, w) ++ u,, To prove (3-2.9d), ' and TV. for w E TV is a one-to-one corrcspondence between (@) x V Thcrefore, l . n = ](@] X I.YI = ( W J= n. The proof of (3-2.9~) is based on the following result from sct theory, which we lcavc for thc reader to prove. (3-2.10). Let X, Y, and JV be any sets. Then

(Y

W)

(X

Y)

(X

W).

For the application of (3-2.10) to the proof of (3-2.9c), lct X , Y, and W bc any sets such that 1 x 1 = Ic, / Y / = m, ITVI = n, and Y n W = @. Thcn it is clear that (X X Y) n (X X W) = a. Hence, by (3-2.3), Theorem 3-2.8, and (3-2.10), 1 c m li n = IX X Y/ ( X X WI = I ( X x Y) ~ ( X TY)] X = IX x (Y U JV)] = 1 x 1. ] Y u W J = ( m + n ) .

l. Usc (3-2.2) to provc the statement (3) following Definition 3-2.1.

2. Assuming that addition corrcsponds to set union and that multiplication corresponds to set intcrscction, which of the laws (3-2.4) and (3-2.9) have analogucs for scts, and which of the identities of Theorem 1 4 . 3 have analogues in the arithmctic of natural numbers? 3. Write the proof (3-2.4a) in full.

m 4. If m and n are natural numbcrs, what is I 5. Prove (3-2.10).

u nl and Jmn nl?

6. Prove by induction (3-1.2) on the cardinal number of Y that if X and I' are finite sets, then X X Y is finitc. [liint:Use (3-2.10) and Problem 18, Section 3-1 .]

3-31

T H E 0 H t ) E R I S G OF THE S A T U I l A L S U M R E I i S

93

3-3 The ordering of the natural numbers. I'robably the most important property of the natural numbers is their ordcring. Just the act of counting prescnts t,hc natural numbers in a definite sequence: one, two, three, four, and so on. I t is this ordering of the numbers that most childrcn lcarn long bcfore thcy can add or multiply. I n the devclopment of the natural numbers bascd on Definition 3-1.1, the ordcring is dcfined very simply.

DEFISITIOS3-3.1. If m aiid n are iiatural iiiimbers, t.hen m is said t o be less than n if m C n. As is cust,omary, we will write m < n or n > m if m is less than n. According to Definition 3-3.1 and the result of Problcm 14, Scctioii 3-1, the thrce conditions are equivalent. I n Scction 2-5, we discussed one important aspcct of thc ordering of the natural numbers, namcly thc ~vell-ordcringprinciplc: (2-5.2). Here we poirit out, somc of thc more elementary propertics of ordcr. (:3-3.2). Properties of order. 1,ct lc, m, and n bc any natural numbcrs. Thcn (a) eithcr m < n, m = n, or n < m, and it is impossiblc for more than onc of thcsc relations to bc satisficd by a giveii pair m and n of natural numbers; (b) if k < m and m < n, then 1 ; < n; (c) if k < m, then k n <m n; (d) if k < m, thcn k n < m . n.

By the definition of <, the property (3-3.2s) is cquivalciit to the statement that cxactly one of the rclations m c n, m = n, n C m is satisfied. I t is clear from thc definition of set theoretical inclusion that a t most one of these relations holds. Thc fact that a t lcast onc of thc relatioiis is satisfied was stated as Problcm 13, Section 3-1, and wc \vil1 not prove this rcsult. The assertion (3-3.213) is the samc as the statcment that if 1 ; Cm and m c n, then k c n, and this is clearly true. I n order to provc part (c), choose scts X, Y, and W such that IXI=1;, and X IYI=m, and Y

lTVl=n,

(3- 1)

cY

IY

a.

(3-2)

This can bc done in thc samc way that scts were constructcd for the proof

96

THE X.%TUI~~IL KUMREIIS

[CHAP.

of statcment (1) following I1efinition 3-2.1. Then by Defiiiition 3-2.1, k n = 1X U TVl, and m + n = IY U IVl. (Xote that Y n TV = implies X n IV = Q.) I t folloivs from (3-2) that X U IV c Y U T.V. Using thc result of l'roblem 17, Section 3-1, X U TV c Y U bV implics IX U TV( C 1 Y U TiT/. Thcrefore, using Dcfinition 3-3.1, \ve have k n = IX U TVJ < 1 Y U TI;/ = m n. Thc proof of (3-3.2d) is similar, and we lcavc it as a11 cxcrcisc for the rcader. The cancellation laws of addition and multiplication, (3-2.4~) and (3-2.9c), can be deduced from (3-3.2). I t is nccessary to shoiv that if k, nz, and n are natural numbcrs, then

(1) 1 ; n = n2 n implies k = m, and (2) li n = m n implies 1i = m. Wc provc t,he cont,rapositives of thcse implications (sec t,hc Introduct,ion). Suppose that li f: in. Then by (3-3.2s), eithcr 1 ; < m, or m < k. Suppose that 1 ; < m. By (3-3%) and (d), li n <m n and I; n < m n. If m < li, the proof is n # m n. li n # m n and k Thcreforc, similar. There is a useful relat,ion betivccn thc ordering and addition of natural numbers.

THEOHEM 3-3.3. Lct m and n be natural numbers. Then m and only if thcrc is a natural niimber 1 ; such that n = k m.

< 7~ if

That is, if n = k m, thcn m < n, and convcrsely, if m < n, it is possible to find a number k such that n = 1 ; m. Intuitively, (3-3.3) is evidcnt. The natural numbcr 1; is obtained by counting thc numbers in the scqucnce m+l, m+2, ..., n.

A formal proof of Theorcm 3-3.3 could be given by means of induction on m. To carry this out, it is nccessary to use the rcsiilts givcn in scveral of the problems in Section 3-1. We \vil1 leave this task to thc reader who is intercsted in working out the details of the thcory (see I'roblem 5 ) . I t is easy to sec that the numbcr k which occurs in Thcorem 3-3.3 is z m and n = 1c' m, then 1 ; m =1 ; ' m, unique. Indecd, if n = l so that by the cancellation Inw (3-2.4c), k = Ii'.

I~EFIXITION 3-3.4. Let m siid n be natural numbers such that m < n. Thc uniquc number k satisfying n = li m is called t,hc diflercnce of n and m.

The usual notation to designate thc dif'fcrcncc of n and m is n - m. The opcration ivhich associates with thc pair m and n of natural numbers

3-31

THE O R D E H I S G OF T H E SjITURAL NUiMBEHS

97

(satisfying m < n) their difference n - m is called subtraction. If m is not less than n, then it is impossible to form the difference n - m and still remain within the system of natural numbers. T h a t is, thc natural numbers are no closcd with respect to suhtraction. One of the reasons for enlarging the system of natural numbers to the intcgcrs is to make subtraction of any two numbers possible. Subtraction satisfies the following identities.

(3-3.5). Let j, li, m, and n be iiutural numbers such that j < k and m < n. Then n) - ( j n) ; (a) (li - j) -1 (n - m) = (1; (b) (1; - j) (n - m) = (li . n j a m ) - ( j - n 1 i . m ) ; (e) (k n) - (1; -t m) = n - m; (d) IC (n - m) = (1; 1%) - (1; m).

The proofs of al1 of these identities are based on the same observation. Suppose that x, y, and x are natural numbers. Then y < .v, and x is equal y = x. I n fact, if x g = x, to the diffcrencc x - y if and only if x then y < x by Theorcm 3-3.3, and x = x - y by Ilcfinition 3-3.4. Conversely, if y < x and x = x - g, then x y = .v by Ilefinition 3-3.4. We now use this fact to prove (3-3.5b). I n this case, .z: = 1 c n j . m, y =j - n l;.m, and z = (1; - j ) (n - m). S o t e that sinre j < k and m < n, the diffcrcilccs 1 ; - j and n - m are natural numbers, so that z = (Ic - j) (n - m) is a natural numbcr. We must prove that .2: = x y, that is,

Sirice n = (n - m) m, it follo~vsthat j n = j [(n - m) j (n - m) j m. Consecluently, by (3-2.4) and (3-2.9))

+ m] =
+ 1;

(1;

j).(n

m)
=

+ ( j . n + 1;-m) = (k j ) . ( n + [ ( j (n - m) 4-j m) -t li m]
-

m) m m) m)

j (n - m)] [(k - j) (n - m) 4= [((I; - j ) -kj ) (n - m)] ( 1 c m = 1 ; (n - m) :(1; m j m) = [k (n - m) 1 ; m] j m

-+
+

+(j

+j

We leave the proof of the remnining parts of (3-3.5) as an exercise for t,he reader.

98

THE NATURAL NUMBERS

[CHAP.

1. Use the result of Problem 17, Section 3-1, to show that if Y is a finite set, < IYI. and X C Y, then

1 x 1

2. Prove (3-3.2d). 3. Show that if k, m, and n are natural numbers, then (a) k implies k < m, and (b) k n < m n implies k < m.

+n < m+n

4. Use (and cite) the necessary results from the problems of Section 3-1 to prove that the following conditions on a natural number are equivalent. However, do not use Theorem 3-3.3. (a) n - # 1 (b) n > 1 (c) There is a k E N such that n = S(k). 1. (d) There is a k E N such that n = k (e) There is a fc E N and a j E N such that n = k j. (f) There is a k E N such that n > k. 5. Using ordinary mathematical induction, together with the results of Problem 4, prove the following two statements (for al1 m). (a) For al1 natural numbers n, if m < n, then there is a natural number k such that n = k m. (b) For al1 natural numbers n, if there is a natural number 7 such that n = k m, then m < n.

6. Show that if j < k and m thenj+m < k + n a n d j * m

< n (wherej, < kan.


-

k, m, and n are natural numbers),

7. Prove (3-3.5a, c, d).


8. Show that if k - n
=

n, then k = m.

9. Let k, m, and n be natural numbers. Prove the following. (a) I f m < n, then (n k) - m = (n - m) k. (b) I f n < m < k, t h e n n + ( k - m ) = k - ( m - n ) . f n < k < m < n + k, then n - (m - k) = ( n + k) - m (c) I = k - (m-n). (d) I f n + k < m, then (m - k) - n = (m - n) - k = m - ( n + k).

CHAPTER 4
THE INTEGERS
4-1 Construction of the integers. The average American student first encounters the integers in elementary algebra, tvhere he learns that the integers consist of the natural numbers, zero, and the negative numbers (the negatives of the natural numbers). He learns by rote the rulcs for adding and multiplying these numbers and s.ome identities which addition and multiplicntion satisfy. If given the opportunity to use this knowledge, he muy remember these rules, but the chances are good that he \vil1 never ask why the integers are definid as they are, or w h y they satisfy their familiar rules of operation. Our purpose in this section is to explore these questions. As far as historical rcsearch has determined, the negative numbers and zero were introduced by the Hindu mathematicians of India in the sixth or seventh century A.n. The increasing importancc of commerce in India a t that time stimulated this invention. The natural numbers could be used t o measure fixed quantities of moneySormerehandise, but business transactions involved changes of these quantities, i.e., increases or decreases. Instead of dealing with receipt and payment as different kinds of exchanges, it was found that both transactions coiild be treated a t once if the amount of money or goods received was denoted by an ordinary natural number, and the amount paid out was represented by a negative number. This idea is useful mainly because the effect of consecutive transactions can be obtained by the operation which \ve know as addition of intcgers. For example, if the receipt of five coins is followed by the payment of ten coins, the net result is the same as the payment of five coins. Iri symbolic form, this equivalence is expressed by the formula

The interpretation of consecutive exchanges as a single exchange also requires the consideration of transactions which involve no change of money or goods. These are of course represented by the number zero. For instance, a receipt of 5 coins followed by payment of 5 coins has the same eff ect as "breaking even " : S + ( - 5 ) = o. The integers and their operations are of course very familiar in our modern society. The application of the integers to represent exchanges of money

100

THE IXTEGERS

[CIXAP.

is also commonplace. However, before the sixth century, negative numbers were unknown, and zero was used only as a symbol to distinguish between numbers such as 102 and 12. The invention of these new numbers and the definition of addition and multiplication of the integers to satisfy the needs of commerce must be considered to be among the greatest advances of civilization. Informally, the set Z of al1 integers consists of (1) al1 natural numbers, (2) an object called zero and denoted by O, which is different from al1 natural numbers, and (3) for each natural number n, an object denoted by -n, which is different from al1 natural riumbers and zero, and such that if m and n are two different natural numbers, then -m and -n are different objects. These objects are called the negative numbers. I t is not very important what the objects called "integers" really are. In fact, thcre are severa1 ways to construct the system of integers from the natural numbers, and the different constructions lead to different answers to the question "\Vhat are the specific objects called integers?" Of course, al1 of these constructions lead to systems which are essentially* the same. When the natural numbers are defined to be the finite ordinal numbers in von Seumann's scnse (Definition 3-1. l), then a convenient choice for zero is the empty set a, and the negative numbers can be defined as

DEFISITION 4-1.1. The set L of al1 integers is

Thus, the integers are the folloiving sets: (a) al1 natural numbers:

(b) zero : a , and (c) t,he negat,ivc numbers :

As usual the symbol O will denote zero ( = a ) and -1, -2, -3, . . . , -n, . . . will stand for (11, (21, (31, . . . , {n), . . . respect,ively. I t is easy to see that the set Z of Definition 4-1.1 satisfies the conditions (l), (2), and (3) of the informal description of Z given above. In particiilar, al1 of

* Thc systcms obtained by thc various constructions arc isomorphic (froin the Greck word meaning "of the same form"). Thc mathematical mcaning of the term isomorphic will be explsincd in Section 4-2.

4-11

COSSTRUCTION OF THE ISTEGEHS

101

the objects in the list

are different. The operations of addition, multiplication, and negation are responsible for the usefulness and importance of the integers. Addition and multiplication are extcnsions of the corresponding operations in the system of natural numbers, but negation has no counterpart in N. The definition of negation is suggested when tve thirik of the set of integers in the usual order,

consisting of the natural niimhers, and a mirror-image copy of these numbers (the negative numbers), linked together by the number zero. ATegationis the process of passing from a n integcr a to its mirror-image. DEFISITIOS4-1.2. Ncgaiion. ,et m be any natural number. Theii (a) -m = {mi, (b) - {m] = m, (e) -0 = o. Once the meaning of Definition 4-1.2 is understood, the notation -m for the negative numbers can be used without fear of trouble, since in fact -m stands for {m), whether FVC think of - as the negation operation symbol, or simply as the usual sign to denote negative numbers.* If parts (a) and (b) of Definition 4-1.2 are combined, we obtain the familiar rule o j double negation: -(-m) = m. The addition operation in the iiitegers is surprisingly complicat,ed. D E F I S I T I O4-1.3. ~~ Addition. Let m and n be Then (a,) m n is defined as in N ; (b) (-m) (-n) = -(m n); m-n O (c) m (-n) = (-n) m = -(n - m) (d) m + O = O + m = m ; any natural numbers.

ifn<m, if m = n, ifm < n;

(e) (-m) O =O (f) O O = 0.

+ (-m)

-m;

* The minus sign is also uscd t o denote the binary opcration of subtraction, as, for example, in expressions likc 3 - 1. \Vhen uscd in this way, the symbol L< -77 always occurs betwccn two number symbols; when "-" denotes the operation of negation, it is never preceded by a numbcr symbol.

102

THE IXTEGERS

[CHAP.4

This definition of addition is open to criticism on the grounds that it is cumbersome and difficult to use in proving the important properties of addition. To avoid such a complicated definition of addition, and an almost equally unwieldy definition of multiplication, mathematicians have devised another way of constructing Z from N. This construction employs three important new mathematical concepts: equivalence relation, equivalence class, and partition of a set. The introduction and study of these notions represents a considerable digression from our program of constructing the fundamental number system Z (see Section 6-4). We have therefore chosen to adopt Definition 4-1.3 as the definition of addition in Z. The rules of this definition provide an effective method of performing addition in Z, and they can be used to prove the main properties of addition. f addition. Let a, b, and c be integers. Then (4-1.4). Properties o (a) a + b = b + a ; (b) a (b c) = (a b) c; (c) a O = a; (d) a (-a) = O .

+ +
+ +

+ +

Actually, (4-1.4a, c, d) can be obtained very easily from Definitions 4-1.3, 4-1.2, and the properties of addition for the natural numbers (3-2.4). I t is the proof of the associative law (4-1.4b) which requires the checking of a discouragingly large number of different cases.

EXAMPLE 1. We will prove the commutative law (4-1.4a). There are nine possible cases to esamine. Let m and n be any natural numbers. (1) m n = n m, by (3-2.4); (2) m (-n) = (-n) m, by Definition 4-1.3(c) ; (3) m O = O m, by Definition 4-1.3(d) ; (4) (-m) n = n (-m), by Definition 4-1.3(c) ; ( 5 ) (-m) (-n) = -(m n) = -(n m) = (-n) (-m), by Definition 4-1.3(b) and (3-2.4) ; (6) (-m) O = O (-m), by Definition 4-1.3(e) ; (7) O n = n O , by Definition 4-1.3(d) ; (8) O (-n) = (-n) O, by Definition 4-1.3(e); (9) o + o = o + o.

+ + +

+ +

+ + +

+ +

Of course, cases (2) and (4), cases (3) and (7), and cases (6) and (8) are really the same, so that one case of each of these pairs could be omitted. EXAMPLE 2. There are 27 main cases to consider in the proof of (4-1.4b). These are obtained by letting a, b, and c take al1 combinations of natural numbers, negative numbers, or zeros. However,.because of Definition 4-1.3(c), it is neces-

sarF to break some of thcsc main cascs into subc:ises. k'or csamplc, siipl->ose thnt k , 172, arid n are natural numbers, srid thnt n-e wish t o prove

This ideiitity has a differcnt meariing in ench of five subcascs. (1) If k < m, thcn i t is ncccssary to show ( m n) - k = (m - k ) n. (2) If k = m, tlicn i t is neccssary to show (1n n) - m = O n. ( 3 ) If m < k < m n, tlien i t is ncccss:iry t o slio~v(m n) - k = n (k - m). (4) If k = m n, then i t is ncccssary t o show O = [- ( ( m n) - m)] n. (5) If k > m n , thcii it is ncccssary to show -[k - ( n ~ n ) ] = -[(k - V I ) - n].

+ +

+ +

-+

S h e dcsired identities in cach of these cascs can bc 1)roi-cd using Definitioii 4-1.3 and the rcsults of l'roblcm 9, Section 3-3.

I3y considcring tlie interprctatioii of thc iiitcgcrs as mcasures of variution (iricrcase or dccrcase), it is possihlc to firid ti rtitioilal bttsis for Ilcfinition 4-1.3 and for thc propcrties of addition (1-1.4). Lct us examine :I. spccific example. Suppose thut thc iritcgcr a reprcseiits the c*hungciri thc number of gallons of water in a c~crtain rescrvoir duriiig a givcn day. 1;or instante, if thc timount of water in tlic rescrvoir iricrcased by 15,000 gallons diiriilg t hc day, then a = 15,000, i\-hcrc:~sif it dccrctises by 15,000 galloils, thcii a = - 15,000. I,et b reprcsei~t,the chaiige in thc riiimbcr of galloris of water iri the rescrvoir diiriilg thc next day. Theii a -1- b rcpresciits the chaiige of volume of water i i l the rcsorvoir (measilrcd iri niimber of gallons of water) diiring the two day pcriod. If hoth a :~ndO rcpreseilt increases, then a and b are nat,ural niimbers, aiid our physical iriterpretatioii of additiori agrccs Sor natiiral numbcrs n-ith the usual defiiiition of :idditi011 iii ,V. These facts are so familiar that they \\-oiild probably be acceptcd without cliiestion. Howevcr, Sor thc sake of our disc*iissiori, n-e could use this physical description as the dcfiriitioii of the addit,ion of intcgers. Quite possibly in this way. thc rules of addition wcrc originally ot~taiilcd Iri terms of this examplc, the properties of addition (4-1.4) can bc iritcrpreted as statcments of commonplacac observations. Thc commutativc la~v a b =b a mcans t,hat a change of amount a iri orie day, folloi~ed by a changc of amount 6 thc next day, produces thc same rcsult iri thc tn-Oday period as a charige of amoiiiit b on the first day, followed by a charige of amoiirit a on the sccoiid day. Thc associative laiv a (b c) = (a O) c is even more cirident, since the two sides of this idcntity simply represent two different wnys of looking a t thc rcsult of thc changcs o11 three consccutivc days. The idcntitics a -1- 0 = a and a ( - a ) = O have similar iriterprctations.

+ + +

104

THE INTEGERS

[CHAP.

I f m is a natural number, and a is an integer, then we can informally define the product m a to be the integer which is obtained by adding a to itself m times. For example, 1 . a = a, 2 a = a a, 3 . a = (a a) a. This definition is both natural and useful. I f a is a natural number, then m a is the same as the product of m and a, obtained from Definition 3-2.6. If a = O, then m a = O, as the reader can show by induction on m. I f a = -n, t h e n m - a = -(m.n). Infact,

+ +

2 (-n) 3 (-n)

= = =

(-n) (-n) = -(n n) = -(2 n), 2 (-n) (-n) = [-(2 n)] (-n) -(2 n n) = -(3 n), etc.

I t is convenient to extend the definition of products so that any two integers can be multiplied together. This means that we must define O c and (-m) . c for an arbitrary integer c and natural number m. If the multiplication of integers is to satisfy the distributive law

then there is only one way in which these products can be defined. In fact, using this distributive law and (4-1.4)) we obtain 0.c o . c + o = o . c + [(O-c) (-(Ose))] = ( O c o c) [-(O c)] = ( O O ) c [-(O c)] = o c [-(O c)] = o.
=

Also, (-m)

+ + . c = (-m) . c + O = (-m) - c + [(m * c ) + (-(m.c))] [(-m) - c + mmc] + [-(m-c)] [(-m) + m] . c + [-(msc)] = 0 . c + [-(m0c)]
+
= =

+ [-(m

c)]

-(m

e).

Thus, we are led to the mell-known rules for multiplying integers. These will be adopted as the formal definition of multiplication. DEFINITION 4-1.5. Jfultiplication. Let m and n be any natural numbers. Then (a) m n is defined as in N; (b) (-m) (-n) = m n; (c) ( - m ) * n = n . ( - m ) = -(m.n); (d) m . 0 = 0 . m = 0; (e) (-m) - 0 = O (-m) = 0; (f) 0 0 = 0.

4-11

CONSTRUCTION O F T H E I N T E G E R S

105

From Definitions 4-1.5 and 4-1.3, and the properties of addition and multiplication of the natural numbers (3-2.4) and (3-2.9), it is possible to deduce the familiar properties of multiplication of the integers. The proofs are elementary, but tedious because they require the examination of numerous cases. (4-1.6). Properties of multiplication. Let a, b, and c be integers. Then (a) a . b = b - a ; (b) (a b) c = a (b c) ; (e) if a . c = b c, then either a = b or c = 0; (d) a . 1 = a; (e) a - ( b + c ) = ( a - b ) (aac).

With the exception of the cancellation law (e), the properties of multiplication of integers (4-1.6) are identical with the properties of multiplication of the natural numbers given in (3-2.9). Since a . O = b . O = O for al1 integers a and b, the statement (3-2.9~) is not true for the integers. However (4-1.6~) shows that O is the only integer which cannot be cancelled.

1. Show that a11 of the sets in the Iist

... ,
are different.

(31,

(2))

(1)) @)

1, 2, 3,
=

...

2. Using Definition 4-13) prove that a any integer a.

+O

a and a

+ (-a)
+

O for

3. Let the integers be considered to measure amounts of change. Give an interpretation of the operation of negation and interpret the identity a (-a) = 0. 4. Show by induction on m that m O = O (where m is a natural number, and m O is the result of adding O to itself m times). 5. Using Definition 4-1.5 and the properties of multiplication of the natural numbers (3-2.9)) prove the laws a b = b a, a (b c) = (a b) c, and a 0 1 = a.

6. Using Definitions 4-1.2 and 4-1.5, prove that for any integers a and b, (-a).b = a . (-b) = -(a.b), and (-a)*(-b) = a-b.

7. Using Definitions 4-1.3 and 4-1.5, the properties of addition and multiplication of the natural numbers (3-2.4) and (3-2.9), (3-3.5), and (4-1.4a, c, d), prove the distributive Iaw a (b c ) = (a b) (a c) for the integers. [Hints: (a) Prove the law in the cases where a t least one of a, b, or c is zero. (First show from Definition 4-1.5 that d O = O d = O for al1 d E 2.) (b) Enumerate

106

THE I N T E G E I ~ S

[CHAP.

the cight possible cases in which nonc of a, b, and c is zero. (c) Consider the cases

separately, and state the meaning of thcsc idcntities in each of the three subcases m < n, m = n, m > n. (d) I'rove these cases (see 3-3.5). (e) I'rove al1 other cases, eithcr dircctly or by reducing them to the cases treated in (d).]
.

8. Gsing Dcfinition 4-1 .5, prove that if c and d are intcgers, and c # O , d # 0, then c d # O . llsing this fact together with the distributive law (4-1.6e) and the result of Problem 6, prove (4-1.6~). 9. Prove the associativc law of additiori a (b c) = (a b) c. [Hints: (a) Cse (4-1.4~) to prove the law in al1 cases in which a t least one of a, b, or c is zero. (b) Enunlerate thc eight possible cases in which none of a, b, or c is zero. a = -a, and use this fact together with the distributive (c) Prove that (-1) law to reduce these eight cases to thc three cases: (i) k+ (m+ n) = (k+ m) n, and (iii) k [(-m) n] = (m n) = [(-k) m] n, (ii) (-lc) [k (-m)] n. (d) Complete the proof outlined in Example 2 of case (ii). (e) Give the proof of case (iii) .]

+ +

+ + +

+ + +

+ +

4-2 Rings. Starting with the properties of addition and multiplication given in (4-1.4) and (4-1.6), it is possible to develop many useful facts about the integers. However, as t,he reader may have noticed, the system of rational numbers and the real number system also satisfy the identities listed in (4-1.4) and (4-1.6). Therefore, if we prove a theorem about the integers using only the facts contained in (4-1.4) and (4-1.6), the samc theorem should be true for the rational numbcrs and real numbers. There is oiie trouble with this usefiil observation. In order to carry a theorem which has been stated and proved for the integers over to the real or rational numbers, it is necessary to examine the proof of the theorem to be sure that it uses only properties which can be deduced from (4-1.4) and (4-1.6). Mathematicians have solvcd this problem by a simple but powerful idea. They have introduced a new term, "integral domain," to describe al1 systems on tzrhich are defined tmo operations (called addition and multiplication) satisfying al1 of the laws given in (4-1.4) and (4-1.6). Then, if a theorem can be deduced from the properties listed in (4-1.4) and (4-1.6), it is a theorem about integral domains, meaning that it is true for every system ~vhichsatisfies (4-1.4) and (4-1.6). In particular, it is true for the integers, rational numbers, and real numbers. 111 this section, ure deal with systems which are defined by a set of axioms, without concern for thc nature of the particular systems under consideration. Such a viewpoint is called abstract. There are several advantages (other than the possible economy of being able to treat many systems a t

4-21

RINGS

107

once) to be gained by an abstract approach t o mathematics. One of them is that in working with an axiomatically defined object, rather than a specific one, there is a n economy of ideas and concepts. Al1 of the superfluous notions and facts are thrown away, and our concentration is focused on the essential features of the object which we are studying. The abstract axiomatic approach to problems and theories has become a dominant feature of modern mathematics. hloreover this viewpoint is gaining importance in physical and social sciences. Anyone who wants to know what is current in science, particularly mathemat,ics, must become acqiiainted with abstraction. Instead of considering integral domains immediately, we introduce a more general concept,.

DEFISITIOS 4-2.1. A ring is a set A oii which are defined two biiiary operations x y and x y (called nddition and multiplication), and a unary opcration -x (called negation), such that A contains among its elements a particular one O (called the xero* of A ) , and the following idcntities hold for al1 .r, y, and x in :

(a) (b) (c) (d) (e) (f) (g)

x + y = y +x; x (y 2 ) = (.r y) 2; x o = x; x (-2) = 0; x (y x) = (x y) x; 2) = (.c . y) -.t(x 2); x (y ( x + y ) . x = (x.2) (y-2).

+ +
+ +

+ +

I t must be strongly cmphasized that in the definition of a ring A, nothing is assumed about the natiirc of the elemcnts of A . Any collcction of objects for which operations x y, .r y, and -z are defined satisfying (a) through (g) is eligible to be called a ring. 'Liloreovcr, although the opcrations of a ring are called "addition, " "multiplicatioil, " and "negation," they need not a t a11 resemble the familiar operations of addition, multiplication, and negation of numbers. Thc only reyuirement is that therc are definitions which, for every x and y in A, determine the elements represented by x y, x y, and -x. I t should also be remembered that a ring is determined not by its elements alone, but by the elements, togethcr with the operations of addition, multiplication, and negation. There are important examples of differcnt rings having the same set of elements.

* The use of the symbol O to rcprcscnt the zero ir1 cvcry ring is a long-standing mathematical tradition. Thcrc are instances in whicli this convention might cause confusion, but thcy are rare.

108

TIIE INTEGERS

[CAAP.

EXAMPLE 1. Thc numbcr systems Z (the intcgers), Q (the rational numbers), and R (the real numbers), with their familiar operations, are al1 rings. In fact, as we have already obscrved, these systems are integral domains, that is, they satisfy thc lavs given in (4-1.4) and (4-1.6). I t is easy to see t h a t (4-1.6a) and (4-1.6e) together imply Definition 4-2.1 (g), so that any integral domain is a ring.
EXAMPLE 2. The sct (2aja E Z ) of al1 even integers is a ring, again with the familiar opcrations.

EXAMPLE 3. Let 24 = {a, 0). Define


(a) a + a = O , a + O = a , O + a = a , 0 + 0 (b) a - a = a, a - O = 0 , O . a = 0 , O . O = 0; (c) -O = O, -a = a.
=

0;

Then -4 is a ring. The symbols a and O may be interpreted in this exaniple as of an integer being odd or even. Howevcr, such an rcprcscnting the l->ropcrtics interprctation has no bearing on the question of A bcing a ring.

EXAMPLE 4. Let 1 1 = P(S), the set of al1 subsets of a set S. For X E A and Y E 1.1 (that is, X and Y are any subscts of S), define

X+ Y

( X n YC)u ( S c n 1');

x.Y = x

-X

X.

Then with thcse operations, ;l is a ring. Nore generally, if A is any collection Y and Y are in 11, then X U Y and X n Yc are in A, of subscts of S such that if , then i l is a ring with respect to these operations. These are the collcctions which \ve callcd rings of sets in Section 1-6. Thc fact that these collections form a ring justifics the tcrminology "ring of sets."

The reader should verify that Examples 3 and 4 are rings as \ve claim (see Problem 14, Sectioil 1-4). It should be noted that the commutative 1aw of miiltiplication, n: y = y x, is not postulated for rings. If a ring satisfies the identity x y = y x, then it is called commutatice. There are important examples of rings which are not commutative, one of ivhich will be given in Chapter 10. Because the commutativc larv for multiplication is omitted from the postulates for a ring, it is necessary t.0 state both distributive laws, x (y x) = x . y z and (.c y ) x = z x y x. If the ring is commutative, then either of thcse laws can be deduced from the other. For example, (x y ) x = x (.c y) = x .z: z y =x x y 2.

+ +

THEOHEM 4-2.2. Let A be a ring, and let


(-2)

,4. Then O

2 =

o.

+ n: = x and

THEOREM 4-23. J,et A be a ring aiid let x, 11, and x he any elcmcnts of A . (a) If x x = y x, theii . l : = y. (b) If x n: = -C y, t,hen x = y.

+ +

We leavc the proof of Theorem 4-2.2 as an cxercise for thc rcader. T o prove Thcorem 4-2.3(a), supposc t,hat x x =y 2. Thcn, by Dcfinition 4-2.l(b), (c), and (d), z = x O =z [X (-x)] = (2 x) (-2) = (y X) (-2) = y [Z (-x)] = y O = y. The implication (b) folloivs from (a) and the commutative law, 4-2.l(a).

+ +
+

+ + +

+ + +

+ +

TIIEOREM 4-2.4. Let A be a riilg, and let r a i ~ d y be elemcilts of 11. Then (a) -(-X) = X ; (b) (-2) (-y) = - ( . E -1- y); (c) 2 . 0 = 0 . x = o ; (d) (-x) y = z ( - l j ) = -(z y); (-y) = x y. (e) (-x) I t is not hard to prove the identitics of Theorem 4-2.4 for the integers by the direct use of Definit,ioris 4-1.2, 4-1.3, and 4-1.5. I t is significant however that the proof for general rings is simpler and more clegant. To prove (a), note that by Definition 4-2.1(a) and (d),

Thus, by Theorem 4-2.3, . x = -(-x). The proof of (b) also uses thc cancellatiori law, Theorem 4 - 2 3 By Definition 4-2.1 (a), (b), (c), (d), and Theorem 4-2.2, \ve obtain (.E y) [(-x) (-Y)] = (Y x) -1 [(-x) (-y)] = Y [x ( ( - 4 (-u)I = y [(x (-x)) (-y)] = y [O (-!/)1 = Y (-y) = 0 =-(-y) = -(.L y). (X y) [-(x y)]. Thus, (-2) By Definition 4-2.1 (e), O O = O. Therefore x O -1O = .r O = .r (O 0) = (x 0) (.E O) by Definition 4-2.1 (f) and (e). Lysing Thcorem 4-2.3, \ve obtain x O = O. Similarly, O . x = O. This proves (4 To provc (d), use the distributive law, Definition 4-2.l(f) and (g), togcther with the result just proved, and the cancellation law, Theorem 4-2.3. For instance, x y $- (-m) . y = [x (-z)] y = 0 y = O = y = (z y) [-(x y)] by Definition 4-2.1 and (c). Therefore, (-2) - (z y) by Theorem 4-2.3. Thc final statement (e) of Theorem 4-2.4 is obtaincd by tn-o applications of part (d), together with (a) : (-.x) (-y) = -[.x (-y)] = - [ - ( J y)] = x0y. From a mathcmatician's viewpoiiit, the advantage of the integcrs over the natural numbers is that thc integers are closed under subtraction. That is, if a and b are any iritegers, then it is al~vays possible t'o find an intcger n: which is a solution of

+ +
+

+ + +

+ +
+

+ + + + + + +
-+

+ +

As a matter of f w t , subtraction is always possible in any ring.

110

THE IKTEGERS

[CHAP. 4

THEOREM 4-2.5. Let A be a ring. Let x E A and y E A. Then there is x = x, namely x = one and only one element x E A satisfying y x (-Y>.

Proof. Let x = n: (-y). x =y [x (-y)] = y Then y [(-y) + x ] = [y (-y)] x = O + x = x + O = x, bythefirstfour (-y) is a solution of y laws of Definition 4-2.1. Therefore, x = x x = x. On the other hand, if x is a solution of y x = x,then x (-y) = (y 2) (-y) = ( 2 y) (-y) = 2 t [y (-y)] = x O = x. Hence, x = x (-y) is the unique solution of y z = x.

+ +

+ +

+ +
+

+ +

+ +

It is customary to use the subtraction notation x - y to denote the solution of the equation y x = x in an arbitrary ring. That is,

The postulates for rings can be given using the binary operation of subtraction instead of the unary operation of negation. [Definition 4-2.l(c) and (d) are replaced by the single identity y (x - y) = x.] However, Definition 4-2.1 is more familiar and,convenient. As we have pointed out in Example 1, the systems Z and Q of integers and rational numbers are rings. Also, Z is a subset of Q. However, more can be said: the operations of addition, negation, and multiplication in Q of the elements belonging to Z agree with the usual operations for the integers. In the study of rings, this situation occurs frequently enough to justify the introduction of a special term to describe it.

DEFIKITIOX 4-2.6. Let A be a ring. A nonempty subset B of A is called a subring of A if for every x and y in B (not necessarily different) the sum x y, the negative -x, and the product x y al1 belong to B.

Since x E B, y E B implies that x E A and y E A, the sum x y, negative -x, and product x y always exist as elements of A. The condition that B be a subring is merely the added assumption that these elements are in B and not just A. I f B is a subring of A, then the operations of A, applied to B, can be considered as operations on B. The set B with the operations which it inherits from A forms a ring because the identities of Definition 4-2.1 are automatically satisfied in B, so that the term subring is justified. I n speaking of a subring B of A, it is customary to think of B as a ring with the ope<atons on B agreeing with the operations of A.

EXAMPLE 5 . Let B = (a/2"la E 2, n = 0, 1, 2, of Q.

. . .l.

Then B is a subring

ESAMPLE 6. Let C = (a/21a E 2 ) . Then C G Q, b u t C is not a subring of Q, since; for cxample, = is a product of elemcnts of C, but 4 C.

+ +

EXAMPLE 7. -4s we have noted, Z is a subring of Q. However, N is not a subring of Z, nor of Q, bccause if n E N, then -n 4 N.

Thc concept of isomorphism, which was mentioned in Section 3-3 for general mathematical systems, is very important in the theory of abstract rings.

DEFISITIOS 4-2.7. Let A and B be rings. Thcn A is isomorphic to B if there is a one-to-one correspondcnce x +-+ x' betwccn the elements of A and B, such that sums, products, and negatives are preserved by the corresponderice. That is, if x +-+ x' and y y', then

x+y+-+x'+y',

X-y-x'.y',

and

-x--(xt).

If a ring A is isomorphic to a ring B, thcn any property of A which can be expressed in tcrms of the operations of addition, multiplication, and negation is also a propcrty of B, and vice versa. E'or instance, suppose that A is a commutative ring which is isomorphic to thc ring B. Lct x' and y' be any elemcnts of R. Then there exist elcments x and y in A such that x ++ x' and y +-+ y' by the given isomorphism. htorcover, x y 2 ' y' and y x y' x'. Since x y = y x in A, and the correspondence is one-to-one, we havc x' y' = y' x' in B. Thus, B is a commutative ring, since x' and y' were arbitrary elements of B. The mcaning of Dcfinition 4-2.7 is that isomorphic rings are indistinguishablc in every way which has to do with t,he fact that they are rings, even though A and B may be dif'ferent as scts.

EXAMPLE 8. 1,et Jf = ((a, a)la E 2 ) . Define addition, niultiplication, and negation* in J1 by the rules

(a, a) 0 (O, O ) (a,'.)

= = =

(a -1 O ,a (-a, -a),

+ O),

o (b, b)
G (a, a)

(a b, a b),

* Tlic symbols used to denote addition, multiplication, and negation in a ring are usually -1-, and -. n'licri discussirig tn-o diffcrcnt rings a t the same time, i t may be confusing t o denote thc corres1)onding opcrations by the same synibols \vil1 sometinies (although this \vas dorie iri Dcfinitiori 4-2.7). I n this case, use 0 , 0, and O t o represent the opcratioris in one of tlic rings, and the usual symbols t o denote the operations in the other ring.
e,

112

THE INTEGERS

[CHAP.4

where and - denote the ordinary operations in Z. It can easily be verified that the operations O, 0, and O in d l satisfy the conditions of Definition 4-2.1, so that 111is a ring. The correspondence (a, a) ++ a is a one-to-one correspondence between the elements of hf and Z, which is an isomorphism. For example,
O,

+,

(a, a) and

a,

(b, b)

+ +

b,
t ,

(a, a) O (b, b) = (a

+ b, a + b)

+ b.

The reader can check that multiplication and negation are also preserved by this correspondence.

9. The rings Z and R are not isomorphic. I n fact, we will show in EXAMPLE Section 7-3 that R is not denumerable. Since Z is denumerable (see Section 1-2, Example 3), there cannot be any one-to-one correspondence between Z and R. For the same reason, Q and R are not isomorphic.
EXAMPLE 10. The rings Z and Q are not isomorphic. Of course there are oneto-one correspondences between Z and Q (see Section 1-2, Example 5, for instance), but none of these are isomorphisms. To prove this statement, suppose that there is an isomorphism between Z and Q. Let r be the rational number corresponding to the integer 1: 1 r. Now let a be the integer corresponding to the rational number r/2:

Since any isomorphism preserves sums, the correspondence a itself gives 2a = a at , r/2 r/2 = r.

t ,

r/2 added to

,r and 2a t , r, so that 1 = 2a. However no integer a satisfies 2a Thus 1 t This contradiction shows that Z cannot be isomorphic to Q.

1.

EXAMPLE 11. Let S be a set with one element. Let B = P(S), with the operations defined as in Example 4. Then the ring B is isomorphic to the ring A described in Example 3. The correspondence is given by S t , a and @ ++ O. EXAMPLE 12. Let S and T be two finite sets with the same number of elements. Then the rings A = P(X) and B = P ( T ) , with the operations defined in Example 4, are isomorphic.

1. Let A

((m, n)lm E Z, n E 2). Define (mi, n i )

+ (m2, n2)

= = =

(mi

+ m2, n i + n2),

(mi, n i ) (m2, na) -(m, n)

(mim2, nin2), (-m, -n).

Show that with these operations, A is a ring. What is the zero element of this ring? Show that in A, i t is possible to find two nonzero elements whose product is zero. 2. Let B = ((m, n)lm E Z, n E 2). Define (mn, n2) (mi, n i ) (mi, n i ) (ma, na)

= =

(mi m2, n i m), (mimn - nin2, min2

+ mnni),

Show t h a t with these operations, B is a ring.

3. Let A

{x, y) (a set consisting of two distinct symbols). Define

Find al1 possible ways in which ncgation and multiplication can be defined on A in order t h a t i t will be a ring. 4. Verify t h a t Examples 3 and 4 are rings. 5. Show that there is one and essentially only one ring which contains one element.

6. Show t h a t if x

+y

x in a ring, then y

0.
=

7. Prove the following idcntities in any ring. (a) ( x + y) - 2 = ( x - z ) + y = x + (y - 2 ) (b) x - (y 2) = (x - y) - 2 (c) x . ( y - 2) = x . y - 2 . 2

x-

(z - y)

8. Show that if B is a subring of a ring A, tlien O E B.


9. Show that a nonempty subset B of a ring A is a subring of A if and only if x E B and y E B implies that x - y E B and x y E B.

10. Prove the assertion in Example 5. 11. Show t h a t if B is a subring of A , and if C is a subring of B, then C is a subring of 11. 12. Verify that J l , in Example 8, is a ring. Complete the proof t h a t the given correspondence is an isomorphism between A l and Z. 13. Prove the statements madc in Examples 11 and 12. 14. Show t h a t the ring Z of al1 integers is not isomorphic to the ring of al1 even integers. 15. Prove that if A, B, and C are rings such that A is isomorphic to B, and B is isomorphic to C, then A is isomorphic to C. 16. Let A be a noncmpty set on which are defined three binary operations x y, x y, and x - y, satisfying the parts (a), (b), (e), (f), (g) of Definition 4-2.1, and the law y (x - y) = x. Do not assume the existence of a zero or negation in A .

114

THE INTEGERS

[CHAP.

(a) Prove the following for elements u, u, w, x, and y of A. [Hint: Use part (3) in the proofs of (4), (5), (6)) and (7) .] (1) If y = (u - u) (u - u), then u + y = u a n d v + y = u. (2) If u w = u w for some element w, then u x = u x for al1 x. (3) If u w = u w, then u = u. (4) u u = w if and only if v = w - u. ( 5 ) (x Y) - Y = x (6) Y (x - x) = Y (712-x= y-y

+ +

+ + +

+ + +

(b) Show that by suitably defining the element 0, and the operation of negation, A becomes a ring in the sense of Definition 4-2.1.

17. Show that if x - y is suitably defined in a ring A, then the law y (x - y) = x is satisfied in A. This result combined with Problem 16(b) shows that the postulate set consisting of (a), (b), (e), (f), (g) of Definition 4-2.1, and the condition y (x - y) = x, is equivalent to Definition 4-2.1 for a ring.

4-3 Generalized sums and products. The purpose of this section is to investigate the consequences and generalizations of the associative, commutative, and distributive laws of addition and multiplication in rings:

These laws are satisfied in the systems of natural numbers, integers, rational numbers, real numbers, complex numbers, and indeed in any commutative ring. Thus our results have a wide range of applicability. Moreover, the study of this section will give the reader a chance to become better acquainted with the abstract approach to mathematics. The fact t'hat multiplication and addition are binary operations means that numbers are always added and multiplied two at a time. However, 5 l . Because ordinary everyone is familiar with expressions like 2 addition satisfies the associative law, (4-2)) it does not matter whether we add 2 and 5 and then add 1, or add 2 to the sum of 5 and 1. Consequently, the expression 2 5 1 makes sense, even though it does not indicate which two of the numbers are to be added first. In general, the expression

+ +

+ +

indicates the result of adding the n numbers, a l , az, . . . , a, in any way, two

4-31

GENERALIZED SUMS AND PRODUCTS

115

a t a time. The fact that the result does not depend on how the terms of thc sum are associated can be proved by induction, using (4-2)) in a way which is similar t o the proof of Theorem 1-5.2 given in Section 2-3. This result is called the general associative law. Thcre is a convenient notation which is used t o indicate the sum of several numbers. I t is thc expression Ci"=lai, standing for a l a2 a,. We read C;=l ai as "the sum of the ai from i = 1 t o i = n." A sum with a single term is providcd for by adopting the convention that C f = l ai = a l . 'I'he symbol C is called the summation sign. The letter i in thc expression ai is called the index o j summation. Another choice for the index of summation does not change the sum. Thus,

+ +

are the same, since they are al1 abbreviations of al

+ a2 + + a,.
m -

Variations of the summation notation such as ai are self-explanatory.

x:z

C;=oai,

C;=k ai, and

EXAMPLE 4. EXAMPLE 5.

ras

+ 2a2 + -k na,. 2+2+ + 2 2n.


al
n terms

I t is easy to show that the commutative law of addition, (4-3), can be extended to sums with any number of terms. Using thc notation for sums which we have just introduced, the "general commutative law" can be stated as follows. (4-3.1) Let a l , a2, . . . , a, be any numbers. Let il, i2,. . . , inbe any rearrangement of the indices 1, 2, . . . , n. Then

For example, if il a4 a2 al a3

+ + +

= =

4, i2 = 2, i3 = 1, i4 = 3, then ~ al a2 a3 a4 = t:=iai.

+ + +

f aij == ~

116

THE INTEGERS

[CHAP.

Proof. The proof of (4-3.1) is by induction on n. I f n = 1, there is nothing to prove. Note that one and only one of the numbers sil, ai2, . . . , ai, is a,. Suppose that aik is a,. Then by the general associative law and (4-3)

(ai,

+ ai2 $ + a,-, + a,+, + + ai,) +

Un-

Since n does not occur in the list il, i2,. . . , ik-l,i k + l , . . . , in,this finite sequence is simply a rearrangement of al, a2, . . . , a,-l. Therefore, the induction hypothesis yields

Consequently,

An important special case of the general commutative law occurs when we consider sums of sums :

In this expression, we have a doubly indexed set of numbers ai,j, where i ranges from 1 to n and j ranges from 1to m. We first sum over j (for each i) and then add the resulting sums. For example,

According to the general commutative law, (4-3.1)) we can write

4-31

GESEIZALIZED SCMS

A N D I'KOIIUCTS

117

This equality expresses the fact that if the numbers ai,j are written in a rectangular array

and if the terms in each row are added, and then these sums added, the result will be the same as if the terms in each column are added and then the totals of the columns are added.

EXAMPLE 6. Letai,j = (:)ifO 5 j 5 i 5 n a n d a i , j = OifO 5 i < j Then by formula (1-2) which gives the sum of the binomial coefficients,

n.

The associativc and commutative laws of multiplication, (4-4) and (4-5)) can be generalized to show that the product of n numbers al, a2, . . . , a, does not depend on the grouping or order of the numbers in the product. Shus, for example, (al a2) (a3 a4)
=

a4 [(al a3) a21

[(a3 a l ) a21 a4.

This justifies the use of t,he expression

to denote the product of the n numbers a l , a2, . . . , a,. There is also a useful notation for products which is similar to the C notation for sums.

118

THE INTEGERS

This consists of using the expression

to stand for the product a l a2 a,. The expression n y = 1 ai is read is called "the product of the ai from i = 1 to i = n." The symbol the product sign.

EXAMPLE 9.

(J-Jr=l

n+k a;)(J-Ji=n+l ai) =

ni=i a;.
n+k

There is a useful generalization of the distributive law, (4-6). al, a2, . . . , a,, and bl, b2, . . . , bm be any numbers. Then

Let

We leave the proof of this identity as a problem for the reader. As we observed in Section 4-1, it is possible to define the product n a for a natural number n and an integer a in terms of the addition of integers. This remark can be generalized to arbitrary rings. DEFIXITION 4-3.2. Let x be an element of the ring A . Define

n summands

if n is a natural number, O x = 0, * (-n)x


=

n(-x)

(-x)

+ (-x) +

n summands

if n is a natural number.

+ (-x),

The symbol O is used here with two different meanings. The O on the left is the integer O (that is, iP), while on the right, O stands for the zero of the ring.

"

4-31

GESEHALIZED SUMS A-YD PRODL~CTS

119

On the basis of Definition 4-3.2, \ve can speak of ax where a E Z and x is an element of any ring. The generalized commutative, associative, and distributive laws of addition can be specialized to obtain the following result.

TIIEOREM 4-3.3. Let x arid y be any clements of the ring A . Let a and b be arbitrary integers. 'i'hcn
(a) (b) (c) (d) (e) (f) (g) az bx = (a b ) z; a(bx) = (aO)x; a(x y) = ax ay; a(x y) = (ax) y = .r (ay) ; l x = x; a0 = 0 ; a(-x) = (-a)x.

+ +

We leave the proof of this result as an exercise for the reader.

l. Using (4-2), prove that

sums and products. 2. Determine the follo~iing (a) (b)

2
i=l

(j2

+ j + 11, Ij[ (j2 + j +


(see Section 2-1)

1)

2i

(c) c o i 2

3. VTritcthe folloliing sums and products in terms of the


(a) i 6 - 2'+
3' -

and

fl notation.

+ (-1)"-'n6

(d) (k

+ 1) (k + 2)

120

THE INTEGERS

4. Find the sum in Problem 3(b) by noting that

5. Let ai

a for i

=
n

1, 2, . . . , n. Show that
=

E ai
(a)

na,

fi

ai

an.

6. Prove the following by mathematical induction.

ax E
bj
=

abj

7. Evaluate the following double sums.

true in general? Justify your answer.

true in general? Justify your answer. 10. Show that

11. Defineai,i = (i)t'forO 5 j < i < n a n d ai,i By suitably interpreting the identity

OforO

< i <j < n

. show that (i) ( j $ ' ) 12. Prove Theorem 4-3.3.

+ + (7)

(7 ):.

4-41

IKTEGRAL D O M A I N S

121

4 4 Integral domains. As \ve observed in Section 4-2, the integers satisfy t,hree identit'ies which do not occur in the definition of a ring. This section is concerned with some very simple consequences of these special properties.

DEFIXITION 4-4.1. Let A be a ring. An element e E A is called a n identity for A , or an identity element of A , if
e..r for al1 x E A .
=

x - e = x,

THEOREM 4-1.2. A ring can have a t most one identity element.


Because of this theorem, we may speak of the identity element of A. It is clear from (4-1.6) (d) and (a) that the ring Z of a11 integers has 1 as its identity element. I t is common practice to denote the idcntity element of any ring by the symbol 1. This might cause confusion, but usually i t does not. The proof of Theorem 4-4.2 is easy. Suppose that e and e' are both identity elements of A. That is, e z = x . e = x and e' y = y e' = y for al1 x and y in A. I n particular, if \ve let x = e' and y = e, we get e . e ' = e . x = x = e ' a n d e . e ' = y . e ' = y = e. Thus,e = e . e ' = e ' . If a ring A contains an identity element 1, then by Theorem 4-2.4, Similarly, n: (-1) = -x. That is, the (-1) x = -(1 x) = -x. operation of negation is the samc as multiplication (on either side) by the element -1. Shis fact is very familiar for the integers, rational, and real numbers, so that the reader may not be surprised to find that it is a property of rings in general. We will use the identity

frequently, and wit,hout further discussion. If a ring contains a t least one element, x different from O, then O cannot be an ident,ity element, since O x = O # z. Thus the only ring in which zero is an identity element is the ring containing only the element O.

D E F I S I T I-2-4.3. O ~ An integral domain is a commutative ring A with a n identity element different from zero, such that the folloming cancellation law is satisfied in A : if x, y, and x are elements of A such that x z = y - 2 , then either x = y, or x = 0.
As we have observed, the rings of integers, rational numbers, and real numbers are al1 integral domains. There is a way t,o determine whether a commutative ring satisfies the cancellation la\\-. This test is most conveniently stated, using a nenr notion.

122

THE INTEGERS

[CHAP.

DEFINITION 4-4.4. Let A be a commutative ring. An element x E A is called a divisor of xero if for some x # O in A, x . x = O . If x is a divisor of zero and x # O, then x is called a proper divisor of xero. I t is evident from Theorem 4-2.4(c) that if A contains at least one element different from zero, then O is a divisor of zero. However, an integral domain has no proper divisors of zero. THEOREM 4-4.5. Let A be a commutative ring with an identity element different from zero. Then A is an integral domain if and only if A contains no proper divisors of zero.

Proof. Suppose that A is an integral domain. We wish to prove that A has no proper divisors of zero. Suppose that z is a divisor of zero. Then there is an x # O in A such that x x = O . By Theorem 4-2.4(c), O . x = 0. Thus, x . x = O x. By the cancellation law, either x = O, or x = 0. . In other words, O is the . Thus x = O However, by assumption, x # O only divisor of zero in A. That is, A contains no proper divisors of zero. The proof of the converse depends on the distributive law. Suppose that A has no proper divisors of zero. Since A is commutative and has an identity element not equal to O , it is only necessary to show that A satisfies the cancellation law. Assume that x, y, and x are elements of A satisfying x * x = y x. Then

by Definition 4-2.l(f) and Theorem 4-2.4. Thus, by Definition 4-4.4, either x ( - y ) = O, or else x is a divisor of zero. Since we are assuming that A contains no divisors of zero except O (that is, no proper divisors of zero), it follows that either x (-y) = O, or z = O . I f x (-y) = 0, than x = y. Thus, either x = y, or x = O . This shows that A satisfies the cancellation law.

Some of the most interesting problems in the study of the integers are concerned with divisibility. In fact, a large part of the theory of numbers is devoted to the divisibility properties of the integers and the natural numbers. Most of the next chapter deals with this topic. In the integers, x divides y if there is an integer z such that y = x x. This definition makes sense in an arbitrary ring A, but the notion "x divides y" is not very useful unless A is an integral domain. DEFINITION 4-4.6. Let A be an integral domain. Let x and y be elements of A. Then x divides y in A (or y i s divisible by x in A, or x i s a factor of y in A), if there is a z E A such that y = x x.

4-41

INTEGRAL DOMAINS

123

It is important to note that the notion of divisibility depends not only on the elements x and y, but also on the integral domain under consideration. For example, 2 divides 3 in &, but not in Z. Usually, however, discussions involving divisibility are restricted to elements of a fixed integral domain. In this case, the terminology "x divides y" and "y is divisible by x," and the notation xl y if x divides y,
x i y if x does not divide y can be used without danger of confusion. For example, in the next chapter, the notion of "divisibility" and the symbolism aJb will always refer to divisibility in the ring Z of al1 integers. The reader can easily verify the following facts. THEOREM 4-4.7. Let A be any integral domain. Then for elements x, y, 2, U, v of A, (a) if xly and ylx, then xlx; (b) if xly and xlz, then xI(y 2); (c) if x(y, then x xly x;

( 4 xlx Y; (e) if xl y and xlx, then xl(u y (f) 4 0 , 11x7 - l b ; (g) if OIx, then x = 0.

+v

2) ;

I f xly and ylx, then by definition, elements z and w exist in A such that f x = z . y , y = W - X . Hence, 1 * x = x = x a ( w - x )= (x-w) . x . I x # O, this implies by Definition 4-4.3 that x w = l. In the ring of integers it is easy to see from Definition 4-1.5 and (3-3.2) that the condition x . w = 1 can be satisfied only if z = w = 1, or if x = w = -1. Therefore, either x = y, or x = -y. The same conclusion is obtained if x = O, since y = w x = w O = O. Thus, we obtain the following result for integers.

THEOREM 4-4.8. Let x and y be integers. Suppose that x divides y and y divides x in 2. Then either x = y, or x = -y.
Of course, if x = y, or if x = -y, then xly and ylx in any integral domain. If y = x x and y = x . w in an integral domain A, then x z = x w. Hence, if x # O, then x = w. This observation justifies the following definition.

124

THE INTEGERS

[CHAP.

DEFINITION 4-4.9. Let A be an integral domain. Let x and y be elements of A such that x divides y in A, and x # O. Then the unique element x such that y =x*x
is called the quotient of y and x, and it is denoted by

1. Show that the ring of Example 2, Section 4-2, does not have an identity. Show that the rings of Examples 3 and 4 do have identities. 2. State which of the following rings are not integral domains and give your reasons. (a) The ring of al1 rational numbers. (b) The ring of Example 2, Section 4-2. (c) The ring of Example 3, Section 4-2. (d) The ring of Example 4, Section 4-2. (e) The ring of Problem 1, Section 4-2. 3. Give an example of an integral domain which contains exactly two elements. 5. If x and y are rational numbers, what are the conditions for x to divide
y in Q?

6. Using Definition 4-1.5 and (3-3.2), show that in the ring 2 , the condition
z . w = 1 can be satisfied only if x = w = 1 or x = w = -1.

7. Let A

((2, y)(x E 2, y E 2 ) . Define

Show that A is an integral domain. Show that (xi, yl) divides (x2, yz) in A if and only if xf yf divides both 21x2 yiy2 and xlyz - x2yi in 2.

8. Let A be an integral domain containing elements x, y , and x. Prove the following facts. (a) If zlx and zly, tllen x/z y/x = (x y)/z. (b) If zlx, then y (x/x) = (y x)/x. (c) If yjz and x((z/y), then (x y)lz, and z/(x. y) = (z/y)/x.

9. Show that if B is a subring of an integral domain A, and if B contains the identity element of A, then B is an integral domain.

4-51

T H E OI-~I)ERIXG OF THE ISTEGEKS

125
= {aela E

10. Lct A be a ring with the idcntity element e. Show that B is a subring of A . (See Definition 4-3.2 and Theorem 4-3.3.)

2)

11. Lct 1 1 bc an integral domain. Show that if B is a ring wEich is isomorphic to A , then B is also an integral domain.

4-5 The ordering of the integers. Since the ordcring of the natural numbers is of such great importarice in mathemutics, it is not surprising that ]ve JV-ould want to define a similar order relation on the intcgers. The ii7ay in which this is done is familiar. The ordering in Z is given by

But why order Z in this ivay? For examplc, why not, define

The ansiver is that thc ordcring (4-8) is thc only really useful onc. If an ordering is to be defincd on 2, it should agree on N ivith the usual ordcring, and it should satisfy as many of thc basic conditions listcd in (3-3.2) as possiblc. I n part.icular, such an ordering should a t least have the property that addition of the integcr c to cach of the integers a and b does not change the order relation between them. That is, if a

<

b, then a

+ c < b + c.

(4-9)

The fact is that thc only ordering of Z which agrees on N with the usual order relation, and which satisfics (4-9)) is the familiar one given by (4-8). This assertion is not hard to prove. Suppose that < is such an order relation defincd on A . If m and n are natural numbers and m < n, thcn (4-9) implies

Also, if m E hT, thcn 1

< m + 1, so that

Conseq uently, -m
=

+ (-m)

< m + (-m)

0.

Thus, our order relation agrees ivith (4-8). I t is possible t,o describe the ordering of Z given by (4-8) in a convenient way.

a is less than b (or b DEFIXITIOS 4-51. Let, a and b be intcgers. T h e i ~ is greater han a) if b - a E N. In this case, we write a < b (or b > a).

126

THE INTEGERS

[CHAP.

In other words, a < b if and only if there is a natural number m such that b=a m. It follows in particular that Definition 4-5.1 agrees with the usual ordering of N. (See Definition 3-3.4.)

THEOREM 4-5.2. Let a, b, and c be any integers. Then (a) either a < b, a = b, or b < a, and it is impossible for more than one of tzheserelations to be satisfied by a given pair a, b of integers; (b) if a < b, and b < c, then a < c; c <b c; (c) if a < b, then a (d) if a < b and c > O , then a c < b c.

These statements are easily derived from the properties of the integers and the natural numbers. For instante, if c is any integer, then by Definitions 4-1.1 and 4-1.2 exactly one of the following holds true: c E N, c = 0, -c E N. Applying this remark to c = b - a, it follows that either b - EN, b - a = 0 , o r -(b - a) = a - EN. Thus, by Definition 4-5.1, either a < b, a = b, or b < a. The condition c > O is obviously essential for Theorem 4-5.2(d) to be true. We point this out because neglecting to check this condition is a frequent source of error in algebraic manipulations involving inequalities. It is possible to derive most of the useful properties of the ordering of the ring Z from Theorem 4-5.2 (a), (b), (e), and (d). As we will show later, the rational numbers and the real numbers also have orderings which satisfy these laws. Thus, any theorems which can be proved using only the properties of rings and the laws given in Theorem 4-5.2 will also be true for Q and for R. By introducing the abstract notion of an ordered integral domain, we can cover al1 of these cases a t once. DEFISITIOX4-5.3. Let A be an integral domain. Suppose that an-order relation (written x < y or y > x) is defined on A satisfying the conditions of Theorem 4-5.2. That is, if x, y, and z are elements of A, then (a) either x < y, x = y, or y < x, and it is impossible for more than one of these relations to be satisfied by a given pair x, y of elements of A ; (b) if x < y and y < z, then x < x; (c) if x < y, t h e n x z < y z; (d) if x < y, and x > O , then x z < y 2 . Then, A is called an ordered integral domain.

By Theorem 4-5.2, Z is an ordered integral domain. As we remarked above, the number systems Q and R are also ordered integral domains so that al1 definitions and theorems concerning ordered integral domains apply to each of the rings Z, Q, and R.

4-51

TI-IE O R D E I ~ I S G OF TIIE ISTEGEHS

127

TIIEOREM 4-5.4. Let A be an ordercd integral domain. Let x, y, z, and w be any elemerits of 14. Thcn (a) if x < y and x < w, t,hen x z < w; (b) if x > O and y > O, then x -{ y > 0 ; (e) if x > O and y > 0, then x y > 0.

+
+

x <y x by Definition 4-5.3(c). Also, if Proof. If x < y, then x z < w, thcn y z < ?j w by Ikfinition 4-5.3(c) and the commutativity of addition. 'i'hus, by Ilefinition 4-5.3(b), if x < y and x < w, then x z < ?j W. This proves (a). 'i'he statement (b) is a special case of (a), and the statement (e) is a special case of Definition 4-5.3(d), using Theorem 4-2.4 (e) .

+ +

THEOREM 4-5.5. 1,et A be an ordered integral domaiii. Let x, y, and x be any elernent,~ of A . Then
(a) (b) (c) (d) (e) if x < y, thcn -y < -x; if x < y a n d x < O, then y . 2 if x < O and y < O, then x . y if x # O, then x2 > 0 ; 1 > 0.

< x.2; > 0;

Proof. (a) B y Dcfinition 4-5.3(c), -y = x [(-x) -1- (-y)] < y [(-x) -1 (-y)] = -x. (b) If z < O, then O < -2 by (a). Thus, by 4-5.3(d), -(x x) = x . (-2) < y (-x) = Theorem 4-2.4 and D~finit~ion Hencc, y . z = -[-(y.z)] < -[-(.c.x)] = x . 2 . (e) This -(y.z). statcmcrit is obtained from (b) by taking y t o be O and x t o be y. (d) If x # O, then either x > O, or x < O. If x > O, thcn x2 > O by Theorem 4-5.4(c). If x < O, then x2 > O by (c). (e) By Definition 4-4.3, 1 # O. Thus, by (d), 1 = 1 " 0.

It follows from Theorem 4-5.5(e) that not every iritcgral domain can have defined on it an ordcr relat,ion satisfying the coilditions of Definition 4-5.3, since there cxist integral domains A which satisfy thc condition 1 1 = O (sec Example 3 , Section 4-2). If such an 11 could be made irito an ordcrcd integral domain by a n order rclation <, then O < 1. Conscqucntly 1 = O $ 1 < 1 1 = O. This contradicts Dcfinition 4-5.3(a). In any ordered integral domain A , the elements of the set

are called the positice elements, and the elements of the set

are called the negatice elements.

128

THE INTEGERS

[CHAP.

If x E P and y E P, then by Theorem 4-5.4, x y E P and x y E P. Moreover, for any x E A, either x > O, x = O, or x < O . That is, any element of A is either positive, zero, or negative. This remark explains why an element x of an ordered integral domain is called nonnegative if either x > O, or x = O. By Definition 4-5.1, an integer a is positive if and only if a = a - O is in N. For this reason, the natural numbers, when regarded as elements of 2, are often called positive integers.

1. Prove Theorem 4-5.2(a), (e), and (d).

2. Let P be the set of al1 positive elements of an ordered integral domain A . Let M be the set of negative elements of A. Then A = P U (O) U 1M, where the sets P, (O), and M are pairwise disjoint. Justify this statement.

3. Let A be an ordered integral dolnain with P as its set of positive elements. Show that x < y in A if and only if y - x E P.
4. Let A be an integral domain. Let P be a subset of A which satisfies the following conditions : (a) if x E P and y E P , then x y E P and x y E P ; , then either x E P , or -x E P; (b) if x E A and x # O (c) o 4 P.

Define x < y if y - x E P. Prove that an ordered integral domain.

< is ar: order relation which makes A

5. The analogue for multiplication of Theorem 4-5.4(a) is: if x

< y a n d x < w, t h e n x o x < y o w .

Give an example to show that this is false. Find conditions on x, y, x, and w that will guarantee that x x < y w.
6. Show that i t is impossible to define an ordering of the integral domain given in Problem 2, Section 4-2, so that i t becomes an ordered integral domain. [Hint: Find an element x # O such that x2 < O.]

7. Let A be an ordered integral domain. Suppose that u E A, and m is a natural number. Prove the following statements. (a) If m is odd, then the equation xm = u has either one solution x in A, or else no solution in A . Give examples for the case A = Z to show that both of these possibilities can occur. (b) If m is even, and u > O , then the equation xm = u has either two solutions in A, or else no solution in A. Give examples for the case A = Z to show that both of these possibilities can occur. f m is even, and u = O , then x = O is the only solution of xm = u. (c) I (d) If m is even, and u < O, then xm = u has no solution in A.

4-61

PROPERTIES OF ORDER

129

4-6 Properties of order. Some of the most important concepts encountered in the calculus are defined in terms of the order relation of the real numbers. Many of these notions can be studied profitably in the abstract setting of ordered integral domains.

DEFINITION 4-6.1. Let A be any ordered integral domain. Then x 5 y 2 X) means that either x < y, or x = y. The relations x < y and (or y x y are called inequalities.

<

THEOREM 4-6.2. Let A be any ordered integral domain. Let x, y, z, and w be arbitrary elements of A. (a) If x < y and y z, or if x y and y < z, then x < z. (b) I f x y and y z, then x z. (c) If x y and y x, then x = y. y, or y x. (d) Either x y, y < x is satisfied. (e) Exactly one of the relations x (f) I f x < yandz w,orifx yandz < w , t h e n x + z < y + w . (g) I f x y and z w, then x z y w. y and z 2 0, then x z y z. (h) If x (i) If x y and z O, then y z x z. (j) For al1 x, x2 2 0.

< <

<

< < <

< <

<

< < <

< < <

<

<

+ < +
< <

Proof. The proof of this theorem is routine. For example, if x < y and z, then either x < y and y < z: or x < y and y = z. In the first y case, x < z by Definition 4-5.3(b). In the second case, x < z, since x < y = z. This proves the first part of (a). To prove the first part of (f), suppose that x < y and z 5 w. I f z = w, then x z < y w, since z <y z by Definition 4-5.3(c). If z < w, then x z <y w x by Theorem 4-5.4(a). We leave the remaining statements for the reader to prove.

<

I t is common practice to write sequences of inequalities and equalities. For example, is an abbreviation for x < y, y = z, z 2 w, and w < u. Usually al1 of the inequalities in such a sequence are directed in the same way, from small to large, or from large to small. I t then follows from Theorem 4-6.2 that the inequalities obtained by omitting part of the sequence are valid. In the above example, we get x < z, x < w, x < u, y w, y < U, and z < u. Frequently sets are defined by means of inequalities. By using the laws given in Theorem 4-6.2, it is often possible to simplify the descriptions of these sets.

<

130

THE INTEGERS

[CHAP.

EXAMPLE 1. Determine {x E Rlx2 x 5 2). The condition x2 x 5 2 is The product 2) (x - 1) = x2 x - 2 5 2 - 2 = O. equivalent to (x 2 and x - 1 have the same sign (either posi(x 2)(x - 1) is positive if x tive or negative). Therefore, the product (x 2) (x - 1) will be 5O if and only if x 2 and x - 1 have opposite signs, or one of these factors is zero. Since x-1<x 2, it follows that (x 2)(x - 1) 5 O is equivalent to x - 1 5 O 5x 2. Consequently, x2 x 5 2 if and only if x 5 1 and -2 5 x. Theref ore, J = { X E R ]- 2 5 X < 1). { X E R ~ x ~ + x 2)

+ +

+ +

DEFINITION 4-6.3. Let A be any ordered integral domain. Let S be a nonempty subset of A. An element x in A is called the smallest (or least, or minimum) element of S if x E S and x 5 y for al1 y E S. An element x in A is called the largest (or greatest, or maximum) element of S if x E S and y _< x for al1 y E S. The smallest element of S (if it exists) is denoted by min S and the largest element of S (if it exists) is denoted by max S. EXAMPLE 2. Min {x E Rlx 2 0) = 0, max {x E R / X 2 O) does not exist, min {z E Rlx > O ) does not exist, min {x E Z/x > O } = 1, max {x E Z(X> 0) does not exist. Generally, a nonempty set need not have either a smallest or a largest element. For example, it is easy to see that if S = A, then neither min S nor max S exists. However, if S is finite, the situation is different. THEOREM 4-6.4. Let S be a nonempty finite subset of an ordered integral domain. Then S has a smallest element and a largest element.

Proof. The proof is by induction on the number of elements in S. Let

= 1, then xl is both the largest and the smallest element of S. Suppose that n > 1 and that every set containing fewer than n elements has a largest element and a smallest element. Let

I f n

By assumption, T has a smallest element xi. Then by Theorem 4-6.2, either x; 5 x,, in which case xi is the smallest element of S, or x, < xi, in which case x, is the smallest element of S. Similarly, S contains a largest element. This completes the proof of the induction step and proves the theorem.

4-61

PROPERTIES OF ORDER

131

THEOREM 4-6.5. Let A be an ordered integral domain. Let x l , x2, . . . , x,, and y be any elements of A. (a) min ( x l , 2 2 , . . . , ~ n ) y = min (xl y, x2 4- Y , . . , xn y), max (21, $ 2 , . . . , xn) $ y = M ~ X (51 Y , x2 $ 9, . . . , X n $- Y ) . (b) If y O, then

>

+ +

y amin ( x l , ~ and

2 , .

. . ,~ n = ) min { y x l , y xp, . . . , y xn), . .,~ n = ) max ( y x l , y x2,. . . , y x,). . . . , y xn))

y 'max (x1,$ 2 , . (c) If y 5 O, then y min ( x l , x2, and

. . . ,~

n= ) max ( y x l , y

X2,

Proof. Let xi be the smallest of the elements x l , x2, . . . , x,. Then min { x l , ~ 2 . ,. . , x,) = xi, and xi 5 x l , xi ~ 2 . ,. . , xi x,. For any X i y 5 X n y. Thus) y, X i y 5 21 y, X i y X2 Y, since xi y occurs among the numbers xl y, x2 y, . . . , x, y, it follows that

+ < +

<

<

min ( x l

+ y, x z + y , . . . ,xn $ y)

= Xi $y =

min { x i , X 2 , . . . ,xn)

+Y-

f y This proves the first part of (a). I

> O, then by Theorem 4-6.2(h),

Thus, as before, min { y x l , y


~ 2 ,

. . . , y . xn)

Xi =

y min ( ~ 1X,2 ,

. . . , x,).

f 2~ This proves the first part of (b). I

< O, then by Theorem 4-6.2(i),

max { y . x l , y a x 2 , .

. . , y - x,)

y.

Xi =

y 0 m i n( x l , ~

2 , .

. ., ~ n ) .

This proves the first part of (e). The second statement of each part of the theorern is proved in a way which is similar to the proof of the first statement.

132

THE INTEGERS

[CHAP.

DEFINITION 4-6.6. Let A be an ordered integral domain. Suppose that x E A. The absolute value of x, denoted* by 1x1, is defined to be 1 x 1 = rnax (x, -x). In other words, 1 x 1 = x if x integers for example,

O and 1 x 1 = -x if x

< O.

Thus, in the

THEOREM 4-6.7. Let A be an ordered integral domain, and let x and y be arbitrary elements of A . (a) 1 x 1 2 O; moreover, 1 x 1 = O only if x = 0. (b) I f x > O, then Iyl x if and only if -x y x. (c) Ix (e) x

(4 IX YI

+ yl 5 1x1 + lyl.
=

<

< <

+ 1x1 2 o.

1 x 1 Iyl.

Proof. (a) If x # O, then either x

> O,

or -x

> O.

Hence,

rnax {x, -x)

> 0.

x, then rnax {y, -y) x. Thus, y 5 x and -y (b) If lyl x. By x. Conx implies -x 5 y. Thus, -x 5 y Theorem 4-6.2(i), -y y and y x, then -y x and y x. Therefore, versely, if -x x 1 1 x 1 and lvl Iyl. Hence, by (b), Iyl = max (y, -y) 2. (c) 1 -1 x 1 x 1 x 1 and -\y1 y Iyl. Therefore, by Theorem 4-6.2(g), -(Ixl 191) x Y 1 x 1 Ivl. Consequently, by (N, Ix YI 5 f x O and y 2 0, then x - y O, so that J x . y J= 1 x 1 ly(. (d) I 0,thenx-y O . Inthiscase, lx.yl = x . y = Ixj.Iyl. I f x 2 0 , y -(x.y) = x g ( - y ) = lxI.IyI. The case x O, y 2 O is similar. Finally, if x 0, y O, then x y 2 0, and therefore lx . y] = x y = (-x) (-y) = 1 x 1 1 y]. (e) By Definition 4-6.6 and Theorem 4-6.5(a),

<

<

<

<

< <

<

<

+ < +
> < <

< < < <

<

<

<

< <

<

> < <

1 x 1

+ x = rnax {x, -2) + x = rnax {x + x, -x + x]


= rnax (2x, 0)

> 0.

Problems involving algebraic manipulation of absolute values occur often in analysis. Sometimes Theorem 4-6.7 can be used to solve them.

* I t might be expected that the use of vertical bars to denote both absolute value and the cardinality of a set would be confusing. However, both notations are standard and the double meaning causes no cofusion.

4-61

PROPERTIES OF ORDER

133

EXAMPLE 3. Determine {x E R11x2 - 4x1 ( 1). By Theorem 4-6.7(b), 1x2 - 4x1 $ 1 is equivalent to -1 5 x2 - 42 $ l. These inequalities hold if and only if 3 = -1 4 $ x2 - 4x4- 4 $ 1 -l-4 = 5. Thus, x belongs to the set {x E R 1 jx2 - 4x1 1) if and only if

<

If 4 3 and d5 denote the (positive) square roots of 3 and 5, then this inequality can be written in the form

It follows that lx2 - 4x1

< 1 if and only if either

(see Problem 3 below). Hence,

1. Complete the proof of Theorem 4-6.2. 2. Let A be an ordered integral domain. Prove the following properties of A. (a) If O x < y and O z < w, then x z < y w. (b) If O 5 x $ y and O z $ w, then x z 2 y w. (e) I f O x < y, then xn < yn for al1 n E N. (d) If O x y, then xn 5 y" for al1 n E N. (e) I f x < y, then z - y < z - x. (f) If x $ y, then z - y 5 x - x. (g) I f x $ y, then -y -s.

< < <

< <

<

3. (a) Prove that if u, u, and x are elements of an ordered integral domain such that O u 5 u, then u2 $ x2 $ v2 if and only if either

<

Show that if O

2 u, then u2 < x2 < v2 if

and only if either

(b) Generalize this result from squares to arbitrary exponents. 4. Let A be an ordered integral domain. Prove the following properties of the elements of A. (a) ~ X Y L x2 y2 (b) If x # y, then 2xy < x2 y2. ( 4 (XY (x2 x2) (y2 w2) [Hint: Show that [(x2 z2)(y2 w2) - (xy Z W ) ~ 2] O.]

<

134

T I I E ISTEGEHS

5. Determine thc folloiving scts of real nurilbcrs. (a) {x E R14x - 12 > O) (b) { X E R 1 9 - 3 x < 0) (c) {x E R12x+ 1 < 42 - 7)
(d) {x E R((-5)(3 - 2) L (i) (x 4- 4)) (e) { x E R19x2 < 25) (f) {x E R(7x2 > 63) (g) { x E Rl(x - l ) ( x + 2) < 0) (h) {x E R ( ( x - 1)(x - 3) > 0) (i) { X E R ( ( x - l ) ( x + 2) < 0) (j) {x E RIx2 x > 6) (k) {x E R J X ~ x - G < 6) (1) {x E R l ( x + l ) ( x - 2)(x - 3) < 0) (m) { x E R ( ( x - 3 ) ( x + 1)(x - 2) > 0) (n) { X E RIx3 - x + 3 > 3)

6. Prove thc sccond half of cach stat.emcnt in Thcorcm 4-6.5.

7. Show t h a t if 1 1 is an ordered intcgral domain, then rnax not exist. JVhat does this imply about finite integral domairis?

and min A do

8. Let A bc an ordered intcgral domain. Suppose S and T are finite nonempty subsets of 11. Show t h a t (a) max S U T = max {max S, max T) . (b) min S U T = min {min S, min T). 9. Let *,1be an ordcrcd integral domain. Suppose t h a t x, y, and z are arbitrary elements of 11. l'rovc thc following facts.
(a> 1 5 1 (b) lx21
=

(e) lx y 21 1x1 IzI ( 4 1x1 - 191 I lx - y 1 1x1 IYI (e) If z > 0, then Ix - y/ < z if and only if

+ + < + 1!/ +
<

1x1 x2

(f) If x divides y in : 1 , thcn 1x1 divides

lIrl

in .1, and Il//xl

lyl/lxl.

10. Ileterrninc thc following scts of real numbers. (b) { X E RI(1 - xl < 2) (a) { X E Rll2x+ 11 < 3) (c) ( x E R(21x - 11 < 3) ( 4 {X E R112~+ 41 5 2) (e) { c E Rllx - 11 > O) (f) {x E R11x - 31 > 4) (h) {x E R114x - x21 2 4) (g) i c E RIIx2 - 2x1 < 1) (i) {x E RI[x2 - 2x1 > 3)

CHAPTER 5

ELEMENTARY NUMBER THEORY 5-1 The division algorithm. In this chapter we will develop some useful and interesting results concerning the natural numbers and the integers. Our study can be viewed as a brief introduction to the theory of numbers, one of the oldest and most respected branches of pure mathematics. We begin with a discussion of the familiar process of long division and some of its consequences. I f a and b are integers, and if a O, then it is possible to "divide a into b" obtaining a quotient q and a remainder r. The exact statement of this fact is calIed the division algorithm. It is a basic result of considerable importance in number theory.

THEOREM 5-1.1. Division aigorithm. Let a and b be integers, with a # O. Then there exist unique integers q and r such that
b = aq where O

+ r,

L: r < lal.

Proof. There are two things to prove. We must show that integers q and r exist which satisfy

and that there is just one pair of integers q and r which satisfy these conditions. First we present the existence proof. Consider the set S of al1 nonnegative integers of the form b - ax, x E 2. We wish to apply the wellordering principle (2-5.2) to the set S. It is first necessary to prove that S f a 1, then is nonempty. Since a # O, either a 2 1 or a 5 -1. I ajbl L Ibl, and

>

f a Thus, if x = -lb(, then b - ax E S. I

5 -1, then -albl 2 161, and

In this case, b - ax E S, when x = Jbl. By the well-ordering principle, S contains a smallest number r. Let q be the value of .2: such that b - ax = r.
135

136

ELEMESTARY

XUMBER THEOHY

[CHAP.

That is, b - aq = r, r a l. Thcn

>

> O.

\Ve will prove that r b-aq-a

<

lal. Suppose that

b-a(q+l)=

<b-aq=r.

Since r is the smallest number in S, it follows that b - a(q 1) is not a member of S. Since b - a(q 1) is of the form b - ax, and since S consists of a11 nonncgative integers of this form, the only possiblc reason for b - a(q 1) not to be in S is that O - a(q 1) is negativc. Hence,

If a 5 -1, then

Thus, by the argument we have just given, b

a(q

1)

< O. <

Therefore,

In both of the possible cases a 1, a j -1, \ve obtain r To prove uniyueness, suppose that

>

lal.

where O

5 r < lal, and


b = aq'

+ r',
+

where O 2 r' < lal. Then aq r = aq' r'. Thus, a(q - q') = r' - r. Taking absolutc values, wc obtain la/ I q - q'I = Ir' - TI. By adding the inequalities -la( < -r 5 O and O 5 r' < la!, it follows that -(al < r' - r < lal. Therefore, by Theorem 4-6.7, Ir' - rl < lal. Hence, (al Iq - qr( < la(, so that I q - q'l < 1. Since q and q' are integers, so is Moreover O j I q - q'J < 1. Hence, Iq - q'l = 0, and thereI q - qr(. fore q = q'. Consequently, r = r'. In thc cxpression b = aq r, q is called the quotient and r the remainder in the division of b by a. If r = O, then a divides b in 2, and in this case q = b/a. The division algorithm can be gcneralized to obtain an important theorem on the rcprescntation of natural numbers. The proof of this gcncralization uses thc following simple fact. (5-1.2). If a

>

1 and b

>

O in Theorem 5-1.1, then b

>

0.

5-11

THE DIVISION ALGORITHM

137

Proof. Assume that q < O . Then -q > 0, and therefore -q 2 1. Hence, a(-q) 2 a > r. Adding aq to both sides of this inequality, we obtain O>aq+r=b>O, which is a contradiction. Consequently q O . If q = O, then obviously f q > O, then since a > 1, we have aq > q. Therefore b > q 2 0. I b=aq+r>aq>q>O. This proves (5-1.2). THEOREM 5-1.3. Let a be a natural number greater than l . Then every natural number n can be uniquely represented in the form

>

where 7 is some nonnegative integer and rO,r l , . . . , rk are nonnegative integers less than a. Proof. The proof is by course of values induction on n (see 2-3.3). Assume that every natural number m < n can be represented uniquely in the form rkak rk-lak-l . rla ro,

+ +

where ro, r l , . . . , rk are nonnegative integers less than a. By the division algorithm, there are unique integers q and r such that

f q = O, then n = r is the required unique By (5-1.2)) n > q 2 O . I representation (with 7 = O, ro = r). Of course this is the case when n = l. Assume now that q > O, that is, n a. Then q is a natural number less than n, so that the induction hypothesis applies to q. Thus, there is a unique representation

>

Using this representation of q, we obtain

With a change of notation, this is an expression for n in the required form. S o prove that this representation of n is unique, suppose that n = sjai

+ sj-lai-l +

+ sla + so,

138

ELEMENTARY NUMBER THEORY

[CHAP.

where j is a nonnegative integer, and so, sl, . . . , sj are nonnegative integers less than a. Because n 2 a, it follows that j 2 1, and

Since O

5 so < a, the uniqueness of q and r implies that so = r and

Finally, it follows from the uniqueness of the expression

In the particular case a = 10, this theorem is merely a formal statement of the well-known fact that every natural number can be represented in decimal notation. In fact, when we write an expression such as

we are using a standard abbreviation for the number

The fact that every natural number admits a unique representation of this form is usually taken for granted. By Theorem 5-1.3, such an assumption is justified. In fact, we have proved that it is not necessary to use powers of 10 for such a representation. Any natural number a > 1 will do just as well. In an expression

the number a is called the base, or radix, of the representation. As in the case of the decimal system, it is convenient to abbreviate

For this notation to be unambiguous, it is necessary to have individual symbols representing each of the numbers O, 1, . . . , a - 1. I f a 5 10, then the customary digits can be used. For example, every number is expressible with the base 5, using the coefficients O, 1, 2, 3, and 4. Thus, 1 53 3 5 2 O 5 f 1. 1411301 represents 1 56 4 55 1 5*

5- 1 1

THE DIVISION ALGORITHM

139

I f a > 10, then new symbols must be introduced for the numbers which are written in the decimal notation as 10, 11, . . . , a - 1. (Clearly, the use of 10, 11, etc., would be confusing.) A frequently used base is 12. The scheme for representing numbers to the base 12 is called the duodecimal system. The letters A and B are often employed to denote 10 and 11, respectively, in the duodecimal system. For instance,
7AlBO represents 7 l2*

+ 10

lz3

+1

1 2 ~ 11 12

+ 0.

I n representing numbers to bases other than 10, we must be careful f there is a possibility of that the base being used is clearly understood. I confusion, the base is usually indicated as a snbscript. Thus,

The reader will find that with a little practice he can do elementary arithmetic with numbers expressed to bases other than 10. The methods used are the usual ones.
EXAMPLE 1. Let 413204, 223001 be numbers written to the base 5. Then

The magnitudes of numbers expressed to any base can be compared in the same familiar way that decimal numbers are compared. For example, (11113

<

(22213, and

(132)5

<

(141)5,

(11)2<(100)2,

(A99)12<(B00)12.

140

ELEMENTARY NUMBER THEORY

[CHAP. 5

The rule used to compare two numbers is a simple one, but the general statement of it is somewhat involved.

THEOREM 5-1.4. Let a be a natural number greater than 1. Suppose that n and m are natural numbers represented to the base a. Then n < m if and only if either n has fewer digits than m or n and m have
the same number of digits, and at the first place from the left where the digits of n and m differ, the digit in n is less than the corresponding digit in m. The proof is left as an exercise for the reader (see Problem 9 below). In the binary system of enumeration, each number is represented to the base 2. Thus, a binary number is written as a series of zeros and ones. For example,

Many large-scale digital computers operate with numbers in binary form. There are two reasons for this. First, the operations of addition and multiplication are particularly simple in the binary system. Second, most of the basic components (switches, relays, diodes, rectifiers) of digital computers are bistable devices, that is, they are always in one of two states, which conveniently correspond to the digits O and 1 in the representation of a binary number. The binary system of enumeration has other important uses in mathematics. EXAMPLE 2. For the case a = 2, Theorem 5-1.3 can be stated as follows: every natural number n can be uniquely represented in the form

where k l

>

k2

>.

> k , >: O.

Thus,

is a one-to-one correspondence between the set of al1 finite subsets of (O, 1, 2, . . .) and N. Since the set of al1 finite subsets of any denumerable set has the same cardinality as the set of al1 finite subsets of (O, 1, 2, . . .) (see Problem 14, Section 1-3), we obtain a useful theorem. The set of al1 finite subsets of a denumerable set is denumerable.

EXAMPLE 3. The binary number system can be used to obtain a winning strategy for Nim, an ancient game, which originated in China. Nim is played by two contestants, using three piles of counters. The contestants alternately pick

5-11

THE DIVISION

ALGORITHM

141

up any number of counters from one of the three piles. On each play they must take a t least one counter, and the counters chosen can come only from a single pile. The winner is the player who takes the last counter. From a mathematical standpoint, the game is completely described by specifying the numbers 1, m, and n of counters in each of the piles a t the beginning of each play. Thus, a sample game might be represented by

We say that a triple (1, m, n) of nonnegative integers describes a "position" of the game. Let (1, m, n) be any position. Write 1, m, and n in binary form.

We cal1 this position "unfavorable" if each of the sums

) is unfavoris even. Otherwise, the position is "favorable." In particular (O, 0, O able. I t is easy to see that if (1, m, n) is unfavorable, then any position (1', m, n) with 1' < 1, or (1, m', n) with m' < m, or (1, m, n') with n' < n is favorable. Suppose that the counters are removed from the first pile. Then 1' < 1, so that when 1' is written in binary form

a t least one of the binary digits e; is different from the corresponding e i in the representation of l. Since e; and e: are either O or 1, we have e: = e i 1 if ei = 0, and e: = e; - 1 if e i = l. Therefore,

g i is even. The fact that (1, m, n) is an unfavorable position means that e i f i Consequently e: f ; g; is odd, and the position (l', m, n) is favorable. I n particular, an unfavorable position can never lead to (O, 0, O); hence no player can finish the game from an unfavorable position. Thus, a good strategy is to remove enough counters from some pile so that the opponent is left in an unfavorable position. Of course, if (1, m, n) is unfavorable, this is impossible. But if (1, m, n) is favorable, then by reducing one of 1, m, or n, an unfavorable position results. This can be done in the following way. Suppose that j is the largest number such that ei fi gi is odd, that is, ek fk gk, ek-i fk-i gk-1, . . . , ei+i fi+l gj+l are even, but ei fi gi is odd. B y the definition of a favorable position, such a j exists. Then either e j = 1, fi = 1, or g j = 1, since

+ +

+ +

+ +

+ +

+ +

142
otherwise ei fi Define for each i

ELEMENTARY NUMBER THEORY

[CHAP.

+ + gi

O. For the sake of definiteness, suppose that ej = 1. if ei if

e', = ei e',= 1 - e i In particular, el = ei for i

+ fi + gi
ei+fi+gi O

is even, isodd. Let

> j and e$ =

< ej.

' Then 1

< 1 by Theorem 5-1.4

and for al1 i,

if ei

+ fi + g; is even,
+ +

if ei fi gi is odd. Consequently, (l', m, n) is unfavorable. To illustrate this procedure, suppose that

Then 1 = 0 . Z 5 + 1 . Z 4 + 1 * 2 3 + 0 . 2 2 + 1 . 2 + 1,

Thus, (1, m, n) is favorable. Using the procedure which we outlined above, we can obtain an unfavorable position (1', m, n), where

That is, the appropriate strategy is to remove 6 counters from the pile containing 27. The other player will then be faced with an unfavorable position, so that whatever he does leads to a new favorable position. In this way, the player who finds himself with a favorable position can always keep his opponent in an unfavorable position and win the game. In particular, if the initial position is favorable, the player with the first move can always win, provided that he knows the method which we have described. If he does not know this strategy, there is a good chance that one of his moves will lead to a favorable position for his opponent, since usually there are more favorable than unfavorable positions.

5-11

THE DIVISION ALGORITHM

143

1. Using long division, find the quotient by a, where a and b are as follows. (a) a = 212, b = 3111 (c) a = 2164, b = -6411037 (e) a = -121, b = -36

and remainder in the division of b (b) a (d) a (f) a


= = =

-2164, b = 6411037 121, b = -36 21, b = 31

2. Write the following decimal numbers in the base 5 system of enumeration: 2, 21, 3116, 711096, 1O1O. 3. Write the following decimal numbers in the base 12 system of enumeration: 4, 16, 3102, 999111. 4. Find the decimal expression for the following numbers:

5. Convert the following numbers in the base 2 to the base 8:

6. Carry out the following addition without converting to the base 10. (a) (12145)s (51015)s (b) (111010111)2 4- (10101100)2 (c) (1AlB21)12 (ABAB11A)i2 (d) (140314)5 (2134114)5

7. Carry out the following multiplication without converting to the base 10. (a) (12145)s (51015)s (b) (111010111)2 (10101100)2 (e) (1AlB21)12 (ABAB11A)i2 (d) (140314)5 (2134114)s
8. Let n be a natural number. Suppose that

(a) Show that in the sequence of numbers 1, 2, 3, than 2k is divisible by 2k (in 2). (b) Use the result of part (a) to prove

. . . , n, no number other

where a E N, b E N, and a and b are odd.

9. Let a be a natural number greater than l. (a) Show that if n = rkak rk-lak-l ro, where O 5 ri < a, then n < akfl. [Hint:First show that a k f l - 1 = (a - l)ak (a - l)ak-l (u - l).] (b) Use the result of (a) to prove Theorem 5-1.4.

+ +

+ +

144

ELEMESTAHY

SCMBEII

THEORY

[CIIAP.

5-2 Greatest common divisor. If a and b are any two integers, then ari integer c is called a common dizjisor of a and b if cla and cib. Several simple facts follow immcdiatcly from this definition. Since 1 is a divisor of every integer, 1 is a common divisor of any two integers a and 6 . Thus, the set of common divisors of two integers is nonempty. Every intcger divides 0. Hence if b = O, then the common divisors of a and b are just the divisors of a. In particular, if a = b = O, every integer is a common divisor of a and b. I n this case thc set of common divisors of a and b is infinite. However, in every othcr case the set of common divisors of a and b is finite. Indeed, if a # O and cla, then a = w c for somc nonzero integer w. Consequently, by Theorem 3-6.7, la1 = Iwl Icl 2 Icl. 'i'hercforc, if a # O and if c is a common divisor of a and 6, thcn -/al 5 c 5 lal. Obviously, there are only firiitely many integcrs c satisfying - la1 5 c 5 lal. Similarly, if b # O, then - lb1 5 c 5 lb/, and therc are only finitely many integers c satisfying -lb/ 5 c 5 jbl. Therefore, if cithcr a or O (or both) is different from zero, thcn thc set of common divisors of a and b is finite and nonempty. Thus, by Theorem 4-6.4, this set coritains a largest integer. Since 1 is in the set of common divisors, this largcst integer is positive, that is, it is a natural number.

DEFIXITION 5-2.1. IJet a and b be integers which are not both zero. The greatest common divisor of a and b is the largcst integer in the set of al1 common divisors of a and b. The greatest common divisor of a and b is denoted by (a, b). The expression "grcatest common divisor" is often abbreviatcd g.c.d.

EXAMPLE 1. The common divisors of 12 and -30 Therefore, the g.c.d. of 12 and -30 is 6.

are f 1 , f 2, f3, *6.

Xote that al1 of thc common divisors of 12 and -30 divide the greatest common divisor. We will show that this is no coincidence, but rathcr is a fundamental property of the g.c.d. THEOREM 5-2.2. Let a and b be integers which are not both zero. (a) There exist integers u and v such t h a t (a, b)
=

ua

+ vb.

(h) Every common divisor of a and b divides (a, b).

5-21

GREATEST COMMON DIVISOR

145

Proof. If c is a common divisor of a and b, then c divides any number of tb, where S and t are integers. Thus, statement (b) follows the form su from the property (a). Let S be the set of al1 positive integers (natural numbers) which are of the form su tb, with s and t integers. Since a and b are not both zero, at least one of the integers

is positive and therefore belongs to S. By the well-ordering principle, S contains a smallest number d. By the definition of S, there are integers u and v such that d = u.a+v.b. As we noted above, every common divisor of a and b divides d, so that in particular (a, b) Id. Thus, (a, b) 5 d. The proof will be finished if we show that d 5 (a, b). By the division algorithm,

where O

5 r < d, r

E 2, q E 2 . Therefore,

I f r were positive, then r E S, since r is of the form sa tb. But r < d and d is the smallest number in S. Therefore, r cannot be positive, that is, r = O. Thus, a = q d, so that d divides a. Similarly, d divides b. Therefore, d is a common divisor of a and b. Since (a, b) is the greatest common divisor of a and b, d 5 (a, b). The two inequalities (a, b) 5 d and d 5 (a, b) imply that (a, b) = d = ua vb.

Suppose that a and b are integers which are not both zero and d is an integer which satisfies the following conditions: dla and dlb. (5-1 (5-2)

I f c is an integer such that cla and clb, then cid.

By (5-1)) d is a common divisor of a and b. Therefore, by Theorem 5-2.2(b), dl (a, b). Since (a, b) ( aand (a, b) b, it follows from (5-2) that (a, b) Id. Thus, d = &(a, b). I n other words, the greatest common divisor of a and b is characterized up to its sign by the above conditions. In fact, these conditions together with the requirement that d be positive can be taken as the definition of the g.c.d. in 2 . The importance of the conditions (5-1) and (5-2) lies in the fact that they make sense in an arbitrary integral domain,

146

ELEMENTARY NUMBER THEORY

[CHAP.

whereas Definition 5-2.1 depends not only on the ordering of 2, but also on the very special fact that a nonzero integer has only a finite number of divisors. Accordingly, if A is an integral domain and if a and b are elements of A which are not both zero, then an element d E A is called a greatest common divisor of a and b (in A) if d satisfies (5-1) and (5-2) [where in (5-2)) c is an element of A]. O f course, in some integral domains, not every pair of elements has a greatest common divisor (see Probelm 14 below). Also, greatest common divisors in integral domains need not be unique. For example, in Q, if a and b are not both zero, then every nonzero rational number satisfies (5-1) and (5-2). We will use this generalized notion of a greatest common divisor in our discussion of polynomials in Chapter 9. We will now derive some of the most useful properties of greatest common divisors. The first of these are simple consequences of Definition 5-2.1. (5-2.3). Let a and b be integers which are not both zero. Then ( 4 (a, b) 2 1; (b) (a, b) = (b, a) ; (c) (a,b) = (-a,b) = (a,-b) = (-a,-b) = ((al,lbl); (d) (a, b) = (al if and only if al b; (e) (a, O ) = la1 (provided that a # 0). Proof. The statement (c) becomes evident if we note that the set of common divisors of a and b is identical with the sets of common divisors of -a and b, of a and -b, and of -a and -b. To prove (d), suppose first that (a, b) = lal. Then in particular la1 is a divisor of b. Therefore, al b. Conversely, if alb, then any divisor of a is also a divisor of b. Thus, the common divisors of a and b are exactly the divisors of a. Note that a # O, since a = O and alb implies b = 0, and we have assumed that a and b are not both zero. By the discussion preceding Definition 5-2.1, every divisor lal. Since la1 divides a, it follows that la1 is the largest c of a satisfies c divisor of a, and therefore la1 is the g.c.d. of a and b. The remaining statements of (5-2.3) are easy to prove.

<

THEOREM 5-2.4. Let a and b be integers which are not both zero. Suppose that c is a nonzero integer. Then (ca, cb) = Icl(a, b). Proof. Since (a, b) is a common divisor of a and b, and since Icl divides c, it follows that Icl (a, b) is a common divisor of ca and cb. Hence, by Theorem 5-2.2(b), Icl (a, b) divides (ca, cb).
0 1 1the

other hand, by Theorem 5-2.2(a), there exist integers u and v such

5-21

GREATEST COMMON DIVISOR

that (a, b) = ua

+ vb.

Consequently,

where u' = u, v' = v if c > 0, and u' = -u, u' = -v if c (ca, cb) is a common divisor of ca and cb, it follows that (ca, cb) divides Icl (a, b).

< O.

Since

Since both (ca, cb) and Icl(a, b) are positive, and each divides the other, (ca, cb) = Icl(a, b). An immediate consequence of this theorem is the following. THEOREM 5-2.5. Let a and b be integers which are not both zero. Suppose that c is a common divisor of a and b. Then

Note that since a and b are not both zero, and c divides both a and b, c cannot be zero. By Theorem 5-2.4, (a, b) = (c a/c, c b/c) = jcl (a/c, b/c). Any pair of integers a and b has -1 and 1 as common divisors. I f these are the only common divisors, then a and b are said to be relatively prime. I n other words, a and b are relatively prime if (a, b) = 1. For example, 2 and 5 are relatively prime, 9 and 16 are relatively prime, -27 and 35 are relatively prime, but 24 and 63 are not relatively prime since they have 3 as a common divisor. We now obtain a result which is needed in the next section for the proof of the fundamental theorem of arithmetic. THEOREM 5-2.6. Suppose that a and b are relatively prime and a divides b c. Then a divides c. Proof. Since (a, b) = 1, by Theorem 5-2.2(a) there are integers u and v such that 1 = ua vb.

Multiplying this equation by c, we obtain

Since a divides bc and a divides a, it follows that a divides (uc)a

+ v(bc) = c.

As we pointed out at the beginning of this section, any common divisor c of two nonzero integers a and b satisfies c 5 min (la/, Ibl}. Thus, the

148

ELEMENTARY NUMBER THEORY

[CHAP.

problem of finding the g.c.d. of a and b can be solved by examining al1 of the natural numbers which are less than min {[al,lb[> to find the largest one which divides both a and b. However, unless a and b are small, this procedure is impractical. There is a very efficient process for determining the g.c.d. of two integers, using the division algorithm. This method was apparently discovered by Euclid, and is called the Euclidean algorithm. I f either a or b is zero, then (a, b) is obtained from (5-2.3e). Moreover, since (a, b) = (la], 1 bl) by (5-2.3c), it is only necessary to consider positive integers, that is, natural numbers. Let a and b be natural numbers. By the division algorithm,

I f r l = O, then b divides a, (a, b) divide b by rl, obtaining

f r l # 0, 6, and the process ends. I

I f r2 = 0, the process ends. Otherwise, divide rl by r2, and the division algorithm yields

This process can be continued as long as a nonzero remainder is obtained. Since each new remainder is a nonnegative integer which is smaller than the preceding one, the sequence

. Thus, we have the following equamust terminate with some rn+l = O tions : a = bq1 r1,

We will show that r,, the last nonzero remainder, is the g.c.d. of a and b. By the equation rn-1 = rnqn+l, we see that rnlrn-l. Since rn-2 = rnWlqn r,, it follows that rn[rn-2. Continuing up the sequence of equa-

5-21

GREATEST COMMON DIVISOR

149

tions (5-3) we find that r, divides each of the preceding remainders. Then rl, it follows since b = rlq2 r2, it follows that r,l b, and since a = bql that r,la. Therefore, r, is a common divisor of a and 6. Suppose that c is any common divisor of a and b. Then cjr l , since rl = a - bql. Consequently, clr2, since r2 = b - r1q2. Continuing down the sequence of equations (&3), we find that c!r3, clr4, . . . , clrn. In particular, since r, # O, c 5 Icl 5 irn\ = rn. T ~ u s rn , is the g.c.d. of a and b.

EXAMPLE 2. Let a

24756, b

6108. We obtain the following equations:

Therefore, (24756, 6108)


=

12.

I t is possible to use equations (5-3) obtained in applying the Euclidean algorithm to determine not only the greatest common divisor d of any pair a and b of natural numbers, but also integers u and v such that

d = ua

+ vb.

The existence of such numbers was proved in Theorem 5-2.2(a), but that proof does not give a convenient method for finding the values of u and v. We illustra-te the use of equations (5-3) to find u and v with t,he example a = 24756, b = 6108. Write the equation 48 = 36 1 12 in the form 12 = 48 - 36 1. Now reorder the preceding equation of Example 2 to obtain 36 = 276 48 . 5. Substitute and collect:

Continue this process, using each of the equations obtained in the above 0) : example (except the last one, 36 = 12 3

150 In general terms,

ELEMEXTARY NUMBER THEORY

[CHAP.

Continuing up the set of equations (5-3) in this way, we can eventually express r, in terms of a and b. I t is possible to extend the definition of the greatest common divisor to collections of severa1 integers. Thus, if {al, az, . . . , a,) is any nonempty set of integers, we say that c is a common divisor of the integers in this set if c/ul, c/u2,. . . , and clan. I f not al1 of a l , a2, . . . , a, are zero, then this collection has only a finite number of common divisors, and therefore there is a natural number d which is the greatest common divisor of al, a2, . . . , a,. As before, the g.c.d. of al, a2, . . . , a, is denoted by (a1, a2, . . . , a,). Note that if n = 1, then (al) = lal/. Further, if any ai = O, then (al, a2, . . . , a,) = (al, a2, . . . , ai-1, ai+l, . . . , a,), so that we may restrict our attention to sets of nonzero integers. THEOREM 5-2.7. Let al, a2, . . . , a, be nonzero integers, where n (a) There exist integers ul, u2, . . . , u, such that (al, a2,

> 1.

. . . , un)

C Unan.
. . . , a, divides (al, a2, . . . , a,).

(b) Every common divisor of al, a2,

The proof of this theorem is similar to the proof of Theorem 5-2.2, and we leave it as an exercise. A useful consequence of Theorem 5-2.7(b) is the f ollowing theorem. THEOREM 5-2.8. Let al, a2, . . . , a, be nonzero integers, where n 2 2. Then (al, a2) - an) = (al7 (a27 7 un)).

f c is a common divisor of al, a2, . . . , a,, then by TheoProof. I Thus, by Theorem 5-2.2(b), rem 5-2.7(b), clal and cl(a2, . . . , a,). ( a ( a , . . . , a ) ) In particular, (al, a2, , an)I(ai, (un, . . , un)). Conversely, if c is a common divisor of a l and (a2, . . . , a,), then clal and cl(a2, . . . , a,). Therefore, c is a common divisor of al, a2, . . . , a,, so that by Theorem 5-2.7(b), cl (al, a2, . . . , a,). In particular,

5-21

GREATEST COMMON DIVISOR

151

Since (al, a2, . . . , a,) and (a1, (a2, . . . , a,)) are natural numbers, each of which divides the other, they are equal. By using the Euclidean algorithm, together with Theorem 5-2.8, i t is possible to determine the g.c.d. of any nonempty finite set of natural nurnbers. Moreover, Theorem 5-2.8 can also be used with induction t o extend results on the greatest common divisor of two integers to theorems about any nonempty set of integers. Let a l , a2, . . . , a, be any nonzero integers. An integer c is called a common multiple of a l , a2, . . . , a, if a l (c, a21c, . . . , and u,~c. Evidently, al-a2. a, and -al a2 . . a, are both common multiples of a l , a2, . . . , a,. At least one of these is positive. By the well-ordering property of the natural numbers, there exists a smallest positive integer c which is a common multiple of a l , a2, . . . , a,. This unique positive integer is called the least common multiple (or 1.c.m.) of a l , a2, . . . , a,. The usual notation for the 1.c.m. of a l , a2, . . . , a, is [al, a2, . . . , a,]. There is a close relationship between the g.c.d. and the 1.c.m. I n fact i t is possible t o prove that for any two nonzero integers a and b,

(see Problem 12 belom).

ua

+ vb.

l. In the following cases, find the g.c.d. (a, b), and express it in the form

(a) a = -121, b = 33 (b) a = 543, b = -241 (c) a = 78696, b = 19332 2. Show that in the expression (a, b) = ua vb, the integers u and v are not unique. 3. Find the following g.c.d.'s. (a) (144, 90, -1512) (b) (1932, 476, -952, 504, -9261) 4. Show that the integers a and b are relatively prime if and only if there exist integers u and v such that ua vb = 1.

5. Show that any two consecutive integers are relatively prime. 6. Prove that any two successive terms of the Fibonacci,sequence

are relatively prime (see Section 2-6). 7. Prove Theorem 5-2.7.

152

ELEMENTARY NUMBER THEORY

[CHAP.

8. Show that if al, a2, . . . , a, are nonzero integers (n ' : l ) , and if c # 0, then (cal, caa, . . . , can) = IcI(a1, a2, . . . , a,).
9. Show that if a and b are integers which are not both zero, then a/(a, b) and b/(a, b) are relatively prime. 10. Let a, b, and c be nonzero integers. Prove the following result concerning least common multiples : [ca, cb] = Icl[a, b]. 11. Let a and b be nonzero integers which are relatively prime. Use Theorem 5-2.6 to show that [a, bl = la1 Ibl. 12. Using the results of Problems 9, 10, and 11, show that for any two nonzero integers a and b, [a, b l b , b) = la1 Ibl. 13. In equations (5-3), show that (a) rn 2 1, rn-1 2rnl rn-2 2 rn-i rn, rn-3 2 Tn-2 rn-1, ., b 2 ri r2. (b) Using this result, show that if ui, u2, . . . denote the terms 1, 1, 2, 3, 5, 8, . . . of the Fibonacci sequence, then

>

1 of (c) Show that if p is the number of digits in b, then the number n steps in Euclid's algorithm is less than or equal to 5p. [Hint: By (b) and Problem 8, Section 2-6,

Thus,

14. Let A = (m n d 1 0 /m, n E 2). Prove the following. (a) A is an integral domain with the usual addition and multiplication of real numbers. f a b d f i divides c d2/@ in A, then a2 - 10b2 divides (b) I c2 - 10d2 in 2. (c) 2 and 4 2/10 are both common divisors of 6 and 8 22/10 in A. b 2 / a in A, then 21a and 21b in 2. (d) If 2 divides a (e) I f 4 2/10 divides 2c 2 d d 3 in A , then 31c2 - 10d2 in 2. (f) I f 2c 2d2/fi divides 6 in A, then c2 - 10d2/9in 2. (g) I f 2c 2 d d a divides 8 22/10 in A, then c2 - 10d2[6in 2.

+ + +

+ +

5-31

THE FUNDAMENTAL THEOREM OF ARITHMETIC

153

(h) Prove that there is no element a b2/= in A which is a common divisor 2/10 in A. [Hint: I f such of 6 and 8 22/10, and is divisible by both 2 and 4 bda exists, then by (d), (e), (f), and (g), we have a b d m = 2c an a 2dd10, where c2 - 10d2 = ~ 3 Now . use the easily verified fact that the square of a natural number written in decimal form never ends with 3 or 7.1

5-3 The fundamental theorem of arithmetic. As the two preceding sections indicate, many questions considered in the study of the natural numbers are concerned with divisibility properties. That is, when will a number a divide a number b, if a and b are someho~v related? As an example, we might be interested in conditions under which the natural number n divides the binomial coefficient (:). One of the principal elementary tools in the study of divisibility problems is a theorem, called the fundamental theorem of arithmetic, which says that every natural number greater than 1 can be written in an essentially unique way as a product of prime numbers. The primes, which we discussed briefly in Section 2-3, can therefore be considered as the basic building blocks of al1 natural numbers.

DEFINITION 5-3.1. A natural number p is called a prime (or prime number) if p # 1, and p is not divisible by any natural number other than 1 or p. A natural number n > 1 which is not a prime is called composite.
For example, 2, 3, 5, 7, 11, 13, 17, and 19 are al1 of the primes less than 20, and 4, 6, 8, 9, 10, 12, 14, 15, 16, and 18 are al1 of the composite numbers less than 20. The number 1 is distinguished: it is neither prime nor composite. Following an old tradition, we will usually designate primes by the small latin letters p and q (sometimes with subscripts). I f p is a prime, and if a is any natural number, then the greatest common f divisor (p, a) divides p, so that either (p, a) = p, or (p, a) = 1. I (p, a) = p, then p divides a (since it is the g.c.d. of p and a). Thus, either p divides a, or else p and a are relatively prime. There are two parts of the fundamental theorem of arithmetic. The more elementary part states that every natural number greater than 1 can be written in some way as a product of primes. That is, if n > 1, then

where pl, p2, . . . , pk are primes (not necessarily different). Of course, it may happen that 1c = 1, so that the product has only one factor. This result is easily proved by course of values induction on n [see (2-3.3)]. It is only necessary to show that if every natural number m, which satisfies 1 < m < n, can be written as a product of primes, then n can be written

154

ELEMENTARY NUMBER THEORY

[CHAP.

as a product of primes. If n is itself a prime, there is nothing to prove. (This remark takes care of the basis for the induction when n = 2.) Otherwise, n is composite, and therefore n = a b, where neither a nor b is 1 or n. That is, 1 < a < n and 1 < b < n. By the induction hypothesis,
k
1

a=IIpi

and

b=IIqj,

where the pi and qj are primes. Therefore,

is a product of primes. The second part of the fundamental theorem of arithmetic is sometimes called the uilique factorization theorem. I t states that the expression of a natural number as a product of primes is unique, except for the order of the factors. This fact is also proved by induction. This time the induction is on the number Ic of prime factors in the expression of n as a product of primes. To begin with, however, we need a preliminary fact.

f a prime p divides a product ala2 (5-3.2). I least one of the factors ai.

. . . ak, then p divides a t

Proof. I f k = 1, then the hypothesis is the same as the conclusion, so that there is nothing to prove. We may therefore make the induction hypothesis that if p divides a product of k - 1 (k > 1) natural numbers, then it divides at least one of the factors of this product. By assumption, p divides ala2 . . . ak = (ala2 . . . ak-l) ak. As we remarked above, either p(ak or (p, ak) = l. I f (p, ak) = 1, then by Theorem 5-2.6, p divides ala2 . . . akWl. In this case, the induction hypothesis yields plai for some i with 1 i 5 k - 1. This completes the proof of the induction step, and proves (5-3.2).

<

We can now complete the proof that factorization of a natural number n into a product of primes is unique. Suppose that

f k = 1, then n = pl where pl, p2, . . . , pk and q l , qg, . . . , ql are primes. I is a prime. Moreover, qllpl and ql # 1. Thus, by Definition 5-3.1, f 1 were greater than 1, then n = pl = q l . ( q 2 . ql = pl. I -ql). Sinee q2, . . . , ql are primes, q2 q~ # 1, and n = pl is composite. This is a contradiction. Therefore, 1 = 1 and n = pl = ql, which is the desired conclusion. This proves the basis step in the induction on k . We

5-31

THE FUNDAMENTAL

THEOREM OF ARITHMETIC

155

may therefore assume that k > 1. Our induction hypothesis is that if a natural number can be expressed as a product of less than k primes, then this expression is unique up to the order of the factors. That is, the number has no other representation, regardless of the number of factors. From the equality pl.p2. . p k = q1'q2' Ql,
' ' a

ql. By (5-3.2), pk divides some qi. i t follows that pk divides ql q2 Since qi is a prime and pk # 1, it follows that pk = qi. By canceling pk, we obtain

Now p1 p2 pk-l is a product of 7c - 1 primes, and by the induction hypothesis, the factors pl, p2, . . . , pkWl are equal to the factors q1, 42, . . . , qi-1, qi+l, . . . , qz in some order. This completes the proof of the main result of this section, which can be stated as follows. THEOREM 5-3.3. Fundamental theorem of arithmetic. Every natural number n > 1 can be written as a product of primes, and except for the order of the factors, the expression of n in this form is unique. O f course, the primes which appear in the factorization of a natural number may be repeated. For example, 360 = 2 - 2 2 - 3 3 5 = 23 32 5. In writing a number as a product of primes, it is convenient to group together the repeated primes so that the number is expressed as a product of powers of distinct primes. Thus, each natural number greater than 1 has a unique expression

where pl, p2, . . . , pg are distinct prime numbers and t.he exponents ei are natural numbers. I t is easy to see from Theorem 5-3.3 that the natural numbers which are where divisors of n = peilp22 . . . pig are the numbers p{lp& . . . O 5 fi ei. For instance, the divisors of 360 = 23 32 5 are

<

pp

156

ELEMEXTARY

NUMBER THEORY

[CHAP.

Associated with any natural number n interest. These are r(n) and a(n)
= =

>

1 are two quantities of some

number of divisors of n, sum of al1 divisors of n,

I f we know the factorization of n into a product of powers of primes, it is possible to determine these quantities easily. Suppose that n = ppp12. . . pig,
where pl, p2, . . . , p, are distinct primes and el, e2, . . . , e, are natural numbers. I f d is a divisor of n, then

fi ei. By Theorem 5-3.3, different choices of the sequence where O f l , f2, . . . , f Qgive rise to different divisors. Thus, r(n) is the number of fi e;. I t is easy to show (by different sequences f 1, f2, . . . ,f, with O induction on g, for example) that the number of such sequences is

< <

< <

Thus, we obtain the following result. (5-3.4) then

I fn

pqlpiz . . . pig, where pl, p2,


=

. . . , pg are distinct primes,


(e,

r(n)

(el

+ l)(e2 + 1)

+ 1).

In order to evaluate a(n), consider the product

This product where p$ is O fi ei. al1 divisors of

< <

can be expanded as the sum of al1 products p{lp$ . . . p$, Thus, chosen from a summand of 1 pi p;" Therefore, the expansion of this product is just the sum of n, that is, a(n). Finally, since

+ +

[see Problem 6(a), Section 2-11) we have proved (5-3.5) I f n = pl"lpS2 . . . pis, where pl, p ~ .,. . , pg are distinct primes, then,

5-31

THE FUNDAMENTAL THEOREM OF ARITHMETIC

157

EXAMPLE 1. Let n

360

Z3 . 32 5. Then

Another useful application of Theorem 5-3.3 is in finding the greatest common divisor and the least common multiple of a set of natural numbers. Let (al, a2, . . . , a,) be a set of natural numbers. Then each number ai can be expressed as a product of powers of the same set of primes pl, p2, . . . , pk, if zero exponents are used. For example, consider (360, 105, 1078). Here, 360 = 23 32 5l 7' 11, 105 = 2'. 3' - 5 ' 7l 11, 1078 = 2 3 5' . 72 11l . Naturally, any prime raised to the power zero is 1. (5-3.6). Let (al, a2, . . . , a,) be a set of natural numbers, where

f o r i = 1, 2 , . (a) (al, a,,

. . , n a n d j = 1, 2 , . . . , k. Then . . . , a,)
=

p{lp$.

. . p& where
fj

min {ejl, eja,

. . . , ejn); . . . , ejn).

(b) [al, az, . . . , a,]

pY1pi2. . . pik, where gj = max {ejl, ej2,

Proof. To prove (a), we note first that since

for al1 a' and j, then p{lp$ . . . pik is a divisor of each ai. On the other hand, a number which is a divisor of each ai must be of the form p:lp$ . . . pkk, where hj 5 eji for al1 i and j. Then hj 5 min {ejl, ej2, . . . , ejn) = f j. Thus, p?lp;2. . . pkk divides p{lp$. . . p,$, SO that p{lp$. . . p k k = (al, a2, . . . , a,). The proof of part (b) is similar and is left to the reader.

EXAMPLE 2. Find the g.c.d. and 1.c.m. of the set of numbers (360, 105, 1078).
We have

360 O 7 8
and

= =

23 32 51 70 110, 2' 3' 5'. 72 111,


= =

105 = 20 31 5' 71 110,

min (3, 0, 1) min (0, 1, 2)

0, 0,

min (2, 1, 0) min (0, 0, 1)

= =

0, 0.

min (1, 1, 0)

0,

138

ELEMEKTARY NUMBER THEORY


=

[CHAP.

Hence, (360, 105, 1078) max (3, 0, 1) max {O, 1, 2)


= =

O 5 O 7 O 11 = 1. We find 2 O 3 max (2, 1, 0) max (0, 0, 1)


= = =

3, 2,

2, 1.

max (1, 1, 0)

1,

Therefore, [360, 105, 10781

23 . 32 5l 72 111 = 194040.

1. Express the following numbers as products of powers of distinct primes. (a) 100 (b) 1300 (c) 1960 (d) 109 (e) 713 2. Find the set of al1 divisors of each of the numbers in Problem 1. ) a(n) for each of the numbers n in 3. Using (5-3.4) and (5-3.5), find ~ ( n and Problem 1. Check your results on a(n) by computing the sum of the divisors of n directly. 4. Use (5-3.6) to find the g.c.d. and 1.c.m. of the following sets of integers. (b) (27, -18, 21, 45) (a) (20, -15, 22, -10) (c) (168, 842, 252) (d) (253, 690, 1127) 5. Use the fundamental theorem of arithmetic to determine the square roots of the following numbers with three decimal place accuracy. (a) 392 (b) 5780 (c) 122694 [Note: 4 3 = 1.41421..., 4 3 = 1.73205..., 4 5 = 2.23607....] 6. Prove in detail that if dln and n = pflpa2 . . . pig, then d = where O _< f; 5 ei.

p{lpp. . . pie,

7. Show that if the natural numbers a and b are relatively prime, then

8. For any natural number n, let ak(n) be the sum of the lcth powers of the divisors of n: ok(n) = I d l n dk, where the sum is over al1 natural numbers d which divide n. (a) Show that oo(n) (b) Show that if n primes, then
= =

~(n). p1lpp . . . pi,, where pl, p2,

. . . , p,

are distinct

(c) Show that a-k(n) = n-%k(n) both from the definition of ak(n) and from the formula obtained in part (b).
9. Use the fundamental theorem of arithmetic to give a new proof of Theorem 5-2.4.

10. Use the fundamental theorem of arithmetic to show that for any natural numbers a and b, (a, b)[a, b] = ab.

5-41

MORE ABOUT PRIMES

159

"5-4 More about primes. The fundamental theorem of arithmetic shows that the primes are of great irnportance in number theory. In this section we will consider some apparently simple questions about the set of al1 primes. Most of these questions will be left unanswered. Indeed, some of the simplest looking problems of the theory of prime numbers can be counted among the outstanding unsolved problems of mathematics. Probably the first question which one would ask about prime numbers is: How can 1te11 when a natural number is a prime? I t is always possible to test by long division whether or not a natural number a is divisible by a natural number b in the range 1 < b < a. I f a is not divisible by any such b, then a must be a prime. However, if a is large, then the amount of computation required to determine in this way whether a is a prime may be considerable. The labor can be reduced by using a simple property of composite numbers.

( 5 - 4 . ) Every composite number a is divisible by a prime p

< 6.

Since a is composite, a = b c where b > 1 and c > 1. Suppose that b 5 c. (One of the factors of a is less than or equal to the other one, and we can denote that factor by b.) Assume that b > d a . Then c 2 b > < a , & = a, which is imposso that c > 4;. Therefore, a = b c > sible. Since the assumption that b > & led to a contradiction, we can conclude that b 5 &. By Theorem 5-3.3 or Example 2, Section 2-3, b is divisible by a prime p. Hence a = b c is divisible by p where p 2 b .da. To test whether a natural number a is a prime, it suffices by (5-4.1) to divide a by al1 primes which are not larger than .\/a. I f each division has a nonzero remainder, then a is a prime. For example, consider a = 787. Since 2g2 = 784 and 2g2 = 841, 28 < < 29. The primes which are 2, 3, 5 , 7, 11, 13, 17, 19, and 23. By trial we are not larger than find that 787 is not divisible by any of these primes. Therefore, by (5-4. l ) , 787 is a prime. The first tables of prime numbers were compiled by a simple process based on (5-4.1). This method "sifts" the composite numbers from the sequence of al1 natural numbers which are less than or equal to some fixed natural number. The process is credited to the Greek mathematician Eratosthenes (276-194 B.C.). Suppose that we wish to find al1 primes 100. By (5-4.1), every composite number 5 100 is divisible by a prime p -\/iOO = 10. Therefore, the primes 5 100 are those numbers which are not proper multiples of 2, 3, 5, and 7. Thus, if we let the multiples of 2, 3, 5, and 7 fa11 through a sieve which contains the first hundred natural numbers, the primes will be left. This process, the sieve of Eratosthenes, is illustrated in Fig. 5-1.
\/a

<

<

<

ELEMENTARY NUMBER THEORY

Knowing the primes 100 = v ' ' m we can use this method for finding the primes 5 10,000, and so forth. There are various refinements to the sieve of Eratosthenes which cut down the labor involved in compiling tables of primes. hforeover at the present time, this computation can be done by automatic computing machines. A complete list of al1 the primes among the first 11,000,000 natural numbers has been obtained by the sieve method. Note that in our table of primes, the primes thin out as the numbers get larger. There are 15 primes 5 50 and 10 primes between 50 and 100. A natural question to ask is whether the primes stop somewhere in the sequence of natural numbers, that is, is there a largest prime? Euclid* answered this question in the negative. THEOREM 5-4.2. There are infinitely many primes.
Proof. Euclid's proof of this fact is a proof by contradiction. Suppose that the number of primes is finite. Then al1 of the primes can be written down in order of increasing size,

<

where p is the largest prime. The natural number

* Most people think of Euclid as a geometer. Actually, Euclid's contributions to the subject of geometry seem to be slight. The familiar geometrical portions of Euclid's Elements are mainly compilations of the work of other geometers. However, Euclid's contributions to number theory were of the highest significance. Theorem 5-4.2 is rightly considered to be one of the fine gems of mathematical science.

5-41

MORE ABOUT PRIMES

161

which is the product of al1 of the primes plus 1, is not divisible by any prime in our list. This is true, since the remainder on dividing n by any one of these primes is 1 # 0. For example, if we divide n by 5, we obtain

But by Theorem 5-3.3 (or by Example 2, Section 2-3), every natural number greater than 1 is divisible by a prime. Therefore, n is divisible by a prime which is not in our list of al1 of the primes. This is a contradiction. Hence our assumption that there is only a finite number of primes is false, and Theorem 5-4.2 is true. Although Euclid showed over 2200 years ago that the set of primes is infinite, a closely related problem remains unsettled. I f p and p 2 are both primes, then they are said to form a prime pair. For example, 5 and 7, 11 and 13, 17 and 19,29 and 31 are prime pairs. The largest known prime pair seems to be 1,000,000,009,649 and 1,000,000,009,651. There is strong evidence to support the conjecture that the number of prime pairs is infinite. However, no proof has been found for this statement. Very little regularity is found in the occurrence of primes in the sequence of natural numbers. On the one hand, the difference between consecutive primes can probably be as small as two infinitely often. On the other hand, there are arbitrarily large gaps between consecutive primes. For if n is any natural number, then the numbers

2 = (1 2 3 n) 2 is divisible are al1 composite. In fact n! by 2, n! 3 is divisible by 3, and so forth. Therefore, if p is the largest prime less than n! 2, and q is the next largest prime, then q - p 2 n. The irregular occurrence of the primes makes it seem unlikely that there is any simple expression for the number of primes less than the natural number n. However, studies of tables of primes indicate that the number of primes less than n is approzimately equal to n/log n. (Here log n represents the natural logarithm of n; hence log n = c loglo n, where c = 2.302585... and loglo n is the usual logarithm to the base 10.) The fact that the ratio of the number of primes less than n to the quantity n/log n approaches 1 as n gets large is one of the most important results in the theory of prime numbers. I t is known as the prime number theorem. This theorem was conjectured by severa1 mathematicians in the late eighteenth century, but over a hundred years of mathematical development was required before it could be proved. Even though the set of al1 primes is irregularly distributed among the natural numbers, it might be hoped that a subset of the primes could be

162

ELEMENTARY PYUMBER THEORY

[CHAP.

obtained by some simple formula. Pierre de Fermat (1601-1665)) the founder of modern number theory and one of the great mathematicians of al1 time, observed that for n = 0, 1, 2, 3, and 4, the quantity

is a prime, and he conjectured that this might be the case for al1 n. However, it wa.s found in 1732 that F5 is composite:

None of the Fermat numbers F, has been found to be a prime for n > 4, and in fact F, has been shown to be composite for 28 values of n. Nevertheless, the Fermat numbers have numerous interesting properties. A related class of numbers from which one might hope to obtain primes is given by the formula M, = 2,
-

1,

p a prime.

These are called Mersenne numbers after a rather undistinguished French mathematician hlarin Mersenne (1588-1648)) who asserted in 1644 that M, is a prime for p = 2, 3, 5, 7, 13, 17, 19, 31, 67, 127, and 257, and is composite for al1 other p less than 257. I t has since been shown that Mersenne's statement was incorrect: M67 and M257are not primes, and and Mio7 are not composite. There is an efficient method for MG1, testing whether certain Mersenne numbers are primes. This test has been used (with the aid of an automatic digital computer) to find the largest a number with 686 digits. I t is not known, however, known prime M2281, whether there are infinitely many Mersenne primes. There are two apparent ways to generalize the Mersenne numbers: replace 2 by an arbitrary natural number a > 1, and drop the restriction that the exponent p be a prime. Keither of these generalizations leads to new primes, however.

THEOREM 5-4.3. Let a and n be natural numbers greater than 1. Suppose that k = a" - 1 is a prime. Then a = 2 and n is a prime.
Proof. I f a > 2, then lc = a" - 1 = (a - l)(an-1 anW2 1) has a proper divisor a - 1 which is larger than l . This contradicts the f n = r S, where r > 1 assumption that Ic is a prime. Hence, a = 2. I

+ +

5-41

MORE ABOUT PRIMES

163

and s > 1, then lc = 2 C 1 = bs - 1, where b = 2' > 2. As we have just seen, this implies that lc is composite. Thus, n is a prime. The Mersenne primes are closely connected with a class of numbers which greatly interested the ancient Greeks: the so called perfect numbers. A perfect number is a natural number which is equal to the sum of its proper divisors, that is, the sum of al1 of the divisors except the number itself. 3 and 496 = 2*. 31 = 1 2 4 8 For example, 6 = 1 2 16 31 62 124 248 are perfect numbers. Since a(n) is the sum of al1 of the divisors of n, including n, n is a perfect number if and only if a(n) = 2n. From Euclid's time a rule has been known for determining al1 even perfect numbers.

+ + +

+ + +

+ + + +

THEOREM 5-4.4. A number of the form

where p and 2" - 1 are primes, is a perfect number. Conversely, if n is an even perfect number, then n is of this form.

Proof. I f n = 2p-'(2p (5-3.5) ,

1))where p and 2" - 1 are primes, then by

Hence, n is a perfect number. Conversely, suppose that a(n) = 2n, where n is even. Since n is even, n = 2' k where 1 > O and k is odd. Therefore, a(n) = 42') a(k) (see Problem 7, Section 5-3), and

Since 2 is the only prime dividing 2"') and 21f' - 1 is odd, it follows that 2'+' and 2'+' - 1 are relatively prime. In view of the equality

Theorem 5-2.5 implies that (2'+' - 1) divides k. Thus,


k = (2'+'
-

1) . m

and

a(k) = 2'+'

m.

164

ELEMENTARY NUMBER THEORY

Now m and k are divisors of lc and

Thus, the sum of al1 of the divisors of k is the sum of the two divisors m and k. Hence, 1c has only two divisors. This implies that m = 1 and 7 is a prime. Moreover k = 2'+' - 1. Thus,

where k = 2"' - 1 is a prime. By Theorem 5-4.3, 1 prime p. Therefore, n = 2~-'(2~ - 1)) where p and 2P - 1 are primes.

+ 1 must also be a

Whether there are any odd perfect numbers is another unsolved problem in number theory. Results have been obtained which show that if an odd perfect number exists, it must be larger than 2,200,000,000,000.

1. Determine which of the following natural numbers are primes and justify your answer. (a) 503 (b) 943 (c) 1511 (d) 213 - 1 (e) 899 2. Use the sieve of Eratosthenes to compile a table of primes less than 300. 3. Prove that the only prime triple (that is, three consecutive primes of the form p, p 2, p 4) is 3, 5, 7.

+ +

4. Show that 28 is a perfect number.

5. Show that 33,550,336 is a perfect number. 6. Show that 2,096,128 is not a perfect number.

7. (a) Prove that if m < n, then F, divides F, - 2 (where F, and F, are the Fermat numbers 22m 1 and 22n 1). (b) Show that if m # n, then F, and F, are relatively prime. (c) Use the result of (b) to give a new proof of Euclid's Theorem 5-4.2. 8. Show that if n is a natural number and if k = 2" 1 is a prime, then n is a power of 2.

9. (a) Prove that the product of natural numbers which are al1 of the form 3s 1 is a number which is again of this form. (b) Use this remark to prove that there are infinitely many primes of the 2, that is, that the infinite sequence 5, 8, 11, 14, 17, . . . of natural form 3x numbers contains infinitely many primes. [Hint: Proceed as in the proof of

5-51

THE FUNDAMENTAL THEOREM OF ARITHMETIC

165

Theorem 5-4.2. Suppose that there is only a finite number of primes o f the form 3x 2. List them all: 5, 11, 17, . . . , p,

where p is the largest such prime. Show that the number

is divisible only by primes of the form 3x contrary to (a).]

+ 1, but n itself is not of this form,

"5-5 Applications of the fundamental theorem of arithmetic. The importance of the fundamental theorem of arithmetic can hardly be overestimated. This fact can be appreciated after some applications of the theorem are examined. In this section, we will present four different applications. Numerous others will appear later in the book. Godel numbering. Any scheme which associates a natural number with each sentence, or sequence of sentences, in some language in such a way that different expressions are associated with different numbers is called a Godel numbering of the language (after the mathematician Kurt Godel, who used such a numbering to prove important results in mathematical logic). One of the ways in which this can be done depends on the fundamental theorem of arithmetic and the fact that there are infinitely many prime numbers (Theorem 5-4.2). Nearly al1 expressions in the English language can be written using 38 symbols and a space marker. The symbols are the 26 letters of the alphabet, the period, question mark, exclamation point, comma, colon, semicolon, hyphen, apostrophe, two parentheses, and two quotation marks. Associate the numbers 1 to 38 with these symbols in the given order, that is,

. . . 26 27 28 . . . 37 38

1
A

5
B

5
C

5
2

5
.

1
?

1
Lt

5
>>

Associate zero with the space symbol. Let pl, p2, p3, p4, . . . denote the sequence of al1 primes in the order of increasing size. That is, pl = 2, p2 = 3, p3 = 5, pq = 7, . . . . I t is now possible to define a Godel numbering of the English language by associating with the expression

(where the si are either one of the 38 symbols listed above, or else a space marker) the number en p11p?p33 Pn ,

166

ELEMENTARY PXJMBER

THEORY

[CHAP.

where ei is the integer from O to 38 which corresponds to si. For example, the number associated with the expression GEORGE WA4SHINGTON

Since there are infinitely many primes, this scheme associates numbers with expressions of any length. Of course, the numbers involved may be very large. The number associated with the expression GEORGE WASHINGTON has more than 250 decimal digits. Nevertheless, even the text of a book such as the King James version of the bible (written in capitals, and with numbers written out) has a uniquely associated number. I t is theorectically possible to determine any expression from the knowledge of its corresponding number. For example, the number

factors to

and therefore

1 DO
is the expression from which it is obtained. Different expressions must correspond to different numbers, since by the fundamental theorem of arithmetic, two different products of primes cannot be equal to the same natural number. Thus, our scheme satisfies the requirements for a Godel numbering of the English language. The scheme for constructing a Godel numbering which we have presented is not very practical, since the numbers involved are usually very large. However, when applied to formal mathematical languages, the method has important theoretical consequences for logicians and philosophers. A cardinal number problem revisited. In Section 1-2, we showed that the set F of al1 fractions a / b (where a and b are natural numbers) has the same cardinality as N, the set of natural numbers. We will now use the fundamental theorem of arithmetic to give another proof of this fact, that is, to establish a one-to-one correspondence between N and F. First, define a one-to-one correspondence between the set of al1 nonnegative integers and . Any such the set of al1 integers in such a way that O corresponds to O

5-51

THE FUNDAMENTAL THEOREM O F ARITHMETIC

correspondence will do, but to be specific, let

For each nonnegative integer e, let correspondence. For example,

stand for the mate of e under this

THEOREM 5-5.1. With each natural number

where p,, pz, . . . , pQ are distinct primes and e l , ez, negative integers, associate the fraction

. . . , e, are non-

Then the association a ++ r is a one-to-one correspondence between N and F. Before discussing the proof of this theorem, let us compare the correspondence between N and F given by Theorem 5-5.1 with the correspondence defined in Section 1-2. We see immediately that they are different. For example, with the definition given in Theorem 5-5.1,

which bears no resemblance to the correspondence given in Section 1-2. The correspondence of Section 1-2 was defined in rather vague terms. We gave no rule stating what fraction would correspond to a specific natural number n. Instead, it was pointed out how one could, with sufficient patience, find the fraction corresponding to any particular n. For large values of n, the method would not be practical. For example, to find the fraction corresponding to 90,000,000 would be a long, tedious job. On the other hand, the correspondence given by Theorem 5-5.1 is much more explicit. To apply the rule, the only requirement is that we be able to factor the natural number a into its prime factors. For example, the number 90,000,000 = 27 32 . 57 corresponds to 2-* 3 5-4 = 3/10,000. For a mathematician, the correspondence defined in Theorem 5-5.1 is much more satisfying than the vague directions laid down in Section 1-2. Never-

168

ELEMESTARY

NUMBER THEORY

[CHAP.

theless, he would admit, perhaps reluctantly, that the discussion in Section 1-2 proves just as effectively that the set F is denumerable. The proof of Theorem 5-5.1 is based on a generalization of the fundamental theorem of arithmetic.

THEOREM 5-5.2. The positive rational numbers r can be expressed iil


the form r
=

p11pi2. . . p i ~ ,

where pl, p2, . . . , p, are distinct primes and xl, xs, . . . , xg are integers. Moreover, this representation is unique, except for the order of the factors and the occurrence of primes with exponent zero. This theorem is an almost immediate consequence of Theorem 5-3.3, and the fact that every positive rational number r has a unique representation a/b in "lowest terms," that is, with a and b natural numbers which are relatively prime. We leave to the reader the chore of supplying a detailed proof. Theorem 5-5.1 can now be easily proved by reinterpreting Theorem 5-3.3 and 5-5.2. For this purpose, let p l , p2, p3, p4, . . . denote the sequence of al1 primes in increasing order. Thus, pl = 2, p2 = 3, p3 = 5, p4 = 7, . . . . Then by Theorem 5-3.3, each natural number a can be written

where now el, e2, . . . , e, are nonnegative integers, and g is some sufficiently large number. The number of factors in the expression is not uniquely determined because we can always multiply by primes to the zero power. Thus, 10 = 2 l . 3O. 5 l . 7 O . l l O 13O. . However, the number a determines, and is determined by, the sequence of exponents el, e2, . . . , e,. By adjoining an infinite number of zeros, we do achieve complete uniqueness. I n other words, there is a one-to-one correspondence between the natural numbers and the infinite sequence of nonnegative integers

which are zero from some point on (that is, for sufficientlylarge g, = 0, = O, . . .). The uniqueness statement in the fundamental theorem of arithmetic tells us that the correspondence

a
is one-to-one.

py1pi2.. . p g "

(el, e2,

. . . , e,, O, 0, . . .)

5-51

THE FUNDAMENTAL THEOREM OF ARITHMETIC

169

In exactly the same way, there exists, by Theorem 5-5.2, a one-to-one correspondence between the set F of al1 positive rational numbers and the set of al1 sequences

of integers with the property that x , + ~ = O, xg+2 = 0, point on. The correspondence is

. . . from some

We now have two sets of infinite sequences: the set J of al1

where the ei are nonnegative integers which are zero from some point on, and the set K of al1 (xl, x2, . . . , x,, . . .), where the xi are integers which are zero from some point on. The one-to-one correspondence e e given by (5-4) clearly determines a one-to-one correspondence between J and K : (el, ez, . . . , e,,

. . .)

I f al1 of these one-to-one correspondences are combined, we obtain


pt1p3. . . pzg

(l, e2, . . . , e,,

. . .). . . .)

. . . , e,, O, . . .) p;~;~ . .. pgg,


(el, e2,

(e1, e2, . . . , e,, O,

which is the correspondence described in Theorem 5-5.1. A Diophantine problem. I t is well known that there are right triangles whose sides have integral length. The best known example is the 3, 4, 5 right triangle with bases of length 3 and 4, and hypotenuse of length 5. Somewhat less well known is the right triangle with sides of length 5, 12, and 13. Since the length c of the hypotenuse of a right triangle is related to the lengths a and b of the sides by the Pythagorean formula

the problem of finding al1 right triangles with sides of integral length is equivalent to finding al1 natural numbers a, b, and c which satisfy (5-5). An equation such as (5-5) involving powers of unknown quantities with integral coefficients is called a Diophantine equation (after the ancient b = 2, a2 Greek mathematician Diophantus). For example, a 5b2 = 1, a2 ab b2 = 5c2, and a 4 b4 = c2 are al1 Diophantine equations. The problem of finding al1 integral solutions of a Diophantine equation, or a system of Diophantine equations, is called a Diophantine problem.

+ +

170

ELEMENTARY NUMBER THEORY

[CHAP.

Using the fundamental theorem of arithmetic, it is possible to obtain the complete solution of (5-5). First note that if r, s, and t are natural numbers, with r > S, and if we let then

Therefore, (5-6) gives a large family of solutions of (5-5). We will show that every solution of (5-5) with a, b, and c natural numbers is of the form (5-6) (or a similar form with a and b interchanged) for suitable natural numbers r, s, and t. The proof is based on the following useful consequence of the fundwental theorem of arithmetic. THEOREM 5-5.3. Suppose that a and b are natural numbers which are relatively prime, and ab = cn for some natural number c. Then a = al, b = bl, where a l and bl are natural numbers. Proof. Let a =

p p . . . pp,

b =

q{l

fh q h

where pl, . . . , p, are distinct primes, q l , . . . , q h are distinct primes, and the exponents el, . . . , e, and f 1, . . . , fh are al1 positive. Then the pi must be different from al1 qj, since otherwise a and b would have a common prime factor, contrary to the assumption that they are relatively prime. Let

k are positive exponents. where rl, . . . , rk are distinct primes and ml, . . . ,m Then the condition ab = cn can be written

By Theorem 5-3.3, it follows that the primes rl, . . . rh must be pl, . . . , p,, ql, . . . , q h in some order, and that el, . . . , e,, f l , . . . , fh are the corresponding exponents nml, . . . , nmk. Thus, each ei and f j is divisible by n, that is, el/n, . . . , e,/n, f l/n, . . . , fh/n are al1 natural numbers. Let

.: Then a = ay, b = b We now return to the problem of finding al1 natural iiumber solutions of (5-5). Suppose that a, b, and c are natural numbers which satisfy (5-5).

5-51

THE FUNDAMENTAL

THEOREM OF ARITHMETIC

171

Let t be the greatest common divisor of a, b, and c. Then a/t, b/t, and c/t are natural numbers with no prime factor in common which satisfy

We will show that a/t, b/t, and c/t have the form r2 - s2, 2rs, and r2 s2, respectively, or else 2rs, r2 - s2, and r2 s2, respectively. Let x = a/t, y = b/t, and x = c/t. Then

where no prime divides any two of these natural numbers, that is, each pair of the numbers x, y, x are relatively prime. For example, if plx and plx, then p divides x2 - x2 = y2. Thus, by (5-3.2)) ply. But this is a contradiction, since x, y, and x have no prime factor in common. Since x and y are relatively prime, they cannot both be even. Suppose they are both odd. Then we could write x = 1 2m, y = 1 2n, with m and n nonnegative integers. Consequently

and z2 = x2

+ y2 = 2 + 4[m(m + 1) + n(n + l)].


=

This implies that z is even, say x = 21. Then x2

412, so that

This is clearly impossible. Therefore, one of x or y is even, while the other is odd. Suppose that x is odd and y is even. Then x is odd, so that x - x x) are integers. Moreand x x are even. That is, $(z - x) and $(z over, they are relatively prime, since if a prime p divides +(x - z) and +(x x), then p divides +(x x) +(x - x) = x and *(x x) +(x - x) = x. But this is impossible, since x and x are relatively prime. BY (5-7) ,

+ +

where *y is a natural number, since y is even. By Theorem 5-5.3, *(x x) and *(z - x) are squares, that is, there exist natural numbers r and sbuch that $(z - Z) = s2. +(z x) = r2,

+ Consequently, z = r2 + s2, x

r2 - s2, and

5-51

THE FUNDAMENTAL

THEOREM OF ARITHMETIC

173

f a, b, and c are natural numbers which have no common prime (5-5.5). I factor and satisfy a2 b2 = c2, then (a) c is odd; (b) either a is even and b is odd, or vice versa; (c) if a is even,

where r and

are relatively prime natural numbers and r

> s.

Now suppose that the equation x4 y4 = z2 can be satisfied by some natural numbers. Then the set of al1 natural numbers z for which there exist natural numbers x and y, such that x4 y4 = z2, is not empty. Consequently, by the well-ordering principle, this set contains a smallest number c. Let a and b be corresponding natural numbers such that a4 b4 = c2. We will obtain a contradiction by showing that there is a natural number t smaller than 'c such that t2 = x4 y4 for some natural numbers x and IJ. This will show that our original assumption that a solution exists is false. I f a and b had a common prime factor p, then p41c2. Thus, by the fundamental theorem of arithmetic, p2/c, and therefore (a/p)4 ( b / ~= )~ ( ~ 1 ~ Since ~ ) ~clp2 . < c, this contradicts the assumption that c is the smallest of the natural numbers z for which z2 = x4 y* has a solution. Consequently, a and b are relatively prime. This implies that a2, b2, and c have no common prime factor, so that (5-5.5) applies to the equation (a2)2 (b2)2= c2. We obtain that either a2 or b2 is even, and assuming that a2 is even, we have

where r and S are relatively prime natural numbers and r > s. Since r and S are relatively prime, it follows that S, b, and r in the equation s2 b2 = r 2 have no prime factor in common. Thus, by (5-5.5a), applied to the equation s2 b2 = r2, r is odd. However, a2 is even, so that a is even. Thus, 4 divides a2 = 2rs, and consequently 2/73. Since r is odd, S must be even, and we have

Since (r, S) = 1, it follows that r and s/2 are relatively prime. Consequently, by Theorem 5-5.3, the equation (a/2)2 = r (s/2) implies that r and s/2 are squares:

174

ELEMEKTARY NUMBER THEORY

[CHAP.

Now apply (5-5.5) to the equation s2 we can write s


=

+ b2 = r2 again.
r = v2

Since s is even,

2vw,

b = u2

w2,

+ w2,

where u and w are relatively prime natural numbers. Hence,

By Theorem 5-5.3, it follows that u = x2,


W

y2

for some natural numbers x and y. Combining these equalities with the w2, we obtain equations r = t2 and r = v2

Moreover, t 5 t2 = r $ r2 < r2 s2 = C. Thus, we have arrived at the promised contradiction, and proved the f ollowing result. THEOREM 5-5.6. There are no natural numbers a, b, and c which satisfy a4 b4 = c2. In particular, the equation x4 y4 = z4 has no solution in natural numbers.

The reader should reexamine the proof of this theorem, noting the following aspects of it. (1) The main step of the proof is to show that the existence of one triple =z : leads to another (xl, y,, zl) of natural numbers satisfying xf = zz with zz < zl. Repeating the triple (x2, y2, z2) satisfying x i argument would lead to a sequence of triples (x,, y,, z,), n = 1, 2, 3, . . . , with x$ y$ = Z : and zl > 22 > z3 > . . . . This sequence of inequalities is impossible by the well-ordering principle, and therefore proves that the existence of the original triple (xl, yl, zl) is impossible. (Actually, it was convenient for our argument to use the well-ordering principle at the beginning of the proof.) This technique of proof is common in number theory. It is called the "method of infinite descent. " The reader may recall that this method was used to establish the Euclidean algorithm in Section 5-2. (2) The main step of the proof is carried out by two applications of (5-5.5) and two applications of Theorem 5-5.3. Remembering this observation and the general method of proof, the reader should be able to reconstruct the argument without the help of the book. (3) The method of proof which we used would not suffice to show directly that the equation x4 2/4 = z4 has no solutions in N. The generalization

5- 61

CONGRUENCES

175

to x4 y4 = z2 is essential to the success of our proof. This is another instance of the situation discussed in Section 2-4, where induction fails in the proof of a certain theorem, but is successful in proving a stronger result. In the case of Theorem 5-5.6, induction occurs as an application of the well-ordering principle. (4) I t is an immediate consequence of Theorem 5-5.6 that no equation of the form X41 + y4m = x2n has a solution x = a, y = b, z = c, with a E N, b E N, c E N. Indeed, if such a solution exists, then a" bm, cn is a solution of x4 y4 = z2. I n particular, if 4 divides n, then the Fermat equation xn y" = zn has no solution in N.

1. Using the Godel numbering of the English language which was defined in this section, find the Godel numbers (in factored form) of the following expressions. (a) ALGEBRA (b) U.S.A. (c) DON'T GIVE U P THE SHIP! 2. Give the proof of Theorem 5-5.2. 3. Let a and b be any natural numbers. Show that integers r and s exist such that rs = (a, b), (alr, b/s) = 1. 4. Let a be a natural number. Let s2 be the largest square dividing a. Show that if d2 is a square dividing a, then dls. 5. Suppose that (a, b) = 1, (c, d) = 1, and ab = cd. Show that integers r, S, t, and u exist, each pair of which are relatively prime, such that a
=

rs,

tu,

c =

rt,

su.

6. Show that if p is a prime, and if 1 2 i . coefficient (I)

< p,

then p divides the binomial

7. Show that every solution in natural numbers of the Diophantine equation

a2
is given by

+ 2b2 = c2
b
=
=

a
[or a = Zrst, b numbers.

= =

3t(r2 - 2s2)t, +(r2 - 2s2)t,c

Zrst,

c =

(r2

+ 2s2)t
S,

(r2

+ 2s2)t],where r,

and t are natural

5-6 Congruences. Many interesting problems and numerous theoretical questions in number theory are concerned with properties of the remainder obtained by dividing an integer by a fixed natural number m. For example, if the first of July falls on Sunday, then what will be the day of the week on

176

ELEMEKTARY PYUMBER THEORY

[CHAP.

which the first of September falls? Since July and August each have 31 days, the answer is that the first of September falls r days after Sunday, where r is the remainder obtained on dividing 31 31 by 7, namely, r = 6, and the day is Saturday . Another example is the following problem : a certain chemical reaction requires 100 hours; if it is desirable to complete the reaction at 8:00 A.M.,at what time of day should it be started? The answer is r hours before 8:00 A.M.,where r is the remainder obtained on dividing 100 by 24, that is, 4:00 A.M. A property of remainders which was needed for the solution of Problem 9, Section 5-4, is the fact that if the natural number a leaves the remainder 1 on division by 3, then the same is true for every power of a. The study of many such problems involving remainders is simplified by the systematic use of a concept which was introduced by the great German mathematician Carl Friedrich Gauss (1777-1855).

DEFINITION 5-6.1. Let m be a natural number. An integer a is congruent modulo m to an integer b if a - b is divisible by m in the ring of integers. I t is customary to write a

b (mod m)

to indicate that a is congruent to b modulo m. The relation a = b (mod m) is called a congruence, and m is called the modulus of the congruence. By the definition of congruence, every pair a, b of integers are congruent modulo 1. Thus, congruence with the modulus 1 is not very interesting. congruence modulo 2 has a familiar meaning: a = b (mod 2) if and only if a and b have the same parity; that is, either a and b are both even, or they are both odd. The connection between the remainders on division by m and congruence with the modulus m is seen from the following fact. THEOREM 5-6.2. Let m be a natural number. Then each integer is congruent modulo m to one and only one of the numbers O, 1, 2, . . . , m - l. This theorem is an immediate consequence of the division algorithm. If a is any integer and m is a natural number, then there are unique integers q and r, with O 5 r < m, such that a = qm r. Thus, there is a unique number r among the numbers O, 1, 2, . . . , m - 1 such that a - r is divisible by m. By Definition 5-6.1, this means that a is congruent modulo m to one and only one of the numbers O, 1, 2, . . . , m - 1. I t is clear that an integer a is divisible by a natural number m if and only if a = O (mod m). Moreover, a b (mod m) is equivalent to the

5-61

statement that a - b O (mod m). Thus, the notion of congruence is apparently only a variation of the concept of divisibility. It is therefore surprising that this notion is so useful. The usefulness is partly explained by the fact that congruence has many of the familiar properties of ordinary equality, so that manipulations with congruences are similar to the computations of elementary algebra. THEOREM 5-6.3. Let m be a natural number and let a, b, c, and d be integers. Then (a) a a (modm); (b) if a b (mod m), then b = a (mod m); b (mod m) and b = c (mod m), then a c (mod m) ; (c) if a (d) if a = b (mod m) and c = d (mod m), then a + c=b d (modm) a n d a - c b - d (modm); (e) if a = b (mod m) and c d (mod m), then ac bd (mod m); b (mod m), then ca cb (mod m); (f) if a (g) if a r b (mod m), then a" = bn (mod m) for any natural number n.'

CONGRUENCES

177

--

-- - -

The properties (a), (b), and (c) follow easily from Definition 5-6.1. To prove (d), suppose that a = b (mod m) and c = d (mod m). Then by Definition 5-6.1, a - b = km and c - d = lm for some integers k and l. Thus,

and

c) - (b d) and (a - c) - (b - d) are Therefore, the differences (a both divisible by m. By Definition 5-6.1, a c b d (mod m) and a - c b - d (mod m). The statement (e) is proved similarly. Using the same notation as in the proof of (d), we have

-+
+

Property (f) is an immediate consequence of (e) and the fact that c c (mod m) by (a). Property (g) is obtained by successively applying (e) to the congruence a = b (mod m). Using the given congruence twice, (e) implies a2 b2 (mod m). Using a = b (mod m) and a2 = b2 (mod m),

ac - bd = (a - b)(c - d) ad bc - 2bd = (a - b)(c - d) d(a - b) b(c - d) = (km)(lm) d(km) b(1m) = (klm dk b1)m.

+ + +

+ + + +

178

(e) gives a3 b3 (mod m), and so forth. Of course, this argument can be formalized by induction. The "transitive law," Theorem 5-6.3(c), and the "reflexive law," Thcorem 5-6.3(a), justify the use of sequences of equalities and congruentes. For example,

ELEMEXTARP XUMBER THEORY

[CHAP.

is a convenient abbreviat,ion for a = b (mod m), b = c, c d (mod m), and d = e. By (a) and (c), the congruenccs obtained by omitting one or more quantities from this sequencc are valid: a c (mod m), a d (mod m), a = e (modm), b = d (modm), b e (modm), and c e (modm). I t is a consequence of Theorem 5-6.3 that in a congruence with modulus m which involves sums and differences of products, any integer in the congruence can be replaced by any other integer to which it is congruent modulo m. For example, if ab3 - 2abc 5d2 (mod m), and if b = e (mod m), then ae3 - 2aec 5d2 (mod m). I n fact by (g), b3 = e3 (mod m). Using (f), we obtain ab3 = ae3 (mod m). Simjlarly, 2abc 2aec (mod m). ae3 - 2aec (mod m). Finally, employing (e), \\re By (d), ab3 - 2abc find a e " 2aec 5d2 (mod m). Even the simple properties of congruence given in Theorem 5-6.3 have useful applications.

- - -

--

EXAMPLE 1. L c us ~ find the remainder obtained on dividing the sum

by 7. By Thcorem 5-6.3,

110+ 210 +

. . . + 610 + 010 + 991 + 1001 (mod 7))

s..

mhere this sum contains 14 occurrences of the blocks 11 01 (since 14 7 = 98). Thus,

+ 21 + + 61 +

Consequently, the remainder obtained on dividing the sum 1l o 1001 by 7 is 3.

+ 21 + 31 +

5- 61

CONGRUENCES

179

EXAMPLE 2. A well-known property of natural numbers written in decimal notation is that such a number is divisible by 9,if and only if the sum of its digits is divisible by 9. The basis for this useful fact is the observation that if

where O

ri

<

10, then since 10

1 (mod 9))

That is, any natural number is congruent modulo 9 to the sum of its digits. I n particular, n is divisible by 9 if and only if the sum of the digits of n is divisible by 9. Note that the process of adding digits can be repeated to obtain the remainder on division of a number by 9. For instance,

One of the simplest and most familiar methods of checking the addition of a column of numbers is based on this observation. The process is called "casting out nines." I t consists of summing the digits of each number in the column, adding these sums, and comparing the result with the number which is obtained by summing the digits of the number which is supposed to be the sum of the given numbers. If the two numbers being compared are not congruent modulo 9, then there is an error. For example: 2165 3082 7165 11011 35171 1022 59616

= 13 = 19
4

14

--

= 17 = 5 72 = 7 + 2 O (mod 9) 27 2 + 7 = O (mod 9)

Of course, this check is not infallible, but i t is easy to apply. It is left to the reader to show that this method can also be used to check multiplication.

The following theorem gives some of the most useful relations between congruences with different moduli.

THEOREM 5-6.4. (a) I f a b (mod m), then l a = lb (mod lm). (b) If a b (mod m) and llm, then a b (mod 1). (c) If a = b (mod m) and d is a comnion divisor of a, b, and m, then a/d = b/d (mod m/d) . (d) I f ea eb (rnod m), then a = b (mod m/(c, m)).

180

ELEMENTARY NUMBER THEORY

[CHAP.

(e) I f a = b (mod ml) and c = d (mod mi), t h e n a + c = b + d (mod (ml,m2)), a - c-b - d (mod (mi,mi)), and ac = bd (mod (mi, mi)). (f) I f a = b (mod ml), a b (mod m2), . . . , a = b (mod mk), then a b (mod n), where n is the least common multiple of ml, mi,

The reader will find that (a), (b), and (c) are straightforward consequences of Definition 5-6.1. Property (d) is the cancellation law for congruences. To prove (d), we first note that since (c, m) is a common divisor cb/(c, m) (mod m/(c, m)). of ea, cb, and m, by (e) we have ca/(c, m) This means that m/(c, m) divides [c/(c, m)](a - b). But m/(c, m) and c/(c, m) are relatively prime. Therefore, by Theorem 5-2.6, m/(c, m) divides (a - b). That is, a = b (mod m/(c, m)), proving (d). In order to prove (e), we observe that a b (mod (ml, m2)) and c = d (mod (ml, mi)) by (b), since (ml, mi) lml and (ml, mi) jm2. The conclusion follows from Theorem 5-6.3 (d) and (e). By Definition 5-6.1, the hypothesis of (f) is equivalent to the statement that ml 1 (a - b), m21(a - b), . . . , and mkl(a - b). Therefore, by (5-3.610) the least common multiple n of ml, m2, . . . , mk divides a - b. In other words, a b (mod n).

1. Show that every integer is congruent modulo 7 to exactly one of the following numbers: 291, 7, 54, 31, 36, 20, 765.

Z5

2. Find the remainders on dividing 360by 7, 15, and 31. [Hint: N7rite 60 Z4 Z3 22, SO that 360 = 3323163834.]

+ + +

3. Prove (a), (b), and (c) of Theorem 5-6.3.

4. Prove that if a

5. Show that the method of "casting out nines" can be used to check multiplication of natural numbers. 6. Use the fact that 10 = -1 (mod 11) and lo2 1 (mod 11) to discover a rule for divisibility of a natural number (written in decimal notation) by 11.

b (mod m), then (a, m)

(b, m).

7. Discover a method of "casting out sixes" as a check for addition and multiplication for natural numbers written in the base 7 notation.
8. Prove (a), (b), and (c) of Theorem 5-6.4.

9. Find the remainder obtained for the following divisions. 10805 divided by 14. (a) l5 z5 (b) 1 2! 3! 4! (10l0)! divided by 24. (c) (1) (i) : (E)divided by 7.

+ + + + + + + + + (O) + + + +

5-71

LINEAR CONGRUENCES

181

5-7 Linear congruences. The linear equation ax = b, where a and b are integers, has a solution which is an integer if and only if a divides b. In fact, this statement is just the definition of divisibility. This section is concerned with the analogous problem of solving the linear congruence
ax

b (mod m).

Linear congruences occur in a variety of practica1 problems.

EXAMPLE 1. A synodic month (the period of time between two consecutive appearances of a full moon) is approximately 29% days. If a full moon occurs a t a certain time on Monday evening, how many synodic months later will the full moon occur a t approximately tF-e same time on Wednesday evening? If we measure time in terms of half days, the synodic month is 59 half days in length and a week is 14 half days long. After x synodic months beyond the occurrence of the full moon, 59x half days have elapsed, and a full moon occurs again. I f we divide 592 by 14 obtaining a remainder r, then this full moon occurs r half days after Monday evening. Since Wednesday is 4 half days after Monday, the x which solves our problem is the smallest positive integral solution of the congruence 59x 4 (mod 14).

Since 59

= 3 (mod 14)) this congruence is equivalent to


3x = 4 (mod 14).
6 is the smallest positive solution of this congruence.

By trial, we find that x

To understand more clearly the nature of the solutions of linear congruences, let us examine a particular example. Consider the congruence 3x

2 (mod 5 ) .

Substituting x = 1, 2, 3, . . . , 20, we find that among these numbers only x = 4, x = 9, x = 14, and x = 19 satisfy the congruence. Note that these numbers are al1 congruent modulo 5. This suggests that al1 integers x which are congruent to 4 modulo 5, and only these numbers, are solutions 2 (mod 5 ) . of 3x By checking more values of x we could gather additional evidence for our guess. However, this is unnecessary, since the conjecture is easy to prove. First, if x = 4 (mod 5)) then by Theorem 5-6.3, 3x 3 4 = 12 2 (mod 5). Thus every such x is a solution. Next we must show that these are the only solutions. I f z is any solution, then

182

ELEMEXTARY NUMBER THEORY

[CHAP.

Since 3 is relatively prime to 5, it follows from Theorem 5-6.4(d) that x 4 (mod 5). The result that x satisfies 3x 2 (mod 5) if and only if 4 (mod 5) provides a complete solution of the linear congruence x 3x = 2 (mod 5). In order to describe the solutions of linear congruences in general, we introduce a new concept. By Theorem 5-6.2, every integer is congruent modulo m to one and only one of the numbers O, 1, 2, . . . , m - l . Thus the set Z of al1 integers is divided into disjoint subsets Xo, X1, X2, . . . , Xrn-1 where X, = (x E Zlx r (mod m)).

The sets Xo, X1, X2, . . . , Xrn-1 are called congruence classes modulo m (or residue classes modulo m). For example, if m = 2, X o is the set of al1 even integers and X1 is the set of al1 odd integers. I f m = 4, Xo = (4klk E Z), X1 = (41c llk E Z), X 2 = (4k 2 1 1 c E Z), and X3 = (41c 31k E 2 ) . I f the integers x and y are in the same congruence class X , modulo m, then x r r (mod m) and y r (mod m). Therefore, by Theorem 5-6.3(b) and (c), x y (mod m). Conversely, if x = y (mod m) and y E X,, then x y = r (mod m), so that x E X,. Thus, two integers x and y are in the same coiigruence class modulo m if and only if x = y (mod m). As in the example discussed above, if x is a solution of the congruence

- -

ax = b (mod m), x (mod m), then by Theorem 5-6.3, ay ax = b (modm).

and if y

Thus, if x is a solution of ax = b (mod m), theii every member of the congruence class which contains x is also a solution of the congruence. I n the example 3x 2 (mod 5)) every element of the congruence class X 4 is a solution of the congruence. I n fact, the solutions of 3x 2 (mod 5) are exactly the integers which belong to X4. However, it may happen that a linear congruence modulo m has solutions belonging to more than one congruence class modulo m. For example,

2x

6 (mod 12)

has the solutions 3 and 9, and therefore every element in either of the two congruence classes X 3 and X9 is a solution of this congruence. On the other hand, some linear congruences have no solutions. For instance, 22

1 (mod 6)

5-71

LINEAR CONGRUENCES

183

cannot be satisfied by any integer x, since 2n: - 1 is always odd and therefore not divisible by 6. These remarks suggest that a linear eongruence ax = b (mod m) is effectively solved if we obt,ain a representative set of solutions

where O 5 ri 5 m - 1 and ri # r j for i # j (which implies that the ri belong to different eongruence classes), such that every solution of ax = b (mod m) is a member of the eongruence class of some Ti. I f ax = b (mod m) has such a representative set of solutions Irl, 7-2, . . . , rk), then this congruence is said to have exactly 1c incongruent solutions modulo m. In particular, if lc = 1, that is, al1 solutions belong to the same congruence class, then we say that the congruence has a uniqzse solution modulo m. This is the case for the congruence 3x = 2 (mod 5). If m is not very large, it is possible to obtain the representative set of solutions by testing each of the numbers O, 1, 2, . . . , m - 1 to see which of them satisfy ax r b (mod m). However, this procedure is impractical for large values of m. Fortunately, it is possible to prove general theorems which give a complete solution for any linear congruence. THEOREM 5-7.1. I f (a, m) = 1, the' eongruence an: unique solution modulo m.

b (mod m) has a

Proof. By Theorem 5-2.2(a), there exist integers u and v such that ua vm = l. Multiplying by b, we obtain

bua

+ bvm = b,

By Definition 5-6.1, a(bu) b (mod m), so that x = bu is a solution of t'he given congruence. Suppose that r is any solution of ax b (mod m). Then ar b = a(bu) (mod m). Since (a, m) = 1, we can use the cancelation lam for congruences, Theorem 5-6.4(d), to cancel a and obtain r

or

a(bu)

b = (-bv)m.

bu (mod m).

Thus, any solution of the given congruence is congruent modulo m to the solution x = bu, so that ax = b (mod m) has a unique solution modulo m. Note that u can be found by using the Euclidean algorithm, explained in Section 5-2.

184

ELEMENTARY NUMBER THEORY

[CHAP.

Now consider the general linear congruence. THEOREM 5-7.2. The congruence ax = b (mod m) has a solution if and only if (a, m) divides b. If (a, m) divides b, the congruence has exactly (a, m) incongruent solutions modulo m. Proof. I f the congruence a z b (mod m) has a solution r, then a r b = lm, or ar - lm = b, for some integer l. Since (a, m) is a common divisor of a and m, (a, m) divides b. Conversely, if (a, m) divides b, then we can consider the congruence

Since a/(a, m) and m/(a, m) are relatively prime, (5-8) has a solution S by Theorem 5-7.1. Then as = b (mod m) by Theorem 5-6.4(a), so that S is a solution of ax E b (mod m). I n fact, any solution of (5-8) is a solution of the given congruence. I f the condition (a, m)lb is satisfied, then the congruence (5-8) has a solution S satisfying O _< S < m/(a, m). Define for j = O, 1, . . . , (a, m> - 1,

Then s j = S (mod m/(a; m)), so that s j is a solution of (5-8)) and therefore of ax = b (mod m). Moreover, since

i t follows that if O 5 i < j 5 (a, m) modulo m. We will show that

1, then si is not congruent to

sj

is a representative set of solutions of the congruence ax = b (mod m). That is, every integer t satisfying at = b (mod m) is congruent modulo m to S, for some r. By Theorem 5-6.4(c), at = b (mod m) implies that t is a solution of (5-8). Therefore, t S (mod m/(a, m)) by Theorem 5-7.1. That is, t =S l[m/(a, m11

for some integer l. By the division algorithm, 1 = q(a, m) O 5 r 5 (a,m) - l . Thus,


t=s+-

+ r, where

rm (a, m>

+ qm = + qm =
S ,

S,

(mod m).

5-71

LINEAR CONGRUENCES

185

EXAMPLE 2. Solve the congruence 15x = 20 (mod 35). Since (15, 35) = 5, and 5120, the congruence has 5 solutions which are incongruent modulo 35. These are obtained by first solving 32 = 4 (mod 7). We find the solution x = 6. Then a representative set of solutions of 152 = 20 (mod 35) is obtained from (5-9) and (5-10). These are

For (a, m) = 1, the congruence ax = b (mod m) can be solved as in Theorem 5-7.1 using the Euclidean algorithm. This is probably the best method if the numbers a and b are large. I f these numbers are small, the congruence can often be solved more easily by trial, or by using the properties of congruences given in Theorem 5-6.3.
f x is a solution, then 6x = 8 (mod 7), EXAMPLE 3. Solve 3x = 4 (mod 7). I so that -x = 1 (mod 7), and x E -1 = 6 (mod 7). Suppose that we wish to solve 5x = 9 (mod 13). I f x is a solution, then 18x = 9 (mod 13), and therefore 1 (mod 13); consequently, 142 = 7 (mod 13), and x = 7 (mod 13). As a 22 final example, if x is a solution of 5x = 11 (mod 17), then 52 45 (mod 17); hence x = 9 (mod 17).

There is an important application of Theorem 5-7.1 to the construction of sets of orthogonal Latin squares. A Latin square of side m is an arrangement of m distinct symbols in m2 subsquares of a square, in such a way that every row and every column contains each symbol exactly once. I t is immaterial what symbols are used, but it is convenient to let them be the number symbols O , 1, 2, . . . , m - 1. As an example,

is a Latin square of side 3. Two Latin squares of the same size are called orthogonal if, when one is superposed on the other, every ordered pair of symbols occurs exactly once in the resulting square. For instance

186

ELEMENTARY NUMBER THEORY

[CHAP.

are orthogonal Latin squares, since when one is superposed on the other we have

For centuries, amateur and professional mathematicians have found Latin squares interesting. In recent years the study of pairs of orthogonal Latin squares has taken a serious turn, because of the discovery that such pairs have important applications in algebra, geometry, and applied statistics. Let p be a prime. For any integer a, let r(a) be the remainder on dividing a by p. That is, O 5 r(a) < p, and a = r(a) (mod p). For O < lc < p, define a Latin square (which we designate by Lk) as in Table 5-1. In other words, the number in the ith row and jth column of Lk is

To show that Lkis a Latin square, it is necessary to prove that if O 5 b < p, then b occurs in every row of Lk and in every column of Lk. Consider the ith row. Then b - (i - 1)k = c (mod p) for some c satisfying O c 5 p - 1. Let j = c 1. Then

(j Thus,

1)

+ (i
1)

1)k = b (modp).

r((j

+ (i - 1)k)

b.

Therefore, b occurs as the jth entry of the ith row. Now examine the jth column. Since O < k < p and p is a prime, it follows that (k, p) = 1.

5-71

LINEAR CONGRUENCES

Hence by Theorem 5-7.1, there is an integer d such that kd = b - ( j We-can select d so that O
-

1) (modp). 1. Define i = d

5d5p

+ l.

Then

Consequently, r (( j - 1) (i - 1)k) = b, so that b is the it,h ent,ry of t,he ,jth column in Lk. Thus, we have shown that each row and column contains each of the numbers O, 1, . . . , p - 1 at least once. Since there are only p entries in each row and column, it follows that the rows and columns cannot contain these symbols more than once. Therefore, Lk is a Latin square. We wish to show now that if O < k < 7' < p, then the squares Lk and Lkl are orthogonal. For this, we have to prove that if O 5 a 5 p - 1 and O 5 b 5 p - 1, there are natural numbers i and j such that 1 5 i 5 p, 1 5 j 5 p, and r ( ( j - 1) (i - 1)k) = a, r ( ( j - 1) f (i - 1 ) = b. This is elearly equivalent to the problem of solving the congruenees (i - 1)k = a(modp), ( j - 1) ( j - 1) (i - 1)k' 5 b (modp)

+ +

for i and j. Subtracting these congruences, \ve obtain the condition (i


-

1)(k'

lc)

=b

a (mod p),

which can be written in the form (k' - k)i = (b


-

a)

+ (k' - k) (modp).

Since O < Ic' - 7 < p and p is a prime, it follows from Theorem 5-7.1 that this congruence has a solution i such that 1 5 i 5 p. Choose j so that 1 5 j 5 p and
j - 1=a
-

(i

I)lc (modp).

Then by construction, j - 1 and i - 1 satisfy the congruence ( j - 1) (i - 1)lc = a (mod p). However, these values of j - 1 and i - 1 also satisfy the congruence ( j - 1) (i - l)kf b (mod p). In fact,

j -1

+ (i

l)kf = a - (i - 1)lc (i - l)kf = a (i - l)(kf - k) = b (modp).

This proves that Lk and Lkl are orthogonal. Note that we have constructed a set of p - 1 Latin squares, each pair of which is orthogonal.

188

ELEMENTARY NUMBER THEORY

[CHAP.

Many problems in number theory require the simultaneous solution of systems of congruences. We will prove a famous and important theorem about such congruences. This result was known to Chinese mathematicians as early as 250 A.D., and for this reason i t is usually called the Chinese remainder theorem. THEOREM 5-7.3. Let ml, m2, . . . , mk be natural numbers such that (mi, mj) = 1 if i # j. Then if bl, b2, . . . , bk are any integers, there exists an integer x such that

Moreover, x is unique modulo mlm2

. . . mk.

Proof. Let ni = mlm2 . . . mi-lmi+l . . . mk. Then (ni, mi) = 1 since (mi, mj) = 1if i # j. Consequently, by Theorem 5-7.1, there is a n integer ti S U C ~ that niti bi (mod mi). Let

bi (mod mi), since if j # i, then milnj, and conseThen z = niti quently njtj = O (mod mi). Thus, x is a simultaneous solution of the f y also satisfies y = bl (mod ml), given system of congruences. I y b2 (mod m2), . . . , y = bk (mod mk), then x = y (mod ml), x = y (mod m2), . . . , x = y (mod mk). Therefore by Theorem 5-6.4(f), x y (mod m), where m is the least common multiple of (ml, m2, . . . ,mk) . But since these integers have no common prime factors, their least common multiple is mlm2 . . . mk.

l. Give the representative set of solutions for each of the following linear congruences. (a) 3622 236 (mod 24) (b) 552 = 5 (mod 31) (c) 84x = 96 (mod 7) (d) 36x: = 6 (mod 21) (e) 2702 = 30 (mod 150) 2. Let p be a prime. Show that if p i a, then the congruence ax = b (mod p) has a unique solution modulo p. 3. Find the solutions of the following systems of congruences. (a) x r 5 (mod 6)) x = 7 (mod 11) (b) x = 1 (mod 2)) x = O (mod 3)) x = 2 (mod 5) (c) x 21 (mod 29)) x = 5 (mod 30), x = 24 (mod 31)

5-81

THE THEOREMS OF FERMAT AKD EULER

189

4. Let a, b, and c be integers and let m be a natural number. Suppose that (c, m) = l. (a) Prove that the congruence ax = b (mod m) is equivalent to the congruence caz = cb (mod m), that is, every solution of ax b (mod m) is a solution of caz cb (mod m), and conversely. (b) Suppose, in addition, that (a, m) = 1. Prove that the congruence ax b (mod m) is equivalent to the congruence x b' (mod m) for some integer b'.

5. Let mi, m2, . . . , mk be natural numbers such that (mi, mi) = 1 if i # j. Let a l , a2, . . . , ak and bi, b2, . . . , b k be integers. Prove that the system of congruences

has a solution if and only if 1 ,1


,

(a2, ma)lbz,

..

(ak, mk) 1 b k .

[Hint: Reduce the system of congruences to the form treated in Theorem 5-7.3 by using Problem 4(b), together with an argument similar to that given in the proof of Theorem 5-7.2.1
6. Determine which of the following systems of congruences have a solution, and mhen solutions exist, find a t least one. (a) 5x = 1 (mod 7), 22s 2 (mod 6) 1 (mod 125) (b) 8x = 14 (mod 24), 4x 1000 (mod 91) (c) 3 8 r ~ 3 (mod 12), 50x 75 (mod 125), x (d) 233x 10 (mod 12), 73x 1 (mod 219), 12x 4 (mod 8) 7. A band of 17 thieves stole a large sack of dollar bills. They tried to divide the bills evenly, but had three bills left over. Two of the thieves began to argue about the extra money, so one of them shot the other. The money was redistributed, but this time there were ten bills remaining. Again argument developed, and one more thief was shot. When the money was redistributed, there waqnone left over. What was the least possible amount of money which could hstve been stolen originally?

--

--

8. Construct a set of 4 Latin squares with 5 rows and 5 columns, such that each pair of the set is orthogonal.
"5-8 The theorems of Fermat and Euler. One of the oldest aiid most famous theorems in number theory was discovered by Fermat and communicated to a friend in 1640. The first published proof of Fermat's theorem, due to the Swiss mathematician Leonhard Euler (1707-1783), appeared almost a century later. Subsequently, a more general theorem mTasfound by Euler. In this section we will discuss these classical results and some of their applicatioiis.

190

ELEMENTSRY KUMBER THEORY

[CHAP.

Fermat7s theorem concerns congruences with prime moduli. These are important because many problems concerning congruences with composite modulus can be reduced to questions about congruences with prime modulus. We begin by noting a simple property of the binomial coefficients. (5-8.1). If p is a prime and if i is an integer such that O

< i < p, then

(r)

P! -O(rnodp). i!(p - i)!

We leave the proof of this fact as an exercise for the reader (see Problem 6, Section 5-5). If p is a prime, and if a and b are any integers, tlien by the binomial theorem

+ (p

1) abp-'

+ bP

aP

+ bp (mod p),

= O (mod p) if 1 2 i 2 p - 1. Using mathesince by (5-8.1), matical induction, this observation can be generalized as follows.
(5-8.2). If p is a prime, and if al, a2, . . . , a, are any integers, then (al

+ a2 + + anlp

al

+ a; +

Proof. If n = 1, then the assertion is that al; ay (mod p), which is clearly valid. Assuming that the result holds for n, it follows from the remarks above that
[(al

+ a:

(modp).

+ a2 + +

a,)

4- an+llP= (al
ay

+ a; +

+ a2 + . + a d p + a:+l
+a:

+ a:+l

(modp).

This proves the induction step.

I f welet a l = a2 = . . . = a, = 1 in (5-8.2), weobtainnp n (modp). Also, if p is odd, and a l = a2 . . . = a, = -1, then (5-8.2) specializes to (-n)p r -n (mod p). This is also true if p = 2, since -n n (mod 2). Obviously, Op O (mod p). Therefore, we have proved the following theorem.

THEOREM 5-8.3. I f p is a prime, and if a is any integer, then ap a (mod p) .

5-81

THE THEOREMS OF FERMAT AND EULER

191

Alt,hough this theorem is obtained as a special case of (5-8.2)) it is evident that (5-8.2) can be deduced easily from the theorem. The "little f Fermat" is a slight variation of Theorem 5-8.3. theorem o THEOREM 5-8.4. I f p is a prime, and if a is any integer which is not divisible by p, then a ~ - l= - 1 (mod p). Proof. By Theorem 5-8.3, p divides a(ap-' - 1). Since p does not 1 (mod p). divide a, it must divide ap-' - 1, by (5-3.2). Hence, ap-' The method by which we have proved Theorem 5-8.4 is similar to Euler's first proof of this theorem. Some years later, Euler found a different way to prove Fermat's theorem. Using the ideas of this second proof, he was able to establish the more general result known as Euler's theorem. We will prove Euler's theorem by a method which was discovered about 50 years later. This proof is important because it introduces a technique which has many applications in number theory. The following definition is needed. DEFINITION 5-8.5. Let m be a natural number. The totient of m is the number of nonnegative integers less than m which are relatively prime to m. The totient of m is usually denoted by p(m). In other words, cp(m)is the number of integers k such that O _< k < m and (k, m) = 1. For example, p(1) = 1, p(2) = 1, p(3) = 2, p(4) = 2, p(5) = 4, and p(6) = 2. If p is a prime, then the numbers 1,2, . . . , p - 1 are al1 prime to p, so that p(p) = p - 1. THEOREM 5-8.6. Euler7stheorem. If m is a natural number and a is an integer which is relatively prime to m, then
a'"(") =

1 (mod m).

Proof. To simplify notation, let t denote p(m). According to the definition of p(m), there are exactly t different natural numbers in the set (0, 1, 2, . . . , m - 1) which are relatively prime to m. Let these be designated as Ic', k2) . . . , h. Consider the set of integers

By the division algorithm, we have for i = 1, 2,

. . .,t

192

ELEMENTARY NUMBER THEORY

[CHAP.

where qi is an integer and O 5 ri < m. Thus, ski r ri (mod m). The main step of the proof consists of showing that the list of numbers rl, r2, . . . , rt is just a rearrangement of the sequence kl, k2, . . . , let. We do this indirectly. First note that each ri is relatively prime to m. In fact, if a ri = aki. T ~ u s , prime p divides both m and Ti, then p also divides qim either p[aor pllci. However, since plm and both a and l & are relatively prime to m, p cannot divide either a or ki. Therefore, (ri, m) = 1. Since lcl, k2, . . . , lct is the list of all integers k such that O 5 l e < m and (lc, m) = 1, and since each ri satisfies these conditions, it follows that each ri must be equal to some lc,. I f we can show that the numbers rl, r2, . . . , rt are al1 different, then our proof that r l , r2, . . . , r t is a rearrangement of kl, k2, . . . , kt will be complete. Suppose that ri = r j for some i # j. Then subri and akj = qjm r j gives tracting the equations ski = qim

Therefore, mla(lci - kj). Since a is prime to m, it follows from Theorem 5-2.6 that ml(ki - kj). However, i # j implies ki # kj, and O 2 lei, kj < myields -m < lei - lej < m. Hence, O < Iki - kjl < m. Therefore, m cannot divide ki - kj. This contradiction proves that the numbers rl, r2, . . . , rt are al1 different, and that the list r l , r2, . . . , rt is the same (in possibly different order) as kl, le2, . . . , kt. In particular, by the commutative law of multiplication,

Consequently, since ski = ri (mod m),

It only remains to observe that the product lcl lc2 k t can be cancelled from each side of this congruence. In fact, no prime factor of m divides any of the integers lel, lc2, . . . , kt since these numbers are relatively prime to m. Thus, m has no prime factor in common with the product kl k2 . lct. That is, (lcl k2 . lct, m) = 1, and the cancellation is permissible by (5-6.4d). Thus, 1 at = aq'm' (mod m).

This proof can be illustrated by carrying it out in a particular numerical case. Let m = 14. Then the integers in the range from O to 13 which are relatively prime to 14 are 1, 3, 5, 9, 11, and 13. These can be taken as the numbers kl, le2, . . . , let in the proof of Theorem 5-8.6. In this example,

5-81

THE THEOREMS OF FERMAT AND EULER

193

(s(14) = 6. Let a = -5.

Then the numbers alci are

The division algorithm gives

Therefore in our special case, the numbers rl, r2, . . . , rt occurring in the proof of Theorem 5-8.6 are 9, 13, 3, 11, 1, and 5. This agrees with the general result that r l , r2, . . . , r t is a rearrangement of kl, k2, . . . , Ict. To conclude the illustration, note that 1.3.5.9.11-13 = 9.13.3.11.1*5 = [(-5) 1][(-5) 3][(-5) 5][(-5) 9][(-5) = (-5)6(i - 3 . 5 . 9 . 1 1 13) (mod 14).

111[(-5)

131

Since 1 3 5 9 11 13 is relatively prime to 14, it follows that

I f the natural number m in Euler's theorem is a prime p, then (s(m) = p(p) = p - 1, and the theorem asserts that if (a, p) = 1, then ap-' 1 (mod p). This is exactly the statement of Fermat7stheorem. In order to use Euler's theorem, it is necessary to know the value of the totient (s(m). For small values of m, (s(m)can be obtained by counting the numbers from O to m - 1 which are relatively prime to m. However, if m is large, this procedure is impractical. Fortunately there is a convenient formula for q(m).
TKEOREM 5-8.7. If m is a natural number different from 1, and if m = p;'pi2. . . py, where pi, p2, . . . , p, are distinct primes, and the exponents el, e2, . . . , e, are positive, then

We will not give a proof of this theorem (however, see Problems 14, 15, and 16 below) . Euler's theorem has numerous applications. For example, it provides another method of solving linear congruences of the type discussed in

194 Theorem 5-7.1 :


ax

ELEMENTARY NUMBER THEORY

b (mod m),

where

(a, m) = 1.

Indeed, if we let x = av'"'-'


ax = a .

b, then

.b

aq(m'

.b -

1 . b = b(m0dm).

EXAMPLE 1. Consider the congruence 152 6 (mod 22).

We have ~ ( 2 2 ) = p(2 11) = 1 10 = 10. Then 6 159 is a solution of the ~ congruence. Since 152 = 225 r 5 (mod 22), 1 5 ~ 52 = 3 (mod 22), 1 5 = 32 = 9 (mod 22), 6 159 r 6 15 9 = 90 9 = 2 9 = 18 (mod 22). Therefore, x = 18 is the smallest nonnegative solution of the congruence. This method of solving linear congruences is very often not the easiest. If 15x 6 (mod 22), then 52 = 2 = 90 (mod 22), and x = 18 (mod 22).

Another application of Euler's theorem is the reduction of large powers of a number modulo m.

EXAMPLE 2. Suppose that we wish to find the least nonnegative integer to which 5221is congruent modulo 18. Since ~ ( 1 8 ) = 6 and (5, 18) = 1, Euler's theorem yields 56 1 (mod 18). Since 221 = 36 6 5, we obtain

13 (mod 18), and 55 5 13 = 65 = Finally, 52 7 (mod 18), 54 = 49 11 (mod 18). Thus, 11is the least nonnegative integer to which 5221is congruent modulo 18. EXAMPLE 3. What are the last two decimals of the number 3119? This is equivalent to the problem of finding the least nonnegative integer to which 3119 2 = ~ 40, is congruent modulo 100. Since (3, 100) = 1 and ~ ( 1 0 0 )= ~ ( 52) we have 340 r 1 (mod 100). Thus, (340)3 = 3 3119 1 (mod 100). Consequently, by Theorem 5-7.1, 3119 r (mod 100), where r is any solution of 3 67 3x 1 (mod 100). Since 100 = 33 3 1, we obtain 1 = 3 (-33) (mod 100). Hence, 3119 67 (mod 100), SO that the last two decimal digits of 3119 are 6'7.

DEFINITION 5-8.8. Let m be a natural number and let a be an integer such that (a, m) = 1. The order o f a modulo m (or the exponent to which

5-81

THE THEOREMS OF FERMAT AND EULER

195

a belongs modulo m) is the smallest natural number d such that ad = 1 (mod m).
By Theorem 5-8.6, the set of al1 natural numbers TL such that a" = 1 (mod m) is nonempty, since in fact cp(m) belongs to this set. Therefore, d is well defined and d 2 cp(m). It is clear from Definition 5-8.8 and Theorem 5-6.3 that if a = b (mod m), then a and b have the same order modulo m.

f THEOREM 5-8.9. Let d be the order of the integer a modulo m. I a" = 1 (mod m), then d[n. In particular, d[cp(m).
Proof. By the division algorithm, n = qd integer and O _< r < d. Consequently,
a'(ad)q

+ r, where q is a nonnegative

a@+'

a" = 1 (mod m).

Since d is the smallest positive exponent such that ad = 1 (mod m), i t follows that r = O. That is, dln. The last statement of the theorem is a consequence of Theorem 5-8.6. I t is possible to find the order modulo m of an integer a by trial, provided that m is small. For example, if m is 5, then cp(m) = 4, and the numbers f d is the order of a which are relatively prime to m are 1, 2, 3, and 4. I modulo 5, then d14, by Theorem 5-8.9. Hence, d = 1, 2, or 4. Clearly, 1 belongs to the exponent l. Also, 22 = 4 = -1 (mod 5 ) , 32 = 9 = -1 (mod5), 42 = 16 = 1 (mod5). Thus, 4 belongs to the exponent 2, while 2 and 3 belong to the exponent 4 modulo 5. The problem of finding the order of a modulo m can be difficult if a and m are large. By Theorem 5-8.9, the largest possible order of an integer modulo m is p(m). If the integer a is relatively prime to m and a belongs to the exponent cp(m) modulo m, then a is called a primitive root modulo m. For example, 2 and 3 are primitive roots modulo 5. I t is possible to prove that if m is a prime, then there are exactly cp(cp(m)) primitive roots modulo m among the natural numbers 1,2, . . . , m - 1. However, if m is composite, there may not be any primitive roots for this modulus. For example, every odd integer belongs to one of the exponents 1, 2, or 4 modulo 16, and cp(16) = 8.

196

ELEMENTARY NUMBER THEORY

[CHAP.

An amusing application of Theorem 5-8.9 concerns the perfect shuffling of cards. Consider a deck of 2m cards. Let the cards of the deck be numbered from top to bottom: 1, 2, 3, . . . , m, m 1, . . . , 2m. The deck is split into two equal piles, the first pile consisting of the cards 1,2, 3, . . . , m 1, m 2, . . . ,2m in order, and the second pile consisting of the cards m in order. A perfect shuffle results if the cards are shuffled together from the bottom up, alternating a card from each pile, and beginning with the first pile. After a perfect shuffle, the arrangement of the cards will be changed f r o m 1 , 2 , 3, . . . , m , m + l , . . . , 2 m t o m + l , l , m + 2 , 2 , . . . , 2m,m. A card numbered 1, 2, . . . , m which was in position i before the perfect 1, m 2, shuffle is in position 2i after the shuffle. A card numbered m . . . , 2m which was in position i before the shuffle is in position 2i (2m 1) afterwards. Note that for 1 _< i _< m,

while for m

+ 1 5 i 5 2m,

Thus, in every case the ith card goes into position rl, where r l is the remainder obtained on dividing 2i by 2m l . Hence,

rl

2i (mod 2m

+ 1). + 1).

A second perfect shuffle will send a card which is now in position r l into position r2, where
r2 = 2rl

2 2i = 22 i (mod 2m

In general, after n perfect shuffles, the ith card will be in position r,, where r, 2" i (mod 2m

+ 1).

The question now arises: what is the least number of perfect shuffles required to return a deck of 2m cards to its original order? The answer is plainly the smallest positive integer n satisfying

1, . . . , 2m. Since this congruence must hold for i = 1, 2, 3, . . . , m, m in particular for i = 1, it is necessary that 2"

1 (mod 2m

+ 1).

5-81

THE THEOREMS OF FERMAT AND EULER

On the other hand, if this latter congruence is satisfied, then

2" i (mod 2m

+ 1)
+

for every i, b y Theorem 5-6.3(f). Thus, the positive integer which is the answer to the problem is the order of 2 modulo 2m l . [Note t h a t Definition 5-8.8 applies since (2, 2m 1) = l . ] Suppose now t h a t we are considering a n ordinary deck of 52 cards. Then 1 = 53. By Theorem 5-8.9, the order of 2 modulo 53 m = 26, and 2m is a divisor of (p(53) = 52. T h a t is, the order of 2 is one of the numbers 2, 4, 13, 26, or 52. Clearly 2 2 and 2* are not congruent to 1 modulo 53. Also, 26 = 64 11 (mod 53), 212 = 121 = 15 (mod 53)) and therefore 302 = 900 = 53 17 - 1 2 l 3 30 (mod 53). Finally, 226 = (2 13) -1 (mod 53). Thus, the order of 2 modulo 53 is 52. Consequently, i t follows from our general result t h a t 52 perfect shuffles are required to return a n ordinary card deck to its original order.

--

l. Solve the following linear congruences, using Euler's theorem.

(a) 682 13 (mod 19) (c) 2 6 9 1 x - l l ( m ~ d 9 )


99999.

(b) 502 -21 (mod 33) (d) 11x-25(mod12)

2. Find the last three decimal digits of the following numbers: 32m00,71610, 3. Find the last two decimal digits of gg9. 4. Prove that for every natural number a, a and a5 have the same final decimal digit. 5. Show that if p is an odd prime which does not divide the integer a, then either a(~-l)I2 1 (mod p), or a(p-l)I2 G -1 (mod p). 6. Find p(m) for m = 1, 2, 7, 8, 9, 10, 12, and 16 by listing the numbers Ic which satisfy O k < m, and (k, m) = 1.

<

7. Illustrate the proof of Euler's theorem, Theorem 5-8.6, for the particular case m = 15, a = 4. 8. (a) Find the order modulo 16 of the numbers 1, 3, 5, 7, 9, 11, 13, and 15. (b) Find the order modulo 15 of the numbers 1, 2, 4, 7, 8, 11, 13, and 14. 9. Let p be a prime, let e be a positive exponent, and let a and b be integers such that a b (mod pe). Prove that ap bP (mod pe+l). [Hint: Write a = b cpe, c E E, and expand ap = (b cpe)pby the binomial theorem.]

198

ELEMENTARY NUMBER THEORY

10. Show that if a is an odd number, then aZn


1 (mod 2n+2)

for every natural number n. What does this imply about the existence of primitive roots modulo powers of 2? 11. Using the result of Problem 9, deduce the following case of Euler's theorem from Fermat's theorem. Let p be a prime, let e be a positive exponent, and let a be an integer which is not divisible by p. Then

12. How many perfect shuffles are required to return a deck of 46 cards to their original order? How many for a deck of 22 cards? 13. Suppose that p is a prime and that a is a primitive root modulo p. (a) Show that
1" + 2n

+ . . . + (p
1"

1)n

+ an + a2n +

+ a(p-2)n (mod p).

(b) Use (a) to prove that if (p - 1) .i n, then

+ 2" + . . + (p - 1)" = O (mod p).

(c) Show that if (p - l)ln, then


1" + 2"

+ . . . + (p - ) n

-1 (mod p).

(d) Determine l n

+ 2" + + 52" (mod 53) for al1 n.

The following three problems lead to a proof of Theorem 5-8.7. Accordingly, this theorem should not be used to prove any of the statements in these problems. 14. Let p be a prime and let e be a positive exponent. Prove that

[Hint: First show that the number of integers Ir; such that O p/Ir;is pe-l.]

Ir;

< pe and

15. Let m and n be natural numbers which are relatively prime. Let rl, r2, . . , r, be al1 of the different integers r such that O 5 r < m, and (r, m) = 1, and let si, 82, . . . , S, be al1 of the different integers S such that O _< S < n, and (S, n) = 1. (a) Show that for any integers a and b, if c = ma nb, then (c, mn) = 1 if and only if (a, n) = 1 and (b, m) = 1. (b) Show that if msi nrj = msk nrl (mod mn), then i = Ir; and j = 1.

5-81

THE THEOREMS OF FERMAT ASD

EULER
=

199

(c) Show that if c is an integer such that (e, mn) and j, c msi nri (mod mn).

1, then for some i

(d) Prove that p(mn) = p(m) . p(n). [Hint: Using (a), (b), and (e), show k < mn and (k, mn) = 1, that for each integer k such that O there is exactly one pair (i,j ) of indices such that

<

msi

+ nri (mod mn),

and every pair (i, j ) occurs this way.] 16. Using the results of Problems 14 and 15, prove Theorem 5-8.7.

CHAPTER 6

THE RATIONAL NUMBERS


6-1 Basic properties of the rational numbers. The positive rational fractions were used by some of the earliest civilizations on earth long before negative numbers were introduced. There are records which indicate that the Rabylonians employed symbols for the fractions 2,+, and as long ago as 2400 B.C. Before 1650 B.C.,the Egyptians devised a curious system of representing certain fractions as sums of reciprocals of distinct natural numbers. For example,

By the time Euclid wrote his Elements (about 300 B.C.),rational arithmetic had been developed to almost the same form that we know today. Of . course, the negative rational numbers came later. I t is not surprising that the invention of fractions occurred early in the history of our culture. The use of numbers for measuring length must have developed naturally from their use as counting devics. I f a trader wished to buy a certain amount of cloth, he had to have some way of giving the seller a description of how much cloth was wanted. A convenient measure was the number of arm lengths (approximately, the number of yards). With the need for more accuracy came the necessity of measuring fractional parts of unit lengths. A tailor who could make 3 robes from 10 yards of cloth would not want to buy 4 yards to make a single robe. He would need some way to express the length, 35 yards. Such needs must have led to the early invention of "rulers" and "yard sticks." Through the years, devices for measuring distance have progressed from the crude "measuring sticks" to the finest microscopic gauges. Let us review the facts about rational numbers which are usually discussed in elementary algebra courses. As we have seen, the main reason for enlarging the ring of integers to the rational numbers is to make division by natural numbers possible. Thus, ideally the system Q of rational numbers should satisfy the following conditions. (6-1.1). Properties of Q. (a) Q is an ordered integral domain. (b) Q contains Z as a subring. That is, Z E Q, and the ordering and the operations of addition, multiplication, and negation in Z agree with the ordering and operations in Q.

6-11

BASIC PROPERTIES O F ' THE RATIONAL NUMBERS

201

(c) If a E Z and n E N, then the equation nr = a has a solution r E Q,that is, the quotient a/n exists in Q (see Definition 4-4.9). (d) Every element r of Q is a quotient a/n for some a E Z and n E N, that is, r satisfies nr = a for some n E N, a E Z. The property (d) implies that Q is the "smallest" system which satisfies the requirements (a), (b), - and (e). This is stated more exactly in the following theorem. THEOREM 6-1.2. Let A be an ordered integral domain containing the ring Z of al1 integers as a subring. Suppose that for each a E Z and n E N, the quotient a/n exists in A. That is, A satisfies (a), (b), and (c) of (6-1.1). Let B be the set of al1 elements r E A which satisfy nr = a for some n E N and a E Z. Then B is a subring of A, and B satisfies (6-l.la, b, c, d). Moreover, any ring which satisfies al1 of the conditions of (6-1 . l ) is isomorphic to B. Since we will not use this theorem, its proof will be omitted. The conditions (c) and (d) of (6-1.1) lead to the familiar method of representing al1 rational numbers. In fact, every rational number is a quotien t a/n, a E Z, n E N, and conversely, for every a E Z, n E N, there is a rational number which is the quotient a/n. I t should be remembered that a/n is no more than a way of designating a particular rational number. We think of a/n as the expression representing the solution of the equation n x = a; that is, a/n represents the number obtained from the division of a by n, just as a n represents the number obtained by multiplying a and n. Of course, each rational number has infinitely many representations in this form. For example
1 2,

4 7

6 7

al1 denote the same rational number. The conditions (6-1.1) are the specifications which must be met by the system of rational numbers, and as we noted in Theorem 6-1.2, any two rings which satisfy these conditions are isomorphic. However, it is not immediately obvious that there is any system at al1 which satisfies (6-1.1). In Section 6-5, starting with N and Z, we will construct an ordered integral domain Q which does satisfy (6-1.1). By using (6-l.la, b), it is possible to discover the rules of operation for rational numbers, represented as quotients.

202

THE RATIONAL NUMBERS

[CHAP.

(6-1.3). Rules o f operation for rational numbers. Let a and b be integers, and let m and n be natural numbers. Then (a) a/m = b/n if and only if na = mb ; (b) a/m < b/n if and only if na < mb; (e) ( a l 4 (bln) = (na mb)/mn; ( 4 -(a/m) = (-a)/m; (e) (a/m> (b/n> = (ab)/(mn).

Proof. Let r denote the rational number a/m, and let s represent b/n. That is, r and s are the unique rational numbers satisfying

We first prove (a). Note that

I f r = S, then mnr = mns. Hence na = mb. Conversely, if na = mb, then mnr = mns. Since m and n are natural numbers, mn f 0, and by the cancellation law in an integral domain, r = s. The proof of (b) is f r < S, then na = mnr < mns = mb, similar to the proof of (a). I since mn is positive in 2, and therefore also positive in Q. Conversely, if na < mb, then mnr < mns. This implies that r < s. For otherwise, s 2 r and hence mns _< mnr, by Theorem 4-6.2. The proofs of (e), (d), and (e) are very simple. Note that by Definition 4-2.1 and Theorem 4-2.4
(mn)(r S) = mnr mns m(-r) = -(mr) = -a, (mn) (rs) = (mr) (ns) = ab. Thus, by Definition 4-4.9, (a/m) -(a/m) (a/m)

na

+ mb,

+ (b/n) = r + s = (na + mb)/mn,


=

-r
=

(-a)/m,

. (b/n)

r s = (ab)/(mn).

Although the system of rational numbers is constructed in order to divide integers by natural numbers, it happens that this system enjoys the strongest possible divisibility property: if r and s are rational numbers, and if r f O, then r divides s in Q. This fact is easily proved using the properties of Q given above. By (6T1.1), we can write r = a/m, s = b/n, where a and b are integers and m and n are natural numbers. Then it is well known that s/r = mb/na is the required quotient. However, if a is negative, then na is not a natural number, and it does not follow from

6-11

BASIC PROPERTIES OF THE RATIONAL NUMBERS

203

(6-1.1~)that mblna is in Q. This defect is easily corrected by noting that s/r can just as well be represented by mab/na2, where a2 is a positive integer, and consequently na2 E N. This is the idea involved in the proof of the important divisibility property, which we now formulate more carefully . THEOREM 6-1.4. I f r and s are rational numbers, and if r # O, then r divides S in Q; that is, there is a rational number t such that r . t = s. Proof. By (6-1.1)) we can write

tvhere a and b are suitably chosen integers, and m and n are natural numbers. Thus, the rational numbers r and S satisfy

Since r # O, it follows that a # O. Thus, by Theorem 4-5.5, a2 is a positive integer. Therefore na2 E N. Evidently, mab E 2 . By (6-1.1~))there is a rational number t such that na2t = mab. Theref ore, (mna2)rt = (mr) (na2t) = a(mab)
=

(ma2) b = (ma2)(ns)

(mna2)s.

Sinee Q is an integral domain, and mna2 E N (hence, mna2 Z O), it follows that rt = s. I t should be noted that in severa1 places in the above proof, we have used the fact that multiplication in Q of elements belonging to Z agrees with the usual multiplication in 2, that is, Z is a subring of Q.

1. Prove the cancellation law for quotients: if m and n are natural numbers and a is an integer, then ma/mn = a/n in Q.

2. Prove that each positive rational number can be represented uniquely in the form m/n, where m and n are relatively prime natural numbers.

3. Prove that if a E 2, then a

an/n in Q for every n E N.

4. The Farey series F k of order k is the ascending sequence of rational fractions 01'12 m/n, where O 5 m 5 n 5 lc and (m,n) = 1. For instance, F4 is 1, 4 , 31 29 3 7 3 , L. Write the Farey series F5 and Fs.

5. Prove that a/m - b/n

(na - mb)/mn (where a, b E 2, m, n E N ) .

204

THE RATIONAL NUMBERS

[CHAP.

6. Prove that if r and s are rational numbers, then (+)/S


=

r/(-S)

-(r/s)

and

(-r)/(-S)

r/s.

7. Point out the places where the condition (6-l.lb) was used in the proof of Theorem 6-1.4.
8. Prove Theorem 6-1.2. 6-2 Fields. The theory of ordered integral domains developed in Chapter 4 can be applied to the ring Q of rational numbers. However, Q also satisfies the divisibility property, given in Theorem 6-1.4, which does not hold in al1 integral domains (for example, it is not satisfied in 2 ) . I t is to be expected that certain properties of the rational numbers are consequences of Theorem 6-1.4 (together with the other properties of integral domains). I f this is so, then these properties will hold as well for any ordered integral domain which satisfies this divisibility condition. There are examples of such integral domains. For us, the most interesting one other than Q itself is the ring R of al1 real numbers. Thus, as in Section 4-2, it appears that by introducing a new abstract concept, we will be able to prove theorems in a general setting which will apply to a number of important special cases.

DEFINITION 6-2.1. A jield is a commutative ring F such that (a) F contains a t least one element different from O; (b) if z E F, y E F, and x f O, then there is an element x E F such that z x = y.
EXAMPLE 1. The rings Q and R are fields, as we have mentioned. So is the system C of al1 complex numbers. The ring of integers is not a field. EXAMPLE 2. Let F a + a = 0;O.O = a . 0 is a field. EXAMPLE 3. Let F
tables
=

=
=

{O,a), where 0 + 0 = O , O+a 0 . a = O , a - a = a;-O = O,-a

a + O = a, a. T h e n F

(0, 1, u, u). Define addition and multiplication by the

+ O l u v O 1 l u u u v
Define -x verify.

.lo

1 u v

l O v u

u v o 1

v u 1 0

0 ' 0 o o o 1 O l u v
U O U V l

O v l u

x for x = 0, 1, u, and v. Then F is a field, as the reader can easily

6-21

FIELDS

205

The definition of a field does not explicitly require the existence of an identity element. However, it is not hard to show that every field does contain an identity. In fact a much stronger statement can be made. THEOREM 6-2.2. Every field is an integral domain. Proof. Suppose that F is a field. We first prove that F contains a nonzero identity element. Let x be any nonzero element of F. There is such an x by Definition 6-2.l(a). By Definition 6-2.1 (b), there is an e E F such that x e = x. Since x # O, it is evident that e cannot be O . To prove that e is an identity element, it is only necessary to show that y . e = y for al1 y E F. Note that since F is commutative, e y = y . e for al1 y. If y is any element of F, there exists z E F such that x z = y. Consequently, y = x . 2 = ( x . e ) . z = ( e . x ) . z = e . ( x . z ) = e - y = yse. Thus,eis a nonzero identity in F. To complete the proof of the theorem, it will now be enough (by Theorem 4-4.5) to prove that F has no proper divisors of zero. Suppose that y is a divisor of zero in F. That is, x y = O for some nonzero element x. By Definition 6-2.1 (b), there is an element w E F such that x . w = e, the identity of F. Consequently, y = y e = . Thus, if y is a divisor y (x w) = (y x) w = (x y) w = O . w = O of zero, then y = O, that is, F contains no proper divisors of zero. As we mentioned in Chapter 4, the identity element in any ring is usually denoted by 1. Of course, this custom is observed for fields in particular. Since every field F is an integral domain, there is, for each x # O and y in F, only one x E F satisfying the equation x z = y. Thus, as in any integral domain, we can write z as the quotient y/x. The property which distinguishes fields from arbitrary integral domains is the fact that in a field the quotient y/x always exists when x # 0. DEFINITION 6-2.3. Let F be a field. Let x be a nonzero element of F. The quotient 1/x is called the inverse o f x i n F, and is denoted by x-l. Thus, x-' is the unique element satisfying n: x-' = 1. I t should be emphasized that x-' is not defined for z = 0. Moreover, if x # O, then x-' # o. THEOREM 6-2.4. Let x, y, xl, x2, . . . , n:, (n a field F. Then (a) (x . y)-' (b) (2' ~ (c) (x-l)-'
1

1) be nonzero elements of

=
2 .

x-' 2;

. y-';
xn)-' = x i '
=

. x2'. . . . x , ' .
S

(d) 1- = = 1; (-1)-'

-1.

206

THE RATIONAL NUMBERS

[CHAP.

Proof. The proofs of al1 of these statements [except (b), which can be obtained by induction from (a)] are based on the above obser~at~ion about the uniqueness of the inverse, namely, if s and t are elements of F such tha,t S t = 1, then t = S-'. For example, to prove (a), let s = x y, t = x-l y-1. Then s t = (x y) (x-l t-l) = (x x-l) ( y . y-1) = 1 . 1 = 1. Thus, x - ' - ~ - ' = t = s-l = (x y)-'. The proof of (c) is similar. By definition of the inverse, x x-l = 1. Thus, x-' x = 1. Therefore, x is the inverse of x-l. That is, x = (x-l)-'. The proofs of (b) and (d) are left for the reader to complete. Quotients can be expressed in terms of the inverse operation: if x # 0, then Y- .-l. y = y . x-1. (6-1) X Indeed, x (x-l y) = (x x-') y = 1 y = y. This observation is useful, because inverses are easier to manipulate than quotients. DEFINITION 6-2.5. Let x be a nonzero element of the field F. Define xn for natural numbers n by the inductive conditions

Define x0 = 1, and
X(-n)

(x-'1".

For a natural number n, xn is the product


n factors & Z . X . x.

...

We have briefly discussed powers of real numbers in Section 2-6. The new concept which Definition 6-2.5 introduces is the idea of zero and negative exponents. That is, if x is a nonzero element of a field, then the object xa is defined for every integer a. f eaponents. Let x and y be nonzero elements of a field (6-2.6). Ruies o

F. Let a and b be arbitrary integers. Then


(a) (b) (c) (d) (e) xa . xb = x a + b (xa) = xa'b; (x y)a = xa ya; (x-')~ = x f - a ) la = l .

The identities (d) and (e) are easy consequences of Definition 6-2.5. Also, if a and b are natural numbers, then the identities (a), (b), and (c)

6-21

FIELDS

207

can be proved by induction on a (see Problem 1, Section 2-6). To extend these results to arbitrary integers involves a somewhat tedious checking of cases. As an illustration, let us consider the identity (a) for the case in which a is a positive integer and b is a negative integer. Then a = n and b = -m, where n and m are natural numbers. If n > m, then n = (n - m) m. Hence,

assuming that al1 of the identities above have been verified in the cases where a and b are natural numbers. I f n = m, the proof is simpler:

I f n

< m, t~hen m

(m

n)

+ n and

An ordered integral domain which satisfies Definition 6-2.1 (b) is naturally called an ordered field. The most important examples of ordered fields are the systems of rational numbers and real numbers.

THEOREM 6-2.7. Let F be an ordered field. Let x and y be elemeiits of F, and let a and b be integers. (a) If x > O , thenx-' > 0;if z < O , thenx-' < 0. (b) I f O < x < y, o r x < I/ < 0, thenx-' > y-'. (c) If x > O, then za > O; if x < O and a is odd, then xa < O; if x < O and a is even then xa > 0. (d) I f O < x < y, and a > O, then xa < ya; if O < x < y, and a < O, then xa > ya. (e) If O < x < 1 and a < O , then xb < xa; if 1 < x, and a < b, then xa < xb.
Proof. If x > O, then x-' # O. Hence, either x-' > O, or x-' < 0. I f z-' < O , then 1 = x x-l < O, which is false by Theorem 4-5.5. Thus, x-1 > O. Similarly, if n. < O, then x-' must also be < O . Suppose that O < x < y. Then x-l and y-' are positive. Assume that a-' 5

208

THE RATIONAL NUMBERS

[CHAP.

y-'. T h e n 1 = x x-' < y . x-' y y-' = 1 , whieh is a eontradietion. Therefore, y-' < x-l. I f x < y < O, then O < - y < -x. Henee, (-N)-' < (-y)-'. Therefore, - (-y)-' < - ( - x ) - l . Holvever, - (-y)-l = (-1) (-g)-l = ( - 1 ) - l . (-y)-l
= [(-1)

<

(-y)]-l

= y-1,

a n d similarly - (-z)-' t h e reader t o prove.

x-l.

T h e remaining statements are left for

EXAMPLE 4. Often problems of simplifying the description of sets of real or rational numbers involve inverses. The rules given in Theorem 6-2.7 are useful in solving such problems. Consider the set ( x E Rlx 22-l > 3). If x 22-1 > 3, then x 5 O is impossible. Hence, x 22-1 > 3 if and only if x > O and x ( x 2 s - l ) > 32, that is x2 2 > 3x > O. This inequality holds if and only if ( x - 2 ) ( x - 1 ) > O and x > O. Since x > 0, the product ( x - 2 ) ( x - 1 ) is positive if and only if O < x < 1 or x > 2. Hence,

l . Simplify the following expressions.

(a) ( x - l . y-l)-l . ( x - l ) - l (b) ( x y x - l ) - l . ( x - l . y-1 . x - l ) ( 4 [ ( x - l ) y]-l [ x ' (Y-91 2. If A is a subring of a field F, and if 1 E A, does it follow that A is a field? An integral domain? Support your answer by examples or proofs.

3. Show that the systems in Examples 2 and 3 are fields.


4. Complete the proof of Theorem 6-2.4.

5. Prove by induction on n that if x, x i , 22, and a, al, a2, . . . , a, are integers, then

. . . , x, are elements of a field F,

6. Simplify the following expressions. (b)

fi

x2)

6-31

CHARACTERISTIC

OF IKTEGRAL DOMAIKS AND FIELDS

209

7. Simplify the descriptions of the following sets. (a) (x E Rlx-l > 4) (b) {x E Rlx (c) {x E Rlx x-l 23.. 8. Let F be an ordered field. Suppose that x E F, y E F, and x # O . Show that (a) Ix-lI = I X I - ~ , (b) Ix/ Y I = lxl/lyl. 9. Let x, y, x, and w be elements of an ordered field. Suppose that x # O and w # O. Prove the following. (a) If x and w have the same sign, then x/x < y/w if and only if xw < yx. (b) If x and w have opposite sign, then x/x < y/w if and only if x w > yx. 10. Prove Theorem 6-2.7(c), (d), and (e). 11. Prove Theorem 6-2.6 in detail. 12. Show that if A is a commutative ring with an identity element e such that for every x # O in A, there is an element y E A such that x y = e, then A is a field. 13. Prove that if F is a field, and if A is a ring which is isomorphic to F (see Definition 4-2.7)) then A is a field.

>

<

6-3 The characteristic of integral domains and fields. The rules of exponents (6-2.6) resemble the identities listed in Theorem 4-3.3 for repeated sums. Indeed, the operation of forming powers of x is the multiplicative analogue of the additive operation ax defined in Definition 4-3.2. Recall that if a = n is a natural number and x is an element of a ring, then

ax= x + x + - a - + x ;
if a = O, then ax = O; if a
=

n summands
n summands

-n is a negative integer, then

There is an important classification of integral domains and fields which is based on the operation ax. THEOREM 6-3.1. Let A be aii integral domain. Theii exactly oiie of the following statements is true. (a) If x # O in A, then nz # O for al1 n E N. (b) There is a unique prime p such that pn: = O for al1 z E A, and if nx = O for z # O, then p divides n iii 2.

2 10

THE RATIONAL NUMBERS

[CHAP.

Proof. Suppose that nx = O for some n E N and x # O. By the wellordering principle for N, there is a smallest natural number p which has O for al1 nonzero this property. That is, if m < p, m E N, then my y E A, but there is an x # O in A such that px = O. Let e be the identity* of A. By Theorem 4-3.3, (pe) x = p(e x) = px = O. Thus, since A is , we obtain pe = O . Therefore, for any an integral domain and x # O We next show that p is a prime. y E A, py = p(e y) = (pe) y = O. Suppose otherwise. Then there are natural numbers r and s such that 1 < r < p, 1 < s < p, and p = r s. Therefore, (re) (se) = (rs)e2 = pe = O . Since A is an integral domain, this implies that either re = O or se = O. However, by its choice, p was the smallest natural number such that pe = O. Thus, p must be a prime. It is clear that the prime p is unique. Finally, suppose that nx = O for some natural number n and x # O in A. If p i n, then (n, p) = 1. Thus, integers a and b exist satisfying an bp = 1. Therefore, by Theorem 4-3.3, x = l x = (an bp)x = (an)x (bp)x = a(nx) b(px) = a0 bO = O . This contradiction shows that pln. To review our proof, we have shown that if statement (a) is not true, then statement (b) is true. Therefore, either (a) is true or else (b) is true. Obviously both statements cannot be true for the same integral domain A.

+ +

DEFINITION 6-3.2. Let A be an integral domain. The characteristic of A is defined to be zero if A satisfies Theorem 6-3.1 (a), and it is defined to be the prime p if A satisfies Theorem 6-3.1 (b). We say that A is of prime characteristic if its characteristic is some prime p. The characteristic of the rings of integers, the rational numbers, the real numbers, and the complex numbers is zero. I n fact a more general result can be proved. THEOREM 6-3.3. The characteristic of every ordered integral domain A is zero. Proof. Suppose that x > O in A. Then 2x = x 32 = 22 x >x x = 2x, and so on

+ x > O + x = x,

, then -x > 0, and -(nx) = In particular nx # O for al1 n E N. If x < O n(-x) # O for al1 n E N. Therefore, nx # O for al1 n E N and x E A. Consequently, Theorem 6-3.1 (a) is satisfied, and the characteristic of A is zero.

* In order to avoid confusing the identity of A with the natural number 1, we do not follow the custom of denoting the identity of A by 1 in this proof.

6-31

CHARACTERISTIC

OF INTEGRAL DOMAINS AND FIELDS

211

The fields in Examples 2 and 3 of Section 6-2 have characteristic 2. I t is possible to construct fields of arbitrary prime characteristic.

EXAMPLE l. Let p be a prime. Define Z, = {O, 1, 2, . . . , p - 1). Instead of the usual addition, negation, and multiplication of integers, define operations 0, 0, and O by

where d is the unique integer such that O

5 d < p,

and d

-+
a

b (mod p) ;

where e is the unique integer such that O

5e<

p, and e r -a (mod p);

where f is the unique integer such that O 5 f < p, and f a b (mod p). The fact that 2, is a ring with respect to these operations is a consequence of the elementary properties of integers and the relation of congruence. For example, to prove that (a @ b) @ c = a O (b O c), let a b d (mod p), where O 5 d < p. Then (a @ b) @ c = e, where e is determined by the conditions O 5 e < p and d c = e (modp). Thus, ( a + b) c = e (modp). I n exactly the same way, a O (b O c) = f, where f is the integer determined by the conditions O 5 f < p and a (b c) f (mod p). Since (a b) b = a (b c), i t follows that e f (mod p). However, e and f satisfy O 5 e < p and O 5 f < p. Thus, e = f, and consequently a O (b @ c) = (a @ b) @ c. The other ring postulates are proved similarly. Note that 2, is commutative and has 1 as an identity. These facts do not depend on p being a prime. However, the assurnption that p is prime is needed to show that 2, is a field. Suppose that a # O inZ,. Then O < a < p, so that p does not divide a. Thus, since p is prime (a, p) = 1. Consequently if b E Z,, there exists (by Theorem 5-7.2) an integer m such that am b (mod p). Let c be the unique integer such that 0 5 c < p a n d m - ~ ( m o d p ) . ThencEZ,,anda.c-a-m-b(modp). Since O 5 b < p, i t follows that a O c = b. Hence, by Definition 6-2.1, Z, is a field. Our final observation concerning 2, is that its characteristic is p. I n fact

+ +

-+ + -

+ +

p summands

is the integer d such that O


d-

< p and

l + l + . . o + l = p(modp).
' p summands

Thus, p l = O in 2,.

212

THE RATIONAL NUMBERS

[CHAP.

The fields Z,, defined in the above example, are almost as important in mathematics as the fields Q and R. They connect the theory of numbers with the powerful methods of abstract algebra. In Chapter 9, we will see an example of how a rather simple theorem about abstract fields can be translated into an important result of number theory when it is specialized to a statement about 2,. For the sake of future reference, we collect some useful facts about 2,. THEOREM 6-3.4. (a) The elements of 2, are the integers O, 1, 2, p - 1. (b) Z, is a field with respect to the operations 0, 0. (c) If a and b are elements of Z,, then

...,

e,

a.b =a

0b (modp).

1. Verify Theorem 6-3.4(c).

2. Complete the proof that Zp is a commutative ring. 3. (a) Show that the field Z2 is isomorphic to the field described in Example 2, Section 6-2. (b) Show that the field given in Example 3, Section 6-2, is not isomorphic to any of the fields 2,.
4. Show that a field which has only a finite number of elements cannot have zero characteristic.

5. Let A be an integral domain of prime characteristic p. Show that for any x and y in A, (x y)P = xp yp.

6. Let A be an integral domain with identity element e. Let B be the subring of A consisting of al1 elements ae with a E Z (see Problem 10, Section 4-6). (a) Show that if the characteristic of A is zero, then the correspondence

is an isomorphism between Z and B. (b) Show that if the characteristic of A is the prime p, then the correspondence

is an isomorphism between Zp and B.

6-41

EQUIVALENCE RELATIONS

213

6-4 Equivalence relations. In Section 6-1, the system of rational numbers was described informally. Some basic properties of this system were listed in (6-1.1), and the consequences of these properties were discussed. We now face the problem of constructing the rational numbers. Our goal is to define a set of objects, operations of addition, negation, and multiplication, and an ordering of the objects, such that the conditions of (6-1.1) are satisfied. What the objects called rational numbers really are does not matter very much. They stand for different things in different applications, The important thing is to satisfy the conditions of (6-1.1). I t is a fact (and not very difficult to prove) that any two systems which satisfy these conditions are isomorphic in the sense of Section 3-3. This remark explains why the problem of constructing the rational numbers is equivalent to the construction of a system which satisfies these conditions. How should a set of objects which satisfies the conditions of (6-1.1) be defined? By (6-l.ld), each rational number can be represented by an expression a/m, where a represents an integer, m represents a natural number, and the solidus bar / is a punctuation mark which separates a and m. This describes the symbol for a rational number in a purely formal way, and points out the fact that a rational number is really determined by the pair of numhers a and m, written in a definite order. Therefore, the symbol (a, m), which denotes an ordered pair of numbers can be used to represent the rational number a/m The set of al1 ordered pairs (a, m ) with a E Z and m E N is a definite collection of objects. Our discussion suggests that this set of ordered pairs might be a likely candidate for the set Q of rational numbers. However, there is a difficulty with this choice for Q. I t follows from (6-1.3) that different expressions a/m and b/n can represent the same rational number. For example, % = = 2. In fact, by (6-1.3a), a/m = b/n if na = mb. Thus, in the collection of ordered pairs (a, m), we must agree to somehow identify the pairs (a, m) and (6, n) when na = mb. This identification procedure is based on the important concept of an equivalence relation. Although our immediate interest is the construction of the system of rational numbers, the methods which will be discussed in the present sectioh are applicable to very general situations. The term "relation " is familiar to almost everyone. Mathematicians use many particular relations such as inequalities of numbers, inclusions of sets, congruence of integers, and similarity of geometric figures. In addition to these specific examples, the general notion of a relation on a set is of great importance in mathematics.

DEFINITION 6-4.1. Let S be any set. A relation on S is a set T of ordered pairs (z, y ) of elements of S.
At first, this definition seems to be far from the usual meaning of the term "relation. " I t would be less strange if we said that t'he relation "corre-

214

THE RATIONAL NUMBERS

[CHAP.

sponds t o n the set T of al1 ordered pairs (x, y) of elements of S which stand in the given relation. For example, the relation < on the set of al1 integers corresponds to the set of al1 ordered pairs (a, b) which satisfy a < b (or more explicitly, b - a E N ) . The trouble is that there are relations on sets of objects which correspond to the same set of ordered pairs, but would be considered different by familiar standards. Consider a set of three brothers, Jack, Jerry, and Jim. Suppose that Jack is 8 years old and 5 feet tall, Jerry is 5 years old and 4 feet tall, while Jim is 3 years old and 3 feet tall. Then the relations "is older than" and 'lis taller than" applied to this set both correspond to the following set of ordered pairs : {(Jack, Jerry), (Jack, Jim), (Jerry, Jim)). From a mathematician's standpoint, these two relations are the same even though the concepts "older " and "taller" would lead to different relations on another collection of people. Simplification is a characteristic of mathematics. Considering two relations to be identical if the corresponding sets of ordered pairs are the same is a typical example of simplifying a familiar concept so that it can be used with mathematical precision. This is the justification for Definition 6-4.1. Although a relation is defined to be a set T of ordered pairs, it is often convenient to express the fact that a certain pair (x, y) belongs to T by writing x < y, x = y, x N y, x y, or 5 = y. That is, a symbol such as is associated with the relation T and if ( 2 , y) E T then we write

and speak of the relation on S (meaning, of course, the relation T). The particular relations of congruence and inequality which we have considered in previous sections have al1 been defined and expressed in this way. The most important and useful relations in mathematics satisfy certain special conditions. The equivalence relations which we will discuss in this section are defined to be relations which satisfy three particular conditions.

DEFINITION 6-4.2. Let S be a set. Let T be a relation on S. Theii T is an equivalence relation* if (a) (x, 2) E T for al1 x E S ; (b) if (x, y) E T, then (y, x) E T ; (c) if ( x , y) E T and (y, 2) E T , then (5, x) E T .

* This notion should not be confused with the concept of the "equivalence of two sets" which was introduced in Section 1-2. The relation of equivalence of sets defined in Definition 1-2.3 is a particular equivalence relation on the class of al1 sets.

6-41

EQUIVALENCE

RELATIONS

215

I f we write x y to stand for (x, y ) E T, then the conditions (a), (b), and (c) take a more familiar form: (a') x x for al1 x E S; (b') if x y, then y x; (e') if x y and y x, thenx x.

--

The condition (a) or it,s equivalent (a') is called the "reflexive law." The properties (b) and (b') are called the "law of symmetry," while (c) and (e') are called the "transitive law." Because of the transitive law, it makes sense to write a sequence of equivalences

as we have done in the case of inequalities or congruences. Also, by the reflexivity, such a sequence can include equalities

The convenience of writing sequences of equivalences and equalities is one b is preferred to (a, b) E T. reason why the notation a

EXAMPLE l. Let S be any set. Let T = ((x, x)lx E S}. Then T is an equivalente relation on S. This equivalence relation is ordinary equality, since (x, y) E T if and only if x = y. EXAMPLE 2. Let m be a natural number. Let T = ((a, b)la E Z, b E Z, a = b (mod m)). Then T is the equivalence relation on the set Z of al1 integers (see Theorem 5-6.3), which was called "congruence modulo m" in Chapter 5. EXAMPLE 3. Let S be the set of al1 ordered pairs (m, n) of natural numbers. Let T be the collection of al1 ordered pairs of ordered pairs ((m, n), (k, 1)) satisfying m 1 = n k. Then T is an equivalence relation on S.

EXAMPLE 4. Let S be the set of al1 ordered pairs (a, m) where a E Z and m E N. Define T to be the set of al1 ordered pairs of ordered pairs

such that na = mb. Then T is an equivalence relation on S. To prove the transitive law for example, suppose that

Then na = mb and kb = nc. Therefore, nlca = mkb by cancellation, ka = mc. Thus, by definition,

mnc

nmc. Hence,

216

THE RATIONAL NUMBERS

[CHAP.

The last example is the equivalence relation which will lead to the construction of the system Q of rational numbers. For convenience, we will use the symbol = to denote this relation. That is, write (a, m)
. =

(b, n)

if

na

mb.

Example 3 is similar to Example 4. I t is possible to use this equivalence relation to obtain a new construction of the integers from the natural numbers. The process is analogous to the construction of Q using the equivalence relation of Example 4, which will be given in the next section. DEFINITION 6-4.3. Let S be a set and let on S. For x E S, define

y).

be an equivalence relation

[x]= (y E S ( x

The set [x]is called the equivalence cass of the element x with respect to the equivalence relation -. I t should be remembered that the definition of [x] depends on the equivalence relation -, although this fact is not indicated by the notation. THEOREM 6-4.4. Let be an equivalence relation on a set S. Then (4 x E [xl; (b) [x]= [y]if and only if x y; (c) if y E [x], then [x]= [y] ; (d) for any x E S and y E S, either [x]= [y]or [x]n [y]= a; (e) S = ~({CxllxE 81).

Proof. By Definition 6-4.2(a'), x x. Thus by Definition 6-4.3, x E [x]. To prove (b), suppose first that [x]= [y]. Then y E [y]= [x]. Thus, by Definition 6-4.3, x y. Conversely, suppose that x y. If x E [y),then y 2. Therefore, by Definition 6-4.2, (e'), x x. Hence, x E [x].This shows that x y implies [y]S [x].Similarly, y x implies [x]2 [y]. Consequently, since y x follows from x y by Definition 6-4.2(br), we obtain x y implies [x]= [y]. This proves (b). If y E [x], then by Definition 6-4.3, x y. Therefore, [x]= [y] by what has just been shown. This proves (e). In order to obtain (d), assume that [x]n [y] # a. Then there is a x E S such that x E [x]and x E [y]. By the property (c) which we have just established, [x]= [x]= [y]. Finally (e) is evident because by Definition 6-4.3, [x]5 S for al1 x, so that ~({[xllx E S)) E S. On the other hand, by (a), if y E S, then y E [g] ~({[xllxE S)). Hence, every element of S belongs to the union ~({[xllx E S)).

- - -

-- -

6-41

EQUIVALENCE RELATIONS

217

EXAMPLE 5. Let S be the set Z of al1 integers and let be the relation of congruence modulo m. Then for any a E Z, it is easy to see that [a] = { a + b 0 m l b = 0, h 1 , h 2 , Therefore there are exactly m distinct (and disjoint) equivalence classes; namely, [O], [l], [2], . . . , [m - 11. These are the sets which were denoted by Xo, Xl, X2, . . . , XmViin Section 5-7. EXAMPLE 6. Let S be the set of al1 ordered pairs of natural numbers, and let T be the equivalence relation on S which was defined in Example 3. If (m, n) E S , then [(m, n)] = { ( k , 1)jlc - 1 = m - n) . Consequently, there is a one-to-one correspondence between the equivalence classes for this equivalence relation and the set of al1 integers; this correspondence is given by [(m, n)] ++ m - n. EXAMPLE 7. Let S be the set F of al1 ordered pairs (a, m) with a E Z, m E N. The equivalence classes of F with respect to the equivalence relation = defined in Example 4 are [(a, m)] = {(b, n)lna = mb) . By (6-1.3a), na = mb if and only if a/m = b/n. Thus, [(a, m)] consists of al1 pairs (b, n) such that bjn = a/m. Therefore, there is a one-to-one correspondence between these equivalence classes and the rational numbers as we know them informally. This remark is the key to the formal construction of Q: the rational numbers are defined to be the equivalence classes of elements of F with respect to =.

1. Find al1 relations on the set {O, 1).

2. Find al1 equivalence relations on the set {O, 1, 2). 3. Give examples of a relation on the set {O, 1, 2) which satisfies the following conditions of Definition 6-4.2. (i) none of (a), (b), and (c) (ii) (a), but neither (b) nor (c) (iii) (b), but neither (a) nor (c) (iv) (c), but neither (a) nor (b) (v) (a) and (b), but not (c) (vi) (a) and (c), but not (b) (vii) (b) and (c), but not (a) 4. Show that the relations of Examples 3 and 4 are equivalence relations.

218

THE RATIONAL NUMBERS

[CHAP.

5. Let S be the set of al1 straight lines in the Euclidean plane. Let Po be a fixed point in the plane. Let lo be a fixed line in the plane. State which of the conditions (a), (b), or (c) of Definition 6-4.2 are satisfied by the following relations. (i) 1 m if 1 is parallel or equal to m (ii) 1 m if 1 is not parallel to m (iii) 1 m if 1 is perpendicular to m (iv) 1 m if 1 is perpendicular to m, or if 1 is parallel or equal to m (v) 1 m if 1 and m both pass through Po (vi) 1 m if 1 and m intersect in a point on the line lo (vii) 1 m if 1 # m

6. Verify directly that the equivalence classes of integers modulo m given in Example 5 satisfy the statements of Theorem 6-4.4.

7. Let S be the set of al1 natural numbers. Define T = ( ( m , n)ln divides m k for some lc, and m divides ni for some j). Show that T is an equivalence relation on S. What are the equivalence classes [l],121, and [6]with respect to the equivalence relation T?
8. A partition P of a set S is a set of nonempty subsets of S such that (i) if A E P, B E P, and A # B, then A n B = cP; (ii) U(P) = S.

(a) Show that if is an equivalence relation on S, then the set of al1 equivalence classes in S (with respect to the relation -) is a partition of S. (b) Suppose that P is a partition of S. Define x y if x E A and y E A, where A is some set of P. Show that is an equivalence relation on S.

--

6-5 The construction of Q. Let F be the set of al1 ordered pairs (a, m), with a E 2, m E N.
DEFINITION 6-5.1. Let Q be the set of al1 equivalence classes [(a, m)] of F with respect to the equivalence relation = defined by (a, m) = (6, n) if na = mb.

The elements of Q are called rational numbers. I t is necessary now to introduce operations of addition, negation, and multiplication on the set Q defined above. The discussion of Example 7, Section 6-4, suggests that [(a, m ) ] should be interpreted as the quotient a/m. On the basis of this interpretation, the laws given in (6-1.3) motivate the definitions of the operations and the ordering in Q. For example, since a/m b/n = (na mb)/mn, it is natural to define

6- 51

THE CONSTRUCTION OF Q

219

A little thought is required to see that this definition makes sense. Consider a particular example:

According to (6-2))

However, we also have, for example,

so that by (6-2),

Therefore, in order to justify (6-2), it is necessary to show that [(7, 6)] = [(42, 36)]. By Theorem 6-4.4, this condition is equivalent to

which is easily verified from the definition of =. Of course, in order to justify (6-2) generally, we must work with expressions which involve arbitrary numbers. However, before proceeding with this calculation, let us formulate the exact definitions of the operations and the ordering in Q.

DEFINITION 6-5.2. Let r and S be elements of Q. That is, r and S are equivalence classes of ordered pairs, with respect to the relation =. Arbitrarily, select (a, m) E r and (b, n) E s. Define (a) r S = [(na mb, mn)l, (b) -7- = [(-a, m)], (c) r S = [(ab, mn)], (d) r < S if na < mb.

Frequently; as in this case, mathematical definitions are made to depend on an arbitrary choice of one or more things. Whenever this happens, it is necessary to show that the object being defined does not really depend on these initial choices. If this can be proved, then the object is said to be well deJined. The fact which must be proved in order to justify Definition 6-5.2 is that the equivalence classes [(na mb, mn)], [(-a, m)], [(ab, mn)] and the condition na < mb are the same for al1 choices of (a, m) E r and (b, n) E s.

220

THE RATIONAL NUMBERS


S,

[CHAP.

Suppose that (a,m ) E r, (a', m') E r, (6, n ) E by Theorem 6-4.4,

and (b', n') E s. Then

(a, m ) = (a', m')


that is,

and and

(b, n ) = (b', n'),

m'a

ma'

n'b

nb'.

What has to be shown is that

[(na

+ mb, mn)] = [(n'a' + rn'b', m'n')],


E(-a, m)] = [(-a', m')], [(ab,mn)] = [(a'b',m'n')],
if and only if

and

na

< mb

n'a'

< m'b'.

By Theorem 6-4.4(b) and the definition of =, these conditions are equivalent to mtn'(na mb) = mn(n'a' m'b'), m'(-a) = m(-a'), (m'n') (ab) = (mn) (a'b'), and na < mb if and only if n'a' < m'b'.

+-

These results can easily be obtained from the relations m'a n'b = nb'. For example,

mar and

m'n' (na

+ mb) = (n'n)
= =

(m'a) (m'm) . (n'b) (n'n) . (mal) (m'm) (nb') mn(n'af rn'b').

+ +

Also, if na < mb, then m'n'na < m'n'mb, since m' and n' are natural numbers. Thus, nfn.ma' < m'mnb', or (mn) (n'a') < (mn) (rn'b'). Therefore, n'a' < rn'b'. In the same way, n'a' < m'b' implies na < mb. The remaining identities are left for the reader to check. THEOREM 6-5.3. The set Q defined in Definition 6-5.1, together with the operations and the ordering given by Definition 6-5.2 is an ordered integral domain.
Proof. The proof of this result is entirely straightforward, as the following sample indicates. Let r, S , and t be elements of Q, that is, equivalence

6-51

T H E CONSTRUCTION OB Q

22 1

classes of ordered pairs. We wi11 prove the distributive law (r S) t = r t s t. Let (a, m) E r, (6, n) E S, and (c, lc) E t. Then (r S) t = [(na mb, mn)] [(e, k)]. Note that (na mb, mn) belongs to its equivalente class [(na mb, mn)], so we may choose it to form the product [(na mb, mn)] [(c, le)]. By Definition 6-5.2(c),

+ + +

+ +

(r

+ S) . t

[((na

+ mb)c, mnlc)]
t
= =

[(nac

+ mbc, mnlc)].

On the other hand,

r t

+s

[(ac, mk)] [(bc, nk)] [(nlcac mlcbc, mknk)].

Finally, we observe that (nac Indeed, mlcnlc(nac Thus, by Theorem 6-4.4,

+ mbc, mnlc) = (nkac + mkbc, mknk). + mbc) = mnlc(n1cac + mkbc). (r + S) t = r t + s t.

The construction of Q given in this section encounters the following problem: the set Q defined in Definition 6-5.1 does not contain the set Z of al1 integers. However, Q does contain the subset

which has al1 the properties of Z. Indeed, by Definition 6-5.2, [(a, 0 1 [(b, 0 1 = [(a b, 1)1, -[(a, 0 1 = [(-a, 0 1 , [(a, 1) [(b, 0 1 = [(a b, 1)1, and [(a, 1)]

+
<

[(b, 1)]

if and only if a

< b.

Thus, Z' is a subring of Q. Moreover, the correspondence

is an isomorphism between Z and 2'. In fact, if a # b, then 1 a # 1 b, so that (a, 1) is not equivalent to (b, 1) under the relation --. Hence, by Theorem 6-4.4, [(a, l ) ] # [(b, l)]. This proves that the correspondence is one-to-one. The other conditions required for an isomorphism are easily obtained from (6-3).

222

THE RATIONAL NUMBERS

[CHAP.

In order to satisfy condition (b) of (6-1. l ) , we will identify each equivalente class [(a, l ) ] with the corresponding integer a. That is, [(a, l ) ] will be considered as a new label for a. The process of identifying a ring A (or a more general mathematical system) with a subring B of another ring is used frequently in mathematical constructions. This is always possible if A is isomorphic to B, and if \ve are only concerned with properties which are consequences of the operations in A. Indeed, from this viewpoint, there is no real difference between A and B. The identification of Z with Z' carries with it the identification of N with N' = {[(m, l)]jm E N). Thus, we obtain N C Z c Q. It remains to show that (6-1 .le, d) are satisfied. THEOREM 6-5.4. For any a E Z and m E N,

Proof. Since (m, 1) E [(m, l)] and (a, m) E [(a, m)], it follows from Definition 6-5.2 that

By the definition of the relatioa =, (ma, m) = (a, 1). Thus, by Theorem 6-4.4(b),, [(ma, m)] = [(a, l)]. From this theorem and the identificatioii of the integer a with the equivalence class [(a, l)], we obtain the result that Q satisfies (6-l.lc, d). THEOREM 6-5.5. I f a E Z and m E N, then [(m, l ) ] divides [(a, l)] in Q. Moreover, every element of Q is of the form [(a, l)]/[(m, l)] for some a E Z and m E N.

1. Complete the proof that the operations of Definition 6-5.2(b) and (c) are well defined.

2. Complete the proof of Theoreni 6-5.3.


3. Let A be a ring. Suppose that is an equivalence relation on A such that if x x' and y y', then x y x' y', -x -x', and x y x' y'. Show that the set of equivalence classes of A form a ring with the definition [xl [Y] = [x yl, -[xl = [-xl, and [xl . [y1 = [x yl. 4. Apply the construction outlined in Problem 3 to the ring Z with the equivalente relation of congruence modulo m. How many elements are there in the

6-51

T H E CONSTRUCTION

OF Q

223

resulting ring? Shoiv that if m is a prime, then the ring obtained by this construction is a field mhich is isomorphic to 2,.

5. Let S be the set of al1 ordered pairs (m, n ) of natural numbers with the equivalence relation defined in Example 3, Section 6-4: (k, 1 )
Define

(m, n )

if

+n

+ m.

[(k,0 1

+ [(m,4 1
<
[(m,n ) ]

-[(m, n)l t(k, 0 1 [(m,n)l [(k,l ) ]

= =

[(k m, 1 n)l, [(n,m)], [(km 1% kn Wl, if k n <m 1.

+ + + + + +

(a) Prove that these operations are well defined on the set 2' of equivalente classes of S, and that with these operations, Z' is an ordered integral domain. (b) Show that the correspondence [(m,n ) ]++ m - n is an isomorphism between 2' and the ring Z of al1 integers.

CHAPTER 7

THE REAL NUMBERS


7-1 Developrnent of the real numbers. I t is correct to say that the real number system is the foundation on which modern mathematics is built. I f al1 of the mathematical theories which depend on real numbers u7ereto be wiped out, then mathematics, and more generally al1 physical science, would be set back 500 years. This is perhaps surprising, since rigorous construction of the real numbers is a relatively recent development in the history of mathematics. Before the work of Cantor and particularly the German mathematician, Richard Dedekind (1831-1916), during the last quarter of the 19th century, the question, "What is a real number?" was largely ignored.* Real numbers were used, of course, and elaborate theories were constructed with them, but the concept of a real number was vague. Fortunately, the intuitive idea of the real number system was substantial enough so that early mathematicians were seldom led to false results by the inexactness of the definition of numbers. The house of mathematics was built, so to speak, on the forms for its foundation. The forms were filled wjth concrete only after part of the house was compieted. The history of real numbers begins with the discovery, sonletimes attributed to the Greek mathematician Pythagoras,t that there is no rational number whose square is 2. In current mathematical terrninology, this fact is expressed by saying that 2/2 is irrational. Our interpretation of Pythagoras' discovery is that the rational number system must be enlarged if we wish to take square roots, cube roots, etc. However, for the Greeks, the theorem of I'ythagoras was important because of its geometrical consequences. I t implied to them, for example, that the ratio of the diagonal of a square to its side is not a rational number. This observation was a great blow to the idealistic number mystics of the Pythagorean cult. The traditional and presumably the original proof that fi is irrational runs as follo~vs. If 2 is the square of a rational number, then we can write 2 = (a/m)2, where a and m are natural numbers with no common factor. Therefore, a 2 = 2m2. Thus, a 2 is even. Since the square of an odd number is always odd, it follo~v-s that a must be even. That is a = 26, where b E N. Substituting, \ve obtain 4b2 = a 2 = 2m2. Henec, m2 = 2b2.

* However, the theory of proportions developed by the Greek mathematician Eudoxus (408-355 B.C.) can be considered as a geometrical analogue of Dedekind's development of the real numbers. There is evidence that this discovery was not made by Pythagoras himself, but rather by one of his followers.

7-11

DEVELOPMENT

OF THE REAL NUMBERS

225

Therefore, m2 is even. Consequently, m is even. Thus, the number a has the factor 2 in common with m. This contradicts the fact that a and m were selected to be relatively prime. Hence our assumption that 2 is the square of a rational number must be false. By using the more sophisticated facts about the integers which were established in Chapter 5, we can prove a general theorem, from which Pythagoras' result is obtained as a special instance. THEOREM 7-1.1. Let m be a natural number, and let a be an integer. I f there is a rational number r such that

then r is necessarily an integer. This theorem does not state that there is, or is not, a solution of the equation xm = a. Such results depend on m and a. For example, if a = 4 and m = 2, then xm = a has two rational solutions x = 2 and x = -2; on the other hand we have just seen that if a = 2 and m = 2, then xm = a has no solution which is a rational number. What Theorem 7-1.1 says is that in searching for a rational number r such that rm = a, we can restrict our attention to integers. To prove the theorem, suppose that r is a rational number such that rm = a. Then r can be represented as a quotient b/n, where b E 2, n E N, and b is relatively prime to n. Consequently,

I f n # 1, then there is a prime p such that pln. Consequently, plbm and therefore, by (5-3.2)) plb. Ho~vever, this is impossible, since n and b are relatively prime. Thus,
n = l and r = b ~ Z .

Let us now see how Theorem 7-1.1 implies that .\/S is irrational. Suppose that fi = r E Q. Then r2 = 2. Thus, by Theorem 7-1.1 (using the case m = 2, a = 2), it follows that r is an integer. However, this is impossible, since 2 is a prime. Theref~re, 2 cannot be the square of a rational number. The discovery of the irratioriaiity of .\/S did not lead Greek mathematicians to the introduction of real numbers, although it did inspire the development of Eudoxus' theory of incommensurable line segments (which is the geometrical analogue of Dedekind's theory of real numbers, created 2200 years later). The Greeks of Euclid's era carefully separated arithmetic from geometry, and as a result, concepts such as length and

226

THE REAL NUMBERS

[CHAP.

area had only a geometrical meaning for them. The use of numbers in practica1 computations was spurned by the Greek ruling class as being contrary to "Platonic idealism. " Nevertheless, fractions were well known and commonly used in ancient Greece. In fact, there was one outstanding exception to the purism of the Greek geometers. That was Archimedes of Syracuse (287-212 B.C.),who is considered to be the greatest of al1 mathematicians before Isaac Newton (1642-1727). Archimedes used numbers extensively in his studies of volumes and areas, and his idea of successive approximation is close to the modern conception of real numbers. The intuitive notion of a real number as a specific object appeared late in the Renaissance. To be sure, rational approximations of particular numbers such as roots of integers and 71- were found and used by the Babylonians even before the period when Greek mathematics flourished. However, the idea of the system of al1 real numbers developed only after the introduction around 1600 of the familiar ('decimal point" notation for decimal fractions :

DEFINITION 7-1.2. either of the form

A decimal fraction is a rational number r which is

where the a's and b's are integers between O and 9 inclusive, or else r is the negative of such a number. If r = amlOm a , - l l ~ ~ - l al 10 a0 bl 10-1 b210-~ b,lO-n, then r is represented by the expression

+ +

The number of digits following the decimal point is called the number of decimal places in the representation of the decimal fraction r. Using Theorem 5-1.3, we can easily see that a rational number r is a decimal fraction if and only if r can be written in the form r = a/1ok, with a E Z and 1c some nonnegative integer. Thus, not every rational number is a decimal fraction. However, every rational number can be approximated by a decimal fraction. In fact, one of our main goals after defining the real numbers is to show that every real number can be approximated by a decimal fraction.

7-11

DEVELOPMENT

OF THE REAL NUMBERS

227

EXAMPLE 1. The rational number 4is not a decimal fraction. For $ = a/lOk implies lok = 2k5k = 3a, which is impossible by the fundamental theorem of arithmetic. However, 0.3 = $ a < 4< $ G = 0.4, 0.33 = < 4 < &$ = 0.34, = 0.334, < 4< 0.333 =

$ a

etc. Thus, 4differs from the n place decimal fraction 0.33


l2 = 1

. . . 3 by less than lo-".

EXAMPLE 2. It is possible to approximate 4% by trial and error. Note that < 2 and 22 = 4 > 2. It therefore seems plausible that 4 S l i e s between 1)) where O b < b 1 9. By a pair of decimal fractions 1.b and l.(b trial, we obtain

<

+ <

Thus, apparently lies between 1.4 and 1.5. If this procedure is repeated, additional decimal places are obtained:

Therefore, a perfectly straightforward search procedure yields a decimal fraction whose square is as close to 2 as desired. Our calculation shows that the square of 1.41421 differs from 2 by about By continuing the computation, we could improve this estimate as much as we wish.

The above examples suggest that infinite decimal sequences might be introduced to represent real numbers. Imagine aii endless sequence of successive decimal fractions

approximating a real number u. As in the above examples, it should be possible to obtain the n-place decimal approximation from the (n - 1)place decimal approximation by adding a single decimal digit. Then, the

228

THE REAL NUMBERS

[CHAP.

result of al1 the approximations can be conveniently represented by an infinite sequence of decimal digits:

From this infinite expression, the n-place decimal approximation is obtained by using only the first (m 1) n symbols, that is, amam-1 . . . ala0 . blb2b3 . . . bn. The expression amam-1 . . . alao . blb2b3 . . . will be called the infinite decimal sequence representing u. For exarnple, 4 is represented by the infinite decimal sequence

+ +

In practice it is not possible to specify the complete infinite decimal sequence representing a real number u, except for very special values of u. The decimal fractions are themselves represented by decimal sequences which end with an infinite string of zeros:

We will prove later that rational numbers are characterized by the fact that their infinite decimal representations are ultimately periodic, that is, from some point on, the representations consist of the repetition of blocks of decimal digits. For example,

+ is represented by 0.333333...,
& is represented by 0.090909...,
is represented by 0.416666....

I f infinite decimal sequences are used to represent real numbers, there is a tendency to identify the numbers with the sequences which represent them. A similar temptation was encountered in our discussion of the rational numbers, where we were inclined to identify rational numbers with the quotients a/m. In fact, the definition of the rational numbers in Section 6-5 was motivated by the idea that rational numbers are represented by quotients a/m, with a E 2, m E N. Similarly, it is possible to define real numbers in terms of infinite decimal sequences, but this construction involves formidable technical difficulties. I t will therefore be avoided. Instead, we will base the construction of R on the intuitive idea that there is a one-to-one correspondence between the set of al1 real numbers and the set of al1 points on a line. This geometrical motivation and the resulting construction of R, following the ideas of Dedekind, will be outlined in the next two sections. Infinite decimal sequences do, however, provide a useful way of representing real numbers, and this subject will be discussed in Section 7-9.

7-21

THE COORDINATE LINE

229

l. Indicate which of the following numbers are decimal fractions and give 6 25 their decimal representation: +, &, 12g.

2. Find the 2-, 3-, and 4-place decima,l approximation of


1 -6'

2/5.

3. Find the infinite decimal sequence representing the numbers

22,$, and
=

4. Prove that a rational number r is a decimal fraction if and only if r for some a E Z and nonnegative integer k.

a/lOk
Prove

5. Let a = p:lpg2 that the equation

. . .p p ,

where p l , p2,

. . . , p,

are distinct primes.

xm-a=O
has a rational number solution x nents el, e2) . . . , and e,.
=

r if and only if m divides each of the expo-

6. Let n be a natural number. Suppose that u and v are rational numbers which are related by the equation

Show that

By repeated use of this observation, find a rational number u such that Find a rational number u such that u2 - 10 < lob8. u2 - 5 <
7. Show that if t

>

2, then there is a rational number r such that t

> r2 >

2.

7-2 The coordinate line. The motivation for Dedekind's construction of the real number system is geometrical. As we saw in Section 6-1, the use of numbers to measure distances led to the introduction of rational numbers. Rulers were constructed by subdividing a convenient unit of length into fractional parts. A number of these subdivided units could be laid out on a single "yard stick" or "tape measure." The mathematical idea behind al1 these measuring implements is the notion of a one-to-one corresponden~ between the rational numbers and certain points of a line. The early Greek geometers were the first to realize that not al1 the points of the line were "used up" in this correspondence, that is, there are points of the line which do not correspond to any rational number. From this they drew the conclusion that numbers are inadequate for describing geometrical notions such as length and area. The modern viewpoint is quite different. The fact that the rational numbers do not fill up the whole line is accepted as evidence that the rational number system should be enlarged. Moreover, the desire to have a one-to-one correspondence

230

THE REAL KUMBERS

[CHAP.

between the set of all numbers and the set of all points on a line is the principle which guides the construction of the real numbers. If two points Poand P1are given on a line 1, then there is a natural way to set up a one-to-one correspondence between the rational numbers and points of 1 so that O corresponds to Poand 1 corresponds to P1. The construction of this correspondence parallels the steps of the construction of the rational number system given in Chapters 3, 4, and 6. Once the real numbers have been defined, this correspondence can be completed so that the real numbers are associated in a one-to-one way with al1 the points of l. The correspondence between numbers and points is called a coordinate system on l.* In this section, the one-to-one correspondence between Q and points of a line will be described. We will then see in the next section how this correspondence leads to Dedekind's definition of the real number system. The geometrical ideas are introduced only to guide our intuition toward the appropriate definitions. Accordingly, no attempt will be made to give rigorous proof s of geometrical statements. As is customary, assume that the distinguished points Po and P1 on 1 are situated so that P1 is on the right side of Po (see Fig. 7-1). The segment of 1 from Poto P1is called the basic unit interual. The length of this interval is the unit of length for the coordinate system on l. The points Po and P1 are called the origin and unit point, respectively, of the coordinate system.

The first step in the constructioii of the coordinate line consists of associating the natural numbers with points of l. Let P2be the point to the right of P1whose distance from P1is the same as the distance from P1 to P o . Such a point can be constructed mechanically, using a pair of compasses, by drawing a semicircle centered a t P1, starting a t the point Po (see Fig. 7-2). The other point of intersection of this semicircle with 1 is

* The term "coordinate system" may also refer to a one-to-one correspondence between the points of a plane and the set of al1 pairs of real numbers (see Section 8-3), or points in space and the set of al1 triples of real numbers. A line 1 together with a coordinate system on 1 is often called a coordinate line.

7-21

THE COORDINATE LINE

231

P 2 . Let P3 be the point to the right of P2whose distance from P2is also equal to the unit length. This can be constructed in the same way that P2 was obtained, since the distance from P2 to P1 is also equal to the unit length. Continue this process, obtaining points P4,P5,P 6 , . . . SO that the segments PoP1, P1P2, P2P3, P3P4, P4P5, . . . are al1 congruent. (That is, they have the same length and the ordering of their endpoints is the same: P1is right of P o , P2is right of P1,Pois right of P2,etc.). Now set up the correspondence n t-, P, between the natural numbers and the points constructed in this way (see Fig. 7-3). It is apparent that this correspondence is one-to-one. However, in order to prove this fact, it would be necessary to use some simple geometrical properties of straight lines (specifically, the fact that straight lines do not close back on themselves, as do the great circles on spheres, for example) . The next step in setting up a one-to-one correspondence between the rational numbers and points of 1 consists of defining a sequence P-1, P-2, P-3, . . . of points on 1, moving from Poto the left in such a way that al1 of the segments P-lPo, P-2P-1, P-3P-2, . . . are congruent to the basic unit interval P o P l . These points can be obtained using a pair of compasses in the same way that the points PP,P3, P4, . . . were constructed. Now define the correspondence a +-+ P, between the integers and the set of points . . . , PW3, P-2, P-1, Po,Pi, P2, P3, . . . (see Fig. 7-4). Let r be any rational number. Then r can be represented as a quotient of an integer by a natural number r = a/m. Thus, in order to obtain a correspondence between Q and points of 1, it suffices to define points P,,, on 1 for each a E Z and m E N, so that P,,, = Pb,,if and only if a/m = b/n.

232

THE REAL NUMBERS

[CHAP.

If this condition is satisfied, then the correspondence

between Q and the points P,,, of 1 is well defined (since a/m = b/n implies P,,, = Pb,n)and one-to-one (since a/m # b/n implies P,,, # Pb,n). To construct the points P,,,, choose P1,, to be the first point to the right of Poin a subdivision of the basic unit interval into m equal parts. That is, P1,, is the point on the right-hand side of Po such that the distance from Poto P1is m times the distance from Poto P1,,. For example, p1,i = P1, and P1,2 is the point which bisects the segment PoP1. To obtain the points P,,, for arbitrary a E 2, we repeat the process used to obtain the points P, associated with the integers, except that PoPl ,, is used as the basic unit interval instead of PoP1. That is, the points P2,,, P3,,, P4,m7 . . . are constructed to the right of P1,, and P-i ,, P-2,m, P-3,m, . . . are constructed to the left of Po,so that the intervals

...,
and

P 3 , m P - 2 ,m)

P-2,mP-1

,m,

P-1 ,rnPo,

Pl , m P 2 , m ,

P2,mP3,m)

are al1 congruent to the interval PoP1 .,, For example, if m = 2, the points shown in Fig. 7-5 are obtained. I t is clear from the definition of the points Pa2,that for any natural number Ic, P,,, = Pka,km. Then if a/m = b/n, it follows that

If a/m # b/n, then either na < mb, or mb < na. In case na < mb, it is evident that P,,, = Pna,,, lies to the left of P b , n = Pmb,nm.Similarly, if mb < na, then P b , n lies to the left of P,,,. In either case P,,, Z Pb,n. I f r is any rational number, we can define P, to be the point P,,,, where r = a/m is any represeiitatioii of r as a quotient. Using this more convenient notation, we can express the correspondence between Q and the points of 1 in the form r ++ P,.
Our discussion in the preceding paiagraph shom-s that this correspondence has the following basic property.

7-21

THE COORDINATE LINE

233

(7-2.1). The point P, lies to the left of t,he point P, if and only if r

< s.

Since P1, = P1,the unit interval which is used for the construction of the points PaY1 is the same as the original basic unit interval PoP1. This means that the points PaP1 are the same as the points P, which were associated with the integers. Thus, the correspondence a l 1 t . , Pa,l agrees with the previously defined association a ++ P,. I t is not possible to picture al1 of the points Pul, on a line segment, since as m gets large, these points become increasingly dense along every part of the line (see Fig. 7-6). In fact it is possible to prove the following important result. (7-2.2). I f S and T are two differeiit points of 1, then there is a point P,,, between S and T. We will not give a proof of (7-2.2) since that would require a careful formulation of geometrical principies. However, it is worthwhile to give an informal argument in support of this statement. Suppose for definiteness that S lies to the left of T. Then the basic interval can be covered by a finite number of translates* of ST. That is, there are points TI, T2, T3, . . . , T, on 1 such that T, lies to the right of P1 and the intervals PoT1, T1T2, T2T3, . . . , Tm-1 T, are al1 congruent to S T (see Fig. 7-7). Thus, we can suppose that m is the number of translates of S T needed to cover PoPl in this way. Then if PoPl is subdivided into m equal subintervals, each of these will be shorter than the intervals PoTl, TlT2, T2T3,. . . , Tm-1Tm. In particular, PoPIIm is properly contained in PoT1.

* This fundamental assumption about line segments is usually called Archimedes7principie.

THE REAL PLUMBERS

[CHAP.

There is a unique integer a such that S lies in the interval Pca-l,I,P,rm with P,,, to the right of S (see Fig. 7-8). Then PaI, lies on the Ieft side of T, since otherwise the interval P(a-I,I,PaI, woufd contain the interval S T . However, Pca-l)lmPalm 2 S T is impossible, since P(a-l)lmPa/m is congruent to P o P I I m S , T is congruent to P O T I ,arid P o P I I m c PoT1. Therefore, the point Pul, lies strictly between S and T. The density property (7-2.2) obscures t,he fact that the points P, do not filI the whole line. The sets (P,,,la E 2 ) form a "mesh" on E which becomes arbitrarily fine as m inereases. I t is conceivable therefore that ~ , ~ ~ { P , ~ ,E l a2) is the set of al1 points of l. Thefact that the points P, do not exhairst 1 is a consequence o Pythagaras' theorem that .$S is irrational. This diseovery, which greatly influenced the development of Greek mathematics, was probably made by means of a geometrical exarnple such as the folfo~ving one.

EXAMPLE 1. Draw a line 1 through diag~nallyopposite corners of a square. Set up a coordinate system on 1, using one corner of the square as the origin Po, and chwsing Pi on the segment of 1 inside the square, so that the unit of length for the coordinate system is the same as the length of the sides of the square (that is, the distance from Po t o Pi is the same as the Iength of the side of the square). Let T be the point on 1 u-hich is the corner of the square opposite to Po (see Fig. 7-9). Finally, let S be one of tho two corners of the square which is not on l. Then POST is a right triangle. Thus, by the Pythagorean triangle theorem

where PoT, POS, ST, and PoPi represent the lengths* of the line segments PoT, POS,ST, and PoPl. If T = P, for some rational number r, then the distance from Po to T is Ir1 times the length of the basic unit interval PoPl. That

---

* The eal-ly Greek mathematicians al~vaysinterpreted lengths and areas as different kinds of geometrical magnitudes (not numbers), so that the Pythagorean triangle theorem for them was a relation between the areas of three squares. However, they did assign a meaning to fractional multiples of lengths and areas, and they showed that if the length of the side of one square is r times the length of the side of another, then the area of the first square is r2 times the area of the second. Thus, the proof given in this example would have made sense to the Greek geometers.

DEDEKIND CUTS

is,

Pol'

J r J PoPi. Hence,

so that As we saw in Section 7-1,2 cannot be the square of a rational number. Therefore, T must be different from al1 of the points P,.

l. Using a ruler, draw a figure which extends the subdivision in Fig. 7-6 to include al1 points a/5 and a/6 between -3 and 3.

2. For which of the following values of Ic is i t possible to construct a rectangle with sides of integral length whose diagonal has length dk:

3. Show how to subdivide a line segment into 7 equal subsegments, using only a ruler and a pair of compasses.
4. Show that the correspondence r +-+ P , satisfies (7-2.1). 5 . Define the length of a line segment P,P, on the line 1 to be

Describe the correspondence r +-+ P , in terms of this notion of distance.

7-3 Dedekind cuts. We turn now to the problem of constructing the real numbers so that they will correspond to al1 of the points on the line l. Our purpose in this section is to show how this requirement leads to the definition of real numbers. If T is a point on 1, define

XT

{rjP, lies to the right of T).

(7-1)

That is, X T is the set of rational numbers r corresponding to points P, on

THE REAL NUMBERS

1 which lie to the right of T (see Fig. 7-10). In the example shown in Fig. 7-10, r E XT and s XT. The properties (7-2.1) and (7-2.2) of the correspondence r ++ P,, together with the definition of the sets XT, given in (7-1)) lead to a number of important facts.

(7-3.1). Let T be any point on l. Then (a) C X T C Q; (b) if r and s are rational numbers such that r < S and r E XT, then s E XT; (c) X T has no smallest element; (d) if T = P,, then X T = (S E &Ir < S); (e) if the point S on 1 lies to the left of T, then Xs XT; (f) the correspondence T ++ X T is one-to-one.

We will prove ( a ) , (b), and (c) and leave it for the reader to prove the remaining statements on the basis of (7-2.1) and (7-2.2). Let S and R be points on 1 such that S lies to the left of T and R lies to the right of T. By (7-2.2)) there are rational numbers s and r such that P, is between S and T and P, is between T and R. Then P, lies to the left of T and P, lies to the right of T. By (7-l), s 4 X T and r E XT. Therefore X T is a nonempty proper subset of &, that is, @ c X T c &. TO prove (b), we note that by (7-2.1) if r < S, then P, lies to the right of P,. Since r E XT, P, lies to the right of T. Hence P, lies to the right of T. Consequently, s E XT. The proof of (c) uses both (7-2.1) and (7-2.2). Suppose that r E XT, that is, P, lies to the right of T. By (7-2.2) there is a rational number s such that the point P, is between T and P,. Therefore, P, lies to the right of T, and to the left of P,. Hence, s E X T by (7-1)) and s < r by (7-2.1). This argument proves that for any element r of XT, there is always a smaller element s E XT. Thus X T cannot have a smallest element . The fact stated in (7-3.1 f ) that T ++ X T is a one-to-one correspondence between the points of 1 and the sets X T of rational numbers suggests that these sets might be the appropriate objects to cal1 real numbers, since our stated objective is to define the real numbers so that they will correspond to al1 the points on l. However, the sets X T are defined using rather vague geometrical ideas. The definition of real numbers should be based on the established properties of the rational number system. We would therefore like to find properties of the sets XT which characterize these sets in an

7-31

DEDEKIND CUTS

237

exact way. The properties (a), (b), and (e) of (7-3.1) satisfy this requirement. DEFINITIOX 7-3.2. A Dedekind cut* is a set X of rat,ional numbers satisfying (a) $ X S&; (b) if r < s and r E X, then S E X ; and (c) X has no smallest element. I t is nomyour contention that the sets of rational numbers thus defiiied can be identified with the sets XT. By (7-3.1), every set X T is a set of rational numbers X which satisfies the conditions of Definition 7-3.2. On the other hand, if X is a Dedekind cut, then there is a point T on the line 1 such that X = X T . This is not a statement which we can prove, but rather it is a geometrical assumption about the set of points on a line. However, this assurnption can be made plausible. Let X be a Dedekind cut. Xote that there is a point on 1which lies to the left of every point P, corresponding to a rational number r which belongs to X. Indeed, if this is not the case, then X contains every rational number, contrary to Definition 7-3.2(a). To see this, assume that there is no point of 1 which lies to the left of every point P, with r E X. This means that for every point S on 1, there is some point P, with r E X such that either P, = S or P, lies to the left of S. Let s be an arbitrary element of Q, and choose S on 1 to be a point to the left of P,. Then by assumption there is an r E X such that P, lies to the left of P,. By (7-2.1)) r < s. Since r E X, it follows from Definition 7-3.2(b) that S E X. Thus, we have shown that X contains every rational number S, contradicting Definition 7-3.2(a), as predicted. n'ow imagine that a movable poiiit indicator is placed on 1 a t a point which lies t'o the left of every point P, with r E X. Let the indicator be moved to the right, as far as it will go without passing through one of the points P, corresponding to a rational number r which belongs to X. Since X is not empty by Definition 7-3.2(a), the indicator cannot be moved indefinitely. Therefore, it must stop a t some point T, blocked by the condition that if it is moved any farther to the right, then it will pass through a point P, with r E X. We assert that X = XT. In the first place, suppose that s E X. We wish to show that S E XT, t,ha>t is, the point P, lies to the right of T. Since X has no smallest element by Definition

* The sets of rational numbers satisfying (a), (b), and (c) of Definition 7-3.2 are more properly called upper Dedekind cuts, but we will simply refer to them as "cuts." A lower Dedekind cut is defined to be a set X of rational numbers such that C X C Q, r > S and r E X implies s E X, and X has no largest element. The real numbers can be defined using lower Dedekind cuts, but i t turns out that the definition of multiplication is less natural in terms of lower cuts.

238

THE REAL KUMBERS

[CHAP.

7-3.2(c), there is an r E X such that r < s. Therefore, P, lies to the left of P,, by (7-2.1). Moreover, P, cannot lie to the left of T, since otherwise the indicator would have passed through P, in moving to the position T. Thus, either P, = T or P, lies to the right of T. Since P, lies to the right of P,, it follows that in either case, P, is t,o the right of T. Consequently, X 2 XT. Now suppose that s E XT. Then P, lies to the right of T. Since the indicator cannot be moved any closer to P, than T without passing through some point P, corresponding to a rational number r in X, there must be an r E X such that P, lies to the left of P,. Hence r < S by (7=2.1). Therefore, by Definition 7-3.2(b), it follows that S E X. This shows t,hat XT c X. Therefore, X = XT. Of course, this argument is not a proof in the mathematical sense. However, it does show, intuitively at least, that every Dedekind cut is of the form X T for some point T on the line l. This means that T t , X T is a one-to-one correspondence between the set of al1 points on 1 and the set of al1 Dedekind cuts. I t therefore seems reasonable to formally deJine the set of al1 real numbers to be the set of al1 Dedekind cuts. That is, by definition, real numbers are Dedekind cuts. Then we can say that there is a one-toone correspondence between the set of al1 points on 1 and the set of al1 real numbers. The correspondence T X T is called the coordinate system on the line 1 (or the coordinatization of 1) with the basic unit interval PoP1.

1. Which of the following sets are Dedekind cuts?

( 4 {r E &Ir 2 > 3 ) (b) { r E Q l l / r < -1) (c> { Y E &Ir2 > 1) (d) {r E QI lir3 2 0 ) (e) {r E QIO (1 y ) - l < 1) 2. Illustrate the proofs of (7-3.la, b, c) by means of diagrams.

<

3. Prove (7-3.ld, e).


4. Can the definition of a Dedekind cut be extended to any ordered integral domain? If not, state why. If so, give a reformulation of the Definition 7-3.2.

7-4 Construction of the real numbers. The motivation given in the last two sections has prepared the way for the formal definition of the real numbers and their operations. For convenience, we repeat the definition given informally at the end of the last section.
DEFINITION 7-4.1. The se R of al1 real numbers is the set of al1 Dedekind satisfy cuts, that is, the totality of the sets X of rational numbers ~vhich conditions (a), (b), and (c) of Definition 7-3.2.

7-41

COPI'STRUCTIOPI' OF THE REAL NUMBERS

239

Thus the real numbers are defined strictly in terms of the set Q of rational numbers and the ordcr relation < in Q. The reader should be aware of the fact that our point of view is now changed. The real numbers are the Dedekind cuts. The operations in R must be dcfined and their properties derived solely on the basis of Definition 7-4.1 and known properties of the rational numbers. Before considering the operations in R, it is necessary to establish some fundamental properties of Dedekind cuts.

(7-4.2). If X and Y are Dedekind cuts, then exactly one of the relations
XCY, is satisfied. X=Y, or X 3 Y

Prooj. Suppose that neither X c Y nor X = Y is satisfied. Then there is an element r E X such that r 4 Y. If s E Y, then s 5 r is impossible, because otherwise r E Y by Definition 7-3.2(b). Therefore, r < s. Since r E X, it follows from Definition 7-3.2(b) again that S E X . Hence, we have shawn that s E Y implies S E X , tjhat is, Y X. By assumption Y # X. Therefore, X > Y. We have shown that if neither of the two relations X c Y, X = Y is satisfied, then the third relation X > Y must hold. This proves that a t least one of the threc relations is satisfied. I t is obvious from the definition of set inclusiori that a t most one of the relations X c Y, X = Y, X 3 Y can be satisfied.
Thc reader should remember that for an arbitrary pair of sets X and Y, none of the relations X c Y, X = Y, or X > Y necessarily hold. Therefore (7-4.2) expresses an import,ant special property of Dedekind cuts. Correspondirig to cach rational number r there is a Dedekind cut given b y t,hc definition

X(r)
Thc correspondencc

= (t E QJt

>

rj.

++

X(r)

is onc-to-one, and it has thc following properties.

( 7 4 . 3 ) . Lct r arid s be rational numbers. ( a ) X ( r ) > X ( s ) if and only if r < s. (1,) X ( r S ) = {t u ( t E X ( r ) ,u E X ( s ) ) . ( c ) If r 2 O and s 2 0, then X ( r . s ) = ( t - u l t E X ( ~ ) , U E X ( S ) ] .

l'roof. 'i'he first of thesc statcmcnts follo\vs easily from (7-2). We will prove (b) and lcavc the proof of (c.) as an excrcise. If t E X ( r ) and u E X ( s ) , then t > r arid U > S , by (7-2). Therefore, t u >r S,

240
that is, t

THE REAL NUMBERS

+ u E X(r + S). This shows that


+ +

S). Then v > r S, by (7-2). There is a rational Suppose that v E X(r number w satisfying u > w > r S; for example, w = i ( r S u) has this property. Then w - r > S and ( u - w) r > r. Thus, if t = (u - w) r and u = w - r, it follows that t E X(r), u E X(s), and t U = (u - W) r (W - r) = u. Therefore,

+ +

+ +

S), it follows that X(r Since u was any rational number in X(r {t ult E X(r), u E X(s)). This proves (b).

S)

I t should be noted that (7-4.3~)is not true if either of the assumptions r 2 O and S 2 O is omitted. In fact, if r < 0, and S is any rational number, then {u v[uE X(r), v E X(s)) = Q (see Problem 3 below). The one-to-one correspondence r * X(r) which is established by (7-2) between the set Q of rational numbers and a set of Dedekind cuts serves to identify the rational numbers with a subset of R. Of course, Q itself is not a subset of R, and in order to be able to think of the rational number system as a part of the system of real numbers, it is necessary to "identify" each rational number r with the corresponding cut X(r). A similar identification process was used when we enlarged the system of integers to the field of rational numbers (see Section 6-5). In effect, the rational numbers are redefined to be the set of al1 Dedekind cuts X(r). I t is important to show that the operations and ordering which will be defined in R agree for cuts of the form X(r) with the usual operations and ordering in Q. Specifically, it will be necessary to prove X(r)

+ X(s)

=
= =

-X(r) X(r) X(s) and X(r)

X(r S), X(-r), X(r S), r

< X(s)

if and only if

< s.

These facts will be established as each operation is defined in R. I t is convenient to discuss the ordering of R before defining the operations of addition, negation, and multiplication.

DEFINITION 7-4.4 Order in R. Let X E R and Y E R. Define


X

<

(or Y

> X)

if X > Y.

In this case, X is said to be less than Y.

7-41

CONSTRUCTION OF THE REAL NUMBERS

241

I t may seem odd that the ordering of R is the reverse of the inclusion relation. This reversal is necessary, however, to make the ordering in R agree with the usual ordering in Q. In fact, by (7-4.3a), X(r) > X(s) is equivalent to r < s. Hence, X(r) < X(s) if and only if r < s according t o Definition 7-4.4. THEOREM 7-4.5. The ordering of R has the properties: (a) for any X and Y in R, exactly one of the relations X X = Y, or Y < X is satisfied; (b) if X < Y and Y < W, then X < W.

< Y,

The statement (a) is a reformulation of (7-4.2), using the Definition 7-4.4, and statement (b) is a consequence of the transitivity of inclusion. DEFINITION 7-4.6. Addition in R. Let X E R,

Y E R. Define

+ Y is called the sum of X and Y. I t is necessary to show that X + Y is a Dedekind cut. Obviously, Q. Since X c Q and Y C Q, there are rational numbers u @ CX + Y and v such that u < r for al1 r E X and v < s for al1 s E Y.Consequently, u + v < r + s for al1 r E X and s E Y. Therefore, u + v 4 X + Y. This proves that X + Y c Q. Next, suppose that r, and t are rational s. numbers and that r + s < t, where r E X and E Y. Then r < t Thus, t s E X. Consequently, t = (t + s E X + Y.This shows that X + Y satisfies Definition 7-3.2(b). Finally, to show that X + Y has no smallest element, suppose that t E X + Y. Then by definition, t = r + s for some r E X and s E Y. Since r is not the smallest element of X, thereexistsr' E Xsuchthatr' < r. Thent = r + s > r' + s E X + Y. Hence, t is not the smallest element of X + Y, and since t was any number in X + Y, it follows that this set has no smallest element. Therefore, X + Y satisfies al1 of the conditions required to be a Dedekind cut, so that X + Y E R.
Then X
S,
S

- S)

By Definition 7-4.6 and the equality (7-4.313)

Therefore, addition of the elements of R which we have identified with rational numbers agrees with the usual addition in Q. Defining negation in R is a bit tricky. The negative, -X, of a Dedekind cut X E R must be a cut such that the sum of X and -X is the zero f the rational numbers [which we are identifying with the element of R. I cuts of the form X(r), r E Q] are to be a subring of R,then the zero of R

242

THE REAL NUMBERS

[CHAP.

must be X(0). By Definition 7-4.6, this means that

s > O, so that s In particular, if r E X ? s E -X, then r sequently, if S E -X, then s > -r for al1 r E X ; that is, -X E {S E Qls

>

-r.

Con-

> >

-r for al1 r E X).

As a first guess, one might suppose that the set Tx = {S E QIs -r for al1 r E X)

is the cut -X for which we are looking. Indeed, it is easy to see that Tx satisfies the first two conditions in Definition 7-3.2 of a Dedekind cut, namely, @ C Tx c Q, and if s < t and s E Tx, then t E Tx (see Fig. 7-11). What makes Tx most attractive as a candidate for the role of -X is the fact (which we will not prove) that {r sjr E X, s E Tx) = X(0). That is, if Tx is a Dedekind cut, then according to Definition 7-4.6, X Tx = X(0). Unfortunately, the set Tx is not always a cut. In fact, if X = X(r), where r E Q, then

Tx

{S E Qjs > {S E Q/s > = {S E Qjs > = {S E QIs 2


=

-t for al1 t E X(r)) -t for al1 t E Q such that t > r} -t for al1 t E Q such that -t < -r) -r),

and the set Tx has a smallest element -r. Therefore, Tx is not a Dedekind cut, by Definition 7-3.2(c). I t appears, however, that the negative of X

7-41

CONSTRUCTION OF THE REAL NUMBERS

243

should be Tx if T x has no least element, and it should be the set obtained from Tx by deleting the srnallest element, in case Tx has a least element. A convenient way to formulate this definition is to say that -X is the set of al1 elements in T x which exceed some other element of Tx. This description of -X is equivalent to the following one.

DEFINITION 7-4.7.
defined to be

Negation in R. Let X E R. The negative of X is


E Q such that

-X

(r E QIr

> t for some t

>

-S

for al1 s E X).

We must prove that -X is a Dedekind cut. Obviously, -X G Q. Moreover, if r E X, then by Definition 7-4.7, -r 4 -X. Hence, -X C Q: Suppose that t X. Then t < S for al1 s E X, by Definition 7-3.2(b). Consequently, -t > -S for al1 S E X. Thus, if r > -t, then r E -X. Therefore, @ c -X. I t is obvious from Definition 7-4.7 that if r < s and Finally, if r E -X, then by Definition 7-4.7, r E -X, then S E -X. r > t for some t such that t > -8, for al1 s E X. Let r' be a rational Connumber satisfying r > r' > t (see Problem 1 ) . Then r' E -X. sequently, r is not the smallest element of -X. Since r was an arbitrary element of -X, it follows that -X has no smallest element. Therefore, -X is a Dedekind cut. We note that negation in R agrees on cuts of the form X(r) with negation in Q. In fact, by Dehition 7-4.7,

(u E Qlu > t for some t E Q such that t > -S for al1 S E X(r)). Since X(r) = (S E Q / s > r) , we have
-X(r)
=

-X(r)

(u E Qlu

> t for some t

E Q such that

>

-S

for al1 S

> r).

I f t > -S for every rational number S which is greater than r, then -t < S for al1 s > r. This implies that -t 5 r; that is, t 2 -r. Conversely, if t -r, then -t 5 r, and therefore -t < S for al1 S > r. We have shown -r. Consequently, that t > -s for al1 S > r if and only if t

>

>

-X(r)

= =

( u E Qlu > t for some t E Q such that t (U E QIu > -r> = X(-r).

-r>

Before we define multiplication in R, it is nec,essary to prove a simple fact about negation.

f X E R and X (7-4.8). I

< X(O), then X(0) <

-X.

244

THE REAL NUMBERS

Proof. By Definition 7-4.4, X

< X(0) implies

Let S be an element of X which is not in X(0). Then S 5 O. Since X has no smallest element, there is a rational number t E X such that t < S 2 0. f r E -X, then r > -t, so that . Hence, -t E X(0). I Then -t > O r E X(0). This shows that -X 2 X(0). Since - t E X(0) and -t 4 -X (by Definition 7-4.7)) it follows that -X C X(0). Therefore X(0) < -X. As one might expect, X(0) is the zero element of R. Therefore, X E R is called positive if X(0) < X, negative if X < X(O), and nonnegative if X(0) < X or X(0) = X (that is, X(0) 5 X). By (7-4.8)) if X is negative, then -X is positive. DEFINITION 7-4.9. Multiplication in R. Let X E R and Y E R. Define (a) X Y = {r slr E X, S E Y)if X and Y are nonnegative, (b) X Y = -[(-X) Y] if X is negative and Y is nonnegative, if X is nonnegative and Y is negative, and (c) X Y = -[X (- Y)] (-Y) if X and Y are negative. (d) X Y = (-X) To justify this definition,* three remarks are needed. First, if X and Y are nonnegative, then {r slr E X, S E Y)is a cut. The proof of this fact is similar to the argument which we gave to show that ( r slr E X, S E Y> is a Dedekind cut, and we leave it for the reader. [Note that since X E X(0) and Y c X(O), it follows that C X Y E X(0) c Q].Second, if X is negative and Y is nonnegative, then -X is positive, so that the expressiori -[(-X) Y] makes gense because products of nonnegative cuts have been defined. A similar remark applies to the cases (e) and (d). Our final remark is that if r and S are rational numbers, then

* It is unfortunate that to define the product of two Dedekind cuts, four cases must be considered separately. We can easily see, however, that if either X or Y is negative, then ( r slr E X, S E Y) = Q (see Problem 3). Thus it would not do to use Definition 7-4.9(a) without some restriction on X and Y. This problem can be avoided by arranging the construction of R in a different order. Instead of proceeding from the natural numbers, to the integers, to the rational numbers, to the real numbers, as we have done in Chapters 3, 4, 6, and 7, the system R could have been obtained by the construction:
natural numbers
-+

-+positive

positive fractions -+ positive reals and negative reals.

This route from N to R is somewhat more convenient, but less interesting, because i t does not give us an opportunity to study the important rings Z and Q along the way.

7-41

CONSTRUCTION OF THE REAL NUMBERS

245

If r and s are nonnegative rational numbers, this identity follows from (7-4.3~). Then using the identity X(-r) = -X(r), together with Definition 7-4.9(b), (e), and (d), the desired result is easily obtained for al1 combinations of the signs of r and s. For example, if r is negative and S is nonnegative, then X(r) X(s) = -[(-X(r)) X(s)] = -[X(-r) X(s)] = -[X((-r) S)] = X(-[(-r) S ] ) = X ( r S). Until now, the only examples of Dedekind cuts which we have seen are the sets X(r) corresponding to rational numbers. In Section 7-10, it will be shown that the sets Q and R do not have the same cardinal number. Consequently, there must be a vast set of cuts which are not of the form X(r) for any r E Q. I t seems worthwhile to give here a specific example of such a cut.

EXAMPLE 1. Let X = {r E Qlr > O and r2 > 2). Obviously, @ C X C Q, and if r E X, r < S, then s E X. To prove that X has no smallest element, suppose that r E X. Let r 1 S = -+-. 2 r

Then r - s = r/2 - l/r = (r2 - 2)/2r > O . Hence, r > s > O. Also, s2 - 2 = (r2 - 2)2/4r2 > 0, SO that s2 > 2. Therefore, s E X. This proves that X is a Dedekind cut. Obviously X is nonnegative. Thus, X2
=

(r-slr E X, s E X).

If r E X and s E X, then (r s ) = ~ r2 s2 > 2 . 2 = 4. Therefore, r s > 2. That is, X2 X(2). On the other hand, if t > 2, it is possible to find positive rational number r such that t > r2 > 2 (see, for example, Problem 7, Section 7-1). Hence, r E X and t E X2. This shows that X2 X(2). Therefore, X2 = X(2). In other words, X is the real number 2/2. In particular, X cannot be of the form X(t) for any rational number t, since otherwise X(t)2 = X(2) would imply t2 = 2.

THEOREM 7-4.10. The system R of al1 real numbers, given by Definition 7-4.1, with the operations of addition, negation, multiplication, and order defined by Definitions 7-4.6, 7-4.7,7-4.9, and 7-4.4, and with X(0) and X ( l ) as zero and identity element, is an ordered field.
The reader should refresh his memory by listing al1 the identities which have to be checked in the proof of this theorem. Some of these are trivial. Por example,

246

THE REAL NUMBERS

[CHAP.

establishes the commutative law of addition. The identities which involve multiplication (particularly the distributive law) are troublesome, because their proofs require the consideration of numerous cases. There are two rules whose proofs involve a new idea. x (-X) = X(0). I f X E R, W E R, and X # X(O), then there is a Y E R such that X . Y = W.

The proofs of both these results use the following property of Dedekind cuts. (7-4.11). I f X is a Dedekind cut, and if r is a rational number greater than zero, then there is a rational number s such that s 4 X and s + r ~ X . Proof. Since X c Q, there is some s E Q with s 4 X. Suppose that r 4 X. (7-4.11) is false. Then for any s not in X, it follows that s Starting with such an S, we obtain S r 4 X, s 2r = (S r) r 4 X, nr 4 X for al1 s 3r = (S 2r) r 4 X, etc. By induction, s natural numbers n. However, this is impossible. In fact, since r > O, it is nr exceeds any rational possible to choose n large enough so that s number. In particular, choosing t E X, we can find n so that s nr > t. Then by Definition 7-3.2(b), S nr E X.

+ +

+ + + +

Using (7-4.11)) we show that X (-X) = X(0). Let r E X and s E -X. Then by Definition 7-4.7 there is a rational number t such that s > t, and t > -u for al1 u E X. In particular, s >. -r, so that r (-X) = (r slr E X, s E -X} c X(0). On s >O . Therefore, X the other hand, suppose that r E X(O), that is, r > O . By (7-4.11)) it is possible to find S E Q such that s 4 X, and s (r/2) E X. Hence, s < t for al1 t E X. Consequently, (-S) r/2 > -S and -S > -t for al1 r/2 E -X. I t follows that t E X. Therefore, (-S)

(-X) 2 Since r was an arbitrary element of X(O), we have proved X X(0). Therefore, X (-X) = X(0). We conclude this Section by showing that if X > X(0) and W 2 X(O), then there is a Dedekind cut Y such that X Y = W. Define

{r/slr E W, O

< s < t for al1 t

E X}.

There must be rational numbers s satisfying O < s < t for al1 t E X, because X is a Dedekind cut which is properly contained in X(0) = (r E Q I ~ > O>. Hence, @ c Y c X ( 0 ) cQ. If r E W and O < s < t

7-41

COPI;STRUCTIOS OF THE REAL XUMBERS

247

for al1 t E X, and if r / s < u, then r < su. Hence, su E W and u = su/s E Y . Finally, Y has no smallest element, because if r / s E Y, with r E W and O < S < t for al1 t E X, then there exists r' E W such that r' < r. Consequently r r / s < r / s and r r / s E Y. This proves that Y is a nonnegative Dedekind cut . By Definition 7-4.9,

I f u E X and O < s < t for al1 t E X, then in particular s < u. Moreover, since r E W, and W >_ X(O), i t follows that r > O. Therefore, u ( r / s ) > r. Hence, X Y W. To reverse this inclusion, suppose that r E W. Since W has no smallest element, there is an r' E W with r' < r. Then ( r - rr)/r' > O. Select s E Q so that O < s < t for al1 t E X. Then s(r - rr)/r' > O. Hence, by (7-4.11)) there exists S' E Q such that S' 4 X and S' [s(r - rr)/r'] E X. We can suppose that S' >_ S , since otherwise S' could be replaced by s. Since S' 4 X, it follows that O < S' < t for al1 t E X. Thus rr/s' E Y by the definition of Y. Therefore,

Consequently, r E X Y. Since r was any element of W, we have proved that X . Y 2 W. Therefore, X Y = W .

1. (a) Show that if u and v are rational numbers with u < u, then w = +(u u) is a rational number satisfying u < w < v. (b) Use part (a) to prove that X(r) is a Dedekind cut for every rational number r.

2. Prove (7-4.3~). 3. Show that if r

< O and S is any rational number, then


> X(0).
Define a cut Y such that Y 2
=

4. Suppose that X

X.

5. Show that X

<

X(r) if and only if r E X.

6. Prove that if X is a Dedekind cut, then the set Tx = { S E Qls > -r for al1 r E X) satisfies Definition 7-3.2(a) and (b), and {r slr E X, s E Tx) = X(0) 7. Draw a diagram to illustrate the proof that X (-X) 2 X(0).

8. Show from the definition of multiplication that X . X(0) = X(0) for al1 X E R.

248

THE REAL NUMBERS

[CHAP.

9. Show from the Definitions 7-4.4 and 7-4.7 that if X

<

Y, then

10. Prove the following laws in R. (a) X (Y TIT) = ( X Y) IV (b) X X(0) = X (c) X . Y = Y - X (d) X . (Y TV) = ( X Y) W (e) X < Y implies X 1 1 ' < Y TV (f) X < Y and TV > X(0) implies X TV

+ + +

+ +

+
=

11. Show that -(X

+ Y)

(-X)

12. Prove the distributive law TV = (W X) (W Y) in the following cases. (a) X, Y and TV are nonnegative Y 2 X(0) (b) TY 2 X(O), X < X(O), X [Hint: Consider W (X Y) W (-X) .] (c) W X(0)) X < X(O), Y 2 X(0)) X Y < X(0) (d) X(0)) X < X(0)) Y < X(0) (e) H7 < X(O), X, Y arbitrary

+ + (-Y). (X + Y)

<

Y T Y

w>

>

+ + +
=

13. Prove directly that X X ( l )

X for al1 X E R.

14. Show that if X # X(0) and TY is any element of R, then there exists Y such that X Y = TV. It is necessary to consider the three cases: X > X(O), W < X(0); X < X(O), T V 2 X(0); X < X(O), T.V < X(0). These can be reduced to the case X > X(O), W 2 X(O), which has already been considered.

7-5 The completeness of the real numbers. Theorem 7-4.10 shows thal the system R of al1 real numbers is an ordered field. The same is true of the rational numbers, but we have seen that the field of real numbers is more versatile than the field Q. For example, such equations as x 2 = 2, x 2 = 3, x 2 = 5,etc., can be solved in R , but not in Q. 1s it possible to find a property of R which distinguishes it from arbitrary ordered fields? In this section we will show that such a fundamental property exists.

DEFINITION 7-5.1. Let A be an ordered integral domain." Let S be a subset of A.


(a) An element x E A is called an upper bound of S in A if x al1 y E S. (b) An element z E A is called a iower bound of S in A if x al1 y E S.

2
5

y for
y for

* To state this definition, or Definition 7-5.2,i t is not necessary that A be an integral domain. The only requirement is that A be a partially ordered sei. That x for al1 x E A ; is, there is a relation 2 defined on A which satisfies (i) x (ii) if x 5 y and y x, then x = y; (iii) if x 5 y and y 2, then x 2.

<

< <

<

7-51

THE COMPLETENESS OF THE REAL NUMBERS

249

EXAMPLE 1. Let A = Q. Let S = ( r E Q]r2< 2). Then t E Q is an upper bound of S if t > O and t2 > 2 (that is, considered as an element of R, t > 4 2 ) . An element u E Q is a lower bound of S if u < O and u2 > 2 (that is, u < - 4 2 ) . EXAMPLE 2. Let A = Z . Let S = ( a E ZIa2 < 2). Then S = (-1, 0, 1). Consequently, the upper bounds of S in Z are al1 integers b 2 1. The lower bounds of S are the integers b 5 -1. EXAMPLE 3. If A is an ordered integral domain, and if S is a subset of A which has a greatest element x, then the upper bounds of S in A are al1 elements y E A such that y 2 x. Similarly if x is the least element of S, then the lower bounds of S are al1 of the elements u E A such that u 5 2 . EXAMPLE 4. Let A bound in A.
=

&, S

Z . Then S has no upper bound, and no lower

DEFINITION 7-5.2. Let A be an ordered integral domain. Suppose that S is a subset of A. An element x E A is called the least upper bound of S in A if x is the smallest element in the set of al1 upper bounds of S. An element y E A is called the greatest lower bound of S in A if y is the largest
element in the set of al1 lower bounds of S. I t is sometimes convenient to have a more formal statement of this definition. Referring to Definition 4-6.3, we see that x is the least upper bound of S if and only if (a) x 2 y for al1 y E S, (b) if x 2 y for al1 y E S, then x 2 x. Similarly, x is-the greatest lower bound of S if and only if (a') x 5 y for al1 y E S, (b') if x 5 y for al1 y E S, then x 5 x. Since the largest element in a set and the smallest element of a set are unique, if they exist a t all, we are justified in speaking of the least upper bound and the greatest lower bound. Of course, the least upper bound and the greatest lower bound of a set may not exist a t all. The expressions 1.u.b. S and g.1.b. S are frequently used as abbreviations for the least upper bound of S and the greatest lower bound of S, respectively. Of ten the Latin terms "supremum" and "injimum" are used instead of "least upper bound" and "greatest lower bound." In this case, the abbreviations sup S and inf S are used. Thus, 1.u.b. S = sup S, and g.1.b. S = inf S.
EXAMPLE 5. Let A = Q and let S = {r E &Ir2 < 2). Then S has no least upper bound and no greatest lower bound in Q, because the set

250

THE REAL NUMBERS

[CHAP.

of al1 upper bounds of S has no smallest element, and the set

of al1 lower bounds of S has no largest element. (See the Example in Section 7-4 .)

EXAMPLE 6. Let d
Then 1.u.b. S 7-4 .)
=

R and let

dS

in R and g.1.b. S

-dS

in R. (See Example 1, Section

= R and let T = {X E RIX2 5 2 ) . Then 1.u.b. T and g.1.b. T = -dS. Note that in this example, the least upper bound and the greatest lower bound of T actually belong to T, whereas in Example 6, this was not the case.

EXAMPLE 7. Let A

2/Z

THEOREM 7-5.3. Let F be an ordered field. Let S and T be nonempty subsets of F such that g.1.b. S and g.1.b. T exist. (a) I f U = (x+ ylx E S , y E T), theng.1.b. U = g.1.b. S + g.1.b. T. (b) I f V = (x ylz E S, y E T), and if al1 the elements of S and T are nonnegative, then g.1.b. V = (g.1.b. S ) . (g.1.b. T) . (c) I f W = {-x!x E S), then 1.u.b. W = -(g.l.b. S).

Proof. We will prove (a) and (c), leaving (b) as a test for the reader. By definition of the greatest lower bound, it follows that g.1.b. S 5 x for al1 x E S and g.1.b. T 5 y for al1 y E T. Hence, g.1.b. S g.1.b. T x y for each x E S and y E T. That is, g.1.b. S g.1.b. T is a lower bound of (x ylz E S, y E T) = U. We wish to show that this sum is the greatest lower bound of U. That is, if x x y for al1 x E S and y E T, then x 5 g.1.b. S g.1.b. T. Let x be an arbitrary element of S. Then x x y for al1 y E T, so that x - x is a lower bound of T. Therefore, x - x g.1.b. T. Transposing, we obtain x - g.1.b. T x. Since x can be any element of S, it follows that x - g.1.b. T is a lower bound of S. Thus, x - g.1.b. T g.1.b. S. This gives the desired result : x g.1.b. S g.1.b. T. Therefore, (a) is proved. To prove (c), note that by Definition 7-5.1, w is an upper bound of W if and only if w 2 -x for al1 x E S. The condition w 2 -x is evidently equivalent to -w x. Thus, w is an upper bound of W if and only if -w is a Iower bound of S. Since S has a greatest lower bound, the condition for -w to be a lower bound of S is the same as -w 5 g.1.b. S, or -(g.l.b. S). This sequence of equivalent statements equivalently w shows that -(g.l.b. S ) is an upper bound of W and every other upper bound of W is larger. Therefore, -(g.1.b. S) is the least upper bound of W.

< + <

+ +

<

<

< <

<

>

7-51

THE COMPLETENESS

OF THE REAL NUMBERS

25 1

A useful case of Theorem 7-5.3 occurs when t,he set T consists of a single element y. The laws (a) and (b) then become (a') g.1.b. (x yJx E S ) = (g.1.b. S ) y, (b') g.1.b. {x ylx E S} = (g.1.b. S) y, provided that y and al1 of the elements of S are nonnegative.

The reader should be able to formulate and prove an analogue of Theorem 7-5.3 for least upper bounds. If A is any ordered integral domain and S is the empty set, then every element of A satisfies the condition for being an upper bound and a Iowei bound of S. This fact may seem strange, but a careful reading of Definition 7-5.1 shows that it is true. For instance, the condition x 2 y for al1 y E @ is satisfied vacuously, because there is no y in @. I t follows that the empty set has no least upper bound, and no greatest lower bound, since an ordered integral domain has no greatest element and no least element (see Problem 4 below). Also, if the set S has no upper bound, then it cannot have a least upper bound. I f S has no lower bound, then it cannot have a greatest lower bound. There are two important examples of ordered integral domains in which every nonempty set which has an upper bound also has a least upper bound, and every nonempty set which has a lower bound also has a greatest lower bound. These are the rings Z and R. DEFINITION 7-5.4. An ordered integral domain A is called complete if it satisfies: (a) if S is a nonempty set in A which has an upper bound in A, then 1.u.b. S exists; (b) if S is a nonempty set in A which has a Iower bound in A, then g.1.b. S exists. We leave it as a problem for the reader to show that Z is complete. THEOREM 7-5.5. R is a complete ordered field.

Proof. Let S be a nonempty set of Dedekind cuts ~vhichhas a lower bound. That is, there exists a cut X such that X 5 Y for al1 Y E S. By definition of the ordering in R, this means that Y C X for al1 Y E S. Define w = u({Y(Y E S)).
We will show that W is a Dedekind cut. Since S is not empty, there is some Y E S. Therefore, c Y G W. Since every Y in S is contained in X, it follows that W c X c Q. Therefore, W satisfies condition (a) of the definition of a Dedekind cut. Suppose that r E JV and r < s. Then there is some Y E S such that r E Y. Since Y is a cut, r E Y and r < s implies

252

THE REAL NUMBERS

[CHAP.

S E Y . Hence, S E Y W. Finally, W has no smallest element. For if r E W, then r E Y for some Y E S. Since Y has no smallest element, there is a rational number r' such that r' < r and r' E Y W. That is, for every number in W, there is a smaller number in W. Consequently, W has no smallest element, as claimed. We have shown that W satisfies al1 the conditions of a Dedekind cut. Therefore, W E R. We next prove that W is the greatest lower bound of S. By definition of W, if Y E S, then Y G W. Therefore, W Y for al1 Y E S, so that W is a lower bound of S. Suppose that U is any lower bound of S. That is, U 5 Y for al1 Y E S. Thus, Y U for al1 Y E S. Hence,

<

u ( { Y J YE S ) )

e u.

Therefore, U 2 W. This shows that any lower bound of S in R is less than or equal to W, so that W is the greatest lower bound of S. Our proof up to this point shows that if S is a nonempty subset of R which has a lower bound, then g.1.b. S exists. To complete the proof, it is necessary to prove that if T is a nonempty subset of R which has an upper bound, then 1.u.b. T exists. Let S = {-XIX E T). I f U is an upper bound of T, then - U is a lower bound of S. Hence, g.1.b. S exists. Noting that T = {- Y1 Y E S ) , it follows from Theorem 7-5.3(c) that 1.u.b. T exists and is equal to -(g.l.b. S). This completes the proof of Theorem 7-5.5.

1. Which of the following sets have upper bounds in Q? Which ones have lower bounds in Q?

(a) (b) (c) (d) (e)

{la - blla E 2, b E 2 ) (rnln E N), where r E Q, O < r { r n ( nE N), where r E Q, r > 1 (-nln E N) ( a . b / ( a 2 b2)la E N, b E N )

<

2. Determine the least upper bounds and greatest lower bounds in Q (whenever they exist) of the sets given in Problem 1.

3. Show that any nonempty finite subset of an ordered integral domain has a least upper bound and a greatest lower bound.
4. Show that if A is an ordered integral domain, and if S upper bound in A and no lower bound in A.
=

A, then S has no

5. Show that if S is a nonempty set in an ordered integral bomain, then every upper bound of S is greater than or equal to every lower bound of S. Can the equality ever hold? If so, when? Show that for any set S such that g.1.b. S and 1.u.b. S exist, the inequality g.1.b. S 5 1.u.b. S is satisfied.

7- 61

PROPERTIES OF COMPLETE ORDERED FIELDS

253

6. Give examples of nonempty subsets S of Q which have upper and lower bounds, satisfying (a) g.1.b. S exists, but 1.u.b. S does not exist, (b) g.1.b. S does not exist, but 1.u.b. S exists.

7. State the analogue of Theorem 7-5.3 for least upper bounds.


8. Use the well-ordering principie to show that Z is a complete ordered integral domain. 9. Let S be a subset of Z such that 1.u.b. S exists. Prove that 1.u.b. S E S.

7-6 Properties of complete ordered fields. I t is difficult to overestimate the importance of the completeness property of the real numbers. Almost al1 of the fundamental theorems of analysis make use of completeness. In fact, one naturally wonders if it would be possible to construct mathematical theories such as calculus, using an arbitrary complete ordered field rather than the particular field R. The answer is that this would be possible, but because of the following theorem the results of this theory would not be any more general than the usual theorems concerning the real numbers.
THEOREM 7-6.1. Let F be a complete ordered field. Then there is an isomorphism between F and R which preserves the ordering. That is, there is a one-to-one correspondence between F and R such that if x and y in F correspond respectively to X and Y in R, then

and

x < y

ifandonlyif

<

Y.

Theorems 7-5.5 and 7-6.1 are the two most important results concerning the system of real numbers. Taken together, these theorems te11 us that there is one, and, except for differences in the description of the elements and operations, only one complete, ordered field. Theorem 7-6.1 also shows that any property which can be proved for the real numbers is a consequence of the ordered field properties and completeness. The complicated description of R by means of Dedekind cuts can now be discarded.* It was needed only to prove the existence of a complete ordered field. We will not prove Theorem 7-6.1 in spite of the importance of this result. Instead, the use of completeness will be illustrated by proving two important elementary theorems about R. To emphasize the fact that only

* However, there are a few results concerning R which are proved most easily by using the properties of Dedekind cuts. We will find such an example in Section 7-9.

254

THE REAL NUMBERS

[CHAP.

the ordered field properties aiid completeness are used in the proofs, we will state these theorems for complete ordered fields. Then by Theorem 7-5.5, they are true for R in particular. THEOREM 7-6.2. Let F be a complete ordered field. (a) Suppose that x > 1 in F. Then for any y E F, there is a natural number n such that xn > y. (b) Suppose that O 2 x < 1 in F. Then for any y > O in F, there is a natural number n such that xn < y.

Proof. If statement (a) is false, then the set S = (x, x2, x3, . . .) has an upper bound y in F. Hence, by completeness there is a least upper bound w of S in F. Then w 2 xn for al1 n E N. Since x > O, it follows that x-1 > O. Hence, w x-' 2 xn-1 for al1 n E N. Thus, w x-l is also an upper bound of S. Since w is the least upper bound, this implies that w x-l 2 w. Consequently, z-l 2 1 because w > O . Thus, x 1. This inequality contradicts the original assumption that x > 1. To prove (b), we first dispose of a trivial case. I f x = O, then y > O = xl, so that (b) holds with n = 1. I f x >O , then since x < 1, it foIlows that 1 < x-l. By (a), there is a natural number n such that (x-')" > y-'. Consequently, xn < y.

<

THEOREM 7-6.3. Let F be a complete ordered field. Let x E F be positive. Suppose that m is any natural number. Then there is one and only one positive x E F such that xm = x.

Proof. This theorem is trivially true for m = 1, so that it can be assumed that m > 1. However, with minor changes of notation, the argument which follows is valid in the case m = 1, also. The proof is divided into three parts. (1) We will use the completeness of F to show that there is an element x E F satisfying the following conditions: (a) x > 0; (b) if O y < x, then y" < x; (c) if x < w, then wm 2 x.

<

To obtain such an x, define

Then S contains some positive element of F. For example, if y=min(l,x/2), so that then O < y I l
=

and y

y<x,

< y ' 1-1 ym = y . ym-l -

< 2.

7-61

PROPEETIES OF COMPLETE ORDERED BIELDS

255

Hence, y E S. Moreover, S has an upper bound. In fact, if w then w 2 1 and w 2 z, so that

> max (1, x),

Thus, w 2 y for al1 y E S. For if this is not the case, then w < y for some y E S. I t would then follow that wm < y" < 2, which is contrary to wm 2 x. Therefore, in particular, max (1, x) is an upper bound of S. By the completeness of F, the set S has a least upper bound in F. Let

We will show that x satisfies (a), (b), and (e). Since x y for al1 y E S and some y E S satisfies y > O, it follows that x > O. To prove (b), suppose that O 5 y < x. Then y is not an upper bound of S, since x is the least upper bound of S. Therefore, y < yl for some y E S. Consequently by the definition of S, y" < y? < x. To prove (c), suppose that x < w. Then w 4 S, because x is an upper bound of S. Therefore, wm is not less than z, that is, wm x. (2) We now show that if x satisfies (a), (b), and (c), then both of the inequalities xm > x and xm < z lead to contradictions. Therefore, xm = z. Suppose first that xm > z. Let

>

>

y = max {O, x Then (d) 0 I y < x, (e) (X - . y ) . (m.xrn-l ) Therefore,

(xm - z) (m xm-')-'l.

5 xm - x.

y". However, by (d) and (b), y" < x. Thus, xm > x Consequently, x leads to a contradiction. Next, assume that xm < X. Define w = min {22, x Then (f) x < w I 2 x , (g) (w - x) (2mxm-1) 5 Therefore, using the identity

<

+ (z - xm) . (2m xm-')-'}


2 -

xm.

256

THE REAL NUMBERS

[CHAP.

(see Problem 6, Section 2-l), we obtain

Hence, wm < x. However, by (f) and (e), x 5 wm. Therefore, the assumption xm < x also leads to a contradiction. The only remaining possibility is xm = x. (3) We complete the proof of Theorem 7-6.3 by showing that there is only one positive x E F such that xm = x. Suppose that x and y are in F, O < x, O < y, xm = x, and y" = x. Then xm = y", so that

Since O

< x and O < y, it follows that


Therefore, x

. We have O = (x - y)w and w # O x = y. This completes the proof.

- J Z

O, that is?

This proof is a typical sample of the reasoning methods which are used in analysis. To a beginning student, such proofs look very mysterious and complicated. Often the problem is that the details obscure the simple idea on which the argument is based. In order to understand such a proof, it is necessary to strip away the details and find the underlying idea. The above proof provides a good example. As the quantity y increases, starting a t zero, the value y" also increases continuously, that is, it does not jump. Since y" will ultimately exceed x, there must be some first value x of y for which xm x. This value is obtained in (1) by taking the least upper bound of (y E F ( y 2 0, ym < x). Then the fact that the increase of y" is continuous implies that y" cannot have "jumped over" z a t x. Therefore, xm = x. This is what was established in part (2) of the proof. The unique positive x E F satisfying xm = x is called the mth root of x in F. This quantity is usually denoted by

>

7-61

PROPERTIES OF COMPLETE

ORDERED FIELDS

257

(If m = 2, the expression is customarily abbreviated to 4.) I t is = O. I f z < O, then the expression is not convenient to define defined. * By Theorems 7-5.5 and 7-6.3, we have proved that every positive real number has a unique positive mth root for al1 natural numbers m.

1. Let e be the identity element of a complete ordered field F. Show that if y E F , then there exists a natural number n such that y < ne. 2. Let F be a complete ordered field. Let m be an odd natural number. Let
z E F (either positive or negative). Prove that there is one and only one x E F

such that xm = z. 3. Let F be a complete ordered field. Suppose that x aEZ,mEN, llm a xaIm = (x ) .

2 O in F.

Define for

Show that if a/m = b/n, then s a l m = xbln. Thus, xr is well defined for every rational number r. Prove the following rules of exponents. (a) xr xS = xr+Sfor x 2 O in F, r E Q, and S E Q. (b) (x')~ = x(r.S) for x 2 O in F, r E Q, and S E Q. (c) (x y)' = x r . yr for x O and y 2 O in F, and r E Q.

>

4. Write the proof of Theorem 7-6.3 in the particular case m

2.

The following problems lead to a proof of Theorem 7-6.1. They should be done in order. I n al1 of these problems, e denotes the identity element of a complete ordered field F.

5. Suppose that y E F, z E F are such that z - y > e. Show by the wellordering principle (using the result of Problem 1) that there is an integer a such that y < ae < z.
6. Suppose that y E F, z E F are such that y integer a and a natural number m such that

< z.

Show that there is an

[Wint: Choose m so that me and (me) 2.1

>

(z - y)-l, and apply Problem 5 to (me) y

7. Show that for a E 2, b E 2, m E N, and n E N,

if and only if a/m

b/n in Q.

* For odd values of m, it would make sense to let 7 2 = However, for m even and x < 0, the expression is meaningless in an ordered field, because of Theorem 4-5.5.

T X

-(?m).

258

THE REAL NUMBERS

[CHAP.

8. For x E F, define X(x) = (a/m E Qja E 2, m E N, x Show that X(x) is a Dedekind cut.
9. With the notation of Problem 8, prove that if x

<

(ae)/(me)} .

< y in F, then

10. Show that if X is a Dedekind cut, then there exists x E F such that X = X(x), where X(x) is defined as in Problem 8. [Hint: Let x be the greatest lower bound in F of the set ((ae)/(me)la E 2, m E N, a/m E X) .] 11. With the notation of Problem 8, prove that

12. (a) Show that X(0) the zero of R.

(a/m E &]aE 2, m E N, O

<

(ae)/(me) in F} is

(b) Use this fact, together with the properties of addition in R, to show that X(-x) = -X(x). 13. (a) Prove that X(0) X(x) = X(0 x) for al1 x E F. (b) Prove that X(x) X(y) = X(x y) for x > O and y > O in F. (c) Prove that X(x) X(y) = X(x y) for al1 x and y in P. 14. Show that x + X(x) is an isomorphism between F and R which preserves order (see Theorem 7-6.1).

"7-7 Infinite sequences. I n order to bring our discussion of the real number system back to its starting point, we must show that the real numbers (considered as Dedekind cuts) can be represented by means of the infinite decimal sequences discussed in Section 7-1. This will be done in Section 7-9. The theoretical foundation of decimal representations will be laid in this section and the following one. Since the real numbers often occur as elements of sets, it is confusing to use capital letter set symbols to denote these objects. We will therefore change our notation, beginning in this section, and denote real numbers by small Latin letters u, v, w, etc. The ring of rational numbers will always be considered to be a subring of R , and this convention leads to the inclusions

It is clear from the discussion of Theorem 7-6.1 that we can ignore the way in which the real numbers are constructed without losing any essential information about them. The vital fact to remember is that R is a complete ordered field. DEFINITION 7-7.1. Let ul, u2, u3, . . . be an infinite sequence of real numbers. This sequence is said to converge to a real number v if, for

7-71

INFINITE SEQUENCES

259

any real numbers wl and wz satisfying wl < v < w2, there is a natural number Ic (depending on how close wl and w2 are to u) such that if n 2 Ic, then w1 < un < w2, that is,

and so forth. This definition is so important in mathematics that it deserves some discussion. The meaning which we wish to convey by saying that ul, 242, u3, . . . converges to v is that the numbers u, get close to v as we move to the right along the sequence. I t is natural to ask, how close to u? The answer is "arbitrarily close to v" by going out "sufficiently far. " The expressions in quotation marks are vague, but they can be made exact. The phrase "the u's are arbitrarily close to v" must be replaced by an expression such as "the u's lie in an arbitrarily small interval around u," and the phrase "sufficiently far out along the sequence" should be changed to "from some point on in the sequence." Combining these replacements gives a better informal definition of convergence of a sequence to u: no matter how small an interval is prescribed around u, al1 the numbers of the sequence from a certain point on lie in this interval. The reader can now see that Definition 7-7.1 is only a formal restatement (using mathematical symbolism) of this informal definition. I t appears offhand that for any sequence ul, u2, u3, . . . there might be three possibilities : (a) ul, UZ, u3, . . . does not converge to any real number; (b) ul, u2, u3, . . . converges to exactly one real number ; (c) u1, u2, u3, . . . converges to two or more real numbers. We will show that this last possibility is inconsistent with the definition of convergence.

THEOREM 7-7.2. I t is impossible for an infinite sequence to converge to two different real numbers.
Proof. Suppose that the sequence ul, u2, u3, . . . , converged to numbers vl and v2 with v1 < v2. Let wl, w2, and w3 be any numbers satisfying wl < v l < w2 < v2 < wg. Then by Definition 7-7.1, there are natural numbers kl and 1c2 such that if n 2 lcl, then wl < u, < w2, and if m 2 k2, then w2 < U, < w3. However, if n is larger than both kl and kz, these conditions imply u, < w2 < u,, which is impossible. Thus, the sequence ul, u2, u3, . . . cannot converge to two different numbers.

260

THE REAL NUMBERS

[CHAP.

Because of this theorem, we are justified in saying that v is the limit of the sequence ul, u2, u3, . . . if this sequence converges to v. In this case it is customary to write U = limn+, U,.

EXAMPLE 1. Let ui, u2, u3, . . . be the sequence 1, 2, 3, . . . . Then this sequence does not converge to any real number u, since for any u, there is some m such that u 1<m <m 1 < . . I n particular, i t is not possible to find a natural number k such that v - 1 < uk < v 1 for al1 n k. This example shows that the possibility (a) listed above can occur.

>

EXAMPLE 2. Let ul, u2, u3, . . . be the sequence 1 , 0 , 1, 0, 1, O, . . . . Then this sequence does not converge to any real number v. I n fact, no matter what the 3 and v - 4 < number v might be, i t is impossible to have v - 3 < O < v 1<v +, which is what would be required if we took wl = v - $ and in Definition 7-7.1. w2 = v

+ +

EXAMPLE 3. Let ul, u2, u3, . . . be the sequence 1, +, 4,$, 4, . . . . Then lirn,,, u, = O. For suppose that wl < O < w2. Choose k to be the smallest natural number which is greater than (w2)-l. If n 2 k, then n > ( ~ 2 ) ~ ~ . Therefore, wl < O < u, = l / n < w2. This example shows that the possibility (b) listed above can also occur.

EXAMPLE 4. Let ul, u2, u3, . . . be the infinite sequence of real numbers u, u, u, . . . , al1 of which are the same. Then i t is evident, and even easy to prove, that lirn,,, u, = u. EXAMPLE 5. Let ul, u2, u3, . . . be the infinite sequence t, t2, t3, . . . , where t is some real number. Suppose that Itl < 1. Then Itj > ltI2 > jtI3 > . . It is a familiar fact that Itln "gets close to zero" as n gets large. On a more rigorous level, i t follows from Theorem 7-6.2(b) that if w > O, then there is a natural number k such that w > Itl k. Therefore, w > It/", or equivalently -w < tn < w, for al1 n 2 k . If wl < O < w2, let w = min (-wl, wz}. Then

for al1 n 2 k. Thus, t, t2, t3, . . . converges to O when Itl < 1. If Itl > 1, then for any v there is a natural number n such that ltIn > lvl, by Theorem 7-6.2(a). It follows easily that the sequence t, t2, t3, . . . does not converge if ltl > 1. If Itl = 1, then either t = 1 and the sequence is 1, 1, 1, . . . , or t = -1 and the sequence is -1, 1, -1, . . . . I n the first case the sequence converges to 1, and in the second case i t does not converge.

EXAMPLE 6. Let ul, u2, ua, . . . be the sequence 0, 4,-+, 0, 4, 0, . . . . That is, u, = O if n 1 (mod 3), u, = l / n if n = 2 (mod 3), and u, = -l/n if n =- O (mod 3). Then lirn,,, u, = O. JVe leave the proof of this fact as a problem for the reader.

-6,

7-71

INFINITE

SEQUENCES

261

EXAMPLE 7. Let ul, u2, zt3, Then lirn,,, u, = 5.

. . . be the sequence 0.3, 0.33, 0.333, 0.3333, . . . .

There are some useful properties of thc limits of sequences which will be needed in the succeeding sections. THEOREM 7-7.3. Let zcl, u2, u3, . . . and vl, v2, v3, . . . be infinite sequences of real numbers which have limits. Let w be any real number. Then (u, u,) = lim,,,~, limn+,vn, and (a) lim,,, u,) = w limn+,un. (b) limn,,(w

The meaning of (a) is that the sequence u1 v l , u2 v2, u3 v3, . . . has a limit which is the sum of the limits of the sequences ul, u2, u3, . . . and vl, u2, u3, . . . . Equality (b) means that the sequence wul, W U ~ W , U ~ ,. . . has a limit which is w t,imes the limit of the sequence u,l, u2 ,u3, . . . . Suppose that lim,,,~, = u, and that lim,,,v, = v. To prove (a), we must v < w2, then there is a natural number k such that show that if wl < u for n k, wl < u, u, < w2. From the inequality wl < u u < w2, it follows that wl - u < u < w2 - u. Choose w and w; so that wl - u < w < v < w; < w 2 - u . Thenwl - w < u < w2-w;. This, together with the inequality w < v < wk, allows us to use the hypotheses lim,+,u, = u and limn+,vn = v. Indeed, by Definition 7-7.1, there must be natural numbers kl and 1c2 such that if n 2 kl, then wl - w < u, < w2 - w;, and if n 2 k2, then w < un < wh. Let k = max (kl, k 2 ) . Then if n 7, it follows that n kl and n 2 k2. Therefore, n k implies wl - w < u, < w2 - w h and w < u, < w;. Adding these inequalities gives wl < u, un < w2 for al1 n k. This proves (a). The proof of (b) must be separated into three cases: w = 0, w > 0, and w < O . If w = O, then the statement to be proved is that the limit of the sequence O, 0, O, . . . is O . This is clear. Suppose that w > 0. Let lim,,,~, = u, as before. We wish to show that limn,,w u, = w u. That is, if wl < u, u < w2, then there is a natural number k such 1c. The inequality wl < w u < w2 that wl < w u, < w2 for al1 n and thefact that w > Oyieldsw-l wl < u < w-l w2. Sincelim, +,u, = 1c. Conseu, there is a k such that w-' . wl < u, < w-' w2 for al1 n quently, wl < w u , < w2 for al1 n 2 lc. The proof for w < O differs from the proof in the case w > O only in that the inequality wl < w u < w2 is equivalent to w-' w2 < u < w-' wl, rather than w-' wi < u < w-l W2. As a particular case of Theorem 7-7.3(b), we obtain (for w = -1)

>

+ +

>

>

>

>

>

>

262

THE REAL NUMBERS

Using this observation and Theorem 7-7.3 (a), we have

Finally, it should be mentioned that

This formula is more general than Theorem 7-7.3(b), but we will not prove it. (See Problem 9, however.) There are two problems associated with every sequence. Does the sequence converge to some number? I f so, to what real number does it converge? I t appears from Definition 7-7.1 that in order to give a "yes" answer to the first of these questions, it would be necessary to have the answer to the second. I t turns out that this is not always the case. Many methods have been devised which, for particular types of sequences, yield a criterion for convergence. One of the simplest is the following.

THEOREM 7-7.4. Let ul, u2, u3, . . . be an increasing sequence of real . Then this sequence converges u2 u3 numbers, that is, ul if and only if it has an upper bound (in other words, there is a real number w such that u, w for al1 n).

<

<

<

<

Proof. First suppose that ul, u2, u3, . . . converges to v. Choose any real numbers wl and w2 satisfying wl < v < w2. Then by Definition 7-7.1, there is a natural number k such that if n 2 k, then wl < u, < w2. In particular, u1 _< u2 _< . 5 uk _< uk+l 5 5 Un < w2 for a11 n > lc. That is, w2 is an upper bound of {u,ln E N}. Conversely, assume that {u,ln E N} has an upper bound. Then by the completeness of R, this set also has a least upper bound v. We will prove that lim,+,u, = v. Suppose that wl < v < w2. Then u, v < w2, for al1 n, since v is an upper bound of {u,ln E N}. Moreover, because v is the least upper bound of {u,ln E N}, and wl < v, it follows that wl cannot be an upper bound of the set of un's. Hence, there is some natural number Ic such that wl < uk. Then wl < uk 5 uk+l 5 uk+2 5 ' ' ' 7

<

so that wl

< u, < w2

for a11 n 2 1c.

Hence, by Definition 7-7.1,

I t is possible to prove a theorem similar to Theorem 7-7.4 for decreasing sequences of real numbers. A decreasing sequence converges if and only if it has a lower bound. These results do not give any information about sequences which are neither increasing nor decreasing. Such sequences can be bounded but not converge, as Example 2 shows.

7-81

INFINITE

SERIES

1. Which of the following sequences converge? (a) 1, 2, 4, 8, . . . , 2"-l (b) -1 , -1 -L -L 2, 3, 4, 7 -l/n,


)
S

(c) 1, -+, +, -*, . . . , (-l)n-l/n, . . . (d) 1, -+, 1, -6, 1, -$, . . . (e) 1 , 1 - + , 1 + + , 1 - ~ 4 7 l + L5 7 1 - L 6 7 l + l 77

2. Show that if u is any real number, then the sequence u, u, u, to u. 3. Prove the statement niade in Example 6.

. . . converges

4. Show that if ul, u2, u3, . . . is any infinite sequence of real numbers, and if vi, 212, 213, . . . is the sequence obtained from ul, u2, u3, . . . by omitting the first m terms, then v i , 02, 213, . . . converges to w if ul, u2, u3, . . . converges to w, and i t does not converge if ul, u2, u3, . . . does not converge.

5. Show that if ul, u2, u3, . . . converges to u, then lull, luzl, to / u / .

1~31,

. . . converges

6. Let ul, u2, u3, . . . be an infinite sequence of real numbers. Suppose that for each real number u, there is some n such that lunl > jul. Prove that the sequence does not converge.
7. Use Theorem 7-7.3(b) (with w rem 7-7.4 for decreasing sequences.
=

-1)

to prove the analogue of Theo-

8. Show that any convergent sequence, considered as a set, has an upper bound and a lower bound.
9. Prove that lirn,,, (u, cases. (a) limn+, U n = O (b) lirn,,, u, > O

u,)

(lirn,,,

u,)

(lirn,,,

u,) in the following

"7-8 Infinite series. A particularly important class of sequences is obtained from the formal expressions called infinite series. To motivate the concept of an infinite series, let us return to the decimal fractions which were discussed in Section 7-1. The usual notation

is an abbreviation for the expression

For example,

264 and

THE REAL NUMBERS

[CHAP.

This observation tempts us to use a similar interpretation for hfinite decimal sequences. We would like to write

However, the sum of infinitely many numbers is not defined. By using the definition of convergence of sequences, it is sometimes possible to assign a meaning to infinite sums. In particular, this definition covers al1 of the sums which are associated with infinite decimal sequences. DEFINITION 7-8.1. An infinite series is an expression

where v l, v2, v3, . . . is a given sequence of numbers. The elements of this sequence are called the terms of the series. DEFINIT~OX 7-8.2. Let
vk be an infinite series.

The n u m b ~ r

is called the nth partial sum of this series. The series is said to converge to u, or to have t,he sum u, or simply to be convergent, if the sequence ul, u2, u3, . . . converges to u. In this case, we write

I f the sequence ul, u2, u3, . . . does not converge, then the series is called divergent.

A convenient way to abbreviate the definition of the sum of an infinite series is by the formula

7-81

INFINITE SERIES

265

EXAMPLE 1. Let vi = 1, va = 1, va = 1, . . . . Then the nth partial sum of Ek=l vk is 1 1 1 = n. Since the sequence 1, 2, 3, . . . of partial sums is not convergent (Example 1, Section 7-7), i t follows from Definition 7-8.2 that this series is divergent.

+ + +
+

2. Let vi = 1, u2 = -1, va = 1, . . . , vk = ( - l ) k + l , . . . . Then EXAMPLE u1 = 1, V l + V 2 = 1 - 1 = o, v1+v2+v3 = 1 - 1 + 1 = 1, V l + V 2 + 03 04 = 1 - 1 1 - 1 = O, . . . . That is, the sequence of partial sums of the series E;=l vk is 1, 0, 1, 0, . . . . Since this sequence does not converge (Example 2, Section 7-7), i t follows that ( - l ) k +l is divergent.

EXAMPLE 3. Let vi

1/1 - 2 , va

1 / 2 . 3,

03 =

1 / 3 - 4 , . . . . Then

I t is easy to evaluate the nth partial sum of this particular series:

By Example 3, Section 7-7 and Theorem 7-7.3,

Thus, the series

1/k ( k

+ 1) converges to 1.

EXAMPLE 4 . , Let 01 = 1, v2 = t, v3 = t2, . . . , vk = tk-l, . . . . That is, tk-l. It is easy to prove by induction* [see Problem 6(a), z = l vk = Section 2-11 that the nth partial sum of this series is

provided that t # l . I t follows from Theorem 7-7.3 and Example 5, Section 7-7, that 1 - t" 1 limn-), -= ---1-t 1-t

* Another proof is obtained from the identity


(1 - t ) ( l + t

+ t2+

+ tn-1)

+ t + t2 + + tn-l - (t + t2 + + tn-l + tn)

266

THE REAL NUMBERS

[CHAP.

if Itl < 1, and this limit does not exist if Itl > 1. Hence, by Definition 7-8.2, th1 converges to 1/(1 - t ) if Itl < 1, and i t diverges if Itl > 1. the series If Itl = 1, then t = 1 or t = -1. In these cases, the series E:=i tX-l is the same as the ones discussed in Examples 1 and 2, both of which diverge. This example, and Example 5, Section 7-7, upon which i t is based, should be studied carefully. Both results have important applications in the theory of infinite sequences and series.

Many of the results concerning infinite sequeiices lead to theorems about infinite series. A t8ypicalexample is the following.
v k and E:=l w k be infinite series which THEOREM 7-8.3. Let converge. Let w be any real number. Then

That is, under the assumptions that the series (vk ~ k ) a,nd
k=l

vk

and
vk)

wk

converge,

2 +

k=l

2 (w

converge to the corresponding expressions on the right-hand side of (a) and (b). To prove (a), note that by the generalized commutative law

Thus by Definition 7-8.2 and Theorem 7-7.3,

7-81

INFINITE

SERIES

267

The proof of (b) is similar. By the generalized distributive law (4-7))

k=l

w.v.=

w.("k). k=l

Thus,

The generalized commutative and distributive laws used in the proof of Theorem 7-8.3 are concerned only with finite sums. This theorem is in a sense a generalization of these laws to infinite series. If C:=l uk and w k both diverge, one might expect that (uk wk) also diverges. However, the series

k=l

(1 + h(k

+ 1)

and

k=l

(-1)

both diverge, while

converges. The infinite series discussed in Examples 3 and 4, above, are unusual, because the sum of these series can be determined. Generally, it is very difficult to find the sum of a series. Often we need only to know whether or not a given series converges. For this problem, numerous tests have been devised. Most of these tests apply only to series with nonnegative terms. They are based on the following consequence of Theorem 7-7.4. THEOREM 7-8.4. Let CF=l v k be an infinite series such that v k O for al1 lc. Then this series is convergent if and only if the set of its partial sums has an upper bound, that is, there is a real number w such that Ck=lv k = u, 5 w for al1 n.

>

Proof. By Definition 7-8.2, the series C g l vk is convergent if and only if its sequence of partial sums ul, u2, u3, . . . is convergent. Since v k 2 O for al1 k , i t follows that ul 5 u l v2 = u2 5 u2 v3 = u3 5 . . Hence, the partial sums of CZl v k form an increasing sequence. By

268

THE REAL NUMBERS

[CHAP.

Theorem 7-7.4, such a sequence converges if and only if it is bounded. This proves the theorem. As it stands, Theorem 7-8.4 is not a very useful criterion for deciding whether or not a series converges. However, there are numerous tricks for determining whether the partial sums of particular infinite series are bounded or not. Such tests are studied a t length in calculus courses. We will implicitly use one well-known test (the "comparison test ") to prove a result which makes it possible to assign a real number to every infinite decimal sequence.

THEOREM 7-8.5. The infinite series

. . . , ao, bl, b2, . . . , b,, . . . are integers between O and 9 where a,, (inclusive), converges to a real number .
Proof. The (m

+ 1 + n)th partial sum of this series is

Since each a; and b j is 5 9, this partial sum is a t most equal to

Thus, lorn+' is an upper bound of the set of al1 partial sums of

so that by Theorem 7-8.4, this series converges to a real number. By Theorem 7-8.5, we see that an infinite decimal sequence can be considered as an abbreviation for a convergent infinite series:

7-81

INFINITE SERIES

269

Thus, associated with every infinite decimal sequence is a real number (the sum of the corresponding series). This definition of the real number associated with an infinite decimal sequence agrees with the intuitive idea of the decimal representation of real numbers which we discussed in Section 7-1. Indeed, the decimal representation of a real number u was described there as a sequence of progressively more accurate approximations of u by decimal fractions. This sequence of decimal fractions is exactly the sequence of partial sums of the series associated with the infinite decimal sequence representing u, so that if the intuitive idea of "progressively more accurate approximations of u" agrees with the exact notion of convergence, then the series associated with the infinite decimal representation of u must converge to u. In the next section we will completely justify this viewpoint.

1. I n Example 4, i t was stated that

does not exist if Itl

> 1.

Prove this in detail, using Theorem 7-7.3.


uk

2 . What is the numerical value of the 10th term of the series


(a)
uk =

if

k+2 (b) ux 2k1'

= -9

2k k!

(c)

U* =

k loAk.

3. Prove that if u k is an infinite series such that O = un+l = un+z = un+sn= ,that is, al1 terms after the nth one are eero, then u k converges

xr=l

xr=l

to

Uk.

4. Show that if an infinite series

xyZl

uk

converges, then limk+mu k = 0.

5 . Prove that the converse of the theorem of Problem 4 is false by showing that (a) lirnk,, 1 / ( d m 1 % ) = 0, and (b) the series

diverges. [Hint: First show that l / ( d k $-

1 x 1=d
uk

dX.1

6. Prove the comparison test: If u* is an infinite szies witli u k 2 O for al1 k = 1, 2, . . . , and (b) v k converges, then

x : = l

E,=,

5, such that (a)

u*

vn for

converges.

7. Use the comparison test to show that the following infinite series are convergent.

270

THE REAL NUMBERS

[CHAP.

*7-9 Decimal representation. At the end of the last section, it was shown that every infinite decimal sequence can be considered as the representation of a real number, namely,

This observation provokes the two main questions which will be answered in this section. Can every real number be represented in this way by a decimal sequence? Can certain real numbers be represented by more than one decimal sequence, and if so, which ones, and in how many ways? Throughout this section, both finite and infinite decimal sequences will be considered as abbreviations of their corresponding decimal sums; that is

A decimal fraction which has n decimal places (see Definition 7-1.2) is called an n-place decimal fraction or an n-place decimal sequence. These are the decimal sequences with n digits following the decimal point, that is, . . . a o . b1b2.. . b,. A nonnegative rational number r is an n-place decimal fraction if and only if 10" r is an integer (see Problem 4, Section 7-1). I t is convenient to summarize some familiar properties of decimal sequences which will be used in this section. THEOREM 7-9.1. (a) If r is an n-place decimal fraction, then r is an n-place decimal fraction. (b) I f r and S are n-place decimal fractions, and r < s, then

+ lo-"

(c) If amam-1 . . . a. . blb2. . . bn = cmcm-l . . . co . dld2. . . dn, then a, = cm, a,-1 = cm-1, . . . , a0 = CO, 61 = di, b2 = d2, . . . , b, = d,. . . . a. . b l b 2 . . . b, (d) < amam-1 . . . a. . blb2. . . b n b n + l . . . bn+k < (ama,-1.. . a o . 6162.. . b,) lo-".

Proof. By the remark preceding this theorem, if r is an n-place decimal fraction, then 1 0 9 is an integer. Therefore, 10nr 1 = 10n(r loMn)is

7-91

DECIMAL REPRESENTATION

271

lo-" is an n-place decimal fraction. also an integer. Consequently r The proof of (b) is based on the same idea. Since r < S, This proves (a). Thus, IOnr 1 5 10"s. Hence, r 10nr is an integer less than 10"s. lo-" 5 s. To prove (c), note that if amam-1 . . . a0 . blb2 . . . bn = cmcm-l . . . co . dld2 . . . dn is multiplied by lon, then we obtain

Thus, by the uniqueness of the decimal represent'ation of a natural number (Theorem 5-1.3), a, = cm, a,-1 = cm-1, . . . , a0 = co, b l = dl, b2 = d2, . . . , bn = dn. Finally, the proof of (d) is a simple calculation:
amam-1 . . . a0 . blbz . . . b, = a, . l o m lom-' . . a. b1 10-1 bz. . bn lo-" . lorn-' . . . a. bl 10-l+ b2 . ior2 a,. 10'" + 6, lo-" b,+l - bn+k. 1 0 - ( ~ + ~ '

+ + + + + + + + + ... + + + + = amam-l.. . a o . b l b 2 . . . bnbn+l.. . bn+k lom-' + . . + a. + bl - lo-' + 62. l o w 2 + 5 (a,. 10" + + bn


+

+ (g.lo-'"+l' + . . . + 9 . lo-'"+")
(a, lom

+ (lo-"

- 10-'n+k')

<
=

(a,, 10"

+ b2 . 10W2 + . + bn lorn-' + - . + a. + bl . 10-1 + b2 - 10W2 + + + 6, . a. . blb2.. . b,) + lo-".


lorn-'

+ a. + bl

10-1

lo-") lo-")

4-

THEOREM 7-9.2. For infinite decimal sequences:


(a) lon .

. . . a0 . b1b2 . . . bnbn+lbn+2 . . .) = . . . aoblb2.. . bn . bn+lbn+2. . . .


for for
-

(b) If O<ci<ai<9 O<dj5bj<9 e i = ai then

i = m , m - 1 , . . . , O, j Z l , 2 , 3,..., fj
=

ci,

bj

dj 7

(c) (d)

<

. . . a0 . blb2 . . . bn. . . . a. . blb2 . . . bnOOO . . . = . . . a. . blb2 . . . bn 5 amam-1 . . . a. . blb2 . . . bnbn+1 . . . . . . a0 . blb2. . . bn)

The identities (a), (b), and (c) are easily proved, using the ii~terpretation of decimal sequences as sums, together with Theorem 7-8.3. The proof of Theorem 7-9.2 (d) is based on the corresponding result for finite decimal

272

THE REAL NUMBERS

[CHAP.

sequences Theorem 7-9.l(d), and the definition of an infinite decimal sequence as the sum of an infinite series. Suppose that

and

amam-1 . . . a o . b l b 2 . . . bn = r.
We have to prove that r by Theorem 7-9.1 (d),

5 u 5r

+ lo-".

Suppose that u

< r.

Then

However, this is impossible, since by the definition of u, we have

and in particular, if u

< r, there is a 1c E N such that

loe7'. Consequently, r 5 u. I n the same way, we see that u 5 r Xote that the second strict inequality of Theorem 7-9.l(d) has been weakened to 5 in Theorem 7-9.2(d). In Theorem 7-9.4 it will be shown that this weakening is essential. I n the proof of the fundamental theorem of decimal representation of real numbers we will use an important property of the real number system which has not yet been discussed. (7-9.3). Let x and y be real numbers such that x rational number r such that x < r 5 y.

< y.

Then there is a

This result has been proved for complete ordered fields in Problems 5 and 6 of Section 7-6. However, a simpler proof can be given for the real number system if we go back to the construction of real numbers by Dedekind cuts. By Definition 7-4.4, the inequality x < y means that x, considered as a set of rational numbers, properly contains y. Hence, there is a rational number r such that r belongs to x, but not to y. Then the Dedekind cut corresponding to r contains y and is contained properly in x. (See Problem 3, Section 7-4.) Thus, if we identify r with the cut to which it corresponds and use Definition 7-4.4 again, we obtain x < r 5 y. THEOREM 7-9.4. Fundamental theorem of decimal representation. Let u be a positive real number. Then u is represented by some infinite decimal sequence . . . a. . b l b 2 . . . b, . . . .

7-91

DECIMAL REPRESENTATION

273

That is, u is the limit of the infinite series corresponding to this decimal sequence.
Proof. The proof of this theorem consists of severa1 steps. First we show that for any n, there is a unique n-place decimal fraction r, such that

The rational number r, is called the n-place decimal approximation of u. The next step of the proof is to show that there is an infinite decimal sequence . . . a. . b l b 2 . . . b, . . . such that for each natural number n, the n-place decimal approximation r of u is exactly . . . a. . blb2 . . . b,. The proof is completed by showing that the infinite series

. . . a. . blb2 . . . bn . . . converges to u. corresponding to ( 1 ) By (7-9.3)) there is a rational number r such that

Let 10nr = a/m, where a E Z and m E N. By the division algorithm, d, where b and d are integers, and O 5 d < m. we can write a = m b Then 10nr = b (dlm). In particular,

There are two possible cases: either b > 10% - 1, or b 5 10% Suppose first that b > 10nu - 1. Define r, = lo-" b. Then

1.

lo-,. Moreover, r, is an n-place decimal Consequently, rn 5 u < rn fraction, since lon r, = b is an integer, and b 1 > 10nu 2 0 , so that b O. In the second case, where b 5 10nu - 1, define r, = 10-"(b 1). Then u - lo-" < r < rn 5 u, so that r, 2 u < r, lo-". As in 1 E Z, the first case, r, is an n-place decimal fraction, since 1 0 9 , = b and -1 < 10nu - 1 < 10nr < b 1 implies b 1 2 0. I t is easy to see that there is at most one n-place decimal fraction r, such that r, 5 u < r, lo-,. Indeed, suppose that S , is an n-place decimal fraction such that S , 5 u < S , lo-". If r, < S,, then

>

274

THE REAL NUMBERS

[CHAP.

r, lo-" 5 S,, by Theorem 7-9.1 (b). However, this yields the contradiction S, u < r, lo-" S,. For a similar reason, the inequality S, < r, is impossible. Therefore, r, = S,. This completes the proof of the first step. (2) Suppose that r, = amam-1 . . . a. . blb2 . . . b,, is the n-place decimal approximat,ion of u, and that r,+l = cmcm-l . . . co . dld2 . . . d,d,+l is the (n 1)-place decimal approximation of u. (We may assume that the number of digits to the left of the decimal point is the same for r, and rn+l by adjoining zeros, if necessary.) Let S, = cmcm-1. . . co . dld2 . . . d, be the n-place decimal fraction obtained from r,+l by deleting the last digit. Since r n + ~ = S, d,+l 10-'~+" and O ( d,+l 9, it follows 10-'"'". Moreover, rn+l u < r,+l r,+l S, 9 that S, 10-'"+", because rn+l is the (n 1)-place decimal approximation of u. Hence,

<

<

<

+ < +

< <

Therefore, S, is the n-place decimal approximation of u. By the uniqueness of such n-place approximations, which was proved in (l), it follows that S , = r,. Hence by Theorem 7-9.1 (c), cm = a,, cm-1
=

a,-1,

. . . , co

ao, dl = bl, d2

= b2,

. . . , d,

bn.

We have proved that r,+l = amam-1 . . . a0 . blb2 . . . b,d,+l, that is, the 1)-place decimal approximation of u is obtained from the n-place (n approximation by adding a single decimal digit. Thus, the sequence rl, r2, r3, . . . of decimal approximations of u gives rise to the infinite decimal sequence amam-1 . . . a. . blb2 . . . b, . . . such that for each n,

rn = amam-1 . . . a.

. blbz . . . b,.

This completes the second step of the proof. (3) The (m 1 n)th partial sum of the infinite series

+ +

a,

10"

+ bn

+
lo-,

+ a. + b1
=

10-1

+ b2

+
=

. . . a0 . blb2. . . b,

r,.

Therefore, to complete the proof, we have only to show that if wl < u < w2, then there is a natural number le such that wl < r, < w 2 for al1

7-91

DECIMAL REPRESEXTATIOS

275

2 Ii. By Definitions 7-8.1 and 7-8.2, this n-ill imply that the infinite series corresponding to the iilfinite decimal scqueilce

converges to u. Hy Theorem 7-fj.2(t)), there is a natural number h. such that = (10-l)k < 21 - wl. Then wl < 16 l'herefore, if n 2 l., W l < u - lo-" u - lo-"" rn 5 u < Wp. This completes the proof of Theorem 7-9.4. We now consider the second qiicstion which \vas mentioned a t the beginning of this section: which real numbers can be represented by tivo or more infinite decimal seclueilces? I t is easy to show that there are numbers which have different rcprcsentatioils.

EXAMPLE 1. l e \vil1 show that 0.999 ... = 1 = 1.000... . 13y Theorem 7-9.2(a), 10 (0.999...) = 9.999... = 9 (0.999...). Hencc, 9 (0.999...) = 10 . (0.999...) - (0.999...) = 9. Dividing by 9 gives tlie desircd conclusion.

This example can easily be gcneralizcd. THEOI~EM 7-0.5. Let, a,, digits. Then

. . . , ao, bl, b2, . . . , bn be aily decimal

I'roof.
10"

By Theorem 7-0.2 and Example 1,

. . . a. . b1b2 . . . bn999. . .) = . . . aob b2 . . . 6, .O90 . . . = (a,nam-l . . . aoblb2 . . . O,,) -1- (0.9'39 . . .) = (a,,am-l . . . aob16p . . . b,) 1 . b1h2 . . . bn) -t = 1On[(a,a ,,,- 1 . . .

Ilividiilg by 10'"ives

the theorem.

This theorcm shows that every n-place dccimal frnction can be represented by tn-o different infinite decimal seclueilccs, one of which has al1 zeros after the nth digit to the right of the decimal point, and the other one having al1 riiilcs after thc nth digit to the right of thc dccimal point. NTe

276

THE REAL NUMBERS

[CHAP.

will prove that this is the only case in which a real number has more than one decimal representation. THEOREM 7-9.6. Suppose that the real number u is represented by two different decimal sequences,

. . . a. . blb2 . . . b n b n + l . . . = U
- CmCm-l

. . . co . d l d 2 . . . dndn+l

where for some n,

Then
bn+i = 9, bn+, = 9,

bn+, = 9,

..., .

and dn+i Proof. For k


=

O,

dn+2 = 0, dn+3 = 0,

1, let

By Theorem 7-9.2(d), rk 2 u 2 rk

+ lo-',

sk

< U 2 sn +

I f rk

<~

k then ,

by Theorem 7-9.1 (b),

Hence, rk

< sk implies
U = Sk

= rk

By Theorem 7-9.1 (d), if rk

< sk it also follows that

. by assumption r, so that rk < sk also yields rk+l < ~ k + ~Since induction argument proves that for al1 k 2 n,

< S,,

an

rk Thus, if k

< sk,

and

u = sk = rk

n,

7-91

DECIMAL R E P R E S E S T A T I O S

277

By the uniqueness theorcm for finite decimal sequences [Theorem 7--9.1. (c)],

l'erforming this decimal subtractioil iri the usual way gi~res

Consequently, by Theorem 7-9.1 (c) again, a,, = cm, a,-1 = P ,,,- 1 , . . . , a. = eo, b i = Si, b2 = f 2 , . . . bn = fn, bn+l = 9, On+2 = 9, . . . , bk = (3. (Xote that if d, > O, then bn = d, - 1, b,-l = d,-l, . . . , bi = d i ,a. = c0) . . . , = cm-l, a,, = cm.) Sirice 1; can be ariy natural numbcr grcater than or cqual to n, this completes thc proof of Theorem 7-9.6. We can summarize the rcsults of Theorenis 7-0.3, 7-9.5, and 7-9.6 as f ollo\vs :

THEOKEM 7-9.7. If U is a positive real number ivhich is not n decimal fraction, then u can be represcnted in exactly one way as an infinite decimal sequcnce. If u is a decimal fraction, then u can bc represented in exactly tivo ways as an infinite decimal fraction. Onc of these rcpresentations ends ivith a sequcnce of nines and the other ends ivith a sequence of zeros.

l . Give al1 possible infinite decimal representations of the folloiving numbers. (a) 1.O1 (b) (c) 5 (d) &

2. Carry out the proof of Theorem 7-9.5 for the particular case of the number 0.4999 ... .
3. I n which of the folloiving proofs is the coml~letenesspropcrty of R used either directly or indirectly: the proof of Theorem 7-8.5; the proof of Theoreni 7-9.4; the proof of Theorem 6-9.5; the proof of Theorcm 7-9.6.
4. Prove Theorem 7-9.2(a), (b), and (c).

5. Provc the following refinenlent of (7-9.3) : let x and y be real numbers such that x < y ; then there is a rational number r such t h a t x < r < y.

278

THE REAL NUMBERS

6. An infinite binary sequence is an expression

where a,, a,-1, . . . , ao, bi, b2, . . . , b,, binary sequence represents the number

. . . are binary

digits O or 1. Such a

(a) State the analogue for infinite binary sequences of each of Theorems 7-8.5, 7-9.4, 7-9.5 and 7-9.6. (b) Find the binary sequence representing

h.
5 x).

7. If x is any real number, define the greatest integer function [z] as folloivs:

[x] = max {n E Nln

If x has the decimal representation amam-1. . . a0 . blb2

. . . bn . . . , what is [x]?

"7-10 Applications of decimal representations. The possibility of representing real numbers by infinite decimal sequences is of considerable practica1 importance. However, the theorems on decimal representation also have theoretical applications of some importance. One of these will be presented in this section. This application leads naturally to a discussion of the decimal representation of rational numbers. Cantor's theorem. One of the most interesting applications of the decimal representation theorem for real numbers is Cantor's proof that the set of al1 real numbers is not denumerable. That is, there is no one-to-one correspondence between the set R of al1 real numbers and the set N of natural numbers. The proof is by contradiction: we assume that such a correspondence exists and show that this assumption leads to a contradiction. Suppose t hat n u,

is a one-to-one correspondence between R and N. Let ill


=

(n E Nln

+-+

u,, where O

5 u, <

1).

Let the elements of Il/I be labeled nl, n2, n3,. . . , with n l < na < n3 < . . . . Not'e that M must be infinite, since the set of al1 real numbers between O and 1 is infinite. Therefore, we obtain the one-to-one correspondence

between N and the set of al1 real numbers u such that O

5 u <

1. Ex-

7- 1o]

. \ P P L I C A T I O S S O F DECIMAL REPKESEXT.4TIOXS

279

pressing each u,, as an infinite decimal seyuencc, \ve obtain the tablc

Thc contradiction which v7c are seeking is obtaincd by constructing a decimal sequcnce O . clc2 . . . C ~ C ~ +.I. . corresponding to a numbcr 1 1 , diffcrent from 1, which cannot occur in the list

(contradicting the assumption that this list contains al1 real numbcrs u satisfying O _< 'ZL < 1 ) . To obtain U,let cl be any dccimal digit which is differcnt from bl , l , O, and 9 ; Ict C? be ariy decimal digit which is diffcrent from 0 2 , 2 ,O, and 9; . . . ; let ck be any decimal digit which is differerit from bk,k, O, and '3 ; arid so forth. Yote that there are a t least six possible choices for each of the numbers cl, c2, CQ, . . . . Define

Thcii v is a positive real number, lcss than 1 , which does not end with a sequcnce of zeros or riincs. Hcnce, the decimal repre~entat~ion of u is uniclue, by Theorem 7-9.7. AIoreaver, for evcry Ic, u # unk. In fact, by thc way that u was constructcd, the decimal reprcsentation of v is differcnt from the dccimal reprcscritation of un,. Sincc v has only one dccimal represcntation, this implics that 11 # u,,. This complctcs thc proof.

TIIEOKEXI 7-10.1. Cantor's theorcm. ''hc set of al1 real izumbers is izot
dcnumcrahle. Cantor's thcorem shows coiiclusivcly that, it is not possible in any way to sct up a one-to-onc corresporidence bet~i-ccnthc, points of a linc and the rational numt)crs. l'hc example gi\rcii in Section 7-2 showed that the natural coordinate corrcspondcrice between (2 arid thc points of a linc 1 does ilot exhaust al1 poiiits of 1. 'I'hc fact that it is not possible to establish any corresporidciicc betwccii Q aiid the points of 1 is a much st,rongcr result. To prove that no such correspondencc is possible, observe that by the discussioiz of Sccltioii 7-:3, therc is a one-to-one corrcspondencc betwecn thc

280

THE REAL NUMBERS

[CHAP.

real numbers and the points of l. Therefore, the existence of a one-to-one correspondence between the points of 1 and Q would lead to a one-to-one correspondence between R and Q. However, since Q is denumerable (Problem 5 , Section 1-2), this contradicts Cantor's theorem. Perhaps even more important than the result of Cantor's theorem is the method used for its proof. The crucial step in this proof is the observation that if an infinite list of sequences is given, arranged in the form of a rectangular array,

then any sequence cl, c2, c3, c4, . . . which differs at each entry from the "diagonal " sequence a l ,l , a2,2,a3,3, a4,4) . . . must differ from every one of the sequences ai,l, ai,2, ai,3, ai,4, . . . , which are the rows of the rectangular array. This type of argument is usually called the diagonal method. I t occurs in the proofs of some of the most important theorems of modern mathematics. The representation o f rational numbers. In Section 7-2 the term "irrational" was introduced to describe those real numbers which do not belong to Q. Cantor's theorem shows in a striking way that the irrational numbers are much more abundant than the rational numbers. I t is natural to ask if there is some way to recognize the decimal representation of a rational number. We will now prove that the infinite decimal sequences which represent positive rational numbers are exactly tlhe ones which are ultimately periodic.

DEFINITION 7-10.2.
if it is of the form

A decimal sequence is called ultimately periodic

That is, fiom a certain point ont the decimal sequence consists of repetitions of a finite sequence of decimal digits.

7- 101

APPLICATIONS

OF DECIMAL HEPRESENTATIONS

281

I t is convenient t,o abbreviate ultimately periodic decimal sequences by writing amam-l .a0 blb2 . . . bnb,+i bn+k

The line over the block of digits indicates that this finite sequence is repeated indefinitely in the decimal sequence.
EXAMPLE 1. The expression 33.3 stands for 33.3333... = 100/3. This number could just as well be abbreviated 33.333, or even 33.333, etc. EXAMPLE 2. The expression 121.427 stands for 121.4272727... . This could also be ivritten 121.42%, or 121.42727. The possibility of expressing the number u representcd by 121.427 as 121.42727 leads to a niethod of determining u as a rational fraction. In fact

and 10 u
=

10 (121.427)

1214.27 = 1214

+ 0.27.

Subtracting these equations, the terms 0.27 on the right-hand sides cancel each other, leaving

The argument used in Example 2 can easily be generalized to show that every ultimately periodic decimal sequence represents a rational number. THEOREM 7-10.3. Let U = Then u is the rational number

. . . a. . b l b 3 . . . bnbn+1 . . . 6 n + k .

In fact,
( l o n f k- l o n ) .U

. . . a o b l b z . . . b,b,+l . . . b,+k) O . b,+l . . . b,+k] . . . aoblb2 . . . b,) + 0 . b,+l . . . b,+k] = . . a o b l b z . . . b,)] . . . a o b l b z . . . b,bn+l . . . b,+k) + 0.000. . . ,
=

by Theorem 7-0.2(b). Dividing gives the desired result.

282

THE REAL NUMBERS

[CHAP.

A theorem such as this always suggests a converse. In this case we are led to ask if every decimal representation of a nonnegative rational number is ultimately periodic. There is evideiice to support this conjecture. For example, by Theorem 7-9.7 the two decimal representations of any decimal fraction are ultimately periodic since they end in a sequence of zeros or nines. Consider also the following example.

EXAMPLE 3. It is possible to obtain the decimal espansion of rational numbers by long division. For instance,

Since the remainder 4 is the same as the number which we began dividing, i t is clear that continuation of the process will give the block 571428 repeatedly. Therefore, it seems certain that

(The only reason for not trusting this conclusion is that we have not shown that the continued use of long division does really lead to the decimal representation of a fraction. However, the validity of the result can be checked directly from Theorem 7-10.3.)

THEOREM 7-10.4. Every decimal represeiitatioii of a positive ratioiial number is ultimately peiiodic. The idea underlying the proof of this theorem is the same as the priiiciple operating in Example 3. The process of dividing the numerator of the fraction by the denominator must somewhere yield two remainders which are equal. When the same remainder occurs st second time, the decimal begiiis to repeat. This idea is somewhat disguised iii the following proof.

7- 1O]

APPLIC~~TIOSS OF DECIMAL HEPRESEXTATIOSS

283

Let 2c be the positive rational number c/d where c and d are natural numbcrs. If u is a decimal fraction, then its two decimal representations are ultimately pcriodic, by Theorem 7-9.7. Thercforc suppose that u is not a decimal fraction, so that its decimal representation

is unique. C'sing the division algorithm, we can writc

\\-here qo, ql, q2, . . . , rO,r l , 7-2, . . . are nonilegative integers, and O 5 < d for i = 0, 1, 2, . . . . There are a t most d different values which ri can take (actually, a t most d - 1, since the assumption that u is not a decimal fraction implies that ri # O). Thercfore, in the list, of numbers

there must be two which are equal. Suppose that


Tn

rn+k

r,

>: 0,

> 0.

Thcn

S o t e that r/d, q, r/d, and qn+k r/d are not decimal fractions, since otherwise c/d would be a decimal fraction. This fact is needed so that we can use the uniqueness of the decimal representations which was proved in Theorem 7-9.6. Because q, and qn+k are integers nnd O 5 r/d < 1, i t follows that

O bn+l
Thereforc,

bn+kbn+k+l

bn+2kbn+2k+l = r/d = O . bn+k+l . . . bn+2kbn+2k+l . . . b n + 3 k . .

..

284
Consequently,
U

THE REAL NUMBERS

= amam-1

. . . a0 . blb2 . . . bnbn+l . . . bn+k,

t h a t is, the decimal repre~entat~ion of u is ultimately periodic.

1. Find the rational numbers which are represented by the following ultimately periodic decimal sequences. (a) 21.01 (b) 4.0010012 (c) 0.00111. 2. Find the decimal sequences which represent the following rational numbers. (a) 2/7, (b) 201/999 (c) 18/17 3. Let u be the real number 0.1010010001000010... whose decimal representation consists of a sequence of ones separated by blocks of zeros, with the length of each block equal to the number of ones which precede it. Show that u is irrational.
4. Show that the number

k=l is irrational. 5. Let A be a set containing a t least two elements. Use the diagonal method to prove that the set S of al1 sequences al, a2, as, . . . of elements of A is not denumerable. Use this result to show that the set of al1 subsets of the set N is not denumerable. [Hint: Let A = {O, 1) and establish a one-to-one correspondence between the set P(N) of al1 subsets of N and the set S of al1 sequences al, a2, a3, . . . of zeros and ones. For instance, let
O if i 4 M and ai = 1 if i E 3 1 . 1

where for each i, ai

6. Let

[For example, (121.2121..., 003.3333...) +--+ 102013.23132313....] (a) Show that this definition establishes a one-to-one correspondence between the set of al1 pairs (u, u ) of real numbers and a subset T of R, provided the following convention is accepted: each real number is represented by an infinite decimal sequence (which may begin with a finite number of zeros), possibly ending with a sequence of al1 zeros, but not with a sequence of nines. (b) Show that T is not al1 of R.

7- 1O ]

APPLICATIONS

OF DECIMAL REPRESENTATIONS

285

7. Use Theorem 7-10.4 and the proof of Theorem 7-10.3 to show that any natural number k: which is not divisible by 2 or 5 will divide some number of the sequence 9,99,999,9999, . . . .
8. Show that if m is a natural number which is relatively prime to 10, then the decimal expansion of l/m is of the form

where d is the order of 10 modulo m (see Definition 5-8.8).

CHAPTER 8

THE COMPLEX NUMBERS


8-1 The construction of the complex numbers. One of the properties of the system of real numbers which was derived in the preceding chapter concerned the solution of the equation xm = U, where m is a natural number and u is a real number: I f m is odd, then x m = u has exactly one real solution, and if m is even and u is positive, there are exactly two solutions which are real numbers. However, if m is even and u is negative, then the equation xm = u has no real solution (see Theorem 7-6.3, and Problem 7, Section 4-5). In particular, the equation x 2 = -1 has no solution in the field of real numbers. The desirability of solving such equations poses a problem which should now be familiar to the reader: invent a new number system which contains R and which includes numbers which satisfy the equations under consideration. More precisely, we wish to construct a number system C which satisfies the following conditions:

(i) C is a field containing R as a subring, and (ii) C contains a number* i which satisfies i2 = -1.

(8-1

I t is also reasonable to require that C be minimal among the systems satisfying these conditions. That is, there should be no proper subring of C which also satisfies (8-1). For otherwise, we could attain our objectives more economically with the subring than with C. The construction which gives the desired field turns out to be remarkably easy. The result is the complex number system, which not only contains a solution of x2 = -1, but also solutions of the most general algebraic equations. Complex numbers were introduced in about 1560 by the Italian mathematician Rafael Bombelli (1530-1572?). Bombelli was a teacher a t the University of Bologna, an important center of mathematics during the Renaissance. Until about 1800, complex numbers were viewed as mysterious objects, devoid of any real meaning.t At the end of the eighteenth century, severa1 mathematicians independently gave logically correct definitions and useful geometrical interpretations of these numbers.

* The use of the symbol i to represent 4 1 in C is standard mathematical notation. This element is usually called the imaginary unit. t A vestige of the early mysticism surrounding complex numbers is the common use of the term "imaginary" to distinguish them from "real7' numbers.
286

8-11

THE CO;~;STRUCTION OF THE COMPLEX XUMBERS

287

In order to see how the complex numbers and their operations should be defined, we suppose that there is a field F which satisfies the conditions (8-1). Then F contains R and the number i, and therefore it will contain i u, where u and v are real numbers. al1 expressions of the form u Also, since the usual rules of arithmetic are available in a field, it is easy to derive expressions for the sum, negat,ive, and product of such numbers:

(x

-(x i y) (U

+ i -y) = (-X) + i - (-y),


+ i . U) = + i
IU

(ZU-

(yu XV) i Z y v yv) i . (yu XL)).

+ +

+ +

(8-2)

These identities show that the collection of al1 t'he elements which can be written in the form u i v is a subring of F. Moreover, it is not hard to show that this subring also satisfies the conditions (8-1). In particular, if F is a field C with al1 of the desired properties, then the assumption that C is minimal implies that C coincides mith the subring of al1 elements of the form u i v. Therefore, it must be possible to write every element of C in the form u i u, where u and v are real numbers. I t is apparent that u i v is determined by the two real numbers u and v. Moreover, if x i y=u i u, then x = u and y = v. Indeed, if y # u, then i = (x - u)/(v - y). However, i2 = -1, and since (X - u)/(v - y) is a real number, it follows that [(x - u)/(v - y)]2 0, which is a contradiction. Therefore, y = u, and consequently x = u. We can summarize this discussion by saying that if a number system C with the desired properties exists a t all, then there is a one-to-one correspondence, (U,u) c-, U i . L',

>

between the set R X R of al1 ordered pairs of real numbers and C. This observation suggests that a way to construct the complex numbers is to define suitable operations on the set R X R. The identities of (8-2) show how the operations of addition, negation, and multiplication must be defined for the ordered pairs. There is another important fact which is a consequence of the above discussion. Any two rings which satisfy al1 of the requirements desired for C are isomorphic. That is, if C exists a t all, then C is unique. f al1 complex numbers consists of al1 DEFIKITION 8-1.1. The set C o ordered pairs (u, u) of real numbers. I f (x, y) E C and (u, u) E C, then (a) (x, y) (u, u) = (x u, Y 4; (b) -(x, y) = (-z, -y); and (e) (x, y ) (U, u) = ( X U - y v, y u x u).

288

THE COMPLEX NUMBERS

[CHAP.

The ordered pairs of real numbers are definite objects which can be interpreted as complex numbers without any logical contradiction. However, the set of al1 ordered pairs of real numbers often occurs in mathematics with other interpretations. The intended meaning of (u, v) should be specified whenever such pairs are used. In the case of complex numbers, this will usually be unnecessary, because once we show that the system C defined above satisfies the requirements listed in (8-l), it will be possible to return to the convenient notation u i v. The reader should be aware of the double use of the signs -, and . in Definition 8-1.1. On the left-hand sides of the identities (a), (b), and (e), they represent the operations which are being defined for ordered pairs, while on the right-hand sides of these equalities, they indicate the known operations in the field R of real numbers. There is no problem about the operations in Definition 8-1.1 being well defined, as there was in the case of the rational numbers and the real numbers. Definition 8-1.1 involves no arbitrary choice, such as was made in defining the operations on the equivalence classes which are the elements of Q. Also, the expressions on the right-hand sides of (a), (b), and (c) obviously belong to the set C of al1 ordered pairs of real numbers, so that the problem of closure, which was troublesome in defining addition, negation, and multiplication of real numbers, does not arise. I t must now be shown that the complex numbers as defined above satisfy the description given in (8-1). This result is the content of the two following theorems.

+,

THEOREM 8-1.2. The set C of al1 complex numbers with the operations defined in Definition 8-1.1 is a field with (O, O) as the zero element and (1, O) as the identity element. The complex number (O, 1) is a solution of the equation

x2

-(l, O).

Proof. The proof that C is a commutative ring with (O, O) as the zero element and (1, O) as the identity element consists of checking the identities of Definition 4-2.1 in a straightforward way. For example, me will prove the associative law of multiplication:

8- 11

THE COSSTRUCTION

OF THE COMPLEX NUMBERS

289

Hence, (ul, V I ) ( ( ~ 2~ , 2 )( ~ 3va)) , = ( ( ~ 1V , I ) .(~2,v2)) ( ~ 3~ ,3). To prove that C is a field, it is necessary to show that if (u, u) # (O, 0) in C and (w, z) E C, then there exists (x, y) E C such that (u, U) (:c, y) = (w, 2).

(8-3)

If both sides of this cquality are multiplied on the left by (u, -u), then by the associative law just proved, we obtain (u2

+ v2, O)

(5, y) =

(U,

-U)

. (w, z) = (uw + uz, (-v)w

+ uz).

The real number u2

+ v2 is not zero, since otherwise

so that 'ZL = u = O. This contradicts the assumption that (u, u) # (O, 0). Thus, (u2 v2)-' exists in R, and

+
=

(x,y)

(1, O) (2,y) = (((u2 v2)-l, 0) . (u2 v2, 0)) (x, 9) = ((u2 u2)-l, 0 ) . (U, -z!) (w, 2) = ((u2 v2)-l (uw L'z),(u2 u2)-l (u2 - uw)).

+ +

As frcquently happens in elementary algebra, the steps which lead to the solution of (8-3) can be reversed to prove that the expression obtained for (x, y) really is a solution: (u,v) ((u2

+ vz), (u2 + v2)-l. (uz - vw)) = (U, u) (U, -u) ((u2 + v2)-l, O) (w, z) = (u2 + u2, O) ((u2 + v2)-l, O) . (w, = (1, o) (w,
+ u2)-l
(uw
2)

2)

= (w, 2).

By Definition 8-1 .l (c) and (b), (0, 1)2 = (- 1,O) servation completes the proof of Theorem 8-1.2.

= -(1,

O). This ob-

The definition of a complex number as an ordered pair (u, v) of real ,u i u, where numbcrs was suggested by the correspondence (u, u) t u and v are real and i2 = - 1. In particular, a real number u = u i O should corrcspond to the pair (u, O).

290

THE COMPLEX KUMBERS

[CHAP.

THEOREM 8-1.3. The correspondence zc t , (u, O) is an isomorphism between R and the subring R' = ((u, 0)Ju E R ) of C. Each element (0, 1) (v, O). of C can be expressed in the form (u, O)

This theorem, whose proof we leave for the reader, is the justification for identifying each real number u with the corresponding element (u, O) of R'. I f this identification is made, then R becomes a subring of C. Thus, C satisfies condition (i) of (8-1). In particular, we have now attained the following chain of inclusions relating the classical number systems of mathematics:

NcZcQcRcC.
For simplicity, each element of R in C will be denoted by a single symbol such as O, 1, +, 2, u, and v, rather than by the corresponding pair (O, O), (1, O), (3, O), (2, O), (U,O), and (v, O). Note that with this notation, O and 1 represent the zero and identity of C, as they should. Moreover, by Theorem 8-1.2, (0, 1)2 = -1, which leads to an exact definition of the symbol i as an abbreviation for (0, 1). Therefore,

and C satisfies condition (ii) of (8-1). By virtue of the notat,ion just introduced, the expression t c on a definite meaning as a complex number. In fact, u

+i

v takes

+ i . v = (U,O) + (O, 1)

(u, O)

(u, u).

We see from this equality that every complex number can be represented uniquely in the form u i v, with u and v real numbers. From this fact it follows easily that no proper subring of C satisfies (8-1); that is, C is minimal.

1. Express the following coinplex numbers in the form u iv. (4 (2, 1) . (172) (4 (3,2b'(l, 1) (a) (-1, 1) (b) ( 0 , l ) (2, -1) where (u, u) # (O, O ) (e) (u, 2. Complete the proof of Theorem 8-1.2. 3. Prove Theorem 8-1.3. 4. Determine t,he value of the sum i q o r al1 values of n. 5. Show that the following sets are subrings of C. ( 4 ((Y, s)lr E Q, s E Q) (b) {(a, b)la E 2, b E 2)

1s either of these subrings a field? 1s either of them isomorphic to N, 2, Q, or E ?

8-21

COMPI~EX COSJUGATES

A X D ABSOLUTE VAL'L'E IS c

291

8-2 Complex conjugates and the absolute value in C. I t is iiot possible to dcfinc an orderiiig of thc complcx riumbcrs such that they will form an ordered ficld. I'or if C could be made into an ordered ficld, then i2 = - 1 would have to be both positivc : ~ n d negativc, by 'i'heorem 4-53. Ho~vever, the ficld C has somc important special propertics ~vhich are not present in every field, or evcn iii ordcred fields. I n this sectioii the clonsequences of somc of thcse propcrties will be examined. Throughout this section and the remainder of the book, we \vil1 represent complcx numbcrs cithcr by single lettcrs, such as z and w, or else by the notation z i y and zc i c , where .c, y, u, and c dciiotc real numhcrs. Thc discussion a t the cnd of Section 8-1 justifies this c:oiivention. Since . x i y = u iu implics that s = u and = v, wc sec t,hat the real numbers n: aiid ?j appearing in the rcpreseritation z = .c i y are uniclilcly detcrmined by the complex riumber z. The real iiumber n: is called the real part of x, and y is called the imnginary part of z. I t is convenient to write n: = o ] ( ~ ) and y = g(z) in this case. T h a t is,

+ +

+ +

I t is obvious from t,he dcfiiiitioii of addition snd negation iii C that


R(x

+ w) = m(z) + di(w), g(z + w) = 9(z) + g(w),

and and

a(-z) S(-z)

-m(z), -g(z). (8-5)

DEFIXITION 8-2.1. Let, z = z conjugate of z is the iiumber

+ i y be a complex number.

The complex

We ~vill often simplify the phrase "complex conjugate of 2'' to "conjugate of z," although t,hc latter expression has a broadcr meanirig in ot,her phases of algebra. Also, it is customary to write n: - i y instead of z i(-y).

THEOREM 8-2.2. Let z aild w be complex numbers. Theii (a) z + w = Z + w ; (b) (-x)= -2; (c) 2 . w = X - w ; (d) if w f O, thcn = z/w; (e) if w = 2, t,hen = z, that is, = z; (f) x Z = 26i(z), z - Z = 2iS(z).

292

THE COMPLEX NUMBERS

[CHAP.

We will omit the proof of (a), (b), (e), and (f). The proof of (c) is obtained by direct computation. Let z = x iy, w = U iv. Then Z = x i(-y), and m = u i(-u). Hence,

+ X . = [(xu - yv) + i(yzc + xv)] = (xu


2 j = [XU- (-y) (-u)]
=

yv)

+ i[-(yu + xv)],

and ( u .

+ i[(-y)u + %(-u)] yo) + i[-(yu + xv)] = X.W.

To prove (d), note that by what we have just showii Hence, Z/w = Z/w.

If z = x

+ iy, then

Therefore z Z is a nonnegative real number, and it has a square root in R. DEFINITION 8-2.3. Let z = x iy be a complex number. The absolute value or modulus of z is the nonnegative real number

I f z is a real number, say z = x i O, then l t l = I f x 2 0, then @ = x. If x < O , then @ = -x. Therefore, the definition of the absolute value of z given above is consistent with Definition 4-6.6 for the absolute value of elements of an ordered integral domain (in particular, the absolute value in R).
THEOREM 8-2.4. Let z and w be complex numbers. Then z I O; if I z I = O, then z = 0; (a) I (b) z Z = 1212; (c> 1 . 4= 1x1; (d) 1-21 = 1x1; (e) l z w l= 1 . 4 ; (f) if w # O , then I z / w l= I z I / I w I ; (g) l @ ( z ) l 5 121, Ig(z)l 5 121; (h) 1 2 w l: '1 . 4 Iwl.

e.

>

IwI

8-21

COMPLEX CONJUGATES AND ABSOLUTE VALUE IN

293

iy. By Definition 8-2.3, 1x1 2 0. I f jz/ = O, then Proof. Let z = x O = x2 + y2 2 x2 2 0; hence, x = 0; similarly, y = O . To prove (b), observe that z

z=

(x

+ iy)

[x

+ i(-y)]

= x2

+ y2 = Izj2.

The equality (c) is obtained from (b) and Theorem 8-2.2(e) by taking the square root of both sides of the identity 1212 = I =Z z= z E = 1.~1~. Using Definition 8-2.3,

The identity (e) is obtained from (b) and Theorem 8-2.2(c) by taking the square root of both sides of the equality

Using this result, we have Iz/wl . Iwl = 1 (z/w) . w l = jzj. If w # O, then Iwl t ' O by (a), so that this identity can be divided by Iwl to obtain (f). I fz=x iy, then by definition,

The second statement of (g) is proved in a similar way. Finally, to obtain the triangle inequality {h), note that by Theorem 8-2.2,

Taking the square root of the first and last term of this inequality yields (h). The theorem we have just proved contains the most important elem e n t a r ~properties of the absolute value. The reader should become thoroughly familiar with these f acts. The result of Theorem 8-2.4(b) can be used to calculate quotients in C. The general idea, which was used implicitly in the proof of Theorem 8-1.2, is that

294

T H E C O M P L E X KUMBERS

[CHAP.

We will use the results of Theorems 8-2.2 and 8-2.4 and the fact that every nonnegative real number has a square root to prove that every complex number w has a square root x in C. In fact, an explicit expression for x in terms of w can be obtained. Let w E C. First assume that there is a complex number x which satisfies x2 = W. We will solve this equation for z in terms of w. By Theorem 8-2.4, x2 implies Iwl = / z 2 /= lz12 = 2 . 2. Therefore,
=

Moreover, by Theorem 8-2.2,

Thus, 2[lwl we obtain:

+ @(w)]is a nonnegative real number. Taking square roots, 2@(z) = & . \ / [ 2 ~ ~ ( z = ) ] 2f d ~ [ l w l + @(w)]. (8-7) There are now two cases to consider. I f jwj + @(w) # O, then by (8-7),

2@(x) # O. Then (8-6) can be written in the form

Using (8-7) again, \ve find that x has the two possible values,

I f

IwI

+ @(w) = O, then

Hence g(w) = 0, and w = @(w) = -Iwl. That is, w = -u, where u is a nonnegative real number.' By (8-7)) @ ( E ) = O. Hence, x = i g ( x ) , and

8-21

COMPLEX CONJUGATES ,4ND ABSOLUTE VALUE I N C

295

Therefore, $(z) = ble values,

&m. Consequently, in this case z has the two possiz = (8-9)

Our discussion shows that if there is a compIex number x satisfying z2 = w, then z must have the form (8-8) if lwJ @(w) # 0, and 2 is @(w) = O. I t remains to show conversely that if given by (8-9) if lwl @(w) # 0, and by (8-9) when lwl is given by (8-8) in case Iwl @(w) = O, then z2 = w. This is done by an easy computation. Suppose first that Iwl @(w) # O. Then

If Iwj

+ a ( w ) = O, theii ( f i f l ) 2

-1wI

W.

THEOREM 8-2.5. I f w is aiiy nonzero complex number, theii there are o numbers z such that x2 = w. If jwl @(w) # O, exactly t ~ complex then these numbers are given by

I f Iwl

+ @(w) = O, the solutions of z2 = w are


z
=

i m

and

im.

For any complex number w, it is convenient to Iet the symbol fistand f or IwI J2[lwl @(w)l

if Iwl a ( w ) # 0, and for if lwl @(w) = O . Then we caii say that the two square roots of w are 6and -&.

i\/m +
=

EXAMPLE 4. Let w = 3 16. Hence

+ i4. Then lwj

5, @(w)

3, and 2[/wl

+ @(w)]

296

THE COMPLEX NUMBERS

[CHAP.

In the case of square roots of complex numbers, just as for square roots of real numbers, we must be careful not to assume that @ is always equal to w. I t is in fact easy to see that

@ = w if @(w) > O, @ = w if R(w) = 0, @ = -wif@(w) = O ,

@ = -wif @(w) < 0, and 9(w) > 0, andS(w) < O .

(8-1 O )

In any case, 4 3 = h w . More generally, we obtain the following result. (8-2.6). Let z and w be complex numbers. Then (a) 4X.W = &(&-&); (b) if w # O, then = =t (G/&). We leave the proof of these identities for the reader. The theorem that every complex number has a square root in C can be used to show that any quadratic equation

where a, b, and c are complex numbers and a jL 0, has a solution x in C. Suppose that x is a complex number which satisfies (8-1 1). Rewrite (8-1 1) in the form

That is, the term b2/4a is added and subtracted on the left-hand side of (8-ll), so that the expression in parentheses becomes a perfect square. This is the familiar method of completing the square. I t leads to the equality

Therefore,

Conversely, it can be checked by direct substitution that the two numbers given in (8-12) are solutions of (8-11). That is, the following result holds.

8-21

COMPLEX CONJUGATES AND ABSOLUTE VALUE IN C

297
#

THEOREM 8-2.7. Let a, b, and c be complex numbers and a


the solutions of the equation

O. Then

are given by the formula

EXAMPLE 5. Find the solutions of the equation

Apply the formula of Theorem 8-2.7 with a

+ i, b = 1 + i2, and c = -2:

X =

-(1

+ i2) + (3 + i2)
2(1

+ i)

2
A -

2(1

+ i) -

(1 - i))

l. Simplify the following quotients.

2. Find the square roots: (a) 4 7 i(24) (b) 4 2 (c)

4 -

(d)

d m

(e)

fl

3. Find the solutions of the following equations. 1 = O (b) (3 i)x2 (a) x2 2ix 10x - (9 -j- i3) = O (c) -5x2+ 2/Zx - 1 = o 4. Prove Theorem 8-2.2(a), (b), (e), and (f).

+ +

298

THE COMPLEX NUMBERS

[CHAP.

5 . Show that if z and w are complex numbers, then the following are true.
( 4 lz 5 lzl (b) Iz - w l 2 IzI ( 4 lz wI2 Iz - wI2

WI

+ IwI

2(1zI2 1wI2) 6. Show that for any complex number w, @(2/6) 2 0.


=

+ +

IwI

7. Prove (8-10).
8. Prove (8-2.6). 9. Show that if @(w) > O and @(z) > O, then

2 / / x = fi d .

10. Show that the numbers given by (8-12) are solutions of (8-11). 11. Prove that if w = u

+ iv, and v # O, then

12. Let w = a ib, where a and b are integers. Prove that Iwl is an integer if and only if w = t z2 or w = it z2, where z = r is with r, S, and t integers. [Hint: See Theorem 5-5.4.1

13. Solve for z in terms of w in the equation z4

w.

8-3 The geometrical representation of complex numbers. We mentioned in Section 8-1 that ordered pairs (u, v) of real numbers have severa1 interpretations in mathematics. One of the most familiar applications of these pairs occurs in analytic geometry. In fact, analytic geometry is based on the "coordinatization" of the plane, that is, a one-to-one correspondence between the set of al1 points P cif the plane, and the set of al1 pairs (x, y) of real numbers. This correspondence provides an important way of representing complex numbers by points in the plane. For the reader who is not familiar with analytic geometry, we will discuss briefly the process of defining coordinate systems in a plane. The construction begins with the choice of any two perpendicular lines. It is convenient to take one of these to be horizontal. This line is called the x-axis, and is denoted by X. The other line must then be vertical. I t is called the y-axis, and is denoted by Y. Let O be the point of intersection of X and Y. The point O is called the origin of the coordinate system in the plane. Let I be a point on X which lies to the right of O. Using 01 as the basic unit interval, define a coordinate system on X by the construction described in Section 7-3. Let J be a point on Y, above O, such that the distaiice is equal to the distance That is, the segments 01 and O J are congruent. Establish a coordinate system on Y using OJ as the basic unit interval. Let P be any point in the plane. Construct the line i through P and parallel to X (hence perpendicular to Y). Also draw the line m passing

m.

8-31

GEOMETRICAL REPRESEXT.4TION OF COMPLEX XUMBERS

299

through P and parallel to Y (hence perpendicular to X ) . Then 1 meets Y at some point S, and m meets X at some point T. Let x be the real number corresponding to T in the coordinate system on X . Let y be the real number corresponding to S in the coordinate system on Y. We associate with P the number pair (x, y) (see Fig. 8-1) :

Different points evidently correspond in this way to different number pairs, and every pair of real numbers is associated with some point. In fact, the point corresponding to (x?y) can be found as the intersection of the vertical line through the point associated with x on X and the horizontal line through the point corresponding to y on Y. Thus, P * (x, y) is a one-to-one correspondence between the set of al1 points of the plane and the set of al1 pairs of real numbers. A plane, together with a correspondence between points and number pairs defined in this way, is called a coordinate plane. The numbers x and y in the pair (x, y) corresponding to the point P are called the cartesian* coordinates of P. Sometimes, to be more specific, x is called the X-coordinate or abscissa of P, and y is called the Y-coordinate or ordinate of P. The points on the x-axis are exactly the points whose coordinates are of the form (x, O). The points on the y-axis have coordinates (O, y). In (O, O), 1 +-+ (1, O), and J * (O, 1). particular, O (8-3.1). Let S and T be points with cartesian coordinates (xs, ys) and (xT, yT), respectively. Then the distance between S and T is

* The term "cartesian" is used in honor of the French mathematician and philosopher Rene Descartes (1596-1650), who was the founder of analytic geometry.

300

THE COMPLEX NUMBERS

Prooj. The proof of this statement is based on the Pythagorean triangle theorem. Let ls and lT be horizontal lines through S and T, respectively, ~ vertical lines through these same points. Let P be and let ms and r n be the point of intersection of the perpendicular lines ms and lT. Then P S T is a right triangle with S T as its hypotenuse. Figure 8-2 illustrates a f ST is horizontal or vertical, the triangle PST is typical situation. I degenerate, and this case requires special treatment. The lines ms and r n ~ intersect the x-axis a t points corresponding to the real numbers xs and XT, and these two points, together with T and P, determine a rectangle. Thus, the distance between P and T is the same as the distance between the points on the x-axis corresponding to xs and x ~ : T P = IxS - xT]. Similarly, PS = \ys - YTI. Hence,
-72

si

T P+ ~ FS2 = l x s
=

xT12

+ lYS

(XS - x T ) ~ (YS - yT)'.

yT12

Taking the square root completes the proof. We now turn to the representation of complex numbers as points in a coordinate plane. This representation is obtained simply by using the definition of complex numbers as pairs of real numbers, and associating each complex number x i y = (x, y ) with the point in the coordinate plane whose coordinates are x and 3. I t is then possible to use the complex numbers as labels for the corresponding points in the plane (see Fig. 8-3)) just as the real numbers are used to represent the points on a line. The term complex plane is often used to describe a coordinate plane whose points are labeled by complex numbers. I f complex numbers are interpreted in this way, then the operations with iy, them have interesting geometrical meanings. For example, if x = x

then @(z) = x is the abscissa of z and g(z) = y is the ordinate of z. Thus, in particular, the real numbers represent points on the x-axis. The absolute value /zl = is the distance from the origin O to z. More genl is the distance between erally, if z and w are complex numbers, then (z - w the point z and the point w. To see this, let z = x iy and w = u iv. which, Then I z - WI = 1 (x - U) i(y - U) 1 = d ( x 4- (y by (8-3.1), is the distance between the point with coordinates (x, y) and the point with coordinates (u, u). Often it is possible to give concise descriptions of sets of points in the plane, using complex numbers.

.\/m
+

EXAMPLE 1. {zIB(z) > O) is the set of al1 points in the upper half plane; in other words, the set of al1 points which lie above the x-axis. EXAMPLE 2. {z/121 < 1) is the set of al1 points which have distance less than one from the origin O, that is, the set of points which lie inside a circle of radius one with center a t O. EXAMPLE 3. {zI Iz - 'il = 1) is the set of al1 points on the circle with center a t i, and radius equal to one. EXAMPLE 4. {z14(z) = m@(z)),where m is a real number, is the set of al1 points on a line 1 through the origin, with slope equal to m (see Fig. 8 4 ) .

The addition of complex numbers has an interesting geometrical meaning. Let z and w be complex numbers representing points in the complex f x, w, and O lie on a line E which plane. Let O be the origin in the plane. I is not the y-axis, then by Example 4, B(z) = m@(z) and 4(w) = mR(w) w) = $(z) S(w) = for some real number m. Consequently, 4(x m[@(z) @(w)]= m@(z w). Therefore, z w corresponds to a point on E . [If z and w lie on the 3-axis, then R(z) = R(w) = O implies @(z w) = 0, and z w is on the y-axis.] I f the origin O does not separate z from w on E, then z w is a t a distance 1x1 Iwl from O on the same f O is between x and w, then x w is at a distance side as z and w. I llzl - lwll from O? on the same side as x if 1x1 > Iwl, and on the same side as w if lwj > lzl (see Fig. 8-5). I f x, w, and O do not lie on the same line, then the point corresponding to x w can be determined by the parallelogram rule.

THE COMPLEX XUMBERS

[CHAP.

(8-3.2). Parallelogram rule. Let x and w be complex numbers representing points S and T, such that O, S, zlnd T do not lie on a line. If P is the point corresponding to z w, then OSPT is a parallelogram (see Fig. 8-6).

I t should be remarked in connection with (8-3.2) that OSPT js the order in which the vertices are encountered in moving around the sides of the parallelogram. That is, ST and OP are diagonals, not sides of the figure. The proof of (8-3.2) is an exercise in elementary geomet'ry, which we will leave for the interested reader.

1. Draw coordinate axes in a plane, and plot the points with the following coordinates: (2, 1), (-1, 2), (-1, -l), (-$, +). 2. Find the distance between the following pairs of points in the complex plane. (a) 4, i9 (b) i, 2 - i (c) -1 i, 1 - i (d) 9 + i(15),4 - i9

3. Describe the following sets in geometrical terms.


(4 (b) (4 (d) (e) (f) (g) g(2) 5 1) 1) (21 Iz - 1 1 > 1) (21 1z21 = 4) (21 1z2 - 2x+ 1 > 0) ( z j ~ ( 2 ) = r t l , S ( x ) = f1) (zlg(2) = 2@(4,g(2) 2 01
(21-1

{zl9(ix)

I2[] is the circle with centcr a t -w, with 4. Sliow t h a t {xl Iz - zr;l = radius 4 2 1x1. [Hint: Use the ideiitity obtained in Problerii 5(c), Scction 8-2: ' 7 -L w ' 2 -t - 2C12 1% I 2(/s12-i- Izc12).] 5 . \\-1i:it is tlie geon~ctricalinterl)rctation of tlie law Iz wl 5 1x1 IwI? ITscthis intrr1)rctation t o tlccide ivlicri tlie cquality Ix wl = 1x1 f I w l holds.
1

IZ

6. \\-1i:tt is tlie geon1etrica.1 mclnnirig of thc idcritity givcn in Problem 5(c), Scctioii 8-2:>

7. Describe tlic riietliod of finding the point corresponding t o z -l- w, if you are givcn thc poiiits corrcspoiidirig t o z and x.
8. M-hat is thc gcoriictrical iiitcr1,rctation of 2 , -2, and z poiiits rcl)resciitccl 1.13' x and zo) 3
-

w (in terrns of thc

9. Slio\\- thnt if z # O, ~vhcrex is a coiiiplex riuniber, and if t is a n y real niiiiiljcr, t1ic.n tlic poirits 0, z, an(1 t z arc al1 in a line. S h o ~ v t h a t O, z, sncl zu lie ori : L liiic if iiritl only if eitlicr z = O, or w is a real multiljlc of z.

10. l'rovc (8-3.2).

8-4 Polar representation. T o iiitcrpret multiplication, \ve introduce a 11t.n- n-ay of 1-eprcsciitirig complcx iiumbcrs. 'i'his reprcsciitatioil is bascd oii thc polar coordinatc systcm* used iii analytic geometry, and for this rcasoii it is crtlled the polar reprcscntation of complcs iiiimbers. Lct O be tkic origin of s cartesirtn coordiilate system iii tlie plaiie. As iri the 1;ist scctioii, tlic points of the plaiie \vil1 bc lat~eledby complex iy be a noiizero complex iiumbcr. 'i'he line scgnumbcrs. I,et x = .r merit from O to the poiiit x has lciigth 171 = 2 / F + T I,et . 8 dcilote thc :lnglct whicbh this lirie scgmeiit makcs n-itli tlic riglit lialf of the .r-axis (SCC Z:ig. 8-7). -1s is (bi~stomiiry, \ve will meusiirc aiiglcs couiitcrcloc~k\vise bctu-ccii O :~iid 300 dcgrccs. 'i'hc compoiiciits .L. :iiid of x cnii be espresscd in tcrms of 12; alid 8 by the cquatioiis 2: X

+-

.r

{xi cos e,

;xl siii 8.

FIGURE 8-7

* 'i?ic dcfiriition of polar coordinates niakcs use of t,he geonictrical concept of "anglc" arid tlic trigononictric functions "sinc" and "cosine." Iii ordcr t o obtaiii rcsults sucli as 'i'licorcril 8-4.3, bcloiv, ~vithoutresortirig to tlie usc of "gconietric:illy cvident" f:icts, \\-e \\-ould liavc to dcfiiic :iriglcs, sirics, aiid cosines rnorc carcfully, and work h:irtl to show tliat thcsc, iiotions liave tlie 1,roljerties n-liiclli are o1)vious to our geoii~etricalintuitioii. Tlo~\-cver, no attcriipt will be iiiadc to carry out sucli a 1)rograni here. t Tlie syrnbol 8 is tlic siilall Grcck ltttcr tlicta. -Ingles in ri~atlien~atics are oftcn rc1)rcscntctl by l o \ ~ e r case Greek lettcrs, sucli as 8, 4 (phi), and x (chi).

THE COMPLEX NUMBERS

Hence, we can write x


=

1x1 (COS 6

+ i sin 6).

This expression is called the polar representation of x. The angle 6 is called the argument, or arnplitude, of t'he complex numbei x. This angle will be denoted by Arg x. I t should be remembered that the argument of the complex number O is not defined. Since we are measuring angles in degrees* between O and 360, the argument of every nonzero complex number satisfies

< Arg x

< 360.

It is evident that if x is a positive real number, then Arg x = 0, and if z is a negative real number, then Arg x = 180. Although the argument of a complex number x is always betureen O and 360, it should be noted that if x = 1x1 (COS 8 i sin 8)) where 8 = Arg x, then we also have

x = 1x1 [COS (8 for n = 0, d=1, =t2,

+ n - 3 6 0 ) + isin (8 + n.360)]
+ i sin 410)

. . . . Thus, for example,


x = 3 (cos 410

is a complexnumber with Arg x # 410. Infact, Arg x = 410 - 360 = 50. Let x and w be two nonzero complex numbers, with Arg x = 8, and Arg w = 4. Then
x =

IxI(cos8+isin6),

w = Iwl(cos++isin+).

Hence,
x w = 1x1 Iwl [(COS 6 cos 4
-

sin 8 sin 4)

+ i (sin 6 cos 4 + cos 8 sin +)l.

This expression can be simplified by using the sum formulas of trigonometry : cos (6 4) = cos 6 cos 4 - sin 6 sin 4, sin (8 4) = sin 6 cos 4 cos 8 sin 4. We obtaiii (8-14) x w = 1x1 IwI [COS (6 4) i sin (6 +)l.

+ +

+ + +

* I n most higher mathematics, angles are measured in radians rather than degrees. However, for the applications in this book, radians have no advantage over degrees. I n computational work, i t is more convenient to use degrees rather than radians, since most trigonometric tables list angles in degrees.

8-41

POLAR REPRESEXTATION

305

Comparing this formula n-ith the equation which expresses the trigonometric representation of z w,we obtain the rule for determining the argument of the product of two numbers. THEOREM 8-4.1. Let x and w be iionzero complex iiumbers. Theil Arg z w = Arg x
Y

+ h r g w,
z
-

-+ Arg w < 360, and Arg z w = Arg x + Arg w if Are; x + Arg w 2 360.
if Arg x

360,

FIGURE 8-8

This theorem provides the desired geometrical interpretation of multiplication in C. The point x . w is the point on the half line which makes an angle of (Arg z Arg w) with the positive real axis, and which is a t a distance lzl I w l from O (see Fig. 8-8). A particularly important case of (8-14) occurs when w = z. This 28 i sin 28). We can easily genformula then becomes z2 = /zI2 (COS eralize this result by induction:

xn

= Iz jn

(cos n8

+ i sin no),

where 8

Arg x.

(8-15)

In particular, if 1x1 = 1, this identity gives Demoivre's theorem. TIIEOHEM 8-4.2. (cos 8

+ i sin 8)" = cos n8 + i sin no.

The theorem of Demoivre has numerous applications. One which students often appreciate is its use as a device for deriving trigonomet,ric formulas for multiple angles.

EXAMPLE 1. JVe use Theorem 8 4 . 2 to determine cos 48 and sin 48. Taking = 4 in this formula, and using the binomial theorem, we obtain:
cos 48

+ i sin 48

= =

(cos 8 cos4 8

+ i sin + 4 cos3 8 ( i sin 8) -t6 cos2 8 ( i sin + 4 cos 8 ( i sin + (i sin (cos4 8 - 6 cos2 8 sin2 8 + sin4 8)
4-i (4 cos3 8 sin 8 - 4 cos 8 sin3 8).
cos4 8 - 6 cos2 8 sin2 8 sin4 8, 4 cos3 8 sin 8 - 4 cos 8 sin3 8.

Thus, cos 48 sin 48


= =

THE COMPLEX S U M B E K S

l+i

zo = $2(cos

15 i sin 15)

A more significant application of t,his theorem occurs in the proof of resul t. the follo~ving
THEOHEM 8-4.3. Let w bc ariy nonzero complex number. Let n be a natural number. Then there are exttctly n distinct complex numbcrs z sat,isfying zn = w. If w = I w l (cos O i sin O), wit,h O = Arg w, t,hen these numbers are given by z = 20, z = 21, . . . , z = Zn-l, where

Zj

7 i . I

[tos (" 1 '


+

"O)

+ i sin (O

+:

360)]

(8-16)

This theorem is illust,rated in E'ig. 8-9, with w = 1 The polar representation of 1 i is 1

+ i and n = 3.

+i = 4 (cos 45 + i sin 45).

I n order to prove Theorem 8-4.3, first not,c that by Sheorem 8-42)

(":

"O)

+ i sin (O
=
=

1 ' 360)])1
I w l [COS (e
I w l (COS O

+ j 360) + i sin (O + j 360)]


+ i s i n 8) = w.

Thus, for any nonnegative integer j, the formula (8-16) gives a solution o the equation zn = w. What ]ve must shom is that for j = 0,1, . . . , n - 1, the numbers given by (8-16) are al1 ditrerent, and that for any z sueh that zn = w, there is somc j = 0, 1, . . . , n - 1, such that z = zj. First observe that if O 5 j < n, then

8-41

POLAR REPIEESESTATIOS

Theref ore, Arg zj = (Kote hotvcver that ArgzjZ


S
S

+ j 360
n

+ j 360

if j 2 n. Ver cxample, Arg zn = S/n.) Suppose that

Thcn by tvhat has just been observed, Arg zjl


=

+ j~ 360 # S + 360 n n
j2

Arg zj2.

Thercfore, zj, # zj,. This proves that the numbcrs 20, 21, . . . , ~ n - 1 are al1 different. Assume next that z is a complex number which satisfies zn = w. )Ve want to prove that z is one of the numbers 20, 21, . . . , ~ ~ - 1 that is, /zl = and S j 360 Arg z = n

vm

for some j = 0, 1, . . . , n - 1. I t follows from Theorem 8-2.4(e) (using mathematical induction) that zn = w implies lzln = i w l . Therefore, Izl = Let z = lzl (cos i sin +) be thc polar representation of z, with + = Arg z. Thcn by Theorem 8-4.2,

w.

++

Izln (cos n+

+ i sin n+) = zn = w = Iwl (cos S + i sin S).

Therefore, since jzln = (w!, this equality implics cos n+ = cos 0, and sin n+ = sin 8. Using the fact that ttvo anglcs mhich have the same sine and cosine differ by an integral multiplc of 360, we obt,ain n+ - S = j 360, where j E 2. Thereforc, S j 360

+= +

Sincc O 5 = Arg z < 360, (n+ - 8)/36O, it follows that

O 5 S

Xrg w

<

360,

and

Hotvcver, j is an intcgcr, so that thcse strict inequalities imply

308

THE COMPLEX NUMBERS

[CHAP.

1. Find the arguments of the following complex numbers. (e) - l + i (d) l + i 3 (e) 3 - 24

(a) 5

(b) -i

2. Use Demoivre's theorem to obtain expressions for cos 58 and sin 58 in terms of cos 8 and sin 8. 3. Give the inductive proof of (8-15) in detail. 4. Show that Demoivre's theorem is valid for negative exponents; that is (cos 8 for al1 n E N. 5. Use Theorems 8-4.3 and 8-2.5 to obtain expressions for cos (8/2) and sin (8/2) in terms of cos 8 and sin 8. 6. Using a table of sines and cosines, find al1 the solutions z of the following equations (giving the real and imaginary parts with four decimal accuracy). (a) z3 = 1 i (b) z5 = 1 (e) z4 = i (d) z6 = -1 (e) z3 = -2 - i2

+ i sin 8)

-n

cos (-no)

+ i sin (-no)

7. Let n be a natural number. Define


360 360 < = cos + i sin n n (a) Show that x = 1, x = <,z = 12, . . . , z = are al1 of the different solutions of the equation zn = 1. (b) Prove that if z = u is one solution of zn = w, then al1 the other solutions are of the form u li, where j = 1, 2, . . . , n - 1.

rn-l

8. Let { be defined as in Problem 7. Show that for k E N


1 1

+ lk + lZk + .+ + rzk+ + . +
{k+

<(n-l)k <(n-l)k

O if n does not divide k, = n if n divides k.


=

THE THEORY OF ALGEBRAIC EQUATIONS


9-1 Algebraic equations. The problem of solving algebraic equations has interested the mathematicians and engineers of al1 ages, as far in the past as the civilizations of Babylori and Egypt. There is evidence that the Egyptians solved certain complicated quadratic equations as early as 2000 B.C. The first steps toward the development of a theory of equations were takcn by Diophantus of Xlcxandria in about the third century -4.1). Important advances in the subjcct were made by IIindu and Arabian scholars from about 800 to 1100 a.n. The modern era of algebra (and indeed of al1 mathematics) came with the Renaissance in the sixteenth century. The Italian mathematicians Scipio Ferro (1465-1526), Niccolo I;oritana, nicknamed Tartaglia (1500-1 557), Girolamo Cardan (1501-157G), Bombelli, and the Frcnch mathematicians Vieta and Descartes, among others, lifted the theory of algcbraic equations to about the leve1 which is presently taught to high-school algebra students in the United States. The development of algebra after Descartes followed more sophisticated and abstract lines. Our purpose in this chapter is to examine critically the familiar results of algebra, and to lead the reader a short way into the rcalm of the modern theory of equations. Elementary algebra courses are concerned largely with linear equations, such as x + 1 =2, 3x+5=3, 3x-+=o,

and quadratic equat,ions! such as

'rhe solution of equations of higher degrec is also considered, but usually only in the cases where the polynomials involved can be factored; for example,

These examples are al1 special cases of the general nth degree equation

In this expression z is the "unknown" and the ai are real or complex

310

T H E T H E O R Y OF ALGEBRSIC EQUATIONS
.L.

[CHAP.

numbers. The problem is to find values of into the "polynomial"

which, if they are substituted

make this quantity zero. For first- and second-degree equations, convenient formulas exist which give the solutions explicitly. The solution of the linear equation

a.
is given by the formula The quadrat'ic equation

+ a l z = O,

(al # 0)

,J-=

- - .a o
a1

has t'wo solutions (which may be the same) : and

There is no reason to suppose that for equations of degree higher than two i t will always be possible to find solutions among the complex numbers. We saw that in order to solve equations such as x2 - 2 = O and x2 5 = O it was necessary to extend the rational number system to the real number field; to solve s2 1 = O it was necessary to go beyond the reals to the complex numbers. 1s there any reason then to suppose that it is possible to solve such equations as

x3+32+1=0,

and

x5f(1+i)x3+2x2-ie-1=O

within the complex number field? Moreover, even if the general nth degree equation has solutions in the complex field C, can we expect to obtain explicit expressions for these solutions such as we have for the solutions of linear and quadratic equations? These two questions will be discussed more completely in Sections 9-8 and 9-9. In the theory of numbers, the study of algebraic congruences is almost as important as the study of algebraic equations. The problem of solving linear congruences, a2 b = O (mod m ) ,

was discussed in Section 5-7. congruences of higher degree, such as x4

+ 3x2 + 5.x +- 1 = O (mod 8)

and

4x213fx f 37 = O (mod 89))

a r e m o r e difficault t o solvc. Of course, integral solutions .r of aiiy congrucliicc can bc foilnd by trial a n d error if t h c y exist. Hoivcvcr, this m c t h o d is pracatical oiily f o r congruenccs of small modulus.

LXAMPL l~ . : Lct us solve

JVe wish t o dctcrnline tlie sct

First observe tlint r y (111od 8) implics tliat x4 3x2 52 1 Y4 -k 3y2 -t 5y 1 (mod S). Thcrcforc, if x is a solution of x4 3x2 52 1 O (nlod S), thcn so is y, :ind vicc versa. This iiicans t h a t t o solve the congruence x4 3x2 5.r 1 = O (rnoci S), it is oiilj- ricccssary t o find the riunlbcrs in thc sct {O, 1, 2, . . . , 7;

+ + + +

+ + + + +

--

wliicli are solutioiis. Evcry solutiori is congrucnt t o n solution which is in t h i s set, and conversely, every iiumber ~vhicliis congrucnt t o a solution x satisfying O x < 8 is itsclf a solution. Ucfore stnrting thc work of substituting each of thcsc nunihers into the given congruence, Ict us note t h a t thc theorems of Chaptcr 5 can be used t o simplify our work. If x is even, then

<

'i'hereforch, by Theorein 5-6.4(b), no cvcn valuc of x can be a solution of x4 3x2 5x t . 1 O (nlod 8). I3y Eulcr's Tlicorcm, 5-8.6,

x4 l~rovidcd that

= 2 ~ (= ~1 ) (mod

8),

is relativcly yrime t o 8. Thus, for odd values of x,

Non- Ict us clicck the valucs x


x4+3x2+52+
x4 24

1, 3, 5, and 7 : +2=

1E 3.1-5-1

+ 3x2 + 52 + 1 = 3 + 3x2 1- x + 1 3

x4

+ 32"

5x -1-

+ 5 3 -12 4 (mod 8) 5 2 + 5 5 -t2 6 (mod 8) i2 + 5 7 + 2 = O (mod 8)


32

2(mod8)

forx for x for x for x

= =
= =

1, 3, 5,

7.

Tliereforc, tlic required set of solutioi-is is

312

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

The problem of solving congruences with a prime modulus p is equivalent to finding the solution of equations

where ao, a l , . . . , and a, are elements of the field 2, which was discussed in Example 1, Section 6-3. This fact is important because it enables us to apply tjhe theory of algebraic equations to obtain theorems about congruences. Such an application will be given in Section 9-7.

1. Find the real values of x ~vhich are solutions of the following equations. (a) (b) (c) (d) (e)

x2 - 4x - 2 = O 3x3 - 1 = O x2+ x + 1 = o x4 - 2x2 1 = O x75 - 1 = O

2. Find the real and complex values of x which are solutions of the following equations. 1= O (a) x4 (b) x 3 + x 2 + x + 1 = O (c) x 1 + 5 x 5 + 4 = o (d) x2" 2xn - 1 = O (e) x6 - 3x4 3x2 - 1 = O

3. Let a, b, and c be real numbers, with c # O. Show that the equation

(a) has two (different) real solutions if b2 > 4ac, (b) has one real solution if b2 = 4ac, and (c) has two complex conjugate solutions if b2 < 4ac.

4. Find al1 integers x which are solutions of the following congruences. (a) x2 - 2x 1 O (mod 2 ) (b) x46 -l- 7x32 8x17 5x16 2x9 4x3 32 1 = O (mod 3 ) (c) x6 - 1 = O (mod 7) (d) x1 63x4 x = O (mod 9) (e) 2x25 57x 1 = O (mod 30)

+ + + +

+ + + +

9-2 Polynomials. The theory of equations is based 011 the algebra of polynomials. From elementary algebra, the reader is familiar with the procedures f or adding, subtracting, multiplying, and factoring polynomials. Al1 of the operational rules which were given in Section 4-2 as the postulates for a ring are used in manipulating polynomials. In order to

jiistify the use of these rules, it is necessary to examine the concept of a polynomial more critically than is customary iii elementary algebra courses. This is particularly imperative for our purposes, because we want to develop the theory so that it can be applied to equations iii the fields Z,, as me11 as the fields of complex and real numbers. Our plan in this section is to first review the intuitive definitions concerning polynomials and their opcrations. Then ]ve will examine these notions more critically, and see how they can be put on a sound basis. The reader who is not iiiterested in the formal devclopment of polynomials may omit the last part of this section. Let D be an integral domain. A polynomial in . x with coeficients i n D, is (tentatively) defined to be a formal expression

~vhcreao, a l , a2, . . . , and a, are elemeiits iil D. For the present the symbols xO,z l , . . . , xn and the plus signs iii (9-1) arc to be thought of as nothing more than punctuatioii marks which separate ao, a 1, . . . , and a,. The notation aozO a1z1 anxn is adopted because this expression will ultimately be interpretcd as a sum of products. For O 5 i 5 n, the expressions ai.xi are called terms of the polynomial. The elements ao, a l , a2, . . . , a, of D are called the coeficients of rO, xl, r2, . . . , and m", respectively, in this polynomial.

DEFINITIOS 9-2.1. TWOpolynomials iii x with coefficients iil D are equal if they have exactly the same terms, except for terms with zero coefficieiits. That is,

if a. = bo, a l = b l , a2 = b2, . . . , and a j = O for m m < n,orbj=Oforn < j 5 mifn <m. For example,

<j 2

n if

I n writing a polynomial, it. is customary (a) to omit terms with coefficient O, (b) to write a. inst,ead of aozO,(c) to write r instead of z l , (d) t,o write z j instead of l x j for j > 0, and (e) to write -a$ instead of (-aj)z3. For instante, instead of

314 we would write

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

+ 22 + x3

5x4.

We will later see that these conventions are entirely justified. It is also a common practice to use expressions such as a(x), b(x), c(x), f(x), g(z), and p (x) to represent polynomials. It follows from Definition 9-2.1 that any two polynomials can be written with the same number of terms. For example, suppose that aoxO a2x2 . amxm and boxO blxl bnxn are polynomials with m < n. Then we can write

+-

This observation shows that the following definition of addition of two polynomials is completely general.

DEFINITION 9-2.2. Addition o f polynomials. Let


and

+ + a2x2 + . + anxn, b(x) = bozo + blxl + bzz2 + + bnxn


a(2) = aOxO alxl

be polynomials in z with coefficients in D. The sum of a(x) and b(z) is the polynomial

As an example, let a(x) = 2 x - x2 and b(x) = 3 x2 2x3. Then a(x) b(x) = 5 x 2x3. Indeed, we can consider 2 x - x2 as an abbreviation for 2x0 lxl (-l)x2 0x3, and 3 x2 2x3 as an abbreviation of 3x0 Oxl 1x2 2x3; therefore, by Definition 9-2.2,

+ + + + + + + + +

+ + + + +

In elementary algebra courses, the process of multiplying two polynomials is usually carried out in severa1 steps. First, al1 combinations of two terms, one from each polynomial, are multiplied. Then the rule of exponents is applied to the powers of x, and finally the coefficients of equal powers of x are collected. The whole procedure can be carried out, using

the familiar arrailgemen t for m111tiplication :

I t is not convcnicnt to use this description of the process of multiplying polynomials as thc dcfinition of multiplication, but the end product of the method can be described in general tcrms aiid provides a satisfactory defini tion.

DEFISITIOS 9-2.3. ilfultiplication of polynomials. Let

be polynomials in z with coeficients iil D. The product of a(z) and b(z) is the poIynomia1

The coefficient of xi in the product a(z) b(+) is the element,

m-here aj = O if j

>

m, and bk

O if Ic

>

n.

DEFIXITIOS 9-2.4. Negation of polgnomials. Let

coefficieilts in D. The negatilv of a(x) is the bc a polynomial in n: ~vit,h polynomial

I n this definition, the cocfficients -ao, -al, -a2, . . . , -an are the ncgatives of thc elements ao, a l , a2, . . . , a, in the integral domain D.

316

THE THEORY O F ALGEBRAIC EQUATIONS

[CHAP.

THEOREM 9-2.5. Let D[x]be the set of al1 polynomials in x with coefficients in an integral domain D. Define equality, addition, multiplication, and negation in D[x] by Definitions 9-2.1, 9-2.2, 9-2.3, and 9-2.4, respectively. Then D[x]is an integral domain.
Proof. In order to prove that D[x] is a commutative ring with OxOas its zero and l x Oas its identity, it is necessary to verify such identities as the associative, commutative, and distributive laws. This is a rather tedious job which we will leave to the reader. I t should be remarked that the proofs of these laws use the fact that addition, multiplication, and negation in D satisfy the postulates for a commutative ring. For example, amxm and b(x) = boxO blxl if a ( x ) = aozO a l z l bnxn are in D[x],then by Definition 9-2.3, the coefficient of xi in the product a ( z ) b(x) is

Again using Definition 9-2.3, the coefficient of x<n the product b ( z ) a ( z ) is bi-ial biao. boai blai-i

Since D is a commutative ring, it follows that these two expressions are equal for every i. Therefore, by Definition 9-2.1, a ( x ) b(x) = b(x) a(x). That is, multiplication is commutative in D[x]. In order to prove that D[x] is an integral domain, it suffices by Theorem 4-4.5 to show that if a ( z ) and b(x) are nonzero polynomials, then a(x)b(x)is not the zero polynomial OxO. Since a ( z ) and b(z) are not zero, it is possible to write

where a, # O and bn # O in D. By Definition 9-2.3, the coefficient of xm+" in a(x)b(x)is ambn. Since D is an integral domain, ambn # O, by Theorem 4-4.5. Therefore, a ( x ) . b(x) # OxO. This proves the theorem. I t is time to justify the notation aoxO aixl a2x2 polynomials. It is clear from Definitions 9-2.2 and 9-2.3 that

+ anxn for

where the right-hand side of this equality is no longer a formal expression, but is an actual sum of products of polynomials. This observation suggests

that U-cshould use the symbol .c to denote thr polynomial os0 -L l.,:', and cach clement a E D should be identified with the polynomial a.co. This last identification can be easily justified. Indeed, by Ilefinitions 9-2.2, 9-2.3, and 9-2.4,

so that the corresponderice a ++ a.cO is ari isomorphism. identifications, it folloivs that a(.r) = a. -t al.r a2x2 actually a sum of the products

Making thcse . i an.xn is

i factors
Qi

. .L' . . x . ... - z

, -

in the integral domain D[.r]. A numbcr of useful conscqucnces follo\k- from this observation. For example, \ve can rearrange the terms in a polynornial in any way which might be convenient. In particular, the polynomial a0 alz an-lzn-l anzn can be writteri in "dcsccnding powers" of x, that is, in the form a,sn ~,-~z"-' alz ao. I t is customary to denote the integral domaiii of al1 polynornials in x with coefficients in D, as we have done in Thcorem 9-2.5, by D[.r]. The identification of z with OxO l z l and of each a E D with uzo will always be madc. The polynomial z is often called an indeterminate, and D[x] is referred to a s the domain qf polgnomials in the indeterminate z with cocfficicnts in D. The elements of D, when regarded as clements of D[z], alx are called constant polynomials. The term a. in a(x) = a. a2x2 anxn is callcd the constant term of a(z). Thc zero and identity of D are also the zero and identity of D[z]. Let us now examine our definition of polynomials more critically. There are t\vo weak points in the construction of D[z] which we have givcn. lcirst,, the idea of a "formal expression" is vague. Second, Definition 9-2.1, for "eyuality" of polynomials, needs to be clarified. I t is possible to give a definition of polynomials and their opcrations which avoids both of these problems. However, some discussion is needed to see that this definition is reasonable. The definition of ccluality given above implies that a polynomial can be exprcssed using any number of terms with zero cocfficicnts. For instance,

I n fact, there is no harm in thinking of a polynomial as an infinite "sum,"

iil ~vhichthe coeffi(:ients are zero from some point, on. This viewpoint has

3 18

THE THEORY OF ALGEBRAIC E Q U ~ ~ T I O S S

[CHAP.

the advantage that there is no ambiguity about the number of terms in a anxn and C:=o bn.cn are equal when polynomial. Two polynomials a, = b, for every n. The problem of giviiig an exact meaning to the formal expressions

can be avoided. It is evident that a polynomial is completely determiiied by the sequence (ao, a l , a2, . . . , a,) of its coefficients. Thus, if 1s-e want concrete mathematical objects for our polynomials, 1s-e can take them to be the sequences of elements in'D which up to now have been thought of as the coefficients of the "powers" of .c. For the reasons explained above, it is advantageous to let al1 of these sequences be infinite, but of course zero from some point on. These remarks motivate the following construction of an integral domain whose elements are definite objects, and which has the same algebraic properties as the ring of al1 polynomials in .t. ~ v i t h coefficients iii D. Let A be the set of al1 infinite sequences

of elements from the integral domain D, such that ak = O for al1 except finitely many values of 1;. That is, the sequences svhich belong to 14 are those which are of the form (ao, a l , a2, . . . , an, 0, O,
.).

TTVO such sequences are equal if they contain exactly the same elements of D in the same order. The operations of addition, multiplication, 2nd negation in A are defined by the rules (ao, a l , a2, a3, . . .) @O, b i , 62, b3, = (a0 bo,al bl,az

.)

+ b3,. . .),
(9-2)

(ao, a l , a2, a3, . . .) @O, b1, 02, b3, = (sobo, a0b1 albo, a o h

and

+ albl iazbo, aOb3 + alb2 + a261 + asboj - -1,


+
+ + +

.)

The sums ai b;, the sums of products aobi albiWl . ai-lbl aibo, and the negatives -ai are formed in the integral domain D. It is

easy to see that the sct il is closed under the three opcrations of (9-2)) that is, if the sccluenccs (aO, al, a2, a3, . . .) nnd ( / l o , bl, b2, b3, . . .) have only a finite numbcr of nonzcro elements, then the same is true of the sum, product, and negativcs of these sequcnces. 1:or example, if aj = O for a11 j > m and bk = O for al1 li > n , then for 1 > m n,

since if j li = 1 > m n , then cithcr j > m , or 1~ > n. Consequcntly, al1 tcrms are zero aftcr thc m n 1st in the seyucnce

+ +

(aobo,aobl

+ albo, aob2 + albl + a2bo,aob3 + alb2 + a2b1 + a300,. . .),

which is the prodilct of (ao,al, a2, a3, . . .) and ( b u , b l , b2, 03, . . .). A few straightforn-ard calculations show t,hat A is an integral domain whose zero is (O, 0, 0, O, . . .), ancl n-hose idcntity is ( 1 , 0, 0, 0, . . .). Aorcovcr, the corrcspondcnce a ++ (a, O , 0, 0 , . . .)

is an isomorphism bct~vcenD and thc subring of A consisting of al1 scquenccs of the form (a, O , 0, O , . . .). As usual, wc idcntify D with this subring, and n-rite a instcad of (a, O, 0, O, . . .). I t follo~vs from (9-2) that the element x = (O, 1, o, O) . . .) satisfics x2 = ( O , O , 1,0 , . . .), :<- (0,070, 1 ,
e),

ax a z "

(a, O, 0, 0 , . . .) (0, 1 , 0, 0, . . .) (a, O, 0, 0 , . . .) (0, 0, 0 , 1 , . . .)

= = =

(0, a, 0, 0 , . . .), (0, O, a, O , . . .), (0, 0, O, a , . . .),

a.x2 = ( a , 0, 0, 0 , . . .) (0, 0, 1 , 0 , . . .)

e te. Conscqucntly,

I n othcr words, the elemcnts of A can be cxprcssed iii the same \vay as thc polyiiomials ~vhichn-e havc bccn thinkiilg of as "formal espressions." I t is casy to see that the corrcspondciicc bct\vccii polynomials aild the clcments of 12 is a riilg isomorphism k)ctwccii D[x] and .l.

320

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

1. Use Definition 9-2.2 to find the sums of the following pairs of polynomials with coefficients in 2.
(a) OxO 7 x 1 (-3)x2 1x3, 5x0 6x1 (-3)x2 (b) 1 7x4 - x7, $3 5x5 (c) ixO oxl ox2 . ox24 ixO oxl ox2 . . 0 ~ 2 1 ~ ~ 2 ~ .

+ + + + + + + + + +

+ +

+ +

+ +

2. Use Definition 9-2.3 to find the products of the pairs of polynomials listed in Problem 1.

3. Write in full the expression for the product

Show that this is the expression which is obtained by multiplying al1 combinations of terms from each factor and collecting the coefficients of equal powers of x. 4. Prove that addition is commutative and associative in D[x]. Show that a ( x ) O x0 = a ( x ) for a ( x ) E D [ x ] . Prove that a ( x ) [-a(x)] = OxO for a(x) E D[x].

5. Prove that multiplication is distributive with respect to addition in D[x]. Show that l x Ois the identity of D[x]. 6 . Prove the following properties of multiplication in D[x]. ( a ) (aoxO a l x l a2x2 anxn) (cxi) = aocxi alcxi+ l a2cxi+2 a,c~i+~. (b) axi [b(x) (exi)] = [(axi) b ( x ) ] (cxi) for b ( x ) E D[x] and a E D, c E D. (c) Use the distributive law and (b) to prove that a ( x ) [b(x) (exi)] = [ a ( x ) b ( x ) ] (cxi) for a ( x ) , b ( x ) E D[x]and c E D. (d) Use the distributive law and (c) to prove that multiplication is associative in D[x].

+ +

+ +

7. Show by direct computation that the multiplication defined in (9-2) is associative.


8. Prove in detail that the ring A of al1 sequences (ao, a l , a2, as, . . .) of elements in D with a, = O for al1 but a finite number of n, and with the operations defined by (9-2)) is isomorphic to D[x].

9-3 The division algorithm for polynomials. I n this and the following two sections, we will investigate the arithmetic of polynomial rings. I t will be seen that the theory of the rings F[x]of al1 polynomials in x with coefficients in a Jield F is remarkably similar to the theory of the ring Z of al1 integers. The reader is advised to compare the results in Sections 9-3, 9-4, and 9-5 with the theorems about the integers which were proved in Sections 5-1, 5-2, and 5-3.

9-31

THE DIVISION ALGORITHM FOR POLYNOMIALS

321

DEFINITION 9-3.1. Let a ( x ) = a0 alx a2x2 anxn be a polynomial with coefficients in an integral domain D. Suppose that a ( x ) is not the zero polynomial. The degree of a ( x ) is the largest m 2 O such that a, # O. The coefficient a , is called the leading coeficient of a ( x ) . The degree of any nonzero polynomial is a nonnegative integer. For example, 32: - 4x3 is three; the degree of 2 is zero; the degree of 3 Ox is one. the degree of 3 (-l)x

+ +

+ + +

The polynomials of degree zero are exactly the nonzero constant polynomials; the polynomials of degree one are the polynomials of the form a bx with b # 0 , etc. Ko degree is assigned to the zero polynomial. I t is convenient to denote the dcgree of a nonzcro polynomial a ( x ) by

k'or instante,

Deg [2 3% - 4 2 " Deg [:zlO]= 10, Deg [3]= 0 , Deg [25x2]= 2.

0x4] = 3,

I t is obvious from (9-1) that if Deg [a(x)]= n, then it is possible to write a ( x ) = anxn an-lxn-l . ao,

where a, # O. Of course, the converse statement. is also t,rue: if a ( x ) = anxn

+ an-lxn-l +

+ ao,with a,

# O, then Deg [a(x)]= n.

These two observations are oft,en useful. THEOREM 9-3.2. Let a(x) and b(x) be nonzero polynornials in D[x], where D is any integral domain. Then ( 4 Deg [a(x) b(x)l = Deg [a(x)l Deg [b(x)l; (b) if a ( x ) O(x) # O, then Deg [ a ( z ) b(x)] < max {Deg [ a ( x ) ]Deg , [ b ( x ) ]; ) (c) if Deg [ a ( x ) ]# Deg [ b ( x ) ]then , ,Deg [a(x) b(x)] = max (Deg [a(x)], Deg [b(x)]) .

+ +
+

Proof. Let a ( x ) = a,+" an-lxn-l a0 and b ( z ) = b , ~ ? ~ bm-l~m-l . bO,where a, # O and b, # O. Therefore,

Deg [ a ( x ) ]= n

and

Deg [b(x)]= m.

322
Then

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

Since D is an integral domain, anbm # O. Therefore, Deg [a(x) b(x)] = m n = Deg [a(x)] Deg [b(x)]. To prove (b) and (c), suppose first that n > m. Then

Therefore,

Thus, Deg [a(x) b(x)] = n = max {Deg [a(x)], Deg [b(x)]). similar argument, if n < m, then Deg [a(x)

By a

+ b(x)] = m = max (Deg [a(x)],Deg [b(x)]).

This proves (e), and also (b) except in the case that n = m. When n = m,

+ b(x) = (a, + bn)xn + (an-1 + bn-l)xn-l + + (ao + bo). I f a(x) + b(x) # O, then ak + bk # O for some k not exceeding n. The degree of a(x) + b(x) is the largest such Ic. Clearly,
a(x) Except for certain special results concerning Z[x], we will restrict our discussion to the integral domains F[x], where F is a field. Since every field is an integral domain, the definitions and results which have already been given in Sections 9-2 and 9-3 apply to F[x]. The fields which particularly concern us are C, R, Q, and 2,. The degree of a polynomial is used in the study of the arithmetic of polynomials in much the same way as the absolute value is used in the study of 2. The principal of mathematical induction is applied to the study of Z by means of the absolute value of an integer. Similarly, it is through the degree of a polynomial that induction can be used in F[x]. The division algorithm for polynomials is our first example of a theorem about polynomials which is proved by induction on degrees. THEOREM 9-3.3. The division algorithm in F[x]. Let a(x) and b(x) be polynomials in F[x], where F is a field. Suppose that b(x) # O. Then there exist unique polynomials q(x) and r(x)in F[x] such that

and either r(z) = O, or else Deg [r(x)] < Deg [b(x)].

9-31

THE I)IVISIOI; ALGORITHM FOR POLYXOMIALS

323

This fundamental result is a statement of the process of long division for polynomials. The reader is probably familiar with the mechanics of this process, without having thought about, its formal statemcnt and proof.
22 3 and b(x) = 2x2 - 32 1. JVe will EXAMPLE 1. Let a(x) = x3 think of thcse polynomials as elements of Q[x], even though they have integral coefficients. To find the polynomials q(x) and r(x) ivhose existence is guaranteed by Theorem 9-3.3, the familiar long division process will be used:

+ +

Therefore,

The validity o this identity can be checkcd by direct computation.

Thc proof of the division algorithm is based on an induction in the form of t.he well-ordering principle. I t is convenient to prove a result which plays t,he role of the induction step in the proof of Theorem 9-3.3.

(9-3.4). Suppose that b ( x ) and c ( x ) are nonzerr, polynomials in F [ x ] , such that Deg [ b ( x ) ]5 Deg [c(x)].Then there is a polynomial j ( x ) such that either c ( x ) = j ( x ) . b ( x ) , or else
Deg [c(x) - f ( x ) b(x)] < Deg [c(z)].

. boj and c ( z ) = cmxm Prooj. Let b ( x ) = bnxn bn-lxn-l ~ ~ - ~ x ~ C O , -mhere ~ bn # O, cm # 0, and n 2 m. Define j ( x ) = (cm b ; l ) . ~ ~ - ~ Then .

Thus, if c ( x ) - j ( x ) b ( x ) # O , then the dcgree of this polynomial is less than m, the degree of c ( z ) . This is exactly what had to be shown for the proof of (9-3.4).

324

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

We will now prove Theorem 9-3.3. We first prove the existence of polynomials q(x) and r ( x ) with the required properties. I f there is a polynomial q(x) such that a ( x ) = q(x) b(x), then r ( x ) can be taken to be O . Therefore, suppose that a ( x ) # g(x) b(x) for al1 g(x) E F[x]. Then Deg [a(x) - g(z) b(x)] is defined for al1 polynomials g(x) E F[x]. Consequently, (Deg [a(x) - g(x) . b(x)ll d x ) E F[xl)

is a nonempty set of nonnegative integers. By the well-ordering principle, this set contains a smallest integer lc. That is, there is a polynomial q(x) E F[x]such that Deg [a(x) - q(x) b(x)]= k 5 Deg [a(x) - g(x) b(x)] for al1 g(x) E F[x]. If 12 f ( x ) E F[x]such that either

> Deg [ b ( x ) ] ,then

by (9-3.4), there exists

Deg [a(x) - q(x) . b(x)

f ( x ) b(x)] < Deg [a(x) - q(x) b(x)].

In the first case, a ( x ) = [q(x) f(x)] b(x), which is contrary to the assumption that a ( x ) # g(x) b(x) for al1 g(x) E F[x]. In the second case, Deg [a(x) - [q(x) f(x)] b(x)] < k, which contradicts k 5 Deg [a(x) - g(x) b(x)]for al1 g(x) E F[x]. The only alternative to these contradictions is that k is less than the degree of b(x). Thus, if we cal1 r ( x ) = a ( x ) - q(x) b(x), it follows that a ( x ) = q(x) b(x) r ( x ) , and Deg [r(x)]< Deg [b(x)]. I t remains to show that the polynomials q(x) and r ( x ) satisfying the conditions of Theorem 9-3.3 are unique. Suppose that

where r l ( x ) and r2(x) are either zero, or else they have degree less than Deg [b(x)].By subtracting these expressions, we obtain

Assume that r 2 ( x ) - r l ( x ) # O . Then q l ( x ) - q2(x) # O . Thus, by Theorem 9-3.2,

This is impossible, because Deg [ql(x) - q2(x)]2 O . Therefore, r2(x) r1( 2 ) = O, and since [ql( x ) - q2(x)] b(x) = r 2 ( x ) - r 1 (x) = O and

9-31

T H E DIVISIOS ALGORITHM FOH POLYSOMIALS

325

b ( x ) # O, we also have q l ( x ) - q2(x) = O. This completes the proof of the uniqueness of q(x) and r ( x ) . The polynomials q ( x ) and r ( x ) in the expression a ( x ) = q ( x ) b(x)

+ r( x )

given by the division algorithm are called, respectively, the quotient and rernainder on dividing a ( x ) by O(x). The division algorithm for polynomials can be generalized in a way which is analogous to the way that Theorem 5-1.3 generalizes the division algorithm for integers. We limit ourselves to st,ating a special case of this generalization.

THEOREM 9-3.5. Let c E F, where F is a field. Then every nonzero polynomial f ( x ) can be uniquely represented in the form

where n = Deg [ f ( x ) ] and a,, a,-l,

. . . , a. are elements of F.

This theorem can be proved from Theorem 9-3.3 by induction on Deg [ f ( x ) ] in the same way t.hat Theorem 5-1.3 is obtained from Theorem 5-1.1. We omit the proof.

l . Use the division algorithm to find the quotient and remainder on dividing a ( x ) by b(x) for the following pairs of polynomials. (a) a ( x ) = 2x3 - 3x2 x - 1 and b ( x ) = x2 .2 are in Q[x] (b) a ( x ) = x2 2 and b ( x ) = 2x3 - 3x2 x - 1 are in Q[x] (e) a ( x ) = x7 4 x and b ( x ) = x - 1 are in Q [ x ] (d) a ( x ) = x2 2/x - 1 and b ( x ) = x - ( 2 / G - 2 / 2 ) / 2 are in R [ x ] (e) a ( x ) = x3 ix2 x i and b ( x ) = x2 i are in C [ x ] ( f ) a ( x ) = 3x4 8x2 2 and b ( x ) = 12x2 x 3 are in2l3[x]

2. Let f ( x ) = Let d E F. Let a2, al, and a0 be the coefficients of the powers of x - d in the reprcsentation

+ + + + + + + + + + + + + + as2 + b s + c, where a, b, and c are clements of the field F.

Find cxl)ressions which give a2, a l , and a0 in terms of a, b, c, and d . Show directly t h a t Theorem 9-3.5 is truc in this case.

3. Let f( x )

x9

l . Express j ( x ) as a sum of powcrs of x - 1.

4. State a general analogue of Theorem 5-1.3 for I;[x],where li' is a ficld.

5. Prove Theorem 9-3.5.

326

TIIE THEOHY OF A L G E B I ~ ~ I I C EQC.ITIOSS

[CHAP.

6. Show t h a t l'heorcin 9-3.3 can bc generalized a s follows: let a ( x ) and b(x) be polynomials in D [ x ] , ~vlicreD is an integral domtiin. Suppose t h a t b(x) # 0, and thc leading coefficient (see Definitioii 9-3.1) of b(x) is 1. Thcn thcre exist unique polynomials g(x) and r ( x ) iii D[x] such t h a t

a(.)
and cither r ( x )
=

q(2) b(x)

+ r ( x ),

0, or

C ~ S CD

c [~ r ( x ) ] < I)eg [ b ( x ) ] .

9 4 Greatest common divisor in F [ x ] . Thc divisit,ility of elements in an integral domain \\-as discuescd bricfiy in Scc~tioii 4-4. The conccpts and notatioii introduccd in that acction apply to the ring F[.r] of polyiiomials with cocfficiciits i i l a ficld F. I:or convcniencc Iet us recall that uccording to Dcfinitiori 4-1.G, a po1ynomi:il 6(.c) divides a(.r) in I;'[.c] if thcrc is a polyriomial c(.c) E F[.r] such that a(.[) = b(s) c(.x). I t is nlso customary to say i i l this case that b(.e) is afactor of a(.r). i'hus, b(.c) divides a(.r) in F[.c] if and oidy if the remainder on dividiiig a(.x) by b(.c) is the zero polynomial. The statement that b(.r) divides a(m) is nbbreviated by writing

The relation b(.r) ia(.r) iii F[.r] has certnin useful propcrtics which dcpend on thc particular nature of thc iiltcgral domain F[.x].
(0-4.1). (a) If b(.c):a(.c) in F[x], then d . b(.r.):a(.x) aiid b(.c)]j(.r) a(.r), wherc d is any nonzero elcment of F , arid f(z) is a ~ i y polynomial of F[.r]. (b) Ll iioiixero const:~ntpolynomial divides evcry polynomiul in F[.x]. (e) If b(.c) la(.r) and n(a) # 0, thcn tlie degree of b(.r) is lcss than or elual to tkic dcgrcc of a(.r). (d) If b(.r)'a(.r) :iiid a(.x) aiid b(.c) have the same degrec, then each polynomial is a iioiizcro constant multiple of the othcr. (c) If c(.r)la(.r), c(.r)jb(.r), thcn c(.r),[f(.r)a(.r) g(.c)b(.r)] for every j(.r) arid g(.r) in F[.c].

I'roof. To provc (ti), \\-e note that a(.c) = I)(.r) c(.r) for some c(.r) E F[.r]. Thrn a(.r) = [ d . b(.r)J [d-l .c(.r)] aiid f(x) .a(.c) = b(.ts) [f(.r) c(.r)]. Thcrcforc, d b(.r)la(.c) arid b(.r),f(x) a(.c). Statcment (1)) folio\\-s from (a) niid the fact that thc ideiitity elemeilt 1 E F'[.L'] divides cvery polynomial i i i F[.r] [see Theorem 4-4.7(f)]. The propcrties (e) and (d) follow from Theorem $1-3.2(a). If b(.r)!a(.c), thcn by dcfinitioii, therc is a polynomial c(.x) siich that a(.r) = O(.r) c(.r). Since a(m) f O, it fo1lon.s that b(.c) # O aiid c(.r) f O. Therefore, by 'i'heorem 9-3.2(a), n e g [a(.r)] = 1)cg [b(.r)]

+ Deg [c(.c)].

9-41

GREATEST COMMON DIVISOR

IN

FIXI

327

Since the degrees are nonnegative integers, it follows that Deg [b(x)] 2 Deg [a(z)]. Iloreover, if Deg [b(z)] = Deg [a(z)], then Deg [c(r)] = O, so that c(x) is a nonzero constant. This proves (c) and (d). lTinally, the property (e) is no more than a restatement of Theorem 4-4.7(e).

EXAMPLE l . I n Q [ x ] , (+

+ 2x + *x2)1(3 + 2x -f zx2 $- 2x3 + ;x4), 3 + 2~ + gx2+ 2 ~ + 3 ix4 (4+ 25 + &x2)(1 +x~).


=

since

and

Definition 5-2.1 of the greatest common divisor of two integers was based on the ordering of the integers. Such a definition does not make sense in riilgs such as F[x] which are iiot ordercd. Howcver, the conditions (5-1 ) and (5-2) for the greatest common divisor make sense in any integral domain, and as ive observed in Section 5-2, these conditions can be used to define the greatcst common divisor of tivo elements (not both zero) in any integral domain. For conveniente, let us rcstate this definition for the integral domain F[x].

DEFISITION 9-4.2. IJet a(x) and b(z) be polyiiomials in F[x] which are not both zero. Then d(x) E F[x] is a greatcst common dicisor (g.c.d.) of a(x) xnd b(x), in F[x] if (a) d(x)la(x) and d(x)lO(x); (b) if c(x) E F [ x ] satisfies c(x)la(x) and c(:c)]b(x), then c(x)ld(s).
I t follo~vs from (9-l.la, b) that if d(x) is a greatest common divisor of a(x) and b(x), then so is c d(x), ivhere c is any nonzero element of F. Thus, a g.c.d. of a(x) and b(x) is not unique. Moreover, it is by no means obvious that tivo polynomials a(.r) and b(x) necessarily have any greatcst common divisor.

328

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

DEFINITION 9-4.3. A nonzero polynomial f(x) is called monic if the leading coefficient of f(x) is 1. That is, f (x) has the form

If g(x) = bnxn bn-lxn-' F[x], with b, # 0, then

+ bo

is a nonzero polynomial in

is a monic polynomial. Thus, for any nonzero polynomial g(x), there is a unique monic polynomial f(x) which is a multiple of g(x) by a nonzero element of F . It is customary t.o cal1 f(x) the monic polynomial associated with g(x) [or simply the monic associate of g(x)]. THEOREM 9-4.4. Let a(x) and b(x) be polynomials in F[x] which are not both zero. Then there exists a unique monic polynomial d(x) E F[x] which is a greatest common divisor of a(x) and b(x). Moreover,

for some g(x) and h(x) in F[x]. Proof. Suppose that a(x) = O. Then b(x) # O, and it is easy to see that the monic polynomial associated with b(x) is a greatest common divisor of a(x) and b(x). Similarly, if b(x) = O, then the monic associate of a(x) is a g.c.d. of a(x) and b(x). I n both of these cases, this monic g.c.d. can be expressed in the form g(x)a(x) h(x) b(x), in fact, with g(x) and h(x) constant polynomials. Assume therefore that a ( ~ # ) O and b(x) # O. We will prove the statement of the theorem (except the uniqueness) by course of values induction on

I f min {Deg [a(x)], Deg [b(x)]) = O, then either a(x) or b(x) is a nonzero constant polynomial, and the only common divisors of a(x) and b(x) are the nonzero constant polynomials. Hence, 1 is a monic g.c.d. of a(x) and b(x) in this case. Moreover, if a(x) = a E F, then 1 = a-' a O . b(x), and if b(x) = b E F, then 1 = O a(x) b-' . b. Assume inductively that if s(x) and t(z) are polynomials such that min {Deg [s(x)], Deg [t(x)]) < n, then s(x) and t(x) have a monic g.c.d. d(x), which can be expressed in the form d(x) = e(x) s(x) f (x) t (x) for some e(x) and f (x) in F[x]. Suppose that n = Deg [b(x)] 5 Deg [a(x)]. The proof is similar if n = Deg [a(x)] < Deg [b(x)]. By the division algorithm, -

where eithcr r(.c) = O, or else r(x) # O and Dcg [r(x)] < Deg [b(x)]. If r(x) = O, then b(z)la(x), and it follows easily from Definition 9-4.2 that t8hcmonic polynomial associated with b(x) is a greatest common divisor of a(x) and b(x). The monic associate of b(x) has the form g(x) a(z) h(x) b(x), whcrc g(x) = O and h(x) is a nonzero constant polynomial. Consider the case iil ivhich r(x) # O. Thcn mil1 {Deg [r(x)],Deg [b(x)])
= =

Dcg [r(x)] < Deg [b(z)] min {Dcg [a(x)], Dcg [b(x)]} = p.

Thus, by thc induction hypothesis, r(x) and b(x) have a monic greatest common divisor d(.r), which can be writtcn in the form d (x)
=

e(z) r (x)

+ f (x)b (x) . +

Since d(x)lb(x) and d(x)lr(x), it follows that d(x)([q(x)b(x) r(x)], by (9-4.1~). That is, d(x) Ja(z). Suppose tthat c(x) E F[x] is such t.hat c(x) la(x) and c(z)lb(x). Thcn c(x) divides a(x) - q(x) b(x) = r(x). Therefore, sincc d(x) is a g.c.d. of b(x) and r(x), Definition 9-4.2(b) requires that c(x)(d(.c). We have shown that d(z) satisfics both of the conditions of Definition 9-4.2 for a g.c.d. of a(x) and b(x). Thcrefore, d(x) is a g.c.d. of a(x) and b(x). Moreover, d(x) = e(x) r(x) = g(x) . a(.r)

+ f(x)

b(x) = e(x)(a(x) - q(z) b(x)) h(x) b(x) ,

+ f(x)

b(x)

where g(x) = e(z) and h(.t.) = f(x) - e(x) q(x). To prove that d(x) is the unique monic g.c.d. of a(.t.) and b(x), assume that dl(x) is also a monic polynomial which satisfies Definition 9-4.2. Since d(x) satisfics part (a) and dl(x) satisfies part (b), it follo~vs that d(x)ldl(x). Similarly, sincc dl(x) satisfies (a) and d(x) satisfics (b), it follo\vs that d (x) Id(.r). Thcrcfore, Dcg [d (x)] = Deg [d(:c)], by (9-4. le). By (9-4.ld), dl(x) is a constant multiplc of d(x), say dl(x) = Icd(x), whcre 1; # O is in F. Sincc both dl (x) and d(x) have leading cocfficient 1, it follo\vs that 1i = l. Hence, dl(x) = d(x). This completes the proof of the thcorem. us to spcak of the monic g.c.d. of l'hc rcsult of Theorem 9-4.4 allo~vs two polynomials a(.t.) and b(x) in F[x] \vhich are not both zero. I t is convcnient to denote this uniclue monic g.c.d. by the exprcssion

This is similar to thc notation iilt,roduccd in Definition 5-2.1 for the g.c.d. of t\vo integers.

330

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

I t i s possible to prove Theorem 9-4.4 by a method which resembles the proof of the analogous Theorem 5-2.2 (see Problem 5 ) . However, the proof which we have given provides a practica1 method of finding the g.c.d. of f b(x) = O, then ( a ( x ) ,b ( x ) ) is the monic two polynomials a ( x ) and b(x). I associate of a ( x ) . If a ( x ) # O, b(x) # 0 , and Deg b(x) 2 Deg a ( x ) , then (a(.), b ( x ) ) = (b(x),r ( x ) ) ,where r ( x ) is the remainder obtained from the division of a ( x ) by b(x). Consequently, by repeated application of the division algorithm, it is possible to find the g.c.d. of a ( x ) and b(x).
Let a ( x ) = x 5 + 3 x 4 + 5 x 3 + 4 x 2 + 4 x + 1 and b ( x ) + 2 x 4 + 3 x 3 + 2 x 2 + 2 x . Then b y repeated use of the division algorithm: x5 + 3x4 + 5x3 + 4x2 + 4 x + 1 i .( ~ + 5 2~~ + 3 ~ + 3 2 ~+ 2 2 ~+ ) ( ~ -p 4 sx3+ 2x2 + 2 x + 1 ) ) ( ~+ 4 zx3 + zx2+ zx + 1 ) + ( ~ + 3 x), + 2x4 + + 2 ~ + 2 zx +2 ~ + 3 zx2+ 2% + 1 = ( X + 2 ) ( ~ + 3 X)+ ( ~+ 2 1) x3 + x ( x 2 + 1 ) + o.

EXAMPLE 2.

x5

~5

3L3

~4

Therefore,

The method of obtaining the g.c.d. of two polynomials as in Example 2 by repeated use of the division algorithm is called the Euclidean algorithm, because it is similar to the process of obtaining the g.c.d. of two integers which has been passed down to us in the works of Euclid. In general terms, the process consists of forming the successive equations

where r l ( x ) , r 2 ( x ) , . . . , rn(x) are not zero, Deg [b(x)]> Deg [ r i ( x ) ]> Deg [r2(x)]> > Deg [rn(x)]. I t follows from the proof of Theorem 9-4.4 that the monic polynomial associated with rn(x) is the monic

9-41

G R E A T E S T COMMOX DIVISOR IN Fizl

g.c.d. of a ( x ) and b ( x ) . Indeed,

and (rn( 2 ) )O) is the monic polynomial assoeiated wi th rn ( x ). Two polynomials a ( x ) and b ( x ) in F [ x ] are relatively prime if the monic g.c.d. of a ( x ) and b ( x ) is 1. The proof of the following result is identical with the proof of Theorem 5-2.6.

THEOREM 9-4.5. If a ( x ) and b ( x ) are relatively prime polynomials in F [ x ] ,and if a ( x ) l b ( x ) ~ ( x )where , c ( x ) E F [ x ] ,then a ( x ) l c ( x ) .

1. Find the monic (a) x4 - x3 ( b ) x4 - 3x3 - x5 (c) (d) x2 - 2, x2

+ 3x2 - 22 + 2, x3 + x2 + 2x + 2 + 4x2 - 122, x3 4x2 + 42 - 3 + 2 ~ ~ - 3 ~ ~+ - 2 ~ + 2, - x3


-

g.c.d. of the following pairs of polynomials.

- (dS

+ d 3 )+ ~d 6

x2

X~

x2 + l

2. Prove that a ( x ) and b(x) are relatively prime if and only if there exist polynomials j ( x ) and g(x) in F[x]such that j ( x ) a ( x ) g(x)b(x), = 1.

3. Let a ( x ) and b(x) be polynomials in D[x],where D is an integral domain. The polynomial a ( x ) is called an associate of b(x) if a(x)lb(x) and b(x)la(x). If a ( x ) is an associate of b(x), we write a(x) b(x). (a) Prove t h a t is an equivalence relation on D[x]. ( b ) Let D = 2. Prove t h a t a ( x ) b(x) if and only if b(x) = a ( x ) or b(x) = -a(x). (c) Let D = F, a field. Prove that a ( x ) b(x) if and only if b(x) = k . a ( x ) , where k is a nonzero element of F. (d) Suppose that d i ( x ) and dn(x) are greatest common divisors of a ( x ) and b(x). Prove that di ( x ) d2(x). (The definition of a greatest common divisor of two polynomials in D[x],where D is an integral domain, is obtained from Definition 9-4.2 by replacing the field F by D).

4. Find a n example which shows t h a t the polynomials j ( x ) and g(x) in the expression d ( x ) = j ( x ) a ( x ) g(x)b(x) for the monic g.c.d. of a ( x ) and b(x) in Theorem 9-4.4 are not unique.

5. Let a ( x ) and b(x) be polynomials in F[x]which are not both zero. Let

(a) Show that S contains a t least one nonzero, monic polynomial. ( b ) Without using Theorem 9-4.4, prove t h a t the monic polynomial of smallest degree in S is a g.c.d. of a ( x ) and b(x).

332

THE THEOHY OF ALGEBHAIC

EQUATIOSS

[CHAP.

6 . Show t h a t the only greatest common divisors of 2 and x in Z [ x ]arc 1 and -1. [See Problem 3 ( d ) for the definition of greatest common divisor in Z [ x ] . ] Prove t h a t i t is impossible t o find j ( x ) E Z [ x ] and g(x) E Z [ x ] satisfying 1 = g(x) x. This shows t h a t Theorcm 9-4.4 is false in D [ x ] ,where L) is j(x) 2 a n integral domain.

7 . Let a($) and b ( x ) be polynomials, not both zero, with coefficients in the rational field Q. Prove t h a t the monic g.c.d. of a ( x ) and b ( x ) in R [ x ]has rational coefficients. 1s the s+me conclusion true if R [ x ]is replaced b y C[x]? [Ilint: Prove t h a t the monic g.c.d. of a ( x ) and b ( x ) in Q [ x ]is also a g.c.d. of a ( x ) and b ( x ) in R [ x ] ,then use the uniqueness statement in Theorem 9-4.4.1 S. Let a ( x ) , b ( x ) , and c ( x ) be polynomials in F [ x ] ,with a ( x ) # O, b ( x ) # 0 , and a ( x ) monic. Prove t h a t ( a ( x ) b ( x ) ,a ( x ) c ( x ) ) = a ( x ) ( b ( x ) ,c ( x ) ) . 9. Let { a l ( x ) ,a 2 ( x ) ,. . . , a,(x)) ( n 2 ) be a sct of polynomials in F [ x ]with a l ( x ) # O. A greatest common divisor of { a l ( x ) ,a2(x), . . . , a,,(x)) in F [ x ] is a polynomial d ( x ) E F [ x ] such t h a t (i) d(x)lai(x) for i = 1, 2, . . . , n, and (ii) if c ( x ) ) u ~ ( for x ) i = 1, 2, . . . , n, then c ( x ) l d ( x ) . (a) Prove t h a t (. . . ( ( a i ( x ) ,a 2 ( x ) ) ,a s ( x ) ) , . . . , a n ( x ) ) is a g.c.d. of { a i ( x ) , an(x), . . . , an(x)) in I'[xl. (b) State and prove a theorem similar t o Theorem 9-4.4 for sets of n >_ 2 polynomials. 10. Find the monic g.c.d. of the following scts of polynomials. (a) x4 - x3 3x2 - 22 2, x3 x2 2x 2, x 2/S i (b) x4 - 1, x3 x 2 + x + 1, x2 - 1 (c) x5 13x4 63x3 148x2 2082 192, 4x4 52x3 189x2 2962 208, 20x3 156x2 3782

>

+ + + + + +

+ + + + + + + + + +

+ 296

11. A least common multiple (1.c.m.) of two nonzero polynomials a ( x ) and b ( x ) in I'[x] is a polynomial m ( x ) E I'[x] ivhich satisfies (i) a ( x ) l m ( x ) and b ( x ) l m ( x ) ,and (ii) if l ( x ) is any polynomial in F [ x ]such t h a t a(x)ll(x) and b(x)ll(x),then m(x>11(x). Prove t h a t if a ( x ) and b ( x ) are nonzero polynomials in k'[x], then a ( x ) b ( x ) / ( a ( x ) ,b ( x ) ) is a 1.c.m. of a ( x ) and b ( x ) . 12. Find a least common multiple for thc follo\ving pairs of polynomials. (a) x5 3x4 5x3 4x2 42 1, x5 2x4 3x3 2x2 2.2: (b) x4 - x 3x2 - 22 2, x3 x2 2x 2 (c) ~3 - 2 ~ + 1, x ~ + 1

+ + + + + + + + + + + + + +

9-5 The unique factorization theorem for polynomials. The fundamental theorem of arithmctic, which was proved in Section 5-3, states that every natural number can be written uniquely as a product of prime numbers. Thc purpose of this section is to prove a similar theorem about the arithmetic in an integral domain F[z], wherc F is a field.

Our first task is to define the analogue in F[r] of a prime number.

DEICIKITION 9-5.1. IJet p(x) be a polynomial of positive degree in F[x]. Then p(x) is irreducible in F[x] if p(x) is not divisible by any polynomial except constant polynomials and constant multiples of p(x). in F[.L] Other~vise, p(x) is called reducible in F[x].
This defiriition requires some discussion. By (9-&la, b), any polynomial a(x) in F[x] is divisible by every nonzero constant polynomial and by every nonzero constant multiple of a(.r). Thus, the irreducible polynomials in F[z] are exactly those which have no divisors other than these "trivial" ones. This parallels closely the definition of a prime number (Definition 5-3.1). Suppose that a(x) is a polynomial of positive degree which is reducible in F[.x]. Then by Definition 9-5.1, a(x) = f(x) g(x), where f(x) is not a constant and f(x) is not a constant multiple of a(x). I t follo~vsthat Deg [ f(x)] < Deg [a(x)]. For if Deg [f(x)] = 1)eg [a(x)], then by Theorem 9-3.2(a), Deg [g(.z)] = O. Therefore, g ( x ) is a nonzero constarit ~vhich implies that f(x) is a constant multiple of a(.c). This contradictiori proves that Deg [ j(.z)] < Deg [a(x)]. Therefore, a reducible polynomial a(x) in F[x] has a factor f(x) such that O < Deg [f(.e)] < Dcg [a(x)]. Conversely, it is easy to show that if a(x) E F[x] has a factor f(x) E F[x] which satisfies O < Deg [f(.c)] < Deg [a(z)], then a(x) is reducible in F[x]. Since Definition 9-5.1 applies only to polynomials of positive degree, thc constant polynomials are neither reducible nor irreducible. These polynomials play a special role in F[x] similar to that of the integers 1 aiid - 1 in the arithmctic of %. I t is very important to observe that irreducibility is defined rclative to a particular field F. That is, a polynomial which is irreducible in F[.r]may be reducible in K[x] for some field K containing F.

E~XAMPLE1. The polynomial x2 - 3 is irreduciblc in Q[x]. Suppose thnt x2 - 3 is reducible. Then

x2 - 3

(ax

+ b)(cx + d)

acx2 -/- (ad

+ bc)x + bd, + +

where a, b, c, and d are rational numbers. This iinplies that ac = 1, ati bc = 0, and bd = -3. Thus, c = ]/a, d = -3/b, and substituting in ad bc = O, \\-e obtain

Thercfore, -3a2

+ b2

O and ( b / ~ )= ~ 3. However,

d3 is iiot a rational

334

THE T I ~ E O R Y OF ALGEBKAIC EQUATIOSS

[CIIAP.

9
=

number. Thus, x2 - 3 is irreducible in Q[x]. On the othcr hand, x2 - 3 (x - d3)(x d3),so that x2 - 3 is reducible in R[x].

EXAMPLE 2. Any polynomial of dcgree one, ax b, a # O , with coefficients in a field F, is irreducible in l+'[x].In fact ax b = f(x) g(x) with O < Dcg [f(x)] < 1 is obviously impossible. 3lorcover, if K is any field containing F as a subring, then ax b is also irrcducible in K[x].

+ +

The principal result of this section is that every polynomial of positive degree in F[x] can be expressed as a product of an element in F and one or more monic irreduciblc polynomials in F[x]. fi,foreover, this factorization is unique, except possibly for the order of the factors. This is the unique factorization theorem in F[x], which is the analogue of the fundamental theorem of arithmetic. The following preliminary results are needed for the proof of this important theorcm. (9-5.2). If p(x) is irreduciblc in F[x] and f (x) E F[x], then either p(x) 1j (x) in F[x], or p(x) and f(x) are rclatively prime. Proof. Let d(x) = (p(a), f(x)). Then d(x)lp(x), by Definition 9-1.2. Since p(x) is irreducible, i t follows that either d(x) is a constant or d ( 4 ) is a nonzero constant multiple of p(x). If d(x) is a constant, then d(x) = 1 (because d(x) is monic), so that p(x) and f(x) are relatively prime. If d(x) is a nonzero constant multiple of p(x), then p(x) = k d(x) for some nonzero 1c E E. Since d(x)l j(z), it follows that p(x) 1 j(x), by (9-4.la). ( - 5 . ) . If p(x) is irreducible in F[x], and p(x) divides the product a,(x) of polynomials in F[x], then p(x) divides a t a l (z) a2(x) least one of the polynomials ai(x). The proof is the same as the proof of (5-3.2). THEOREM 9-5.4. Unique factorization theorem i n F[x]. Every polynomial a(x) E F[x] of positive degree can be written as a product of a nonzero element of F and monic irreducible polynomials in F[x]. Except for the order of the factors, the expression of a(x) in this form is unique. Proof. Roth parts of this theorem are proved by course of values induction on n = Ileg [a(x)]. The proof that a(x) can be factored into a product of a nonzero element of 17 and monic irreducible polynomials in F[x] is similar to the corresponding part of the fundamental theorem of arithmetic. Suppose that Deg [a(x)] = l. Then a(x) = bz c, where b E F , c E F, and b # O. By Example 2, x (b-' . c) is a monic irreducible

9-51

U S I Q U E FACTORIZATION THEOREM FOR POLYXOMIALS

335

polynomial in F [ x ] ,and a(.)


=

b [x

+ (b-l

c)].

Suppose that a ( x ) has degree n > 1: and assume that every polynomial of degree m, with 1 5 m < n, can be expressed in the form

where c f O is in F and the pi(x) are monic irreducible polynomials in F[x]. I f

is irreducible, then

+ ( a l a o )is monic and irreducible, by (9-4.1).

is the desired expression for a ( x ) , since xn

(ala,-l)xn-l If a ( x ) is not irreducible, then a ( x ) = b ( x ) c ( x ) , where b ( x ) and c ( x ) are polynomials in F [ x ] satisfying 1 5 Deg [ b ( x ) ]< Deg [ a ( x ) ] and 1 5 Deg [c(x)] < Deg [ a ( x ) ] . Therefore, by the induction hypot,hesis b(x)
= =

ci p i ( x ) P ~ ( x' ) c2 q1(x) q2(x) '

'

pr(x) qs(x):

and c(x)
' ' ' '

where cl and c2 are nonzero elements of F , and the pi(x) and q j ( x )are monic irreducible polynomials. Thus, a(.)
=

b( x ) c ( x )
' ' ' '

=. (cl ' c2) ' p l ( x ) ' p 2 ( x ) '

p&)

q1(x) ' q2(2)

' ' ' '

qs(x),

which is the required form. To prove that the factorization of a polynomial a ( x ) is unique, we can use induction either on the degree of a ( x ) , or on the number of monic irreducible polynomials which occur in some decomposition of a ( x ) into a product of irreducible polynomials. This last method corresponds to the proof of the uniqueness given in Theorem 5-3.3. However, for the proof of Theorem 9-5.4, it is slightly easier to induce on the degree of a ( x ) . Suppose first that a ( x ) has degree one and that

Thcn a l = a2 f 0, and albl = a2b2. Multiplying the last, equation by a;' = a;', we obtain bl = 62. Therefore, any t\vo factorizat,ions of

336

THE THEORY OF ALGEBRAIC

EQUATIONS

[CHAP.

a ( x ) are identical. Now, suppose that a ( x ) has degree n > 1, and assume that the unique fact,orization theorem is true for al1 polynomials of degree less than n. Let

of a ( x ) into products of an element of F and one be any two fa~torizat~ions or more monic irreducible polynomials. Since the P ~ ( xand ) q j ( x ) are monic polynomials, the leading coefficient of a ( x ) is both cl and c2. That is, c1 = cg. Thus,

so that p l ( x ) divides q l ( x ) q 2 ( x ). . . qs(x). Since p l ( x ) is irreducible, it follows from (9-5.3) that p l ( x ) divides one of the polynomials q j ( x ) . However, q j ( x ) is irreducible, and p l ( x ) is not a constant, so that p l ( x ) must be a constant multiple of q j ( x ) . Since p l ( x ) and q j ( x ) are both monic polynomials, it follows that pl ( x ) = q j ( x ) . If r = 1, then a ( x ) is irreducible, so that s = j = 1. In this case, the factorizations a ( x ) = c l p l ( x ) = c2q1(x)are identical. Otherwise, p l ( x ) can be cancelled from the above expression to obtain

Since n = Deg [ p l ( x ) ] Deg [ p z ( x ) . . . pT(x)] and Deg [ p l ( x ) ]2 1, it follows that Deg [p2(x) p,(x)l < n. By the induction hypothesis, the polynomials p 2 ( x ) , . . . , pT(x) are just the polynomials ql ( x ), q2 ( x ), . . . , qj- 1 ( x ), qj+ 1 ( a ) , . . . , qs ( x ) in some order. Therefore, the two factorizations of a ( x ) are the same, except possibly for the order of the factors. The process of expressing a polynomial as a product of an element of F and a product of monic polynomials which are irreducible in F[x] is the familiar "complete factorization" which is studied in elementary algebra. I t would be convenient to have a systematic method which would give a complete factorization of any polynomial a ( x ) in any integral domain F [ x ] . Simply to have a way of deciding whether or not a given polynomial in F [ x ]is irreducible in F [ x ]would be helpful. Unfortunately, such methods exist only for particular fields F. For example, if the field F is a finite field of the form 2, (where p is a prime), then there are only finitely many polynomials of a given degree. By examining al1 products of two polynomials of degree less than a ( x ) , it is possible to decide whether or not a ( x ) is irreducible.

9-51

C N I Q U E FACTOHIZ.4TION T H E O R E M FOR P O L Y S O M I A L S

337

EXAMPLE 3. I3y a method which is similar t o the "sieve of Eratosthenes" (see Section 5-4), the monic irreducible polynomials of any degree in thc rings Z,[x] can be determined. Actually, the method is practica1 only for small p, and for polynomials of low degree. )Ve will consider the case p = 3. The following list includes evcry monic polynomial in Z3[x] of degree lcss than or equal t o two : x x -1 1
x +2 x2 x x

x2 x x2+x 1 x2+x + 2 x2 22 x2 2x 1 x2 22 2

+1 +2
+

= X'X

= (x+ l).(x+
= =

2)

x . ( x + 1) (x+2).(x+2) x (x (x

+ + + + +

= =

+ 1) (x + 1)

+ 2)

I t follo\vs t h a t the monic, irreducible polynomials of degrce one and two in Z3[x] are

1. Determine which of the following polynomials are irreducible in Q[x]. (a) x3 - 2 (b) x3 1 (d) x4 - x2 - 1 (e) x2 2x2 22 1 (e) x4 zx 4 (f) X~ zx3 I

+ +

2. Which of thc polynomials listed in Problcm 1 arc irreducible in R[x]?

3. Let j(x) = ax2 bx c be a polynomial with rational coefficients, where a # O. (a) Prove t h a t j(x) is irreducible in Q[x] if and only if b2 - 4ac is not the square of a rational number. (b) Prove t h a t f (x) is irreducible in Rlx] if and only if b2 - 4ac < O. (c) Prove t h a t f(x) is reducible in C[x] for al1 values of a, b, and c .
4. Express the polynomials listed in I'roblem 1 as a product of monic irreducible polynomials in Q[x], R[x], and C[x].

+ +

5. TJsc the method of Example 3 to find al1 irreducible monic polynomials of the third degree in Z3[x].
6. Find the complete factorization of al1 1)olynomials of degree four in Z ~ [ X ] .

7. I'rove t h a t if p(x) is irreducible in F[x], and c # O in F, then cp(x) is irreducible in F[x].

338

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

8. Let a ( x ) = cp1 ( ~ ) ~ i p 2 ( x ) .". 2 . p,(x)"r be a polynomial in F[x],where the pi(x) are monic polynomials which are irreducible in F [ x ] , pi(x) # p j ( x ) for i # j, and the exponents ni are natural numbers. Prove that b(x) E F[x]divides a ( x ) if and only if b(x) = dpi(x)"lp2(~)"2. . . p,(x)"r, where O mi 5 ni for i = 1)2 ) . . . )r .

<

9. Any tuTo nonzero polynomials a ( x ) and b(x) in F[x] can be expressed in the forms a(x) = ~ p ~ ( x ) ~ ~ . p .~ . p,(x)"r, (x)~2

b(x)

dpi(x)"~p2(x)"2. . . p,(x)"r,

where the pi(x) are monic irreducible polynomials, pi(x) # pj(x) if i # j, and the exponents ni and mi are nonnegative integers. (a) Prove t h a t the monic g.c.d. of a ( x ) and b(x) is

where t i = min (mi, n i ) for i = 1, 2, . . . , r . (b) Prove t h a t a least common multiple of a ( x ) and b(x) is

where si = max {mi, n i ) for i tion 9-4).

1, 2,

.. ., r

(see Problem 11, Sec-

9-6 Derivatives. Up to now, our discussion of the rings of polynomials with coefficients in a field has run parallel to the development of the fundamental t,heorem of arithmetic. In this section we introduce an idea which has no analoguc in thc arithmetic of integers. This is the conccpt of the derivative of a polynomial. This concept is one of the basic notions of calculus. The derivative of a polynomial plays an important role in the theory of equations, and for its application in this subject,, it can be defincd in a purely algebraic way.

is a nonconstant polynomial in F [ x ] ,then the polynomial


(n.
an)xn-l

+ ((n

1 ) an-1)xn-2

+ . . + ( 2 .a i ) + ~ 1

al

is called the derivative of a ( x ) . The derivative of a constant polynomial is zero. I t is customary to denote the derivativc of a ( x ) by a f ( x ) , the dcrivative of b ( x ) by b f ( x ) ,etc. The expressions n a,, ( n - 1 ) a,-1, . . . , 2 a2, and 1 a l for the coefficients of a r ( x ) dcnot,e the elements of F which are obtained by re-

9-61

DERIVATIVES

peated addition. That is,


n aummands

-m

n-1 summanda

Xote that if the characteristic of the field F is a prime p with p 2 n, then p a , = O in F. Also, if 2p 2 n, then ( 2 p ) a2, = 0, etc.

EXAMPLE 1. If a(x) of a(x) is

If a(x) = field of integers modulo 5, then

+ d g x3 - x + 4 3 E R[x],then the derivative ar(x) 4x3 + 32/2 x2 - 1. x5 + 2x3 + 32 + 1 E Z5[x],that is, the coefficients belong to thc
=

x4

I f a f ( x )is the derivative of a ( x ) , then the derivative of a l ( x ) is called the second derivative of a ( x ) , and is dcnoted by a U ( x ) . For any natural number n, thc result of taking n successive derivatives of a polynomial a ( x ) is called the nth derivati~le of a ( x ) . The nth derivative of a ( x ) can be denoted by

with n primes. However, this notation is unusual if n > 3. For large n it is customary to write a'"'(x) for the nth derivative of a ( x ) , and we will follow this practice if n > 2.

THEOREM 9-6.2. Let b(x) and c(x) be polynomials in F[x]. ( a ) If a ( x ) = b(x) c ( x ) , then a' ( x ) = b' (z) c' ( x ). (b) If a ( x ) = b(x) c ( x ) , then a f ( x ) = b(x) c l ( x ) bl(x) c ( z ) . (c) If a ( x ) = b(x)", where n 2 1, then a f ( x ) = n b(x)"-' bf(+).

Proof. First consider ( a ) . Let

340

THE THEOKY OF -4LGEBRAIC EQUATIOSS

[CHAP.

(As we obscrvcd in Section 9-2, there is no loss of gcnerality in assuming that b(.x) and c ( x ) are written with the same number of terms.) Then

a ( r ) = (6, f cn).rn
Hence, by Definition '3-6.1,

+ . + (bl + c1)x + + col.


+ (bn-1 +
-

c,-1)~"-'

(O0

af(.c) = n (O,

+1

+ cn)xn-' + ( n

+
=

(61 e l ) [ ( n bn)xn-l ( ( n - 1) b,-l)xn-2 [(n-cn)xn-l ( ( n - 1) . c , - ~ ) x " - ~ bt(a) c r ( x ) .

1) (bn-1

+ c n - l ) - ~ ~+ -~
+

+ 1 bi] + . + 1 .ci]
.

This provcs (a). We prove (b) first in t.hc case that b ( x ) = eam a,nd c ( x ) = f.cn where e, f E F , m 1 O and n 2_ O. Then a ( x ) = b ( x ) c ( x ) = (ef)xm+". By definition, a r ( x ) = [ ( m n) e j ] ~ ~ + ~ - ' , and

br(x) c ( x )

+ b(.x)

cr(.r) = [ ( m e).rm-'1 ( f i n ) = [ ( m n) ef ].x'"+"-',

+ (exm)

[ ( n f)xn-'1

so that a' ( x ) = b r ( x ) c ( a ) --t b ( x ) c' ( x ). Yext observe that if the identity (b) is correct for a l ( x ) = b l ( a ) c ( x ) and a2(:c) = b 2 ( x ) ~ ( x )then , it is truc for a ( x ) = b ( x ) c ( x ) , where b ( x ) = b l ( x ) b 2 ( x ) . Indeed, a ( x ) = [bl(n.) b 2 ( x ) ] ~ ( x= ) bl(.x) c(.c) b2(.r) C ( Z ) = a l ( x ) a2(x), so that

Similarly, if thc identity (b) holds for b ( x ) c l ( x ) and b ( x ) c 2 ( x ) , then it holds for b ( x ) [cl( x ) c 2 ( x ) ] . The proof of the identity ( b ) can now be completcd by induction. I t is convenient to use two steps. First we prove by induction on m thnt (b) is valid when

9- 61

DERIVATIVES

The general case

is then ohtained by induction on n. The reader can renew his ski11 in the use of mathematical induction by filling in the details of this argument. I n order to prove (e), \ve usc induction on n. If n = 1 , the statement is that if a(x) = b(x), then at(a) = 1 b(.r)O bf(x). Since b(.c)O is 1 (the usual convention for the exponent zero), this identity is correct. Assume that n > 1, and that the derivatiie of b ( ~ ) ~ is' (n - 1) O ( Z ) ~ - ~ bt(z). Write b(z)"-' = c(.c). Then if a(.r) = b ( . ~ = ) ~ b(n) . c(.x), it follo117s from (b) and the induction hypothesis that a' (x) = b(s) ct(.c) bf( . E ) c (x) bf(x) = b(.r) (n - 1 ) b ( ~ ) " - ~
= =

(n - 1) . b ( ~ ) ~ . bt(.r) ' n b(.r)"-' bt(x).

+ b ( ~ ) ~ - bt(x) l

+ b'(s)

b(.r)"-l

Therefore, the induct,ion is complet,e, and Theorcm 9-6.2 is completcly proved. Thc reader should examine the proof of (b) very carefully, siilce the method is common in mathematical arguments. Our proof consists of three steps. I;irst, it is shown that the identity is true for the simplest polynomials, that is, the monomials. S e x t we prove that the set of polynomials satisfying the identity is closcd undcr addition. Finally, sincc every polynomial is a sum of monomials. it follows that the identity is true for al1 polynomials. This last stcp is of course a form of mathematical induction. I t is possiblc to provc (b) by straightforward calculation, but the notation becomes un~vieldy.
EXARIPLE 2. Let a ( x ) = (x - c)". Thcn a l ( x ) = n . ( x - c)"-l, a U ( x ) = n ( n - 1) ( x - c ) " - ~ ,
a ( n - l ) ( x ) = n . ( n - 1) a(")(x) = n ! . l .
2 (x
-

c),

The derivative is useful for studying the multiple factors of a polynomial. I;or this application, \ve need thc following formula, ~vhichis a combination of Thcorem 9-0.2(1)) and (c).

342

THE THEORY OF ALGEBHAIC EQUATIONS

[CHAP.

THEOREM 9-6.3. Let a(x) = cpl(x)"1 . . . pk(x)"k, k 1, where c E F, pl(x), . . . , pk(x) are polynomials in F[x] which are not constant, and n l , . . . , nk are natural numbers. Then

>

This result is easily obtained from Theorem 9-6.2 by induction on lc. We leave thc details for the reader to supply. THEOREM 9-6.4. Let a(x) = cpl(x)"1 . . . pk(x)"k, k 1, where ( 4 C E F, (b) pl(x), . . . , pk(x) are distinct monic, irreducible polynomials in FbI, (c) n l 1,. . . , n k 1,and (d) F has characteristic zero. Then the monic grcatest common divisor of a(x) and af(x) is

>

>

>

Prooj. I t is immediate that p1(x)"l-' . . . pk(x)"k-' divides a(x) = cpl(s)"l . . . pk(x)"*. We next observe that pl(x)nl-l . . . pk(x)"k-l divides a(x)/pj(x) for j = 1, . . . , k. Therefore,

divides

where we have used the formula for a' (x) givcn by Theorem 9-6.3. Thus pl(x)"1-' . . . pk(x)nk-l is a common divisor of a(x) and al(s). To com) plete the proof, we must show that every common divisor of a ( ~ and af(x) divides pl(x)nl-l . . . pk(x)"k-l. Let j(x) be a common divisor of a(x) and af(x). Thcn since j(x)la(x), it follows that

where ml _< n l , . . . ,mk 5 nk. I t is now sufficient to show that m1 # n l , . . . , mk # nk, for in this case ml 2 n l - 1, . . . , mk nk - 1, so that Assume that m1 = nl. Then f(x) divides pl(x)"l-' . . . pk(x)"k-l. f (x))at (x) implies that p l (x)"1laf (x). Moreover,

<

9-61 Therefore, p (x)"1 divides

DERIVATIVES

Hence, by the unique factorization theorem, and the fact that pl(x), . . . , pL(x) are distinct monic irreducible polynomials, it follows that pl(x)lnlp(x). We now observe that nlp(x) # O. In fact, the leading coefficient of nlp(x) is nl Deg [pl(x)] times the identity element of F, which is not zero because of assumption (d). Therefore, Deg [nlp(x)] = Deg [pl (x)] - 1. By (9-4.1~))this contradicts pl (x) lnip(x). This contradiction was obtained by assuming that ml = nl. Therefore, ml # nl, and similarly m2 n2, . . . , m k # nk. As we remarked above, these inequalities imply the theorem. A special case of Theorem 9-6.4 is worth emphasizing.

THEOREM 9-6.5. If p(x) is an irreducible polynomial in F[x], where F is a field of characteristic zero, then

Theorem 9-6.4 is useful for factoring certain polynomials, because the derivative, af(x), and the greatest common divisor, (a(x), af(x)), can both be effectively calculated in F[x].

EXAMPLE 3. We wish to factor

a(x)

x9

+ 4x8 - 16x6 - 16x5+ 2x4 + 13x3+ 30x2 + 282 + 8

completely in Q[x]. The derivative of a(x) is

a'(x)
Denote d(x)

= =

9x8

+ 32x7 - 96x5 - 80x4 + 8x3 + 39x2 + 60x + 28.


d(x)
=

(a(x), af(x)). The Euclidean algorithm yields x4

+ 3x3 - x2 - 8~ - 4.
=

Then d'(x)

4x3

+ 9x2 - 22 - 8, and
(d(x), d'(x)) x

+ 2.

By Theorem 9-6.4, we know that (x Carrying out the division,

+ 2)21d(x).

344

T H E THFIORY O F -4T,(;EBH..IIC

EQUATIOSS

[CHAP.

The polynomial x2 - x - 1 is irreducible in Q[x] (see Problenl 3, Scction 9-5). Again by Thcorem 9-6;4, (x 2)3(x2 - x - 1)2 divides a(x). Ilividing, \ve obtain a(x) = (x 2)3(x2 - x 1).

+ +

This is thc complete factorization of a(x) in Q[x].

1. Find tlie derivatives of thc folloving polynomials. 3x4 Ex2 - x 6, in Q[x] (a) 5x5 (b) x4 2/'2 x2 1, in R[x] (c) x " ix3 (2 3i)x2 4 3 x i, in C[x] (d) xn - 1, in Q[x] (e) xn - 1, in Zp[x] (f) xp+l 1, in Zp[x]

+ +

+ + + +

+ +

2. Find the successive derivatives af'(x), aC3)(x), u ( ~ ) ( x )., . . ior thc polynomials given in Problem 1. In cach case, find tlie smallcst natural number nz such t h a t a(")(x) = 0. 3. Prove t h a t for any nonzcro polynomial a(x) E F[x] there is a natural numbcr m 5 Deg [a(x)] such t h a t a(")(x) = O for al1 n > m. Prove t h a t if thc z = Dcg [a(x)]. Wliat can ??z be if the characcharacteristic of P is zcro, thcn n teristic of F is a prime p? 4. Complete thc dctails of the proof of Theorcn~ 9-6.2(b). 5. l'rove Thcorem 9-6.3. 6. Tlse the metliod of Examplc 3 to factor thc follo~vingpolynomials con-iyletcly in the indicated I+'[x]. (a) x5 4x4 7 ~ ' 8~ x 9 35 2, in Q[x] (b) x G + 6 x + 11x4 12x3 19x2+ 6 x + 9, inQ[x] (c) x3 ix2 x i, in C[x] (d) x4 - 15x2 - 28x - 12, in R[x] (e) x3 (22/ 43)x2 (2 2 4 6 ) ~ 2 4 3 , in R[x] (f) x4 x3 x 1, in Q[x]

+ + + + + + + + + + + + + + + + + + + + +

7. Cse Theorem 9-6.5 to show t h a t the following pol~nomials are not irrcducible in Q[x]. (a) x4 2x3 3x2 2x 1 (b) 4x3 16x2+ 21x+ 9 (c) x 6 + x4 - x2 - 1
8. Show t h a t 'i'heorcm 9-6.4 is correct if the assuiiiption t h a t the charactcristic of F is zero is rcplaccd by thc condition t h a t the characteristic of Il' is a prime ivhich is larger than Deg [a(x)]. Givc an cxample ~vhichs h o ~ s that Theorcm 9-6.4 niay fui1 if tlic assumption (d) is omitted ciitircly.

9-71

T H E ROOTS O F A POLYNOMIAL

345

9. A nonzcro polynomial a ( x ) in F[x]is said to have a muEtiple3factor if there cxists a polynomial b(x) E F [ x ] ,of positive degree, such that b ( ~ ) ~ ( a ( Prove x). t h a t if F is a field of characteristic zero, then a polynomial a ( x ) E F[x] has a multiplc factor if and only if a ( x ) and a'($) are not relatively prime. 10. Use the result of Problem 9 to prove that the following polynomials have no multiple factors in Q[x]. (a) x4 x3 x2 x 1 22 - 1 ( b ) x3 (e) xn - 1 (d) " x 3x2 2x - 4
Find the condition on a and b in ordcr that the given polynomial have a multiple factor.

+ + + + + + 1 l . Let x3 + ax + b be a polynomial with rational coefficients.

9-7 The roots of a polynomial. We now rcturn to our study of the solutions of algcbraic equat,ions. The work of the last five sections makes it possible to discuss this subjcct more critically than wc did in Section 9-1.

DEFINITIOS 9-7.1. Let D be ari integral domain, and let A be a commutativc ring which contains D as a subring. If

anun E A is called the alu and ZL E A , t.hen the elemcnt a. ualue of a ( x ) for x = U , and is denoted by a ( u ) . The element a ( u ) is said to be obtaincd by substituting u for . x in a ( x ) . Since the reprcscntation a ( x ) = a. al,r anxn is unique (by Definition 9-2.1), it follows that a ( u ) is uniquely defined by Definition 9-7.1.
EXAMPLE 1. The polynomial a ( x ) Suppose t h a t 1 1 = Z [ x ] . Then if u = x 2 ( x - 1 ) + 1 = x3 - 3 x 2 + 55 - 2. a($) = 2(3) 1 =
=
-

x3 22 1 has coefficients in 2. 1, a(u) = a ( x - 1) = (x - 1)3 If -4 = Q and u = $, then a ( u ) =

+ +

The substitution process has some elementary propertics which are useful. (9-7.2). 1,et D be an integral domain which is a subring of a commutative ring A . Let f ( x ) , a ( x ) , and b ( x ) be in D[x]. Suppose that u E A . ( a ) If f ( x ) = a ( x ) b ( x ) , thcn f ( u ) = a ( u ) b(u). (b) If f ( x ) = a ( x ) b ( x ) , then f ( u ) = a ( u ) b(u). (c) If f ( 2 ) is a constant d in F, thcn f ( u ) = d. (d) If f ( x ) = a ( b ( x ) ) , that is, f ( x ) is thc polynomial obtained by substituting b(x) for x in a ( x ) , then f ( u ) = a ( b ( u ) ) .

346

THE THEORY OF ALGEBRAIC

EQUATIONS

[CHAP.

Let us prove (b). Suppose t,hat a(x) = Then

aix%nd b(r) =

bj2j.

Therefore,

The property (d) is obtained from (a), (b), and ( c ) by induction on t,he degree of a (x) . DEFINITIOX 9-7.3. Let D be an integral domain, and let A be a com) D[x]. An element mutative ring containing D as a subring. Let a ( ~ E c in A is called a root of a(x) [or a xero of a(x)] in A if a(c) = 0. The problem of finding the roots of the polynomial a(x) in A is exactly the same as the problem of solving the equation a(x) = O in A . We now restrict our attention to polynomials with coefficients in a field F. The results of Sections 9-3, 9-4, 9-5, and 9-6 (for example, the division algorithm, the properties of greatest common divisors, ancl the unique factorization theorem) can be used to obtain important information about the roots in F of polynomials in F[x]. Since many of the theorems proved in these sections do not apply t<opolynomials with coefficients in an integral domain, this restriction is essential. THEOREM 9-7.4. Remainder theorem. Let F be a field. If a(z) E F[x] and c E F, then a(c) is the remainder obtained on dividing a(x) by x - c. That is, there is a unique polynomial q(x) E F[x] such that

I'roof.

By t,he division algorithm,

where either r(x) = O, or the degree of r(x) is less than the degree of x - c. Since Deg [x - c] = 1, it follows in either case t,hat r(z) is a

9-71

THE ROOTS OF A POLYNOMIAL

constant d E F. By (9-7.2)) we obtain

THEOREM 9-7.5. Factor theorem. An element c in the field F is a root of the polynomial a(x) E F[x] if and only if x - c is a factor of a(x) in F[x]. x Proof. By Theorem 9-7.4, the remainder obtained on dividing a(x) by c is a(c) . Therefore, x - c divides a(x) in F[x] if and only if a(c) = 0.

The factor theorem is often useful when one is trying to factor a polynomia!.
EXAMPLE 2. By inspection, the polynomial x3 x2 x 1 has -1 as a root. Thus, by Theorem 9-7.5, x - (-1) = x 1 is a factor of x3 x2 x 1. Dividing, we find

+ + + +
+ +

+ +

1 is irreducible in Q[x]and R [ x ] ,because otherwise i t would The polynomial x2 have a real root, by Theorem 9-7.5. Thus, ( x 2 l ) ( x 1) is the complete factorization of x3 x2 x 1 in Q[x]and R [ x ] . However, in C [ x ] , x2 1 = ( x i )( x - i). Thus,

+ + + +

is a complete factorization in C [ x ] .

Using the factor theorem and the unique factorization theorem, we can now prove one of the most useful general theorems about the roots of polynomials. THEOREM 9-7.6. Let F be a field, and let a(x) E F[x] be a nonzero polynomial of degree n. Then a(x) has at most n distinct roots in F. If cl, c2, . . . , c k are al1 of the different roots of a(x) in F, then

where ml, m2, . . . , and m k are natural numbers, and b(x) is a nonzero polynomial in F[x] which has no roots in F. Proof. I f c E F is a root of a(x), then x - c is a monic irreducible factor of a($) in F[x], by Theorem 9-7.5. By the unique factorization theorem and Theorem 9-3.2(a), a(x) has at most n different monic irreducible factors

348

THE THEORY O F ALGEBRAIC EQU-4TIOSS

[CHAP.

in F[.r]. Therefore, a(x) has a t most n distinct roots in F. Let these be cl, c2, . . . , ck. Then x - cl, x - c2, . . . , x - ck must occur among the irreducible factors in the complete factorization of a(x) in F[x]. Thus we can write

1, m2 1, . . . , mk 1, and b(x) is a product of irreducible where m l polynomials which are different from x - cl, x - c2, . . . , and x - ck. If b(x) had a root c in F , then a(c) = O, so that c would be one of cl, c2, . . . , or ck. Ry t,he fact,or theorem, this would imply that x - cjlb(x) for some j. This would contradict the fact that b(x) is the product of al1 the irreducible factors of a(z) which are different from x - cl, x - cp, . . . , and x - ck. Therefore, b(x) has no root in F.

>

>

>

A particularly useful case of Theorem 9-7.6 is the following.


THEOREM 9-7.7. If a monic polynomial a(x) E F[x] of degree n has n distinct roots cl, c2, . . . , C, in F , then

Proof. By Theorem 9-7.6, i t is possible to write

where ml, m2, . . . , m, are greater than zero. Taking t,he degrees on bot,h sides, we obtain from Theorem 9-3.2(a), n = Deg [a(x)] = Deg [(x - cl)ml]
=

m1

+ m2 + + m, + Deg [b(x)].

+ Deg [(.c + Deg [(x

c,)".]

+ + Ileg [b(x)]

Since ml, m2, . . . , m, are natural numbers, this equality implies that = m, = 1, and Dcg [b(x)] = O. Hence, b(x) is a nonml = m2 = zero constant, and since a ( ~ is ) monic this constant must be 1. Thus,

A root c E F of the polynomial a(x) E F[x] is said t,o have multiplicity nz, or to be an m-fold root of a(x) if (x - c ) ' ~ divides a(x), but (x - c ) ~ + '
does ilot divide a(x) in F[x]. Thus, c is a root of multiplicity m of a(x) if

9-71

THE ROOTS OF A POLYNOMIAL

349

where b(x) E F[x] is a polynomial such that b(c) # O. Roots of multiplicity one are usually cal.led simple roots. Roots of multiplicity two or more are called multiple roots. I f the field F has characteristic zero, then it follows from Theorem 9-6.4 that a root of multiplicity m > 1 of a(x) is a root of multiplicity m - 1 of af(x), and a simple root of a(x) is not a root of a' (x).
EXAMPLE 3. Let us find the roots in C of x7 with the multiplicities of each root. We have

+ 2x6 + 3x5 + 2x4 + x3, along

The roots of x2

+ x + 1 can be obtained using Theorem 8-2.7. -3 + i(d3/2) and S - i(d3/2).


--

They are

Therefore,

and the desired roots are 0, plicities 3, 2, and 2, respectively.

-+ + i(d3/2), and -+

- i(d3/2), with multi-

Theorem 9-7.6 has a useful application in the theory of numbers.

THEOREM 9-7.8. Let p be a prime number. Suppose that

is a polynomial with integral coefficients, such that a, # O (mod p). Then there are at most n integers d which are incongruent modulo p, and satisfy a(d) = O (mod p).

anxn and b(x) = bo blx bnxn If a(x) = a. alx are polynomials with integral coefficients, it is customary to write a(x) b(x) (mod p) if ao-bo(modp), and a, = b, (mod p). al-bl(modp),

...,

350

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

I t is clear from Theorem 5-6.3, which gives the properties of congruence, that a(.) = b(x) (mod p) implies a(d) = b(d) (mod p) for any intcgcr d, and d = e (mod p) implies a(d) = a(e) (mod p) for any a(x) E Z[.r]. Becausc of thesc two observations, the study of the congrucncc modulo p of polynomials in Z[x] is equivalent to the study of polynomials with coefficients in the field 2,. Theorcm 9-7.8 is simply a reinterpretation of Theorcm 9-7.6 from this new vicwpoint. So that thc readcr can get a better understanding of the method of translating theorcms about the field Z , into statements about congruences modulo p, 11.c will give the proof of Theorcm 9-7.8 in full dctail. Suppose that d l , d2, . . . , dk are intcgers such that di and a(di) = O (mod p) for al1 i. We must show that k 5 n. Let bo, bl, . . . , b, be t,he remaindcrs obtained on dividing ao, a l , . . . , a,, rcspectively, by p. Lct el, e2, . . . , e k be thc remainders on dividing dl, d2, . . . , dk by p. 'I'hat is, a I. -= Ol. ( m o d p ) , and di for O
E

# d j (mod p)

for i # j,

OIbi<p,

ei (mod p),

5 ei < p.

5 j 5 n and

1 2 i _< 1;. Let

Then a(x) = b(x) (mod p), and b(ei) = O (mod p). Thc integers bo, bi, . . . , bn, and el, e2, . . . , ek can be rcgarded as elements of the field 2,. Thus, b(x) can be considered as a polynomial with coefficicnts in Z,[x]. Since a, # O (mod p), the leadirig cocfficicnt b, of b ( x ) is not zero. Therefore, Deg [b(x)J = n. S o t e that thc addition and multiplication operations of Z, are different from the operations in Z, so that thc result of substituting ei into b(x) when ei is thought of as an clcment of Z,, and b(x) is considered as belonging to Z,[x] will bc different from thc result

9-71

THE ROOTS OF A POLYNOMIAL

351

obtained when ei is taken a s a n integer, and b(x) as a polynomial with integral coefficients. I n Zp we have

whereas in Z , b ( ~ i )= bo f blei Theorem 6-3.4,

+ b2e: +

+ bnel.

However, by

by Theorem 9-7.6.

bo

+ blei + b2e: +
for

+ bnel = O (modp).
Since
# j,

Thus, in Zp, b(ei) = O. T h a t is, ei is a root of b(x) in 2,.

di # di (mod p)

i t follows t h a t el, e2, . . . , ek are distinct elements of 2,.

Therefore,

lc _< Deg [b(x)]

n,

EXAMPLE 4. We illustrate the proof of Theorem 9-7.8 by an example. Let a(x) = x3 - x2 x 9. Then

+ +
a($)

b(x)

x3

+ 4x2 + x + 4 (mod 5).

Considered as a polynomial with integral coefficients,

However, if b(x) is thought of as an element of 25[x],

Thus, b(x) has roots 1, 2, and 3 in Z5. Since Z5 is a field and Deg [b(x)] = 3, the polynomial b(x) cannot aave more than three roots. Returning to the original polynomial a(x), we see that

and if d is any integer such that a(d) 2 (mod 5)) or else d = 3 (mod 5).

a(1)

O (mod 5 ) )

a(2) = O (mod 5),

a(3)

= O (mod 5)) then either

- d

O (mod 5))
1 (mod 5))

Although a t first glance i t seems somewhat trivial, Theorem 9-7.8 is a powerful tool in number theory. T o support this statement, we digress from our study of the theory of equations and use Theorem 9-7.8 to prove

352

THE THEORY OF ALGEBRAIC EQUATIOXS

[CHAP.

the fact, mentioned in Section 5-8, that if p is a prime, then there are p(p(p)) = p(p - 1) primitive roots modulo p among the numbers 1,2, . . . , p - l . The reader who is not familiar with the material in Sections 1-6 aild 5-8 can pass on to the next section. Recall that if a is an integer prime to p, then the order of a modulo p is the smallest natural number d such that ad 1 (mod p). By Theorem 5-8.9, the order d of a modulo p is a divisor of p - 1, and a is called a primitive root modulo p if its order is p - 1. The desired result is a special case of the following theorem.

THEOREM 9-7.9. Let p be a prime. Suppose that dlp - 1. Then among the numbers 1, 2, . . . , p - 1, there are exactly p(d) integers mhich have order d modulo p. The proof is carried out in three stages. Only the first step uses Theorem 9-7.8. (1) Among the integers of the set S = (1, 2, . . . , p - 1) there are cxactly d which satisfy xd - 1 = O (mod p).

Proqf. Since d lp

1, we have

where e(x) = 1 xd By Fermat's theorem,

+ + xZd+

+ xk'd, with

[(p

l)/d]

1.

: o ~ - l- 1 = O (mod p) has p
-

1 solutions in S. By Theorem 9-7.8, ~ ( x ) O (mod p)

can have a t most kd

:od - 1

d solutions in S. Therefore,

O (mod p)

must have a t least d solutions in S. On the other hand, by Theorem 9-7.8, there can be a t most d solutions of zd - 1 = O (mod p) in the set S. (2) To obtain Theorem 9-7.9 from the result (1)) we will use induction on d. To carry out this induction, an important identity is needed:

that is, the sum of p(e) over al1 natural izumbers e which divide d (including 1 and d) is exactly equal to d.

9-71

THE IiOOTS O F A POLYNOMIAL

353

I'roof. Let T = { l , 2, . . . , d). For each divisor e of d, define Te = (lc E TI (d, k ) = el-. Then each number k E T belongs to exactly one of t,he sets l',, with eld, that is, 1' is thc union of t.he pairwisc disjoint collect,ion {Tele divides d] . 1-Icnce,by Theorem 1-6.4,

In ordcr to determine ITel, the number of elements in Te, note that k belongs to Te if and only if (d, Ic) = e, and that (d, Ic) = e is equivalent to eJi; and (d/e, k/e) = l . Hence, thcre is a one-to-one correspondence between l', and the set (m E 211 5 m _< d/e, (d/e, m) = 1)) given by

Thercfore, ITel = 1 (m E L(1 5 m 2 d/e, (d/e, m) = 1) 1 = cp(d/e), by the dcfinition of the totient, Dcfinition 5-8.5. Consequent,ly,

As e ranges over the divisors of d, so does d/e, in reverse order. Hence,


eld

eld

one natural (3) We can now prove Theorem 9-7.9. There is exact.1~ number a in the set S = (1, 2, . . . , p - 1) which has order 1 modulo p, namely, a = 1. Hcnce, the theorem is true for d = l. We can therefore make the induction hypothesis that if elp - 1 and e < d, then there are exactly cp(e) integers in S which have order e modulo p. For each divisor e of d, define Se = (a E Sla has order e modulo p). I t is obvious that the collection (Sele divides d) is pairwise disjoint. By Theorem 5-8.9: u((Sele divides d) ) = {a E slad Hence, by (1))
eld
eld,e<d
-

1 = O (mod p)).

By the induction hypothesis, I S , I using (2), we have

cp(e) if eld and e

< d.

Therefore,

354

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

Consequently,
Isdl =

~(d>.

This completes the induction, and proves the theorem.

1. FVithout actual division, find the remainder when (a) x3 22 - 4 is divided by x - 1 ; (b) x25 14x1 24 is divided by x 1 ; (c) x5 12x4 13x2 x 27 is divided by x 3.

+ +

2. Completely factor the following polynomials in C[x]. (a) x2 ix 2 (b) xs - 1 x2 1 (c) 2 4 (d) X~ - 2 ~ 3 5 ~2 zx 24 (e) x3 - 2 (f) x3 - 5x2 - 9 x + 12

+ + + +

3. Let (a) (b) (c) (d)'

a(x) = xn - 1. Find a(u) when u has the following values. u = -1 u = i U = X + 1 u = xn - 1

4. Find al1 monic fifth-degree polynomials f (x) in C[x] such that (a) f (x) has i as a root of multiplicity four; (b) f x) has 0, 1, 2, and 3 as simple roots; (c) f(x) has 1 and i as roots of multiplicitYf two; (d) f(x) has i and -i as simple roots and -1 as a root of multiplicity two. 5. Show that the sum of the multiplicities of the roots of a polynomial a(x) E F[x] is less than or equal to the degree of a($).
6. Show that if a(x) and b(x) are polynomials of degree less than n in F[x], and if a(di) = b(di) for i = 1, 2, . . . , n, where di, d2, . . . , d, are distinct elements of F, then a(x) = b(x) in F[x].

7. Let a(x)

xP - x

+ 1 E Zp[x]. Prove that a(d)

1 for al1 d E 2,.

8. Prove Taylor's theorem: I f f(x) is a polynomial of degree n in F[x], where F has characteristic zero, and if c is any element of F, then

[Hint: Use Theorem 9-3.5 and Example 2, Section 9-6.1


9. State which of postulates of ring theory are used in the proof of (9-7.213). Prove (9-7.2~). 10. Prove that the derivative of a(b(x)) is at(b(x)) b'(x).

9-81

THE FUXDAMENTAL THEOREM OF ALGEBRA

355

11. Show that if f(x) = ax2 bx c E F[x], a # O, then either (a) f(x) has two distinct roots in F, (b) f(x) has one root of multiplicity two in F, or (c) f (x) is irreducible in F[x]. 12. Show that in Z,[x] there are exactly +p(p - 1) polynomials of the form x2 ax b which are irreducible. [Hint: Show that there are +p(p 1) polynomials of this form which are reducible.]

+ +

+ +

13. Let a(x) be a monic polynomial of degree n in Z[x]. Suppose that p is a prime. Assume that di, dz, . . . , d, are integers such that di $ dj (mod p) if i # j, and a(di) = O (mod p) for al1 i. Prove that
a(x)

= (x

- dl)(x

d2).

. . (x

d,) (modp).

14. Use Problem 13 and Fermat's theorem to show that if p is a prime, then xp-l
-

(x - 1)(x - 2 ) . . .[x - (p - l)] (modp).

From this identity, deduce TVilson7stheorem: (p - l ) !

= -1 (modp).

9-8 The fundamental theorem of algebra. We come now to what is probably the most important result in the theory of equations.

THEOREM 9-8. l . The fundamental theorem of algebra. If f (x) E C[x]is a nonzero polynomial with Deg [ f (x)] >_ 1, then f (x) has at least one root in C. This theorem was surmised as early as the sixteenth century. Severa1 incorrect proofs of it were published before a satisfactory proof was found by Gauss in 1797. Gauss ultimately gave five different proofs of the fundamental theorem of algebra, each of which introduced new ideas and methods which have greatly influenced the development of mathematics. Of course, many other proofs of this theorem have been discovered since Gauss's time. Unfortunately, al1 of the known paths from elementary mathematical principles to Theorem 9-8.1 are quite long. We will not try to give a proof in this section. The reader who is interested in seeing a complete and correct proof can study Appendix 3 of this book, after he has read the remainder of this chapter. I t is possible for us to give a geometrical argument which shows that the fundamental theorem of algebra is plausible. a0 E C[x], where a, Z O and Let f(x) = anxn an-lxn-l n >_ 1. Since every root of

is also a root of f(x), we can assume that a, = 1. I f a. = O, then x = O is a

THE THEORY OF ALGEBRAIC EQUATTOXS

[CHAP.

root of f(z). Therefore, assume that a. # 0, t.hat is,

If a complex number z is suhstituted for .r in f(x), then we obtain a complex number j(z). We interpret the numbers z and f(z) as points in the complex plane. As z ranges over a circle of radius r with ceiiter a t t,he origin O of thc: complex plane, the corresponding point f(z) describes a closed curve C,. Figure 9-1 shoms the curves ClI4, Cll,/j, C1, and C312for the polynomial f(x) = "2 -. z i. If r = O, then C, is not a curve, but instead it is the point ao, and for small positive values of r, C, lies very close to this point. I n particular, for sufficiently small values of r, C, does not enclose the origin of the complex plane, because a. # O. If r is very large, the curve C, is approximated by the curve C: corrcsponding to the polynomial xn, since for values of z which have large absolute value, the term zn in f(z) dominates the sum a,-lzn-l . . a l z a. of the rcmaining terms. If z = r(cos 0 i sin O ) , then zn =-rn (eos n0 i sin no) (see Section 8-4). Thus, C: is a circle of radius rn which is traversed n times as z circles the origin once. From this obscrvation, it follo\vs that for large r, C, is a curve which

9-81

THE FUNDAMENTAL

THEOREM OF ALGEBRA

357

encircles the origin of the complex plane n times and lies relatively close to the circle mith center O and radius rn. AS r increases from small to large values, C, is deformed from a curve which does not enclose the origin into one which encircles the origin n times. The reader should try to visualize this deformation process in Fig. 9-1. It is geometrically evident that a t some stage of this deformation process, the corresponding curve must pass through O. That is, there exists an r > O such that C, passes through O. By definition of C,, this means that for some complex number x with 1x1 = r, the value of f(x) for x = x is O. Thus, x is the desired root of f(x). I t is possible to make this intuitive argument into a valid proof of the fundamental theorem of algebra by giving exact definitions of the geometrical concept of a curve, of the deformation of one curve into another, and of the idea of a curve enclosing a point. I n addition, it is necessary to establish some properties of these notions which seem obvious, but turn out to be very difficult to prove. To carry out this program would require a fairly deep penetration into the field of geometry which mathematicians cal1 topology. Since our main interest in this book is algebra, \ve will not pursue this topic. We now examine some of the consequences of the fundamental theorem of algebra. THEOREM 9-8.2. The irreducible polynomials in C[x] are exactly the polynomials of degree one. Hence, every polynomial a(s) E C[x] of positive degree can be written in the form

where b is a nonzero complex number, cl, c2, . . . , cn are al1 of the roots of a(x) in C (possibly with repetitions), and n = Deg [a(x)]. This factorization of a(x) is unique up to the order of the factors. Proof. Suppose that p(x) is an irreducible polynomial in C[x]. By Definition 9-5.1, p(x) # O and Deg [p(x)] > O. Therefore, by the fundamental theorem, p(x) has a root c E C. By the factor theorem, x - c divides p(x) in C[x]. Thus, x - c = b p(x) for some b # O in C (by Definition 9-5.1)) so that p(x) = 6-' (x - c) has degree one. Since polynomials of degree one are always irreducible (see Example 2, Section 9-5)) this proves the first statement of Theorem 9-8.2. The second statement is a consequence of the unique factorization theorem, taking into account what we have just shown. The reader should bear in mind that since Z C Q C R C C, polynomials with coefficients in Z, Q, or R are polynomials in C[x], and therefore they have roots in C. This observation leads to the characterization of the

358

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

irreducible polynomials in R[x]. First we need an important property of the complex root's of real polynomials. THEOREM 9-8.3. Let f(x) E R[x] C[x]. I f c = a ib is a complex number which is a root of f(x), then the complex conjugate = a - ib of c is also a root of f(x). O f course, it may happen that c itself is real, in which case this case, the theorem is trivial. To prove this theorem, let
=

c. In

where ao, a l , . . . , a, are real numbers. Then

Taking the complex conjugate of the left-hand side of this equation, we obtain from Theorem 8-2.2

Since ao, a l , . . . , a, are real, it follows that o = ao, 1 = al, , = a,. Therefore,

. . . , and

Thus, E is a root of f(x). THEOREM 9-8.4. The irreducible polynomials in R[x] are exactly the polynomials of degree one and the polynomials

with a, O, and c real, and b2 - 4ac < O. Hence, every polynomial a(x) E R[x] of positive degree can be written in the form

where b is a nonzero real number, cl, c2, . . . , C, are al1 of the roots of a(x) in R (possibly with repetitions), and dl(x), d2(x), . . . , ds(x) are quadratic polynomials in R[x] which have no real roots. Proof. Suppose that p(x) is an irreducible polynomial in R[z]. Then p(x) Z O and Deg [p(x)] > O. By Theorem 9-8.1, there is a complex number x such that p(z) = O. If x is real, then z - divides p(z) in R[x],

9-81

THE FUNDAMENTAL THEOREM OF ALGEBRA

359

so that p(x) has degree one, as in the proof of Theorem 9-8.2. Therefore, suppose that x is not real. By Theorem 9-8.3, Z is a root of p(x), and Z # x. Let

By Theorem 8-2.2(f), x 2 = 2@(x) is real, and by Theorem 8-2.4(b), z.2 = /zl is real. Therefore, d(x) E R[x]. By the division algorithm, we can write P(X>= q(x) . d(x)

+ 44,
=

where q(x) and r(x) are in R[x], and either r(x) Deg [d(x)] = 2. Since ~(2) = p(z) and r(2) = p(2)
-

O, or else Deg [r(x)] < 0,

q(z) d(x) q(2) d(2)

= =

O O

q(x) O

q(x) O = 0,

it follows that r(x) must be the zero polynomial. Indeed, otherwise the number of roots of r(x) would exceed Deg [r(x)], which is impossible by Theorem 9-7.6. Thus, d (a) divides p(x) in R[x]. Since p(x) is irreducible,

where a is some nonzero real number, and b = - a . (x are also real. Moreover, by Theorem 8-2.2(f),

+ z), c = a

xZ

since a # O and g(x) # O (because x is not real). This shows that every irreducible polynomial in R[x] is either linear or of the form ax2 bx c . Conversely, al1 such polynomials are irreducible in with b2 - 4ac < O R[x] (see Problem 3, Section 9-5). The last statement of Theorem 9-8.4 is a specialization of the unique factorization theorem to the ring of polynomials with real coefficients.

+ +

EXAMPLE l. The knowledge of a single root of a polynomial often simplifies the task of finding the remaining roots. For instance, if we are given that 1 i is a root of x4 - 4x3 5x2 - 23 - 2, then it follows from Theorem 9-8.3 that 1 - i is also a root. Dividing x4 - 4x3 5x2 - 2x - 2 by

360

THE THEOHY OF ALGEBK.IIC

EQU~ITIOSS

[CIIAP.

9
1

gives thc quoticiit x2 :irc ancl

2x

1. 135- Tlicorcni 8-2.7, tlic roots of x2


'4)
=

22

*(2 *(2

+ 214 -j-

1 -1- 4 2 ,
1
-

- 4)

42.
-

I'l-icrcforc, al1 of tlic roots of x4 1-t-i. 1-i,

4 ~ :5 . 7 . " ~
l-i-z/12,

21

2 are 1-42.

aiid

I.:S.~JII>LI-: 2. S O I I I P ~ ~ Ii~ t Iis CS ~iceclssarj-t o ~letcrininc a polj-noniial froni tlie kiio\\-lcdge of its roots. If tlie polynornial belorigs t o C [ r ]and t h r leading coefficicnt, :iiitl al1 of tliv co1iil)lcs roots, togetlicr n i t h their niultil~licitics, are given, tlicii 'Tlicorcrii 9-8.2 solvrs tliis problcni. For cxaiiil)le, tlic riioriic ~)olj.rioinial \~liic.h has i as a doublc root, 1 i as a sini1)lc root. ancl 1 :is a sirnplc root is

(.r

il2[x - ( 1

+ i)](s

1)

- ~4 -

( 2 + 3ilX:3 - ( 2

5 i ) . r 2 + (4

i)x

(1

+i).

Vcrj- oftcri in such ~)roblcnis, the inforin:itiori about thc roots is inconil~lete, so tliat i t is neccssary t o use othcr dat:i. For csaml)lc, suppose t h a t n-e wish t o find cvcry real, cubic 1)olynonii:il n ( x ) witli 1c:idiiig cocfficiciit 1 and coristant tcrril 1, u-hich h:is i as one of its roots. Siricc a(.r) is to 1ia1,c real coefficients and i is :t root, i t folio\\-s froii~ Theorciil 0-8.3 t h a t - i is also :i root. Let x bc the rciiiainirig root. 'i'lieii

3lultil)lj-ing oiit tlio lcft-haiicl siclo of this clcjiiality g i ~ - e s

'Thcrcforc, b = -2, c = 1, 1 = --z. '1'Iius, tlic oiily 1)olj.noniial ivitli thc rcquirccl propcrty is x3 -1- x2 -1 t 1 . Of coursc i t cal1 :11so be scen t h a t z = -1 by ol)scrvirig t h a t tlic 1)rotluct of tlic roots of a cubic polynon~ialis equal t o tlic ncgativc of tlic coristnrit tcriii clividctl b ! - ttic 1c:iding cocfficicnt.

l . 1-sing IFig. 9-1, ostiiiintc roiiglily tlic :ibsoliitc vnlucs of tlie roots of tlic l)olyiion~ial x:~ r i.

+ +

2 . Firid al1 of tlic roots of thc folio\\-ing l)ol!-iioniials, riiaking usc of tlic givcii tlit:i. (a) x:j GX' - 24.r - 160, oric root of \vliicli is 2 - 2 4 3 i . (b) rX-1 (1 - 2 i ) s 2 - ( 1 4- %).t. - 1, u-hicli h:is a doul~lc root. 4 1 . : ' - 4.r -14 , \vliicli 1i:is 1 -1- i 3 s :i cloublc root. (c) X" - 3s"

9-91

SOLUTIOX OOFTHIIID-

SI)

FOUKTH-DEGREE EQUATIONS

361

3. Find the monic polynomial a(x) in C[x] from the givcn data. 4i, and 1 - 4i, and no othcrs. (a) a(x) has simple roots 1, 2, i, 1 (b) a(x) has i a s a root of multil~licitythree and Dcg [a(x)] = 3. (c) a(x) is real, of the fourth dcgree, and has 1 - i and i among its roots. bx c, and 2 i is a (d) a(x) is a real cubic polynomial of thc form x3 root of a(x).

+ +

4. Let r l , r2, and r3 bc thc roots of the cubic polynomial

Exprcss a, b, and c in tcrms of r l , r z , and rs. Obtain similar results for monic polynomials of degrce four.

5. Gsing Thcorcm 9-8.4, prove that every real polynomial of odd degree has a t least onc real root. [Remark. In Section 9-10, we will give a proof of this fact which docs not make indircct usc of the fundamental theorcm of algcbra.]
6. Let f(x) be a monic polynomial in R[x]such that f(x) has no roots in R. Prove t h a t f(a) > O for al1 real numbers a.

7. Lct f(x)

ax2"

n 2 l.

+ bxn + c be a polynomial +

in C[x], whcre a # O and

(a) Show how Theorems 8-2.7 and 8-4.3 can be used t o find al1 of the roots of f ( 4 (b) Find thc roots of xG - 2ix3 (-1 - i). 8. Prove t h a t if f(x) E R[x]has thc complex root c with multiplicity m, then f (x) also has C as a root of multiplicity m.

*9-9 The solution of third- and fourth-degree equations. The fundamental theorcm of algebra is what mathematicians cal1 an existence theorem. I t asserts that certain numbers always exist', but it gives no method for talian mathematicians of the Renaissance period were finding them. The 1 mainly concerned with methods by which they could actually determine roots of particular equations. It was a remarkable achicvement that theyt discovered formulas which explicitly exhibited the solutions of third- and fourth-degree equations. 'I'he cxpressions which give the roots of the general cubic equation can easily be derived by formal manipulation. Suppose that x = x is a solution of " 2 bx2 cx d = 0, (9-3)

+ +

Scipio Ferro discovered a solution of x3 ax = b, whcre a and b are positive real numbers. This was rediscovered and gcneralized somewhat b y Tartaglia, who showed his work t o Cardan under a pledge of secrecy. Cardan published the result of Ferro and Tartaglia, togethcr with some discoveries of his own, but he neglccted to mcntion t h a t the solution of the cubic equation was not his own work.

362

THE THEORY OF ALGEBRAIC EQUATIOXS

[CHAP.

where b, c, and d are arbitrary complex numbers. Let w = z

+ b/3. Then

Conversely, if w satisfies

then it is easy to see that x = w - b/3 is a solution of x3 bx2 cx d =O . Therefore, we can restrict our attention to reduced cubic equations, that is, equations of the form

+ +

where the coefficients p and q are related to the coefficients of the general cubic equation (9-3) by

I f p

O in (9-4), then the reduced cubic equation has the special forrn

In this case, the three roots of the equation are the three complex cube roots of -q which can be found by Theorem 8-4.3. Thus, we may assume that p # O in (9-4). Suppose that w is a solution of (9-4). Let u satisfy u2 - wu - p/3 = O . Then, since p # O, it follows that u # O . Therefore, w = u - p/3u. Substituting in (9-4), we have

( u - p/3uI3
that is, Consequently

P(U -

p/3u)

+q

O,

u3 - ( ~ / 3 ~ ) 3 q = o. (u3I2 q(u3) - ( ~ / 3= ) ~ o.

It follows from Theorem 8-2.7 that u satisfies

Therefore, u is a solution of one of the two equations

Suppose that u satisfies the equation (9-6). By Theorem 8-4.3, this equation has three solutions. I f u is any one solution, then the other two are f u and f 2 u , where
f = cos 120 f2

+ i sin 120 = -3 + 32/5i, = ~ 0 ~ 2+ 40 isin240 = -3 - +2/5i = 1/f


The next step is to check directly that

(see Problem 7, Section 8-4).

are actually roots of the reduced cubic equation x3 first note that

+ px + q = O.

We

Therefore,
(-P/~u> =~-q/2
-

d(q/2)2

+(~13)~.

Substituting wl = u -p/3u in x 3

+ px + q, we have

by (9-8). Similarly, since

f3

1,

364

THE THEORY OF AL(;ERR;IIC

EQTTATIOSS

[CHAP.

Iii the same \\;ay, ('U - (p/:<usatisfies (9-4). Thcrefore, w l , w2, and W Q are roots of the reduced cubic. These roots wcre obtained by assuming that u satisfies (9-6). Bowever, (0-8) shows that if u is a solution of (9-ti), then u = - p / 3 u is a solution of (9-7). Therefore, the three solutions of (9-7) are u, ( u , and ?u. Thcse lead to the roots

of thc reduced cubic. Thus, (9-7) does not lcad to new solutions of (9-1) Wc summarize our results in the follo~ving theorcm. THEOHEM 9-9.1. 1,et p # O and q be complex numbcrs. Thcn the solutions of the reduced cubic equation

are given by the expressions

is ariy one of the three solutioiis zc of the ecluatioii

is a
UL!=

solution 21 of "u -p/3. Of course


u =

-q/2

v'm)" f (p/:3):$ such

that

J-q/2

+ d ( q / 2 ) 2 + (pm 'j

-p/3

The expressions in this theorem for tho solutioris of the rcduccd cubic equation are kilo~vnas Cardan's formzLlas.

9-91

SOLUTION O F THIRD- A S D FOURTH-DEGREE EQUATIOKS

365

EXAMPLE 1. Let us solve x3 3x2 2 = O. The corresponding reduced equation is obtained by letting x = y - 1: y3 - 3y 4 = O. Thus, p = -3, q = 4, and

+ +

Taking
4-2+

43

and

q - 2
-

43

to be the real cube roots of -2

+ 2/3 and -2

2/3, respectively, we have

Hence, the solutions of the reduced equation are

and
{ W - 2

+ i 3+ j-4-2
=

43,

and the solutions of x3

+ 3x2 + 2

O are

The solution of the general quartic equation can be obtained from the solution of a cubic equation by an ingenious trick discovered by Ferrari (1522-1565), a student of Cardan. As in the case of cubic equations, it is convenient to reduce the general quartic equation,

to the special form


?J4

+ r?J2+ + t = 0)
Sy

by substituting x = 71 - 0/4. complex number, then


(2J2

I f y is a solution of (9-10) and u is any


- T

= y4

+ 2uy2 + u2 = ( 2 u

) ~

sy ~

+ (u2- t ) ,

366
since y4
=

THE THEOHY OF ALGEBH-IIC

EQUATIOSS
21

-rg2 ( u

sy

t. Let us try to choose


S

so that

)y2-

+ (z2

t)

(my

+ n)2

for suitable complex numbers m and n. This cquation will certainly hold no matter what y may be, provided m2 = (22s - r), n 2 = u 2 - t, and 2mn = -s. These rcquirements impose the condi tioii (-S) = (2mn) = 4m2n2 = 4(2u - r)(u2 - t). In other words, 11 must satisfy the resolztent cubic equation

If t,his condition is fulfilled, thcn


(Y2 where m Therefore, y2
=
1 1

(m9

+ n)2,
-

v''%T, n = -s/(2d2u
f(my

r).

(9-1 2)

+ n), and g is a root of one of the equations

lo (9-13) are the roots of the reduced The four roots of the t ~ ~ equations quartic equation (9-lo), as is easily shown by reversing our steps in the derivation of (9-13). Since a solution of (9-11) can be obtained, using Theorem 9-9.1, it follo~vsthat (9-10)) and hence (9-9)) can be solved explici tly.
EXAMPLE 2. Consider thc quartic cquation

To reduce this equation, let y

l . JYe obtain

Thus, r

6, s

4, and t

2, so that the rcsolvent cubic equation is

Clcarly, u

1 is a solution of this resolvent cubic, and 1vc obtain

Thus, y is obtained as a solution of

9-91

S O L U T I O S O F THIRD- . i S D F O U R T H - D E G R E E E Q U A T I O S S

367

The quadratic formulas, Theorem 8-2.7, givc

The square roots 4 - 2

+ i and 4 - 2

i can be computed from (8-9) :

x4 - 4x3

Combining thesc results, ~ v c obtain al1 of the solutions of the original equation 12x2 - 122 5 = O :

Succcss in solving thc cuhic and quartic equations led mathematicians from t,he time of Bombclli to scck similar results for the general fifth-degree bx4 -t cx3 d x 2 ex f = O. However, al1 efforts cyuation x5 failcd. Thc rcason for this failurc )vas finally discovered in 1824 by t,he young Korwegian gcnius, Y. 11. Abel (1802-1829), who provcd that the gencral fifth-degrcc cquat,ion cannot be solved by means of radicals. T h a t is, there are rio cxpressions (involving only the operations of addition, multiplication, subtraction, division, and thc operation of taking square roots, cube roots, fourth roots, etc.) which explicitly cxhibit the roots of an arbitrary moriic fifth-degrec polyriomial in terms of the coefficicnts of the polynomial. I h e n dceper insight into the solutions of polynomial equations resultcd from the investigations of Abcl's E'rench contemporary, Kvaristc Galois* (1811-1832). Galois' theory not only sho~vcdwhy it is

+ +

* Galois was perhaps the grcatest of al1 mathematical prodigies. Of him i t can truly be snid t h a t hc: \vas neither a1)preciatcd nor understood during his lifctime. His mathematical work !vas not publishcd until 14 years after his death, and \vas not absorbed into thc body of mathcmatical kno~vledgefor another 25 years. Yet thc ideas in this work rcvolutionized algcbra. Galois \vas killed in a ducl a t thc age of 21.

368

TIIE T I I E O I ~ Y OE' A L ( ; E B ~ I . ~ I CE Q U ~ ~ T I O S S

[CHAP.

impossible to solve the gencral fifth-dcgrce ccluation by radicals, but also revealed ivhy the third- arid foilrth-degree equations caan he solved. Evcii today, Galois' work stands, practically uilchanged, as one of the most bcautiful thcories of modern mathemat ivs.

l . Solve the follo\ving equations.

(a) (b) (c) (d) (e) (f) (g)

x3 - 9x - 12 = O x3 - 1 8 ~ 30 = O z" - 6x2 - 6x - 2 = O x3 - 3ix (1 - i ) = O x4 - 4x2 8x - 4 = O 2 " 4x3 - 5x2 -t 12x 6 x4 - x2 - 2 i x + 6 = O

+ +

2. (a) I'rove in dctail t h a t nny solution y of one of thc equations (9-13) is a solution of (9-lo), provided ?n antl r~ are givcri by (9-12) :iiid u is any solution of (9-1 1). (b) 11-rite on :t largc 1)icce of pn1)cr aii cs~)rcssioriwhich gives a solutioii y of (9-10) iii tcrms of r, S, arid t.

3. I,et f(x) be a monic cubic 1)olynoiiiial ivith roots rl, criminant D of f(x) is dcfirichcl t o bc

7-2,

and 7-3. T h e dis-

L'se Tlicorein 9-9.1 to prove t h a t the discriniinaiit of x "

px

+ q is
+ 3- + c2 = O.]

[lint: 1<?- tlcfinitiori, 4. Lct f(z) = x3

+ bx2 + cx + d.

c3 =

1, tlnd by Problcni 8, Scctiori 8-4, 1

Find the discrirniriarit of f(x).

5. Prove tliat s cubic polynomial f(x) in 12[x]has thrce distinct real roots if thc discriminarit IJ of f ( r ) is positive, rc:tl roots, oric of whicli is a multiplc root, if D = O, nrid n single real root and two (riorircal) coiiiplcs conjugate roots if D < O.
6. Cse tlic rcsults of I'roblciiis 3 and 5 to detcriiliiie tlie riumbcr of real roots of the follon-iiig polyrioniials. (h) z3 - ql-0 .x -- 1 (c) 2~~ - % . { 1 (a) xLk 2x - 1

7. E'incl tlic roots of z" - 2.c 1 by obscrvirig tliat 1 is a root. Firicl the exl~rcssion, giveri by 'i3icorciii 9-9.1, for cacli of tlicsc roots.
8. Let a(x) = z:' -1- p r -1- q, jvlicrc p ancl rl nrc real arid (p/3)3 (so tliat p < 0). I'rovc tliat tlic tlircc roots of a(x) :ire

+ (q/2)2 < O

9- 1o]
where

GRAPHS O F REAL P O L Y S O M I h L S

4 is an angle such that

[Iiint: Let -q/2 i~'-[(p/3)~ ( ~ / 2 ) ~ = 1 r ( ~ 04 s i sin 4). Show that r = d-p"/7 and cos 4 = (-q/2)/V'-p3/27. Substitute into Theorem 9-9.1, and use Theorem 8-4.3.1 9. Use the result of Problem 8 to find the roots of the follo~ving polynomials. (a) x3 - 22 1 9 (c) x3 - 3x2 - 32 - 4 (b) x3 - 92

9-10 Graphs of real polynomials. An importarit part of the theory of equations in R [ x ]is concerned with finding the real roots of polyriomials. For a given polynomial a ( x ) E R [ x ] , the problem is to determine the number of real roots of a(.c) and obtain decimal approximations of each real root. In this section and the following one, we will discuss some of thc basic methods for solving these problems. 1,et a ( x ) = a,xn an-lxn-l alx a0 be a polynomial with real coefficients. Associated with each real number c is the value a(c) of a ( x ) a t x = c. Of course, a(c) is also a real number. The set of al1 ordered pairs of real numbers a(c)>lcE R )

is called the graph of a ( x ) . Since each ordered pair of real numbers can be represented by a point in a coordinate plane, the graph of a ( x ) can be represented by a set of points in the plane. I t is customary to also refer to this set of points as the graph of a ( x ) . Experience shows that the graph of a real polynomial a(x) is a smooth unbroken curve. 14'or example, if a ( x ) is a constant polynomial, then the graph of a ( x ) is a horizontal line. If Deg [ a ( x ) ]= 1, then the graph of a ( x ) is a straight line which is neither horizontal nor vertical [see Fig. 9-2(a)]. If Deg [a(m)]= 2, then the graph of a ( x ) is a parabola [see 17ig.9-2(b)].

370

THE THEORY OF ALGEBRAIC EQUATIOKS

[CHAP.

From the graph of a real polynomial a(x), it is possible to obtain a great deal of information about a(x). For example, the real roots of a(x) are the numbers c such that a(c) = O, that is, they are the points at which the graph of a(x) either touches or crosses the X-axis of the coordinate plane. Thus the graph of a ( s ) tells us (at least roughly) where the real roots of a(z) are located.
EXAMPLE l. Let us sketch the graph of a(x) = x3 - 3x2 - 22 6. I t is convenient to make a table of values of a(c) corresponding to various choices of c:

We plot the points determined by the pairs (c, a(c)) from the above table in a coordinate plane, and sketch a n unbroken curve which passes through these points (see Fig. 9-3). It is seen from this graph that a(x) has three real roots a t approximately -1.5, 1.5, and 3. Actually 3 is an exact root of a(x) as our table shows, and factoring out x - 3 gives

6 it is not necessary to plot the graph in Hence, for a($) = x3 - 3x2 - 2x order to find the real zeros. However, for polynomials of higher degree, graphical methods may be the most effective way of approximating the roots.

GRAPHS O F REAL POLYNOMIALS

The fact that the graph of a polynomial is an unbroken curve suggests the following important result. THEOREM 9-10.1. Let f(x) be a polynomial in R[x]. Suppose that a and b are real numbers such that a < b, and f(a) and f(b) have opposite signs. Then f(x) has a t least one real root c with a < c < b. This result is intuitively obvious. In fact, by assumption, the points (a, f(a)) and (b, f(b)) are on opposite sides of the X-axis in the coordinate plane. Since the graph of f(x) is an unbroken curve which passes through these two points, this graph must, at one or more points between a and b, cross the X-axis (see Fig. 9-4). That is, there is a real number c with a < c < b, such that f(c) = 0. Of course, the above remarks do not constitute a proof of the theorem. The completeness property of the real numbers will be used to locate the largest root c of a(x) in the interval from a to b. The argument is a slight modification of the proof of Theorem 7-6.3. The proof of Theorem 9-10.1 will not make use of the fundamental theorem of algebra. This remark is important, because our proof of the fundamental theorem given in the appendix is based on Theorem 9-10.1. Before giving the proof, it is convenient to establish a simple property of real polynomials. (9-10.2). Let g(x) E R[x]. Then there is a positive real number m which depends only on g(x) such that -m g(h) 5 m for al1 h E R satisfying Ihl 5 1.

<

Proof. Let g(x) = bo blx from Theorem 4-6.7 that,

+ bnxn. Then if (hl 5 1, it follows

372

THE THEOHY OF ALGEBRAIC EQU.~TIOSS

[CIIAP.

Shus, \ve can let m = lbO( jbl( polynomial, and m = 1 if g(x) = 0.

+ lb,/

if g(.c) is iiot, the zero

I'rooj o j Theorem 9-1 0.1. Since j(a) and f(b) havc opposite signs, it, follows that either j(a) > O > f(b), or !(a) < O < f(b). We will prove the theorem for the case j(a) > O and j(6) < O. The proof iil the other case is similar. Let S = {t E R J a 5 t 5 6 and S() > O). . That is, S is the set of al1 real numbcrs betwecn a and b for ivhich the value of j ( s ) is positivc. The sct S is riot empty sincc a E S . Xloreovcr, 6 is an upper bound for S. Since R is a complete ordered ficld, the sct S has a least uppcr bound (see Ilefinition 7-5.4). Let c = 1.u.b. S. Then a c, because a E ,S' and c is an upper bound of S, and c _< b, since b is an upper bound of S and c is the least upper bound of S. The definitions of S and c imply two facts which \ve will use: (1) if c < t _< b, thenf(t) _< 0; (2) if h > O, then there is a real number t such that c - h < t 5 c and f(t) > 0. Indeed, if c < t 5 b, thcn t 4 S, since c is an upper bound for S . IIowever, a 5 c < t 5 b and f(t) > O implies that t E S, by the definition of S . Therefore, j(t) > O is impossiblc. That is, j(t) _< 0. Morcover, h > O means that c - h is not an upper bound of S, so that c - h < t for some t E S. Furthermore, t E S implies t 5 c and f(t) > O. The proof will be completed by showing that both of the inequalities f(c) > O and f(c) < O lead to contradictions. Indeed, it then follows that f(c) = 0, so that c # a and c # b. Thus, a < c < O. Consider the polynomial f(x C) - f(c), where f ( s c) is obtained from f(x) by substituting n: c for x in f(x). Since f(0 c) - f(c) = O, it follows that O is a root of this polyr~omial. Consequently, by the factor theorem, we have (3) f (x c) - j(c) = x g(:L.), where g ( x ) is some polynomial iri R[x]. Let m be a positivc real number such that (4) if h E Ii and Ihl 5 1, thcn -m 5 g(h) _< m. Such a numbcr exists, by ($1-10.2). Suppose that f(c) > O. Then c < b, sincc j (b) < O. Define

<

+ +

+ +

(i) min

(2, b

c, j(c)/m)

This definition is so contrived that h sat.isfies (5) h > 0, (6) h 5 1, (8) h m < f(c). (7) h c < O , By (3), (a), (S), (6), aild (8)) wc obtain

9- 101

GRAPHS O F R E A L POLYNOMIALS

373

However, it follows from ( 5 ) and (7) that c < h c < b, so that this inequality is in contradiction with (1). Therefore, f (e) > O is impossible. Suppose that f (e) < O . Define h
=

min (1, -f(c)/m).

This choice of h leads to the inequalities (9) h > 0, (10) h _< 1, (11) h m -f(c). By (2), there is a real number t such that c - h < t Consequently, -h < t - c O < h, so that It - cl fore, by (3)) (4)) and (1 1) (substituting t - c for x),

<

<

< c and f(t) > 0.


<h 5
l. There-

This contradiction shows that f(c) Theorem 9-10.1 is complete.

< O is impossible,

so that the proof of

EXAMPLE 2. Another proof that each positive real number d has a real nth root (Theorem 7-6.3) can be obtained very easily from Theorem 9-10.1. Con1)" d l. Hence, sider the polynomial f(x) = xn - d. Since n 2 1, (d f ( d + 1) = ( d + 1)" - d d + 1 - d = 1 > O,andbecausef(O) = -d < 0, i t follows from Theorem 9-10.1 that f(x) has a positive root. That is, d has a positive nth root.

>

+ > +

3. Theorem 9-10.1 can be used to locate the real roots of the polyEXAMPLE nomial f(x) = x3 - 12x2 - 13x 6. J%Te make a table of values for f(x) :

By Theorem 9-10.1, f(x) has three real roots ti, ta, and
-2

t3

such that

< ti <

-1,

<

tg

<

1,

and

12

< ts <

13.

Since f (x) can have a t most three roots, ti, tg, and t3 are al1 of the roots of f (x).

374

THE THEORY OF ALGEBRAIC EQUATIOXS

[CHAP.

To make the most effective use of the method used in Example 3 to locate the real roots of a polynomial f ( x ) E R [ x ] ,it is desirable to have an upper and a lower bound for the real roots of f(x). Otherwise, we will usually not know how large or small to take t in calcuIat,ingf(t) for a table of values of f ( x ). THEOREM 9-10.3. Let f ( x ) = xn be a polynomial in R [ x ] . Define
A l = max (-an-l,

+ a n - l ~ n - i + an-2xn-2 +
. . , -a09

+ a0

-an-2,.

o>

and m = max (an-1, -an-2, Then f(t) t < -(m -(m 1)

. . . , (-l)n-lao, O).

+ 1).
5

>O

for al1 t > n/!+ 1, and ( - l ) " f ( t ) > O for al1 In particular, if f ( x ) has a real root c, theii c 5 M 1.

Proof. By the definition of M , we have Jf for j = 0 , 1, . . . , n - 1. Thus, if t > M

+ 1 2 1, then

-a

j,

and hence -M

5 aj,

[see Problem G(a), Section 2-11. To prove that t < - ( m 1 ) implies that ( - l ) n f ( t ) > O, simply apply the result which has just been proved to the polynomial

We leave the details for the reader to work out. It should be emphasized that the bounds - ( m 1 ) and M 1 obtained in Theorem 9-10.3 for the real roots of a polynomial are not in general the best possible. For instance, the theorem gives the bounds -1 and 6 for the real roots of x 2 - 5x 9, although this polynomial actually has no real root.

EXAMPLE 4. Let us obtain upper and 101%-er bounds for the real roots of the
polynomial f(~= ) 2~~ - 3~~

- 4.

Since f(x) is not a monic polynomial, Theorem 9-10.3 does not apply directly to give bounds for the roots of f(x). However, the roots of f(x) are evidently the

9-10]

GRAPHS O F REAL POLYNOMIALS

same as those of the monic polynomial

+f(x)
We have max and max

x4 - $ x 3 +
-0,

+x

2.
=

{-(-2), (-2,-0,

-3,

-(-21,

0)
= 2.

3, -(-21,

0)

Therefore, if c is a real root of f(x), then -3

5c5

3 by Theorem 9-10.3.

An important consequence of Theorems 9-10.1 and 9-10.3 is the following result. THEOREM 9-10.4. If f(x) is a nonzero polynomial iii R[x] such that Deg [ f(x)] is odd, then f(x) has at least one real root. Proof. Let f(x) = a. alx . an-lxn-l anxn, where ao, and a, are real numbers, a, # 0, and n is odd. Define al, . . . , g(a) = a ; ' f(x). Then

where bn-l = an-l/an, . . . , bl = al/an, and bo = ao/an. Since every root of g(x) is also a root of f(x), it is sufficient to show that g(x) has at least one real root. Let u and v be real numbers such that
ZL

> max (-bn-l,

-bn-2,

-bl,

-b

O,

0)

+ 1,

and

<

-[max (bn-l,

-bn-2,

. . . , (-1)n-2bi, ( - I ) ~ - ' ~ o ? 01

+ 11-

Then by Theorem 9-10.3, f(u) > O and (-l)"f(v) >O.

Since n is odd, (-1)" = -1, so that f(v) < O . Therefore, by Theorem 9-10.1, f(x) has a real root between v and u. The above proof does not depend on the fundamental theorem of algebra. A proof of Theorem 9-10.4 can be based on the fundamental theorem of algebra (see Problem 5 , Section 9-S), but then it would not be logically correct to turn around and use Theorem 9-10.4 in the proof of the fundamental theorem, as \ve will do in Appendix 3.

376

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

1. By plotting points a t 4 unit intervals from -3 to 3, sketch the graphs of the following polynomials. (a) x2 - 2 x + 1 (b) - 2 ~ 3 X - 3 (c) x4 x3 x2 x I (d) x3 - 2x2 - 3

+ + + + +
-

2. By roots. (a) (b) (c)

graphing the following polynomials, estimate the location of their real x4 x4 x3 2x2 - 8x - 3 28x2 24x 12 - 4x 1
-

+ +

3. Find upper and lower bounds for the real roots of the following polynomials. (a) x7 - x6 - x5 x4 - x3 x -1 (b) x12 - 23x2 722 - 1 (c) 4x5 - 2x - I (d) 99xg9 x7 1

+ +

4. Use the method of Example 3 to find the largest integer the real roots c of the following polynomials. 5 (a) x3 - 7x (b) x4 - 4x2 x 1 (c) x5 - 7x3 3x2 5x - 1

c for al1 of

+ + + + +

5. Prove that a monic polynomial in R[x] which has even degree must have a t least two real roots if the constant term is negative. 6. Prove the last part of Theorem 9-10.3 in detail.

7. Let a l , a2, . . . , a, be real numbers with a l bl, b2, . . . , b, be positive real numbers. Define
g(x) and
=

< a2 <

<

a,.

Let bo,

(x - ai)(x

a 2 ) . . - ( x - a,)

Prove that f(x) has n different real roots.

8. Let f (x) be a polynomial of positive degree in R[x]. Prove that if f' (x) has no real root, then f(x) has exactly one real root. [Hint: Use Theorem 9-10.4 to show that f(x) has a t least one real root; use Theorem 9-6.4 to show that f(x) has no multiple real roots; prove that if

where f (x) has no root between a and b, then f' (a) and f '(b) have opposite signs; from these facts, deduce the assertion of Problem 8.1

9-1 1 Sturm9s theorem. Theorem 9-1 0.1 guarantees the existence of at least one real root between c and d if the values of the polynomial j ( x ) E R [ x ]a t c and d have opposite signs. There may be more than one. For example, if j ( z ) = 64.~"- 88.c2 34.c - 3,

then f ( 0 ) = -3 and f ( 1 ) = 7 . The roots of j(.r) are 6,3, and 2. In sketching a graph of f(.c) fr6m a table of valucs, it would be easy to overlook t,wo of these roots:

From this data, wc would probably sketch the graph pictured in Fig. 9-5. The actual graph of j ( x ) , with the three zeros indicated, is shown in Fig. 9-6. Sturm's theorem* makes it possible to determine the number of real roots of a polynomial between any two numbers. Applying this theorem to the polynomial f ( z ) = 642" - 88.c2 342 - 3, wc would be able to see that f ( x ) has three real roots between O and 1 , and thereby avoid the error of sketching the graph of f ( x ) as in Fig. 9-5. Let j ( x ) be a polynomial of positive degree. We will describe a process which assigns to every real number t a nonnegative integer N ( t ) , such that the value of N ( t ) is diminished by 1 whenever t passes a root of j ( x ) . Then for any real numbers c < d such that, f(c) # O and f(d) # 0, the integer N ( c ) - N ( d ) is the number of reaI roots of j(.x) bet,ween c and d.

* Named for its discoverer, Jacques Charles Francois Sturm (1803-1855).

378

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

The first step in defining N ( t ) is to alter slightly the Euclidean algorithm (see Section 9-4). By the division algorithm,

Let sl(x)

-rl(s), and divide fl(x) by sl(x):

Let s2(x) = -r2(x), and divide sl(x) by s ~ ( x ) :

I f this process is continued, we obtain the following sequence of equations:

where sk(x) is the last nonzero remainder. Except possibly for sign, the remainders sl(x), s2(x), . . . , sk(x)obtained in this way are the same as the remainders obtained in applying the Euclidean algorithm to find a greatest common divisor of f(x) and f'(x). Therefore, the last nonzero remainder sk(x) is a g.c.d. of f(x) and fr(x). The sequence of polynomials

is called the Sturm sequence of f(x).

176s

EXAMPLE 1. Let f ( x ) = 64x3 - 88x2 34. By dividing, we obtain

+ 34x - 3 .

Then f f ( x )

= 192x2

Therefore, the Sturm sequence of f ( x ) is

For each real number t, the values at x = t of the polynomials given in (9-14) form a finite sequence of reaI numbers:

A variation in sign occurs in the sequence (9-15) whenever one of the numbers is positive, and the next nonzero number in the sequence is negative, or vice versa. For instance, in the sequence 3,0, -1, -2,0,0, 1, variations in sign occur at 3 and -2. Let N ( t ) be the total number of variations in sign for the sequence (9-15). The number N ( ) can be computed by discarding the numbers in the sequence (9-15) which are 0, and counting the number of variations in sign for the new sequence which consists of positive and negat,ive real numbers.
EXAMPLE 2. Let f ( x ) be the polynomial 64x3 - 88x2 34x - 3, whose Sturm sequence was obtained in Example 1. The values of the polynomials in the Sturm sequence of f ( x ) corresponding to x = O and x = 1 are, respectively,

Consequently, for the polynomial 64x3 - 88x2

+ 34x

3,

THEOREM 9-11.1. Sturm's theorem. Let f(x) be a polynomial in R[x] whose Sturm sequence is given by (9-14). Let c and d be real numbers such that c < d and f ( c ) # O and f(d) # O . For each real number t, let N ( t ) be the number of variations in sign in the sequence (9-15). Then the number of distinct roots of f ( x ) between c and d is equal to N(c) - N(d). The proof of Sturm's theorem is elementary, but rather long. For this reason, we will not prove Theorem 9-11.1 in this section. The interested reader can find a proof of Sturm's theorem in Appendix 1.

380

THE THEORY OF .4LGEBILIIC EQU-4TIOSS

[CHAP.

EXAMPLE 3. Returning to the polynomial j(x) = 64x3 - 88x2 342 - 3, we note t h a t by Sturrn's theorem and the result of Example 2, j(x) must have three roots betiveen O and 1. 7'his is in agreernent mith the obscrvation made a t the beginning of this section t h a t 9, 4, and 2 are roots of j(x). There can be no others, because the dcgree of j(x) is thrce.

I t is to be emphasized that Sturm's theorem gives a ~ v a y of finding the number of distinct real roots of a polynomial. This theorem does not give any information about the miiltiplicity of these roots. However, if thc Iast term sk(.c) in the Sturm sequencc of a polynomial f (z) is no t a constan t, then f(x) muy have multiple real roots, which can be located by applying Sturm's theorem to sk(x), since this polynomial is a g.c.d. of f(x) and f r ( x ) .
EXAMPLE 4. Let j(x)
4x3

32

+ 1.

Then jr(x)

12x2 - 3, and

Therefore, the Sturm sequence of j(x) is

For the values x

-2, x

0, x

2, the Sturm sequence of j(x) becomes

Therefore, N(-2) = 2, N(0) = 1, and N(2) = O. I t follo~vsfrom Sturm's theorem t h a t j(x) has one root between -2 and 0, and one root between O and 2. I t is easy to see (by Theorem 9-10.3, for example) t h a t j(x) has no root smaller than -2, and none larger than 2. Thus, j(x) has orily two distinct real roots. Clearly, one of these must be a double root, since the complex roots of a real polynomial occur in pairs, by Theorem 9-8.3. If we note that 22 - 1 is a greatcst common divisor of j(x) and jr(x), then it becomes clear from Theorem 9-5.4 t h a t is a double root of j(x). By inspection, the other real root is -1.

EXAMPLE 5. Let f(x)


and

+ 4x3 + x2 62 + 2 O. Then jf(x) = 4x3 + 12x2 + 22 6, j(x) (ax + +)fyx) (3x2 + 5~ - 5), f'(x) (fx + t)(4jx2 + 55 - 5 ) - ($2 + g), + 52 - 5 ( Y X+ y ) ( $ x + 5) 6.
=

x4

5 2

Therefore, the Sturm sequence of f (x) is

By Theorem 9-10.3, every real root of f(x) is between -7 and 7. Computing the values of the Sturm sequence for each integral value beginning a t x = -7, we find that N(-7) = 4, N(-6) = 4, N(-5) = 4, N(-4) = 4, N(-3) = 4, . This shotvs that al1 four N(-2) = 2, N(-1) = 2, N(0) = 2, and N(1) = O roots of f(x) are real, and there are two roots bettveen -3 and -2 and two roots between O and 1. Since f(-3) > 0, f (-2) > O, f(0) > 0, and f(1) > 0, the existence of these real roots would not be detected by Theorem 9-10.1 if we calculated f(x) only for integer values of x. The calculation of N(-+) = 3 and N(+) = 1 locates the roots of f(x) in the intervals -3<x<-8, -s<x<-2, O < % < + , and + < x < l .

Having isolated each real root of f(x), we can use Theorem 9-10.1* to obtain the n-place decimal approximation of these roots. For example, since f(0) = 2, f(O.1) = 1.9541, f(0.2) = 0.8736, f(0.3) = 0.4061, f(0.4) = 0.0416, and f(0.5) = -0.1875, it follows from Theorem 9-10.1 that the root off(x) in the interval O < x < 4 is between 0.4 and 0.5. Repeating this process, we obtain f (0.41) = 0.0120 and f(0.42) = -0.0768 (with four decimal accuracy). Thus, the 2-place decimal approximation of this root is 0.41. Continuing in this way, we can locate the root between successive thousandths, ten thousandths, etc. There are various schemes for systematizing and shortening the calculations involved in finding decimal approximations of the real roots of a polynomial in R[x]. The interested reader can find these methods discussed in standard college algebra and theory of equations textbooks.

1. Give the Sturm sequence of each of the following l)olynomials. (a) x3 x2 x 1 ~ 6 (b) x4 - 3x2 - 1 0 (c) xS - 5x - 2

+ + +

* If the multiplicity of the isolated root is even, then Theorem 9-10.1 will not help in locating the root. For the polynomial which tve are considering, i t is obvious that al1 of the roots are simple, because the surn of the multiplicities of al1 the roots is four, and there are four distinct roots.

382

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

2. Use Sturm's theorem to locate (between consecutive integers) al1 the real roots of the polynomials in Problem 1.

3. Let f(x) = ax2 bx c, where a, b, and c are real numbers, with a # 0. Find the Sturm sequence of f(x). Use Sturm's theorem to show that f(x) has real roots if and only if b2 > 4ac.
4 . Let p and q be real numbers, with p # O. Show that the Sturm sequence of the polynomial x3 px q is

+ +

+ +

provided 27q2 4 p 3 # O. Use Sturm's theorem to show that x3 px q has one real root if 27q2 4 p 3 > O and three real roots if 27q2 4 p 3 < O . [Hint: Consider the cases p > 0, p < O separately.]

+ +

5 . Show that if s k ( t ) sequence is zero.

O in the sequence (9-15), then every term in the

6. Find the 3-place decimal approximations of al1 the roots of the polynomial of Example 5.

9-12 Polynomials with rational coefficients. The fundamental theorem of algebra leads to a complete solution of the problem "what are the irreducible polynomials in C[x] and in R[x]?" (See Theorems 9-8.2 and 9-8.4.) Determining the irreducible polynomials in Q[z] is much more difficult. There are ways of testing whether or not a polynomial in Q[x] is irreducible. However, al1 of these methods are rather complicated, and they do not lead to very interesting general results. For this reason, we will only consider a part of the general problem of determining the complete factorization of polynomials in Q[z], namely, the determination of the linear fac tors. By the factor theorem, a polynomial x - r with r E Q is a factor of a(x) in Q[x] if and only if r is a root of a(x). Suppose that

where the numbers ui and v i # O are integers. Let u be a common multiple of the denominators VO, ul, v2, . . . , un, for example, u = ~ 0 ~ . 1 . .~un,2 or u = [u0,u1, u2, . . . , u,]. Then the polynomial b(x) = v a(x) has integral coefficients. Moreover, b'(r) = v a(r) = O if and only if a(r) = O. Thus, the problem of finding the monic linear factors of a polynomial in Q[x] can be reduced to the problem of finding the rational roots of a polynomial in Z[x]. The following theorem shows that the rational roots of a polynomial in Z[z] can be found by trial.

9- 121

POLYNOMIALS WITH RATIONAL COEFFICIEKTS

383

THEOREM 9-12.1. Let a(x)

= a0 a l x 4an-,xn-' anzn be a polynomial with integral coefficients. Suppose that a. # O, a, # O, and n 2 1. If b and c are relatively prime integers such that b/c is a root of a(x), then b divides a. and c divides a,.

Proof. If b/c is a root of a(x), then

Multiplying this equation by en, we obtain


aOcn

+ albcn-l +
+ albcn-2 +
+

+ an-lbn-lc + anbn = 0.
+an-~ bn-l) . c = a,bn,
. + anWlbnd2c + anbn-')l.

Therefore,
-(aocn-l

and
aOcn= b . [-(alcn-l

These equalities imply that c divides anbn and b divides aocn. Since b and c have no common prime factor by hypothesis, it follows that (c, bn) = 1 and (b, cn) = 1. Thus, by Theorem 5-2.6, c divides a, and b divides ao.

EXAMPLE 1. We will use Theorem 9-12.1 to show that O is the only rational root of the polynomial a(x) = x7 - 3x6 2x3 x2. Clearly, O is a root of a(x), and a(x) = x2(x5 - 3x4 22 1). If r # O is a rational root of a(x), then r is a root of x5 - 3x4 2x 1. We can write r = b/c, where b and c are relatively prime integers. By Theorem 9-12.1, b divides the constant term of x5 - 3x4 22 1, and c divides the leading coefficient of this polynomial. That is, b and c both divide l. Hence, b and c are either 1 or -1, so that r = rt 1 also. However, a(1) = l7 - 3 - l 6 2 . l3 l 2 = 1 and a(-1) = ( - I ) ~ 3 (-1)6 2(-1)3 (-1)2 = -5. Therefore, O is the only rational root of a(x>

+ + + +
+

+ +
+

+ +

EXAMPLE 2. Let
a(x)
= x4

+ yx3 + $$2

- 2 3'

The roots of a(x) are the same as the roots of

If r = b/c is a rational root of 6a(x), where b and c are relatively prime integers, then by Theorem 9-12.1, b divides 4 and c divides 6. Therefore, the possibilities for r are *l, h 2 , rt4, A+) A+) *g, *S, A&. Testing each of these numbers, we find that a(-+)
=

O, a($)

O , and

-3

and

384

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

are the only rational roots of a ( x ) . The division algorithm gives the factorization a(x) = (X + ) ( x - + ) ( x 2 2x 2))

+ +

and i t is easy to see that x2

+ 2x + 2 is irreducible in Q[x].

EXAMPLE 3. Theorem 9-12.1 can be used in combination with some of the previous results in this chapter to obtain considerable information about the complete set of roots in C of a polynomial in Q [ x ] . Let

Since a ( x ) E Q [ x ] ,it f o l l o ~ ~ that s a ( x ) E R [ x ] . By Theorem 9-10.3, a real root c of a ( x ) satisfies -o < c < 1 3 3 .
3 -

The complete set of roots in C of a ( x ) is the same as the set of roots of

By Theorem 9-12.1, the possible rational roots of b ( x ) are f1, f7 , f a ,and f3. Since -7 < and 7 > it follo\vs that 7 and -7 cannot be roots of b ( x ) [for otherwise a ( x ) would have a root in Q C R which is not between the bounds for the real roots of a ( x ) ] . Testing the numbers f1, f5, f3 in b ( x ) , we find that b(-1) = 0 , b ( 3 ) = 0 , and that -1 and 3 are the only rational roots of b ( x ) . The division algorithm yields

-y

y,

in Q[x]. Further roots of b.(x) in C are roots of

From Theorem 9-12.1, the only possible rational roots of c ( x ) are 1 and -1. Of course, 1 cannot be a root of c ( x ) , since i t is not a root of b ( x ) . By substituting, we find that e(-1) = O. Division gives
C(X) = (X

so that b(x)
=

+ 1 ) ( x 3 - x2 - 1 ) 3(x + I ) ~ ( x 3)(x3
-

x2 - 1 )

in Q[x]. Let d ( x ) = x3 - x2 - l . Since d(-1) # O, it follows x3 - x2 - 1 is irreducible in Q[x](see Problem 3 ) . Thus a ( x ) = ( x I ) ~ ( x 3 ) ( x 3 - x2 - 1 ) is the complete factorization of a ( x ) into irreducibles in Q[x]. Further roots of b(x) in C are roots of d ( x ) . Regarding d ( x ) as a polynomial in R [ x ] , we use Theorem 9-10.3 again, and find that every real root c of d ( x ) satisfies -2 5

9-12]
c

POLYXOMIALS WITH RATIONAL COEFFICIEKTS

2. Thc Sturm sequcnce for d ( x ) is

and N ( - 2 ) = 2, N ( 2 ) = l . Therefore, by Sturm's theorem, d ( x ) has exactly one real root. This root is located between 1 and 2 sincc d(1) = -1 and d ( 2 ) = 3. The othcr roots of d ( x ) are a pair of conjugate complex numbers (Theorem 9-8.3). I n summary, we havc obtained the following information about thc roots in C of the polynomial a ( x ) : -1 is a double root; 3 is a simple root; thcre is a simple real root between 1 and 2 which is not rational; there is a pair of conjugatc complex roots. Of coursc, real and complex roots of x3 - x2 - 1 can bc found in terms of square roots and cube roots, using the methods of Section 9-9.

The roots of polynomials in Q[x]have many interesting properties. I n the rcmainder of this section, we will examine some of the simplest ideas which are usecl in the study of the roots of rational polynomials. Our discussion will scratch the surface of an extensive branch of mathematics known as algcbraic number theory. DEFISITIOX9-12.2. A complex number u is called an algebraic number if u is a root of some nonzero polynomial with rational coeficients. Complex numbers which are not algebraic are called transcendental. Every rational number r is an algebraic number, because r is a root of x - T . Any number of thc form u = where r is rational, is algebraic, becausc u is a root of xm - r. The complex unit i = .\/-lis an algebraic number. More generally, any number of the form r i S , r E Q, S E Q, is an algebraic number, because r i S is a root of x 2 - 2rx (r2 s2). Later we will show that the sum and product of any two algebraic numbers is an algcbraic number, so that numbers such as 4 3 4,4 fi, i/5 i, 2 fi, etc., are algebraic. We observed in Section 1-2 that the set of al1 algebraic numbers is denumerable (sce the discussion following Example 5 ) . Since the set C of complex numbers is not denumerable, thcrc must be many complex numbers which arc not algebraic. That is, transcenden tal numbcrs certainly exist. However, i t is not very easy to produce specific examples of transcendent,al numbers, and it is quite difficult to prove that particular numbers such as T and 2d2 are transcendental. According to Definition 9-12.2, a number u is algebraic if it is a root of any nonzero polynomial in Q[x]. Of c'ourse, if u is algebraic, then u is a root

e,

+ + + +

386

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

of infinitely many polynomials with coefficients in Q. The following theorem tells us exactly what this set of polynomials can be.

THEOREM 9-12.3. Let u be an algebraic number. Then there is a unique monic polynomial p(x) of least degree having u as a root. This polynornial p(x) is irreducible, and it has the following property: if a(x) E Q[x], and u is a root of a(x), then p(x) divides a(x) in Q[x].
The unique polynomial p(x) described in this theorem is called the minimal polynomial o f u. The degree of u is defined to be the degree of the minimal polynomial of u. Thus, the rational numbers are exactly the algebraic numbers of degree one, and the numbers r fi where r, S E Q and S is not a square in Q are of degree two. To prove Theorem 9-12.3, let J = (a(x) E Q[x]la(u) = O). That is, J is the set of al1 polynomials in Q[x] which have u as a root. The assumption that u is an algebraic number means that J is a subset of Q[x] which contains a t least one nonzero polynomial. Therefore,

S = (Deg [a(x)]la(x) E J, a(x) # 0)


is a nonempty subset of the set N of al1 natural numbers. (Note that no nonzero constant polynomial belongs to J . ) Consequently, by the wellordering principie, S contains a smallest number. That is, there is a nonzero polynomial f(x) E J such that Deg [ f (x)] Deg [a(x)]for al1 nonzero a(x) E J . Let f(x) = a. alx . . a n - l ~ n - l anxn,where a, # 0. Define

+ +

<

Then p(x) is a monic polynomial such that p(u) = O and Deg [p(x)] Deg [a(x)] for al1 nonzero a(x) E J . We will show: (i) p(x) is irreducible, and (ii) if a(x) E J, then p(x) divides a(x) in Q[x]. I t will then follow easily that p(x) is unique. Suppose that p(x) is reducible. Then p(x) = b(x) c(x), where b(x) and c(x) are nonzero polynomials in Q[x] which have degrees less than Deg [p(x)]. Since b(u) . c(u) = p(u) = O, it follows that either b(u) = O, or c(u) = O. Hence, by definition of J, either b(x) E J, or c(x) E J . This is impossible however, because Deg [p(x)] Deg [a(x)] for al1 nonzero a(x) E J. Therefore, p(x) is irreducible. I n order to prove (ii), let a(x) E J . By the division algorithm, it is possible to write

<

<

where q(x) E Q[x], r(x) E Q[x], and either r(x) = O, or else Deg [r(x)] < Deg [p(x)]. Suppose that r(x) # O . Then Deg [r(x)] < Deg [p(x)].

9-12]

POLYNOMIALS WITH RATIONAL COEFFICIENTS

387

Moreover, r(u) = a(u) - q(u) . p(u) = O - q(u) O = O, because a(x) E J and p(x) E J. Thus, r(x) E J. However, this is impossible, since r(x) E J implies that Deg [p(x)] Deg [r(x)]. Consequently, r(x) # O is impossible. Therefore, r(x) = O and a(x) = q(x) p(x). That is, p(x) divides a(x), which proves (ii). I t remains to show that p(x) is unique. By choice, p(x) is one monic polynomial of minimal degree in J. Suppose that a(x) is another one. Then Deg [a(x)] = Deg [p(x)]. By what we have just proved, p(x) la(x). Therefore, a(x) is a nonzero, constant multiple of p(x) (see 9-4. ld) . Since a(x) and p(x) are both monic, the constant must be one. That is, a ( ~ ) = p(x). This establishes the uniqueness of ~(4.

<

EXAMPLE 4. Let u = 2/S. Then the minimal polynomial of u is x2 - 2, since u is a root of this polynomial, but not of any polynomial of lower degree in Q [ x ] . Thus 2/2 is an algebraic number of degree two. The polynomials in Q[x] which have 2/2 as a root are exactly those polynomials which are divisible by x2 - 2. I n particular, if 2/2 is a root of the rational polynomial a ( x ) , then -& is also a root of a(x).

We wish to prove that the set of al1 algebraic numbers is a subring of the ring C of al1 complex numbers. A preliminary result is needed, which is important in its own right. THEOREM 9-12.4. Let u be an algebraic number of degree n. Define

is closed under addition, multiplication, negation, and the Then &[u] inverse of every nonzero element of &[u] is in &[u]. Thus, &[u] is a field which is a subring* of C.

Proof. Let U = {a(u)la(x) E Q[x]). Then it follows from (9-7.2) that = U. I t is clear that U is a subring of C. We will first prove that &[u] &[u] G U. Indeed &[u] is just the set of al1 complex numbers r(u), where r(x) E Q[x]is such that either r(x) = O, or else Deg [r(x)] < n. On the other hand, suppose that w E U. Then w = a(u) for some a(x) E Q[x]. Let p(x) be the minimal polynomial of u. Then the degree of p(x) is,

* I n general, if D is a subring of a ring A and a E A, then D[a] denotes the smallest subring of A containing D and a. This notation seems to conflict with the use of D[x]to denote the ring of polynomials with coefficients in D, but there is no contradiction because D[x]is the smallest subring of D[x] which contains D and x. Throughout the rest of this section, the symbols u and v will always stand for algebraic numbers, and x will denote an indeterminate as usual.

388

THE THEORY OF ALGEBHAIC EQGATIOSS

[CIIAP.

by definition, the dcgree n of u. By the division algorithm, wc can write a(x) = q(x) p(x) r(x), where r(x) E Q[x], and cither r(x) = O, or else Deg [r(x)3 < I>cg [p(x)] = n. Thus,

where ro, r l , . . . , and rn-1 are rational numbcrs, i14orcover,

Conseyuently w E &[u]. Sincc w was any element of C, we have proved that U c Q[u]. Thus, Q[u] = U. The only thing left to show is that every nonzero element of &[u] has an inverse in &[u]. Let w = ro r l u rn-lun-l be an element of Q[u] which is not zero. Then in particular, rlx rn-lxn-l is not aero. Moreovcr, the polynomial r(x) = ro Deg [r(x)] 5 n - i < Deg [p(x)]. Hence, p(x) does not divide r(x). Since p(x) is irreducible by Thcorem 9-1 2.3, it follows that p(x) and r(x) are rclatively prime [sec (9-5.2)]. Therefore, by Thcorem 9-4.4, polynomials g(x) and h(x) cxist in Q[z], such that

Subst,ituting . x = u in this identity, wc obtain

Thcrcfore, w-'

r(u)-'

g(u) E U = Q[u]. This completes thc proof.

THEOREM 9-12.5. If U and 1) are algcbraic numbers, then u u, u u, and -u are algebraic riumbers. If u is a nonzero algebraic iiumber, then u-' is an algebraic ilumbcr. In order to prove this theorem, it is necessary to use a result which will bc established in Section 10-2 (sce Theorem 10-2.9). 'i'hc spccial case of Theorcm 10-2.9 which wc will use here can be stated as follows. (9-12.6). Let {ri,j l 1 5 i 5 g, O 5 j 5 g) be a set of rational numbcrs. Then there exist rational numbcrs so, sl, sg, . . . , sg not al1 of which are zero such t,hat

9-12]

POLYNOMIALS

WITH RATIONAL COEFFICIENTS

389

Proof of Theorem 9-12.5. Suppose t,hat the degree of u is m and the u is a root of a nonzero polynomial degree of v is n. We will prove that u in Q[x]mhich has degree a t most m n. Therefore u v is algebraie of degree 5 m n. By Theorem 9-12.4, for any natural numbers i and j, there exist rational numbers ai.0, ai,l, ai.2, . . . , ai,m-l and bj,,, bj,,, bj,2J . . . , bj,n-l S U C that ~

Hence, by the binomial theorem, me have for h = 1, 2, 3, . . . , m n.

Sinee al1 of the binomial coefficients ( : ) are natural numbers, it follows that each of the numbers rk,l,h is rational. It is also convenient to define rk,l,o = 1 if 1~= 1 = O and r k , ~= , ~O if k > O or 1 > O, SO that

By (9-12.6) (taking g = m n and replacing the indices i = 1, 2, . . . , g by the m - n pairs (k,1), O _< k 5 m - 1, O 5 1 5 n - 1 in some order), there exist rational numbers so, sl, . . . , S,., not al1 of which are zero, such that

for al1 pairs ( k , l ) with O 5 1; 5 m

1 and O

1 _< n

1.

390 Consequently,

THE THEORY OF ALGEBRAIC EQUATIOXS

[CHAP.

That is, u

+ u is a root of the nonzero polynomial +

v is an algebraic number. A similar proof shows that u . v Therefore, u is a root of a nonzero polynomial of degree a t most m n. Thus, u v is algebraic. In particular -u = (-1) . u is algebraic. Finally, suppose that u # 0, and let the minimal polynomial of u be

Then co # O, because p(x) is irreducible, so that

+(-C&'C~-~)U~-~+(-C ) U O Therefore,

-1

m-1

1-

Since the sums and products of algebraic numbers are algebraic, and since -1 u and each of the rational numbers - c ~ ' c l , -cc1c2, . . . , -co cm-il -cOh1 is algebraic, it follows that u-' is algebraic. This completes the proof of Theorem 9-12.5.
EXAMPLE 5. It is instructive to carry out the proof of Theorem 9-12.5 in a special case. Let u = 1 4 2 and u = 43. Then the minimal polynomials of u and u are x2 - 2x - 1 and x2 - 3, respectively. mTehave

9- 121 so that

POLYNOMIALS W I T H R A T I O N S L C O E F F I C I E N T S

We wish to find rational numbers so, si, s2, s3, and s4, not al1 zero, satisfying

A method for solving such systems of equations will be developed in Section 10-2. However, it is easy to verify that

is a solution. Consequently,

4 3 is a root of x4 - 4x3 - 4x2 16x - 8. v = 1 Therefore, u The proof that uv is an algebraic number is somewhat simpler in this special case. Note that 6u, (uv)~ = u2v2 = (2u 1) 3 = 3
(uv)~ = u4v4 Thus,
=

+ (12u + 5)

+ 45 + 108u.

2/2 2/3 is a root of x4 - 18x2 9. Consequently, uv = 4 3 It can be shown that the polynomials x4 - 4x3 - 4x2 16x - 8 and x4 - 18x2 9 are irreducible in Q[x], so that if u = 1 d 2 and v = d 3 ,

392

THE THEORY OF ALGEBRAIC EQUATIONS

[CHAP.

then the degree of u v and u v is exactly 4, the product of the degree of u and v is less the degree of v. I t may happen however that the degree of u v or of u than the product of the degrees of u and v. For example, if u = 4 2 and v = .t/2, then the degree of u is 2, the degree of v is 4, and the degrees of u v and u v are both 4 : u v is a root of x4 - 4x2 - 8x 2, u v is a root of x4 - 8.

I t is convenient to reformulate our main results on algebraic numbers.

THEOREM 9-12.7. The set A of al1 algebraic numbers is a field which is a subring of C. If ZL is any algebraic number, then the field &[u] is a subring of A.
Proof. By Theorem 9-12.5, the set A of al1 algebraic numbers is a field with respect to the operations of addition, multiplication, and negation in C. That is, A is a subring of C. If v is any element of &[u],where u is an algebraic number, then by the definition of &[u],v is a sum of products of algebraic numbers. Thus, by Theorem 9-12.5, v E A. Therefore, &[u] C A.

1. Find al1 of the rational roots of the following polynomials. (a) 2x3 - 7x2 10x - 6 (b) x3 - ,X 3x - 2 (C) x3 - S X 2 - $x + -1. 16 (d) x3 - 48x 64 (e) x4 - 52 - 1 (f) 2~~ - ~5 - 2 ~ 4 ~3 2x2 32 - 2

+ +

+ + +

2. Prove that if r is a rational root of a monic polynomial with integral coefficients, then r is an integer.

3. Prove that a polynomial of degree 2 or 3 in Q[x]is irreducible in Q[x]if i t has no rational root. Use this result to show that the following polynomials are irreducible in Q[x]. (b) x2 +X - 1 (a> x2 II: 1 (c) x3 37x2 2 1 1 ~ - 1 ( d ) x3 - 25x - 5

+ ++

4. Give the complete factorization in Q[x]of the following polynomials. (a) x4 - 1 (b) 2 ~ 4 x3 2 ~ 2 - 1 (e) x4 x2 1

+ + + +

5. For the following polynomials in Q[x]determine al1 rational roots, and the number and approximate location of al1 real roots. (a) x4 -3x3 f$x2 &x - & ( b ) x5 4x4 7x3 7x2 4x 1 (c) x 7 + Z J i x 6 + $ x 5 - Yx4- 3 5 3 - 3x2+ GX - 4 xX 3

+ + + + + + + +

9- 121

POLYR'OMIALS WITH RATIOR'AL COEFFICIENTS

393

+ d3, -+

6. Find the minimal polynomial of the following algebraic numbers: - i d 3 / 2 , 3YS.

+,

7. Suppose that r and S are rational numbers and S is not the square of a di is x2 - 2rx rational number. Prove (a) the minimal polynomial of r is a root of a(x), then r - di is (r2 - S); (b) if a(x) E Q[x] is such that r also a root of a(x).

8. Carry out the proof that if u and v are algebraic numbers, then u an algebraic number in the following special case. (a) u = d Z , v = d 5 (b) u = 43, v = +S.

+ v is

9. Give the details of the proof that if u and v are algebraic numbers of degree

m and n, respectively, then uv is algebraic of degree a t most m n.


10. Let p(x) and q(x) be distinct monic irreducible polynomials in Q[x]. Prove that there is no complex number which is a root of both p(x) and q(x).
1l. Show that if u is an algebraic number of degree n, then -u is of degree n. 12. Let p(x) be irreducible in Q[x]. Suppose that u and v are two roots of p(x). Prove that the fields &[u] and &[u] are isomorphic.

CHAPTER 10

SYSTEMS OF EQUATIONS AND MATRICES


10-1 Polynomials in severa1 indeterrninates. In Section 9-2 we showed that beginning with any integral domain D, a domain D[x] of polynomials with coefficients in D could be constructed. In particular, D itself can be taken to be a domain of polynomials. In fact, this process can be repeated any number of times to obtain polynomials in severa1 indeterrninates. In order to avoid confusion, it is of course necessary to use different symbols to designate the various indeterrninates. The symbols x, y, and x are usually used in discussing polynomials in one, two, or three indeterrninates; in discussions involving larger numbers of indeterminates, xl, x2, x3, . . . are more convenient.

f DEFINITION 10-1.1. Let D be an integral domain. The domain o polynomials in the distinct indeterminates xl, x2, . . . , x, with coeficients in D is defined by induction on r. For r = 1, D[xl] is the integral domain of polynomials in xl with coefficients in D, defined as in Section 9-2. I f r > 1 and D[xl, x2, . . . , x,-~] has been defined, let

be the integral domain of polynomials in x, with coefficients in

The elements of D[xl, x2, . . . , xr] are called polynomials in xl, 2 2 , with coeficients in D. According to Definitions 10-1.1 and 9-2.1, each element of

. . . , X,

can be expressed uniquely in the form

where fi E D[xl, x2, . . . , x,-J. If r = 2, then each fi is a polynomial in which can be expressed in the form aipjx; with aij E D. Choose m to be the largest of the integers mo, ml, . . . , m, and define ai,j = O if
21,

394

10-11

POLYNOMIALS I N SEVERAL INDETERMINATES

395

mi

<j

< m.

Then the polynomial (10-1) (in the case r = 2) is

Moreover, this expression is unique. That is, if

where al1 a;, j and bi,j are in D, then ai,j = bit for al1 i and j. In fact, define

and

i for al1 i. Then fixa = x7=0 gix2. By uniqueness of the representation (10-1)) it follows that

I for al1 i. Therefore, by Definition 9-2.1, ai,j = bi, for al1 i and j. In general, it can be shown by induction on r that each polynomial in D[xl, 2 2 , . . . , xT] can be expressed uniquely as a multiple sum

C ai,jx{ = fi = gi = j=O b j=O

i,$1j

il 5 nl, where for each string il, i2, . . . , ir of integers satisfying O O i2 n2, . . . , O ir nr, ail,i2 ,...,iT is an element of D. The existence of a representation of the form (10-2) is the reason why the elements of D[xl, x2, . . . , x,] are called polynomials in 21, 2 2 , . . . , XT, with coefficients in D. Because it is cumbersome, the expression (10-2) is frequently shortened

< <

< <

<

where i stands for the ordered string (il, i2, . . . , ir), and the sum is over a finite number of such strings. I t is sometimes convenient to denote polynomials in r indeterminates by expressions such as

396

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

The statement that the representation in D[xl, x2, . . . , x,] is unique means that

xiaixtx2

x$ of a polynomial

only if ai = bi for al1 i = (il, i2,. . . , ir). This fact is very important. Many definitions concerning polynomials in severa1 indeterminates are stated in terms of the representation of polynomials in the form

The concepts introduced in this way are well defined because of the uniqueness of the representation (a fact which is often not mentioned). Those polynomials in D[xl, x2, . . . , x,] which contain only the indeterminates xj,, xj2, . . . , xj8, where jl, j2,. . . ,j, are distinct elements of the set (1, 2, . . . , r), form a subring of D[xl, x2, . . . , x,]. This subring is isomorphic to the ring of al1 polynomials in any s indeterminates with coefficients in D. I t is natural to denote this subring of D[xl, x2, . . . ,?l. j ZI, ai,j,nxny'z' by D[xj1, xj,, . . . , xj8]. For example, a polynomial such that ai,j,k = O for al1 k > O can be expressed as

xi

C C C a i , j , k ~ ~= yj~~
i j k i
j

(aii,j,Ox)Y2 =

j i

E
i j

.. bi.jy'z2,

where ai,j, oxO= bi, j E D. The set of al1 such polynomials is the subring of D[x, y, x], which we denote by D[y, x]. In this way, the rings of polynomials in the various subsets of (xl, x2, . . . , xr) are identified with subrings of D[x1, x2, , xrl. I f a(xl, 2 2 , . . . , xr) is a polynomial in D[xl, x2, . . . , x,], then it is clear from the representat,ion (10-2) that for each natural number j 2 r, we can think of a(xl, x2, . . . , xr) as a polynomial in xj with coefficients in D[xl, . . . , xj-1, X j + l , . . . , x,]. Thus, no distinction is made between D [ x ~ , x ~ , . . . , and x ~ ]D[xl , . . . , xj-l,xj+l, . . . , xr,xj]. In general, if il, i2,. . . , ir is any permutation of 1, 2, . . . , r, then D[xi1, x,, . . . , x;,] is regarded as the same domain of polynomials as D[xl, x2, . . . , X,]. Por example, the polynomial

is expressed as (x4

+ 5y) + (2x2y - 3)x + (3x3)x2

when considered as a polynomial in D[x, y][x] = D[x, y, x]. On the other

10-11

POLYNOMIALS I N SEVERAL IKDETERMIKATES

hand, the same polynomial can be written in the form

which is a polynomial in D [ y ,z][x]= D [ y ,z, x]. The notion of the degree of a polynomial can be generalized in severa1 ways to polynomials in severa1 indeterminates. When a ( x l , 2 2 , . . . , x,) E D [ x l , x2, . . . , x,] is regarded as a polynomial in xj with coefficients in D[x17 . . . , xjVl7 ~ j + ~. ., . , x,], we can use Definition 9-3.1 to define the x j-degree of a ( x l , 2 2 , . . . , 2,). That is, if

where f,(xl, . . . , xj-1, xj+l, . . . , x,) # 0, then a ( x l , x2, . . . , x,) is n. For example, +x2y 2xy3 (+y)x2= 1 ( i x 2 ) y (2x)y3,SO that

+ 1 = 1 + (2y3)x +

the

xj-degree of

Of course, the properties of the degree of a polynomial listed in Theorem 9-3.2 are satisfied by Deg, for each xj. I t is also possible to define the total degree of

i, for which ai, ,;,,... ,i, is not i2 to be the largest of the sums il zero. For example, the total degree of +x2y 2xy3 1 is four. It is easy to prove the analogue of Theorem 9-3.2 for the total degree.

+ +

+ +

(10-1.2). Let a ( x l , x2, . . . , x,) and b ( x l , x2, . . . , x,) be nonzero polynomials of total degrees m and n respectively. Then n; (a) a ( x l , x2, . . . , x,) b(xl, 2 2 , . . . , x,) has total degree m x,) b ( x l , x2, . . . , x,) is either zero, or has total (b) a ( x l , x2, . . . , degree 5 max (m, n ) ; (c) if m # n, then the total degree of a ( x l , x2, . . . , x,) b(x1, x2, . . . , x,) is equal to max {m, n ) .

We leave the proof of these facts for the reader to supply. The arithmetical properties of the rings F[x] mith F a field cannot be generalized to polynomial domains F [ x l ,x2, . . . , x,] with r > 1. The most important results in Sections 9-3 and 9-4 are false in F [ x l , x2, . . . , x,] when r > 1. Surprisingly enough, the unique factorization theorem is

398

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

true in F[xl, x2, . . . , x,], although it is proved in a different way than Theorem 9-5.4. We will not enter into a discussion of these matters, but will only note the following example.

EXAMPLE 1. The polynomials x and y in Q[x, y] clearly have only nonzero rational numbers as common divisors. Hence, 1 is a greatest common divisor of x and y (in the sense explained in Section 5-2). I t is not hard to see, however, that there are no polynomials f(x, y) and g(x, y) in Q[x, y] such that

Therefore, the analogue of Theorem 9-4.4 fails in Q[x, y].

The definition of substitution given in 9-7.1 can be extended to polynomials in severa1 indeterminates.

DEFINITION 10-1.3. Let D be an integral domain, and let A be a commutative ring which contains D as a subring. Suppose that

is in D[xl, x2, . . . , x,]. Let (ul, u2, ments of A. Then the element

. . . , u,) be an ordered string of ele-

in A is called the value of a(xl, x2, . . . , x,) for ~1 = ~ 1 x2 , = ~ 2 . ,. . , and x, = u,, and this value is denoted by a(u1, ug, . . . , U,). The a ( u l , u2, . . . , U,) is said to be obtained by substituting u l , ~ 2 .,. . , u, for x1, xp, . . . , Xr in a(x1, X2, . . . , xr).

EXAMPLE 2. Let D = R, a(x, y, z) = x2 y2 - z2. If A = C, the value = -1. If of a($, y, z) a t (1, i, -1) is a(1, i, -1) = l2 (i)2 A = R, the value of a(x, y, z) a t ( d 2 , 4 2 , 2) is a ( d 2 , d 2 , 2) = ( d 2 ) 2 ( d S ) 2 - (2)2 = O. Let A = R[x, y]. Then the value of a(x, y, z) a t

10-11

POLYNOMIALS I N S E V E R A L I X D E T E R M I N A T E S

399

The property of substitution given in (9-7.2) can be generalized. (10-1.4). Let D be an integral domain mhich is a subring of the commutative ring A. Let f(xl, x2, . . . , x,), a(xl, x2, . . . , x,), and b(xi, 2 2 ,

, xr)

be in D[xl, x2, . . . , x,]. Suppose that ul, 242, . . . , are in A. (a) Iff(xl, ~ 2 , .. . , x,) = a(x1, ~ 2 , . .. , x r ) b(xi, 2 2 , . . . ,xr) then

(b) I f f(x1, ~ 2 . ,. . , xT)

a(x1, $2,

. . . , xr) b(x1, X2,

xr), then

(c) If f(xl, x2,

. . . , x,) does not contain xj, then

for al1 v E A. (d) Let g(xl, x2, . . . , x,) E D [ x ~~ , 2 . ,. . , xsI, ai(x1, x2, D[xl, x2, . . . , x,] for i
=

- , xr)

1,2, . . . , S , and let u l , 212, . . . , U, E A.

1 f
h(zi, x2, . xr) = g(al(xl, 2 2 , then

. . . , x,), a2(x1, x2, . . . , x,), . . . , as(xi, 2 2 , .

,~ 7 ) ) )

The statements (a), (b), and (c) are easily proved by means of the generalized commutative, associative, and distributive laws of operation in a ring. (See Section 9-7 for the proof of (b) in the case r = 1.) The staternent (d) can be obtained from (a), (b), and (c) by induction on S (see Problem 14 below). Part (d) includes (a) and (b) as the special cases in which g(xl, x2) = x1 x2 and g(xl, x2) = x1 x2. Another important consequence of (d) is the fact that the result of substituting for the indeterminates in a polynomial does not depend on the way in which the polynomial is expressed. For example,

400

SYSTEMS

OF EQUATIONS

AND MATRICES

[CHAP.

10

in Z[.cl, .c2, .r3, .r4]. If nTe let

then

I t follo~vs from (10-1.4d) that

for any u17u2, u3, u4 in a commutative ring containing Z as a subring. Of directly . course, this fact could be sho~vn , . . . , ~ r ] ,where DEFINITION 10-1.5. Let a(x17x2, . . . , x,) E D [ z ~22, D is an integral domain. Let A be a commutative ring containing D. If ul, u2, . . . , u, are in A, then the ordered string (ul7 u2, . . . , u,) is called a xero of a(xl, x2, . . . , x,) [or a solution o f a(x1, x2, . . . , 2,) = O ] in the ring A if a(ul, u2, . . . , uT) = 0. More generally, if

are polynomials in D[xl, x2, . . . , x,], then (ul7u2, solution o f the system o f equations

. . . , U,) is called a

10-11

POLYNOMIALS IN SEVERAL INDETERMINATES

401

EXAMPLE 3. Let a(x, y) E R[x, y]. The zeros (u, u) of a(x, y) in R can be considered as the coordinates of points in the cartesian plane. The set of al1 such points constitutes what is called an algebraic curve, (possibly degenerate, that is, the empty set, or a finite number of points). For example, if a(x, y) = x2 y2 - 1, the set of al1 points (u, u) which are zeros of a(x, y) is the same as the set of al1 points which are a t a distance one from the origin. Hence, the solutions , when plotted as points in the cartesian plane, form a circle in R of a(x, y) = O of radius one with center a t the origin. EXAMPLE 4. Let a(x, y, z) E R[x, y, z]. The zeros (14, u, w) of a(a, y, z) in R can be considered as the coordinates of points in three-dimensional cartesian space (by a process which is similar to the representation of number pairs by points in the plane). The set of al1 zeros in R of a polynomial a(x, y, z) E R[x, y, z] constitutes what is called an algebraic surface (possibly degenerate, that is, the empty set, or a finite set of points and algebraic curves). For example, let a(x, y, z) = x2 y2 - z2. I t is possible to show that the set of al1 zeros of a(x, y, x ) in R lie on two cones with their vertices meeting a t the origin and with their axes extending along the z-axis in space (see Fig. 10-1). The zero

of a(x, y, z) in R[x, y] is called a parametrization of the upper half of this surface. The points on the upper cone are exactly those solutions (wl, w2, w3) in R of a(x, y, z) = O with w3 O. If any real numbers u and v are substituted for x and y, respectively, in (x2 - y2, 2xy, x2 y2), we obtain a zero (u2 - u2, 2uv, u2 v2) in R of a(x, y, z) with u2 v2 0, and therefore a point on the upper cone. The reader can show conversely that any zero (wi, w2, w3) in R of a(x, y, z) with w3 2 O is of the form wi = u2 - v2, w n = 2uv, w3 = u2 v2 for suitable real numbers u and v.

>

+ >

EXAMPLE 5. Let ai(x, y, z) = x2 y2 - z2, a2(x, y, z) = x2 y2 - 1 be in R[x, y, z]. The zeros in R of the system al(x, y, z) = O, a2(x, y, z) = O consist of al1 (u, u, + l ) with u2 v2 = l. Thus, in the three-dimensional cartesian coordinates, the set of al1 these zeros forms two circles of radius one in space (see Fig. 10-2).

402

SYSTEMS

OF EQUATIONS

AND MATRICES

[CHAP.

10

The branch of mathematics which is concerned with the zeros of systems of polynomials in severa1 indeterminates is known as algebraic geometry. In recent years, the geometric aspects of algebraic geometry have become subordinate to the algebraic features of the theory. Each of the rings D[xl, 2 2 , . . . , x,] contains an important class of special polynomials, the symmetric polynomials. Ordinarily, a polynomial is changed into a different polynomial when its indeterminates are permuted. y2 z3, then a(z, x, y) = z x2 y3, For example, if a(x, y, z) = x a(y, z, N) = y z2 x3, etc. However, certain polynomials are left unchanged by al1 permutations of their indeterminates. For instance, let a(x, y) = x2 xy y2. The only permutations of (x, y) are

+ + + +

+ +

+ +

Obviously, the first of these permutations does not change a(x, y). The second permutation changes a(x, y) into a(y, x). However,

by the commutative and associative laws.

DEFIXITION 10-1.6. 14 polynomial a(x1, x2, . . . , x,) in D[xl, x2, . . . , X,] is called symmetric if it has the property that for any permutation

of the set (1, 2, 3,

. . . , r),

That is, a(xl, x2, . . . , N,) is symmetric if every interchange of the indeterminates in a(xl, x2, . . . , x,) leaves t.his polynomial unchanged.

It is not necessary to check every permutation of (1,2, . . . , r ) to determine whether a polynomial a(xl, x2, . . . , x,) is symmetric.
(10-1.7). Let a(xl, x2, . . . , x,) E D[xl, x2,

. . . , x,]. Then

is symmetric if and only if for every pair i, j of natural numbers with l < i < j < r , a(zl, . . . , Ni-1,
Xj, Xi+l,

. . . , Xj-1,

Xi,

xj+i,

.-

xr) = a ( ~ 1'2, 9

'r).

10-11

POLYNOMIALS IK SEVERAL INDETERMINATES

403

That is, interchanging xi and xj has no effect on a ( x l , x2, Proof. Suppose that a ( x l , x2,

. . . , xr).

. . . , xr) is symmetric. Then since

is a permutation of (1, 2, . . . , r ) , it follo~vs from Definition 10-1.6 that

The proof of the converse will be clearer if we first examine a special case. Let r = 4 and suppose that interchanging any two indetermjnates has no effect on the polynomial a ( x l , x2, x3, x4). Consider the permutation

By assumption,

since a(x3,x2, x l , x4) is obtained from a ( x l , x2, x3, x4) by interchanging xl and x3. For the same reason, we have a(x1, xg, x3, x4) = a(x1, x4, 2 3 , ~ and a ( x l , 2 2 , x3, ~ In the identity
4
2 )

= )

a(zl,22,

24, 23)-

, re u1 = x3, substitute ul, u2, u3, and u4 for x l , ~ 2 2,3 , and ~ 4 here u 2 = x2, u3 = x l , and u4 = x4. It then follows from (10-1.4d) that

Similarly, in the identity a(x1, 2 2 , xg, 5 4 ) = ~ ( 2 1 2, 2 , x4, 2 3 ) ~ubstitute u l , u 2 , u3,and u 4 for x l , x2, 2 3 , and x4, where u1 = x3, u2 = 2 4 , u3 = 2 1 , and u 4 = x2. We obtain

Combining the sequence of identities

404

SYSTEMS O F EQUrlTIONS AND MATRICES

[CHAP.

10

gives the required result that a(sl, 2 2 , x3, x4) is left unchanged by the permutat ion

The proof of the general case follows the same idea, but uses more elaborate notation. First note that if kl, k2, . . . , kr is any rearrangement of 1, 2, . . . , r , then for aizy pair i, j with 1 5 i < j 5 r

I n fact, by assumption, a(xl, 2 2 , . . . , x,) sat'isfies

Substituting u l , u2, . . . , u, for xl, 22, . . . , x,, where u1 = xk1, u2 = xk,, . . . , u, = xk, gives the required identity (10-3). The identity (10-3) means that in a(xkl,q,, . . . , xk,), any two of the indeterminates xk,, xk,, . . . , xk, can be interchanged without changing the polynomial. Moreover, for any permutation

it is possible to obta,in a(xj1, xj,, . . . , zj,) from a(zl, 22, . . . , x,) by finite sequence of such interchanges. Indeed, starting with

tl

me can put xj, in the first position by substituting xj, for xl and xl for xj,. If jl = 1, this operation involves no change a t all. If jl # 1, then the substitution simply interchanges z l and xjl in a(xl, x2, . . . ,x,). I n this case, it follows from (10-3) that

By a similar subst,itution, it is possible to get xjz into the second position. Since j2 # jl (by the definition of a permut'ation), the interchange which

10-11

POLYKOMIALS IK SEVERAL INDETERMINATES

405

puts rj, into the secoiid place will not affect xj,. Continuing this process, \ve havc

(makiiig allo~vaiice for the iiiexactiiess of our notation). Each polyiiomial iii the column on the right side is obtained from the polynomial above it by interchanging two indeterminates or by no change a t all. Hence, by the identity (10-3), each polyiiomial is equal to the one which precedes it. This proves (10-1.7).

THEOREM 10-1 .B. The sum, product', aiid iiegative of symmetric polynomials are symmetric. Hence, t,he set of a11 symmetric polynomials in D[zl, 2 2 , . . . , x,] is a subring of D[zl, 2 2 , . . . , x,].
Proof. Let
1 2

. . .
. . .

T
jl

1
j2

T
jr

be a permutation of (1,2, . . . , r). If a(xl, 2-2, . . . ,~ r E ) D[xl, 2 2 , . . . ,x,], then the polynomial ~ ( x j ,zj,, , . . . ,~ j , ) is obtained from a(z1, 2 2 , . . . , 5,) by substituting xj, for xl, xj, for 2 2 , . . . , and Xj, for x,. In particular, if a(zl, x2, . . . , xT)and b(xl, ~ 2 . ,. . , x,) are symmetric, and

then by (10-1.4a),

It follows that f(xl, x2, . . . , 2,) is symmetric. The fact that the product and negative of symmetric polynomials are symmetric follows in a similar way from (10-1.4).
There is a particularly important class of symmetric polynomials, which can be conveniently defined as follows.

406

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

DEFINITION 10-1.9. The elementary symmetric polynomials in

, are the polynomials s:"(zl, 2 2 , . . . , x,), s $ ' ( x ~x2, s?)(x1, x2, . . . , x,) defined by the following identity in

. . . , xr), . . . ,

For example, if r = 2,
(23

x1)(~3 -

2 22) = X3

( ~ 1 ~ 2 ) ~ X 3lX2,

so that

(x4 - xl)(x4 - x 2 ) ( ~ 4 - x3) = 5: - ( 2 1 x2 2 3 ) ~ :f (21x2

+ +
( ~ 1 x2, ,

so that
~ ' 3 ' , 1 ( ~ 1 x2,

x3)

8'3'
2

x3) =

+ x2x3 + x1 + x2 + x3, xlx2 + x2x3 + x3x1,

~ 3 ~ 1 ) ~ xlx2X3r 4

s ' ~ ~ ) x2, (x~ x3) , = x1x2x3.


The fact that the polynomials S $ " ( X ~ x2, , . . . , x,) are symmetric in D [ x l , x2, . . . , xr] is an easy consequence of their definition. If

is a permutation of (1, 2, . . . , r ) , then by (10-1.4d),

10-11

POLYNOMIALS IN SEVERAL INDETERMINATES

407

Thus, applying Definition 9-2.1 in ( D [ x l ,x2,

. . . , xT])[xT+l], we obtain

This result can easily be generalized. (10-1.10). Let f ( x l , x2,

. . . , x,)

D [ x ~~ ,

2 ,

. . . , xT1. Then

is symmetric. This observation is an immediate consequence of the symmetry of the elementary symmetric polynomials and (10-1.4d). We leave the proof to the reader. The converse of (10-1.10) is a deeper and more important result. THEOREM 10-1.1 1. Fundamental theorem of symmetric polynomials. Let a ( x l , x2, . . . , x,) be a symmetric polynomial in D [ x l , x2, . . . , xT],where D is any integral domain. Then there is a polynomial f ( x l , x2, . . . , x,) E D [ x l , x2, . . . , x,] such that

We will not prove this theorem here, but the interested reader can find a proof in Appendix 2.

x:

EXAMPLE 6. Let a(x1, 2 2 , $ 3 ) E Z [ x i ,x2, x3] be the symmetric polynomial xi xi. We have

+ +

and
( ~ 1 22

+ +

3 3 =) 2 1 ~

+ + + 3(xSx2 + xSx3 + + 6x1~2~3.


3 22 3 23

212;

+~1x+ 3 x;x3 + ~ 2 x 3 )

408

SYSTEMS

OF EQUATIONS

AXD MATRICES

Hence,
X?

+ +Xj
Xi

(s:~')~- 3S1 S2

(3) (3)

+ 3sk3).

The general procedure followed in Example 6 can be used to express any symmetric polynomial a(zl, x2, . . . , 2,) in D[xl, 52, . . . , zr] in terms of the elementary symmetric polynomials. Roughly speaking, the process consists of computing al1 products (including powers) of elementary symmetric polynomials such t h a t the products have total degree no greater t h a n the total degree of a(xl, z2, . . . , 2,). It is then possible (usually by inspection) t o express a(zl, x2, . . . , x,) as a sum with coefficients in D of these products. This procedure can be systematized, but the statement of the exact process is somewhat complicated. I n practice, the method of trial and error is usually effective.

1. Formulate the definitions of the following concepts for the special case of polynomials in the two indeterminates x and y. (a) the total degree of a(x, y) (b) the value of a(x, y) for x = u, y = v (c) a zero of a(x, y) in the ring 1 . 1 2. What are Deg, [a(%, y)], Deg, [a(%, y)], and the total degree of a(x, y) for the following polynomials?

3. Prove by induction on r that every element of D[xl, 5 2 , expressed uniquely in the form (10-2).
4. Prove (10-1.2).

. . . , x,]

can be

5 . Prove that there are no polynomials f(x, y), g(x, y) in D[x, y] such that xf (x, y) yg(x, y) = l. [Hint: Substitute x for y.]

6. Describe geoinetrically the zeros in R of the following polynomials in R[x, yl. ( 4 x2 -t- y2 (b) x - y (d) (x - 112 (Y 212 - 4 ( 4 XY (e) x2 2x - 3 (f) y2 1 7. Find the solutions in R of the following systems of equations. =O (a) x + y - 5 = O , x - y + l 1 = 0, 10x - 15y 2 = O (b) 2x - 3y (c) 2x - 3y 1 = O , 10%- 15y 5 = O

+ +

+ +

+ +

10-11

POLYNOMIALS I N SEVERAL INDETERMINATES

= O (d) x2 - y - 5 = O , x + y + l (e) x2 - y 2 10 = 0, x2 y2 - 28 = O (f) x3y = 0, x 2 + y2 - 1 = 0

8. Determine S(:)(XI, x2, 23, xq) for 1 for 1 i 5.

< <

<i<4

and S(:)(xi, x2, x3, 24, x5)

9. Which of the following polynomials in D[xi, x2, x3, x4] are symmetric? Prove your assertions. (a) xSx2 222x3 232x4 24x1 (b) (xl 22 x 3 ) ( ~ 2 23 x4)(51 x3 ~4)(~1 22 24) x2x3 x3x1. (c) x1x2

+ + + + + + + + +

+ +

+ +

10. Give the details of the proof of (10-1.10). 11. Express the following symmetric polynomials in Z[xl, x2, xa] in terms of the elementary symmetric polynomials. (a) 2: x2 23 (b) x?xz 22x3 ~ 3 x 1 x?x3 22x1 23x2 (4 x : 22 23 2 2 (d) x?x%3 xlxgxg f XlX2x3 12. Suppose that the roots of the polynomial x3 - 2x2 x and r3. Find the cubic polynomial whose roots are rS, r2, and rg.

+ + + + + + +

+ + 1 are rl, r2,

13. (a) Show that in Q[x, y], every symmetric polynomial a(x, y) can be written in the form a(x, Y)
=

E
i=O

ri.i(xy)?xi
j=O

+ Y'),

where ri,j E Q. [Hint: Let a(x, y) = ~ . ~ k , ~ x and ~ yobserve l, that since a(x, y) is symmetric a(x, y) = +[a(x, y) a(y, x)].] (b) Prove the fundamental theorem on symmetric polynomials for Q[x, y] by showing that for al1 j 2 O , si yi can be written in the form f(x y, xy) for some f(x, y) E Q[x, y]. [Hint: Note that x ? + ~ yi+2 = (x Y)(xif l yi+l) - xy(xi yi), and use induction.]

++

+ + +

14. (a) Use (10-1.4a, b) to prove by induction on m and n respectively that if

and

+ cm(x1,

22,

. , xr)

in D[xl, XZ., . . . , xr], then for any ul, u2, ring containing D as a subring

. . . , ur in

a commutative

SYSTEMS

OF EQUATIONS

AND MATRICES

[CHAP.

10

and
Q(u~ ~) 2
)

ur)

dl(u1, u.2,

ur) d2(~1, ~ 2 , ur) . dn(ui, u2, . . . , U?).

(b) Use this result to prove (10-1.4d) by induction on s.


10-2 Systems of linear equations. One of the most important special cases of systems of polynomial equations arises when each equation of the system is linear. That is, the system is of the form

where the total degree of each polynomial

is no greater than one. Thus, the equations can be written in the form

where the coefficients ai,j and bi are elements of an integral domain D. We refer to (10-4) as a system o f S linear equations in r indeterminates (or unknowns) with coeflcients in D. For example

is a system of two equations in and h 1 . +x2 2x1 -k 3x2 0x1 0x2

four indeterminates with coefficients in 2,


=

+ + *x3 + -$x4 + Ox5 + 4x3 + 5x4 + 0x5 + + 0x3 + 0x4 + 0x5

1 = 1 = 0

is a system of three equations in five unknowns ~vith coefficients in the field Q. Note that the case in which al1 of the coefficients ai,l, ai,2, . . . 9 ai,r and the constant term of one or more equations in a system are zero is not excluded. It is often convenient to omit terms which have zero coefficient,

10-21

SYSTEMS

OF LINEAR EQUATIONS

411

provided that this does not cause confusion. For example, instead of xl 0x1 \ve would write

+ 0x2 + Ox3 - x4 = 1

+ x2 + x3 + x4
xz

o,

+ +
X3

X l - X4 =
Xq

1 = 0.

However, it would be confusing to omit the terms Ox4 in the system

because then it would not be clear that the system is in four indeterminates rather than three, unless this fact were mentioned explicitly. Therefore, whenever such a system is written, al1 indeterminates will be exhibited. In dealing with arbitrary systems of linear equations, it is convenient to use the summation notation, and write

instead of (10-4). This notation is not convenient for specific systems in which r and s are small. I f r 5 3, we will use x, y, and z instead of xl, x2, and x3. Definition 10-1.5, of a solution of a general system of polynomial equations, applies to systems of linear equations in particular. That is, if

is a system of s linear equations in r unknowns with coefficients in the integral domain D, and if A is a commutative ring containing D as a subring, then a solution in A of this system consists of an ordered string (e1, ~ 2 ., . . , c ~of ) r elements in A, S U C that ~ C;=l ai,jcj = bi, for i = 1, 2, . . ) s. DEFINITION 10-2.1. A system of linear equations with coefficients in an integral domain D is called consistent if it has a solution in some commutative ring containing D as a subring. Otherwise, the system is called inconsistent. When D is a field, there is a way to decide whether or not a system of linear equations with coefficients in D is consistent, and to find al1 of the solutions of the system if it is consistent. In the remainder of this section

412

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

we will explain this method of solving systems of linear equations.* The general idea of the process is to construct a new system of equations from the given one. The new system is such that its consistency can be determined by inspection, and when it is consistent, its solutions are easily found. Moreover, the new system is constructed in such a way that it has exactly the same set of solutions as the original system.

DEFINITION 10-2.2. Let

and

be systems of linear equations with coefficients in a field F. The systems are equivalent if every solution of the first system is a solution of the second, and vice versa. For example, the system
z+

2x

y=o 2y = o

is evidently equivalent to the system consisting of the single equation

It is obvious that the relation of equivalence of systems of equations is reflexive, symmetric, and transitive. That is, every system is equivalent to itself ; if the system S1is equivalent to the system S2, then S2is equivalent to Sl;and if the system SI is equivalent to the system SSand S2is equivalent to a system S3,then SI is equivalent to S3. Moreover, any two inconsistent systems are equivalent.

* The theory of determinants furnishes another method of solving systems of linear equations. In the simplest case of r equations in r unknowns, with the determinant of the coefficients not equal to zero, the familiar Cramer's rule provides explicit formulas for the unknowns as quotients of certain determinants. However, if the number of equations and unknowns exceeds four, then it requires considerable computation to evaluate these determinants, so that Cramer's rule is of more theoretical than practica1 importance. I n this book we will not discuss determinants or their application to the solution of linear equations. A complete discussion of these topics can be found in References 20, 21, 22, 24, and 25 listed a t the end of this book.

10-21

SYSTEMS

OF LINEAR EQUATIONS

413

There are three basic operations called elementary transformations which replace a given system of equations with coefficients in a field F by an equivalent system. These operations are described as folloms: ( 1 ) interchange two equations; ( 2 ) multiply an equation by an element of F and add the result to a different equation of the system; (3) multiply an equation by a nonzero element of F. Thus, if the original system of equations is (10-4), then the forms of systems obtained by applying elementary transformations of the three types are as follows. Type 1, where 1

<m

<n

<

S:

Type 2, where 1

5 m <n

<

S,

and c E F:
=

+ ' ' ' + al,rXr bl am,iZl + a m , 2 ~ 2 + + am,rX~ bm (an,i + cam,i)xi + (an,2+ cam,z)xa 44- (a,,, + ca,,,)~, as,lxl + as,2x2 + + as,,x, bs.
ai,ilL^l + a 1 , 2 ~ 2
'

bn

+ cb,

Type 3, where c

+ O in F :

I t is clear that each type of elementary transformation takes a system of s linear equations in r unkno~vnswith eoefficients in F into a system of

414

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

linear equations of the same sort, that is, s equations in r unknowns with coefficients in F.

THEOREM 10-2.3. Suppose that S and S' are systems of linear equations with coefficients in a field F such that S' is obtained from S by means of a sequence of elementary transformations. That is, there are systems of linear equations So,SI,S2,. . . , Snsuch that Sois S and S, is S', and for each natural number Tc 5 n the system Sr, is obtained from the system Sk-l by means of an elementary transfermation. Then the systems S and S' are equivalent.
Proof. Since the relation of equivalence between two systems of linear equations is transitive, it is sufficient to prove that for each Tc 5 n, S k is equivalent to Sk-l. There are three cases to consider, depending on which type of elementary transformation is used in passing from Sk-l to Sk. If Skis obtained from SkVl by interchanging two equations in the list, then it is obvious that every solution of Skis a solution of SkMl, and vice versa. is obtained from SkVl by adding a multiple of one equation Suppose that Sk to another. That is,

where the equation i 7t n, and

di,jxj = ei is the same as

ai,jxj = bi for

Then where m # n. Let (e1, c2, . . . , c,) be a solution of Sk-1. (cl, c2, . . . , c,) plainly satisfies every equation of Sk, except possibly dnVjxj= e,. However, Cam,jcj=b, and Ca,,jcj=bn.

Multiplying the first of these equations by c and adding it to the second, we obtain from the general distributive, associative, and commutative lawS

10-21

SYSTEMS OF LINEAR EQUATIONS

415

That is, CS=ldn,jcj = en. Therefore, ( e l , c2, . . . , c,) is a solution of Sk. Conversely if ( e l , C 2 , . . . , c,) is a solution of S k , then ai,jcj = bi for i # n and Subtracting from this cam,j)cj = b, cb,. equality c times the equation a,, jcj = b, gives C:=l a,, jcj = b,. Thus, ( e l , c2, . . . , c k ) is a solution of SkVl. Thus Sk and Sk-1 are equivalent in this case also. The proof that S k is equivalent to Sk-1 if Sk is obtained from Sk-l by multiplying some equation by a nonzero element of F is left as an exercise for the reader (see Problem 7 below). We now illustrate by an example the way in which a system of linear equations can be transformed by a sequence of elementary transformations into an equivalent system which can easily be solved.

x>=l

EXAMPLE l . Consider the system


-3y+
32 =

2x+*y-z=o

3x-2y+z

with coefficients in Q. In Table 10-1, the elementary transformation is described

Interchange the first and second equations

2x+ 3x-

*yz = o - 3y+3z=4 2y+ z = l

Multiply the first equation by 3 Multiply the first equation by -3 and add to the third equation Multiply the second equation by -3 Multiply the second equation by and add to the third equation Multiply the third equation by

x+&y-32=0 - 3y+3z=4 3x2y+ z = l

+ &Y

- fz = O
=

- 3y+$z=4

- m 2y3 + s z

x+&yjz=o y - 1, = -4 6 3
23 - ~ y + ~ =z 1

+ &Y

-" 3

o
= -4

y - A,

= -31

15

x+&y-jz=o y - 1, =
6
2 =

-4

-4%

416

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

on the left and the resulting equivalent system is given on the right. The final system of equations in this table is easily solved. If (cl, ea, es) is a solution, then c3 = ' -4 5 7 0 (from the second equation), and el = 1 2 7 ) c2 = iC3 3 = -&e2 +c3 = -M (from the first equation). It is routine to check by direct substitution that (-*%, is a solution of the system

,,, -M, -w)

Therefore, this system has exactly one solution in any cominutative ring containing Q as a subring. It follo~vs from Theorem 10-2.3 that the original system

has the unique solution

(-m? -570 -381) 381) i:?)'

It is the special form of the last system of equations in Table 10-1 that makes it possible to obtain these solutioii so easily. This system is a particular case of a system of equations which is in "echelon form." DEFISITIOS10-2.4. A system of linear equatioi~s

is said to be in echelonform if there exists an integer m with O m and a sequence of natural numbers (nl, n2, . . . , n,) such that < n , 5 r; (a) 1 5 n l < n 2 < (b) if 1 5 i 5 m, then a i , j = O for j < n i and ai,,i = 1; (c) if m < i 5 S, then ai,j = O for al1 j. [If m = S, case (c) does not occur].

<

In Example 1, the last system obtained in Table 10-1 is in echelon form, with m = 3, n l = 1, nz = 2, and n3 = 3. The system

10-21

SYSTEMS O F LINEAR EQUATIONS

is in echelon form with


m=?

nl = 1,

and

n2 = 3.

The system

is also in echelon form with m = O . Systems of t,his kind (with the coefficients of al1 indeterminates equal to zero) seem rather trivial, but it would be inconvenient to exclude them from our discussion. In general, if m = O in Definition 10-2.4, t,hen the set {nl, n2, . . . , n,) of natural numbers is empty. In this case, the conditions (a) and (b) are satisfied vacuously, and condition (c) implies that ai,j = O for al1 i and j. Xote that by condition (a) in Definition 10-2.4, the number m cannot exceed r, because it is impossible to have more than r different natural numbers ni which satisfy 1 5 ni 5 r.

THEOREM 10-2.5. If S is a system of s linear equations iii r unknowns with coefficients in a field F, then it is possible to transform S into a system of linear equations S' in echelon form by means of a finite sequence of elementary transformations.
Proof. The proof of this theorem is by course of values induction on the number t of different indeterminates which have nonzero coefficients in the system. That is, t is the number of indeterminates having at least one f this number is zero, then the nonzero coefficient. Of course, t 5 r. I system must have the trivial form
0x1 0x1 0x1

+ 0x2 + . . + OX, = b l + 0x2 + + OX, = b2


*

+ 0x2 +

+ OX, = b,,

which is already in echelon form (with m = O). Thus, the basis of the induction t = O offers no difficulty. Assume that t > O and every system in which fewer than t indeterminates appear with nonzero coefficients can be transformed to a system in echelon form by means of elementary transformations. Suppose that

is a system in which t indeterminates occur with nonzero coefficients. Let

418

SYSTEMS OF EQUATIOKS

AXD MATRICES

[CHAP.

10

nl be the least natural number such that xnl has a nonzero coefficient in one of the equations. Since t > O, it follows from the well-ordering principie that such an nl exists. If the coefficient of xnl is zero in the first equation, interchange the first equation with an equation in which the coefficient of xnl is not zero. Multiply the new first equation by the inverse in F of the coefficient of xnl. After these elementary transformations, the system has the form

I n turn, multiply the first equation by -a:,, for i = 2, 3, . . . , S to obtain

and add to the ith equation

The construction of (10-6) from the original system is effected by a finite number of elementary transformations. Moreover, it is evident that if an indeterminate x, occurs with zero coefficient in every equation of the original system

then every coefficient of x, in (10-6) is also zero. Consequently, in the system

a t most t - 1 indeterminates appear with nonzero cocfficients. By the induction hypothesis, the system (10-7) can be transformed into echelon form by a finite sequence of elementary transformations. Clearly, in the resulting echelon system obtained from (10-7), the indeterminates xj for j 5 nl will occur with coefficient zero. That is, the echelon system ob-

10-21

SYSTEMS OB LINEAR EQUATIONS

419

tained will be of the form

Consequently, combining this system with the first equation of (10-6)) we obtain an echelon system

Since a sequence of elementary transformations applied to (10-7) can be considered as a sequence of elementary transformations applied to (10-6) which do not involve the first equation, it follows that we can get from our original system to a system in echelon form by applying a finite number of elementary transformations. This completes the induction, and proves Theorem 10-2.5. By combining the results of Theorems 10-2.3 and 10-2.5, we obtain the most important result of this section. THEOREM 10-2.6. Any system S of s linear equations in r unknowns with coefficients in a field F is equivalent to a system Sr of s linear equations in r unknowns with coefficients in F where Sr is in echelon form. I t should be emphasized that a system of linear equations may be equivalent to many different systems in echelon form. The system Sr in Theorem 10-2.6 is by no means unique (see Problem 5 below). The reduction process described in Example 1 and in the proof of Theorem 10-2.5 works for arbitrary fields. When it is used for fields of the form Z,, where p is a prime number, the results can be interpreted to obtain information concerning the solution of linear congruences with a prime modulus (see the discussion following Theorem 9-7.8).

EXAMPLE 2. Let the system


2x1

+ 4x2 + x3 +

x4

420

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

have coefficients in Z5, the integers modulo 5. We list the successive equivalent systems, arriving finally a t a system in echelon form. The reader should describe the elementary transformations a t each step.

SYSTEMS O F L I K E A R E Q U A T I O N S

This system is not satisfied for any choice of x l , $2, $3, and 2 4 because the final equation O = 3 is never satisfied. Therefore the original system has no solution in any commutative ring containing Z 5 . The linear system of equations in this example can be regarded as a system of simultaneous linear congruences

+ si+ 3xi + 4x1 +


2x1 xl+

4x2 3x2 4x2 x2

+ + + 2x3 + + + + +
23 x3 x3 x3+

= 1 (mod 5) $4 = 2 (mod 5) 2 x 4 = 3 (mod 5)


24 3x4

= 4 (mod 5)
O(mod5).
( e l , c2, c3,

x2+

24-

Our result shows that this system of congruences has no solution lvhere ci E 2.

c4)

EXAMPLE 3 . The system

with coefficientsin Q is in echelon form. Let x5 = c, where c is an element in any commutative ring A containing Q as a subring. Then from the last equation, x4 = 1 - C. Substituting x4 = 1 - c, x5 = c in the second equation and choosing x3 = d E A, we have
xz
=

-id

+ z(1

C) - c

+ S = 1 - 3c -

'5d .

From the first equation

Thus, (5

-3 Sd

,1 -

- 3d, d , 1 - c, c )

422

SYSTEMS OF EQUATIONS AND MATRICES

[CHAP.

10

is a solution of the given system, where c and d are arbitrary elements in A. For example, if A = Q[x] and c = d = x, then a solution is
(5+*x,
1 - fx, x, 1 - x, x).

I t is clear that this system has infinitely many different solutions.

Examples 1, 2, and 3 illustrate the fact that systems of equations in echelon form can be solved (or shown to be inconsistent) without much trouble. In fact, we can prove the following general results. THEOREM 10-2.7. Let

be a system of linear equations with coefficients in F, which is in echelon i 5 m and ai,j = O for al1 form: ai,j = O for j < ni, a;,ni = 1 for 1 j i f m < i < s , w h e r e l 5 nl < n 2 <nm randO 2 m 5 s. (a) The system is consistent if and only if either m = S, or bi = O for every i satisfying m < i <_ s. If the system is consistent, then it has a solution (cl. c2, . . . , c,) with each ci in F. (b) If the system is consistent, then its solution is unique if and only if m = r. When this condition is not satisfied (that is, the system has more than one solution) then it is always possible to find at least as many solutions (cl, c2, . . . , c,) with ci E F as there are elements in F.

<

< . . a

<

Proof. (a) Suppose that m < s and there is an i Then the ith equation of the given system is

> m such that bi

# 0.

This equation plainly has no solution in any ring A containing F as a subring. On the other hand, if either m = S, or bi = O for al1 i satisfying m < i 5 S, then it is easy to see that (cl, c2) . . . , cr) is a solution with ci E F, where we define recursively

and cj

O for al1 indicesj which are not among the indices nl, n2,

. . . , nm.

10-21

SYSTEMS OF LINEAR EQUATIONS

423

Note that the cn, are determined by the ai,j and bi. For example,

I t follows that our system is consistent and has a solution in F. (b) Suppose that the system is consistent and m = r. Since the natural numbers nl, n2, . . . , n, satisfy 1 5 nl < n2 < . < n, 5 r, it follows that nk = Ic for 7c = 1, 2, . . . , r. That is, the system has the form

Ox, = b, = 0.

I f (cl, CZ, . . . , c,) is a solution of this system, then necessarily

Suppose inductively that c,, c,-1, in any solution. Then since

. . . , ~ , - k + ~ are uniquely determined

it follows that c,-k is also unique. Hence, by the principle of induction, the system of equations has a unique solution. Conversely, if the system is consistent, but the condition m = r is not satisfied, then there exists an index 1 such t,hat 1 # nk for al1 1 5 Ic 5 m. Let c E F. Define ei = bi ai,tc. Then the system

. By is still consistent because if i > m, then ai,l = O and ei = bi = O the proof of part (a), this new system has a solution (cl, c2, . . . , c,) with ci E F, such that cl = O. I t is then clear that

is a solution of our original system of equations. Since c can be arbitrary, it follows that the system has at least as many different solutions (in F) as

424

SYSTEMS OF EQUATIOXS AND MATRICES

[CHAP.

10

there are elements in F. In particular, since every field contains a t least two elements, the system has more than one solution. As a consequence of Theorems 10-2.G and 10-2.7, we have the following useful result. THEOREM 10-2.8. I f a system of linear equations in r unknowns with coefficients in a field F is consistent, then the system has a solution (e1, c2, . . . , cT)with C; E F. Proof. By Theorem 10-2.6, the given system S is equivalent to a system S' of linear equations with coefficients in F, such that S' is in echelon form. Since S is consistent and S' is equivalent to S, it follows that S' is consistent. By Theorem 10-2.7, S' has a solution (el, c2, . . . , c,) with ci E F. Since S' is equivalent to S, it follows that (cl, c2, . . . , c,) is also a solution of S.
= b, = O in (10-4), the system is called homoWhen bl = b2 = geneous. A homogeneous system of linear equations is always consistent since (0,0, . . . , 0) is a solution. An interesting question concerning homogeneous equations is whether or not they have solutions other than the trivial one (0, 0, . . . , 0). This problem can always be referred to the case in which the homogeneous system is in echelon form. Indeed, it is clear that every elementary transformation carries a homogeneous system into a homogeneous system. Therefore, by Theorems 10-2.3 and 10-2.5, every homogeneous system is equivalent to a homogeneous system in echelon form. It is clear that if a homogeneous system has a unique solution, then it has no solution other than the trivial one (0, 0, . . . , 0). Consequently, Theorem 10-2.7(b) provides a condition for a homogeneous system in echelon form to have a nontrivial solution, namely m < r, where m is the number of equations of the system in which some nonzero coefficient appears and r is the number of indeterminates. In particular, if the number S of equations is less than the number r of unknowns, then the system has a nontrivial solution. Consequently, we obtain the follo~ving useful result.

THEOREM 10-2.9. Let

be a homogeneous system of*s linear equations in r unknowns with coefficients in the field F. Suppose that s < r. Then cl, c2, . . . , Cr exist in F, not al1 zero, such t,hat

10-21

SYSTEMS OF LINEAR EQUATIONS

425

Proof. By Theorems 10-2.3 and 10-2.5, the system ai,jxj = O, i = 1 , 2 , . . . , S , is equivalent to a homogeneous system S' of S linear equations in r unknowns with coefficients in F such that S' is in echelon form. Since m 2 S < r, it follows from Theorem 10-2.7(b) and the fact that every field contains a t least two elements that there is a solution

of S' which is different from (0, 0 , . . . , 0 ) . Since S' is equivalent to the given system, it follows that C5=lai,j~j = O for i = 1, 2, . . . , s.

EXAMPLE 4. By elementary transformations, the homogeneous system

can be transformed into the system

The value of x4 can be chosen arbitrarily and the equations solved for 23, x2, and x l .

1. Reduce the following systems of linear equations with coefficients in Q to echelon form by means of elementary transformations, describing the elementary transformation being used a t each step. y = 3 (a) 2 x x - y = l x+ y = 2

(b) Zxi
Xl

22

- x2

+ +x3 -

- +x3+

x4 = 1
24

(c)

X -

y = 2

x+ y = 2 32y = 2 x + 7 y = 2

426

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

25 = O (d) 2x1 - 3x2 4x3 - ~4 x i - 2x2 - 5 3 - x5=l

4x1 2x1

22 X2

X3

+ + + 4x5 + 2x5
54 X4

= =

2 0

(e)

x ! 2 y

(-l)i'~j

(-1Ii,

1, 2,

. . . , 100

2. Discuss the solution of each of the systems in Problem 1. That is, determine whether or not each system is consistent, and if i t is describe a11 possible solutions (as in Example 3). 3. Describe the elementary transformations used a t each step in Example 2. 4. Solve the following systems of linear equations with coefficients in 2 7 . (a) 2x+ 2y+ 32 = 1 4x+6y+ x = 4 X x = 3 (b) xl

+ + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 -/- 4x2 + 2x3 + 2x4 + 4x5 -/- xg + 6x3 + + 6x5 + 6x6 xi + $c 2x2 + 4x3 + 4x4 + 2x5 + x i + 4x2 + 5x3 + 2x4 + 3x5 + 6x6

+ + + + +
22
53

24

X5

$6

X i X i

1 = 1
=

= 1 =

22

$4

X i

X6 =

1 1 1

5. Show that by elementary transformations i t is possible to reduce the system

to any of the following systems in echelon form

Does this list of systems include al1 possible echelon forms to which the given system can be reduced? 6. Suppose that the system C:=i ai,jXj = bi, i = 1, 2, . . . , r, of r linear equations in r unknowns with coefficients in a field F has the unique solution ( 1 , 2 , . . . , c . Show that i t is possible to reduce this system by elementary

10-21

SYSTEMS OF LINEAR EQUATIONS

transformations to the form x1


= C1

7. Complete the proof of Theorem 10-2.3 by showing that if a system S' is obtained from a system S by a n elementary transformation of type 3 (multiplication of an equation in S by a nonzero element of F), then S and S' are equivalent systems.
8. Let a, b, c, d, e, and f be elements of any field with a # O. Prove that the system by = e ax cx+dy = f

is consistent if and only if either (i) ad - bc # O, or (ii) ad - bc = af - ec = 0.

9. Prove that the homogeneous system

has a solution different from (O, 0, O) if and only if

10. Show that if the system S' is obtained from the system S by an elementary transformation, then there is an elementary transformation which carries the system S' into S. 11. Show that if (ci, ca,

. . . , c,)

is any solution of the homogeneous system

with al1 of the cj belonging to some ring A containing al1 ai,j, and if d is any element of A, then (dcl, dc2, . . . , dc,) is also a solution of the homogeneous system. 12. Show that if (cl, c2,

. . . , c,)

is any solution of the system S :

428

SYSTEMS OF EQUATIONS AND MATRICES

[CHAP.

10

and if (di, d2, . . . , dr) is a solution of the associated homogeneous system

then (cl

+ di, + dg, . . . , ct + d7) is a solution of S.


c2

10-3 The algebra of matrices. The study of linear equations in the preceding section serves as a natural introduction to the concept of a rectangular matrix. The system of equations S,

can be completely determined if the coefficients of S are given and the position of each coefficient in the system js known. This information is conveniently presented by the rectangular array

which is called a matrix.

DEFIXITION 10-3.1. Let A be a ring. An m by n matrix (plural: matrices) with elements in A is a rectangular array*

with m rows and n columns, where the entries ai,j are e1ement.sof the ring A . For example,

[:, -7

-:,U]

* I n this section and the following one, boldface capital letters will denote matrices.

10-31

THE ALGEBRA OF MATRICES

429

is a 2 by 4 matrix with elements in the ring Z of integers. In this example,

al,l=2,
and
a2,1

al,z-0, a2,2 = -7,

a1,3

-1,

a1,4 = 6, a2,4 = 20.

0,

a2,3 = 2,

The entries ai,j of a matrix are called the elements of the matrix, and the position of each element in the matrix is indicated by its subscripts. For instance, alVl is the element in the first row and first column (the upper left-hand corner) of the matrix, while a3,4is the element in the third row and fourth column. In general, ai,j is the element in the ith row and jth column for i = 1, 2, . . . , m and j = 1, 2, . . . , n . The number m of rows and the number n of columns in a matrix can be arbitrary natural numbers. These numbers are called the dimensions of f A is an n by n matrix, that is, the number of rows is equal to the matrix. I the number of columns, then A is called a square matrix. A matrix with only one column, that is, an m by 1 matrix, is called a column matrix, or a column vector. Similarly, a matrix with only one row is called a row matriz, or a row vector. The reader should be careful not to confuse matrices with determinants. Corresponding to every square matrix A with elements in a commutative ring A, there is associated in a certain way an element of A called the determinant of A. For example, if A is the 2 by 2 matrix

then the determinant of A is

and if A is the 3 by 3 matrix

the determinant of A is

The matrix A is not an element of the ring A, whereas the determinant of A is an element of A. For r by s matrices with r # s, the determinant is not even defined.

430

SYSTEMS OF EQUATIONS

AS~D MATRICES

[CHAP.

10

Matrices are more than just convenient forms for presenting numerical data. By defining suitable operations of addition, subtraction, and multiplication, it is possible to develop an algebra of matrices which has numerous applications. The purpose of this section is to define these matrix operations and derive their basic properties. Some of the application of the algebra of matrices will be described in examples. Two matrices will be called equal if they are identically the same. That is, if

then A = B if and only if m = r, n = s (thus, A and B have the same dimensions), and ai,j = bi,j for i = 1, 2, . . . , m and j = 1, 2, . . . , n. For example, if

A=

[U

:]

and
=

B=

[ o o'1,
-1

then A # B ; if A = (1 1 1) and B

(1 1)) then A # B ; however, if

[O O O]

and

[+ o
I

i2

O
2-(\/2)2

then A = B.

DEFINITION 10-3.2. If A and B are m by n matrices with elements ai,j and bi, j in a ring A , then the sum, A B, of A and B is the matrix

Thus, C = A B is an m by n matrix with elements in A such that a i , j + bi,jfor i = 1, 2, . . . , m and j = 1, 2, . . . , n. I t is clear that addition of matrices is a well defined biiiary operation on the set of al1 m by n matrices with elements in A. However, the sum A B is not defined unless A and B have t,he same dimensions.
Ci,j =

10-31

THE ALGEBRA OF MATRICES

EXAMPLE 1. The matrices

are 4 by 3 matrices with elements in the field Q of rational numbers.

Since matrices are added "elementwise", according to Definition 10-3.2, the properties of addition which hold in the ring A are also satisfied by matrix addition.
(10-3.3). Matrix addition is associative.

Proof. Let A, B, and C be m by n matrices with elements ai,j, biVj, and ci,j in a ring A. Then by Definition 10-3.2,

Similarly,

432

SYSTEMS

OF EQUATIONS

AND MATRICES

[CHAP.

10

Both (A B) C and A (B C) are m by n matrices with elements in A, and since addition is associative in A, it follows that (ai,j biPj) ci,j) for al1 i, j. Thus, according to the definition of ci,j = ai,j (bi, equality of matrices, (A B) C = A (B C). The commutative law of addition in a ring A leads to the corresponding property of matrix addition.

+ +

+ +

+ +

+ +

(10-3.4). Matrix addition is commutative.

It will be left as an exercise for the reader to prove (10-3.4)) that is, to show that if A and B are m by n matrices with elements in a ring A, then A+B=B+A.
Let O denote the m by n matrix which has the zero element of A in every position. Then it follows from Definitions 4-2.1 (c) and 10-3.2 that

f course, O where A is any m by n matrix with elements in A. O also. Because O satisfies (10-8)) it is called the xero matriz. Let a i , i al-2 . . a1,n

+A = A

am,l am,2

am,n

be an m by n matrix with elements ai,jin a ring A. Define the negative of A to be the m by n matrix

In (10-9), the element -ai,j of 4 , is the negative of ai,j in the ring A . Thus, we have
al,*

+ (-A)

a2,l

+ +
+

(-al,d (-a2,l)

al,2

a2.2

+ +

(-al,z>

(-a2.2)

. . .al,, .-. . a2., . . . a,,,

+ +
+

(-al,,) (-a2,n)

0 0 0 0

...

. . . .
-0 0

o.

amPl

(-am,l)

am,2

+ (-a,,d

(-am,,)-

0-

10-31

THE ALGEBRA O F MATRICES

433

Let ,Mn(A) denote the set of al1 m by n matrices with elements in a ring A . Then with addition and negation defined by Definition 10-3.2 and (10-9), the properties (10-3.3), (10-3.4) and equations (10-8), (10-10) correspond exactly to the conditions of Definition 4-2.1 (a) (b), (e), and (d) in the definition of a ring. The reader might expect that the next step would be to introduce an "elementwise7'multiplication in the set ,Mn(A), which together with addition and negation would make ,M,(A) into a ring. Indeed, this can be done (see Problem 6 below). However, it turns out that in the various applications of matrices, a different definition of matrix multiplication is more useful. DEFINITION 10-3.5. Let A be an m by n matrix with elements ai,j in a ring A and let B be an n by q matrix with elements bi,j E A. Then ai,kbk,j the product Al3 is the m by q matrix which has the element in the ith row and jth column for i = 1,2, . . . , m and j = 1, 2, . . . , q. According to this definition, it is possible to multiply two matrices with elements in a ring only when the first matrix has the same number of columns as the second matrix has rows. Therefore, if m # n, Definition 10-3.5 does not define the product of two matrices in the set ,Mn(A). However if m = n, then it does define a binary operation on the set nMn (A )
EXAMPLE 2. Let

be matrices with elements in the field Q of rational numbers. Since A has three columns and B has three rows, the product AB is defined. In fact, according to Definition 10-3.5,

The product BA is not defined, since B has three columns, while A has only two rows.

EXAMPLE 3. Using the definition of multiplication given in Definition 10-3.5, it is possible to write a system of S linear equations in r unknowns as a single

434

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

matrix equation. Let

be a system of linear equations with coefficients in an integral domain D. The S by r matrix

is called the matrix of coeficients of the system. Define column matrices X and B by

The elements of A, X, and B can be thought of as being in the ring

Since A has r columns and X has r rows, i t is possible to form the product AX. By definition, this product is a column matrix with s rows, namely,

Consequently, the matrix equation AX equations

B is identical with the system of

Using this notation, a solution of the system of equations is a column matrix with r rows

10-31

T H E ALGEBRA OF MATRICES

435

with elements in a commutative ring A containing D as a subring, such that

AC
EXAMPLE 4. Let

B.

be a system of linear equations with coefficients in the integral domain D. Suppose that y l , y2, . . . , yt are new unknowns which are related to $1, xa, . . . , X T by the equations x i = di,iyi di,2yz -l- di,tyt 2 2 = d2,1y1 d2,2y2 4- d z , t ~ t
XT

+ + + + dT,tyt, dr,iyi + d r , 2 ~ 2

with

dj,k

D for al1 j and k. In compact notation,

Thus, the given system becomes

which, by the generalized distributive, commutative, and associative laws is equivalent to

The matrix of coefficients of this new system of equations is

436

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

That is, if A is the matrix of coefficients of the original system, and if

is the matrix of the coefficients in the system of equations which relate xl, x2, . . . , x, to yl, ys, . . , yt, then the matrix of coefficients of the new system is AD, according to Definition 10-3.5. These calculations can be carried out within the algebra of matrices. Let

Then the relation between the x's and y's can be expressed by the matrix equation X = DY (see Example 3). Also, the original system of equations can be written in the form

AX

B.

Substituting DY for X in this equation gives

It must be noted of course that the number of columns of A is equal to the number of rows of DY, so that A(DY) makes sense. I n a moment we will show that matrix multiplication is associative. Assuming this fact, i t follows that

Consequently, the new system of equations in matrix form is

The matrix of coefficients of this system is clearly AD, which is what we proved above by writing the systems in full. This example illustrates the notational savings which matrices provide.

We will now establish the associativity of matrix multiplication which was mentioned in Example 4.

10-31

THE ALGEBRA O F MATRICES

437

(10-3.6). Matrix multiplication is associative.

Proof. Let A be an m by n matrix with elements ai,j in a ring A , B an n by q matrix with elements bi, in A ,and C a q by r matrix with elements ci,j in A . Then the products AB, BC, (AB)C, and A(BC) are al1 defined. We wish to prove that these last two products are equal. By Definition 10-3.5,

is the m by r matrix which has the element

in the ith row and jth column for i = 1, 2, Again using Definition 10-3.5,

. . . , m and j

1, 2,

. . . , r.

is an m by r matrix which has the element

438

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

in the ith row and jth column for i = 1, 2, . . . , m and j = 1, 2, . . . , r. By the distributive laws, Definition 4-2.l(f) and (g), and the commutative law for addition, Definition 4-2.l(a), which are satisfied in the ring A ,

Since (ai,kbk,l)cl,j = ai,k(bk,lCl,j) by the associative law for multiplication, in A, it follows that the element in ith row and jth column of (AB)C is the same as the element in the ith row and jth column of A(BC) for i = 1, 2, . . . , m and j = 1, 2, . . . , r. Therefore,

by the definition of equality of matrices.

I f A and B are m by n and n by q matrices, respectively, with elements in a ring, then AB is defined, but BA has no meaning unless m = q. However, even in the case where both products AB and BA are defined, they are not necessarily equal. Indeed, if A is an m by n matrix and B is an n by m matrix, then AB is m by m and BA is n by n. Thus, if m # n, the two products do not have the same dimensions, and are not equal. The following example shows that even when A and B are both n by n square matrices (so that AB and BA are also n by n matrices), the products AB and BA may not be equal.
EXAMPLE 5 . Let

be 2 by 2 matrices with elements in 2. Then

10-31

THE ALGEBRA OF MATRICES

be 3 by 3 matrices with elements in Q. Then

7 - 9 CD
=

[%

-%?-

, ] : 1 0

DC

8 -10

[-: y :l.
7 -6

-2

25

We will now adopt the simpler notation iWn(A) for the set ,&fn(A) of al1 n by n matrices with elements in a ring A. The matrices of Mn(A) are called n-rowed square matrices with elements in A. We have already proved most of the results needed for the following theorem. THEOREM 10-3.7. The set Mn(A) of al1 n-rowed square matrices with elements iil a ring A, with addition, multiplication, and negation, defined by Definitions 10-3.2 and 10-3.5 and (10-9), is a ring. I f A contains an identity element 1, then the n by n matrix

(whose elements ei,j are 1 if i = j and O if i # j) is the identity in Mn(A). Moreover, if n 2 2 and 1 # O in A, then Mn(A) is not commutative.

Proof. The only identities left to verify in order to prove that Mn(A) is a ring are t,he distributive laws, Definition 4-2.l(f) and (g), that is,
A(B

+ C) = Al3 + AC,

(A

+ B)C = AC + BC.

These follow easily from the properties of addition and multiplication in A, and we leave their proof as an exercise for the reader. To prove that 1 is an identity in dfn(A), let

be an arbitrary matrix in M,(A).

Then the element of the ith row and

440

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

jth column of A1 is

A1 = A. Similarly, IA = A. To complete the proof, it will be sufficient to exhibit two matrice A and B in M , ( A ) such that AB # BA (assuming of course that n 2 2 and 1 # O in A). Let

where

e k ,j =

ai,,en, j

O if k # j and e j , j = 1. T ~ u s ,a i , l e l ,j ai,zez,j .. = a i , j 1 = a i ,j. Siiice i and j are arbitrary, it follows that

Then it follows easily from Definition 10-3.5 that

AB

ro 1 o . . . oo o o ... o
=

. . . . . . . . .
-0

and

BA

o o o ... o O o ...
e e
s

=
. .
.

. .
.

o o . . . o-

-0

o ...

Therefore Al3 # BA, so that the proof is complete.

1. (a) Write the 5 by 3 matrix with elements in Z which has

ai,j =

i . j for
i/j

i = 1,2,3,4,5andj = 1,2,3. (b) Construct the 2 by 4 matrix with elements in Q which has for i = 1, 2, and j = 1, 2, 3, 4. 2. List every 2 by 2 matrix which has elements in modulo 2.
ente

ai,j =

22,

the ring of integers

3. If A and B are m by n matrices with elements in a ring A, then the diferof A and B is defined by A - B = A (-B). Prove that A - B is the unique solution of the matrix equation B X = A.

+ +

4. Perform the indicated operations.

10-31

THE ALGEBRA OF MATRICES

5. Prove (10-3.4).

6. Define multiplication in ,Al,(A) by the rule

Prove that with this multiplication and with addition and negation, defined by Definition 10-3.2 and (10-9), ,Mn(A) is a ring. Prove that if A is commutative, then ,M,(A) is commutative. 1s it true that ,Ji,(A) is an integral domain if A is an integral domain?

7. Compute the following matrix products.

442

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

xl, $2,

8. Write the systems of homogeneous linear equations in the unknowns x3, and 2 4 whose matrices of coefficients are as follows.

(c) 1, where 1 is the identity matrix of M4(Q).


9. Find the matrix of coefficients of the systems obtained from the homogeneous systems in Problem 8 by making the following change of unknowns:

10. Complete the proof of Theorem 10-3.7 by proving the distributive laws in M n ( A ) . Prove, more generally, that if and then State a general form of the other distributive law.
11. Prove that if n 2 2 and the ring A contains an element a # O, then Jf,(A) contains proper divisors of zero.

12. Let A be a ring with identity. Suppose that n B in M , ( A ) such that (AB)2 # A2B2.

2 2. Find matrices A and

13. Prove that for any ring A, the ring M 1 ( A ) of al1 1 by 1 matrices with elements in A is isomorphic to A.

10-41

THE INVERSE OF A SQUARE MATRIX

443

10-4 The inverse of a square matrix. I f F is a field, theil by Theorem 10-3.7 the ring M,(F) of al1 n-rowed square matrices with elements in F has the identity

which is the n by n matrix with 1 in every position on the diagonal line from the upper left-hand corner to the lower right-hand corner (the socalled "main diagonal") and O in every other position. The existence of an identity element in M,(F) makes it possible to define inverses. DEFINITION 10-4.1. Let A and B be in M,(F), where F is a field. If Al3 = BA = 1, then the matrix B is called an inverse of the matrix A in M, (F) .

I f B is an inverse of A, then of course A is an inverse of B, since Definition 10-4.1 is symmetrical in A and 8. A matrix A may not have an inverse, but if an inverse does exist, then it is unique. In fact, suppose that AB = BA = 1 znd AC = CA = 1, where A, B, and C belong to M,(F). Then by the associative law,

We will denote the unique inverse of A, when it exists, by A-'. Matrices which have no inverse are called singular; if A has an inverse, then A is called nonsingular.
EXAMPLE 1. The matrix

in J42(Q)does not have an inverse. Assume that

is such that AB = 1. Then

444

SYSTEMS

OF EQUATIONS A N D MATRICES

[CHAP.

10

Therefore, the numbers b i , 1, b i ,2, b2,1, and b2,2 must satisfy the following equations : +bi,i @2,1 = 1 Lb1,2 4- $32,2 = 0 +bi,i 4- 3b2,i = O

Multiplying the first equation by -2 and adding it to the third equation, wc get an equivalent system of equations:

which is inconsistent. This proves that A has no inverse in *1f2(Q).

EXAMPLE 2. The matrix

in li3(C) has an inverse

in -1fs(C), as the reader can verify by checking that A-'A

M-l = 1.

An important elementary property of the set of al1 nonsingular matrices is the fact that this set is closed under multiplication. In fact, the inverse of the product of nonsingular matrices can be given explicitly in terms of the inverses of the given matrices. THEOREM 10-4.2. Let A l , A2, . . . , Ak be nonsingular matrices in J f , ( F ) , where F is a field. Then AL' . . . A T ~ A is ~ 'the inverse of the product AlA2 . . . Ak, SO that this product is nonsingular.
Prooj. If k = 1, the assertion to be proved is that A l ' is the inverse of Al. This is true by the definition of A l 1 . Suppose that k = 2. Then

10-41

T H E INVERSE O F A SQUARE MATRIX

445

and

Thus, by Definition 10-4.1, ALIA1' is an inverse of AlA2. Since inverses are unique, (A1A2)-' = ALIA;'. The proof of the general case is obtained by induction on k , using the case Ic = 2 t'o establish the induction step. We omit the details.

is a system of n linear equations in n unknowns with coefficients in a field F, then the matrix of coefficients of this system

f the matrix A has an inverse in Mn(F), aild if this belongs to Mn(F). I inverse is known, then the system of equations can easily be solved. In fact, suppose that (cl, c2, . . . , c,) is any solution of the system. As we observed in Example 3, Section 10-3,
AC = B, where

Mult,iplying each side of this equation by A-' gives

Therefore, by the associative law, C = IC = (AW'A)C = A-'(Ac) = A-'B. That is, the solution (cl, c2, . . . , e,) can be obtained in the form of a column matrix by computing A-'B, provided that A-' is known. Conversely, by direct substitution of C = A-'B for X in the mtltrix equation AX = B, it f o l l o ~ ~ that s C is a solution. Therefore, the elements of C furnish a solution of the original system of linear equations. h'ote that the solution of the system is unique since C = A-'B aiid A-' is unique.

446
in C:

SYSTEMS

OF EQUATIONS

AND MATRICES

[CHAP.

10

EXAMPLE 3. Consider the following system of linear equations with coefficients


ixl
-

ix2

23

The matrix of the coefficients of this system is the matrix A whose inverse was given in Example 2. By our discussion, the unique solution of this system is obtained from the column matrix

thus, ((-17 given system.

+ 4i)/61, ( 3 + 28i)/61, (-24 + 20i)/61) is the solution of the

The above discussion gives some indication of why it is important to be able to decide whether or not a matrix has an inverse, and if it has, to find the inverse. In the remainder of this section, we will describe a practical method* of finding the inverse of any nonsingular square matrix with elements in a field. The process is similar to the method of solving systems of linear equations which was explained in Section 10-2. Suppose that ai,jxj = bi, i = 1, 2, . . . , m, is a system of m linear equations in n unknowns with coefficients in the field F. Let A be the matrix of coefficients of this system. I f we apply an elementary transformation to this system, then a system of linear equations is obtained whose matrix of coefficients B can be described in terms of the matrix A. For example, if the elementary transformation interchanges the equations k and 1, then B is obtained from A by interchanging the rows k and l. This observation motivates the definition of an elementary row transformation of a matrix A in ,M,(F). There are three types of such elementary transformations, which can be described as follows.

* It c a n be shown that a square matrix A with elements in a field is nonsingular if and only if the determinant of A is not zero. An explicit expression can even be given for the inverse of A in terms of certain determinants. However, the method which we will explain below is a more practical way to find A-l than by evaluating these determinants.

10-41

THE INVERSE O F A SQUARE MATRIX

(1) Interchange two rows of A:

(2) Multiply a row of A by some element of F and add to a different row of A:

where c E F, and i # j. (3) Multiply a row of A by some nonzero element of F:


rai,t ai,2 ai,nl

where c E F, and c # O. I t is clear that the method used to prove Theorem 10-2.6 can be employed to show that any matrix can be carried into echelon form:

by a sequence of elementary transformations.

448

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

EXAMPLE 4. Let

By elementary row transformations, A is taken into echelon form as follows:

It should be clear which elementary row transformation is applied at each step.

A sequence of elementary row transformations on a matrix A E ,M,(F) can be accomplished by multiplying A by a matrix P E M m ( F ) . This fact can be used to give a necessary and sufficient condition for a square matrix to have an inverse, and to calculate the inverse when it exists. In order to carry out this program, we need several preliminary results.
be the matrix obtained from the identity matrix (10-4.3). Let I ' " ~ 1 E M,(F) by interchanging the ith and jth rows of 1. Let A E ,M,(F). Then the matrix I c i 9 j )is ~ the matrix obtained from A by interchanging the ith and jth rows of A.
Proof. We have

10-41

THE INVERSE OF A SQUARE MATRIX

449

The matrix ~ ' ~ * has j ' 1's on the diagonal except in the ith and jth rows where the diagonal element is zero, 1 in the (i, j)-position, 1 in the (j, i)position, and zeros elsewhere. I f A E ,M,(F) is a matrix with elements ai,j, then it follows from the definition of matrix multiplication that

For example, I ' ~ * is ~ the ' matrix obtained from the identity matrix in M4(F) by interchanging the second and fourth rows, and

(10-4.4). Let 1 0 ) be the matrix obtained from the identity matrix 1 E Mm(F) by multiplying each element of the ith row of 1 by c E F and adding it to the corresponding element of the jth row (i # j). Let A E ,M,(F). Then the matrix is the matrix obtained from A by multiplying each element of the ith row of A by c and adding it to the corresponding element of the jth row.

IP'A

Proof. Observe that

is a matrix with 1's on the diagonal, the element c

F in the (j, i)-position

450

SYSTEMS OF EQUATIONS AND MATRICES

[CHAP.

10

and zeros elsewhere. Let A have elements ai,j. Then

a1,1

a1,2

For instance, I:'"' is the matrix obtained from the identity matrix in M 4 ( F ) by multiplying each element of the first row of 1 by c and adding to the corresponding element of the third row. Moreover,

! I be the matrix obtained from the identity matrix (10-4.5). Let ) 1 E M m ( F ) by multiplying each element of the ith row of 1 by c # O in F. Let A E ,M,(F). Then I:)A is the matrix obtained from A by multiplying each element of the ith row of A by c.
Pro0.f. The result follows at once when we note that

s a matrix with 1's on the diagonal except in the ith row where c is on the diagonal, and zeros elsewhere.

10-41

T H E INVERSE O F A SQUARE MATRIX

451

We will refer to the matrices and ) ! I as elementary transformation matrices of type 1, 2, and 3, respectively. The results (10-4.4), (10-4.5)) and (10-4.6) show that each elementary row transformation on a matrix can be accomplished by multiplying the given matrix on the left by a matrix obtained from 1 by this same elementary transformation.
1 ( ~ 9 j ) ,

IC.~),

EXAMPLE 5. We will find a matrix P E M3(Q) such that PA is in echelon


form, where

-2

I n Table 10-2, we list a sequence of elementary row transformations which will carry A into echelon form, the corresponding elementary transformation matrices, and the result of performing these elementary transformations.

o
Interchange the first and second rows

2 Multiply the first row by 2

o o

1 12

Multiply the first row by -2 and add to the third row

1 0 0

12

-2

o o

1 O

Multiply the second row by and add to the third row

1 12

[:
o

[o
y 1

o] 1

-:]

Multiply the third row by -&

o o

-*

452

SYSTEMS OF EQUATIONS AND MATRICES

[CHAP.

10

From the table, we see that

and the required matrix P is the product of the five elementary transformation matrices. Since

i t is evident that P is obtained from the identity matrix 1 by performing the given sequence of elementary transformations on 1. Thus, P can be computed without resorting to matrix multiplication. The following steps carry 1 into P by the elementary transformations listed in Table 10-2:

The reader can check that

(10-4.6). Each elementary transformation matrix in Mm(F) has an inverse in Mm(F) which is an elementary transformation matrix of the same type.

Proof. By (10-4.3) when a matrix is multiplied on the left by I""', the ith and jth rows of the matrix are interchanged. Since IKj)is obtained from 1 by interchanging the ith and jth rows of 1, it follows that

10-41

THE INVERSE OF A SQUARE MATRIX

453

Therefore, the inverse of 1'") is ~ ( " j ) . By (10-4.4), multiplying a matrix on the left by I ! ? ! ' , adds -c times each element of the ith row of the matrix to the corresponding element of the jth row. Since 1 : ) ' is obtained from 1 by multiplying each element of ith row of 1 by c and adding to the corresponding element of the jth row, it follows that 1!?:)1E9" = 1. A similar argument shows that I ~ ~ ' = I 1. ~ Therefore ~ ) 15:) is the inverse of 1f9j). Finally, it is easy to check that is the inverse of ~ f ) and , this completes the proof.

I$L

Since any product of nonsingular matrices has an inverse, by Theorem 10-4.2, the following result is obtained from (10-4.6). (10-4.7). A matrix P E Mm(F)which is a product of elementary transformation matrices has an inverse in Mm(F). We now return to the consideration of n-rowed square matrices. One more preliminary result is needed before the main theorem.

f A has 1 (10-4.8). Let the matrix A in Mn(F) be in the echelon form. I in every main diagonal position, then

and it is possible to transform A into the identity matrix 1 in Mn(F) by a sequence of elementary row transformations.

Proof. If the last row of A is multiplied by -dlPn and added to the first row, then multiplied by -dzVn and added to the second row, etc., we obtain the matrix which is identical with A except that d l t n ,d2,n, . . . , dn-l,n are replaced by O:

454

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

Next, the (n - 1)st row of this new matrix is multiplied by -dl,,-1 and added to the first row, then multiplied by -d2,n-1 and added to the second row, and so forth. This sequence of elementary row transformations leads to the matrix

I t is obvious how this process is continued to finally obtain the identity matrix 1. EXAMPLE 6. By using type 2 elementary transformations, the matrix

is reduced to the identity matrix in

M 4 ( Q ) in

the following five steps:

THEOREM 10-4.9. Let F be a field and suppose that A E M,(F). Then A has an inverse in M,(F) if and only if A can be transformed into the identity matrix 1 of M,(F) by a sequence of elementary row transforma-

10-41

THE INVERSE OF A SQUARE MATRIX

455

tions. The inverse of A can be obtained by applying to 1 the same sequence of elementary row transformations that is used to get from A to 1.
Proof. Suppose that A can be transformed into 1by a sequence of elementar~ transformations. Then by (10-4.3)) (10-4.4), and (10-4.5) there is a sequence El, Ea) . . . , Ek-l, Ek of elementary transformation matrices such that EkEk-1.. . E2ElA = 1.

Let B = EkEk-l . . . E2E1. Then BA = 1. We wish to show that B is the inverse of A. By Definition 10-4.1, it is sufficient to prove that AB = 1. Note that by (10-4.7), B has an inverse B-l. From this fact and the identity BA = 1we obtain the desired result: AB = IAB = B-~BAB = B-lIB = B - ~ B = 1. By definition, B = EkEkdl . . . E2E11, so that B is obtained from 1 by applying in order the elementary transformations corresponding to El, Ea, . . . , Ek-l, and finally Ek. This proves the last statement of the theorem. The only thing left to show is that if A has an inverse, then A can be transformed into 1 by a sequence of elementary transformations, Suppose that A-' exists. As we remarked before, any matrix A can be transformed into the echelon form (10-11) by means of a sequence of elementary row transformations. Consequently, by (104.3), (10-4.4), and (10-4.5), there is a matrix P E Mn(F), such that P is a product of elementary transformation matrices and C = PA is in echelon form (10-1 1). To complete the proof, it is sufficient by (10-4.8) to show that C has the form

with 1 in every diagonal position. Suppose that C does not have this form. Then because C is a square matrix in echelon form, it follows that every element of the last row of C is zero. Therefore, by the definition of matrix multiplication, if D is any matrix in Mn(F), then every element in the last row of CD is zero. In particular, C cannot have an inverse. However, C = PA. By assumption A has an inverse, and since P is a product of elementary transformation matrices, it follows from (10-4.7) that P has an inverse. Therefore, by Theorem 10-4.2, C has an inverse. This contradiction shows that C must have the form (10-12), which completes the proof.

456

SYSTEMS OF EQUATIONS

AND MATRICES

[CHAP.

10

The last part of the above proof shows that no matter how a nonsingular matrix A is reduced to echelon form by elementary row transformations, the result will be of the form (10-12). Otherwise A could not have an inverse. Therefore, if a matrix A E Mn(F) reduces by elementary row transformations to an echelon form different from (10-12) (which means that the last row must contain al1 zeros), then A does not have an inverse in Mn(F).
EXAMPLE 7. We will show that the matrix

in N * ( & ) has no inverse. I n fact, by the usual process of carrying A into echelon form, we obtain

At this point, i t is possible to stop, even though complete reduction to echelon form has not been achieved. I t is clear however that elementary row transformations applied to the last two rows of this matrix cannot produce a 1 on the main diagonal in the third row and third column. Therefore, A can be transformed by elementary transformations into an echelon matrix which is not of the form (10-12). Consequently, A has no inverse in M4(Q). Note that this same con-

10-41

THE INVERSE OF A SQUARE MATRIX

457

clusion could not be obtained from the next to last matrix in the above sequence, because of the presence of 4 in the fourth row, second column. EXAMPLE 8. Let us apply the process described in Theorem 10-4.9 to obtain the inverse which was given (without any motivation) for the matris

in Example 2. From the second line on, the first column of Table 10-3 describes a n elementary row transformation. The second and third columns give the matrices which are obtained by applying these elementary transformations to the corresponding matrices of the preceding lines. The second and third columns of the first line contain the matrices A and 1 in M 3 ( C ) , respectively. The second and third columns of the last line of Table 10-3 (see pp. 458 and 459) contain 1and A-l.

1. Check that AA-1

A-lA

1 in Example 2.

2. Complete the induction in the proof of Theorem 10-4.2.

3. Show by an example that the sum of two nonsingular matrices is not necessarily nonsingular. Can the sum of two singular matrices be nonsingular?
4. Carry the following matrices into echelon form (10-11) by elementary row transformations.

5. Write the following elementary transformation matrices in A l 5 ( & ) : I " ~ ~ ' , ,1 , 1 ,1 Describe in words the elementary row transformations to which each af these matrices corresponds.

l+i

O 0 1

Multiply the first row by -i


O -1
l+i

0 0 1 -i O O

Multiply the first row by -3i and add to the second row

-1

-i

O O

Multiply the first row by -1 and add t o the third row

-1

Interchange the second and third rows


O 4+3i

-3

-3

1 O

Multiply the second row by -(4 3i) and add to the third row

1 -1

-i

-(1

+ lli)

-4i

1 -(4

+ 3i)

THE INVERSE OF A SQUARE MATRIX

*
4

Ih M

l + .+

'+

'+

1
4

12
b
M
l +

+y3
w

N 12
M M

b
M

'+
4

I
.@

-7
m &

cn
04

+;M1

1;1

+;i
+

'+
Cu Cu m

-+

'cr

m M
Cu

rmi

w
CV

'cr

LO

-+

m
CV

1s + S 2 m l I r +Ir n

m
1

; z I

-+-lz
1

' N '
'
*

-- +
4
0

' N '
O +

0
4

0
4

.+

o
0

G 4
al+ B
h l g

a
a3
Q

33 U1
.M

S!

k '
Q)

-+
h

zc-i

h a s

5
33 O

Q)

B o

ll

2 2

3 75 3 h * a .+ 2 e

5 =,m al 5a
k

3"

" 2 4;*

xz

2
G

-a

0,
Q) UI)

.e
33

hOJ e

33

5 L

5:

g7;

r"+z

a 4

5
B

Q)

E:

460

SYSTEMS OF EQUATIONS AND MATRICES

[CHAP.

10

6. Find a matrix P such that PA is in echelon form (10-11) for each matrix A listed in Problem 4. 7. Find the inverses of the elementary transformation matrices of Problem 5. 8. Which of the matrices of Problem 4 have inverses? Find the inverses when they exist. 9. Prove that A E ilPn(F) has an inverse if and only if A is a product of elementary transformation matrices. 10. Let A be the matrix of coefficients of the homogeneous system of n equations in n unknowns with coefficients in a field F :

Write this system in matrix form,

(a) Show that if B is a nonsingular matrix, then the solutions of AX = O are the same as the solutions of (BA)X = 0. (b) Use the result of (a) to prove that A is nonsingular if and only if AX = O has only the trivial solution X = O . [Hint: To prove that this condition is sufficient, let B be a product of elementary transformation matrices such that BA is in echelon form. Use the result of Theorem 10-2.7(b), together with (104.8) and Theorem 10-4.9.1 (c) Use part (b) to prove that if A E M,(F) is such that BA = 1 for some B E Mn(F), then A is nonsingular, and B = A-l.
11. Prove that the matrix

is nonsingular if and only if its determinant

- ai,2a2,1a3,3
is not zero.

APPENDIX 1

THE PROOF OF STURM'S THEOREM THEOREM Al-l. Sturm's theorem. Let f(x) be a polynomial in R [ x ]with Sturm sequence

f (x), f '(x),
s1(x) = Y ~ ( x ) ~ ' ( x) f(d, s2(x) = q2(x)s1(4 - f ' ( ~ 1 , 83 ( 2 ) = Q O ( X ) ~ Z ( X ) (x), sk(x) = ~ k ( x ) s k - l ( x )- sk-2(~). Let c and d be real numbers such that c < d and f (c) # O and f ( d ) # 0. For each real number t, let N ( t ) be the number of variations in sign in the sequence ( 1 ) . Then the number of distinct real roots of f ( x ) between c and d is equal to N (c) - N ( d ). Proof. The first step in the proof is to replace the Sturm sequence (1) by a modified sequence for which the value of N ( c ) - N ( d ) is the same as for (1). The last Sturm polynomial sk(x) is a g.c.d. of f ( x ) and f ' ( x ) (see Section 9-11), and is a divisor of every polynomial in the Sturm sequence (1). The modified sequence is

Since sk(x) divides f ( x ) and f (c) # O, it follows that sk(c) # O. Therefore, dividing each polynomial of sequence (1) by sk(x)leaves the signs the same a t x = c if sk(c) > 0 , and reverses each sign at x = c if sk(c) < O. In either case, the number of variations in sign in the sequence

462

APPENDIX

is the same as the number of variations in

That is, N(c) is the same for sequence (2) as for sequence (1). Similarly, sk(d) # O since f(d) # 0, and N(d) calculated from (2) is the same as N(d) computed from (1). Thus, the modified Sturm sequence (2) yields the same value of N (c) - N (d) as the original sequence (1). We next observe that the real roots of g(x) are the same as the real roots of f (x), although possibly with different multiplicities. In fact, suppose that the distinct real roots of f(x) are ul, u2, . . . , U,. Then

where a is a nonzero real number, ml, m2, . . . , m,, nl, n2, . . . , n, are natural numbers, and ql (x), q2(x), . . . , q,(x) are distinct monic polynomials of degree greater than one which are irreducible in R[x] and consequently have no real roots. By Theorem 9-6.4,

where b is some nonzero real number. Hence,

Thus, the different real roots of g(x) are also ul, u2, . . . ,and u,. Moreover, we note for future reference that each ui is a simple root of g(x). Thus, to prove the theorem, it is sufficient to show that N(c) - N(d) calculated from sequence (2) is the number of roots of g(x) in the interval from c to d. Divide the interval between c and d at each point which corresponds to a root of any one of the polynomials in the sequence (2). We then have a finite set of real numbers

such that each xi for 1 5 i < r is a root of some polynomial in (2), and every root of every polynomial in (2) in the interval from c to d is in the Thus if t satisfies xj-1 < t < xi for set (xo, xl, x2, . . . , x,-1, x,). i = 1, 2, . . . , r, then none of the polynomials in (2) is equal to O at x = t . The proof is carried out by showing that (i) the value of N(t) remains unchanged in each interval xi-1 < t < xi, (ii) the value of N ( t ) is the same in two adjacent intervals ~ i < t < xi and xi < t < xi+l if is not a root of g(x), and (iii) the value of N(t) for xi < t < xi+l is one less than the value of N(t) for xi-1 < t < xi if xi is a root of g(x).

APPENDIX

463

Proof of (i). Suppose that one of the polynomials in sequence ( 2 ) changes sign in an interval xi-1 < t < xi. Denote this polynomial by h(x). Then h(tl) and h(t2) have opposite signs where xi-1 < tl < t2 < xi. By Theorem 9-10.1, h(x) has a root between tl and t2. However, this contradicts the fact that every root of h(x) between c and d is in the set {x0,X L , x2, . . . , x,-~, x,). Therefore, every polynomial in sequence (2) has the same sign for al1 t such that xi-1 < t < xi. This implies that N(t), which is the number of variations in sign in the sequence

is the same for al1 t such that xi-1

< t < xi.

Proof of (ii). Suppose that xi is not a root of g(x). We will compare the sequences g(t>, go(t>, g1(t), , gk(t) = 1, (3)
s . .

where xi-1

< t < xi, and

By the proof of (i), the signs of the numbers in (4) are the same as those for the corresponding numbers in (3), except that some of the numbers in sequence (4) may be zero. Observe that the first and last terms in (4) are not zero, since X i is not a root of g(x) and gk(xi) = 1. Moreover, no two consecutive terms in (4) are zero. For otherwise, examination of the equations (2) shows that al1 following terms would be zero. I n particular, gk(xi) = O, which is impossible. It also follows from (2) that those numbers in sequence (4) which are adjacent to a zero have opposite signs. For example, if g2(xi) = O, then since g3(x) = q3g2(x) - gl(x), we have O # g3(xi) = -gl (xi). Therefore, at a place where a zero occurs in (4), there are the following possibilities for the signs in the sequences (3) and (4) :

Thus, the variation in sign that occurs in (3) is preserved in (4). Hence, N(t), the total number of variations in sign in (3), is the same as N(xi), the f t satisfies xi < t < xi+l in (3), total number of variations in sign in (4). I then the above argument shows that N(xi) = N(t). Therefore, N(t) is the same for al1 t such that xi-l < t < xi+l, which completes the proof of (ii). The reader should observe that since g(c) O , g(d) O, we have in-

cidentally proved that N ( t ) is the same for al1 t such that c = z o as well as for al1 t such that xrAl < t x, = d .

<

5 t < xl

Proof o f (iii). Xote first that if xi is a root of g(x), then i # O and i f: r, since g(xo) = g(c) # O and g(x,) = g(d) f: O. Suppose that xi is a root of f ( x ) of multiplicity m . Then

where xi is not a root of a ( x ) . Moreover, s ~ ( x = ) ( X - X ~ ) ~ - ' S ( X ) , where xi is not a root of s(x). Thus, s(x) and x - xi are relatively prime, so that s(x) divides a ( x ) . Since sk(x) divides

it follows that s(x) also divides a l ( x ) . Let b(x) = a ( x ) / s ( x ) and c(x) = a' ( x ) / s ( x ) . Then we have

where xi is not a root of b(x). I t follows that b(t) # O for al1 t such that xi-1 < t < xi+l. Indeed, b(xi) # O, since xi is not a root of b(x). If b(t) = O for t # xi, then by ( 6 ) g(t) = O. This is impossible because x; is the only root of g(x) between xi-1 and xi+l. It therefore follows from Theorem 9-10.1 that b(t) has the same sign throughout the interval xi-1 < t < xi+l. Suppose that b(t) > O for al1 t in this interval. Then

if xi-1 if xi

< t < xi, and


g ( t ) = (t - xi)b(t) > O

< t < xi+l.


go(xi) = mb(xi)

BY (71,

+ (xi - xi)c(xi) = mb(xi) > 0.


< t < xi+l.
gk(t) = 1 Hence, for

Therefore, go(t) > O for al1 t such that xi-1 xi-1 < t < xi, the signs of the sequence go(t), g1(t), are

-+...

+,

and for xi

< t < Xi+l,

the siglls are

This same result is obtained if we suppose that b(xi) < O. If xi is not a root of any polynomial in (2) excbeptg(.~), then each term of t,he abbreviated sequence go(t), gdt), , gk(t)
s . .

has t,he same sign throughout the interval ai-1 the complet,e sequence

< t < xi+l.

In this case,

has exactly one less variation in sign when X i < t < xi+l than when zi-1 < t < xi. If xi is a root of some polynomial in (2) other than g(x), then X i must bc a root of one of the polynomials gl (x), g2(.T), . . . , gk- 1 (x), since go(xi) # 0 and gk(xi) = 1 # O. I t is now possible to use the rcsult of (ii) applicd to t,hc sequence

That is, since go(xi) # O and gk(xi) # 0, the number of variations in sign in go(t), g i w , g2(t), 9 gk-l(t), gk(t) is the same for xi-1 < t < xi as for n.; < t < xi+l. Thcrefore, in evcry case, the value of N(t) is exactly one less in the interval xi < t < xi+l than in the interval xi-1 < t < Ji. This completes the proof of (iii). Combining the rcsults (i), (ii), and (iii), we have proved that the only change which occurs in the value of N(t) for c 5 t d is that N(t) is diminished by 1 a t each root of g(x) in the given interval. Thercfore, the number of roots of g(x) [which is the number of distinct real roots of the polynomial f(x)] between c and d is N(c) - N(d).

<

APPENDIX 2

THE PROOF OF THE FUNDAMENTAL THEOREM OF SYMMETRIC POLYNOMIALS


In this appendix we will prove the fundamental theorem on symmetric polynomials. Actually, a slightly stronger result than Theorem 10-1.11 will be obtained. This strengthening is motivated by the following observation : (1) If a(x1, x2, . . . , x,) is symmetric in D[xl, x2, l < i < j L r Deg,, [a(xl, x2,

. . . , xr], then for

. . . , xr)] = . . . , xr)I

[a(xi, X2,

, xr)I-

In fact, it is easily seen that Deg,, [a(xl, . . . , xi, . . . , xj, for any a(xl, x2,
= Degzj [a(xl, . . . , xj,

. . . , xi, . . . xr)])

. . . , X,) E D [ x ~~, 2 .,. . , xr]. Since

if a(xl, x2, . . . , x,) is symmetric, the assertion (1) is proved. The result which we will prove is the following.

THEOREM A2-l. Let a(xl, x2, . . . , x,) be a nonzero symmetric polynomial in D[xl, x2, . . . , x,] such that Deg,, [a(xl, 2 2 , . . . , x,)] = = Degxr [a(xl, xa, . . . , x,)] = n. Then there is a polynomial

of tot,al degree n such that

Proof. The proof is in the form of a double induction. The first induction is on the number r of indeterminates. The second induction is on n, and it occurs in proving the induction step : if the theorem is true for symmetric polynomials in fewer than r indeterminates, then it is true for polynomials
466

in r indeterminates. Before carrying out this induction, it is convenient to establish some prcliminary facts.
( 2 ) If a ( s l , x2, . . . , X r - 1 , xr) is symmetric in D [ x l ,x2, . . . , xr],and

thcn each of the polynomials bi(xl, x2, . . . , x r A l ) is symmetric in

is any permutation of { 1 , 2 , . . . , r

11, then

is a permutation of {1, 2, . . . , r - 1 , r ] . Since a ( x l , 2 2 , symmetric in D [ x l ,x2, . . . , xr-l, x,], it follows that

. . . , ar-l,

~ r is )

That is,

Thus by Dcfinition 9-2.1, for i = 0 , 1, . . . , n,

Since j l , j2, . . . , jr-l was an arbitrary permutation of 1, 2, . . . , r - 1, it follows that each bi(xl, x2, . . . , x , - ~ ) is symmetric in D [ x l , 2 2 , . . . , xr-l]. The clementary symmetric polynomials in D [ x l ,x2, . . . , xr]were defined in Definition 10-1.9 by t,he ident'ity

(where we have written S:) instead of s f ) ( x l ,x2, . . . , x,) for simplicity). I t is convenient t o also define

S;)

. . xr) = 1) = S ; ~ ~ (2 X 2)~ . .,. , xr) = o.


SO'(xl, ~
2 , .
j

Using this convention, \ve obtain the next result.


(3) I f r

>

1, then
=

s Y ) ( x l ,X Z , . . . , xr) for 1 5 i 2 r . B y definition,

S ir-1) i
i-1

( ~ 1 x2) )

xr-1) xr-1)
'

+ s(r-1)

( ~ 1 X,2 ,

...

Xr

Also, since r

>

1,

= =

[(xr+1 - 2 1 )
[x:$ S:'-1)

(xr+i

xr-i)I(xr+i
2

- 2,)

. xr-2 r+l
(r-1)

+~(r-1)

xr+l -

T-3

+ (-1)
=

r-1

Sr-1 I(xr+l - xr)


r-1

(x;+l1 Xr+l +( , r .r-lr+l + S ' -11 ) .

s(r-l)

+ s(r-1).
2

Xr+l
S

y-2

+ ( - l ) ' - ' s ~ xr+l) ~~)


( r - 1 ) . Xrxr+l )T-1 Sr-2

X r X rr-2 +l

+ (=

1)Ts;_;l)xr)
1

+ S ~ - ~ ) X , ) X : T : + (S:-" + S('-" xr)x:+: + . . + (-l)T-l(~;~ +ls)~ > ~ ) x ~ ) + x ~(+ - l~ ) r ( ~ ; - l+ ) ~F5~)xr)x:+l - (S:'-"

Therefore,

(4) For r 2 1 and 1

5 i 5 r, s:)(x~, x2, . . . , x,)


2 ,

# O and

DegZr [ s Y ) ( x 1~ ,

. . . , xr)] =

l.

If r = i = 1, then S j r ) ( x l )= x1, for which thc statement (4) is t,rue. We can thcrefore make the induction hypot,hesis that (4) is true for r - 1. Note that also S:-'' = 1 # O. Hence, if 1 i r, it follows from (3) that S f ) ( x l ,x 2 , . . . , xr) # O and it,s degrcc in x, is cxactly one.

< <

( 5 ) Suppose that g(x1, x2, . . . , x,) E D [ x l , x2, . . . , x,] has total degree m. Thcn g ( ~ f S:), ) , . . . , S:)) is symmetrie in D [ x l ,x 2 , . . . , ~ r ] .Moreovcr, if* g ( S f ' , S.$', . . . , S:') # O, then Degxr S;', . . . , S!))] m.

[~(sY),

<

The fact that g ( ~ ? ) S;), , . . . , S:') is symmetric \vas observed in (10-1 .lo). To prove t,he sccond statcment, note that by (4), Des,, [(S';') '1

. (sF)) . . . . . (S:)) ''1


'2

il Deg,, [S:"] i2Deg,, [S:'] = il + i 2 + - + Si r .


g(x1, 2 2 , . , . , J,)

+
. . a$.

+ irDeg,'

[S?)]

Let
=

. .

If g(S?', S;', . . . , S!') # O, t,hen


Deg,,

[ g ( ~ yS;), ',

. . . , S:')]

< msx
=

{DcgX,[(S:")" ( S r ) ) i2

( S r ) )i r ] / ~ i# 0 )

max ( i l

+ i2f

+ irlci # O } ,

which by definit,ion is the tot,al degrce m.

( 6 ) Let a ( x l , x2, . . . , xr-1, xr) bc symmctric in D [ x l ,x2, . . . , x,]. If a ( x l , x2, . . . , x , - ~ , 0 ) # O, then


Deg,,,

[ a ( x l ,a2, . . . , x,-1, O ) ] 5 I)eg,, [a(.rl, z2, . . . , xr)l.

This statemcnt is a dircct conseyuencc of ( 1 ), bccause obviously Deg,,-, [ a ( x l , x 2 , .. . ,x,-1,0)]

< DcgxPl [ a ( J i , Q , . .
=

,.z.r-~,~r)l

Deg,, [a(.rl,5 2 , . . . , X T - 1 , xr)I-

(7) Let a ( x l , 5 2 , . . . , xr-1, x r ) be symmetric i11 D[.rl, a2, . . . , x,]. If a ( x l , x2, . . . , x,-1, 0 ) = O, then there is a polynomial b ( x l , 5 2 , . . . , x,) which is symmet,ric in D [ x l , x2, . . . , x r ] ,and such t,hat

* I t can be proved that if g(x1, x2, . . . , x,) However, this fact \vil1 not be needed.

# O, thcn g(SP),S;),

. . . , S!')) # 0.

470 Let

APPENDIX

OurassumptionisthatO I f r = 1, then

a(xl, 22,.

. . , Xr-1, 0)

bo(xl,xs,.

. . , xr-1).

brnx7-l) = s\l)(x1) b(xl), a(xl) = xl(bl b2xl . which proves (1) in the case r = 1, because every polynomial in D[xl] is symmetric. Assume that (7) holds for r - 1. We have

+ +

By (2), each bn(x1,x2, . . . , x,-~) is symmetric in D[xl, x2, . . . , xr-l]. Moreover, since a(xl, x2, . . . , xr-1, x,) is symmetric in D[xi, X2, , xrI,

From the assumption that a(xl, x2, . . . , x,-~, O )

O , it follows that

Hence, for k = 1, 2, . . . , m, we obtain bk(xl, x2, Therefore, by the induction hypothesis,

. . . , xr-2,O)

0.

s:"(xl, 22,
-

xr)
,
9

+ sTJ1)(xl, ~ 2 , 51-1) = o + sl_;l)(x1, x2, . . . , xr-1)


9

sy-1) ( ~ 1~, 2

xr-1)

5 1

xr,

so that

APPENDIX

is any permutation of (1, 2,

. . . , r), then

Consequently, since D [ x l ,x2, . . . , x,] is an integral domain and

Therefore, b(xl, x2, . . . , x,) is symmetric in D [ x l ,x2, . . . , x,]. This completes the induction which proves (7).
(8) We can now give the inductive proof of the fundamental theorem on symmetric polynomials. For r = 1, there is essentially nothing to prove: every polynomial in D[xl ] is symmetric, and S " ( x1 ) = xl . Assume therefore that r > 1, and that Theorem A2-1 is true for polynomials which are symmetric in D [ x l ,x2, . . . , x , - ~ ] . Let

f n = O, then where b,(xl, 2 2 , . . . , X r - l ) # O in D[xl, x2, . . . , x,-~]. I Deg,, [a(xl,2 2 , . . . , X,)] = O. Hence, by ( 1 ) ,Deg,, [a(xl,x2, . . . , x,)] = O for al1 i with 1 2 i 5 r. That is a ( x l , x2, . . . , x,) = a E D. In this case, = a, and a(x1, x2, . . . , x,) = take f ( x l , x2) . . . , x T ) = a(xl, x2, . . . , ~ i . ) f ( S f ) , S$), . . . , S:'), where f(x,, x2, . . . , 5,) has total degree zero. Therefore, let us make our second induction hypothesis: Theorem A2-1 is true for polynomials which are symmetric in D [ x l , x2, . . . , xr] and have degree in x, less than n. By ( 2 ) ,bo(xl,x2, . . . , 2,-1) = a(x1, 5 2 , . . . , xr-1, O ) is symmetric in D [ x l ,x2, . . . , x,-~]. The two cases a(x1, x2, . . . ,x,-1, O ) = O and a ( x l , x2, . . . , x,-1, O ) # O are treated separately. Suppose first that ~ ( $ 1 x2, , , xr-i,O) = 0- BY (7),

472
where d ( x l , 2 2 ,

APPENDIX

. . . , xr) is symmetric in D [ x l , x2, . . . , x,]. Since


Degx, [SY'(xl, ~
2 ,

, xr)I
2 ,

1, 1
=

it follows that Degx, [d(xl,x2,

, x,)]

= Deg,, [a(xl,~

. . . , xr)I

n - 1-

By the second induction hypothesis, there is a polynomial h ( x l , x2, . . . , xr) of total degree n - 1, such that

Let f(x1, ~ 2 . ,. . , xr) = total degree n, and

Xr

, h(x1, X Z , . . . , xr). Then f ( ~ 1 ~

2 ,.

. . , xr) has

a ( x l , x2, . . . , x,)
NOWsuppose that

=
=

S:'

~ ( s Y ' ,S:', . . . , S:')


. . . , S:', .

f (SY', S:',

By the first induction hypothesis, there is a polynomial g(xl, 2 2 , . . . , ~ r - 1 ) in D [ x l ,x2, . . . , x , - ~ ] ,having total degree a t most n, such that

bo(xl, 2 2 , . . . , xr-1)
Let

g ( ~ Y - l ' ,S;-'' . . . , s:z:').


. . . ,x,)
-

c ( x ~~,

2 ,

. . . , x,)

a(xl, ~

2 ,

g ( ~ y 'S:', J

. . . , SI-1).

If c ( x ~~, 2 . ,. . , xr)

O, then

a ( x l , x2,
In this case, let f ( x l , x2,

. . . , x,)

~ ( s Y ' ,S:', . . . , SYLl).


=
g ( ~ 1~ , 2 ,

. . . , xr-1, xr)

. . . , xr-1). Then

where the total degree of f ( x l , x2, . . . , x,) is at most n. If the total degree of f ( x l , x2, . . . , xr) were less than n, then by ( 5 ) , we would have Deg,, [f(S:", S:',

. . . , S:')] < n. . . . , x,)]


= n. Therefore, the total

This is impossible since Deg,, [ a ( x l ,x2,

degree of f ( x l , x2, . . . , .E,) is exactly n. Finally, if then by ( 5 ) , DegXr[ c ( x l ,22, . . . , x,)]

~(21 ~, 2 ,

. . . , 2,)

O,

max {Deg,,

. . . , x,)], Deg,, [ ~ ( s Y 'S:", , . . . ,S Z ~ )= ]} n.


[ a ( x i ,22,
y(,-1)

Moreover, by (3)) s ~ " ( x ~ x2, , . . . , x,-l, O ) = Si 1 5 i 5 r - 1, so that


~(21 x2, ,

(21, 22,

xr-1)

for

, 2r-1,O)
=

. . . , xr-1, 0 ) - g(sy"(xl,22, . . . X T - 1 , O ) , . . .
a(x1, ~
2 ,
)

~111(21 22) ,

..

xr-1,

O))

bo(x1, 22, . . . , xt-1) - g ( S(,-l) i

SS-" , . . . ,Sr-1 (1-1) )

= O.

At this point wc have reached essentially the same situation as when we assumed that a ( x l , x2, . . . , x , - ~ , O ) = O : c ( x l , x2, . . . , x,) is symmetric in D[.rl,x2, . . . , x,], has degree in x, a t most n, and

Thereforc, by t,he proof for that case, there is a polynomial e(zl,22, . . . , x,) of total degree a t most n such that

Let f ( x l , x2,

. . . , x,)

e(x1, x2,

. . . , 2,)
=f

+ g(x1,

22,

. . . , x,-1).

Then

a ( z i , 22, . . . , x,)

(sY), S:), . . . , SI)),

wherc the total degree of f ( x l , z 2 , . . . , z,) is a t most n. As before, it follo~vs from ( 5 ) )that t,he total degree of f ( xl , x2, . . . , x,) is exactly n. The induction is thcreforc complet,e and Theorem A2-1 is proved.

APPENDIX 3

THE PROOF OF THE FUNDAMENTAL THEOREM OF ALGEBRA


The purpose of this appendix is to give a proof of Theorem 9-8.1, the fundamental theorem of algebra. Severa1 of the preliminary results needed for the proof are interesting and important, and we will prove them in a more general form than is needed for our immediate purposes. The first step in our program is to obtain a weak first approximation to Theorem 9-8.1.

THEOREM A3-1. Let F be any field, and let p(x) be a polynomial which is irreducible in F[x]. Then there is a field K which contains F as a subring, such that p(x) has a root in K.
Proof. The construction of K uses the method indicated in Problem 3, Section 6-5. We will leave most of the details for the reader to fill in. Define a relation on F[x] by the condition
a(x)

b(x) if p(x) divides a(x) - b(x) in F[x].

The followingfacts can easily be verified, using the properties of divisibility in F[x]. (1) is an equivalence relation. (2) If a(x) b(x) and c(x) d(x), then a(x) c(x) b(x) d (x), a(x) . b(x) c(x) . d(x), and -a(x) -b(x). (3) If a and b are in F, then a b if and only if a = b. (4) P(X) 0. Define K to be the set of al1 equivalence classes [a(x)] of elements of F[x] under the equivalence relation (see Definition 6-4.3). Define operations 0 , 0,and O in K by the conditions:

- -

are well-defined operations Using (2), it is easy to show that 0 , 0, and on K (see the discussion a t the beginning of Section 6-5). Moreover, with these operations, K is easily seen to be a commutative ring with an identity [ll. BY (3), a [al
+-+

is a one-to-one correspondence between F and a subring {[a]la E F) of K.

APPENDIX

475

I t follows easily from ( 5 ) that this correspondence is an isomorphism: for example a b = c implies [a] @ [b] = [c].

b c, so that by (3), a b = c. Conversely, if [a] @ [b] = [c], then a As usual, we identify F with the subring {[alla E F } of K, and for simplicity write a instead of [a]. Let us also write u for [x]. It then follows by induct,ion from (5) that

u, azu2 = [a2] u (where a l u = [al] u, and so forth) for any ao, a l , . . . , a, in F. In particular, considering p(x) to be an element of K[x], we can substitute u for x in p(x) to obtain from (4),

Therefore, u is a root of p(x) in K. The only thing left t o show is that K is a field. Here, for the first time, we use the assumption that p(x) is irreducible in F[x]. We must show that any nonzero element of K has an an inverse (see Problem 12, Section 6-2). I f v is any element of K, then v is of the form [a(x)] for some a(x) E F[x]. The assumpt,ion v = [a(x)] # O means that a(x) is not equivalent to O, that is, p(x) does not divide a(x) - O. Since p(x) is irreducible, the monic greatest common divisor of a(x) and p(x) must therefore be 1. Thus, by Theorem 9-4.4, there exist polynomials g(x) and h(x) in F[x] such t,hat g(x)a(x) h(x)p(x) = 1. Therefore, g(x)a(x) 1, so that

This proves that every nonzero element of K has an inverse. Having constructed K in this proof, we will now revert to our usual notation and - for the operations in K as well as in F. Although the proof of Theorem A3-1 makes essential use of the fact that p(x) is irreducible, this restriction is not really necessary, as the following strengthened version of Theorem A3-1 shows.

+,

e ,

THEOREM A3-2. Let F be any field, and let a(x) be a polynomial of degree m > O in F[x]. Then there is a field K containing F as a subring, such that in K[x]

where a. is a nonzero element of F and ul, u2, . . . , Um are elements of K.

476

APPENDIX

Prooj. The theorem is clearly true for polynomials of degree one, in which case we can let K = F. Therefore, assume that m > 1 and that the theorem holds for al1 polynomials of degree less than m, with coefficients in any field. If a(x) is not irreducible in F[x], then it is possible to write a(x) = b(x) cx), where O < Deg [b(x)] < m and O < Deg [c(x)] < m. Consequently, the induction hypothesis applies to b(x) and c(x) . Hence there is a field L containing F as a subring, such that

where r = Deg [b(x)],bo f 0, and ul, u2, . . . , u, belong to L. h'ow think of c(x) as a polynomial in L[x], and apply the induction hypothesis again to obtain c(x) = cO(x- vl) . . . (x - v,), where S = Deg [c(x)], co f O and vl, . . . , v, are in a field K which contains L as a subring. Thus, F E L K , and F is a subring of K. In K, we have

Let a0 = boco # O. Obviously a. is the leading coefficient of a(x) , so that a. E F. Since ul, . . . , u,, vl, . . . , v, belong to K, and r S = Deg [b(x)] Deg [c(x)] = Deg [a(x)] = m, the proof of Theorem A3-2 is complete in the case that a(x) is not irreducible. Suppose therefore that a(x) is irreducible. By Theorem A3-1, there exists a .field L containing F as a subfield such that a(x) is not irreducible in L[x]. Indeed, by the factor theorem, Theorem 9-7.5,

a(x) = (x - u) d (x), where u E L and d(x) E L[x]. Hence, by what we have just shown, there is a field K containing L as a subring (therefore also containing F as a subring) such that

where ul, u2, . . . , u, are in K, and a0 f O . Again, since a. is the leading coefficient of a(x), it must belong to F. This completes the induction and proves Theorem A3-2. The proof of the fundamental theorem makes use of some special polynomials, which are defined as follows. Let h be a natural number, and let x, xl, x2, . . . , X, be distinct indeterminates, with m >_ 2. Let

It is useful to consider gh(x) in two ways: as an element of

APPESDIX

477
The notation gh(x) c o n f ~ r m s

and as an element of (Z[xl, x2, . . . , x,])[x]. mit,h this second viewpoint.

This is clear, because the number of distinct pairs (i, j ) satisfying i < j 5 m is exa,ctly (2) = +m(m - 1) (see Section 1-3).

( 4 Considered as an element of (Z[x])[xl, x2, . . . , x,], the polynomial gh(x) is symmetric. Consequently, we can write

where kh(x, 2 1 ,

~ 2 ,

. . . , xm)

E Z[x, xl, ~ 2 ,

, xmI-

Yrooj. To prove this, it is sufficient,, by Theorem 10-1.7, t o show t h a t gh(x) is left. unchanged when xk and xz are iriterchanged, for each pair ( k , Z) with 1 5 Ii < Z 5 m. S o t e that gh(x) can be writt,en in the form

Pl ' P 2 ' P 3 ' P 4 ' P 5'P6'P7 'P8,


where

P1 = IIi<i<j5m)i+k,i+Z, j#k, j + Z (X - X i - X j - hxixj), P2 = n i < k (X - X i - Xk - hXixk), P3 = r][j>l (X - XZ - x j - h~zxj), P4 (X - X i - X l - hxixl), P5 = n k < i < l (X - Xi - xl - ~ X ~ C C Z ) , P6 = n k < j < l (X - Xk - X j - hXkXj), (X - xk - x j - hxkxj), P7 = P8 = X - Xk - Zl - hXkx1.
The effect of interchanging xk with xl in these various products is clearly that P1 is left, unchanged, P2 goes into P4 and P4goes into P2, P3 goes into P7 and P7 goes into Ya, P5 goes into Y 6and P6 goes int.0 P5, and

P8 is Ieft unchanged.
Consequently, gh(x) = P1 P2 P3 P4 P5 P6 Y 7 P8 goes into II1 P4 Y7 P2 Y 6 P5 P3 Y 8 = gh(x). Elence, gh(x) is symmetrical in xl, x2, . . . , x,. The last statcment of (A3-4) is a conscquence of the fundamental theorem on symmetric polynomials, Theorem 10-1.11.

478

APPENDIX

We are now in a position to show that every real polynomial has at least one root in the field C of complex numbers. THEOREM A3-5. Let a(x) E R[x] have degree m least one complex number u such that a(u) = 0.

> O.

Then there is at

Proof. The proof of this theorem is carried out by induction on the highest power of 2 which divides m, that is, on the nonnegative integer n for which m = 2"k, where k is odd. I f n = O, then m is odd, and a(x) has a real root, by Theorem 9-10.4. Therefore, we can assume that n > O and make the induction hypothesis that every polynomial f(x) E R[x] for which the highest power of 2 dividing Deg [f(x)] is 2"-l has a complex root. Our objective is to prove that the polynomial a(x) which is of degree 2"k (with k odd) has a complex root. Consider a(x) as an element of C[x]. By Theorem A3-2, there is a field K containing C such that

where a. # 0, and ul, u2, . . . , u, belong to K. We wish to prove that a t least one of the ui is in C. Since a. is the leading coefficient of a(x), it must be a real number. Thus, we have

Since a(x) E R[x] and a. # O, it follows that

are real numbers. For each natural number h, let

That is, fh(x) is obtained from gh(x) by substituting u1 for zl, u2 for x2, . . . , and u, for x , . By (A3-4), fh(x) can be considered as a polynomial in x, with integral coefficients. Thus, considered as a polynomial in x, fh(x) belongs to R[x]. Moreover, by (A3-3) Deg [fh(x)] = *m(m Therefore, since n
-

S(,) 1 (~1,U2,...,~m),S2'(~1,U2,...,~m),...,andS2'(~1,~2,...,~,

1) = 2"-'k(2"k

- 1).

> 0, the highest power of

2 which divides Deg [fh(z)]

APPEXDIX

479

is 2"-l. Consequently, the induction h~pothesis applies to fh(x), that is, fh(x) has at least one complex root. However, we know from the definition of fh(x) that its roots are

Thus, for some pair (k, 1) with 1 k < 1 2 m, the element uk uz hukuz belongs to C. Note that k and 1may depend on the integer h. However, such a k and 1 exist for every natural number h. In particular, among the integers 1, 2, . . . , +m(m - 1) 1, there must be two diferent values of h, say h = r and h = S, such that uk u1 rukul is in C and uk u1 sukul is in C for the sume pair (k, 1). Otherwise, we could obtain a one-toone mapping of the set (1, 2, . . . , +m(m - 1) 1) into the set ((i, j)ll i < j 5 m), which contains only +m(m - 1) elements. I f uk u1 rUkUl E C and uk uz sukul E C, then (r - s)ukuz E C, so that ukul E C, and hence also uk u1 E C. By Theorem 8-2.7, the polynomial - UZ) = x 2 - ( ~ k ~ 1 ) x UkUl (X - u ~ ) ( x

<

+ +
+ +

+ +

+ +

<

+ + +

has two roots in C, and of course these must be uk and ul. Thus, uk and ul are both in C. This completes the induction.

It is worth the reader's effort to examine the proof of Theorem A3-5 very carefully. This proof is the deepest argument which he will find in this book. Both the basis of the induction and the induction step use fundamental results from algebra and the theory of the real number system. The induction itself is somewhat unusual in that the induction variable is the exponent n of 2 in the factorization of Deg [a(x)] into powers of primes. Our objective can now be easily attained.
THEOREM A3-6. Fundamental theorem o f algebra. Every polynomial of positive degree in C[x] has at least one root in C. Proof. Let f (x) = co clx eo, el, . . . , en in C. ~ e f i n e f ( x )= E. complex conjugate of ci. Let

+ +

+ cnxn, cn # O, n 2 1, and + Elx + + &xn, where E i is the

where ak = xi+j=k ciEj. Note that

APPENDIX

for O and

5k 5

n, and

ak = ck-nzn

+ ck-n+lzn-1 + + ~n-~zk-n+l+
' '
'

~nzk-n7

for n < k 5 2n. In both cases, akis the same sum as ak, except in reverse order. Hence, ak = ak for al1 k. This means that the numbers ao, al, . . . , a2, are al1 real, and therefore a(x) E R[x]. The degree of a(x) is 2n, because apn = cnEn = ]cnI2# O. Thus, a(x) has positive degree, so that by Theorem A3-5 there is a complex number u such that a(u) = O. That is, f (u) f (u) = O. Therefore, either f (u) = O, in which case f (x) has the complex root u, or else f(u) = O. However,

so that f(u) = O implies f()= O. Thus, f(x) has a complex root in this case also.

REFERENCES
There are many fine books on mathematics. Our purpose here is to cal1 the reader's attention to some of these. We have selected a few of the good English language textbooks which deal mainly with the topics considered in this book. Most of them will carry the reader beyond his present state of knowledge, even assuming that he has mastered every word up to this point. General references. There are severa1 excellent books which deal with many of the topics we have considered. The three which are listed here are not textbooks in the usual sense, although they have been used as such. They are perhaps read with the most enjoyment and profit by someone who knows (or thinks he knows) something about everything in them. 1. RICHARD COURANT and HERBERT ROBBINS, What is Mathematics?, Osford Press, New York, 1941. This book comes closer than any other we know to answering the question posed in its title. 2. R. L. WILDER, The Foundations of dlathematics, Wiley, New York, 1952. This book should be read first and studied afterward. It does an escellent job of presenting an honest picture of the foundations of mathematics. 3. FELIX KLEIN, Elementary Mathematics from an Advanced Standpoint, Macmillan, New York, 1932 (reprinted by Dover, New York, 1945). This book was taken from lectures delivered around 1908 by Professor Klein to German secondary school teachers. Modern college teachers can learn a great deal from Klein's lectures. References on mathematical logic and reasoning. (Introduction.) 4. ALFREDTARSKI,Introduction to Logic, Osford University Press, New York, 1941. 5. PATRICK SUPPES, Introduction to Logic, Van Nostrand, Princeton, 1957. The books of Suppes and Tarski both present elementary mathematical logic with admirable clarity. 6. GEORGE POLYA, HOWto Solve It, Princeton University Press, Princeton, 1957. HADAMARD, The Psychology o f Invention in the IlIathematical Field, 7. JACQUES Princeton University Press, Princeton, 1949. The aim of Polya's book is to teach the reader to think like a mathematician. Hadamard's study demonstrates by examples that only a born mathematician can think like a mathematician. Both books are interesting. 8. ERNEST NAGEL and JAMES NEWMAN, Godel's Proof, New York University Press, New York, 1958. A popular esposition of the work of Godel which brought a revolution in the philosophy of mathematics. This is an interesting book for light reading, but i t leaves a craving for the complete story. 9. S. C. KLEENE, Introduction to dletamathematics, Van Nostrand, Princeton, 1952. Kleene's book shows what happens when mathematics is used to study logic. The book is heavy and difficult,but the first two parts of i t are within the reach of good college undergraduates.

482

REFEREKCES

References on set theory. (Chapter 1.) Set Theory, translated and reprinted by Chelsea, New 10. F. HAUSDORFF, York, 1957. Although it is essentially a monograph, Hausdorff's book has been a standard source of information on informal set theory for many years. 11. P. R. HALMOS, Naive Set Theory, Van Nostrand, Princeton, 1960. 12. PATRICK SUPPES, Axiomatic Set Theory, Van Nostrand, Princeton, 1960. These two recent books by Suppes and Halmos approach set theory in a more formal way. Nevertheless, both books are clearly written and not too difficult. A more elementary discussion of set theory than is found in either Hausdorff, Halmos, or Suppes is given in the following recently published textbook. HAMILTON and JOSEPHLANDIN, Set Theory: The Structure o f 13. NORMAN Arithmetic, Allyn and Bacon, Boston, 1961. References on mathematical induction. (Chapter 2.) 14. 1. S. SOMINSKII, The Method of Mathematical Induction, Blaisdell, New York, 1961. This work is a pamphlet recently translated from Russian which contains numerous interesting esamples of the use of mathematical induction. References on the development of the number systems. (Chapters 3, 4, 6, 7, and 8.) 15. E. G. H. LANDAU, The Foundations of Analysis, Chelsea, New York, 1951. Landau's classical monograph begins with Peano's axioms and relentlessly proceeds to construct each number system from N to C. Nothing essential is omitted, and nothing inessential is included. More leisurely developments of the number systems can be found in Suppes' book (12), and in the work (13) of Hamilton and Landin. References on the theory of numbers. (Chapter 5.) There are many first rate testbooks on the elementary theory of numbers. The following four are particularly noteworthy. They are listed in the order of increasing comprehensiveness. Elementary Theory of Numbers, Addison-Wesley, Read16. W. J. LE VEQUE, ing, Mass., 1962. An Introduction to the Theory o f 17. IVANNIVEN and H. S. ZUCKERMAN, Numbers, Wiley, New York, 1960. and E. M. WRIGHT, An Introduction to the Theory o f Numbers, 18. G. H. HARDY 4th ed., Oxford University Press, London, 1960. Topics in Number Theory, Volumes 1 and 11, Addison19. W. J. LE VEQUE, Wesley, Reading, Mass., 1956. References on the theory o f equations and linear algebra. (Chapters 9 and 10.) Introduction to the Theory o f Equations, Macmillan, New 20. Lours WEISNER, York, 1938. Weisner's work is one of the best books on the theory of equations which is written from a modern point of view. 21. Ross A. BEAUMONT and RICHARD W. BALL, Introduction to Modern Algebra and Matrix Theory, Holt, Rinehart, Winston, New York, 1954. Beaumont and Ball covers most of the subjects which we have discussed, plus severa1 others: groups, vector spaces, linear transformations, and canonical forms for

REFERENCES

483

matrices. The reader will find that the style of Beaumont and Ball is remarkably similar to what he has encountered in this book. 22. B. L. VANDER WAERDEN, Modern Algebra, Translated and reprinted from the second revised edition, Ungar, New York, 1949. This classical textbook has served many generations of mathematics graduate students, and i t will probably serve many more. Volume 11 and the last few chapters of Volume I are fairly advanced. If possible, the German fourth edition should be used. I t is the bestknown example of "easy" mathematical German. 23. HARRY POLLARD, The Theory o f Algebraic Numbers, Carus Monograph number nine, New York, 1950. This monograph is an escellent elementary introduction to algebraic number theory. 24. D. C. MURDOCH, Linear Algebra for Undergraduates, Wiley, New York, 1957. and J. D. SWIFT,Elements o f Linear Algebra, Ginn, Boston, 25. L. J. PAIGE 1961. Finite Dimensional Vector Xpaces, 2nd edition, Van Nos26. P. R. HALMOS, trand, Princeton, 1958. The books of Murdoch, Paige and Swift, and Halmos are al1 textbooks on ices and linear algebra. They are listed in increasing order of sophistication. Halmos' work has an especially interesting collection of problems.

Referentes on the history of mathernatics. The literature on the history of mathematics is not as large as i t might be. For example, practically nothing has been written about the mathematics of the 20th century-a period during which more mathematics has been done than in al1 of the years up to 1900. Choosing "good books" on the history of mathematics is largely a matter of taste. The following two books are very different, but both of them are enjoyable. A Concise History o f Mathematics, Dover, New York, 1948. 27. D. J. STRUIK, f Mathematics, Simon and Schuster, New York, 1937. 28. E. T. BELL,Men o Be11 presents a collection of short biographies of leading mathematicians up to the 20th century. Although it is not a scholarly history, Bell's book is certainly a classic of its kind.

INDEX
Complement of a set, 31 Complete factorization, 336 Complete ordered integral domain, 251 Completing the square, 296 Complex conjugate, 291 Complex numbers, 2, 287 Complex plane, 300 Composite number, 153 Congruence, 176 Congruence classes, 182 Congruent modulo m, 176, 215 Conjugate, 291 Consistent systems of equations, 411 Constant polynomial, 317 Constant term, 317 Contrapositive, 7 Convergent sequence, 258 Convergent series, 264 Converse, 7 Coordinate line, 230 Coordinate plane, 299 Coordinate system, 230, 238 Course of values induction, 68 Cubic equation, 361 Division algorithm, for integers, 135 for polynomials, 322 Divisor of zero, 122 Domain of polynomials, 317, 394 Duodecimal system, 139

Abscissa, 299 Absolute value, 132, 292 Addition, of complex numbers, 287, 301 of integers, 101 of matrices, 430 of natural numbers, 89 of polynomials, 314 of rational numbers, 219 of real numbers, 241 in a ring, 107 Algebraic number, 385 Amplitude, 304 Archimedes, 226 Archimedes' principle, 233 Argument, 304 Associate, 331 Associative law, for addition, 90 for matrix multiplication, 437 for multiplication, 93 for set operations, 37
B

Base, 138 Basis of induction, 57, 68 Binary operation, for rings, 107, 114 for sets, 30 Binary system, 140 Binomial coefficients, 61, 64 Binomial theorem, 63 Bomlielli, Rafael, 286 Bounds for roots, 374

Cancellation law, of addition, 90, 96 of multiplication, 93, 96 in a ring, 121 Cantor, Georg, 18 Cantor's theorem, 279 Cardan, Girolamo, 309 Cardan's formulas, 364 Cardinality, 18, 20, 166 Cardinal number, of a finite set, 18, 86 of a set, 20, 83 Cartesian coordinates, 299 Casting out nines, 179 Characteristic of an integral domain, 210 Chinese remainder theorem, 188 Coefficient, 313 Column matrix, 429 Common divisor, 144, 150 Common multiple, 151 Commutative law, for addition, 90 for multiplication, 93 Commutative ring, 108 Comparison test, 269

Decimal fraction, 226 Decimal representation, of rational numbers, 282 of real numbers, 277 Dedekind cuts, 237 Dedekind, Richard, 224 Definition, 7 Degree, of an algebraic number, 386 of a polynomial, 321, 397 Demoivre's theorem, 305 Density property, 233 Denumerable, 23 Derivative, 338 Descartes, Ren, 299 Determinant, 412, 429 Diagonal method, 280 Difference, of natural numbers, 96 of sets, 36 Dimensions of a matrix, 429 Diophantine equation, 169 Direct proof, 9 Discriminant, 368 Disjoint sets, 44 Disjunctive normal form theorem, 40 Distance, 299 Distributive law, for natural numbers, 93 for set operations, 39 Divergent series, 264 Division, in an integral domain, 122 of polynomials, 326

Echelon form, of a matrix, 447 of a system of equations, 416 Element, of a matrix, 429 of a set, 11, 17 Elementary symmetric polynomial, 406 Elementary transformation matrix, 451 Elementary transformations, of a matrix, 446 of a system of equations, 413 Empty set, 13, 17 Equal, matrices, 430 polynomials, 313 sets, 11 Equivalence, of sets, 20 of statements, 6 Equivalence class, 216 Equivalence relation, 214 Equivalent systems of equations, 412 Euclid, 148, 160 Euclidean algorithm, for integers, 148 for polynomials, 330 Euler, Leonhard, 189 Euler's theorem, 191 Exponent, 194, 206

F
Factor, 122, 326 Factor theorem, 347 Fermat conjecture, 172 Fermat numbers, 162 Fermat, Pierre, 162 Fermat's theorem, 191 Fibonacci sequence, 79, 151, 152 Field, 204 Finite ordinal numbers, 84 Finite sets, 18, 20, 86 Fundamental theorem, of algebra, 355, 474 of arithmetic, 155 of decimal representation, 272 of symmetric polynomials, 407, 466

General associative law, 115, 117 General commutative law, 115, 117 General distrib~tive law, 118

INDEX

Godel numbering, 165 Graph, 369 Greatest common divisor, for integers, 146, 148, 150, 157 for polynomials, 327, 332 Greatest element, 130 Greatest integer function, 278 Greatest lower bound, 249

Homogeneous system of equations, 424


I

Identity, 4 Identity element, 93, 121 Identity matrix, 443 Imaginary part of a complex number, 291 Implication, 4 Incongruent solutions modulo m, 183 Inconsistent system of equationsi 411 Indeterminate, 317 Index set, 37 Index of summation, 115 Indirect proof, 9 Induction hypothesis, 57, 68 Induction step, 57, 68 Inductive definitions, 79 Inequality, 129 Infinite decimal sequence, 228 Infinite sequence, 258 Infinite series, 264 Infinite set, 18, 20 Integers, 2, 100 Integral domain, 121 Intersection of sets, 30, 36 Inverse, of an element, 205 of an implication, 7 of a square matrix, 443 Irrational number, 224 Irreducible polynomial, 333, 357, 358 Isomorphic rings, 111 Isomorphism, 111

Measure of a set, 41, 45 Mersenne number, 162 Method of infinite descent, 174 m-fold root, 348 Minimum element, 130 Minimum polynomial, 386 Modulus, 176, 292 Monic associate, 328 Monic polynomial, 328 mth root, 256 Multiple factor, 345 Multiple root, 349 Multiplication, of compIex numbers, 287, 305 of integers, 104 of matrices, 433 of natural numbers, 91, 92 of polynomials, 315 of rational numbers, 219 of real numbers, 244 in a ring, 107 Alultiplicity, 348

Natural numbers, 2, 82, 85 Negation, of complex numbers, 287 of integers, 101 of polynomials, 315 of rational numbers, 219 of real numbers, 243 in a ring, 107 Negative elements, 127 Negative, of a matrix, 432 of a polynomial, 315 Negative numbers, 100, 244 Nim, 140 Nonnegative elements, 128 Nonnegative real numbers, 244 Nonsingular matrix, 443 n-place decimal approximation, 273 n-place decimal fraction, 270 n-rowed square matrix, 439 nth roots of unity, 308 Number of divisors, 156

Peano's axioms, 87 Perfect number, 163 Permutation, 20 Polar representation, 303 Polynohial, 313 in severa1 indeterminates, 394 Positive elements, 127 Positive integers, 128 Positive real numbers, 244 Power set, 26 Prime characteristic, 210 Prime number, 68, 153, 159 Prime number theorem, 161 Prime pair, 161 Primitive root, 195, 352 Principle of mathematical induction, 57, 76 Product, of matrices, 433 of natural numbers, 91, 92 of polynomials, 315 of sets, 25 Product sign, 118 Probability measure, 43 Proofs, 8 Proper divisor of zero, 122 Proper subset, 15 Pythagoras' theorem, 224, 234

a
Quadratic equation, 296 Quartic equation, 365 Quotient, 124, 136, 325
R

Range of a variable, 3 Rational numbers, 2, 200, 218 Real numbers, 2, 224, 238 Real part of a complex number, 291 Recursive definitions, 79 Reduced cubic equation, 362 Reducible polynomial, 333 Reflexive law, 215 Relation, 213 Relatively prime integers, 147 Relatively prime polynomials,
331

Largest element, 130 Latin square, 185 Law of substitution, 8 Leading coefficient, 321 Least common multiple, of integers, 151, 157 of polynomials, 332 Least element, 130 Least upper bound, 249 Limit, 260 Linear congruence, 181 Lower bound, 248 for roots, 374 M Mathematical induction, 53 Matrix, 428 of coefficients, 434 Maximum element, 130

One-to-one correspondence, 19 Order modulo m, 194, 352 Ordered field, 207 Ordered pair, 24 Ordered integral domain, 126 Ordering, of integers, 125 of natural numbers, 95 of rational numbers, 219 of real numbers, 240 Ordinate, 299 Origin, 230, 298 Orthogonal Latin square, 185
P

~ e m i i d e r138, , 325 Remainder theorem, 346 Residue classes, 182 Resolvent cubic equation, 366 Ring, 107 of subsets, 43, 108 Root, 346 Row matrix, 429 Rule, of detachment, 8 of double negation, 101

Pairwise disjoint collection of sets, 44 Parallelogram rule, 302 Partial sum, 264 Partially ordered set, 248 Partition, 218 Pascal triangle, 61

Scheffer stroke operation, 36 Sentence, 4 Sentential function, 4 Sequence, 76 Set, 11, 17 Set builder, 13 Sieve of Eratosthenes, 159 Simple root, 349 Singular matrix, 443 Smallest element, 130

INDEX

Solution, of a polynomial equation, 400 of a system of equations, 400 Square matrix, 429 Square root, 295 Sturm sequence, 378 Sturm's theorem, 379, 461 Subring, 110 Subset, 15 Substitution in a polynomial, 345, 398 Subtraction, of natural numbers, 97 in a ring, 109 Sum, of divisors, 156 of an infinite series, 264 of matrices, 430 of natural numbers, 89 of real numbers, 241 Summation sign, 115 Symmetric law, 215 Symmetric polynomial, 402 System, of linear equations, 410 of polynomial equations, 400

Taylor's theorem, 354 Total degree, 397 Totient, 191, 193 Transcendental number, 385 Transitive law, 215 Triangle inequality, 293

Variables, 2 Variation in sign, 379 Venn diagrams, 31


W

Well defined, 219 Well-ordering principle, 77 Wilson's theorem, 355


X

Ultimately periodic decimal sequence, 280 Unary operation, for rings, 107 for sets, 31 Union of sets, 30, 36 Unique factorization theorem, 334 Universal set, 31 Upper bound, 248 for roots, 374

x-axis, 298
Y

y-axis, 298

z
Zero, integer, 100 of a polynomial, 346, 400 of a ring, 107 Zero matrix, 432

v
Value, of a variable, 3 of a polynomial, 345, 398