Professional Documents
Culture Documents
Sets
Set theory is the foundation of probability and statistics, as it is for almost every branch of mathematics.
AB
Membership is a primitive, undefined concept in naive set theory, the type we are studying here. However, the
following construction, known as Russell's paradox, after the mathematician and philosopher Bertrand Russell,
shows that we cannot be too cavalier in the construction of sets.
Let R be the set of all sets A such that AA. Then RR if and only if RR.
Proof:
The contradiction follows from the definition of R: If RR, then by definition, RR. If RR, then by
definition, RR. The net result, of course, is that R is not a well-defined set.
Usually, the sets under discussion in a particular context are all subsets of a well-defined, specified set S, often
called a universal set. The use of a universal set prevents the type of problem that arises in Russell's paradox.
That is, if S is a given set and p(x) is a predicate on S (that is, a valid mathematical statement that is either true
or false for each xS), then {xS:p(x)} is a valid subset of S. Defining a set in this way is known as
predicate form. The other basic way to define a set is simply be listing its elements; this method is known as list
form.
In contrast to a universal set, the empty set, denoted , is the set with no elements.
A means that xxA. Since the premise is false, the implication is true.
Suppose that A, B and C are subsets of a set S. The subset relation is a partial order on the collection of subsets
of S. That is,
1. AA (the reflexive property).
2. If AB and BA then A=B (the anti-symmetric property).
3. If AB and BC then AC (the transitive property).
If AB and AB, we sometimes write AB, and A is said to be a strict subset of B. If AB, then A
is called a proper subset of B. If S is a set, then the set of all subsets of S is known as the power set of S and is
denoted P(S).
Special Sets
The following special sets are used throughout this text. Defining them will also give us practice using list and
predicate form.
R, the set of real numbers, and the universal set for the other subsets in this list.
A={xR:p(x)=0 for some polynomial p with integer coefficients}, the set of algebraic
numbers.
Note that N+NZQAR. We will also occasionally need the set of complex numbers
Set Operations
We are now ready to review the basic operations of set theory. For the following definitions, suppose that A and
AB={xS:xA or xB}
The intersection of A and B is the set of elements common to both A and B:
Ac={xS:xA}
Note that union, intersection, and difference are binary set operations, while complement is a unary set
operation.
In the Venn diagram applet, select each of the following and note the shaded area in the diagram.
1. A
2. B
3. Ac
4. Bc
5. AB
6. AB
Basic Rules
In the following problems, A, B, and C are subsets of a universal set S. The proofs are straightforward, and just
use the definitions and basic logic. Try the proofs yourself before reading the ones in the text.
ABAAB.
The identity laws:
1. A=A
2. AS=A
The idempotent laws:
1. AA=A
2. AA=A
The complement laws:
1. AAc=S
2. AAc=
The double complement law: (Ac)c=A
The commutative laws:
1. AB=BA
2. AB=BA
Proof:
These results follows from the commutativity of the or and and logical operators.
The associative laws:
1. A(BC)=(AB)C
2. A(BC)=(AB)C
Proof:
These results follow from the associativity of the or and and logical operators.
Thus, we can write ABC without ambiguity. Note that x is an element of this set if and only if x is an
element of at least one of the three given sets. Similarly, we can write ABC without ambiguity. Note that x
is an element of this set if and only if x is an element of all three of the given sets.
The distributive laws:
1. A(BC)=(AB)(AC)
2. A(BC)=(AB)(AC)
Proof:
1. xA(BC) if and only if xA and xBC if and only if xA and either xB or xC if and
only if xA and xB, or, xA and xC if and only if xAB or xAC if and only if
x(AB)(AC.
2. The proof is exactly the same as (a), but with or and and interchanged.
DeMorgan's laws (named after Agustus DeMorgan):
1. (AB)c=AcBc
2. (AB)c=AcBc.
Proof:
1. x(AB)c if and only if xAB if and only if xA and xB if and only xAc and xBc if and
only if xAcBc
2. x(AB)c if and only if xAB if and only if xA or xB if and only xAc or xBc if and
only if xAcBc
The following statements are equivalent:
1. AB
2. BcAc
3. AB=B
4. AB=A
5. AB=
Proof:
1. Recall that AB means that xAxB.
2. BcAc means that xBxA. This is the contrapositive of (a) and hence is equivalent to (a).
xAB. Thus AB=. Conversely suppose that AB=. If xA then xAB so xB. Thus
AB.
In addition to the special sets mentioned earlier, we also have
Since QAR it follows that RARQ, that is, every transcendental number is also irrational.
Set difference can be expressed in terms of complement and intersection. All of the other set operations
(complement, union, and intersection) can be expressed in terms of difference.
Results for set difference:
1. BA=BAc
2. Ac=SA
3. AB=A(AB)
4. AB=S{(SA)[(SA)(SB)]}
Proof:
1. This is clear from the definition: BA=BAc={xS:xB and xA}.
2. This follows from (a) with B=S.
3. Using (a), DeMorgan's law, and the distributive law, the right side is
A(ABc)c=A(AcB)=(AAc)(AB)=(AB)=AB
4. Using (a), (b), DeMorgan's law, and the distributive law, the right side is
[Ac(AcB)c]c=A(AcB)=(AAc)(AB)=S(AB)=AB
So in principle, we could do all of set theory using the one operation of set difference. But as (c) and (d)
suggest, the results would be hideous.
(AB)(AB)=(AB)(BA).
Proof:
A direct proof is simple, but for practice let's give a proof using set algebra, in particular, DeMorgan's law, and
the distributive law:
(AB)(AB)=(AB)(AB)c=(AB)(AcBc)=(AAc)(BAc)(ABc)(BBc)=(
BA)(AB)=(AB)(BA)
The set in the previous result is called the symmetric difference of A and B, and is sometimes denoted AB.
The elements of this set belong to one but not both of the given sets. Thus, the symmetric difference
corresponds to exclusive or in the same way that union corresponds to inclusive or. That is, xAB if and
only if xA or xB (or both); xAB if and only if xA or xB, but not both. On the other hand, the
complement of the symmetric difference consists of the elements that below to both or neither of the given sets:
(AB)c=(AB)(AcBc)=(AcB)(BcA)
Proof:
Again, a direct proof is simple, but let's give an algebraic proof for practice:
(AB)c=[(AB)(AB)c]c=(AB)c(AB)=(AcBc)(AB)=(AcA)(AcB)(BcA)(
BcB)=S(AcB)(BcA)S=(AcB)(BcA)
There are 16 different (in general) sets that can be constructed from two given events A and B.
Proof:
S is the union of 4 pairwise disjoint sets: AB, ABc, AcB, and AcBc. If A and B are in general position,
these 4 sets are distinct. Every set that can be constructed from A and B is a union of some (perhaps none,
perhaps all) of these 4 sets. There are 24=16 sub-collections of the 4 sets.
Open the Venn diagram applet. This applet lists the 16 sets that can be constructed from given sets A and B
using the set operations.
1. Select each of the four subsets in the proof of the last exercise: AB, ABc, AcB, and AcBc. Note
that these are disjoint and their union is S.
2. Select each of the other 12 sets and show how each is a union of some of the sets in (a).
General Operations
The operations of union and intersection can easily be extended to a finite or even an infinite collection of sets.
Definitions
Suppose that A is a nonempty collection of subsets of a universal set S. In some cases, the subsets in A may be
naturally indexed by a nonempty index set I, so that A={Ai:iI}. (In a technical sense, any collection of
subsets can be indexed.)
The union of the collection of sets A is the set obtained by combining the elements of the sets in A:
nN+, we would write ni=1Ai for the union and ni=1Ai for the intersection.
A collection of sets A is pairwise disjoint if the intersection of any two sets in the collection is empty:
AB= for every A,BA with AB. The collection of sets A is said to partition a set B if the collection
A is pairwise disjoint and A=B.
Partitions are intimately related to equivalence relations.
Basic Rules
In the following problems, A={Ai:iI} is a collection of subsets of a universal set S, indexed by a nonempty
set I, and B is a subset of S.
The general distributive laws:
1. (iIAi)B=iI(AiB)
2. (iIAi)B=iI(AiB)
Restate the laws in the notation where the collection A is not indexed.
Proof:
1. x is an element of either set if and only if xB and xAi for some iI.
2. x is an element of either set if and only if xB or xAi for every iI.
(A)B={AB:AA}, (A)B={AB:AA}
The general De Morgan's laws:
1. (iIAi)c=iIAci
2. (iIAi)c=iIAci
Restate the laws in the notation where the collection A is not indexed.
Proof:
1. x(iIAi)c if and only if xiIAi if and only if xAi for every iI if and only if xAci for
(A)c={Ac:AA}, (A)c={Ac:AA}
Suppose that the collection A partitions S. For any subset B, the collection {AB:AA} partitions B.
Proof:
Suppose A={Ai:iI} where I is an index set. If i,jI with ij then
iI(AiB)=(iIAi)B=SB=B
Product sets
Definitions
Product sets are sets of sequences. The defining property of a sequence, of course, is that order as well as
membership is important.
Let us start with ordered pairs. In this case, the defining property is that (a,b)=(c,d) if and only if a=c and
b=d. Interestingly, the structure of an ordered pair can be defined just using set theory. The construction in the
result below is due to Kazamierz Kuratowski
Define (a,b)={{a},{a,b}}. This definition captures the defining property of an ordered pair.
Proof:
Suppose that (a,b)=(c,d) so that {{a},{a,b}}={{c},{c,d}}. In the case that a=b note that (a,b)={{a}}.
Thus we must have {c}={c,d}={a} and hence c=d=a, and in particular, a=c and b=d. In the case that
ab, we must have {c}={a} and hence c=a. But we cannot have {c,d}={a} because then (c,d)={{a}}
and hence {a,b}={a}, which would force a=b, a contradiction. Thus we must have {c,d}={a,b}. Since c=a
and ab we must have d=b. The converse is trivial: if a=b and c=d then {a}={c} and {a,b}={c,d} so
(a,b)=(c,d).
For ordered triples, the defining property is (a,b,c)=(d,e,f) if and only if a=d, b=e, and c=f. Ordered
triples can be defined in terms of ordered pairs, which via the last result, uses only set theory.
Define (a,b,c)=(a,(b,c)). This definition captures the defining property of an ordered triple.
Proof:
Suppose (a,b,c)=(d,e,f). Then (a,(b,c))=(d,(e,f)). Hence by the definition of an ordered pair, we must
have a=d and (b,c)=(e,f). Using the definition again we have b=e and c=f. Conversely, if a=d, b=e, and
(x1,x2,,xn)=(y1,y2,,yn) if and only if xi=yi for all i{1,2,,n}. For infinite sequences, (x1,x2,
)=(y1,y2,) if and only if xi=yi for all iN+.
Suppose now that we have a sequence of sets of n sets, (S1,S2,,Sn), where nN+. The Cartesian product
of the sets is defined as follows:
Rn is n-dimensional Euclidean space, named after Euclid, of course. The elements of {0,1}n are called bit
strings of length n. As the name suggests, we sometimes represent elements of this product set as strings rather
than sequences (that is, we omit the parentheses and commas). Since the coordinates just take two values, there
is no risk of confusion.
Suppose that we have an infinite sequence of sets (S1,S2,). The Cartesian product of the sets is defined by
When Si=S for iN+, the Cartesian product set is sometimes written as a Cartesian power as S or as SN+.
An explanation for the last notation is given in the next section on functions. Also, notation similar to that of
general union and intersection is often used for Cartesian product, with as the operator. thus
ni=1Si=S1S2Sn,i=1Si=S1S2
Rules for Product Sets
We will now see how the set operations relate to the Cartesian product operation. Suppose that S and T are sets
and that AS, BS and CT, DT. The sets in the exercises below are subsets of ST.
The most important rules that relate Cartesian product with union, intersection, and difference are the
distributive rules:
1. A(CD)=(AC)(AD)
2. (AB)C=(AC)(BC)
3. A(CD)=(AC)(AD)
4. (AB)C=(AC)(BC)
5. A(CD)=(AC)(AD)
6. (AB)C=(AC)(BC)
Proof:
1. (x,y)A(CD) if and only if xA and yCD if and only if xA and either yC or yD if
and only if xA and yC, or, xA and yD if and only if (x,y)AC or (x,y)AD if and
only if (x,y)(AC)(AD).
2. Similar to (a), but with the roles of the coordinates reversed.
3. (x,y)A(CD) if and only if xA and yCD if and only if xA and yC and yD if and
only if (x,y)AC and (x,y)AD if and only if (x,y)(AC)(AD).
4. Similar to (c) but with the roles of the coordinates reversed.
5. (x,y)A(CD) if and only if xA and yCD if and only if xA and yC and yD if and
only if (x,y)AC and (x,y)AD if and only if (x,y)(AC)(AD).
6. Similar to (e) but with the roles of the coordinates reversed.
In general, the product of unions is larger than the corresponding union of products.
1. (AB)(CD)=(AC)(AD)(BC)(BD)
2. (AC)(BD)(AB)(CD)
Proof:
1. (x,y)(AB)(CD) if and only if xAB and yCD if and only if at least one of the
following is true: xA and yC, xA and yD, xB and yC, xB and yD if and only if
(x,y)(AC)(AD)(BC)(BD)
2. This follows from (a). Note that the left side is a proper subset of the right side if AD or BC is
nonempty.
On the other hand, the product of intersections is the same as the corresponding intersection of products.
(AC)(BD)=(AB)(CD)
Proof:
(x,y)(AC)(BD) if and only if (x,y)AC and (x,y)BD if and only if xA and yC and
xB and yD if and only if xAB and yCD if and only if (x,y)(AB)(CD.
In general, the product of differences is smaller than the corresponding difference of products.
1. (AB)(CD)=[(AC)(AD)][(BC)(BD)]
2. (AB)(CD)(AC)(BD)
Proof:
1. (x,y)(AB)(CD if and only if xAB and yCD if and only if xA and xB and
Cx={yT:(x,y)C}
Similarly the cross section of C at yT is
Cy={xS:(x,y)C}
Note that CxT for xS and CyS for yT. We define the projections of C on T and onto S by
yDx.
2. The proof is just like (a), with and replacing or.
3. The proof is just like (a), with and not replacing or.
For projections, the results are a bit more complicated. We give the results for projections onto T; naturally the
results for projections onto S are analogous.
Suppose again that C,DST. Then
1. (CD)T=CTDT
2. (CD)TCTDT
3. (CT)c(Cc)T
Proof:
1. Suppose that y(CD)T. Then there exists xS such that (x,y)CD. Hence (x,y)C so yCT,
or (x,y)D so yDT. In either case, yCTDT. Conversely, suppose that yCTDT. Then yCT
or yDT. If yCT then there exists xS such that (x,y)C. But then (x,y)CD so y(CD)T.
Similarly if yDT then y(CD)T.
2. Suppose that y(CD)T. Then there exists xS such that (x,y)CD. Hence (x,y)C so yCT
and (x,y)D so yDT. Therefore yCTDT.
3. Suppose that y(CT)c. Then yCT, so for every xS, (x,y)C. Fix x0S. Then (x0,y)C so
A1,A2S are nonempty and disjoint and BT is nonempty. Let C=A1B and D=A2B. Then CD=
so (CD)T=. But CT=DT=B. In part (c) for example, suppose that A is a nonempty proper subset of S
and B is a nonempty proper subset of T. Let C=AB. Then CT=B so (CT)c=Bc. On the other hand,
Cc=(AcB)(ABc), so (Cc)T=T.
Computational Exercises
Subsets of R
The universal set is [0,). Let A=[0,5] and B=(3,7). Express each of the following in terms of intervals:
1. AB
2. AB
3. AB
4. BA
5. Ac
Answer:
1. (3,5]
2. [0,7]
3. [0,3]
4. (5,7)
5. (5,)
The universal set is N. Let A={nN:n is even} and let B={nN:n9}. Give each of the following:
1. AB in list form
2. AB in predicate form
3. AB in list form
4. BA in list form
5. Ac in predicate form
6. Bc in list form
Answer:
1. {0,2,4,6,8}
2. {nN:n is even or n9}
3. {10,12,14,}
4. {1,3,5,7,9}
5. {nN:n is odd}
6. {10,11,12,}
Coins and Dice
Let S={1,2,3,4}{1,2,3,4,5,6}. This is the set of outcomes when a 4-sided die and a 6-sided die are tossed.
Further let A={(x,y)S:x=2} and B={(x,y)S:x+y=7}. Give each of the following sets in list form:
1. A
2. B
3. AB
4. AB
5. AB
6. BA
Answer:
1. {(2,1),(2,2),(2,3),(2,4),(2,5),(2,6)}
2. {(1,6),(2,5),(3,4),(4,3)}
3. {(2,5)}
4. {(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(1,6),(3,4),(4,3)}
5. {(2,1),(2,2),(2,3),(2,4),(2,6)}
6. {(1,6),(3,4),(4,3)}
Let S={0,1}3. This is the set of outcomes when a coin is tossed 3 times (0 denotes tails and 1 denotes heads).
Further let A={(x1,x2,x3)S:x2=1} and B={(x1,x2,x3)S:x1+x2+x3=2}. Give each of the following
sets in list form, using bit-string notation:
1. S
2. A
3. B
4. Ac
5. Bc
6. AB
7. AB
8. AB
9. BA
Answer:
1. {000,100,010,001,110,101,011,111}
2. {010,110,011,111}
3. {110,011,101}
4. {000,100,001,101}
5. {000,100,010,001,111}
6. {110,011}
7. {010,110,011,111,101}
8. {010,111}
9. {101}
Let S={0,1}2. This is the set of outcomes when a coin is tossed twice (0 denotes tails and 1 denotes heads).
Give P(S) in list form.
Answer:
{,{00},{01},{10},{11},{00,01},{00,10},{00,11},{10,11},{00,01,10},{00,01,11},{00,10,11},
{01,10,11},{00,01,10,11}}
Cards
A standard card deck can be modeled by the Cartesian product set
D={1,2,3,4,5,6,7,8,9,10,j,q,k}{,,,}
where the first coordinate encodes the denomination or kind (ace, 2-10, jack, queen, king) and where the second
coordinate encodes the suit (clubs, diamonds, hearts, spades). Sometimes we represent a card as a string rather
than an ordered pair (for example q). For the problems in this subsection, the card deck D is the universal set.
Let H denote the set of hearts and F the set of face cards. Find each of the following:
1. HF
2. HF
3. FH
4. HF
Answer:
1. {j,q,k}
2. {1,2,34,5,6,78,910}
3. {j,q,k,j,q,k,k,k,j,q,k}
4. {1,2,3,4,5,6,7,8.9,10,j,q,k,j,q,k,j,q,k}
A bridge hand is a subset of D with 13 cards. Often bridge hands are described by giving the cross sections by
suit. Thus, suppose that N is a bridge hand, held by a player named North, with
N={2,5,q},N={1,5,8,q,k},N={8,10,j,q},N={1}
Find each of the following:
B1={,},B8={,},Bq={}
Find each of the following:
1. The nonempty cross sections of B by suit.
2. The projection of B onto the set of suits.
3. The projection of B onto the set of denominations
Answer:
1. B={1,8}, B={q}, B={1,8}
2. {,,}
3. {1,8,q}
The poker hand in the last exercise is known as a dead man's hand. Legend has it that Wild Bill Hickock held
this hand at the time of his murder in 1876.
General unions and intersections
For the problems in this subsection, the universal set is R.
Let An=[0,11n] for nN+. Find
1. n=1An
2. n=1An
3. n=1Acn
4. n=1Acn
Answer:
1. {0}
2. [0,1)
3. (,0)[1,)
4. R{0}
Let An=(21n,5+1n) for nN+. Find
1. n=1An
2. n=1An
3. n=1Acn
4. n=1Acn
Answer:
1. [2,5]
2. (1,6)
3. (,1][6,)
4. (,2)(5,)
Subsets of R 2
Let T be the closed triangular region in R2 with vertices (0,0), (1,0), and (1,1). Find each of the following:
1. The cross section Tx for xR
2. The cross section Ty for yR
3. The projection of T onto the horizontal axis
4. The projection of T onto the vertical axis
Proof:
1. Tx=[0,x] for x[0,1], Tx= otherwise
3. [0,1]
4. [0,1]
2. Functions
Functions play a central role in probability and statistics, as they do in every other branch of mathematics. For
the most part, the proofs in this section are straightforward, so be sure to try them yourself before reading the
ones in the text.
y=f(x). Less formally, a function f from S into T is a rule (or procedure or algorithm) that assigns to each
xS a unique element f(x)T. The definition of a function as a set of ordered pairs, is due to Kazamierz
Kuratowski. When f is a function from S into T we write f:ST.
f1(y)=xf(x)=y;xS,yT
If you like to think of a function as a set of ordered pairs, then f1={(y,x)TS:(x,y)f}. The fact that f
is one-to-one and onto ensures that f1 is a valid function from T onto S. Sets S and T are in one-to-one
correspondence if there exists a one-to-one function from S onto T. One-to-one correspondence plays an
essential role in the study of cardinality.
Restrictions
Suppose that f:ST and that AS. The function fA:AT defined by the rule fA(x)=f(x) for xA is
called, appropriately enough, as the restriction of f to A. As a set of ordered pairs, note that
fA={(x,y)f:xA}.
Composition
Suppose that g:RS and f:ST. The composition of f with g is the function fg:RT defined by
(fg)(x)=f(g(x)),xR
Composition is associative:
Suppose that h:RS, g:ST, and f:TU. Then
f(gh)=(fg)h
Proof:
Note that both functions map R into U. Using the definition of composition, the value of both functions at
xR is f(g(h(x))).
Thus we can write fgh without ambiguity. On the other hand, composition is not commutative. Indeed
depending on the domains and co-domains, fg might be defined when gf is not. Even when both are
defined, they may have different domains and co-domains, and so of course cannot be the same function. Even
when both are defined and have the same domains and co-domains, the two compositions will not be the same
in general. Examples of all of these cases are given in the computational exercises below.
Suppose that g:RS and f:ST.
1. If f and g are one-to-one then fg is one-to-one.
2. If f and g are onto then fg is onto.
Proof:
1. Suppose that u,vR and (fg)(u)=(fg)(v). Then f(g(u))=f(g(v)). Since f is one-to-one,
The inverse of a function is really the inverse with respect to composition. Suppose that f is a one-to-one
function from S onto T. Then
1. f1f=IS
2. ff1=IT
Proof:
1. Note that f1f:SS. For xS, (f1f)(x)=f1(f(x))=x.
2. Note that ff1:TT. For yT, (f1f)(y)=f(f1(y))=y
An element xSn can be thought of as a function from {1,2,,n} into S. Similarly, an element xS can
be thought of as a function from N+ into S. For such a sequence x, of course, we usually write xi instead of
x(i). More generally, if S and T are sets, then the set of all functions from S into T is denoted by TS. In
particular, as we noted in the last section, S is also (and more accurately) written as SN+.
Suppose that g is a one-to-one function from R onto S and that f is a one-to-one function from S onto T. Then
(fg)1=g1f1.
Proof:
Note that fg)1:TS and g1f1:TS. For yT, let x=(fg)1(y). Then (fg)(x)=y so that
f1(A)={xS:f(x)A}
Technically, the inverse images define a new function from P(T) into P(S). We use the same notation as for
the inverse function, which is defined when f is one-to-one and onto. These are very different functions, but
usually no confusion results. The following important theorem shows that inverse images preserve all set
operations.
Suppose that f:ST, and that AT and BT. Then
1. f1(AB)=f1(A)f1(B)
2. f1(AB)=f1(A)f1(B)
3. f1(AB)=f1(A)f1(B)
4. If AB then f1(A)f1(B)
5. If A and B are disjoint, so are f1(A) and f1(B)
Proof:
1. xf1(AB) if and only if f(x)AB if and only if f(x)A or f(x)B if and only if
Forward Images
Suppose again that f:ST. If AS, the forward image of A under f is the subset of T given by
f(A)={f(x):xA}
Then y=f(x) for some xAB. Hence xA so yf(A). Again, the proof breaks down at this point.
However, if f is one-to-one and f(u)=y for some uB, then u=x so xB, a contradiction. Hence
iI. Hence for every iI there exists xiAi with y=f(xi). If f is one-to-one, xi=xj for all i,jI.
Call the common value x. Then xAi for every iI so xiIAi and therefore yf(iIAi).
Suppose again that f:ST. As noted earlier, the forward images of f define a function from P(S) into P(T)
and the inverse images define a function from P(T) into P(S). One might hope that these functions are
inverses of one-another, but alas no.
Suppose that f:ST.
1. Af1[f(A)] for AS. Equality holds if f is one-to-one.
2. f[f1(B)]B for BT. Equality holds if f is onto.
Proof:
1. If xA then f(x)f(A) and hence xf1[f(A)]. Conversely suppose that xf1[f(A)]. Then
f(x)f(A) so f(x)=f(u) for some uA. At this point we can go no further. But if f is one-to-one,
then u=x and hence xA.
2. Suppose yf[f1f(B)]. Then y=f(x) for some xf1(B). But then y=f(x)B. Conversely
suppose that f is onto and yB. There exist xS with f(x)=y. Hence xf1(B) and so
yf[f1(B)].
Spaces of Real Functions
Let S be a set and let V denote the set of all functions from S into R. The usual arithmetic operations on
members of V are defined pointwise. That is, if fV, gV, and cR, then f+g, fg, fg, and cf defined as
follows for all xS.
(f+g)(x)=f(x)+g(x)
(fg)(x)=f(x)g(x)
(fg)(x)=f(x)g(x)
(cf)(x)=cf(x)
If g(x)0 for every xS, then f/g defined by (f/g)(x)=f(x)/g(x) for xS is also in V. The 0 functions
0 of course is defined by 0(x)=0 for all xS. Each of these functions is also in V.
A fact that is very important in probability as well as other branches of analysis is that V with addition and
scalar multiplication as defined above is a vector space.
(V,+,) is a vector space over R. That is, for all f,g,hV and a,bR
1. f+g=f+f (commutative rule)
2. f+(g+h)=(f+g)+h (associative rule)
3. a(f+g)=af+ag (distributive rule)
4. (a+b)f=af+bf (distributive rule)
5. f+0=f (existence of zero vector)
6. f+(f)=0 (existence of additive inverses)
7. 1f=f (unity rule)
Proof:
Each of these properties follows from the corresponding property in R.
Indicator Functions
Suppose that A is a subset of a universal set S. The indicator function of A is the function 1A:S{0,1}
defined as follows:
1A(x)={1,0,xAxA
Thus, the indicator function of A simply indicates whether or not xA for each xS.
Conversely, any function f:S{0,1} is the indicator function of the set A=f1{1}={xS:f(x)=1}
Thus, there is a one-to-one correspondence between P(S), the power set of S, and the collection of indicator
functions {0,1}S. The next exercise shows how the set algebra of subsets corresponds to the arithmetic algebra
of the indicator functions.
Suppose that A and B are subsets of S. Then
1. 1AB=1A1B=min{1A,1B}
2. 1AB=1(11A)(11B)=max{1A,1B}
3. 1Ac=11A
4. 1AB=1A(11B)
5. AB if and only if 1A1B
Proof:
1. Note that both functions on the right just take the values 0 and 1. Moreover,
5. Since both functions just take the values 0 and 1, note that 1A1B if and only if 1A(x)=1 implies
For a fixed universal set S, the set operators on P(S) were studied in the section on Sets:
As these example illustrate, a binary operator is often written as xfy rather than f(x,y). Still, it is useful to
know that operators are simply functions of a special type.
Suppose now that f is a unary operator on a set S, g is a binary operator on S, and that AS. Then A is said to
be closed under f if xA implies f(x)A. Thus, f restricted to A is a unary operator on A. Similarly, A is
said to be closed under g if (x,y)AA implies g(x,y)A. Thus, g restricted to AA is a binary operator
on A. Returning to our basic examples, note that
N is closed under plus and times, but not under minus and difference.
Q is closed under plus, times, minus, and difference. Q{0} is closed under reciprocal.
Many properties that you are familiar with for special operators (such as the arithmetic and set operators) can
now be formulated generally. Suppose that f and g are binary operators on S. In the following definitions, x, y,
and z are arbitrary elements of S.
Suppose that S is a collection of nonempty subsets of a set S. The axiom of choice states that there exists a
function f:SS with the property that f(A)A for each AS. The function f is known as a choice
function.
Stripped of most of the mathematical jargon, the idea is very simple. Since each set AS is nonempty, we can
select an element of A; we will call the element we select f(A) and thus our selections define a function. In
fact, you may wonder why we need an axiom at all. The problem is that we have not given a rule (or procedure
or algorithm) for selecting the elements of the sets in the collection. Indeed, we may not know enough about the
sets in the collection to define a specific rule, so in such a case, the axiom of choice simply guarantees the
existence of a choice function. Some mathematicians, known as constructionists do not accept the axiom of
choice, and insist on well defined rules for constructing functions.
A nice consequence of the axiom of choice is a type of duality between one-to-one functions and onto functions.
Suppose that f is a function from a set S onto a set T. There exists a one-to-one function g from T into S.
Proof.
For each yT, the set f1{y} is non-empty, since f is onto. By the axiom of choice, we can select an element
g(y) from f1{y} for each yT. The resulting function g is one-to-one.
Suppose that f is a one-to-one function from a set S into a set T. There exists a function g from T onto S.
Proof.
Fix a special element x0S. If yrange(f), there exists a unique xS with f(x)=y. Define g(y)=x. If
Computational Exercises
Some Elementary Functions
Each of the following rules defines a function from R into R.
f(x)=x2
g(x)=sin(x)
h(x)=x
u(x)=ex1+ex
Find the range of the function and determine if the function is one-to-one in each of the following cases:
1. f
2. g
3. h
4. u
Answer:
1. Range [0,). Not one-to-one.
2. Range [1,1]. Not one-to-one.
3. Range Z. Not one-to-one.
4. Range (0,1). One-to-one.
Find the following inverse images:
1. f1[4,9]
2. g1{0}
3. h1{2,3,4}
Answer:
1. [3,2][2,3]
2. {n:nZ}
3. [2,5)
The function u is one-to-one. Find (that is, give the domain and rule for) the inverse function u1.
Answer:
f(x,y)=x+y
g(x,y)=yx
u(x,y)=min{x,y}
v(x,y)=max{x,y}
1. f
2. g
3. u
4. v
5. U
Answer:
1. {2,3,4,,12}
2. {5,4,,4,5}
3. {1,2,3,4,5,6}
4. {1,2,3,4,5,6}
5. {(i,j){1,2,3,4,5,6}2:ij}
1. {(1,5),(2,4),(3,3),(4,2),(5,1)}
2. {(3,3),(3,4),(4,3),(3,5),(5,3),(3,6),(6,3)}
3. {(1,4),(4,1),(2,4),(4,2),(3,4),(4,3),(4,4)}
4. {(3,4),(4,3)}
Find each of the following compositions:
1. fU
2. gU
3. uF
4. vF
5. FU
6. UF
Answer:
1. fU=f
2. gU=|g|
3. uF=g
4. vF=f
5. FU=(f,|g|)
6. UF=(g,f)
Note that while fU is well-defined, Uf is not. Note also that fU=f even though U is not the identity
function on S.
Bit Strings
Let nN+ and let S={0,1}n and T={0,1,,n}. Recall that the elements of S are bit strings of length n,
and could represent the possible outcomes of n tosses of a coin (where 1 means heads and 0 means tails). Let
f:ST be the function defined by f(x1,x2,,xn)=ni=1xi. Note that f(x) is just the number of 1s in the
the bit string x. Let g:TS be the function defined by g(k)=xk where xk denotes the bit string with k 1s
followed by nk 0s.
Find each of the following
1. fg
2. gf
Answer:
1. fg:TT and (fg)(k)=k.
2. gf:SS and (gf)(x)=xk where k=f(x)=ni=1xi. In words, (gf)(x) is the bit string with the
same number of 1s as x, but rearranged so that all the 1s come first.
In the previous exercise, note that fg and gf are both well-defined, but have different domains, and so of
course are not the same. Note also that fg is the identity function on T, but f is not the inverse of g. Indeed f
is not one-to-one, and so does not have an inverse. However, f restricted to {xk:kT} (the range of g) is oneto-one and is the inverse of g.
Let n=4. Give f1({k}) in list form for each kT.
Answer:
1. f1({0})={0000}
2. f1({1})={1000,0100,0010,0001}
3. f1({2})={1100,1010,1001,0110,0101,0011}
4. f1({3})={1110,1101,1011,0111}
5. f1({4})={1111}
Again let n=4. Let A={1000,1010} and B={1000,1100}. Give each of the following in list form:
1. f(A)
2. f(B)
3. f(AB)
4. f(A)f(B)
5. f1(f(A))
Answer:
1. {1,2}
2. {1,2}
3. {1}
4. {1,2}
5. {1000,0100,0010,0001,1100,1010,1001,0110,0101,0011}
In the previous exercise, note that f(AB)f(A)f(B) and Af1(f(A)).
Indicator Functions
Suppose that A and B are subsets of a universal set S. Express, in terms of 1A and 1B, the indicator function of
each of the 14 non-trivial sets that can be constructed from A and B. Use the Venn diagram applet to help.
Answer:
1. 1A
2. 1B
3. 1Ac=11A
4. 1Bc=11B
5. 1AB=1A1B
6. 1AB=1A+1B1A1B
7. 1ABc=1A1A1B
8. 1BAc=1B1A1B
9. 1ABc=11B+1A1B
10. 1BAc=11A+1A1B
11. 1AcBc=11A1B+1A1B
12. 1Acbc=11A1B
13. 1(ABc)(BAc)=1A+1B21A1B
14. 1(AB)(AcBc)=11A1B+21A2B
Suppose that A, B, and C are subsets of a universal set S. Give the indicator function of each of the following,
in terms of 1A, 1B, and 1C in sum-product form:
1. D={xS:x is an element of exactly one of the given sets}
2. E={xS:x is an element of exactly two of the given sets}
Answer:
1. 1D=1A+1B+1C2(1A1B+1A1C+1B1C)+31A1B1C
2. 1E=1A1B+1A1C+1B1C31A1B1C
Operators
Recall the standard arithmetic operators on R discussed above.
We all know that sum is commutative and associative, and that product distributes over sum.
1. Is difference commutative?
2. Is difference associative?
3. Does product distribute over difference?
4. Does sum distributed over product?
Answer:
1. No. xyyx
2. No. x(yz)(xy)z
3. Yes. x(yz)=(xy)(xz)
4. No. x+(yz)(x+y)(x+z)
3. Relations
Definitions
Suppose that S and T are sets. A relation from S to T is a subset of the product set ST. As the name suggests,
a relation R from S into T is supposed to define a relationship between the elements of S and the elements of
T, and so we usually use the more suggestive notation xRy when (x,y)R. The domain of R is the set of first
coordinates and the range of R is the set of second coordinates.
Constructions
Set Operations
Since a relation is just a set of ordered pairs, the set operations can be used to build new relations from existing
ones. Specifically, if Q and R are relations from S to T, then so are QR, QR, QR. If R is a relation from
Basic Properties
The important classes of relations that we will study in the next couple of sections are characterized by certain
basic properties. Here are the definitions:
Suppose that R is a relation on S.
(y,x)R and hence (x,y)R. Thus R=R1. Conversely, suppose R=R1. If (x,y)R then (x,y)R1
and hence (y,x)R.
A relation R on S is transitive if and only if RRR.
Proof:
Suppose that R is transitive. If (x,z)RR then there exists yS such that (x,y)R and (y,z)R. But
then (x,z)R by transitivity. Hence RRR. Conversely, suppose that RRR. If (x,y)R and
antisymmetry, x=y. Conversely suppose that (x,y)RR1 implies x=y. If (x,y)R and (y,x)R then
(y,z)Q, and (y,z)R. Hence (x,z)Q and (x,z)R so (x,z)QR. Hence QR is transitive.
Suppose that R is a relation on a set S.
1. Give an explicit definition for the property R is not reflexive.
2. Give an explicit definition for the property R is not irreflexive.
3. Are any of the properties R is reflexive, R is not reflexive, R is irreflexive, R is not irreflexive
equivalent?
Answer:
1. R is not reflexive if and only if there exists xS such that (x,x)R.
2. R is not irreflexive if and only if there exists xS such that (x,x)R.
3. Nope.
Suppose that R is a relation on a set S.
1. Give an explicit definition for the property R is not symmetric.
2. Give an explicit definition for the property R is not antisymmetric.
3. Are any of the properties R is symmetric, R is not symmetric, R is antisymmetric, R is not
antisymmetric equivalent?
Answer:
1. R is not symmetric if and only if there exist x,yS such that (x,y)R and (y,x)R.
2. R is not antisymmetric if and only if there exist distinct x,yS such that (x,y)R and (y,x)R.
3. Nope.
Computational Exercises
Let R be the relation defined on R by xRy if and only if sin(x)=sin(y). Determine if R has each of the
following properties:
1. reflexive
2. symmetric
3. transitive
4. irreflexive
5. antisymmetric
Answer:
1. yes
2. yes
3. yes
4. no
5. no
The relation R in the previous exercise is a member of an important class of equivalence relations.
Let R be the relation defined on R by xRy if and only if x2+y21. Determine if R has each of the following
properties:
1. reflexive
2. symmetric
3. transitive
4. irreflexive
5. antisymmetric
Answer:
1. no
2. yes
3. no
4. no
5. no
4. Partial Orders
Basic Theory
Definitions
A relation on S that is reflexive, anti-symmetric, and transitive is said to be a partial order on S, and the pair
As the name and notation suggest, a partial order is a type of ordering of the elements of S. Partial orders occur
naturally in many areas of mathematics, including probability. A partial order on a set naturally gives rise to
several other relations the set.
Suppose that is a partial order on a set S. The relations , , , , and are defined as follows:
1. xy if and only if yx.
2. xy if and only if xy and xy.
3. xy if and only if yx.
4. xy if and only if xy or yx.
5. xy if and only if neither xy nor yx.
Note that is the inverse of , and is the inverse of . Note also that xy if and only if either xy or
x=y, so the relation completely determines the relation . Finally, note that xy means that x and y are
related in the partial order, while xy means that x and y are unrelated in the partial order. Thus, the relations
and are complements of each other, as sets of ordered pairs. A total or linear order is a partial order in
which there are no unrelated elements.
A partial order on S is a total order or linear order if for every x, yS, either xy or yx.
Suppose that 1 and 2 are partial orders on a set S. Then 1 is an sub-order of 2, or equivalently 2 is an
extension of 1 if and only if x1y implies x2y for x, yS.
Thus if 1 is a suborder of 2, then as sets of ordered pairs, 1 is a subset of 2. We need one more relation
that arises naturally from a partial order.
Suppose that is a partial order on a set S. For x, yS, y is said to cover x if xy but no element zS
satisfies xzy.
If S is finite, the covering relation completely determines the partial order, by virtue of the transitive property.
The covering graph or Hasse graph is the directed graph with vertex set S and directed edge set E, where
(x,y)E if and only if y covers x. Thus, xy if and only if there is a directed path in the graph from x to y.
Hasse graphs are named for the German mathematician Helmut Hasse. The graphs are often drawn with the
edges directed upward. In this way, the directions can be inferred without having to actually draw arrows.
Basic Examples
Of course, the ordinary order is a total order on the set of real numbers R. The subset partial order is one of
the most important in probability theory:
Suppose that S is a set. The subset relation is a partial order on P(S), the power set of S.
Proof:
We proved this result in the section on sets. To review, recall that for A, BP(S), AB means that xA
implies xB. Also A=B means that xA if and only if xB. Thus
1. AA
2. AB and BA if and only if A=B
3. AB and BC imply AC
Let denote the division relation on the set of positive integers N+. That is, mn if and only if there exists
Thus m=n so is antisymmetric. Finally, suppose mn and np, where m, n, pN+. Then there
exists j, kN+ such that n=jm and p=kn. Substituting gives p=jkm, so mp. Thus is transitive.
2. If m, nN+ and mn, then there exists kN+ such that n=km. Since k1, mn.
Basic Properties
The proofs of the following basic properties are straightforward. Be sure to try them yourself before reading the
ones in the text.
The inverse of a partial order is also a partial order.
Proof:
Clearly the reflexive, antisymmetric and transitive properties hold for .
If is a partial order on S and A is a subset of S, then the restriction of to A is a partial order on A.
Proof:
The reflexive, antisymmetric, and transitive properties given above hold for all x, y, zS and hence hold for
all x, y, zA.
The following theorem characterizes relations that correspond to strict order.
Let S be a set. A relation is a partial order on S if and only if is transitive and irreflexive.
Proof:
Suppose that is a partial order on S. Recall that is defined by xy if and only if xy and xy. If xy
and yz then xy and yz, and so xz. On the other hand, if x=z then xy and yx so x=y, a
contradiction. Hence xz and so xz. Therefore is transitive. If xy then xy by definition, so is
irreflexive.
Conversely, suppose that is a transitive and irreflexive relation on S. Recall that is defined by xy if and
only if xy or x=y. By definition then, is reflexive: xx for every xS. Next, suppose that xy and
yx. If xy and yx then xx by the transitive property of . But this is a contradiction by the irreflexive
property, so we must have x=y. Thus is antisymmetric. Suppose xy and yz. There are four cases:
1. If xy and yz then xz by the transitive property of .
2. If x=y and yz then xz by substitution.
Recall the definition of the indicator function 1A associated with a subset A of a universal set S: For xS,
Two partially ordered sets (S,S) and (T,T) are said to be isomorphic if there exists a one-to-one function f
from S onto T such that xSy if and only if f(x)Tf(y), for all x, yS. The function f is an isomorphism.
Generally, a mathematical system often consists of a set and various structures defined in terms of the set, such
as relations, operators, or a collection of subsets. Loosely speaking, two mathematical systems of the same type
are isomorphic if there exists a one-to-one function from one of the sets onto the other that preserves the
structures, and again, the function is called an isomorphism. The basic idea is that isomorphic systems are
mathematically identical, except for superficial matters of appearance. The word isomorphism is from the Greek
and means equal shape.
Suppose that the partially ordered sets (S,S) and (T,T) are isomorphic, and that f:ST is an isomorphism.
Then f and f1 are strictly increasing.
Proof:
We need to show that for x, yS, xSy if and only if f(x)Tf(y). If xSy then by definition, f(x)Tf(y).
But if f(x)=f(y) then x=y since f is one-to-one. This is a contradiction, so f(x)Tf(y). Similarly, if
f(x)Tf(y) then by definition, xSy. But if x=y then f(x)=f(y), a contradiction. Hence f(x)Sf(y).
In a sense, the subset partial order is universalevery partially ordered set is isomorphic to (S,) for some
collection of sets S.
Suppose that is a partial order on a set S. Then there exists SP(S) such that (S,) is isomorphic to
(S,).
Proof:
For each xS, let Ax={uS:ux}, and then let S={Ax:xS}, so that SP(S). We will show that the
function xAx from S onto S is one-to-one, and satisfies
xyAxAy
First, suppose that x, yS and Ax=Ay. Then xAx so xAy and hence xy. Similarly, yAy so yAx
and hence yx. Thus x=y, so the mapping is one-to-one. Next, suppose that xy. If uAx then ux so
uy by the transitive property, and hence uAy. Thus AxAy. Conversely, suppose AxAy. As before,
xAx, so xAy and hence xy.
Extremal Elements
Various types of extremal elements play important roles in partially ordered sets. Here are the definitions:
Suppose that is a partial order on a set S and that AS.
inf(A). Similarly, the least upper bound of A is unique, if it exists, and is denoted lub(A) or sup(A). Note
that every element of S is a lower bound and an upper bound for , since the conditions in the definition hold
vacuously.
The symbols and are also used for infimum and supremum, respectively, so A=inf(A) and
A=sup(A) if they exist.. In particular, for x, yS, operator notation is more commonly used, so
xy=inf{x,y} and xy=sup{x,y}, if these elements exist. If xy and xy exist for every x, yS, then
(S,) is called a lattice.
For the subset partial order, the inf and sup operators correspond to intersection and union, respectively:
Let S be a set and consider the subset partial order on P(S), the power set of S. Let A be a nonempty
subset of P(S)), that is, a nonempty collection of subsets of S. Then
1. inf(A)=A
2. sup(A)=A
Proof:
1. First, AA for every AA and hence A is a lower bound of A. If B is a lower bound of A then
BA for every AA and hence BA. Therefore A is the greatest lower bound.
2. First, AA for every AA and hence A is an upper bound of A. If B is an upper bound of A
then AB for every AA and hence AB. Therefore A is the least upper bound.
In particular, AB=AB and AB)=AB, so (S,P(S)) is a lattice.
Consider the division partial order on the set of positive integers N+ and let A be a nonempty subset of N+.
1. inf(A) is the greatest common divisor of A, usually denoted gcd(A) in this context.
2. If A is infinite then sup(A) does not exist. If A is finite then sup(A) is the least common multiple of
S is said to be well ordered by . One of the most important examples is N+, which is well ordered by the
ordinary order . On the other hand, the well ordering principle, which is equivalent to the axiom of choice,
states that every nonempty set can be well ordered.
Orders on Product Spaces
Suppose that S and T are sets with partial orders S and T respectively. Define the relation on ST by
(u,v), (x,y)ST and that (u,v)(x,y) and (x,y)(u,v). Then uSx and xSu, and vTy and
yTv. Hence u=x and v=y, so (u,v)=(x,y). Hence is antisymmetric. Finally, suppose that
(u,v), (x,y), (z,w)ST, and that (u,v)(x,y) and (x,y)(z,w). Then uSx and xSz, and
vTy and yTw. Hence uSz and vSw, and therefore (u,v)(z,w). Hence is transitive.
2. Suppose that (S,S)=(T,T), and that x, yS are distinct. If (x,y)(y,x) then xSy and ySx
and hence x=y, a contradiction. Similarly (y,x)(x,y) leads to the same contradiction x=y. Hence
The product order on R2. The region shaded red is the set of
points (x,y). The region shaded blue is the set of points (x,y). The region shaded white is the set of points
that are not comparable with (x,y).
Product order extends in a straightforward way to the Cartesian product of a finite or an infinite sequence of
partially ordered spaces. For example, suppose that Si is a set with partial order i for each i{1,2,,n},
where nN+. Let S=S1S2Sn. The product order on S is defined by as follows: for x=(x1,x2,
,xn)S and y=(y1,y2,,yn)S, xy if and only if xiiyi for each i{1,2,,n}. Recalling that
sequences are just functions defined on a particular type of index set, the product order can be generalized even
further:
Suppose that Si is a set with partial order i for each i in a nonempty index set I. Let
(u,v), (x,y)ST, and that (u,v)(x,y) and (x,y)(u,v). There are four cases, but only one can
actually happen.
o If uSx and xSu then uSu. This is a contradiction since S is irreflexive, and so this case
cannot happen.
o
If uSx, x=u, and yTv, then uSu. This case also cannot happen.
If u=x, vTy, and xSu, then uSu. This case also cannot happen
if u=x, vTy, x=u, and yTv, then u=x and v=y, so (u,v)=(x,y).
Hence is antisymmetric. Finally, suppose that (u,v), (x,y), (z,w)ST and that (u,v)(x,y) and
If u=x, vTy, x=z, and yTw, then u=z and vTw, and hence (u,v)(z,w).
Therefore is transitive.
2. Suppose that S and T are total orders. Let (u,v), (x,y)ST. Then either uSx or xSu and
either vTy or yTv. Again there are four cases
o
The lexicographic order on R2. The region shaded red is the set
of points (x,y). The region shaded blue is the set of points (x,y). The region shaded white is the set of
points that are not comparable with (x,y).
As with the product order, the lexicographic order can be generalized to a collection of partially ordered spaces.
However, we need the index set to be totally ordered.
Suppose that Si is a set with partial order i for each i in a nonempty index set I. Suppose also that is a total
order on I. Let
f(j)jg(j). Then
1. is a partial order on S, known again as the lexicographic order.
2. If i is a total order for each iI, and I is well ordered by , then then is a total order on S.
Proof:
1. By the result above on strict orders, we need to show that is irreflexive and transitive. Thus, suppose
that f, gS and that fg and gf. Then there exists jI such that f(i)=g(i) if i<j and
f(j)jg(j). Similarly there exists kI such that g(i)=f(i) if i<k and g(k)kf(k). Since I is totally
ordered under , either j<k, k<j, or j=k. If j<k then we have g(j)=f(j) and f(j)jg(j), a
contradiction. If k<j then we have f(k)=g(k) and g(k)kf(k), another contradiction. If j=k, we
have f(j)jg(j) and g(j)jf(j), another contradiction. Hence is irreflexive. Next, suppose that
f, g, hS and that fg and gh. Then there exists jI such that f(i)=g(i) if i<j and f(j)jg(j).
Similarly, there exists kI such that g(i)=h(i) if i<k and g(k)kh(k). Again, since I is totally
ordered, either j<k or k<j or j=k. If j<k, then f(i)=g(i)=h(i) if i<j and f(j)jg(j)=h(j). If
k<j, then f(i)=g(i)=h(i) if i<k and f(k)=g(k)kh(k). If j=k, then f(i)=g(i)=h(i) if i<j and
f(j)jg(j)jh(j). In all cases, fh so is transitive.
2. Suppose now that i is a total order on Si for each iI and that I is well ordered by . Let f, gS
with fg. Let J={iI:f(i)g(i)}. Then J by assumption, and hence has a minimum element j.
If i<j then iJ and hence f(i)=g(i). On the other hand, f(j)g(j) since jJ and therefore, since
j is totally ordered, we must have either f(j)jg(j) or g(j)jf(j). In the first case, fg and in the
second case gf. Hence is totally ordered.
The term lexicographic comes from the way that we order words alphabetically: We look at the first letter; if
these are different, we know how to order the words. If the first letters are the same, we look at the second
letter; if these are different, we know how to order the words. We continue in this way until we find letters that
are different, and we can order the words. In fact, the lexicographic order is sometimes referred to as the first
difference order. Note also that if Si is a set and i a total order on Si for iI, then by the well ordering
principle, there exists a well ordering of I, and hence there exists a lexicographic total order on the product
space S. As a mathematical structure, the lexicographic order is not as obscure as you might think.
(R,) is isomorphic to the lexicographic product of (Z,) with ([0,1),), where is the ordinary order for
real numbers.
Proof:
Every xR can be uniquely expressed in the form x=n+t where n=xZ is the integer part and
t=xn[0,1) is the remainder. Thus x(n,t) is a one-to-one function from R onto Z[0,1). For example,
5.3 maps to (5,0.3), while 6.7 maps to (7,0.3). Suppose that if x=m+s, y=n+tR, where of course
m, nZ are the integer parts of x and y, respectively, and s, t[0,1) are the corresponding remainders. Then
x<y if and only if m<n or m=n and s<t. Again, to illustrate with real real numbers, we can tell that
5.3<7.8 just by comparing the integer parts: 5<7. We can ignore the remainders. On the other hand, to see that
6.4<6.7 we need to compare the remainders: 0.4<0.7 since the integer parts are the same.
Limits of Sequences of Real Numbers
Suppose that (a1,a2,) is a sequence of real numbers.
The sequence inf{an,an+1} is increasing in nN+.
Since the sequence of infimums in the last result is increasing, the limit exists in R{}, and is called the
limit inferior of the original sequence:
liminfnan=limninf{an,an+1,}
The sequence sup{an,an+1,} is decreasing in nN+.
Since the the sequence of supremums in the last result is decreasing, the limit exists in R{}, and is called
the limit superior of the original sequence:
limsupnan=limnsup{an,an+1,}
Note that liminfnanlimsupnan and equality holds if and only if limnan exists (and is the
common value).
Computational Exercises
Let S={2,3,4,6,12}.
1. Sketch the Hasse graph corresponding to the ordinary order on S.
1.
2.
Consider the ordinary order on the set of real numbers R, and let A=[a,b) where a<b. Find each of the
following that exist:
1. The set of minimal elements of A
2. The set of maximal elements of A
3. min(A)
4. max(A)
5. The set of lower bounds of A
6. The set of upper bounds of A
7. inf(A)
8. sup(A)
Answer:
1. {a}
2.
3. a
4. Does not exist
5. (,a]
6. [b,)
7. a
8. b
Again consider the division partial order on the set of positive integers N+ and let A={2,3,4,6,12}. Find
each of the following that exist:
1. The set of minimal elements of A
2. The set of maximal elements of A
3. min(A)
4. max(A)
5. The set of lower bounds of A
6. The set of upper bounds of A
7. inf(A)
8. sup(A).
Answer:
1. {2,3}
2. {12}
3. Does not exist
4. 12
5. {1}
6. {12,24,36,}
7. 1
8. 12
Let S={a,b,c}.
1. Give P(S) in list form.
Note that the Hasse graph of looks the same as the graph of , except for the labels on the vertices. This
symmetry is because the complement relationship.
Let S={a,b,c,d}.
1. Give P(S) in list form.
2. Describe the Hasse graph of (S,P(S))
Answer:
1. P(S)={,{a},{b},{c},{d},{a,b},{a,c},{a,d},{b,c},{b,d},{c,d},{a,b,c},{a,b,d},{a,c,d},
{b,c,d},S}
2. For AP(S) and xSA, there is a directed edge from A to A{x}
Note again that the Hasse graph of looks the same as the graph of , except for the labels on the vertices.
This symmetry is because the complement relationship.
Suppose that A and B are subsets of a universal set S. Let A denote the collection of the 16 subsets of S that
can be constructed from A and B using the set operations. Show that (A,) is isomorphic to the partially
ordered set in the previous exercise. Use the Venn diagram applet to help.
Proof:
Let a=AB, b=ABc, c=AcB, d=AcBc. Our basic assumption is that A and B are in general position,
so that a,b,c,d are distinct and nonempty. Note also that {a,b,c,d} partitions S. Now, map each subset S of
{a,b,c,d} to S. This function is an isomorphism from S to A. That is, for S and T subsets of {a,b,c,d},
ST if and only if ST.
5. Equivalence Relations
Basic Theory
Definitions
A relation on a nonempty set S that is reflexive, symmetric, and transitive is said to be an equivalence
relation on S. Thus, for all x,y,zS,
1. xx, the reflexive property.
2. If xy then yx, the symmetric property.
3. If xy and yz then xz, the transitive property.
As the name and notation suggest, an equivalence relation is intended to define a type of equivalence among the
elements of S. Like partial orders, equivalence relations occur naturally in most areas of mathematics, including
probability.
Suppose that is an equivalence relation on S. The equivalence class of an element xS is the set of all
elements that are equivalent to x, and is denoted
[x]={yS:yx}
Results
The most important result is that an equivalence relation on a set S defines a partition of S, by means of the
equivalence classes.
Suppose that is an equivalence relation on a set S.
1. If xy then [x]=[y].
2. If x
y then [x][y]=.
3. The collection of (distinct) equivalence classes is a partition of S into nonempty sets.
Proof:
1. Suppose that xy. If u[x] then ux and hence uy by the transitive property. Hence u[y].
Similarly, if u[y] then uy. But yx by the symmetric property, and hence ux by the transitive
property. Hence u[x].
2. Suppose that x
y. If u[x][y], then u[x] and u[y], so ux and uy. But then xu by the
symmetric property, and then xy by the transitive property. This is a contradiction, so [x][y]=.
3. From (a) and (b), the (distinct) equivalence classes are disjoint. If xS, then xx by the reflexive
property, and hence x[x]. Therefore xS[x]=S.
Sometimes the set S of equivalence classes is denoted S/. The idea is that the equivalence classes are new
objects obtained by identifying elements in S that are equivalent. Conversely, every partition of a set defines an
equivalence relation on the set.
Suppose that S is a collection of nonempty sets that partition a given set S. Define the relation on S defined
by xy if and only if xA and yA for some AS.
1. is an equivalence relation.
2. S is the set of equivalence classes.
Proof:
1. If x, then xA for some AS, since S partitions S. Hence xx, and so the reflexive property holds.
Next, is trivially symmetric by definition. Finally, suppose that xy and yz. Then x,yA for
some AS and y,zB for some BS. But then yAB. The sets in S are disjoint, so A=B.
Hence x,zA, so xz. Thus is transitive.
2. If xS, then xA for a unique AS, and then by definition, [x]=A.
S/(S/S)=S.
Suppose that S is a nonempty set. The most basic equivalence relation on S is the equality relation =. In this
case [x]={x} for each xS. At the other extreme is the trivial relation defined by xy for all x,yS. In
this case S is the only equivalence class.
Every function f defines an equivalence relation on its domain, known as the equivalence relation associated
with f. Moreover, the equivalence classes have a simple description in terms of the inverse images of f.
Suppose that f:ST. Define the relation on S by xy if and only if f(x)=f(y).
1. The relation is an equivalence relation on S.
2. The set of equivalences classes is S={f1{t}:trange(f)}.
3. The function F:ST defined by F([x])=f(x) is well defined and is one-to-one.
Proof:
1. If xS then trivially f(x)=f(x), so xx. Hence is reflexive. If xy then f(x)=f(y) so trivially
f(y)=f(x) and hence yx. Thus is symmetric. If xy and yz then f(x)=f(y) and f(y)=f(z),
so trivially f(x)=f(z) and so xz. Hence is transitive.
2. Recall that trange(f) if and only if f(x)=t for some xS. Then by definition,
[x]=f1{t}={yS:f(y)=t}={yS:f(y)=f(x)}
3. From (1), [x]=[y] if and only if xy if and only if f(x)=f(y). This shows both that F is well defined,
and that F is one-to-one.
Suppose that we have a relation that is reflexive and transitive, but fails to be a partial order because it's not
anti-symmetric. The relation and its inverse naturally lead to an equivalence relation, and then in turn, the
original relation defines a true partial order on the equivalence classes. This is a common construction, and the
details are given in the next theorem.
Suppose that is a relation on a set S that is reflexive and transitive. Define the relation on S by xy if and
only if xy and yx.
1. is an equivalence relation on S.
2. If A and B are equivalence classes and xy for some xA and yB, then uv for all uA and
vB.
3. Define the relation on the collection of equivalence classes S by AB if and only if xy for some
(and hence all) xA and yB. Then is a partial order on S.
Proof:
1. If xS then xx since is reflexive. Hence xx, so is reflexive. Clearly is symmetric by the
symmetry of the definition. Suppose that xy and yz. Then xy, yz, zy and yx. Hence xz
and zx since is transitive. Therefore xz so is transitive.
2. Suppose that A and B are equivalence classes of and that xy for some xA and yB. If uA
and vB, then xu and yv. Therefore ux and yv. By transitivity, uv.
3. Suppose that AS. If x,yA then xy and hence xy. Therefore AA and so is reflexive. Next
suppose that A,BS and that AB and BA. If xA and yB then xy and yx. Hence xy
so A=B. Therefore is antisymmetric. Finally, suppose that A,B,CS and that AB and BC.
Note that B so let yB. If xA,zC then xy and yz. Hence xz and therefore AC. So
is transitive.
A prime example of the construction in the previous theorem occurs when we have a function whose range
space is partially ordered. We can construct a partial order on the equivalence classes in the domain that are
associated with the function.
Suppose that S and T are sets and that T is a partial order on T. Suppose also that f:ST. Define the relation
Proof:
1. If xS then f(x)Tf(x) since T is reflexive, and hence xSx. Thus preceqS is reflexive. Suppose
that x,y,zS and that xSy and ySz. Then f(x)Tf(y) and f(y)Tf(z). Hence f(x)Tf(z)
since T is transitive. Thus S is transitive.
2. For the equivalence relation on S constructed in (8), xy if and only if xSy and ySx if and only
if f(x)Tf(y) and f(y)Tf(x) if and only if f(x)=f(y), since T is antisymmetric. Thus is the
equivalence relation associated with f.
3. This follows immediately from (8) and parts (a) and (b). If u,vrange(f), then f1({u})Sf1({v})
if and only if uTv.
[f]={f+c:cR}
Congruence
Recall the division relation from N+ to Z: For dN+ and nZ, dn means that n=kd for some kZ. In
words, d divides n or equivalently n is a multiple of d. In the previous section, we showed that is a partial
order on N+.
Now fix dN+. Define the relation d on Z by mdn if and only if d(nm). The relation d is known as
congruence modulo d. Also, let rd:Z{0,1,,d1} be defined so that r(n) is the remainder when n is
divided by d. Specifically, recall that by the Euclidean division theorem, named for Euclid of course, nZ can
be written uniquely in the form n=kd+q where kZ and q{0,1,,d1}, and then rd(n)=q.
Congruence modulo d.
1. d is the equivalence relation associated with the function rd.
2. There are d distinct equivalence classes, given by [q]d={q+kd:kZ} for q{0,1,,d1}.
Proof:
1. Recall that for the equivalence relation associated with rd, integers m and n are equivalent if and only if
rd(m)=rd(n). By the division theorem, m=jd+p and n=kd+q, where j,kZ and p,q{0.1,},
and these representations are unique. Thus nm=(kj)d+(qp), and so mdn if and only if
[0]4={0,4,8,12,}{4,8,12,16,}
[1]4={1,5,9,13,}{3,7,11,15,}
[2]4={2,6,10,14,}{2,6,10,14,}
[3]4={3,7,11,15,}{1,5,9,13,}
Linear Algebra
Linear algebra provides several examples of important and interesting equivalence relations. To set the stage, let
Rmn denote the set of mn matrices with real entries, for m,nN+
BRmn, then A and B are said to be row equivalent if A can be transformed into B by a finite sequence of
row operations.
Row equivalence is an equivalence relation on Rmn.
Proof.
If ARmn, then A is row equivalent to itself: we can simply do nothing, or if you prefer, we can multiply the
first row of A by 1. For symmetry, the key is that each row operation can be reversed by another row operation:
multiplying a row by c0 is reversed by multiplying the same row of the resulting matrix by 1/c.
Interchanging two rows is reversed by interchanging the same two rows of the resulting matrix. Adding c times
row i to row j is reversed by adding c times row i to row j in the resulting matrix. Thus, if we can transform
A into B by a finite sequence of row operations, then we can transform B into A by applying the reversed row
operations in the reverse order. Transitivity is clear: If we can transform A into B by a sequence of row
operations and B into C by another sequence of row operations, then we can transform A into C by putting the
two sequences together.
Next recall that nn matrices A and B are said to be similar if there exists an invertible nn matrix P such
that P1AP=B. Similarity is very important in the study of linear transformations, change of basis, and the
theory of eigenvalues and eigenvectors.
Similarity is an equivalence relation on Rnn.
Proof:
If ARnn then A=I1AI, where I is the nn identity matrix, so A is similar to itself. Suppose that
A,BRnn and that A is similar to B so that B=P1AP for some invertible PRnn. Then
A=PBP1=(P1)1BP1 so B is similar to A. Finally, suppose that A,B,CRnn and that A is similar
to B and that B is similar to C. Then B=P1AP and C=Q1BQ for some invertible P,QRnn. Then
C=Q1P1APQ=(PQ)1A(PQ), so A is similar to C.
Next recall that for ARmn, the transpose of A is the matrix ATRnm with the property that (i,j) entry of
A is the (j,i) entry of AT, for i{1,2,,m} and j{1,2,,n}. Simply stated, AT is the matrix whose
rows are the columns of A. For the theorem that follows, we need to remember that (AB)T=BTAT for
A,BRnn and that A is congruent to B so that B=PTAP for some invertible PRnn. Then A=(PT)
1BP1=(P1)TBP1
to B and that B is congruent to C. Then B=PTAP and C=QTBQ for some invertible P,QRnn. Then
C=QTPTAPQ=(PQ)TA(PQ), so A is congruent to C.
Number Systems
Equivalence relations play an important role in the construction of complex mathematical structures from
simpler ones. Often the objects in the new structure are equivalence classes of objects constructed from the
simpler structures, modulo an equivalence relation that captures the essential properties of the new objects.
The construction of number systems is a prime example of this general idea. The next exercise explores the
construction of rational numbers from integers.
Define a relation on ZN+ by (j,k)(m,n) if and only if jn=km.
1. is an equivalence relation.
2. Define mn=[(m,n)], the equivalence class generated by (m,n), for mZ and nN+. This definition
captures the essential properties of the rational numbers.
Proof:
(m,n)ZN+ and (j,k)(m,n), then jn=km so trivially mk=nj, and hence (m,n)(j,k). Thus
is symmetric. Finally, suppose that (j,k),(m,n),(p,q)ZN+ and that (j,k)(m,n) and
(m,n)(p,q). Then jn=km and mq=np, so jnp=kmp which implies jmq=kmp, and so jq=kp.
Hence (j,k)(p,q) so is transitive.
2. Suppose that jk and mn are rational numbers in the usual, informal sense, where j,mZ and k,nN+.
Then jk=mn if and only if jn=km if and only if (j,k)(m,n), so it makes sense to define mn as the
equivalence class generated (m,n). Addition and multiplication are defined in the usual way: if (j,k),
(m,n)ZN+ then
jk+mn=jn+mkkn, jkmn=jmkn
The definitions are consistent; that is they do not depend on the particular representations of the
equivalence classes.
6. Cardinality
Basic Theory
Definitions
Suppose that S is a non-empty collection of sets. We define a relation on S by AB if and only if there
exists a one-to-one function f from A onto B. The relation is an equivalence relation on S. That is, for all
A,B,CS,
1. AA, the reflexive property
2. If AB then BA, the symmetric property
3. If AB and BC then AC, the transitive property
Proof:
1. The identity function IA on A, given by IA(x)=x for xA, maps A one-to-one onto A. Hence AA
2. If AB then there exists a one-to-one function f from A onto B. But then f1 is a one-to-one function
from B onto A, so BA
3. Suppose that AB and BC. Then there exists a one-to-one function f from A onto B and a one-toone function g from B onto C. But then gf is a one-to-one function from A onto C, so AC.
A one-to-one function f from A onto B is sometimes called a bijection. Thus if AB then A and B are in oneto-one correspondence and are said to have the same cardinality. Thus, the equivalence classes under this
equivalence relation capture the notion of having the same number of elements.
Let N0=, and for kN+, let Nk={0,1,k1}. As always, N={0,1,2,} is the set of all natural
numbers.
Suppose that A is a set.
1. A is finite if ANk for some kN, in which case k is the cardinality of A, and we write #(A)=k.
2. A is infinite if A is not finite.
3. A is countably infinite if AN.
4. A is countable if A is finite or countably infinite.
5. A is uncountable if A is not countable.
In part (a), think of Nk as a reference set with k elements; any other set with k elements must be equivalent to
this one. We will study the cardinality of finite sets in the next two sections on Counting Measure and
Combinatorial Structures. In this section, we will concentrate primarily on infinite sets. In part (c), a countable
set is one that can be enumerated or counted by putting the elements into one-to-one correspondence with Nk
for some kN or with all of N. An uncountable set is one that cannot be so counted. Countable sets play a
special role in probability theory, as in many other branches of mathematics. Apriori, it's not clear that there are
uncountable sets, but we will soon see examples.
Preliminary Examples
If S is a set, recall that P(S) denotes the power set of S (the set of all subsets of S). If A and B are sets, then
AB is the set of all functions from B into A. In particular, {0,1}S denotes the set of functions from S into
{0,1}.
If S is a set then P(S){0,1}S.
Proof:
The mapping that takes a set AP(S) into its indcator function 1A{0,1}S is one-to-one and onto.
Specifically, if A,BP(S) and 1A=1B, then A=B, so the mapping is one-to-one. On the other hand, if
N. But as the previous result shows, that point of view is incorrect: N, E, and Z all have the same cardinality
(and are countably infinite). The next example shows that there are indeed uncountable sets.
If A is a set with at least two elements then S=AN, the set of all functions from N into A, is uncountable.
Proof:
The proof is by contradiction, and uses a nice trick known as the diagonalization method. Suppose that S is
countably infinite (it's clearly not finite), so that the elements of S can be enumerated: S={f0,f1,f2,}. Let a
and b denote distinct elements of A and define g:NA by g(n)=b if fn(n)=a and g(n)=a if fn(n)a.
Note that gfn for each nN, so gS. This contradicts the fact that S is the set of all functions from N into
A.
Subsets of Infinite Sets
Surely a set must be as least as large as any of its subsets, in terms of cardinality. On the other hand, by the
example above, the set of natural numbers N, the set of even natural numbers E and the set of integers Z all
have exactly the same cardinality, even though ENZ. In this subsection, we will explore some interesting
and somewhat paradoxical results that relate to subsets of infinite sets. Along the way, we will see that the
countable infinity is the smallest of the infinities.
If S is an infinite set then S has a countable infinite subset.
Proof:
Select a0S. It's possible to do this since S is infinite and therefore nonempty. Inductively, having chosen
{a0,a1,,ak1}S, select akS{a0,a1,,ak1}. Again, it's possible to do this since S is not finite.
Manifestly, {a0,a1,} is a countably infinite subset of S.
A set S is infinite if and only if S is equivalent to a proper subset of S.
Proof:
If S is finite, then S is not equivalent to a proper subset by the pigeonhole principle. If S is infinite, then S has
countably infinite subset {a0,a1,a2,} by the previous result. Define the function f:SS by f(an)=a2n and
On the other hand, if there exists a function that maps a set A onto a set B, then in a sense, there is a copy of B
contained in A. Hence A should be at least as large as B.
Suppose that f:AB is onto.
1. If A is finite then B is finite.
2. If B is infinite then A is infinite.
3. If A is countable then B is countable.
4. If B is uncountable then A is uncountable.
Proof:
For each yB, select a specific xA with f(x)=y (if you are persnickety, you may need to invoke the axiom
of choice). Let C be the set of chosen points. Then f maps C one-to-one onto B, so CB and CA. The
results now follow from the comparison result above:
1. If A is finite then C is finite and hence B is finite.
2. If B is infinite then C is infinite and hence A is infinite.
3. If A is countable then C is countable and hence B is countable.
4. If B is uncountable then C is uncountable and hence A is uncountable.
The previous exercise also could be proved from the one before, since if there exists a function f mapping A
onto B, then there exists a function g mapping B one-to-one into A. This duality is proven in the discussion of
the axiom of choice. A simple and useful corollary of the previous two theorems is that if B is a given countably
infinite set, then a set A is countable if and only if there exists a one-to-one function f from A into B, if and
only if there exists a function g from B onto A.
If Ai is a countable set for each i in a countable index set I, then iIAi is countable.
Proof:
Consider the most extreme case in which the index set I is countably infinite. Since Ai is countable, there exists
a function fi that maps N onto Ai for each iN. Let M={2i3j:(i,j)NN}. Note that the points in M are
distinct, that is, 2i3j2m3n if (i,j),(m,n)NN and (i,j)(m,n). Hence M is infinite, and since MN,
M is countably infinite. The function f given by f(2i3j)=fi(j) maps M onto iIAi, and hence this last set
is countable.
AB=aA{a}B
Both proofs work because the set M is essentially a copy of NN, embedded inside of N. The last theorem
generalizes to the statement that a finite product of countable sets is still countable. But, from the result above
on sets of functions, a product of infinitely many sets (with at least 2 elements each) will be uncountable.
The set of rational numbers Q is countably infinite.
Proof:
The sets Z and N+ are countably infinite and hence the set ZN+ is countably infinite. The function
C=n=1Zn. Think of C as the set of coefficients and note that C is countably infinite. Let P denote the set
of polynomials of degree 1 or more, with integer coefficients. The function (a0,a1,
,an)a0+a1x++anxn maps C onto P, and hence P is countable. For pP, let Ap denote the set of
roots of p. A polynomial of degree n in P has at most n roots, by the fundamental theorem of algebra, so in
particular Ap is finite for each pP. Finally, note that A=pPAp and so A is countable. Of course NA, so
A is countably infinite.
Now let's look at some uncountable sets.
The interval [0,1) is uncountable.
Proof:
Recall that {0,1}N+ is the set of all functions from N+ into {0,1}, which in this case, can be thought of as
infinite sequences or bit strings:
all kn}. Clearly Nn is finite for all nN+, so N is countable, and therefore S={0,1}N+N is uncountable.
In fact, S{0,1}N+. The function
xn=1xn2n
maps S one-to-one onto [0,1). In words every number in [0,1) has a unique binary expansion in the form of a
sequence in S. Hence [0,1)S and in particular, is uncountable. The reason for eliminating the bit strings that
terminate in 1s is to ensure uniqueness, so that the mapping is one-to-one. The bit string x1x2xk0111
corresponds to the same number in [0,1) as the bit string x1x2xk1000.
The following sets have the same cardinality, and in particular all are uncountable:
1. R, the set of real numbers.
2. Any interval I of R, as long as the interval is not empty or a single point.
3. RQ, the set of irrational numbers.
4. RA, the set of transcendental numbers.
5. P(N), the power set of N.
Proof:
1. The mapping x2x1x(1x) maps (0,1) one-to-one onto R so (0,1)R. But (0,1)=[0,1){0}, so
(0,1](0,1)R, and all of these sets are uncountable by the previous result.
2. . Suppose a,bR and a<b. The mapping xa+(ba)x maps (0,1) one-to-one onto (a,b) and hence
A onto B? Fortunately, the answer is yes; the result is known as the Schrder-Bernstein Theorem, named for
Ernst Schrder and Sergi Bernstein.
P(A) has a supremum (namely the union of the subcollection). Suppose that f maps A one-to-one into B and
g maps B one-to-one into A. Define the function h:P(A)P(A) by h(U)=Ag[Bf(U)] for UA. Then
h is increasing:
UVf(U)f(V)Bf(V)Bf(U)g[Bf(V)]g[bf(u)]Ag[Bf(U) Ag[Bf(U)]
From the fixed point theorem for partially ordered sets, there exists UA such that h(U)=U. Hence
R. The continuum hypothesis actually started out as the continuum conjecture, until it was shown to be
consistent with the usual axioms of the real number system (by Kurt Gdel in 1940), and independent of those
axioms (by Paul Cohen in 1963).
If S is uncountable, then there exists AS such that A and Ac are uncountable.
Proof:
There is an easy proof, assuming the continuum hypothesis. Under this hypothesis, if S is uncountable then
[0,1)S. Hence there exists a one-to-one function f:[0,1)S. Let A=f[0,12). Then A is uncountable, and
since f[12,1)Ac, Ac is uncountable.
There is a more complicated proof just using the axiom of choice.
7. Counting Measure
Basic Theory
Suppose that S is a finite set. For AS, the cardinality of A is the number of elements in A, and is denoted
#(A). The function # on P(S) is called counting measure. Counting measure plays a fundamental role in
discrete probability structures, and particularly those that involve sampling from a finite set. The set S is
typically very large, hence efficient counting methods are essential. The first combinatorial problem is attributed
to the Greek mathematician Xenocrates.
In many cases, a set of objects can be counted by establishing a one-to-one correspondence between the given
set and some other set. Naturally, the two sets have the same number of elements, but for some reason, the
second set may be easier to count.
The Addition Rule
The addition rule of combinatorics is simply the additivity axiom of counting measure.
If {A1,S2,,An} is a collection of disjoint subsets of S then
#(i=1nAi)=i=1n#(Ai)
N.
Inequalities
This subsection gives two inequalities that are useful for obtaining bounds on the number of elements in a set.
The first is Boole's inequality (named after George Boole) which gives an upper bound on the cardinality of a
union.
If {A1,A2,,An} is a a finite collection of subsets of S then
#(i=1nAi)i=1n#(Ai)
Proof:
Let B1=A1 and Bi=Ai(A1Ai1) for i{2,3,,n}. Note that {B1,B2,,Bn} is a pairwise disjoint
collection and has the same union as {A1,A2,,An}. From the the increasing property, #(Bi)#(Ai) for
each i{1,2,,n}. Hence by the addition rule,
#(i=1nAi)=#(i=1nBi)i=1n#(Ai)
Intuitively, Boole's inequality holds because parts of the union have been counted more than once in the
expression on the right. The second inequality is Bonferroni's inequality (named after Carlo Bonferroni), which
gives a lower bound on the cardinality of an intersection.
If {A1,A2,,An} is a finite collection of subsets of S then
#(i=1nAi)#(S)i=1n[#(S)#(Ai)]
Proof:
Using the complement rule, Boole's inequality, and DeMorgan's law,
#(i=1nAi)=#(S)#(i=1nAci)#(S)i=1n#(Aci)=#(S)i=1n[#(S)#(Ai)]
The Inclusion-Exclusion Formula
The inclusion-exclusion formula gives the cardinality of a union of sets in terms of the cardinality of the various
intersections of the sets. The formula is useful because intersections are often easier to count. We start with the
special cases of two sets and three sets. As usual, we assume that the sets are subsets of a finite universal set S.
If A and B are subsets of S then #(AB)=#(A)+#(B)#(AB).
Proof:
Note first that AB=A(BA) and the latter two sets are disjoint. From the addition rule and the difference
rule,
#(AB)=#(A)+#(BA)=#(A)+#(B)#(AB)
+#(ABC).
Proof:
Note that ABC=(AB)[C(AB)] and that AB and C(AB) are disjoint. Using the addition
rule and the difference rule,
#(ABC)=#(AB)+#[C(AB)]=#(AB)+#(C)#[C(AB)]=#(AB)+#(C)
#[(AC)(BC)]
Now using the inclusion-exclusion rule for two sets (twice) we have
#(ABC)=#(A)+#(B)#(AB)+#(C)#(AC)#(BC)+#(ABC)
#(iIAi)=k=1n(1)k1JI,#(J)=k#jJAj
Proof:
The proof is by induction on n. The formula holds for n=2 events by (7) and for n=3 events by (8). Suppose
the formula holds for nN+, and suppose that {A1,A2,,An,An+1} is a collection of n+1 subsets of S.
Then
i=1n+1Ai=(i=1nAi)[An+1(i=1nAi)]
and the two sets connected by the central union are disjoint. Using the addition rule and the difference rule,
#(i=1n+1Ai)=#(i=1nAi)+#(An+1)#[An+1(i=1nAi)]=#(i=1nAi)
+#(An+1)#[i=1n(AiAn+1)]
By the induction hypothesis, the formula holds for the two unions of n sets in the last expression. The result
then follows by simplification.
The general Bonferroni inequalities, named again for Carlo Bonferroni, state that if sum on the right is truncated
after k terms (k<n), then the truncated sum is an upper bound for the cardinality of the union if k is odd (so
that the last term has a positive sign) and is a lower bound for the cardinality of the union if k is even (so that
the last terms has a negative sign).
The Multiplication Rule
The multiplication rule of combinatorics is based on the formulation of a procedure (or algorithm) that
generates the objects to be counted. Specifically, suppose that the procedure consists of k steps, performed
sequentially, and that for each j{1,2,,k}, step j can be performed in nj ways regardless of the choices
made on the previous steps. Then the number of ways to perform the entire algorithm (and hence the number of
objects) is n1n2nk.
The key to a successful application of the multiplication rule to a counting problem is the clear formulation of
an algorithm that generates the objects being counted, so that each object is generated once and only once. That
is, we must neither over-count nor under-count. It's also important to notice that the set of choices available at
step j may well depend on the previous steps; the assumption is only that the number of choices available does
not depend on the previous steps.
The first two results below give equivalent formulations of the multiplication principle.
Suppose that S is a set of sequences of length k, and that we denote a generic element of S by (x1,x2,,xk).
Suppose that for each j{1,2,,k}, xj has nj different values, regardless of the values of the previous
coordinates. Then #(S)=n1n2nk.
Proof:
A procedure that generates the sequences in S consists of k steps. Step j is to select the jth coordinate.
Suppose that T is an ordered tree with depth k and that each vertex at level i1 has ni children for i{1,2,
Proof:
Each endpoint of the tree is uniquely associated with the path from the root vertex to the endpoint. Each such
path is a sequence of length k, in which there are nj values for coordinate j for each j{1,2,,k}. Hence the
result follows from the previous result on sequences.
Product Sets
If Si is a set with ni elements for i{1,2,,k} then
#(S1S2Sk)=n1n2nk
Proof:
This is a corollary of the result above on sequences.
If S is a set with n elements, then Sk has nk elements.
Proof:
This is a corollary of the on previous result on product sets.
In the previous result, note that the elements of Sk can be thought of as ordered samples of size k that can be
chosen with replacement from a population of n objects. Elements of {0,1}n are sometimes called bit strings of
length n. Thus, there are 2n bit strings of length n.
Functions
The number of functions from a set A of m elements into a set B of n elements is nm.
Proof:
An algorithm for constructing a function f:AB is to choose the value of f(x)B for each xA. There are
n elements, then the elements in the Cartesian power set Sk can be thought of as functions from {1,2,,k}
into S. Thus, the result above on Cartesian powers can be thought of as a corollary of the previous result on
functions.
Subsets
If S is a set with n elements then there are 2n subsets of S.
Computational Exercises
A license number consists of two letters (uppercase) followed by five digits. How many different license
numbers are there?
Answer:
67600000
Suppose that a Personal Identification Number (PIN) is a four-symbol code word in which each entry is either a
letter (uppercase) or a digit. How many PINs are there?
Answer:
1679616
In the board game Clue, Mr. Boddy has been murdered. There are 6 suspects, 6 possible weapons, and 9
possible rooms for the murder.
1. The game includes a card for each suspect, each weapon, and each room. How many cards are there?
2. The outcome of the game is a sequence consisting of a suspect, a weapon, and a room (for example,
Colonel Mustard with the knife in the billiard room). How many outcomes are there?
3. Once the three cards that constitute the outcome have been randomly chosen, the remaining cards are
dealt to the players. Suppose that you are dealt 5 cards. In trying to guess the outcome, what hand of
cards would be best?
Answer:
1. 21 cards
2. 324 outcomes
3. The best hand would be the 5 remaining weapons or the 5 remaining suspects.
An experiment consists of rolling a fair die, drawing a card from a standard deck, and tossing a fair coin. How
many outcomes are there?
Answer:
624
A fair die is rolled 5 times and the sequence of scores recorded. How many outcomes are there?
Answer:
7776
Suppose that 10 persons are chosen and their birthdays recorded. How many possible outcomes are there?
Proof:
41969002243198805166015625
In the card game Set, each card has 4 properties: number (one, two, or three), shape (diamond, oval, or
squiggle), color (red, blue, or green), and shading (solid, open, or stripped). The deck has one card of each
(number, shape, color, shading) configuration. A set in the game is defined as a set of three cards which, for
each property, the cards are either all the same or all different.
1. 81
2. 1080
A coin is tossed 10 times and the sequence of scores recorded. How many sequences are there?
Answer:
1024
A string of lights has 20 bulbs, each of which may be good or defective. How many configurations are there?
Answer:
1048576
The die-coin experiment consists of rolling a die and then tossing a coin the number of times shown on the die.
The sequence of coin results is recorded.
1. How many outcomes are there?
2. How many outcomes are there with all heads?
3. How many outcomes are there with exactly one head?
Answer:
1. 126
2. 6
3. 21
Run the die-coin experiment 100 times and observe the outcomes.
Suppose that a sandwich at a restaurant consists of bread, meat, cheese, and various toppings. There are 4
different types of bread (select one), 3 different types of meat (select one), 5 different types of cheese (select
one), and 10 different toppings (select any). How many sandwiches are there?
Answer:
61440
Consider a deck of cards as a set D with 52 elements.
1. How many subsets of D are there?
2. How many functions are there from D into the set {1,2,3,4}?
Answer:
1. 4503599627370496
2. 20282409603651670423947251286016
The Galton Board
The Galton Board, named after Francis Galton, is a triangular array of pegs. Galton, apparently too modest to
name the device after himself, called it a quincunx from the Latin word for five twelfths (go figure). The rows
are numbered, from the top down, by (0,1,). Row n has n+1 pegs that are labeled, from left to right by
(0,1,,n). Thus, a peg can be uniquely identified by an ordered pair (n,k) where n is the row number and k
is the peg number in that row.
A ball is dropped onto the top peg (0,0) of the Galton board. In general, when the ball hits peg (n,k), it either
bounces to the left to peg (n+1,k) or to the right to peg (n+1,k+1). The sequence of pegs that the ball hits is
a path in the Galton board.
There is a one-to-one correspondence between each pair of the following three collections:
1. Bit strings of length n
2. Paths in the Galton board from (0,0) to any peg in row n.
3. Subsets of a set with n elements.
Thus, each of these collections has 2n elements.
Open the Galton board applet.
1. Move the ball from (0,0) to (10,6) along a path of your choice. Note the corresponding bit string and
subset.
2. Generate the bit string 0111001010. Note the corresponding subset and path.
3. Generate the subset {2,4,5,9,10}. Note the corresponding bit string and path.
4. Generate all paths from (0,0) to (4,2). How many paths are there?
Answer:
4. 6