You are on page 1of 216

Applied Combinatorics Math 6409

S. E. Payne
Student Version - Fall 2003
2
Contents
0.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1 Basic Counting Techniques 11
1.1 Sets and Functions: The Twelvefold Way . . . . . . . . . . . . 11
1.2 Composition of Positive Integers . . . . . . . . . . . . . . . . . 17
1.3 Multisets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Multinomial Coecients . . . . . . . . . . . . . . . . . . . . . 20
1.5 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.6 Partitions of Integers . . . . . . . . . . . . . . . . . . . . . . . 24
1.7 Set Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.8 Table Entries in the Twelvefold Way . . . . . . . . . . . . . . 27
1.9 Recapitulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.10 Cayleys Theorem: The Number of Labeled Trees . . . . . . . 30
1.11 The Matrix-Tree Theorem . . . . . . . . . . . . . . . . . . . . 35
1.12 Number Theoretic Functions . . . . . . . . . . . . . . . . . . . 38
1.13 Inclusion Exclusion . . . . . . . . . . . . . . . . . . . . . . . 42
1.14 Rook Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.15 Permutations With forbidden Positions . . . . . . . . . . . . . 50
1.16 Recurrence Relations: M`enage Numbers Again . . . . . . . . . 55
1.17 Solutions and/or Hints to Selected Exercises . . . . . . . . . . 58
2 Systems of Representatives and Matroids 67
2.1 The Theorem of Philip Hall . . . . . . . . . . . . . . . . . . . 67
2.2 An Algorithm for SDRs . . . . . . . . . . . . . . . . . . . . . 72
2.3 Theorems of Konig and G. Birkkho . . . . . . . . . . . . . . 73
2.4 The Theorem of Marshall Hall, Jr. . . . . . . . . . . . . . . . 76
2.5 Matroids and the Greedy Algorithm . . . . . . . . . . . . . . . 79
2.6 Solutions and/or Hints to Selected Exercises . . . . . . . . . . 85
3
4 CONTENTS
3 Polya Theory 89
3.1 Group Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.3 The Cycle Index: Polyas Theorem . . . . . . . . . . . . . . . 96
3.4 Sylow Theory Via Group Actions . . . . . . . . . . . . . . . . 98
3.5 Patterns and Weights . . . . . . . . . . . . . . . . . . . . . . . 100
3.6 The Symmetric Group . . . . . . . . . . . . . . . . . . . . . . 106
3.7 Counting Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.8 Solutions and/or Hints to Selected Exercises . . . . . . . . . . 111
4 Formal Power Series as Generating Functions 113
4.1 Using Power Series to Count Objects . . . . . . . . . . . . . . 113
4.2 A famous example: Stirling numbers of the 2nd kind . . . . . 116
4.3 Ordinary Generating Functions . . . . . . . . . . . . . . . . . 118
4.4 Formal Power Series . . . . . . . . . . . . . . . . . . . . . . . 121
4.5 Composition of Power Series . . . . . . . . . . . . . . . . . . . 125
4.6 The Formal Derivative and Integral . . . . . . . . . . . . . . . 127
4.7 Log, Exp and Binomial Power Series . . . . . . . . . . . . . . 129
4.8 Exponential Generating Functions . . . . . . . . . . . . . . . . 132
4.9 Famous Example: Bernoulli Numbers . . . . . . . . . . . . . . 135
4.10 Famous Example: Fibonacci Numbers . . . . . . . . . . . . . 137
4.11 Roots of a Power Series . . . . . . . . . . . . . . . . . . . . . . 138
4.12 Laurent Series and Lagrange Inversion . . . . . . . . . . . . . 139
4.13 EGF: A Second Look . . . . . . . . . . . . . . . . . . . . . . . 149
4.14 Dirichlet Series - The Formal Theory . . . . . . . . . . . . . . 155
4.15 Rational Generating Functions . . . . . . . . . . . . . . . . . . 159
4.16 More Practice with Generating Functions . . . . . . . . . . . . 164
4.17 The Transfer Matrix Method . . . . . . . . . . . . . . . . . . . 167
4.18 A Famous NONLINEAR Recurrence . . . . . . . . . . . . . . 176
4.19 MacMahons Master Theorem . . . . . . . . . . . . . . . . . . 177
4.19.1 Preliminary Results on Determinants . . . . . . . . . . 177
4.19.3 Permutation Digraphs . . . . . . . . . . . . . . . . . . 178
4.19.4 A Class of General Digraphs . . . . . . . . . . . . . . . 179
4.19.5 MacMahons Master Theorem for Permutations . . . . 181
4.19.8 Dixons Identity as an Application of the Master The-
orem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
4.20 Solutions and/or Hints to Selected Exercises . . . . . . . . . . 186
4.21 Addendum on Exercise 4.19.9 . . . . . . . . . . . . . . . . . . 194
CONTENTS 5
4.21.1 Symmetric Polynomials . . . . . . . . . . . . . . . . . . 194
4.21.7 A Special Determinant . . . . . . . . . . . . . . . . . . 197
4.21.9 Application of the Master Theorem to the Matrix B . . 199
4.21.10Sums of Cubes of Binomial Coecients . . . . . . . . . 201
5 Mobius Inversion on Posets 203
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.2 POSETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.3 Vector Spaces and Algebras . . . . . . . . . . . . . . . . . . . 208
5.4 The Incidence Algebra 1(P, K) . . . . . . . . . . . . . . . . . 210
5.5 Optional Section on . . . . . . . . . . . . . . . . . . . . . . . 215
5.6 The Action of 1(P, K) and Mobius Inversion . . . . . . . . . . 216
5.7 Evaluating : the Product Theorem . . . . . . . . . . . . . . . 218
5.8 More Applications of Mobius Inversion . . . . . . . . . . . . . 226
5.9 Lattices and Gaussian Coecients . . . . . . . . . . . . . . . . 233
5.10 Posets with Finite Order Ideals . . . . . . . . . . . . . . . . . 243
5.11 Solutions and/or Hints to Selected Exercises . . . . . . . . . . 248
6 CONTENTS
0.1 Notation
Throughout these notes the following notation will be used.
c = The set of complex numbers
^ = The set of nonnegative integers
T = The set of positive integers
Q = The set of rational numbers
1 = The set of real numbers
Z = The set of integers
N = a
1
, . . . , a
n
= typical set with n elements
[n] = 1, 2, . . . , n; [0] =
[i, j] = [i, i + 1, . . . , j], if i j
x| = The oor of x (i.e., the largest integer not larger than x)
x| = The ceiling of x (i.e., smallest integer not smaller than x)
T([n]) = A : A [n]
T(S) = A : A S (for any set S)
[A[ = The number of elements of A (also denoted #A)
_
N
k
_
= A : A N and [A[ = k = set of k subsets of N
_
n
k
_
= #
_
N
k
_
= number of k subsets of N (0 k n)
0.1. NOTATION 7
__
S
k
__
= set of all k multisets on S
__
n
k
__
= number of k multisets of an n set
_
n
a
1
, . . . , a
m
_
= number of ways of putting each element of an
n set into one of m categories C
1
, . . . , C
m
,
with a
i
objects in C
i
,

a
i
= n.
(j)
q
= 1 +q +q
2
+ +q
j1
(n)!
q
= (1)
q
(2)
q
(n)
q
n-qtorial
_
n
k
_
q
=
(n)!
q
(k)!
q
(nk)!
q
Gaussian q-binomial coecient
o
n
= The symmetric group on [n]
_
n
k
_
= # o
n
: has k cycles
= c(n, k) = signless Stirling number of rst kind
s(n, k) = (1)
nk
_
n
k
_
(Stirling number of rst kind)
_
n
k
_
= S(n, k) = Number of partitions of an n-set into
k nonempty subsets (blocks)
= Stirling number of second kind
B(n) = Total number of partitions of an n-set
= Bell Number
n
k
= (n)
k
= n(n 1) (n k + 1) (n to the k falling)
n

k
= (n)
k
= n(n + 1) (n +k 1) (n to the k rising)
8 CONTENTS
0.2 Introduction
The course at CU-Denver for which these notes were assembled, Math 6409
(Applied Combinatorics), deals more or less entirely with enumerative com-
binatorics. Other courses deal with combinatorial structures such as Latin
squares, designs of many types, nite geometries, etc. This course is a
one semester course, but as it has been taught dierent ways in dierent
semesters, the notes have grown to contain more than we are now able to
cover in one semester. On the other hand, these notes contain considerably
less material than the standard textbooks listed below. It is always dicult
to decide what to leave out, and the choices clearly are a reection of the
likes and dislikes of the author. We have tried to include some truly tradi-
tional material and some truly nontrivial material, albeit with a treatment
that makes it accessible to the student.
Since the greater part of this course is, ultimately, devoted to developing
ever more sophisticated methods of counting, we begin with a brief discussion
of what it means to count something. As a rst example, for n ^, put
f(n) = [T([n])[. Then no one will argue that the formula f(n) = 2
n
is any-
thing but nice. As a second example, let d(n) be the number of derangements
of (1, . . . , n). Then (as we show at least twice later on) d(n) = n!

n
i=0
(1)
i
i!
.
This is not so nice an answer as the rst one, but there are very clear proofs.
Also, d(n) is the nearest integer to
n!
e
. This is a convenient answer, but it
lacks combinatorial signicance. Finally, let f(n) be the number of n n
matrices of 0s and 1s such that each row and column has 3 1s. It has been
shown that
f(n) = 6
n

(1)

(n!)
2
( + 3)!2

!!(!)
2
6

,
where the sum is over all , , ^ for which + + = n. As far as we
know, this formula is not good for much of anything, but it is a very specic
answer that can be evaluated by computer for relatively small n.
As a dierent kind of example, suppose we want the Fibonacci numbers
F
0
, F
1
, F
2
, . . . , and what we know about them is that they satisfy the recur-
rence relation
F
n+1
= F
n
+F
n1
(n 1; F
0
= F
1
= 1).
The sequence begins with 1, 1, 2, 3, 5, 8, 21, 34, 55, 89, . . .. There are ex-
act, not very complicated formulas for F
n
, as we shall see later. But just to
0.2. INTRODUCTION 9
introduce the idea of a generating function, here is how a generatingfunc-
tionologist might answer the question: The nth Fibonacci number F
n
is the
coecient of x
n
in the expansion of the function 1/(1 x x
2
) as a power
series about the origin. (See the book generatingfunctionology by H. S. Wilf.)
Later we shall investigate this problem a great deal more.
We shall derive a variety of techniques for counting, some purely combina-
torial, some involving algebra in a moderately sophisticated way. But many
of the famous problems of combinatorial theory were solved before the
sophisticated theory was developed. And often the simplest ways to count
some type of object oer the greatest insight into the solution. So before we
develop the elaborate structure theory that mechanizes some of the counting
problems, we give some specic examples that involve only rather elemen-
tary ideas. The rst several sections are short, with attention to specic
problems rather than to erecting large theories. Then later we develop more
elaborate theories that begin to show some of the sophistication of modern
combinatorics.
Many common topics from combinatorics have received very little treat-
ment in these notes. Other topics have received no mention at all! We
propose that the reader consult at least the following well known text books
for additional material.
P. J. Cameron, Combinatorics, Cambridge University Press, 1994.
I. P. Goulden and D. M. Jackson, Combinatorial Enumeration, Wiley-
Interscience, 1983.
R. Graham, D. Knuth and O. Patashnik, Concrete Mathematics,
Addison-wesley Pub. Co., 1991.
R. P. Stanley, Enumerative Combinatorics, Vol. I, Wadsworth and
Brooks/Cole, 1986.
J. H. van Lint and R. M. Wilson, A Course In Combinatorics, Cambridge
University Press, 1992.
H. S. Wilf, generatingfunctionology, Academic Press, 1990.
In addition there is the monumental Handbook of Combinatorics
edited by R.L. Graham, M. Grotschel and L. Lovasz, published in 1995 by
the MIT Press in the USA and by North-Holland outside the USA. This is a
two volume set with over 2000 pages of articles contributed by many experts.
This is a very sophisticated compendium of combinatorial mathematics that
10 CONTENTS
is dicult reading but gives many wonderful insights into the more advanced
aspects of the subject.
Chapter 1
Basic Counting Techniques
1.1 Sets and Functions: The Twelvefold Way
Let N, X be nite sets with #N = n, #X = x. Put X
N
= f : N X.
We want to compute #(X
N
) subject to three types of restrictions on f and
four types of restrictions on when two functions are considered the same.
Restrictions on f:
(i) f is arbitrary
(ii) f is injective
(iii) f is surjective
Consider N to be a set of balls, X to be a set of boxes and f : N X
a way to put balls into boxes. The balls and the boxes may be labeled or
unlabeled. We illustrate the various possibilities with the following examples.
N = 1, 2, 3, X = a, b, c, d.
f =
_
1 2 3
a a b
_
; g =
_
1 2 3
a b a
_
; h =
_
1 2 3
b b d
_
;
i =
_
1 2 3
c b b
_
.
Case 1. Both the balls and the boxes are labeled (or distinguishable).
11
12 CHAPTER 1. BASIC COUNTING TECHNIQUES
f: _
1
_
2
_
3
a b c d
g: _
1
_
3
_
2
a b c d
h: _
1
_
2
_
3
a b c d
i: _
2
_
3
_
1
a b c d
Case 2. Balls unlabeled; boxes labeled.
f g: __ _
a b c d
h: __ _
a b c d
i: __ _
a b c d
Case 3. Balls labeled; boxes unlabeled.
f h: _
1
_
2
_
3
g: _
1
_
3
_
2
i: _
2
_
3
_
1
Case 4. Both balls and boxes unlabeled.
f g h i: __ _
For the four dierent possibilities arising according as N and X are each
labeled or unlabeled there are dierent denitions describing when two func-
tions from N to X are equivalent.
Denition: Two functions f, g : N X are equivalent
1. with N unlabeled provided there is a bijection : N N such that
f((a)) = g(a) for all a N. (In words: provided some relabeling of the
elements of N turns f into g.)
2. with X unlabeled provided there is a bijection : X X such that
(f(a)) = g(a) for all a N. (In words: provided some relabeling of the
elements of X turns f into g.)
1.1. SETS AND FUNCTIONS: THE TWELVEFOLD WAY 13
3. with both N and X unlabeled provided there are bijections :
N N and : X X with (f((a))) = g(a), i.e., the following diagram
commutes.
T
N

N
f
X

X
E
c
E
g
Obs. 1. These three notions of equivalence determine equivalence rela-
tions on X
N
. So the number of dierent functions with respect to one of
these equivalences is the number of dierent equivalence classes.
Obs. 2. If f and g are equivalent in any of the above ways, then f is
injective (resp., surjective) i g is injective (resp., surjective). So we say
the notions of injectivity and surjectivity are compatible with the equivalence
relation. By the number of inequivalent injective functions f : N X we
mean the number of equivalence classes all of whose elements are injective.
Similarly for surjectivity.
As we develop notations, methods and results, the reader should ll in the
blanks in the following table with the number of functions of the indicated
type. Of course, sometimes the formula will be elegant and simple. Other
times it may be rather implicit, e.g., the coecient of some term of a given
power series.
14 CHAPTER 1. BASIC COUNTING TECHNIQUES
The 12Fold Way
Value of [f : N X[ if [N[ = n and [X[ = x
N X f unrestricted f injective f surjective
Labeled Labeled 1 2 3
Unlabeled Labeled 4 5 6
Labeled Unlabeled 7 8 9
Unlabeled Unlabeled 10 11 12
Let N = a
1
, . . . , a
n
, X = 0, 1. Let T(N) be the set of all subsets of
N. For A N, dene f
A
: N X by
f
A
(a
i
) =
_
1, a
i
A
0, a
i
, A.
Then F : T(N) X
N
: A f
A
is a bijection, so
[T(N)[ = #(X
N
) = 2
n
. (1.1)
Exercise: 1.1.1 Generalize the result of Eq. 1.1 to provide an answer for
the rst blank in the twelvefold way.
We dene
_
N
k
_
to be the set of all k-subsets of N, and put
_
n
k
_
=
#
_
N
k
_
. Let N(n, k) be the number of ways to choose a k-subset T of N
and then linearly order the elements of T. Clearly N(n, k) =
_
n
k
_
k!. On
the other hand, we could choose any element of N to be the rst element
1.1. SETS AND FUNCTIONS: THE TWELVEFOLD WAY 15
of T in n ways, then choose the second element in n 1 ways, . . . , and
nally choose the kth element in n k + 1 ways. So N(n, k) =
_
n
k
_
k! =
n(n 1) (n k +1) := n
k
, where this last expression is read as n to the
k falling. (Note: Similarly, n

k
= n(n+1) (n+k 1) is read as n to the
k rising.) This proves the following.
_
n
k
_
=
n(n 1) (n k + 1)
k!
=
n
k
k!
(1.2)
_
=
(n)
k
k!
according to some authors
_
.
Exercise: 1.1.2 Prove:
_
n
k1
_
+
_
n
k
_
=
_
n+1
k
_
.
Exercise: 1.1.3 Binomial Expansion If xy = yx and n ^, show that
(x +y)
n
=
n

i=0
_
n
i
_
x
i
y
ni
. (1.3)
Note:
_
n
k
_
:=
n
k
k!
makes sense for k ^ and n c. What if n Z,
k Z and k < 0 or k > n? The best thing to do is to dene the value of
the binomial coecient to be zero in these cases. Then we may write the
following
(1 +x)
n
=
n

k=0
_
n
k
_
x
k
=

k
_
n
k
_
x
nk
, (1.4)
where in the second summation the index may be allowed to run over all
integers since the coecient on x
nk
is nonzero for at most a nite number
of values of k.
Put x = 1 in Eq. 1.4 to obtain

k
(1)
k
_
n
k
_
= 0. (1.5)
16 CHAPTER 1. BASIC COUNTING TECHNIQUES
Put x = 1 in Eq. 1.4 to obtain

k
_
n
k
_
= 2
n
. (1.6)
Dierentiate Eq. 1.4 with respect to x (n(1 +x)
n1
=

k
k
_
n
k
_
x
k1
)
and put x = 1 to obtain

k
k
_
n
k
_
= n 2
n1
. (1.7)
Exercise: 1.1.4 Prove that

r
j=m
_
j
m
_
=
_
r+1
m+1
_
.
(Hint: A straightforward induction argument works easily. For a more amus-
ing approach try the following:

n1
i=0
(1+y)
i
=
(1+y)
n
1
(1+y)1
=

n
j=1
_
n
j
_
y
j1
. Now
compute the coecient of y
m
.)
Exercise: 1.1.5 Prove that for all a, b, n ^ the following holds:

i
_
a
i
__
b
n i
_
=
_
a +b
n
_
.
Exercise: 1.1.6 Let r, s, k be any nonnegative integers. Then the following
identity holds:
k

j=0
_
r +j
r
__
s +k j
s
_
=
_
r +s +k + 1
r +s + 1
_
.
Exercise: 1.1.7 Evaluate the following two sums:
a.

n
i=1
_
n
i
_
i3
i
b.

n
i=2
_
n
i
_
i(i 1)m
i
1.2. COMPOSITION OF POSITIVE INTEGERS 17
Exercise: 1.1.8 Show that if 0 m < n, then
n

k=m+1
(1)
k
_
n
k
__
k 1
m
_
= (1)
m+1
.
(Hint: Fix m and induct on n.)
Exercise: 1.1.9 Show that if 0 m < n, then
m

k=0
(1)
k
_
n
k
_
= (1)
m
_
n 1
m
_
.
(Hint: Fix n > 0 and use nite induction on m.)
Exercise: 1.1.10 Show that:
(a)

k=0
_
k+n
k
_
1
2
k
= 2
n+1
.
(b)

n
k=0
_
k+n
k
_
1
2
k
= 2
n+1
.
Exercise: 1.1.11 If n is a positive integer, show that
n

k=1
(1)
k+1
k
_
n
k
_
=
n

k=1
1
k
.
1.2 Composition of Positive Integers
A composition of n T is an ordered set = (a
1
, . . . , a
k
) of positive
integers for which n = a
1
+ a
k
. In this case has k parts, i.e., is
a k-composition of n. Given a k-composition = (a
1
, . . . , a
k
) , dene a
(k 1)-subset () of [n 1] by
() = a
1
, a
1
+a
2
, . . . , a
1
+a
2
+ +a
k1
.
is a bijection between the set of k-compositions of n and (k 1)-subsets
of [n 1]. This proves the following:
There are exactly
_
n 1
k 1
_
k compositions of n. (1.8)
18 CHAPTER 1. BASIC COUNTING TECHNIQUES
Moreover, the total number of compositions of n is
n

k=1
_
n 1
k 1
_
=
n1

k=0
_
n 1
k
_
= 2
n1
. (1.9)
The bijection is often represented schematically by drawing n dots in a
row and drawing k 1 vertical bars between the n 1 spaces separating the
dots - at most one bar to a space. For example,
[ [ [ [ 1 + 2 + 3 + 1 + 2 = 9.
There is a closely related problem. Let N(n, k) be the number of solutions
(also called weak compositions) (x
1
, . . . , x
k
) in nonnegative integers such
that x
1
+x
2
+ +x
k
= n. Put y
i
= x
i
+1 to see that N(n, k) is the number
of solutions in positive integers y
1
, . . . , y
k
to y
1
+ y
2
+ + y
k
= n + k, i.e.,
the number of k-compositions of n +k. Hence
N(n, k) =
_
n +k 1
k 1
_
=
_
n +k 1
n
_
. (1.10)
Exercise: 1.2.1 Find the number of solutions (x
1
, . . . , x
k
) in nonnegative
integers to

x
i
n. Here k can vary over the numbers in the range 0
k r. (Hint:

n
j=0
N(j, k) =
_
n+k
k
_
, and

r
j=m
_
j
m
_
=
_
r+1
m+1
_
.)
Exercise: 1.2.2 Find the number of solutions (x
1
, . . . , x
k
) in integers to

x
i
= n with x
i
a
i
for preassigned integers a
i
, 1 i k. (Hint: Try
y
i
= x
i
+ 1 a
i
.)(Ans: If k = 4 and m = a
1
+ a
2
+ a
3
+ a
4
, the answer is
(n+3m)(n+2m)(n+1m)
6
.)
Exercise: 1.2.3 a) Show that

n
k=1
_
m+k1
k
_
=

m
k=1
_
n+k1
k
_
. b) Derive a
closed form formula for the number of weak compositions of n into at most
m parts.
1.3. MULTISETS 19
Exercise: 1.2.4 Let S be a set of n elements. Count the ordered pairs (A, B)
of subsets of S such that A B S. Let c(j, k) denote the number of
such ordered pairs for which [A[ = j and [B[ = k. Show that:
(1 +y +xy)
n
=

0jkn
c(j, k)x
j
y
k
.
What does this give if x = y = 1?
Exercise: 1.2.5 Show that
f
k
(x) =
_
x
k
_
=
x
k
k!
is a polynomial in x with rational coecients (not all of which are integers)
and such that for each integer m (positive, negative or zero) f
k
(m) is also an
integer.
1.3 Multisets
A nite multiset M on a set S is a function : S ^ such that

xS
(x) <
. If

xS
(x) = k, M is called a k-multiset. Sometimes we write k = #M.
If S = x
1
, . . . , x
n
and (x
i
) = a
i
, write M = x
a
1
1
, . . . , x
a
n
n
. Then let
__
S
k
__
denote the set of all k-multisets on S and put
__
n
k
__
= #
__
S
k
__
.
If M
t
=
t
: S ^ is a second multiset on S, we say M
t
is a submultiset
of M provided
t
(x) (x) for all x S.
Note: The number of submultisets of M is

xS
((x) + 1). And each
element of
__
S
k
__
corresponds to a weak n-composition of k : a
1
+ a
2
+
+a
n
= k. This proves the following:
__
n
k
__
=
_
n +k 1
n 1
_
=
_
n +k 1
k
_
. (1.11)
20 CHAPTER 1. BASIC COUNTING TECHNIQUES
In the present context we give a glimpse of the generating functions that
will be studied in greater detail later.
(1+x
1
+x
2
1
+ )(1+x
2
+x
2
2
+ ) (1+x
n
+x
2
n
+ ) =

:SA
(

x
i
S
x
(x
i
)
i
).
Put all x
i
s equal to x:
(1 +x +x
2
+ )
n
=

:SA
x

(x
i
)
=

M on S
x
#M
=

k0
__
n
k
__
x
k
.
Hence we have proved that
(1 x)
n
=

k0
__
n
k
__
x
k
. (1.12)
From this it follows that (1)
k
_
n
k
_
=
__
n
k
__
, a fact which is also
easy to check directly. Replacing n with n + 1 give the following version
which is worth memorizing:
(1 x)
(n+1)
=

k0
_
n +k
n
_
x
k
. (1.13)
Exercise: 1.3.1
_
n
k
_
= (1)
k
_
n+k1
k
_
.
1.4 Multinomial Coecients
Let
_
n
a
1
, . . . , a
m
_
be the number of ways of putting each element of an n-
set into one of m labeled categories C
1
, . . . , C
m
, so that C
i
gets a
i
elements.
This is also the number of ways of distributing n labeled balls into m labeled
boxes so that box B
i
gets a
i
balls.
Consider n linearly ordered blanks
1
,
2
, . . . ,
n
to be assigned one of
m letters B
1
, . . . , B
m
so that B
i
is used a
i
times. It is easy to see that the
number of words of length n from an alphabet B
1
, . . . , B
m
with m letters
1.5. PERMUTATIONS 21
where the ith letter B
i
is used a
i
times is
_
n
a
1
, . . . , a
m
_
. Using this fact it
is easy to see that
_
n
a
1
, . . . , a
m
_
=
=
_
n
a
1
__
n a
1
a
2
__
n a
1
a
2
a
3
_

_
n a
1
a
2
a
n1
a
m
_
=
n!
a
1
!a
2
! a
m
!
.
Theorem 1.4.1 The coecient of x
a
1
1
x
a
2
2
x
a
m
m
in (x
1
+ +x
m
)
n
is
_
n
a
1
, . . . , a
m
_
.
The multinomial coecient is dened to be zero whenever it is not the
case that a
1
, . . . , a
m
are nonnegative integers whose sum is n.
Exercise: 1.4.2 Prove that
_
n
a
1
, . . . , a
m
_
=
_
n 1
a
1
1, a
2
, . . . , a
m
_
+
_
n 1
a
1
, a
2
1, . . . , a
m
_
+ +
_
n 1
a
1
, a
2
, . . . , a
m
1
_
.
1.5 Permutations
There are several ways to approach the study of permutations. One of the
most basic is as a group of bijections. Let A be any nonempty set (nite or
innite). Put o(A) = : A A : is a bijection.
Notation: If : a
1
a
2
we write (a
1
) = a
2
(unless noted otherwise). If
, o(A), dene the composition by ( )(a) = ((a)).
Theorem 1.5.1 (o(A), ) is a group.
22 CHAPTER 1. BASIC COUNTING TECHNIQUES
For simplicity in notation we take A = [n] = 1, . . . , n, for n T and
we write o
n
= o([n]). One way to represent o
n
is as a two-rowed
array =
_
1 2 . . . n
(1) (2) . . . (n)
_
. From this representation it is easy to
write either as a linearly ordered sequence = (1), (2), . . . , (n) or as a
product of disjoint cycles:
= ((1)
2
(1)
i
(1))((j)
2
(j) ) ( ).
Example: =
_
1 2 3 4 5 6 7 8 9
2 6 1 9 8 3 7 5 4
_
. Then = 261983754 as
a linearly ordered sequence. = (1263)(49)(58)(7) as a product of disjoint
cycles. Recall that disjoint cycles commute, and (135) = (351) = (513) ,=
(153), etc.
We now introduce the so-called standard representation of a permutation
o
n
. Write as a product of disjoint cycles in such a way that
(a) Each cycle is written with it largest element rst, and
(b) Cycles are ordered (left to right) in increasing order of largest ele-
ments.
Example: =
_
1 2 3 4 5 6 7
4 2 7 1 3 6 5
_
= (14)(2)(375)(6) =
= (2)(41)(6)(753), where the last expression is the standard representation
of . Given a permutation , is the word (or permutation written as a
linearly ordered sequence) obtained by writing in standard form and erasing
parentheses. So for the example above, = 2416753. We can recover from
by inserting a left parenthesis preceding each left-to-right maximum, i.e.,
before each a
i
such that a
i
> a
j
for every j < i in = a
1
a
2
a
n
. Then put
right parentheses where they have to be. It follows that is a bijection
from o
n
to itself.
Theorem 1.5.2 o
n
o
n
is a bijection. And has k cycles if and only if
has k left-to-right maxima.
Dene
_
n
k
_
to be the number of permutations in o
n
that have exactly k
cycles. (Many authors use c(n, k) to denote this number.) So as a corollary
of Theorem 1.5.2 we have
1.5. PERMUTATIONS 23
Corollary 1.5.3 The number of permutations in o
n
with exactly k left-to-
right maxima is
_
n
k
_
.
Put s(n, k) := (1)
nk
c(n, k) = (1)
nk
_
n
k
_
. Then s(n, k) is called the
Stirling number of the rst kind and
_
n
k
_
is the signless Stirling number
of the rst kind.
Lemma 1.5.4
_
n
k
_
= (n1)
_
n 1
k
_
+
_
n 1
k 1
_
, n, k > 0. And
_
0
0
_
=
1, but otherwise if n 0 or k 0 put
_
n
k
_
= 0.
Proof: Let o
n1
be written as a product of k disjoint cycles. We
can insert the symbol n after any of the numbers 1, . . . , n 1 (in its cycle).
This can be done in n 1 ways, yielding the disjoint cycle decomposition of
a permutation
t
o
n
with k cycles for which n appears in a cycle of length
greater than or equal to 2. So there are (n 1)
_
n 1
k
_
permutations

t
o
n
with k cycles for which
t
(n) ,= n. On the other hand, we can choose
a permutation o
n1
with k 1 cycles and extend it to a permutation

t
o
n
with k cycles satisfying
t
(n) = n. This gives each
t
o
n
exactly
once, proving the desired result.
Theorem 1.5.5 For n ^,

n
k=0
_
n
k
_
x
k
= x
n
= (x)
n
.
Proof: Put F
n
(x) := x
n
= x(x + 1) (x + n 1) =

n
k=0
b(n, k)x
k
.
If n = 0, F
n
(x) is a void product, which by convention is 1. So we put
b(0, 0) = 1, and b(n, k) = 0 if n < 0 or k < 0. Then F
n
(x) = (x+n1)F
n1
(x)
implies that

n
k=0
b(n, k)x
k
= x

n1
k=0
b(n 1, k)x
k
+ (n 1)

n1
k=0
b(n 1, k)x
k
=

n
k=1
b(n 1, k 1)x
k
+ (n 1)

n1
k=0
b(n 1, k)x
k
=

n
k=0
[b(n 1, k 1) + (n 1)b(n 1, k)]x
k
.
24 CHAPTER 1. BASIC COUNTING TECHNIQUES
This implies that b(n, k) = (n 1)b(n 1, k) + b(n 1, k 1). Hence
the b(n, k) satisfy the same recurrence and initial conditions as the
_
n
k
_
,
implying that they are the same, viz., b(n, k) =
_
n
k
_
.
Corollary 1.5.6 x
n
=

n
k=0
(1)
nk
_
n
k
_
x
k
.
Proof: Put x = y in Theorem 1.5.5 and simplify. (Use x
n
= (1)
n
(x)
n
.)
Cycle Type: If o
n
, then c
i
= c
i
() is the number of cycles of length
i in , 1 i n. Note: n =

n
i=1
ic
i
. Then has type (c
1
, . . . , c
n
) and the
total number of cycles of is c() =

n
1
c
i
().
Theorem 1.5.7 The number of o
n
with type (c
1
, . . . , c
n
) is
n!
1
c
1
c
1
!2
c
2
c
2
! n
c
n
c
n
!
.
Proof: Let = a
1
a
n
be any word, i.e., permutation in o
n
. Suppose
(c
1
, . . . , c
n
) is an admissible cycle type, i.e., c
i
0 for all i and n =

i
ic
i
.
Insert parentheses in so that the rst c
1
cycles have length 1, the next
c
2
cycles have length 2, . . . , etc. This denes a map : o([n]) o
c
([n]),
where o
c
([n]) = o([n]) : has type (c
1
, . . . , c
n
). Clearly is onto
o
c
([n]). We claim that if o
c
([n]), then the number of mapped to
is 1
c
1
c
1
!2
c
2
c
2
! n
c
n
c
n
!. This follows because in writing as a product of
disjoint cycles, we can order the cycles of length i (among themselves) in c
i
!
ways, and then choose the rst elements of all these cycles in i
c
i
ways. These
choices for dierent i are all independent. So : o([n]) o
c
([n]) is a many
to one map onto o
c
([n]) mapping the same number of to each . Since
#o([n]) = n!, we obtain the desired result.
1.6 Partitions of Integers
A partition of n ^ is a sequence = (
1
, . . . ,
k
) ^
k
such that
1.7. SET PARTITIONS 25
a)

i
= n, and
b)
1

k
0.
Two partitions of n are identical if they dier only in the number of
terminal 0s. For example, (3, 3, 2, 1) (3, 3, 2, 1, 0, 0). The nonzero
i
are
the parts of the partition . If = (
1
, . . . ,
k
) with
1

k
> 0,
we say that has k parts. If has
i
parts equal to i, we may write
=< 1

1
, 2

2
, . . . > where terms with
i
= 0 may be omitted, and the
superscript
i
= 1 may be omitted.
Notation: n means is a partition of n. As an example we have
(4, 4, 2, 2, 2, 1) = < 1
1
, 2
3
, 3
0
, 4
2
> = < 1, 2
3
, 4
2
> 15
Put p(n) equal to the total number of partitions of n, and p
k
(n) equal to
the number of partitions of n with k parts.
Convention: p(0) = p
0
(0) = 1.
p
n
(n) = 1.
p
n1
(n) = 1 if n > 1.
p
1
(n) = 1 for n 1.
p
2
(n) = n/2|.
Exercise: 1.6.1 p
k
(n) = p
k1
(n 1) +p
k
(n k).
Exercise: 1.6.2 Show p
k
(n) =

k
s=1
p
s
(n k).
A great deal of time and eort has been spent studying the partitions of
n and much is known about them. However, most of the results concerning
the numbers p
k
(n) have been obtained via the use of generating functions.
Hence after we have studied formal power series it would be reasonable to
return to the topic of partitions. Unfortunately we probably will not have
time to do this, so this topic would be a great one for a term project.
1.7 Set Partitions
A partition of a nite set N is a collection = B
1
, . . . , B
k
of subsets of N
such that:
26 CHAPTER 1. BASIC COUNTING TECHNIQUES
(a) B
i
,= for all i;
(b) B
i
B
j
= if i ,= j;
(c) B
1
B
2
B
k
= N.
We call B
i
a block of and say that has k = [[ = # blocks. Put
_
n
k
_
= S(n, k) = the number of partitions of an n-set into k-blocks.
S(n, k) is called a Stirling number of the second kind.
We immediately have the following list of Stirling numbers.
_
0
0
_
= 1;
_
n
k
_
= 0 if k > n 1;
_
n
0
_
= 0 if n > 0;
_
n
1
_
= 1;
_
n
2
_
= 2
n1
1;
_
n
n
_
= 1;
_
n
n 1
_
=
_
n
2
_
.
Theorem 1.7.1
_
n
k
_
= k
_
n 1
k
_
+
_
n 1
k 1
_
.
Proof: To obtain a partition of [n] into k blocks, we can either
(i) partition [n 1] into k blocks and place n into any of these blocks in
k
_
n 1
k
_
ways, or
(ii) put n into a block by itself and partition [n 1] into k 1 blocks in
_
n 1
k 1
_
ways.
Bell Number Let B(n) be the total number of partitions of an n-
set. Hence
1.8. TABLE ENTRIES IN THE TWELVEFOLD WAY 27
B(n) =
n

k=1
_
n
k
_
=
n

k=0
_
n
k
_
,
for all n 1.
Theorem 1.7.2
x
n
=

k
_
n
k
_
x
k
, n ^.
Proof: Check n = 0 and n = 1. Then note that x x
k
= x
k+1
+ kx
k
,
because x
k+1
= x
k
(x k) = x x
k
k x
k
. Now let our induction hypothesis
be that x
n1
=

k
_
n 1
k
_
x
k
for some n 2. Then x
n
= x x
n1
=
x

k
_
n 1
k
_
x
k
=

k
_
n 1
k
_
x
k+1
+

k
_
n 1
k
_
kx
k
=

k
_
n 1
k 1
_
x
k
+

k
_
n 1
k
_
kx
k
=

k
_
k
_
n 1
k
_
+
_
n 1
k 1
__
x
k
=

k
_
n
k
_
x
k
.
Corollary 1.7.3 x
n
=

k
(1)
nk
_
n
k
_
x

k
.
1.8 Table Entries in the Twelvefold Way
The reader should supply whatever is still needed for a complete proof for
each of the following.
Entry #1. #(X
N
) = x
n
.
Entry #2. #f X
N
: f is one-to-one = x
n
.
Entry #3. #f X
N
: f is onto = x!
_
n
x
_
, as it is the number of
ways of partitioning the balls (say [n]) into x blocks and then linearly ordering
the blocks. This uniquely determines an f of the type being counted.
Entry #4. A function from unlabeled N to labeled X is a placing of
unlabeled balls in labeled boxes: the only important thing is how many balls
28 CHAPTER 1. BASIC COUNTING TECHNIQUES
are there to be in each box. Each choice corresponds to an n-multiset of an
x-set, i.e.,
__
x
n
__
=
_
n +x 1
n
_
=
_
n+x1
x1
_
.
Entry #5. Here N is unlabeled and X is labeled and f is one-to-one.
Each function corresponds to putting 0 or 1 ball in each box so that n balls
are used, so the desired number of functions is the binomial coecient
_
x
n
_
.
Entry #6. Here N is unlabeled, X is labeled, and f is onto. So each f
corresponds to an n-multiset on X with each box chosen at least once. The
number of such functions is
__
x
n x
__
=
_
n 1
x 1
_
=
_
n 1
n x
_
.
Entry #7. With N labeled and X unlabeled, a function f : N X is
determined by the sets f
1
(b) : b X. Hence f corresponds to a partition
of N into at most x parts. The number of such partitions is

x
k=1
_
n
k
_
=

x
k=1
S(n, k).
Entry #8. Here N is labeled, X is unlabeled and f is one-to-one. Such
an f amounts to putting just one of the n balls into each of n unlabeled
boxes. This is possible i n x, and in that case there is just 1 way.
Entry #9. With N labeled, X unlabeled, and f onto, such an f corre-
sponds to a partition of N into x parts. Hence the number of such functions
is
_
n
x
_
.
Entry #10. With N unlabeled, X unlabeled and f arbitrary, f is de-
termined by the number of elements in each block of the ker(f), i.e., f is
essentially just a partitiion of [n] with at most x parts. Hence the number of
such f is p
1
(n) + +p
x
(n).
Entry #11. Here N and X are both unlabeled, and f is one-to-one. So
n unlabeled balls distributed into x unlabeled boxes is possible in just one
way if n x and not at all otherwise.
Entry #12. Here both N and X are unlabeled and f is onto. Clearly f
corresponds to a partition of n into x parts, so there are p
x
(n) such functions.
Several of the entries of the Twelvefold Way are quite satisfactory, but
others need considerable further development before they are really useful.
1.9. RECAPITULATION 29
1.9 Recapitulation
We have already established the following.
1.
_
n
k
_
=
_
n 1
k
_
+
_
n 1
k 1
_
n choose k
tt
2.
_
n
k
_
= (n 1)
_
n 1
k
_
+
_
n 1
k 1
_
n cycle k
tt
3.
_
n
k
_
= k
_
n 1
k
_
+
_
n 1
k 1
_
n subset k
tt
4. x
n
=

k
_
n
k
_
x
k
x
n
= (1)
n
(x)
n
5. x
n
=

k
(1)
nk
_
n
k
_
x
k
x
n
= (1)
n
(x)
n
6. x
n
=

k
_
n
k
_
(1)
nk
x

k
7. x
n
=

k
_
n
k
_
x
k
It appears that 4. and 6. (resp., 5. and 7.) are some kind of inverses
of each other. Later we shall make this a little more formal as we study the
incidence algebra of a nite POSET.
Also in this section we want to recap certain results on composition of
integers, etc.
1. P(r; r
1
, r
2
, . . . , r
n
) =
_
r
r
1
, r
2
, . . . , r
n
_
=
r!
r
1
!r
2
! r
n
!
= the number of ways to split up r people into n labeled committees with
r
i
people in committee C
i
= the number of words of length r with r
i
letters of type i, 1 i n,
where r
1
+r
2
+ +r
n
= r
30 CHAPTER 1. BASIC COUNTING TECHNIQUES
= the coecient on x
r
1
1
x
r
2
2
x
r
n
n
in (x
1
+ x
2
+ + x
n
)
r
, where r
1
+
r
2
+ +r
n
= r
= the number of ways of putting r distinct balls into n labeled boxes with
r
i
balls in the i
th
box.
2. C(n, r) =
_
n
r
_
=
n!
r!(n r)!
= the number of ways of selecting a committee of size r from a set of n
people
= the coecient of x
r
y
nr
in (x +y)
n
.
3. C(r +n 1, r) =
_
r +n 1
r
_
=
(r +n 1)!
r!(n 1)!
= the number of ways of ordering r hot dogs of n dierent types (selec-
tion with repetition)
= the number of ways of putting r identical balls into n labeled boxes
(distribution of identical objects)
= the number of solutions (x
1
, . . . , x
n
) of ordered n-tuples of nonnegative
integers such that x
1
+x
2
+ +x
n
= r.
4. Suppose that a
1
, . . . , a
n
are given integers, not necessarily nonnega-
tive. Then
_
r

a
i
+n 1
r

a
i
_
= [(y
1
, . . . , y
n
) :
n

i=0
y
i
= r and y
i
a
i
for 1 i n[.
For a proof, just put x
i
= y
i
a
i
and note that y
i
a
i
i x
i
0.
NOTE: If

a
i
> r, then the number of solutions is 0.
1.10 Cayleys Theorem: The Number of La-
beled Trees
One of the most famous results in combinatorics is Cayleys Theorem that
says that the number of labeled trees on n vertices is n
n2
. We give several
1.10. CAYLEYS THEOREM: THE NUMBER OF LABELED TREES 31
proofs that illustrate dierent types of combinatorial arguments. But rst
we recall the basic facts about trees.
Let G be a nite graph G = (V, E) on n = [V [ vertices and having b = [E[
edges. Then G is called a tree provided G is connected and has no cycles. It
follows that for any two vertices x, y V , there is a unique path in G from x
to y. Moreover, if x and y are two vertices at maximum distance in G, then
x and y each have degree 1. Hence any tree with at least two vertices has at
least two vertices of degree 1. Such a vertex will be called a hanging vertex.
An easy induction argument shows that if G is a tree on n vertices, then it
has n 1 edges. As a kind of converse, if G is an acyclic graph on n vertices
with n 1 edges, it must be a tree. Clearly we need only verify that G is
connected. Each connected component of G is a tree by denition. But if G
has k connected components T
1
, . . . , T
k
, where T
i
has n
i
vertices and n
i
1
edges, and where n
1
+ + n
k
= n, then G has

k
i=1
n
i
1 = n k edges.
So n k = n 1 implies G is connected. Similarly, if G is connected with
b = n 1, then G has a spanning tree T with n 1 edges. Hence G = T
must be acyclic and hence a tree. It follows that if G is a graph on n vertices
and b edges, then G is a tree if and only if at least two (and hence all three)
of the following hold:
(a) G is connected;
(b) G is acyclic;
(c) b = n 1.
A labeled tree on [n] is just a spanning tree of the complete graph K
n
on
[n]. Hence we may state Cayleys theorem as follows.
Theorem 1.10.1 The number of spanning trees of K
n
is n
n2
.
Proof #1. The rst proof is due to H. Pr ufer (1918). It uses an algorithm
that uniquely characterizes the tree.
Let T be a tree with V = [n], so the vertex set already has a natural
order. Let T
1
:= T. For i = 1, 2, . . . , n 2, let b
i
denote the vertex of degree
1 with the smallest label in T
i
, and let a
i
be the vertex adjacent to b
i
, and
let T
i+1
be the tree obtained by deleting the vertex b
i
and the edge a
i
, b
i

from T
i
. The code assigned to the tree T is [a
1
, a
2
, . . . , a
n2
].
32 CHAPTER 1. BASIC COUNTING TECHNIQUES
x 3
x 4
x
2
z
1
x 5
z6
x7
x
10
z8
z9
d
d
d
d

f
f
f
f
f
f

e
e
e
e

r
r
r
r
r
r
r
r
A Tree on 10 Points
As an example, consider the tree T = T
1
on 10 points in the gure. The
vertex of degree 1 with smallest index is 3. It is joined to vertex 2. We dene
a
1
= 2, b
1
= 3, then delete vertex 3 and edge 3, 2, to obtain a tree T
2
with
one edge and one vertex less. This procedure is repeated eight times yielding
the sequences
[a
1
, a
2
, . . . , a
8
] = [2, 2, 1, 1, 7, 1, 10, 10],
[b
1
, b
2
, . . . , b
8
] = [3, 4, 2, 5, 6, 7, 1, 8]
and terminating with the edge 9, 10.
The code for the tree is the sequence: [a
1
, a
2
, . . . , a
8
] =
[2, 2, 1, 1, 7, 1, 10, 10].
To reverse the procedure, start with any code [a
1
, a
2
, . . . , a
8
] =
[2, 2, 1, 1, 7, 1, 10, 10]. Write a
n1
:= n. For i = 1, 2, . . . , n 1, let b
i
be the
vertex with smallest index which is not in
a
i
, a
i+1
, . . . , a
n1
b
1
, b
2
, . . . , b
i1
.
Then a
i
, b
i
: i = 1, . . . , n 1 will be the edge set of a spanning tree.
Exercise: 1.10.2 With the sequence b
i
dened from the code as indicated in
the proof above, show that b
i
, a
i
: i = 1, . . . , n 1 will be the edge set of
a tree on [n]. Fill in the details of why the mapping associating a code to a
tree, and the mapping associating a tree to a code, are inverses.
Proof #2. This proof starts by showing that the number N(d
1
, . . . , d
n
) of
labeled trees on vertices v
1
, . . . , v
n
in which v
i
has degree d
i
+1, 1 i n,
1.10. CAYLEYS THEOREM: THE NUMBER OF LABELED TREES 33
is the multinomial coecient
_
n2
d
1
,...,d
n
_
. As an inductive hypothesis we assume
that this result holds for trees with fewer than n vertices and leave to the
reader the task of checking that the result holds for n = 3. Since the degree
of each vertex is at least 1, we know that the ds are all nonnegative integers.
The sum of the degrees of the vertices counts the n1 edges twice, so we have
2(n1) =

n
i=1
(d
i
+1) = (

d
i
) +n

d
i
= n2. Hence at least
_
n2
d
1
,...,d
n
_
is in proper form. We also know that any tree has at least two vertices with
degree 1. We need to show that if (d
1
, . . . , d
n
) is a sequence of nonnegative
integers with

d
i
= n2 then (d
1
+1, . . . , d
n
+1) really is the degree sequence
of
_
n2
d
1
,...,d
n
_
labeled trees. Clearly if

d
i
= n2 then at least two of the d
i
s
equal zero. The following argument would work with any particular d
j
= 0,
but for notational ease we suppose that d
n
= 0. If there is a labeled tree with
degree sequence (d
1
+ 1, . . . , 1), then the vertex v
n
is adjacent to a unique
vertex v
j
with degree at least 2. So the tree obtained by removing v
n
and
the edge v
j
, v
n
has degree sequence (d
1
+1, . . . , d
j
, . . . , d
n1
+1). It follows
that N(d
1
, . . . , d
n1
, 0) = N(d
1
1, d
2
, . . . , d
n1
) +N(d
1
, d
2
1, . . . , d
n1
) +
+N(d
1
, d
2
, . . . , d
n1
1). By the induction hypothesis this is the sum of
the multinomial coecients
_
n 3
d
1
1, d
2
, . . . , d
n1
_
+
_
n 3
d
1
, d
2
1, . . . , d
n1
_
+ +
+
_
n 3
d
1
, d
2
, . . . , d
n1
1
_
=
_
n 2
d
1
, d
2
, . . . , d
n1
, 0
_
.
Cayleys Theorem now follows. For the number T(n) of labeled trees
on n vertices is the sum of all the terms N(d
1
, . . . , d
n
) with d
i
0 and

n
i=1
d
i
= n 2, which is the sum of all terms
_
n2
d
1
,d
2
,...,d
n
_
with d
i
0 and

n
i=1
d
i
= n2. Now in the multinomial expansion of (a
1
+a
2
+ +a
n
)
n2
set a
1
= = a
n
= 1 to obtain the desired result T(n) = (1+1+. . .+1)
n2
=
n
n2
.
Proof #3. This proof establishes a bijection between the set of labeled
trees on n vertices and the set of mappings from the set 2, 3, . . . , n1 to the
set [n] = 1, 2, . . . , n. Clearly the number of such mappings is n
n2
. Suppose
f is such a mapping. Construct a functional digraph D on the vertices 1
through n by dening (i, f(i)), i = 2, . . . , n 1 to be the arcs. Clearly 1 and
n have zero outdegrees in D, but each of them could have positive indegree.
In either case, the (weakly) connected component containing 1 (respectively,
34 CHAPTER 1. BASIC COUNTING TECHNIQUES
n) may be viewed as an in-tree rooted at 1 (respectively, n). Any other
component consists of an oriented circuit, to each point of which an in-tree is
attached with that point as root. Some of these in-trees may consist only of
the root. The ith oriented circuit has smallest element r
i
, and the circuits are
to be ordered among theselves so that r
1
< r
2
< . . . < r
k
. In the ith circuit,
let l
i
be the vertex to which the arc from r
i
points, i.e., f(r
i
) = l
i
. Suppose
there are k oriented circuits. We may now construct a tree T from D by
deleting the arcs (r
i
, l i) (to create a forest of trees) and then adjoining the
arcs (1, l
1
), (r
1
, l
2
), . . . , (r
k1
, l
k
), (r
k
, n).
For the reverse process, suppose the labeled tree T is given. Put r
0
:= 1,
and dene r
i
to be the smallest vertex on the (unique) path from r
i1
to n.
Now delete the edges r
i1
, l
i
, i = 1, . . . , k 1, and r
k
, n, to create
k + 2 components. View the vertex 1 as the root of a directed in-tree.
Similarly, view each vertex along the path from l
i
to r
i
as the root of an
in-tree. Now adjoin the directed arcs (r
i
, l
i
). We may now view this directed
tree as the functional digraph of a unique function from 2, 3, . . . , n 1 to
[n]. Moreover, it should be clear that this correspondence between functions
from 2, 3, . . . , n 1 to [n] and labeled trees on [n] is a bijection.
Proof #4. In this proof, due to Joyal (1981), we describe a many-to-one
function F from the set of n
n
functions from [n] to [n] to the set of labeled
trees on [n] such that the preimage of each labeled tree contains n
2
functions.
First, recall that a permutation of the elements of a set S may be
viewed simultaneously as a linear arrangement of of the elements of S and
as a product of disjoint oriented cycles of the objects of S. In the present
context we want S to be a set of disjoint rooted trees on [n] that use precisely
all the elements of [n]. But there are many such sets S, and the general result
we need is that the number of linear arrangements of disjoint rooted trees
on [n] that use precisely all the elements of [n] is the same as the number of
collections of disjoint oriented cycles of disjoint rooted trees on [n] that use
precisely all the elements of [n].
To each function f we may associate its functional digraph which has an
arc from i to f(i) for each i in [n]. Every (weakly) connected component
of a functional digraph (i.e., connected component of the underlying undi-
rected graph) can be represented by an oriented cycle of rooted trees, so
that the cycles corresponding to dierent components are disjoint and all the
components use the elements in [n], each exactly once. Clearly there are n
n
functions from [n] to [n], each corresponding uniquely to a functional digraph
which is represented by a collection of disjoint oriented cycles of rooted trees
1.11. THE MATRIX-TREE THEOREM 35
on [n] that together use each element in [n] exactly once. Each collection
of disjoint oriented cycles of rooted trees (using each element of [n] exactly
once) corresponds uniquely to a linear arrangement of a collection of rooted
trees (using each element of [n] exactly once). Hence n
n
is the number of
linear arrangements of rooted trees on [n] (by which we always mean that
the rooted trees in a given linear arrangement use each element of [n] exactly
once).
We claim now that n
n
= n
2
t
n
, where t
n
is the number of (labeled) trees
on [n].
It is clear that n
2
t
n
is the number of triples (x, y, T), where x, y [n] and
T is a tree on [n]. Given such a triple, we obtain a linear arrangement of
rooted trees by removing all arcs on the unique path from x to y and taking
the nodes on this path to be the roots of the trees that remain, and ordering
these trees by the order of their roots in the original path from x to y. In
this way each labeled tree corresponds to n
2
linear arrangements of rooted
trees on [n].
1.11 The Matrix-Tree Theorem
The matrix-tree theorem expresses the number of spanning trees in a graph
as the determinant of an appropriate matrix, from which we obtain one more
proof of Cayleys theorem counting labeled trees. The main ingredient in
the proof is the following theorem known as the Cauchy-Binet Theorem. It
is more commonly stated and applied with the diagonal matrix below
taken to be the identity matrix. However, the generality given here actually
simplies the proof.
Theorem 1.11.1 Let A and B be, respectively, r m and mr matrices,
with r m. Let be the mm diagonal matrix with entry e
i
in the (i, i)-
position. For an r-subset S of [m], let A
S
and B
S
denote, respectively, the
r r submatrices of A and B consisting of the columns of A, or the rows of
B, indexed by the elements of S. Then
det(A B) =

S
det(A
S
)det(B
S
)

iS
e
i
,
where the sum is over all r-subsets S of [m].
36 CHAPTER 1. BASIC COUNTING TECHNIQUES
Proof: We prove the theorem assuming that e
1
, . . . , e
m
are independent
(commuting) indeterminates over F. Of course it will then hold for all values
of e
1
, . . . , e
m
in F.
Recall that if C = (c
ij
) is any r r matrix over F, then
det(C) =

S
r
sgn()c
1(1)
c
2(2)
c
r(r)
.
Given that A = (a
ij
) and B = (b
ij
), the (i,j)-entry of AB is

m
k=1
a
ik
e
k
b
kj
,
and this is a linear form in the indeterminates e
1
, . . . , e
m
. Hence det(AB) is
a homogeneous polynomial of degree r in e
1
, . . . , e
m
. Suppose that det(AB)
has a monomial e
t
1
1
e
t
2
2
. . . where the number of indeterminates e
i
that have
t
i
> 0 is less than r. Substitute 0 for the indeterminates e
i
that do not
appear in e
t
1
1
e
t
2
2
. . ., i.e., that have t
i
= 0. This will not aect the monomial
e
t
1
1
e
t
2
2
. . . or its coecient in det(A B). But after this substitution has
rank less than r, so A B has rank less than r, implying that det(A B)
must be the zero polynomial. Hence we see that the coecient of a monomial
in the polynomial det(A B) is zero unless that monomial is the product
of r distinct indeterminates e
i
, i.e., unless it is of the form

iS
e
i
for some
r-subset S of [m].
The coecient of a monomial

iS
e
i
in det(A B) is found by setting
e
i
= 1 for i S, and e
i
= 0 for i , S. When this substitution is made in
, A B evaluates to A
S
B
S
. So the coecient of

iS
e
i
in det(A B) is
det(A
S
)det(B
S
).
Exercise: 1.11.2 Let M be an n n matrix all of whose linesums are zero.
Then one of the eigenvalues of M is
1
= 0. Let
2
, . . . ,
n
be the other
eigenvalues of M. Show that all principal n 1 by n 1 submatrices have
the same determinant and that this value is
1
n

3

n
.
An incidence matrix N of a directed graph H is a matrix whose rows
are indexed by the vertices V of H, whose columns are indexed by the edges
E of H, and whose entries are dened by:
N(x, e) =
_

_
0 if x is not incident with e, or e is a loop,
1 if x is the head of e,
1 if x is the tail of e.
Lemma 1.11.3 If H has k components, then rank(N) = [V [ k.
1.11. THE MATRIX-TREE THEOREM 37
Proof: N has v = [V [ rows. The rank of N is v n, where n is the
dimension of the left null space of N, i.e., the dimension of the space of row
vectors g for which gN = 0. But if e is any edge, directed from x to y, then
gN = 0 if and only if g(x) g(y) = 0. Hence gN = 0 i g is constant on
each component of H, which says that n is the number k of components of
H.
Lemma 1.11.4 Let A be a square matrix that has at most two nonzero en-
tries in each column, at most one 1 in each column, at most one -1 in each
column, and whose entries are all either 0, 1 or -1. Then det(A) is 0, 1 or
-1.
Proof: This follows by induction on the number of rows. If every column
has both a 1 and a -1, then the sum of all the rows is zero, so the matrix is
singular and det(A) = 0. Otherwise, expand the determinant by a column
with one nonzero entry, to nd that it is equal to 1 times the determinant
of a smaller matrix with the same property.
Corollary 1.11.5 Every square submatrix of an incidence matrix of a di-
rected graph has determinant 0 or 1. (Such a matrix is called totally
unimodular.)
Theorem 1.11.6 (The Matrix-Tree Theorem) The number of spanning trees
in a connected graph G on n vertices and without loops is the determinant of
any n 1 by n 1 principal submatrix of the matrix D A, where A is the
adjacency matrix of G and D is the diagonal matrix whose diagonal contains
the degrees of the corresponding vertices of G.
Proof: First let H be a connected digraph with n vertices and with inci-
dence matrix N. H must have at least n 1 edges, because it is connected
and must have a spanning tree, so we may let S be a set of n1 edges. Using
the notation of the Cauchy-Binet Theorem, consider the n by n1 submatrix
N
S
of N whose columns are indexed by elements of S. By Lemma 1.11.3, N
S
has rank n1 i the spanning subgraph of H with S as edge set is connected,
i.e., i S is the edge set of a tree in H. Let N
t
be obtained by dropping any
single row of the incidence matrix N. Since the sum of all rows of N (or of
N
S
) is zero, the rank of N
t
S
is the same as the rank of N
S
. Hence we have
the following:
38 CHAPTER 1. BASIC COUNTING TECHNIQUES
det(N
t
S
) =
_
1 if S is the edge set of a spanning tree in H,
0 otherwise.
(1.14)
Now let G be a connected loopless graph on n vertices. Let H be any
digraph obtained by orienting G, and let N be an incidence matrix of H.
Then we claim NN
T
= D A. For,
(NN
T
)
xy
=

eE(G)
N(x, e)N(y, e)
=
_
deg(x) if x = y,
t if x and y are joined by t edges in G.
An n 1 by n 1 principal submatrix of D A is of the form N
t
N
tT
where N
t
is obtained from N by dropping any one row. By Cauchy-Binet,
det(N
t
N
tT
) =

S
det(N
t
S
)det(N
tT
S
) =

S
(det(N
t
S
))
2
,
where the sum is over all n 1 subsets S of the edge set. By Eq. 1.14 this is
the number of spanning trees of G.
Exercise: 1.11.7 (Cayleys Theorem Again) In the Matrix-Tree Theorem,
take G to be the complete graph K
n
. Here the matrix DA is nI J, where
I is the identity matrix of order n, and J is the n by n matrix of all 1s.
Now calculate the determinant of any n 1 by n 1 principal submatrix of
this matrix to obtain another proof that K
n
has n
n2
spanning trees.
Exercise: 1.11.8 In the statement of the Matirx-Tree Theorem it is not nec-
essary to use principal subdeterminants. If the n 1 n 1 submatrix M
is obtained by deleting the ith row and jth column from D A, then the
number of spanning trees is (1)
i+j
det(M). This follows from the more gen-
eral lemma: If A is an n 1 n matrix whose row sums are all equal to
0 and if A
j
is obtained by deleting the jth column of A, 1 j n, then
det(A
j
) = det(A
j+1
).
1.12 Number Theoretic Functions
An arithmetic function (sometimes called a number theoretic function) is a
function whose domain is the set T of positive integers and whose range is a
subset of the complex numbers c. Hence c
1
is just the set of all arithmetic
1.12. NUMBER THEORETIC FUNCTIONS 39
functions. If f is an arithmetic function not the zero function, f is said to
be multiplicative provided f(mn) = f(m)f(n) whenever (m, n) = 1, and to
be totally multiplicative provided f(mn) = f(m)f(n) for all m, n T. The
following examples will be of special interest to us here.
Example 1.12.1 I(1) = 1 and I(n) = 0 if n > 1.
Example 1.12.2 U(n) = 1 for all n T.
Example 1.12.3 E(n) = n for all n T.
Example 1.12.4 The omega function: (n) is the number of distinct primes
dividing n.
Example 1.12.5 The mu function: (n) = (1)
(n)
, if n is square-free, and
(n) = 0 otherwise.
Example 1.12.6 Eulers phi-function: (n) is the number of integers k, 1
k n, with (k, n) = 1.
The following additional examples often arise in practice.
Example 1.12.7 The Omega function: (n) is the number of primes divid-
ing n counting multiplicity. So (n) = (n) i n is square-free.
Example 1.12.8 The tau function: (n) is the number of positive divisors
of n.
Example 1.12.9 The sigma function: (n) is the sum of the positive divi-
sors of n.
Example 1.12.10 A generalization of the sigma function:
k
(n) is the sum
of the kth powers of the positive divisors of n.
Dirichlet (convolution) Product of Arithmetic Functions.
Def. If f and g are arithmetic functions, dene the Dirichlet product
f g by:
(f g)(n) =

d[n
f(d)g(n/d) =

d
1
d
2
=n
f(d
1
)g(d
2
).
40 CHAPTER 1. BASIC COUNTING TECHNIQUES
Obs. 1.12.11 f g = g f.
Obs. 1.12.12 If f, g, h are arithmetic functions, (f g) h = f (g h), and
[(f g) h)](n) =

d
1
d
2
d
3
=n
f(d
1
)g(d
2
)h(d
3
).
Obs. 1.12.13 I f = f I = f for all f. And I is the unique multiplicative
identity.
Obs. 1.12.14 An arithmetic function f has a (necessarily unique) multi-
plicative inverse f
1
i f(1) ,= 0.
Proof: If f f
1
= I, then f(1)f
1
(1) = (f f
1
)(1) = I(1) = 1, so
f(1) ,= 0. Conversely, if f(1) ,= 0, then f
1
(1) = (f(1))
1
. Use induction
on n. For n > 1, if f
1
(1), f
1
(2), . . . , f
1
(n 1) are known, f
1
(n) may be
obtained from 0 = I(n) = (f f
1
)(n) =

d[n
f(d)f
1
(n/d) for n > 1.
The following theorem has essentially been proved.
Theorem 1.12.15 The set of all arithmetic functions f with f(1) ,= 0 forms
a group under Dirichlet multiplication.
Theorem 1.12.16 The set of all multiplicative functions is a subgroup.
Proof: f(1) ,= 0 ,= g(1) implies (f g)(1) ,= 0. Associativity holds by
Obs. 1.12.12. The identity I is clearly multiplicative. So suppose f, g are
multiplicative. Let (m, n) = 1. Then
(f g)(mn) =

d[mn
f(d)g
_
mn
d
_
=

d
1
[m

d
2
[n
f(d
1
d
2
)g
_
mn
d
1
d
2
_
=

d
1
[m

d
2
[n
f(d
1
)f(d
2
)g(m/d
1
)g(n/d
2
)
=

d
1
[m
f(d
1
)g(m/d
1
)

d
2
[n
f(d
2
)g(n/d
2
)
= (f g)(m)(f g)(n).
Finally, we need to show that if f is multiplicative, in which case f
1
exists, then also f
1
is multiplicative. Dene g as follows. Put g(1) = 1,
and for every prime p and every j > 0 put g(p
j
) = f
1
(p
j
). Then extend
g multiplicatively for all n T. Since f and g are both multiplicative, so
is f g. Then for any prime power p
k
, (f g)(p
k
) =

d
1
d
2
=p
k f(d
1
)g(d
2
) =

d
1
d
2
=p
k f(d
1
)f
1
(d
2
) = (f f
1
)(p
k
) = I(p
k
). So f g and I coincide on
prime powers and are multiplicative. Hence f g = I, implying g = f
1
, i.e.,
f
1
is multiplicative.
Clearly is multiplicative, and

d[n
(d) = 1 if n = 1. For n = p
e
,

d[n
(d) =

e
j=0
(p
j
) = 1 + (1) + 0 + 0 = 0. Hence U = I and we
have proved the following:
1.12. NUMBER THEORETIC FUNCTIONS 41
Obs. 1.12.17
1
= U; U
1
= .
Theorem 1.12.18 Mobius Inversion: F = U f i f = F.
This follows from
1
= U and associativity. In its more usual form it
appears as:
F(n) =

d[n
f(d) n T i f(n) =

d[n
(d)F(n/d) n T.
NOTE: Here we sometimes say F is the sum function of f. When F
and f are related this way it is interesting to note that F is multiplicative if
and only if f is multiplicative. For if f is multiplicative, then F = U f is
multiplicative. Conversely, if F = U f is multiplicative, then F = f is
also multiplicative.
Exercise: 1.12.19 1. = U U is multiplicative, and (n) =

p

[[n
(+
1).
2. = E is multiplicative. (First show E = U.)
3. = U E is multiplicative and (n) =

p

[[n
_
p
+1
1
p1
_
.
4. = .
5. = E E.
6. E
1
(n) = n(n).
Sometimes it is useful to have even more structure on c
1
. For f, g c
1
,
dene the sum of f and g as follows:
(f +g)(n) = f(n) +g(n)
Then a large part of the following theorem has already been proved and
the remainder is left as an exercise.
Theorem 1.12.20 With the above denitions of addition and convolution
product, (c
1
, +, ) is a commutative ring with unity I, and f c
1
is a unit
i f(1) ,= 0.
42 CHAPTER 1. BASIC COUNTING TECHNIQUES
Exercise: 1.12.21 For g c
1
, dene g c
1
by g(n) = ng(n). Show that
g g is a ring automorphism. In particular,

g
1
= ( g)
1
.
Exercise: 1.12.22 In how many ways can a necklace with n beads be
formed out of beads labeled L, R, 1, 2, . . . , m so there is at least one L,
and the Ls and Rs alternate (so that the number of Ls is the same as the
number of Rs)?
1.13 Inclusion Exclusion
Let E be a set of N objects. Let a
1
, a
2
, . . . , a
m
be a set of m properties
that these objects may or may not have. In general these properties are not
mutually exclusive. Let A
i
be the set of obejcts in E that have property a
i
.
In fact, it could even happen that A
i
and A
j
are the same set even when i and
j are dierent. Let N(a
i
) be the number of objects that have the property
a
i
. Let N(a
t
i
) be the number of objects that do not have property a
i
. Then
N(a
i
a
t
j
) denotes the number of objects that have property a
i
but do not have
property a
j
. It is easy to see how to generalize this notation and to establish
identities such as the following:
N = N(a
i
) +N(a
t
i
); N = N(a
i
a
j
) +N(a
i
a
t
j
) +N(a
t
i
a
j
) +N(a
t
i
a
t
j
). (1.15)
We now introduce some additional notation.
s
0
= N;
s
1
= N(a
1
) +N(a
2
) + +N(a
m
) =

i
N(a
i
);
s
2
= N(a
1
a
2
) +N(a
1
a
3
) + +N(a
m1
a
m
) =

i,=j
N(a
i
a
j
);
s
3
= N(a
1
a
2
a
3
) + +N(a
m2
a
m1
a
m
) =

1i<j<km
N(a
i
a
j
a
k
);
.
.
.
s
m
= N(a
1
a
2
a
m
).
Also,
1.13. INCLUSION EXCLUSION 43
e
0
= N(a
t
1
a
t
2
a
t
m
);
e
1
= N(a
1
a
t
2
a
t
3
a
t
m
) +N(a
t
1
a
2
a
t
3
a
t
m
) + +N(a
t
1
a
t
2
a
t
m1
a
m
);
.
.
.
e
m
= N(a
1
a
2
a
m
).
In other words, e
i
is the number of objects that have exactly i properties.
Theorem 1.13.1 For 0 r m, we have
e
r
=
mr

j=0
(1)
j
_
r +j
j
_
s
r+j
.
Proof: Clearly, if an object has fewer than r of the properties, then it
contributes zero to both sides of the equation. Suppose an object has exactly
r of the properties. Then it contributes exactly 1 to the left hand side. On
the right hand side it contributes 1 to the term with j = 0 and it contributes
0 to all the other terms. Suppose an object has exactly r +k properties with
0 k m r, so it contributes exactly 1 to the the left hand side. On
the right hand side it contributes exactly
_
r+k
r+j
_
to s
r+j
. So the total count it
contributes to the right hand side is
j=mr

j=0
_
r +j
j
__
r +k
r +j
_
(1)
j
.
Notice that
_
r +j
j
__
r +k
r +j
_
=
(r +j)!
r!j!
(r +k)!
(r +j)!(k j)!
=
(r +k)!
r!k!
k!
j!(k j)!
=
_
r +k
r
__
k
j
_
.
Thus the total count on the right hand side is
_
r +k
r
_
_
_
j=k

j=0
_
k
j
_
(1)
j
_
_
= 0.
44 CHAPTER 1. BASIC COUNTING TECHNIQUES
This concludes the proof.
Put S(x) =

i=m
i=0
s
i
x
i
and E(x) =

i=m
i=0
e
i
x
i
.
Using Theorem 1.13.1 we see that
E(x) =
m

r=0
e
r
x
r
=
m

r=0
_
_
mr

j=0
(1)
j
_
r +j
j
_
s
r+j
_
_
x
r
=

0rm
0jmr
(1)
j
_
r +j
j
_
x
r
s
r+j
=
m

k=0
s
k
_
k

r=0
(1)
kr
_
k
k r
_
x
r
_
=
m

k=0
s
k
(x 1)
k
= S(x 1).
This proves the following:
Theorem 1.13.2 E(x) = S(x 1).
Of course, it now follows that E(x + 1) = S(x), from which we easily
deduce the following results:
s
j
=
m

k=j
_
k
j
_
e
k
. (1.16)
E(0) = S(1) =
m

i=0
(1)
i
s
i
. (1.17)
1
2
[E(1) +E(1)] =

i
e
2i
=
1
2
_
_
s
0
+
r

j=0
(2)
j
s
j
_
_
. (1.18)
1
2
[E(1) E(1)] =

i
e
2i+1
=
1
2
_
_
s
0

j=0
(2)
j
s
j
_
_
. (1.19)
Equation 1.17 is the traditional inclusion-exclusion principle. Equation 1.18
gives the number of objects having an even number of properties, and Equa-
tion 1.19 gives the number of objects having an odd number of properties.
We can also easily nd a formula for the number of objects having at least t
of the properties.
1.13. INCLUSION EXCLUSION 45
Exercise: 1.13.3 For k 0, t 0, show that

k
j=0
_
t+k
j
_
= (1)
k
_
t+k1
k
_
.
Theorem 1.13.4 The number of objects having at least t of the m properties
is given by

rt
e
r
=
mt

j=0
(1)
j
_
t +j 1
t 1
_
s
t+j
.
Proof: The proof of this result amounts to collecting terms appropri-
ately in the left hand sum and observing that the coecient on s
t+j
is

j
i=0
(1)
i
_
t+j
i
_
, which by Ex. 1.13.3 is equal to (1)
j
_
t+j1
t1
_
. In detail,
and using Theorem 1.13.1,
m

r=t
e
r
=
m

r=t
mr

j=0
(1)
j
_
r +j
j
_
s
r+j
.
Here t and m are xed with 0 t m, and r and j are dummy variables
with t r m and 0 j m r. We introduce new dummy variables k
and i by the invertible substitution r = t + i, j = k i. The constraints on
k and i are 0 i k mt. So continuing, we have
m

r=t
e
r
=

i,k:0ikmt
(1)
ki
_
t +k
k i
_
s
t+k
=
mt

k=0
_
k

i=0
(1)
ki
_
t +k
k i
__
s
t+k
=
mt

k=0
_
_
k

j=0
(1)
j
_
t +k
j
_
_
_
s
t+k
=
mt

k=0
(1)
k
_
t +k 1
k
_
s
t+k
.
If we replace each property a
i
with its negation a
t
i
, and use the notation
s
i
,

S(x), e
i
, etc., for the corresponding analogues of s
i
, e
i
, etc., we can see
how to write

S(x) in terms of the s
i
.
Theorem 1.13.5

S(x) =
m

r=0
_

k
(1)
k
_
mk
mr
_
s
k
_
x
r
.
Proof: It is clear that e
i
= e
mi
, i.e.,

E(x) is the reverse x
m
E
_
1
x
_
of
E(x). Recall that S(x) = E(x + 1). Then
46 CHAPTER 1. BASIC COUNTING TECHNIQUES

S(x) =

E(x + 1) = (x + 1)
m
E
_
1
x + 1
_
= (x + 1
m
E
_
1 +
x
s + 1
_
= (x + 1)
m
S
_
x
x + 1
_
=
m

k=0
(x)
k
(x + 1)
mk
s
k
.
The coecient of x
r
in this expression is
[x
r
]
_

k
(x)
k
(x + 1)
mk
s
k
_
=

k
[x
rk
]
_
(1)
k
(x + 1)
mk
s
k
_
=

k
(1)
k
_
mk
r k
_
s
k
,
from which the theorem follows.
Application to derangements: Let D
n
be the number of permutations
= b
1
b
n
of 1, 2, . . . , n for which b
i
,= i for all i, i.e., D
n
is the number
of derangements of n things. Here the N objects are the n! permutations.
The property a
i
is dened by: The permutation has property a
i
provided
b
i
= i. Then N(a
i
1
a
i
r
) = (n r)!, and s
j
=
_
n
j
_
(n j)! =
n!
j!
. So
e
0
=

n
p=0
(1)
p n!
p!
= n!

n
p=0
(1)
p
p!
.
Probl`em des Rencontres: Let D
n,r
be the number of permutations
= b
1
b
n
with exactly r xed elements, i.e., b
j
= j for r values of j.
Choose r xed symbols in
_
n
r
_
ways, and multiply by the number D
nr
of
derangements on the remaining n r elements.
D
n,r
=
n!
r!(nr)!
_
(n r)!(
1
0!

1
1!
+
1
2!
+
1
(nr)!
)
_
. Or,
D
n,r
=

nr
p=0
(1)
p
_
r +p
r
_
n!
(r+p)!
=
n!
r!

nr
p=0
(1)
p 1
p!
.
Application to Eulers Phi function: Let n = p
a
1
p
a
r
be the prime
power factorization of the positive integer n. Apply the InclusionExclusion
Principle with E = [n] = 1, . . . , n, and let a
i
be the property (of a positive
integer) that it is divisible by p
i
, 1 i r. This yields
(n) = n
r

i=1
n
p
i
+

1i<jr
n
p
i
p
j
= n
r

i=1
_
1
1
p
i
_
.
1.14. ROOK POLYNOMIALS 47
1.14 Rook Polynomials
Let C be an n m matrix, 1 n m, each of whose entries is a 0 or a 1.
A line of C is a row or column of C. An independent k-set of C is a set of k
1s of C with no two on the same line. Given the matrix C, for 0 r n we
let r
k
(C) = r
k
be the number of independent k-sets of C. A diagonal of C is
a set of n entries of C with no two in the same line, i.e., a set of n entries of
C with one in each row and no two in the same column. Let E be the set of
diagonals of C and let N = [E[ = m(m 1) (m n + 1) be the number
of diagonals of C. Let a
j
be the property (that a diagonal may or may not
have) that the entry in row j of the diagonal is a 1.
If we select an independent j-set of C (in r
j
ways) and then in the re-
maining n j rows select n j entries in (mj)(mj 1) (mn +1)
ways, we see that using the notation of the inclusion-exclusion principle we
have
s
j
= r
j
(mj)(mj 1) (mn + 1) =
(mj)!
(mn)!
r
j
. (1.20)
The number of diagonals of C with exactly j 1s is clearly the same as
the number of diagonals of C with exactly nj 0s. If J is the nm (0, 1)-
matrix each entry of which is a 1s, then

C = J C is the complement of C.
If E(x) =

m
i=0
e
i
x
i
where e
i
is the number of diagonals of C having exactly
i 1s of C, then the reverse polynomial is given by

E(x) = x
n
E(
1
x
) =
m

i=0
e
i
x
i
=
m

i=0
e
ni
,
where e
i
is the number of diagonals of C with exactly i 0s of C. Clearly
e
i
= e
ni
.

k
(mk)!
(mn)!
r
k
x
k
=

k
s
k
x
k
=

S(x)
=

E(x + 1) = (x + 1)
n
E
_
1
x + 1
_
= (x + 1)
n
E
_
1 +
x
x + 1
_
=
(x + 1)
n
(mn)!

i
r
i
(mi)!E
_
x
x + 1
_
i
48 CHAPTER 1. BASIC COUNTING TECHNIQUES
=

i
r
i

(mi)!
(mn)!
(x)
i
(x + 1)
ni
.
The coecient of x
k
in the rst term of this sequence of equal expressionsis
clearly
(mk)!
(mn)!
r
k
. The coecient of x
k
in the last term is
[x
k
]
_

i
r
i
(mi)!
(mn)!
(x)
i
(x + 1)
ni
_
=

i
[x
ki
]
_
r
i
(mi)!
(mn)!
(1)
i
(x + 1)
ni
_
=

i
(1)
i
(mi)!
(mn)!
_
n i
k i
_
.
Hence we have established the following theorem.
Theorem 1.14.1
r
k
=

i
(1)
i
(mi)!
(mk)!
_
n i
n k
_
r
i
.
There is a special case that is often used.
Corollary 1.14.2 Suppose m = n. Then
r
n
=

i
(1)
i
(n i)! r
i
.
For a given nm (0,1)-matrix C we continue to let r
k
denote the number
of independent k-sets of C, and let
R
C
(x) = R(x) =
n

k=0
r
k
x
k
be the ordinary generating function of the sequence (r
0
, r
1
, r
2
, . . .). Then
R(x) is the rook polynomial of the given matrix.
If C is the direct sum of two (0, 1)-matrices C
1
and C
2
, i.e., no line of C
contains 1s in both C
1
and C
2
, it is easy to see that the independent sets of
C
1
are completely independent of the independent sets of C
2
. It follows that
r
k
(C) =

k
j=0
r
j
(C
1
)r
kj
(C
2
), and hence
1.14. ROOK POLYNOMIALS 49
R
C
(x) = R
C
1
(x)R
C
2
(x) (1.21)
It is also easy to see that if some one line of C contains all the 1s of C,
then R
C
(x) = 1 +ax, where a is the number of 1s of C.
Suppose that in a given matrix C an entry 1
ij
(in row i and column j) is
selected and marked as a special entry. Let C
t
denote the matrix obtained by
deleting row i and column j of the matrix C, and let C
tt
denote the matrix
obtained by replacing the entry 1
ij
of C with a 0
ij
. Then the independent
k-sets of C are naturally divided into two classes: those that have a 1 in row
i and column j and those that do not. The number of independent k-sets of
the rst type is r
k1
(C
t
) and the number of the second type is r
k
(C
tt
). Hence
we have the relation
r
k
(C) = r
k1
(C
t
) +r
k
(C
tt
).
It is now easy to see that we have
R
C
(x) = xR
C
(x) +R
C
(x). (1.22)
Equation 1.22 is called the expansion formula. The rook polynomial of a
(0, 1)-matrix of arbitrary size and shape may be found by repeated appli-
cations of the expansion formula. To facilitate giving an example of this,
let
_
_
_
1 1 0 0
0 1 1 0
0 0 1 1
t
_
_
_
denote the rook polynomial of the displayed matrix and the 1
t
indicates the
entry about which the expansion formula is about to be applied. Then by
the expansion formula we have (since we write the matrix to mean its rook
polynomial)
R
C
(x) = x
_
1 1 0
0 1 1
t
_
+
_
_
_
1 1 0 0
0 1
t
1 0
0 0 1 0
_
_
_
= x
_
x
_
1 1
_
+
_
1 1 0
0 1
t
0
__
+
_

_x
_
1 0 0
0 1 0
_
+
_
_
_
1 1 0 0
0 0 1 0
0 0 1 0
_
_
_
_

_
50 CHAPTER 1. BASIC COUNTING TECHNIQUES
= x
2
(1 + 2s) +x
_
x(1 0) +
_
1 1 0
0 0 0
__
+x(1 +x)
2
+ (1 + 2x)
2
= x
2
+ 2x
3
+x
2
(1 +x) +x(1 + 2x) +x + 2x
2
+x
3
+ 1 + 4x +rx
2
= 4x
3
+ 10x
2
+ 6x + 1.
Exercise: Compute the rook polynomials of the following matrices:
a.
_
1 1
1 0
_
.
b.
_
_
_
0 1 0
1 1 1
1 1 0
_
_
_.
(Answer: 1 + 6x + 7x
2
+x
3
.)
c.
_
_
_
_
_
1 1 1 1 1
1 0 1 1 1
0 0 1 1 0
0 0 1 1 1
_
_
_
_
_
.
1.15 Permutations With forbidden Positions
Consider the distribution of four distinct objects, labeled a, b, c, and d, into
four distinct positions, labeled 1, 2, 3, and 4, with no two objects occupying
the same position. A distribution can be represented in the form of a ma-
trix as illustrated below, where the rows correspond to the objects and the
columns correspond to the positions. A 1 in a cell indicates that the object
in the row containing the cell occupies the position in the column containing
the cell. Thus, the distribution shown in the gure is as follows: a is placed
in the second position, b is placed in the fourth position, c is placed in the
rst position, and d is placed in the third position.
a
b
c
d
_
_
_
_
_
0 1 0 0
0 0 0 1
1 0 0 0
0 0 1 0
_
_
_
_
_
.
1.15. PERMUTATIONS WITH FORBIDDEN POSITIONS 51
Since an object cannot be placed in more than one position and a position
cannot hold more than one object, in the matrix representation of an accept-
able distribution there will never be more than one 1 in a row or column.
Hence an acceptable distribution is equivalent to an independent 4-set of 1s.
We can extend this notion to the case where there are forbidden positions
for each of the objects. For example, for the derangement of four objects,
the forbidden positions are just those along the main diagonal. Also, it is
easy to see that r
k
(x) =
_
4
k
_
, so that R
I
(x) = (1 + x)
4
. Hence the problem
of enumerating the number of derangements of four objects is equivalent to
the problem of nding the value of r
4
for the complementary matrix
_
_
_
_
_
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
_
_
_
_
_
.
By Theorem 1.14.1, r
4
=

4
i=0
(1)
i
(4i)!
(44)!
_
4i
44
_
r
i
=

4
i=0
(1)
i
(4 i)!
_
4
i
_
=

4
i=0
(1)
i 1
i!
. Of course, this agrees with the usual formula.
Nontaking Rooks A chess piece called a rook can capture any oppo-
nents piece in the same row or column of the given rook (provided there are
no intervening pieces). Instead of using a normal 8 8 chessboard, suppose
we play chess on the board consisting solely of those positions of an nm
(0,1)-matrix where the 1s appear. Counting the number of ways to place k
mutually nontaking rooks on this board of entries equal to 1 is equivalent to
our earlier problem of counting the number of independent k-sets of 1s in
the matrix. Consider the example represented by the following matrix:
B =
_
_
_
_
_
_
_
_
1 1 0 0 1
1 1 1 0 1
1 0 1 1 0
1 1 1 1 1
1 1 1 1 0
_
_
_
_
_
_
_
_
Thus r
k
(B) counts the number of ways k nontaking rooks can be placed
in those entries of B equal to 1. The 5 5 matrix B could be consideredto
have arisen from a job assignment problem. The rows correspond to workers,
the columns to jobs, and the (i, j) entry is a 1 provided worker i is suitable
52 CHAPTER 1. BASIC COUNTING TECHNIQUES
for job j. We wish to determine the number of ways in which each worker can
be assigned to one job, no more than one worker per job, so that a worker
only gets a job to which he or she is suited. It is easy to see that this is
equivalent to the problem of computing r
5
(B). Since there are several more
1s than 0s, it might be easier to deal with the complementary matrix
B
t
=
_
_
_
_
_
_
_
_
0 0 1 1 0
0 0 0 1 0
0 1 0 0 1
0 0 0 0 0
0 0 0 0 1
_
_
_
_
_
_
_
_
Easy Exercise: Show that if the matrix C
t
is obtained from matrix C
by deleting rows or columns with no entries equal to 1, then r
k
(C
t
) = r
k
(C).
Let B
tt
be the matrix obtained by deleting column 1 and row 4 from B
t
.
So
B
tt
=
_
_
_
_
_
0 1 1 0
0 0 1 0
1 0 0 1
0 0 0 1
_
_
_
_
_
By Equation 1.21, we see that r
B
(x) = r
C
1
(x) r
C
2
(x),where
C
1
= C
2
=
_
1 1
0 1
_
.
We easily compute r
C
1
(x) = 1 + 3x +x
2
, so
r
B
(x) = r
B
(x) = (1 + 3x +x
2
)
2
= 1 + 6x + 11x
2
+ 6x
3
+x
4
.
Now using (the dual of) Theorem 1.14.1 we nd
r
5
(B) =

i
(1)
i
(5 i)!
(5 5)!
_
5 i
5 5
_
r
i
=

i
(1)
i
(5 i)! r
i
= 5! 1 4! 6 +3! 11 2! 6 +1! 1 0! 0 = 31.
The next example is a 57 matrix B that arises from a problem of storing
computer programs. The (i, j) position is a 1 provided storage location j has
1.15. PERMUTATIONS WITH FORBIDDEN POSITIONS 53
sucient storage capacity for program i. We wish to assign each program to
a storage location with sucient storage capacity, at most one program per
location. The number of ways this can be done is again given by r
5
(B).
B =
_
_
_
_
_
_
_
_
1 0 0 0 1 0 0
1 0 1 0 1 0 0
1 1 1 0 1 0 1
1 1 1 0 1 0 0
1 1 1 1 1 1 1
_
_
_
_
_
_
_
_
Exercise Compute r
5
(B).
Probl`eme des Menages A menage is a permutation of 1, 2, . . . , n in which
i does not appear in position i or i +1 (mod n). Let P
n
be the permutation
matrix with a 1 in position (i, i + 1) (mod n), 1 i n. So P
n
represents
the cycle (1, 2, 3 . . . , n). Then let M
n
= I
n
+ P
n
, and let M
n
(x) be the rook
polynomial of M
n
. If J
n
is the n n matrix of 1s, then e
n
(J
n
I
n
P
n
) is
the number U
n
of menages.
Let M

n
be obtained from M
n
by changing the 1 in position (n, 1) to a 0,
and let M
0
n
be obtained from M
n
by changing both 1s of column 1 to 0s
(i.e., the 1s in positions (1,1) and (n, 1) become 0). It should be clear after
a little thought (using the expansion formula and the fact that a matrix and
its transpose have the same rook polynomial), that
M
n
(x) = M

n
(x) +xM

n1
(x).
Since M
0
n
has only zeros in its rst column, M
0
n
(x) =

M
0
n
(x), where

M
0
n
=
_
_
_
_
_
_
_
_
_
_
1
1 1
1 1

1 1
1
_
_
_
_
_
_
_
_
_
_
with 1s in positions (1, 1), (2, 1), (2, 2), (3, 2), (3, 3), . . . , (n1, n2), (n, n
1). Here

M
0
n
is n(n1). Now select the 1 in position (n, n1) to use the
expansion theorem for rook polynomials. So deleting the row and column
containing 1
(n,n1)
, we get

M
0
n1
. Also, replacing 1
(n,n1)
with a zero and
then removing the bottom row of 0s, we get (M

n1
)
T
. Since a matrix and
its transpose have the same rook polynomial, we have
54 CHAPTER 1. BASIC COUNTING TECHNIQUES
M
0
n
(x) = x M
0
n1
(x) +M

n1
(x). (1.23)
M

n
has one 1 in its bottom row, in position (n, n). Expand about this 1.
Deleting the row and column of this 1 gives m

n1
. Changing this 1 to a 0
and deleting the bottom row of zeros gives the transpose of

M
0
n
. Hence
M

n
(x) = x M

n1
(x) +M
0
n
(x) = (x + 1)M

n1
(x) +x M
0
n1
(x). (1.24)
Rewrite the last two equations as a matrix equation:
_
M
0
n
(x)
M

n
(x)
_
=
_
x 1
x x + 1
__
M
0
n1
(x)
M

n1
(x)
_
. (1.25)
M

1
= (1) M

1
(x) = 1 +x.
M
0
1
= (0) M
0
1
(x) = 1.
M

2
=
_
1 1
0 1
_
M

2
(x) = 1 + 3x +x
2
.
M
0
2
=
_
0 1
0 1
_
M
0
2
(x) = 1 + 2x.
Induction Hypothesis (for n 0):
M
0
n
(x) =
n1

k=0
_
2(n 1) k + 1
k
_
x
k
;
M

n
(x) =
n

k=0
_
2n k
k
_
x
k
.
Then
_
x 1
x x + 1
__
M
0
n
(x)
M

n
(x)
_
=
_
_

n1
k=0
_
2(n1)k+1
k
_
x
k+1
+

n
k=0
_
2nk
k
_
x
k

n1
k=0
_
2(n1)k+1
k
_
x
k+1
+

n
k=0
_
2nk
k
_
(x
k+1
+x
k
)
_
_
1.16. RECURRENCE RELATIONS: M
`
ENAGE NUMBERS AGAIN 55
=
_
_

n
k=1
_
2nk
k1
_
x
k
+

n
k=0
_
2nk
k
_

n
k=1
_
2nk
k1
_
x
k
+

n+1
k=1
_
2nk+1
k1
_
x
k
+

n
k=0
_
2nk
k
_
x
k
_
_
At this point we can easily compute that
M
n
(x) = M

N
(x) +xM

n1
=
n

j=1
_
2n j
j
_
2n
2n j
x
j
r
j
(I +P) =
_
2n j
j
_
2n
2n j
s
j
(I +P) = (nj)!
_
2n j
j
_
2n
2n j
e
n
(J I P) = e
n
(I +P) = e
0
(I +P)
=
n

i=0
(1)
i
(n j)!
_
2n j
j
_
2n
2n j
= U
n
.
1.16 Recurrence Relations: M`enage Numbers
Again
The original Probl`eme des Menages was probably that formulated by
Lucas. This asks for the number of ways of seating n married couples at a
circular table with men and women in alternate positions and such that no
wife sits next to her husband. The wives may be seated rst, and this may be
done in 2n! ways. Then each husband is excluded from the two seats beside
his wife, but the number of ways of seating the husbands is independent of
the seating arrangement of the wives. Thus is M
n
denotes the number of
seating arrangements for this version of the probl`eme des menages, it is clear
that
M
n
= 2n!U
n
.
Consequently we may concentrate our attention on the menage numbers
U
n
. The formula we derived using rook polynomials will now be obtained
using recursion techniques.
Lemma 1.16.1 Let f(n, k) denote the number of ways of selecting k objects,
no two consecutive, from n objects arranged in a row. Then
f(n, k) =
_
n k + 1
k
_
. (1.26)
56 CHAPTER 1. BASIC COUNTING TECHNIQUES
Proof: Clearly
f(n, 1) =
_
n
1
_
= n,
and for n > 1,
f(n, n) =
_
1
n
_
= 0.
Now let 1 < k < n. Split the selections into those that include the rst
object and those that do not. Those that include the rst object cannot
include the second object and are enumerated by f(n2, k 1). Those that
do not include the rst object are enumerated by f(n1, k). Hence we have
the recurrence
f(n, k) = f(n 1, k) +f(n 2, k 1). (1.27)
We may now prove Eq. 1.26 by strong induction on n. Our induction
hypothesis includes the assertions that
f(n 1, k) =
_
n k
k
_
; f(n 2, k 1) =
_
n k
k 1
_
.
These together with Eq. 1.27 clearly imply Eq. 1.26.
Lemma 1.16.2 Let g(n, k) denote the number of ways of selecting k objects,
no two consecutive, from n objects arranged in a circle. Then
g(n, k) =
n
n k
_
n k
k
_
)n > k).
Proof: As before, split the selections into those that inlcude the rst
object and those that do not. the selections that include the rst object
cannot include the second object or the last object and are enumerated by
f(n 3, k 1).
The selections that do not include the rst object are enumerated by
f(n 1, k).
Hence
g(n, k) = f(n 1, k) +f(n 3, k 1),
1.16. RECURRENCE RELATIONS: M
`
ENAGE NUMBERS AGAIN 57
and Lemma 1.16.2 is an easy consequence of Lemma 1.16.1.
Now return again to the consideration of the permutations of 1, 2, . . . , n.
Let a
i
be the property that a permutation has i in position i, 1 i n, and
let b
i
be the property that a permutation has i in position i+1, 1 i n1,
with b
n
the property that the permutation has n in position 1. Now let the
2n properties be listed in a row:
a
1
, b
1
, a
2
, b
2
, . . . , a
n
, b
n
.
Select k of these properties and ask for the number of permutations that
satisfy each of the k properties. The answer is 0 if the k properties are not
compatible. If they are compatible, then k images under the permutation
are xed and there are (n k)! ways to complete the permutation. Let v
k
denote the number of ways of selecting k compatible properties from the 2n
properties. Then by the classical inclusion-exclusion principle,
U
n
=
n

i=0
(1)
i
v
i
(n i)!. (1.28)
It remains to evaluate v
k
. But we see that if the 2n properties are arranged
in a circle, then only the consecutive ones are not compatible. Hence by
Lemma 1.16.2,
v
k
=
2n
2n k
_
2n k
k
_
, (1.29)
and this completes the proof.
Chapter 2
Systems of Representatives and
Matroids
2.1 The Theorem of Philip Hall
The material of this chapter does not belong to enumerative combinatorics,
but it is of such fundamental importance in the general eld of combinatorics
that we feel impelled to include it.
Let o and I be arbitrary sets. For each i I let A
i
o. If a
i
A
i
for all
i I, we say a
i
: i I is a system of representatives for / = (A
i
: i I).
If in addition a
i
,= a
j
whenever i ,= j, even though A
i
may equal A
j
, then
a
i
: i I is a system of distinct representatives (SDR) for /. Our rst
problem is: Under what conditions does some family / of subsets of a set o
have an SDR?
For a nite collection of sets a reasonable answer was given by Philip Hall
in 1935. It is obvious that if / = (A
i
: i I) has an SDR, then the union
of each k of the members of / = (A
i
: i I) must have at least k elements.
Halls observation was that this obvious necessary condition is also sucient.
We state the condition formally as follows:
Condition (H) : Let I = [n] = 1, 2, . . . , n, and let S be any (nonempty)
set. For each i I, let A
i
S. Then / = (S
1
, . . . , S
n
) satises Condition
(H) provided for each K I, [
kK
S
k
[ [K[.
Theorem 2.1.1 The family / = (S
1
, . . . , S
n
) of nitely many (not neces-
sarily distinct) sets has an SDR if and only if it satises Condition (H).
67
68 CHAPTER 2. SYSTEMS OF REPRESENTATIVES AND MATROIDS
Proof: As Condition (H) is clearly necessary, we now show that it is also
sucient. B
r,s
denotes a block of r subsets (S
i
1
, . . . , S
i
r
) belonging to /,
where s = [ S
j
: S
j
B
r,s
[. So Condition (H) says: s r for each block
B
r,s
. If s = r, B
r,s
is called a critical block. (By convention, the empty block
B
0,0
is critical.)
If B
r,s
= (A
1
, . . . , A
u
, C
u+1
, . . . , C
r
) and
B
t,v
= (A
1
, . . . , A
u
, D
u+1
, . . . , D
t
), write B
r,s
B
t,v
=
(A
1
, . . . , A
u
); B
r,s
B
t,v
= (A
1
, . . . , A
u
, C
u+1
, . . . , C
r
, D
u+1
, . . . , D
t
). Here
the notation implies that A
1
, . . . , A
u
are precisely the subsets in both blocks.
Then write
B
r,s
B
t,v
= B
u,w
, where w = [ A
i
: 1 i u[, and B
r,s
B
t,v
= B
y,z
,
where y = r +t u, z = [ S
i
: S
i
B
r,s
B
t,v
[.
The proof will be by induction on the number n of sets in the family /,
but rst we need two lemmas.
Lemma 2.1.2 If / satises Condition (H), then the union and intersection
of critical blocks are themselves critical blocks.
Proof of Lemma 2.1.2. Let B
r,r
and B
t,t
be given critical blocks. Say B
r,r

B
t,t
= B
u,v
; B
r,r
B
t,t
= B
y,z
. The z elements of the union will be the r + t
elements of B
r,r
and B
t,t
reduced by the number of elements in both blocks,
and this latter number includes at least the v elements in the intersection:
z r +t v. Also v u and z y by Condition (H). Note: y +u = r +t.
Hence r +t v z y = r +t u r +t v, implying that equality holds
throughout. Hence u = v and y = z as desired for the proof of Lemma 2.1.2
.
Lemma 2.1.3 If B
k,k
is any critical block of /, the deletion of elements of
B
k,k
from all sets in / not belonging to B
k,k
produces a new family /
t
in
which Condition (H) is still valid.
Proof of Lemma2.1.3. Let B
r,s
be an arbitrary block, and (B
r,s
)
t
= B
t
r,s

the block after the deletion. We must show that s


t
r. Let B
r,s
B
k,k
= B
u,v
and B
r,s
B
k,k
= B
y,z
. Say
B
r,s
= (A
1
, . . . , A
u
, C
u+1
, . . . , C
r
),
B
k,k
= (A
1
, . . . , A
u
, D
u+1
, . . . , D
k
).
2.1. THE THEOREM OF PHILIP HALL 69
So B
u,v
= (A
1
, . . . , A
u
), B
y,z
= (A
1
, . . . , A
u
, C
u+1
, . . . , C
r
, D
u+1
, . . . , D
k
).
The deleted block (B
r,s
)
t
= B
t
r,s
is (A
1
, . . . , A
u
, C
t
u+1
, . . . , C
t
r
). But C
u+1
, . . . , C
r
,
as blocks of the union B
y,z
, contain at least z k elements not in B
k,k
. Thus
s
t
v + (z k) u + y k = u + (r + k u) k = r. Hence s
t
r, as
desired for the proof of Lemma 2.1.3.
As indicated above, for the proof of the main theorem we now use induc-
tion on n. For n = 1 the theorem is obviously true.
Induction Hypothesis: Suppose the theorem holds (Condition (H) implies
that there is an SDR) for any family of m sets, 1 m < n.
We need to show the theorem holds for a system of n sets. So let 1 <
n, assume the induction hypothesis, and let / = (S
1
, . . . , S
n
) be a given
collection of subsets of S satisfying Condition (H).
First Case: There is some critical block B
k,k
with 1 k < n. Delete
the elements in the members of B
k,k
from the remaining subsets, to obtain
a new family /
t
= B
k,k
B
t
nk,v
, where B
k,k
and B
t
nk,v
have no common
elements in their members. By Lemma 2.1.3, Condition (H) holds in /
t
, and
hence holds separately in B
k,k
and in B
t
nk,v
viewed as families of sets. By
the induction hypothesis, B
k,k
and B
t
nk,v
have (disjoint) SDRs whose union
is an SDR for /.
Remaining Case: There is no critical block for / except possibly the
entire system. Select any S
j
of / and then select any element of S
j
as its
representative. Delete this element from all remaining sets to obtain a family
/
t
. Hence a block B
r,s
with r < n becomes a block B
t
r,s
with s
t
s, s 1.
By hypothesis B
r,s
was not critical, so s r + 1 and s
t
r. So Condition
(H) holds for the family /
t
S
j
, which by induction has an SDR. Add to
this SDR the element selected as a representative for S
j
to obtain an SDR
for /.
In the text by van Lint and Wilson, Theorem 5.3 gives a lower bound on
the number of SDRs for a family of sets that depends only on the sizes of
the the sets. It is as follows.
Theorem 5.3 of van Lint and Wilson: Let / = (S
0
, S
1
, . . . , S
n1
) be
a family of n sets that does have an SDR. Put m
i
= [S
i
[ and suppose that
m
0
m
1
m
n1
. Then the number of SDRs for / is greater than or
equal to
70 CHAPTER 2. SYSTEMS OF REPRESENTATIVES AND MATROIDS
F
n
(m
0
, m
1
, . . . , m
n1
) :=
n1

i=0
(m
i
i)

,
where (a)

:= max1, a.
They leave as an exercise the problem of showing that this is the best
possible lower bound depending only on the size of the sets.
Exercise: 2.1.4 Let / = (A
1
, . . . , A
n
) be a family of subsets of 1, . . . , n.
Suppose that the incidence matrix of the family is invertible. Show that the
family has an SDR.
Exercise: 2.1.5 Prove the following generalization of Halls Theorem:
Let / = (A
1
, . . . , A
n
) be a family of subsets of X that satises the follow-
ing property: There is an integer r with 0 r < n for which the union of
each subfamily of k subsets of /, for all k with 0 k n, has at least k r
elements. Then there is a subfamily of size n r which has an SDR. (Hint:
Start by adding r dummy elements that belong to all the sets.)
Exercise: 2.1.6 Let G be a (nite, undirected, simple) graph with vertex set
V . Let C = C
x
: x V be a family of sets indexed by the vertices of
G. For X V , let C
X
=
xX
C
x
. A set X V is C-colorable if one can
assign to each vertex x X a color c
x
C
x
so that c
x
,= c
y
whenever x
and y are adjacent in G. Prove that if [C
X
[ [X[ whenever X induces a
connected subgraph of G, then V is C-colorable. (In the current literature of
graph theory, the sets assigned to the vertices are called lists, and the desired
proper coloring of G chosen from the lists is a list coloring of G. When G is
a complete graph, this exercise gives precisely Halls Theorem on SDRs. A
current research topic in graph theory is the investigation of modications of
this condition that suce for the existence of list colorings.
Exercise: 2.1.7 With the same notation of the previous exercise, prove that
if every proper subset of V is C-colorable and [C
V
[ [V [, then V is C-
colorable.
2.1. THE THEOREM OF PHILIP HALL 71
We now interpret the SDR problem as one on matchings in bipartite
graphs. Let G = (X, Y, E) be a bipartite graph. For each S X, let N(S)
denote the set of elements of Y connected to at least one element of S by
an edge, and put (S) = [S[ [N(S)[. Put (G) = max(S) : S X.
Since () = 0, clearly (G) 0. Then Halls theorem states that G has an
X-saturating matching if and only if (G) = 0.
Theorem 2.1.8 G has a matching of size t (or larger) if and only if t
[X[ (S) for all S X.
Proof: First note that Halls theorem says that G has a matching of size
t = [X[ if and only if (S) 0 for all S X i [X[ [X[ (S) for
all S X. So our theorem is true in case t = [X[. Now suppose that
t < [X[. Form a new graph G
t
= (X, Y Z, E
t
) by adding new vertices
Z = z
1
, . . . , z
[X[t
to Y , and join each z
i
to each element of X by an edge
of G
t
.
If G has a matching of size t, then G
t
has a matching of size [X[, implying
that for all S X,
[S[ [N
t
(S)[ = [N(S)[ +[X[ t,
implying
[N(S)[ [S[ [X[ +t = t ([X[ [S[) = t [X S[.
This is also equivalent to t [X[ ([S[ [N(S)[) = [X[ (S).
Conversely, suppose [N(S)[ t[XS[ = t([X[[S[). Then [N
t
(S)[ =
[N(S)[ +[X[ t (t [X[ +[S[) +[X[ t = [S[. By Halls theorem, G
t
has
an X-saturating matching M. At most [X[ t edges of M join X to Z, so
at least t edges of M are from X to Y .
Note that t [X[ (S) for all S X i t min
SX
([X[ (S)) =
[X[ max
SX
(S) = [X[ (G).
Corollary 2.1.9 The largest matching of G has size [X[ (G) = m(G),
i.e., m(G) +(G) = [X[.
72 CHAPTER 2. SYSTEMS OF REPRESENTATIVES AND MATROIDS
2.2 An Algorithm for SDRs
Suppose sets S
1
, . . . , S
n
are given and we have picked an SDR A
r
= a
1
, . . . , a
r

of S
1
, . . . , S
r
in any way. Here is how to nd an SDR for S
1
, . . . , S
r
, S
r+1
, or
to determine that S
1
, . . . , S
r
, S
r+1
does not satisfy Halls Condition (H).
Construct ordered sets T
1
, T
2
, . . . , etc., as follows. Put T
1
= S
r+1
=
b
1
, . . . , b
t
. If some b
i
is not yet used in A
r
, let a
r+1
= b
i
. Otherwise,
assume that all the elements b
1
, . . . , b
t
are already in A
r
. Form T
2
as follows.
First, let S(b
1
) denote the S
j
for which b
1
= a
j
. Then
T
2
=

b
1
, b
2
, . . . , b
t
; b
t+1
, . . . , b
s
,
where b
t+1
, . . . , b
s
are the elements in S(b
1
) not already in T
1
.
If some one of b
t+1
, . . . , b
s
is not in A
r
, use it to represent S(b
1
) and use
b
1
to represent S
r+1
. Leave the other a
i
s as before.
Each list T
j
looks like T
j
=

b
1
,

b
2
, . . . ,

b
k
, b
k+1
, . . . , b
m
. If all members
of T
j
are in A
r
, construct
T
j+1
=

b
1
, . . . ,

b
k
,

b
k+1
, b
k+2
, . . . , b
m
, (list here any members of S(

b
k+1
)
not already listed).
If some b
m+s
, s > 0, is not in A
r
, let b
m+s
represent S(b
k+1
). Then if
b
k+1
S(b
j
) S(b
j1
), let b
k+1
represent S(b
j
). And if b
j
S(b
i
) S(b
i1
),
let b
j
represent S(b
i
). Then b
i
S(b
u
) S(b
u1
). Let b
i
represent S(b
u
).
Eventually, working down subscripts, some b
p
S(b
j
) with b
j
T
1
. Let b
j
represent S
r+1
, and let b
p
represent S(b
j
). (Each b
j
is in some S(b
i
) with
i < j.)
Exercise: 2.2.1 You have seven employees P
1
, . . . , P
7
and seven jobs J
1
, . . . , J
7
.
You ask each employee to select two jobs. They select jobs as given below.
You start by assigning to P
1
the job J
2
and to P
2
the job J
6
. Illustrate our
algorithm for producing systems of distinct representatives (when they exist)
to complete this job assignment (if it is possible).
P
1
selects jobs numbered 1 and 2.
P
2
selects jobs numbered 5 and 6.
P
3
selects jobs numbered 2 and 3.
P
4
selects jobs numbered 6 and 7.
2.3. THEOREMS OF K

ONIG AND G. BIRKKHOFF 73


P
5
selects jobs numbered 3 and 4.
P
6
selects jobs numbered 7 and 6.
P
7
selects jobs numbered 4 and 2.
2.3 Theorems of Konig and G. Birkkho
Theorem 2.3.1 If the entries of a rectangular matrix are zeros and ones,
the minimum number of lines (i.e., rows and columns) that contain all the
ones is equal to the maximum number of ones that can be chosen with no two
on a line.
Proof: Let A = (a
ij
) be an n t matrix of 0s and 1s. Let m be
the minumum number of lines containing all the 1s, and M the maximum
number of 1s no two on a line. Then trivially m M, since no line can pass
through two of the 1s counted by M. We need to show M m.
Suppose a minimum covering by m lines consists of r rows and s columns,
where r + s = m. We may reorder rows and columns so these become the
rst r rows and rst s columns. Without loss of generality assume r 1.
For i = 1, . . . , r, put S
i
= j : a
ij
= 1 and j > s. So S
i
indicates which
columns beyond the rst s have a 1 in row i.
Claim: / = (S
1
, . . . , S
r
) satises Condition (H). For supppose some k of
these sets contain together at most k 1 elements. Then these k rows could
be replaced by the appropriate k1 (or fewer) columns, and all the 1s would
still be covered by this choice of rows and columns. By the minimality of m
this is not possible! Hence / has an SDR corresponding to 1s in the rst
r rows, no two in the same line and none in the rst s columns. By a dual
argument (if s > 1), we may choose s 1s, no two on a line, none in the rst
r rows and all in the rst s columns. These r + s = m 1s have no two on a
line, so m M. If s = 0, i.e., r = m, just use the r 1s to see r = m M.
Theorem 2.3.2 (Systems of Common Representatives) If a set S is parti-
tioned into a nite number n of subsets in two ways S = A
1
+ + A
n
=
B
1
+ + B
n
and if no k of the As are contained in fewer than k of the
Bs, for each k = 1, . . . , n, then there will be elements x
1
, . . . , x
n
that are si-
multaneously representatives of the As and Bs (maybe after reordering the
Bs).
74 CHAPTER 2. SYSTEMS OF REPRESENTATIVES AND MATROIDS
Proof: For each A
i
, put S
i
= j : A
i
B
j
,= . The hypothesis of the
thereom is just Condition (H) for the system / = (S
1
, . . . , S
n
). Let j
1
, . . . , j
n
be an SDR for /, and choose x
i
A
i
B
j
i
. Then x
1
, . . . , x
n
is simultaneously
an SDR for both the As and the Bs.
Corollary 2.3.3 If B is a nite group with (not necessarily distinct) sub-
groups H and K, with [H[ = [K[, then there is a set of elements of B that
are simultaneously representatives for right cosets of H and left (or right!)
cosets of K.
Exercise: 2.3.4 Sixteen (male - female) couples and a caller attend a square
dance. At the door each dancer selects a name-tag of one of the colors red,
blue, green, white. There are four tags of each color for males, and the same
for females. As the tags are selected, each dancer fails to notice what color
her/his partner selects. The caller is then given the job of constructing four
squares with four (original!) couples each in such a way that in each square
no two dancers of the same sex have tags of the same color. Show that this
is possible no matter how the dancers select their name tags.
Corollary 2.3.5 (Theorem of G. Birkkho) Let A = (a
ij
) be an nn matrix
where the a
ij
are nonnegative real numbers such that each row and column
has the same sum. Then A is a sum of nonnegative multiples of permutation
matrices.
Proof: A permutation matrix P is a square matrix of 0s and 1s with
a single 1 in each row and column. We are to prove that if

n
i=1
a
ij
= t =

n
j=1
a
ij
, a
ij
0, then A =

u
i
P
i
, u
i
0, each P
i
a permutation matrix.
The proof is by induction on the number w of nonzero entries a
ij
.
If A ,= 0, then w n. If w = n, then clearly (?) A = tP for some
permutation matrix P. So suppose w > n, and that the theorem has been
established for all such matrices with fewer than w nonzero entries. For each
i = 1, . . . , n, let S
i
be the set of js for which a
ij
> 0.
Claim: / = (S
1
, . . . , S
n
) satises Condition (H). For suppose some k of
the sets S
i
1
, . . . , S
i
k
contain together at most k 1 indices j. Then rows
i
1
, . . . , i
k
have positive entries in at most k 1 columns. But adding these
entries by rows we get tk, and adding by columns we get at most (k 1)t,
2.3. THEOREMS OF K

ONIG AND G. BIRKKHOFF 75


an impossibility. Hence / has an SDR j
1
, . . . , j
n
. This means that each of
a
1j
1
, a
2j
2
, . . . , a
nj
n
is positive. Put P
1
= (c
ij
), where
c
ij
=
_
1, if j = j
i
0, otherwise.
Put u
1
= mina
ij
i
: 1 i n. Then A
1
= Au
1
P
1
is a matrix of nonneg-
ative numbers in which each row and column sum is t u
1
. By the choice
of u
1
, A
1
has fewer nonzero entries than does A. Hence by the induction hy-
pothesis there are permutation matrices P
2
, . . . , P
s
and nonnegative numbers
u
2
, . . . , u
s
for which A
1
=

s
j=2
u
j
P
j
. So A =

s
i=1
u
i
P
i
, as desired.
Exercise: 2.3.6 Let n be a positive integer, and let a
ij
(1 i, j n 1)
be real numbers in
_
n2
(n1)
2
,
1
n1
_
which are independent over the rationals and
such that their rational span does not contain 1. Let A be the matrix of order
n whose (i, j) entry equals
_

_
a
ij
if 1 i, j n 1;
1

n1
k=1
a
ik
if i ,= n and j = n;
1

n1
k=1
a
kj
if j ,= n and i = n;
2 n +

n1
k=1

n1
l=1
a
kl
if i = j = n.
Show that A is a doubly stochastic matrix of order n. Show that A cannot be
expressed as the nonnegative linear combination of n
2
2n + 1 permutation
matrices.
Theorem 2.3.7 Let A = (a
ij
) be a doubly stochastic matrix of order n with
f(A) fully indecomposable components and #(A) nonzero entries. Then A
is the nonnegative linear combination of #(A) 2n +f(A) + 1 permutation
matrices.
Proof: The proof is by induction on #(A). Since A is doubly stochastic,
#(A) n. If #(A) = n, then A is a permutation matrix, and since #(A)
2n + f(A) + 1 = n 2n + n + 1 = 1, the theorem holds in this case.
Now assume that #(A) > n. Let k and l be integers such that a
kl
is a
smallest positive entry of A. Since A is doubly stochastic there exists a
permutation matrix P = (p
ij
) such that if p
ij
= 1 then a
ij
> 0, and p
kl
= 1.
(By the proof of the claim in the proof of Cor 2.3.5.) Since #(A) > n,
76 CHAPTER 2. SYSTEMS OF REPRESENTATIVES AND MATROIDS
a
kl
,= 1. Let B =
_
1
1a
kl
_
(A a
kl
P). Then B is a doubly stochastic matrix
with #(B) < #(A). By induction B is a nonnegative linear combination of
#(B)2n+f(B)+1 permutation matrices. Hence A is a nonnegative linear
combination of #(B) 2n + f(B) + 2 permutation matrixes. If f(B) =
f(A), then since #(A) > #(B), A is a nonnegative linear combination of
#(A) 2n+f(A) +1 permutation matrices, and we are done. Now suppose
that f(B) > f(A). Let S be the set of (i, j) such that p
ij
= 1 and a
ij
= a
kl
.
By permuting rows and columns we may assume that B is a direct sum
B
1
B
2
B
k
of fully indecomposable doubly stochastic matrices. Let
A
ij
denote the submatrix of A with rows those of B
i
and columns those of
B
j
. If A
ii
is not a fully indecomposable component of A, then there exists
a j, j ,= i, such that A
ij
,= 0. It follows that [S[ f(B) f(A). Since
#(B) +[S[ = #(A) and f(A) f(B) [S[, we have
#(A) 2n +f(n) + 1 < #(B) 2n +f(B) + 2.
Therefore A is a nonnegative linear combination of #(A) 2n+f(A) +1
permutation matrices, and the proof follows by induction.
In the proof of the following theorem we need the following elementary
inequality that can be proved by induction on r.
Exercise: 2.3.8 Let k
1
, . . . , k
r
, r 1, be r positive integers. Then
r

i=1
k
2
i
+r
_
r

i=1
k
i
_
2
+ 1.
Theorem 2.3.9 Let A be a doubly stochastic matrix of order n. Then A is
the nonnegative linear combination of n
2
2n + 2 permutation matrices.
Proof: Let A have r = f(A) fully indecomposable components A
1
, . . . , A
r
with A
i
being k
i
by k
i
. Then #(A) + f(A)

r
i=1
k
2
i
+ r (

r
i=0
k
i
)
2
+
1 = n
2
+ 1. Then by the previous theorem, A is the nonnegative linear
combination of #(A) 2n + 1 + f(A) (n
2
+ 1) 2n + 1 = n
2
2n + 2
permutation matrices.
2.4 The Theorem of Marshall Hall, Jr.
Many of the ideas of nite combinatorics have generalizations to situations
in which some of the sets involved are innite. We just touch on this subject.
2.4. THE THEOREM OF MARSHALL HALL, JR. 77
Given a family / of sets, if the number of sets in the family is innite,
there are several ways the theorem of P. Hall can be generalized. One of the
rst (and to our mind one of the most useful) was given by Marshal Hall, Jr.
(no relative of P. Hall), and is as follows.
Theorem 2.4.1 Suppose that for each i in some index set I there is a nite
subset A
i
of a set S. The system / = (A
i
)
iI
has an SDR if and only if
the following Condition (H) holds: For each nite subset I
t
of I the system
/
t
= (A
i
)
iI
satises Condition (H).
Proof: We establish a partial order on deletions, writing D
1
D
2
for
deletions D
1
and D
2
i each element deleted by D
1
is also deleted by D
2
.
Of course, we are interested only in deletions which preserve Condition (H).
If all deletions in an ascending chain D
1
D
2
D
i
preserve
Condition (H), let D be the deletion which consists of deleting an element
b from a set A i there is some i for which b is deleted from A by D
i
. We
assert that deletion D also preserves Condition (H).
In any block B
r,s
of /, (r, s < ), at most a nite number of deletions in
the chain can aect B
r,s
. If no deletion of the chain aects B
r,s
, then of course
D does not aect B
r,s
, and Condition (H) still holds for B
r,s
. Otherwise, let
D
n
be the last deletion that aects B
r,s
. So under D
n
(and hence also under
D) (B
r,s
)
t
= B
t
r,s
still satises Condition (H) by hypothesis, i.e., s
t
r. But
B
r,s
is arbitrary, so D preserves Condition (H) on /. By Zorns Lemma,
there will be a maximal deletion

D preserving Condition (H). We show that
under such a maximal deletion

D preserving Condition H, each deleted set
S
t
i
has only a single element. Clearly these elements would form an SDR for
the original /.
Suppose there is an a
1
not belonging to a critical block. Delete a
1
from
every set A
i
containing a
1
. Under this deletion a block B
r,s
is replaced by a
block B
t
r,s
with s
t
s 1 r, so Condition (H) is preserved. Hence after
a maximal deletion each element left is in some critical block. And if B
k,k
is
a critical block, we may delete elements of B
k,k
from all sets not in B
k,k
and
still preserve Condition (H) by Lemma 2.1.3 (since it needs to apply only
to nitely many sets at a time). By Theorem 2.1.1 each critical block B
k,k
(being nite) possesses an SDR when Condition (H) holds. Hence we may
perform an additional deletion leaving B
k,k
as a collection of singleton sets
and with Condition (H) still holding for the entire remaining sets. It is now
clear that after a maximal deletion

D preserving Condition (H), each element
78 CHAPTER 2. SYSTEMS OF REPRESENTATIVES AND MATROIDS
is in a critical block, and each critical block consists of singleton sets. Hence
after a maximal deletion

D preserving Condition (H), each set consists of a
single element, and these elements form an SDR for /.
The following theorem, sometimes called the CantorSchroederBernstein
Theorem, will be used with the theorem of M. Hall, Jr. to show that any
two bases of a vector space V over a eld F must have the same cardinality.
Theorem 2.4.2 Let X, Y be sets, and let : X Y and : Y X be
injective mappings. Then there exists a bijection : X Y .
Proof: The elements of X will be referred to as males, those of Y as
females. For x X, if (x) = y, we say y is the daughter of x and x is the
father of y. Analogously, if (y) = x, we say x is the son of y and y is the
mother of x. A male with no mother is said to be an adam. A female with
no father is said to be an eve. Ancestors and descendants are dened in
the natural way, except that each x or y is both an ancestor of itself and a
descendant of itself. If z X Y has an ancestor that is an adam (resp.,
eve) we say that z has an adam (resp., eve). Partition X and Y into the
following disjoint sets:
X
1
= x X : x has no eve;
X
2
= x X : x has an eve;
Y
1
= y Y : y has no eve;
Y
2
= y Y : y has an eve.
Now a little thought shows that : X
1
Y
1
is a bijection, and
1
:
X
2
Y
2
is a a bijection. So
= [
X
1

1
[
X
2
is a bijection from X to Y .
Corollary 2.4.3 If V is a vector space over the eld F and if B
1
and B
2
are two bases for V , then [B
1
[ = [B
2
[.
2.5. MATROIDS AND THE GREEDY ALGORITHM 79
Proof: Let B
1
= x
i
: i I and B
2
= y
j
: j J. For each i I, let

i
= j J : y
j
occurs with nonzero coecient in the unique linear
expression for x
i
in terms of the y
t
j
s. Then the union of any k ( 1)
t
i
s, say

i
1
, . . . ,
i
k
, each of which of course is nite, must contain at least k distinct
elements. For otherwise x
i
1
, . . . , x
i
k
would belong to a space of dimension
less than k, and hence be linearly dependent. Thus the family (
i
: i I) of
sets must have an SDR. This means there is a function : I J which is
an injection. Similarly, there is an injection : J I. So by the preceding
theorem there is a bijection J I, i.e., [B
1
[ = [B
2
[.
2.5 Matroids and the Greedy Algorithm
A matroid on a set X is a collection 1 of subsets of X (called independent
subsets of X) satisfying the following:
Subset Rule: Each subset of an independent set (including the empty
set) is independent.
Expansion Rule: If I, J 1 with [I[ < [J[, then there is some x J
for which I x 1.
An independent set of maximal size in 1 is called a basis. An additive
cost function f is a function f : T(X) 1 with f() = 0 and such that
f(S) =

xS
f(x), for each S X.
Theorem 2.5.1 Let 1 be a matroid of independent sets on X, and let f :
T(X) 1 be an additive cost function. Then the greedy algorithm (given
below) selects a basis of minimum cost.
Proof: The Greedy Algorithm is as follows:
1. Let I = .
2. From set X pick an element x with f(x) minimum.
3. If I x is independent, replace I with I x.
4. Delete x from X.
80 CHAPTER 2. SYSTEMS OF REPRESENTATIVES AND MATROIDS
Repeat Steps 2 through 4 until X is empty.
The Expansion Rule implies that all maximal independent sets have the
same size, and the Greedy Algorithm will add to I until it is a basis. Suppose
the Greedy Algorithm selects a basis B = (b
1
, b
2
, . . . , b
k
) and that A =
(a
1
, . . . , a
k
) is some other basis, both ordered so that if i < j then f(b
i
)
f(b
j
) and f(a
i
) f(a
j
). By Step 2 of the Greedy Algorithm, f(b
1
) f(a
1
).
If f(b
i
) f(a
i
) for all i, then f(B) f(A). So suppose there is some
j such that f(b
j
) > f(a
j
), but f(b
i
) f(a
i
) for i = 1, 2, . . . , j 1.
Then a
1
, . . . , a
j
and b
1
, b
2
, . . . , b
j1
are both independent. By
the Expansion Property, for some a
i
with i j, b
1
, . . . , b
j1
, a
i
is
independent. Since f(a
i
) f(a
j
) < f(b
j
), a
i
would have been selected by
the Greedy Algorithm to be b
j
(almost true; at least b
j
would not have been
chosen!). Hence f(b
j
) f(a
j
) and B must be a minimum-cost basis.
We now consider three contexts in which the Greedy Algorithm has turned
out to be very useful.
Let / = (S
1
, . . . , S
m
) be a family of (not necessarily distinct) subsets of a
set X. Suppose each element of X has a weight (or cost) assigned to it. For
x X let f(x) be the cost assigned to x. And for S X, the cost of S is
to be f(S) =

xS
f(x). We want to conctruct an SDR for some subfamily
/
t
= (S
i
1
, . . . , S
i
k
) of / which is as large as possible, and such that the
cost f(D) of the SDR D is the minimum possible among all maximum-sized
SDRs.
We say that a subset I of X is an independent set (of representatives for
/) if the elements of I may be matched with members of / so that they
form an SDR for the sets with which they are matched. So we want to be
able to nd a cheapest independent set of maximal size.
The same problem can be expressed in terms of bipartite graphs. Given
a bipartite graph G = (X, Y, E), we say that a subset S of X is independent
(for matchings of X into Y ) if there is a matching M of G which matches
all elements of S to elements of Y . If each element x of X is assigned a
(nonnegative) cost f(x), for each subset S X put F(S) =

xS
f(x).
Then the problem is to nd a cheapest (or sometimes a most expensive)
independent set in X, and it is usually desirable also to have a corresponding
matching.
Given / = (S
1
, . . . , S
m
), S
i
X, construct a bipartite graph G =
(X, /, E) with (x, S
j
) E i x S
j
. Then a subset I of X is indepen-
2.5. MATROIDS AND THE GREEDY ALGORITHM 81
dent in the matching sense i it is independent in the SDR sense.
Theorem 2.5.2 In both examples we have just given (matchings of bipartite
graphs and SDRs) the independent sets form a matroid.
We do the case for matchings in bipartite graphs, and we note that if the
empty set (as a subset of X in G = (X, Y, E)) is dened to be independent,
then clearly the set 1 of independent subsets of X satises the Subset Rule.
So we consider the Expansion Rule. However, before proceding into the
proof we need to be sure that our terminology is clear and we need to prove
a couple preliminary lemmas.
A matching M of size m in a graph G is a set of m edges, no two of which
have a vertex in common. A vertex is said to be matched (to another vertex)
by M if it lies in an edge of M. We dened a bipartite graph G with parts
X and Y to be a graph whose vertex set is the union of the two disjoint sets
X and Y and whose edges all connect a vertex in X with a vertex in Y . A
complete matching of X int Y is a matching of G with X edges. Our rst
lemma describes the interaction of two matchings.
Lemma 2.5.3 Let M
1
and M
2
be matchings of the graph G = (V, E). Let
G
t
= (V, E
t
) be the subgraph with E
t
= (M
1
M
2
) (M
1
M
2
) = (M
1
M
2
)
(M
2
M
1
). Then each connected component of G
t
is one of the following
three types:
(i) a single vertex
(ii) a cycle with an even number of edges and whose edges are
alternately in M
1
and M
2
.
(iii) a chain whose edges are alternately in M
1
and M
2
,
and whose two end vertices are each matched by one of
M
1
, M
2
but not both.
Moreover, if [M
1
[ < [M
2
[, there is a component of G
t
of type (iii) with
rst and last edges in M
2
and whose endpoints are not M
1
-matched.
Proof: If x is a vertex of G that is neither M
1
-matched nor M
2
-matched,
then x is an isolated vertex of G
t
. Similarly, if some edge through x lies in
both M
1
and M
2
, then x is an isolated vertex of G
t
. Now suppose G
1
is a
connected component of G
t
having n vertices and at least one edge. Since
G
1
is connected, it has at least n 1 edges (since it has a spanning tree).
82 CHAPTER 2. SYSTEMS OF REPRESENTATIVES AND MATROIDS
Each vertex of G
1
has degree 1 or 2 (on at most one edge of M
1
and at most
one edge of M
2
). Hence
2(n 1) 2(number of edges of G
1
) =

xV (G
1
)
deg(x) 2n.
So G
1
has either n1 or n edges. If G
1
has n1 edges, it must be a tree
in which each vertex has degree at most 2, i.e., G
1
is a chain whose edges are
alternately in M
1
and M
2
. If x is an endpoint of G
1
, it is easy to see that x
is matched by only one of M
1
, M
2
. (If x e
1
M
1
and x e
2
M
2
, then
if e
1
= e
2
this edge is not in E
t
; if e
1
,= e
2
, both edges e
1
, e
2
are in G
t
and x
could not be an endpoint of G
1
.)
If G
1
has n edges, it must be a cycle whose edges alternate in M
1
and
M
2
, forcing it to have an even number edges.
Finally, if [M
1
[ < [M
2
[, there must be some connected component G
1
with more M
2
-edges than M
1
-edges. So G
1
is of type (iii) with rst and last
edge in M
2
and whose endpoints are not M
1
-matched.
If M is a matching for G, a path v
0
e
1
v
1
e
2
e
n
v
n
is an alternating path
for M if whenever e
i
is in M, e
i+1
is not and whenever e
i
is not in M, e
i+1
is in M. We now show how to use the kind of alternating path that arises in
case (iii) of the previous Lemma to obtain a larger matching.
Lemma 2.5.4 Let M be a matching in a graph G and let P be an alternating
path with edge set E
t
beginning and ending at unmatched vertices. Let M
t
=
M E
t
. Then
(M M
t
) (E
t
M
t
) = (M E
t
) (E
t
M)
is a matching with one more edge than M has.
Proof: Every other edge of P is in M. However, P begins and ends with
edges not in M, so there is a number k such that P has k edges in M and
k +1 edges not in M. The rst and last vertices of P are unmatched, and all
other vertices in P are matched by M
t
, so no edge in M M
t
contains any
vertex in P. Thus, the edges of MM
t
have no vertices in common with the
edges of E
t
M
t
. Further, since P is a path and E
t
M
t
consists of every
other edge of the path, the edges of E
t
M
t
have no vertices in common.
Thus:
(M M
t
) (E
t
M
t
) = (M E
t
) (E
t
M)
2.5. MATROIDS AND THE GREEDY ALGORITHM 83
is a matching and by the sum principle, it has mk +k +1 = m+1 edges.
We are now ready for the proof of the theorem.
Proof: As mentioned above, the set 1 of independent subsets of X (of
vertices of a bipartite graph G = (X, Y, E)) satises the Subset Rule. So we
now consider the Expansion Rule.
Suppose M
1
is a matching of S into Y , M
2
is a matching of T into Y ,
where S X, T X and [S[ < [T[. Let G
t
be the graph on X Y
with edgeset E
t
= (M
1
M
2
) (M
1
M
2
) = (M
1
M
2
) (M
2
M
1
). Here
[M
1
[ = [S[ < [T[ = [M
2
[, and clearly [E
t
M
1
[ < [E
t
M
2
[.
At least one of the connected components of G
t
has one more edge in
M
2
than it has in M
1
. So by Lemma 2.5.3 the graph G
t
has a connected
component that must be an M
1
-alternating path P whose rst and last edges
are in M
2
. Each vertex of this path that is touched by an M
1
edge is also
touched by an M
2
edge. And the endpoints of this path are not M
1
-matched.
Since the path has an odd number of edges, its two endpoints lie one in X,
one in Y . Say x is the endpoint lying in X. Let E
tt
be the edge-set of the
path P. Then M
t
= (M
1
E
tt
) (E
tt
M
1
) is a matching with one edge
more than M
1
, and M
t
is a matching of S x into Y . Hence S x is
independent, and we selected x from T.
There is a converse due to Berge that is quite interesting, but we leave
its proof as an exercise.
Theorem 2.5.5 Suppose G is a graph and M is a matching of G. Then M
is a matching of maximum size (among all matchings) if and only if there is
no alternating path connecting two unmatched vertices.
Exercise: 2.5.6 Prove Theorem 2.5.5.
There is a third standard example of a matroid.
Theorem 2.5.7 The edgesets of forests of a graph G = (V, E) form the
independent sets of a matroid on E.
Proof: If for some F E it is true that (V, F) has no cycles, then (V, F
t
)
has no cycles for any subset F
t
of F. This says that the Subset Rule is
satised.
84 CHAPTER 2. SYSTEMS OF REPRESENTATIVES AND MATROIDS
Recall that a tree is a connected graph with k vertices and k 1 edges.
Thus a forest on n vertices with c connected components will consist of c trees
and will thus have n c edges. Suppose F
t
and F are forests (contained in
E) with r edges and s edges, respectively, with r < s. If no edge of F can
be added to F
t
to give an independent set, then adding any edge of F to F
t
gives a cycle. In particular, each edge of F must connect two points in the
same connected component of (V, F
t
). Thus each connected component of
(V, F) is a subset of a connected component of (V, F
t
). Then (V, F) has no
more edges than (V, F
t
), so r s, a contradiction. Hence the forests of G
satisfy the expansion rule, implying that the collection of edgesets of forests
of G is a collection of independent sets of a matroid on E.
Corollary 2.5.8 The Greedy Algorithm applied to cost-weighted edges of a
connected graph produces a minimum-cost spanning tree. In fact, this is what
is usually called Kruskals algorithm.
Chapter 3
Polya Theory
3.1 Group Actions
Let X be a nonempty set and o
X
the symmetric group on X, i.e., o
X
is
the group of all permutations of the elements of X with the group operation
being the composition of functions. Let G be a group. An action of G on
X is a homomorphism : G o
X
. In other words, is a function from G
to o
X
satisfying
(g
1
) (g
2
) = (g
1
g
2
) (3.1)
for all g
1
, g
2
G.
Often ((g))(x) is written as g(x) if only one action is being considered.
The only dierence between thinking of G as acting on X and thinking of G
as a group of permutations of the elements of X is that for some g
1
, g
2
G,
g
1
,= g
2
, it might be that (g
1
) and (g
2
) are actually the same permutation,
i.e., g
1
(x) = g
2
(x) for all x X. Also, sometimes there are several dierent
actions of G on X which may be considered in the same context.
Theorem 3.1.1 Let be an action of G on X and let e be the identity of
G. Then the following hold:
(i) (e)is the identity permutation on X.
(ii) (g
1
) = [(g)]
1
, for each g G.
(iii) More generally, for each n Z, (g
n
) = ((g))
n
.
89
90 CHAPTER 3. POLYA THEORY
Proof: These results are special cases of results usually proved for homo-
morphisms in general. If you dont remember them, you should work out the
proofs in this special case.
Let G act on X. For x, y X, dene x y i there is some g G for
which g(x) = y.
Theorem 3.1.2 The relation is an equivalence relation on X.
Proof: This is an easy exercise.
The equivalence classes are called G-orbits in X. The orbit con-
taining x is denoted x
G
or sometimes just [x] if there is no likelihood of
confusion.
For g G, put X
g
= x X : g(x) = x, so X
g
is the set of elements of
X xed by g. For x X, put G
x
= g G : g(x) = x. G
x
is the stabilizer
of x in G.
Theorem 3.1.3 For x X, G
x
is a subgroup of G (written G
x
G). If G
is nite, then [G[ = [[x][ [G
x
[.
Proof: It is an easy exercise to show that G
x
is a subgroup of G. Having
done that, dene a function f from the set of left cosets of G
x
in G to [x] by:
f(gG
x
) = g(x).
First we show that f is well-dened. If g
1
G
x
= g
2
G
x
, then g
1
2
g
1
G
x
,
so that (g
1
2
g
1
)(x) = x, which implies g
1
(x) = g
2
(x). Hence f(g
1
G
x
) =
f(g
2
G
x
). So f is well-dened. Now we claim f is a bijection. Suppose
f(g
1
G
x
) = f(g
2
G
x
), so by denition g
1
(x) = g
2
(x) and
(g
1
2
g
1
)(x) = x. Hence g
1
2
g
1
G
x
, implying g
1
G
x
= g
2
G
x
, so f is one-to-
one. If y [x], then there is a g G with g(x) = y. So f(gG
x
) = g(x) = y,
implying f is onto [x].
Hence f is a bijection from the set of left cosets of G
x
in G to [x], i.e.,
[G[/[G
x
[ = [[x][ as claimed.
Theorem 3.1.4 For some x, y X and g G, suppose that g(x) = y.
Then
(i) H G
x
i gHg
1
G
y
; in particular,
(ii) G
g(x)
= gG
x
g
1
.
3.1. GROUP ACTIONS 91
Proof: Easy exercise.
Theorem 3.1.5 The Orbit-Counting Lemma (Not Burnsides Lemma) Let
k be the number of G-orbits in X. Then
k =
1
[G[
_
_

gG
[X
g
[
_
_
.
Proof: Put S = (x, g) X G : g(x) = x. We determine [S[ in two
ways: [S[ =

xX
[G
x
[ =

gG
[X
g
[. Since x y i [x] = [y], in which
case [[x][ = [[y][, it must be that [G[/[G
x
[ = [G[/[G
y
[ whenever x y. So

y[x]
[G
y
[ =

y[x]
[G
x
[ = [[x][ [G
x
[ = [G[. Hence

xX
[G
x
[ = k [G[ =

gG
[X
g
[. And hence k =
_

gG
[X
g
[
_
/[G[.
The following situation often arises. There is some given action of G
on some set X. T is the set of all functions from X into some set Y . Then
there is a natural action of G on T dened by: For each g G and each
f T = Y
X
,
(g)(f) = f (g
1
).
Theorem 3.1.6 : G o
T
is an action of G on T.
Proof: First note that (g
1
g
2
)(f) = f ((g
1
g
2
)
1
) = f (g
1
2

g
1
1
) = f [(g
1
2
) (g
1
1
)] = [f (g
1
2
)] (g
1
1
) = (g
1
)(f (g
1
2
) =
(g
1
)((g
2
)(f)) = ((g
1
) (g
2
))(f). So is an action of G on T provided
each (g) is actually a permutation of the elements of T.
So suppose (g)(f
1
) = (g)(f
2
), i.e., f
1
(g
1
) = f
2
(g
1
). But since
(g
1
) is a permutation of the elements of X, it must be that f
1
= f
2
, so
(g) is one-to-one on T. For each g G and f : X Y , f (g) Y
X
and
(g)(f (g)) = (f (g)) (g
1
) = f, implying that (g) is onto.
To use Not Burnsides Lemma to count G-orbits in T, we need to compute
[T
(g)
[ for each g G.
Theorem 3.1.7 For g G, let c be the number of cycles of (g) as a per-
mutation on X. Then [T
(g)
[ = [Y [
c
.
92 CHAPTER 3. POLYA THEORY
Proof: For f : X Y , g G, we want to know when is f (g
1
) = f,
i.e., (f (g
1
))(x) = f(x). This is i f((g
1
)(x)) = f(x). So f must have
the same value at x, (g
1
)(x), (g
2
)(x), . . ., etc. This just says that f is
constant on the orbits of (g) in X. So if c is the number of cycles of (g) as
a permutation on X, then [Y [
c
is the number of functions f : X Y which
are constant on the orbits of (g).
Applying Not Burnsides Lemma to the action of G on T = Y
X
, we
have:
Theorem 3.1.8 The number of G-orbits in T is
1
[G[

gG
[T
(g)
[ =
1
[G[

gG
[Y [
c(g)
,
where c(g) is the number of cycles of (g) as a permutation on X.
3.2 Applications
Example 3.2.1 Let G be the group of symmetries of the square
1 2
4 3
writ-
ten as permutations of [4] = 1, 2, 3, 4. The convention here is that if a
symmetry of the square moves a corner labeled i to the corner previously
labeled j, then (i) = j. We want to paint the corners with W and R (white
and red) and then determine how many essentially dierent paintings there
are.
Here X = 1, 2, 3, 4, Y = W, R. A painting f is just a function
f : X Y . Two paintings f
1
, f
2
are the same if there is a g G with
f
1
g
1
= f
2
. So the number of distinct paintings is the number of G-orbits
in T = f : X Y , which is
1
[G[

gG
[Y [
c(g)
.
G = e, (1234), (13)(24), (1432), (24), (12)(34), (13), (14)(23).
3.2. APPLICATIONS 93
So the number of distinct paintings is
1
8
(2
4
+ 2
1
+ 2
2
+ 2
1
+ 2
3
+ 2
2
+ 2
3
+ 2
2
) = 6.
The distinct paintings are listed as follows:
W W
W W
W R
W W
W R
R W
W W
R R
W R
R R
R R
R R
Example 3.2.2 How many necklaces are there with n beads of m colors if
two are the same provided one can be rotated into the other?
Let be the basic rotation = (1, 2, 3, . . . , n). So G =
i
: 1 i n.
The length of each cycle in
i
is the order of
i
, which is
n
gcd(n, i)
.
So the number of cycles in
i
is gcd(n, i). For each d such that d[n, there
are (n/d) integers i with 1 i n and d = gcd(n, i). So the number of

i
G with d cycles is (n/d), for each d with d[n. If Y is the set of m
colors to be used, the number of distinct necklace patterns under the action
of G is
1
n

gG
m
c(g)
=
1
n

d[n
(n/d)m
d
=
1
n

d[n
(d)m
n/d
.
Note: If f(x) =
1
n

d[n
(d)x
n/d
, then f(x) is a polynomial of degree n
with rational coecients which lie between 0 and 1, but f(m) is a positive
integer for each positive integer m.
Exercise: 3.2.3 If G also has ips, so [G[ = 2n, i.e., G is dihedral, how
many necklaces of n beads in m colors are there? (Hint: Do the cases n odd,
n even separately.)
94 CHAPTER 3. POLYA THEORY
Solution: If n is odd, each of the n additional permutations is a ip about
a uniquely dened vertex (i.e., bead), and it has 1+
n1
2
=
n+1
2
cycles. If n is
even, each of the n additional permutations is a ip. n/2 of them are about
an axis through two opposite beads and each has 2 +
n2
2
=
n+2
2
cycles. The
other n/2 ips are about an axis that misses each bead and has
n
2
cycles. So
the number of distinct necklace patterns under the action of G is
1
2n
_
_
_

d[n
(
n
d
)m
d
+
n
2
(m
n+2
2
) +
n
2
m
n
2
_
_
_
.
Answer to Exercise
(i) For n odd, the number of necklaces is:
1
2n
_
_

d[n
(n/d)m
d
+nm
n+1
2
_
_
.
(ii) For n even, the number of necklaces is:
1
2n
_
_

d[n
(n/d)m
d
+
1
2
nm
n
2
(m+ 1)
_
_
.
Example 3.2.4 A switching function f in n variables is a function
f : Z
n
2
Z
2
. Starting with a group G acting on the set [n] = 1, . . . , n we
can dene an action of G on the set of all switching functions in n variables
and then ask how many inequivalent switching functions in n variables there
are.
As a rst case, let G = o
n
be the group of all permutations of the elements
of [n], and dene an action of G on
Z
[n]
2
= x : [n] Z
2
= 0, 1
according to the following: For x Z
[n]
2
(write x = (x
1
, . . . , x
n
): x
i
=0 or 1,
and where x
i
is the image of i under x), and g G, put
3.2. APPLICATIONS 95
((g))(x) = x g
1
= (x
g
1
(1)
, . . . , x
g
1
(n)
).
Now let
T
n
= f : Z
n
2
Z
2
= Z
(?
n
2
)
.
So there is an action of G on T
n
dened by: For f T
n
, g G,
((g))(f) = f (g
1
),
i.e.,
((g)(f))(x) = f((g
1
)(x)) = f(x g) = f(x
g(1)
, . . . , x
g(n)
).
This last equation says that g o
n
acts on T
n
by: (g(f))(x) = f(x g),
i.e.,
g(f) : (x
1
, . . . , x
n
) f(x
g(1)
, . . . , x
g(n)
).
We say that f
1
and f
2
are equivalent if they are in the same G-orbit, and
we would like to determine how many inequivalent switching functions in n
variables there are. This is too dicult for us to do for general n, so we put
n = 3. But rst we rename the elements of Z
3
2
essentially by listing them in
a natural order so that we can simplify notation in what follows.
(000) 0; (001) 1; (010) 2; (011) 3;
(100) 4; (101) 5; (110) 6; (111) 7.
We note that
G = o
3
= e, (123), (132), (12), (13), (23)
eects an action on Z
3
2
by: (g)(x) = x g
1
:
96 CHAPTER 3. POLYA THEORY
e : (x
1
, x
2
, x
3
) (x
1
, x
2
, x
3
) (e) = (0)(1)(2)(3)(4)(5)(6)(7)
(123) : (x
1
, x
2
, x
3
) (x
3
, x
1
, x
2
) (123) = (0)(142)(356)(7)
(12) : (x
1
, x
2
, x
3
) (x
2
, x
1
, x
3
) (12) = (0)(1)(24)(35)(6)(7)
(13) : (x
1
, x
2
, x
3
) (x
3
, x
2
, x
1
) (13) = (0)(14)(2)(36)(5)(7)
(132) : (x
1
, x
2
, x
3
) (x
2
, x
3
, x
1
) (132) = (0)(124)(365)(7)
(23) : (x
1
, x
2
, x
3
) (x
1
, x
3
, x
2
) (23) = (0)(12)(3)(4)(56)(7)
T
3
= f : Z
3
2
Z
2
= f : 0, . . . , 7 Z
2
.
For g G, [(T
3
)
g
[ = 2
c(g)
, where c(g) is the number of cycles in (g). So
the number of G-orbits in T
3
is
1
[G[

gG
2
c(g)
=
1
6
[2
8
+ 2
4
+ 2
6
+ 2
6
+ 2
4
+ 2
6
] = 80
(whereas [T
3
[ = 2
(2
3
)
= 2
8
= 256).
Exercise: 3.2.5 Repeat this problem with n = 2.
Exercise: 3.2.6 Repeat this problem with n = 3, but extend G to a group of
order 12 on Z
3
2
by allowing complementation: x
i
= 1+x
i
, with addition mod
2.
3.3 The Cycle Index: Polyas Theorem
Let G act on a set X, with n = [X[. For each g G, let
t
(g) be the number
of cycles of length t in the cycle decomposition of g as a permutation on X,
1 t n. Let x
1
, . . . , x
n
be variables. Then the CYCLE INDEX of G
(relative to the given action of G) is the polynomial
P
G
(x
1
, . . . , x
n
) =
1
[G[

gG
x

1
(g)
1
x

n
(g)
n
.
3.3. THE CYCLE INDEX: POLYAS THEOREM 97
If Y is a set with m = [Y [, then G induces an action on Y
X
, as we have
seen above. And the ordinary version of Polyas counting theorem is given
as follows:
Theorem 3.3.1 The number of G-orbits in Y
X
is
P
G
(m, . . . , m) =
1
[G[

gG
m
(

n
t=1

t
(g))
.
Proof: Of course, this is exactly what Theorem 3.1.8 says.
Example 3.3.2 THE GROUP OF RIGID MOTIONS OF THE CUBE
Consider a cube in 3-space. It has 8 vertices, 6 faces and 12 edges. Let
G be the group of rigid motions, (i.e., rotations) of the cube. G consists of
the following rotations:
(a) The identity.
(b) Three rotations of 180 degrees about axes connecting centers of op-
posite faces.
(c) Six rotations of 90 degrees around axes connecting centers of opposite
faces.
(d) Six rotations of 180 degrees around axes joining midpoints of opposite
edges.
(e) Eight rotations of 120 degrees about axes connecting opposite vertices.
Exercise: 3.3.3 Compute the Cycle Index of G considered as a group of
permutations on the vertices (resp., edges; resp., faces) of the cube.
CONVENTION: If A labels a vertex (edge, face, etc.) which is carried
by a motion to the position originally held by the vertex (edge, face, etc.)
labeled B, we write (A) = B.
98 CHAPTER 3. POLYA THEORY
3.4 Sylow Theory Via Group Actions
We begin with a preliminary result.
Theorem 3.4.1 Let n = p

m where p is a prime number, and let p


r
[[m
(i.e., p
r
divides m, but p
r+1
does not divide m). Then
p
r
[[
_
p

m
p

_
.
Proof: The question is: What power of p divides
_
p

m
p

_
=
(p

m)!
(p

)!(p

mp

)!
=
=
p

m(p

m1) (p

mi) (p

mp

+ 1)
p

(p

1) (p

i) (p

+ 1)
?
Looking at this expression written out, one can see that except for the
factor m in the numerator, the power of p dividing (p

m i) = p

i +
p

(m 1) is the same as that dividing p

i, since 0 i p

1, so all
powers of p cancel out except the power which divides m.
Theorem 3.4.2 (Sylow Theorem 1) Let G be a nite group with [G[ = n =
p

q, p

[[n. (So .) Then G has a subgroup of order p

.
Proof: From the preceding result it follows that p

[[
_
n
p

_
. Put X =
S G : [S[ = p

. : G o
X
is an action of G on X, where is dened
by
(g)S = gS = gs : s S.
Fix S = g
1
, . . . , g
p
X. If g G
S
, then S = gS gg
1
, . . . , gg
p
=
g
1
, . . . , g
p
, which implies that gg
1
= g
k
for some k, so that
g = g
k
g
1
1
g
1
g
1
1
, g
2
g
1
1
, . . . , g
p
g
1
1
.
3.4. SYLOW THEORY VIA GROUP ACTIONS 99
From this it is clear that [G
S
[ p

.
Let O
1
, . . . , O
h
be the distinct G-orbits in X.
_
n
p

_
= [X[ =

h
t=1
[O
t
[,
so that p
+1
does not divide [X[. Hence there is some t for which p
+1
does not divide [O
t
[. Let O be any one of the orbits O
t
for which p
+1
does
not divide [O[. For any S O, O = gS : g G, so [G[ = [G
S
[ [O[ = p

d
where p does not divide d. Hence [O[ =
p

d
[G
S
[
, forcing p

to divide [G
S
[.
Putting this together with the result of the previous paragraph, [G
S
[ = p

.
This shows that if O is an orbit for which p
+1
does not divide [O[ (and
there must be at least one such), then for each S O, [G
S
[ = p

. So G has
a subgroup of order p

.
Theorem 3.4.3 (Sylow Theorem 2) If p

is an exact divisor of [G[, then all


subgroups of G with order p

(i.e., all Sylow p-subgroups of G) are conjugate


in G.
Proof: In Theorem 3.4.2 put = . So there is an orbit O for which p
does not divide [O[, and [G
S
[ = p

for each S O.
We claim that the converse holds: If H G with [H[ = p

, then there
must be some T O for which H = G
T
, so that H and G
S
are conjugate
for each S O.
Given H G with [H[ = p

, clearly H acts on O. Let Q


1
, . . . , Q
m
be
all the H-orbits in O, so that [O[ =

m
t=1
[Q
t
[. Since p does not divide [O[,
there must be some t for which p does not divide [Q
t
[. Choose any T Q
t
,
so Q
t
= hT : h H. Then [H
T
[ [Q
t
[ = [H[ = p

and [Q
t
[ =
p

[H
T
[
. Hence
p

= [H
T
[, since p does not divide [Q
t
[. Since H
T
H and [H
T
[ = [H[, we
have H = H
T
. Also T O with p not dividing [O[, so [G
T
[ = p

. Then
H
T
G
T
and [H
T
[ = [G
T
[ imply that H
T
= G
T
. Hence H = G
T
.
Theorem 3.4.4 (Sylow Theorem 3) Let p

be an exact divisor of [G[, with


[G[ = p

q. Let t
p
be the number of Sylow p-subgroups in G. Then:
(a) t
p
1 (mod p);
(b) t
p
[q.
Proof: Let H
1
, . . . , H
r
be the distinct Sylow p-subgroups of G. Recall
that there is an orbit O with p not dividing [O[, and each H
i
is the stabilizer
100 CHAPTER 3. POLYA THEORY
of some element of O. In fact, each H
i
is the stabilizer of the same number
s of elements of O, so [O[ = rs.
Let P
1
= S O : H
1
= G
S
. So [P
1
[ = s. If T O P
1
, then the
H
1
-orbit | containing T has more than one element. [(H
1
)
T
[ [|[ = [H
1
[.
But [H
1
[ = p

, [|[ > 1, imply that p divides [|[. So rs = [O[ = s + mp


s , 0 (mod p). So s , 0 (mod p) and s(r 1) 0 (mod p) imply that
r 1 (mod p).
Finally, [O[ = [G[/[G
S
[ = q for any S O, so that rs = q. With r = t
p
the theorem is proved.
3.5 Patterns and Weights
Let X and Y be nite, nonempty sets (with [X[ = m), and let : G o
X
be an action of the group G on the set X. Then G induces an equivalence
relation on Y
X
, with equivalence classes being called patterns: viz., for f, h
Y
X
, then f h if and only if there is some g G for which f = h (g
1
).
Let be a commutative ring containing the rational numbers Q as a
subring. Frequently, is a polynomial ring over Q in nitely many variables.
Also, we suppose there is some weight function w : Y . Y is called the
store, and

yY
w(y) is the store inventory. A weight function W : Y
X

is then dened by
W(f) =

xX
w(f(x)).
It is easy to see that if f and h belong to the same pattern, then W(f) =
W(h). For, with h = f (g
1
),
W(h) =

xX
w(h(x)) =

xX
w[f((g
1
)(x))] =

xX
w[f(x)] = W(f),
since (g
1
)(x) varies over all elements of X as x varies over all elements of
X. So the weight of a pattern may be dened as the weight of any function in
that pattern. The inventory

fY
X W(f) of Y
X
is equal to
_

yY
w(y)
_
[X[
.
This is a special case (with each [X
i
[ = 1) of the following result, whose proof
is given.
3.5. PATTERNS AND WEIGHTS 101
Theorem 3.5.1 If X is partitioned into disjoint, nonempty subsets X =
X
1
+ +X
k
, put
S = f Y
X
: f is constant on each X
i
, i = 1, . . . , k.
Then the inventory of S is dened to be

fS
W(f) and is equal to

k
i=1
_

yY
(w(y))
[X
i
[
_
.
Proof: A term in the product is obtained by selecting one term in each
factor and multiplying them together. This is equivalent to selecting a map-
ping of the set 1, . . . , k into Y , yielding the term

k
i=1
[w((i))]
[X
i
[
. Let
: X 1, . . . , k be dened by (x) = i if and only if x X
i
. Put
f = . Then
w((i))
[X
i
[
=

xX
i
w((i)) =

xX
i
w(( )(x)) =

xX
i
w(f(x)),
from which it follows that
k

i=1
[w((i))]
[X
i
[
=

xX
w(f(x)) = W(f).
Since each f S can be written uniquely in the form f = for some
: 1, . . . , k Y , the desired result is easily seen to hold; viz.:
k

i=1
_
_
_

yY
(w(y))
[X
i
[
_
_
_
=

:1,...,kY
_
k

i=1
w((i))
[X
i
[
_
=

fS
W(f).
If each [X
i
[ = 1, we have S = Y
X
and

fY
X W(f) =
_

yY
w(y)
_
[X[
.
Theorem 3.5.2 (Polya-Redeld) The Pattern Inventory is given by:

F
W(F) = P
G
(

yY
w(y),

yY
[w(y)]
2
, . . . ,

yY
[w(y)]
m
),
where the summation is over all patterns F, and P
G
is the cycle index. In
particular, if all weights are chosen equal to 1, the number of patterns is
102 CHAPTER 3. POLYA THEORY
P
G
([Y [, [Y [, . . . , [Y [). If f F where F is a given pattern, then W(F) =
W(f) =

xX
w(f(x)). If w(y
i
) = x
i
, where x
1
, . . . , x
m
are independent com-
muting variables, then W(f) = x
1
b
1
x
2
b
2
x
m
b
m
, where b
i
is the number of
times the color y
i
appears in the coloring of any f in the pattern F. Hence the
coecient of x
1
b
1
x
2
b
2
x
m
b
m
in P
G
(

yY
w(y),

yY
[w(y)]
2
, . . . ,

yY
[w(y)]
m
)
is the number of patterns in which the color y
i
appears b
i
times.
Proof: Let w be one of the possible values that the weight of a function
may have. Put S = f Y
X
: W(f) = w. If g G, then W(f (g
1
)) =
w. Hence for each g G, (g) : f f (g
1
) maps S into S. (And it is
easy to see from earlier results that is an action of G on S.) Clearly, for
f
1
, f
2
S, f
1
and f
2
belong to the same pattern (in the sense mentioned at
the beginning of this section) if and only if they are equivalent relative to
the action of G on S. Not Burnsides Lemma applied to : G o
S
says
that the number of patterns contained in S is equal to
1
[G[

gG

w
(g), where

w
(g) denotes the number of functions f with W(f) = w and f = (g)(f) =
f (g
1
).
The patterns contained in S all have weight w. So if we multiply by w
and sum over all possible values of w, we obtain the patern inventory

F
W(F) =
1
[G[

gG

w
(g) w.
Also,

w
(g) w =
(g)

f
W(f),
where the right hand side is summed over all f Y
X
with f = f (g
1
).
It follows that

(W(F)) =
1
[G[

gG
(g)

f
W(f).
Here (g) splits X into cycles. And f = f (g
1
) means
f(x) = f((g
1
)(x)) = = f((g
i
)(x)),
3.5. PATTERNS AND WEIGHTS 103
i.e., f is constant on each cycle of (g
1
), and hence on each cycle of (g).
Conversely, each f constant on each cycle of (g) automatically satises
f = f (g
1
), since (g
1
)(x) always belongs to the same cycle as x itself.
Thus if the cycles are X
1
, . . . , X
k
, then

(g)
f
W(f) is the inventory calculated
by Theorem 3.5.1 to be
(g)

f
W(f) =
k

i=1
_
_

yY
[w(y)]
[X
i
[
_
_
.
Let (b
1
, . . . , b
m
) be the cycle type of (g). This means that among the
numbers [X
1
[, . . . , [X
k
[, the number 1 occurs b
1
times, 2 occurs b
2
times, . . . ,
etc. Hence
(g)

f
W(f) =
_
_
_

yY
(w(y))
_
_
_
b
1

_
_
_

yY
(w(y))
2
_
_
_
b
2

_
_
_

yY
(w(y))
m
_
_
_
b
m
.
Finally,

F
W(F) =
1
[G[

gG

(g)
f
W(f) is obtained by putting
x
i
=

yY
(w(y))
i
in P
G
(x
1
, . . . , x
m
) =
1
[G[

gG
x
b
1
1
x
b
m
m
.
We close this section with some examples that partly duplicate some of
those given earlier.
Example 3.5.3 Suppose we want to distribute m counters over three persons
P
1
, P
2
, P
3
with the condition that P
1
obtain the same number as P
2
. In how
many ways is this possible?
We are not interested in the individual counters, but only in the number
each person gets. Hence we want functions f dened on X = P
1
, P
2
, P
3

with range Y = 0, 1, . . . , m and with the restrictions f(P


1
) = f(P
2
) and

3
i=1
f(P
i
) = m. Put X
1
= P
1
, P
2
, and X
2
= P
3
. Dene w : Y
by w(i) = x
i
. Thus the functions we are interested in have weight x
m
, and
they are the only ones with weight x
m
. By Theorem 3.5.1 the inventory

fS
W(f) must be equal to
2

i=1
_
_

yY
w(y)
[X
i
[
_
_
= (1 +x
2
+x
4
+ +x
2m
)(1 +x +x
2
+ +x
m
).
104 CHAPTER 3. POLYA THEORY
But the coecient of x
m
in this product is the coecient of x
m
in
(1 x
2
)
1
(1 x)
1
=
1
4
(1 +x)
1
+
1
2
(1 x)
2
+
1
4
(1 x)
1
,
which is the coecient of x
m
in
1
4
(1 x +x
2
x
3
+ + (1)
m
x
m
+ )+
+
1
2

i=0
_
2 +i 1
i
_
x
i
+
1
4
(1 +x +x
2
+ +x
m
+ ),
which is equal to
1
2
(m+ 1) +
1
4
((1)
m
+ 1) =
_
1
2
m+ 1, m even,
1
2
(m+ 1), m odd.
For the next few examples let G be the group of rigid motions of a cube.
The elements of G were given earlier, but this time we want to include the
details giving the cycle indices. Recall the elements of G from Example 3.3.2.
Example 3.5.4 Let X be the set of vertices of the cube. The cycle types are
indicated as follows:
(a) x
8
1
(b) x
4
2
(c) x
2
4
(d) x
4
2
(e) x
2
1
x
2
3
So P
G
=
1
24
(x
8
1
+ 9x
4
2
+ 6x
2
4
+ 8x
2
1
x
2
3
).
Example 3.5.5 Let X be the set of edges of the cube. The cycle types are
indicated as follows:
(a) x
12
1
(b) x
6
2
(c) x
3
4
(d) x
2
1
x
5
2
(e) x
4
3
So P
G
=
1
24
(x
12
1
+ 3x
6
2
+ 6x
3
4
+ 6x
2
1
x
5
2
+ 8x
4
3
).
3.5. PATTERNS AND WEIGHTS 105
Example 3.5.6 Let X be the set of faces of the cube. Then
P
G
=
1
24
(x
6
1
+ 3x
2
1
x
2
2
+ 6x
2
1
x
4
+ 6x
3
2
+ 8x
2
3
).
Example 3.5.7 Determine how many ways a cube can be painted so that
each face is red or blue? In other words, how many patterns are there?
Let X be the set of faces of the cube, and : G o
X
as in the preceding
example. Put Y = red, blue, with the weight of each element being 1. Then
the number of patterns is P
G
(2, 2, . . .) =
1
24
(2
6
+32
4
+62
3
+62
3
+82
2
) = 10.
(Summary: (a) All faces red; (b) ve red, one blue; (c) two opposite faces
blue, the others red; (d) two adjacent faces blue, the others red; (e) three
faces at one vertex red, the others blue; (f) two opposite faces plus one other
red, the remaining faces blue; (g), (h), (i) and (j) obtained from (d), (c), (b)
and (a) upon interchanging red and blue.)
Example 3.5.8 In the preceding example, how many color patterns show
four red faces, two blue?
Let w(red) = x, w(blue) = y. Then the pattern inventory is

F
W(F) =
1
24
[(x +y)
6
+ 3(x +y)
2
(x
2
+y
2
)
2
+
+6(x +y)
2
(x
4
+y
4
) + 6(x
2
+y
2
)
3
+ 8(x
3
+y
3
)
2
].
The coecient of x
4
y
2
is
1
24
(15 + 9 + 6 + 18 + 0) = 2.
Example 3.5.9 In how many ways can the eight vertices be painted with n
colors?
Let X be the set of vertices, with : X o
X
as in Example 3.5.4. Let
Y = c
1
, . . . , c
n
, with w(c
i
) = x
i
. Then the pattern inventory PI is given
by
106 CHAPTER 3. POLYA THEORY
PI =
1
24
[(x
1
+ +x
n
)
8
+ 9(x
2
1
+ +x
2
n
)
4
+ 6(x
4
1
+ +x
4
n
)
2
+
+8(x
1
+ +x
n
)
2
(x
3
1
+ +x
3
n
)
2
].
If the total number of patterns is all that is sought, putting x
i
= 1 shows
this number to be
1
24
n
2
(n
6
+ 17n
2
+ 6).
Example 3.5.10 Let G be a nite group of order m. For each a G put

a
(g) = ag. So
a
o
G
. Then G

=
a
: a G is a subgroup of
o
G
, and : G G

: a
a
is an isomorphism called the left regular
representation of G. We now calculate the cycle index of G relative to its
left regular representation.
Let k(a) be the order of a for each a G. Then
a
splits G into m/k(a)
cycles of length k(a). So
P
G
=
1
m

aG
x
m/k(a)
k(a)
=
1
m

d[m
(d)x
m/d
d
,
where (d) is the number of elements a in G of order k(a) = d. If G is cyclic
of order m, then (d) = (d) for each d such that d[m, so
P
G
=
1
m

d:d[m
(d)x
m/d
d
.
3.6 The Symmetric Group
Let X be a nite set with m elements. Let G = o
X
o
m
. Let

b =
(b
1
, . . . , b
m
) be a permissible cycle type of some g G, i.e., b
i
0 and
b
1
+ 2b
2
+ 3b
3
+ +mb
m
= m. Then:
Theorem 3.6.1 The number #(

b) of permutations in o
m
and having type

b
is
3.6. THE SYMMETRIC GROUP 107
#(

b) =
m!
b
1
!1
b
1
b
2
!2
b
2
b
3
!3
b
3
b
m
!m
b
m
.
Proof: There are
_
m
b
1
_
ways to form the 1-cycles. Suppose we have
taken care of the 1-cycles, 2-cycles, . . . , (k 1)-cycles and are about to form
the k-cycles. We have
k1
= m b
1
2b
2
(k 1)b
k1
elements at
our disposal (
0
= m). The rst k-cycle can be formed (k 1)!
_

k1
k
_
ways, the second k-cycle in (k 1)!
_

k1
k
k
_
ways, . . . , the b
k
th k-cycle
in (k 1)!
_

k1
(b
k
1)k
k
_
ways. Hence the k-cycles can be formed in
1
b
k
!
_
(k 1)!
_

k1
k
_
(k 1)!
_

k1
k
k
_
(k 1)!
_

k1
2k
k
_

(k 1)!
_

k1
(b
k
1)k
k
__
=
=
1
b
k
!
_
(k 1)!
k1
!
k!(
k1
k)!

(k 1)!(
k1
k)!
k!(
k1
2k)!

(k 1)!(
k1
2k)!
k!(
k1
3k)!

(k 1)!(
k1
k(b
k
1))!
k!(
k1
b
k
k)!
=

k1
!
b
k
!k
b
k

k
!
ways. (The initial factor
1
b
k
!
arises from the fact that the k-cycles may be
written in any order.)
From Theorem 3.6.1 it follows readily that:
Theorem 3.6.2 The cycle index of o
m
is given by
P
S
m
=

b
x
b
1
1
x
b
m
m
b
1
!1
b
1
b
2
!2
b
2
b
m
!m
b
m
,
where the sum is over all

b = (b
1
, . . . , b
m
) for which b
i
0, b
1
+ 2b
2
+ +
mb
m
= m.
108 CHAPTER 3. POLYA THEORY
Now let M = 1, 2, . . . , m, and let X be the set of unordered pairs i, j
of distinct elements of M. The symmetric group o
m
acting on M also has a
natural action on X. For each o
m
, dene

o
X
by

: i, j (i), (j).
Then :

is an action of o
m
on X, and
: o
m
o

m
=

: o
m

is an isomorphism. We need to calculate the cycle index of o

m
.
This is feasible because the cycle type of each determines the cycle type
of the corresponding

. Specically, each factor x


b
t
t
in a term x
b
1
1
x
b
m
m
of
P
S
m
corresponding to a of cycle type (b
1
, . . . , b
m
) yields specic factors in
the term corresponding to

.
Let o
m
have cycle type (b
1
, . . . , b
m
),

m
i=1
ib
i
= m. Then for i, j
X we ask what is the length of the cycle of

to which i, j belongs?
First suppose i and j belong to the same cycle of of length t. If t is
odd, (

)
k
: i, j
k
(i),
k
(j) = i, j if and only if 1)
k
(i) = i and

k
(j) = j, or 2)
k
(i) = j and
k
(j) = i. Since
k
(i) = j and
k
(j) = i
imply
2k
(i) = i, it must be that k is a multiple of t. Hence
i
(i) = i and

k
(j) = j. So the

-cycle of i, j has length t also. Hence one cycle of


having odd length t produces
1
t
_
t
2
_
=
t1
2
cycles of length t.
Now suppose i and j belong to the same cycle of of even length t = 2k.
This cycle produces one cycle of

of length k and
1
2k
__
2k
1
_
k
_
=
1
2k
_
2k(2k 1)
2

2k
2
_
=
t 2
2
.
Since there are b
t
cycles of length t, the pairs i, j belonging to common
cycles of yield cycles of

in such a way that the terms of the cycle index


of o
m
yield terms of the cycle index of o

m
as follows:
x
b
t
t
P
S
m

_

_
x
b
t(
t1
2
)
t
, t odd;
_
x
t/2
x
(
t2
2
)
t
_
b
t
, t even.
3.6. THE SYMMETRIC GROUP 109
Now we suppose that i and j come from distinct cycles c
r
and c
t
of
of lengths r and t, respectively. (Note: (r, t) and [r, t], denote the greatest
common divisor and the least common multiple, respectively, of r and t.)
Also, rt = (r, t)[r, t]. The cycles c
r
and c
t
induce on the pairs of elements, one
each from c
r
and c
t
, exactly (r, t) cycles of length [r, t], since
k
(i),
k
(j) =
i, j if and only if k 0 (mod [r, t]). In particular, if r = t = k, there are k
cycles of length k. Hence we have the following:
If r ,= t, x
b
r
r
x
b
t
t
P
S
m
x
(r,t)b
r
b
t
[r,t]
P
S

m
.
If r = t = k, x
b
k
k
P
S
m
x
k
_
b
k
2
_
k
.
Multiplying over appropriate cases, and summing over permissible cycle
types

b, we nally obtain the cycle index of o

m
:
P
S

m
(x
1
, . . . , x
m
) =
1
m!

b
m!
b
1
! b
m
!1
b
1
2
b
2
m
b
m

[
m1
2
]

k=0
x
kb
2k+1
2k+1

[
m
2
]

k=1
(x
k
x
k1
2k
)
b
2k

[
m
2
]

k=1
x
k
_
b
k
2
_
k

1r<tm1
x
(r,t)b
r
b
t
[r,t]
Example 3.6.3 For m = 4 we have
P
S
4
=
1
4!
(x
4
1
+ 6x
2
1
x
2
+ 8x
1
x
3
+ 3x
2
2
+ 6x
4
).
P
S

4
=
1
4!
(x
6
1
+ 6x
2
1
x
2
2
+ 8x
2
3
+ 3x
2
1
x
2
2
+ 6x
2
x
4
).
110 CHAPTER 3. POLYA THEORY
For future reference we calculate:
P
S

4
(1 +x, 1 +x
2
, 1 +x
3
, 1 +x
4
) =
1 +x + 2x
2
+ 3x
3
+ 2x
4
+x
5
+x
6
.
3.7 Counting Graphs
We count the graphs on m vertices with q edges. Let ( denote the set of
graphs on the vertices M = 1, 2, . . . , m. Such a is a function from the
set X of unordered pairs i, j of distinct elements of M to the set Y = 0, 1,
where (i, j) is 1 or 0, according as i, j is an edge or a nonedge of the
graph .
Two such graphs
1
and
2
are equivalent if and only if there is a rela-
beling of the vertices of
1
so that it has the same edges as
2
, i.e., i there
is a permutation o
m
so that as functions
2
=
1

, where for each


o
m
,

acts on X as usual:

: i, j (i), (j).
Let w : Y 1, x be dened by w(i) = x
i
. Then the weight of a graph
Y
X
is W() = x
q
, where q is the number of edges of . The pattern
inventory of ( = Y
X
is P
S

m
(1 +x, 1 +x
2
, 1 +x
3
, . . .) by Polyas theorem. So
the number of graphs on m vertices with q edges is the coecient of x
q
in
P
S

m
(1 +x, 1 +x
2
, 1 +x
3
, . . .).
In particular, putting m = 4, we see from the preceding section that there
are 11 graphs on 4 vertices.
112 CHAPTER 3. POLYA THEORY
Chapter 4
Formal Power Series as
Generating Functions
4.1 Using Power Series to Count Objects
Suppose we are interested in all combinations of three objects O
1
, O
2
, O
3
:
, O
1
, O
2
, O
3
, O
1
, O
2
, O
1
, O
3
, O
2
, O
3
, O
1
, O
2
, O
3
.
Consider the generating function
C
3
(x) = (1 +O
1
x)(1 +O
2
x)(1 +O
3
x)
= 1 + (O
1
+O
2
+O
3
)x + (O
1
O
2
+O
1
O
3
+O
2
O
3
)x
2
+ (O
1
O
2
O
3
)x
3
.
This is readily generalized to
C
n
(x) = (1 +O
1
x)(1 +O
2
x) (1 +O
n
x) = 1 +a
1
x + a
n
x
n
,
where a
k
is the kth elementary symmetric function of the n variables O
1
to
O
n
. C
n
(x), after multiplication of it factors, contains the actual exhibition
of combinations. If only the number of combinations is of interest, the object
labels may be ignored and the generating function becomes an enumerating
generating function (sometimes just called an enumerator), i.e.,
C
n
(x) = (1 +x)
n
=

k=0
_
n
k
_
x
k
.
113
114CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
As a simple (and familiar!) example of the way in which generating
functions are used, consider the following :
C
n
(x) = (1 +x)
n
= (1 +x)C
n1
(x) = (1 +x)
n1

k=0
_
n 1
k
_
x
k
=
_
n 1
0
_
+
__
n 1
0
_
+
_
n 1
1
__
x + +
+
__
n 1
n 2
_
+
_
n 1
n 1
__
x
n1
+
_
n 1
n 1
_
x
n
,
where for 1 k n 1, the coecient on x
k
is
_
n 1
k 1
_
+
_
n 1
k
_
.
But
C
n
(x) =
n

k=0
_
n
k
_
x
k
,
so
_
n
k
_
=
_
n 1
k 1
_
+
_
n 1
k
_
,
a basic relation for binomial coecients.
Now look again at the generating function for combinations of n distinct
objects. In its factored form, each object is represented by a binomial and
each binomial spells out the fact that its object has two possibilities in any
combination: either it is absent (the term 1) or it is present (the term O
i
x for
the object O
i
). So C
n
(x) = (1+O
1
x) (1+O
n
x) is the generating function
for combinations without repetition. For repetitions of a certain kind, we
use a special generating function. For example, if an object may appear in
a combination zero, one or two times, then the function is the polynomial
1 + Ox + O
2
x
2
. If the number of repetitions is unlimited, it is the function
1 +Ox+O
2
x
2
+O
3
x
3
= (1 Ox)
1
. If the number of repetitions is even,
the generating function is 1 + O
2
x
2
+ O
4
x
4
+ + O
2k
x
2k
+ . Moreover,
the specication of repetitions may be made arbitrarily for each object. The
generating function is a representation of this specication in its factored
4.1. USING POWER SERIES TO COUNT OBJECTS 115
form as well as a representation of the corresponding combinations in its
developed (multiplied out) form.
EXAMPLE: For combinations of n objects with no restrictions on the
number of repetitions for any object, the enumerating generating function is
(1 +x +x
2
+ )
n
= (1 x)
n
=

k=0
_
n
k
_
(x)
k
=

k=0
(n)(n 1) (n k + 1)(x)
k
/(k!) =

k=0
_
n +k 1
k
_
x
k
.
This is worth stating as a theorem.
Theorem 4.1.1 The number of combinations, with arbitrary repetition, of
n objects k at a time, is the same as the number of combinations without
repetition of n + k 1 objects, k at a time. The corresponding generating
function is given by:
(1 +x +x
2
+ )
n
= (1 x)
n
=

k=0
_
n +k 1
k
_
x
k
. (4.1)
For the problem in the above example with the added specication that
each object must appear at least once, the enumerator is (x + x
2
+ )
n
=
x
n
(1 x)
n
=

k=0
_
n +k 1
k
_
x
k+n
=

k=n
_
k 1
k n
_
x
k
. Hence the
number of combinations of n objects taken k at a time with the restriction
that each object appear at least once is the same as the number of combina-
tions without repetition of k 1 objects taken k n at a time.
In this section we have seen power series manipulated without any con-
cern for whether or not they converge in an analytic sense. We want to
establish a rm foundation for the theory of formal power series so that we
can continue to work magic with series, so a signicant part of this chap-
ter will be devoted to developing those properties of formal power series that
have proved to be most useful. Before starting this development we give one
more example to illustrate the power of these methods.
116CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
4.2 A famous example: Stirling numbers of
the 2nd kind
We have let
_
n
k
_
be the number of partitions of [n] into k nonempty classes,
for integers n, k, with 0 < k n. Also, we put
_
n
k
_
= 0 if k > n or n < 0
or k < 0. Further, we take
_
n
0
_
= 0 for n ,= 0. Hence we have
_
n
k
_
=
_
n 1
k 1
_
+k
_
n 1
k
_
(4.2)
for (n, k) ,= (0, 0). And
_
0
0
_
= 1. (Recall: We proved this earlier for
0 < k n.)
For each integer k 0 put
B
k
(x) =

n
_
n
k
_
x
n
. (4.3)
Multiply Eq. 4.2 by x
n
and sum over n:
B
k
(x) = xB
k1
(x) +kxB
k
(x) (k 1; B
0
(x) = 1.) (4.4)
This leads to:
B
k
(x) =
x
1 kx
B
k1
(x) =
x
2
(1 kx)(1 (k 1)x)
B
k2
(x).
So B
1
(x) =
x
1x
1; B
2
(x) =
x
12x
B
1
(x) =
x
2
(1x)(12x)
; B
3
(x) =
x
13x
B
2
(x) =
x
3
(1x)(12x)(13x)
. And in general,
B
k
(x) =

n
_
n
k
_
x
n
=
x
k
(1 x)(1 2x) (1 kx)
, k 0. (4.5)
The partial fraction expansion of Eq. 4.5 has the form
1
(1 x)(1 2x) (1 kx)
=
k

j=1

j
1 jx
.
4.2. AFAMOUS EXAMPLE: STIRLINGNUMBERS OF THE 2NDKIND117
To nd the s, x r, 1 r k, and multiply both sides by 1 rx.
1
(1 x) (1 (r 1)x)(1 (r + 1)x) (1 kx)
=
=
k

j=1;j,=r

j
(1 rx)
(1 jx)
+
r
.
Now putting x = 1/r gives:

r
=
1
(1 1/r)(1 2/r) (1 (r 1)/r)(1 (r + 1)/r) (1 k/r)
=
r
k1
(r 1)(r 2) (1)(1) ((k r))
=
r
k1
(1)
kr
(r 1)!(k r)!
,
which implies

r
=
(1)
kr
r
k1
(r 1)!(k r)!
, 1 r k.
Notation: If f(x) =

n=0
a
n
x
n
, then [x
n
]f = a
n
. Clearly in general
we have [x
n
]f = [x
n+r
]x
r
f.
We now have for k 1:
_
n
k
_
= [x
n
]
_
x
k
(1 x) (1 kx)
_
= [x
nk
]
_
1
(1 x) (1 kx)
_
= [x
nk
]
k

r=1

r
1 rx
=
k

r=1

r
[x
nk
]
1
1 rx
=
k

r=1

r
r
nk
=
k

r=1
(1)
kr
r
n1
(r 1)!(k r)!
=
=
1
(k 1)!
k

r=1
(1)
kr
r
n1
_
k 1
r 1
_
.
This gives a closed form formula for
_
n
k
_
:
_
n
k
_
=
1
(k 1)!
k1

r=0
(1)
kr1
(r + 1)
n1
_
k 1
r
_
. (4.6)
118CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
4.3 Ordinary Generating Functions
Now that we have seen some utility for the use of power series as generating
functions, we want to introduce formal power series and lay a sound theoret-
ical foundation for their use. However, this material is rather technical, and
we believe it makes better pedagogical sense to use power series informally
as ordinary generating functions rst. This will permit the reader to get
used to working with these objects before having to deal with the abstract
denitions and theorems.
Def. The symbol f
ops
a
n

0
means that f is the ordinary power series
generating function for the sequence a
n

0
, i.e., f(x) =

i=0
a
i
x
i
.
Rule 1. If f
ops
a
n

0
, then for a positive integer h,
f a
0
a
1
x a
h1
x
h1
x
h
a
n+h

n=0
.
This follows immediately, since the L.H.S. equals
a
h
x
h
+a
h+1
x
h+1
+
x
h
= a
h
+
a
h+1
x + +a
h+n
x
n
+ .
Def. The derivative of f =

a
n
x
n
is f
t
=

na
n
x
n1
.
Proposition 1. f
t
= 0 i f = a
0
, i.e., a
i
= 0 for 1.
Proposition 2. If f
t
= f, then f = ce
x
.
Proof: f = f
t
i

n=0
a
n
x
n
=

n=1
na
n
x
n1
=

n=0
(n + 1)a
n+1
x
n
i
a
n
= (n + 1)a
n+1
for all n 0, i.e., a
n+1
a
n
n+1
for n 0. By induction on n,
a
n
= A
0
/n! for all n 0. So f = a
0

x
n
n!
= a
0
e
x
.
Starting with f =

n=0
a
n
x
n
, we clearly have
f
t
=

n=0
(n + 1)a
n+1
x
n
.
Then we see that
xf
t
=

n=0
(n + 1)a
n+1
x
n+1
=

n=0
na
n
x
n
.
So if we let D be the operator on f that gives f
t
, then
f
ops
a
n

0
(xD)f
ops
na
n

0
.
4.3. ORDINARY GENERATING FUNCTIONS 119
Then clearly (xD)
2
f n
2
a
n

0
. And (2 3(xD) + 5(xD)
2
)f (2
3n + 5n
2
)a
n

0
. More generally, we obtain
Rule 2: If f
ops
a
n

0
and P is a polynomial, then
(P(xD))f
ops
P(n)a
n

n0
.
The next Rule is just the product denition.
Rule 3: If f
ops
a
n

0
and g
ops
b
n

0
, then
fg
ops

n

r=0
a
r
b
nr

0
.
An immediate generalization is:
Rule 4: If f
ops
a
n

0
, then
f
k
ops

a
n
1
a
n
2
a
n
k

n=0
where the sum is over all (a
n
1
, . . . , a
n
k
) for which n
1
+n
2
+ +n
k
= n and
n
i
0.
Example: Lef f(n, k) be the number of weak k-compositions of n. Since
1
(1x)
ops
1, by Rule 4,
1
(1x)
k
ops
f(n, k)

n=0
. And as we have already seen,
(1 x)
k
=

n=0
_
k
n
_
(1)
n
x
n
implies that f(n, k) = (1)
n
_
k
n
_
=
_
n +k 1
n
_
.
We now ask what happens when we multiply by
1
1x
:
f(x)/(1 x) = (a
0
+a
1
x +a
2
x
2
+ )(1 +x +x
2
+ ) = a
0
+(a
0
+a
1
)x +
(a
0
+a
1
+a
2
)x
2
+ .
This leads to Rule 5:
Rule 5: If f
ops
a
n

0
, then f/(1 x)
ops

n
j=0
a
j

n0
.
Exercise: 4.3.1 D
n
_
1
(1x)
m+1
_
=
(m+n)!
m!(1x)
m+n+1
; m, n 0.
Recall that there is a formal Taylors formula. If you did not prove it
when we met it earlier, do it now.
120CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
Exercise: 4.3.2 If f(x) =

n=0
a
n
x
n
, then a
n
=
1
n!
D
n
(f(x))[
x=0
.
We can put together the two previous exercises to obtain
[x
n
]
_
1
(1 x)
m+1
_
=
(m+n)!
m!n!
(4.7)
Example 4.3.3 Start with f =
1
1x
ops
1
n0
. By Rule 2 we have (xD)
2
_
1
1x
_
ops

n
2

n0
. Then by Rule 5,
1
1x
(xD)
2
_
1
1x
_
ops

n
j=0
j
2

n0
. This implies

n
j=0
j
2
= [x
n
]
_
1
1x
(xD)
2
_
1
1x
__
= [x
n
]
x(1+x)
(1x)
4
, after some calculation. From
Eq.4.7 with m = 3, [x
n
]
1
(1x)
4
=
(n+3)!
3!n!
=
_
n + 3
3
_
. Hence

n
j=0
j
2
=
[x
n
]
_
x
(1x)
4
+
x
2
(1x)
4
_
= [x
n1
]
_
1
(1x)
4
_
+[x
n2
]
_
1
(1x)
4
_
=
_
n + 2
3
_
+
_
n + 1
3
_
=
n(n+1)(2n+1)
6
.
Example 4.3.4 (Harmonic numbers) Put H
n
= 1 +
1
2
+
1
3
+ +
1
n
, n 1.
Start with f =

n1
x
n
n
= log(1x). (Check: f
t
=

n1
x
n1
=

n0
x
n
=
1
1x
, and (log(1x))
t
= (1)
1
1x
(1) =
1
1x
). So f
ops

1
n

n1
. By Rule
5,
1
1x
f
ops
H
n

n=1
. So
1
1 x
log
_
1
1 x
_
ops
H
n

n=1
.
In the rst three sections of this chapter we gave examples of how to use
ordinary power series as generating functions. There are other ways to
associate a possibly innite sequence with a power series. In later sections
we explore a couple more types, the more important of which are the exp-
ponential generating functions. But rst we begin our exposition of formal
power series.
Exercise: 4.3.5 Given three types of objects O
1
, O
2
and O
3
, let a
n
be the
number of combinations of these objects, n at a time, so that O
1
is selected at
most 2 times, O
2
is selected an odd number of times, and O
3
is selected at least
once. Determine the ordinary generating function of the sequence a
n

n=0
and determine a recursion that is satised by the sequence. Compute a closed
form formula for a
n
. As always, be sure to include enough details so that I
can see a proof that your answers are correct.
4.4. FORMAL POWER SERIES 121
Exercise: 4.3.6 If O
1
can appear any number of times and O
2
can appear
any multiple of k times, use Rule 5 to show that the number of combinations
of n objects must be a
n
= 1 +
n
k
|.
Exercise: 4.3.7 Put a
n
= [(i, j) : i 0; j 0; i + 3j = n[. Determine
the ordinary generating function for the sequence a
n

n=0
and determine an
explict value in closed form for a
n
.
Exercise: 4.3.8 If O
1
can appear an even number of times and O
2
a multiple
of 4 times, determine the number of combinations of n of the objects.
Exercise: 4.3.9 If O
1
can appear an odd number of times, O
2
an even num-
ber of times, and O
3
either 0 or 1 time, how many combinations of n of these
three objects are there? (Find the enumerating generating function.)
4.4 Formal Power Series
If we want to be fairly general we dene formal power series in one indeter-
minate z over a commutative ring A with unity 1. Even though usually in
our applications A is the eld of complex numbers or perhaps even just the
eld of real numbers, it really does not take much additional eort to assume
merely that A is a commutative ring with unity. On the other hand, often
when power series are dened and their theory developed, series in n inde-
terminates are discussed. For simplicity, we will use only one indeterminate
in our formal development and in many applications. But the extension to
several indeterminates is conceptually natural and will be more or less taken
for granted after a certain stage.
A power series in one indeterminate x over the commutative ring A with
unity 1 is a formal sum

i=0
a
i
x
i
= a
0
+a
1
x + +a
n
x
n
+ , a
i
A.
To begin with, a
n

n=0
is an arbitrary sequence in A; no question of
convergence is considered. Two power series

a
i
x
i
and

b
i
x
i
are equal i
a
i
= b
i
for all i = 0, 1, 2, . . . .
122CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
Dene addition and multiplication of two power series as follows:
(i)

a
i
x
i
+

b
i
x
i
=

(a
i
+b
i
)x
i
, and
(ii) (

a
i
x
i
)(

b
i
x
i
) =

c
i
x
i
,
where c
i
=

i
j=0
a
j
b
ij
.
As an example of multiplication, consider the product (1x)

n=0
x
n
= 1.
Be sure to check this out! It says that (1 x)
1
=

n=0
x
n
.
The ring of all formal power series in x over the commutative ring A is
denoted by A[[x]].
Exercise: 4.4.1 Prove that with these denitions, if A is an integral domain,
then the set A[[x]] of all formal power series in x over A is an integral domain
with unity equal to

i=0
a
i
x
i
, where a
0
= 1, and a
i
= 0 for i > 0.
Proof: All the details are fairly routine. We include here only the details
for associativity of multiplication. So, to show
__

a
i
x
i
_

b
i
x
i
__ _

c
i
x
i
_
=
_

a
i
x
i
_ __

b
i
x
i
_ _

c
i
x
i
__
we must show that the coecient of x
j
on on the left hand side is equal to the
coecient of x
j
on the right hand side. The coecient of x
j
on the L.H.S.
is

j
k=0
d
k
c
jk
, where d
k
=

k
i=0
a
i
b
ki
. This is

j
k=0
_

k
i=0
a
i
b
ki
_
c
jk
=

a
i
b
k
c
l
, where this last sum is over all i, k, l for which i + k + l = j, 0
i, k, l j. And the last equality is obtained by observing that each term
of the R.H.S. of this last equality appears exactly once on the left, and vice
versa. Similarly, the coecient of x
j
on the R.H.S. of the main equality to
be established is also equal to

a
i
b
k
c
l
, where this sum is over all i, j, k for
which i +k +l = j, 0 i, k, l j.
We have established the following:
[x
j
]
_
(

a
i
x
i
)(

b
i
x
i
)(

c
i
x
i
)
_
=

a
i
b
k
c
l
(4.8)
where the sum is over all i, k, l for which i +k +l = j, 0 i, k, l j.
This suggests more general formulas, but we write down only one special
case:
4.4. FORMAL POWER SERIES 123
[x
n
]
_
_
_
_

i=0
a
i
x
i
_
k
_
_
_
=

a
n
1
a
n
2
a
n
k
(4.9)
where the sum is over all (n
1
, n
2
, . . . , n
k
) for which n
i
0 and n
1
+n
2
+ +
n
k
= n.
Def.
n
Let f(x) =

a
i
x
i
be a nonzero element of A[[x]]. Then there must
be a smallest integer n for which a
n
,= 0. This n is called the order of f(x)
and will be denoted by o(f(x)). If n = o(f(x)), then a
n
will be called the
initial term of f(x). We say that the order of the zero power series (i.e., the
zero element of the ring A[[x]]) is +.
Theorem 4.4.2 If f, g A[[x]], then
1. o(f +g) mino(f), o(g);
2. o(fg) o(f) +o(g).
Furthermore, if A is an integral domain, then A[[x]] is an integral domain
and o(fg) = o(f) +o(g).
Proof: Easy exercise.
Theorem 4.4.3 If f(x) =

i=0
a
i
x
i
A[[x]], then f(x) is a unit in A[[x]]
i a
0
is a unit in A.
Proof: If f(x)g(x) = 1, then (with g(x) =

b
i
x
i
), a
0
b
0
= 1, implying a
0
is a unit in A. Conversely, suppose a
0
is a unit in A. Then there is a b
0
A
for which a
0
b
0
= 1. And there is a (unique!) b
1
A for which a
0
b
1
+a
1
b
0
= 0,
i.e., b
1
= a
1
0
(a
1
b
0
) = a
1
b
2
0
. Proceed inductively. Suppose b
0
, b
1
, . . . , b
n
have been determined so that

j
i=0
a
i
b
ji
= 0 for j = 1, 2 . . . , n. Then put
b
n+1
= a
1
0
(a
1
b
n
+ +a
n+1
b
0
) = b
0
(a
1
b
n
+ +a
n+1
b
0
). By induction,
g(x) =

i=0
b
i
x
i
A[[x]] is constructed (uniquely!) so that f(x)g(x) = 1.
Theorem 4.4.4 If K is a eld, then the units of K[[x]] are the power series
of order 0, i.e., with nonzero constant term.
The next result is inserted here for the algebraists. It will not be needed
in this course.
124CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
Theorem 4.4.5 Let K be a eld. Then the ring K[[x]] has a unique maximal
ideal (generated by x), and the nontrivial ideals are all powers of this one.
Theorem 4.4.6 Let K be a eld, and let f(x), g(x) K[[x]] with f(x) ,= 0.
Then there is a unique power series q(x) K[[x]] and there is a unique poly-
nomial r(x) K[x] such that g(x) = q(x)f(x) +r(x) and either deg(r(x)) <
o(f(x)) or r(x) = 0.
Proof: Let g(x) =

a
i
x
i
, f(x) =

b
i
x
i
, where h = o(f(x)) < . Put
r(x) = a
0
+a
1
x+ +a
h1
x
h1
. Then g(x) r(x) = a
n
x
n
+a
n+1
x
n+1
+ ,
where a
n
,= 0 and n h. Now f(x) = b
h
x
h
+ b
h+1
x
h+1
= x
h
(b
h
+
b
h+1
x+ ), where b
h
+b
h+1
x+ is invertible with a unique inverse v(x)
K[[x]]. Then g(x) r(x) = x
n
(a
n
+ a
n+1
x + ) = x
nh
(a
n
+ a
n+1
x +
)x
h
= x
nh
(a
n
+a
n+1
x+ )x
h
(b
h
+b
h+1
x+ )v(x) = x
nh
(a
n
+a
n+1
x+
)v(x)f(x). So we put q(x) = x
nh
(a
n
+ a
n+1
x + )v(x). Then g(x) =
q(x)f(x) + r(x) where r(x) = 0 or deg(r(x)) < o(f(x)). Now suppose there
could be two power series q
1
(x), q
2
(x) and two polynomials r
1
(x), r
2
(x), with
g(x) = q
i
(x)f(x)+r
i
(x), where deg(r
i
(x)) < o(f(x)) or r
i
= 0, i = 1, 2. Then
(q
1
(x) q
2
(x))f(x) = r
2
(x) r
1
(x). Clearly q
1
(x) = q
2
(x) i r
1
(x) = r
2
(x),
since K is a eld and f(x) ,= 0. So suppose q
2
(x) q
1
(x) ,= 0 ,= r
1
(x) r
2
(x).
Then deg(r
2
(x)r
1
(x)) < o(f(x)) o((q
1
(x)q
2
(x))f(x)) = o(r
2
(x)r
1
(x)).
This is impossible, since no nonzero polynomial can have degree less than its
order. Hence q
1
(x) = q
2
(x) and r
1
(x) = r
2
(x).
A very common technique used to obtain identities that at rst glance
look impossible to prove is to calculate the coecient on x
n
in two dierent
expressions for the same formal power series. Here are some exercises that
oer practice using this method.
Exercise: 4.4.7 Show that
m+1

r=0
(1)
r
_
m+ 1
r
__
m+n r
m
_
=
_
1, if n = 0;
0, if n > 0.
Hint: Recall that
1 = (1 z)
m+1

k=0
_
m+k
k
_
z
k
,
and compute the coecient on x
n
.
4.5. COMPOSITION OF POWER SERIES 125
Exercise: 4.4.8 Compute the coecient of x
2n
in both sides of
(1 x
2
)
2n
=
2n

r=0
(1)
r
_
2n
r
_
x
2r
to show that
2n

k=0
(1)
k
_
2n
k
_
2
= (1)
n
_
2n
n
_
.
Exercise: 4.4.9 Show that
n+1

k=0
__
n
k
_

_
n
k 1
__
2
=
2
n + 1
_
2n
n
_
.
Hint: Compute the coecient of x
m
on both sides of
[(1 x)(1 +x)
n
]
_
(1
1
x
)(1 +x)
n
_
= (
1
x
+ 2 x)(1 +x)
2n
.
4.5 Composition of Power Series
For each n 0 let f
n
(x) =

i=0
c
ni
x
i
be a formal power series. Suppose that
for xed i, only nitely many c
ni
could be nonzero. Say c
ni
= 0 for n > n
i
.
Then we can formally dene

n=0
f
n
(x) =

i=0
_
n
i

n=0
c
ni
_
x
i
.
By hypothesis this formal sum of innitely many power series involves
the sum of only nitely many terms in computing the coecient of any given
power of x. This denition allows the introduction of substition of one power
series b(x) for the variable x of a second power series a(x), at least when
o(b(x)) 1.
If a(x) =

n=0
a
n
x
n
, and if b(x) =

n=1
b
n
x
n
, i.e., b
0
= 0, then the
powers b
n
(x) := (b(x))
n
satisfy the condition for formal addition, i.e.,
a(b(x)) :=

n=0
a
n
b
n
(x).
126CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
As an example, let a(x) := (1 x)
1
, b(x) = 2x x
2
. Then formally
h(x) := a(b(x)) = 1 + (2x x
2
) + (2x x
2
)
2
+
= 1 + 2x + 3x
2
+ 4x
3
+
= (1 x)
2
.
The middle equality is a bit mysterious. It follows from (legitimate)
algebraic manipulation:
a(b(x)) = (1 (2x x
2
))
1
= (1 x)
2
.
If we try to verify it directly we nd that the coecient on x
n
in 1+(2x
x
2
) + (2x x
2
)
2
+ is
n

j=0
(1)
nj
2
2jn
_
j
2j n
_
,
which we must then show is equal to n + 1. This is quite tricky to show
directly, say by induction.
From analysis we know that 1 + 2x + 3x
2
+ 4x
3
+ is the power series
expansion of (1 x)
2
, so this must be true in c[[x]]. Indeed we have
(1 x)h(x) =

n=0
x
n
= (1 x)
1
.
If b(x) =

n=0
b
n
x
n
is a power series with o(b(x)) = 1, i.e., b
0
= 0
and b
1
,= 0, we can also nd the (unique!) inverse function as a power
series. We solve the equation b(a(x)) = x by substitution, assuming that
a(x) =

n=1
a
n
x
n
with a
0
= 0. Then
x = b
1
(

n=1
a
n
x
n
) +b
2
(

n=1
a
n
x
n
)
2
+b
3
(

n=1
a
n
x
n
)
3
+
From this we nd b
1
a
1
= 1, b
1
a
2
+ b
2
a
2
1
= 0, b
1
a
3
+ b
2
(a
1
a
2
+ a
2
a
1
) +
b
3
(a
1
a
1
a
1
) = 0, etc. In general the zero coecient on x
n
(for n > 1) must
equal an expression that starts with b
1
a
n
and for which the other terms
involve the coecients b
1
, . . . , b
n
and coecients a
k
with 1 k < n. Hence
recursively we may solve for a
n
starting with a
1
. This compositional inverse of
b(x) will be denoted b
[1]
(x) to distinguish it from the multiplicative inverse
b
1
(x) (sometimes denoted b(x)
1
) of b(x). Note that for b(x) A[[x]],
b
[1]
(x) exists i o(b(x)) = 1 and b(x)
1
exists i o(b(x)) = 0 and b(0)
1
4.6. THE FORMAL DERIVATIVE AND INTEGRAL 127
exists in A. Also note that if b(a(x)) = x, and we put y = a(x), then
a(b(y)) = a(b(a(x))) = a(x) = y, i.e., a(b(y)) = y, so if b is a left inverse of
a, it is also the unique right inverse of a.
Note that certain substitutions that make perfect sense in analysis are
forbidden within the present theory. For suppose b(x) = 1+x, i.e., o(b(x)) =
0. If a(x) = e
x
=

n=0
x
n
n!
, then we are not allowed to substitute 1+x in place
of x in the formula for e
x
to nd the power series for e
1+x
. If we try to do this
anyway, we see that e
1+x
would appear as

n=0
(1+x)
n
n!
=

n=0
1
n!

n
j=0
_
n
j
_
x
j
,
which has innitely many nonzero contributions to the coecient on x
i
for
each i. This is not dened.
4.6 The Formal Derivative and Integral
Let f(x) =

i0
c
i
x
i
and g(x) =

j0
d
j
x
j
be two power series. We dene
the formal derivative f
t
(x) by:
f
t
(x) =

i0
(i + 1)a
i+1
x
i
. (4.10)
It is now easy to show that the Sum Rule holds:
(f(x) +g(x))
t
= f
t
(x) +g
t
(x), (4.11)
and
(f(x)g(x))
t
=

i,j0
(i +j)c
i
d
j
x
i+j1
=

i,j0
ic
i
d
j
x
i1+j
+

i,j0
jc
i
d
j
x
j1+i
,
which proves the Product Rule:
(f(x)g(x))
t
= g(x)f
t
(x) +f(x)g
t
(x). (4.12)
Suppose that o(g(x)) = 1, so that the composition f(g(x)) is dened.
The Chain Rule
(f(g(x))
t
= f
t
(g(x))g
t
(x) (4.13)
128CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
is established rst for f(x) = x
n
, n 1, by using induction on n along with
the product rule, and then by linear extension for any power series f(x).
If f(0) = 1, start with the equation f
1
(x)f(x) = 1. Take the formal
derivative of both sides, applying the product rule. It follows that (f
n
(x))
t
=
nf
n1
(f
t
(x)), for n 1. But now given f(x) and g(x) as above with
g(0) = 1, we can use this formula for (f
1
(x))
t
= f
t
(x)/f
2
(x) with the
product rule to establish the Quotient Rule:
(f(x)/g(x))
t
=
g(x)f
t
(x) f(x)g
t
(x)
g
2
(x)
. (4.14)
If R has characteristic 0 (so we can divide by any positive integral mul-
tiple of 1), the usual Taylors Formula in one variable (well, MacLaurins
formula) holds:
f(x) =

n0
f
(n)
(0)x
n
n!
. (4.15)
If R has characteristic zero and f, g R[[x]], then
f
t
= g
t
and f(0) = g(0) f(x) = g(x). (4.16)
In order to dene the integral of f(x) we need to assume that the char-
acteristic of A is zero. In that case dene the formal integral I
x
f(x) by
I
x
f(x) =
_
x
0
f(x)dx =

i1
i
1
c
i1
x
i
. (4.17)
It is easy to see that
_
x
0
(f(x) + g(x))dx =
_
x
0
f(x)dx +
_
x
0
g(x)dx. The
following also follow easily:
(I
x
f(x))
t
= f(x); If F
t
(x) = f(x), then
_
x
0
f(x) = F(x) F(0). (4.18)
4.7. LOG, EXP AND BINOMIAL POWER SERIES 129
4.7 Log, Exp and Binomial Power Series
In this section we assume that R is an integral domain with characteristic 0.
The exponential series is
e
x
= exp(x) =

j0
x
j
j!
. (4.19)
The logarithmic series is
log((1 x)
1
) =

j1
x
j
j
. (4.20)
If y is also an indeterminate, then the binomial series is
(1 +x)
y
= 1 +

j1
y(y 1) (y j + 1)
x
j
j!
=
=

j0
_
y
j
_
x
j
(R[y])[[x]]. (4.21)
If o(f(x)) 1, the compositions of these functions with f are dened.
So exp(f), log(1 + f) and (1 + f)
y
are dened. Also, any element of 1[[x]]
may be substituted for y in (1 + x)
y
. Many of the usual properties of these
analytic functions over c hold in the formal setting. For example:
(e
x
)
t
= e
x
(log((1 x)
1
))
t
= (1 x)
1
((1 +x)
y
)
t
= y(1 +x)
y1
The application of the chain rule to these is immediate except possibly
in the case of the logarithm.
We ask: When is it permissible to compute log(f(x)) for f(x) 1((x))?
To determine this, note that log(f(x)) = log
_
1
1(1f
1
)
_
=

j1
(1f
1
)
j
j
,
where this latter expression is well-dened provided o(1 f
1
) 1. So in
particular we need o(f(x)) = 0 and f(0) = 1. In this case D
x
logf(x) =
f(x)D
x
(1 f
1
(x)), and
130CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
(log(f(x))
t
= f
t
(x)/f(x) (4.22)
By the chain rule, D
x
log(e
x
) = (e
x
)
1
e
x
= 1 = D
x
x, and log(e
x
)[
x=0
=
0 = x[
x=0
, which implies that
log(e
x
) = x (4.23)
Similarly, using both the product and chain rules,
D
x
(1 x)exp(log((1 x)
1
))
= exp(log((1 x)
1
)) + (1 x)(1 x)
1
exp(log((1 x)
1
)) = 0,
so that
(1 x)exp(log((1 x)
1
)) = 1,
and
exp(log((1 x)
1
)) = (1 x)
1
. (4.24)
Again, this is because both (1x)exp(log((1x)
1
)) and 1 have derivative
0 and constant term 1.
Now consider properties of the binomial series. We have already seen that
for positive integers n:
(1 +x)
n
=

j0
_
n
j
_
x
j
. (4.25)
This is the binomial expansion for positive integers. Thus for positive
integers m and n, [x
k
] can be applied to the binomial series expansion of
the identity (1 + x)
m
(1 + x)
n
= (1 + x)
m+n
, giving the Vandermonde
Convolution
k

i=0
_
n
i
__
m
k i
_
=
_
m+n
k
_
. (4.26)
4.7. LOG, EXP AND BINOMIAL POWER SERIES 131
If f(x) is a polynomial in x of degree k, and the equation f(x) = 0
has more than k roots, then f(x) = 0 identically. Thus the polynomial
_
y +z
k
_

k
i=0
_
y
i
__
z
k i
_
in indeterminates y and z must be identi-
cally 0, since it has an innite number of roots, namely all positive integers.
Accordingly we have the binomial series identity
(1 +x)
y
(1 +x)
z
= (1 +x)
y+z
. (4.27)
Substitution of y for z yields (1 +x)
y
(1 +x)
y
= (1 +x)
0
= 1, so
((1 +x)
y
)
1
= (1 +x)
y
. (4.28)
This allows us to prove that
log((1 +x)
y
) = ylog(1 +x) (4.29)
by the following dierential argument:
D
x
(log((1 +x)
y
)) = (1 +x)
y
y(1 +x)
y1
= y(1 +x)
1
= D
x
(ylog(1 +x)),
and
log((1 + 0)
y
) = 0 = ylog(1 + 0).
Combining these results gives
(1 +x)
y

z
= exp(log(1 +x)
y

z
) = exp(zlog((1 +x)
y
))
= exp(zylog(1 +x)) = exp(log((1 +x)
yz
)),
so
(1 +x)
y

z
= (1 +x)
yz
. (4.30)
By the binomial theorem,
132CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
exp(x +y) =

n0
(x +y)
n
n!
=

n0
n

i=0
x
i
i!
y
ni
(n i)!
,
so
exp(x +y) = (exp x)(exp y). (4.31)
The substitution of x for y yields exp(0) = (exp(x))(exp(x)), and we
have
(exp(x))
1
= exp(x). (4.32)
By making the substitution x = f, for f R[[x]] with o(f(x)) 1,
and y = g for any g R[[x]], in the preceding results, we obtain many of
the results that are familiar to us in terms of the corresponding analytic
functions. The only results that do not hold for the formal power series are
those that correspond to making inadmissible substitutions. For example, it
is not the case that exp(log(x)) = x, since log(x) does not exist as a formal
power series.
HERE ARE TWO MORE OFTEN USED POWER SERIES:
sin(x) =

k=0
(1)
k
x
2k+1
/((2k + 1)!)
cos(x) =

k=0
(1)
k
x
2k
/((2k)!)
It is a good exercise to check out the usual properties of these formal
power series.
4.8 Exponential Generating Functions
Recall that P(n, k) is the number of k-permutations of an n-set, and P(n, k) =
n!/(n k)! = n(n 1) (n k 1). The ordinary generating function of
the sequence P(n, 0), P(n, 1), . . . is
G(x) = P(n, 0)x
0
+P(n, 1)x
1
+ +P(n, n)x
n
.
Also recall the similar binomial expansion
C(x) = C(n, 0)x
0
+ +C(n, n)x
n
= (1 +x)
n
.
4.8. EXPONENTIAL GENERATING FUNCTIONS 133
But we cant nd a nice closed form for G(x). On the other hand,
P(n, r) = C(n, r)r!, so the equation for C(x) can be written
P(n, 0)x
0
/0! +P(n, 1)x
1
/1! +P(n, 2)x
2
/2! + +P(n, n)x
n
/n! = (1 +x)
n
,
i.e.,
n

k=0
P(n, k)x
k
/k! = (1 +x)
n
.
So P(n, k) is the coecient of x
k
/k! in (1 + x)
n
. This suggests another
kind of generating function - to be called the exponential generating function
- as follows: If a
n
is a sequence, the exponential generating function for this
sequence is
H(x) =

n=0
a
n
x
n
/n!
EXAMPLE 1. a
k
= 1, 1, 1, . . . has H(x) =

x
k
/k! = e
x
as its
exponential generating function.
EXAMPLE 2. From above, we already see that (1 + x)
n
is the expo-
nential generating function of the sequence P(n, 0), P(n, 1), . Then the
exponential generating funcion of 1, ,
2
, . . . is
H(x) =

k=0

k
x
k
/k! =

k=0
(x)
k
/k! = e
x
.
Now suppose we have k kinds of letters in an alphabet. We want to form
a word with n letters using i
1
of the 1st kind, i
2
of the second kind, . . . , i
k
of the kth kind. The number of such words is
p(n; i
1
, . . . , i
k
) =
_
n
i
1
, . . . , i
k
_
= n!/(i
1
! i
k
!).
Consider the product:
(1 +O
1
x
1
/1! +O
2
1
x
2
/2! +O
3
1
x
3
/3! + ) (1 +O
k
x
1
/1! +O
2
k
x
2
/2! + ).
The term involving O
i
1
1
O
i
2
2
O
i
k
k
is (if we put n = i
1
+i
2
+ +i
k
)
134CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
(O
i
1
1
x
i
1
/i
1
!)(O
i
2
2
x
i
2
/i
2
!) (O
i
k
k
x
i
k
/i
k
!)
= O
i
1
1
O
i
k
k
(x
i
1
++i
k
)/(i
1
! i
k
!) = O
i
1
1
O
i
i
k
(x
n
/n!)(n!/(i
1
! i
k
!))
= O
i
1
1
O
i
i
k
p(n; i
1
, , i
k
)x
n
/n!
The complete coecient on x
n
/n! is

i
1
++i
k
=n
O
i
1
1
O
i
k
k
_
n
i
1
, , i
k
_
, provided that there is no restriction on
the number of repetitions of any given object, except that i
1
+ + i
k
= n.
And the various O
j
really do not illustrate the permutations, so we place
each O
j
equal to 1. Also, for the object O
j
, if there are restrictions on the
number of times O
j
can appear, then for its generating function we include
only those terms of the form x
m
/m! if O
j
can appear m times. Specically,
let O be an object (i.e., a letter) to be used. For the exponential generating
function of O, use

k=0
a
k
x
k
/k!, where a
k
is 1 or 0 according as O can appear
exactly k times or not.
EXAMPLE 3: Suppose O
1
can appear 0, 2 or 3 times, O
2
can appear
4 or 5 times, and O
3
can be used without restriction. The the exponential
generating functions for O
1
, O
2
, O
3
are:
(1 +x
2
/2! +x
3
/3!)(x
4
/4! +x
5
/5!)(1 +x +x
2
/2! + )
Theorem 4.8.1 Suppose we have k kinds of objects O
1
, , O
k
. Let f
j
(x)
be the exponential generating function of the object O
j
determined as above
by whatever restrictions there are on the number of occurrences allowed O
j
.
Then the number of distinct permutations using n of the objects (i.e., words of
length n), subject to the restrictions used in determining f
1
(x), f
2
(x), . . . , f
k
(x),
is the coecient of x
n
/n! in f
1
(x) f
k
(x).
EXAMPLE 4. Suppose there are k objects with no restrictions on repeti-
tions. So each individual exponential generating function is

k=0
= e
x
. The
complete exponential generating function is then
(e
x
)
k
= e
kx
=

n=0
(kx)
n
/n! =

n=0
(k
n
)(x
n
/n!).
4.9. FAMOUS EXAMPLE: BERNOULLI NUMBERS 135
But we already know that k
n
is the number of words of length n with k
types of objects and all possible repetitions allowed.
EXAMPLE 5. Again suppose there are k types of object, but that each
object must appear at least once. So the exponential generating function is
(e
x
1)
k
. The coecient of x
n
/n! in
(e
x
1)
k
=
k

j=0
_
k
j
_
e
jx
(1)
kj
=
k

j=0
_
k
j
_
(1)
kj

n=0
(jx)
n
/n!
=

n=0
_
_
k

j=0
_
k
j
_
(1)
kj
j
n
_
_
x
n
/n!
is
k

j=0
_
k
j
_
(1)
kj
j
n
.
This proves the strange result:
The number of permutations of n objects of k types, each type appearing
at least once is
k

j=0
_
k
j
_
(1)
kj
j
n
.
Def. The symbol f
egf
a
n

0
means that the power series f is the expo-
nential generating function of the sequence a
n

0
, i.e., that f =

n0
a
n
x
n
n!
.
So suppose f
egf
a
n

0
. Then f
t
=

n=1
a
n
x
n1
(n1)!
=

n=0
a
n+1
x
n
n!
, i.e.,
f
t
egf
a
n+1

0
. By induction we have an analogue to Rule 1:
Rule 1
t
: If f
egf
a
n

0
, then for h ^, D
h
f
egf
a
n+h

n=0
.
4.9 Famous Example: Bernoulli Numbers
Dene B
n
, n 0, by
x
e
x
1
=

n=0
B
n
x
n
n!
.
The dening equation for B
n
is equivalent to
136CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
1 =
_

k=0
x
k
(k + 1)!
__

k=0
B
k
x
k
k!
_
.
Recursively we can solve for the B
k
using this equation. But rst notice
the following: Replace x by x in the egf for B
n
:

k=0
B
k
(x)
k
k!
=
x
e
x
1
=
xe
x
e
x
1
.
So
x
e
x
1

xe
x
e
x
1
= x =

k=0
B
k
_
1 (1)
k
k!
_
x
k
.
This implies that
x = B
0
0 +B
1

2
1
x +B
2
0 x
2
+B
3

2
3!
x
3
+B
4
0 x
4
+
which implies that
B
1
=
1
2
and B
2k+1
= 0 for k 1.
Then recursively from above we nd B
0
= 1; B
1
=
1
2
; B
2
=
1
6
; B
4
=
1
30
;
B
6
=
1
42
, . . . ,.
A famous result of Euler is the following:
(2k) =

n=1
1
n
2k
=
(1)
k

2k
2
2k1
(2k 1)!
_
B
2k
2k
_
, k = 1, 2, . . . .
In particular,
(2) =

2
6
.
Bernoulli originally introduced the B
n
to give a closed form formula for
S
n
(m) = 1
n
+ 2
n
+ 3
n
+ +m
n
.
On the one hand
4.10. FAMOUS EXAMPLE: FIBONACCI NUMBERS 137
x(e
mx
1)
e
x
1
=
_
x
e
x
1
(e
mx
1
_
=
_

k=0
B
k
x
k
k!
_
_
_

j=1
m
j
x
j
j!
_
_
=
=

n=0
_
m

i=1
B
ni
(n i)!

m
i
i!
_
x
n
=

n = 0
_
n

i=1
_
n
i
_
B
ni
m
i
_
x
n
n!
.
(The coecient on
x
0
0!
is 0.)
On the other hand:
x(e
mx
1)
e
x
1
= x
_
e
mx
1
e
x
1
_
= x(e
(m1)x
+e
(m2)x
+ +e
x
+ 1) =
= x
m1

j=0
_
_

r0
j
r
x
r
r!
_
_
=

r = 0
x
r+1
r!
_
_
m1

j=0
j
r
_
_
=
=

r=0
S
r
(m1)
x
r+1
r!
=

n=1
S
n1
(m1)
nx
n
n!
.
Equating the coecients of
x
n
n!
we get:
n

i=1
_
n
i
_
B
ni
m
i
= S
n1
(m1)n, n 1,
or
n+1

i=1
_
n + 1
i
_
B
n+1i
(m+ 1)
i
= S
n
(m) (n + 1), n 0.
So Bernoullis formula is:
S
n
(m) = 1
n
+ 2
n
+ +m
n
=
n+1

i = 1
_
n + 1
i
_
B
n+1i
(m+ 1)
i
(n + 1)
.
4.10 Famous Example: Fibonacci Numbers
Let F
n+1
= F
n
+ F
n1
for n 0, and put F
1
= 0, F
0
= 1. Put f
egf

F
n

0
, i.e., f =

n=0
F
n
x
n
n!
. By Rule 1
t
we have f
t
egf
F
n+1

n=0
and
138CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
f
tt
egf
F
n+2

n=0
. Use the recursion given in the form F
n+2
= F
n+1
+ F
n
,
n 0. So by Rule 1
t
we have f
tt
= f
t
+ f. From the theory of dierential
equations we see that f(x) = c
1
e
r
+
x
+ c
2
e
r

x
, where r

=
1

5
2
, and where
c
1
and c
2
are to be determined by the initial conditions: f(0) = F
0
= 1, and
f
t
(0) = 1!F
1
= 1. Then f(0) = c
1
+c
2
= 1 and f
t
(0) = r
+
c
1
+r

c
2
= 1. So
_
1
1
_
=
_
1 1
r
+
r

__
c
1
c
2
_

c
1
=

1 1
1 r

1 1
r
+
r

=
r

1
r

r
+
=
1

5
2

5
=
1 +

5
2

5
=
r
+

5
.
Similarly,
c
2
=

1 1
r
+
1

r
+
=
r

5
.
So f =
1

5
(r
+
e
r
+
x
r

e
r

x
) =
1

5
_
r
+

n=0
(r
+
)
n x
n
n!
r

n=0
(r

)
n x
n
n!
_
.
Then
F
n
=
_
x
n
n!
_
f =
1

5
(r
+
n+1
r

n+1
). (4.33)
Suppose f
egf
a
n

0
. Then Df = f
t
egf
a
n+1

n=0
and (xD)f
egf

na
n

0
. f =

a
n
x
n
n!
f
t
=

a
n
x
n1
(n1)!
xf
t
= (xD)f =

a
n
x
n
(n1)!
=

na
n
x
n
n!
. So xf
t
egf
na
n

n=0
. This leads easily to
4.11 Roots of a Power Series
We continue to assume that R is an integral domain with characteristic 0.
Occasionally we need to solve a polynomial equation for a a power series.
As an example we consider the nth root. Let g(x) R[[x]] satisfy g(0) =
n
,
R,
1
R. We want to determine f R[[x]] such that
4.12. LAURENT SERIES AND LAGRANGE INVERSION 139
f
n
(x) = g(x) with f(0) = . (4.34)
Then the unique such power series is
(write
n
g(x) = ((
n
g(x) 1) + 1))
f(x) = (
n
g(x))
1/n
=

i0
_
1/n
i
_
(
n
g(x) 1)
i
, (4.35)
since (
n
g(x) 1)
i
R[[x]] with o((
n
g(x) 1)
i
) 1. This is a solution
since
f
n
(x) =
n
(
n
g(x))
1/n

n
=
n
(
n
g(x))
1
= g(x),
from Eq. 4.30 and since f(0) = . To extablish uniqueness, suppose that f
and h are both solutions to Eq. 4.34, so that
0 = f
n
h
n
= (f h)(f
n1
+f
n2
h + +fh
n2
+h
n1
).
Since R, and therefore R[[x]], has no zero divisors, then either f h = 0
or (f
n1
+f
n2
h + +fh
n2
+h
n1
) = 0. But
(f
n1
+f
n2
h + +fh
n2
+h
n1
)[
x=0
= n
n1
,= 0
since ,= 0 and R has characteristic 0 and no zero divisors. Thus f = h and
the f of Eq. 4.34 is the unique solution to Eq. 4.35.
This result is used most frequently when f(x) satises a quadratic equa-
tion with a given initial condition.
4.12 Laurent Series and Lagrange Inversion
Again in this section we assume that R is a eld with characteristic 0, so
that the notation and results of the preceding sections will apply. Usually
this eld is the eld of quotients of an integral doman obtained by starting
with the complex numbers and adjoining polynomials and/or power series in
sets of commuting independent variables.
140CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
The quotient eld of R[[x]] may be identied with the set R((x)) of so-
called Laurent series f(x) =

n=k
a
n
x
n
, where k Z is the order o(f(x))
provided a
k
,= 0. When k 0 this agrees with the earlier denition of
order. We give the coecient of x
1
a name familiar from complex analysis.
If a(x) =

n=k
a
n
x
n
, we say that a
1
is the residue of a(x). This will be
written as Res a(x) = [x
1
]a(x).
For a Laurent series f, the multiplicative inverse exists i o(f) < . If
o(f) = k, then f = x
k
g where g c((x)) has o(g) = 0. In this case we dene
f
1
= x
k
g
1
.
Since the usual quotient formula for derivatives of power series holds, it is
straightforward to carry the theory of the derivative over to Laurent series.
The following facts are then easily proved (Exercises!):
Exercise: 4.12.1 If w(x) is a Laurent series, then
(R1) Res(w
t
(x)) = 0;
and
(R2) [x
1
]
_
w

(x)
w(x)
_
= o(w(x)).
We have already mentioned the idea of an inverse function of a power
series with order 1 and have shown how to compute its coecients recursively.
The next theorem gives an expression for the coecients.
Theorem 4.12.2 Let W(x) = w
1
x+w
2
x
2
+ be a power series with w
1
,=
0. Let Z(w) = c
1
w+c
2
w
2
+ be a power series in w such that Z(W(x)) = x.
Then
c
n
= Res
_
1
nW
n
(x)
_
. (4.36)
Proof: From our computations above we see that c
1
= w
1
1
. Now apply
formal derivation to Z(W(x)) = x. This yields:
1 =

k=1
kc
k
W
k1
(x)W
t
(x). (4.37)
4.12. LAURENT SERIES AND LAGRANGE INVERSION 141
Consider the series obtained by dividing this equation by nW
n
(x):
[x
1
]
_
1
nW
n
(x)
_
=
= [x
1
]
_
c
n
W
t
(x)
W(x)
_
+ [x
1
]
_
_
_

k1:k,=n
kc
k
n
W
k1n
(x)W
t
(x)
_
_
_
If n ,= k, then the term W
k1n
(x)W
t
(x) is a derivative by the chain rule
and hence has residue 0 by (R1). Now apply (R2) to the term with n = k
to see that the residue of the R.H.S. is equal to c
n
o(W(x)) = c
n
, proving
the theroem.
Practice using the previous theorem on the following exercises:
Exercise: 4.12.3 (i) If w = W(z) =

n=1
z
n
, use the previous theorem
to compute z = Z(w). Check your result by expressing z and w as simple
rational functions of each other.
(ii) Put w = W(z) =
z
(1z)
2
. Use the previous theorem to compute z =
Z(w).
(iii) Put w = W(z) =
z
(1z)
2
. Use the quadratic formula to solve for z
as a function of w. Then use the binomial expansion (for (1 + t)
1
2
with an
appropriate t to solve for z as a function of w.
(iv) Show that the two answers you get for parts (ii) and (iii) are in fact
equal.
Before proving our next result we need to recall the so-called Rule of
Leibniz:
Exercise: 4.12.4 Let D denote the derivative operator, and for a function
f (in our case a formal power series or Laurent series) let f
(j)
denote the
jth derivative of f, i.e., D
j
(f) = f
(j)
.
(i) Prove that
D
n
(f g) =
n

i=0
_
n
i
_
f
(i)
g
(ni)
.
142CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
(ii) Derive as a corollary to part (i) the fact that
D
j
(f
2
) =

i
1
+i
2
=j
_
j
i
1
, i
2
_
f
(i
1
)
f
(i
2
)
.
(iii) Now use part (i) and induction on n to prove that
D
j
(f
n
) =

i
1
++i
n
=j
_
j
i
1
, . . . , i
n
_
f
(i
1
)
f
(i
n
)
.
Theorem 4.12.5 (Special Case of Lagrange Inversion Formula) Let f(z) =

i=0
f
i
z
i
with f
0
,= 0, so in particular f(z)
1
c[[z]]. If w = W(z) =
z
f(z)
(which is a power series with o(W(z)) = 1), then we can solve for z = Z(w)
as a power series in w with order 1. Specically,
z = Z(w) =

n=1
c
n
w
n
, with
c
n
= Res
_
f
n
(z)
nz
n
_
=
1
n!
_
D
n1
f
n
_
(0).
Proof: Since f(z) =

i=0
f
i
z
i
with f
0
,= 0, in c[[z]], W(z) =
z

i=0
f
i
z
i
=
1

i=0
f
i
z
i1
= (f
0
z
1
+ f
1
+ f
2
z + f
3
z
2
+ )
1
=

n=1
w
n
z
n
. Here f
0
,= 0
implies that w
1
,= 0. By Theorem 4.12.2, z =

n=1
c
n
w
n
, where
c
n
= Res
_
_
1
n
_
z
n
f
n
(z)
_
_
_
= Res
_
f
n
(z)
nz
n
_
=
1
n
[z
n1
]f
n
(z) =
1
n

f
i
1
f
i
n
.
Here the sum is over all (i
1
, . . . , i
n
) for which i
1
+ i
n
= n 1 and i
j
0
for 1 j n. But now we need to evaluate
1
n!
_
D
n1
f
n
_
(0) =
1
n!

_
n 1
i
1
, , i
n
_
f
(i
1
)
f
(i
n
)
(0),
where the sum is over all (i
1
, . . . , i
n
) for which i
1
+ i
n
= n 1 and i
n
0
for 1 j n. In this expression f
(i
j
)
is the (i
j
)th derivative of f and when
4.12. LAURENT SERIES AND LAGRANGE INVERSION 143
evaluated at 0 yields f
i
j
i
j
! by Taylors formula. Hence the sum in question
equals
1
n!

_
(n 1)!i
1
!f
i
1
i
n
!f
i
n
i
1
! i
n
!
_
=
1
n

f
i
1
f
i
n
,
as desired.
Let
f(x) =
j

i=1
a
i
x
i
+a
0
+

i1
a
i
x
i
;
f
t
(x) =
j

i=1
ia
i
x
i1
+ 0 +

i1
ia
i
x
i1
.
Similarly, let
g(x) =
j

i=1
b
i
x
i
+a
0
+

i1
b
i
x
i
;
g
t
(x) =
j

i=1
ib
i
x
i1
+ 0 +

i1
ib
i
x
i1
.
Then it is easy to compute the following:
[x
1
] f(x)g
t
(x) =
j

i=1
ia
i
b
i
+
k

i=1
ia
i
b
i
;
[x
1
] f
t
(x)g(x) =
j

i=1
ia
i
b
i
+
k

i=1
ia
i
b
i
= [x
1
] f(x)g
t
(x) .
Note that neither a
0
nor b
0
aects this value. Hence we may write for
f, g R((x)),
[x
1
]fg
t
= [x
1
]f
t
(g(x) g(0)) (4.38)
When we use this a little later, g(w) = log((w)), so g
t
(w) =
t
(w)/(w),
and Eq. 4.38 then appears as
144CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
[w
1
]
_
f(w)
t
(w)
1
(w)
_
= [w
1
]
_
f
t
(w)
_
log
_
(w)
(0)
___
. (4.39)
The next result allows us to change variables when computing residues,
and in some ways is the main result of this section, since the full Lagrange
Inversion formula follows from it.
Theorem 4.12.6 (Residue Composition) Let f(x), r(x) c((x)) and sup-
pose that = o(r(x)) > 0. We want to make the substitution x = r(z).
[x
1
]f(x) = [z
1
]f(r(z))r
t
(z).
Proof: First consider f(x) = x
n
, 1 ,= n Z. Then [z
1
]r
n
(z)r
t
(z)
= (n + 1)
1
[z
1
]
__
d
dz
_
r
n+1
(z)
_
= 0 by (R1), since r
n+1
(z) c((z)). Also,
[x
1
]x
n
= 0.
On the other hand, if n = 1, then [z
1
]r
t
r
1
= o(r(z)) = > 0. It
follows that for all integers n, [z
1
]r
n
(z)r
t
(z) =
n,1
= [x
1
]x
n
. Now
let f(x) =

nk
a
n
x
n
(o(f(x)) = k < ). Since o(r(z)) > 0, f(r(z)) exists,
and we have
[x
1
]f(x) = [z
1
]

nk
a
n
r
n
(z)r
t
(z) = [z
1
]f(r(z))r
t
(z).
As an application of Residue Composition we present the following prob-
lem: Find a closed form formula for the sum
S =
n

k=0
_
2n + 1
2k + 1
__
j +k
2n
_
.
We give the major steps as little exercises.
1. Put f(x) =
1
2x
(1 +x)
2n+1
(1 x)
2n+1
. Show that
f(x) =
n

k=0
_
2n + 1
2k + 1
_
x
2k
.
2. f((1 +y)
1/2
) =

n
k=0
(1 +y)
k
[x
2k
]f(x).
4.12. LAURENT SERIES AND LAGRANGE INVERSION 145
3. [y
2n
]
_
(1 +y)
j

n
k=0
(1 +y)
k
_
2n + 1
2k + 1
__
=

k
_
2n + 1
2k + 1
__
j +k
2n
_
= S. (Hint: At one stage you will have to use
the fact that

m
_
a
m
__
b
n m
_
=
_
a +b
n
_
for appropriate choices of
a, b, m, n. You might want to prove this as a separate step if you have not
already done so.)
So at this stage we have
S = [y
2n
]
_
(1 +y)
j
n

k=0
(1 +y)
k
[x
2k
]f(x)
_
=
= [y
1
]
_
y
(2n+1)
(1 +y)
j
f((1 +y)
1/2
)
_
.
At this point we want to use Residue Composition using the substitution
y = y(z) = z
2
(z
2
2), so o(y(z)) = 2, and y
t
(z) = 4z(z
2
1). Also, (1+y)
1/2
=
(1 z
2
). Now use f((1 + y)
1/2
) = f(1 z
2
) =
1
2(1z
2
)
(2 z
2
)
2n+1
z
4n+2

and Residue Composition to obtain S =


[z
1
]
2
_
z
(4n+2)
(z
2
2)
(2n+1)
(1 z
2
)
2j
(2 z
2
)
2n+1
z
4n+2

2(1 z
2
)
4z(z
2
1)
_
which simplies to
[z
1
]
_
(z
2
1)
2j
_
1
(z
2
2)
2n+1
+
1
z
4n+2
_
z
_
=
[z
1
]
_
(z
2
1)
2j
z
(4n+1)
_
+ 0,
since
1
(z
2
2)
2n+1
is a power series, so when multiplied by (z
2
1)
2j
it contributes
nothing to [z
1
]. Hence
S = [z
4n
]
_
(z
2
1)
2j
_
=
_
2j
2n
_
.
146CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
Theorem 4.12.7 (Lagrange Inversion) Let (x) c[[x]] with o((x)) = 0.
Hence
1
(x) exists and w
1
(w) has order 1 in w. Put t = w
1
(w),
i.e., w = t(w). We want to generalize the earlier special case of Lagrange
Inversion that found w as a function of t. Here we let f(t) be some Laurent
series about t and nd the coecients on powers of t in f(W(t)). Specically,
we have the following:
1. If f() c(()), then
[t
n
]f(W(t)) =
_
1
n
[
n1
]f
t
()
n
(), for 0 ,= n o(f);
[
0
]f() + [
1
]
_
f
t
()log
_
()
(0)
__
, for n = 0.
2. If F() c[[]], then
F(w)
_
1
w
t
(w)
(w)
_
1
=

n0
c
n
t
n
, where c
n
= [
n
]F()
n
().
Proof: Let (w) =
w
(w)
, so t = (w) and o((w)) = 1, which implies that

[1]
() exists. Here w =
[1]
(t) is the unique solution w of w = t(w). For
any integer n : [t
n
]f(W(t)) = [t
1
]t
(n+1)
f(
[1]
(t)). Now use Residue
Composition to substitute t = (w) with = 1 = o(), and f(x) of the
Residue Composition theorem is now
t
(n+1)
f(
[1]
(t)). Hence
[t
n
]f(W(t)) =
1
n
[w
1
]f(w)(
n
(w))
t
=
=
1
n
[w
1
]f
t
(w)
n
(w) =
1
n
[w
1
]
_
f
t
(w)

n
(w)
w
n
_
.
If n = 0, [t
n
]f(w) = [t
0
]f(w) = [w
1
]
1
(w)f(w)
t
(w) = [w
1
]
_
f(w)
(w)
w
_
(w)w

(w)

2
(w)
__
=
[w
1
]
f(w)
w
f(w)
_

(w)
(w)
_
=
[w
0
]f(w) + [w
1
]
_
f
t
(w)log
_
(w)
(0)
__
.
This completes the proof of 1. Now let F() c[[]]. It follows that
F()
1
() c[[]]. Hence we may put f(w) =
_
w
0
F()
1
()d and know
4.12. LAURENT SERIES AND LAGRANGE INVERSION 147
that f(w) c[[w]]. Also, since f
t
() = F()
1
(), we see that F(w) =
f
t
(w)(w). By 1., f(w) = f(0)+

n1
1
n
[
n1
]
n
()f
t
()t
n
. Dierentiate
this latter equality with respect to t:
f
t
(w)
dw
dt
=

n1
[
n1
]
n
()f
t
()t
n1
=

n0
[
n
]
n+1
()f
t
()t
n
.
But w = t (w) implies that
dw
dt
= (w) + t
t
(w)
dw
dt
, from which we see
that
dw
dt
= (w)[1 t
t
(w)]
1
.
Putting this all together, we nd
f
t
(w)(w)[1 t
t
(w)]
1
= F(w)[1 t
t
(w)]
1
=
=

n0
[
n
]
n+1
()f
t
()t
n
=

n0
[
n
]
n
()F()t
n
.
We write this nally in the form:
F(W(t))
1 t
t
(W(t))
=

n0
[
n
]
n
()F()t
n
.
The following example illustrates the above ideas and gives some idea of
the power of the method.
Example of Inversion Formula Suppose that for all n Z we have
the following relation:
b
n
=

k
_
k
n k
_
a
k
. (4.40)
Then we want to show that
na
n
=

k
_
2n k 1
n k
_
(1)
nk
kb
k
. (4.41)
The latter formula says nothing for n = 0, but the former says that
a
0
= b
0
. Multiply Eq. 4.40 by w
n
and sum over n:
148CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
B(w) =

n
b
n
w
n
=

n
_

k
_
k
n k
_
a
k
_
w
n
=

k
a
k

n
_
k
n k
_
w
n
=

k
a
k
w
k

n
_
k
n k
_
w
nk
=
=

k
a
k
w
k
k

n=0
_
k
n
_
w
n
=

k
a
k
w
k
(1 +w)
k
=
= A(w +w
2
) = A(t),
where we have put t = w + w
2
= w(1 + w), or w = t
_
1
1+w
_
. So in the
notation of the Theorem of Lagrange, (w) =
1
1+w
. So if w = W(t) we want
to nd A(t) = B(W(t)), i.e., we want the coecients of A in terms of the
coecients of B:

a
n
t
n
=

b
k
(W(t))
k
. At this stage we can say:
a
n
= [t
n
]

k
b
k
(W(t))
k
=

k
[t
n
](W(t))
k
.
In the notation of the theorem of Lagrange, put f(u) = u
k
, so that f
t
(u) =
ku
k1
and f(W(t)) = (W(t))
k
. So for n > 0,
[t
n
](W(t))
k
=
1
n
[
n1
]f
t
()
n
() =
=
1
n
[
n1
]
_
k
k1
(1 +)
n
_
=
_
k
n
_
[
nk
]
_
1
(1 +)
n
_
=
=
_
k
n
_
[
nk
]

i=0
_
n
i
_

i
=
k
n
_
n
n k
_
=
k
n
(1)
nk
_
2n k 1
n k
_
.
This implies that a
n
=

k
k
n
(1)
nk
_
2n k 1
n k
_
b
k
, as desired.
Central Trinomial Numbers We shall use the second statement in
the Theorem of Lagrange to nd the generating function of the central
trinomial numbers c
n
dened by c
n
= [
n
](1 + +
2
)
n
. Clearly c
n
=
[
n
]F()
n
() where F() = 1, () = 1++
2
. So
t
() = 1+2. Part
2 of the Theorem of Lagrange says that
4.13. EGF: A SECOND LOOK 149

n0
c
n
t
n
= F(w)1 t
t
(w)
1
,
where w = t(w) = (t(1+w+w
2
)). Hence tw
2
+(t1)w+t = 0, implying that
w =
1t

12t3t
2
2t
. It is easy to compute that 1 t
t
(w) =

1 2t 3t
2
.
Now it follows that

n0
c
n
t
n
= (1 2t 3t
2
)
1/2
.
Exercise: 4.12.8 Show that c
n
=

n
2
in
135(2i1)3
ni
(ni)!(2in)!2
ni
=

n
2
in
_
n
i
__
i
n i
_
. (Hint: Remember that you now have c
n
described
in two dierent ways as a coecient of a certain term in a power series
expansion of some ordinary generating function.)
4.13 EGF: A Second Look
Let M denote a type of combinatorial structure. Let m
k
be the number
of ways of giving a labeled k-set such a structure. In each separate case we
shall specify whether we take m
0
= 0 or m
0
= 1. Then dene
M(x) =

k=0
m
k
x
k
k!
.
Consider a few examples. If T denotes the structure labeled tree, then
as we saw above, T(x) =

k=0
k
k2 x
k
k!
. Similarly, if S denotes the structure
a set (often called the uniform structure), then s
k
= 1 for all k 0, so
S(x) =

k=0
x
k
k!
= e
x
. If C denotes oriented circuit, then c
k
= (k 1)! for
k 1. Put c
0
= 0. Then C(x) =

k=1
x
k
k
= log
_
1
1x
_
= log(1 x). If
denotes the structure permutation, then (x) =

k=0
k!
x
k
k!
=

k=0
x
k
=
1
1x
.
Suppose we wish to consider the number of ways a labeled n-set can be
partitioned into two parts, one with a structure of type A and the other with a
structure of type B. The number of ways to do this is clearly

n
k=0
_
n
k
_
a
k
b
nk
.
It follows that if we call this a structure of type A B, then
150CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
(A B)(x) =

n=0
_
n

k=0
_
n
k
_
a
k
b
nk
_
x
n
n!
= A(x) B(x).
Famous Example: Derangements again Let D denote the structure
derangement. Any permutation consists of a set of xed points (interpreted
as a set) and a derangement on the remaining points. Hence we have (x) =
S(x) D(x), i.e., (1 x)
1
= D(x) e
x
, implying
D(x) = e
x
(1 x)
1
=

k=0
(1)
k
x
k
k!

j=0
x
j
=
=

n=0
_
n

k=0
(1)
k
k!
1
_
x
n
=

n=0
_
n!
n

k=0
(1)
k
k!
_
x
n
n!
.
It follows that we get the usual formula:
d
n
= n!
n

k=0
(1)
k
k!
.
EXAMPLE 6. In how many ways can a labeled n-set be split into a
number of pairs and a number of singletons?
First, let p
n
be the number of ways to split an n-set into pairs. Clearly,
if n is odd, p
n
= 0. By convention we say that p
0
= 1. Suppose n = 2k 2.
Pick a rst element a
1
in 2k ways, and then the second element in 2k 1
ways, the third in 2k2 ways, etc., so that a
1
, a
2
, a
3
, a
4
, , a
2k1
, a
2k

is chosen in (2k)! ways. But the same pairs could be chosen in k! orders, and
each pair in two ways, so that
p
2k
=
(2k)!
2
k
k!
=
(2k)(2k 1)(2k 2)(2k 3) 1
2
k
k!
=
2
k
k!(2k 1)!!
2
k
k!
= (2k 1)!!
Here (2k 1)!! = (2k 1)(2k 3)(2k 5) 1, with (2 0 1)!! = 1 by
convention. Then we nd
P(x) :=

n=0
p
n
x
n
n!
=

k=0
p
2k
x
2k
(2k)!
4.13. EGF: A SECOND LOOK 151
=

k=0
(2k 1)!!
x
2k
(2k)!
=

k=0
x
2k
2
k
k!
=

k=0
_
x
2
2
_
k
k!
= e
x
2
2
.
The number of ways to pick n singletons from an n-set is 1, i.e., the
corresponding egf is S(x) = e
x
. Hence
(P S)(x) = P(x) S(x) = exp(
1
2
x
2
) exp(x) = exp(x +
1
2
x
2
).
We can also obtain the same result by using a recursion relation. Denote
the structure PS by B. In the set 1, . . . , n we can either let n be a singleton
or make a pair x, n with 1 x n1. So b
n
= b
n1
+(n1)b
n2
, n 1.
As b
1
= 1 by denition, and b
1
= b
0
according to the recursion, it must be
that b
0
= 1. Multiply the recursion by
x
n1
(n1)!
for n 1 and sum.

n=1
b
n
x
n1
(n 1)!
=

n=1
b
n1
x
n1
(n 1)!
+

n=1
(n 1)b
n2
x
n1
(n 1)!
.
Also B(x) =

n=0
b
n
x
n
n!
implies
B
t
(x) =

n=1
b
n
x
n1
(n 1)!
.
This implies that
B
t
(x) = B(x) +xB(x) = (1 +x)B(x).
Since B(0) = 1, the theory of dierential equations shows that B(x) =
exp(x +
1
2
x
2
).
Example 7. Recall (Theorem 1.7.1) that the Stirling number S(n, k) of
the second kind, the number of partitions of an n-set into k nonempty blocks,
satises the following recursion:
S(n, k) = kS(n1, k)+S(n1, k1), n k; S(n, k) = 0 for n < k. (4.42)
152CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
Multiply Eq. 4.42 by
x
n1
(n1)!
and sum over n k:

nk
S(n, k)
x
n1
(n 1)!
=

nk
k(S(n1, k))
x
n1
(n 1)!
+

nk
S(n1, k1)
x
n1
(n 1)!
.
Put F
k
(x) =

nk
S(n, k)
x
n
n!
. Then F
t
k
(x) =

nk
S(n, k)
x
n1
(n1)!
and

nk
k(S(n 1, k))
x
n1
(n 1)!
= k

nk+1
S(n 1, k)
x
n1
(n 1)!
= k

nk
S(n, k)
x
n
n!
.
Also

nk
S(n 1, k 1)
x
n1
(n 1)!
=

n1k1
S(n 1, k 1)
x
n1
(n 1)!
= F
k1
(x).
The preceding says that
F
t
k
(x) = kF
k
(x) +F
k1
(x). (4.43)
We now use induction on k in Eq. 4.43 to prove the following:
Theorem 4.13.1

nk
S(n, k)
x
n
n!
=
1
k!
(e
x
1)
k
.
Proof: For n 1, S(n, 1) = 1. And

n1
1
x
n
n!
=
1
1!
(e
x
1)
1
. So the
theorem is true for k = 1.
The induction hypothesis is that for 1 t < k, F
t
(x) =
1
t!
(e
x
1)
t
. then
F
t
k
(x) = kF
k
(x) +F
k1
(x) implies
F
t
k
(x) = kF
k
(x) +
1
(k1)!
(e
x
1)
k1
.
[x
k
]F
k
(x) = [x
k
]
_

nk
S(n, k)
x
n
n!
_
=
S(k,k)
k!
=
1
k!
.
Put G
k
(x) =
1
k!
(e
x
1)
k
. Then [x
k
]G
k
(x) =
1
k!
and
G
t
k
(x) =
1
(k1)!
(e
x
1)
k1
e
x
. Also kG
k
(x) + G
k1
(x) =
k
k!
(e
x
1)
k
+
4.13. EGF: A SECOND LOOK 153
1
(k1)!
(e
x
1)
k1
=
1
(k1)!
(e
x
1
k1
[e
x
1 + 1] = G
t
k
(x). This is enough
to guarantee that F
k
(x) = G
k
(x).
Let n be a positive integer. For each k-tuple (b
1
, . . . , b
k
) of nonnegative
integers for which b
1
+ 2b
2
+ + kb
k
= n, we count how many ways there
are to partition an n-set into b
1
parts of size 1, b
2
parts of size 2, . . . , b
k
parts
of size k. Imagine the elements of the n-set are to be placed in n positions.
The positions are grouped from left to right in bunches. The rst b
1
bunches
have one position each; the second group of b
2
bunches have b
2
blocks of size
2, etc. There are n! ways to order the integers in the positions. Within each
grouping of k positions there are k! ways to permute the integers within those
positions. So we divide by (1!)
b
1
(2!)
b
2
(k!)
b
k
. But the groups of the same
cardinality can be permuted without aecting the partition. So we divide by
b
1
!b
2
! b
k
!. Hence the number of partitions is:
n!
b
1
! b
k
!(1!)
b
1
(k!)
b
k
.
Now suppose that each j-set can have n
j
structures of type N on it. So
each partition gives (n
1
)
b
1
(n
k
)
b
k
congurations. Hence the total number
of such congurations is
n!
b
1
! b
k
!

_
n
1
1!
_
b
1

_
n
k
k!
_
b
k
.
It follows that the number of congurations on an n-set is
a
n
=

n!
b
1
! b
k
!

_
n
1
1!
_
b
1

_
n
k
k!
_
b
k
,
where the sum is over k-tuples (b
1
, . . . , b
k
) with b
1
+2b
2
+ +kb
k
= n, b
i
0;
k 0. Among the tuples (b
1
, . . . , b
k
) for which b
1
+ 2b
2
+ + kb
k
= n, we
lump together those for which b
1
+ +b
k
is a constant, say b
1
+ +b
k
= m,
m = 0, 1, . . .. If we let A(x) =

n=0
a
n
x
n
n!
, we see that the coecient on x
n
equals

1
b
1
! b
k
!
_
n
1
1!
_
b
1

_
n
k
k!
_
b
k
,
154CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
where the sum is as above, but we think of it as coming in parts, part m
being the sum of those terms with b
1
+b
2
+ +b
k
= m.
Put N(x) =

i=1
n
i
x
i
i!
. What does
N(x)
m
m!
contribute to the coecient of
x
n
in

m=0
N(x)
m
m!
? In expanding
1
m!
N(x)N(x) N(x) (m factors), choose
terms of degree i, b
i
times, 1 i n. There are
_
m
b
1
, . . . , b
k
_
ways to
choose terms of degree i, b
i
times, where b
1
+ +b
k
= m. This gives a term
of degree 1b
1
+2b
2
+ +kb
k
. So the contribution to the term of degree n is

1
m!
_
m
b
1
, . . . , b
k
_
_
n
1
x
1!
_
b
1

_
n
k
x
k
k!
_
b
k
=
=

1
b
1
! b
k
!
_
n
1
1!
_
b
1

_
n
k
k!
_
b
k
x
b
1
+2b
2
++kb
k
.
The sum is over all k 0, and over all (b
1
, . . . , b
k
) with b
i
0,

ib
i
= n,

b
i
= m. Now sum over all m. (Of course the contribution is zero unless
m n.) It is clear that A(x) =

n=0
a
n
x
n
n!
= exp(N(x)), and we have proved
the following theorem.
Theorem 4.13.2 If the compound structure S(N) is obtained by splitting a
set into parts, each of which then gets a structure of type N, and if a k-set
gets n
k
structures of type N, so N(x) =

k=1
n
k
x
k
k!
, and there are
_
n
k
_
ways
of selecting a k-set, then
S(N)(x) = exp(N(x)).
(Keep in mind that S(x) =

1
x
k
k!
, since there is only 1 way to impose
the structure of set on a set.)
Example 8. If we substitute the structure oriented cycle into the uni-
form structure (set), the we are considering the compound structure consist-
ing of a partition of an n-set into oriented cycles, i.e., the structure with

0
= 1. So we must have (x) = exp(C(x)). Indeed, above we determined
that (x) = (1 x)
1
and C(x) = log(1 x).
Exercise: 4.13.3 A directed tree with all edges pointing toward one vertex
called the root is called an arborescence. Let T(x) =

n=1
t
n
x
n
n!
, where t
n
4.14. DIRICHLET SERIES - THE FORMAL THEORY 155
is the number of labeled trees on n vertices. And let A(x) =

n=1
a
n
x
n
n!
,
where a
n
is the number of arborescences on n vertices. Since a labeled tree
on n vertices can be rooted in n ways and turned into an arborescence, and
the process is reversible, clearly a
n
= nt
n
, i.e., A(x) = xT
t
(x). Consider a
labeled tree on n+1 vertices as an arborescence with vertex n+1 as its root.
Then delete the root and all incident edges. The result is a rooted forest
on n vertices, with the roots of the individual trees being exactly the vertices
that were originally adjacent to the root n + 1. If F(x) =

n=1
f
n
x
n
n!
, where
f
n
is the number of rooted forests on n vertices (and f
0
= 1 by convention),
then by Theorem 4.13.2, exp(A(x)) = F(x). Hence we have
exp(A(x)) =

n=0
f
n
x
n
n!
=

n=0
t
n+1
x
n
n!
= T
t
(x) = x
1
A(x).
Use the special case of Lagrange Inversion to nd c
n
if A(x) =

n=1
c
n
x
n
=

n=1
a
n
x
n
n!
, and complete another proof of Cayleys Theorem.
4.14 Dirichlet Series - The Formal Theory
In this brief section we just introduce the notion of Dirichlet Series.
Def.
n
Given a sequence a
n

, the formal series


f(s) =

n=1
a
n
n
s
is the Dirichlet series generating function (Dsgf) of the given sequence:
f(s)
Dsfg
a
n
.
Suppose A(s) =

a
n
n
s
and B(s) =

b
n
n
s
. The Dirichlet convolution
product is
A(s)B(s) =

m,n=1
a
n
n
s
b
m
m
s
=

n=1
_
_

d[n
a
d
b
n/d
_
_
1
n
s
.
Rule 1
tt
A(s)B(s)
Dsfg

d[n
a
d
b
n/d
_

n=1
.
Rule 2
tt
A(s)
k
Dsfg

(n
1
,...,n
k
):n
1
n
k
=n
a
n
1
a
n
2
a
n
k
_

n=1
.
A most famous example is given by the Riemann zeta function
156CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
(s) =

n=1
1
n
s
Dsfg
1

n=1
.
Theorem 4.14.1 Let f be a multiplicative arithmetic function. Then
(i) L(f, s) =

n=1
f(n)
n
s
=

p prime
_

i=0
f(p
i
)
p
is
_
.
(ii) If f is completely multiplicative, then
L(f, s) =

p prime
_
1
f(p)
p
s
_
1
.
Proof: If the unique factorization of n is n = p
e
1
1
p
e
r
r
, then there is a
unique term in the product that looks like

r
i=1
f(p
e
i
i
)
(p
e
i
i
)
s
=
f(n)
n
s
.
Since U dened by U(n) = 1 is completely multiplicative, we may write
(s) =

1
n
s
=

p prime
_
1
1
p
s
_
1
=
_

p prime
(1 p
s
)
_
1
. Hence
(s)
1
=

p prime
_
1 p
s
_
. (4.44)
L(, s) =

n=1
(n)
n
s
=

p
_

i=0
(p
i
)
p
is
_
=

p
_
1 p
s
_
=
1
(s). (4.45)
In other words,
1
(s)
Dsfg
(n)

n=1
. (4.46)
In the present context we give another proof of the usual Mobius Inversion
Formula.
Theorem 4.14.2 Let F and f be arithmetic functions. Then
F(n) =

d[n
f(d) for all n ^ i f(n) =

d[n
F(d)(n/d) =
=

d[n
(d)F(n/d), for all n ^.
4.14. DIRICHLET SERIES - THE FORMAL THEORY 157
Proof: suppose F(s)
Dsfg
F(n), f(s)
Dsfg
f(n). Then
F(s) = f(s) (s) i F(s)((s))
1
= f(s).
So F = f i f = F
1
.
Recall the following commonly used multiplictive arithmetic functions in
this context.
I(n) =
_
1, n = 1;
0, n > 1.
So L(I, s) =

I(n)
n
s
= 1 is the multiplicative identity.
U(n) = 1 for all n ^. So L(U, s) = (s).
E(n) = n for all n ^. So L(E, s) =

n
n
s
=

1
n
s1
= (s 1).
(n) =

d[n
1, so = U U is multiplicative, and

2
(s) =

n
(

d[n
1 1)
1
n
s
=

n
(n)
n
s
.
(n) =

d[n
d =

d[n
E(d)U(n/d) = (E U)(n). Hence
= E U is multiplicative.
Since = U
1
, E = , wich says n =

d[n
(d)(n/d). And
(s) (s 1) = (

1
n
s
) (

n
n
s
) =

n
_
_

k[n
1
n
k
_
_
1
n
s
=

(n)
n
s
.
Similarly,
(s) (s q) =

1
n
s

n
q
n
s
=

n
_
_

d[n
1
_
n
d
_
q
_
_
1
n
s
=

d[n
d
q
n
s
.
We give some more examples.
158CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
Example 4.14.3

t
(s) =

n=1
_
1
n
s
_
t
=

(n
s
)
t
n
2s
=
n
s
log(n)
n
2s
=

log(n)
n
s

t
(s)
Dsfg
log(n).
Example 4.14.4 The familiar identity

d[n
(d) = n
says that U = E, from which we see = E, i.e.,
(n) =

d[n
(d)
n
d
=

d[n
d
_
n
d
_
,
which is the same thing as:
(s 1)
(s)
= L(, s).
Example 4.14.5 Put f(n) = [(n)[ for all n ^. Clearly f is multiplica-
tive. So

f(n)
n
s
=

p
_
1 +
f(p)
p
s
+
f(p
2
)
p
2s
+
_
=

p
_
1 +
1
p
s
_
=

p
_
1
1
p
2s
_
_
1
1
p
s
_
=

p
_
1
1
p
2s
_

p
1
1
1
p
s
.
Also,
4.15. RATIONAL GENERATING FUNCTIONS 159
((2s))
1
=

p
_
1
1
p
2s
_
.
Hence,

[(n)[
n
s
=
(s)
(2s)
.
Example 4.14.6
1 = (s)
1
(s)
=
_

1
n
s
_
_

(n)
n
s
_
=

n
_
_

d[n
1
_
n
d
_
_
_

1
n
s

d[n
(d) =
_
1, n = 1;
0, n > 1.
Example 4.14.7
(s) =
1
(s)

2
(s) 1 =

d[n
(d)
_
n
d
_
.
This also follows from doing Mobius inversion on (n) =

d[n
1.
4.15 Rational Generating Functions
In this section we consider the simplest general class of generating functions,
namely, the rational generating functions in one variable, and their connec-
tion with homogeneous linear recursions. These are generating functions of
the form
U(x) =

n0
u
n
x
n
for which there are p(x), q(x) c[x] with
160CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
U(x) =
p(x)
q(x)
.
Here we assume q(0) ,= 0, so q(x)
1
exists in c[[x]]. Before considering the
connection between rational generating functions and homogeneous linear
recursions, we recall the notion of reverse of a polynomial.
Let f(x) = a
n
+a
n1
x +a
n2
x
2
+ +a
0
x
n
c[x]. The reverse

f(x) of
f(x) is dened by

f(x) = x
n
f(
1
x
) = a
0
+a
1
x + +a
n
x
n
.
If n
0
is the multiplicity of 0 as a zero of f(x), i.e., a
n
= a
n1
= =
a
nn
0
+1
= 0, but a
nn
0
,= 0, and if w
1
, . . . , w
q
are the nonzero roots of
f(x) = 0, then
1
w
1
, . . . ,
1
w
q
are the roots of

f(x) = 0, and

f(x) = a
0
(1
w
1
x) (1 w
q
x). So deg(

f(x)) = n n
0
.
Alternatively, if f(x) = (x
1
)
m
1
(x
s
)
m
s
, where m
1
+ +m
s
= n
and
1
, . . . ,
s
are distinct, then

f(x) = x
n
f(
1
x
) = (1
1
x)
m
1
(1
s
x)
m
s
.
If a
0
a
n
,= 0, so neither f(x) nor

f(x) has x = 0 as a zero, then

f = f,
and f() = 0 if and only if

f(
1
) = 0.
Suppose that U(x) =

n0
u
n
x
n
=
p(x)
q(x)
, where deg(p(x)) < deg(q(x)),
is a rational generating function. We assume q(0) ,= 0 in order that q(x)
1
exist in c[[x]], so we may assume without loss of generality that q(0) = 1.
Hence q(x) = 1 +a
1
+a
2
x
2
+ +a
k
x
k
, p(x) = p
0
+p
1
x + +p
d
x
d
, d < k.
From this it follows that
p
0
+ p
d
x
d
= (1 +a
1
x + +a
k
x
k
)(u
0
+u
1
x
1
+ +u
n
x
n
+ ).
The right hand side of this equality expands to
u
0
+ (u
1
+a
1
u
0
)x + (u
2
+a
1
u
1
+a
2
u
0
)x
2
+
+(u
k1
+a
1
u
k2
+ +a
k1
u
0
)x
k1
.
And for n k,
4.15. RATIONAL GENERATING FUNCTIONS 161
u
n
+a
1
u
n1
+ +a
k
u
nk
= 0, (4.47)
which is the coecient on x
n
. If u
0
, . . . , u
k1
are given, then u
n
is determined
recursively for n k.
Put f(x) = q(x). Then for the complex number , it is easily checked
that f() = 0 if and only if u
n
=
n
is a solution of the recurrence of Eq. 4.47.
The polynomial f(x) is the auxiliary polynomial of the recurrence of
Eq. 4.47.
Theorem 4.15.1 If U(n) =
p(x)
q(x)
, where deg(p(x)) < deg(q(x)), is a ratio-
nal generating function, then the sequence u
n

n=0
where U(x) =

n0
u
n
,
satises a homogeneous linear recurrence, and the denominator q(x) is the
reverse of the auxiliary polynomial of the corresponding recurrence.
Now take the converse point of view.
Let c
0
, c
1
, . . . , c
k1
be given complex constants, and let a
1
, . . . , a
k
also be
given. Let U = u
n
, n 0 be the unique sequence determined by the
following initial conditions and homogeneous linear recursion:
u
0
= c
0
, u
1
= c
1
, . . . , u
k1
= c
k1
,
[HLR]
u
n+k
+a
1
u
n+k1
+a
2
u
n+k2
+ +a
k
u
n
= 0, n 0.
Theorem 4.15.2 The ordinary generating function for the sequenceu
n
de-
ned by [HLR] is
U(x) =

n=0
u
n
x
n
= R(x)/(1 +a
1
x + +a
k
x
k
),
where R(x) is a polynomial with degree less than k.
Proof:
Consider the product:
(1 +a
1
x + +a
k
x
k
)(u
0
+u
1
x + ).
The coecient on x
n+k
is
u
n+k
+a
1
u
n+k1
+a
2
u
n+k2
+ +a
k
u
n
.
162CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
And this equals 0 for n 0 by [HLR], so the only coecients that are possibly
nonzero in the product are those on 1, x, . . . , x
k1
.
Note that the coecients of R(x) may be obtained from multiplying out
the two factors (just as we did above):
R(x) = u
0
+ (u
1
+a
1
u
0
)x + (u
2
+a
1
u
1
+a
2
u
0
)x
2
+
+(u
k1
+a
1
u
k2
+ +a
k1
u
0
)x
k1
.
As u
0
, . . . , u
k1
are given by the initial conditions, R(x) is determined.
Theorem 4.15.3 Suppose (u
n
) is given by [HLR] and the auxiliary polyno-
mial has the form
f(t) = (t
1
)
m
1
(t
s
)
m
s
.
Then
u
n
=
s

i=1
P
i
(n)
n
i
,
where P
i
is a polynomial with degree at most m
i
1, 1 i s.
Proof: By the theory of partial fractions, U(x) can be written as the sum
of s expressions of the form:
()
1
/(1 x) +
2
/(1 x)
2
+ +
m
/(1 x)
m
,
where in each such expression =
i
, m = m
i
, for some i in the range
1 i s.
Recall:
(1 x)
n
=

k=0
_
n +k 1
k
_
x
k
.
So the coecient of x
k
in (*) is

1
_
1 +k 1
k
_

k
+
2
_
2 +k 1
k
_

k
+ +
m
_
m+k 1
k
_

k
=
_

1
_
k
0
_
+
2
_
k + 1
1
_
+ +
m
_
m+k 1
m1
__

k
= P(k)
k
.
4.15. RATIONAL GENERATING FUNCTIONS 163
The formula
_
k +l
l
_
= (k +l)(k +l 1) (k + 1)/l(l 1) 1
shows that
_
k +l
l
_
is a polynomial in k with degree l. Hence P(k) is a
polynomial in k with degree at most m1. The theorem follows.
In practice we assume the form of the result for u
n
and obtain the co-
ecients of the polynomials P
i
(n) by substituting in the initial values of
u
0
, u
1
, . . . , u
k1
and solving k equations in k unknowns.
Example 4.15.4 The Fibonacci Sequence again Put F
0
= F
1
= 1 and
F
n+2
F
n+1
F
n
= 0 for n 0. So the auxiliary equation is 0 = f(t) = t
2

t1 = (t
1
)(t
2
), where
1
=
1+

5
2
,
2
=
1

5
2
. Put F(x) =

n0
F
n
x
n
,
and compute F(x)(1 x x
2
) = 1, so F(x) =
1
1xx
2
ops
F
n

n=0
. Then
F(x) =
A
1
1
x
+
B
1
2
x
=
1
1xx
2
leads to
F(x) =
2
5

i
(
1
)
i
x
i
+
2
5 +

i
(
2
)
i
x
i
.
Hence
F
n
= [x
n
]F(x) =
2(1 +

5)
n
(5

5)2
n
+
2(1

5)
n
(5 +

5)2
n
.
Exercise: 4.15.5 Let u
n

n=0
be the sequence satisfying the recurrence
u
n+4
= 2u
n+3
2u
n+1
+u
n
, n 0,
and satisfying the initial conditions
u
0
= 1, u
1
= +1, u
2
= 0, u
3
= 1.
Find a formula for u
n
. Also nd the generating function for the sequence
u
n

n=0
.
164CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
4.16 More Practice with Generating Func-
tions
Theorem 4.16.1 [y
j
]
_
1
1xxy
_
=

k
_
k
j
_
x
k
=
x
j
(1x)
j+1
.
Proof: For j 0, put g
j
(x) =

k
_
k
j
_
x
k
. Note that g
0
(x) =
1
1x
. We
claim g
j+1
(x) =
x
1x
g
j
(x), for j 0. For j 1, xg
j1
(x) + xg
j
(x) =

kj1
_
k
j1
_
x
k+1
+

kj
_
k
j
_
x
k+1
=
_
j1
j1
_
x
j
+

kj
_
k+1
j
_
x
k+1
= x
j
+

kj+1
_
k
j
_
x
k
=

kj
_
k
j
_
x
k
= g
j
(x). Hence for j 1, g
j
(x) =
x
1x
g
j1
(x). Now put
H(x, y) =

j=0
g
j
(x)y
j
. Then

j1
g
j
(x)y
j
=
x
1x

j1
g
j1
(x)y
j
, implying
H(x, y)g
0
(x) =
xy
1x

j0
g
j
(x)y
j
=
xy
1x
H(x, y). Hence H(x, y)
_
1
xy
1x
_
=
g
0
(x) =
1
1x
, and thus
H(x, y) =
1
1 x xy
.
This forces g
j
(x) =

k
_
k
j
_
x
k
= [y
j
] H(x, y) = [y
j
]
_
1
1xxy
_
= [y
j
]
_
1
1x
1
(
x
1x
)
y
_
=
[y
j
]
_
1
1x

i
_
x
1x
_
i
y
i
_
=
x
j
(1x)
j+1
.
Theorem 4.16.2

k
_
k
nk
_
x
nk
=

k
_
nk
k
_
x
k
= [y
n
]
_
1
1yxy
2
_
.
Proof: For n 0, put f
n
(x) =

k
_
nk
k
_
x
k
(0 k
n
2
). We claim that
f
n+2
(x) = xf
n
(x) +f
n+1
(x). For,
x

0k
n
2
_
n k
k
_
x
k
+

0k
n+1
2
_
n + 1 k
k
_
x
k
=
=

0k
n
2
_
n k
k
_
x
k+1
+
_
n + 1
0
_
+

1k
n+1
2
_
n + 1 k
k
_
x
k
=
=

1t
n+1
2
_
n t + 1
t 1
_
x
t
_
n + 2
0
_
+

1t
n+1
2
_
n + 1 t
t
_
x
t
=
_
n + 2
0
_
+

1t
n+1
2
_
n + 2 t
t
_
x
t
_
n
_
n+2
2
_
+ 1
n
2
_
x
n+2
2
4.16. MORE PRACTICE WITH GENERATING FUNCTIONS 165
=

0t
n+2
2
_
n + 2 t
t
_
x
t
= f
n+2
(x).
Note that f
0
(x) = 1; f
1
(x) = 1; f
2
(x) = 1 +x.
Put G(x, y) =

n=0
f
n
(x)y
n
. Multiply the recursion just established for
f
n
(x) by y
n+2
, n 0, and sum over n.

n=0
xf
n
(x)y
n+2
+

n=0
f
n+1
(x)y
n+2
=

n=0
f
n+2
(x)y
n+2
xy
2
G(x, y) +y (G(x, y) f
0
(x)) = G(x, y) f
0
(x) f
1
(x)y
G(x, y) = [xy
2
+y 1] = y(1 1) 1 = 1.
G(x, y) =
1
1 y xy
2
.
Note that f
n
(1) =

k
_
nk
k
_
= [y
n
]
_
1
1yy
2
_
= F
n
, the n
th
Fibonacci
number.
Exercise: 4.16.3 (Ex. 10F, p.77 of van Lint & Wilson) Show that
n

k=0
(1)
k
_
2n k
k
_
2
2n2k
= 2n + 1.
Exercise: 4.16.4 Evaluate the sum

k
_
nk
k
_
(1)
k
.
Exercise: 4.16.5 Evaluate

k
_
nk
k
_
.
Theorem 4.16.6 Sylvias Problem. Establish the identity
k

j=2n
(2)
j
_
k
j
__
j n 1
n 1
_
=
_

_
4
n
_
]
k
2
|
n
_
, n 1
0, n = 0.
(4.48)
166CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
Proof: It is clear that the L.H.S. in the desired equality equals 0 when n =
0. So assume n 1 and note that

k
j=2n
(2)
j
_
k
j
__
jn1
n1
_
= 4
n

k
j=2n
_
k
j
__
j2(n1)
n1
_
(2)
j2n
.
So we may restate the desired result as:
k

j=2n
_
k
j
__
j n 1
j 2n
_
(2)
j2n
=
_

_
_
]
k
2
|
n
_
, n 1
0, n = 0.
(4.49)
Put
T

(x, y) =

k,n
_

k
2
|
n
_
x
k
y
n
=

k
_

n
_

k
2
|
n
_
y
n
_
x
k
=

k
(1 +y)
]
k
2
|
x
k
= (1 +x) + (1 +y)(x
2
+x
3
) + (1 +y)
2
(x
4
+x
5
) +
= (1 +x)

i=0
(1 +y)
i
x
2i
=
1 +x
1 (1 +y)x
2
.
Note that [y
0
]T

(x, y) =
1
1x
.
Now put T(x, y) = T

(x, y)
1
1x
=
1+x
1(1+y)x
2

1
1x
=
1x
2
1+x
2
+yx
2
(1x)(1(1+y)x
2
)
=
x
2
y
(1x)(1(1+y)x
2
)
.
So [x
k
y
n
]
_
x
2
y
(1x)(1(1+y)x
2
)
_
=
_

_
_
]
k
2
|
n
_
, n 1;
0, n = 0.
Hence T(x, y) =
x
2
y
(1x)(1(1+y)x
2
)
is the generating function for the doubly-
innite sequence of terms on the R.H.S. of Eq. 4.48.
Put
S(x, y) =

k,n,j
_
k
j
__
j n 1
j 2n
_
(2)
j2n
x
k
y
n
.
Then [x
k
y
n
] S(x, y) is the desired sum (on the L.H.S. of Eq. 4.49).
Hence our task is equivalent to showing that S(x, y) = T(x, y).
Make the invertible substitution (change of variables):
_
t
s
_
=
_
1 2
1 1
__
j
n
_
, i.e., t = j 2n, s = j n, with inverse
j = 2s t, n = s t. Hence we have
S(x, y) =

k,s,t
_
k
2s t
__
s 1
t
_
(2)
t
x
k
y
st
4.17. THE TRANSFER MATRIX METHOD 167
=

s
_

t
_

k
_
k
2s t
_
x
k
_
y
st
(2)
t
_
s 1
t
__
(now use Theorem 4.16.1)
=

s
_

t
_
x
2st
(1 x)
2st+1
_
y
st
(2)
t
_
s 1
t
__
=

s
_
_

t
_
s 1
t
__
(1 x)(2)
xy
_
t
_
_
x
2s
y
s
(1 x)
2s+1
=

s1
_
_
_
1
2(1 x)
xy
_
s1
_
_
_
x
2
y
(1 x)
2
_
s
1
1 x

j0
_
xy 2(1 x)
xy
_
j
_
x
2
y
(1 x)
2
_
j+1
1
1 x
=
x
2
y
(1 x)
3

j0
_
(xy 2(1 x))x
(1 x)
2
_
j
=
x
2
y
(1 x)
3

1
1
xy2(1x))x
(1x)
2
=
x
2
y
(1 x)(1 2x +x
2
x
2
y + 2x 2x
2
)
=
x
2
y
(1 x)(1 x
2
x
2
y)
= T(x, y).
4.17 The Transfer Matrix Method
The Transfer Matrix Method, when applicable, is often used to show that a
given sequence has a rational generating function. Sometimes that knowledge
helps one to compute the generating function using other information.
Let A be a p p matrix over the complex numbers c. Let f() =
det(I A) = a
pn
0

n
0
+ +a
1

p1
+
p
be the characteristic polynomial
of A with a
pn
0
,= 0. So the reverse polynomial

f (cf. Section 4.15) is given by

f() = 1+a
1
+ a
pn
0

pn
0
. Hence det(I A) =
p
det
_
1

I A
_
=

f().
We have essentially proved the following:
168CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
Lemma 4.17.1 If f() = det(I A), then

f() = det(I A). Moreover,
if A is invertible, so n
0
= 0, then

f = f, and f() = det(I A) i

f() = det(I A).


For 1 i, j p, dene the generating function
F
ij
(A, ) =

n0
(A
n
)
ij

n
. (4.50)
Here A
0
= I even if A is not invertible.
Theorem 4.17.2 F
ij
(A, ) =
(1)
i+j
det[(IA):j,i]
det(IA)
.
Proof: Here (B : i, j) denotes the matrix obtained from B by deleting the
i
th
row and the j
th
column. Recall that (B
1
)
ij
=
(1)
i+j
det(B:j,i)
det(B)
. Suppose
that B = I A, so B
1
= (I A)
1
=

n=0
A
n

n
, and
(1)
i+j
det(B:j,i)
det(B)
=
(B
1
)
ij
=

n=0
(A
n
)
ij

n
= F
ij
(A, ), proving the theorem.
Corollary 4.17.3 F
ij
is a rational function of whose degree is strictly less
than the multiplicity n
0
of 0 as an eigenvalue of A.
Proof: Let f() = det(I A) as in the paragraph preceding the state-
ment of Lemma 4.17.1, so

f() = det(I A) has degree p n
0
, and
deg(det(I A) : j, i)) p 1. Hence deg(F
ij
(A, )) (p 1) (p n
0
) =
n
0
1 < n
0
.
Now write q() = det(IA) =

f(). If w
1
, . . . , w
q
are the nonzero eigen-
values of A, then
1
w
1
, . . . ,
1
w
q
are the zeros of q(), so q() = a
_

1
w
1
_

_

1
w
q
_
for some nonzero a. From the denition of q() we see that q(0) = det(I) = 1,
so
q() = (1)
q
w
1
w
q
_

1
w
1
_

_

1
w
q
_
. (4.51)
Then after computing the derivative q
t
() we see easily that
q
t
()
q()
=
_
_
_
1

1
w
1
+ +
1

1
w
q
_
_
_
(4.52)
=
w
1

1 w
1

+
w
2

1 w
2

+
w
q

1 w
q

4.17. THE TRANSFER MATRIX METHOD 169


=
q

i=1

n=1
w
n
i

n
=

n=1
_
q

i=1
w
n
i
_

n
=

n=1
tr(A
n
)
n
.
We have proved the following corollary:
Corollary 4.17.4 If q() = det(I A), then

n=1
tr(A
n
)
n
=
q

()
q()
.
Let D = (V, E, ) be a nite digraph, where V = v
1
, . . . , v
p
is the set of
vertices, E is a set of (directed) edges or arcs, and : E V V determines
the edges. If (e) = (u, v), then e is an edge from u to v, with initial vertex
int(e) = u and nal vertex fin(e) = v. If u = v, then e is a loop. A walk
in D of length n from u to v is a sequence e
1
e
2
e
n
of n edges such that
int(e
1
) = u, fin(e
n
) = v, and fin(e
i
) = int(e
i+1
) for 1 i < n. If also
u = v, then is called a closed walk based at u. (Note: If is a closed walk,
then e
i
e
i+1
e
n
e
1
e
i1
is in general a dierent closed walk.)
Now let w : E R be a weight function on E (R is some commutative
ring; usually R = c or R = c[x].) If = e
1
e
2
e
n
is a walk, then the weight
of is dened by w() = w(e
1
)w(e
2
) w(e
n
). Fix i and j, 1 i, j p.
Put A
ij
(n) =

w(), where the sum is over all walks in D of length n


from v
i
to v
j
. In particular, A
ij
(0) =
ij
. The fundamental problem treated
by the transfer matrix method (TMM) is the evaluation of A
ij
(n), or at least
the determination of some generating function for the A
ij
(n).
Dene a p p matrix A = (A
ij
) by
A
ij
=

e
w(e),
where the sum is over all edges with int(e) = v
i
and fin(e) = v
j
. So
A
ij
= A
ij
(1). A is the adjacency matrix of D with respect to the weight
function w.
Theorem 4.17.5 Let n ^. Then the (i, j)-entry of A
n
is equal to A
ij
(n).
(By convention, A
0
= I even if A is not invertible.)
Proof: (A
n
)
ij
=

A
ii
1
A
i
1
i
2
A
i
n1
j
, where the sum is over all sequences
(i
1
, . . . , i
n1
) [p]
n1
. (Here i = i
0
and j = i
n
.) The summand is zero unless
there is a walk e
1
e
n
from v
i
to v
j
with int(e
k
) = v
i
k1
(1 k n), and
fin(e
k
) = v
i
k
(1 k n). If such a walk exists, then the summand is equal
to the sum of the weights of all such walks.
We give a special case that occasionally works out in a very satisfying
way. Let C
D
(n) =

w(), where the sum is over all closed walks in D of


length n. In this case we have the following.
170CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
Corollary 4.17.6

n1
C
D
(n)
n
=
q

()
q()
, where q() = det(I A).
Proof: Clearly C
D
(1) = tr(A), and by Theorem 4.17.5 we have C
D
(n) =
tr(A
n
). Hence by Cor 4.17.4 we have

n1
C
D
(n)
n
=
q

()
q()
.
Often an enumeration problem can be represented as counting the number
of sequences a
1
a
2
a
n
[p]
n
of integers 1, . . . , p subject to certain restric-
tions on the subsequences a
i
a
i+1
that may appear. In this case we form a
digraph D with vertices v
i
= i, 1 i p, and put an arc e = (i, j) from
i to j provided the subsequence ij is permitted. So a permitted sequence
a
i
1
a
i
2
a
i
n
corresponds to a walk = (i
1
, i
2
)(i
2
, i
3
) (i
n1
, i
n
) in D of
length n 1 from i
1
to i
n
. If w(e) = 1 for all edges in D and if A is the
adjacency matrix of D with respect to this particular weight function, then
clearly f(n) :=

p
i,j=1
A
ij
(n1) is the number of sequences a
1
a
2
a
n
[p]
n
subject to the restrictions used in dening D. Put q() = det(I A) and
q
ij
() = det((I A) : j, i). Then by Theorem 4.17.2
F() :=

n0
f(n + 1)
n
=

n0
_
_
p

i,j=1
A
ij
(n)
_
_

n
(4.53)
=
p

i,j=1

n0
A
ij
(n)
n
=
p

i,j=1
F
ij
(A, ) =
p

i,j=1
(1)
i+j
q
ij
()
q()
.
We state this as a corollary.
Corollary 4.17.7 If w(e) = 1 for all edges in D and f(n) is the number
of sequences a
1
a
2
a
n
[p]
n
subject to the restrictions used in dening D,
then

n0
f(n + 1)
n
=
p

i,j=1
(1)
i+j
q
ij
()
q()
. (4.54)
We give an easy example that can be checked by other more elementary
means.
Example 1. Let f(n) be the number of sequences a
1
a
2
a
n
[3]
n
with
the property that a
1
= a
n
and a
i
,= a
i+1
for 1 i n 1. Then the
adjacency matrix A for this example is A =
_
_
_
0 1 1
1 0 1
1 1 0
_
_
_.
4.17. THE TRANSFER MATRIX METHOD 171
We apply Cor. 4.17.6. It is easy to check that q() = det(I A) =
(1 + )
2
(1 2), and q
t
() = 6(1 + ). Using partial fractions, etc., we
nd that
q
t
()
q()
= 3 +
2
1 +
+
1
1 2
= 3 +

n=0
2()
n
+

n=0
2
n

n
= 3 +

n=0
(2
n
+ (1)
n
2)
n
.
Here n = 3 gives 8 2 = 6. The six sequences are of the form aba with
a and b arbitrary but distinct elements of 1, 2, 3.
Example 2. Let D be the complete (weighted) digraph on two vertices,
i.e., p = 2, V = v
0
, v
1
, and the weight w(e
ij
) of the edge e
ij
from v
i
to v
j
, is
the indeterminate x
ij
, 0 i, j 1. A sequence a
0
a
1
a
2
a
n
of n +1 0s and
1s corresponds to a walk of length n along edges a
0
a
1
, a
1
a
2
, . . . , a
n1
a
n
, and
has weight x
a
0
a
1
x
a
1
a
2
x
a
n1
a
n
. The adjacency matrix is A =
_
x
00
x
01
x
10
x
11
_
.
Then (A
n
)
ij
=

w(), where the summation is over all walks of length


n from i to j, 0 i, j 1. At this level of generality we are in a position to
consider several dierent problems.
Problem 2.1 Let f(n) be the number of sequences of n 0s and 1s with 11
never appearing as a subsequence a
i
a
i+1
, i.e., x
11
= 0. Then as in Cor. 4.17.7
we put x
00
= x
01
= x
10
= 1 and we have

n0
f(n+1)
n
=

2
i,j=1
(1)
i+j
q
ij
()
q()
,
where q() = det
_
I
_
1 1
1 0
__
= 1
2
. A quick computation shows
that q
11
= 1, q
12
= ; q
21
= ; q
22
= 1 . Hence

n0
f(n + 1)
n
=
2+
1
2
. We recognize that this denominator gives a Fibonacci type sequence.
If we solve
2+
1
2
=
b
1
+
c
1
with =
1

5
2
and =
1+

5
2
for b and c,
we eventually nd that
2 +
1
2
=

n0
f(n + 1)
n
if and only if
f(n + 1) =
_
5 2

5
__
1

5
2
_
n
+
_
5 + 2

5
__
1 +

5
2
_
n
.
172CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
Problem 2.2. Find the number of sequences of n + 1 0s and 1s with 11
never appearing as a subsequence a
i
a
i+1
, i.e., x
11
= 0 as above, but this time
consider only those sequences starting with a xed i 0, 1 and ending
with a xed j 0, 1.
For this situation we need to nd the ij entry of the n
th
power of A =
_
1 1
1 0
_
. Here we diagonalize the matrix A to nd its powers
A
n
=
1
4

5
_
2 2

5 1

5 + 1
_
_
_
_
1+

5
2
_
n
0
0
_
1

5
2
_
n
_
_
_
1 +

5 2
1

5 2
_
=
1
4

5
_
_
(1+

5)
n+1
2
n1

(1

5)
n+1
2
n1
(1+

5)
n
2
n2

(1

5)
n
2
n2
(

51)(1+

5)
n+1
2
n
+
(

5+1)(1

5)
n+1
2
n
(

51)(1+

5)
n
2
n1
+
(

5+1)(1

5)
n
2
n1
_
_
.
For example the 12 entry of this matrix is the number of sequences of
n+10s and 1s with 11 never appearing as a subsequence a
i
a
i+1
and starting
with 0 and ending with 1. A little routine computation gives
tr(A
n
) =
(1 +

5)
n
2
n
+
(1

5)
n
2
n
.
We could also have used Cor. 4.17.6 and calculated
q

()
q()
= 2 +
2
1
2
= 2 +
1
1
+
1
1
. This agrees with the above for n 1, but
in the proof of Cor. 4.17.6 the term C
D
(0) is not accounted for.
Problem 2.3 Suppose we still require that two 1s never appear together,
but now we want to count sequences with prescribed numbers of 0s and 1s.
Return to the situation where A =
_
x
00
x
01
x
10
x
11
_
. Then

n=0
A
n
= (I A)
1
=
_
1 x
00
x
01
x
10
1 x
11
_
1
=
_
1 x
11
x
01
x
10
1 x
00
_
(1 x
00
)(1 x
11
) x
01
x
10
=
_
1 x
11
x
01
x
10
1 x
00
_

_
_
1
(1 x
00
)(1 x
11

1
1
x
01
x
10
(1x
00
)(1x
11
)
_
_
=
_
1 x
11
x
01
x
10
1 x
00
_

i=0
x
i
01
x
i
10
(1 x
00
)
i+1
(1 x
11
)
i+1
4.17. THE TRANSFER MATRIX METHOD 173
=
_
1 x
11
x
01
x
10
1 x
00
_

i=0
x
i
01
x
i
10

j=0

k=0
_
i +j
j
_
x
j
00
_
i +k
k
_
x
k
11
.
If we suppose that the pair 11 never appears, so x
11
= 0, then x
k
11
=
k,0
.
And

n=0
A
n
=
_
1 x
01
x
10
1 x
00
_

i,j=0
_
i +j
j
_
x
i
01
x
i
10
x
j
00
.
We now consider what this equation implies for the (i, j) position, 1
i, j 2.
Case 1. (1,1) position:

n=0
(A
n
)
11
=

i,j=0
_
i+j
j
_
x
i
01
x
i
10
x
j
00
. So there
must be
_
i+j
j
_
ways of forming walks of length 2i +j using the edges x
01
and
x
10
each i times and the edge x
00
j times. This corresponds to a sequence of
length 2i + j + 1 with exactly i 1s (and i + j + 1 0s), starting and ending
with a 0, and never having two 1s next to each other. Another way to view
this is as needing to ll i +1 boxes with 0s (the boxes before and after each
1) so that each box has at least one 0. This is easily seen to be the same as
an (i + 1)-composition of i + j + 1, of which there are
_
i+j
i
_
=
_
i+j
j
_
. (See
pages 15-16 of our class notes.)
Case 2. (1,2) position:

n=0
(A
n
)
12
=

i,j=0
_
i+j
j
_
x
i+1
01
x
i
10
x
j
00
. Here there
must be
_
i+j
j
_
walks of length 2i + j + 1 using the edge x
01
i + 1 times, the
edge x
10
i times, and the edge x
00
j times. This corresponds to a sequence
of length 2i + j + 2 with exactly i + 1 1s (and i + j + 1 0s), starting with
a 0 and ending with a 1, and never having two 1s next to each other. It is
clear that this kind of sequence is just one from Case 1 with a 1 appended
at the end.
Case 3. (2,1) position: This is the same as Case 2., with the roles of x
01
and x
10
interchanged, and the 1 appended at the beginning of the sequence.
Case 4. (2,2) position:

n=0
(A
n
)
22
=

i,j=0
_
i+j
j
_
x
i
01
x
i
10
x
j
00

i,j=0
_
i+j
j
_
x
i
01
x
i
10
x
j+1
00
=

i,j=0
x
i
01
x
i
10
x
j
00
_
i +j
j
_

i,j=0
x
i
01
x
i
10
x
j+1
00
_
i +j
j
_
=

i=0
x
i
01
x
i
10
+

i=0;j=1
x
i
01
x
i
10
x
j
00
__
i +j
j
_

_
i +j 1
j 1
__
174CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
= 1 +

k,j=1
x
i
01
x
i
10
x
j
00
_
i +j 1
j
_
,
after some computation. A term x
i
01
x
i
10
x
j
00
corresponds to a sequence of
length 2i + j + 1 = n, and n must be at least 3 before anything interesting
shows up. Here i + j 1 = n 3 (i 1) and j = n 2i 1 so (n 3
(i 1)) (n 2i 1) = i 1. Hence, the number of sequences of 0s and
1s of length n 3 and with 11 never appearing is

1i
n1
2
_
n3(i1)
i1
_
=

0k
n3
2
_
n3k
k
_
. We recognize this as F
n3
, the (n3)
th
Fibonacci number.
(See Section 4.11.)
Example 3. Let f(n) be the number of sequences a
1
. . . a
n
[3]
n
such
that neither 11 nor 23 appear as two consecutive terms a
i
a
i+1
. Determine
f(n) or at least a generating function for f(n).
Solution: Let D be the digraph on V = [3] with an edge (i, j) if and only
if j is allowed to follow i in the sequence. Also let w(e) = 1 for each edge e
of D. The corresponding adjacency matrix is A =
_
_
_
0 1 1
1 1 0
1 1 1
_
_
_. So f(n) =

3
i,j=1
A
ij
(n 1). Put q() = det(I A), and q
ij
() = det(I A : j, i).
By Theorem 4.17.2, F() :=

n0
f(n + 1)
n
=

n0
(

3
i,j=1
A)
ij
(n)) =

3
i,j=1
(1)
i+j
q
ij
()
q()
. It is easy to work out q() =
3

2
2 + 1. Then
det[(I A)
1
] = [det(I A)]
1
. By Cor. 4.17.3, F
ij
(A, ), and hence F()
is a rational function of of degree less than the multiplicity n
0
of 0 as an
eigenvalue of A. But q() has degree 3, forcing A to have rank at least 3.
But A is 3 3, so n
0
= 0. Since the denominator of F() is q(), which has
degree 3, the numerator of F() has degree at most 2, so is determined by
its values at three points. Note: we need A
2
=
_
_
_
2 2 1
1 2 1
2 3 2
_
_
_.
Then
f(1) =
3

i,j=1
A
ij
(0) = tr(I) = 3.
f(2) =
3

i,j=1
A
ij
(1) = 7.
4.17. THE TRANSFER MATRIX METHOD 175
f(3) =
3

i,j=1
A
ij
(2) = 16.
Then for some a
0
, a
1
, a
2
Q, F() =
a
0
+a
1
+a
2

2
12
2
=

n0
f(n + 1)
n
=
f(1) +f(2) +f(3)
2
+ , which implies that
(a
0
+a
1
+a
2

2
) = (1 2
2
+
3
)(3 + 7 + 16
2
+ ) = 3 +
2
.
Hence
F() =

n0
f(n + 1)
n
=
3 +
2
1 2
2
+
3
,
from which it follows that

n0
f(n + 1)
n+1
=
3 +
2

3
1 2
2
+
3
.
Now add f(0) = 1 to both sides to get

n=0
f(n)
n
=
1 +
1 2
3
+
3
.
The above generating function for f(n) implies that
f(n + 3) = 2f(n + 2) +f(n + 1) f(n). (4.55)
For a variation on the problem, let g(n) be the number of sequences
a
1
a
n
such that neither 11 nor 23 appears as two consecutive terms a
i
a
i+1
or as a
n
a
1
. So g(n) = C
D
(n), the number of closed walks in D of length n.
So we just need to compute
q

()
q()
=
(2+23
2
)
12
2
+
3
.
The preceding example is Example 4.7.4 from R. P. Stanley, Enumerative
Combinatorics, Vol. 1., Wadsworth & Brooks/Cole, 1986. See that reference
for further examples of applications of the transfer matrix method.
Exercise: 4.17.8 Let f(n) be the number of sequences a
1
a
2
a
n
[3]
n
with
[3] = 0, 1, 2 and with the property that a
1
= a
n
and 0 and 2 are never next
to each other. Use the transfer matrix method to nd a generating function
for the sequence a
n

n=1
, and then nd a formula for f
n
.
176CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
4.18 A Famous NONLINEAR Recurrence
For n 3 let u
n
be the number of ways to associate a nite sequence
x
1
, . . . , x
n
. As a rst example,
u
3
= [x
1
(x
2
x
3
), (x
1
x
2
)x
3
[ = 2.
Similarly, u
4
= 5 =
[x
1
(x
2
(x
3
x
4
)), x
1
((x
2
x
3
)x
4
), (x
1
x
2
)(x
3
x
4
), (x
1
(x
2
x
3
))x
4
, ((x
1
x
2
)x
3
)x
4
[.
By convention, u
1
= u
2
= 1.
A given associated product always looks like (x
1
. . . x
r
)(x
r+1
. . . x
n
), where
1 r n 1. So u
n
= u
1
u
n1
+u
2
u
n2
+ +u
n1
u
1
, n 2. Hence
u
n
=
n1

i=1
u
i
u
ni
.
Put f(x) =

n=1
u
n
x
n
. Then
(f(x))
2
=

n=2
_
n1

i=1
u
i
u
ni
_
x
n
=

n=2
u
n
x
n
= f(x) x.
It follows that [f(x)]
2
f(x) + x = 0, so f(x) =
1
2
[1 (1 4x)
1
2
]. We
must use the minus sign, since the constant term of f(x) is f(0) = 0. This
leads to
f(x) =
1
2

1
2
(1 4x)
1
2
=
1
2

1
2

n=0
_
1
2
n
_
(4x)
n
.
Then a little computation shows that
u
n
= (
1
2
)(1)
n1
(1 3 5 (2n 3))(1)
n
4
n
2
n
n! =
(2n 2)!
n!(n 1)!
=
=
1
n
_
2(n 1)
n 1
_
= C
n1
.
These numbers C
n
=
1
n+1
_
2n
n
_
are the famous Catalan numbers. See
Chapter 14 of Wilson and van Lint for a great deal more about them.
4.19. MACMAHONS MASTER THEOREM 177
4.19 MacMahons Master Theorem
4.19.1 Preliminary Results on Determinants
Theorem 4.19.2 Let R be a commutative ring with 1, and let A be an nn
matrix over R. The characteristic polynomial of A is given by
f(x) = det(xI A) =
n

i=0
c
i
x
ni
(4.56)
where c
0
= 1, and for 1 i n, c
i
=

det(B), where B ranges over all the
i i principal submatrices of A.
Proof: Clearly det(xI A) is a polynomial of degree n which is monic,
i.e., c
0
= 1, and and with constant term det(A) = (1)
n
det(A). Suppose
1 i n 1 and consider the coecient c
i
of x
ni
in the polynomial
det(xI A). Recall that in general, if D = (d
ij
) is an n n matrix over a
commutative ring with 1, then
det(D) =

S
n
(1)
sgn()
d
1,(1)
d
2,(2)
d
n,(n)
.
So to get a term of degree n i in det(xI A) =

S
n
(1)
sgn()
(xI
A)
1,(1)
(xI A)
n,(n)
we rst select ni indices j
1
, . . . , j
ni
, with comple-
mentary indices k
1
, . . . , k
i
. Then in expanding the product (xIA)
1,(1)
(xI
A)
n,(n)
when xes j
1
, . . . , j
ni
, we select the term x from the factors
(xI A)
j
1
,j
1
, . . . , (xI A)
j
ni
j
ni
, and the terms (A)
k
1
,(k
1
)
, . . . , (A)
k
i
,(k
i
)
otherwise. So if A(k
1
, . . . , k
i
) is the principal submatrix of A indexed by rows
and columns k
1
, . . . , k
i
, then det(A(k
1
, . . . , k
i
)) is the associated contribu-
tion to the coecient of x
ni
. It follows that c
i
=

det(B) where B ranges
over all the principal i i submatrices of A.
Suppose the permutation o
n
consists of k permutation cycles of sizes
l
1
, . . . , l
k
, respectively, where

l
i
= n. Then sgn() can be computed by
sgn() = (1)
l
1
1+l
2
1+l
k
1
= (1)
nk
= (1)
n
(1)
k
.
We record this formally as:
sgn() = (1)
n
(1)
k
if o
n
is the product of k disjoint cycles. (4.57)
178CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
4.19.3 Permutation Digraphs
Let A = (a
ij
) be an n n matrix over the commutative ring R with 1.
Let D
n
be the complete digraph of order n with vertices 1, 2, . . . , n, and
for which each ordered pair (i, j) is an arc of D
n
. Assign to each arc (i, j)
the weight a
ij
to obtain a weighted digraph. The weight of a directed cycle
: i
1
i
2
i
k
i
1
is dened to be
wt() = a
i
1
i
2
a
i
2
i
3
a
i
k1
i
k
a
i
k
i
1
, (4.58)
which is the negative of the product of the weights of its arcs.
Let o
n
. The permutation digraph D() has vertices 1, . . . , n and arcs
(i, (i)), 1 i n. So D() is a spanning subgraph of D
n
. The directed
cycles of the graph D() are in 1-1 correspondence with the permutation
cycles of . Also, the arc sets of the directed cycles of D() partition the set
of arcs of D().
The weight wt(D()) of the permutation digraph D() is dened to be
the product of the weights of its directed cycles. Hence if has k permutation
cycles,
wt(D()) = (1)
k
a
1,(1)
a
2,(2)
a
n,(n)
. (4.59)
Then using Equations 4.56 and 4.57 we obtain
det(A) =

wt(D()), (4.60)
where D() ranges over all permutation digraphs of order n.
Fix X [n] = 1, 2, . . . , n and let o
X
. The permutation digraph
D() has vertex set X and is a (not necessarily spanning) subgraph of D
n
with weight equal to the product of the weights of its cycles. (If X = , the
corresponding weight is dened to be 1.) If B is the principal submatrix of
A whose rows and columns are the (intersections of the) rows and columns
of A indexed by X [n], then det(B) =

S
X
wt(D()). If we put x = 1
in Eq. 4.56 we obtain
det(I
n
A) =

S
X
X[n]
wt(D()). (4.61)
Let y
1
, . . . , y
n
be independent commuting variables over R, and put R

=
R[y
1
, . . . , y
n
]. Replace A in the preceding discussion with AY , where Y is
4.19. MACMAHONS MASTER THEOREM 179
the diagonal matrix with diagonal entries y
1
, . . . , y
n
. So (AY )
ij
= a
ij
y
j
. So
if o
n
has k permutation cycles, D() has k directed cycles. And
wt(D()) = (1)
k
a
1,(1)
y
(1)
a
n,(n)
y
(n)
. (4.62)
From a dierent point of view, let Hbe the set of all digraphs H of order n
for which each vertex has the same indegree and outdegree, and this common
value is either 0 or 1. Then H consists of a number of pairwise disjoint
directed cycles, and henced is a permutation digraph on a subset of [n]. The
weight wt(H) of a digraph H H is dened to be wt(H) = (1)
c(H)
(the
product of the weights of its arcs), where c(H) is the number of directed
cycles of H and the weight of an arc (i, j) of H is wt(i, j) = a
ij
y
j
. So if
H H satises H = D(), o
X
, X [n], then wt(H) is given by
Eq. 4.62. Moreover, if wt(H) =

H1
wt(H), by Eq. 4.61 we have
wt(H) =

H1
wt(H) = det(I
n
AY ). (4.63)
4.19.4 A Class of General Digraphs
We now consider the set T of general digraphs D on vertices in [n], for which
the arcs having i as initial vertex are linearly ordered, and such that for
each i, 1 i n, there is a nonnegative integer m
i
such that m
i
equals
both the indegree and the outdegree of the vertex i. Recall that a loop on i
contributes 1 to both the indegree and the outdegree of i. We still have the
n n matrix A = (a
ij
) and the independent indeterminates y
1
, . . . , y
n
. If D
is a general digraph, and if (i, j) is the t
th
arc with i as initial vertex, let a
t
ij
y
j
be the weight of the arc (i, j) (a
t
ij
is the (i, j) entry of A with a superscript t
adjoined.) The weight wt(D) of D is the product of the weights of its arcs.
Each D is uniquely identied by wt(D).
Moreover, suppose that the variables y
1
, y
n
commute with all the en-
tries of A, but do not commute with each other. We show that each D is
identied uniquely by the word in y
1
, . . . , y
n
associated with wt(D). As an
example, suppose that
wt(D) = a
1
11
y
1
a
2
13
y
3
a
3
13
y
3
a
1
22
y
2
a
1
31
y
1
a
2
31
y
1
a
3
33
y
3
(4.64)
= a
1
11
a
2
13
a
3
13
a
1
22
a
1
31
a
2
31
a
3
33
y
1
y
3
y
3
y
2
y
1
y
1
y
3
. (4.65)
Here in D vertex 1 has outdegree 3 and indegree 3, i.e., m
1
= 3. Similarly,
m
2
= 1 and m
3
= 3. Notice that the word y
1
y
2
3
y
2
y
1
y
1
y
3
is sucient to
180CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
recreate the digraph D along with the linear order on its arcs. To see this,
start with y
1
y
3
y
3
y
2
y
1
y
1
y
3
and work from the left. For each j, 1 j 3, the
number of y
j
appearing in the word is the indegree m
j
of j. Since m
1
= 3,
m
2
= 1, and m
3
= 3, the rst 3 arcs have initial vertex 1, the 4
th
arc has
initial vertex 2, the last 3 arcs have initial vertex 3.
As another example, consider the word y
2
y
1
y
2
y
2
3
y
2
1
, and let D be the
associated digraph. Here m
1
= 3, m
2
= 2, m
3
= 2. It follows that
wt(D) = a
1
12
a
2
11
a
3
12
a
1
23
a
2
23
a
1
31
a
2
31
y
2
y
1
y
2
y
2
3
y
2
1
.
Two digraphs D
1
and D
2
in T are considered the same if and only if for
each i, 1 i n, and for each t, 1 t m
i
, the t
th
arc of D
1
having initial
vertex i, and the t
th
arc of D
2
having initial vertex i, both have the same
terminal vertex.
Consider the product

n
i=1
(a
i1
y
1
+ +a
in
y
n
)
m
i
. (4.66)
Label the factors in each power, say,
(a
i1
y
1
+ a
in
y
n
)
m
i
=
= (a
i1
y
1
+ a
in
y
n
)
1
(a
i1
y
1
+ a
in
y
n
)
2
(a
i1
y
1
+ a
in
y
n
)
m
i
,
and then write a
t
ij
in place of a
ij
in the t
th
factor. Then the product appears
as
(a
i1
y
1
+ a
in
y
n
)(a
2
i1
y
1
+ a
2
in
y
n
) (a
m
i
i1
y
1
+ a
m
i
in
y
n
) (4.67)
Consider the product as i goes from 1 to n of the product in Eq. 4.67.
Each summand of the expanded product that involves a word in the ys using
m
j
of the y
t
j
s, 1 j n, corresponds to (i.e., is the weight of) a unique
general digraph in which vertex i has both indegree and outdegree equal to
m
i
. If we remove the superscript t on the element a
t
ij
and now assume that the
ys commute, we see that if B(m
1
, . . . , m
n
) is the coecient of y
m
1
y
m
2
y
m
n
in the product as i goes from 1 to n of the product in Eq. 4.67, then
wt(T) =

DT
wt(D) =

(m
1
,...,m
n
)(0,...,0)
B(m
1
, . . . , m
n
)y
m
1
1
y
m
n
n
. (4.68)
4.19. MACMAHONS MASTER THEOREM 181
To see this, let
T
(m
1
,...,m
n
)
= D T : m
i
= outdeg(i) = indegree(i) in D.
Clearly,
wt(T
(m
1
,...,m
n
)
) =

DT
(m
1
,...,m
n
)
wt(D) = B(m
1
, . . . , m
n
)y
m
1
1
y
m
n
n
.
4.19.5 MacMahons Master Theorem for Permutations
Continue with the same use of notation for A and Y .
Theorem 4.19.6 Let A(m
1
, . . . , m
n
) be the coecient of y
m
1
1
y
m
2
2
y
m
n
n
in
the formal inverse det(I
n
AY )
1
of the polynomial det(I
n
AY ). Let
B(m
1
, . . . , m
n
) be the coecient of y
m
1
1
y
m
2
2
y
m
n
n
in the product
n

i=1
(a
i1
y
1
+a
i2
y
2
+ +a
in
y
n
)
m
i
.
Then A(m
1
, . . . , m
n
) = B(m
1
, . . . , m
n
).
Proof: Put ( = T H = (D, H) : D T, H H, and dene the
weight of the pair (D, H) by wt(D, H) = wt(D) wt(H). Then
wt(() :=

(D,H)
wt(D, H) = wt(T) wt(H).
This implies (by Eqs. 4.63 and 4.68) that
wt(() =
_
_

(m
1
,...,m
n
)(0,...,0)
B(m
1
, . . . , m
n
)y
m
1
1
y
m
n
n
_
_
det(I
n
AY ). (4.69)
If we can show that wt(() = 1, we will have proved MacMahons Master
Theorem.
Let denote the digraph on vertices 1, . . . , n, with an empty set of arcs.
Then wt(, ) = 1. We want to dene an involution on the set ( (, )
which is sign-reversing on weights.
Given a pair (D, H) (( (, )), we determine the rst vertex u whose
outdegree in either D or H is positive. Beginning at that vertex u we walk
182CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
along the arcs of D, always choosing the topmost arc( arc a
t
ij
from i with t
the largest available), until one of the following occurs:
(i) We encounter a previously visited vertex (and have thus located a
directed cycle of D).
(ii) We encounter a vertex which has a positive outdegree in H (and thus
is a vertex on a directed cycle of H).
We note that if u is a vertex with positive outdegree in H then we are
immediately in case (ii). We also note that cases (i) and (ii) cannot occur
simultaneously. If case (i) occurs, we form a new element of ( by removoing
form D and putting it in H. If case (ii) occurs, we remove from H and put
it in D in such a way that each arc of is put in front of (in the linear order)
those with the same initial vertex. Let (D
t
, H
t
) be the pair obtained in this
way. Then D
t
is in T and H
t
is in H, and hence (D
t
, H
t
) is in (. Moreover,
since the number of directed cycles in H
t
diers from the number in H by
one, it follows that wt(D
t
, H
t
) = wt(D, H). Dene (D, H) = (D
t
, H
t
) and
note that (D
t
, H
t
) = (D, H). Thus is an involution on ( (, ) which is
sign-reversing on weights. It follows that wt(() = wt(, ) = 1. Hence the
proof is complete.
We give two examples to help the reader be sure that the above proof is
understood. Let D be the general digraph with arcs
D : a
1
15
a
1
23
a
1
32
a
2
35
a
3
31
a
1
53
a
2
53
.
Let X = 2, 3, 4, 6 [6]. Let = (2, 4, 6)(3) o
X
, and let H = D() :
a
24
a
33
a
46
a
62
. Since the rst vertex 1 has positive outdegree in D, we start
walking along arcs in D: rst is a
1
15
. As 5 does not have positive outdegree
in H, the next arc is a
2
53
. As 3 has positive outdegree in H and belongs to
the directed cycle (which is a loop) = a
33
. We put this loop into D as arc
a
4
33
, and remove it from H to obtain H
t
= a
24
a
46
a
62
. So (D, H) = (D
t
, H
t
).
We now check that (D
t
, H
t
) = (D, H).
So let D be the same as D
t
above, and suppose X = 2, 4, 6 and =
(2, 4, 6). So
(D, H) = (a
1
15
a
1
23
a
1
32
a
2
35
a
3
31
a
4
33
a
1
53
a
2
53
, a
24
a
46
a
62
).
We start our walk with a
1
15
, moving to a
2
53
, then to a
4
33
. Since 3 is a
repeated vertex, the loop = 3 3 represented by a
4
33
is removed from
D and adjoined to H as the loop a
33
. We have now obtained the original
element of (.
4.19. MACMAHONS MASTER THEOREM 183
When we specialize to n = 2 we obtain the following:
If A =
_
a
11
a
12
a
21
a
22
_
, then
det(I AY )
1
= (1 a
11
y
1
a
22
y
2
+ (a
11
a
22
a
12
a
21
)y
1
y
2
)
1
=

(m
1
,m
2
)(0,0)
_

i
_
m
1
i
__
m
2
m
1
i
_
a
i
11
a
m
1
i
12
a
m
1
i
21
a
m
2
m
1
+i
22
_
y
m
1
1
y
m
2
2
. (4.70)
Note: If some a
ij
= 0, then to get a nonzero contribution the power on
a
ij
must be zero.
Computing det(I AY )
1
directly, we get

k=0
[a
11
y
1
+a
22
y
2
(a
11
a
22
a
12
a
21
)y
1
y
2
]
k
.
Then computing the coecient of y
m
1
1
y
m
2
2
in this sum (and writing in
place of a
11
a
22
a
12
a
21
) we get

k=0
_
k
k m
2
, k m
1
, m
1
+m
2
k
_
a
km
2
11
a
km
1
22

m
1
+m
2
k
(1)
m
1
+m
2
k
.
(4.71)
This gives a variety of equalities. In particular, suppose each a
ij
= 1.
Hence = 0 so k = m
1
+ m
2
for a nonzero contribution. Then the Master
Theorem yields the familiar equality:

i
_
m
1
i
__
m
2
m
1
i
_
=
_
m
1
+m
2
m
1
, m
2
, 0
_
=
_
m
1
+m
2
m
1
_
. (4.72)
Exercise: 4.19.7 Prove that

k
_
k
km,kn,m+nk
_
(1)
m+nk
= 1.
(Hint: Compute the coecient of a
m
1
11
a
m
2
22
in the two equations Eq. 4.70
and Eq. 4.71, which must be equal by the Master Theorem.)
184CHAPTER 4. FORMAL POWER SERIES AS GENERATINGFUNCTIONS
4.19.8 Dixons Identity as an Application of the Mas-
ter Theorem
Problem: Evaluate the sum S =

n
k=0
(1)
k
_
n
k
_
3
.
Since each summand is the product of three binomial coecients with
upper index n, we are led to consider the expression:
_
1
x
y
_
n _
1
y
z
_
n
_
1
z
x
_
n
=

0i,j,kn
_
n
i
__
n
j
__
n
k
_
(1)
i+j+k
x
ik
y
ji
z
kj
.
To force the lower indices in the binomial coecients to be equal, we
apply the operator [x
0
y
0
z
0
]. From the above we see that
S = [x
0
y
0
z
0
]
__
1
x
y
_
n _
1
y
z
_
n
_
1
z
x
_
n
_
=

0in
_
n
i
_
3
(1)
3i
.
We can see directly that this is equal to
= [x
n
y
n
z
n
] (y x)
n
(z y)
n
(x z)
n
,
but the point of this exercise is to get it from the Master Theorem.
Now let A =
_
_
_
0 1 1
1 0 1
1 1 0
_
_
_ and
Y =
_
_
_
x 0 0
0 y 0
0 0 z
_
_
_. A simple calculation shows that
I AY =
_
_
_
1 y z
x 1 z
x y 1
_
_
_, and det(I AY )
1
=
= (1+xy +yz +zx)
1
=

i,j,k0
(1)
i+j+k
_
i +j +k
i, j, k
_
(xy)
i
(yz)
j
(zx)
k
. (4.73)
4.19. MACMAHONS MASTER THEOREM 185
MacMahons Master Theorem with m
1
= m
2
= m
3
= n applied to I AY
says that
[x
n
y
n
z
n
]
_
det(I AY )
1
_
= [x
n
y
n
z
n
] (y z)
n
(z x)
n
(x y)
n
, (4.74)
from which we obtain
S = [x
n
y
n
z
n
] (y z)
n
(z x)
n
(x y)
n

= [x
n
y
n
z
n
]

i,j,k0
(1)
i+j+k
_
i +j +k
i, j, k
_
(xy)
i
(yz)
j
(zx)
k
=

i,j,k0
(1)
i+j+k
_
i +j +k
i, j, k
_
where the sum is over all (i, j, k) for which i +j = j +k = k +i = n. Hence
i = j = k = n/2, and i, j and k are integers. From this it follows that
S =
_
(1)
m
(3m)!(m!)
3
if n = 2m,
0 otherwise.
Exercise: 4.19.9 Apply the Master Theorem to the matrix B =
_
_
_
0 1 1
1 0 1
1 1 0
_
_
_.
Show that

i
_
m
i
_
3
=

n
_
m+n
m2n, n, n, n
_
2
m2n
.
Show that this is the number of permutations of the letters in the sequence
x
m
1
x
m
2
x
m
3
such that no letter is placed in a position originally occupied by itself.
Chapter 5
Mobius Inversion on Posets
This chapter deals with locally nite partially ordered sets (posets), their
incidence algebras, and Mobius inversion on these algebras.
5.1 Introduction
Recall rst that we have proved the following (see Theorem 1.5.5):
(x)
n
=
n

k=0
c(n, k)x
k
, (5.1)
where c(n, k) =
_
n
k
_
is the number of o
n
with k cycles. Replacing x
with x and observing that (x)
k
= (1)
k
(x)
k
, we obtained
(x)
k
=
n

k=0
s(n, k)x
k
, (5.2)
where s(n, k) = (1)
nk
c(n, k) is a Stirling number of the rst kind.
Let

n
be the set of all partitions of the set [n], and S(n, k) the number
of partitions of [n] with exactly k parts. For each function f : [n] [m], let

f
denote the partition of [n] determined by f. For

n
, let

(m) =
[f : [n] [m] : =
f
[ = [f : [()] [m] : f is one-to-one[ = (m)
(
),
where () denotes the number of parts of . Given any f : [n] [m],
there is a unique

n
for which f is one of the maps counted by

(m),
203
204 CHAPTER 5. M

OBIUS INVERSION ON POSETS


i.e., =
f
. And m
n
= [f : [n] [m][. So m
n
=

(m) =

n
(m)
()
=

n
k=0
S(n, k)(m)
k
for all n 0.
x
n
=
n

k=0
S(n, k)(x)
k
, n 0, (5.3)
where S(n, k) is a Stirling number of the second kind. If we use the same
trick of replacing x with x again, we get
x
n
=
n

k=0
(1)
nk
S(n, k)(x)
k
. (5.4)
Here we can see that Eq. 5.1 and Eq. 5.4 are inverses of each other, and
Eq. 5.2 and Eq. 5.3 are inverses of each other. We proceed to make this a
little more formal.
Let T
n
be the set of all polynomials of degree k, 0 k n, (along with
the zero polynomial), with coecients in c. Then T
n
is an (n+1)-dimensional
vector space.
B
1
= 1, x, x
2
, . . . , x
n
,
B
2
= (x)
0
= 1, (x)
1
, . . . , (x)
n

and
B
3
= (x)
0
= 1, (x)
2
, . . . , (x)
n

are three ordered bases of T


n
. Recall that if B = v
1
, . . . , v
m
and B
t
=
w
1
, . . . , w
m
are two bases of the same vector space over c (or over any eld
K), then there are unique scalars a
ij
, 1 i, j m for which w
j
=

m
i=1
a
ij
v
i
and unique scalars a
t
ij
, 1 i, j m for which v
j
=

m
i=1
a
t
ij
w
i
. And the
matrices A = (a
ij
) and A
t
= (a
t
ij
) are inverses of each other.
So put:
A = (a
ij
), 0 i, j n a
ij
= c(j, i);
B = (b
ij
), 0 i, j n b
ij
= s(j, i);
C = (c
ij
), 0 i, j n, c
ij
= S(j, i);
D = (d
ij
), 0 i, j n, d
ij
= (1)
ji
S(j, i).
5.2. POSETS 205
Then A and D are inverses of each other, and B and C are inverses. So
n

k=0
S(j, k)s(k, i) =
n

k=0
b
ik
c
kj
= (BC)
ij
=
ij
, (5.5)
n

k=0
(1)
jk
S(j, k)c(k, i) =
n

k=0
a
ik
d
kj
= (AD)
ij
=
ij
. (5.6)
We want to see Eq. 5.5 expressed in the context of Mobius inversion over
a nite partially ordered set. Also, when two matrices, such as A and D
above, are recognized as being inverses of each other,

b = aA i a =

bD.
Consider a second example. Let A, B, C be three subsets of a universal
set E. Then [E(ABC)[ = [E[ ([A[ +[B[ +[C[) +([AB[ +[AC[ +
[B C[) [A B C[. This is a very special case of the general principle
of inclusion - exclusion that we met much earlier and which we now want to
view as Mobius inversion over a certain nite partially ordered set.
As a third example, recall Mobius inversion as we studied it earlier:
f(n) =

d[n
g(d) for all n ^ i g(n) =

d[n
(d)f(n/d) for all n ^,
where is the classical Mobius function of elementary number theory. The
general goal is to introduce the abstract theory of Mobius inversion over
nite posets and look at special applications that yield the above results and
more as special examples of this general theory. As usual, we just scratch the
surface of this broad subject. An interesting observation, however, is that
although special examples have been appearing at least since the 1930s, the
general theory has been developed primarily by G.-C. Rota and his students,
starting with Rotas 1964 paper, On the foundations of combinatorial theory
I. Theory of Mobius functions, Z. Wahrsch. Verw. Gebiete 2(1964), 340
368.
5.2 POSETS
A partially ordered set P (i.e., a poset P) is a set P together with a
relation on P for which (P, ) satises the following:
206 CHAPTER 5. M

OBIUS INVERSION ON POSETS


PO1. is reexive (x x for all x P);
PO2. is transitive (x y and y z x z x, y, z P);
PO3. is antisymmetric (x y and y x x = y x, y P).
A poset (P, ) is a chain (or is linearly ordered) provided
PO4. For all x, y P, either x y or y x.
Given a poset (P, ), an interval of P is a set of the form
[x, y] = z P : x z y,
where x y. So [x, x] = x, but is NOT an interval. P is called locally
nite provided [[x, y][ < whenver x, y P, x y. An element of P is
called a zero (resp., one) of P and denoted

0 (resp.,

1) provided

0 x for
all x P (resp., x

1 for all x P). Finally, we write x < y provided x y


but x ,= y.
EXAMPLES OF LOCALLY FINITE POSETS
Example 5.2.1 T = 1, 2, . . . , with the usual linear order. Here T is a
chain with

0 = 1. For each n T, let [n] = 1, 2, . . . , n with the usual
linear order.
Example 5.2.2 For each n ^, B
n
consists of the subsets of [n] ordered
by inclusion (recall that [0] = ). So we usually write B
n
= 2
[n]
, with S T
in B
n
i S T [n].
Example 5.2.3 In general any collection of sets can be ordered by inclu-
sion to form a poset. For example, let L
n
(q) consist of all subspaces of an
n-dimensional vector space V
n
(q) over the eld F = GF(q), ordered by in-
clusion.
Example 5.2.4 Put D = T with dened by: For i, j D, i j i i[j.
For each n T, let D
n
be the interval [1, n] = d : 1 d n and d[n. For
i, j D
n
, i j i i[j.
5.2. POSETS 207
Example 5.2.5 Let n T. The set
n
of all partitions of [n] is made into
a poset by dening (for ,
n
) i each block of is contained in
some block of . In that case we say is a renement of .
Example 5.2.6 A linear partition of [n] is a partition of [n] with a
linear order on each block of . The blocks themselves are unordered, and
() denotes the number of blocks of . /
n
is the set of linear partitions of
[n] with partial order dened by: , for , /
n
, i each block of
can be obtained by the juxtaposition of blocks of .
Example 5.2.7 For n T, let o
n
denote the set of permutations of the
elements of [n] with the following partial order: given , o
n
, we say
i each cycle of (written with smallest element rst) is composed of
a string of consecutive integers from some cycle of (also written with the
smallest element rst).
For example, (12)(3) (123), (1)(23) (123), but (13)(2) , (123). The

0 of o
n
is

0 = (1)(2) (n). As an example, for = (12435) o
5
, we give
the Hasse diagram of the interval [

0, ]. Note, for example, that (12)(435) is


not in the interval since it would appear as (12)(354).
208 CHAPTER 5. M

OBIUS INVERSION ON POSETS


h (124)(35)
h
(12)(35)(4)
h(124)(3)(5)
h (12)(3)(4)(5)
h

0 = (1)(2)(3)(4)(5)
h
(1)(2)(35)(4)
h
(1)(24)(3)(5)
h
(1243)(5)
h
(12435)
h
(1)(24)(35)
h
(1)(243)(5)
h
(1)(2435)
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
f
f
f
f
f
f
f
f
f
f
f

f
f
f
f
f
f
f
f
f
f
f
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r

4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
44

d
d
d
d
d
d
d
d
d
d
d

f
f
f
f
f
f
f
f
f
f
f

t
t
t
t
t
t
t
t
t
t
t

Hasse diagram of interval [

0, ] in o
n
, = (12435)
5.3 Vector Spaces and Algebras
Let K be any eld (but K = c is the usual choice for us). Let P be any
(nonempty) set. The standard way to make K
P
= f : P K into a
vector space over K is to dene vector addition by: (f +g)(p) = f(p) +g(p)
for all p P and any f, g K
P
. And then scalar multiplication is dened
by (af)(p) = a f(p), for all a K, f K
P
and p P. The usual axioms
for a vector space are then easily veried.
5.3. VECTOR SPACES AND ALGEBRAS 209
If V is any vector space over K, V is an algebra over K if there is
also a vector product which is bilinear over K. This means that for each
pair (x, y) of elements of V , there is a product vector xy V for which the
following bilinearity conditions hold:
(1) (x +y)z = xz +yz and x(y +z) = xy +xz, x, y, z V ;
(2) a(xy) = (ax)y = x(ay) for all a K; x, y V.
In these notes we shall be interested only in nite dimensional (linear)
algebras, i.e., algebras in which the vector space is nite dimensional over
K. So suppose V has a basis e
1
, . . . , e
n
as a vector space over K. Then
e
i
e
j
is to be an element of V , so e
i
e
j
=

n
k=1
c
ijk
e
k
for unique scalars c
ijk

K. The n
3
elements c
ijk
are called the multiplication constants of the
algebra relative to the chosen basis. They give the value of each product
e
i
e
j
, 1 i, j n. Moreover, these products determine every product in V .
For suppose x =

n
i=1
a
i
e
i
and y =

n
j=1
b
j
e
j
are any two elements of V .
Then xy = (

i
a
i
e
i
)(

j
b
j
e
j
) =

i,j
(a
i
e
i
)(b
j
e
j
) =

i,j
a
i
(e
i
(b
j
e
j
)) = =

i,j
a
i
b
j
(e
i
e
j
), and hence xy is completely determined by all the products
e
i
e
j
. In fact, if we dene e
i
e
j
(any way we please!) to be some vector of V ,
1 i, j n, and then dene xy =

n
i,j=1
a
i
b
j
(e
i
e
j
) for x and y as above,
then it is an easy exercise to show that conditions (1) and (2) hold so that
V with this product is an algebra.
An algebra V over K is said to be associative provided its multiplication
satises the associative law (xy)z = x(yz) for all x, y, z V .
Theorem 5.3.1 An algebra V over K with nite basis e
1
, . . . , e
n
as a vector
space over K is associative i (e
i
e
j
)e
k
= e
i
(e
j
e
k
) for 1 i, j, k n.
Proof: If V is associative, clearly (e
i
e
j
)e
k
= e
i
(e
j
e
k
) for all i, j, k =
1, . . . , n. Conversely, suppose this holds. Let x =

a
i
e
i
, y =

b
j
e
j
, z =

c
k
e
k
are any three elements of V . Then (xy)z =

a
i
b
j
c
k
(e
i
e
j
)e
k
and
x(yz) =

a
i
b
j
c
k
e
i
(e
j
e
k
). Hence (xy)z = x(yz) and V is associative.
The algebras we study here are nite dimensional linear associative alge-
bras.
210 CHAPTER 5. M

OBIUS INVERSION ON POSETS


5.4 The Incidence Algebra 1(P, K)
Let (P, ) be a locally nite poset, and let Int(P) denote the set of intervals
of P. Let K be any eld. If f K
Int(P)
, i.e., f : Int(P) K, write f(x, y)
for f([x, y]).
Here is an example we will nd of interest. P = D
n
= d : 1 d
n and d[n. An interval of P is a set of the form [i, j] = k : i[k and k[j,
where 1 i j n and i[j, i[n, j[n. Dene
n
: Int(P) c by
n
(i, j) =
(
j
i
), where is the classical Mobius function we studied earlier and [i, j] is
any interval of P.
Def.
n
The incidence algebra 1(P, K) of P over K is the K-algebra of all
functions f : Int(P) K, with the usual structure of a vector space over
K, and where the algebra multiplication (called convolution) is dened by
(f g)(x, y) =

z:xzy
f(x, z)g(z, y), (5.7)
for all intervals [x, y] of P and all f, g 1(P, K). The sum in Eq. 5.7 is nite
(so f g really is dened), since P is locally nite.
Theorem 5.4.1 1(P, K) is an associative K-algebra with two-sided identity
denoted (or sometimes denoted 1) and dened on intervals [x, y] by
(x, y) =
_
1, if x = y;
0, if x ,= y.
Proof:
This is a straightforward and worthwhile (but tedious!) exercise. It is
probably easier to establish associativity by showing (f g) h = f (g h)
in general than it is to establish associativity for some specic basis and
then to use Theorem 5.3.1. And to establish that f = f = f for
all f 1(P, K) is almost trivial. The bilinearity conditions are also easily
established.
Note: It is quite helpful to think of 1(P, K) as the set of all formal
expressions
f =

[x,y]Int(P)
f(x, y)[x, y] (allowing innite linear combinations).
5.4. THE INCIDENCE ALGEBRA 1(P, K) 211
Then convolution is dened by requiring that
[x, y] [z, w] =
_
[x, w], if y = z;
0, if y ,= z,
for all [x, y], [z, w] Int(P), and then extending to all of 1(P, K) by bilin-
earity. This shows that when Int(P) is nite, one basis of 1(P, K) may be
obtained by setting 1
[x,y]
equal to the function from Int(P) to K dened by
1
[x,y]
(z, w) =
_
1, if [z, w] = [x, y];
0, if [z, w] ,= [x, y] (but [x, y], [z, w] Int(P)).
Then the set 1
[x,y]
: [x, y] Int(P) is a basis for 1(P, K) and the
multiplication constants are given by
1
[x,y]
1
[z,w]
=
y,z
1
[x,w]
, where
y,z
=
_
1, y = z;
0, otherwise.
(5.8)
Exercise: 5.4.2 Show that if P is any nite poset, its elements can be labeled
as x
1
, x
2
, . . . , x
n
so that x
i
< x
j
in P implies that i < j.
Suppose that P is a nite poset, say P = x
1
, . . . , x
n
. Let f =

x
i
x
j
f(x
i
, x
j
)[x
i
, x
j
]
K
Int(P)
. Then dene the n n matrix M
f
by
(M
f
)
i,j
=
_
f(x
i
, x
j
), if x
i
x
j
;
0, otherwise.
We claim that the map f M
f
is an algebra isomorphism from 1(P, K)
to the algebra of all n n matrices over K with (i,j) entries equal to zero
if x
i
, x
j
. It is almost obvious that f M
f
is an isomorphism of vector
spaces, but we have to work a little to see that multiplication is preserved.
So suppose f, g K
Int(P)
. Using [x, y] [z, w] =
y,z
[x, w] we have
f g =
_
_

x
i
x
j
f(x
i
, x
j
)[x
i
, x
j
]
_
_

_
_

x
k
x
l
f(x
k
, x
l
)[x
k
, x
l
]
_
_
212 CHAPTER 5. M

OBIUS INVERSION ON POSETS


=

x
i
x
j
x
k
x
l
f(x
i
, x
j
)g(x
k
, x
l
)[x
i
, x
j
] [x
k
, x
l
]
=

x
i
x
l
_
_

x
j
:x
i
x
j
x
l
f(x
i
, x
j
)g(x
j
, x
l
)
_
_
[x
i
, x
l
].
So
(f g)([x
i
, x
l
]) =

x
j
:x
i
x
j
x
l
f(x
i
, x
j
)g(x
j
, x
l
).
Now with matrices M
f
and M
g
dened as above:
(M
f
M
g
)
i,l
=
n

j=1
(M
f
)
i,j
(M
g
)
j,l
=

x
j
:x
i
x
j
x
l
(M
f
)
i,j
(M
g
)
j,l
=
_

x
j
:x
i
x
j
x
l
f(x
i
, x
j
)g(x
j
, x
l
), if x
i
x
l
;
0, otherwise.
Note that if P is ordered so that x
i
< x
j
implies i < j, then the matrices are
exactly all upper triangular matrices M = (m
ij
) over K, 1 i, j n, with
m
ij
= 0 if x
i
, x
j
.
For example, if P has the (Hasse) diagram
h x
1
h x
3
hx
2
hx
4
h
x
5

d
d
d
d

d
d
d
d
then 1(P, K) is isomorphic to the algebra of all matrices of the form:
_
_
_
_
_
_
_
_
0 0
0
0 0 0
0 0 0
0 0 0 0
_
_
_
_
_
_
_
_
.
5.4. THE INCIDENCE ALGEBRA 1(P, K) 213
Theorem 5.4.3 Let f 1(P, K). Then the following are equivalent:
(i) f has a left inverse;
(ii) f has a right inverse;
(iii) f has a two-sided inverse f
1
(which is necessarily the unique
left and right inverse of f);
(iv) f(x, x) ,= 0 x P.
Moreover, if f
1
exists, then f
1
(x, y) depends only on the
poset [x, y].
Proof: fg = i f(x, x)g(x, x) = 1 for all x P and 0 =

z:xzy
f(x, z)g(z, y)
whenever x < y, x, y P. The last equation is equivalent to
g(x, y) = f(x, x)
1

z:x<zy
f(x, z)(g(z, y))
whenever x < y, x, y P, and also to
f(x, y) = g(y, y)
1

z:xz<y
f(x, z)g(z, y).
It follows that if f has a right inverse g then f(x, x) ,= 0 for all x
P, and in this case g(x, y) = f
1
(x, y) depends only on [x, y]. For the
converse, suppose this condition holds. (Note: If x
i
< x
j
implies i < j, so
that M
f
is upper triangular, then M
f
is invertible i it has no zero on the
main diagonal, which is i f(x, x) ,= 0 for all x P. And of course f is
invertible i M
f
is invertible. But we give the general proof.) First dene
g(y, y) = f(y, y)
1
for all y P. Then if [x, y] = x, y with x ,= y, put
g(x, y) = (f(x, x))
1
f(x, y)g(y, y). If the maximum length of any chain in
[x, y] is 3, say [x, y] = x, z
1
, . . . , z
k
, y with x < z
i
< y and z
i
, z
j
for all
i ,= j, put
g(x, y) = (f(x, x))
1
__
k

i=1
f(x, z
i
)g(z
i
, y)
_
+f(x, y)g(y, y)
_
.
214 CHAPTER 5. M

OBIUS INVERSION ON POSETS


Now suppose that the maximum length of any chain in [x, y] is 4, say
x < w < z < y is a maximal chain in [x, y]. For each u [x, y] x, either
[u, y] = y, or [u, y] = u, y, or [u, y] = u, w, y for some w [x, y]. In
any case, g(u, y) is already dened for all u [x, y] x. So we may dene
g(x, y) = (f(x, x))
1

u[x,y]\x
f(x, u)g(u, y).
Proceed downward by induction on the maximum length of chain contained
in [x, y]. Since P is nite, this process will terminate in a nite number of
steps. Clearly if f
1
exists, then f(x, y)
1
(x, y) depends only on the poset
[x, y].
Similarly, g has a left inverse f i g(y, y) ,= 0 for all y P, and in this case
f(x, y) = g
1
(x, y) depends only on [x, y]. But here we dene f(x, y) by an
upward induction. Applying this argument to f (instead of g), we see f has a
left inverse h i f(x, x) ,= 0 for all x P i f has a right inverse g. But since
is associative, from f g = = hf, we have g = (hf)g = h(f g) = h.
The zeta function of P is dened by
(x, y) = 1 for all [x, y] Int(P). (5.9)
The zeta function is of interest in its own right (we include an optional
page dealing with ), but for us the main interest in is that by Theorem 5.4.3
it has an inverse called the Mobius function of the poset P.
One can dene inductively without reference to the incidence algebra
1(P, K). Namely, = is equivalent to
(x, x) = 1 for all x P,
and
(x, y) =

z:xz<y
(x, z) whenever x < y. (5.10)
Similarly, = is equivalent to
5.5. OPTIONAL SECTION ON 215
(x, x) = 1 for all x P,
and
(x, y) =

z:x<zy
(z, y) whenever x < y. (5.11)
5.5 Optional Section on
Start with (x, y) = 1 for all [x, y] Int(P). Then
2
(x, y) =

z:xzy
1 = [[x, y][, if x y. More generally, for k T,

k
(x, y) =

(x
0
,...,x
k
):x=x
0
x
1
x
k
=y
1
is the number of multichains of length k from x to y.
Theorem 5.5.1 ( )(x, y) =
_
1, x < y;
0, x = y.
Proof: Clear.
Hence for k T, ( )
k
(x, y) is the number of chains x = x
0
< x
1
<
< x
k
= y of length k from x to y.
Theorem 5.5.2 (2)(x, y) =
_
1, if x = y;
1, if x < y.
So (2)
1
exists and
(2 )
1
(x, y) is equal to the total number of chains x = x
0
< x
1
< <
x
k
= y from x to y.
Proof: Let l be the length of the longest chain in the interval [x, y].
Then ( )
l+1
(u, v) = 0 whenever x u v y. Thus for x u
v y, (2 )[1 + ( 1) + ( 1)
2
+ + ( 1)
l
](u, v) = (1 (
1))[1 + ( 1) + ( 1)
2
+ + ( 1)
l
](u, v) = (1 ( 1)
l+1
)(u, v) =
(u, v). Hence (2 )
1
= 1+( 1) + +( 1)
l
, when restricted to the
interval [x, y]. But by the denition of l and theorem 5.5.1 it is clear that
(1 +( 1) + +( 1)
l
)(x, y) is the total number of chains from x to y.
216 CHAPTER 5. M

OBIUS INVERSION ON POSETS


Theorem 5.5.3 Dene : Int(P) K by
(x, y) =
_
1, if y covers x;
0, otherwise.
Then (1 )
1
(x, y) is equal to the total number of maximal chains in [x, y].
Proof:
k
(x, y) =

1, where the sum is over all (z
0
, . . . , z
k
) where x =
z
0
, y = z
k
, and z
i+1
covers z
i
, i = 0, . . . , k 1. So

k=0

k
(x, y) is the total
number of maximal chains in [x, y]. If l is the length of the longest maximal
chain in [x, y], then (1 )
1
=

k=0

k
, which equals

l
k=0

k
on [x, y].
5.6 The Action of 1(P, K) and Mobius Inver-
sion
The Mobius function plays a central role in Mobius inversion, as does the
incidence algebra 1(P, K). But before we can make this precise we need to
see how 1(P, K) acts on the vector space K
P
= f : P K. Clearly K
P
is a vector space over K in the usual way.
For each 1(P, K), acts in two ways as a linear transformation on
K
P
. On the right acts by
(f )(x) =

yx
f(y)(y, x) for all x P and f K
P
. (5.12)
In particular (f )(x) =

yx
f(y)(y, x) = f(x), implying that f = f.
On the left acts by
( f)(x) =

yx
(x, y)f(y), for all x P, f K
P
. (5.13)
Similarly, f = f.
For these to be actions in the usual sense, it must be true that the fol-
lowing hold:
(f
1
)
2
= f (
1

2
), for all f K
P
;
1
,
2
1(P, K), (5.14)
5.6. THE ACTION OF 1(P, K) AND M

OBIUS INVERSION 217


and

1
(
2
f) = (
1

2
) f for all f K
P
;
1
,
2
1(P, K). (5.15)
We verify Eq. 5.14 and leave Eq. 5.15 as a similar exercise. So for each
x P,
((f
1
)
2
)(x) =

yx
(f
1
)(y)
2
(y, x) =

w,y:wyx
f(w)
1
(w, y)
2
(y, x) =
=

w:wx
f(w)(

y:wyx

1
(w, y)
2
(y, x)) =

w:wx
f(w)(
1

2
)(w, x) =
= (f (
1

2
))(x).
Theorem 5.6.1 Mobius Inversion Formula Let P be a poset in which
every principal order ideal
x
= y P : y x is nite. Let f, g : P K.
Then g(x) =

yx
f(y) for all x P i f(x) =

yx
g(y)(y, x) for all
x P. Dually, if each principal dual order ideal V
x
= y P : y x
is nite and f, g : P K, then g(x) =

yx
f(y), for all x P, i
f(x) =

yx
(x, y)g(y) for all x P.
Proof: The rst version of Mobius inversion is just that f = g i
f = g. The second is that f = g i f = g. These follow easily from the
above. For example, if f = g we have g = (f ) = f ( ) = f = f.
Example 5.6.2 Consider the chain P = ^ with the usual linear ordering,
so

0 = 0. Here (x, x) = 1 for all x P, and for x < y, (x, y) =

z:xz<y
(x, z). So if y covers x, (x, y) = 1. If y covers z and z
covers x, then (x, y) = ((x, x) + (x, z)) = (1 1) = 0. If y covers
z, z covers w, w covers x, then (x, y) = ((x, x) + (x, w) + (x, z)) =
(1 + (1) + 0) = 0. By induction,
(i, j) =
_

_
1, if i = j,
1 if j = i + 1,
0, otherwise.
218 CHAPTER 5. M

OBIUS INVERSION ON POSETS


Then Mobius inversion takes the following form: For f, g : P K, g(n) =

n
i=0
f(i) for all n 0 i f(n) =

n
i=0
g(i)(i, n) = g(n) g(n 1). So
the sum operator

((

f)(n) =

n
i=0
f(i)) and the dierence operator
(( f)(n) = f(n)f(n1)) are inverses of each other. This may be viewed
as a nite dierence analogue of the fundamental theorem of calculus.
5.7 Evaluating : the Product Theorem
One obstacle to applying Mobius inversion is that even when the Poset (P, )
is fairly well understood, evaluating the Mobius function can be quite
dicult. In this section we prove the Product Theorem and apply it to
three well known posets. One corollary is the famous Principle of Inclusion-
Exclusion.
Let P and Q be posets. Then the direct (or Cartesian) product of P
and Q is the poset P Q = (x, y) : x P and y Q, with (x, y) (x
t
, y
t
)
in P Q i x x
t
in P and y y
t
in Q. The direct product P P P
with n factors is then dened in the natural way and is denoted by P
n
. There
are three examples of interest at this point.
Example 5.7.1 For integers m, n T, let B
m
, B
n
be the posets of all sub-
sets of [m], [n], respectively, ordered by inclusion. Then B
m
B
n
B
m+n
.
If we identify B
m
with [2]
[m]
= f : [m] [2] in the usual way, then we
have [2]
[m]
[2]
[n]
[2]
m+n
. On the other hand, if 2 = 1, 2 with partial
order dened by 1 1 2 2 (i.e., 2 is the set [2] with the usual linear
order), then B
1
2, so B
n
B
1
B
1
2 2 2
n
. Hence
2
m
2
n
2
m+n
.
Example 5.7.2 Recall that for a positive integer k, k denotes [k] with the
usual linear order. Then let n
1
, . . . , n
k
^ and put P = 1 +n
1
1 +n
k
.
We may identify the elements of P with the set of k-tuples (a
1
, . . . , a
k
) ^
k
with 0 a
i
n
i
ordered componentwise, i.e., (a
1
, . . . , a
k
) (b
1
, . . . , b
k
)
i a
i
b
i
for i = 1, 2, . . . , k. Then if this relation holds, the interval
[(a
1
, . . . , a
k
), (b
1
, . . . , b
k
)] is isomorphic to b
1
a
1
+ 1 b
2
a
2
+ 1
b
k
a
k
+ 1.
5.7. EVALUATING : THE PRODUCT THEOREM 219
A little thought reveals that Example 5.7.2 is a straightforward general-
ization of Example 5.7.1.
Example 5.7.3 Recall that

n
is the poset of partitions of [n], where two
partitions ,

n
satisfy i each block of is contained in a single
block of . Now suppose that = A
1
, . . . , A
k
and that A
i
is partitioned
into
i
blocks in . Then in

n
, the intereval [, ] is isomorphic to

k
.
Note that

2
2 and (

2
)
k
(2)
k
. Hence if and are partitions of
[n] for which = A
1
, . . . , A
k
has k parts, each of which breaks into two
parts in , then the interval [, ] in

n
is isomorphic to B
k
.
Theorem 5.7.4 (The Product Theorem) Let P and Q be locally nite
posets with Mobius functions
P
and
Q
, respectively, and let P Q be their
direct product with Mobius function
PQ
. Then whenever (x, y) (x
t
, y
t
)
in P Q,
PQ
((x, y), (x
t
, y
t
)) =
P
(x, x
t
)
Q
(y, y
t
).
Proof:

(u,v):(x,y)(u,v)(x

,y

P
(x, u)
Q
(y, v) =
=
_
_

u:xux

P
(x, u)
_
_
_
_

v:yvy

Q
(y, v)
_
_
=
=
x,x

y,y
=
(x,y),(x

,y

)
.
Also

(u,v):(x,y)(u,v)(x

,y

PQ
((x, y), (u, v)) =
(x,y),(x

,y

)
.
But since

z:xzy
(x, z) =
x,y
in some poset with Mobius function
determines uniquely, it must be that
PQ
=
P

Q
.
Using the product theorem we can say a great deal about the Mobius
functions of the three examples above.
220 CHAPTER 5. M

OBIUS INVERSION ON POSETS


Theorem 5.7.5 If is the Mobius function of B
n
, then (S, T) = (1)
[TS[
if S T, for all S, T B
n
.
Proof: A subset S (of [n], say) corresponds to an n-tuple (s
1
, . . . , s
n
) with
each s
i
equal to 0 or 1. Similarly, T (t
1
, . . . , t
n
). Then S T i s
i
t
i
for all i = 1, . . . , n. Then (with a natural abuse of notation)
i
(s
i
, t
i
) = 1
or 1 according as s
i
= t
i
or s
i
,= t
i
. And (S, T) =

n
i=1

i
(s
i
, t
i
) =

n
i=1
(1)
t
i
s
i
= (1)
[TS[
.
It follows that the two versions of Mobius inversion for B
n
become:
Theorem 5.7.6 Let f, g : B
n
K. Then
(i) g(S) =

T:TS
f(T) S B
n
i f(S) =

T:TS
(1)
[ST[
g(T) S B
n
;
and
(ii) g(S) =

T:TS
f(T) S B
n
i f(S) =

T:TS
(1)
[TS[
g(T) S B
n
.
Either of these two statements is called the Principle of Inclusion -
Exclusion.
The following is a standard combinatorial situation involving inclusion
exclusion. We are given a set E of objects and a set P = P
1
, . . . , P
n

of properties. Each object in E either does or does not have each of the
properties. (We may actually think of P
i
as just being a subset of E but
without the assumption that P
i
,= P
j
when i ,= j.) For each collection
T = P
i
1
, . . . , P
i
k
of the properties (i.e. for T P), let f(T) = [x E :
x P
i
i P
i
T[. And let g(T) = [x E : x P
i
whenever P
i
T[.
Then g(T) =

S:ST
f(S). (This just says that an object x of E has all the
properties in T i it has precisely the properties in S for some S T.) So
by the second version of Mobius inversion, we have
f(T) =

S:ST
(1)
[ST[
g(S). (5.16)
5.7. EVALUATING : THE PRODUCT THEOREM 221
In particular, the number of objects x of E having none of the properties
in P is given by
f() =

Y P
(1)
[Y [
g(Y ) (By convention g() = [E[.) (5.17)
We look at three classical applications.
Application 1. Let A
1
, . . . , A
n
be subsets of some (nite) set E. Then
by Eq. 5.17 the number of objects of E in none of the A
i
is
[E (A
1
A
n
)[ = [E[
n

i=1
[A
i
[ +

1i<jn
[A
i
A
j
[+

1i<j<kn
[A
i
A
j
A
k
[ + + (1)
n
[A
1
A
2
A
n
[. (5.18)
Application 2 (Derangements) Let E = o
n
, the set of all permutations
of the elements of [n]. Let A
i
= o
n
: (i) = i, i = 1, . . . , n. If T is a set
of j of the A
i
s, then g(T) = [ o
n
: (i) = i for all A
i
T[ = (n j)!
It follows that the number D(n) of derangements in o
n
(i.e., o
n
with
(i) ,= i for all i, so o
n
(A
1
A
n
)) is given by
D(n) =

T:TA
1
,...,A
n

(1)
[T[
g(T) =
n

i=0
(1)
i
_
n
i
_
(n i)! =
= (n!)
n

i=0
(1)
i
i!
. (5.19)
This is a special case of a general situation: suppose f(T) = f(T
t
) when-
ever [T[ = [T
t
[ (As above, f(T) is the number of objects of E having a prop-
erty P
i
if and only if P
i
T.) Then also g(T) =

S:ST
f(S) depends only on
[T[. So for each i, 0 i n, if [T[ = i, let a(ni) = f(T) and b(ni) = g(T).
Then g(T) =

S:ST
f(S) becomes b(n i) =

n
j=i
_
n i
j i
_
a(n j), or
writing m = n i, k = j i, we have
222 CHAPTER 5. M

OBIUS INVERSION ON POSETS


b(m) =
m

k=0
_
m
k
_
a(mk) =
m

i=0
_
m
i
_
a(i). (5.20)
And f(T) =

S:ST
(1)
[ST[
f(S) becomes
a(n i) =

n
j=i
(1)
ji
_
n i
j i
_
b(n j), which we rewrite as
a(m) =
m

k=0
(1)
k
_
m
k
_
b(mk) =
m

k=0
(1)
mk
_
m
k
_
b(k). (5.21)
Hence the Mobius inversion formula says:
b(m) =
m

i=0
_
m
i
_
a(i) for 0 m n
i a(m) =
m

i=0
(1)
mi
_
m
i
_
b(i), 0 m n. (5.22)
Exercise: 5.7.7 If A is the matrix whose (i, j) entry is
_
j
i
_
, 0 i, j n,
then A
1
is the matrix whose (i, j) entry is (1)
ji
_
j
i
_
. (Hint: Try putting
b(m) = (x + 1)
m
and a(m) = x
m
for 0 m n.)
We give an explicit example of the above to illustrate the simple nature
of the statememt of the result when viewed in matrix form.
_
_
_
_
_
_
_
_
1 1 1 1 1
0 1 2 3 4
0 0 1 3 6
0 0 0 1 4
0 0 0 0 1
_
_
_
_
_
_
_
_
1
=
_
_
_
_
_
_
_
_
1 1 1 1 1
0 1 2 3 4
0 0 1 3 6
0 0 0 1 4
0 0 0 0 1
_
_
_
_
_
_
_
_
.
Application 3 (Eulers phi-function again). Let n T and suppose
p
1
, . . . , p
k
are the distinct prime divisors of n. E = [n]. A
i
= x E : p
i
[x,
5.7. EVALUATING : THE PRODUCT THEOREM 223
i = 1, . . . , k. First note that [A
i
1
A
i
2
A
i
k
[ =
n
p
i
1
p
i
k
. Then the
principle of inclusion-exclusion gives
(n) = n

1ik
n
p
i
+

1i<jk
n
p
i
p
j
+ (1)
K
n
p
1
p
k
= n
k

i=1
_
1
1
p
i
_
.
It is easy to show that this agrees with our formula developed quite some
time ago.
Note:
1

1
p
i
+

1
p
i
p
j
+ (1)
k
1
p
1
p
k
=

d[n
(d)
d
,
where is the classical Mobius function. So (n) =

d[n
(d)
n
d
. Now using
classical Mobius inversion (in reverse), n =

d[n
(d), a familiar equality.
We now propose to illustrate the connection between classical Mobius
inversion and our new version over posets. Recall Example 5.7.2 from the
beginning of this section: P = (n
1
+ 1) (n
k
+ 1), as well as the Mobius
function for chains as worked out in Section 5.6. By the product theorem, if
[(a
1
, . . . , a
k
), (b
1
, . . . , b
k
)] is an interval in P,
((a
1
, . . . , a
k
), (b
1
, . . . , b
k
)) =
_
(1)

(b
i
a
i
)
, if each b
i
a
i
= 0 or 1;
0, otherwise.
(5.23)
Now suppose n is a positive integer of the form n = p
n
1
1
p
n
k
k
, where
p
1
, . . . , p
k
are distinct primes. Let D
n
be the poset of positive integral divisors
of n, ordered by division (i.e., i j in D
n
i i[j.) But we identify P above
with D
n
according to the following scheme: (a
1
, . . . , a
k
) P corresponds
to p
a
1
1
p
a
k
k
in D
n
. (Here it is convenient to let the elements of n
k
+ 1
be 0, 1, . . . , n
k
.) Then Eq. 5.23, when interpreted for D
n
, becomes: For
r, s D
n
,
(r, s) =
_
(1)
t
, if s/r is a product of t distinct primes,
0, otherwise.
(5.24)
224 CHAPTER 5. M

OBIUS INVERSION ON POSETS


In other words, (r, s) is just the classical Mobius function (s/r). Then
our new Mobius inversion formula in D
n
looks like:
g(m) =

d[m
f(d), m[n, i f(m) =

d[m
g(d)(d, m), m[n.
Writing (
m
d
) in place of (d, m) gives:
g(m) =

d[m
f(d) m[n i f(m) =

d[m

_
m
d
_
g(d) m[n.
As n is arbitrary, this is just the classical Mobius inversion formula.
At this point we have seen that the classical Principle of Inclusion-Exclusion
is just Mobius inversion over B
n
and the classical Mobius inversion formula
is just Mobius inversion over D
n
.
Exercise: 5.7.8 The Mobius Function of the Poset

n
, n 1. Recall that
to make

n
into a poset, for ,

n
, we dened i each part of
is contained in some part of . For

n
, dene () to be the number of
parts of . Example: If = 1, 3, 5, 7, 8, 2, 4, 6, then () = 3. The
goal of this exercise is to compute the Mobius function of

n
. The underlying
eld of coecients is denoted by K. The exercise is broken into ten small
steps.
Step 1.

n
has a

0 and a

1, with () = n i =

0 and () = 1 i
=

1.
Step 2. Let = B
1
, . . . , B
k


n
. Suppose that B
i
is partitioned
into
i
blocks in . The interval [, ] in

n
is isomorphic to the direct prod-
uct

k
. Illustrate this with = 1, 2, 3, 4, 5, 6, 7, 8, 9,
= 1, 2, 3, 4, 5, 6, 7, 8, 9. As a special case, for

n
, ,
[,

1]

()
.
Step 3. For each positive integer n, let
n
= (

0,

1), where is the


Mobius function of

n
. Then using the notation of Step 2, i.e., [, ]

k
, we have (, ) =

k
.
Step 4. Recall (from where ?) that x
n
=

n
k=0
_
n
k
_
x
k
. Then for each
positive integer m,
5.7. EVALUATING : THE PRODUCT THEOREM 225
m
n
=

n
m
()
.
Dene f :

n
K : m
()
, and g :

n
K : m
()
. Then
g(

0) =

0
f().
Step 5. For each

n
, (briey justify each step)
g() = m
()
=

()
m
()
=

n
:
m
()
=

n
:
f().
For each

n
, the poset P

=

n
: = [,

1] is isomorphic
to

()
. So in

n
is

0 in P

. And g(

0) =

0
f() stated for P

says that
for all

n
, g() =

f(). To this we may apply Mobius inversion


to obtain
f() =

(, )g().
Step 6. For each

n
,
f() =

n
:
(, )m
()
.
In this put =

0 to obtain
(m)
n
= f(

0) =

n
(

0, )m
()
=

k
_
_
_

n
:()=k
(

0, )
_
_
_m
k
.
As this holds for innitely many m, we have a polynomial identity:
226 CHAPTER 5. M

OBIUS INVERSION ON POSETS


(x)
n
=
n

k=0
_
_
_

n
:()=k
(

0, )
_
_
_x
k
.
Recall (from where?) that (x)
n
=

n
k=0
s(n, k)x
k
, where s(n, k) = (1)
nk
c(n, k)
is a Stirling number of the rst kind. So comparing the coecients on x
k
we
nd once again that s(n, k) =

n
:()=k
(

0, ) = w
nk
(which is called
the (n k)th Whitney number of

n
of the rst kind).
Step 7. For each positive integer m,
m
n
=
n

k=1
_
_
_

n
:()=k
(

0, )
_
_
_m
k
.
Step 8. As polynomials in x we have
x
n
=
n

k=1
_
_
_

n
:()=k
(

0, )
_
_
_x
k
,
and

n
:()=k
(

0, ) =
_
n
k
_
(1)
nk
, 1 k n.
Step 9. (

0,

1) = (1)
n1
[(n 1)!].
Step 10. If = B
1
, . . . , B
k
and B
i
breaks into
i
parts in , then
(, ) =

n
i=1
(1)

i
1
[(
i
1)!].
5.8 More Applications of Mobius Inversion
Consider the following three familiar sequences of polynomials.
(1) The power sequence: x
n
, n = 0, 1, . . ..
5.8. MORE APPLICATIONS OF M

OBIUS INVERSION 227


(2) The falling factorial sequence: (x)
n
= x(x 1) (x n + 1), n =
0, 1, . . ..
(3) The rising factorial sequence: (x)
n
= x(x + 1) (x + n 1), n =
0, 1, . . ..
For n = 0 we have the following conventions: x
0
= (x)
0
= (x)
0
= 1.
Theorem 5.8.1 For m, n P we have the following:
(i) m
n
= [f : [n] [m][ = [[m]
[n]
[.
(ii) (m)
n
= [f : [n] [m] : f is one-to-one[.
(iii) (m)
n
= [f : [n] [m] : f is a disposition, i.e., for each
d [m], f
1
(d) is assigned a linear order[.
Proof: The rst two identities need no further explanation, but the third
one probably does. A disposition may be visualized as a placing of n distin-
guishable ags on m distinguishable agpoles. The poles are not ordred, but
the ags on each pole are ordered. For the rst ag there are m choices of
agpole. For the second ag there are m 1 choices of pole other than the
one ag 1 is on. On that pole there are two choices, giving a total of m + 1
choices for ag 2. Similarly, it is easy to see that there is one more choice for
ag k + 1 than there was for ag k. Hence the number of ways to assign all
n ags is m(m+ 1) (m+n 1) = (m)
n
.
Theorem 5.8.2 For each n ^ we have the following:
(i) x
n
=

n
k=0
S(n, k)(x)
k
. This is Theorem 1.7.2.
(i)
t
(x)
n
=

n
k=0
s(n, k)x
k
. This is Corollary 1.5.6.
(ii) (x)
n
=

n
k=1
n!
k!
_
n1
k1
_
(x)
k
.
(ii)
t
(x)
n
=

n
k=1
(1)
nk n!
k!
_
n1
k1
_
(x)
k
.
(iii) (x)
n
=

k
c(n, k)x
k
. This is Theorem 1.5.5.
(iii)
t
x
n
=

k
(1)
nk
S(n, k)(x)
k
. This is Corollary 1.7.3.
228 CHAPTER 5. M

OBIUS INVERSION ON POSETS


Proof: The only two parts that need proving are (ii) and (ii)
t
, and we
now establish (ii).
A linear partition is a partition of [n] together with a total order on
the numbers in each part of . The parts themselves are unordered. Let /
n
denote the collection of all linear partitions of [n] , and let () denote the
number of blocks of . Each disposition from [n] to [m] may be thought of
as a pair consisting of a linear partition of [n] into k blocks and a one-to-
one function g mapping [k] to [m]. Since (m)
n
counts the total number of
dispositions of [n] and (m)
k
counts the one-to-one functions from [k] to [m],
we have
(m)
n
=

/
n
(m)
()
. (5.25)
To obtain the number of linear partitions of [n] into k blocks, note that
there are n!
_
n1
k1
_
linear partitions of [n] with k ordered blocks. Visualize this
as a placing of k 1 slashes into the n 1 interior spaces of a permutation
(ordered array) of [n], at most one slash per space. Then divide by k! to get
unordered blocks:
n!
k!
_
n 1
k 1
_
= # linear partitions of [n] with k blocks. (5.26)
These numbers
n!
k!
_
n1
k1
_
are called Lah numbers. Then (ii) follows im-
mediately from Eqs. 5.25 and 5.26. Now replacing x with x interchanges
(ii) and (ii)
t
.
We now do Mobius inversion on each of three carefully chosen posets to
explore the relationship between (a) and (a)
t
, for a = i, ii, and iii.
Let

n
be the set of all partitions of [n] made into a poset by: for ,

n
, i each part of is contained in some part of . In the proof of
Eq. 5.3 (which is equivalent to (i)) we obtained the following:
m
n
=

n
(m)
()
. (5.27)
Dene f :

n
K by f() = (m)
()
and dene g :

n
K by
g() = m
()
. Since

0 in

n
is

0 = 1, 2, 3, . . . , n, and () = n i
=

0, we have
5.8. MORE APPLICATIONS OF M

OBIUS INVERSION 229


g(

0) = m
n
=

n
(m)
()
=

0
f(). (5.28)
For each

n
, the poset P

=

n
: = [,

1] is isomorphic to

()
. So

n
is

0 in P

. And Eq. 5.28 applied to P

says:
g() =

f(), for all

n
. (5.29)
Apply Mobius inversion to Eq. 5.29 to obtain
f() =

(, )g() =

(, )m
()
. (5.30)
Putting =

0 yields
(m)
n
= f(

0) =

n
(

0, )m
()
=
n

k=1
_
_
_

n
and ()=k
(

0, )
_
_
_m
k
.
(5.31)
As this holds for innitely many m, we have a polynomial identity:
(x)
n
=
n

k=1
_
_
_

n
and ()=k
(

0, )
_
_
_x
k
. (5.32)
Comparing Eq. 5.32 with (i)
t
we see that
s(n, k) =

n
and ()=k
(

0, ) = w
nk
, (5.33)
the (n k)th Whitney number of

n
of the rst kind. This shows that (i)
and (i)
t
are related by Mobius inversion on

n
.
Putting k = 1 in Eq. 5.32 (() = 1 i = 1, 2, . . . , n =

1) yields
230 CHAPTER 5. M

OBIUS INVERSION ON POSETS


(

0,

1) is the coecient of x in (x)


n
, which is (1)
n1
(n 1)! (5.34)
If has type (a
1
, . . . , a
k
), i.e., has a
i
parts of size i, then [

0, ] (

1
)
a
1

2
)
a
2
(

k
)
a
k
. Hence (

0, ) =

k
i=1
[(1)
i1
(i 1)!]
a
i
. Putting this
in Eq. 5.33 yields a rather strange formula for s(n, k).
For our second example turn to the set /
n
of all linear partitions of [n].
For , /
n
, say i each block of can be obtained by juxtaposition
of blocks of . Then /
n
is a nite poset.
Fix m T. Dene f : /
n
K by
f() = (m)
n()
(5.35)
Dene g : /
n
K by
g() = (m)
()
(5.36)
Note that for /
n
, =

0 i () = n. Then Eq. 5.25 implies that
(m)
n
= g(

0) =

0
(m)
()
=

0
f(). (5.37)
Exercise: 5.8.3 Show that P

= /
n
: is isomorphic to /
()
.
So Eq. 5.37 generalizes to
g() =

f() for all /


n
. (5.38)
Then Mobius inversion gives
f() =

g()(, ) for all /


n
. (5.39)
Putting =

0 in Eq. 5.38 yields
5.8. MORE APPLICATIONS OF M

OBIUS INVERSION 231


(m)
n
=

/
n
(

0, )(m)
()
. (5.40)
For each /
n
, the interval B

= [

0, ] is Boolean. For example, if =


1, 2, 3, 4, 5, 6, 7, there are 1+3+0 = 4 places to put slashes between
members of one (ordered) part to obtain lower linear partitions. So the
set whose subsets form the Boolean poset is the set of positions between
members of a same part of . And (

0, ) must then be (1)


k
, where k is
the total number of positions between members of a same part of . If has
() parts, then k = n (). Hence from Eq. 5.40 we have
(m)
n
=

/
n
(

0, )(m)
()
=
=

/
n
(1)
n()
(m)
()
=
n

k=1
(1)
nk
n!
k!
_
1
k 1
_
(m)
k
. (5.41)
This holds for all m T, so yields a polynomial identity
(x)
n
=
n

k=1
(1)
nk
n!
k!
_
n 1
k 1
_
(x)
k
. (5.42)
So Eq. 5.42, which is (ii)
t
, is related to (ii) by Mobius inversion on /
n
.
For the third example, make o
n
into a poset as follows. Always write a
permutation o
n
as a product of disjoint cycles so that in each cycle the
smallest element always comes rst (furthest to the left in the cycle). Then
given , o
n
, say i each cycle of is composed of a string of
consecutive integers from some cycle of . For example, (12)(3) (123),
(1)(23) (123), but (13)(2) , (123). See Example 5.2.7 where we gave the
Hasse diagram of the interval [

0, ].
Equation (iii) of Theorem 5.8.1 can be written as
(m)
n
=

0
m
c()
, where c() is the number of cycles of . (5.43)
So c() = n i =

0.
232 CHAPTER 5. M

OBIUS INVERSION ON POSETS


Fix = c
1
c
2
c
k
o
n
(where, of course each cycle c
j
is written with
its smallest element rst, and if i < j, the smallest element of c
i
is less than
the smallest element of c
j
). Then i each cycle of is made up of the
juxtaposition of some cycles of . It follows that P

= o
n
: is
isomorphic to o
k
. So Eq. 5.43 generalizes to
(m)
c()
=

m
c()
. (5.44)
Dene f : o
n
K and g : o
n
K by
f() = (m)
c()
and g() = m
c()
, for all o
n
. (5.45)
Then Eq. 5.44 says
f() =

g(), for all o


n
. (5.46)
So by Mobius inversion we have
g() =

(, )f(), for all o


n
. (5.47)
Putting =

0 in Eq. 5.47 gives
m
n
=

S
n
(

0, )(m)
c()
. (5.48)
We now wish to evaluate (

0, ). Say o
n
is increasing if each of its
cycles increases. So if (i
1
, . . . , i
s
) is a cycle of , then i
1
< i
2
< < i
s
.
Lemma 5.8.4 The Mobius function for o
n
satises the following:
For each o
n
, (

0, ) =
_
(1)
nc()
, if is increasing;
0, otherwise.
(5.49)
5.9. LATTICES AND GAUSSIAN COEFFICIENTS 233
Proof: Given o
n
, consider the interval [

0, ]. The atoms of [

0, ]
correspond to transpositions (i
r
, i
r+1
) where (i
r
, i
r+1
) is a substring of a cycle
of and i
r
< i
r+1
. Thus if is increasing, the atoms of I

= [

0, ] correspond
to all of the possible n c() transpositions. In that case I

is Boolean, and
(

0, ) = (1)
nc(s)
. So suppose is not increasing. Then some cycle
of has a consecutive pair ( , i
r
, i
r+1
) with i
r
> i
r+1
. Form a new
permutation

from by inserting a pair )( of parentheses between i


r
, i
r+1
for every consecutive pair ( i
r
, i
r+1
) of all cycles where i
r
> i
r+1
. Then

for every atom of [

0, ], and

< . It follows that in the upper


Mobius algebra A
V
([

0, ], K), if X is the set of atoms of [

0, ],

xX
x =

txxX

t
has

as a summand. (5.50)
Hence no (nonempty) product of atoms ever equals . Then by the dual of
Theorem 5.10.4 (with the

1 of [

0, ]), (

0, ) = 0.
Now Eqs. 5.48 and 5.49 give
m
n
=

S
n
; increasing
(1)
nc()
(m)
c()
=

k
(1)
nk
(m)
k
_
_
_

increasing and c()=k


1
_
_
_. (5.51)
Since the number of increasing permutations with k cycles is easily seen
to be equal to S(n, k) (the number of partitions of [n] with k blocks), we
have derived (iii)
t
from (iii) by Mobius inversion on o
n
.
Recapitulation: (i) and (i)
t
are related by Mobius inversion on

n
; (ii)
and (ii)
t
are related by Mobius inversion on /
n
; and (iii) and (iii)
t
are
related by Mobius inversion on o
n
.
5.9 Lattices and Gaussian Coecients
A lattice L is a poset with the property that any nite subset S L has a
meet (or greatest lower bound), that is, an element b L for which
1. b a for all a S, and
234 CHAPTER 5. M

OBIUS INVERSION ON POSETS


2. if c a for all a S, then c b.
And dually, there is a join (or least upper bound), i.e., an element
b L for which
1.
t
a b for all a S, and
2.
t
if a c for all a S, then b c.
The meet and join of a two element set S = x, y are denoted, respec-
tively, by x y and x y. It is easily seen that and are commutative,
associative, idempotent binary operations. Moreover, if all 2-element subsets
have meets and joins, then any nite subset has a meet and a join.
The lattices we will consider have the property that there are no innite
chains. Such a lattice has a (unique) least element (denoted

0 or 0
L
), because
the condition that no innite chains exist allows us to nd a minimal element
m, and any minimal element m is a minimum, since if m , a, then m a
would be less than m. Similarly, there is a unique largest element 1
L
(or

1).
For elements a and b of a poset, we say a covers b and write a > b,
provided a > b but there are no elements c with a > c > b. For example,
when U and W are linear subspaces of a vector space, then U > W i U W
and dim(U) = dim(W) + 1. A point of a lattice with

0 is an element that
covers

0. A copoint of a lattice with

1 is an element covered by

1.
Theorem 5.9.1 (L. Weisner 1935) Let be the Mobius function of a nite
lattice L. And let a L with a >

0. Then

x:xa=

1
(

0, x) = 0.
Proof: Fix a. Put S :=

x,yL
(

0, x)(x, y)(a, y)(y,

1). Now compute


S in two dierent ways.
S =

xL

y x
y a
(

0, x)(y,

1) =

x
(

0, x)

y x
y a
(y, 1)
=

x
(

0, x)

yxa
(y,

1)
=

x
(

0, x)

xay

1
(y,

1) =

x
(

0, x)
_
1 x a = 1;
0, otherwise.
5.9. LATTICES AND GAUSSIAN COEFFICIENTS 235
=

x:xa=1
(

0, x), which is the sum in the theorem.


Also,
S =

ya
(y,

1)

0xy
(

0, x) =

ya
(y,

1) 0,
since y > 0.
Let V
n
(q) denote an n-dimensional vector space over F
q
= GF(q). The
term k-subspace will denote a k-dimensional subspace. It is fairly easy to
see that the poset L
n
(q) of all subspaces of V
n
(q) is a lattice with

0 = 0
and

1 = V
n
(q). We begin with some counting.
Exercise: 5.9.2 The number of ordered bases for a k-subspace of V
n
(q) is:
(q
k
1)(q
k
q)(q
k
q
2
) (q
k
q
k1
). How many ordered, linearly independent
subsets of size k are there in V
n
(q)?
To obtain a maximal chain (i.e., a chain of size n + 1 containing one
subspace of each possible dimension) in the poset of /
n
(q) of all subspaces
of V
n
(q), we start with the 0-subspace. After we have chosen an i-subspace
U
i
, 0 i < n, we can choose an (i + 1)-subspace U
i+1
that contains U
i
, in
q
n
q
i
q
i+1
q
i
ways, since we can take the span of U
i
and any of the q
n
q
i
vectors
not in U
i
. But an (i + 1)-subspace will arise exactly q
i+1
q
i
times in this
manner. Hence the number of maximal chains of subspaces in V
n
(q) is:
M(n, q) =
(q
n
q
0
)
(q
1
q
0
)

(q
n
q
1
)
(q
2
q
1
)

(q
n
q
n1
)
(q
n
q
n1
)
=
=
(q
n
1)q(q
n1
1)q
2
(q
n2
1) q
n1
(q 1)
(q 1)q(q 1)q
2
(q 1) q
n1
(q 1)
=
=
(q
n
1)(q
n1
1) (q 1)
(q 1)
n
.
This implies that
M(n, q) = (q
n1
+q
n2
+ +q + 1)(q
n2
+ q + 1) (q + 1).
236 CHAPTER 5. M

OBIUS INVERSION ON POSETS


We may consider M(n, q) as a polynomial in q for each integer n. When
the indeterminate q is replaced by a prime power, we have the number of
maximal chains in the poset PG(n, q).
Note: When q is replaced by 1, we have M(n, 1) = n!, which is the
number of maximal chains in the poset of subsets of an n-set.
The Gaussian number (or Gaussian coecient)
_
n
k
_
q
can be de-
ned as the number of k-subspaces of V
n
(q). This holds for 0 k n, where
_
n
0
_
q
= 1.
To evaluate
_
n
k
_
q
, count the number N of pairs (U, C) where U is a
k-subspace and C is a maximal chain that contains U. Since every maximal
chain contains one subspace of dimension k, clearly N = M(n, q). On the
other hand, we get each maximal chain uniquely by appending to a maximal
chain in the poset of subspaces of U of which there are M(k, q) a maximal
chain in the poset of all subspaces of V
n
(q) that contain U. There are M(n
k, q) of these, since the poset W : U W V is isomorphic to the poset
of subspaces of V/U, and dim(V/U) = n k. Hence
M(n, q) =
_
n
k
_
q
M(k, q) M(n k, q),
which implies that
_
n
k
_
q
=
M(n, q)
M(k, q)M(n k, q)
=
_
n
n k
_
q
=
(q
n1
+q
n2
+ +q + 1)(q
n2
+q
n3
+ +q + 1) (q + 1)
(q
k1
+ + 1) (q + 1)(q
nk1
+ +q + 1) (q + 1)
=
(q
n1
+ + 1)(q
n2
+ + 1) (q
nk
+ + 1)
(q
k1
+ + 1) (q + 1)
=
(q
n
1)(q
n1
1) (q
nk+1
1)
(q
k
1)(q
k1
1) (q 1)
.
5.9. LATTICES AND GAUSSIAN COEFFICIENTS 237
In fact there is a satisfactory way to generalize the notion and the notation
of Gaussian coecient to the multinomial case. (See the book by R. Stanley
for this.) However, for our present purposes it suces to consider just the
binomial case. Dene (0)
q
= 1, and for a positive integer j put (j)
q
=
1 + q + q
2
+ + q
j1
. Then put (0)!
q
= 1 and for a positive integer k, put
(k)!
q
= (1)
q
(2)
q
(k)
q
. So (n)!
q
= M(n, q). With this notation we have
_
n
k
_
q
=
(n)!
q
(k)!
q
(n k)!
q
. (5.52)
For some purposes it is better to think of
_
n
k
_
q
as a polynomial in an
indeterminate q rather than as a of function of a prime power q. That
_
n
k
_
q
is a polynomial in q is an easy corollary of the following exercise.
Exercise: 5.9.3 Prove the following recurrence:
_
n
k
_
q
=
_
n 1
k
_
q
+q
nk
_
n 1
k 1
_
q
.
Exercise: 5.9.4 Prove the following recurrence:
_
n + 1
k
_
q
=
_
n
k 1
_
q
+q
k
_
n
k
_
q
.
(Hint: There is a completely elementary proof just using the formulas for
the symbols.)
Note that the relation of the previous exercise reduces to the binomial
recurrence when q = 1. However, unlike the binomial recurrence, it is not
symmetric.
Exercise: 5.9.5 Show that
_
n
k
_
q
=

l0

l
q
l
, where
l
is the number of
partitions of l into at most k parts, each of which is at most n k.
238 CHAPTER 5. M

OBIUS INVERSION ON POSETS


If we regard a Gaussian coecient
_
n
k
_
q
as a function of the real variable
q (where n and k are xed integers), then we nd that the limit as q goes to
1 of a Gaussian coecient is a binomial coecient.
Exercise: 5.9.6
lim
q1
_
n
k
_
q
=
_
n
k
_
.
Exercise: 5.9.7 (The q-binomial Theorem) Prove that:
(1 +x)(1 +qx) (1 +q
n1
x) =
n

i=0
q
i(i1)
2
_
n
i
_
q
x
i
, for n 1.
Letting q 1, we obtain the usual Binomial Theorem.
Exercise: 5.9.8 Prove that:
_
n +m
k
_
q
=
k

i=0
_
n
i
_
q
_
m
k i
_
q
q
(ni)(ki)
.
Dene the Gaussian polynomials g
n
(x) 1[x] as follows: g
0
(x) =
1; g
n
(x) = (x 1)(x q) (x q
n1
) for n > 0. Clearly the Gaussian
polynomials form a basis for 1[x] as a vector space over 1.
Theorem 5.9.9 The Gaussian coecients connect the usual monomials to
the Gaussian polynomials, viz.:
(i) x
n
=

n
k=0
_
n
k
_
(x 1)
k
;
(ii) x
n
=

n
k=0
_
n
k
_
q
g
k
(x).
5.9. LATTICES AND GAUSSIAN COEFFICIENTS 239
Proof: (i) is a special case of the binomial theorem. And (ii) becomes (i)
if q = 1. To prove (ii), suppose V, W are vector spaces over F = GF(q) with
dim(V ) = n and [W[ = r. Here r = q
t
is any power of q with t n. Then
[Hom
F
(V, W)[ = r
n
.
Now classify f Hom
F
(V, W) according to the kernel subspace f
1
(0)
V . Given some subspace U V , let u
1
, . . . , u
k
be an ordered basis of
U and extend it to an ordered basis u
1
, . . . , u
k
, u
k+1
, . . . , u
n
of V . Then
f
1
(0) = U i f(u
i
) = 0, 1 i k, and f(u
k+1
), . . . , f(u
n
) are linearly
independent vectors in W. Now
r
n
=

UV
(r 1)(r q) (r q
nr(U)1
)
=
n

k=0
_
n
k
_
q
(r 1)(r q) (r q
nk1
)
=
n

k=0
_
n
k
_
q
(r 1)(r q) (r q
k1
)
(Use the fact that
_
n
k
_
q
=
_
n
n k
_
q
)
=
n

k=0
_
n
k
_
q
g
k
(r).
As r can be any power of q as long as r q
n
, the polynomials x
n
and

n
k=0
_
n
k
_
q
g
k
(x) agree on innitely many values of x and hence must be
identical.
The inverse connection can be obtained from the q-binomial theorem (c.f.
Ex. 5.9.7).
Exercise: 5.9.10 Prove that
g
n
(x) =
n

i=0
_
n
i
_
q
q
(
ni
2
)
(1)
ni
x
i
.
240 CHAPTER 5. M

OBIUS INVERSION ON POSETS


(Hint: In Ex. 5.9.7 rst replace x with x and then replace q with q
1
and simplify.)
If a
n

n=0
is a given sequence of numbers we have considered its ordi-
nary generating function

n0
a
n
x
n
and its exponential generating function

n0
a
n
x
n
n!
. (Also considered in Chapter 4 was the Dirichlet generating series
function.) There is a vast theory of Eulerian generating series functions de-
ned by

n0
a
n
x
n
n!
q
. (See the book by R. Stanley for an introduction to this
subject with several references.) The next exercise shows that two specic
Eulerian generating functions are inverses of each other.
Exercise: 5.9.11
_

k0
(t)
k
q
(
k
2
)
k!
q
_
_

k0
t
k
k!
q
_
= 1.
(Hint: Compute the coecient on t
n
separately for n = 0 and n 1.
Then use the q-binomial theorem with x = 1.)
Exercise: 5.9.12 Gauss inversion: Let u
i

i=0
and v
i

i=0
be two sequences
of real numbers. Then
v
n
=
n

i=0
_
n
i
_
q
u
i
(n 0) u
n
=
n

i=0
(1)
ni
q
(
ni
2
)
_
n
i
_
q
v
i
(n 0).
(Hint: Use Exercise 5.9.11.)
See the book by R. P. Stanley and that by Goulden and Jackson for a
great deal more on the subject of q-binomial (and q-multinomial) coecients.
We are now going to compute the Mobius function of the lattice L
n
(q).
Theorem 5.9.13 The Mobius function of the lattice L
n
(q) of subspaces of
a vector space of dimension n over the Galois eld F = GF(q) is given by
(U, W) =
_
(1)
k
q
(
k
2
)
, if U W and k = dim(W) dim(U);
0, if U , W.
5.9. LATTICES AND GAUSSIAN COEFFICIENTS 241
Proof: The idea is to use Weisners theorem on the interval [U, W] viewed
as isomorphic to the lattice of subspaces of the quotient space W/U. This
means that we need only compute (

0,

1), where V =

1 is a space of dimen-
sion n and

0 = 0. If V has dimension 1, then L
1
(q) is a chain with two
elements and (

0,

1) = 1 = (1)
1
q
(
1
2
)
. Now suppose n = 2. Let a be a
point. By Weisners theorem
(

0,

1) =

p : p a =

1
p ,=

1
(

0, p) = [p : p a =

1 and p ,=

1[
= q = (1)
2
q
(
2
2
)
.
Now suppose that our induction hypothesis is that
(

0, V ) = (1)
k
q
(
k
2
)
if k = dim(V ) < n.
Let p cover 1. By Weisners Theorem,
(

0, V ) =

U : U p = V
U ,= V
(

0, U).
The subspaces U such that U p = V and U ,= V are those of dimension
n 1 (i.e., hyperplanes) that do not contain p. The number of hyperplanes
on p is the number of points on a hyperplane, which is
_
n 1
1
_
q
, so that
the number of hyperplanes not containing p is
_
n
1
_
q

_
n 1
1
_
q
= q
n1
.
So if dim(V ) = n, then (

0, V ) = (1)q
n1
(1)
n1
q
(
n1
2
)
= (1)
n
q
(
n
2
)
, after
a little simplication.
As an application we count the number of linear transformations from
an n-dimensional vector space Y onto an m-dimensional vector space V over
F = GF(q). Clearly we must have n m if this number is to be nonzero.
However, we do not make this assumption.
242 CHAPTER 5. M

OBIUS INVERSION ON POSETS


Theorem 5.9.14 If Y and V are vector spaces over F with dim(Y ) = n
and dim(V ) = m, then
[T Hom(Y, V ) : T(Y ) = V [ =
m

k=0
(1)
mk
_
m
k
_
q
q
nk+
(
mk
2
)
.
Proof: For each subspace U of V , let f(U) = [T Hom(Y, V ) : T(Y ) =
U[, and let g(U) = [T Hom(Y, V ) : T(Y ) U[. Then g(U) = q
nr
if
dim(U) = r, and clearly g(U) =

W:WU
f(W). By Mobius inversion we
have f(U) =

W:WU
(W, U)q
ndim(W)
. If U = V , by our formula for the
Mobius function on L
n
(q) we have
f(V ) =

W
(W, V )q
ndim(W)
=
m

k=0
(1)
mk
q
(
mk
2
)
_
m
k
_
q
q
nk
,
which nishes the proof.
Corollary 5.9.15 The number of nm matrices over F = GF(q) with rank
r is
_
m
r
_
q
r

k=0
(1)
rk
_
r
k
_
q
q
nk+
(
rk
2
)
.
Corollary 5.9.16 The number of invertible n n matrices over GF(q) is
n

k=0
(1)
nk
_
n
k
_
q
nk+
(
nk
2
)
.
Remark: There are g
n
(q
m
) = (q
m
1)(q
m
q) (q
m
q
n1
) injective
linear transformations from V
n
to V
m
. If m = n, then injective is equivalent
to onto. Hence
g
n
(q
n
) = (q
n
1)(q
n
q) (q
n
q
n1
) =
n

k=0
(1)
nk
_
n
k
_
q
q
nk+
(
nk
2
)
.
Exercise: 5.9.17 It is possible to dene the Gaussian coecients for any
positive integer q, not just the prime powers. Read the following article:
John Konvalina, A Unied Interpretation of the Binomial Coecients, the
Stirling Numbers, and the Gaussian Coecients, Amer. Math. Monthly, 107
(2000), 901 910.
5.10. POSETS WITH FINITE ORDER IDEALS 243
5.10 Posets with Finite Order Ideals
Let P be a poset for which each order ideal
x
= y P : y x is nite,
and let be the Mobius function of P. If K is any eld, the (lower) Mobius
algebra A

(P, K) is the algebra obtained by starting with the vector space


K
P
and dening a (bilinear) multiplication on basis elements x, y P by
x y =

s:sx and sy
_
_

t[s,x][s,y]
(s, t)
_
_
s =
=

(s,t):stx and sty


(s, t)s =

t
x

y
_
_

s:st
(s, t)s
_
_
. (5.53)
So if we put
t
=

s:st
(s, t)s, we have
x y =

t
x

t
(5.54)
If we x z P,

g:gz

t
=

(s,t):stz
(s, t)s =

s:sz
_

t:stz
(s, t)
_
s =

s:sz

s,z
s = z. Hence:
z =

t
z

t
, and x x =

t
x

t
= x. (5.55)
Moreover,
x y = y i y x. (5.56)
Note: In the above we are thinking of f K
P
as a formal (possibly
innite) linear combination of the elements of P : f =

xP
f(x)x. So the
element x of P is identied with the element 1
x
=

yP
(
x,y
)y = x. Then
the above discussion shows that
t
: t P (as well as x : x P) is a
basis for A

(P, K).
Let A
t

(P, K) be the abstract algebra



xP
K
x
with K
x
isomorphic to
K. So A
t

(P, K) is K
[P[
with direct product operations. Let
t
x
be the
identity of K
x
, so
t
x

t
y
=
x,y

t
x
. Then dene a linear transformation
: A

(P, K) A
t

(P, K) by (
x
) =
t
x
, and extend by linearity.
244 CHAPTER 5. M

OBIUS INVERSION ON POSETS


Theorem 5.10.1 is an algebra isomorphism.
Proof: For each x P, put x
t
=

yx

t
y
A
t

(P, K). As is clearly a


vector space isomorphism with (x) =
_

yx

y
_
=

yx

t
y
= x
t
, it suces
to show that (x y) = (x)(y) for all x, y P. So, (x y) =

t:tx and ty

t
t
=

tx and sy

t
t

t
s
=
_
_

tx

t
t
_
_
_
_

ty

t
t
_
_
= x
t
y
t
= (x)(y).
As a simple corollary we have the following otherwise not so obvious
result.
Theorem 5.10.2 If P is nite,
t
: t P is a complete set of orthogonal
idempotents, and

tP

t
= (= 1).
(Note: In the notation for lattices, x y = z i
x

y
=
z
i z = xy.)
Theorem 5.10.3 Let P be nite with [P[ 2. Let a, x P, a ,= x. On
the one hand,
a
x
= a

tx
(t, x)t =

tx
(t, x)(a t) =

dP
_
_
_

t:tx and at=d


(t, x)
_
_
_d.
On the other hand,
a
x
=

t:ta

t

x
=

t:ta

t,x

x
=
_

x
, if x a;
0, if x , a.
This has the following consequences:
5.10. POSETS WITH FINITE ORDER IDEALS 245
(i) If x , a and d P, then

t:tx and at=d


(t, x) = 0.
For example, if a ,=

1 P, then

t:at=d
(t,

1) = 0.
As a special case,

t:at=

0
(t,

1) = 0, if P has

0 and

1,
with a ,=

1.
(ii) If x a,

d
_

t:tx and at=d


(t, x)
_
d =
x
=

d:dx
(d, x)d. So
(a) If d x a, then

t:tx and at=d
(t, x) = (d, x),
and
(b) If d ,= x a, then

t:tx and at=d
(t, x) = 0.
Theorem 5.10.4 Let P be a nite poset with

0 and

1,

0 ,=

1. And let
X be the set of coatoms of P (i.e., elements covered by

1). Then (

0,

1) =

[X[
k=1
(1)
k
N
k
, where N
k
is the number of subsets of X of size k whose product
is

0.
Proof: For any x X, x = (

tP

t
)

t:tx

t
=

t:t,x

t
. Hence

xX
(1 x) =

xX
_

t:t,x

t
_
=

1
, since if t ,= s,
t

s
= 0 and

1
is
the only idempotent appearing in all terms of the product. The coecient
of

0 in

1
=

s:s

1
(x,

1)s is (

0,

1). The coecient of



0 in

xX
(1 x) is
exactly

k
(1)
k
N
k
.
If P is a poset for which each dual order ideal V
x
= y P : y x
is nite, we can dualize the construction of the (lower) Mobius algebra and
dene the upper Mobius algebra A
V
(P, K) which has primitive idempotents
of the form
x
=

yx
(x, y)y.
Theorem 5.10.5 Let P and Q be nite posets. If : P Q is any map,
then extends to an algebra homomorphism : A
V
(P, K) A
V
(Q, K) i
the following hold:
(i) is order preserving, and
(ii) for any q Q, the set p P : (p) q either has a maximum or is
empty.(That is, if I is a principal order ideal of Q, then
1
(I) is principal
or empty).
246 CHAPTER 5. M

OBIUS INVERSION ON POSETS


Proof: First suppose that extends to a homomorphism. Since x y i
xy = y (in A
V
(P, K)), it must be that x y i xy = y i (x)(y) = (y)
i (x) (y), so is order preserving. Now for a xed q Q suppose that
p P : (p) q , = and choose a p P for which (p) q. Then

y(p)

y
= (p) =
_
_

xp

x
_
_

xp
(
x
). (5.57)
since
y
: y P and
y
: y Q are bases for A
V
(P, K) and
A
V
(Q, K), respectively, we see from Eq. 5.57 that
q
A
V
(Q, K) appears
as a summand in (
x
) for some x p. Moreover, this x is unique, because
if
q
is a summand in both (
x
) and (
y
), then (
x
) (
y
) ,= 0. But
(
x
) (
x
) = (
x
)
y
= (0) = 0. We claim that the unique x, x p, for
which
q
is a summand of (
x
) is x = maxp P : (p) q. The above
argument at least shows that x p for each p such that (p) q. But as

q
is a summand of (
x
), it is also a summand of (x) =
_

gx

t
_
=

tx
(
t
). But (x) =

t(x)

t
having
q
as a summand implies that
q (x). We now have: x p for each p with (p) q and (x) q. So
x = supp P : (p) q. Hence p P : (p) q =
1
(I =
q
) =
x
.
This complete the proof that when extends to an algebra homomor-
phism both (i) and (ii) hold.
Conversely, suppose both (i) and (ii) hold. Let Q
0
= q Q : q
(p) for some p = q q :
1
(
q
) ,= . If p P, then q = (p) is
automatically in Q
0
. And if q Q
0
, put (q) = maxp P : (p) q. So

(q)
= p P : (p) q.
If q = (x), then q = ((q)) = (((x))) = (x). On the other hand
if q is not in the image of , then ((q)) = (maxx : (x) q) q.
If (p
1
) = (p) with p
1
,= p, put

(
p
) =

qQ:((q))=(p)

q
=

q
where the sum

t
is over all
q
for which q satises the following: q (p)and
the largest x with (x) q has (x) = (p). In this set of qs, (p) is the
only one in the image of . The other qs are only slightly larger than (p).
(This set of qs is the set q : [(p), q] (P) = (p).)
5.10. POSETS WITH FINITE ORDER IDEALS 247
Fix p P. Then for each q Q with q (p) there is a unique x in P
for which x is the largest element of P with (x) q. Necessarily x p.
Similarly, if we x x, x p, there is a well-dened set of qs for which (x) is
the largest element of (P) which is less than or equal to q. Hence for p P,

(p) =

_
_

x:xp

x
_
_
=

x:xp

(
x
) =
=

x:xp and x=maxy:(x)=(y)

(
x
)
=

x:xp and x=maxy:(x)=(y)


_
_
_

q:q(x) and x=maxtP:(t)q

q
_
_
_ =
=

q:q(p)

q
= (p).
Hence

is the desired extension of to A


V
(P, K).
Corollary 5.10.6 Let (P, ) be a nite poset, and let P
0
P. Then the
injection : P
0
P : p p extends to an algebra homomorphism of
A
V
(P, K) into A
V
(P, K) i the restriction of each principal order ideal of P
to P
0
is either empty or principal.

You might also like