Combinatorics

Alexander Hulpke
(Fall 2025)
\chapterstyle

BlueBox  

\thetitle

Enumeration and Structure

Course Notes – MATH 501/2

\theauthor\thedate
[Uncaptioned image]

-ed on August 25, 2025

Alexander Hulpke
Department of Mathematics
Colorado State University
1874 Campus Delivery
Fort Collins, CO, 80523
{epigraphs} \qitemTitle graphics:
Window in the Southern Transept of the Cathedral in Cologne (detail) Gerhard Richter These notes are accompanying my course MATH 501/2, Combinatorics, held Fall 2025 at Colorado State University.

©2025 Alexander Hulpke. Copying for personal use is permitted.

1 Preface

{epigraphs}\qitem

Counting is innate to man. History of India
Abū Rayhān Muhammad ibn Ahmad Al-Bīrūnī

This are lecture notes I prepared for a graduate Combinatorics course which ran in 2016/17, 2020/21, 2024 and 2025 at Colorado State University.

They started many years ago from an attempt to supplement the book  [cameron] (which has great taste in selecting topics, but sometimes is exceedingly terse) with further explanations and topics, without aiming for the encyclopedic completeness of [stanley1, stanley2].

In compiling these notes, In addition to the books already mentioned, I have benefitted from: [cameron, cameronlint, godsilroyle, lidlniederreiterapp, lintwilson, hall, concrete, knuth3, reichmeider].

You are welcome to use these notes freely for your own courses or students – I’d be indebted to hear if you found them useful.

Fort Collins, \thedate
Alexander Hulpke
hulpke@colostate.edu

Chapter 0 Introduction

{epigraphs}\qitem

A combinatorial structure is one which has combinatorial properties. Combinatorial properties are those possessed by combinatorial structures. So formal definitions are not getting us anywhere.
We shall leave combinatorial structure as an undefined term. […]Course on Undergraduate Combinatorics
Solomon Golomb and Andy Liu

1 What is Combinatorics?

If one looks at university mathematics classes around 1920, one already finds the basic pattern of many current courses. Single-variable analysis had already taken much of its present shape. Algebra had begun to formalize groups, rings, and fields — concepts that, a few years later, looked very much like what is taught today. Much of the theory of differential equations was known, and numerical methods lacked only the availability of fast computers. But there was little combinatorics beyond the basic counting formulas used in statistics. Combinatorics began to grow suddenly in the 1950s and 1960s, in part motivated by the advent of computers, and arguably did not have a standard list of topics until the 1980s or 1990s. It differs from many other areas of mathematics in that it was not driven by a small number of deep (often unresolved) problems, but by the observation that problems from seemingly different areas actually follow similar patterns and can be studied with similar methods. Its scope, in the broadest sense, is the study of the different ways objects can be related to one another.

This investigation naturally splits into three parts: The question of existence (Can certain configurations exist?) Counting the number of possible configurations. (If we can count them it often implies we have a good overview over what can exist.) And finding extreme, often optimal, cases.

This course looks at combinatorics split into two main areas, roughly corresponding to semesters: the first is enumerative combinatorics, the study of counting the different ways configurations can be set up. The second is the study of properties of combinatorial structures that consists of many objects subject to certain prescribed conditions.

2 Prerequisites

This being a graduate class we shall assume knowledge of some topics that have been covered in undergraduate classes. In particular we shall use:

Sets, Functions, Relations

We assume the reader is comfortable with the concept of sets and standard constructs such as Cartesian products. We denote the set of all subsets of X by 𝒫(X), it is called the power set of X.

A relation on a set X is a subset of X×X, functions can be considered as a particular class of relations. We thus might consider a function f:XY as a subset of X×Y.

Another important class of relations are equivalence relations. Via equivalence classes they correspond to partitions of the set.

Induction

The technique of proof by induction is intimately related to the concept of recursion. It is assumed, that the reader is comfortable with the various variants (different starting values, referring to multiple previous values, postulating a smallest counterexample) of finite induction. We also might sometimes just state that a proof follows by induction, if base case or inductive step are obvious or standard.

1 Abstract Algebra

Abstract algebra is often useful in providing a formal framework for describing objects. We assume the reader is familiar with the standard concepts from an undergraduate abstract algebra class – groups, permutations (we multiply permutations from left to right), cycle form, polynomial rings, finite fields, and linear algebra.

2 Graph Terminology

This being a graduate class, the assumption is that the reader has encountered the basic definitions of graph theory – such as: vertex, edge, degree, directed/undirected, path, tree – already in an undergraduate class.

3 Calculus

It often comes as a surprise to students , that combinatorics – this epitome of discrete mathematics – uses techniques from calculus. Some of it is the classical use of approximations to estimate growth, but we shall also need it as a toolset for manipulating power series. Still, there is no need for the reader to worry that we would encounter messy approximations, convergence tests, or bathtubs that are filled while simultaneously draining.

3 OEIS

A problem that arises often in combinatorics is that we can easily describe small examples, but that it is initially hard to see the underlying patterns. For example, we might be able to count the total number of objects of small size, but will be unable to count how many there are of larger size. In investigating such situations, the Online Encyclopedia of Integer Sequences (OEIS, at oeis.org) is an invaluable tool that allows to look up number sequences that fit the particular pattern Given by a few values , and for many of these gives a huge number of connections and references . Sequences in this encyclopedia have a “storage number” starting with the letter ”A”, and we will refer to these at times by an indicator box OEIS A002106.

Chapter 1 Basic Counting

One of the basic tasks of combinatorics is to determine the cardinality of (finite) classes of objects. Beyond basic applicability of such a number – for example to estimate probabilities – the actual process of counting may be of interest, as it gives further insight into the problem:

  • If we cannot count a class of objects, we cannot claim that we know it.

  • The process of enumeration might – for example by giving a bijection between classes of different sets – uncover a relation between different classes of objects.

  • The process of counting might lend itself to become constructive, that is allow an actual construction of (or iteration through) all objects in the class.

The class of objects might have a clear mathematical description – e.g., all subsets of the set {1,,5}. In other situations the description itself needs to be translated into proper mathematical language. For example:

Definition 1 (Derangements, informal definition).

Given n letters and n addressed envelopes, a derangement is an assignment of letters to envelopes such that no letter is in the correct envelope.

(How many derangement of n exist? This particular problem will be solved in section 5.)

Here the translation to more formal objects is that we consider the letters and envelopes to be numbered from 1 to n, and the assignment being a function. That is:

Definition 2 (Derangements).

For an integer n, a derangement is a bijection d:NN on N={1,,n} (i.e. a permutation), such that d(i)i for all iN.

We will start in this chapter by considering the enumeration of some basic constructs – sets and sequences. More interesting problems, such as the derangements here, arise later if further conditions restrict to sub-classes, or if obvious descriptions could have multiple sequences describe the same object.

1 Basic Counting of Sequences and Sets

{epigraphs}\qitem

I am the sea of permutation.
I live beyond interpretation.
I scramble all the names and the combinations.
I penetrate the walls of explanation. Lay My Love
Brian Eno

The number of elements of a set A, denoted by |A| (or sometimes as #A) and called its cardinality, is defined111We only deal with the finite case here, there are generalizations for infinite sets as the unique n, such that there is a bijective function from A to {1,2,,n}.

There are three basic principles that underlie counting:

Disjoint Union

If A=A1A2 with A1A2=, then |A|=|A1|+|A2|.

Cartesian product

|A×B|=|A||B|.

Equivalence classes

If we can represent each element of A in m different ways by elements of B, then |A|=|B|m.

A sequence (or tuple) of length k is simply an element of the k-fold cartesian product. Entries are chosen independently, that is if the first entry has a choices and the second entry b, there are ab possible choices for a length two sequence. Thus, if we consider sequences of length k, entries chosen from a set A of cardinality n, there are nk such sequences.

This allows for duplication of entries, but in some cases – arranging objects in sequence – this is not desired. In this case we can still choose n entries in the first position, but in the second position need to avoid the entry already chosen in the first position, giving n1 options. (The number of options is always the same, the actual set of options of course depends on the choice in the first position.) The number of sequences of length k thus is (n)k=n(n1)(n2)(nk+1)=n!(nk)!, called222Warning: The notation (n)k has different meaning in other areas of mathematics!n lower factorial k”.

This could be continued up to a sequence of length n (after which all n element choices have been exhausted). Such a sequence is called a permutation of A. There are n!=(n)n=n(n1)21 such permutations.

Next we consider sets of elements. While every duplicate-free sequence describes a set, sequences that have the same elements arranged in different order describe the same set. Every set of k elements from n thus will be described by k! different duplicate-free sequences. To enumerate sets, we therefore need to divide by this factor, and get for the number of k-element sets from n the count given by the binomial coefficient

(nk)=(n)kk!=n!(nk)!k!

Note (using a convention of 0!=1) we get that (n0)=(nn)=1. It also can be convenient to define (nk)=0 for k<0 or k>n.

Often one counting process can be modified to count somewhat different objects: Consider compositions of a number n into k parts, that is ways of writing n as a sum of exactly k positive integers with ordering being relevant. For example 4=2+2=1+3=3+1 are the 3 possible compositions of 4 into 2 parts.

To get a formula of the number of possibilities, write the maximum composition

n=1+1+1++1

which has n1 plus-signs. We obtain the possible compositions into k parts by grouping summands together to only have k summands. That is, we designate k1 plus signs from the given n1 possible ones as giving us the separation. The number of possibilities thus is (n1k1).

If we also want to allow summands of 0 when writing n as a sum of k terms, we can simply assume that we temporarily add 1 to each summand. This guarantees that each summand is positive, but adds k to the sum. We thus count the number of ways to express n+k as a sum of k summands which is by the previous formula (n+k1k1).

{tryout}

Check this for n=4 and k=2.

Again, slightly rephrasing the problem, this is also the number of multisets, that is set-like collections in which we allow the same element to appear multiple times, with n elements chosen from k possibilities, the i-th summand indicating how often the i-th element is chosen.

Swapping the role of n and k, We denote by ((nk))=(n+k1n1)=(n+k1k) (the second equality follows from Prop.4 a) the number of k-element multisets chosen from n possibilities.

The results of the previous paragraphs are summarized in the following theorem:

Theorem 3.

The number of ways to select k objects from a set of n is given by the following table:

Repetition No Repetition
Order significant nk (n)k
(sequences)
Order not significant (n+k1k)=((nk)) (nk)
(sets)
Note 1.1.

Instead of using the word significant some books talk about ordered or unordered sequences. I find this confusing, as the use of ordered is opposite that of the word sorted which has the same meaning in common language. We therefore use the language of significant.

2 Bijections and Double Counting

{epigraphs}\qitem

As long as there is no double counting, Section 3(a) adopts the principle of the recent cases allowing recovery of both a complainants actual losses and a misappropriator’s unjust benefit… Draft of Uniform Trade Secrets Act
American Bar Association

In this section we consider two further important counting principles that can be used to build on the basic constructs.

Instead of counting a set A of objects directly, it might be easiest to establish a bijection to another set B, that is a function f:AB which is one-to-one (also called injective) and onto (also called surjective). Once such a function has been established we know that |A|=|B| and if we know |B| we thus have counted |A|.

We used this idea already above when we were counting compositions by instead considering possible positions of plus signs.

As a further example, consider the following problem: We have an n×n grid of points with horizontal and vertical connections (depicted in figure 1 for 3×3) and want to count the number of different paths from the bottom left, to the top right corner, that only go right or up.

Refer to caption
Figure 1: A 3×3 grid

Each such path thus has exactly n1 right steps, and n1 up steps. We thus (this already could be considered as one bijection) could count instead 0/1 sequences (0 is right, 1 is up) of length 2n2 that contain exactly n1 ones (and zeros). Denote the set of such sequences by A.

To determine |A|, we observe that each sequence is determined uniquely by the positions of the ones and there are exactly n1 of them. Thus let B be the set of all (n1)-element subsets of {1,,2n2}.

We define f:AB to assign to a sequence the positions of the ones, f:a1,,a2n2{iai=1}.

As every sequence has exactly n1 ones, indeed f goes from A to B. As the positions of the ones define the sequence, f is injective. And as we clearly can construct a sequence which has ones in exactly n1 given positions, f is surjective as well. Thus f is a bijection.

We know that |B|=(2n2n1), this is also the cardinality of A.

{tryout}

Check this for n=2, n=3.

The second useful technique is to “double counting”, counting the same set in two different ways (which is a bijection from a set to itself). Both counts must give the same result, which can often give rise to interesting identities. The following lemma is an example of this paradigm:

Lemma 2.1 (Handshaking Lemma).

At a convention (where not everyone greets everyone else but no pair greets twice), the number of delegates who shake hands an odd number of times is even.

Proof 2.2.

we assume without loss of generality that the delegates are {1,,n}. Consider the set of handshakes

S={(i,j)i and j shake hands.}.

We know that if (i,j) is in S, so is (j,i). This means that |S| is even, |S|=2y where y is the total number of handshakes occurring.

On the other hand, let xi be the number of pairs with i in the first position. We thus get xi=|S|=2y. If a sum of numbers is even, there must be an even number of odd summands.

But xi is also the number of times that i shakes hands, proving the result.

{epigraphs}\qitem

About binomial theorem I’m teeming with a lot o’ news,
With many cheerful facts about the square of the hypotenuse. The Pirates of Penzance
W.S. Gilbert

The combinatorial interpretation of binomial coefficients and double counting allows us to easily prove some identities for binomial coefficients (which typically are proven by induction in undergraduate classes):

Proposition 4.

Let n,k be nonnegative integers with kn. Then:

  • a)

    (nk)=(nnk).

  • b)

    k(nk)=n(n1k1).

  • c)

    (n+1k)=(nk1)+(nk) (Pascal’s triangle).

  • d)

    k=0n(nk)=2n.

  • e)

    k=0n(nk)2=(2nn).

  • f)

    (1+t)n=k=0n(nk)tk (Binomial Theorem).

Proof 2.3.

a) Instead of selecting a subset of k elements we could select the nk elements not in the set.
b) Suppose we want to count committees of k people (out of n) with a designated chair. We can do so by either choosing first the (nk) committees and then for each team the k possible chairs out of the committee members. Or we choose first the n possible chairs and then the remaining k1 committee members out of the n1 remaining persons.
c) Suppose that I am part of a group that contains n+1 persons and we want to determine subsets of this group that contain k people. These either include me (and k1 further persons from the n others), or do not include me and thus k from the n other people.
d) We count the total number of subsets of a set with n elements. Each subset can be described by a 0/1 sequence of length n, indicating whether the i-th element is in the set.
e) Suppose we have n men and n women and we want to select groups of n persons. This is the right hand side. The number of possibilities with exactly k women is (nk)(nnk)=(nk)2 by a). The left hand of the equation simply sums these over all possible k.
f) Clearly (1+t)n is a polynomial of degree n. The coefficient for tk gives the number of possibilities to choose the t-summand when multiplying out the product

(1+t)(1+t)(1+t)

of n factors so that there are k such summands overall. This is simply the number of k-subsets, (nk).

{tryout}

Prove the theorem using induction. Compare the effort. Which method gives you more insight?

3 Stirling’s Estimate

Since the factorial function is somewhat unhandy in estimates, it can be useful to have an approximation in terms of elementary functions. The most prominent of such estimates is given by Stirling’s formula. Our description follows [fellerstirling]:

Theorem 5.
n!2πn(ne)n(1+O(1n)). (1)

Here 1+O(1/n) means that the quotient of the estimate and the true value is bounded by 1±1/n, that is the relative error is O(1/n). Figure 2 shows a plot of the ratio of the estimate to n!.

Refer to caption
Figure 2: Plot of 2πn(ne)n/n!
Proof 3.1.

Consider the natural333all logarithms in this book are natural, unless stated differently logarithm of the factorial:

log(n!)=log1+log2++logn

If we define a step function L(x)=log(x+1/2), we thus have that log(n!) is an integral of L(x). We also know that

I(x)=0xlog(t)𝑑t=xlogxx.

We thus need to consider the integral over L(x)log(x). To avoid divergence issues, we consider this in two parts. Let

ak=12logkk12klogxdx=k12klog(k/x)𝑑x,

and

bk=kk+12logxdx12logk=kl+12log(x/k)𝑑x.

Then

Sn=a1b1+a2b2++an=logn!12logn+I(n)I(12).

A substitution gives

ak=012log11(t/k)dt,bk=012log(1+t/k)𝑑t,

from which we see that ak>bk>ak+1>0. By the Leibniz’ criterion thus Sn converges to a value S and we have that

logn!(n+12)logn+nSI(12).

Taking the exponential function we get that

n!eCn(ne)n

with C=k=1(akbk)I(12).

Using standard analysis techniques, one can now show that eC=2π; see [fellerstirling] for details. Alternatively, in concrete calculations, we could simply approximate the value to any accuracy desired.

The appearance of e might seem surprising, but the following example shows that it arises naturally in this context:

Proposition 6.

The number sn of all sequences (of arbitrary length) without repetition that can be formed from n objects is en!.

We note that no sequence can have length more than n!, as this would require repetition.

Proof 3.2.

If we have a sequence of length k, there are (n)k=k!(nk)=n!(nk)! sequences of length k. Summing over all values of nk we get (with an index shift, replacing k by nk):

sn=k=0nn!k!=n!k=0n1k!.

Using the Taylor series for ex we see that

en!sn = 1n+1+1(n+1)(n+2)+
< 1n+1+1(n+1)2+
= 1n<1.

This is an example (if we ignore language meaning) of the popular problem how many words could be formed from the letters of a given word, if no letters are duplicate.

{tryout}

If we allow duplicate letters the situation gets harder. For example, consider words (ignoring meaning) made from the letters of the word COLORADO. The letter O occurs thrice, the other five letters only once. If a word contains at most one O, the formula from above gives e6!=k=066!k!=720+720+360+120+30+5+1=1957 such words.

For more than one O, the above formula can’t be used any longer, but we need to go back to summations. If the word contains two O, and k other letters there are (5k) options to select these letters and (k+2)!/2 possibility to arrange the letters (the denominator 2 making up for the fact that both O cannot be distinguished). Thus we get

2!2+53!2+104!2+105!2+56!2+7!2=1+15+120+600+1800+2520=5056

such words.

If the word contains three O, and k other letters we get a similar formula, but with a cofactor (k+3)!/6, reflecting the 3!=6 arrangements of the three O which yield the same word. We thus have

3!6+54!6+105!6+106!6+57!6+8!6=1+20+200+1200+4200+6720=12341

possibilities, summing up to 19354 possibilities in total.

Lucky we do not live in MISSISSIPPI!

4 The Twelvefold Way

A generalization of counting sets and sequences is given by considering functions between finite sets. We shall consider functions f:NX with |N|=n and |X|=x. These functions could be arbitrary, injective, or surjective.

What does the concept of order (in)significant mean in this context? If the order is not significant, we actually only care about the set of values of a function, but not the values on particular elements. That is, all the elements of N are equivalent, we call them indistinguishable or unlabeled (since labels will force objects to be different). Otherwise we talk about distinguishable or labeled objects.

Formally, we are counting equivalence classes of functions, in which two functions f,g:NX are called N-equivalent, if there is a bijection u:NN such that f(u(a))=g(a) for all aN.

Similarly, we define an X-equivalence of function, calling sf and g equivalent, if there is v:XX such that v(f(a))=g(a) for all aN. If we say the elements of X are indistinguishable, we count functions up to X-equivalence.

We can combine both equivalences to get even larger equivalence classes, which are the case of elements of both N and X being indistinguishable.

(The reader might feel this to be insufficiently stringent, or wonder about the case of different classes of equivalent objects. We will treat such situations in Section LABEL:polya under the framework of group actions.)

This setup (N and X distinguishable or not and functions being injective, surjective or neither) gives in total 322=12 possible classes of functions. This set of counting problems is called the Twelvefold Way in [stanley1, Section 1.9] (and attributed there to Gian-Carlo Rota).

{tryout}

For each of the 12 categories, give an example of a concrete counting problem in common language. Say, we have n balls and x boxes (either of them might be labeled), and we might require that no box has more than one ball, or that every box contains at least one ball.

Say, we have N={1,2} and X={a,b,c}. Then we have the following functions NX:

All functions:

There are 9 functions, namely (giving functions as sequences of values): [a,a],[a,b],[a,c],[b,a],[b,b],[b,c],[c,a],[c,b],[c,c].

Injective functions:

There are 6 functions, namely [a,b],[a,c],[b,a],[b,c],[c,a],[c,b].

Surjective functions:

There are no such functions, but there are 6 functions from {1,2,3} to {a,b}, namely: [a,a,b],[a,b,a],[a,b,b],[b,a,a],[b,a,b],[b,b,a].

Up to permutation of N

So we consider only the sets of values, which gives 6 possibilities: {a,a},{a,b},{a,c},{b,b},{b,c},{c,c}.

Injective, up to permutation of N

The values need to be different, so 3 possibilities: {a,b},{a,c},{b,c}.

Surjective, up to permutation of N

There are no such functions, but there are 2 such functions from {1,2,3} to {a,b}, namely: [a,a,b],[a,b,b].

Up to permutations of X

Since |N|=2, the question is just whether the two values are the same or not: [a,a],[a,b].

Injective, up to permutations of X

Here [a,b] is the only such function.

Surjective, up to permutations of X

Again, no such function, but from {1,2,3} to {a,b} there are 3 such functions namely [a,a,b],[a,b,a],[b,a,a].

Up to permutations of N and X

Again two possibilities, [a,a],[a,b]; but if N={1,2,3}, there are three possibilities, namely [a,a,a],[a,a,b],[a,b,c].

Injective, up to permutations of N and X

Again, [a,b] is the only such function.

Surjective, up to permutations of N and X

Again, no such function, but from {1,2,3} to {a,b} there is one, namely [a,a,b].

We are now getting ready to give formulas for the number of functions in each class, depending only on n and x. For this we introduce the following definitions. Determining closed formulae for these is not always easy, and will require further work in subsequent chapters.

partition of a set A is a collection {Ai} of subsets (called parts or cells) AiA such that for all i:

  • iAi=A

  • AiAj= for ji.

Note that a partition of a set gives an equivalence relation and that any equivalence relation on a set defines a partition into equivalence classes.

Definition 7.

We denote the number of partitions of {1,,n} into k (non-empty) parts by S(n,k). It is called the Stirling number of the second kind444There also is a Stirling number of the first kind OEIS A0082h7. The total number of partitions of {1,,n} is given by the Bell number OEIS A000110

Bn=k=1nS(n,k).
{tryout}

There are B3=5 partitions of the set {1,2,3}.

Again we might want to set S(n,k)=0 unless 1kn.

We will give a formula for S(n,k) in Lemma 6.1 and study B(n) in section 1.

In some cases we shall care not which numbers are in which cells of a partition, but only the size of the cells. (That is, we only care about writing n as a sum over an increasing sequence of positive integers: n=1+3+4+6.) Sometimes this is called an integer partition, a Young diagram, or a Ferrers diagram555The difference between the last two is purely in whether we draw boxes or circles.. Figure 3 shows these diagrams for partition (1,3,4,6). In one style (called ”English”), the length of rows decrease as one goes downwards, in another (”French”) they do as one goes upwards.

We denote the number of partitions of n into k parts (ignoring which numbers are in which part) by pk(n), the total number of partitions by p(n). OEIS A000041.

Refer to caption
Figure 3: Partition (1,3,4,6) as Ferrers diagram and Young diagrams, English and French style

Again we study the function p(n) later, however here we shall not achieve a closed formula for the value.

{tryout}

We have B3=5, but p(3)=3. This is because the three partitions {{1},{2,3}}, {{2},{1,3}}, {{3},{1,2}} all have the same cell size pattern.

1 The twelvefold way theorem

We now extend the table of theorem 3:

Theorem 8.

if N,X are finite sets with |N|=n and |X|=x, the number of (equivalence classes of) functions f:NX is given by the following table. (In the first two columns, d/i indicates whether elements are considered distinguishable or indistinguishable. The boxed numbers refer to the explanations in the proof.):

N X f arbitrary f injective f surjective
d d 1xn 2(x)n 3x!S(n,x)
i d 4((xn)) 5(xn) 6(n1x1)=((xnx))
d i 7k=1xS(n,k) 81if nx0if n>x 9S(n,x)
i i 10k=1xpk(n) 111if nx0if n>x 12px(n)
Proof 4.1.

If N is distinguishable, we can simply write the elements of N in a row and consider a function on N as a sequence of values. In 1), we thus have sequences of length n with x possible values, in 2) such sequences without repetition, both formulas we already know.

If such a sequence takes exactly x values, each value f(n) can be taken to indicate the cell of a partition into x parts, into which n is placed. As we consider a partition as a set of parts, it does not distinguish the elements of X, that shows that the value in 9) has to be S(n,x). If we distinguish the elements of X we need to account for the x! possible arrangements of cells, yielding the value in 3).

Similarly to 9), if we do not require f to be surjective, the number of different values of f gives us the number of parts. Up to x different parts are possible, thus we need to add the values of the Stirling numbers.

To get 12) from 9) and 10) from 7) we notice that making the elements of N indistinguishable simply means that we only care about the sizes of the parts, not which number is in which part. This means that the Stirling number S(n,x) gets replaced by the partition (shape) count px(n).

If we again start at 1) but now consider the elements of N as indistinguishable, we go from sequences to sets. If f is injective we have ordinary sets, in the general case multisets, and have already established the results of 4) and 5).

For 6), we interpret the x distinct values of f to separate the elements of N into x parts. This is a composition of n into x parts, for which the count has been established.

In 8) and 11), finally, injectivity demands that we assign all elements of N to different values which is only possible if nx. As we do not distinguish the elements of X it does not matter what the actual values are, thus there is only one such function up to equivalence.

Chapter 2 Recurrence and Generating Functions

{epigraphs}\qitem

There’s al-gebra. That’s like sums with letters. For…for people whose brains aren’t clever enough for numbers, see? Jingo
Terry Pratchett

Finding a closed formula for a combinatorial counting function can be hard. It often is much easier to establish a recursion, based on a reduction of the problem. Such a reduction often is the principal tool when constructing all objects in the respective class.

An easy example of such a situation is given by the number of partitions of n, given by the Bell numbers Bn:

Lemma 0.1.

For n1 we have:

Bn=k=1n(n1k1)Bnk
Proof 0.2.

Consider a partition of {1,,n}. Being a partition, it must have 1 in one cell. We group the partitions according to how many points are in the cell containing 1. If there are k elements in this cell, there are (n1k1) options for the other points in this cell. And the rest of the partition is simply a partition of the remaining nk points.

1 Power Series – A touch of Calculus

A powerful technique for working with recurrence relations is that of generating functions. The definition is easy, for a counting function f:0 defined on the nonnegative integers we define the associated generating function as the power series

F(t)=n=0fntn[[t]]

Here [[t]] is the ring of formal power series in t, that is the set of all formal sums n=0antn with an.

When writing down such objects, we ignore the question whether the series converges, i.e. whether F can be interpreted as a function on a subset of the real numbers.

The reader interested in a rigorous, self-contained, presentation of the theory of formal power series, that does not require any reliance of results from analysis, can find it in [sambale2024].

Addition and multiplication are as one would expect: If F(t)=n0fntn and G(t)=n0gntn we define new power series F+G and FG by:

(F+G)(t) := n0(fn+gn)tn
(FG)(t) := n0(a+b=nfagb)tn.

With this arithmetic R[[t]] becomes a commutative ring. (There is no convergence issue, as the sum over a+b=n is always finite.)

We also define two operators, called differentiation and integration on [[t]] by

ddt(n0antn)=n1(nan)tn1

and

(n0antn)=n0ann+1tn+1

(with the integration constant is set as 0).

Two power series are equal if and only if all coefficients are equal. An identity involving a power series as a variable is called a functional equation.

As a “shorthand”, pretending we knew nothing from Calculus (after all, we do not establish convergence), we define the following names for particular power series:

exp(t) = n0tnn!
log(1+t) = n1(1)n1tnn

and note111All of this can either be proven directly for power series; alternatively, we choose t in the interval of convergence, use results from Calculus, and obtain the result from the uniqueness of power series representations. that the usual functional identities exp(a+b)=exp(a)exp(b), log(ab)=log(a)+log(b), log(exp(t))=t and differential identities ddtexp(t)=exp(t), ddtlog(t)=1/t hold.

A telescoping argument shows that for r we have that (1rt)(nrntn)=1, that is we can use the geometric series identity

nrntn=11rt

to embed rational functions (whose denominators factor completely; this can be taken as given if we allow for complex coefficients) into the ring of power series.

For a real number r, we define222Unless r is an integer, this is a definition in the ring of formal power series. the generalized binomial coefficient as:

(rn)=r(r1)(rn+1)n!

as well as a series representation of the r-th power:

(1+t)r=n0(rn)tn.

For obvious reasons we call this definition the binomial formula. Note that for integral r this agrees with the usual definition of exponents and binomial coefficients, so there is no conflict with the traditional definition of exponents.

We also notice that for arbitrary r,s we have that (1+t)r(1+t)s=(1+t)r+s and ddt(1+t)r=r(1+t)r1.

Up to this point, generating functions seem to be a formality for formalities sake. They however come into their own with the following observations: Operations on the entries of a sequence, relations amongst its entries (such as recursion), or a sequence having been built on top of other sequences often have natural analogues with generating functions. Recursive identities amongst a coefficient sequence then become functional equations or differential equations for the generating functions.

If these equations have solutions, known start values typically give uniqueness of the solution, and (assuming it converges in an open interval), its power series representation must be identical to the generating function. Methods from Analysis, such as Taylor’s theorem, then can be used to determine expressions for the terms of the sequence.

Before describing this more formally, let us look at a toy example:

We define a function recursively by setting f0=1 and fn+1=2fn. (This recursion comes from the number of subsets of a set of cardinality n – fix one element and distinguish between subsets containing this element and those that don’t.) The associated generating function is F(t)=n0fntn. We now observe that

2tF(t)=n02fntn+1=n0fn+1tn+1=F(t)1.

We solve this functional equation as

F(t)=112t

and (geometric series!) obtain the power series representation

F(t)=n02ntn.

this allows us to conclude that fn=2n, solving the recursion (and giving the combinatorial result (which we knew already by another method) that a set of cardinality n has 2n subsets.)

1 Operations on Generating Functions

To prepare us for working with generating functions, let’s look in more detail on what generating function operations do with the coefficients.

For this, assume that fn and gn are sequences, associated to generating functions F(t)=nfntn and G(t)=ngntn, respectively.

We start with basic linearity. λF(t)+G(t) is the generating function associated to the sequence λfn+gn. The case λ=1 is sometimes called the sum rule, describing the case that f and g count the cardinality of disjoint sets and fn+gn the cardinality of their unions.

Shifting

Shifting the indices corresponds to multiplication by (positive or negative) powers of t.

Differentiation

If F(t)=i0fiti, the derivative of F(t) corresponds to the sequence sn=(n+1)fn+1. This will allow for coefficients that also depend on the index position. This can often be used effectively together with shifts. For example, starting with the sequence fn=1, it corresponds (geometric series) to the generating function 11t. Its derivative, 1(1t)2 thus corresponds to the sequence 1,2,3,4,. Thus t(1t)2 corresponds to the sequence 0,1,2,3,. If we take a further derivative, we multiply each coefficient with its index position and shift back, resulting in a sequence of squares 1,4,9,16,. The associated generating function will be the derivative function 1+t(1t)3. We can shift once more to get the sequence sn=n2, associated to the generating function t(1+t)(1t)3.

Products

If we define a new sequence as sum of terms whose indices add up to a given value (this is sometimes called convolution):

cn=f0gn+f1gn1+f2gn2++fng0

its generating function will be F(t)G(t). This is called the product rule.

One particular case of this is the summation rule: Suppose we define one sequence by summing over another: cn=i=0nfi. We can interpret this as convolution with the constant sequence gi=1, whose generating function is 11t. The generating function for the summatory sequence thus is C(t)=F(t)1t.

Continuing the previous example, the sequence sn=i=0ni2 thus has the generating function S(t)=t(1+t)(1t)4.

The standard reference for generating functions is [generatingfunctionology]. The book [concrete] has a dedicated chapter, written on the level of an (advanced) undergraduate textbook.

We now make use of these tools by looking at general tools to find closed-form expressions for recursively defined sequences.

2 Linear recursion with constant coefficients

{epigraphs}\qitem

Then, at the age of forty, you sit,
theologians without Jehovah,
hairless and sick of altitude,
in weathered suits,
before an empty desk,
burned out, oh Fibonacci,
oh Kummer1, oh Gödel, oh Mandelbrot
in the purgatory of recursion.
1  also means: “grief”xxxxxxxxxxxxxx

{epigraphs}\qitem

Dann, mit vierzig, sitzt ihr,

o Theologen ohne Jehova,

haarlos und höhenkrank

in verwitterten Anzügen

vor dem leeren Schreibtisch,

ausgebrannt, o Fibonacci,

o Kummer, o Gödel, o Mandelbrot,

im Fegefeuer der Rekursion. Die Mathematiker

Hans Magnus Enzensberger

Suppose that fn satisfies a recursion of the form

fn=a1fn1+a2fn2++akfnk.

That is, there is a fixed number of recursion terms and each term is just a scalar multiple of a prior value (we shall see below that easy cases of index dependence also can be treated this way). We also assume that k initial values f0,,fk1 have been established. 333Some readers might have seen a linear algebra approach that solves such a recursion through the tool of matrix diagonalization. This approach results in the same characteristic polynomial and similar calculatory effort, but will not be generalizable in the same way as generating functions are.

The most prominent case of this are clearly the Fibonacci numbers OEIS A000045 with k=2, recursion fn=fn1+fn2 and initial values f0=f1=1. We shall use these as an example.

Step 1: Get the functional equation

Using the recursion, expand the coefficient fn in the generating function F(t)=nfntn with terms of lower index. Note that for n<k the recursion does not hold, you will need to look at the initial values to see whether the given formula suffices, or if you need to add explicit multiples of powers of t to get equality.

Separate summands into different sums, factor out powers of t to get fn combined with tn.

Replace all expressions fntn back with the generating function F(t). The whole expression also must be equal to F(t), this is the functional equation.

{tryout}

in the case of the Fibonacci numbers, the recursion is fn=fn1+fn2 for n>1. Thus we get, using the initial values f0=f1=1, that

n0fntn = n>1(fn1tn+fn2tn)+f1t+f0
= n>1fn1tn+n>1fn2tn+t+1
= tn>1fn1tn1+t2n>1fn2tn2+t+1
= tn>0fntn+t2n0fntn+t+1
= t(n0fntnf0)+t2n0fntn+t+1
= tF(t)tf0+t2F(t)+t+1=tF(t)+t2F(t)+1.

The functional equation is thus F(t)=tF(t)+t2F(t)+1, respectively F(t)(1tt2)=1.

Step 2: Partial Fractions

We can solve the functional equation to express F(t) as a rational function in t. (This is possible, because the functional equation will be a linear polynomial in F(t).) Then, using partial fractions (Calculus 2), we can write this as a sum of terms of the form ai(tαi)ei.

{tryout}

We solve the functional equation and obtain F(t)=11tt2. For a partial fraction decomposition, let α=1+52, β=152, be the roots of the denominator polynomial 1tt2=0. Then

F(t)=11tt2=atα+btβ

We solve this as a=1/5, b=1/(5).

Step 3: Use known power series to express each summand as a power series

The geometric series gives us that

atα=n0aαn+1tn

If there are multiple roots, denominators could arise in powers. For this we notice that

1(tα)2=n0(n+1)αn+2tn

and for an integer c1 that

1(t1)c=1cn0(c+n1n)tn

Using these formulae, we can write each summand of the partial fraction decomposition as an infinite series.

{tryout}

In the example we get

F(t) = 151tα+151tβ
= 15n01αn+1tn+15n01βn+1tn
Step 4: Combine to one sum, and read off coefficients

We now take this (unique!) power series expression and read off the coefficients. The coefficient of tn will be fn, which gives us an explicit formula.

{tryout}

Continuing the calculation above, we get

F(t)=n0(15αn+1+15βn+1)tn

and thus a closed formula for the Fibonacci numbers:

fn = 15αn+1+15βn+1
= 15(1+52)n+1+15(152)n+1
= 15((251)n+1+(25+1)n+1)

We notice that 251>25+1>0, thus asymptotically, as n:

fn+1/fn=251=ϕ1.618

the value of the golden ratio.

1 Another example

We try another example. Take the (somewhat random, but involving an index-dependent term that makes it more complicated) recursion given by

g0 = g1=1
gn = gn1+2gn2+(1)n,for n2

We get for the generating function

G(t) = ngntn=n1gn1tn+2n2gn2tn+n0(1)ntn+t
= tG(t)+2t2G(t)+11+t+t.

(you should verify that the addition of t was all that was required to resolve the initial value settings.)

We solve this functional equation as

G(t)=1+t+t2(12t)(1+t)2

and get (e.g. in Wolfram Alpha: partial fractions (1+t+t^2)/(1-2*t)/(1+t)^2) the partial fraction decomposition

G(t)=718(t12)19(t+1)+13(t+1)2.

an can read off the power series representation

G(t) = 718n2n+1tn+19n(1)n+1tn+13n1n(n+1)tn
= n(792n19(1)n+13(1)n(n+1))tn
= n(792n+(13n+29)(1)n)tn,

solving the recursion as gn=792n+(13n+29)(1)n.

3 Nested Recursions: Domino Tilings

{epigraphs}\qitem

The Domino Theory had become conventional wisdom and was rarely challenged. Diplomacy
Henry Kissinger

We now consider a number of problems that stem from counting arrangements of objects in the plane. While these counts themselves are not of deep importance, they produce nice examples of recursions.

Refer to caption
Figure 1: A 2×10 domino tiling

Suppose we have tiles that have dimensions 1×2 (in your favorite units) and we want to tile a corridor. Let dn be the number of possible tilings of a corridor that has dimensions 2×n. We can start on the left with a vertical domino (thus leaving to the right of it a tiling of a corridor of length n1) or with two horizontal dominos (leaving to the right of it a corridor of length n2). this gives us the recursion

dn=dn1+dn2,n>1,

with d1=1 and d2=2 (and thus d0=1 to fit the recursion). This is again the Fibonacci numbers we have already worked out.

If we assume that the tiles are not symmetric, there are actually two ways to place a horizontal tile and itwo ways to place a vertical tile. We thus get a recursion with different coefficients,

dn=2dn1+4dn2,n>1

with d1=2, d2=8 (and thus d0=1).

1 Multiple recursions

We go back to the assumption of symmetric tiles, but expand the corridor to dimensions 3×n. The recursion approach now seems to be problematic – it is possible to have patterns or arbitrary length that do not reduce to a shorter length, see figure 2.

Refer to caption
Figure 2: A tiling pattern that has no vertical cut

We thus instead look only at the right end of the tiling and argue that every tiling has to end with one of the three patterns depicted in the top row of figure 3.

Removing these end pieces either produces a tiling of length n2, or a tiling of length n in which the top (or bottom) right corner is missing. We thus introduce a count en for tilings of length n with the top right corner missing, the count for the bottom right corner missing will by symmetry be the same. The three possible end cases thus give us the recursion

dn=dn2+2en1
Refer to caption
Figure 3: Variants for a 3×n tiling

Since we also need no know the values of en, we build a recursive formula for them as well. Consider a tiling with the top right corner missing. It’s right end must be a vertical tile or two horizontal tiles. In this second case there also will have to be a further horizontal tile in the top row; the ends thus look as in the bottom row of figure 3.

This gives us the recursion

en=dn1+en2

We use the initial values d0=1, d1=0, d2=3, e0=0, e1=1, and thus get the following identities for the associate generating functions

D(t) = ndntn=n>1(dn2+2en1)tn+d0+d1t
= t2n>1dn2tn2+2tn>1en1tn1+1
= t2n0dntn+2t(n0entne0t0)+1
= t2D(t)+2tE(t)+1

and

E(t) = nentn=n>1(dn1+en2)tn+e0+e1t
= t(D(t)d0)+t2E(t)+t=tD(t)+t2E(t)

We solve this second equation as

E(t)=t1t2D(t)

and substitute into the first equation, obtaining the functional equation

D(t)=t2D(t)+2t21t2D(t)+1

which we solve for

D(t)=1t214t2+t4

This is a function in t2, indicating that dn=0 for odd n (indeed, this must be, as in this case 3×n is odd and cannot be tiled with tiles of area 2). We thus can consider instead the function

R(t)=1t14t+t2=nrntn

with d2n=rn. Partial fraction decomposition, geometric series, and final collection of coefficients gives us the formula

d2n=rn=(2+3)n33+(23)n3+3

and the sequence of rn given by OEIS A001835

1,3,11,41,153,571,2131,7953,29681,110771,413403,

If we consider wider corridors the recursions become more complicated. It is therefore somewhat surprising that it is possible to give a general formula for the number of ways of tiling an m×n rectangle with dominoes. According to [temperleyfisher], it is

j=1m2k=1n2(4cos2πjm+1+4cos2πkn+1),

but its derivation is beyond the scope of this course.

4 Catalan Numbers

{epigraphs}\qitem

The induction I used was pretty tedious, but I do not doubt that this result could be obtained much easier. Concerning the progression of the numbers 1, 2, 5, 14, 42, 132, etc. …

{epigraphs}\qitem

Die Induction aber, so ich gebraucht, war ziemlich mühsam, doch zweifle ich nicht, dass diese Sach nicht sollte weit leichter entwickelt werden können. Ueber die Progression der Zahlen 1, 2, 5, 14, 42, 132, etc. … Letter to Goldbach

September 4, 1751

Leonard Euler

Next, we look at an example of a recursion which is not linear; we also use what is essentially the product rule:

Definition 9.

The n-th Catalan number444Named in honor of Eugène Charles Catalan (1814-1894) who first stated the standard formula. The naming after Catalan only stems from a 1968 book, see http://www.math.ucla.edu/~pak/papers/cathist4.png. Catalan himself attributed them to Segner, though Euler’s work is even earlier. Cn OEIS A000108 is defined555Careful, some books use a shifted index, starting at 1 only! as the number of different ways a sum of n+1 variables can be evaluated by inserting parentheses.

{bsp}

We have C0=C1=1, C2=2: (a+b)+c and a+(b+c), and C3=5:

((a+b)+c)+d(a+(b+c))+da+((b+c)+d)a+(b+(c+d))(a+b)+(c+d)

To get a recursion, consider the position of the “outermost” addition: suppose it is after k+1 of the variables have been encountered. On its left side is a parenthesized expression in k+1 variables, on the ride side is an expression in (n+1)(k+1)=nk variables. We thus get the recursion

Cn=k=0n1CkCnk1,if n>0C0=1.

in which we sum over the products of lower terms.

This is basically the pattern of the product rule, just shifted by one.

We thus get the functional equation

𝒞(t)=t𝒞(t)2+1.

(which is easiest seen by writing out the expression for t𝒞(t)2 and collecting terms). The factor t is due to the way we index and 1 is due to initial values.

This is a quadratic equation in 𝒞(t) and yields the solution

𝒞(t)=114t2t.

The branch of the root was chosen so that the function has a limit, namely the value of C0, at t0. (The other branch yields a singularity, see Figure 4.)

Refer to caption
Figure 4: The two branches of 1±14t2t

The binomial series gives

14t=k0(1/2k)(4t)k.=1+k1(1/2k)(4t)k.

We also observe that

(1/2k)(4t)k = 12123232k2k!(1)k22ktk
= (1)k1(2k3)(2k5)2kk!(1)k22ktkset n=k1
= 2n+1(2n1)(2n3)(n+1)!tn+1
= (2n1)(2n3)(n+1)!2n+1tn+1
= 2(2n)!(n+1)!n!tn+1=2n+1(2nn)tn+1.

Therefore

114t2t=n01n+1(2nn)tn,

which shows that

Cn=(2nn)1n+1=(2n)!n!(n+1)!=(2nn)(2nn1).

Catalan numbers have many other combinatorial interpretations, and we shall encounter some in the exercises. Exercise 6.19 in [stanley2] (and its algebraic continuation 6.25, as well as an online supplement) contain hundreds of combinatorial interpretations.

5 Index-dependent coefficients and exponential generating functions

A recursion does not necessarily have constant coefficient, but might have a coefficient that is a polynomial in n. In this situation we can use (formal) differentiation, which will convert a term fntn into nfntn1. The second derivative will give a term n(n1)fntn2; first and second derivative thus allow us to construct a coefficient n2fn and so on for higher order polynomials.

The functional equation for the generating function then becomes a differential equation, and we might hope that a solution for it can be found in the extensive literature on differential equations.

Alternatively, (using the special form of derivatives for a typical summand), such a situation often can be translated immediately to a generating function by using the power series

1(1t)k+1=n(n+kn)tn.

For an example of variable coefficients, we take the case of counting derangements OEIS A000166, that is permutations that leave no point fixed. We denote by dn the number of derangements on {1,,n}.

To build a recursion formula, suppose that π is a derangement on {1,,n}. Then nπ=i<n. We now distinguish two cases, depending on how i is mapped by n:

a) If iπ=n, then π swaps i and n and is a derangement of the remaining n2 points, thus there are dn2 derangements that swap i and n. As there are n1 choices for i, there are (n1)dn2 derangements that swap n with a smaller point.

b) Suppose there is ji such that jπ=n. In this case we can “bend” π into another permutation ψ, by setting

kψ={kπif kjiif k=j.

We notice that ψ is a derangement of the points {1,,n1}.

Vice versa, if ψ is a derangement on {1,,n1}, and we choose a point j<n, we can define π on {1,,n} by

kπ={kψif ki,nnif k=jjψif k=n.

We again notice that π is a derangement, and that different choices of ψ and i result in different π’s. Furthermore, the two constructions are mutually inverse, that is every derangement that is not in class a) is obtained by this construction.

There are dn1 possible derangements ψ and n1 choices for j, so there are (n1)dn1 derangements in this second class.

We thus obtain a recursion

dn=(n1)(dn1+dn2)

and hand-calculate666The reader might have an issue with the choice of d0=1, as it is unclear what a derangement on no points is. But we know that d1=0 and d2=1, forcing this value for d0 to make the recursion consistent. the initial values d0=1 and d1=0.

From this recursion we could now construct a differential equation for the generating function of dn, but there is a problem: Because of the factor n in the recursion, the values dn grow roughly like n!. The resulting series thus will have convergence radius 0, making it unlikely that a function satisfying this differential equation can be found in the literature.

We therefore introduce the exponential generating function which is defined simply by dividing the i-th coefficient by a factor of i!, thus keeping coefficient growth bounded.

In our example, we get

D(t)=ndnn!tn

and thus

ddtD(t)=n1ndnn!tn1=n1dn(n1)!tn1=n0dn+1n!tn.

We also have

tD(t) = n0dntn+1n!=n0(n+1)dntn+1(n+1)!
= n1ndn1tnn!=n0ndn1tnn!
tD(t) = n0ndnn!tn

From this, the recursion (written as dn+1=n(dn+dn1)) gives:

tD(t)+tD(t) = n0(ndn1+ndn)tnn!
= n0dn+1tnn!=D(t),

and thus the separable differential equation

D(t)D(t)=t1t.

with D(0)=d0=1.

Standard techniques from Calculus give the solution

D(t)=et1t.

of this differential equation. Looking up this function for Taylor coefficients (respectively determining the formula by induction) shows that

D(t)=n0(i=0n(1)ii!)tn

and thus (introducing a factor n! to make up for the denominator in the generating function) that

dn=n!(i=0n(1)ii!).

This is n! ,multiplied with a Taylor approximation of e1. Indeed, if we consider the difference to n!e1, the alternating series gives, for n1:

|dnn!e| = n!|i=n+1(1)ii!|
< n!|(1)n+1(n+1)!|=1n+112

We have proven:

Lemma 5.1.

dn is the integer closest to n!/e.

That is asymptotically, if we put letters into envelopes, the probability is 1/e that no letter is in the correct envelope.

6 The product rule, revisited

Exponential generating functions have one more trick up their sleeve, arguably their most important contribution. For this, let us return to the product rule. It corresponds to the situation that the objects of a certain “weight”777The word “weight” indicates a size measurement, e.g. the number of vertices in a graph. can be described in terms of combining objects of lower weights (that sum up to the desired weight) in all possible ways.

This splitting-up however only considers the number of (sub)objects in each part, not which particular ones are in each part. In other words, we consider the constituent objects as indistinguishable.

Suppose now, however, that we are counting objects whose parts have identifying labels. In the example of the Catalan numbers this would be for example, if we cared not only about the parentheses placement, but also about the symbols we add, that is (a+b)+c would be different from (c+a)+b.

In such a situation, the recursion formula must, for each k, account for which k elements are chosen to be in the “left side”, with the rest being in the“right side”. Every combination is possible. That is, the recursion becomes:

dn=k=0n(nk)akbnk.

We can write this as

dnn!=k=0nakk!bnk(nk)!,

which is the formula for multiplication of the exponential generating functions!

Let us look at this in a pathetic example, the number of functions from N={1,,n} to {1,,r} (which we know already well as rn).

Let an count the number of constant functions on an n-element set, that is an=1. The associated exponential generating function thus is

A(t)=ntnn!=exp(t)

(which, incidentally, shows why these are called “exponential” generating functions).

If we take an arbitrary function f on N, we can partition N into r (possibly empty) sets N1,,Nr, such that f is constant on Ni and the Ni are maximal with this property.

We get all possible functions f by combining constant functions on the possible Ni’s for all possible partitions of N. Note that the ordering of the partitions is significant – they indicate the actual values.

We are thus exactly in the situation described, and get as exponential generating function (start with r=2, then use induction for larger r) the r-fold product of the exponential generating functions for the number of constant functions:

D(t)=exp(t)exp(t)r factors=exp(rt)

The coefficient for tn in the power series for exp(rt) is rnn!, and hence the counting function is rn, as expected.

1 Bell Numbers

{epigraphs}\qitem
133z2𝑑z×cos3π9=log(e3)

The integral z-squared dz,
From one to the cube root of three,
Times the cosine,
Of three pi over nine
Equals log of the cube root of e. Anon.

We try this approach next to obtain an exponential generating function for the Bell numbers (though not a closed form expression for its coefficients):

Recall that the Bell numbers Bn give the total number of partitions of {1,,n} and satisfy (Lemma 0.1) the recursion:

Bn=k=1n(n1k1)Bnk=k=0n1(n1k)Bn1k

In light of the product rule, we insert a factor 1 and write this (after reindexing) as

Bn+1=k=0n(nk)1Bnk

and thus

nBn+1n!tn=t(k=0n1k!Bnk(nk)!)tn

If we denote the exponential generating function of the Bn by F(t)=nBntn/n!, the right hand side thus will give the product of the generating function of the constant sequence an=1 (which we just saw is exp(t)) with F(t). This reflects the split that gave us the recursion – into a set of size k containing the number 1 (the total number of such sets being 1 once the numbers are chosen), and a partition of the remaining numbers.

The left hand side is the exponential generating function of Bn+1 which is just the derivative of F(t), thus we have that

ddtF(t)=n1Bntn1(n1)!=n0Bn+1tnn!=exp(t)F(t).

This is a separable differential equation, it’s solution is

F(t)=cexp(exp(t))

for some constant c. As F(0)=1 we solve for c=exp(1) and hence get the exponential generating function

nBntnn!=exp(exp(t)1).

There is no nice way to express the power series coefficients of this function in closed form, a Taylor approximation is (with denominators being deliberately kept in the form of n! to allow reading off the Bell numbers):

1+t+2t22!+5t33!+15t44!+52t55!+203t66!+877t77!+4140t88!+21147t99!+115975t1010!.

One, somewhat surprising, application of Bell numbers is to consider rhyme schemes. Given a sequence of n lines, the lines which rhyme form the cells of a partition of {1,,n}. For example, the partition {{1,2,5},{3,4}} is the scheme AABBA used by Limericks, while Poe’s The Raven uses AABCCCBBB or

{{1,2},{3,7,8,9},{4,5,6}}.

We can read off the Taylor expansion that B5=52. The classic 11th century Japanese novel Genji monogatari (The Tale of Genji) has 54 chapters of which first and last are considered “extra”. The remaining 52 chapters are introduced each with a 5 line poem in one of the 52 possible rhyme schemes and a symbol illustrating the scheme. These symbols, see figure 5, the Genji-mon, have been used extensively in Art. See https://www.viewingjapaneseprints.net/texts/topics_faq/genjimon.html

Refer to caption
Figure 5: The Genji-mon
\attribution

2 Stirling numbers

We apply the same idea to the Stirling numbers of the second kind, S(n,k) denoting the number of partitions of n into k (non-empty) parts. According to 8, part 3) there are k!S(n,k) order-significant partitions of n into k parts.

We denote the associated exponential generating function (for order-significant partitions) by

𝒮k(t)=nk!S(n,k)tnn!.

We also know that there is – apart from the empty set – exactly one partition into one cell. That is

𝒮1(t)=t+t22!+t33!+=exp(t)1

If we have a partition into k-parts, we can fix the first cell and then partition the rest further. Thus, for k>1 we have that

𝒮k(t)=𝒮1(t)𝒮k1(t),

which immediately gives that

𝒮k(t)=(exp(t)1)k

We deduce that the Stirling numbers of the second kind have the exponential generating function

nS(n,k)tnn!=(exp(t)1)kk!.

Using the fact that Bn=k=1nS(n,k), we thus get the exponential generating function for the Bell numbers as

nBntnn! = knS(n,k)tnn!
= k(exp(t)1)kk!=exp(exp(t)1)

in agreement with the above result.

We also can use the exponential generating function for the Stirling numbers to deduce a coefficient formula for them:

Lemma 6.1.
S(n,k)=1k!i=1k(1)ki(ki)in.
Proof 6.2.

We first note that – as 0n=0 – the i-sum could start at 1 or 0 alternatively without changing the result.

Then multiplying through with k!, we know by the binomial formula that

k!nS(n,k)tnn! = (exp(t)1)k=i=0k(ki)exp(t)i(1)ki
= i=0k(ki)exp(it)(1)ki=i=0k(ki)(n(it)nn!)(1)ki
= n(i=0k(ki)(1)kiin)tnn!.

and we read off the coefficients.

3 Involutions

Finally, let us use this in a new situation:

Definition 10.

A permutation π on {1,,n} is called888Group theorists often exclude the identity, but it is convenient to allow it here. an involution OEIS A000085 if π2=1, i.e. (iπ)π=i for all i.

We want to determine the number sn of involutions on n points.

Consider the number of cycles (including 1-cycles, that is fixed points). Let sr(n) be the number of involutions with exactly r cycles. Clearly s0(0)=1, s1(0)=0, s1(1)=1, s1(2)=1, hence the exponential generating function for a single cycle is S1(t)=t+t22.

When considering an arbitrary involution, we can split off a cycle, seemingly leading to a formula

(t+t22)r.

But (similar as when we considered a generating function for k!S(n,k) for the Stirling numbers), such a product of exponential generating functions will consider the arrangement of cycles, i.e. consider (1,2)(3,4) different from (3,4)(1,2). We correct this by dividing by r! and thus get

Sr(t)=1r!(t+t22)r

and thus for the exponential generating function of the number of involutions a sum over all possible r:

S(t)=r=0Sr(t)=r1r!(t+t22)r=exp(t+t22)=exp(t)exp(t22).

We easily write down a power series for the two factors

exp(t) = ntnn!
exp(t22) = nt2n2nn!

and multiply out, yielding

S(t) = (nt2n2nn!)(ntnn!)
= mk=0m/2t2ktm2k2kk!(m2k)!

and thus (again introducing a factor of n! for making up for the exponential generating function)

sn=k=0m/2n!2kk!(n2k)!.

Chapter 3 Inclusion, Incidence, and Inversion

It is calculus exam week and, as every time, a number of students were reported to require alternate accommodations:

  • 14 students are sick.

  • 12 students are scheduled to play for the university Quiddich team at the county tournament.

  • 12 students are planning to go on the restaurant excursion for the food appreciation class.

  • 5 students on the Quiddich team are sick (having been hit by balls).

  • 4 students are scheduled for the excursion and the tournament.

  • 3 students of the food appreciation class are sick (with food poisoning), and

  • 2 of these students also planned to go to the tournament, i.e. have all three excuses.

The course coordinator wonders how many alternate exams need to be provided.

Using a Venn diagram and some trial-and-error, it is not hard to come up with the diagram in figure 1, showing that there are 28 alternate exams to offer.

Refer to caption
Figure 1: An example of Inclusion/Exclusion

1 The Principle of Inclusion and Exclusion

{epigraphs}\qitem

By the method of exclusion, I had arrived at this result, for no other hypothesis would meet the facts. A Study in Scarlet
Arthur Conan Doyle

The Principle of Inclusion and Exclusion (PIE) formalizes this process for an arbitrary number of sets:

Let X be a set and {A1,,An} a family of subsets. For any subset I{1,,n} we define

AI=iIAi,

using that A=X. Then

Lemma 1.1.

The number of elements that lie in none of the subsets Ai is given by

I{1,,n}(1)|I||AI|.
Proof 1.2.

Take xX and consider the contribution of this element to the given sum. If xAi for any i, it only is counted for I=, that is contributes 1.

Otherwise let J={1anxAa} and let j=|J|. We have that xAI if and only if IJ. Thus x contributes

IJ(1)|I|=i=0j(ji)(1)i=(11)j=0

As a first application we determine the number of derangements on n points in an different way:

Let X=Sn be the set of all permutations of degree n, and let Ai be the set of all permutations π with iπ=i. Then SniAi is exactly the set of derangements, there are (ni) possibilities to intersect i of the Ai’s, and the formula gives us:

d(n)=i=0n(1)i(ni)(ni)!=n!i=0n(1)ii!.

For a second example, we calculate the number of surjective mappings from an n-set to a k-set (which we know already from 8 to be k!S(n,k)):

Let X be the set of all mappings from {1,,n} to {1,,k}, then |X|=kn. Let Ai be the set of those mappings f, such that i is not in the image of f, so |Ai|=(k1)n. More generally, if I{1,k} we have that |AI|=(k|I|)n. The surjective mappings are exactly those in X outside any of the Ai, thus the formula gives us the count

i=0k(1)i(ki)(ki)n,

using again that there are (ki) possible sets I of cardinality i.

The factor (1)i in a formula often is a good indication that inclusion/exclusion is to be used.

Lemma 1.3.
i=0n(1)i(ni)(m+niki)={(mk)if mk,0if m<k.
Proof 1.4.

To use PIE, the sets Ai need to involve choosing from an n, set, and after choosing i of these we must choose from a set of size m+ni.

Consider a bucket filled with n blue balls, labeled with 1,,n, and m red balls. How many selections of k balls only involve red balls? Clearly the answer is the right hand side of the formula.

Let X be the set of all k-subsets of balls and Ai those subsets that contain blue ball number i, then PIE gives the left side of the formula.

We finish this section with an application from number theory. The Euler function φ(n) counts the number of integers 1kn with gcd(k,n)=1.

Suppose that n=i=1rpiei, X={1,,n} and Ai the integers in X that are multiples of piei. Thus |Ai|=npiei and for I={i1,,ik} we have that AI consists of multiples of PI=pi1pik and thus |Ai|=nPI. Then φ(n) counts the number of elements in X that do not lie in any of the Ai and (inclusion/exclusion)

φ(n)=ni=1rnpi+1i,jrnpipj=n(11pi).

with the second identity obtained by multiplying out the product on the right hand side.

We also note – exercise LABEL:phisum – that d|nφ(d)=n. On its own this looks as if it is entirely separate from the inclusion/exclusion concept we just considered. But by generalize the concept of a hierarchy, given hitherto through inclusion of subsets, to more general situations, we shall see that this is a structural consequence of the previous formula.

2 Partially Ordered Sets and Lattices

{epigraphs}\qitem

The doors are open; and the surfeited grooms
Do mock their charge with snores:
I have drugg’d their possets,
That death and nature do contend about them
Macbeth, Act II, Scene II
William Shakespeare

A poset or partially ordered set is a set A with a relation RA×A on the elements of A which we will typically write as ab instead of (a,b)R, such that for all a,b,cA:

(reflexive)

aa.

(antisymmetric)

ab and ba imply that a=b.

(transitive)

ab and bc imply that ac.

The elements of a poset thus are the elements of A, not those of the underlying relation and is cardinality is that of A.

For example, A could be the set of subsets of a particular set, and with be the “subset or equal” relation.

A convenient way do describe a poset for a finite set A is by its Hasse-diagram. Say that a covers b if ab, ab and there is no ac=b with acb. The Hasse diagram of the poset is a graph in the plane which connects two vertices a and b only if a covers b, and in this case the edge from b to a goes upwards.

Because of transitivity, we have that ab if and only if one can go up along edges from a to reach b.

Refer to caption
Figure 2: Hasse Diagrams of Small Posets and Lattices

Figure 2 gives a number of examples of posets, given by their Hasse diagrams, including all posets on 3 elements.

An isomorphism of posets is a bijection that preserves the relation.

An element in a poset is called maximal if there is no larger (wrt. ) element, minimal is defined in the same way. Posets might have multiple maximal and minimal elements.

For another cute example 111Taken from lecture notes [axue] consider in Figure 3, left, the arrangement of fruits according to convenience and taste, as given by222pardon the title of the cartoon https://xkcd.com/388/.

Refer to caption
Refer to caption
Figure 3: Fruits, arranged by subjective taste and ease-of-use
\attribution

R. Munroe, F*** Grapefruit, https://xkcd.com/388/, and [axue], p.108

We can read of a partial order from this by declaring a fruit as “better” than another, if it both more tasty and easier to consume. The resulting Hasse diagram is on the right side of Figure 3. It is easily seen that not every pair of fruits is comparable this way, and that there is no universal “best” or “worst” fruit.

1 Linear extension

{epigraphs}\qitem

My scheme of Order gave me the most trouble Autobiography
Benjamin Franklin

A partial order is called a total order, if for every pair a,bA of elements we have that ab or ba.

While this is not part of our definition, we can always embed a partial order into a total order.

Proposition 11.

Let RA×A be a partial order on A. Then there exists a total order (called a linear extension) TA×A such that RT.

To avoid set acrobatics we shall prove this only in the case of a finite set A. Note that in Computer science the process of finding such an embedding is called a topological sorting.

Proof 2.1.

We proceed by induction over the number of pairs a,b that are incomparable. In the base case we already have a total order.

Otherwise, let a,b such an incomparable pair. We set (arbitrarily) that a<b. Now let

L={xAxRa},U={xAbRx}

We claim that S=R{(l,u)lL,uu} is a partial order. As (a,b)S it has fewer incomparable pairs, this shows by induction that there exists a total order TSR, proving the theorem.

Since R is reflexive, S is. For antisymmetry, suppose by contradiction that for some xy we have that (x,y),(y,x)S. Since R is a partial order, not both could have been in R.

Thus assume (WLOG) that

(y,x)SR{(l,u)lL,uu}.

This implies that yRa and bRx. If (x,y)R, this implies by transitivity that bRa, contradicting the choice of a,b. If (x,y)R, then also (x,y)SR and thus by definition xRa. Transitivity implies again bRxRa in contradiction to the choice of a,b.

For transitivity, the definition of S implies that we cannot have transitivity for pairs in SR. Thus first suppose that (x,y)SR and (y,z)S. Then (y,z)R, as otherwise bRyRa. But then bRyRz, implying that (x,z)S. The other remaining case is analog.

This theorem implies that we can always label the elements of a countable poset with positive integers, such that the poset ordering implies the integer ordering. Such an embedding is in general not unique, see Theorem 21.

The case of a totally ordered subset gets a special name:

Definition 12.

A chain in a poset P is a subset of P such that any two elements of it are comparable. (That is, restricted to the chain the order is total.)

2 Lattices

Definition 13.

Let A be a poset and a,bA.

  • A greatest lower bound of a and b is an element ca,b which is maximal in the set of elements with this property.

  • A least upper bound of a and b is an element ca,b which is minimal in the set of elements with this property.

A is a lattice if any pair a,bA have a unique greatest lower bound, called the meet and denoted by ab; as well as unique least upper bound, called the join and denoted by ab.

Amongst the Hasse diagrams in figure 2, e,j,k,l) are lattices, while the others are not. Lattices always have unique maximal and minimal elements, sometimes denoted by 0 (minimal) and 1 (maximal).

Other examples of lattices are:

  1. 1.

    Given a set X, let A=𝒫(X)={YX} the power set of X with defined by inclusion. Meet is the intersection, join the union of subsets.

  2. 2.

    Given an integer n, let A be the set of divisors of n with given by “divides”. Meet and join are gcd, respectively lcm.

  3. 3.

    For an algebraic structure S, let A be the set of all substructures (e.g. group and subgroups) of S and given by inclusion. Meet is the intersection, join the substructure spanned by the two constituents.

  4. 4.

    For particular algebraic structures there might be classes of substructures that are closed under meet and join, e.g. normal subgroups. These then form a (sub)lattice.

Using meet and join as binary operations, we can axiomatize the structure of a lattice:

Proposition 14.

Let X be a set with two binary operations and and two distinguished elements 0,1X. Then (X,,,0,1) is a lattice if and only if the following axioms are satisfies for all x,y,zX:

Associativity:

x(yz)=(xy)z and x(yz)=(xy)z;

Commutativity:

xy=yx and xy=yx;

Idempotence:

xx=x and xx=x;

Inclusion:

(xy)x=x=(xy)x;

Maximality:

x0=0 and x1=1.

Proof 2.2.

The verification that these axioms hold for a lattice is left as exercise to the reader.

Vice versa, assume that these axioms hold. We need to produce a poset structure and thus define that xy iff xy=x. Using commutativity and inclusion this implies the dual property that xy=(xy)y=y.

To show that is a partial order, idempotence shows reflexivity. If xy and yx then x=xy=yx=y and thus antisymmetry. Finally suppose that xy and yz, that is x=xy and y=yz. Then

xz=(xy)z=x(yz)=xy=x

and thus xz. Associativity gives us that xyx,y if also zx,y then

z(xy)=(zx)y=zy=z

and thus zxy, thus xy is the unique greatest lower bound. The least upper bound is proven in the same way and the last axiom shows that 0 is the unique minimal and 1 the unique maximal element.

Definition 15.

An element x of a lattice L is join-irreducible (JI) if x0 and if x=yz implies that x=y or x=z.

For example, figure 4 shows a lattice in which the black vertices are JI, the others not.

{tryout}

If we take the lattice of subsets of a set, the join-irreducibles are the 1-element sets. If we take divisors of n, the join-irreducibles are prime powers.

When representing elements of a finite lattices, it is possible to do so by storing the JI elements once and representing every element based on the JI elements that are below. This is used for example in one of the algorithms for calculating the subgroups of a group.

3 Product of posets

The cartesian product provides a way to construct new posets (or lattices) from old ones: Suppose that X,Y are posets with orderings X, Y, we define a partial order on X×Y by setting

(x1,y1)(x2,y2)if and only ifx1Xx2andy1Yy2.
Proposition 16.

This is a partial ordering, so X×Y is a poset. If furthermore both X and Y are lattices, then so is X×Y.

The proof of this is exercise LABEL:posetproduct.

This allows us to describe two familiar lattices as constructed from smaller pieces (with a proof also delegated to the exercises):

Proposition 17.

a) Let |A|=n and 𝒫(A) the power-set lattice (that is the subsets of A, sorted by inclusion). Then 𝒫(A) is (isomorphic to) the direct product of n copies of the two element lattice {0,1}.
b) For an integer n=i=1rpiei>1 written as a product of powers of distinct primes, let 𝒟(n) be the lattice of divisors of n. Then 𝒟(n)𝒟(p1e1)××𝒟(prer).

3 Distributive Lattices

A lattice L is distributive, if for any x,y,zL one (and thus also the other) of the two following laws hold:

x(yz) = (xy)(xz)
x(yz) = (xy)(xz)
{tryout}

These laws clearly hold for the lattice of subsets of a set or the lattice of divisors of an integer n.

Lattices of substructures of algebraic structures are typically not distributive, the easiest example (diagram l) in figure 2) is the lattice of subgroups of C2×C2 which also is the lattice of subspaces of F22.

If P=(X,) is a poset, a subset YX is an order ideal, if for any yY and zX we have that zy implies zY.

Lemma 3.1.

The set of order ideals is closed under union and intersection.

Proof 3.2.

Let A,B be order ideals and yAB and zy. Then yA or yB. In the first case we have that zA, in the second case that zB, and thus always zAB. The same argument also works for intersections.

This implies:

Lemma 3.3.

The set of order ideals of P, denoted by J(P) is a lattice under intersection and union.

As a sublattice of the lattice of subsets, J(P) is clearly distributive.

For example, if P is the poset on 4 elements with a Hasse diagram given by the letter N (figure 2, g) then figure 4 describes the lattice J(P).

Refer to caption
Figure 4: Order-ideal lattice for the “N” poset.

In fact, any finite distributive lattice can be obtained this way

Theorem 18 (Fundamental Theorem for Finite Distributive Lattices, Birkhoff).

Let L be a finite distributive lattice. Then there is a unique (up to isomorphism) finite poset P, such that LJ(P).

To prove this theorem we use the following definition:

Definition 19.

For any element x of the poset P, let x={yyx} be the principal order ideal generated by x.

Lemma 3.4.

An order ideal of a finite poset P is join irreducible in J(P) if and only it is principal.

Proof 3.5.

First consider a principal order ideal x and suppose that x=bc with b and c being order ideals. Then xb or xc, which by the order ideal property implies that xb or xc.

Vice versa, suppose that a is a join irreducible order ideal and assume that a is not principal. Then for any xa, x is a proper subset of a. But clearly a=xax.

Corollary 20.

Given a finite poset P, the set of join-irreducibles of J(P), considered as a subposet of J(P), is isomorphic to P.

Proof 3.6.

Consider the map that maps x to x. It maps P bijective to the set of join-irreducibles, and clearly preserves inclusion.

We now can prove theorem 18

Proof 3.7.

Given a distributive lattice L, let X be the set of join-irreducible elements of L and P be the subposet of L formed by them. By corollary 20, this is the only option for P up to isomorphism, which will show uniqueness.

Let ϕ:LJ(P) be defined by ϕ(a)={xXxa}, that is it assigns to every element of L the JI elements below it. (Note that indeed ϕ(a) is an order ideal.). We want to show that ϕ is an isomorphism of lattices.

Step1: Clearly we have that a=xϕ(a)x for any aL (using the join over the empty set equal to 0). Thus ϕ is injective.

Step 2: To show that ϕ is surjective, let YJ(P) be an order ideal of P, and let a=yYy. We aim to show that ϕ(a)=Y: Clearly every yY also has ya, so Yϕ(a). Next take a join irreducible xϕ(a), that is xa. Then xyYy and thus

x=x(yYy)=yY(xy)

by the distributive law. Because x is JI, we must have that x=xy for some yY, implying that xy. But as Y is an order ideal this implies that xY. Thus ϕ(a)Y and thus equality, showing that ϕ is surjective.

Step 3: We finally need to show that ϕ maps the lattice operations: Let xX. Then xab if and only if xa and xb. Thus ϕ(ab)=ϕ(a)ϕ(b).

For the join, take xϕ(a)ϕ(b). Then xϕ(a), implying xa, or (same argument) xb; therefore xab. Vice versa, suppose that xϕ(ab), so xab and thus

x=x(ab)=(xa)(xb).

Because x is JI that implies x=xa, respectively x=xb.

In the first case this gives xa and thus xϕ(a); the second case similarly gives xϕ(b).

We close with a further use of the order ideal lattice:

Theorem 21.

Let P be a poset of size m. The number of different linear orderings of P is equal to the number of different chains of (maximal) length m in J(P).

Proof 3.8.

Exercise.

4 Chains, Antichains, and Extremal Set Theory

{epigraphs}\qitem

Man is born free;
and everywhere he is in chains
The Social Contract
Jean-Jacques Rousseau

We start with a definition dual to that of a chain:

Definition 22.

An antichain in a poset is a subset, such that any two (different) elements are incomparable.

We shall consider partitions of (the elements of) a poset into a collection of chains (or of antichains).

Clearly a chain C and antichain A can intersect in at most one element. This gives the following duality:

Lemma 4.1.

Let P be a poset.
a) If P has a chain of size r, then it cannot be partitioned in fewer than r antichains.
b) If P has an antichain of size r, then it cannot be partitioned in fewer than r chains.

A stronger version of this goes usually under the name of Dilworth’s theorem333proven earlier by Gallai and Milgram.

Theorem 23 (Dilworth, 1950).

The minimum number m of chains in a partition of a finite poset P is equal to the maximum number M of elements in an antichain.

Proof 4.2.

The previous lemma shows that mM, so we only need to show that we can partition P into M chains. We use induction on |P|, in the base case |P|=1 nothing needs to be shown.

Consider a chain C in P of maximal size. If every antichain in PC contains at most M1 elements, we apply induction and partition PC into M1 chains and are done.

Thus assume now that {a1,,aM} was an antichain in PC. Let

S = {xPxai for some i}
S+ = {xPxai for some i}

Then SS+=P, as there otherwise would be an element we could add to the antichain and increase its size.

As C is of maximal size, the largest element of C cannot be in S, and thus we can apply induction to S. As there is an antichain of cardinality M in S, we partition S into M disjoint chains.

Similarly we partition S+ into M disjoint chains. But each ai is maximal element of exactly one chain in S and minimal element of exactly one chain of S+. We can combine these chains at the ai’s and thus partition P into M chains.

Corollary 24.

If P is a poset with nm+1 elements, it has a chain of size n+1 or an antichain of size m+1.

Proof 4.3.

Suppose not, then every antichain has at most m elements and by Dilworth’s theorem we can partition P into m chains of size n each, so |P|mn.

Corollary 25 (Erdős-Szekeres, 1935).

Every sequence of nm+1 distinct integers contains an increasing subsequence of length at least n+1, or a decreasing subsequence of at length least m+1.

Proof 4.4.

Suppose the sequence is a1,,aN with N=nm+1. We construct a poset on N elements x1,,xN by defining xixj if and only if ij and aiaj. (Verify that it is a partial order!)

The theorems of this section in fact belong into a bigger context that has its own chapter, chapter 4, devoted to.

A similar argument applies in the following two famous theorems, that are part of foundational material of extremal set theory.

Theorem 26 (Sperner, 1928).

Let N={1,,n} and A1,AmN, such that AiAj if ij. Then m(nn/2).

Proof 4.5.

Consider the poset of subsets of N and let 𝒜={A1,,Am}. Then 𝒜 is an antichain.

A maximal chain 𝒞 in this poset will consist of sets that iteratively add one new point, so there are n! maximal chains, and k!(nk)! maximal chains that involve a particular k-subset of N.

We now count the pairs (A,𝒞) such that A𝒜 and 𝒞 is a maximal chain with A𝒞. As a chain can contain at most one element of an antichain this is at most n!.

On the other hand, denoting by ak the number of sets Ai with |Ai|=k, we know there are

n!k=0nk!(nk)!ak=n!(k=0nak(nk))

such pairs. As (nk) is maximal for k=n/2 we get

(nn/2)(nn/2)k=0nak(nk)k=0nak=m.

We note that equality is achieved if 𝒜 is the set of all n/2-subsets of N.

Theorem 27 (Erdős-Ko-Rado, 1961).

Let 𝒜={A1,,Am} a collection of m distinct k-subsets of N={1,,n}, where kn/2, such that any two subsets have nonempty intersection. Then m(n1k1).

Proof 4.6.

Consider “cyclic k-sequences” ={F1,,Fn} with Fi={i,i+1,,i+k1}, taken “modulo n” (that is each number should be ((x1)modn)+1).

Note that |𝒜|k, since if some Fi equals Aj, then any only other Fl𝒜 must intersect Fi, so we only need to consider (again considering indices modulo n) Fl for ik+1li+k1. But Fl will not intersect Fl+k, allowing at most for a set of k subsequent Fi’s to be in 𝒜.

As this holds for an arbitrary 𝒜, the result remains true after applying any arbitrary permutation π to the numbers in . Thus

z:=πSn|𝒜π|kn!

We now calculate the sum z by fixing Aj𝒜, Fi and observe that there are k!(nk)! permutations π such that Fiπ=Aj. Thus z=mnk!(nk)!, proving the theorem.

5 Incidence Algebras and Möbius Functions

A common tool in mathematics is to consider instead of a set S the set of functions defined on S. To use this paradigm for (finite) posets, define an interval on a poset P for a given pair xyP as the set {zPxzy}, and denote by Int(P) the set of all intervals. (It might be convenient to set [x,y]= if xley.

For a field K, we shall consider the set of functions on the intervals:

I(P)=I(P,K)={f:Int(P)K}

(with f()=0) and call it the incidence algebra of P. If we denote intervals by their end points x,y, we shall write f(x,y) for f([x,y]) if fI(P).

This set of functions is obviously a K-vector space under pointwise operations. We also define a multiplication on I(P) by defining, for f,gI(P) a function fg by

(fg)(x,y)=xzyf(x,z)g(z,y)

In exercise LABEL:incidencealgebra we will show that with this definition I(P) becomes an associative K-algebra444An algebra is a structure that is both a vector space and a ring, such that vector space and ring operations interact as one would expect. The prototype is the set of matrices over a field. with a one, given by

δ(x,y)={1x=y0xy.

We could consider I(P) as the set of formal K-linear combinations of intervals [x,y] and a product defined by

[x,y][a,b]={[x,b]a=y0ay,

and extended bilinearily.

If P is finite, we can, by theorem 11, arrange the elements of P as x1,,xn where xixj implies that ij. Then I(P) is, by exercise LABEL:incalgmatalg isomorphic to the algebra of upper triangular matrices M=(mi,j) where mi,j=0 if xixj.

Lemma 5.1.

Let fI(P). Then f has a (two-sided) inverse if and only if f(x,x)0 for all xP.

Proof 5.2.

The property fg=δ is equivalent to:

f(x,x)g(x,x)=1for all xP,

(implying the necessity of f(x,x)0) and

g(x,y)=f(x,x)1x<zyf(x,z)g(z,y).

If f(x,x)0 the second formula will define the values f1 uniquely, depending only on the interval [x,y]. Reverting the roles of f and g shows the existence of a left inverse and standard algebra shows that both have to be equal.

The zeta function of P is the characteristic function of the underlying relation, that is ζ(x,y)=1 if and only if xy (and ζ()=0).

This implies that

ζ2(x,y)=xzyζ(x,z)ζ(z,y)=xzy1=|{xzzy}|

is the size of the interval.

By Lemma 5.1, ζ is invertible. The inverse μ=ζ1 is called the Möbius function of the lattice P. The identities

μ(x,x) = 1 (1)
μ(x,y) = xz<yμ(x,z) (2)

follow from μζ=δ and allow for a recursive computation of values of μ and imply that μ is integer-valued.

For illustration, we shall compute the Möbius function for a number of common posets.

Lemma 5.3.

Let P be the total order on the numbers {1,,n}. Then for any x,yP we have:

μ(x,y)={1if x=y1if x+1=y0otherwise
Proof 5.4.

The case of x=y is trivial. If x+1=y, the sum in 2 has only one summand, and the result follows. Thus assume that xy but y{x,x+1}. Then

μ(x,y)=μ(x,x)μ(x,x+1)x+2z<yμ(x,z)=x+2z<yμ(x,z)

and the result follows by induction on yx.

Lemma 5.5.

If P, Q are posets, the Möbius function on P×Q satisfies

μ((x1,y1),(x2,y2))=μP(x1,x2)μQ(y1,y2)
Proof 5.6.

It is sufficient to verify that the right hand side of the equation satisfies 2.

Together with Theorem 17 and Lemma 5.3 we get

Corollary 28.

a) For X,Y𝒫(A), we have that μ(X,Y)=1|Y||X| if XY, and 0 otherwise.
b) If x,y are divisors of n, then in 𝒟(n) we have that μ(x,y)=(1)d if y/x is the product of d different primes, and 0 otherwise.

Part b) explains the name: μ(1,n) is the value of the classical number theoretic Möbius function.

Part a) connects us back to section 1: The Möbius function gives the coefficients for inclusion/exclusion over an arbitrary poset. We will investigate and clarify this further in the rest of this section.

1 Möbius inversion

The property of being inverse of the incidence function can be used to invert summation formulas with the aid of the Möbius function:

Theorem 29 (Möbius inversion formula).

Let P be a finite poset, and f,g:PK, where K is a field. Then

g(t)=stf(s)for all tP

is equivalent to

f(t)=stg(s)μ(s,t)for all tP.
Proof 5.7.

Let KP be the K-vector space of functions PK. The incidence algebra I(P) acts linearly on this vector space by

(fξ)(t)=stf(s)ξ(s,t).

The two equations thus become

g=fζ,respectivelyf=gμ,

and their equivalence follows from the fact that μ is the inverse of ζ.

The classical Möbius inversion formula from Number Theory follows as a special case of this.

If we consider the linear poset {1,,n} with total ordering, Lemma 5.3 gives us that (we assume f(0)=g(0)) the (unsurprisingly) result that

g(k)=i=0kf(i)

is equivalent to

f(k)=g(k)g(k1),

the finite difference analog of the fundamental theorem of Calculus!

Going back to the start of the chapter, consider n subsets Ai of a set X. We take the the poset 𝒫(X) and define, for I{1,,n}

f(I) = 1|I||XiIAi|
g(I) = |iIAi|.

Then g(I)=JIf(J) by Inclusion/Exclusion over the complements XAi, while

f(I)=JIg(J)μ(J,I)=J1|I||J||jJAj|=1|I||XAi|

is the ordinary inclusion/exclusion formula.

Chapter 4 Connections

{epigraphs}\qitem

Mathematicians are like Frenchmen.
When you talk to them, they translate
it into their own language,
and then it is something quite different.

{epigraphs}\qitem

Die Mathematiker sind eine Art Franzosen;

redet man zu ihnen, so übersetzen sie es

in ihre Sprache, und dann ist es

alsobald ganz etwas anders. Maximen und Reflexionen:

Über Natur und Naturwissenschaft

Johann Wolfgang von Goethe

{epigraphs}\qitem

With so many joints and connections, leaks were plentiful. As the magazine The Builder remarked, in 1856: “The fate of a theater is to be burned. It seems simply a question of time.” Connections
James Burke

In this chapter we will look at connections – both in an applied sense of modeling situations of connected objects, in the abstract sense of connecting mathematical theorems that initially seem to be unrelated, and in connecting concepts that might seem to be hopelessly abstract to practical applications. One of the joys of mathematics is to discover such connections and see the unity of the discipline.

This practical relevance of the results places much of this chapter also in close contact to the realm of (discrete) optimization.

We shall investigate a gaggle of theorems (which each in their area are fundamental), but which turn out to be equivalent in the sense that we can deduce each theorem as a consequence from any other. Furthermore this kind of derivation is often easier than to derive the theorem from scratch. The description owes much to [reichmeider]

Many theorems in this chapter are formulated in the language of graphs: We typically denote a graph by Γ=(V,E) with vertices V and edges E being 2-element sets of vertices. A digraph is a graph with directed edges (that is we consider edges as elements of V×V instead of 2-element subsets of V). In a digraph, we would allow distinct edges (x,y) and (y,x). Weighted edges means we have a weight function w:E.


Our start will be Dilworth’s Theorem 23 that we have proven already:

The minimum number m of chains in a partition of a finite poset P is equal to the maximum number M of elements in an antichain.

1 Halls’ Marriage Theorem

{epigraphs}\qitem

He was, methinks, an understanding fellow who said, ’twas a happy marriage betwixt a blind wife and a deaf husband.

{epigraphs}\qitem

Celuy là s’y entendoit, ce me semble, qui dict qu’un bon mariage se dressoit d’une femme aveugle avec un mary sourd. Essais, Livre III

Michel de Montaigne

Consider a family of subsets A1,,AnX. A system of distinct representatives (SDR) for these sets is an n-tuple of distinct elements x1,,xn such that xiAi.

{tryout}

SDRs do not have to exist, for example consider A1=A2={1}.

We define, for a subset J{1,,n} of indices, a set

A(J)=jJAj.

We immediately see a reason for the above example failing: For an SDR to exist, by necessity |A(J)||J| for any such subset J, since there are otherwise not sufficiently many elements available to pick distinct representatives. The following Theorem111Proven by the British Mathematician Philip Hall and extended by the unrelated American Mathematician Marshall Hall for the infinite case. Thus the apostrophe placement in the section title. shows this condition is not only necessary, but also sufficient.

Theorem 30 (Halls’ Marriage Theorem).

The family {A1,,An} of finite sets has a system of distinct representatives if and only if

|A(J)||J|for allJ{1,,n}. (1)

The name “Marriage Theorem” comes from the following interpretation: Suppose we have a set of m men and n women. We let Ai be the set of men that women i would consider for a potential marriage222The theorem goes back to times of more restrictive social mores. The concerned reader might want to consider k applicants for n jobs and Ai being the set of candidates that satisfy the conditions for job i.. Then every women can marry a suitable man if and only if every group of k women together considers at least k men as suitable.

Proof 1.1 (DilworthHall).

As the necessity of the condition is trivial, we show sufficiency:

Given sets Ai satisfying condition (1), Let Y={y1,,yn} be n symbols representing the sets. We create a poset P in the following way: The elements of P is the disjoint union XY. The only relations are that xyi iff xAi.

Clearly X is an antichain in P. Suppose S is another arbitrary antichain and let J={1jnyjS}. The antichain criterion imposes that A(J)S=, so

|S||J|+(|X||A(J)|)|X|

because of (1). That means X is a maximal antichain, and by Dilworth’s theorem, P can be partitioned into |X| chains. As a chain cannot contain more than one point from X or more than one point from Y, the pigeonhole principle implies that each chain contains exactly one point from X and at most one point from Y. Suppose that xiX is the element that is together with yiY in a chain. Then xiyi, and thus xiAi, so {x1,,xn} is a SDR.

As an example of the use of this theorem we consider the following theorem due to G. Birkhoff. A matrix M0n×n with nonnegative entries is called doubly statistic333There also is the term doubly stochastic, which denotes real matrices with row/column sum 1. I.e. every double statistic matrix can be scaled to a double stochastic matrix if every row and every column of M has the same row/column sum.

Corollary 31.

A doubly statistic matrix M=(mij) with row/column sum l can be written as the sum of l permutation matrices.

Proof 1.2.

We define sets Ai, corresponding to the row indices by Ai={jmij>0}. For any k-tuple K of indices, the sum of the corresponding rows of M is kl. As every column of M has column sum l, this means that these k rows must have nonzero entries in at least k columns, that is |iKAi|k. Thus 1 is satisfied and there is a SDR. We set pi,j=1 if j is the chosen representative for Ai (and pi,j=0 otherwise). Then P=(pi,j) is a permutation matrix and MP is doubly statistic with sum l1. The statement follows by induction on l.

2 Kőnig’s Theorems – Matchings

{epigraphs}\qitem

Eventually everything connects - people, ideas, objects…the quality of the connections is the key to quality per se. Charles Eames

Let A be an m×n matrix with 0/1 entries. A line of A is a row or column. A set of lines covers A, if every nonzero entry of A lies on at least one of the lines. Nonzero entries are called independent if no two lie on a common line. The term rank of A is the maximum number of independent entries of A.

The following theorem is reminiscent of the properties of a basis in linear algebra:

Theorem 32 (Kőnig-Egerváry).

The minimum number of lines covering A equals the term rank of A.

{tryout}

Let

A=(1000010011000111)

Then A can be covered with 3 lines, and has an independent set of cardinality 3.

Proof 2.1 (HallKőnig-Egerváry).

Given A{0,1}m×n, let p be the term rank of A and q the minimum number of lines covering A. We have to show that p=q.

We first show that pq: A cover of l lines can cover at most l independent 1’s (no line covers more than one) but we can cover all ones with q lines, so pq.

To see that pq, without loss of generality permute the rows and columns of A such that a minimal cover involves the first r rows and the last s columns of the matrix. We aim to find an independent set that has r entries in the first r rows and in columns 1,,ns, and s entries in the last s columns in rows r+1,,m:

For a row index 1ir let Ni={1jnsAi,j=1}. Then the union of any k of these Ni’s contains at least k column indices – otherwise we could replace these k rows with <k columns in a minimal cover. By Hall’s theorem we thus have an SDR {x1,,xr}. By definition xiNi implies that Ai,xi=1. Let S={(i,xi)i=1,,r}. Since the xi are distinct, this is an independent set.

A dual argument for the last s columns gives an independent set T of positions in the last s columns and rows r+1,,n. Since no position in S shares row or column index with any position in T, ST also is an independent set, of cardinality r+s=q.

We reformulate this theorem in the language of graphs. A graph is called bipartite, if its vertex V set can be written as a disjoint union V=V1V2, so that no vertex in Vi has a neighbor in Vi (but only neighbors in V3i). A vertex cover in a graph is a set of vertices that every edge is incident to at least one vertex in the cover. A matching in a graph is a set of edges such that no two edges are incident to the same vertex. A matching is called maximal, if no matching with a larger number of edges exists.

We assume that the graphs we consider here have no isolated vertices.

Theorem 33 (Kőnig).

In a bipartite graph without isolated vertices, the number of edges in a maximum matching equals the number of vertices in a minimum vertex cover.

{tryout}

Figure 1 shows a bipartite graph with a maximum matching (bold).

Refer to caption
Figure 1: A bipartite graph with maximum matching and minimum vertex cover.
Proof 2.2 (Kőnig-EgerváryKőnig).

Let W,UV be the two parts of the bipartition of vertices. Suppose |W|=m, |U|=n. We describe the adjacency in an m×n matrix A with 0/1 entries, Ai,j=1 iff wi is adjacent to uj. (Note that the examples for this and the previous theorem are illustrating such a situation.)

A matching in the graph corresponds to an independent set — edges sharing no common vertices. A vertex cover — vertices such that every edge is incident — correspond to a line cover of the matrix. The result follows immediately from the Kőnig-Egerváry theorem.

Matchings in bipartite graphs can be interpreted as creating assignments between tasks and operators, between customers and producers, between men and women, etc., and thus clearly have practical implications (Compare the Marriage Theorem interpretation of Hall’s theorem!)

Due to this practical relevance, there is obvious interest in an algorithm for finding maximum matchings in a bipartite graph, the Assignment problem. The first algorithm published uses Kőnig’s theorem as a criterion whether an existing matching can be improved. Due to the origin of the theorem authors, it has been named the Hungarian Method. (It turns out that this method had been discovered before independently by Jacobi [jacobiassign].)

Let ME be a (not necessarily maximal) matching in a graph Γ=(V,E). We call an edge matched if it is in M, and free otherwise. Similarly, vertices incident to matched edges are called matched, vertices not incident to any matched edge are free.

An augmenting path for M is a path whose end points are free and whose edges are alternatingly matched and free. This means all vertices but the end points are matched. Such a path allows us to replace M with a larger matching, in that we can replace the matched edges of the augmenting path in M with the free edges in the path, of which there is one more.

If the cardinality of M is that of a vertex cover C, of course no augmenting paths can exist: Every edge of M will be incident with exactly one vertex in C, leaving one of the two free edges at the end of the path without cover.

But if the cardinality is smaller there must be an augmenting path:

Corollary 34.

If |M|<|C| for a vertex cover C, there either is a vertex cover of smaller size, or there is an augmenting path for M.

Refer to caption
Figure 2: The two reduction in Corollary 34. Shaded area is cover.
Proof 2.3.

We prove the statement by induction over the number of vertices |V|, the base case being trivial.

Case 1: Assume that two vertices v,wC are connected by an edge in M. Then both v and w must be incident to other edges, as we otherwise could drop one of them from C and obtain a strictly smaller cover.

Furthermore one such edge {v,v1} must be to a vertex v1C (as we otherwise could drop v from the vertex cover). Clearly {v,v1} is free. We similarly find a free edge {w,w1} to a vertex w1C. If both v1 and w1 are free, then v1,v,w,w1 is an augmenting path. Otherwise at least one vertex, WLOG v1, is matched thus incident to an edge {v1,v2}M. As we assumed that v1C this implies that v2C. As the graph is bipartite we also know that v2w. Now consider the smaller graph Γ obtained from Γ by deleting v and v1 (and the adjacent edges) and adding an edge {v2,w} if it did not yet exist. Then M=(M{{v,w},{v1,v2}}){{v2,w} is a matching in Γ and C=C{v} a vertex cover in Γ (see Figure 2, left). By induction, we can either obtain a strictly smaller vertex cover D for Γ (in which case D{v} or D{v1} is a strictly smaller cover for Γ), or an augmenting path for Γ. If this path uses the new matching edge {v2,w}, we replace this edge by the edge sequence {v2,v1},{v1,v},{v,w} and obtain an augmenting path in Γ. This completes case 1.

In Case 2, we now may assume that no two vertices of the cover are connected by an edge in the matching, as |M||C| there must be an unmatched vertex vC.

If v is adjacent to any other unmatched vertex this is an augmented path of length 1.

Otherwise v is adjacent to a matched vertex wC (if all neighbors are in C, we could remove v from C). By a matched edge, w is adjacent to another vertex x, and to cover that edge we must have xC. We furthermore may assume that w was chosen so that x is also incident to an edge not in the matching, as we otherwise could replace all x’s with the respective w’s in the cover and remove v from C, reducing the cover size.

Now consider the graph Γ, obtained by removing the vertices v and w and their incident edges and set M=M{{x,w}} and C=C{v} (Figure 2, right).

Then C is a vertex cover of Γ (all edges that v covered are gone). Also M is a matching with |M|<|C|. By induction, either Γ has a smaller cover D (in which case D{v} is a smaller cover for Γ), or there is an augmenting path. If this path imvolves x, we extend it by xwv and thus obtain an augmenting path for Γ.

To find a maximal matching, one can thus simply start with some arbitrary matching (pick edges as long as no two share a vertex) and vertex cover and then use corollary 34 to refine them iteratively.

It is not hard to generalize this approach to (complete) bipartite graphs with weighted edges to obtain a (perfect: all vertices are matched) matching of maximal weight (profit):

Given an arbitrary matching with weighted edges, replace an edge {a,b} with (assume: integral) weight w by a set of edges {a,b1},{b1,a2},{a2,b2},,{aw,b} with newly introduced vertices ai,bi that are not connected in any other way. Selecting the edge {a,b} in the original graph thus now allows selection of w edges. Thus the maximum weight of a matching in the original graph corresponds to the cardinality of a maximum matching in the new graph.

1 Stable Matchings

{epigraphs}\qitem

…to Alvin E. Roth and Lloyd S. Shapley “for the theory of stable allocations and the practice of market design”. Citation for the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2012

Also of practical interest is the concept of a stable matching (or stable marriage). Imagine a matching in a bipartite graph represents an assignment of students to college places. Every student has a personal preference ranking of colleges, every college (place – assume for simplicity that every college just has one student) has a personal ranking of students. A matching is instable (otherwise: stable), if there is a student s and a college c such that s ranks c higher than their current college, and c ranks s higher than their current student.

A priori it is unclear that stable matchings exist, the following algorithm not only proves that but also gives a way of producing one:

{algo}

[Deferred Acceptance, Gale, Shapley] Every student has a temporarily assigned college, colleges may have a temporarily assigned student. Once a college has been assigned a student, it will replace it only by one it ranks higher. A student may be moved down from higher ranked to lower ranked colleges.

For bookkeeping, students carry a rejected label that will change status as the algorithm goes on, and also and carry for each college an indicator whether they have been rejected by this college.

  1. 1.

    [Initialize] Label every student as rejected

  2. 2.

    [Complete] If no student is rejected, terminate.

  3. 3.

    [Student Choice] Every student marked as rejected applies to the college she ranks highest amongst those who have not yet rejected her.

  4. 4.

    [College Choice] Every college tentatively picks from the students who chose it the one it ranks highest (and removes that students rejection status). It rejects all other students who have selected it,even if they had been picked tentatively before.

  5. 5.

    Go back to step 2.

Proof 2.4.

The only way the process terminates is if no student is rejected, that is they were tentatively accepted by a college and this college has no student applying it would rank higher. All colleges which the student ranks higher have rejected her because they were able to accept a student they rank higher, so the matching is stable.

In every round, students select from colleges and are either tentatively accepted, or are left with colleges of lower preference. As there is a finite set of students and preferences, this process must terminate.

This algorithm is used for example in the US to match graduating medical doctors to hospital training positions, www.nrmp.org. It also was instrumental for the award of the 2012 Nobel prize in Economics to Shapley and Roth.

3 Menger’s theorem

{epigraphs}\qitem

Live in fragments no longer. Only connect, and the beast and the monk, robbed of the isolation that is life to either, will die. Howards End
E. M. Forster

Kőnig’s aim in proving the Theorem 33 was as a tool towards a proof for a more general theorem by Menger that deals with connections in a graph (and gets us in statement and proof back close to Dilworth’s theorem), but whose proof, when first published, turned out to not work in the case of bipartite graphs.

Let u,w be nonadjacent vertices in a graph Γ. A uw-vertex path in Γ is a path from u to w. A collection of uw-vertex paths is called independent, if the paths are pairwise disjoint except for u and w. A set S of vertices, excluding u and w, is uw-vertex separating, if every uw path must contain a vertex of S. Clearly the minimum cardinality of a uw vertex separating set must be at least as large as that of a number of uw independent paths. It turns out that they are equal:

Theorem 35 (Menger).

Let u,w be nonadjacent vertices in a graph. Then the maximum number of independent uw vertex paths equals the minimum cardinality of a uw vertex separating set.

As Kőnig’s proof is comparatively long, we will instead give a direct proof due to G. Dirac. Note the similarity to the proof of Dilworth’s theorem!

Proof 3.1.

Let m be the minimum cardinality of a separating set, and M the maximum number of independent paths. We have seen that mM.

Assume that the theorem is false and Γ=(V,E) be a graph with a minimum number of edges for which the theorem fails, that is we have two nonadjacent vertices u and w and fewer than m independent vertex paths.

If e is an edge not incident to u or w, we can remove e and obtain a smaller graph Δ for which the theorem holds. Since Γ, and thus Δ has at most m1 uw-independent paths this means that in Δ there is an uw-separating set T of size m1. If we add one of the vertices incident to e to this set T, we obtain a set S of cardinality m which is uw separating – any path connecting in Γ but not in Δ clearly would have to use the edge e. We thus may assume that S contains a vertex not adjacent to u or w.

Next, we note that there is no path usw with sS of length 2 – if there was we could remove s and have a graph with fewer edges and m1 separating vertices and less than m1 independent paths, contradicting the minimality of Γ.

For the final contradiction, denote by Pu the set of vertex paths between u and exactly one vertex in S and Pw ditto for w. The paths in Pu and Pw have only vertices of S in common (otherwise we could circumvent S, contradicting that it is separating).

We claim that all vertices of S are adjacent to u, or that all vertices in S are adjacent to w. This will contradict the choice of S and prove the theorem:

Consider the graph consisting of Pu, w and edges sw for sS. If w is not adjacent to all vertices in s, this graph has fewer edges than G, so by assumption has m independent paths from u to w. If we leave out the sw-step from these paths we obtain a set Ru of m independent paths from u to the m elements of S. Similarly, we get a set of independent paths Rw from w to the elements of S. Combining these paths at the elements of s yields m independent paths from u to w, contradiction.

A dual version of the theorem holds for disjoint paths:

Theorem 36 (Menger’s theorem, edge version).

The maximum number of edge-disjoint paths from u to w equals the minimum number of edges, whose removal would put u and v into disjoint components.

The proof of the theorem is by the ordinary version, applied to the line graph L(Γ) of Γ. The vertices of L(Γ) are the edges of Γ, two vertices are adjacent if the edges in Γ are incident to the same vertex. See figure 3 for an example.

Refer to caption
Figure 3: A graph and its line graph

Menger’s theorem also generalizes in the obvious way to directed graphs, though we shall not give a statement or proof here.

4 Network Flows, Max-flow/Min-cut

{epigraphs}\qitem

And he thought of himself floating on his back down a river, or striking out from one island to another, and he felt that that was really the life for a Tigger. The house at Pooh corner
A. A. Milne

A network is a digraph (V,E) with weighted edges with two distinguished vertices, the source s and the target t. We assume that the weights (which we call capacities) c:E0 are nonnegative. (Below we shall assume that weights are in fact rational – clearly we can approximate real weights to arbitrary precision. We shall investigate the issue of irrational capacities in the exercises.) An example is given in Figure 4. For a (directed) edge e=(a,b) we denote by ι(e)=a the initial, and by τ(e)=b the terminal vertex of this edge.

Refer to caption
Figure 4: A network

The reader might want to consider this as a network of roads with weights being the maximal capacity. (It is possible to have different capacities in different directions on the same road.) We want to transport vehicles from s to t (and implicitly ask how many we can transport through the network).

Note that we are looking for a steady state in a steady-state transport situation; that is we do not want to transport once, but forever, and ask for the transport capacity per time unit.

A flow is a function f defined on the edges (indicating the number of units we transport along that edge) such that

  • 0f(e)c(e)eE (do not exceed capacity).

  • ι(e)=vf(e)=τ(e)=vf(e)vV{s,t} (No intermediate vertex can buffer capacity or create new flow. This is Kirchhoff’s law in Physics.)

The value of a flow is the net transport out of the source:

val(f)=ι(e)=sf(e)τ(e)=sf(e).

This definition might seem asymmetric in focusing on the source, however Lemma 4.1 will show that this is a somewhat arbitrary choice.

The question we ask thus is for the maximal value of a flow. The main tool for this is to consider the flow passing “by a point” in the network:

A cut CV in a network is a set of vertices such that sC, tC. We do not require the cut to be connected, though in practice it usually will be.

We note that the value of any flow is equal to the net flow out of a cut. (This implies that the value of a flow also is equal to the net transport into the target):

Lemma 4.1.

Let C be a cut. We denote the edges crossing the cut by

Co = {eEι(e)C,τ(e)C},
Ci = {eEτ(e)C,ι(e)C}.

Then for any flow f we have that

val(f)=eCof(e)eCif(e).
Proof 4.2.

We prove the statement by induction on the number of vertices in C. For |C|=1 we have that C={s} and the statement is the definition of the value of a flow.

Now suppose that |C|>1, let svC and D=C{v}. By induction, the statement holds for D.

We now consider the following disjoint sets of edges:

Po = {eEι(e)D,τ(e)C},
Pi = {eEτ(e)D,ι(e)C},
Qo = {eEι(e)D,τ(e)=v},
Qi = {eEτ(e)D,ι(e)=v},
Ro = {eEι(e)=v,τ(e)C},
Ri = {eEτ(e)=v,ι(e)C}.

Any edge that starts at v must be in Qi or in Ro, any edge ending at v must be in Qo or in Ri. Kirchhoffs law thus gives us that

eQif(e)+eRof(e)=eQof(e)+eRif(e).

It is easily seen that Co=PoR0, Ci=PiRi and that Do=PoQo and Di=PiQi. Therefore, using the previous expression to replace a difference of Q-sums by a difference of R-sums:

val(f) = eDof(e)eDif(e)
= ePof(e)ePif(e)+eQof(e)eQif(e)=eRof(e)eRif(e)
= eCof(e)eCif(e).

The claim follows by induction.

We define the capacity of a cut as the sum of the capacity of the edges leaving the cut:

cap(C)=eCoc(e).

Lemma 4.1 thus implies that the value of any flow must be bounded by the capacity of any cut and thus (unsurprisingly) the maximal flow value is bounded by the minimum cut capacity – a chain is only as strong as its weakest link.

Similar to the dualities we considered before, equality is attained:

Theorem 37 (Max-Flow Min-Cut, integer version).

Suppose the capacity function c in integer valued. Then the maximum value of a flow in a network is equal to the minimum capacity of a cut. Furthermore, there is a maximum flow f that is integer valued.

Note 4.3.

If the capacity function is rational valued, we can simply scale with the lcm of the denominators and obtain an integer valued capacity function. In the case of irrational capacities, the theorem holds by a boring approximation argument.

Proof 4.4 (MengerMax-flow).

Given a directed network, replace every arc with integral weight c by c disjoint444Or edges with intermediate vertices to ensure disjointness directed edges.

Clearly, if one edge of a c-fold multi-edge set is in a minimum separating set, the other c1 edges have to be.

Thus the cardinality of a minimum edge-separating set is a minimum cut, while k independent paths give a flow of k. By the edge version of Menger’s theorem, the minimum cardinality of an edge-separating set equals the maximum number of independent edge paths, proving the theorem.

The existence of a maximum flow with integer values will follow from the following algorithm.

The algorithm of Ford and Fulkerson finds a maximal flow for a given network. It start with a valid flow (say f(e)=0 for all edges) and then improves this flow if possible, using the following main step: {algo}[increase flow] Given a flow f, return a flow with higher value, or show that no such flow exists.

  1. 1.

    Let A:={s}. \alginvarA is a cut of vertices that can be supplied at higher rate from the source. I:={}. \alginvarI is a set of edges that are not yet at capacity. R:={}. \alginvarR is a set of edges whose flow should be reduced.

  2. 2.

    If tA, go to step 6.

  3. 3.

    If there is an edge eAo with f(e)<c(e), set A:=A{τ(e)}, I:=I{e}. Go to step 2.

  4. 4.

    If there is an edge eAi with f(e)>0 \alginvarThis is flow into A which we could reduce, set A:=A{ι(e)}, R:=R{e}. Go to step 2.

  5. 5.

    If neither of these two properties hold, terminate with a message that the flow is maximal.

  6. 6.

    By tracing back how t got added to A, we construct an (undirected) path (the augmenting path) P=(e1,e2,) from s to t. Let

    d=min(minePI(c(e)f(e)),minePRf(e)).

    Increase the flow on PI by d, reduce the flow on PR by d. \alginvarThis satisfies Kirchhoffs law. Return the larger flow.

Proof 4.5.

We shall show termination in the case of integral capacities: Every time we adjust a flow, we may assume that the flow on one edge increases by an integral value. This can only happen a finite number of times.

Once the algorithm ends in step 5, we have a cut A such that eAif(e)=0, eAof(e)=eAoc(e), thus val(f)=cap(A) and the flow must be maximal by Theorem 37.

{tryout}

We illustrate the algorithm in the example from above. Figure 5, a) shows some flow (in blue). The total flow is 24.

We start building the list of under-supplied vertices and mark edges on which to increase, respectively reduce the flow:

Vertices Increase Reduce
s
B sB
A AB
D AD
t DT

We find an augmenting path sBADt (red in Figure b) and note that we can increase the flow by 2 (the limiting factor being edge sB) to a total of 26.

Starting again, we obtain (Figure c) the augmenting path sCFBAEt, on which we can increase the flow by 2 again, resulting is a flow of 28.

Finally we build the under-supplied cities as C,F,B and then cannot increase the set. Thus (Figure d), shaded) {s,C,F,B} is a cut whose capacity of 28 is reached, proving that we have a maximal flow.

a) Refer to caption b) Refer to caption

c) Refer to caption d) Refer to caption

Figure 5: Increasing a given flow to maximum
Refer to caption
Figure 6: Example for bad augmenting paths
Note 4.6.

As given, the algorithm is not necessarily in polynomial time, as there might be a huge number of small incremental steps. Take the network in Figure 6. Starting with a flow of 0, we choose the augmenting path sABt on which we can increase the flow by 1. Next, we take the augmenting path sBAt, reducing the flow on AB to 0 again and have a total flow of 2. We can iterate this, increasing the flow by 1 each step until we get the maximum flow of 200.

Polynomial time in this algorithm however can be achieved by a more careful choice of the augmenting path (e.g. a shortest path, Edmonds-Karp algorithm), that analysis however is beyond the scope of these notes.

Note 4.7.

To close the circle, observe that the Max-flow/Min-cut theorem implies Dilworth’s theorem: Given a poset, we model it as a directed network with flow from bottom to top and arc capacity one. Thus all theorems mentioned in this section (and in fact a few more we have not mentioned) are in a sense mutually equivalent and form the fundamental theorem of discrete optimization.

1 Braess’ Paradox

{epigraphs}\qitem

On Earth Day this year, New York City’s Transportation Commissioner decided to close 42d Street, which as every New Yorker knows is always congested. ”Many predicted it would be doomsday,” said the Commissioner […] But to everyone’s surprise […] Traffic flow actually improved when 42d Street was closed. New York Times, 12/25/1990, p.38
Gina Kolata

{epigraphs}\qitem

From the time of Say and Ricardo the classical economists have taught that supply creates its own demand The General Theory of Employment, Interest, and Money
John Maynard Keynes

Often the question of maximal flow turns up not just for an existing network, but already at the stage of network design, aiming to maximize flow. Braess’ paradox shows that this can be very nonintuitive.

While we shall give a theoretical example, concrete instances of this effect have been observed in the real world in several cities (New York; Stuttgart, Germany; Winnipeg, Canada;…) when roads had been temporarily blocked because of building work, or after new roads were built. The interested reader might observe obvious implications to society and politics.

Suppose we have four cities, A, B, C, D with connecting roads as shown in figure 7, left. The reader may observe that this configuration is similar to the one in Note 4.6.

Cities A and B as well as C and D are connected by a minor road which easily clogs up. The driving time thus depends strongly on the number of cars, it is

tAB=tCD=10x.

with x the number of drivers in thousands. An obstacle blocks direct connections from B to C.

Refer to caption
Figure 7: Braess’ paradox: Before and after a new road is built.

High capacity roads have been build to connect A and C, as well as B and D. While overall time is longer, the impact for extra cars is less. Driving time with x thousand drivers on the road is

tAC=tBD=50+x.

Assume 6000 drivers want to travel from A to D. A symmetry argument shows that half (3000) travel via B and half via C. Their individual travel time is 103+50+3=83. It is not hard to show that this is a stable equilibrium, i.e. no traveler can gain by changing their route. Furthermore, the system will settle in this equilibrium by itself, as long we assume drivers want to minimize their travel time and when starting their journey have perfect knowledge of how many cars are on each road.


Eventually, a new connection is built from B to C through the obstacle (right image). (For simplicity of the argument we shall assume that it is one-way, but that is not crucial for the paradox, but only simplifies analysis.) It has high capacity and is short, so travel time on this route is

tBC=10+x.

We now have three possible routes from A to D: ABD, ACD, and ABCD. Again assuming perfect information, drivers will move towards a distribution in which travel time along all possible routes is equal. We claim that such an equilibruum exists with 2000 drivers each using ABD, ACD and ABCD. (This means that 4000 drivers are traveling AB and CD, respectively, 2000 travel BC.) This is because we have

tABD = =104+(50+2)=92
tABCD = 104+(10+2)+104=92
rACD = (50+2)+104=92.

Again, one can show that this is the only equilibruum, and that it is stable.

But this new optimal travel time is 92>83 and thus more than before. Implications that this might have for society are left to the reader as an exercise.

References

Index

Some Counting Sequences

Bell

OEIS A000110, page 7
B0=1,B1=1,B2=2,B3=5,B4=15,B5=52,B6=203,B7=877,B8=4140,B9=21147,B10=115975

Catalan

OEIS A000108, page 9
C0=1,C1=1,C2=2,C3=5,C4=14,C5=42,C6=132,C7=429,C8=1430,C9=4862,C10=16796

Derangements

OEIS A000166, page 5
d(0)=1,d(1)=0,d(2)=1,d(3)=2,d(4)=9,d(5)=44,d(6)=265,d(7)=1854,d(8)=14833

Involutions

OEIS A000085, page 10
s(0)=1,s(1)=1,s(2)=2,s(3)=4,s(4)=10,s(5)=26,s(6)=76,s(7)=232,s(8)=764

Fibonacci

OEIS A000045, page 2
F0=1,F1=1,F2=2,F3=3,F4=5,F5=8,F6=13,F7=21,F8=34,F9=55,F10=89,F11=144

Partitions

OEIS A000041, page LABEL:partdef
p(0)=1,p(1)=1,p(2)=2,p(3)=3,p(4)=5,p(5)=7,p(6)=11,p(7)=15,p(8)=22,p(9)=30