In the beginning, there was the empty set

What would you say if I told you that in math everything is a set? That, whatever mathematical object you name, it can be defined using sets? And not sets that contain some special ingredient somewhere deep inside - but by basically wrapping up empty sets and merging them with more wrapped empty sets in various degrees of wrapping?

Because that’s how things are. If you are curious how to do it, let us take a small walk through the foundations of the set theory!

Let’s start with some axioms

If we want to build up the whole universe with sets, we need to set up some ground rules to follow. Otherwise, we wouldn’t know what is a set and what is allowed and what is not.

To be honest, we don’t define a set. It just is. A thing that contains other things… or not. That’s basically what we can tell about a set: does it contain something or not.

Set(1, 2, 3).contains(2) // true
Set(1, 2, 3).contains(4) // false

Axiom of extensionality

So, the first thing we can do with sets is taking them and checking if they are the same thing. Therefore we say, that two sets are equal (and they are the same set) if they contain the same elements.

X,Yz[(zX    zY)    X=Y]\forall_{X,Y} \forall_{z} [(z \in X \iff z \in Y) \implies X = Y]

(For all possible (sets) XX, YY, for all possible (elements) zz , zz belongs to XX if and only if zz belongs YY, implies that (set) XX is equal to YY).

The important takeaway here is that sets equality completely ignores the order. Because, you know, sets are unordered, you would have to define order (which we’ll do, another day).

Set(1, 2, 3) ==
  Set(1, 3, 2) ==
  Set(2, 1, 3) ==
  Set(2, 3, 1) ==
  Set(3, 1, 2) ==
  Set(3, 2, 1)

Also, another property that is implied is that sets don’t allow you to repeat elements.

Set(1, 2, 3, 2) == Set(1, 2, 3)

Axiom of empty set

There is an empty set (\emptyset).

x¬(x)\exists_{\emptyset}\forall_x\neg(x \in \emptyset)

If we want to build set(s) up, we need to start up with something. The empty set is that something.

The classical set theory does it in another way - uses axiom schema of specification, which tells that for each set you can define a subset. (It’s called schema because it has to be defined for each set in separation, but there is a template for it where you just need to fix the set).

Axiom of pairing

If you have (sets) xx and yy, you can create a set, that contains them both.

xyAxAyA\forall_x\forall_y\exists_A x \in A \land y \in A
Set(x, y)

If x=yx=y, then we are effectively building a xx singleton ({x}\{x\}).

Set(x, x) == Set(x)

Axiom of union

If you have two sets, you can combine/sum/union them into one (a sum/union set).

XYZaXaY    aZ\forall_X\forall_Y\exists_Z a \in X \lor a \in Y \iff a \in Z

(For each XX, YY exists ZZ that contains elements of both of them.)

Set(1) ++ Set(2) == Set(1, 2)

We can write it down using set sum operator

Z=XYZ = X \cup Y

(Notice the similarity between \lor and \cup - it helps to remember what is sum \cup and what is intersection \cap).

Together with the axiom of pairing it lets you build up a set out of any elements you want: you just made singletons or sets of pairs out of them and then unionize the resulting sets until you get the set you want.

Set(1, 2) ++ Set(3, 3) == Set(1, 2, 3)

Axiom of regularity (axiom of foundation)

If we said, that a set can contain itself, or some other set that would contain such a set, things would get pretty hairy, pretty soon - it would easily lead to the paradox of set of all sets (Russell’s paradox).

To prevent that, we require that each set as well as each of its elements (if there are some; such elements are also sets) would be disjoint (share no common elements).

X[AAX    YX¬(zzXzY)]\forall_X[\exists_A A \in X \implies \exists_{Y \in X} \neg(\exists_z z \in X \land z \in Y)]
x = Set(x) // not allowed!

Axiom schema of replacement

Let’s say we want to have a pair of things and we want to remember which one is first and which one is second. We can define it like this

(a,b)={{a},{a,b}}(a, b) = \{ \{a\}, \{a, b\} \}

An inner set containing two elements defines our 2 elements and an inner set with one element shows which one is first. This is Kuratowski’s ordered pair/tuple definition.

Now, if we had a set of such pairs, and we knew that there are no two pairs that have the same first element, then we could call such set a function.

y=f(x)    (x,y)fy = f(x) \iff (x, y) \in f

The set of first elements would be a function’s domain, while the set of second values would be the function’s image or codomain.

Axiom schema of replacement is a way of stating that if there is a way of creating a function FF for set AA (domain), then its image F[A]F[A] is also a set. It expresses it using logical predicate where y=f(x)xAy = f(x) \land x \in A is denoted as ϕ(x,y,A)\phi(x,y,A) (yy is a value returned for an argument xx belonging to a domain AA).

A([xA!yϕ(x,y,A)]    By[yB    xAϕ(x,y,A)])\forall_A ( [ \forall_{x \in A} \exists!_y \phi(x, y, A) ] \implies \exists_B \forall_y [y \in B \iff \exists_{x \in A} \phi(x, y, A) ] )

Axiom of infinity

Let’s define S(w)=w{w}S(w) = w \cup \{w\} (union of ww set with ww singleton - achievable thanks to axioms of pairing and union). Then we can build a tower of SSes:

,S(),S(S()),...\emptyset, S(\emptyset), S(S(\emptyset)), ...

Axiom of infinity says that we can do this infinitely many times and obtain the final (infinite) set:

{,{},{,{}},{,{},{,{}}},...}\{\emptyset, \{\emptyset\}, \{\emptyset, \{\emptyset\}\}, \{\emptyset, \{\emptyset\}, \{\emptyset,\{\emptyset\}\}\}, ...\}

(These are called von Neumann ordinals and we get back to them later).

To allow the existence of such a set it’s enough if we allow that

X[Xy(yX    S(y)X)]\exists_X [\emptyset \in X \land \forall_y(y\in X\implies S(y) \in X)]

Of course, this set XX can contain more elements than required above (the von Neumann set is just a minimal example).

Having one infinite set, we can create other infinite sets just by applying other axioms on it.

Axiom of power set

Let’s say we have a set A={a,b,c}A = \{a,b,c\}. It has 8 possible subsets:

,{a},{b},{c},{a,b},{a,c},{b,c},{a,b,c}\emptyset, \{a\}, \{b\}, \{c\}, \{a,b\}, \{a,c\}, \{b,c\}, \{a,b,c\}

Set of all possible subsets is called A’s power set. We call it P(A)P(A). The axiom of power set states that for every set AA there exists a power set P(A)P(A).

AP(A)x[xA    xP(A)]\forall_A\exists_{P(A)}\forall_x[x\subseteq A\implies x\in P(A)]

If our AA set has nn elements (or a cardinality of nn, n=An = \|A\|), then the power set will have

  • (first element yes/no) ×\times (second element yes/no) ×\times ...... ×\times (n-th element yes/no)
  • 2×2×...×22 \times 2 \times ... \times 2 (nn times)
  • 2n2^n

elements (P(A)=2n=2A\|P(A)\| = 2^n = 2^{\|A\|}). Its cardinality is 22 power AA’s cardinality thus the name.

Axiom of choice

We defined a sum set and a tuple. So for completeness, let’s define a product set.

If we have sets AA and BB we can define a Cartesian product of A×BA \times B as a set of all pairs where the first element belongs to AA and second to BB.

A×B={(a,b):aAbB}A \times B = \{ (a,b): a \in A \land b \in B \}

If we generalize tuple (e.g. by stating that (a,b,c)=(a,(b,c))(a, b, c) = (a, (b, c)), (a,b,(c,d))=(a,b,c,d)(a, b, (c, d)) = (a, b, c, d) and so on) we can also generalize Cartesian product

X1×...×Xn={(x1,...,xn):x1X1,...,xnXn}X_1 \times ... \times X_n = \{(x_1, ..., x_n): x_1 \in X_1, ..., x_n \in X_n \}

It looks like product set’s cardinality should be the result of multiplication (product) of cardinalities of multiplied sets. So, if all of the sets are non-empty the cardinality of a product should also be non-zero and the set itself non-empty.

Now, if we looked at it from the other way round, it means that if the product is non-empty X1,...XnX_1, ... X_n we can always pick an n-tuple that contains one element from each set (basically any element of a product set will do). We can also think of such a tuple as a function from XiX_i to xiXix_i \in X_i. We would call such function a selector.

The axiom of choice basically tells, that for any set of non-empty sets you can find a selector.

X[∉X    f:XXAX (f(A)A)]\forall_X [ \emptyset \not\in X \implies \exists f: X \rightarrow \bigcup X\quad \forall A\in X\ \land (f(A)\in A)]

(For all sets of sets XX, if XX doesn’t have an empty set, then there is a function ff from XX to sum of all X elements, such that for each set AXA \in X it returns value belonging to AA).

The axiom of choice is sometimes replaced by well-ordering theorem or Kuratowski-Zorn lemma.

With that, we have some foundations to build on.

Peano arithmetic or how natural numbers came to be

Minimal requirements for building up natural numbers were defined in the 19th century by Giuseppe Peano. He presented that to define numbers we need 2 things:

  • something that will serve as 00,
  • something that will return a value n+1n+1 when we pass it nn - a successor function or succsucc.

So, in Peano arithmetic, to build up the number nn we need to apply succsucc nn times:

1=succ(0)1 = succ(0) 2=succ(1)=succ(succ(0))2 = succ(1) = succ(succ(0)) 3=succ(2)=succ(succ(1))=succ(succ(succ(0)))3 = succ(2) = succ(succ(1)) = succ(succ(succ(0)))

and so on.

So we could define natural numbers as a set made of 00 as well as all results of applying succsucc to other natural numbers. We did something similar with axiom of infinity - von Neumann set starts with an empty set and then created its successor with s=S()s_{\emptyset} = S(\emptyset), then pass sss_{s_{\emptyset}} to SS and so on. It perfectly fits the requirement of Peano arithmetic! As a matter of fact, these are called von Neumann ordinals and are one way of implementing it in set theory. Another would be Zermelo ordinals.

To make it a true arithmetic, we need some operations, e.g.:

  • addition:

    n+0=nn + 0 = n n+succ(m)=succ(n)+mn + succ(m) = succ(n) + m

    We basically move succsucc from one side to the other, until one side is 00 - then we have a result.

    1+2=succ(0)+succ(succ(0))=1 + 2 = succ(0) + succ(succ(0)) = =succ(succ(0))+succ(0)== succ(succ(0)) + succ(0) = =succ(succ(succ(0)))+0=3= succ(succ(succ(0))) + 0 = 3

  • multiplication:

    n×0=0×n=0n \times 0 = 0 \times n = 0 n×1=1×n=nn \times 1 = 1 \times n = n n×succ(m)=n+(n×m)n \times succ(m) = n + (n \times m)

    Here, we basically unroll multiplication into a sequence of additions. And we have the addition already defined.

    2×2=succ(succ(0))×succ(succ(0))=2 \times 2 = succ(succ(0)) \times succ(succ(0)) = =succ(succ(0))+(succ(succ(0))×succ(0))== succ(succ(0)) + (succ(succ(0)) \times succ(0)) = =succ(succ(0))+succ(succ(0))== succ(succ(0)) + succ(succ(0)) = =succ(succ(succ(0)))+succ(0)==succ(succ(succ(0))) + succ(0) = =succ(succ(succ(succ(0))))+0==succ(succ(succ(succ(0)))) + 0 = =succ(succ(succ(succ(0))))=4=succ(succ(succ(succ(0)))) = 4
  • comparison:

    0<succ(n)0 < succ(n) succ(n)>0succ(n) > 0 0=00 = 0 succ(n)<succ(m)    n<msucc(n) < succ(m) \iff n < m succ(n)>succ(m)    n>msucc(n) > succ(m) \iff n > m succ(n)=succ(m)    n=msucc(n) = succ(m) \iff n = m

    If you have 00 on either side, it is obvious. If not, remove succsucc from both sides and try again.

These definitions assume that just as you are able to wrap things up, you can also unwrap them and check whether they match some pattern. In functional programming, it is called pattern matching. In mathematics, I never saw any name for it as it is a granted property of the world.

Integers, anyone?

OK, so we defined an infinite set of natural numbers $\N = {0, 1, 2, …}$. It has addition, multiplication, and comparison, but it doesn’t have subtraction – at least not one that is defined for all possible arguments. If we define it like:

ab=c    a=b+ca - b = c \iff a = b + c

and then we’ll just try to find xx matching the equation, then:

21=x    1+x=22 - 1 = x \iff 1 + x = 2

has one solution for x=1x = 1. But:

12=?1 - 2 = ?

What would be the meaning of that? (I saw a definition of a subtraction in lambda calculus, that would return 0 - because it has to return something, but we would like something better than that).

So, let’s upgrade our numbers. Let’s write all pairs of natural numbers and draw them as points on a graph:

^
|(0,3)  (1,3)  (2,3)  (3,3)
+
|(0,2)  (1,2)  (2,2)  (3,2)
+
|(0,1)  (1,1)  (2,1)  (3,1)
+
|(0,0)  (1,0)  (2,0)  (3,0)
+------+------+------+------>

If we assume that (a,b)(a, b) represents bab - a, then we’ll have the result defined for all tuples lying on diagonal (n,n)(n,n) and below it. We’ll also see, we can group tuples with the same result.

I0={(0,0),(1,1),(2,2),(3,3),...}I_0 = \{(0,0), (1,1), (2,2), (3,3), ...\} I1={(1,0),(2,1),(3,2),(4,3),...}I_1 = \{(1,0), (2,1), (3,2), (4,3), ...\} I2={(2,0),(3,1),(4,2),(5,3),...}I_2 = \{(2,0), (3,1), (4,2), (5,3), ...\}

In general, if for any (n,m)(n, m) where n,mNn,m \in \mathbb{N}, (n+1,m+1)(n+1,m+1) would have the same subtraction, and so it will belong to the same group. If we connect the dots in the same group we get:

^
|(0,3)  (1,3)  (2,3)  (3,3)
+                    .*
|                 .*
|(0,2)  (1,2)  (2,2)  (3,2)
+             .*     .*
|          .*     .*
|(0,1)  (1,1)  (2,1)  (3,1)
+      .*     .*     .*
|   .*     .*     .*
|(0,0)  (1,0)  (2,0)  (3,0)
+------+------+------+------>

From now on let’s identify a natural number with the group in which it is a result of subtraction. As a matter of fact, what we did here is the creation of an equivalence class - we partitioned the whole set of pairs into smaller disjoint subsets that altogether cover the whole original set N×N\mathbb{N} \times \mathbb{N}, where each pair is identical to the other in some regard (here: same result of subtraction). Because they are the same (in a way), we might not distinguish between them, and even identify the whole set with them. And (as long as we won’t violate the property used to partition them), we can translate the operation on elements of the set to the operation on a whole set. E.g. if we add any pairs that represent 1 to any pair that represents 2, we receive a pair that must represent 3. Of course, as long as we define that addition properly.

So, from now on:

  • addition:

    n+m=In+Im={(a+c,b+d):(a,b)In(c,d)Im}n + m = I_n + I_m = \{ (a+c, b+d): (a, b) \in I_n \land (c,d) \in I_m \}
  • multiplication:

    n×m=In×Im={(bd+ac,bc+ad):(a,b)In(c,d)Im}n \times m = I_n \times I_m = \{ (bd + ac, bc + ad): (a, b) \in I_n \land (c,d) \in I_m \}

    (Try to calculate (ba)(dc)(b-a)(d-c) to check where it came from).

What we might not have paid attention to so far is that from the very beginning such a definition defines equivalence classes for all possible pairs - even those that were not handled by the original definition of subtraction.

^
|(0,3)  (1,3)  (2,3)  (3,3)
+      .*     .*     .*
|   .*     .*     .*
|(0,2)  (1,2)  (2,2)  (3,2)
+      .*     .*     .*
|   .*     .*     .*
|(0,1)  (1,1)  (2,1)  (3,1)
+      .*     .*     .*
|   .*     .*     .*
|(0,0)  (1,0)  (2,0)  (3,0)
+------+------+------+------>

As you probably guessed, these new numbers are negative numbers we missed.

  ^
  |(0,3)  (1,3)  (2,3)  (3,3)
-3+      .*     .*     .*
  |   .*     .*     .*
  |(0,2)  (1,2)  (2,2)  (3,2)
-2+      .*     .*     .*
  |   .*     .*     .*
  |(0,1)  (1,1)  (2,1)  (3,1)
-1+      .*     .*     .*
  |   .*     .*     .*
  |(0,0)  (1,0)  (2,0)  (3,0)
  +------+------+------+------>
 0      1      2      3       

So, now we can reliably subtract things. But we still cannot reliably divide them. For that, we’ll need rational numbers.

Rationals - still talking about tuples

Rational numbers are fractions. Dividend, divisor and a fraction bar between them.

pq:p,qZq0\frac{p}{q}: p, q \in \mathbb{Z} \land q \neq 0

At this point, you should know what we’re going to do: use tuples!

pq=(p,q):p,qZq0\frac{p}{q} = (p, q): p, q \in \mathbb{Z} \land q \neq 0

Again, we will introduce equivalence classes, stating that:

(p,q)=(n×p,n×q)nZ{0}(p, q) = (n \times p, n \times q) \land n \in \mathbb{Z}-\{0\}

meaning that you can reduce a fraction and it is still the same fraction.

Also, for compatibility: p1=p,pZ\frac{p}{1} = p, p \in \mathbb{Z}

Then, we can define operations:

  • addition:

    (a,b)+(c,d)=(ad+bc,bd)(a, b) + (c, d) = (ad + bc, bd)
  • multiplication:

    (a,b)×(c,d)=(ab,cd)(a, b) \times (c, d) = (ab, cd)
  • subtraction:

    (a,b)(c,d)=(a,b)+(c,d)=(a,b)+(c,d)(a, b) - (c, d) = (a, b) + (-c, d) = (a, b) + (c, -d)
  • division:

    (a,b)/(c,d)=(ad,bc)(a, b) / (c, d) = (ad, bc)

At this point, I’m skipping showing how comparison works, or things like neutral elements, but the curious reader has all the tools to figure them out.

Real numbers - finally, something made up!

So far everything could be solved with tricks like wrapping things up in a specific way. With real numbers, things are more difficult. In a way, as for something based in eponymous reality, they are the first thing we cannot easily construct from smaller things.

This indirectly became an issue for Pythagoreans, when they found out that numbers like 2\sqrt{2} or π\pi cannot be constructed from (finite amount of) rational numbers, no matter how you arrange them. It offended their sense of mathematical aesthetics. Even millennia later we call such numbers irrationals. Yet, we need them to be able to describe positions and sizes in (among others) Euclidean spaces.

There are 2 popular ways to define real numbers using rationals: Dedekind cuts and Cauchy sequences.

Dedekind cuts use bounds or partitions to define numbers. E.g. I am looking for the smallest number that, when squared, would be greater or equal to 2. In rational numbers, there is no such smallest element. We can get closer and closer to something that, when squared, becomes 2, but it’s always slightly bigger.

Then we can state that on e.g. a numerical line we can reach such a point, and we need a good name for it. Since no rational fits it, we can name it 2\sqrt{2} and call it a day. Every time we will see 2\sqrt{2} we would be able to think that this is a placeholder for something that fulfills the condition that this is the smallest number that, when squared, is greater than or equal to 2 (or it’s the least upper bound of ii such that i2>=2i^2 >= 2).

The problem with such a definition is that we aren’t constructing anything, we just list the requirements, give a name to something that fulfills them, and from now on pretend that this is a thing (and, since it’s mathematics, we can often get away with it if the definition is not contradictory).

Cauchy sequences are sequences where elements become closer to each other as the sequence progresses. E.g.

4,4,2,2,1,1,12,12,14,14,...4, -4, 2, -2, 1, -1, \frac{1}{2}, -\frac{1}{2}, \frac{1}{4}, -\frac{1}{4}, ...

We can easily design a sequence that will approach something that is not a rational number. E.g.

21,1410,141100,14141000,...\frac{2}{1}, \frac{14}{10}, \frac{141}{100}, \frac{1414}{1000}, ...

could be a Cauchy sequence of rationals that gets closer and closer to 2\sqrt{2}.

Then, we could identify each sequence with a number it is approaching and this way we would design real numbers. An infinite sequence is always something that we can construct (as opposed to hand-waving with partitions).

How do we operate on such sequences? Well, if you add, subtract, multiply or divide values on corresponding positions in 2 sequences, you should get a sequence that approaches the result of addition, subtraction, multiplication or division.

For comparison, you can subtract one sequence from the other and check the sign.

Now, you can construct and use every real number.

Beyond real

So far everything was a more or less convoluted way of wrapping up the empty set and sets of the empty set, and sets of sets of the empty set, etc. We build natural numbers, integers, rationals, reals, tuples, and functions this way. Is everything made up of sets? Well, in set theory, basically yes.

Vectors and matrices could be defined as functions from elements’ coordinates to the element (sets). Or tuples (vectors) and tuples of tuples with matching sizes (matrices). Complex numbers are, well, tuples of real and imaginary parts. Also coordinates on a complex plane. Quaternions? Matrices. Algebras are sets of values with operations on these values (functions, thus also sets). Graphs? A tuple consisting of a set of vertices and a set of nodes.

At some point, mathematicians introduced the idea of a class - a collection of objects defined by properties objects must have. They did it because sets have to be constructible from bottom-up (one way or the other), which excludes, e.g., the ability to build the set of all sets, which might be limiting. The class of all sets is a thing. Also, an example of a proper class - a class which is not a set.

However, besides proper classes (and proper classes of proper classes, etc), everything based on set theory operates on sets. And each of these sets originates from an empty set wrapped in different ways. Lambda calculus needs only α\alpha-congruences and β\beta-reductions to implement everything, category theory has objects and arrows. And set theory achieves it all with the empty sets and wrapping.