In the beginning, there was the empty set

What would you say if I told you that in math everything is a set? That, whatever mathematical object you name, it can be defined using sets? And not sets, that contains some special ingredient somewhere deep inside - but by basically wrapping up empty sets and merging it with more wrapped empty sets in various degree of wrapping?

Because that’s how things are. If you are curious how to do it, let us take a small walk through the foundations of the set theory!

Let’s start with some axioms

If we want to build up the whole universe with sets, we need to set up some ground rules to follow. Otherwise, we wouldn’t know what is a set and what is allowed and what is not.

To be honest, we don’t define a set. It just is. A thing that contains other things… or not. That’s basically what we can tell about a set: does it contains something or not.

Set(1, 2, 3).contains(2) // true
Set(1, 2, 3).contains(4) // false

Axiom of extensionality

So, the first thing we can do with sets is taking them and checking if they are the same thing. Therefore we say, that two sets are equal (and they are the same set) if they contain the same elements.

X,Yz[(zX    zY)    X=Y]\forall_{X,Y} \forall_{z} [(z \in X \iff z \in Y) \implies X = Y]

(For all possible (sets) XX, YY, for all possible (elements) zz , zz belongs to XX if and only if zz belongs YY, implies that (set) XX is equal to YY).

The important takeaway here is that sets equality completely ignores the order. Because, you know, sets are unordered, you would have to define order (which we’ll do, another day).

Set(1, 2, 3) ==
  Set(1, 3, 2) ==
  Set(2, 1, 3) ==
  Set(2, 3, 1) ==
  Set(3, 1, 2) ==
  Set(3, 2, 1)

Also, another property that is implied is that sets don’t allow you to repeat elements.

Set(1, 2, 3, 2) == Set(1, 2, 3)

Axiom of empty set

There is an empty set (\emptyset).

x¬(x)\exists_{\emptyset}\forall_x\neg(x \in \emptyset)

If we want to build set(s) up, we need to start up with something. The empty set is that something.

The classical set theory does it in another way - uses axiom schema of specification, which tells that for each set you can define a subset. (It’s called schema because it has to be defined for each set in separation, but there is a template for it where you just need to fix the set).

Axiom of pairing

If you have (sets) xx and yy, you can create a set, that contains them both.

xyAxAyA\forall_x\forall_y\exists_A x \in A \land y \in A
Set(x, y)

If x=yx=y, then we are effectively building a xx singleton ({x}\{x\}).

Set(x, x) == Set(x)

Axiom of union

If you have two sets, you can combine/sum/union them into one (a sum/union set).

XYZaXaY    aZ\forall_X\forall_Y\exists_Z a \in X \lor a \in Y \iff a \in Z

(For each XX, YY exists ZZ that contains elements of both of them.)

Set(1) ++ Set(2) == Set(1, 2)

We can write it dawn using set sum operator

Z=XYZ = X \cup Y

(Notice the similarity between \lor and \cup - it helps to remember what is sum \cup and what is intersection \cap).

Together with the axiom of pairing it lets you build up a set out of any elements you want: you just made singletons or sets of pairs out of them and then unionize the resulting sets until you get the set you want.

Set(1, 2) ++ Set(3, 3) == Set(1, 2, 3)

Axiom of regularity (axiom of foundation)

If we said, that set can contain itself, or some other set that would contain such set, things would get pretty hairy, pretty soon - it would easily lead to the paradox of set of all sets (Russell’s paradox).

To prevent that, we require that each set as well as each of its elements (if there are some; such elements are also sets) would be disjoint (share no common elements).

X[AAX    YX¬(zzXzY)]\forall_X[\exists_A A \in X \implies \exists_{Y \in X} \neg(\exists_z z \in X \land z \in Y)]
x = Set(x) // not allowed!

Axiom schema of replacement

Let’s say we want to have a pair of things and we want to remember which one is first and which one is second. We can define it like this

(a,b)={{a},{a,b}}(a, b) = \{ \{a\}, \{a, b\} \}

An inner set containing two elements defines our 2 elements and inner set with one element shows which one is first. This is Kuratowski’s ordered pair/tuple definition.

Now, if we had a set of such pairs, and we knew that there are no two pairs that have the same first element, then we could call such set a function.

y=f(x)    (x,y)fy = f(x) \iff (x, y) \in f

The set of first elements would be a function’s domain, while the set of second values would be the function’s image or codomain.

Axiom schema of replacement is a way of stating that if way of creating function FF for set AA (domain), then its image F[A]F[A] is also a set. It express it using logical predicate where y=f(x)xAy = f(x) \land x \in A is denoted as ϕ(x,y,A)\phi(x,y,A) (yy is a value returned for an argument xx belonging to a domain AA).

A([xA!yϕ(x,y,A)]    By[yB    xAϕ(x,y,A)])\forall_A ( [ \forall_{x \in A} \exists!_y \phi(x, y, A) ] \implies \exists_B \forall_y [y \in B \iff \exists_{x \in A} \phi(x, y, A) ] )

Axiom of infinity

Let’s define S(w)=w{w}S(w) = w \cup \{w\} (union of ww set with ww singleton - achievable thanks to axiom of pairing and ). Then we can build a tower of SSes:

,S(),S(S()),...\emptyset, S(\emptyset), S(S(\emptyset)), ...

Axiom of infinity says, that we can do this infinitely many times and obtain the final (infinite) set:

{,{},{,{}},{,{},{,{}}},...}\{\emptyset, \{\emptyset\}, \{\emptyset, \{\emptyset\}\}, \{\emptyset, \{\emptyset\}, \{\emptyset,\{\emptyset\}\}\}, ...\}

(These are called von Neumann ordinals and we get back to them later).

To allow the existence of such set it’s enough if we allow that

X[Xy(yX    S(y)X)]\exists_X [\emptyset \in X \land \forall_y(y\in X\implies S(y) \in X)]

Of course, this set XX can contain more elements, than required above (von Neuman set is just a minimal example).

Having one infinite set, we can create other infinite sets just by applying other axioms on it.

Axiom of power set

Let’s say we have a set A={a,b,c}A = \{a,b,c\}. It has 8 possible subsets:

,{a},{b},{a},{a,b},{a,c},{b,c},{a,b,c}\emptyset, \{a\}, \{b\}, \{a\}, \{a,b\}, \{a,c\}, \{b,c\}, \{a,b,c\}

Set of all possible subsets is called A’s power set. We call it P(A)P(A). The axiom of power set states that for every set AA there exists a power set P(A)P(A).

AP(A)x[xA    xP(A)]\forall_A\exists_{P(A)}\forall_x[x\subseteq A\implies x\in P(A)]

If our AA set has nn elements (or a cardinality of nn, n=An = \|A\|), then the power set will have

  • (first element yes/no) ×\times (second element yes/no) ×\times ...... ×\times (n-th element yes/no)
  • 2×2×...×22 \times 2 \times ... \times 2 (nn times)
  • 2n2^n

elements (P(A)=2n=2A\|P(A)\| = 2^n = 2^{\|A\|}). Its cardinality is 22 power AA’s cardinality thus the name.

Axiom of choice

We defined a sum set and a tuple. So for completeness, let’s define a product set.

If we have sets AA and BB we can define a Cartesian product of A×BA \times B as a set of all pairs where the first element belongs to AA and second to BB.

A×B={(a,b):aAbB}A \times B = \{ (a,b): a \in A \land b \in B \}

If we generalize tuple (e.g. by stating that (a,b,c)=(a,(b,c))(a, b, c) = (a, (b, c)), (a,b,(c,d))=(a,b,c,d)(a, b, (c, d)) = (a, b, c, d) and so on) we can also generalize Cartesian product

X1×...×Xn={(x1,...,xn):x1X1,...,xnXn}X_1 \times ... \times X_n = \{(x_1, ..., x_n): x_1 \in X_1, ..., x_n \in X_n \}

It looks like product set’s cardinality should be the result of multiplication (product) of cardinalities of multiplied sets. So, if all of the sets are non-empty the cardinality of a product should also be non-zero and the set itself non-empty.

Now, if we looked at it from the other way round, it means that if the product is non-empty X1,...XnX_1, ... X_n we can always pick n-tuple, that contains one element from each set (basically any element of a product set will do). We can also think of such tuple as a function from XiX_i to xiXix_i \in X_i. We would call such function a selector.

The axiom of choice basically tells, that for any set of non-empty sets you can find a selector.

X[∉X    f:XXAX (f(A)A)]\forall_X [ \emptyset \not\in X \implies \exists f: X \rightarrow \bigcup X\quad \forall A\in X\ \land (f(A)\in A)]

(For all sets of sets XX, if XX doesn’t have an empty set, then there is a function ff from XX to sum of all X elements, such that for each set AXA \in X it returns value belonging to AA).

The axiom of choice is sometimes replaced by well-ordering theorem or Kuratowski–Zorn lemma.

With that, we have some foundations to build on.

Peano arithmetics or how natural numbers came to be

Minimal requirements for building up natural numbers were defined in the 19th century by Giuseppe Peano. He presented that to the define numbers we need 2 things:

  • something that will serve as a 00,
  • something that will return a value n+1n+1 when we pass it nn - a successor function or succsucc.

So, in Peano arithmetic, to build up a nn number we need to apply succsucc nn times:

1=succ(0)1 = succ(0) 2=succ(1)=succ(succ(0))2 = succ(1) = succ(succ(0)) 3=succ(2)=succ(succ(1))=succ(succ(succ(0)))3 = succ(2) = succ(succ(1)) = succ(succ(succ(0)))

and so on.

So we could define natural numbers as a set made of 00 as well as all results of applying succsucc to other natural numbers. We did something similar with axiom of infinity - von Neuman set starts with an empty set and then created its successor with s=S()s_{\emptyset} = S(\emptyset), then pass sss_{s_{\emptyset}} to SS and so on. It perfectly fits the requirement of Peano arithmetic! As a matter of the fact, these are called von Neumann ordinals and are one way of implementing it in set theory. Other would be Zermelo ordinals.

To make it a true arithmetic, we need some operations, e.g.:

  • addition:

    n+0=nn + 0 = n n+succ(m)=succ(n)+mn + succ(m) = succ(n) + m

    We basically move succsucc from one side to the other, until one side is 00 - then we have a result.

    1+2=succ(0)+succ(succ(0))=1 + 2 = succ(0) + succ(succ(0)) = =succ(succ(0))+succ(0)== succ(succ(0)) + succ(0) = =succ(succ(succ(0)))+0=3= succ(succ(succ(0))) + 0 = 3

  • multiplication:

    n×0=0×n=0n \times 0 = 0 \times n = 0 n×1=1×n=nn \times 1 = 1 \times n = n n×succ(m)=n+(n×m)n \times succ(m) = n + (n \times m)

    Here, we basically unroll multiplication into a sequence of additions. And we have the addition already defined.

    2×2=succ(succ(0))×succ(succ(0))=2 \times 2 = succ(succ(0)) \times succ(succ(0)) = =succ(succ(0))+(succ(succ(0))×succ(0))== succ(succ(0)) + (succ(succ(0)) \times succ(0)) = =succ(succ(0))+succ(succ(0))== succ(succ(0)) + succ(succ(0)) = =succ(succ(succ(0)))+succ(0)==succ(succ(succ(0))) + succ(0) = =succ(succ(succ(succ(0))))+0==succ(succ(succ(succ(0)))) + 0 = =succ(succ(succ(succ(0))))=4=succ(succ(succ(succ(0)))) = 4
  • comparison:

    0<succ(n)0 < succ(n) succ(n)>0succ(n) > 0 0=00 = 0 succ(n)<succ(m)    n<msucc(n) < succ(m) \iff n < m succ(n)>succ(m)    n>msucc(n) > succ(m) \iff n > m succ(n)=succ(m)    n=msucc(n) = succ(m) \iff n = m

    If you have 00 of either side, it is obvious. If not, remove succsucc from both sides and try again.

These definitions assume, that just as you are able to wrap things up, you can also unwrap, then and check if they match some pattern. In functional programming, it is called a pattern matching. In mathematics, I never saw any name for it as it is a granted property of the world.

Integers, anyone?

OK, so we defined an infinite set of natural numbers $\N = {0, 1, 2, …}$. It has addition, multiplication, and comparison, but it doesn’t have subtraction. At least one that would be defined for all possible arguments. If we define it like:

ab=c    a=b+ca - b = c \iff a = b + c

and then we’ll just try to find 3 matching the equation, then:

21=x    1+x=22 - 1 = x \iff 1 + x = 2

has one solution for x=1x = 1. But:

12=?1 - 2 = ?

What would be the meaning of that? (I saw a definition of a subtraction in lambda calculus, that would return 0 - because it has to return something, but we would like something better than that).

So, let’s upgrade our numbers. Let’s write all pairs of natural numbers and draw them as points on a graph:

^
|(0,3)  (1,3)  (2,3)  (3,3)
+
|(0,2)  (1,2)  (2,2)  (3,2)
+
|(0,1)  (1,1)  (2,1)  (3,1)
+
|(0,0)  (1,0)  (2,0)  (3,0)
+------+------+------+------>

If we assume that (a,b)(a, b) represents bab - a, then we’ll have the result defined for all tuples lying on diagonal (n,n)(n,n) and below it. We’ll also see, we can group tuples with the same result.

I0={(0,0),(1,1),(2,2),(3,3),...}I_0 = \{(0,0), (1,1), (2,2), (3,3), ...\} I1={(1,0),(2,1),(3,2),(4,3),...}I_1 = \{(1,0), (2,1), (3,2), (4,3), ...\} I2={(2,0),(3,1),(4,2),(5,3),...}I_2 = \{(2,0), (3,1), (4,2), (5,3), ...\}

In general, if for any (n,m)(n, m) where n,mNn,m \in \mathbb{N}, (n+1,m+1)(n+1,m+1) would have the same subtraction, and so it will belong to the same group. If we connect the dots in the same group we get:

^
|(0,3)  (1,3)  (2,3)  (3,3)
+                    .*
|                 .*
|(0,2)  (1,2)  (2,2)  (3,2)
+             .*     .*
|          .*     .*
|(0,1)  (1,1)  (2,1)  (3,1)
+      .*     .*     .*
|   .*     .*     .*
|(0,0)  (1,0)  (2,0)  (3,0)
+------+------+------+------>

Now, let’s say from now on we’ll identify a natural number with the group to in which it is a result of subtraction. As a matter of the fact, what we did here is creation of an equivalent class - we partitioned the whole set of pairs to smaller disjoints subsets, that altogether cover the whole original set N×N\mathbb{N} \times \mathbb{N}, where each pair is identical to the other in some regard (here: same result of subtraction). Because they are the same (in a way), we might not distinguish between them, and even identify the whole set with them. And (as long as we won’t violate the property used to partition them), we can translate the operation on elements of the set to the operation on a whole set. E.g. if we add any pairs which represent 1 to any pair which represent 2, we receive a pair that must represent 3. Of course, as long as we define that addition properly.

So, from now on:

  • addition:

    n+m=In+Im={(a+c,b+d):(a,b)In(c,d)Im}n + m = I_n + I_m = \{ (a+c, b+d): (a, b) \in I_n \land (c,d) \in I_m \}
  • multiplication:

    n×m=In×Im={(bd+ac,bc+ad):(a,b)In(c,d)Im}n \times m = I_n \times I_m = \{ (bd + ac, bc + ad): (a, b) \in I_n \land (c,d) \in I_m \}

    (Try to calculate (ba)(dc)(b-a)(d-c) to check where it came from).

What we might have not paid attention so far, is that from the very beginning such definition defines classes of equivalence for all possible pairs - even those, that were not handled by the original definition of subtraction.

^
|(0,3)  (1,3)  (2,3)  (3,3)
+      .*     .*     .*
|   .*     .*     .*
|(0,2)  (1,2)  (2,2)  (3,2)
+      .*     .*     .*
|   .*     .*     .*
|(0,1)  (1,1)  (2,1)  (3,1)
+      .*     .*     .*
|   .*     .*     .*
|(0,0)  (1,0)  (2,0)  (3,0)
+------+------+------+------>

As you probably guessed, these new numbers are negative numbers we missed.

  ^
  |(0,3)  (1,3)  (2,3)  (3,3)
-3+      .*     .*     .*
  |   .*     .*     .*
  |(0,2)  (1,2)  (2,2)  (3,2)
-2+      .*     .*     .*
  |   .*     .*     .*
  |(0,1)  (1,1)  (2,1)  (3,1)
-1+      .*     .*     .*
  |   .*     .*     .*
  |(0,0)  (1,0)  (2,0)  (3,0)
  +------+------+------+------>
 0      1      2      3       

So, now we can reliably subtract things. But we still cannot reliably divide them. For that, we’ll need rational numbers.

Rationals - still talking about tuples

Rational numbers are fractions. Dividend, divisor and a fraction bar between them.

pq:p,qZq0\frac{p}{q}: p, q \in \mathbb{Z} \land q \neq 0

At this point, you should know, what we’re going to do: use tuples!

pq=(p,q):p,qZb0\frac{p}{q} = (p, q): p, q \in \mathbb{Z} \land b \neq 0

Again, we will introduce a equality classes, stating that:

(p,q)=(n×q,n×q)nZ{0}(p, q) = (n \times q, n \times q) \land n \in \mathbb{Z}-\{0\}

meaning that you can reduce a fraction and it is still the same fraction.

Also, for compatibility: p1=p,pZ\frac{p}{1} = p, p \in \mathbb{Z}

Then, we can define operations:

  • addition:

    (a,b)+(c,d)=(ad+bc,bd)(a, b) + (c, d) = (ad + bc, bd)
  • multiplication:

    (a,b)×(c,d)=(ab,cd)(a, b) \times (c, d) = (ab, cd)
  • subtraction:

    (a,b)(c,d)=(a,b)+(c,d)=(a,b)+(c,d)(a, b) - (c, d) = (a, b) + (-c, d) = (a, b) + (c, -d)
  • division:

    (a,b)/(c,d)=(ad,bc)(a, b) / (c, d) = (ad, bc)

At this point, I’m skipping showing how comparison work, or things like neutral elements, but the curious reader has all the tools to figure them out.

Real numbers - finally, something made up!

So far everything could be solved with tricks like wrapping things up in a specific way. With real numbers, things are more difficult. In a way, as for something based in eponymous reality, they are the first thing we cannot easily construct from smaller things.

This indirectly became an issue for Pythagoreans, when they found out that numbers like 2\sqrt{2} or π\pi cannot be constructed from (finite amount of) rational numbers, no matter how you arrange them. It offended their sense of mathematical aesthetic. Even millenniums later we call such numbers irrationals. Yet, we need them to be able to describe positions and sizes in (among others) Euclidean spaces.

There are 2 popular ways to define real numbers using rationals: Dedekind’s cuts and Cauchy’s sequences.

Dedekind’s cuts uses bounds or partitions to define numbers. E.g. I am looking for the smallest number that, when squared, would be greater or equal to 2. In rational numbers, there is no such smallest element. We can get closer and closer to something that squared becomes 2, but it’s always slightly bigger.

Then we can state, that on e.g. numerical line we can reach such a point, and we need a good name for it. Since no rational fit it, we can name it 2\sqrt{2} and call it a day. Every time we will see 2\sqrt{2} we would be able to think, that this is a placeholder for something that fulfills the condition, that this is the smallest number that squared is greater or equal to 2 (or it’s a least upper bound of ii such that i2>=2i^2 >= 2).

Problem with such definition is that we aren’t constructing anything, we just list the requirements, give a name to something that fulfills them, and from now on pretend that this is a thing (and, since it’s mathematics, we can often get away with it if the definition is not contradictory).

Cauchy’s sequences are sequences where elements become closer to each other as the sequence progresses. E.g.

4,4,2,2,1,1,12,12,14,14,...4, -4, 2, -2, 1, -1, \frac{1}{2}, -\frac{1}{2}, \frac{1}{4}, -\frac{1}{4}, ...

We can easily design a sequence, that will approach something that is not a rational number. E.g.

21,1410,141100,14141000,...\frac{2}{1}, \frac{14}{10}, \frac{141}{100}, \frac{1414}{1000}, ...

could be a Cauchy’s sequence of rationals that gets closer and closer to 2\sqrt{2}.

Then, we could identify each sequence, with a number it is approaching and this way we would design real numbers. An infinite sequence is always something that we can construct (as opposed to hand-waving with partitions).

How to operate on such sequences? Well, if you add, subtract, multiply or divide values on corresponding positions in 2 sequences, you should get a sequence that approaches the result of addition, subtraction, multiplication or division.

For comparison, you can subtract one sequence from the other and check the sign.

Now, you can construct and use every real number.

Beyond real

So far everything was more or less convoluted way of wrapping up empty set and sets of the empty set, and sets of sets of the empty set, etc. We build this way natural numbers, integers, rationals and reals, tuples and functions. Is everything made up of sets? Well, in set theory, basically yes.

Vectors and matrices could be defined as functions from elements coordinates to the element (sets). Or tuples (vectors) and tuples of tuples with matching sizes (matrices). Complex numbers are, well, tuples of real and imaginary part. Also coordinates on a complex plane. Quaternions? Matrices. Algebras are sets of values with operations on these values (functions, thus also sets). Graphs? A tuples of vertices’ set and nodes’ set.

At some point, mathematicians introduced the idea of a class - a collection of objects defined by properties objects must have. They did it because sets have to be constructible from bottom-up (one way or the other), which excludes e.g. ability to build the set of all sets, which might be limiting. The class of all sets is a thing. Also, an example of a proper class - a class which is not a set.

However, besides proper classes (and proper classes of proper classes, etc), everything based on a set theory operates on sets. And each of these sets originates from an empty set wrapped in different ways. Lambda calculus needs only α\alpha-congruences and β\beta-reductions to implement everything, category theory has objects and arrows. And a set theory achieves it all with the empty sets and wrapping.