What would you say if I told you that in math everything is a set? That, whatever mathematical object you name, it can be defined using sets? And not sets, that contains some special ingredient somewhere deep inside  but by basically wrapping up empty sets and merging it with more wrapped empty sets in various degree of wrapping?
Because that’s how things are. If you are curious how to do it, let us take a small walk through the foundations of the set theory!
Let’s start with some axioms
If we want to build up the whole universe with sets, we need to set up some ground rules to follow. Otherwise, we wouldn’t know what is a set and what is allowed and what is not.
To be honest, we don’t define a set. It just is. A thing that contains other things… or not. That’s basically what we can tell about a set: does it contains something or not.
Set(1, 2, 3).contains(2) // true
Set(1, 2, 3).contains(4) // false
Axiom of extensionality
So, the first thing we can do with sets is taking them and checking if they are the same thing. Therefore we say, that two sets are equal (and they are the same set) if they contain the same elements.
(For all possible (sets) , , for all possible (elements) , belongs to if and only if belongs , implies that (set) is equal to ).
The important takeaway here is that sets equality completely ignores the order. Because, you know, sets are unordered, you would have to define order (which we’ll do, another day).
Set(1, 2, 3) ==
Set(1, 3, 2) ==
Set(2, 1, 3) ==
Set(2, 3, 1) ==
Set(3, 1, 2) ==
Set(3, 2, 1)
Also, another property that is implied is that sets don’t allow you to repeat elements.
Set(1, 2, 3, 2) == Set(1, 2, 3)
Axiom of empty set
There is an empty set ().
If we want to build set(s) up, we need to start up with something. The empty set is that something.
The classical set theory does it in another way  uses axiom schema of specification, which tells that for each set you can define a subset. (It’s called schema because it has to be defined for each set in separation, but there is a template for it where you just need to fix the set).
Axiom of pairing
If you have (sets) and , you can create a set, that contains them both.
Set(x, y)
If , then we are effectively building a singleton ().
Set(x, x) == Set(x)
Axiom of union
If you have two sets, you can combine/sum/union them into one (a sum/union set).
(For each , exists that contains elements of both of them.)
Set(1) ++ Set(2) == Set(1, 2)
We can write it dawn using set sum operator
(Notice the similarity between and  it helps to remember what is sum and what is intersection ).
Together with the axiom of pairing it lets you build up a set out of any elements you want: you just made singletons or sets of pairs out of them and then unionize the resulting sets until you get the set you want.
Set(1, 2) ++ Set(3, 3) == Set(1, 2, 3)
Axiom of regularity (axiom of foundation)
If we said, that set can contain itself, or some other set that would contain such set, things would get pretty hairy, pretty soon  it would easily lead to the paradox of set of all sets (Russell’s paradox).
To prevent that, we require that each set as well as each of its elements (if there are some; such elements are also sets) would be disjoint (share no common elements).
x = Set(x) // not allowed!
Axiom schema of replacement
Let’s say we want to have a pair of things and we want to remember which one is first and which one is second. We can define it like this
An inner set containing two elements defines our 2 elements and inner set with one element shows which one is first. This is Kuratowski’s ordered pair/tuple definition.
Now, if we had a set of such pairs, and we knew that there are no two pairs that have the same first element, then we could call such set a function.
The set of first elements would be a function’s domain, while the set of second values would be the function’s image or codomain.
Axiom schema of replacement is a way of stating that if way of creating function for set (domain), then its image is also a set. It express it using logical predicate where is denoted as ( is a value returned for an argument belonging to a domain ).
Axiom of infinity
Let’s define (union of set with singleton  achievable thanks to axiom of pairing and ). Then we can build a tower of es:
Axiom of infinity says, that we can do this infinitely many times and obtain the final (infinite) set:
(These are called von Neumann ordinals and we get back to them later).
To allow the existence of such set it’s enough if we allow that
Of course, this set can contain more elements, than required above (von Neuman set is just a minimal example).
Having one infinite set, we can create other infinite sets just by applying other axioms on it.
Axiom of power set
Let’s say we have a set . It has 8 possible subsets:
Set of all possible subsets is called A’s power set. We call it . The axiom of power set states that for every set there exists a power set .
If our set has elements (or a cardinality of , ), then the power set will have
 (first element yes/no) (second element yes/no) (nth element yes/no)
 ( times)
elements (). Its cardinality is power ’s cardinality thus the name.
Axiom of choice
We defined a sum set and a tuple. So for completeness, let’s define a product set.
If we have sets and we can define a Cartesian product of as a set of all pairs where the first element belongs to and second to .
If we generalize tuple (e.g. by stating that , and so on) we can also generalize Cartesian product
It looks like product set’s cardinality should be the result of multiplication (product) of cardinalities of multiplied sets. So, if all of the sets are nonempty the cardinality of a product should also be nonzero and the set itself nonempty.
Now, if we looked at it from the other way round, it means that if the product is nonempty we can always pick ntuple, that contains one element from each set (basically any element of a product set will do). We can also think of such tuple as a function from to . We would call such function a selector.
The axiom of choice basically tells, that for any set of nonempty sets you can find a selector.
(For all sets of sets , if doesn’t have an empty set, then there is a function from to sum of all X elements, such that for each set it returns value belonging to ).
The axiom of choice is sometimes replaced by wellordering theorem or Kuratowski–Zorn lemma.
With that, we have some foundations to build on.
Peano arithmetics or how natural numbers came to be
Minimal requirements for building up natural numbers were defined in the 19th century by Giuseppe Peano. He presented that to the define numbers we need 2 things:
 something that will serve as a ,
 something that will return a value when we pass it  a successor function or .
So, in Peano arithmetic, to build up a number we need to apply times:
and so on.
So we could define natural numbers as a set made of as well as all results of applying to other natural numbers. We did something similar with axiom of infinity  von Neuman set starts with an empty set and then created its successor with , then pass to and so on. It perfectly fits the requirement of Peano arithmetic! As a matter of the fact, these are called von Neumann ordinals and are one way of implementing it in set theory. Other would be Zermelo ordinals.
To make it a true arithmetic, we need some operations, e.g.:

addition:
We basically move from one side to the other, until one side is  then we have a result.

multiplication:
Here, we basically unroll multiplication into a sequence of additions. And we have the addition already defined.

comparison:
If you have of either side, it is obvious. If not, remove from both sides and try again.
These definitions assume, that just as you are able to wrap things up, you can also unwrap, then and check if they match some pattern. In functional programming, it is called a pattern matching. In mathematics, I never saw any name for it as it is a granted property of the world.
Integers, anyone?
OK, so we defined an infinite set of natural numbers $\N = {0, 1, 2, …}$. It has addition, multiplication, and comparison, but it doesn’t have subtraction. At least one that would be defined for all possible arguments. If we define it like:
and then we’ll just try to find 3 matching the equation, then:
has one solution for . But:
What would be the meaning of that? (I saw a definition of a subtraction in lambda calculus, that would return 0  because it has to return something, but we would like something better than that).
So, let’s upgrade our numbers. Let’s write all pairs of natural numbers and draw them as points on a graph:
^
(0,3) (1,3) (2,3) (3,3)
+
(0,2) (1,2) (2,2) (3,2)
+
(0,1) (1,1) (2,1) (3,1)
+
(0,0) (1,0) (2,0) (3,0)
++++>
If we assume that represents , then we’ll have the result defined for all tuples lying on diagonal and below it. We’ll also see, we can group tuples with the same result.
In general, if for any where , would have the same subtraction, and so it will belong to the same group. If we connect the dots in the same group we get:
^
(0,3) (1,3) (2,3) (3,3)
+ .*
 .*
(0,2) (1,2) (2,2) (3,2)
+ .* .*
 .* .*
(0,1) (1,1) (2,1) (3,1)
+ .* .* .*
 .* .* .*
(0,0) (1,0) (2,0) (3,0)
++++>
Now, let’s say from now on we’ll identify a natural number with the group to in which it is a result of subtraction. As a matter of the fact, what we did here is creation of an equivalent class  we partitioned the whole set of pairs to smaller disjoints subsets, that altogether cover the whole original set , where each pair is identical to the other in some regard (here: same result of subtraction). Because they are the same (in a way), we might not distinguish between them, and even identify the whole set with them. And (as long as we won’t violate the property used to partition them), we can translate the operation on elements of the set to the operation on a whole set. E.g. if we add any pairs which represent 1 to any pair which represent 2, we receive a pair that must represent 3. Of course, as long as we define that addition properly.
So, from now on:

addition:

multiplication:
(Try to calculate to check where it came from).
What we might have not paid attention so far, is that from the very beginning such definition defines classes of equivalence for all possible pairs  even those, that were not handled by the original definition of subtraction.
^
(0,3) (1,3) (2,3) (3,3)
+ .* .* .*
 .* .* .*
(0,2) (1,2) (2,2) (3,2)
+ .* .* .*
 .* .* .*
(0,1) (1,1) (2,1) (3,1)
+ .* .* .*
 .* .* .*
(0,0) (1,0) (2,0) (3,0)
++++>
As you probably guessed, these new numbers are negative numbers we missed.
^
(0,3) (1,3) (2,3) (3,3)
3+ .* .* .*
 .* .* .*
(0,2) (1,2) (2,2) (3,2)
2+ .* .* .*
 .* .* .*
(0,1) (1,1) (2,1) (3,1)
1+ .* .* .*
 .* .* .*
(0,0) (1,0) (2,0) (3,0)
++++>
0 1 2 3
So, now we can reliably subtract things. But we still cannot reliably divide them. For that, we’ll need rational numbers.
Rationals  still talking about tuples
Rational numbers are fractions. Dividend, divisor and a fraction bar between them.
At this point, you should know, what we’re going to do: use tuples!
Again, we will introduce a equality classes, stating that:
meaning that you can reduce a fraction and it is still the same fraction.
Also, for compatibility:
Then, we can define operations:

addition:

multiplication:

subtraction:

division:
At this point, I’m skipping showing how comparison work, or things like neutral elements, but the curious reader has all the tools to figure them out.
Real numbers  finally, something made up!
So far everything could be solved with tricks like wrapping things up in a specific way. With real numbers, things are more difficult. In a way, as for something based in eponymous reality, they are the first thing we cannot easily construct from smaller things.
This indirectly became an issue for Pythagoreans, when they found out that numbers like or cannot be constructed from (finite amount of) rational numbers, no matter how you arrange them. It offended their sense of mathematical aesthetic. Even millenniums later we call such numbers irrationals. Yet, we need them to be able to describe positions and sizes in (among others) Euclidean spaces.
There are 2 popular ways to define real numbers using rationals: Dedekind’s cuts and Cauchy’s sequences.
Dedekind’s cuts uses bounds or partitions to define numbers. E.g. I am looking for the smallest number that, when squared, would be greater or equal to 2. In rational numbers, there is no such smallest element. We can get closer and closer to something that squared becomes 2, but it’s always slightly bigger.
Then we can state, that on e.g. numerical line we can reach such a point, and we need a good name for it. Since no rational fit it, we can name it and call it a day. Every time we will see we would be able to think, that this is a placeholder for something that fulfills the condition, that this is the smallest number that squared is greater or equal to 2 (or it’s a least upper bound of such that ).
Problem with such definition is that we aren’t constructing anything, we just list the requirements, give a name to something that fulfills them, and from now on pretend that this is a thing (and, since it’s mathematics, we can often get away with it if the definition is not contradictory).
Cauchy’s sequences are sequences where elements become closer to each other as the sequence progresses. E.g.
We can easily design a sequence, that will approach something that is not a rational number. E.g.
could be a Cauchy’s sequence of rationals that gets closer and closer to .
Then, we could identify each sequence, with a number it is approaching and this way we would design real numbers. An infinite sequence is always something that we can construct (as opposed to handwaving with partitions).
How to operate on such sequences? Well, if you add, subtract, multiply or divide values on corresponding positions in 2 sequences, you should get a sequence that approaches the result of addition, subtraction, multiplication or division.
For comparison, you can subtract one sequence from the other and check the sign.
Now, you can construct and use every real number.
Beyond real
So far everything was more or less convoluted way of wrapping up empty set and sets of the empty set, and sets of sets of the empty set, etc. We build this way natural numbers, integers, rationals and reals, tuples and functions. Is everything made up of sets? Well, in set theory, basically yes.
Vectors and matrices could be defined as functions from elements coordinates to the element (sets). Or tuples (vectors) and tuples of tuples with matching sizes (matrices). Complex numbers are, well, tuples of real and imaginary part. Also coordinates on a complex plane. Quaternions? Matrices. Algebras are sets of values with operations on these values (functions, thus also sets). Graphs? A tuples of vertices’ set and nodes’ set.
At some point, mathematicians introduced the idea of a class  a collection of objects defined by properties objects must have. They did it because sets have to be constructible from bottomup (one way or the other), which excludes e.g. ability to build the set of all sets, which might be limiting. The class of all sets is a thing. Also, an example of a proper class  a class which is not a set.
However, besides proper classes (and proper classes of proper classes, etc), everything based on a set theory operates on sets. And each of these sets originates from an empty set wrapped in different ways. Lambda calculus needs only congruences and reductions to implement everything, category theory has objects and arrows. And a set theory achieves it all with the empty sets and wrapping.