If you work with anything that can be modeled mathematically, you most likely know that many things you work can be expressed with algebras. However, if you are not a graduate of a computer science course or similar you might not know how ubiquitous they are and how often you rely on some of them. (And I don’t mean F-algebras and FP-concepts). So what are algebras and where can we meet some most common of them?
As many other words related to mathematics, algebra was introduced to European languages through the works of Arabian mathematicians and researchers during Medieval and Renaissance. This particular work came from the original title of the book The Compendious Book on Calculation by Completion and Balancing , which was one of the first texts that wanted to decompose problem to some primitives, study their properties and compose them again in order to be able to solve an infinite class of problems instead of just one problem in front of you. The word al-jabr translates as the reunion of broken parts, which not only describes, that the treatise is about but also what modern algebra wants to achieve.
The word algebra itself has several meanings. First of them (which might be pluralized) describes a mathematical structure - usually a set, operations defined on it and conditions they all should meet. The other is a branch of mathematics dedicated to studying them and relations between them (always singular). Usually, it’s clear from the context which of these is described, but sometimes for clarity, the former is called algebraic structure, and the later might be specified as linear algebra, elementary algebra, abstract (modern) algebra, etc.
But let’s start with some example to get the feeling that it’s all about.
Branches of mathematics like arithmetics or number theory care mostly about numbers. Geometry cares about shapes and points in space (however defined). Algebra doesn’t care about anything like that.
Let’s say you have a sheet of paper and you can glue them together. It’s is not really important how, as long as every 2 pieces you would always glue together the same way. The result you can think of as one piece of paper. Can you glue together 1000 pieces of paper?
Now, let’s say you need to buy a present for your friend. You have 2 presents to choose from and you need to pick a better option. You know your friend well, so no matter what 2 present propositions you get, you can always decide which is better. Can you pick the best present out of 100 propositions?
If we don’t focus too much about a reality where 1000 pieces of paper glued together would consist more of glue than paper, the answer to the first question is yes. If we ignore the fact, that about 5-6 decision we will be completely indifferent what we’ll buy, we can also assume, that we can always pick the best present. What these 2 stories have in common?
In each of them we knew that we can:
- take 2 things of the same kind (pieces of paper, gifts),
- do an operation on it (glue, pick better),
- get the result of the same kind as initial things (piece of paper, gift),
- take another thing of the same kind (piece of paper, gift),
- because again we have 2 things o the same kind, that we know how to operate on, we can repeat the operation again and again.
Actually, we made use of much more properties than that, but the most important one right now is that we have 2 things of some kind and we can turn them into one thing of the same kind. In algebra, this is called magma. Now, let’s try to formalize this description.
Magma is a pair , where is some set and is an operation closed over set , that is for each two elements of there is always a defined result also in :
We say that is closed under .
Magma does not have any other requirement about . It means e.g. that if our operation is sensitive to how we group operations so , it is not an issue. E.g. natural numbers with power operation can be a magma.
The power operation might not be associative, but in many cases, we rely on such property.
When you use list or string concatenation, you unconsciously rely on 2 properties: the fact that concatenations preserve the order of orders inside and that - as long as the order of lists/strings is the same - how you group them in parenthesis doesn’t affect the final result.
When you add numbers or multiply square matrices of the same rank, it is hard to talk about internal order, but still - the arrangement of parenthesis do not matter.
The algebraic structure , such that
- - is closed under
- - is associative
is called a semigroup. We can see that a semigroup is a magma with an associative operation.
Associativity is quite useful in parallel computing. If you have e.g. a list of numbers you have to sum, you can group them with parenthesis, have each group’s sum be calculated on a separate thread, and then sum up partial results - associativity guarantees, that the results will be the same as if you added then sequentially (and in certain situations a lot faster).
But there are also less obvious semigroup candidates. Let’s say you need to build up some validated data structure and each component might be correct or incorrect. If all elements are correct you can combine them together. If they are not correct, you use associativity to build a collection of validated errors. In Scala, there are 2 implementations of exactly this idea: Validated in Cats and Validation in Scalaz.
This property becomes even more important (and in an even more sophisticated way) when you start working with distributed systems. Many of them sooner or later will face CAP theorem, which make them choose - either systems will have to sacrifice availability being always in synchronized state or they will have to go the eventual consistency. Quite a lot of systems prefer the latter option.
Eventual consistency embraces the idea that different parts of the system get out of synch with each other, and that one has to be able to exchange data so that each part will eventually reach the same state. The problem that will inevitably arise is: how to deal with conflicts, where changes in different nodes cannot be applied together?
One of the possible approaches is designing a model where each conflict can be always solved: imagine your data is some initial state with a sequence of updates (sorted by the order of application). Your structure can accept any possible update (even command to remove already removed data) gracefully without going into invalid state. Then, if you could model things so that you could merge 2 journals and get a resulting journal of operations in such order as if all update commands were sent to one node instead of 2 (or more), then you could apply apply resulting journal on initial state and end up with states merged without conflicts.
Of course, it is hardly ever practical to actually rerun the whole log, but it is enough if your structure act as if that was happening. This is the idea behind conflict-free replicated data types (CRDT), that are used e.g. to implement Redis or CKEditor. While it might be initially challenging to design your data type to be associative, in certain cases it immediately pays off, resulting in a simpler architecture.
Let’s say you read some whitepaper, e.g. about map-reduce and you want to immediately implement it. Maybe you want to use it to calculate a total cost of materials for a building construction, so you take a list of materials, turn it into a list of costs and you want to add them. You do it in parallel because you believe that will be faster (not always true, but let’s leave it today). You can always add 2 elements together. But what if groups end up having even number of elements?
For instance, order of additions might be:
a b c d e f g h i j k \_/ | \_/ \_/ \_/ \_/ \_/ \___/ \___/ \_______/ | \______________/ | result
It creates a kind of inelegant graph. If you have to run computations you need to consider some special cases, delay some computations, move some results from one stage to the other because they didn’t have a pair… Also, what if your list of elements to add will be empty? Should the program break?
Since we are talking about numbers and sum, it should be quite obvious to us, that we could easily handle such problem using 0 as a placeholder for missing elements in pairs. 0 is a neutral element of addition so we can add it wherever we like, and the result won’t change. But our code can. Corder cases disappear. We might return it as a result for an empty list and have all inputs covered.
a b c d e f g h i j k | | | 0 | | | | | | | | \_/ \_/ \_/ \_/ \_/ \_/ \___/ \___/ \___/ 0 \_______/ \_____/ \_______________/ | result
Semigroup with such identity/neutral/empty element is called a monoid.
Formally, monoid is a triple such that:
- - is closed under
- - is associative
- - is neutral element of
When we look closer at Lists and String, we’ll see that they have something in common. Basically, both are sequences of some elements which we can concatenate. We can formalize this observation to describe all such structures.
First, we’ll describe the sequence as something formal. When we have some set (called an alphabet), we can build some tuples with them. would be just an alphabet (or set of words of length of 1), would be set of words of length of 2, , and so on (basically is a cartesian product of alphabets set of words of length ). would be a singleton of words of length (cartesian product doesn’t allow that, so can use some special element e.g. to describe that).
Next, we’ll add all sets of particular length together:
This is known as Kleene star. If we remove an empty word, we’ll get a Kleene plus:
Let’s take some alphabet and build a set of all possible words for it:
Now, let’s define a concatenation operation:
Then, a triple creates a free monoid generated by . If we remove an empty word we’ll end up with a free semigroup generated by .
Here, free refers to the fact, that is free of any assumptions about , yet we are certain that it has all the properties required of a monoid. Similarly, other constructions, e.g. free monads and free applicatives, allows you to generate an algebra out of some without any assumptions about .
Examples of free monoids from our everyday life are data structures like Lists or Vectors. If we treat String as a List/Vector of
chars, that it is also a free monoid generated by
Trivia: and are where regular expressions got
+modifiers. Formally, they describe, that this part of the expression is a word belonging to and . Modern Algebra with Applications by Gilbert and Nicholson even describes how one can represent a (finite state) automaton (equivalent with regular expression) with a syntactic monoid.
We have the operation. We have certainty, that this operation is associative. We have certainty, that this operation has an identity/neutral element. What we miss to describe all useful properties of e.g. addition or multiplication of real numbers? Inverse element.
If we have a group (additive notation) or (multiplicative notation) a inverse element of is defined as:
in additive notation and
in multiplicative notation.
If for each you can find an inverse element, then your monoid is called a group.
Additive and multiplicative notations are used, when your set can have 2 different algebras defined for it, e.g. real numbers are group for addition: and multiplication: . 2 different notations help us keep trace which group we are talking about at the moment.
Besides numbers, there is one interesting group to describe, one that is actually used in chemistry (crystallography to be precise). It is a symmetry group.
Let’s say we have a 2D Euclidean space. We can put some geometric shape there. Our neutral element will be an operation, that does nothing to this shape - . Now, we can define:
rotation by 90 degrees , 180 degrees , 270 degrees (rotation center is point of a plane),
1|2 3|4 2|3 -+- -R1> -+- -R1> -+- 4|3 2|1 1|4 | ^ ^ +---R2---+ | +-----------------+
we will also add vertival flip and horizontal flip (using OX and OY axes),
1|2 4|3 -+- -V> -+- 4|3 1|2 1|2 2|1 -+- -H> -+- 4|3 3|4
diagonal flip and ,
1|2 1|4 -+- -D\> -+- 4|3 2|3 1|2 3|2 -+- -D/> -+- 4|3 4|1
we can apply operations and one after the other and describe result as .
It might be simpler, if we put it in a table (called Cayley’s table):
We might notice some interesting things here:
- every two pairs of transformation has defined the sum of transformations,
- each element has an inverse element,
- it might not be obvious, but combining transformations is associative,
- in each row and in each column each element occurs exactly once (so the operations table is a Latin Square)
- if we applied some argument several times we would be able to create cycles (and create a smaller group of a subset of ).
In general, we see that we defined a group. We can define similar groups for 3D, 4D and any dimensions you want. Also, we don’t have to limit ourselves to just 4 rotations and 4 axes of symmetry.
Once we calculated such table, we are able to combine any sequence of operations and save some CPU cycles, that we would otherwise waste on transforming whole 2D image. (Though graphics programmers would probably move straight on to quaternons).
Another interesting group is a permutation group. When you have some ordered elements (e.g. a tuple) and you design a function which assigns each of these elements new position (position might but don’t have to be changed, however as in the old order each element has to have a distinct index). We call such function a permutation.
E.g. would be a permutation. If we assume, that we enumerate inputs with natural numbers , we might describe a permutation by writing down from each position was taken element on current position e.g. .
Permutations can be combined. E.g.
1 2 3 4 into 1 4 3 2 + 1 2 3 4 into 3 4 1 2 = 1 2 3 4 into 3 2 1 4
(Trick: take a look at the second permutation and each number substitute with -th number from the first permutation).
A permutation, which doesn’t reorder elements would be a neutral element. For each permutation we can easily create an inverse element, e.g.:
1 2 3 4 into 1 4 3 2 invert 1 4 3 2 1 2 3 4 order by top row 1 2 3 4 1 4 3 2
We can easily see that permutations also form a group. Each such group of permutations of elements would have a cardinality of .
Now, let’s stop for a moment and make an observation. If we take a look at our symmetry group, we’ll see that if we use it for transforming positions of 4 elements, then it appears that it also can be described using permutations of 4 elements. However, there are only 8 elements, while a set of all possible permutations of 4 elements should have a cardinality of . (It makes sense, as missing transformation would have to tear figure down, and we just wanted to rotate/flip it). So, our symmetry group is isomorphic to some subgroup of 4 element permutation group.
That is a nice example of Cayley’s theorem:
Each group is isomorphic to a subgroup of some permutation group.
Many of groups would change the result of operation if you switched the operands. while . . Same with groups of square matrices of rank with multiplication.
But there are groups where you can flip operands. and . .
Such property where for all 2 elements in algebra flipping the order doesn’t change the result of an operation is called commutativity.
A group with such property is called commutative group or abelian group after Niels Henrik Abel. Examples are addition of most numbers (naturals, integers, rationals, real, complex), matrices of the same dimension, groups (not all symmetry groups are abelian, it is more of an exception!).
Such property might be useful if we want to compute a large sequence of operations in a group. We might e.g. reorder operands so that inverse elements are next to inverted elements, and - since they together form a neutral element - remove them from the sequence altogether. In certain cases it might spare us expensive calculations.
So far we only talked about one operation at once. But what if set might be a subject to 2 different operations? For instance, numbers can be both added and multiplied. We even introduced 2 different notations (additive and multiplicative) to distinguish them. Such algebras exist of course. The first of them is a ring.
A ring is a quadruple , such that:
is an abelan group (associativity, commutativety, neutral and inverse elements),
is a monoid (associativety, neutral element),
multiplication () is distributive with respect to addition ():
As we can imagine most numbers are rings. Square matrices of the same rank (e.g. ) would also be rings. But besides standard sets of numbers, there are also other interesting rings.
Let’s take integers. It is an infinite set . What would happen, if we only took even numbers?
Well, it wouldn’t be a ring. Such set would not have which is required as the neutral element of the multiplication. However, would still be a group, and a subgroup of .
What is more, this subgroup would have an interesting property. For each element of , if we multiplied them by any element of , we would get an element of .
Such we would call an ideal of . If this property would hold only for or we would call it a right ideal of or a left ideal of respectively.
But I promised other interesting rings, while ideals are not rings. This is because ideals can be used to construct rings.
Ideal demands that if you multiply something from whole by some element of you get some element of . There is no such requirement of addition.
Let’s still use our example (all elements of divisible by , so just even numbers).
If we add to every element of , we get
so odd numbers. If we add 2 () we get again. so odd numbers again. You can see where it’s all going. By calculating all possible sets where , we end up with just 2 possibilities: (even numbers) and (odd numbers).
We can create a set of such sets, let’s denote it with . If we think about it we partitioned whole into smaller disjoint sets of numbers which are somehow the same (in this example: has the same remainder in a division by ), that sum up to whole set. In terms of set theory, we could say, that we defined some equivalence relation (has the same reminder as), and then we partitioned R into equivalence classes, where all elements are (in a way) the same.
So, we have a set . But, do know that we can define operations on it?
If we try it out we’ll see, that:
We might as well remove to make it shorter:
That is basically how addition modulo 2 works. As a matter of the fact, we just defined it!
shows that we defined multiplication modulo 2.
Our new 0 and 1 (, ) are neutral elements of addition and multiplication modulo 2. We can also find inverse elements for the addition.
The construction we just made is a ring called a quotient ring . And it is the way we are defining operations modulo, so it is a starting point for any theories which use modulo arithmetic. If and I is being constructed by taking all numbers divisible by (or multiplying all numbers by - ), such is denoted as (an additive group of the ring of integers modulo ).
is not a group with . However, we can create a subgroup, which would be:
that is, a subset of elements from coprime with . We could call it a multiplicative group of the ring of integers modulo .
In a ring only set with addition forms a group. What if we want multiplication form a group with the set as well?
A field is an algebra , where both and are groups (addition form abelan group), which has a property of distributivity of multiplication over addition. Since both addition and multiplication have to have inverse elements (, ) we effectively require it to has: addition, substraction, multiplication and division defined.
Again, most numbers (from rationals up) are fields. They can be described as extensions of the rational numbers field, they are called algebraic number fields.
But there are other fields as well. Actually, there is one very small and very famous.
Named after George Boole who introduced it, this algebra formalizes logic operations on true and false, specifying sets of conditions we need to fulfill in order to perform logic operations (no matter how we actually implement it).
A Boolean algebra is defined as , where:
is a group,
is a group,
is unary operator returning an inverse of (negation),
operations are distributive:
(so far we said, that is a field),
operations have an absorbtion property:
(This is where differes from : instead of ).
Depending on context we might see different notations used for operations:
depending on how we implement our algebra:
using logical operations (if he happens to treat them as something given and not a Boolean algebra implemented in another way in disguise)
Each of these implementations is isomorphic - you can map them into each other without ambiguity.
It might be handy if you ever wanted to perform logical operations on a type-level.
So, we designed a way of implementing modulo arithmetics on numbers. Next thing we could do, would be making polynomials out of them.
Let us remind quotient rings for a moment. was partitioning the whole ring into smaller disjoint sets of somehow identical elements. Now, let’s denote a ring of polynomials with coefficients from as . What would mean?
Basically, it would partition the whole set of polynomials into smaller sets of polynomials with the same remainders from division by polynomial irreducible in . E.g.
There are and (the latter of degree smaller than ’s one) for which
The ring we obtain by calculating all possible rests would look like this.
As we could expect is a neutral element of and is neutral element of .
We would like to have a root at some (). Currently, doesn’t have such element, so extend it:
We could subsutute and in our ring tables we’ll get:
We might not know that, but we just build a Galois field with an order of 4.
A Galois field (or finite field) is a finite field which has elements (has an order of ). There is a theorem stating that each finite field’s order is of a form of where is some prime and is some natural number. We denote Galois field with an order of as .
If we wanted to build , we would need to find irreducible polynomial of degree from (with root at ), and then we could construct .
In our example, before we just happen to use the and , so things played out.
This is the right moment to ask, what is the application of such algebra?
The answer: correction codes.
Correction codes are a way of encoding messages with some extra information, so that in case of some error during transmission we would be able to recover (at least to some degree).
An example of correction codes are linear codes. They translate a vector of bits of information into a vector of bits containing both information as well as extra information required to recover (we would describe them as ; ).
We call it a liner code because it is defined using linearly independent vectors, that can be added and/or multiplied by a constant. You might have already guessed, that translation is done by multiplying a -element vector by a matrix.
An example. We have a . Definition of our code would be:
We have lineary independent vectors, that defnine a linear space. and carries an actual information, while , , are used to recovery. How, we could recover? We need to:
at first we calculate all possible vectors - we can do this by trying out all possible values of and
encoding vector 0 0 00000 0 1 01011 1 0 10101 1 1 11110
next, we need to calculate the set of syndroms - vectors representing transmission errors on particular bits. For binary code we should be able to recover errors, so seach syndrom would have at most 2 bits on,
(empty syndrom - no errors), , , , , , , ,
we start adding syndroms to list we created before - we should end up with vectors grouped in layers:
information 00 01 10 11 vectors 00000 01011 10101 11110 layers 00000 01011 10101 11110 10000 11011 00101 01110 01000 00011 11101 10110 00100 01111 10001 11010 00010 01001 10111 11100 00001 01010 10100 11111 11000 10011 01101 00110 01100 00111 11001 10010
in each rower elements have the same syndromes (the same transmission errors). If we want to correct them, we need to find received vector in layers and then pick the right encoding vector from 2nd row,
received vector can be translated back into the information.
If we cannot find our vector anywhere then there were more errors during transmission than we can recover from.
Linear code as matrix
Our 4 vectors can be calculated also other way. We can calculate them as:
Considering, that input would be a vector , it naturally translates to matrix multiplication:
This will be our generator matrix. As mentioned earlier we can encode linear codes by multiplying by matrix. We might also consider transforming by rearranging and adding rows to each other (such linear transformations will not change its code-correcting properties). Then we might normalize it to the form
where is square identity matrix of degree and is of coefficients used in correction. Such normalized G would be
In order to check errors and decode information we will use parity check matrix . We define it as matrix that . We can calculate it knowing :
Such can be calculated to calculate a syndrome
which in turn we can use to correct message and then retrieve the original information.
Generating codes from polynomials
We can take a polynomial of degree over . If we multiplied all possible polynomials by , we could receive all polynomials divisible by without rest. Since elements belong to finite field some of them after this multiplication would be of degree less than . They would be words of code generated by this polynomial.
Example (from Wiki): if then and we and up with modulo 2 arithmetic. If we took , we would obtain words like:
, , , , , ,
which resolves to:
, , , ,
We can also represent them using vectors of their coefficients:
, , , ,
These vectors create a linear space. One that we can represent using matrix .
By carefully choosing parameters of the polynomial we might create a Bose–Chaudhuri–Hocquenghem code - BCH codes are the most powerful class of correcting codes, they are used in hardware: DVDs, satellites SSDs, etc. Syndromes allow for easy and low-power consuming error correction. These algebras are what makes whole digital communication and storage reliable, so I thought it is worth mentioning how much we rely on it, even if we do not remember about it during our everyday work.
I wanted to show that there are a lot of algebras, that have practical usage. That we use them every day and rely on their properties even if we are not hardcore functional programmers. Last but not least I wanted to show that algebras as not only semigroup, monoid, and F-algebras - in FP-communities other algebras are hardly ever mentioned.
During writing, I used and borrowed few examples from (already mentioned) Modern Algebra with Applications by Gilbert and Nicholson. I also checked few things in Kody Korekcyjne i Kryptografia by Mochnacki (Correction codes and cryptography, Polish-only, sorry). I merely scratched the surface of algebras to show that that have some practical applications, so if you are interested in more details, I encourage you to consult the books - I had to skip over quite a lot of things to make this post possibly short. It still didn’t come short, so you can only imagine how incredible it would be if I haven’t.