Algebras we love

If you work with anything that can be modeled mathematically, you most likely know that many things you work can be expressed with algebras. However, if you are not a graduate of a computer science course or similar you might not know how ubiquitous they are and how often you rely on some of them. (And I don’t mean F-algebras and FP-concepts). So what are algebras and where can we meet some most common of them?

Al-jabr

As many other words related to mathematics, algebra was introduced to European languages through the works of Arabian mathematicians and researchers during Medieval and Renaissance. This particular work came from the original title of the book The Compendious Book on Calculation by Completion and Balancing , which was one of the first texts that wanted to decompose problem to some primitives, study their properties and compose them again in order to be able to solve an infinite class of problems instead of just one problem in front of you. The word al-jabr translates as the reunion of broken parts, which not only describes, that the treatise is about but also what modern algebra wants to achieve.

The word algebra itself has several meanings. First of them (which might be pluralized) describes a mathematical structure - usually a set, operations defined on it and conditions they all should meet. The other is a branch of mathematics dedicated to studying them and relations between them (always singular). Usually, it’s clear from the context which of these is described, but sometimes for clarity, the former is called algebraic structure, and the later might be specified as linear algebra, elementary algebra, abstract (modern) algebra, etc.

But let’s start with some example to get the feeling that it’s all about.

Magma

Branches of mathematics like arithmetics or number theory care mostly about numbers. Geometry cares about shapes and points in space (however defined). Algebra doesn’t care about anything like that.

Let’s say you have a sheet of paper and you can glue them together. It’s is not really important how, as long as every 2 pieces you would always glue together the same way. The result you can think of as one piece of paper. Can you glue together 1000 pieces of paper?

Now, let’s say you need to buy a present for your friend. You have 2 presents to choose from and you need to pick a better option. You know your friend well, so no matter what 2 present propositions you get, you can always decide which is better. Can you pick the best present out of 100 propositions?

If we don’t focus too much about a reality where 1000 pieces of paper glued together would consist more of glue than paper, the answer to the first question is yes. If we ignore the fact, that about 5-6 decision we will be completely indifferent what we’ll buy, we can also assume, that we can always pick the best present. What these 2 stories have in common?

In each of them we knew that we can:

  • take 2 things of the same kind (pieces of paper, gifts),
  • do an operation on it (glue, pick better),
  • get the result of the same kind as initial things (piece of paper, gift),
  • take another thing of the same kind (piece of paper, gift),
  • because again we have 2 things o the same kind, that we know how to operate on, we can repeat the operation again and again.

Actually, we made use of much more properties than that, but the most important one right now is that we have 2 things of some kind and we can turn them into one thing of the same kind. In algebra, this is called magma. Now, let’s try to formalize this description.

Magma is a pair (M,)(M, \cdot), where MM is some set and \cdot is an operation closed over set MM, that is for each two elements of MM there is always a defined result aba \cdot b also in MM:

a,bMabM\forall_{a, b \in M} a \cdot b \in M

We say that MM is closed under \cdot.

Magma does not have any other requirement about \cdot. It means e.g. that if our operation is sensitive to how we group operations so a(bc)(ab)ca \cdot (b \cdot c) \neq (a \cdot b) \cdot c, it is not an issue. E.g. natural numbers with power operation (N,pow)(\mathbb{N}, pow) can be a magma.

Semigroup

The power operation might not be associative, but in many cases, we rely on such property.

When you use list or string concatenation, you unconsciously rely on 2 properties: the fact that concatenations preserve the order of orders inside and that - as long as the order of lists/strings is the same - how you group them in parenthesis doesn’t affect the final result.

When you add numbers or multiply square matrices of the same rank, it is hard to talk about internal order, but still - the arrangement of parenthesis do not matter.

The algebraic structure (S,)(S, \oplus), such that

  • a,bSabS\forall_{a,b \in S} a \oplus b \in S - SS is closed under \oplus
  • a,b,cS(ab)c=a(bc)\forall_{a,b,c \in S} (a \oplus b) \oplus c = a \oplus (b \oplus c) - \oplus is associative

is called a semigroup. We can see that a semigroup is a magma with an associative operation.

Associativity is quite useful in parallel computing. If you have e.g. a list of numbers you have to sum, you can group them with parenthesis, have each group’s sum be calculated on a separate thread, and then sum up partial results - associativity guarantees, that the results will be the same as if you added then sequentially (and in certain situations a lot faster).

But there are also less obvious semigroup candidates. Let’s say you need to build up some validated data structure and each component might be correct or incorrect. If all elements are correct you can combine them together. If they are not correct, you use associativity to build a collection of validated errors. In Scala, there are 2 implementations of exactly this idea: Validated in Cats and Validation in Scalaz.

This property becomes even more important (and in an even more sophisticated way) when you start working with distributed systems. Many of them sooner or later will face CAP theorem, which make them choose - either systems will have to sacrifice availability being always in synchronized state or they will have to go the eventual consistency. Quite a lot of systems prefer the latter option.

Eventual consistency embraces the idea that different parts of the system get out of synch with each other, and that one has to be able to exchange data so that each part will eventually reach the same state. The problem that will inevitably arise is: how to deal with conflicts, where changes in different nodes cannot be applied together?

One of the possible approaches is designing a model where each conflict can be always solved: imagine your data is some initial state with a sequence of updates (sorted by the order of application). Your structure can accept any possible update (even command to remove already removed data) gracefully without going into invalid state. Then, if you could model things so that you could merge 2 journals and get a resulting journal of operations in such order as if all update commands were sent to one node instead of 2 (or more), then you could apply apply resulting journal on initial state and end up with states merged without conflicts.

Of course, it is hardly ever practical to actually rerun the whole log, but it is enough if your structure act as if that was happening. This is the idea behind conflict-free replicated data types (CRDT), that are used e.g. to implement Redis or CKEditor. While it might be initially challenging to design your data type to be associative, in certain cases it immediately pays off, resulting in a simpler architecture.

Monoid

Let’s say you read some whitepaper, e.g. about map-reduce and you want to immediately implement it. Maybe you want to use it to calculate a total cost of materials for a building construction, so you take a list of materials, turn it into a list of costs and you want to add them. You do it in parallel because you believe that will be faster (not always true, but let’s leave it today). You can always add 2 elements together. But what if groups end up having even number of elements?

For instance, order of additions might be:

a b c   d e f g   h i j k
\_/ |   \_/ \_/   \_/ \_/
 \_/     \___/     \___/
  \_______/          |
      \______________/
             |
           result

It creates a kind of inelegant graph. If you have to run computations you need to consider some special cases, delay some computations, move some results from one stage to the other because they didn’t have a pair… Also, what if your list of elements to add will be empty? Should the program break?

Since we are talking about numbers and sum, it should be quite obvious to us, that we could easily handle such problem using 0 as a placeholder for missing elements in pairs. 0 is a neutral element of addition so we can add it wherever we like, and the result won’t change. But our code can. Corder cases disappear. We might return it as a result for an empty list and have all inputs covered.

a b c   d e f g   h i j k
| | | 0 | | | |   | | | |
\_/ \_/ \_/ \_/   \_/ \_/
 \___/   \___/     \___/   0
   \_______/         \_____/
       \_______________/
               |
             result

Semigroup with such identity/neutral/empty element is called a monoid.

Formally, monoid is a triple (M,,e)(M, \oplus, e) such that:

  • a,bMabM\forall_{a,b \in M} a \oplus b \in M - MM is closed under \oplus
  • a,b,cM(ab)c=a(bc)\forall_{a,b,c \in M} (a \oplus b) \oplus c = a \oplus (b \oplus c) - \oplus is associative
  • aMae=ea=a\forall_{a \in M} a \oplus e = e \oplus a = a - ee is neutral element of \oplus

Free monoids

When we look closer at Lists and String, we’ll see that they have something in common. Basically, both are sequences of some elements which we can concatenate. We can formalize this observation to describe all such structures.

First, we’ll describe the sequence as something formal. When we have some set AA (called an alphabet), we can build some tuples with them. A1=AA^1 = A would be just an alphabet (or set of words of length of 1), A2=A×AA^2 = A \times A would be set of words of length of 2, A3=A×A×AA^3 = A \times A \times A, and so on (basically AnA^n is a cartesian product of nn alphabets == set of words of length nn). A0A^0 would be a singleton of words of length 00 (cartesian product doesn’t allow that, so can use some special element e.g. Λ\Lambda to describe that).

Next, we’ll add all sets of particular length together:

A=A0A1A2A3A4...=n=0AnA^* = A^0 \cup A^1 \cup A^2 \cup A^3 \cup A^4 \cup ... = \bigcup_{n=0}^{\infty} A^n

This * is known as Kleene star. If we remove an empty word, we’ll get a Kleene plus:

A+=AA0=A1A2A3A4...=n=1AnA^+ = A^* - A^0 =A^1 \cup A^2 \cup A^3 \cup A^4 \cup ... = \bigcup_{n=1}^{\infty} A^n

Let’s take some alphabet AA and build a set of all possible words for it:

FM(A)=AFM(A) = A^*

Now, let’s define a concatenation operation:

ai=ai1ai2..ai(i1)aiiAia_i = a_{i1}a_{i2}..a_{i(i-1)}a_{ii} \in A^i aj=aj1aj2..aj(j1)ajjAja_j = a_{j1}a_{j2}..a_{j(j-1)}a_{jj} \in A^j aiaj=ai1ai2..ai(i1)aiiaj1aj2..aj(j1)ajjAi+ja_i \star a_j = a_{i1}a_{i2}..a_{i(i-1)}a_{ii}a_{j1}a_{j2}..a_{j(j-1)}a_{jj} \in A^{i + j}

Then, a triple (FM(A),,Λ)(FM(A), \star, \Lambda) creates a free monoid generated by AA. If we remove an empty word we’ll end up with a free semigroup (FM(A),)(FM(A), \star) generated by AA.

Here, free refers to the fact, that FM(A)FM(A) is free of any assumptions about AA, yet we are certain that it has all the properties required of a monoid. Similarly, other constructions, e.g. free monads and free applicatives, allows you to generate an algebra out of some AA without any assumptions about AA.

Examples of free monoids from our everyday life are data structures like Lists or Vectors. If we treat String as a List/Vector of chars, that it is also a free monoid generated by char set.

Trivia: AA^* and A+A^+ are where regular expressions got * and + modifiers. Formally, they describe, that this part of the expression is a word belonging to AA^* and A+A^+. Modern Algebra with Applications by Gilbert and Nicholson even describes how one can represent a (finite state) automaton (equivalent with regular expression) with a syntactic monoid.

Group

We have the operation. We have certainty, that this operation is associative. We have certainty, that this operation has an identity/neutral element. What we miss to describe all useful properties of e.g. addition or multiplication of real numbers? Inverse element.

If we have a group (A,,0)(A, \oplus, 0) (additive notation) or (A,,1)(A, \otimes, 1) (multiplicative notation) a inverse element of aa is defined as:

aA:a(a)=0-a \in A: a \oplus (-a) = 0

in additive notation and

a1A:aa1=1a^{-1} \in A: a \otimes a^{-1} = 1

in multiplicative notation.

If for each aAa \in A you can find an inverse element, then your monoid is called a group.

Additive and multiplicative notations are used, when your set AA can have 2 different algebras defined for it, e.g. real numbers are group for addition: (R,+,0)(\mathbb{R}, +, 0) and multiplication: (R,,1)(\mathbb{R},*,1). 2 different notations help us keep trace which group we are talking about at the moment.

Symmetry group

Besides numbers, there is one interesting group to describe, one that is actually used in chemistry (crystallography to be precise). It is a symmetry group.

Let’s say we have a 2D Euclidean space. We can put some geometric shape there. Our neutral element will be an operation, that does nothing to this shape - II. Now, we can define:

  • rotation by 90 degrees R1R_1, 180 degrees R2R_2, 270 degrees R3R_3 (rotation center is point (0,0)(0,0) of a plane),

    1|2      3|4      2|3
    -+- -R1> -+- -R1> -+-
    4|3      2|1      1|4
     |        ^        ^
     +---R2---+        |
     +-----------------+
    
  • we will also add vertival flip VV and horizontal flip HH (using OX and OY axes),

    1|2     4|3
    -+- -V> -+-
    4|3     1|2
      
    1|2     2|1
    -+- -H> -+-
    4|3     3|4
    
  • diagonal flip DD_{\diagdown} and DD_{\diagup},

    1|2      1|4
    -+- -D\> -+-
    4|3      2|3
      
    1|2      3|2
    -+- -D/> -+-
    4|3      4|1
    
  • we can apply operations aa and bb one after the other and describe result as a+ba + b.

Then:

S={I,V,H}{Rn:n=1,2,3}S = \{ I, V, H \} \cup \{ R_n : n = 1, 2, 3 \} R2=R1+R1R_2 = R_1 + R_1 R3=R2+R1=R1+R2=R1+R1+R1R_3 = R_2 + R_1 = R_1 + R_2 = R_1 + R_1 + R_1 D=V+R1=H+R3D_{\diagdown} = V + R_1 = H + R_3 D=H+R1=V+R3D_{_\diagup} = H + R_1 = V + R_3 I=H+H=V+V=I = H + H = V + V = =R1+R1+R1+R1=R2+R2=R3+R3+R3+R3= R_1 + R_1 + R_1 + R_1 = R_2+ R_2 = R_3 + R_3 + R_3 + R_3 H=V+R1+R1=V+R2=V+R3+R3H = V + R_1 + R_1 = V + R_2 = V + R_3 + R_3 V=H+R1+R1=H+R2=H+R3+R3V = H + R_1 + R_1 = H + R_2 = H + R_3 + R_3 sSs+I=I+s=s\forall_{s \in S} s + I = I + s = s

It might be simpler, if we put it in a table (called Cayley’s table):

  R3R_3 R2R_2 R1R_1 II VV HH DD_{\diagup} DD_{\diagdown}
R1R_1 II R3R_3 R2R_2 R1R_1 DD_{\diagup} DD_{\diagdown} VV HH
R2R_2 R1R_1 II R3R_3 R2R_2 HH VV DD_{\diagdown} DD_{\diagup}
R3R_3 R3R_3 R1R_1 II R3R_3 DD_{\diagdown} DD_{\diagup} HH VV
II R3R_3 R2R_2 R1R_1 II VV HH DD_{\diagup} DD_{\diagdown}
VV DD_{\diagup} HH DD_{\diagdown} VV II R2R_2 R3R_3 R1R_1
HH DD_{\diagdown} VV DD_{\diagup} HH R2R_2 II R1R_1 R3R_3
DD_{\diagup} HH DD_{\diagdown} VV DD_{\diagup} R3R_3 R1R_1 II R2R_2
DD_{\diagdown} VV DD_{\diagup} HH DD_{\diagdown} R1R_1 R3R_3 R2R_2 II

We might notice some interesting things here:

  • every two pairs of transformation has defined the sum of transformations,
  • each element has an inverse element,
  • it might not be obvious, but combining transformations is associative,
  • in each row and in each column each element occurs exactly once (so the operations table is a Latin Square)
  • if we applied some argument several times we would be able to create cycles (and create a smaller group of a subset of SS).

In general, we see that we defined a group. We can define similar groups for 3D, 4D and any dimensions you want. Also, we don’t have to limit ourselves to just 4 rotations and 4 axes of symmetry.

Once we calculated such table, we are able to combine any sequence of operations and save some CPU cycles, that we would otherwise waste on transforming whole 2D image. (Though graphics programmers would probably move straight on to quaternons).

Permutation group

Another interesting group is a permutation group. When you have some ordered elements (e.g. a tuple) and you design a function which assigns each of these elements new position (position might but don’t have to be changed, however as in the old order each element has to have a distinct index). We call such function a permutation.

E.g. (a,b,c,d)(a,c,d,b)(a,b,c,d) \rightarrow (a,c,d,b) would be a permutation. If we assume, that we enumerate inputs with natural numbers 1,2,3,4,...1, 2, 3, 4, ..., we might describe a permutation by writing down from each position was taken element on current position e.g. (1,3,4,2)(1,3,4,2).

Permutations can be combined. E.g. (1,4,3,2)(3,4,1,2)=(1,4,3,2) \cdot (3,4,1,2) =

1 2 3 4 into
1 4 3 2
   +
1 2 3 4 into
3 4 1 2
   =
1 2 3 4 into
3 2 1 4   

(Trick: take a look at the second permutation and each number nn substitute with nn-th number from the first permutation).

A permutation, which doesn’t reorder elements (1,2,3,...,n)(1,2,3,...,n) would be a neutral element. For each permutation we can easily create an inverse element, e.g.:

1 2 3 4 into
1 4 3 2

invert

1 4 3 2
1 2 3 4

order by top row

1 2 3 4
1 4 3 2

We can easily see that permutations also form a group. Each such group of permutations of nn elements would have a cardinality of n!n!.

Now, let’s stop for a moment and make an observation. If we take a look at our symmetry group, we’ll see that if we use it for transforming positions of 4 elements, then it appears that it also can be described using permutations of 4 elements. However, there are only 8 elements, while a set of all possible permutations of 4 elements should have a cardinality of 4!=1234=124! = 1 * 2 * 3 * 4 = 12. (It makes sense, as missing transformation would have to tear figure down, and we just wanted to rotate/flip it). So, our symmetry group is isomorphic to some subgroup of 4 element permutation group.

That is a nice example of Cayley’s theorem:

Each group is isomorphic to a subgroup of some permutation group.

Abelian group

Many of groups would change the result of operation if you switched the operands. (2,1,3)(1,3,2)=(2,3,1)(2, 1, 3) \cdot (1, 3, 2) = (2, 3, 1) while (1,3,2)(2,1,3)=(3,1,2)(1, 3, 2) \cdot (2, 1, 3) = (3, 1, 2). abc+defdef+abcabc+def \neq def + abc. Same with groups of square matrices of rank nn with multiplication.

But there are groups where you can flip operands. 2+3=3+22 + 3 = 3 + 2 and 4×5=5×44 \times 5 = 5 \times 4 . H+V=R2=V+HH + V = R_2 = V + H.

Such property where for all 2 elements in algebra flipping the order doesn’t change the result of an operation is called commutativity.

a,bAab=ba\forall_{a,b \in A} a \oplus b = b \oplus a

A group with such property is called commutative group or abelian group after Niels Henrik Abel. Examples are addition of most numbers (naturals, integers, rationals, real, complex), matrices of the same dimension, groups SS (not all symmetry groups are abelian, it is more of an exception!).

Such property might be useful if we want to compute a large sequence of operations in a group. We might e.g. reorder operands so that inverse elements are next to inverted elements, and - since they together form a neutral element - remove them from the sequence altogether. In certain cases it might spare us expensive calculations.

Ring

So far we only talked about one operation at once. But what if set might be a subject to 2 different operations? For instance, numbers can be both added and multiplied. We even introduced 2 different notations (additive and multiplicative) to distinguish them. Such algebras exist of course. The first of them is a ring.

A ring is a quadruple (R,,,0,1)(R, \oplus, \otimes, 0, 1), such that:

  • (R,,0)(R, \oplus, 0) is an abelan group (associativity, commutativety, neutral and inverse elements),

  • (R,,1)(R, \otimes, 1) is a monoid (associativety, neutral element),

  • multiplication (\otimes) is distributive with respect to addition (\oplus):

    a,b,cRa(b+c)=(ab)(ac)\forall_{a,b,c \in R} a \otimes (b + c) = (a \otimes b) \oplus (a \otimes c) a,b,cR(b+c)a=(ba)(ca)\forall_{a,b,c \in R} (b + c) \otimes a = (b \otimes a) \oplus (c \otimes a)

As we can imagine most numbers are rings. Square matrices of the same rank (e.g. 22) would also be rings. But besides standard sets of numbers, there are also other interesting rings.

Ideal

Let’s take integers. It is an infinite set I={...,3,2,1,0,1,2,3,...}I = \{...,-3,-2,-1,0,1,2,3,...\}. What would happen, if we only took even numbers?

Well, it wouldn’t be a ring. Such set would not have 11 which is required as the neutral element of the multiplication. However, (I,+,0)(I, +, 0) would still be a group, and a subgroup of (Z,+,0)(\mathbb{Z}, +, 0).

What is more, this subgroup would have an interesting property. For each element of RR, if we multiplied them by any element of II, we would get an element of II.

rR,iIr×i,i×rI\forall_{r \in R, i \in I} r \times i, i \times r \in I.

Such II we would call an ideal of RR. If this property would hold only for i×ri \times r or r×ir \times i we would call it a right ideal of RR or a left ideal of RR respectively.

But I promised other interesting rings, while ideals are not rings. This is because ideals can be used to construct rings.

Quotient ring

Ideal demands that if you multiply something from whole RR by some element of II you get some element of II. There is no such requirement of addition.

Let’s still use our example I={i:iR,2i}I = \{ i: i \in R, 2\|i \} (all elements of RR divisible by 22, so just even numbers).

If we add 11 to every element of II, we get

I+1={i+1:iR,2i}={...,3,1,1,3,...}I+1 = \{ i+1: i \in R, 2|i \} = \{...,-3,-1,1,3,...\}

so odd numbers. If we add 2 (I+2I+2) we get II again. I+3=I+1I+3=I+1 so odd numbers again. You can see where it’s all going. By calculating all possible sets I+rI+r where rRr \in R, we end up with just 2 possibilities: I+0I+0 (even numbers) and I+1I+1 (odd numbers).

We can create a set of such sets, let’s denote it with R/IR / I. If we think about it we partitioned whole RR into smaller disjoint sets of numbers which are somehow the same (in this example: has the same remainder in a division by 22), that sum up to whole RR set. In terms of set theory, we could say, that we defined some equivalence relation (has the same reminder as), and then we partitioned R into equivalence classes, where all elements are (in a way) the same.

So, we have a set R/IR/I. But, do know that we can define operations on it?

I+a,I+bR/I(I+a)+(I+b)={(i+a)+(j+b):i,jI}\forall_{I+a, I+b \in R/I} (I+a) + (I+b) = \{ (i+a)+(j+b): i,j \in I \} I+a,I+bR/I(I+a)×(I+b)={(i+a)×(j+b):i,jI}\forall_{I+a, I+b \in R/I} (I+a) \times (I+b) = \{ (i+a) \times (j+b): i,j \in I \}

If we try it out we’ll see, that:

(I+0)+(I+0)=(I+1)+(I+1)=(I+0)(I+0)+(I+0)=(I+1)+(I+1)=(I+0) (I+1)+(I+0)=(I+0)+(I+1)=I+1(I+1)+(I+0)=(I+0)+(I+1)=I+1

We might as well remove I+I+ to make it shorter:

0+0=1+1=00 + 0 = 1 + 1 = 0 0+1=1+0=10 + 1 = 1 + 0 = 1

That is basically how addition modulo 2 works. As a matter of the fact, we just defined it!

Similarly:

0×0=0×1=1×0=00 \times 0 = 0 \times 1 = 1 \times 0 = 0 1×1=11 \times 1 = 1

shows that we defined multiplication modulo 2.

Our new 0 and 1 (I+0I+0, I+1I+1) are neutral elements of addition and multiplication modulo 2. We can also find inverse elements for the addition.

The construction we just made is a ring called a quotient ring R/IR/I. And it is the way we are defining operations modulo, so it is a starting point for any theories which use modulo arithmetic. If R=ZR=\mathbb{Z} and I is being constructed by taking all numbers divisible by nn (or multiplying all numbers by nn - nZnZ), such R/nZR/nZ is denoted as ZnZ_n (an additive group of the ring of integers modulo nn).

ZnZ_n is not a group with \otimes. However, we can create a subgroup, which would be:

Zn={a:aZn,GCD(a,n)=1}Z_n^* = \{ a: a \in Z_n, GCD(a,n) = 1 \}

that is, a subset of elements from ZnZ_n coprime with nn. We could call it a multiplicative group of the ring of integers modulo nn.

Field

In a ring only set with addition forms a group. What if we want multiplication form a group with the set as well?

A field is an algebra (F,,,0,1)(F, \oplus, \otimes, 0, 1), where both (F,,0)(F, \oplus, 0) and (F,×,1)(F, \times, 1) are groups (addition form abelan group), which has a property of distributivity of multiplication over addition. Since both addition and multiplication have to have inverse elements (a-a, a1a^{-1}) we effectively require it to has: addition, substraction, multiplication and division defined.

Again, most numbers (from rationals up) are fields. They can be described as extensions of the rational numbers field, they are called algebraic number fields.

But there are other fields as well. Actually, there is one very small and very famous.

Boolean algebra

Named after George Boole who introduced it, this algebra formalizes logic operations on true and false, specifying sets of conditions we need to fulfill in order to perform logic operations (no matter how we actually implement it).

A Boolean algebra is defined as (B,+,×,,0,1)(B, +, \times, -, 0, 1), where:

  • (B,+,0)(B, +, 0) is a group,

  • (B,×,1)(B, \times, 1) is a group,

  • b-b is unary operator returning an inverse of bb (negation),

  • operations are distributive:

    a×(b+c)=(a×b)+(a×c)a \times (b + c) = (a \times b) + (a \times c) (b+c)×a=(b×a)+(c×a)(b + c) \times a= (b \times a) + (c \times a)

    (so far we said, that BB is a field),

  • operations have an absorbtion property:

    a+(a×b)=aa + (a \times b) = a a×(a+b)=aa \times (a + b) = a

    (This is where BB differes from Z2Z_2: 1×(1+1)=11 \times (1+1)=1 instead of 00).

Depending on context we might see different notations used for operations:

+===max+ = \lor = \cup = max ×===min\times = \land = \cap = min

depending on how we implement our algebra:

  • using numbers

    ({0,1},max,min,f(b)=1b,0,1)(\{0,1\}, max, min, f(b)=1-b, 0, 1)
  • using sets

    ({,{}},,,f(b)={}b,,{})(\{\emptyset, \{\emptyset\}\}, \cup, \cap, f(b)=\{\emptyset\}-b,\emptyset, \{\emptyset\})
  • using logical operations (if he happens to treat them as something given and not a Boolean algebra implemented in another way in disguise)

    ({true,false},,,¬,false,true)(\{true, false\}, \lor, \land, \neg, false, true)

Each of these implementations is isomorphic - you can map them into each other without ambiguity.

It might be handy if you ever wanted to perform logical operations on a type-level.

Galois field

So, we designed a way of implementing modulo arithmetics on numbers. Next thing we could do, would be making polynomials out of them.

Let us remind quotient rings for a moment. R/IR/I was partitioning the whole ring into smaller disjoint sets of somehow identical elements. Now, let’s denote a ring of polynomials with coefficients from RR as R[x]R[x]. What would R[x]/q(x)R[x]/q(x) mean?

Basically, it would partition the whole set of polynomials R[x]R[x] into smaller sets of polynomials with the same remainders from division by polynomial q(x)q(x) irreducible in Z2Z_2. E.g.

p(x)=x3+1Z2[x]p(x) = x^3 + 1 \in Z_2[x] q(x)=x2+x+1Z2[x]q(x) = x^2 + x + 1 \in Z_2[x]

There are s(x)s(x) and r(x)r(x) (the latter of degree smaller than p(x)p(x)’s one) for which

p(x)=q(x)s(x)+r(x)p(x) = q(x)s(x) + r(x)

The ring we obtain by calculating all possible rests would look like this.

++ q(x)q(x) q(x)+1q(x)+1 q(x)+xq(x)+x q(x)+1+xq(x)+1+x
q(x)q(x) q(x)q(x) q(x)+1q(x)+1 q(x)+xq(x)+x q(x)+1+xq(x)+1+x
q(x)+1q(x)+1 q(x)+1q(x)+1 q(x)q(x) q(x)+1+xq(x)+1+x q(x)+xq(x)+x
q(x)+xq(x)+x q(x)+xq(x)+x q(x)+1+xq(x)+1+x q(x)q(x) q(x)+1q(x)+1
q(x)+1+xq(x)+1+x q(x)+1+xq(x)+1+x q(x)+xq(x)+x q(x)+1q(x)+1 q(x)q(x)
×\times q(x)q(x) q(x)+1q(x)+1 q(x)+xq(x)+x q(x)+1+xq(x)+1+x
q(x)q(x) q(x)q(x) q(x)q(x) q(x)q(x) q(x)q(x)
q(x)+1q(x)+1 q(x)q(x) q(x)+1q(x)+1 q(x)+xq(x)+x q(x)+1+xq(x)+1+x
q(x)+xq(x)+x q(x)q(x) q(x)+xq(x)+x q(x)+1+xq(x)+1+x q(x)+1q(x)+1
q(x)+1+xq(x)+1+x q(x)q(x) q(x)+1+xq(x)+1+x q(x)+1q(x)+1 q(x)+xq(x)+x

As we could expect p(x)p(x) is a neutral element of ++ and p(x)+1p(x)+1 is neutral element of ×\times.

We would like q(x)q(x) to have a root at some α\alpha (q(α)=0q(\alpha) = 0). Currently, Z2Z_2 doesn’t have such element, so extend it:

Z2(α)=Z2{α+x:xZ2}={0,1,α,α+1}Z_2(\alpha) = Z_2 \cup \{ \alpha + x : x \in Z_2 \} = \{0,1,\alpha,\alpha+1\}

We could subsutute x=αx=\alpha and q(α)=0q(\alpha)=0 in our Zn[x]/q(x)Z_n[x]/q(x) ring tables we’ll get:

++ 00 11 α\alpha 1+α1+\alpha
00 00 11 α\alpha 1+α1+\alpha
11 11 00 1+α1+\alpha α\alpha
α\alpha α\alpha 1+α1+\alpha 00 11
1+α1+\alpha 1+α1+\alpha α\alpha 11 00
×\times 00 11 α\alpha 1+α1+\alpha
00 00 00 00 00
11 00 11 α\alpha 1+α1+\alpha
α\alpha 00 α\alpha 1+α1+\alpha 11
1+α1+\alpha 00 1+α1+\alpha 11 α\alpha

We might not know that, but we just build a Galois field with an order of 4.

A Galois field (or finite field) is a finite field which has pmp^m elements (has an order of pmp^m). There is a theorem stating that each finite field’s order is of a form of pmp^m where pp is some prime and mm is some natural number. We denote Galois field with an order of pmp^m as GF(pm)GF(p^m).

If we wanted to build GF(pm)GF(p^m), we would need to find irreducible polynomial q(x)q(x) of degree mm from ZpZ_p (with root at α\alpha), and then we could construct GF(pm)=Zp(α)=Zn[x]/q(x)GF(p^m)=Z_p(\alpha)=Z_n[x]/q(x).

In our example, before we just happen to use the p=2p=2 and m=2m=2, so things played out.

This is the right moment to ask, what is the application of such algebra?

The answer: correction codes.

Correction codes

Correction codes are a way of encoding messages with some extra information, so that in case of some error during transmission we would be able to recover (at least to some degree).

An example of correction codes are linear codes. They translate a vector of kk bits of information into a vector of nn bits containing both information as well as extra information required to recover (we would describe them as (n,k)(n, k); n>kn > k).

We call it a liner code because it is defined using nkn-k linearly independent vectors, that can be added and/or multiplied by a constant. You might have already guessed, that translation is done by multiplying a kk-element vector by a k×nk \times n matrix.

An example. We have a (n,k)=(5,2)(n, k) = (5, 2). Definition of our code would be:

a3=a1a_3 = a_1 a4=a2a_4 = a_2 a5=a1+a2a_5 = a_1 + a_2

We have nk=52=3n-k=5-2=3 lineary independent vectors, that defnine a linear space. a1a_1 and a2a_2 carries an actual information, while a3a_3, a4a_4, a5a_5 are used to recovery. How, we could recover? We need to:

  • at first we calculate all possible vectors - we can do this by trying out all possible values of a1a_1 and a2a_2

    a1a_1 a2a_2 encoding vector
    0 0 00000
    0 1 01011
    1 0 10101
    1 1 11110
  • next, we need to calculate the set of syndroms - vectors representing transmission errors on particular bits. For (5,3)(5, 3) binary code we should be able to recover nk12\lfloor\frac{n-k-1}{2}\rfloor errors, so seach syndrom would have at most 2 bits on,

    0000000000 (empty syndrom - no errors), 0000100001, 0001000010, 0010000100, 0100001000, 1000010000, 0001100011, 0010100101, ......

  • we start adding syndroms to list we created before - we should end up with 252=23=322^{5-2}=2^3=32 vectors grouped in layers:

    information 00 01 10 11
    vectors 00000 01011 10101 11110
    layers 00000 01011 10101 11110
      10000 11011 00101 01110
      01000 00011 11101 10110
      00100 01111 10001 11010
      00010 01001 10111 11100
      00001 01010 10100 11111
      11000 10011 01101 00110
      01100 00111 11001 10010
  • in each rower elements have the same syndromes (the same transmission errors). If we want to correct them, we need to find received vector in layers and then pick the right encoding vector from 2nd row,

  • received vector can be translated back into the information.

If we cannot find our vector anywhere then there were more errors during transmission than we can recover from.

Linear code as matrix

Our 4 vectors can be calculated also other way. We can calculate them as:

a1×[01011]+a2×[10101]a_1 \times [01011] + a_2 \times [10101]

Considering, that input would be a vector [a1,a2][a_1, a_2], it naturally translates to matrix multiplication:

[a1a2]×[0101110101][a_1 a_2] \times \begin{bmatrix}01011 \\ 10101\end{bmatrix}

This G=[0101110101]G = \begin{bmatrix}01011 \\ 10101\end{bmatrix} will be our generator matrix. As mentioned earlier we can encode linear codes by multiplying by GG matrix. We might also consider transforming GG by rearranging and adding rows to each other (such linear transformations will not change its code-correcting properties). Then we might normalize it to the form

G=[IkPk,nk]G = \begin{bmatrix}I_k P_{k,n-k}\end{bmatrix}

where IkI_k is square identity matrix of degree kk and Pk,nkP_{k,n-k} is k×(nk)k \times (n-k) of coefficients used in correction. Such normalized G would be

G=[1010101011]G = \begin{bmatrix}10101 \\ 01011\end{bmatrix}

In order to check errors and decode information we will use parity check matrix HH. We define it as matrix that G×HT=0G \times H^T = 0. We can calculate it knowing Pk,nkP_{k,-n-k}:

H=[Pk,nkInk]H = \begin{bmatrix}P_{k,n-k} I_{n-k}\end{bmatrix}

Such HH can be calculated to calculate a syndrome

s=mHTs = mH^T

which in turn we can use to correct message and then retrieve the original information.

Generating codes from polynomials

We can take a polynomial g(x)g(x) of degree nn over GF(pm)GF(p^m). If we multiplied all possible polynomials by g(x)g(x), we could receive all polynomials divisible by g(x)g(x) without rest. Since elements belong to finite field some of them after this multiplication would be of degree less than nn. They would be words of code generated by this polynomial.

Example (from Wiki): if pm=2p^m=2 then GF(2)={0,1}GF(2) = \{0,1\} and we and up with modulo 2 arithmetic. If we took g(x)=x2+x+1g(x) = x^2 + x + 1, we would obtain words like:

0×g(x)0 \times g(x), 1×g(x)1 \times g(x), x×g(x)x \times g(x), (x+1)×g(x)(x + 1) \times g(x), ......, , (x2+x+1)×g(x)(x^2 + x + 1) \times g(x)

which resolves to:

00, x2+x+1x^2 + x + 1, x3+x2+xx^3 + x^2 + x, ......, x4+x2+1x^4 + x^2 + 1

We can also represent them using vectors of their coefficients:

00000000, 0011100111, 0111001110, ......, 1010110101

These vectors create a linear space. One that we can represent using matrix GG.

By carefully choosing parameters of the polynomial we might create a Bose–Chaudhuri–Hocquenghem code - BCH codes are the most powerful class of correcting codes, they are used in hardware: DVDs, satellites SSDs, etc. Syndromes allow for easy and low-power consuming error correction. These algebras are what makes whole digital communication and storage reliable, so I thought it is worth mentioning how much we rely on it, even if we do not remember about it during our everyday work.

Summary

I wanted to show that there are a lot of algebras, that have practical usage. That we use them every day and rely on their properties even if we are not hardcore functional programmers. Last but not least I wanted to show that algebras as not only semigroup, monoid, and F-algebras - in FP-communities other algebras are hardly ever mentioned.

During writing, I used and borrowed few examples from (already mentioned) Modern Algebra with Applications by Gilbert and Nicholson. I also checked few things in Kody Korekcyjne i Kryptografia by Mochnacki (Correction codes and cryptography, Polish-only, sorry). I merely scratched the surface of algebras to show that that have some practical applications, so if you are interested in more details, I encourage you to consult the books - I had to skip over quite a lot of things to make this post possibly short. It still didn’t come short, so you can only imagine how incredible it would be if I haven’t.