My Profile Photo

kubuszok.com


Personally just a "developer" without X in front of it, currently working with Scala.

I enjoy learning new things, especially more abstract like mathematics or algorithmics.

If you want to know about new posts follow me on Twitter or RSS feed!


Implicits, type classes, and extension methods, part 1: with type classes in mind

Implicits. For some people, they are an indispensable feature of Scala, that allows achieving otherwise impossible things. For others a sole reason to avoid using the language. As far as I can tell the majority of the later never really learned how to use them right.

In this mini-series I want to explain different concepts related to implicits: the motivation behind, different mechanisms based on them and a bit on how to work with them. Originally it was supposed to be a single post, but it was way too long for anyone to read.

Implicit val, def, conversion, class…

First of all, implicit is not a thing. There are several distinct ideas, all of which use the same keyword implicit, using the same internal mechanism, but in the end, doing different things for different reasons. We could group them like this:

  • implicit parameter,
  • implicit val and implicit def taking only implicit parameters,
  • implicit def taking one parameter - implicit conversion,
  • implicit class.

However, this division tells us nothing. I think, in order to understand implicits it’s best to talk about them using concrete use cases, and only then look at what they have in common.

Type-based Dependency Injection

Let’s start with a simple case.

def runServer(config: Config, ec: ExecutionContext): Unit 
def runMigrations(config: Config, ec: ExecutionContext): Unit
def runREPL(config: Config, ec: ExecutionContext): Unit

// config.mode is an enum telling which run mode user wanted
def startApp(config: Config, ec: ExecutionContext): Unit =
config.mode match {
  case Server => runServer(config, ec)
  case Migrations => runMigrations(config, ec)
  case REPL => runREPL(config, ec)
}
val config: Config
val ec: ExecutionContext

startApp(config, ec)

As we can see, here we simply pass dependencies around. Additionally, these dependencies have unique types. In this scope we could find only one instance for each type we would ask compiler about. So, we can imagine, we could ask the compiler to do something like:

def startApp(use here the only thing of type Config,
             use here the only thing of type: ExecutionContext): Unit =
(that only Config thing).mode match {
  case Server => runServer(pass Config for me, pass ExecutionContext for me)
  case Migrations => runMigrations(pass Config for me, pass ExecutionContext for me)
  case REPL => runREPL(pass Config for me, pass ExecutionContext for me)
}
startApp(pass Config for me, pass ExecutionContext for me)

Of course, this would break pretty easily. Put another Config and ExecutionContext here and the compiler would not be able to guess, which object we want. So, let’s mark them:

mark as special val config: Config
mark as special val ec: ExecutionContext

This too would fail if we marked 2 Configs or ExecutionContexts as special, but at least this would be a thing we could do something about.

At this point, we can notice, that since it’s the compiler that passes objects for us, we don’t have to write them ourselves:

startApp

The compiler knows, that it needs here one Config and one ExecutionContext, and in order to find them it searches through the list of objects marked as special. Then all we would see is that these parameters were passed implicitly.

This is the reason, why implicit is the keyword we use to mark the variables. It is also the keyword used to mark the parameter list which arguments should be filled by values from implicit scope:

def runServer(implicit config: Config, ec: ExecutionContext): Unit 
def runMigrations(implicit config: Config, ec: ExecutionContext): Unit
def runREPL(implicit config: Config, ec: ExecutionContext): Unit

def startApp(implicit config: Config, ec: ExecutionContext): Unit =
config.mode match {
  case Server => runServer
  case Migrations => runMigrations
  case REPL => runREPL
}
implicit val config: Config
implicit val ec: ExecutionContext

startApp

There are some rules regarding implicits:

  • implicits cannot be ambiguous. You cannot define 2 implicits with the same type in the same scope,
  • however, class and its subclass are slightly different scopes. The rule of thumb is that in such cases, the closer scope would win. However, as always it is a bit more complex, so we will explain that later on,
  • companion objects are also checked for implicits. If you have type X, then X companion will be checked for X implicits, but also for F[X] or X[A] (this will be explained later, as we reach type classes),
  • a method can have one parameter list, which will be marked as implicit. As of now, it will always be the last parameter list. Each of params will become implicit value (so you cannot get params from the implicit scope, but not make them implicits themselves).

For convenience scala.Predef defines utility, that allows us to get implicit by its type:

@inline def implicitly[T](implicit e: T) = e
implicit val s: String = "test"

implicity[String] // "test"

Most of the time, these rules work quite good. There is at least one case though, where they can break your code unexpectedly:

implicitly[X] // gets correct X from some other scope

class Y {
  
  implicit val x: X = implicitly[X]
}

(new Y).x // null!

If you run this particular code, you could be surprised, why y.x would return null. Problem is, that usually when we define variables, they become available only from the moment they are defined. However, class parameters are always accessible from any place in the class - even before they are initialized. Normally, scalac would produce an error if you explicitly tried to access uninitialized val. However, here it is not so obvious - value comes from implicitly[X]. Unfortunately, implicitly takes it from the closest scope it can find - which is Y class - at this point, not initialized completely.

I think I stumbled on this issue only twice. But the first time it was a nightmare, to figure out what was happening. It only shows how complex things can get once several features are used at once.

Instance produced on demand

Notice, that implicit instances so far were defined as values. We had to define a complete type and implementation. But, let’s say we want to always have an instance of a List that is an empty List. The List has a type parameter. What now?

implicit def emptyList[T]: List[T] = List.empty[T]

implicitly[List[String]] // Nil
implicitly[List[Int]] // Nil

That wasn’t very interesting. But let’s say we want a Stream, and we want it to be filled with a value of implicit scope. Can we do it?

implicit s: String = "test"
implicit i: Int = 0

implicit def streamOfT[T](implicit value: T): Stream[T] =
  Stream.continually(value)

implicitly[Stream[String]] // Stream("test", "test", ...)
implicitly[Stream[Int]] // Stream(0, 0, ...)

We see, that we can use implicit def to define value in scope in a more dynamic way. It can even be fed with other values from implicit scope! But now one would write a code like that in examples: passing around primitives is not a not going to work in a long run. So, what is the actual benefit of this mechanism and what is the reason for implicits’ existence in the first place?

Flavors of polymorphism

It all begins with Haskell and polymorphism. Both mathematicians and programmers want to reuse concepts that they already spent time inventing. In order to do so, they need a way to generalize where possible, only specializing where needed. Polymorphism is a general idea, about having different kinds of data, that have something in common, which allows you to perform same operations on them. In object-oriented programming, that usually is related to sharing a common interface, but that’s just one case. Actually, there are 3 different kinds of polymorphism:

  • Subtyping aka subtype polymorphism. Within one type, you have a subset of values making another type. OO languages usually implement it with interface inheritance. Because the subtype must fulfill all requirements and assertions of the supertype, the supertype can be treated as a generalization and common denominator of all its subtypes.

    (That is the ideal situation - OO programmers need to be actively reminded about Liskov substitution principle. A lot of bugs were born when someone overrode a method and created a class that shared an interface with the superclass but couldn’t be used as a drop-in replacement).

  • Parametric polymorphism, List of Strings should behave the same as List of Ints. So why not let them have the same implementation? Of course, they have to have different types, so that we don’t put integers into a list of strings and vice versa. Type parameters let us define a whole implementation where instead of a type we put a placeholder, and when we supplement it with a concrete type, the implementation is materialized.

    There is also another way to look at it - mathematical one. Functions, in general, are sets of arguments-values pairs. Types are also sets of values. And set might be both an argument and a value of a function. So it is possible to create a function taking a type as an argument and returning another type as a value. An example of such function (called type constructor) would be List[_], which takes a parameter e.g. String and then returns a concrete type e.g. List[String].

  • Ad-hoc polymorphism aka operator overloading. You have * (or +) operator and you want to be able to use if for multiplying: ints, longs, doubles, floats, complex numbers, vectors, matrices… For each type it would have a slightly different implementation, so you cannot just create one function that rules them all, and simply adjust the type (as is the case with parametric polymorphism), nor extract common behavior (what matrices multiplication has in common with multiplying scalars?).

    As a matter of the fact, each overloaded function is actually a set of different functions, sharing the same name, where one that will be used is based on a type of the argument(s) (and their number aka arity).

The distinction between all 3 kinds of polymorphism is very important. At some point, developers of Haskell needed to introduce some kind of polymorphism to handle things like mentioned multiplication or addition, but not by having everything being build into the language. They wanted their polymorphic solution to be extensible. Parametric polymorphism is not a solution to this problem and they didn’t want to introduce subtyping with its shares of problems with inheritance.

Type classes in Haskell

The solution originally described in the paper How to make ad-hoc polymorphism less ad hoc works as follows:

  • let us define a named contract. This contract would require, that for a type certain functions have to be defined and meet certain assertions,
  • when type declared itself as following the said contract, then somewhere in the scope of program required functions have to be defined and explicitly pointed to (and have to follow assertions),
  • operators like *, + etc have assigned contracts, and are used as syntactic sugar for functions mentioned in their contracts,
  • one contract can inherit obligations from another contract and add to them.

Such contracts would be named type classes. Let us see this idea by the example of a monoid.

A monoid is an algebra of elements that could be combined together.

We could define it formally as a triple , where:

  • we would require to be binary operator over :
  • which is also associative:
  • and has a neutral element :

Examples of such monoids would be: numbers with addition and 0, numbers with multiplication and 1, lists of the same type with list concatenation and an empty list, matrices of the same size with matrix addition with a matrix of zeros, square matrices of the same size with matrix multiplication and an identity matrix.

Haskell would define such contract this way:

class Monoid m where
  mempty :: m            -- neutral element
  mappend :: m -> m -> m -- operator A x A -> A
  mconcat :: [m] -> m    -- fold list (e.g. calc sum/product)
  -- defining mconcat is optional, since it has the following default:
  mconcat = foldr mappend mempty
 
-- this infix synonym for mappend is found in Data.Monoid
x <> y = mappend x y
infixr 6 <>

together with the following laws:

-- Identity laws
x <> mempty = x
mempty <> x = x
 
-- Associativity
(x <> y) <> z = x <> (y <> z)

An implementation for numbers would look like:

newtype Sum n = Sum n
 
instance Num n => Monoid (Sum n) where
  mempty = Sum 0
  mappend (Sum x) (Sum y) = Sum (x + y)
  
newtype Product n = Product n
 
instance Num n => Monoid (Product n) where
  mempty = Product 1
  mappend (Product x) (Product y) = Product (x * y)

(Examples are taken from Haskell wiki).

While this syntax might be alien to quite a lot of people, it should show us a few things:

  • Haskell uses names that no other language would consider newcomer friendly (mempty - monoid-empty, mappend - monoid-append, mconcat - monoid-concatenate),
  • Haskell kind of wraps types in Sum or Product, because both addition and multiplication are monoids and it has to distinct them somehow which one we want to use,
  • Haskell borrowed class and instance keywords from object-oriented languages,
  • if we think about it… it is not about declaring a class at all! That is unless we assume that Haskell only lets us use class to define parametrized trait with an interface and assertions, while instance only allow us to create one global singleton instance of the interface per type used as a parameter.

Considering that a type class is not really a class, but has class in its name it is not surprising that people are confused about them! But let’s get back to Scala.

Type classes in Scala

Odersky found an idea of type classes appealing and wanted to have them in Scala. There were some issues though:

  • JVM basically won’t let you make sure that several functions are defined at once in other way than by putting them into the same interface,
  • Haskell would supply the right implementation inside parametrized type and each parametrized function is turned into a specialized version, which might have all occurrences of an overload function replaced with a pointer to a specific implementation,
  • also Haskell can look through the whole scope for an implementation for a specific type. With JVM ecosystem where you compile stuff directly to the bytecode and later on you would have trouble re-compiling stuff because new type appeared. @Specialize won’t help either, since it only generates special cases for primitives.

To address the issues, he chooses following solutions:

  • type classes with be implemented as actual classes. Or trait. This way you will have to guarantee,
  • wherever you use a type class, you will call a method on an actual instance which will store all necessary information about types inside,
  • the fact that you call this code from type-erased generic implementation that could have been compiler somewhere else won’t be an issue, because you will use dependency injection to get it.

So you could implement type classes this way:

trait Monoid[A] {
  def empty: A
  def append(a1: A, a2: A): A
  def concat(list: List[A]): A = list.foldRight(empty)(append)
}

val intMonoid = new Monoid[Int] {
  def empty: Int = 0
  def append(a1: Int, a2: Int): Int = a1 + a2
}

def combineValues[T](
  map: Map[String, List[T]]
)(monoid: Monoid[T]): Map[String, T] =
  map.mapValues(monoid.concat)

combineValues(Map(
  "test1" -> List(1, 2, 3, 4, 5),
  "test2" -> List(6, 7, 8, 9, 0)
))(intMonoid)

Thing is, usually, you want to have only one type class for each type in the scope. Additionally, such usage would introduce clumsiness the original solution hasn’t. Therefore the idea of implicits was born. Originally, implicits were introduced as a way of automatically passing type classes where needed basing solely on their type.

object Monoid {
  
  implicit val intMonoid: Monoid[Int] = new Monoid[Int] {
    def empty: Int = 0
    def append(a1: Int, a2: Int): Int = a1 + a2
  }
}

def combineValues[T](
  map: Map[String, List[T]]
)(implicit monoid: Monoid[T]): Map[String, T] =
  map.mapValues(monoid.concat)

combineValues(Map(
  "test1" -> List(1, 2, 3, 4, 5),
  "test2" -> List(6, 7, 8, 9, 0)
))

Here we can notice one interesting thing about type classes and implicits: when you need an implicit of type F[A] compiler will look for it inside companion objects of F and A if no implicit could be found in the current scope. This way, you don’t have to define/import them everywhere if you just want to rely on default.

But passing type classes could be made simpler even further. Since the most common use case is a type class with one parameter, you can declare them as context bounds.

def needMonoid[T](ts: List[T])(implicit monoid: Monoid[T]): T =
  monoid.concat(ts)
// could be written with context bound
def needMonoid[T: Monoid](ts: List[T]): T =
  implicitly[Monoid[T]].concat(ts)
def severalBoundArePossible[T : Bound1 : Bound2]: Unit = {
  implicitly[Bound1[T]]
  implicitly[Bound2[T]]
}

With context bound notation you declare that you need an implicit. It will become available in implicit scope, but it has no name you can use. This is why:

  • it won’t cause unused parameter issue if you just want to force some contract, but not made use of if (e.g. implicit evidence),
  • you can pass it inside a method to another method that requires it,
  • if you want to use it, you need to extract it from implicit scope manually e.g. with implicitly[T].

In type-class-heavy libraries like e.g. Cats you can see an interesting pattern to avoid using the whole implicitly[Typeclass[T]]:

object Monoid {
  
  def apply[T: Monoid]: Monoid[T] = implicitly[Monoid[T]]
  def append[A: Monoid]:(a1: A, a2: A): A = apply[A].append(a1, a2)
  def concat[A: Monoid](list: List[A]) = apply[A].concat(list)
}

Once you define such helpers inside a companion, you can use them inside a code making the context very clear:

def combineValues[T: Monoid](map: Map[String, List[T]]): Map[String, T] =
  map.mapValues(Monoid.concat[T])

If you’ll ever start working with really FP-heavy libraries like Cats or Scalaz, you’ll notice that concepts everywhere.

Summary

In this post, we learned a bit about the first usage of implicits - type-based dependency injection as well as the biggest reason behind introducing them - the type classes.

In next post, we’ll talk about how type-classes (or other implicit instances) are created on demand.