Kinds of types in Scala

When I try to explain to someone why I prefer Scala to Java, and why functional programming works better with Scala, one of the arguments is that is has a better type system. But what exactly it means? What advantage it has over the static type of languages like Java or Go?

Origin of types

First of all, lets us think what are types themselves. Historically, they were introduced to solve the issue of ambiguities in mathematics, which appeared in certain conditions, and were developed as a port of the effort to formalize mathematics.

Before formalization, mathematicians needed to face paradoxes like set of all sets. Bertrand Russell described it as one of entities we are unable to decide whether it could exist or not, without building upon strict, formal definitions. Nowadays, set theory solves this problem using Zermelo–Fraenkel axioms:

  • let us assume, that the set of all sets could be constructed,
  • the power set of some set is always bigger than said set,
  • if this set is the set of all sets it would have to be a strict superset of itself and its own power set as well,
  • this leads to the conclusion, that the set would have to be bigger than itself (not equal!),
  • by contradiction we conclude, that building the set of all sets is impossible.

At a time though, Bertrand Russell proposed another way to avoid the paradox:

  • each term should be assigned a type. It is usually denoted with a colon: 2:nat2: nat,
  • if we wanted to define a function we would define on which types it operates: e.g. times2:natnattimes2: nat \rightarrow nat.

f:XYf: X \rightarrow Y read as: ff is a function from XX (domain) to YY (codomain).

He proposed, that we should build objects from bottom-up: if we define something using predicates, we can only use already defined object and never other predicates. However, such definition has a flaw: since we cannot reuse other predicates, we couldn’t e.g. compare cardinality of sets (how could you prove, that the types match?).

His initial (ramified) set theory was later on refined using axiom of reducibility, which allowed whole hierarchies of types to collapse creating a simple type theory. It introduced the following rule:

  • terms can be converted using explicitly defined rules: e.g. 2+242 + 2 \twoheadrightarrow 4 means that 2+22 + 2 reduces to 44, so it has the same type.

In 1940, Alonzo Church combined the type theory his own lambda calculus creating simply typed lambda calculus. Since then more advanced type theories appeared.

Types, intuitively

Type theory exists in parallel to a set theory: one is not defined using the other. Because of that we cannot strictly speaking compare types to sets.

But we can if it’s about intuition. By saying the value aa belongs to a type AA (a:Aa: A), we can compare it to saying that value aa belongs to a set AA (aAa \in A). Saying that type AA is a supertype of BB and a subtype of CC is analogous to saying, that BACB \subset A \subset C. When we say that tt is of both types XX and YY, it is the same as saying that tXYt \in X \cap Y.

If we defined a set of all values besides itself, in type theory we would call it a top type: type which contains (is a supertype of) all other types. Similarly, the empty set would have a counterpart in form of the bottom type. To make it easier to remember, imagine types as a hierarchy where more generic (less restricted) types are above and more specific (with more requirements) below. Then the type with no requirements would sit the top and contains all values, while the type at the bottom, with requirements of all the types above it (which must be contradictory), would be empty.

In Scala

If we look how Scala implements types, we’ll see that a judgment a:Aa: A (aa is of type AA) directly translated into the syntax:

val a: A
def f(a: A): B

When it comes to Scala syntax, we won’t talk about a judgment. Instead, we would talk about a type ascription. Notation for function type also comes from the type theory: times2:intinttimes2: int \rightarrow int becomes:

val times2: Int => Int

Scala has also the notion of the top and bottom types. The top type - a type we don’t have any assertions about, which contains any other type - is Any. In Java, the closest thing would be an Object, except it is not a supertype for primitives. Scala’s Any is a supertype for both objects (AnyRef) and primitives (AnyVal). When we see Any as an inferred type, it usually means we messed up and combined two unrelated types.

The bottom type - a type which is a subtype of all others, and has no citizen - in Scala is Nothing. We cannot construct a value of type Nothing, so it often appears as a parameter of empty containers or functions that can only throw.

(Quite often I see that Null is mentioned as another bottom type in Scala together with Nothing. It is a type extending everything that is an Object on JVM. So it is kind of like Nothing without primitives, Unit/void, and functions that only throws).

Also, a word about functions. In Scala we make a distinction between a function and a method. A function type is a value, which can be called:

val times2: Int => Int = x => x * 2

A method, on the other hand, is not a value. It cannot be assigned, or directly passed, though via a process called eta-expansion a method can be turned into a function.

def times2(x: Int): Int = x * 2
// val y = def times2(x: Int): Int = x * 2 - error!
val y = { def times2(x: Int): Int = x * 2 } // Unit
// however
val z: Int => Int = times2 _ // works!

Less accurate, but simpler version: a function is an instance of some Function0, Function1, …, Function22 trait, while a method is something bound to a JVM object. We make a distinction because authors of Scala haven’t solved the issue in a more elegant way.

Oh, and one more thing. With a method, you can declare a argument as implicit. You cannot do that with function. For now. It will be possible with implicit function types in Dotty.

Type inference and Least Upper Bounds

This hierarchy of types is quite important when it comes to type inference. We expect it to provide us with the most concrete type possible, while it has to comply to any type restriction it was imposed for a given case. Let’s take a look at a few cases:

val s = "test"

"test" has a type of String. There are no additional restrictions. So, the inferred type is String.

def a(b: Boolean) = if (b) Some(12) else None

Here, the returned value is one of two types Some[Int] or None.type (a type of None companion object). How to decide the common type for these two?

Let’s build a graph of a type hierarchy:

Any Any AnyRef AnyRef Any->AnyRef Equals Equals AnyRef->Equals java.io.Serializable java.io.Serializable AnyRef->java.io.Serializable Product Product Equals->Product Serializable Serializable java.io.Serializable->Serializable Option[T] Option[T] Product->Option[T] Serializable->Option[T] Some[T] Some[T] Option[T]->Some[T] None.type None.type Option[T]->None.type

As we can see, there is quite a lot of candidates for the type of value returned by the function - any type which is above Some[T] and None.type at once is a valid candidate. Each of them is an upper bound for the sought type. (By analogy, if we were looking for a type of parameter we want to pass to a function, that would have to conform to several types, each of the types that would be below all of them would be a lower bound).

But having several upper bounds is not enough, it is ambiguous. However, it just happens so, that there is always one type that is the smallest if them (in a partial order meaning of smallest). Here, we can see that the Option[T] is the least of upper bounds, that fulfill our requirements. We call it the least upper bound (LUB) and it is exactly the type our function will get.

Here is why knowledge about the top type comes handy: if the inferred type is a top type (Any), among our requirements, are some that are most likely wrong, e.g. we ask Scala to find the greatest common denominator for a String and Int (List("test", 10)). Because Any is the only upper bound they share, it automatically becomes the least upper bound and the inferred type. Hardly ever we want such behavior, so it is no wonder that wartremover treats interred Any as a code smell.

Algebraic Data Types

When we think of types as a sets, there are 2 special constructions, that help us build new sets from existing ones, which complement each other. One is a Cartesian product and the other is a disjoint union/disjoint set sum.

Let’s start with a Cartesian product. We can define it as a generalization of an ordered pair. A set of ordered pairs (A,B)(A, B) would be a set of values (a,b)(a, b), such as aA,bBa \in A, b \in B, which we could distinct by its position inside a brackets. More formally, we could define an ordered pair as (Kuratowski’s definition):

(a,b)={{a},{a,b}}(a, b) = \{ \{a\}, \{a, b\}\}

Such definition help us indicate the order using construction that itself has no distinction of order (that is a set).

Cartesian product generalizes the idea. It an operator which takes 2 sets and creates a set of tuples for them:

A×B={(a,b):aA,bB}A \times B = \{ (a, b): a \in A, b \in B \}

{x:p(x)}\{ x: p(x) \} is a set builder notation meaning: a set made of elements xx for which a predicate p(x)p(x) is true

Such definition is not associative ((A×B)×CA×(B×C)(A \times B) \times C \ne A \times (B \times C)), which we can easily check:

((1,2),3)={{{{1},{1,2}}},{{{1},{1,2}},3}}((1, 2), 3) = \{ \{ \{ \{ 1 \}, \{ 1, 2 \} \} \}, \{ \{ \{ 1 \}, \{ 1, 2 \} \}, 3 \} \} (1,(2,3))={{1},{1,{{2},{2,3}}}}(1, (2, 3)) = \{ \{ 1 \}, \{ 1, \{ \{ 2 \}, \{ 2, 3 \} \} \} \}

This is why we must explicitly define a nn-tuple and a product of nn sets as left-associative operation:

(x1,...,xn1,xn)=((x1,...,xn1),xn)(x_1, ..., x_{n-1}, x_n) = ((x_1, ..., x_{n-1}), x_ n) X1×...×Xn1×Xn=(X1×...×Xn1)×XnX_1 \times ... \times X_{n-1} \times X_n = (X_1 \times ... \times X_{n-1}) \times X_n

If we consider records (well, any class actually) as a nn-tuple of its attributes and containers like list as infinite-tuples (where all elements after some index are empty), then we’ll understand why they are called the product types and why they often share a common scala.Product trait.

type IntTwice = (Int, Int)
case class TwoInts(first: Int, second: Int)
type IntList = List[Int]
// all of the above are products

(IntList is a type alias, another name for existing type. We could use List[Int] instead of IntList and it would make no difference to Scala).

The other construct - a disjoint union, sum type or coproduct - a is sum of finite number of sets, where no two sets have a non-empty intersection. Types are also tagged, that is we can always say from each subset originated any value of the union.

Scala 2.x doesn’t allow to actually build up an union from already existing types. If we want to let compiler know that types should form a coproduct, we need to use a sealed hierarchy:

sealed trait Either[+A, +B] {}
case class Left[+A, +B](a: A) extends Either[A, B]
case class Right[+A, +B](b: B) extends Either[A, B]

sealed trait Option[+A] {}
case class Some[A](value: A) extends Option[A]
case object None extends Option[Nothing]

The sealed keyword makes sure that no new subtype of said class can be created outside the current file, and so compiler will always know all types that directly inherit the trait (and form a disjoint union). This allows pattern matching to be exhaustive: compiler can tell us that we didn’t matched all components of the union.

Dotty will introduce the union types, which will let us build an union out of already existing types.

def f(intOrString: Int | String) = intOrString {
  case i: Int    => ...
  case s: String => ...
}

Product types and sum types together are known as algebraic data types. They are the foundation the data modeling in functional programming build upon.

They also allow us to use generic programming in Scala with libraries like shapeless.

Compound and intersection types

We can calculate a sum of sets (sum type), a Cartesian product of sets (product type), why not an intersection? We can declare a type belonging to several types using… with keywords (yup, it’s more than just a mixin!).

trait Str { def str: String }
trait Count { def count: Int }

def repeat(cd: Str with Count): String =
  Iterator.fill(cd.count)(cd.str).mkString

repeat(new Str with Count {
  val str = "test"
  val count = 3
})
// "testtesttest"

It follows the mathematical definition exactly:

xAB    xAxB    xBxA    xBAx \in A \cap B \iff x \in A \land x \in B \iff x \in B \land x \in A \iff x \in B \cap A

which mean that the order of composing type using with does not matter.

val sc: Str with Count
val ca: Count with Str
def repeat(sc) // works as expected
def repeat(ca) // also works!

Well, at least when it comes to types. Since we can compose types this way we have to face a diamond problem sooner or later.

trait A { def value = 10 }
trait B extends A { override def value = super.value * 2 }
trait C extends A { override def value = super.value + 2 }
(new B with C {}).value // ???
(new C with B {}).value // ???

How would Scala deal with such cases? The answer is a trait linearization:

trait X extends A with B with C

is the same as

trait AnonymousB extends A {
  // B overrides A
  override def value = super.value * 2
}
trait AnonymousC extends AnonymousB {
  // C overrides AnonymousB
  override def value = super.value + 2
}
trait X extends AnonymousC

(implementation wise, types B and C are not lost). If we switch the order:

trait Y extends A with C with B

then it is as if we

trait AnonymousC extends A {
  // C overrides A
  override def value = super.value + 2
}
trait AnonymousB extends AnonymousC {
  // C overrides AnonymousB
  override def value = super.value * 2
}
trait Y extends AnonymousC

It should be obvious now, that

(new B with C {}).value // (10 * 2) + 2 = 22 
(new C with B {}).value // (20 + 2) * 2 = 24

In Dotty intersection types will be denotes using &. Documentation claims that it will differ from with in a way, that Dotty want to guarantee, that there will be no difference between A & B and B & A (operator is commutative). Currently, while A with B can be used in place of B with A and it just works., their behavior might be different due to change of order in trait linearization. Dotty will also handle parametric types intersection and intersection of properties types (List[A] & List[B] = List[A & B]). Because with has no such guarantees we call types created with it compound types instead.

Classes

In mathematics class came to life as a generalization of sets. In set theory, we build sets inductively from smaller elements. We cannot say about the set of all sets, as it cannot be built in such way. But what if we had another way of grouping objects, e.g. by some predicate?

A class is such group of objects for which some predicate (an indicator function, we could say) returns true.

Because class does not has to be set, we can talk e.g. about class of all sets.

{x:isSet(x)}\{ x: isSet(x) \}

Programming languages borrowed word class exactly from this concept. If we think about it, in all languages that has classes, it is basically a specification of a type/set of instances, where each member must have some set of properties and methods. It is true for:

  • statically typed languages (like C/C++ where class definition dictates memory layout of an instance),
  • dynamically-typed (like Python or Ruby where class is basically a factory object that creates other objects according to the spec)
  • or prototype-based (like JavaScript, where question does it have a property/methods? is extended to does it or its prototype-object have a property/method?).

Of course, in Scala, a class (and subclass) create a new type (and subtype). Same can be said about traits.

Unit

This one is special. In lambda calculus, there are no constants, no nullary functions, and no multi-parameter functions. Everything is a single argument function returning another single argument function (therefore currying is the only way of achieving function with arity \geq 1).

Similarly, in category theory, each arrow (function abstracted away from its body) goes from one object to one object (function domains/codomains abstracted away from specific values).

In such situation, we would have to implement somehow functions returning no (meaningful) value as well as functions taking no parameters (nullary). Category theory names such irrelevant garbage input an initial object or void. It assumes that for every object there is you can create an arrow from void to the object (for every type there is a nullary function returning a value of this type). Since you never have the need to actually pass any value of void type (it won’t be used anyway), the set might be pretty much empty, and an actual implementation can work around it (category theory don’t care about the values, remember?). Thus the name void.

Similarly, there is also a final object or unit, for each object there is an arrow from an object to unit (for every type there is function that could be feed with values of this type and don’t return anything useful). If it only serves the purpose of being returned, we don’t attribute it with anything meaningful. No matter how many values of this type exists, we cannot tell them apart. For us, the unit might be as well be a singleton. Thus the name unit.

While creators of Scala didn’t see a point in making void an actual type (and making people confuse it with Java’s void), they did want to make sure that every function returns something. That something is (), the sole citizen of an Unit type.

Type constructors and parametric types

We can think of type as a set. But we can also think of function as a set, a set of parameters-returned value pairs. This lets us also thinks of a function signature as a type: a set of all possible sets of such pairs that arguments belong to one type and returned value to the other (where first value from a pair in unique within set):

  • type T - a set of values

  • t: T - value belonging to set T (tTt \in T)

  • type S => T - a set of a subset of S×TS \times T such that each value is a function STS \rightarrow T (which means that first element of a pair must be unique)

    {f:(f:ST)}    {f:fS×T(s1,t1),(s2,t2)fs1s2}\{ f: (f: S \rightarrow T) \} \iff \{ f: f \subset S \times T \land \forall_{(s_1, t_1), (s_2, t_2) \in f} s_1 \neq s_2 \}
  • f: S => T: a subset of pairs, where first element of a pair is unique

    f:ST    (fS×T(s1,t1),(s2,t2)fs1s2)f: S \rightarrow T \iff (f \subset S \times T \land \forall_{(s_1, t_1), (s_2, t_2) \in f} s_1 \neq s_2)

A little bit mind-numbing, but not as much as the next discovery: we could pass a set as an argument of a function and receive another set as a returned value.

Lets build one such function:

  • at first, let’s build a set of all possible types:

    T={T:isType(T)}T' = \{ T: isType(T) \}
  • we can define a list of a type T as a function from an index (a natural number) to a value t:Tt: T (which is not necessarily total function - it has no defined values after its last index)

    isListT(l)=(i,t)l(iNt:T(i2,t2)(i=i2    t=t2))isList_T(l) = \forall_{(i, t) \in l}( i \in \mathbb{N} \land t: T \land \forall_{(i_2, t_2)} (i = i_2 \implies t = t_2 ))

    or shorter

    isListT(l)=l:NTisList_T(l) = l: \mathbb{N} \rightarrow T
  • now, we can build a set of all possible lists of a type TT:

    LT={l:isListT(l)}L_T = \{ l: isList_T(l) \}
  • next step would be to build a set of all possible lists of a type TT for each possible type TT:

    L={LT:isType(T)}L = \{ L_T: isType(T) \}
  • finally, we can define a function which takes one TTT \in T' and returns another set LTLL_T \in L

    List:TLList: T' \rightarrow L List(T)={l:isListT(l)}List(T) = \{ l: isList_T(l) \}

What we built here is a function that takes a type TT and returns… a type ListTList_T or List[T] using Scala’s notation.

Such interesting function which takes a type and builds another type is called a type constructor. In Scala, we could denote ListList type constructor as List[_].

Type constructors are interesting to us because they are a mathematical way of defining *parametric types, **known as *class templates by C++ programmers and generics by Java programmers (though, to be honest, from a mathematicians point of view they are cripples implementation of parametric types). (You will also see this term frequently when you start reading the definitions of type classes on Haskell wiki).

Where parametric types differ from what you might know from C++ or Java is the consequence of their functional definition. You can have a higher order function: a function which takes and/or returns another function. Same with type constructors: you can have a type constructor, which takes and/or returns another type constructor. Many type class definitions take a type constructor as a parameter:

trait Functor[F[_]] { // <- F is a type constructor
  
  def map[A, B](fa: F[A])(f: A => B): F[B]
}

Kinds and higher-kinded-types

These types of types (a type, a type constructor, etc) are known as kinds. Scala let us investigate them in REPL using :kind:

scala> :kind String
String's kind is A

scala> :kind List
List's kind is F[+A]

scala> :kind Either
Either's kind is F[+A1,+A2]

scala> import scala.language.higherKinds
scala> trait NeedTC[F[_]]
scala> :kind NeedTC
NeedTC's kind is X[F[A]]

You can add a -v flag to get a detailed description with a type-theory notation (* read as type):

scala> :kind -v String
String's kind is A
*
This is a proper type.

scala> :kind -v List
List's kind is F[+A]
* -(+)-> *
This is a type constructor: a 1st-order-kinded type.

scala> :kind -v Either
Either's kind is F[+A1,+A2]
* -(+)-> * -(+)-> *
This is a type constructor: a 1st-order-kinded type.

scala> import scala.language.higherKinds
scala> trait NeedTC[F[_]]
scala> :kind -v NeedTC
NeedTC's kind is X[F[A]]
(* -> *) -> *
This is a type constructor that takes type constructor(s): a higher-kinded type.

But functions allows us to do a partial application. Can we do it for types in Scala? Well, we can, but in a messy way.

scala> :kind ({ type T[A] = Map[String, A] })#T
scala.collection.immutable.Map[String,?]'s kind is F[+A]

scala> :kind -v ({ type T[A] = Map[String, A] })#T
scala.collection.immutable.Map[String,?]'s kind is F[+A]
* -(+)-> *
This is a type constructor: a 1st-order-kinded type.

(Such thing is called a type lambda, but well talk more about it when we learn more about structural and path-dependent types. Let’s skip detailed explanations for now).

Among the detailed descriptions, there are terms like 1st-order-kinds and higher-kinded types. Indeed, the same way function taking/returning functions are called higher-ordered functions, type constructors taking/returning type constructors are called higher-kinded types.

Let’s take a look at how values and functions correspond to types, and how it is reflected in types and type constructors correspondence to kinds:

Value Type Notes
"test" String a plain value
(i: Int) => i*2 Int => Int a 1st-order function
(a: Int) => (b: Int) => a+b Int => Int => Int a higher-ordered function
((i: Int) => i*2)(2) Int a plain value
((a: Int) => (b: Int) => a+b)(2) Int => Int a 1st-order function

and on a type level:

Type Kind Notes
String * a proper type
Int => Int * a proper type
a 1st-order function
Int => Int => Int * a proper type
a higher-ordered function
List * -> * a type constructor,
1st-order-kinded type
Either * -> * -> * a type constructor,
1st-order-kinded type
List[A] *  
({ type T[A] = Either[String, A] })#T * -> * a type constructor,
1st-order-kinded type

This table show us how elegantly values corresponds to types, functions to type constructors and function application to passing type to a type constructor. Values can be think of as nullary (0-argument) functions, while types can be thought of as nullary type constructors, and it still works perfectly! They are exactly the same thing, just one level of abstraction higher.

As far as I know, there are even higher levels in values-types-kinds hierarchy, such as sorts, but hardly ever there would be some practical applications for them.

Type constraints

We already talked about type bounds, and how we might impose restrictions on inferred type just giving it values it has to infer against. With parametric types, we might be able to do it more directly.

Let’s say we have an ADT like this:

sealed trait User { val id: Int }
case class Member(id: Int, name: String) extends User
case class Admin(id: Int, accss: Set[String]) extends User

We would like to have a function that takes the list of users, and return a map from id to user… except we would like that map to adjust: if we pass a set of Users it should return a Map[Int, User], if we pass a set of Members it should return a Map[Int, Member].

Our first attempt would probably look like this:

def byId(users: Set[User]): Map[Int, User] =
  users.map { u => u.id -> u }.toMap

It works, but it is not generic. We wanted it to be generic.

def byId[U](users: Set[U])(getId: U => Int): Map[Int, U] =
  users.map { u => getId(u) -> u }.toMap

It also works, but it forces us to pass an extractor every time - not a bad idea if it did something different each time, but it only need to get the same property all our cases share via the User class. If only there was a way to suggest the type system, that U needs to extends User (or has User as a upper bound), and then make use of its properties…

As a matter of the fact, there is.

def byId[U <: User](users: Set[U]): Map[Int, U] =
  users.map { u => u.id -> u }.toMap
byId(users: Set[Member]) // Map[Int, Member]

<: denotes an upper bound in type parameters. It look like this, so that parser would not confuse it with <, but its meaning is similar - a type on the left is smaller (lies lower in hierarchy) than a type on the right. There is also a notation for lower bound, >::

def recover[E, A, B >: A](
    either: Either[E, A])(f: E => B): Either[E, B] =
  either match {
    case Left(e)  => Right(f(e))
    case Right(a) => Right(a)
  }
recover[String, Admin, User](err: Either[String, Admin]) {
    _ =>
  fallback: Member
}
// Either[String, User]

These are the type constraints provided by the language. Besides them there are also generalized type constraints, which exist as type classes provided by compiler (if compiler cannot create them, it means that a types fail constraints):

  • <:< - let us require that one type is a subclass the other.

    def upcast[A, B](set: Set[A])(
      implicit ev: A <:< B
    ): Set[B] = set.map(ev(_))
      
    upcast[Member, User](Set(m: Member)) // Set[User]
    

    It is defined inside a scala.Predef.

  • =:= - let us require that one is equal to the other.

    def update[A, B](set: Set[A])(f: A => B)(
      implicit ev: A =:= B
    ): Set[B] = set.map(f)
      
    val members: Set[Member]
      
    update[Member, Member](members)(identity) // ok
    update[Member, User](members) { member =>
      member: User
    } // compilation error!
    

    It is defined inside a scala.Predef as well.

  • <%< - already removed from Scala, but it used to denote that type A is either a subtype of B or could be implicitly converted to it.

  • =:!= - provided by shapeless. Proves, that types are different.

If you are requiring, that there should be a type class in scope for your type parameter:

def doSomething[F[_]: Monad]: F[String]

I met people, who would also name implicit syntax to type classes a type constraint (I require of a type T to be a monad).

Variance

Type constraints are useful when we want to describe our expectations about type parameter and/or returned type. But there are use cases where they are not enough.

Let’s take a look at Option. The first implementation would be something like:

sealed trait Option[T] {}
case class Some[T](value: T) extends Option[T]
case class None[T]() extends Option[T]

It is not something we got used to. We don’t want to create a new None for each type. Actually, we want just one case object, which we could use as a valid Option[T] everywhere. However, if we tried to:

sealed trait Option[T] {}
case class Some[T](value: T) extends Option[T]
case object None extends Option[Nothing]

then, when we would try to use None as Option[String] or Option[Int] we would get a type error.

def f(stringOpt: Option[String]): Unit
f(None) // compilation error!

The reason for that is that we defined the Option has an invariant parametric type. Invariance means, that even if B <: A then Option[B] still cannot be substituted for Option[A]. It is a default and a good one!

Long time ago Java let us do such things with arrays. In Java, you can create Array[Member] and pass it as a valid parameter for a function expecting Array[User]. That function assumes, that the array can contain anything as long as it’s User, so it can insert Admin to the array. Then you end up with a java.lang.ArrayStoreException. Suddenly you have a reference to array of Members that contains Admin - JVM recognizes such state as invalid. To avoid repetition of this problem, when Java introduced generics, it made them invariant without an option to change the behavior.

But the issue only appears if you share a mutable state. If you have immutable object, you could be able to e.g. pass List[Member] as List[User], Option[B] as Option[A] and we would never end up with an invalid behavior (if we modified something, we would create a modified copy, so original would still be fulfilling its requirements).

A situation when we explicitly state that if B :> A then F[B] :> F[A] is called covariance. To mark a type parameter as covariant we use +:

sealed trait Option[+T] {}
case class Some[+T](value: T) extends Option[T]
case object None extends Option[Nothing]

Covariance propagates down, so if we made T in Option covariant, we will receive complaints if we don’t do the same for T in Some. Usually, when we mark the parameter as covariant and we extends the type as object e.g. an empty container (None, Nil, EmptySet), we use Nothing as a hardcoded type - as a bottom type it subtypes all types, so such object can be upcasted always. Additionally, it is a nice explanation, that this container is empty as it is literally a container of nothing.

(As we can guess, most immutable collection would be covariant, while most, if not all, mutable collections is invariant).

It is quite common to have situation when we want to pass a value of a specific parametric type and generalize it. What about the opposite situation, when we have a producer of a parametric type values and we want to specify it:

trait Consumer[T] {
  def consume(value: T): Unit
}

val members: List[Member]
def consumer: Consumer[User]
members.foreach(consumer.consume) // error!

Case where we want to say that A <: B implies F[B] >: F[A] is called contravariance as it is an opposition of covariance. We denote it with - sign:

trait Consumer[-T] {
  def consume(value: T): Unit
}

val members: List[Member]
def consumer: Consumer[User]
members.foreach(consumer.consume) // works

By analogy to covariance, Nothing, and bottom type, here Any (top type) would be a type parameter of a consumer that consumes any value.

Covariance and contravariance are very important to the functional programming in Scala. They allow for intuitive behavior for the central element of function-as-a-first-class-citizen approach, that it function itself:

trait Function[-A, +B] {
  def apply(a: A): B
}

It is obvious to us that function, that takes User should also take Member, and function returning Member could be used as a substitute for a function taking User. This also shows as that invariance, covariance, and contravariance - known together as variance - describe a type parameter and not the whole type. Here A is contravariant type parameter of Function while B is covariant type parameter.

(Actually, Scala’s function definition looks a little bit different - and depends on its arity - but the principle holds. It also holds for a type derived from a function, a PartialFunction).

Type theory allows also bivariance - a case when the type is covariant and contravariant at the same type - but Scala doesn’t allow such situation.

Existential types

We talk a bit about type parameters, how we can put constraints on them and how they translate to whole type being a super/subtype. But what should be do when we don’t care?

def count[T](seqs: Seq[T]*): Int = seqs.map(_.size).sum
count(Seq(1,2,3), Seq("test")) // 4

Here, we don’t actually use the type T at all. It should work for all Ts we pass (if we put there some type constraints, then for all T that matches them). So, we can describe this function with an universal quantifier:

Tcount:SeqSeqTint\forall_T count: Seq_{Seq_T} \rightarrow int

Because we can describe it with universal quantifier we could call type Seq[T] (and Seq[Seq[T]]) an universal type.

But what if our function count was a part of another function or a class, it was the only place when we needed T and we would have to pass it down anyway? Or what we wanted to store Sequences in a map, but we didn’t care about the type, because each of them would be of some other type? (ClassTag[T] -> Set[T]).

If it was Java, we could use ? to indicate, that we don’t care about the type:

int count(java.util.List<?>... seqs) {
   return Arrays.stream(seqs).mapToInt(seq -> seq.size()).sum();
}

This ? tells us something like we know, that there is some concrete T we could put here, but we don’t care about it at the moment. We don’t know any details about this T and, to be honest, we don’t care. All we need is assumption that there is some. In math we could express that with an existential qualifier:

seq:Seq?    Tseq:SeqTseq: Seq_? \iff \exists_{T} seq: Seq_T count:SeqSeq?intcount: Seq_{Seq_?} \rightarrow int

By analogy we could name such type an existential type. Scala let us express it in two ways:

def count(seqs: (Seq[T] forSome {type T})*): Int = 
  seqs.map(_.size).sum

forSome is a syntax existing solely for existential types, I believe. The other, shorter version is:

def count(seqs: Seq[_]*): Int = 
  seqs.map(_.size).sum

which looks dangerously similar to type constructor syntax. As a matter of the fact, we can distinguish these 2 different usages merely by the context in which they appear: if F[_] appears in definition of a type parameter it is a type constructor, if it is used as a type of a variable/parameter it is an existential type.

There are few things that would be difficult to implement without existential types (fixed-point types in recursion schemes, I guess?), but in our everyday life, we don’t use them that often. During his work on Dotty, Odersky found out that some of their applications are among the things that were making the language unsound, which is why forSome notation is removed from Dotty.

When mathematician talks that some theory is sound, when all formulas in the theory could be logically derived from the axioms and rules of the theory.

In case of a type system, it means that if our type system decides that some expression has a type T, then if you run it, the returned value will also be of type T.

It might be surprising for some, but (besides null, which breaks most type systems that allows it and everybody is aware of that) there is a lot of cases when language can be broken and forced to return value of an invalid type.

F-bound types

The last type-parameter-related types I want to talk about are F-bound types. The name originates from F-type system, which was the first one that allowed their existence. But what they are and what problem they solve? Let’s look at an example:

trait Number {
  
  def +(number: Number): Number
  def *(number: Number): Number
}

class Int extends Number {
  
  def +(number: Number): Number
  def *(number: Number): Number
}

class Double extends Number {
  
  def +(number: Number): Number
  def *(number: Number): Number
}
val a: Int
val b: Int
val c = a + b // Number!

In our example, if we defined our interface without any type parameters, we would end up with situation where any number on a specific Number would lose information about a specific type. Also, without any notice it would allow additions of unrelated implementations:

val a: Char // Char extends Number
val b: Double
val c = a + b // Number, but what Number exactly?

We might make our type Number parametric so that we could control the type of parameters and returned values:

trait Number[N] {
  
  def +(number: N): N
  def *(number: N): N
}

class Int extends Number[Int] {
  
  def +(number: Int): Int
  def *(number: Int): Int
}

class Double extends Number[Double] {
  
  def +(number: Double): Double
  def *(number: Double): Double
}
val a: Int
val b: Int
val c = a + b // Int!

Issue with operations on unrelated implementations also disappears:

val a: Char
val b: Double
val c = a + b // error!

However, now we have a new issue:

class User extends Number[Int] // why is that a thing?

What we need here is some sort of restriction, where we require, that a type parameter must be the type that extends the parameterized type. In Scala, we can do it in 2 ways

  • using self-types:

    trait Number[N] { self: N =>
      // ...
    }
    

    (we’ll talk more about self-types later on),

  • using type constraints:

    trait Number[N <: Number[_]] {
      // ...
    }
    

    (Here Number[_] is an existential type, not a type constructor).

    This definition relies on a certain limitation of Scala (and Java) - we cannot implement the same parametrized type with different parameters in one class. Therefore:

    class Double extends Number[Double]
    class X extends Number[Double] // fails - different type params in Numbers
    class Y extends Number[String] // fails - string is not a Number
    

Both ways help us ensure, that the implementation is type-recursive.

This is quite a popular way of solving the initial problem in languages that don’t support (or encourage) type classes. In Java, it is a built-in method of implementing Enums (enum X becomes abstract class X extends Enum[X]). C++ knows F-bound types under the name Curiously Recurring Template Pattern (CRTF).

This pattern is certainly less encouraged in languages that DO support type classes (and do not support classes and inheritance, which means half the FP languages).

Structural and refined types

We know, that in math a class is a collection of objects, that fulfill some conditions and programming borrowed the idea. But, what if we wanted to specify these conditions without giving them a name? In JavaScript, you can define a type like:

type User = { name: string, surname: string }

By most OOP languages’ standards, it is not a class. But in JavaScript and Go (and in mathematics), this is an allowed method to create a collection of values by a predicate (has name property of type string and surname property of type string) - in other words a type.

Scala also allows us to define a type this way:

type User = {
  val name: String
  val surname: String
}

We don’t have to extends User to make a value conforming to the type - it is enough to fulfill the requirements:

case class Somebody(name: String, surname: String)

def needUser(user: User): Unit

needUser(Somebody("test", "test")) // works!

We can call User a structural type. It is an interesting way of implementing a type, one where compiler checks if object has all the right methods and fields instead of scanning the class hierarchy. A problem with such approach is that in runtime Scala has to use a runtime reflection to access these fields, so it comes with a performance penalty.

Structural types are not restrained to definitions using only vals and defs. We can demand that it extends some class as well:

trait X { val x: String }
type Y = { val y: Int }
val z: X with Y = new X { val x = "test"; val y = 0 }

So a compound type containing a structural type is also a structural type.

The example above shows us another interesting thing. If we asked a REPL about a type of:

new X { val x = "test"; val y = 0 }

the answer would be:

AnyRef with X{val y: Int}

Scala automatically adds the information about new fields and methods to the type! Actually, any time we add something extra to the interface, if don’t upcast the returned value, Scala would reflect these extra properties in a type. We call such types the refined types, but they are just a special case of structural types.

It appears refined types are not unique to Scala. When local type inference arrived in Java, it also started to support them, though in a half-baked way. You cannot declare a structural type, so such types can be only used via vars, which hold an information about refinement. This incomplete implementation can lead some Java programmers to thinking that such feature is dangerous.

Path-dependent types

Let’s say we want to model a card game:

case class Card(color: String, value: String)
case class Player(name: String)
class Game {
  
  def getPlayers: Set[Players] = ???
  
  def getCards: Set[Cards] = ???
  
  def playerPlayCard(player: Player, card: Card): Unit = ???
}

However, in our application there could be several games running at once, so we are at risk of messing up objects from different games:

val game1: Game
val game2: Game
game1.playerPlayCard(game2.getPlayers.head,
                     game2.getCards.head)

Could we make sure that we can only use objects from right Game and do it in compile time (assertions throwing at runtime might be a little too late for our needs)?

Path-dependent types are way of stating that type of one object, depends on another object. It is a limited version of more generic idea of dependent types. Actually, Dotty origins its name from Dependent Object Types calculus that Odersky designed to make future versions of Scala sound.

So, how could we define such type?

class Game {
  
  case class Card(color: String, value: String)
  case class Player(name: String)
  
  def getPlayers: Set[this.Player] = ???
  
  def getCards: Set[this.Card] = ???
  
  def playerPlayCard(player: this.Player, card: this.Card): Unit = ???
}

Let’s check what types will have values returned from game1 and game2:

val game1 = new Game
val game2 = new Game
game1.getPlayers // Set[game1.Player]
game2.getPlayers // Set[game2.Player]

We got different types, so we could expect, that we cannot substitute one for the other:

game1.playerPlayCard(game2.getPlayers.head,
                     game2.getCards.head) // fails!

To create a path-dependent type, all we need is to declare a type inside another type! It doesn’t have to be a class:

class X {
  type Y = String
  val y: Y = "y"
}

val x1 = new X
val x2 = new X

However, if it’s just a type alias, Scala can prove that 2 types are the same, and so it will not help us much:

def y(x: X)(y: x.Y): Unit = ()

y(x1)(x2.y) // no complaints: x1.Y = String = x2.Y

That is, unless we make it a little less obvious, that the types are equal:

trait X {
  type Y = String
  val y: Y = "y"
}

class X2 extends X

val x1 = new X2
val x2 = new X2

y(x1)(x2.y) // fails!

It is not obvious anymore because we could e.g. make type Y abstract:

trait X {
  type Y
  val y: Y
}

val x1 = new X {
  type Y = String
  val y: Y = "y"
}
val x2 = new X {
  type Y = Int
  val y: Y = 1
}

OK, but what if we wanted to lose our requirements at some point?

def takeAnyPlayer(p: ???): Unit

We need to indicate a way of passing path-dependent type without its the context if needed. Such ability is granted to us via #:

def takeAnyPlayer(p: Game#Player): Unit

At this point there is very little we know for certain about our p. Scala most won’t be able to tell what is the exact type even if it would be obvious to you from the code. If there were some type constraints about the type, that would be guaranteed, you can rely on it. But anything that comes with the specification of path-dependent type is lost:

trait X {
  type Y <: AnyVal { def toLong: Long }
  val y: Y
}

val x1 = new X {
  override type Y = Int
  override val y: Y = 1
}
val x2 = new X {
  override type Y = Long
  override val y: Y = 1L
}

Hmm, here our type Y is defined directly inside a X instance, so Scala should be able to guess it’s exact type. We can confirm that indeed it is:

x1.y * x1.y // 1: Int
x2.y * x2.y // 1: Long

Same about #:

scala> trait X { type Y }
scala> :kind (X { type Y = Int })#Y
// Int's kind is A

We can use path-dependent types as method return type (path-dependent methods). However, we are not allowed to create a function which uses path-dependent types. It will be introduced in Dotty under name dependent function types.

Kind projector/type lambda

At this point we have enough information to understand what was happening inside some weird construction we used during partial-application in a type constructor with multiple type parameters:

scala> :kind ({ type T[A] = Either[String, A] })#T
scala.util.Either[String,?]'s kind is F[+A]

We start with Either[_, _] and we want to partially apply String as the first type parameter. In order to do so, we:

  • create a type alias T[A] = Either[String, A], which creates a parametric type T[A] with one type parameter,
  • we put it inside a structural type to create an opportunity for creating a path-dependent type,
  • finally we extract T as a path-dependent type, such that Scala can tell its specific type exactly,
  • since we didn’t applied type parameters, we end up with a single parameter type constructor,
  • we achieved partial-application of a type parameters to a type constructor, aka type lambda or kind projection.

Quite often kind projectors use λ to denote the output type. Because the syntax is quite messy, you might prefer to use a dedicated compiler plugin that would generate it for you. With kind-projector compiler plugin you would need to write just:

Either[String, ?]

In Dotty, type lambdas are supported natively with syntax:

[T] => Either[String, T]

Kind projectors become more useful the more you dive into hardcore functional programming as promoted by Cats and Scalaz.

Self-types

Mixins. Sometimes we want to add some functionality to existing class and traits are a great way of achieving that:

class Producer {
  
  def produce: Int = 1
}

trait DoubleProducer extends Producer {
  
  override def produce: Int = super.produce * 2
}

trait IncreaseProducer extends Producer {
  
  override def produce: Int = super.produce + 1
}

class MyProducer extends Producer with DoubleProducer with IncreaseProducer

(new MyProducer).produce // 3

Thing is, if we do it that way, we cannot make Producer a trait or abstract class. Perhaps we would like to use it only as interface, and then decorate the implementation (one of many).

In Scala sometimes we need to refer to the current instance. Usually, we might use this, but if we nest type definitions this will only point to the one we are currently inside. It makes it hard to access the outer object. We can give a name to this to make it less ambiguous:

trait Outer { self =>
  
  trait Inner {
    self // Outer
    this // self.Inner
  }
}

However, we might use type ascription to put a constraints on this self value (or whatever we decide to name it):

trait Producer {
  
  def produce: Int
}

trait DoubleProducer { self: Producer =>
  
  override def produce: Int = super.produce * 2
}

As we can see, we no longer need to extend decorated type - type ascription already allow us to access the decorated type’s members. Additionally this self-type acts as a type constraint on any type, that would like to mix-in the trait.

class OneProducer extends Producer {
  def produce: Int = 0
}

class MyProducer extends OneProducer with DoubleProducer // OK
class Invalid extends User with DoubleProducer // error!

Self-types, might be used to enforce ordering of low-priority implicits:

object MyType extends MyTypeImplicits

trait MyTypeImplicits extends MyTypeLowPriorityImplicits {
  
  // implicits
}

trait MyTypeLowPriorityImplicits { self: MyTypeImplicits =>
  
  // low-priority implicits
}

That is not the only use case though. Some people use self-types as a way of type-safe dependency injection:

trait FirstServiceComponent {
  
  trait FirstService
  val firstService: FirstService
}

trait SecondServiceComponent {
  
  trait SecondService
  val secondService: SecondService
}
trait FirstServiceComponentImpl extends FirstServiceComponent {
    self: SecondServiceComponent =>
  
  val firstService = new FirstService {
    // use secondService
  }
}

trait SecondServiceComponentImpl extends SecondServiceComponent {
    self: FirstServiceComponent => 
  
  val secondService = new SecondService {
    // use firstService
  }
}

object Application
    extends FirstServiceComponentImpl
    with SecondServiceComponentImpl

Each component of such construct is called layer. When we compose layers together, we are baking a cake. This metaphor is an origin of the name of this patter - a cake pattern. Me and many other programmers discovered, that such bastardized mixin causes more troubles, that it solves. Problems with extensions, maintenance and initialization order I already described in another article.

Creating new types out of existing: tagged and value types

Sometimes we don’t want to create a completely new product with 2 or more fields, nor a sum type. We also don’t want to use type alias, which doesn’t create a distinct type. We might want something like Surname or Name which is like String, but with no risk of accidentally passing values in a wrong way (surname as name, name as surname).

One ways of handling this issues was designed by Miles Sabin for shapeless, the other one was provided by language authors.

Solution proposed by Miles Sabin works by tricking the compiler to believe, that our value is of the type it is not:

object tag {
  def apply[U] = new Tagger[U]

  trait Tagged[U]
  type @@[+T, U] = T with Tagged[U]

  class Tagger[U] {
    def apply[T](t : T) : T @@ U = t.asInstanceOf[T @@ U]
  }
}

To understand this, let us see an example:

sealed trait Name
sealed trait Surname

val name = tag[Name]("John") // String @@ Name
val surname = tag[Surname]("Smith") // String @@ Surname

This @@ construct takes 2 types: our initial type, one of which we provide a value, and another type called tag, which will serve as a way of making a newly created type unique. We can always demand a type String with Tagged[Name] (or String @@ Name if you prefer) - the fact that String is final is checked only during class declaration and the mismatch between String with Tagged[Name] could only be found in runtime.

Meanwhile, the only methods we will have declared will be methods from String - which our value surely is. JVM will never have an opportunity to complain, so as long as compiler is OK about it, it will work. It appears, that .asInstanceOf[String @@ Name] is enough to silence it.

Tagged types, require that we will define a trait/class that will serve as tags. They can quite often be sealed traits that you won’t use anywhere else. As long as you have the tags, you can easily lift any type to a tagged type. In runtime, they will be untagged though, which might be advantage or disadvantage depending on situation (runtime reflection!). In compile time they will create different types, so you have to take that into consideration if you are using any type-level programming - type classes provided for primitives won’t work with tagged types (there is no one-standard for tagging) and you will have to handle it yourself.

To provide a standard solution to the mentioned problem Scala came up with value types. It is basically a class extending AnyVal with one val member.

// for convenience people prefer to use case classes for it
case class Name(value: String) extends AnyVal

// though it is not required
class Surname(val value: String) extends AnyVal
object Surname {
  def apply(value: String): Surname = new Surname(value)
}

Since there are few restrictions imposed on this class, the compiler has a lot of opportunities to optimize the wrapper away. Since it is a standard solution, many libraries handle it out of the box. The disadvantage it that the optimization doesn’t work reliably, so e.g. mocking is almost completely broken (because sometimes the value is boxed and sometimes wrapper is optimized away).

As a some sort of middle-ground a newtype library was designed.

@newtype case class Name(value: String)

It lets you declare your new type as a case class (which helps with IDE support), while macro annotation rewrites it into:

type Name = Name.Type
object Name {
  type Repr = String
  type Base = Any { type Name$newtype }
  trait Tag extends Any
  type Type <: Base with Tag

  def apply(x: String): Name = x.asInstanceOf[Name]

  // here AnyVal is used as extension methods optimization
  implicit final class Ops$newtype(val $this$: Type) extends AnyVal {
    def value: String = $this$.asInstanceOf[String]
  }
}

Phantom and literal types

sealed traits for each tag would pollute the JVM with unused interfaces. Could that be avoided? Well, it is planned.

There is a pattern for implementing something what is basically a type-level state machine.

sealed trait Switch
sealed trait On extends Switch
sealed trait Off extends Switch

class Bulb[S <: Switch] {
  
  def turnOn(implicit ev: S =:!= On): Bulb[On] =
    new Bulb[On]
  
  def turnOff(implicit ev: S =:!= Off): Bulb[Off] =
    new Bulb[Off]
}

val bulb = new Bulb[Off]
bulb.turnOff // error!
bulb.turnOn // Bulb[On]
bulb.turnOn.turnOn // error!
bulb.turnOn.turnOff // Bulb[Off]

Because Switch, On and Off are not used for anything else than indicating the state with type, they are named phantom types. Currently, it is just a pattern, but Dotty will let us mark type as an erased term, which makes it exist only in compile time.

Another type-level specific thing, that will arrive - this time with Scala 2.13 - is a literal type. With libraries like shapeless you can express e.g. case class as a HList. Thing is, it only stores information about types and their positions. There is no information about properties names, so generating things like JSON codecs is impossible. Or would be if not for LabelledGenerics, which return HList of tuples of Witness-types. Wait, what is a Witness? Let’s try to figure out from an example:

case class Test(a: String, b: Int)

def getGeneric[T, U](
  t: T)(implicit gen: Generic.Aux[T, U]): U = gen.to(t) 
def getLabelledGeneric[T, U](
  t: T)(implicit gen: LabelledGeneric.Aux[T, U]): U = gen.to(t)

getGeneric(test) // String :: Int :: HNil = "xyz" :: 1 :: HNil
getLabelledGeneric(test) 
// labelled.FieldType[Symbol @@ a, String] ::
//   labelled.FieldType[Symbol @@ b, Int] ::
//   ops.hlist.ZipWithKeys.hnilZipWithKeys.Out =
//     "xyz" :: 1 :: HNil

Interestingly, the LabelledGeneric created an instance, which stores a name of the field it originated from! (FieldType works the same way as tagged types - no extra information in runtime, but extra data in compile time).

It appears that compiler can create a singleton type of just one value. Since it is defined by its literal it is also known as a literal type. But even if compiler knows such types, it doesn’t let us create one… without some hacking.

Witness is a helper trait which implementation is provided by a shapeless macro:

trait Witness extends Serializable {
  type T
  val value: T {}
}

object Witness extends Dynamic {
  type Aux[T0] = Witness { type T = T0 }
  type Lt[Lub] = Witness { type T <: Lub }

  implicit def apply[T]: Witness.Aux[T] =
    macro SingletonTypeMacros.materializeImpl[T]

  implicit def apply[T](t: T): Witness.Lt[T] =
    macro SingletonTypeMacros.convertImpl
  
  // ...
}

This macros lets us currently access the literal types, which is so far accessible only via compiler internals. But with Scala 2.13 they will become available without costly hacks in the language.

Summary

In this article, I intended to talk a bit about types in Scala from the perspective of type and set theories, but also explain certain idiosyncrasies of Scala when it comes to implementing them.

We can see that Scala has quite powerful type system which is only going to get better. Even on its own, it is enough to give Scala a competitive edge over many other languages. It allowed development of many features which would be otherwise hard to implement, fragile or impossible.