Tagged or AnyVal?

When we want to better describe our domain, at some point we might want to start using types for describing what each value means. String, Int, or Double tell us everything about what we can do with a value, but do they really explain the context?

Motives

Most style guides would tell us that a variable name should explain its purpose. So

val name: String = "John"
val surname: String = "Smith"

is a good way of passing the intent. Still, one can make an error like

def logUser(name: String, surname: String): Unit

logUser(surname, name)

We can argue that named params would make the error obvious, but we all know that sooner or later one will make such a mistake. Some languages allow us to create new types, which would make code look like

val name: Name = "John"
val surname: Surname = "Smith"

def logUser(name: Name, surname: Surname): Unit

logUser(surname, name) // compile error

Unfortunately, a type alias is just an alias. While type Surname = String would make the code more readable, it doesn’t improve type safety.

Tagged types

One of the attempts to address this problem was made by Miles Sabin, who posted a gist with an interesting hack on the Scala compiler and the JVM.

trait Name
trait Surname

type Tagged[U] = { type Tag = U }
type @@[T, U] = T with Tagged[U]

// ... declarations of helper tag[U](tValue) utils ...

val name: String @@ Name = tag[Name]("John")
val surname: String @@ Surname = tag[Surname]("Smith")

def logUser(name: String @@ Name, surname: String @@ Surname): Unit

logUser(name, surname) // compiles
logUser(surname, name) // compile error

How does this work? Let’s take a look at the code that would tag our value (omitted above):

"John".asInstanceOf[String @@ Name]

which is equivalent of:

"John".asInstanceOf[String with Tagged[Name]]

which in turn expands to:

"John".asInstanceOf[String with { type Tag = Name }]

Ok, so what does it mean?

{ type Tag = Name } is a structural type. It means that Scala could use reflection to ensure that our object has all members and methods declared as this ad-hoc created type. For that reason, if we use wartremover, it will warn us that this piece of code is discouraged as inefficient. But is it really?

When it comes to structural types, Scala uses reflection when we attempt to access some of its members. With A with { def show: String }, the runtime environment would have to use reflection on type A to check if it implements def show: String. But { type Tag = U } contains no methods and no members! There is simply no circumstance under which compiler would notice something missing.

On the other hand { type Tag = Name} is different from { type Tag = Surname }, so the compiler won’t allow them to be used interchangeably. This holds even if we use them as mixins - String with Tagged[Name] !== String with Tagged[Surname].

But why does the compiler allow us to create such constructs? String is final, so nothing should be able to extend it, right?

Well, the compiler’s logic here is a bit different:

  • x belongs to A with B iff x belongs to A and x belongs to B,

    xAB    (xA)(xB)x \in A \cap B \iff (x \in A) \land (x \in B)
  • so "John" is String with { type Tag = Name } iff "John" is String (obviously) and "John" is { type Tag = Name },

    "John"String{x:x.Tag=Name}    "John" \in String \cap \{ x: x.Tag = Name \} \iff     ("John"String)("John"{x:x.Tag=Name})\iff ("John" \in String) \land ("John" \in \{ x: x.Tag = Name \}) "John"String=true"John" \in String = true
  • { type Tag = Name } is an interesting construct. For any value passed as { type Tag = Name }, we receive a complaint that the type doesn’t match. And we can tell the compiler to treat "John" as { type Tag = Name }
    "John".asInstanceOf[{ type Tag = Int }] // compiles!
    
  • so, after casting, "John" is both String and { type Tag = Name }, then "John" is String with { type Tag = Name },

  • ergo, the type is valid, and the fact that String is final is completely irrelevant.

@@ is just a nice alias that makes use of infix notation:

  • type @@[T, U] = T with Tagged[U],
  • so String with Tagged[Name] can be accessed as @@[String, Name],
  • @@[String, Name] could be written with infix notation as String @@ Name.

Tagged to go

What are selling points of tagged types?

  • they exist only at compile time — at runtime String @@ Name degenerates into just String, so there is no performance penalty,
  • since tagging is just a matter of casting, you can lift everything - even types that weren’t intended for lifting. But with tagged types we can, e.g., take an existing type class and cast it into a tagged representation,
  • they are so easy to implement, one can add them to their project in like 5 minutes. Then, if used consistently, one can better model the domain and make invalid logic more difficult to write accidentally.

Tagged @@ Nope

So it works, great, but are there any downsides? Well, there are:

  • tagged types are a compiler hack. As such, they are not standardized and there is no single implementation one could use in all codebases. I know that Shapeless, Scalaz and Software Mill ones. Not to mention ad hoc implementations with the author’s own tweaks posted on blogs,
  • as such, hardly any library supports them out of the box. When I used tagged types with Circe or Slick, I ended up writing my own extension methods, which were basically lifting existing type classes,
  • libraries that do rely on tagged types - like Shapeless - assume that they are used internally and not directly by the programmer. In the current version of Shapeless (2.3.2) labelled generics don’t work with tagged types - it was fixed in a snapshot version, but if you try to use type class derivation with tagged types on the stable version, you’ll end up with compiler errors,
  • type classes are often invariant, so even if String @@ Name can be passed everywhere plain String fits, Rep[String @@ Name] is not Rep[String], so, e.g., Slick extension methods for a String database column won’t work.

AnyVals

So community expressed their need to be able to create new types, that would provide compile-time safety, no performance penalty and being done in one standardized way. The response was the introduction of AnyVal.

The premise is simple: one defines a new type with the following syntax:

class Name(val value: String) extends AnyVal

and the language will make sure that Name("John") will exist only in compile time - emitted JVM bytecode will only see plain String "John".

One would use it like:

val name: Name = new Name("John")

Of course, Name is not an instance of String, so if one wanted to provide some Stringy operation, one would have to add methods or extract the wrapped value:

name.value.toUpperCase
// or
// class Name(val value: String) extends AnyVal {
//   def toUpperCase: String = value.toUpperCase
// }
name.toUpperCase

On the one hand, it’s a bit more inconvenient; on the other, it is more consistent with how everything else works.

AnyVal for the win

Reasons to prefer value classes are:

  • as a standardized solution, basically all libraries that allow the use of custom types support value classes out of the box: Shapeless, Circe, Slick claim support for AnyVals,
  • contrary to tagged types, AnyVals can be used for pattern matching - with tagged types matching would break, as at runtime there is no information that discriminates different types. However, for this case, the documentation allows an AnyVal to create an actual wrapper for a value, so there is a cost related to the instantiation of a new object,
  • they are easier to understand by newcomers - you just use a class like any other wrapper, and the compiler will make sure that equals and hashCode are redirected to the value’s implementation, optimize away the allocation for the wrapper, and that pattern matching works; no one needs to wonder how things work, or why they don’t when something breaks.

new AnyValAllocation(value)

There is a small issue with the assumption that the compiler would optimize away allocation. Namely, that it is not always true. Creating a Seq of Names would impose a performance penalty, as investigated in this nice article.

Additionally, several times I had an issue when I tried to use an AnyVal to get rid of a wrapper and ended up with compiler errors:

final class Result[+T](val toTask: Task[Validated[ServiceError, T]])
    extends AnyVal {
  
  // ... utilities
}

used in tests as:

val result = resultReturningService()

fails with:

Result type in structural refinement may not refer to user-defined value class

Sure, I get that there are limitations, and they are even described in the docs… but 30k lines of production code compiles and test code doesn’t, and I am not happy about debugging compiler quirks.

While this is a slightly different use case - I created a wrapper to provide behavior, not a distinction between different types - it undermines my faith in the current implementation of value classes. However, I am sure for simpler cases it should work well.

Summary

So what do I take from all of that? Mostly, that both available options are imperfect, and I cannot clearly say that one is the victor. They crack at different corner cases, so the best way to choose what to use is to simply look at one’s cases and see where each choice would lead.

Tagged types can be used with everything and impose no performance penalty, but they are also not supported by any library.

Value classes claim to work with everything, impose no performance penalty and are supported by basically all major libraries, but they come with a long list of gotchas.

In the end, people will side with one solution or the other based on what burns them most. Personally, I would say that due to different shortcomings, one should always consider both solutions as options.