Tagged or AnyVal?

When we want to better describe our domain, at some point we might want to start using types for describing what each value means. String, Int or Double tell us everything about what could we do with a value, but does it really explain context?

Motives

Most style guides would tell us that variable name should explain its purpose. So

val name: String = "John"
val surname: String = "Smith"

is a good way of passing the intent. Still, one can make an error like

def logUser(name: String, surname: String): Unit

logUser(surname, name)

We can argue, that named params would make the error obvious, but we all know that sooner or later one will make such mistake. Some languages allow us to create new types, which would make code looks like

val name: Name = "John"
val surname: Surname = "Smith"

def logUser(name: Name, surname: Surname): Unit

logUser(surname, name) // compile error

Unfortunately, type alias is just an alias. While type Surname = String would make code more readable, it doesn’t bring better type safety to the table.

Tagged types

One of attempts to address this problem was made by Miles Sabin, who posted a gist with an interesting hack on Scala compiler and JVM.

trait Name
trait Surname

type Tagged[U] = { type Tag = U }
type @@[T, U] = T with Tagged[U]

// ... declarations of helper tag[U](tValue) utils ...

val name: String @@ Name = tag[Name]("John")
val surname: String @@ Surname = tag[Surname]("Smith")

def logUser(name: String @@ Name, surname: String @@ Surname): Unit

logUser(name, surname) // compiles
logUser(surname, name) // compile error

How does this work? Let’s take a look at the code that would tag our value (ommited above):

"John".asInstanceof[String @@ Name]

which is equivalent of:

"John".asInstanceOf[String with Tagged[Name]]

which in turn is expands to:

"John".asInstanceOf[String with { type Tag = Name }]

Ok, so what does it mean?

{ type Tag = Name } is a structural type. It means that Scala could use reflection to ensure that our object has all members and methods declared as this ad-hoc created type. For that reason, if we’ll use wartremover, it will warn us, that this piece of code is discouraged as inefficient. But, is it really?

When it comes to structural types, Scala uses reflection when we attempt to access some of its members. With A with { def show: String } runtime environment would have to use reflection on type A to check if it implements def show: String. But { type Tag = U } contains no methods and no members! There is simply no circumstance under which compiler would notice something missing.

On the other hand { type Tag = Name} is different to { type Tag = Surname }, so compiler won’t allow to be used interchangeably. This hold even we we used them as mixins - String with Tagged[Name] !== String with Tagged[Surname].

But why does compiler allow us even to create such constructs? String is final so nothing should be able to extend it, right?

Well, the compiler’s logic here is a bit different:

  • x belongs to A with B iff x belongs to A and x belongs to B,

    xAB    (xA)(xB)x \in A \cap B \iff (x \in A) \land (x \in B)
  • so "John" is String with { type Tag = Name } iff "John" is String (obviously) and "John" is { type Tag = Name },

    "John"String{x:x.Tag=Name}    "John" \in String \cap \{ x: x.Tag = Name \} \iff     ("John"String)("John"{x:x.Tag=Name})\iff ("John" \in String) \land ("John" \in \{ x: x.Tag = Name \}) "John"String=true"John" \in String = true
  • { type Tag = Name } interesting construct. For no value passed as a { type Tag = Name } we recieve complain that type doesn’t match. And we can tell compiler to treat "John" like { type Tag = Name }
    "John".asInstanceOf[{ type Tag = Int }] // compiles!
    
  • so, after casting, "John" is both String and { type Tag = Name }, then "John" is String with { type Tag = Name },

  • ergo, the type is valid, and the fact that String is final is completely irrelevant.

@@ is just a nice alias that makes use of infix notation:

  • type @@[T, U] = T with Tagged[U],
  • so String with Tagged[Name] can be accessed as @@[String, Name],
  • @@[String, Name] could be written with infix notations as String @@ Name.

Tagged to go

What are selling points of tagged types?

  • they exist only in compiler type - in runtime String @@ Name degenerates into just String, so there is no performance penalty,
  • since tagging is just a matter of casting, you can lift everything - even types that weren’t intended for lifting. But with tagged types we can e.g. take existing type class and cast them into tagged representation,
  • they are so easy to implement, one can add them to their project in like 5 minutes. Then, if used consequently, one can better model domain, and make the invalid logic more difficult to write accidentally.

Tagged @@ Nope

So it works, great, but are there any downsides? Well, there are:

  • tagged types are compiler hack. As such, they are not standardized and there is no single implementation one could use in all codebases. I know that Shapeless, Scalaz and Software Mill ones. Not to mention ad hoc implementations with author’s own tweaks posted on blogs,
  • as such, hardly any library support them out of the box. When I used tagged types with Circe or Slick, I ended up writing my own extensions methods, which were basically lifting existing type classes,
  • libraries that do rely on tagged types - like Shapeless - assumes, that they are used internally and not directly by the programmer. In current version of Shapeless (2.3.2) labelled generics don’t work with tagged types - it was fixed on snapshot version, but if you try to use type class derivation with tagged types on stable version, you’ll end up with compiler errors,
  • type classes are often invariant, so even if String @@ Name can be passed everywhere plain String fits, Rep[String @@ Name] is not Rep[String] and, so e.g. Slick extension methods for String database column won’t work.

AnyVals

So community expressed their need to be able to create new types, that would provide compile-time safety, no performance penalty and being done in one standardized way. The response was the introduction of AnyVal.

A premise is simple: one define new type with a following syntax:

class Name(val value: String) extends AnyVal

and the language will make sure that Name("John") will exist only in compile time - emitted JVM bytecode will only see plain String "John".

One would use it like:

val name: Name = new Name("John")

Of course, Name is not an instance of String, so if one wanted to provide some Stringy operation, one would have to add methods or extract wrapped value:

name.value.toUpperCase
// or
// class Name(val value: String) extends AnyVal {
//   def toUpperCase: String = value.toUpperCase
// }
name.toUpperCase

On one hand, a bit more inconvenient, on the other, more consistent with how everything else works.

AnyVal for the win

Reasons to prefer value classes are:

  • as a standardised solution basically all libraries that allow usage of custom types, support value classes out of the box: Shapeless, Circe, Slick claim support for AnyVals,
  • contrary to tagged types, AnyVals can be used for pattern matching - with tagged types matching would break as in runtime there is no information that discriminate different types. However, for this case documentation allows AnyVal to create an actual wrapper for a value, so there is a cost related to instantiation of a new object,
  • they are easier to understand by newcomers - you just use class just like any other wrapper, and compiler will make sure that equals and hashcode are redirected to value’s implementation, optimize out allocation for the wrapper, pattern matching works, no one needs to wonder why how things works, and why they don’t work when something broke.

new AnyValAllocation(value)

There is a small issue with the assumption, that compiler would optimize out allocation. Mainly that it is not always true. Creating a Seq of Names would impose a performance penalty, as investigated in this nice article.

Additionally, several times I had an issue, when I tried to use AnyVal to get rid of some wrapper and ended up with compiler errors:

final class Result[+T](val toTask: Task[Validated[ServiceError, T]])
    extends AnyVal {
  
  // ... utilities
}

used in tests as:

val result = resultReturningService()

fails with:

Result type in structural refinement may not refer to user-defined value class

Sure, I get that there are limitations, and they are even described in docs… but 30k lines of production code compile and test code doesn’t, and I am not happy about debugging compiler quirks.

While this is a bit different use case - I created a wrapper to provide behavior, not a distinction between different types - it undermines my faith in the current implementation of value classes. However I am sure for simpler cases it should work well.

Summary

So what do I take from all of that? Mostly, that both available options are imperfect, and I cannot clearly say that one is the victor. They crack at different corner cases, so the best way to chose what to use is to simply look at one’s cases and see where each choice would lead.

Tagged types can be used with everything, they impose no performance penalty, but they are also not supported by any library.

Value classes claim to work with everything, impose no performance penalty and are supported by basically all major libraries, but they come with a long list of gotchas.

In the end, people will side with one solution or the other basing on what burns them most. Personally, I would say that due to different shortcomings, one should always consider both solutions as an option.