Relearn your sbt

When I started to learn sbt, I noticed, that there is a huge gap between how I’m told to write builds for simple projects and how I have to write them when I maintain complex multi-module monstrosity. After a while I came to conclusion, that very often the way we are writing build.sbt is but a cargo cult programming.

Our first contact with sbt usually looks like this:

// build.sbt in our project's root
name := "my-project"
organization := "my-organization"
version := "1.0.0"
libraryDependencies += "org.typelevel" %% "cats-core" % "1.0.1"

Obviously, it doesn’t translate well to e.g. multi-module builds. So, for now, let us forget, that we ever learned that syntax.

build.sbt?

First: we are assuming that our project, is something that is located in the same directory as the build.sbt. As a matter of the fact, we always assume, that our project is in the same directory sbt is run in. There could be no build.sbt - sbt would still assume some defaults, so if we put Scala files according to the convention (src/main/scala) it would compile just fine. All you need to do is to cd into the directory and call sbt.

But if there is build.sbt file, then sbt will surely use it, right? Yes. More than that! sbt will read all files with .sbt extension. (This is one of the reasons why I find naming each .sbt file build.sbt in a multimodule project a cargo-cult programming. If you give each of these files a meaningful name, it will be easier to fast-navigate, or pinpoint which file errored on sbt start).

So, let us leave the build.sbt name for our root project. The moment we’ll add more files, we’ll remember to name them accordingly.

(Hint: version.sbt file which only contains version := "value" makes it easier to do a version bump programmatically).

build.sbt as Scala code

Ok, so we have a build.sbt file which relative path to the directory with project files is .. So, let’s declare a project placed in .:

lazy val root = (project in file("."))

So far no custom settings, so we really didn’t add anything to the table. But we could use this project to add some settings to it:

lazy val root = (project in file("."))
  .settings(
    name := "my-project",
    organization := "my-organization",
    version := "1.0.0",
    libraryDependencies += "org.typelevel" %% "cats-core" % "1.0.1"
  )

Hmm, that looks weird. := is an assignment operator, so do we put result of an assignment into settings? Same for += operator.

Actually, these are not mutable operations. name := "my-project" is SettingKey object with := method returning Setting object for a "my-project" value. Similarly with libraryDependencies += "org.typelevel" %% "cats-core" % "1.0.1".

So, what we do here is not an actual assignment. It’s a DSL for building sequence of Settings, that some engine in sbt’s guts turns into the immutable build graph! It was made to look like imperative-style programming… which is a quite confusing thing. What sbt does is providing imperative-looking DSL for creating immutable settings, that will be used for creating immutable build graph. You can think that somewhere deep inside of sbt happens something like:

listOfSettings.foldLeft(initialSbtConfig) {
  (config, nextDefinition) =>
    // case settings => define/override value
    // case task => (re)define task
    //
    // in both cases you might refer to other settings
    // by their reference
}

Once the model is evaluated (which is done before calling the first task or starting a shell) it is immutable - that’s what they mean when they say that sbt uses immutable build model. However, when you add another setting, you are in fact adding another modification to the queue of changes, that will be evaluated each time you start sbt - in this regard, it is very mutable.

It also explains why sometimes you think your setting is being ignored - it simply landed earlier in the queue of modification and was overwritten by other value. Sometimes one, that you didn’t define yourself. It might have been a plugin.

Now, you might start understanding, why people who dug a bit into how sbt works, cringe when they have to explain what sbt does.

Multi-module project

So, we defined a lazy val for root. Is defining subproject more difficult?

lazy val moduleA = (project in file("module-a"))
  .settings(
    name := "module-a",
    organization := "my-organization",
    version := "1.0.0",
    libraryDependencies += "org.typelevel" %% "cats-core" % "1.0.1"
  )

lazy val moduleB = (project in file("module-b"))
  .settings(
    name := "module-a",
    organization := "my-organization",
    version := "1.0.0",
    libraryDependencies += "org.typelevel" %% "cats-core" % "1.0.1"
  )

Not much of a difference. Instead of "." path we used relative path to subproject directory ("module-a" or "module-b").

Ok, but what if we wanted to make module B depend on module A?

lazy val moduleB = (project in file("module-b"))
  .settings(
    ...
  )
  .dependsOn(moduleA)

This defines compile dependency between modules: before we compile module B, module A must be successfully compiled first. And when we work within module B we can use classes from src/main/scala (and resources from src/main/resources) from module A. But if we wanted to reuse in B’s tests some code from module A it won’t work. We can make it work with:

lazy val moduleB = (project in file("module-b"))
  .settings(
    ...
  )
  .dependsOn(moduleA % "compile->compile;test->test")

That defines 2 dependencies: make moduleB / compile task depend on moduleA / compile, and moduleB / test / compile task depend on moduleA / test / compile task.

Well, technically it’s more like moduleB / Compile config depends on moduleA / Compile config and their respective settings depend on each other. "compile" here is about the name of a config (lazy val Compile = Configuration.of("Compile", "compile")).

It’s a mess. I also don’t like that everything is defined using types in more-or-less type-safe way and suddenly we are using strings.

Besides dependsOn there is also aggregate. The difference is that dependsOn is about depending on results of dependent project’s task (e.g. adding compiled classes from dependency to classpath), while aggregate makes sure that when you run a task, the same task will be triggered for all aggregated projects:

// moduleB
  // you can access moduleA classes in moduleB classs
  // and moduleA test classes in moduleB test classes
  .dependsOn(moduleA % "compile->compile;test->test")
  // running moduleB:test will trigger moduleA:test
  .aggregate(moduleA)

That should explain why aggregate is often used in root project of a multimodule setup: this way you only have to run sbt test without calling running tests for each module separately. If root project does only that and has no source code, you don’t have to use dependsOn with it.

project / config / key

You might be wondering, why I used moduleA / test/ compile notation. It is something called unified notation, and it was introduced to sbt in 1.1.0 release.

Basically, settings and tasks are organized within a hierarchy of sort:

  • the top level unit is project. It might be the root project, it might be a submodule. Settings something here sets it for all configs and tasks (unless they override it)
  • then there is config. Example of configs are Compile, Test, IntegrationTest. Out of the box you have only Compile and Test configured and IntegrationTest available to import and use in a project. You might want to define your own at some point like FunctionalTest. Configs exist so that e.g. compiler options and linters could be different for production (Compile) and test code. They are basically a named collection of settings,
  • finally, we got to the task keys and settings keys - they are names to which we can bind actions to perform or values.

sbt automatically propagate values in each scope down, where they can be overridden by more specific settings:

  • mySetting := "value" will be also available as Test / mySetting
  • we can override the value just for a particular task: myTask / mySetting := "other value"
  • values will be propagated down, so if we change Test / mySetting it will not affect Compile / mySetting

It is quite intuitive if you think about it.

This unified slash notation arrived with sbt 1.1.2. Earlier, it was available as a plugin and without that plugin it was necessary to override settings like this:

  • mySetting in Test := "value" (now: Test / mySetting := "value")
  • mySetting in (Test, myTask) := "other value" (now: Test / myTask / mySetting := "other value")
  • mySetting in (project, Test, myTask) := "another value" (now: project / Test / myTask / mySetting := "another value")
  • mySetting in (project, myTask) := "something else" (now: project / myTask / mySetting := "something else")

I can only be happy, that it is a legacy syntax now. The order of scope elements here is a mess.

Custom settings and tasks

So, how one can define its own settings? As a matter of the fact it’s just a matter of:

import sbt._

val mySetting = settingsKey[String]("Description of the setting I just defined")
val myTask = taskKey[String]("Description of the task I just defined")

and then

  .settings(
    mySettings := "value",
    myTask := {
      val settingsValue = mySettings.value
      streams.value.log.info(s"mySettings: $settingsValue")
      settingsValue
    }
  )

Notice .value after mySettings. It is a sbt macro, that can only be used inside setting or task definition (or if you want to log). If you are wondering why sbt authors decided to do it that way, look at the listOfSettings snipped, that I wrote before - if you simply used settings/task value, it would take one that would be the current during the build graph evaluation. Meanwhile, we want to use the final one. Macro was chosen as a way to ensure, that you don’t break that by accidentally messing with internals.

Interestingly it is also a way for sbt to decide task execution order. When you define a task, you need to call .value on each of your dependencies, and so you define the order of task execution. Of course, if needed, you can add a dependency to existing task:

.settings(
  IntegrationTest / test := {
    (IntegrationTest / test) dependsOn (IntegrationTest / flywayMigrate)
  }.value // yes, dependsOn requires calling .value
)

With that in mind, you see that the only thing that can go wrong is a circular dependency. sbt does it best to avoid it but, once in a while, you manage to smuggle it. It’s good to know that at such times you should look after .value .

Familiar syntax

Now that we know the details we might try to understand. What actually happens in:

name := "my-project"
organization := "my-organization"
version := "1.0.0"
libraryDependencies += "org.typelevel" %% "cats-core" % "1.0.1"

The answer is: take all of these not-in-settings values and put them in:

val namedAsProjectDirectory = project.in(".")
  .settings(
    // here go settings values
  )

It has its limitations: without the project object, you cannot call e.g. project.disablePlugin(somePlugin).

Also, as far as syntactic sugar goes .sbt files also:

  • import sbt._ and sbt.Keys._,
  • import autoPlugin._ of each plugin (though it won’t apply its content if it was disabled).

Besides that, .sbt file is just a Scala file like any other. You can import things, declare classes and objects and use values. So, if you could put all of the common stuff in some object you could reuse it, right?

project/

It just happens that project/ directory is special. Whatever is defined there is available to build tool, but not to the project(s). So you can define there classes for your convenience and import them in .sbt files. You can add libraryDependencies, which you could use during task definition and not make them part of the project dependencies.

I am sure that you made use of this even if you weren’t fully aware of all the implications. After all, every plugin installation manual tells you about creating/editing project/plugins.sbt file, and adding there one line. Well, now you know it doesn’t even have to be named plugins.sbt.

(Global settings directory works the same way. If you used IntelliJ Idea and built project with sbt shell, then if you opened ~/.sbt/1.0/plugins you could find - among other global sbt settings: plugins.sbt and idea.sbt. Since sbt will parse all .sbt files, Idea can safely add its own sbt plugin in a dedicated file, knowing that you probably won’t edit it).

Another common practice you might meet is having:

  • object Dependencies and
  • object Settings

defined in project/Dependencies.scala and Settings.scala. In these files, maintainers of bigger projects (which had to be split into modules) can share settings so that they wouldn’t have to e.g. make sure in 15 places that they upgraded all Akka libraries. Instead, they have:

// project/Dependencies.scala

import sbt._

object Dependencies {
  val versions = {
    val akka = "2.5.11"
  }
  
  val akkaActor = "com.typesafe.akka" %% "akka-actor" % versions.akka
  val akkaContrib = "com.typesafe.akka" %% "akka-contrib" % versions.akka
}
// submodule/submodule.sbt

libraryDependencies ++= Seq(
  Dependencies.akkaActor,
  Dependencies.akkaContrib
)

Settings can also be shared:

// project/Settings.scala

import sbt._

object Settings {
  
  val commonSettings = ...
}
lazy val module = (project in file("."))
  .settings(Settings.commonSettings: _*)

Notice that .scala files need to have things imported manually (import sbt._) .

Summary

sbt is just (over-engineered) Scala DSL which, has few syntactic addons for .sbt files. If we use it as such it is much easier to maintain our codebase and understand what is going on.

The syntax that is promoted by tutorials should be perceived as just a syntactic sugar - it is easier to reason about it that way.

sbt build model is immutable only after the whole graph is evaluated - in the process it is pretty much mutable, and the notion what is actually meant by build model immutability is a source of confusion.

Once we finally grasp that mental model it becomes much easier to manage your build.