Improving your project with SBT

12 May 2016• series: sbt • tags: scala sbt

I believe that the work on keeping quality high should start from the very beginning of the project. When it comes to actual implementation, setting up the build configuration is the very first thing one does. The choice of tools has a huge impact on the process and results.

Additionally, the build itself is a program as well (and an important one!), so there is no excuse for avoiding good practices like readability, DRY, SOLID, etc.

That is why in this post I want to write down some good ideas about SBT usage that I’ve learned in both commercial and my own small projects, that help me write better code, keep the build maintainable and improve projects in general.

Originally this post was published on ScalaC blog on 2016-05-12.

Basics

For simpler projects, you will find that our project follows a Maven-like layout similar to:

/our-project
 +- build.sbt
 +- /project
 +- /src
     +- /main
     |   +- /scala
     +- /test
         +- /scala

The layout of /src should be obvious for everyone who ever worked on projects with a Maven-ish directory structure. We have 2 directories here, /src/main and /src/test, which in turn group source code by languages (so Java files would be under the /java subdirectory, Scala files within /scala, etc.), and resources in the resources directory (there are exceptions like Android build configuration, but we’ll leave that for another day).

Right now build.sbt and /project are more interesting to us. The former is the most important file to look up by SBT when we run the sbt command within the our-project directory. /project is kind of a second-class citizen here: we can use it to empower the build.sbt file and make sure that the version of SBT used to build the project will be consistent in all environments.

A simple build definition could look like this:

name := "our-project"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq("org.scalaz" %% "scalaz-core" % "7.1.3")

What we see here is DSL created using SBT magic. As a matter of fact, it is somewhat restricted Scala subset with several implicit imports already made for us. Those properties may look like something mutable, but underneath they are actually immutable values! If we import the project into our favorite IDE with SBT support we can check that all of those are actually sbt.SettingKey instances and operators like := and ++= are used to create modified copies of those keys. Those keys are then used underneath as arguments for something similar to project.settings(settings1, settings2), which returns a modified instance of the immutable project. So despite the mutable looking DSL, everything stays immutable at the core.

How about modules?

Those pieces of information are quite useful when we consider multi-project. What are the reasons to do that? For me personally, it’s about keeping things simple: it is easier to work on a project when some responsibilities are clearly separated. Because the order of compilation and the direction of dependency are clearly defined, we can use modules to enforce concepts like layered architecture, hexagonal architecture, and (to a degree) bounded contexts.

Of course, it comes with a price: maintenance of such build could be more complex and (as for now) SBT has trouble with caching dependency resolutions, meaning that checking libraries might take a while. However, I have seen more than once that keeping things tidy and clean is definitely worth it.

As for this issue, SBT developers have tried to address it with an experimental resolution caching feature. When it comes to snapshots, one can also try to suppress resolution with the offline := true setting.

Simple setup

The basic setup of a multi-project build would look like this:

lazy val moduleA = project.in(file("modules/a"))
  .settings(scalaVersion := "2.11.8")
  .settings(libraryDependencies ++= Seq("org.scalaz" %% "scalaz-core" % "7.1.3"))
  .dependsOn(moduleB)

lazy val moduleB = project.in(file("modules/b"))
  .settings(scalaVersion := "2.11.8")
  .settings(libraryDependencies ++= Seq("org.scalaz" %% "scalaz-core" % "7.1.3"))

Of course, we have to make sure that there are modules a and b within the modules directory which follow the same Maven conventions as the single build described before.

That setup will load aggregating project on sbt named after our directory:

$ sbt
[info] Set current project to our-project (in build file:/home/user/our-project/)
>

If we wanted to have more control over it, we can create it explicitly:

lazy val root = project.in(file(".")).aggregate(moduleA, moduleB)

lazy val moduleA = project.in(file("modules/a"))
  .settings(scalaVersion := "2.11.8")
  .settings(libraryDependencies ++= Seq("org.scalaz" %% "scalaz-core" % "7.1.3"))
  .dependsOn(moduleB)

lazy val moduleB = project.in(file("modules/b"))
  .settings(scalaVersion := "2.11.8")
  .settings(libraryDependencies ++= Seq("org.scalaz" %% "scalaz-core" % "7.1.3"))

then:

$ sbt
[info] Set current project to root (in build file:/home/user/our-project/)
>

loads root project on the start as expected.

Magic names?

Let us stop here for a moment. When we list projects with sbt projects we’ll get:

[info] In file:/home/user/our-project/
[info]        moduleA
[info]        moduleB
[info]      * root

How exactly SBT determined names for those? In earlier versions we had to define them explicitly with:

lazy val moduleA = Project(in = file("modules/A"), id = "moduleA")

but currently we can use project macro which would lookup name of val and use it to populate module identifier and location:

lazy val mymodule = project
// is equal to
lazy val mymodule = Project(in = file("mymodule"), id = "mymodule")

Notice that the macro requires val here. We cannot just pass a reference into some utility function and hope things work. As such project is useful only to initiate a Project definition that we will customize from now on.

DRY in settings

It is difficult to overlook that something repeats in our configuration:

something
  .settings(scalaVersion := "2.11.8")
  .settings(libraryDependencies ++= Seq("org.scalaz" %% "scalaz-core" % "7.1.3"))

That doesn’t look good and easily can lead to some errors. For instance, a moment ago I forgot to copy-paste a line

.settings(scalaVersion := "2.11.8")

into moduleB. What happened?

$ sbt "project moduleA" compile
[info] Loading global plugins from /home/dev/.sbt/0.13/plugins
[info] Set current project to root (in build file:/home/user/our-project)
[success] Total time: 0 s, completed May 4, 2016 6:32:14 PM
[info] Set current project to moduleA (in build file:/home/user/our-project)
[info] Updating {file:/home/user/our-project}moduleB...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] Done updating.
[info] Updating {file:/home/user/our-project}moduleA...
[info] Resolving moduleb#moduleb_2.11;0.1-SNAPSHOT ...
[warn]     module not found: moduleb#moduleb_2.11;0.1-SNAPSHOT
[warn] ==== local: tried
[warn]   /home/dev/.ivy2/local/moduleb/moduleb_2.11/0.1-SNAPSHOT/ivys/ivy.xml
[warn] ==== public: tried
[warn]   https://repo1.maven.org/maven2/moduleb/moduleb_2.11/0.1-SNAPSHOT/moduleb_2.11-0.1-SNAPSHOT.pom
[info] Resolving jline#jline;2.12.1 ...
[warn]     ::::::::::::::::::::::::::::::::::::::::::::::
[warn]     ::          UNRESOLVED DEPENDENCIES         ::
[warn]     ::::::::::::::::::::::::::::::::::::::::::::::
[warn]     :: moduleb#moduleb_2.11;0.1-SNAPSHOT: not found
[warn]     ::::::::::::::::::::::::::::::::::::::::::::::
[warn]
[warn]     Note: Unresolved dependencies path:
[warn]         moduleb:moduleb_2.11:0.1-SNAPSHOT
[warn]           +- modulea:modulea_2.11:0.1-SNAPSHOT

Modules A and B were built using different versions of Scala, and as a result, the dependency couldn’t be resolved. This would never happen if the settings common to all projects could somehow be shared, right? Let us try to create our first file within /project directory.

import sbt._
import sbt.Keys._

object Common {

  val settings = Seq(
    scalaVersion := "2.11.8",
    libraryDependencies ++= Seq("org.scalaz" %% "scalaz-core" % "7.1.3")
  )
}

Then we can refer to common settings with:

lazy val moduleA = project.in(file("modules/a"))
  .settings(Common.settings:_*)
  .dependsOn(moduleB)

lazy val moduleB = project.in(file("modules/b"))
  .settings(Common.settings:_*)

Settings have a signature def settings(ss : sbt.Project.SettingsDefinition*) : sbt.Project which is the reason we have to use vararg type ascription :_* to adjust Seq value.

build.sbt, project/ and modules

Another way of defining settings (better suited for things specific to a module) is… putting another build.sbt in module’s directory. Personally, I try to keep all common settings and dependencies within project/* and use modules/*/build.sbt for libraries used only in one module. One should also keep in mind that the project directory can be used only with the root project. For modules, it will be ignored.

One can also try to remove top-level build.sbt completely and instead create a build object like this:

object Build extends Build {

    lazy val moduleA = project.in(file("modules/a"))
    .settings(Common.settings:_*)
    .dependsOn(moduleB)

  lazy val moduleB = project.in(file("modules/b"))
    .settings(Common.settings:_*)
}

As a matter of fact, that way of defining modules was/is quite popular in a lot of open-source projects. However, newer versions of SBT have deprecated this approach and now build.sbt is the only option if we decide on the newest versions.

Refactoring the build?

While I have seen some projects rely on the Common.scala approach, I have also seen some (more compelling) ones where this blob was split into something more self-explanatory, like Dependencies and Settings. For instance, something that I would use in my own project:

import sbt._

import Dependencies._

object Dependencies {

  // scala version
  val scalaVersion = "2.11.7"

  // resolvers
  val resolvers = Seq(
    Resolver sonatypeRepo "public",
    Resolver typesafeRepo "releases"
  )

  // functional utils
  val scalaz        = "org.scalaz" %% "scalaz-core" % "7.1.3"

  // logging
  val logback = "ch.qos.logback" % "logback-classic" % "1.1.3"

  // testing
  val mockito    = "org.mockito" % "mockito-core" % "1.10.8"
  val spec2      = "org.specs2" %% "specs2" % "2.4.1"
  val spec2Core  = "org.specs2" %% "specs2-core" % "2.4.1"
  val spec2JUnit = "org.specs2" %% "specs2-junit" % "2.4.1"
}

trait Dependencies {

  val scalaVersionUsed = scalaVersion

  val commonResolvers = resolvers

  val mainDeps = Seq(scalaz, logback)

  val testDeps = Seq(mockito, spec2, spec2Core, spec2JUnit)
}

import sbt.Keys._

object Settings extends Dependencies {

  val modulesSettings = Seq(
    organization := "io.scalac",
    version := "0.1.0-SNAPSHOT",

    scalaVersion := scalaVersionUsed,

    scalacOptions ++= Seq(
      "-unchecked",
      "-deprecation",
      "-feature",
      "-language:existentials",
      "-language:higherKinds",
      "-language:implicitConversions",
      "-language:postfixOps",
      "-Ywarn-dead-code",
      "-Ywarn-infer-any",
      "-Ywarn-unused-import",
      "-Xfatal-warnings",
      "-Xlint"
    ),

    resolvers ++= commonResolvers,

    libraryDependencies ++= mainDeps,
    libraryDependencies ++= testDeps map (_ % "test")
  )
}

This way, the Scala version (and standard library), dependencies, and resolvers would be kept in one place and separated from the settings. I’ve separated test dependencies from main ones to make sure that we won’t rely on unit test frameworks in production (libraryDependencies ++= testDeps map (_ % "test")). I’ve also added some scalac compiler options to enforce better quality of the code.

Testing

While we’re at testing we can also think about some small improvements. By default we have access to test task which would run JUnit/Scalatest/Specs2/whatever framework is fancy at a time. But is it enough? Making CI run tests only informs us that no tests got broken, it doesn’t say how much of the code is checked.

Code coverage tools are a great way to figure out which part of the codebase should get special attention. When you see that some critical part of your application is severely untested you might start to worry and this should motivate you to throw some tests there. Mind that a number itself is meaningless. What would be the point of 100% coverage of a module made entirely of POJOs or plain case classes? We should use coverage values reasonably, to decide which parts of the code need special attention, which require more testing, but requiring any level of coverage should be something that responsible programmers decide themselves. We all know that any form of coverage forced on developers against their will would just lead up to meaningless tests that touch everything and check nothing. ;)

Ad rem. Configuring test coverage in SBT cannot be done out of the box. But it can be provided via SBT plugins. For this article, I’ll use SCoverage, but there are plenty others to choose from.

First, let’s make sure that everyone running our project will use the same SBT version — similarly to how Scala libraries’ packages are bound to specific Scala versions, SBT plugins are bound to SBT releases. And we would want other devs to just run build, not fight against it. We can define fixed SBT version by providing project/build.properties file with content like:

sbt.version=0.13.9

Then we can provide plugins for SBT within project/plugins.sbt

addSbtPlugin("org.scoverage" % "sbt-scoverage" % "1.3.3")

From this moment we can access SCoverage settings within our build definitions. In single module builds coverage would be enabled automatically. In a multi-module build, however, we have to enable it in each module individually:

import scoverage.ScoverageSbtPlugin

lazy val module = project.enablePlugin(ScoverageSbtPlugin)

If we also want to enable coverage measurement by default (which I do NOT recommend, but let’s leave it for now) we can configure it with:

import scoverage.ScoverageKeys._

val settings = Seq(
  //...
  coverageEnabled := true,
  //...
)

Now we can measure coverage by running the following command:

$ sbt clean coverage test coverageReport

for a single build or

$ sbt clean coverage test coverageAggregate

for a multi-project.

Why that way? Why not with just one command? Well, there are limitations to tools used to measure coverage. First, they are measuring coverage within files modified/recompiled since last rebuild (or at least they appear to). As a result, you’ll often get coverage values that would make no sense unless you clean build prior to measurement. Second, they have to be manually instructed to start tuning - this can be worked around by setting coverageEnabled := true as shown above, but as a side effect, running the application with sbt run might cause the application to fail since it will still try to load some (absent in normal runtime) coverage dependency (and that’s why I recommend against it, and so does the author of a plugin). The last one is the need to the manual trigger of coverage report documentation.

After that, you can read the reports under the target/scoverage-report directory. You can also define minimal coverage for the build to pass on CI using options like:

coverageMinimum := 80
coverageFailOnMinimum := true

but as I said, first make sure that your team agrees. It is also worth knowing that coverage of some parts of the code could be disabled with:

  // $COVERAGE-OFF$Reason for disabling coverage
  ...
  // $COVERAGE-ON$

so that all kinds of safe code (case classes, etc.) or code that couldn’t be reasonably tested (once you finish extracting deps, you eventually end up with some place where you ultimately gather all the instances and inject them into the components) would not cause any disturbance.

Do it with style

I have seen a few big and successful projects. What they had in common were the developers who wanted to keep quality high at each level. That means they all had their style guidelines that everyone was obligated to follow - but hardly anyone would try to learn formatting rules by heart! Instead, each of those projects relied on some automatic formatter that was not subject to opinion or mistake. Simply - you work here, your code will be formatted with X, EOT. That got rid of all discussions about indentations, where spaces should go and where shouldn’t and with good tests covering a large part of the projects, reviewers could actually focus on more important things: whether code makes sense, whether it will be maintainable in the future, does it leave no place for misunderstandings etc. That is why some people consider defining formatter for the project as a rule 0 of project configuration.

What I used with great success was a combination of Scalariform and Scalastyle. The former is a formatter that (by default) runs on each compilation (which means that as long as our developers commit code they actually run we have consistent codebase with no additional effort). The latter is a style checker. As those two don’t cover exactly the same elements of style guide they complement each other. For instance, by default Scalariform might merge some lines into one line (it doesn’t have a sense of line length limit, unfortunately), then Scalastyle might catch that and let us know that we need to handle this specific case ourselves (I admit that no line-length-limit is the greatest weakness of Scalariform).

To use them, we start by adding plugins to SBT (again, in project/plugins.sbt):

addSbtPlugin("org.scalariform" % "sbt-scalariform" % "1.5.1")

addSbtPlugin("org.scalastyle" %% "scalastyle-sbt-plugin" % "0.7.0")

(some versions of SBT can break if we don’t put those empty lines between plugins). Then Scalariform can be configured with:

import com.typesafe.sbt.SbtScalariform._

import scalariform.formatter.preferences._

val settings = Seq(
  //...
  ScalariformKeys.preferences := ScalariformKeys.preferences.value
    .setPreference(AlignArguments, true)
    .setPreference(AlignParameters, true)
    .setPreference(AlignSingleLineCaseStatements, true)
    .setPreference(DoubleIndentClassDeclaration, true)
    .setPreference(IndentLocalDefs, false)
    .setPreference(PreserveSpaceBeforeArguments, true),
  //...
)

Scalastyle has a slightly different approach to configuration - it uses the scalastyle-config.xml file. We can generate it with the sbt scalastyleGenerateConfig command and then edit it to our heart’s content. Once we’re done we can check style with sbt scalastyle.

If you’re as crazy about quality as I am, you may want the build to fail if the style is not up to standard. You can achieve that by configuring:

import org.scalastyle.sbt.ScalastylePlugin._

var settings = Seq(
  //...
  scalastyleFailOnError := true,
  //...
)

and marking all offending warnings as errors within scalastyle-config.xml.

In case something breaks here and you don’t want it fixed (e.g., you don’t agree with the tools in this particular case), you can suppress the tools with:

// format: OFF
...
// polished stuff
...
// format: ON

// scalastyle:off
...
// naughty stuff
...
// scalastyle:on

Summary

Here we showed how we can start up (or improve) an SBT project with modules that would clearly define the direction of dependencies between different parts of it, highlight architecture and add several tasks that would help us keep code quality high. We would just run

$ sbt clean coverage test coverageAggregate scalastyle

make sure that tests pass, coverage is high enough, style guidelines are followed, and code reviewers can focus on the important stuff — the things that no automation could check for us.

kubuszok.com