Speed up things in scalac and sbt

1 Jun 2018• series: sbt • tags: scala sbt

Scala is not the fastest language to compile. sbt adds its own overhead. So in the life of most (every?) business applications written in Scala, CI builds are so long that after git push you can go watch the next episode of a TV show. Local changes take ages, even with Zinc. And you don’t want to rewrite half the stuff nor do you have the budget to consider things like Triplequote Hydra. What then?

Well, let’s face it - you have limited options. Even so, I noticed that sometimes, long build (and test) times are not an issue with the compiler itself, but rather with how you structure and run things. So there are some options you might try out before doing invasive changes or switching to another Scala compiler.

In this post, I want to list a few things one might try when the CI build or local development becomes too slow. None is guaranteed to work - every project is unique. Also, I didn’t want to provide detailed benchmarks - same reason, every project is unique and whatever worked for me might not work for you and vice versa. I will just provide estimated numbers for a project I worked some time ago. For CI exact numbers it would be even harder to get since I noticed that the one I am using might have different build times for the same commit over the day. I can only guess, that when USA starts its day, more people push to repo triggering more build and the small latencies in infrastructure under the burden adds up into several minute difference.

Tune JVM

Since we want to start with the least invasive changes let’s start with the way you use the JVM.

sbt-extras by Paul Phillips is a wrapper for vanilla sbt that adds several interesting options:

saner JVM defaults. In one project (which had some more shapeless stuff), at some point, CI started giving me stack overflow on implicit resolution, and things were getting really slow. It took about 30 minutes to compile and test 30k lines of code. Simply using the wrapper defaults meant I stopped seeing this error ever again. This let me grow the project to about 60k lines of code, where at some point it took 45 minutes to complete,
.jvmopts file - sbt allows us to use JVM_OPTS (and SBT_OPTS) to set up JVM options: the amount of memory, garbage collector, whether or not to use the server mode. Wrapped adds the ability to use the file to set up these options and passes them to sbt. You can check how e.g. cats use it to give several gigs of memory to sbt,
automatic download of right version of sbt and Scala - if you want to onboard a new dev with little experience with sbt, this script will save them a few minutes.

Now you can check your CI specs, your dev machine’s spec, choose the lowest common denominator and tune JVM against it.

Once you use up all the other options, you might increase your CI plan and bump these values if needed.

Change JVM

Some people might not know, but there are more JVMs and JDKs than just OpenJDK and Oracle JDK. IBM released their version. Azul Zulu and Azul Zing are a thing (especially the latter if you need some absurd GC capabilities like no GC pause with 64GB memory, or caching optimizations to start JVM already warmed up). But the JVM I want you to try out next is GraalVM.

In my case cold, OpenJDK’s JVM compiled locally all production code, unit and integration tests in ~620s. Once I switched to GraalVM, compilation time on cold VM dropped to ~515s. Complete CI builds also sped up: at the time I already did some tuning, so the build took 24 minutes. With Graal it dropped to 20 minutes.

Small warning: GraalVM is currently not shipped with all Oracle JVM’s certificates, so e.g. fetching artifacts from Sonatype will fail. That is unless you install them yourself:

GRAALVM_HOME=/usr/lib/jvm/java-8-graal # location of your GraalVM installation
ORACLE_JRE_URL='http://download.oracle.com/otn-pub/java/jdk/8u171-b11/512cd62ec5174c3487ac17c61aaa89e8/jdk-8u171-linux-x64.tar.gz'
curl -jkL -H "Cookie: oraclelicense=accept-securebackup-cookie" $ORACLE_JRE_URL | \
  tar -zxvf - --directory $GRAALVM_HOME \
              --wildcards "*/jre/lib/security/cacerts" \
              --strip-components 1

Caching CI deps

I used up things that might speed up local development without any changes to the code or the build script. But before I move on to these (slightly) invasive changes let us try to speed up CI even more.

One of the things that add time to CI is getting dependencies. I mean here both fetching artifacts for sbt as well as installing packages by the package manager. It can cut a few minutes from each build if the CI does not have to do it every time.

Caching artifacts is easy. It is just CI provider dependent, but you can add:

cache:
  directories:
    - $HOME/.ivy2/cache
    - $HOME/.sbt/boot

before_cache:
  # Tricks to avoid unnecessary cache updates
  - find $HOME/.ivy2 -name "ivydata-*.properties" -delete
  - find $HOME/.sbt -name "*.lock" -delete

to your .travis.yml file or

#build:
  cache_dir_list:
              - $HOME/.ivy2/cache
              - $HOME/.sbt

to .shippable.yml. (Other providers should have their own syntax).

Caching packages requires a little bit more work. For example, on Shippable I used the ability to provide my own Dockerfile, build CI image and reuse it for subsequent builds:

#build:
  pre_ci:
              - docker build ci-image -t project/ci:tl-scala-2.12.4-sbt-1.1.5-graalvm-1.0.0-rc1
  pre_ci_boot:
    image_name: project/ci
    image_tag:  tl-scala-2.12.4-sbt-1.1.5-graalvm-1.0.0-rc1
    options:    "-e HOME=/root"

Other providers always provide you with an ability to build the image locally, push it to DockerHub and fetch it from there.

Both changes saved me about 5-7 minutes on each CI run.

Keep sbt’s JVM warm

Theoretically, it should go without saying… but I already met people who used sbt for a year or two and never learned about shell mode. Also never run several tasks at once. So, yeah, you can replace:

> sbt task1 # starts JVM, resolves settings, runs task1, shuts down the JVM
> sbt task2 # starts JVM, resolves settings, runs task2, shuts down the JVM
# JVM started and warmed up twice

with:

> sbt # starts JVM, resolves settings, opens shell
>> task1 # run task1
>> task2 # run task2
>> # JVM is started and warmed up once

or (in CI scripts):

sbt task1 task2 # run task with one JVM start

If you are developing locally you might use reload to load sbt settings anew (e.g. after switching branches) without losing JVM warmup.

If you are working on several projects at once you might consider using bloop for starting one global JVM and having each sbt instance actually run within it.

It is hard for me to estimate the speedup, but with each new task run without JVM interruption, it saves a bit more time. Additionally in bigger projects resolving settings itself takes several seconds or more, so doing it once gives noticeable effects during local development.

Careless database

If you run any kind of integration tests against the database, you are probably annoyed by how long it takes (compared to mocks or an in-memory DB). Still, you want to run them against an actual database, to make sure you haven’t accidentally used that one feature of your DB library that doesn’t work with your RDBMS of choice (e.g. Slick + upsert + older PostgreSQL).

However, this is one of the few cases where it makes perfect sense to turn off the D part of the ACID.

Let me show an example with PostgreSQL:

echo 'fsync = off' >> /etc/postgresql/9.6/main/postgresql.conf
echo 'synchronous_commit = off' >> /etc/postgresql/9.6/main/postgresql.conf

It basically means: I want my SQL to have these kick-ass benchmarks like MongoDB, even if it is only a bit safer than writing to /dev/null.

This change saved me about 30-40% of the time, each time I run integration tests on CI.

Just to be sure: never ever use this on production. Or on your local computer. The official documentation mentions only one valid usage for these on production: when you are setting up a server from e.g. backup, need to quickly load data, and nothing bad happens if things fail - you then just remove the database and start anew.

There are other things you can try out when it comes to testing against a database, but they are more invasive, so I will describe them later on.

scalac

If you don’t have a really good reason, you should keep up with Scala updates. Most of the time it is just a matter of waiting a month or two after a major release for libraries to migrate, then updating Scala and libraries, then running tests to check if things are ok. I might be weird but I never needed to spend the whole process more than a day, though with bigger multi-project codebases I guess it can take even a week. (Because moving tickers on Jira board is exhausting). Minor version update usually goes even easier: bump version and you are done.

To understand the benefit just take a look at the benchmarks. You can see that scalac team became dedicated to avoiding performance regressions and introduces small, but consistent improvements over time, making each release slightly faster.

Additionally, you might want to check some experimental compiler flags. Scala 2.12.5 introduced -Ybackend-parallelism N, which lets you emit bytecode in parallel - PR claimed 4-5% improvement with 8 backend threads.

Another interesting flag is -Yinduction-heuristics, which might result in a faster derivation in some cases. Currently it is available only in Typelevel Scala 4, but it allows me to get another few % speedup. However, TL Scala 4 is based on 2.12.4, and the current version is Scala 2.12.6. I can believe not many people would consider what basically is a downgrade with experimental features turned on. Hopefully, improvements tested with this flag will be available by default in Scala 2.13.

Modules and dependencies

The less scalac needs to process at a time the better. If we phrase it like that, you might understand why next two sentences don’t conflict with each other.

You should have as little dependencies as possible.

Several modules depending on one another might be better than having one monolithic module.

Now, if you think about it, the compiler should have less work analyzing already compiled classes (it doesn’t have to prove anything about them, just assume), while classes that are yet to be compiled need to get through all these costly phases.

To remove redundant libraries you might check sbt-dependency-graph. It let me learn that adding e.g. swagger-core resulted in adding jackson and several other libraries I wouldn’t otherwise use as well. Then I wrote sbt-swagger-2 to generate swagger.json with build tool and not burden compiler nor runtime with an otherwise redundant set of libraries. But bigger gains could come from restructuring your project.

As I mentioned earlier my old project at some point reached 45 minutes in CI while it had ~60k lines of code. I noticed, that one of my modules - one containing the whole domain layer: entities, values, services, repositories, factories, published language and implementation - took a reaaaally long time to compile. I think like 15-20 minutes of these 45 was just compilation of this module. I decided to try to split it from:

into

It required a few changes, but not that much - I already had things nicely layered, so in a few places I split the interface from the implementation, but mostly just moved things from one directory to another. These actions didn’t change my architecture, nor did they affect how it works, but still let me cut down 6-7 minutes from the CI build.

I also noticed a benefit locally. Even with Zinc incremental compiler, I see some inefficiencies. E.g. I have to compile 212 files, Zinc compiles 32 of them, then when it succeeds it attempts to compile remaining 280. If it crashes at this point, it only caches the result of successful 32-file compilation. So, once I fix the bug, I will have to compile all 280 files again, even if only one of them had an error, that couldn’t affect the correctness of the rest of them. More modules force sbt to have more stepping stones, which might be helpful during development.

Of course, you shouldn’t make a 100 modules 10 files each. Resolving settings and dependencies from dependent modules is not free, so such split would have its own inefficiencies. Especially if you use snapshots.

You can also remember (and bigger private companies always do), that if you are reusing some component and hardly ever update it, probably making a separate artifact and putting it into your artifactory would be best. Fetching (and storing in cache) will always be faster than building.

No snapshots

I mentioned that snapshots are bad, but why?

If you are loading your project the first time (or after some changes to dependencies), sbt will query repositories and then fetch dependencies to your local cache. After that sbt does a rather good job at not querying artifactories when not needed. Unless one of the dependencies is a snapshot.

The idea of a snapshot is that it is not a release. Which means it is not final. Which means that each time you start the builder you should make sure, that it didn’t change or redownload it if it did. It means, that even one snapshot in your dependencies list will force sbt to query artifactories, even if nothing actually changed. It doesn’t affect CI this much - it has to do it anyway, even with CI cache enabled - but on local development, it can be quite annoying.

The other implication has nothing to do with speed, but is even more serious: build is not reproducible. You might trigger a build on CI, have all tests pass, then you click merge… and because snapshot was updated in the meantime, your master breaks unexpectedly. Of course, git bisect and stuff also become unreliable when you are building against moving target.

Thing is: sometimes library authors don’t give you a choice. Old versions break your code, next release is not yet available, but you can try out a snapshot. Deadline is approaching, you have no control over the library, so you have no other option.

I think, that as a maintainer it is your responsibility to never force your users to rely on snapshot builds. A much better option is something like version containing commit hash number (example from Monix 3). I do it using sbt-git. It has another advantage: if something breaks, I can pinpoint exact commit, even though it is not a release build. If you don’t want to do that, you can always publish milestone or release candidate builds. They may not be a release, but they are still target-safe to build against.

Shapeless and macros

Scala without code generation (by the compiler) might compile slowly. Macros can add a bit. Shapeless and implicit-based derivation can add a lot.

When I developed my old project I had a problem of manually rewriting API case classes to domain case classes back and forth. It inspired chimney project, which I started with Piotr Krzemiński (who basically took ownership of the project ever since).

The initial version of the library was Shapeless based and contributed a few minutes to overall build time. Later on, I tried to use it for some extreme case (it was another project): I had 10 model case classes and 10 API case classes. Big case classes, some 20+ attributes. These 10 transformations compiled in over 10 minutes.

Later on, Piotr rewrote shapeless into macros without changing API. These 10 complex files compiled in under 2 minutes. Still long, but hey!

It’s worth mentioning, that when we use implicit-based derivation in our code, the results are not automatically shared across the codebase. E.g. if we use Circe for the automatic generation of some ApiModel inside 3 files, then the Encoder/Decoder will be derived 3 times: for each file individually. Actually, for each distinct scope individually - if one of the files had some implicit in scope, that would affect derivation, then there would be (at least) 2 separate derivations.

If we don’t want to modify the result but want to just share one implicit across the whole codebase it might be a good idea to put it into the companion object. Additionally, some libraries (e.g. Circe) already have macro annotations for adding these for companion object without additional effort from our side.

// import all necessary implicits here
@JsonCodec
case class ApiModel(value: String)

Personally, I noticed, that when I used shapeless with macro-based implementation it spared me a few minutes. Additionally, usage of @JsonCodec saved me a minute… though I haven’t noticed it initially. I used each codec once in production code, and once in test code. When I only measured the time of compile I haven’t noticed any difference. Only after I started measuring test:compile I found out, that it actually helps me.

Speeding up local tests with a database

I promised I’ll tell another way of speeding up integration tests (namely tests with a database), so that one doesn’t have to rely on limited synchronization with the drive. So, another way is… limiting situations where a database would attempt them.

Slick-pg has a good starting point for such tests:

start with an empty database
during a test at first create the schema and populate tables
then add your actual logic and assertions
once done drop the schema
run it all in the transaction

It might sound stupid, but it indeed affects performance! Because it all runs in a transaction Postgres limits number of accesses to disk and tries to do more things in memory.

Still, it’s a waste of time to drop the database. Thus next optimization: instead of dropping the database, use rollback.

database.run {
  createSchema
    .andThen(dbioWithTestsAndAssertions)
    .flatMap { result =>
      // any exception triggers rollback
      DBIOAction.failed(IntentionalRollbackException(result))
    }
    .transactionally
}.recover {
  case IntentionalRollbackException(result) => result
}

To improve things even more, we can create the schema only once - before running tests. It might create an issue though with the IDs - if your tests assume that the first record has an id = 1, then it will fail (rollback doesn’t reset ids). But that can be easily fixed:

database.run {
  sql"""SELECT setval('"' || tb.table_name || '_' || cols.column_name || '_seq"', 1, false)
        FROM information_schema.tables AS tb
        INNER JOIN information_schema.columns AS cols ON tb.table_name = cols.table_name
        WHERE tb.table_name <> 'schema_version'
        AND tb.table_schema = 'public'
        AND column_default like 'nextval(%)'""".as[Int]
    .andThen(dbioWithTestsAndAssertions)
    .flatMap { result =>
      // any exception triggers rollback
      DBIOAction.failed(IntentionalRollbackException(result))
    }
    .transactionally
}.recover {
  case IntentionalRollbackException(result) => result
}

As far as I can say - it might indeed force you to rewrite some integration tests (unless you already extracted the whole create-test-drop procedure into a method). So, is it worth it?

In my old project, I started with 300s for integration test run. Adding transactions got me to 240s. Rollback and creating the schema before tests got me to 120s. Disabling sync and commits got me to 70s.

Here database is just an example, but in general integration tests might be slowed down by external dependencies and IO - it’s worth trying out different options in order to make sure that the test doesn’t waste IO time on unrelated things.

Alternative scalac

Once one runs out of options, one can always try another compiler.

Grzegorz Kossakowski has proven with Kentacky Mule, that it is possible to define a subset of Scala, that can be compiled in parallel and really fast. His findings were tuned by Twitter into an actual alternative, that is Reasonable Scala. This is NOT a drop-in replacement for whole Scala, but if one is using only some subset of features (that match Twitter’s use cases), then it is possible to get several times faster compilation. However, that is at odds with using some type-level-heavy approach.

Another option is to try out Triplequote Hydra. It is a drop-in replacement for the Scala compiler which can give 1.5x-3.5x speedup. However, this is a premium feature, so I had no ability to test it and probably couldn’t afford it.

I list these options last, as they can impact architecture (or budget) so heavily, that I would recommend them as the last resort. (Not to mention the fact, that I didn’t have the opportunity to test them). Hopefully, Dotty will optimize things further and maybe incorporate some findings from Reasonable Scala.

Summary

This is a list of things I personally tested (excluding other compilers) on one project I worked some time ago. This project at its worst moment ran on CI for 45 minutes while it had 60k lines of code. Currently, it runs for 17 minutes having 70k lines of code. Big difference.

It doesn’t prove, that these methods are universal, or that commercial solutions (better hardware, higher CI tier, premium JVMs, Hydra) are redundant. However, for me, it shows that quite often the reasons for slow build times lie in our carelessness with code and its maintenance, and the speed of the Scala compiler is not the sole offender. So with relatively simple and cheap means, we can make the project better.

Only after we tried all of these methods it makes sense to invest in more expensive solutions. (Though I do believe, that companies paying for them already know that).

I do not know if it would help you, and to what degree, but surely it is a nice checklist to have.

kubuszok.com