Speed up things in scalac and sbt

1 Jun 2018• series: sbt • tags: scala sbt

Scala is not the fastest language to compile. sbt adds its own overhead. So in a life of most (every?) business applications written in Scala, CI build is so long, that after git push you can go watch next episode of a TV show. Local changes take ages, even with Zinc. And you don’t want to rewrite half the stuff not has the budget for considering things like Triplequote Hydra. What then?

Well, let’s face it - you have limited options. Even though, I noticed that sometimes, long build (and test) times are not an issue with the compiler itself, but rather with how you structure and run things. So there, are some options you might try out before doing invasive changes or switching to other Scala compiler.

In this post, I want to list a few things one might to try when the CI build or local development becomes too slow. None is guaranteed to work - every project is unique. Also, I didn’t want to provide detailed benchmarks - same reason, every project is unique and whatever worked for me might not work for you and vice versa. I will just provide estimated numbers for a project I worked some time ago. For CI exact numbers it would be even harder to get since I noticed that the one I am using might have different build times for the same commit over the day. I can only guess, that when USA starts its day, more people push to repo triggering more build and the small latencies in infrastructure under the burden adds up into several minute difference.

Tune JVM

Since we want to start with least invasive changes let’s start with the way you use JVM.

sbt-extras by Paul Phillips is a wrapper for vanilla sbt that adds several interesting options:

saner JVM defaults. In one project (which had some more shapeless stuff), at some point, CI started giving me stack overflow on implicit resolution, and things were getting really slow. It took about 30 minutes for compile and test 30k lines of code. By simply using the wrapper defaults changes, so that I stopped seeing this error ever again. This let me grow the project to about 60k lines of code, where at some point it took 45 minutes to complete,
.jvmopts file - sbt allow us to use JVM_OPTS (and SBT_OPTS) to set up JVM options: the amount of memory, garbage collector, whether or not use the server mode. Wrapped adds the ability to use the file to set up these options and passes them to sbt. You can check how e.g. cats use it to give several gigs of memory to sbt,
automatic download of right version of sbt and Scala - if you want to onboard new dev with little experience with sbt, this script will save him few minutes.

Now you can check your CI specs, you dev machine spec, choose lowest common denominator and tune JVM against it.

Once you use up all the other options, you might increase your CI plan and bump these values if needed.

Change JVM

Some people might not know, but there are more JVMs and JDKs than just OpenJDK and Oracle JDK. IBM released their version. Azul Zulu and Azul Zing are a thing (especially the latter if you need some absurd GC capabilities like no GC pause with 64GB memory, or caching optimizations to start JVM already warmed up). But the JVM I want you to try out next is GraalVM.

In my case cold, OpenJDK’s JVM compiled locally all production code, unit and integration tests in ~620s. Once I switched to GraalVM, compilation time on cold VM dropped to ~515s. Complete CI builds also speed up: at a time I already did some tuning, so build took 24 minutes. With Graal it dropped to 20 minutes.

Small warning: GraalVM is currently not shipped with all Oracle JVM’s certificates, so e.g. fetching artifacts from Sonatype will fail. That is unless you install them yourself:

GRAALVM_HOME=/usr/lib/jvm/java-8-graal # location of your GraalVM installation
ORACLE_JRE_URL='http://download.oracle.com/otn-pub/java/jdk/8u171-b11/512cd62ec5174c3487ac17c61aaa89e8/jdk-8u171-linux-x64.tar.gz'
curl -jkL -H "Cookie: oraclelicense=accept-securebackup-cookie" $ORACLE_JRE_URL | \
  tar -zxvf - --directory $GRAALVM_HOME \
              --wildcards "*/jre/lib/security/cacerts" \
              --strip-components 1

Caching CI deps

I used up things that might speed up local development without any changes to the code nor build script. But before I move on to this (slightly) invasive changes let us try to speed up CI even more.

One of the things that add time to CI is getting dependencies. I mean here both fetching artifacts for sbt as well as installing packages by the package manager. It can cut a few minutes from each build if the CI does not have to do it every time.

Caching artifacts is easy. It is just CI provider dependent, but you can add:

cache:
  directories:
    - $HOME/.ivy2/cache
    - $HOME/.sbt/boot

before_cache:
  # Tricks to avoid unnecessary cache updates
  - find $HOME/.ivy2 -name "ivydata-*.properties" -delete
  - find $HOME/.sbt -name "*.lock" -delete

to your .travis.yml file or

#build:
  cache_dir_list:
              - $HOME/.ivy2/cache
              - $HOME/.sbt

to .shippable.yml. (Other providers should have their own syntax).

Caching packages require a little bit more work. For e.g. Shippable I used the ability to provide my own Dockerfile, build CI image and reuse it for consequent builds:

#build:
  pre_ci:
              - docker build ci-image -t project/ci:tl-scala-2.12.4-sbt-1.1.5-graalvm-1.0.0-rc1
  pre_ci_boot:
    image_name: project/ci
    image_tag:  tl-scala-2.12.4-sbt-1.1.5-graalvm-1.0.0-rc1
    options:    "-e HOME=/root"

Other providers always provide you with an ability to build the image locally, push it to DockerHub and fetch it from there.

Both changes saved me about 5-7 minutes on each CI run.

Keep sbt’s JVM warm

Theoretically, it should go without saying… but I already met people who used sbt for a year or two and never learned about shell mode. Also never run several tasks at once. So, yeah, you can replace:

> sbt task1 # starts jvm, resolves settings, run task1, shutdowns jvm
> sbt task2 # starts jvm, resolves settings, run task2, shutdowns jvm
# jvm started and warmed up twice

with:

> sbt # starts jvm, resolves settings, opens shell
>> task1 # run task1
>> task2 # run task2
>> # jvm is started and warmed up once

or (in CI scripts):

sbt task1 task2 # run task with one jvm start

If you are developing locally you might use reload to load sbt settings anew (e.g. after switching branches) without losing JVM warmup.

If you are working on several projects at once you might consider using bloop for staring one, global JVM and having each sbt instance actually run within it.

It is hard for me to estimate the speedup, but with each new task run without JVM interruption, it saves a bit more time. Additionally in bigger projects resolving settings itself take several seconds or more, so doing it once gives noticeable effects during local development.

Careless database

If you run any kind of integration tests against the database, you are probably annoyed by how long it takes (compared to mocks or in memory db). Still, you want to run them against an actual database, to make sure you haven’t accidentally used that one feature of your DB library that doesn’t work with your RDMS of choice (e.g. Slick + upsert + older PostgreSQL).

However, this is one of few cases where it makes perfect sense to turn off the D part of the ACID.

Let us show example with PostgreSQL:

echo 'fsync = off' >> /etc/postgresql/9.6/main/postgresql.conf
echo 'synchronous_commit = off' >> /etc/postgresql/9.6/main/postgresql.conf

It basically means: I want my SQL to have these kick-ass benchmarks like MongoDB, even if it is only a bit safer than writing to /dev/null.

This change saved me like 30-40% of time, each time I run integration tests on CI.

Just to be sure: never ever use this on production. Or on your local computer. The official documentation mentions only one valid usage for these on production: when you are settings up a server from e.g. backup, need to quickly load data, and nothing bad happens if things fail - you then just remove the database and start anew.

There are other things you can try out when it comes to testing against a database, but they are more invasive, so I will describe them later on.

scalac

If you don’t have a really good reason, you should keep up with Scala updates. Most of the times it is just a matter of waiting a month or two after a major release for libraries to migrate, then updating Scala and libraries, then running tests to check if things are ok. I might be weird but I never needed to spend the whole process more than a day, though with bigger multi-project codebases I guess it can take even a week. (Because moving tickers on Jira board is exhausting). Minor version update usually goes even easier: bump version and you are done.

To understand the benefit just take a look at the benchmarks. You can see that scalac team became dedicated to avoiding performance regressions and introduces small, but consequent improvements over the time, making each release slightly faster.

Additionally, you might want to check some experimental compiler flags. Scala 2.12.5 introduced -Ybackend-parallelism N, which let you emit bytecode in parallel - PR claimed 4-5% improvement with 8 backend threads.

Other interesting flag is -Yinduction-heuristics, that might result in a faster derivation in some cases. Currently it is available only in Typelevel Scala 4, but it allows me to get another few % speedup. However, TL Scala 4 is based on 2.12.4, and the current version is Scala 2.12.6. I can believe not many people would consider what basically is a downgrade with experimental features turned on. Hopefully, improvements tested with this flag will be available by default in Scala 2.13.

Modules and dependencies

The less scalac needs to process at a time the better. If we phrase it like that, you might understand why next two sentences don’t conflict with each other.

You should have as little dependencies as possible.

Several modules depending on one another might be better than having one monolithic module.

Now, if you think about it, the compiler should have less work analyzing already compiled classes (it doesn’t have to prove anything about them, just assume), while classes that are yet to be compiled need to get through all these costly phases.

To remove redundant libraries you might check sbt-dependency-graph. It let me learn, that adding e.g. swagger-core resulted in adding jackson and several other libraries I wouldn’t otherwise use as well. Then I wrote sbt-swagger-2 to generate swagger.json with build tool and not burden compiler nor runtime with an otherwise redundant set of libraries. But bigger gains could come from restructuring your project.

As I mentioned earlier my old project at some point reached 45 minutes in CI while it had ~60k lines of code. I noticed, that one of my modules - one containing whole domain layer: entities, values, services, repositories, factories, published language and implementation - took a reaaaally long time to compile. I think like 15-20 minutes of these 45 was just compilation of this module. I decided to try to split it from:

into

It required few changes, but not that much - I already had things nicely layered, so in few places I split interface from implementation, but mostly just moved things from one directory to another. These actions didn’t change my architecture, nor affected how it works, but still let me cut down 6-7 minutes from the CI build.

I also noticed benefit locally. Even with Zink incremental compiler, I see some inefficiencies. E.g. I have to compile 212 files, Zinc compiles 32 of them, then when it succeeds it attempts to compile remaining 280. If it crashes at this point, it only caches the result of successful 32-file compilation. So, once I fix the bug, I will have to compile all 280 files again, even if only one of them had an error, that couldn’t affect the correctness of the rest of them. More modules forces sbt to have more stepping stones, which might be helpful during development.

Of course, you shouldn’t make a 100 modules 10 files each. Resolving settings and dependencies from dependent modules is not free, so such split would have its own inefficiencies. Especially if you use snapshots.

You can also remember (and bigger private companies always do), that if you are reusing some component and hardly ever updates it, probably making a separate artifact and putting it into your artifactory would be best. Fetching (and storing in cache) will always be faster than building.

No snapshots

I mentioned that snapshots are bad, but why?

If you are loading you project the first time (or after some changes to dependencies), sbt will query repositories and then fetch dependencies to your local cache. After that sbt does a rather good job at not querying artifactories when not needed. Unless one of dependencies is a snapshot.

The idea of a snapshot is that it is not a release. Which means it is not final. Which means that each time you start the builder you should make sure, that it didn’t changed or redownload it if it did. It means, that even one snapshot in your dependencies list will force sbt to query artifactories, even if nothing actually changed. It doesn’t affect CI this much - it has to do it anyway, even with CI cache enabled - but on local development, it can be quite annoying.

The other implication has nothing to do with speed, but is even more serious: build is not reproducible. You might trigger a build on CI, have all tests pass, then you click merge… and because snapshot was updated in the meantime, your master breaks unexpectedly. Of course, git bisect and stuff also become unreliable when you are building against moving target.

Thing is: sometimes libraries authors don’t give you a choice. Old version break your code, next release is not yet available, but you can try out a snapshot. Deadline is approaching, you have no control over the library, so you have no other option.

I think, that as a maintainer it is your responsibility to never force your users to rely on snapshot build. A much better option is something like version containing commit hash number (example from Monix 3). I do it using sbt-git. It has another advantage: if something breaks, I can pinpoint exact commit, even though it is not a release build. If you don’t want to do that, you can always publish milestone or release candidate builds. They may not be a release, but they are still target safe to build against.

Shapeless and macros

Scala without code generation (by compiler) might compile slow. Macros can add a bit. Shapeless and implicit-based derivation can add a lot.

When I developed my old project I had a problem of manually rewriting API case classes to domain case classes back and forth. It inspired chimney project, which I started with Piotr Krzemiński (who basically took ownership of the project ever since).

The initial version of the library was Shapeless based and contributed few minutes to overall build time. Later on, I tried to use it for some extreme case (it was another project): I had 10 model case classes and 10 API case classes. Big case classes, some 20+ attributes. These 10 transformations compiler over 10 minutes.

Later on, Piotr rewrote shapeless into macros without changing API. These 10 complex files compiler under 2 minutes. Still long, but hey!

It’s worth mentioning, that when we use implicit-based derivation in our code, the results are not automatically shared across the codebase. E.g. if we use Circe to automatic generation of some ApiModel inside 3 files, then the Encoder/Decoder will be derived 3 times: for each file individually. Actually, for each distinct scope individually - if one of the files had some implicit in scope, that would affect derivation, then there would be (at least) 2 separate derivations.

If we don’t want to modify the result but want to just share one implicit across the whole codebase it might be a good idea to put it into companion object. Additionally, some libraries (e.g. Circe) already has macro annotations for adding these for companion object without additional effort from our side.

// import all necessary implicits here
@JsonCodec
case class ApiModel(value: String)

Personally, I noticed, that when I used shapeless with macro-based implementation it spared me few minutes. Additionally, usage of @JsonCodec saved me a minute… though I haven’t noticed it initially. I used each coded once in production code, and once in test code. When I only measured the time of compile I haven’t noticed any difference. Only after I started measuring test:compile I found out, that it actually helps me.

Speeding up local test with database

I promised I’ll tell another way of speeding up integration test (namely tests with a database), so that one doesn’t have to rely on limited synchronization with the drive. So, another way is… limiting situations where a database would attempt them.

Slick-pg has a good starting point for such tests:

start with an empty database
during a test at first create the schema and populate tables
then add your actual logic and assertions
once done drop the schema
run it all in the transaction

It might sound stupid, but it indeed affect performance! Because it all runs in transaction Postgres limits number of accesses to disk and tries to do more things in memory.

Still, it’s a waste of time to drop the database. Thus next optimization: instead of dropping the database, use rollback.

database.run {
  createSchema
    .andThen(dbioWithTestsAndAssetions)
    .flatMap { result =>
      // any exception triggers rolback
      DBIOAction.failed(IntentionalRollbackException(result))
    }
    .transactionally
}.recover {
  case IntentionalRollbackException(result) => result
}

To improve things even more, we can create schema only once - before running tests. It might create an issue though with the ids - if your tests assume that the first record has an id = 1, then it will fail (rollback doesn’t reset ids). But that can be easily fixed:

database.run {
  sql"""SELECT setval('"' || tb.table_name || '_' || cols.column_name || '_seq"', 1, false)
        FROM information_schema.tables AS tb
        INNER JOIN information_schema.columns AS cols ON tb.table_name = cols.table_name
        WHERE tb.table_name <> 'schema_version'
        AND tb.table_schema = 'public'
        AND column_default like 'nextval(%)'""".as[Int]
    .andThen(dbioWithTestsAndAssetions)
    .flatMap { result =>
      // any exception triggers rolback
      DBIOAction.failed(IntentionalRollbackException(result))
    }
    .transactionally
}.recover {
  case IntentionalRollbackException(result) => result
}

As far as I can say - it might indeed force you to rewrite some integration tests (unless you already extracted the whole create-test-drop procedure into a method). So, is it worth it?

In my old project, I started with 300s for integration test run. Adding transactions got me to 240s. Rollback and creating the schema before tests got me to 120s. Disabling sync and commits got me to 70s.

Here database is just an example, but in general integration tests might be slowed down by eternal dependencies and IO - it’s worth trying out different options in order to make sure that test doesn’t waste IO time on unrelated things.

Alternative scalac

Once one run out of options, one can always try another compiler.

Grzegorz Kossakowski has proven with Kentacky Mule, that it is possible to define a subset of Scala, that can be compiled in parallel and really fast. His findings were tuned by Twitter into an actual alternative, that is Reasonable Scala. This is NOT a drop-in replacement for whole Scala, but if one is using only some subset of features (that match Twitter’s use cases), then it is possible to get several times faster compilation. However, that is at odds with using some type-level-heavy approach.

Another option is to try out Triplequote Hydra. It is drop in replacement for Scala compiler which can give 1.5x-3.5x speedup. However, this is a premium feature, so I had no ability to test it and probably couldn’t afford it.

I list these options last, as they can impact architecture (or budget) so heavily, that I would recommend them as the last resort. (Not to mention the fact, that I didn’t have the opportunity to test them). Hopefully, Dotty will optimize things further and maybe incorporate some findings from Reasonable Scala.

Summary

This is a list of things I personally tested (excluding other compilers) on one project I worked some time ago. This project at its worst moment run on CI for 45 minutes while it had 60k lines of code. Currently, it runs for 17 minutes having 70k lines of code. Big difference.

It doesn’t prove, that these methods are universal, or that commercial solutions (better hardware, higher CI tier, premium JVMs, Hydra) are redundant. However, for me, it shows that quite often the reasons for slow build times lies in our carelessness with code and its maintenance, not and speed of Scala compiler is not the sole offender. So with relatively simple and cheap means, we can make the project better.

Only after we tried all of these methods it makes sense to invest in more expensive solutions. (Though I do believe, that companies paying for them already know that).

I do not know if it would help you, and to what degree, but surely it is a nice checklist to have.

kubuszok.com