Modeling in Scala, part 1: modeling your domain
Scala gives us a lot of power. We can easily model things with classes and OOP, we can pull in functional libraries and model everything with functions and values, we can implement a stateless monolith or a distributed system. And today we have a lot of books and tutorials that cover these topics: how to glue the code together to create a maintainable application with raw Futures/Akka/Cats ecosystem/Zio ecosystem/Scala as typed Python.
However, before we glue things together it would be nice to discuss a bit how can we design the pieces of code that we will glue, the ones that the business is directly interested in us providing: models reflecting their business domain. The only book that comes to my mind is Domain Modeling Made Functional by Scott Wlaschin - it is a great book, mostly language agnostic (most of its F# reads like a pseudocode), but it could help to know how to translate some concepts to Scala (the only presentation I know of is Scala 3 by Example - Algebraic Data Types for Domain Driven Design - Part 1 and Part 2 by Philip Schwarz, the official docs isn’t much of a guideline, more like a list). So in this article (largely inspired by the book) we will discuss how we could implement our domain in Scala.
Before we jump-in to analysis and coding, disclaimer. I will refer to Domain Driven Development a lot. But DDD is not about some rules how to code. It’s about embracing certain attitude by your whole company. What do I mean? When you implement invoiceing you want to be on the same page as your financial experts. When you implement discount coupons you want to understand their goals and usages the same way as your sales. When you implement a FooBar you want to understand it the same way as your users/customers, and you want to userstand it the same way as designes that will present it to the customers, and the same way as merketing which will promote it, etc. It helps that you all use the same words, and that the same words mean the same things, and that there might be actually several smaller pieces of your whole business where similar words might be used for slightly different things: for the security team “user” might be an entity that can have credentials and authorization for specific resources, while for finanacial team it will be a set of data needed to make a charge. These might be connected to the same real-world entity (here: human being), but it might not be a good idea to put all the possible definitions into one single model. After all model is just an useful approximation. So Security Subdomain of your business might might have its own User definition, Financial subdomain its own, so when you talk about the User you should explicitly tell which one, so that everyone would be on the same page.
This transformation of attitude (attitude not some shamanistic rituals), is really beneficial and really hard, and I don’t want to discuss it here. I want to discuss the part where - when you already embraced the idea - want to implement it. So when I’ll use some terminology from DDD, please treat it just a way of avoiding ambiguity. If your embraced the attitude it is not so important what names and practices you pick as long as they will work for you.
I started writing this article on 14th July 2021, and kinda finished it on 16th of August 2021. I wanted to expland on A Few Tips on Modellling Things in Scala presentation that I have at Scala Love earlier that year. It was supposed to be the first one of a series - next posts could be about event sourcing and testing without Mockito - and I wanted to publish it when I’d have at least 2 articles in the pipeline for publishing.
As you can see 3 and a half year later, I haven’t written this second article yet, which blocked me from publishing this one. I think that while outdated, it might still be useful, so I decided to publish it after some editing.
Blank slate
When I sit down with a new functionality or even project, I often feel to urge to immediatelly start drawing SQL schemas and relations, and what can be cached, and how to create amazing architecture… I learned to ignore it. I don’t mean that good architecture is needed, it’s the opposite, it’s just that if you wander away into realm of code, you you won’t be there to listen what this code should do. You might start thinking about memory optimisations, or how you would push it through distributed pipeline, or should you need a window function in SQL. There will be plenty of time for that later on. Who needs the functionality, why they need it, that they don’t need, what this functionality is not. What do they actually mean when they say [put any word used by the expert]. What that thing is made of (tip: it is still not a good moment to think of them smallest things in terms of String
s, Int
s and Double
s). What are the posible values and which should be rejected. Are User’s First Name and Last Name always needed, or only during the checkout? Should the list of Addresses contain at least one of them? We also need to ask whether we move some data around “as it is” or treat it as a reference to something that has a lifecycle. If you stored Address as something which has it’s own ID and lifecycle, then when you updated it, you would accidentally update past invoices and orders. On the other hand, if the customer informed you that they want to update the Order (e.g. change Shipping Address), you probably would like to just update the Order instead of cancelling it and creating a new one.
So, it helps to ask yourself what you need to know, and then keep on asking people who do know it. Sometimes getting back to them in the middle of the coding, when you understand that you missed something. And, when you starting to implement all these domain operations, and models representing data… you still don’t have to think about the DB schema, JSON documents and network calls. There is this thing called being persistence-agnostic. Your models should try to reflect reality in the best way. You won’t necessarily achieve that if you will see your models only as representations of a SQL table row. Same when it comes to calling external services (or other subdomains). Even if you realise you payment with Stripe, it doesn’t mean that your Payment models have to Stripe API representations (especially if one day you will be using more payment providers). In a way, it’s nice - for a moment - to forget that there are some storages and network calls, and just imagine that all of your data is already in memory. (And in fact, you can have multiple implementations of your side-effecting interfaces, where one of them would be in-memory implementation!).
Let’s describe some domain
So let’s model a customer making an order. For that we surely would need to:
- define a customer
- define an order
- a way of checking out the order
These decriptions are way too abstract, we need to start translating them into something specific.
Function for actions, values for data
So our first draft could be something like this:
type Customer
type Order
def checkout(customer: Customer): Order = ???
Ok, we have the first version, but it is not implemented. Types are abstract and uninstantiable, so we cannot implement the checkout
. At this point we aren’t sure what it needs to do, and whether it should be a part of some model, as a method, or a standalone thing. So we go to discuss it with people who handle orders, and learn that to make an order we need some data
- customer’s first and last name
- billing and shipping addresses
- list of items that are being ordered
So let’s describe that in Order
’s definition:
type FirstName
type LastName
type Address
type Item
final case class Order(
customerFirstName: FirstName,
customerLastName: LastName,
billingAddress: Address,
shippingAddress: Address,
items: List[Item]
)
This slowly starts to look like definition that we could show to someone. But we still cannot implement the checkout
. Where does the first name and last name come from? Should we always take data of the currently logged-in user? Or is this something that users would fill themselves and we would only provide autofill option in the UI? Same with address. Should we maintain a list of predefined addresses? Should users only be able to pick an address out of their lists? Or should they provide them individually for each order? Where does the items come from? We discuss things with our experts (sometimes over 5 meetings across a whole week) and we learned that in our business:
- we always use current user’s data for first name, last name and billing address
- they are able to customize their shipping address
- items will come from the basket - customer will first place the items there and later during the checkout we will retrieve them
That helps us move a bit further. We now know that the Customer
holds some data:
final case class Customer(
firstName: FirstName,
lastName: LastName,
billingAddress: Address
)
With that we are able to implement the checkout
function, modifying its signature to adjust to what we learned:
def checkout(
customer: Customer,
shippingAddress: Address, // added
items: List[Item] // added
): Order =
// implemented!
Order(
customerFirstName = customer.firstName,
customerLastName = customer.lastName,
billingAddress = customer.billingAddress,
shippingAddress = shippingAddress,
items = items
)
It looks really good! Not only we are able to express eveything we learned so far with the types, the code is also so transparent that you could show it to a non-technical person and thay would be able to keep up!
case
and sealed
are your friends
So we take this and go to our experts to show them our implementation so far. And they notice several things:
Order
should have some ID, so that they would know which order they are handling, it should be generated according to the specificationOrder
should have a status: unpaid, paid, in progress, shipped, cancelled - after creation order should be unpaid- we shouldn’t be able to create an order without items
So, let’s implement these changes in our code:
type OrderID
// IO because generation is probably a side-effect
def generateNewOrderID: IO[OrderID] = ???
type OrderStatus
final case class Order(
id: OrderID, // added
customerFirstName: FirstName,
customerLastName: LastName,
billingAddress: Address,
shippingAddress: Address,
items: NonEmptyList[Item], // non-empty! e.g. from Cats
status: Status // added
)
We will have to eventually define
generateNewOrderID
and run it, and pass its result tocheckout
… but ATM we are modellingcheckout
so let’s leave it for now.
Hmm, we need to define Status
, to fix checkout
. In Scala 2 we could do it like this:
// In Scala 2 I always add Product with Serializable
// to avoid infering Status with Product with Serializable
// because of case classes and objects - but it is not necessary.
sealed trait Status extends Product with Serializable
object Status {
case object Unpaid extends Status
case object Paid extends Status
case object InProgress extends Status
case object Shipped extends Status
case object Cancelled extends Status
}
There is an old, deprecated way of defining enumerations using
scala.Enumeration
. There is also a recommended way of improving thissealed trait
+case object
using Enumeratum library. We’ll explain later why the former is discouraged and the latter encouraged.
and in Scala 3 like:
enum Status:
case Unpaid, Paid, InProgress, Shipped, Cancelled
I will show examples in both Scala 2 and Scala 3 syntax the first time I introduce some idea, but later on I will stick to Scala 3 syntax as it’s much more readable.
With that we can update our checkout
definition:
def checkout(
orderID: OrderID, // added
customer: Customer,
shippingAddress: Address,
items: List[Item]
): Either[String, Order] =
// validated items!
NonEmptyList.fromList(items) match {
case Some(nonEmptyItems) =>
Right(
Order(
id = orderID,
customerFirstName = customer.firstName,
customerLastName = customer.lastName,
billingAddress = customer.billingAddress,
shippingAddress = shippingAddress,
items = nonEmptyItems,
status = Status.Unpaid
)
)
case None =>
Left("Order Items shouldn't be empty")
}
There are no primitives in my domain
We are missing some type definitions.
type FirstName
type LastName
type Address
type Item
type OrderID
After a few rounds of discussions with our sales experts, we were able to get some information about them:
- first name and last name are just text (no surprise here)
- order ID is also text, generated according to a specific rule (let’s skip it for now)
- address contains: a country, a city, a postal code and 2 lines reserved for street name or similar
- item contains: a catalogue number (text), a unit price and quantity
Nothing stops us from implementing FirstName
and LastName
as type aliases:
type FirstName = String
type LastName = String
or even from using String
directly. Type aliases have a certain advantage - you can pass the type they are aliasing or anything that resolves to it and the compiler won’t complain. They also have a certain disadvantage - you can pass the type they are aliasing or anything that resolves to it and the compiler won’t complain. Alias doesn’t create a new type, it merly let use another name for the same type. It is a huge advantage when you desing a generic library and the types gets wild, and the signature of combinations of lists and eithers and effects and functions etc, doesn’t necessarily explain the intent. But when you are designing the domain it is so much better to just have completely separate types for any 2 things that represents different concepts. Even if these concepts could be implemented by the same primitive like String
, or Int
or Boolean
.
I pureposefully didn’t mention
Float
orDouble
as the first domain-related use case for them that comes to mind are money-related calculations. Floating point arithmetics - due to how it handles roundings - is the absolutely worst representation for money-related data and computations. Fixed-points/BigDecimal
s and explicit control over the when and how you round are the only responsible choice.
So, let’s make use of our type system and implement even simples domain object as distinct types. There are several ways of doing so (in Scala 2, and Scala 3 introduces a new one), we will discuss them later, for now we could simply use:
final case class FirstName(value: String)
final case class LastName(value: String)
final case class OrderID(value: String)
We could also implement Address
and Item
:
enum Country:
Canada, UnitedKingdom, UnitedStates, ...
final case class City(value: String)
final case class PostalCode(value: String)
final case class AddressFirstLine(value: String)
final case class AddressSecondLine(value: String)
final case class Address(
country: Country,
city: City,
postalCode: PostalCode,
firstLine: AddressFirstLine,
secondLine: AddressSecondLine
)
// There are some existing Money representations you could use,
// like one from Squants: https://github.com/typelevel/squants
enum Currency:
CAD, EUR, USD, ...
final case class UnitPrice(
currency: Currency,
amount: BigDecimal
)
final case class Quantity(value: BigDecimal)
final case class Item(
catalogueNumber: CatalogueNumber,
unitPrice: UnitPrice,
quantity: ItemQuantity
)
At this point we can start wondering: OK, if we want to make list of items displayes to the user reproducible, then item data by its
CatalogueNumber
should also be immutable, so each change to an item page should result in a new number. Or the catalogue number is just an ID, for an updatable entity, and then item should have a copy of every single property of it, so that it in the future, when catalogue item is edited, historical data will still present the item as it was when it was ordered. These are very valid questions, but we want this to be an example and not several-day-long workshop, so I’ll leave this and similar questions as an excercise for curios readers.
It looks so far we have all the models implemented. So we might take a look at improving them, to make them less ambiguous and more useful. For instance errors. String
isn’t very good error format. It’s OK for logging, but you also need to recover errors, or translate errors from one context into another. Would you like to pattern match on String
with literals or regular expressions? It would be quite fallable. So, how about a separate type?
enum OrderError:
case EmptyOrderList
def checkout(
orderID: OrderID,
customer: Customer,
shippingAddress: Address,
items: List[Item]
): Either[OrderError, Order] =
NonEmptyList.fromList(items) match {
case Some(nonEmptyItems) =>
Right(Order(...))
case None =>
Left(OrderError.EmptyOrderList)
}
In Exception
-driven error handling, you would just throw Exception
s. There are some tiny, little issues with that:
- there is a difference between domain errors and logical/infrastructural - domain errors… aren’t actually errors. They are part of the flow that you have to model and design for. Malformed JSON, interrupted connection to DB - they usually aren’t part of the flow. When customer sees information that the item is out of stock, or that they cannot check out anything with an empty basket, or that the left some field unfilled, you are guiding them how they can achieve their results. If client app messes something up or if the infrastucture is down - customer cannot do anything about it, application’s logic for the most part cannot do anything to recover from it, all that can be done is transparently jump to the top of the call stack and report that shit hit the fan. It doesn’t necessarily have to be about domain vs non-domain errors, but there are errors that you want to force someone to do something about (errors returned as values) and errors that you just want to be there, to be handled by someone, somewhere if they feel like it (
Exception
s thrown or passed inside IO monad) Exception
s (andThrowable
s in general) are open hierarchy. You cannot find one place in your codebase where you could find a list of possible issues that your call can have. The opt-in-per-implementation approach is quite good - at the and of the world. Within your domain, it is absolute beneficial to be able to enumare all possible issies to handle, better yet, use pattern matching exhaustiveness to ensure that you explicitly decided how each of them should have been handled. Added newException
? Everything compiles, no need to change the code. Your tests didn’t happen to trigger it? Upsie, you learn about unhandled exception after deployment on Friday. It make things much easier to maintain when you just, add a newcase
to and enum, and compiler fail in every place where you forgot to handle it. Used responsibely you can fix bugs before you even push the changes to CI.
But it might take some practice, experience and several arguments, before you and your team figure our which errors are errors and which are just alternative valid outcomes of you domain operations. And to actually change your habits and notice the benefits from replacing Exception
s with ADTs.
If you aren’t experienced with this approach I suggest you to NOT go all overzelaous-neophyte about it, because passing every single little thing explicitly through all the layers is an amazing way of burning yourself. Try out by passing errors thi way locally, and gradually increasing the scope that the modeled errors traverls within
Either
,Validated
,BIO
or whatever you will use.
In the meantime, let’s try to take a look at our records and think a bit if their design could be slightly improved.
Rethinking our models
Let start by summarising what we wrote so far
// address-related
enum Country:
Canada, UnitedKingdom, UnitedStates, ...
final case class City(value: String)
final case class PostalCode(value: String)
final case class AddressFirstLine(value: String)
final case class AddressSecondLine(value: String)
final case class Address(
country: Country,
city: City,
postalCode: PostalCode,
firstLine: AddressFirstLine,
secondLine: AddressSecondLine
)
// customer-related
final case class FirstName(value: String)
final case class LastName(value: String)
final case class Customer(
firstName: FirstName,
lastName: LastName,
billingAddress: Address
)
// money-related
enum Currency:
CAD, EUR, USD, ...
final case class UnitPrice(
currency: Currency,
amount: BigDecimal
)
// order
final case class OrderID(value: String)
final case class Quantity(value: BigDecimal)
final case class Item(
catalogueNumber: CatalogueNumber,
unitPrice: UnitPrice,
quantity: ItemQuantity
)
final case class Order(
id: OrderID,
customerFirstName: FirstName,
customerLastName: LastName,
billingAddress: Address,
shippingAddress: Address,
items: NonEmptyList[Item],
status: Status
)
// errors
enum OrderError:
case EmptyOrderList
// services
def generateNewOrderID: IO[OrderID] = ??? // TODO
def checkout(
orderID: OrderID,
customer: Customer,
shippingAddress: Address,
items: List[Item]
): Either[OrderError, Order] =
NonEmptyList.fromList(items) match {
case Some(nonEmptyItems) =>
Right(
Order(
id = orderID,
customerFirstName = customer.firstName,
customerLastName = customer.lastName,
billingAddress = customer.billingAddress,
shippingAddress = shippingAddress,
items = nonEmptyItems,
status = Status.Unpaid
)
)
case None =>
Left(OrderError.EmptyOrderList)
}
When we are looking at this, it has all the right things, but - at least for me - namespacing things feels kinda off. We have all definitions in top-level, even though e.g. FirstName
and LastName
belong exclusively to Customer
and AddressFirstLine
and AddressSecondLine
belong exclusively to Address
. We could express that relation using namespacing with companion objects.
final case class Address(
country: Country,
city: Address.City,
postalCode: Address.PostalCode,
firstLine: Address.FirstLine,
secondLine: Address.SecondLine
)
object Address {
final case class City(value: String)
final case class PostalCode(value: String)
final case class FirstLine(value: String)
final case class SecondLine(value: String)
}
final case class Customer(
firstName: Customer.FirstName,
lastName: Customer.LastName,
billingAddress: Address
)
object Customer {
final case class FirstName(value: String)
final case class LastName(value: String)
}
This change doesn’t impact the functionality at all… but (again, it’s subjective) I feel that now I have clear indication of context where each of these properties is being used. Additionally, if I were looking for Address.City
I would be certain that I could look for it in the same file as Address
(so if you stick to the standard convention of your.project
packages translating to your/problem
directories and ClassName
to ClassName.scala
you can tell where to look for the definition even without any tooling).
Why Algebraic Data Types
So, let’s talk a bit about changes that do affect functionality and code behavior. There are reason why we are using sealed
hierarchiy (enums
in Scala 3, both examples of sum types) and case class
es (an example of product types) to model our data (BTW, product types + sum types = algebraic data types):
-
good equality check out of the box - as long as every element of a
case class
has a goodequals
implementation, thecase class
will also have a well behavingequals
, as longs as every element ofsealed
has goodequals
so does the wholesealed
hierarchy. Which usually means that only - raw
Array
s (compare only references, cannot override it, usually make sense to wrap them in e.g.ArraySeq
to make it easier to compare and display them) - functions (their comparison is undecidable in general, so hardly ever you can do more than just compare references)
- and not-
case
custom types in your hierarchy (for which you didn’t imprementequals
andhashcode
)
could break that check
-
unapply
that matchesapply
out of the box - how you construct your value looks exactly the same as how you mattern match itcase class Foo(bar: Bar) Foo(someBar) match { case Foo(matchedBar) => // ... }
-
good defaults for
toString
- if you want to debug your ADT-based data, its default implementation will show your whole structure without any additional effort from your side -
type class derivation - well describe that in detail later but if you are able to define behavior of a whole object as a way of composing the behaviors of its parts, then for ADTs you can generate this behavior only by describing the behavior of the primitives, a way of combining behaviors of elements of a
case class
and a way of dispatching behavior in asealed
hierarchy/enum
s, you can basically generate the implementation in the compile time and guarantee that it handles all the cases -
when you are working in a multithreaded enviornment an immutable data will give you a lot of safety by design. You want to modify data? You have to send a modified copy. You have to pass data explicitly? No accidents with storing things in globals and little race-condition accidents. And if you want to create a modified copy, then case class gives you
.copy
method. (We will show that modifying a nested data and polymorphic data is also not an issue)
This is a lof of (correct default) behavior that we get for free. So, when we model our data, and when we have several options to implement something which would all describe the domain equally well, we might think of picking the one that would be the easiest to use. When we are dealing with values - as understood by functional programming but also as understood by DDD - there is usually hardly any issue. Their value is their whole story. You just follow the definitions and you get the code. Things looks a little bit differently when we look at entities.
Entities and Values
Entities are things with a lifecycle. They can change over time and you still have to be able to point at them and say: this object has different properties than the previous one, but it is the same entity. In other words they have an identity which is different than the sum of all their fields. When someone updates their nickname, you are still able to say that they are the same person. When a company changes its address, it is still the same company. This forces us to ask some question: how do you compare entities? If we are using ADTs to represent our data, then we’ll be comparing the whole object, always. But if we will override equals
(and hashcode
) to e.g. only look at some ID field, then we might end up with some unfunny behavior: you might e.g. try to trace all th versions of some entity in your code and see only one of them - if ID is equal then there will be only one of them in Set
or in collection that you run distinct
on.
Don’t be afraid to nest
My personal take on this - that you might not agree with - is to unflatten the data. For instance our Order
final case class OrderID(value: String)
final case class Order(
id: OrderID,
customerFirstName: FirstName,
customerLastName: LastName,
billingAddress: Address,
shippingAddress: Address,
items: NonEmptyList[Item],
status: Status
)
could be represented like this:
final case class Order(
id: Order.ID
data: Order.Data
)
object Order {
final case class ID(value: String)
final case class Data(
customerFirstName: FirstName,
customerLastName: LastName,
billingAddress: Address,
shippingAddress: Address,
items: NonEmptyList[Item],
status: Status
)
}
Suddenly, the issue almost disappears. If we want to compare whole orders, we use them directly. If we want to compare their IDs, we map
/ filter
/etc on _.id
. We can even check if two different orders have the same content with _.data
! As a matter of the fact, by breaking the big object into smaller ones, we make it easier to work with the data. If we wanted to check if two orders have the same customers, we would have to do
o1.data.customerFirstName == o2.data.customerFirstName &&
o1.data.customerLastName == o2.data.customerLastName
By giving name to commonly used values (because if we are using them often together, probably they are some separate yet unidentified model, and it deserves the name), we make it easier to work with them
final case class Customer(
billingName: Customer.BillingName,
billingAddress: Address
)
object Customer {
final case class FirstName(value: String)
final case class LastName(value: String)
final case class BillingName(
firstName: Customer.FirstName,
lastName: Customer.LastName,
)
}
final case class Order(...)
object Order {
// ...
final case class Data(
billingName: Customer.BillingName,
billingAddress: Address,
shippingAddress: Address,
items: NonEmptyList[Item],
status: Status
)
}
o1.data.billingName == o2.data.billingName
In his DDD book, Eric Evans calls such moments, when you look at your models, talk to the expert and figure out something which makes you understand the domain better and improve the model, a discovery process. A series of such discoveries might drastically change the way your model works. Which is not a bad thing - it simply meant that our old model was a worse representaion of reality and the new one is closer and better. I would be bad, if we tied our model to how we store data and in general how we interact with the external world. But we didn’t, we started from the blank state. And, after a series of a small discoveries, we arrived at the following design:
// address-related
enum Country:
Canada, UnitedKingdom, UnitedStates, ...
final case class Address(
country: Country,
city: Address.City,
postalCode: Address.PostalCode,
firstLine: Address.FirstLine,
secondLine: Address.SecondLine
)
object Address {
final case class City(value: String)
final case class PostalCode(value: String)
final case class FirstLine(value: String)
final case class SecondLine(value: String)
}
// customer-related
final case class Customer(
billingName: Customer.BillingName,
billingAddress: Address
)
object Customer {
final case class FirstName(value: String)
final case class LastName(value: String)
final case class BillingName(
firstName: Customer.FirstName,
lastName: Customer.LastName,
)
}
// money-related
enum Currency:
CAD, EUR, USD, ...
final case class UnitPrice(
currency: Currency,
amount: BigDecimal
)
// order
final case class Quantity(value: BigDecimal)
final case class Item(
catalogueNumber: CatalogueNumber,
unitPrice: UnitPrice,
quantity: ItemQuantity
)
final case class Order(
id: Order.ID
data: Order.Data
)
object Order {
final case class ID(value: String)
final case class Data(
billingName: Customer.BillingName,
billingAddress: Address,
shippingAddress: Address,
items: NonEmptyList[Item],
status: Status
)
}
// errors
enum OrderError:
case EmptyOrderList
// services
def generateNewOrderID: IO[OrderID] = ??? // TODO
def checkout(
orderID: Order.ID,
customer: Customer,
shippingAddress: Address,
items: List[Item]
): Either[OrderError, Order] =
NonEmptyList.fromList(items) match {
case Some(nonEmptyItems) =>
Right(
Order(
id = orderID,
data = Order.Data(
// Since we copied the whole Customer content,
// does it mean we could store just store it here?
// That we made another discovery?
billingName = customer.billingName,
billingAddress = customer.billingAddress,
shippingAddress = shippingAddress,
items = nonEmptyItems,
status = Status.Unpaid
)
)
)
case None =>
Left(OrderError.EmptyOrderList)
}
I believe that, on the first glance, the differences between this code and the flat code before are not large. You might feel that the representation basically haven’t changed, and it was superficial. After all it contains exactly the same data. However, think a bit how you would work with old representation and the new one:
- if you wanted to find all
Order
s in a set with the sameBillingName
which one would result in easier to read and less error-prone code? - if you wanted to find duplicates among
Order
’s with different IDs how would do implement it in the old version and how in the new version? - if you wanted to extract some data with pattern matching and e.g. skip
BillingName
s with_
which version would require slight more effort and which less? (Here is only 1 underscore difference, but for greater models it could be a greater difference) - if you wanted to tell in what contect some model was defined, what it relates to, how it connects with other things - without guessing! Would you prefer the nested structure? Or maybe the flat one?
This a just my personal opinion, but by breaking down bigger models into small models, we not only made each code easier to read and understand but also easier to maintain. Without loosing any data. So, after looking how we can make a better use of product types, let’s take a look at sum types.
You don’t have to validate what you cannot create
During the discussion with an accountant, you learn that your order has to display the VAT. You think to yourself: it’s simple, I’ll just add another field, and call it a day.
final case class UnitPrice(
currency: Currency,
netAmount: BigDecimal,
vat: BigDecimal
) {
def grossAmount: BigDecimal =
netAmount * (BigDecimal(1) + vat)
}
However, the accountant quickly informs you: in some circumstances you are NOT adding the VAT to the invoice, e.g. when the customer is a company from a different country, which has signed the treaty aboid avoiding the double taxation. So, suddenly your UnitPrice
would come from a service, one which should adjust to such cases. And, you have to distinct the cases when the VAT was not collected due to such treaty from situations when some item is subject to 0% VAT. So, you think to yourself: I’ll just made this field optional:
final case class UnitPrice(
currency: Currency,
netAmount: BigDecimal,
vat: Option[BigDecimal] // None -> avoiding double taxation
) {
def grossAmount: BigDecimal =
netAmount * (BigDecimal(1) + vat.getOrElse(BigDecimal(0)))
}
Then you learn that when customer orders bigger amounts of a specific item, the system should decrease its unit price (and that this is different from the discount calculated from the total order). BUT, because of company’s regulations, this discount can only be assigned to the domestic customers.
final case class UnitPrice(
currency: Currency,
netAmount: BigDecimal,
vat: Option[BigDecimal], // None -> avoiding double taxation
discount: Option[BigDecimal] // can only be Some if the customer is domestic
) {
// this is ugly, and impossible to verify in compile time
assert(vat.isEmpty || discount.isEmpty)
def grossAmount: BigDecimal =
netAmount *
(BigDecimal(1) - discount.getOrElse(BigDecimal(0))) *
(BigDecimal(1) + vat.getOrElse(BigDecimal(0)))
}
Slowly, this design becomes problematic. We have more and more options - which doesn’t explain anything about themselves. We have several BigDecimal
s which can be easily confused. It can only go worse. At this point you can either start adding more and more Option
s, handled by more and more if
s, or go insane, or quit job. Or rethink your model. We have to:
- distinct between domestic and non-domestic customers
- among non-domestic you should distinct the ones with the treaty about avoiding the double taxation
- domestic ones should be allowed to have per-item-type discounts
- ideally, we should be able to communicate the intent so that when we will inevitably change the requirements we will still undertand what these data meant and why it was here, or why it might not be there (something which
Option
fails to communicate)
We can address these by making several distinctions. For starters let’s distinct money from the price. We could have done this sooner, but now it became even more apparent that we have to do this:
final case class Ratio(asFraction: BigDecimal)
object Ratio {
val whole: Ratio = Ratio(BigDecimal(1))
}
// This could be implemented by some library like squants
// which would add much more functionality to it, like
// e.g. symbols for each currency, the rules of rounding(!!!)
// (different for each currency), etc.
final case class Money(
currency: Currency,
amount: BigDecimal
) {
// Money operations, e.g:
def * (ratio: Ratio): Money = copy(
amount = amount * ratio.asFraction
)
def * (quantity: Quantity): Money = copy(
amount = amount * quantity.value
)
}
We could also make it explicit whether there is a double tax avoiding treaty or not:
// Scala 2
sealed trait NonDomesticType extends Product with Serializable
object NonDomesticType {
case object NoTreaties extends NonDomesticType
case object DoubleTaxAvoiding extends NonDomesticType
}
// Scala 3
enum NonDomesticType:
case NoTreaties, DoubleTaxAvoiding
whether there is discount or not
// Scala 2
sealed trait Discount extends Product with Serializable
object Discount {
case object NoDiscount extends Discount
final case class LargeOrderDiscount(discount: Ratio) extends Discount
}
// Scala 3
enum Discount:
case NoDiscount
case LargeOrderDiscount(discount: Ratio)
and finally whether the order is domestic or not
// Scala 2
sealed trait PriceHandling extends Product with Serializable
object PriceHandling {
final case class Domestic(discount: Discount) extends PriceHandling
final case class NonDomestic(nonDomesticType: NonDomesticType) extends PriceHandling
}
// Scala 3
enum PriceHandling:
case Domestic(discount: Discount)
case NonDomestic(nonDomesticType: NonDomesticType)
Now we can rethink our price approach:
// we could make Value = Ratio when we just store taxation rate
// or Value = Money when we will return the collected amount...
// but without losing a reason if we don't collect it!
// Scala 2
sealed trait Vat[+Value] extends Product with Serializable {
def map[Another](
f: Value => Another
): Vat[Another] = this match {
case Vat.Collected(value) => Vat.Collected(f(value))
case Vat.NotCollectedAvoidingDoubleTax => Vat.NotCollectedAvoidingDoubleTax
}
def fold[B](
collected: Value => B,
avoidDoubleTax: => B
): B = this match {
case Vat.Collected(value) => collected(value)
case Vat.NotCollectedAvoidingDoubleTax => avoidDoubleTax
}
}
object Vat {
final case class Collected[+Value](value: Value) extends Vat[Value]
case object NotCollectedAvoidingDoubleTax extends Vat[Nothing]
}
// Scala 3 version
enum Vat[+Value]:
case Collected(value: Value)
case NotCollectedAvoidingDoubleTax
def map[Another](
f: Value => Another
): Vat[Another] = this match {
case Collected(value) => Collected(f(value))
case NotCollectedAvoidingDoubleTax => NotCollectedAvoidingDoubleTax
}
def fold[B](
collected: Value => B,
avoidDoubleTax: => B
): B = this match {
case Collected(value) => collected(value)
case NotCollectedAvoidingDoubleTax => avoidDoubleTax
}
This let’s us express the logic and relationships in a very clear way:
final case class ItemPrice(
netPrice: Money,
vatRate: Vat[Ratio]
) {
// Stores information about VAT collected in money
// without loosing information about reasins why VAT
// was not collected if it wasn't.
def vatValue: Vat[Money] = vatRate.map(rate => money * rate)
// Calculates the total price.
def grossPrice: Money = vatValue.fold(
collected = rate => netPrice * (Ratio.whole + rate),
avoidDoubleTax = netPrice
)
}
final case class UnitPrice(
base: Money,
priceHandling: PriceHandling
)
val standardVatRate: Vat[Ratio] = Vat.Collected(Ratio(BigDecimal("0.23")))
// Expresses the logic in a very straighworward way.
def calculatePrice(unitPrice: UnitPrice, quantity: Quantity): ItemPrice =
// Exhaustive pattern matching forces us to handle all cases.
// At the same time, it is impossible to have an invalid state
// like price with a discount and without VAT.
unitPrice.priceHandling match {
// You don't have to deal with double-tax-avoiding treaties
// in a domestic sale, so it is not available for them.
case PriceHandling.Domestic(Discount.NoDiscount) =>
ItemPrice(
netPrice = unitPrice.base * quantity,
vatRate = standardVatRate
)
// Only domestic sales can have discounts.
case PriceHandling.Domestic(Discount.LargeOrderDiscount(discount)) =>
ItemPrice(
netPrice = unitPrice.base * quantity * (Rates.whole - discount),
vatRate = standardVatRate
)
// You cannot use discount as it is not available in these cases.
case PriceHandling.NonDomestic(NonDomesticType.NoTreaties) =>
ItemPrice(
netPrice = unitPrice.base * quantity,
vatRate = standardVatRate
)
// DoubleTaxAvoiding can only be used with NonDomestic sale.
case PriceHandling.NonDomestic(NonDomesticType.DoubleTaxAvoiding) =>
ItemPrice(
netPrice = unitPrice.base * quantity,
vatRate = Vat.NotCollectedAvoidingDoubleTax
)
}
I believe, that this code expresses our domain and intents much better than the alternative which uses one flat case class and several optionals fields for which not all combinations should be allowed. Of course, someone could argue, that this logic could be done better, that VAT for each item could be a part of UnitPrice
definition or calulcated in some other, non-hardcoded way, that there might be a certain kind of a discount allowed for non-domestic sales, etc. However, all these changes could still be introduced in such a way that we could use types to guide and protect us from unhandled corner cases, eliminating the invalid states by definition and turn runtime errors into compile time errors (like in this exhaustive pattern matching).
Of course, this approach have some challenges, if you design your models in too restictive way, then adding a new feature might require some effort. So perhaps it might not work very well in a prototype where you don’t want things to be bulletproff, while preferring to quickly iterate over the different designs. But it definitelly pays off in a long term when you want to maintain a project together with other people, who would have to work with this code even after you are gone to greener pastures.
The other challenge, which we will address in a moment, is how to deal with side effect, and in particular, how to deal with persisting you data and transferring it, when the external world is much less type safe the core of the application.
Securing the boundaries
So far we worked mostly with pure, side-effect free representation of domain’s objects and operations. We totally skipped the implementation of side-effecting operations like
def generateNewOrderID: IO[OrderID] = ??? // TODO
and we skipped talking about the ways of composing such side-effecting computations. However, there is more to it than just gluing together functions that modify some state e.g. of the database, or otherwise communicate with the outside world e.g. through HTTP endpoints.
Non-uniform model
No matter how well your domain is modeled, no matter how well you put types to good use, what kind of techniques you pull to make the expression of the business logic bug-free-by-design - external world is outside your control. You can only control how you talk to it, and how you respond to it. HTTP endpoints might evolve their format, users’ data might be corrupted and unreadable, the database might have migrated to another schema. You have to deal with these cases. And, and the same time, you want to have your business logic free from dealing with these cases or it will bury you.
The (imperfect) solution to this problem is taking all the data from the outside world, and passing it through some layer where the only data that comes from the other side, is valid data. Valid as in described by your domain, because e.g. informing user that the date they entered is not acceptable might be a subject of your business’ law regulations, not IllegalArgumentException
, so it is a valid input for your domain, which would process it and respond with another valid information for your user.
The way such layer is often constructed is by running validations. Something which works more or less like this:
def isPieceOfDataValid(pieceOfData: PieceOfData): Boolean
val pieceOfData: PieceOfData
if (!isPieceOfDataValid(pieceOfData)) {
throw new IllegalArgumentException("Bad input")
}
handlePieceOfData(pieceOfData)
This code is quite flexible. You could put any number of if (check) { throw Exception }
blocks in it. You could also comment out any number of such blocks here. This means that there is virtually nothing in this code stopping you from skipping all the checks, commenting them out and forgetting to uncomment them before deployment. Either you have enough tests to stop yourself from accepting any user’s input or it’s must a matter of time before you or your colleague will make a funny deployment (and I kinda heard about such funny deployments in Big Financial Institutions Which Have Processes Preventing This From Happening. Especially if checks were provided by Runtime Reflection, Annotations And Friends).
So, compiler would never complain, types are OK, so what there is to complain about. But, actually, why? Why should we assume that the input returned by database, JSON decoder or protobuf, should match the format that we chose for our domain?
- domain models are designed for the correctness and ease of maintenance of business logic
- data transfer objects (DTO) are designed for the compatibility for the medium we are passing data through (JSON, Protobufs, Avro, …)
- data access objects (DAO) are designed for the compatibility with (usually) persistence layer (relational DB, document DB, whatever you can think of)
These are dfferent use cases, they might be subject to different limitations. If we try to use one and the same model for all of them, then out model with be some compromise subject of constrains and limitations of all the possible layers. Our ADTs are nice and handy but we might have some issues using them if we wanted to not only express our logic with them but also serialize them as JSONs in endpoints inputs and outputs and persist them in a database. I can imagine that veeeery quickly you would end up with models like
final case class ItemPrice(
netPriceAmount: BigDecimal,
netPriceCurrency: String,
vatPercent: Option[BigDecimal],
vatMissingReason: Option[String],
) {
assert(vatPercent.isDefined ^ vatMissingReason.isDefined)
}
instead of
final case class ItemPrice(
netPrice: Money,
vatRate: Vat[Ratio]
)
And you could have use such definition everywhere, just because this is the simplest way you can make you JSON codecs works, and shove things in SQL back and forth. Initially it might sound like a good idea, you aren’t overdesigning, you deliver quickly, unit tests keep errors at bay. Things start to fall apart when the projects stops fitting inside into your head. You cannot make any larger changes, because they would break too many tests, and you stop remebmering how these tests actually work. You pinned down not only the behavior but also the implementation. Any change in database affects the JSON directly, and any front-end change request affects directly what you return from the database.
Such uniform modeling approach is quite popular in Rapid Application Development frameworks. As long as the JSONs you shove match the records you upsert, things works smooth. As long as virtually all data the user provides is valid, there are no bug. If you have to move away from these restrictions it is also not a problem as long as the scope of your whole project fits into a human skull. Especially if the project is short-living this might be in fact the optimal approach.
The problems arise when the project is not short-living, nor small, and evolving over time. Then it is worth maintaining a separate representation for each representation (domain, DTO, DAO or whatever you have there). This is an example of how we could implement such non-uniform model:
// Domain model, optimized for ease of working with domain (Scala 2)
final case class Money(
currency: Currency,
amount: BigDecimal
) { /* operations */ }
enum Vat[+Value]:
case Collected(value: Value)
case NotCollectedAvoidingDoubleTax
// operations
final case class ItemPrice(
netPrice: Money,
vatRate: Vat[Ratio]
) { /* operations */ }
// DTO model, optimized for passing JSONs around (Scala 2)
final case class MoneyDTO(
currency: String,
amount: BigDecimal
)
// It is easier to create Encoder when type parameters are invariant
enum Vat[Value]:
case Collected(value: Value)
case NotCollectedAvoidingDoubleTax()
final case class ItemPriceDTO(
netPrice: MoneyDTO,
vatRate: VatDTO[BigDecimal]
)
{
"net_price": {
"currency": "EUR",
"amount": "1000.00"
},
"vat_rate": {
"type": "collected",
"value": "0.23"
}
}
// DAO model optimized for the ease of working with an SQL database
final case class ItemPriceDAO(
currency: String,
netAmount: BigDecimal,
vatRate: Option[BigDecimal],
noVatReason: Option[String]
)
I believe a lot of people would be immediatelly against such approach. You could always wrote your own codecs (in case of JSON), and your own DB handlers for each model (e.g. Meta
if you use Doobie). So why write a whole separate model? My own reasons would be:
- you see explicitly what model each usage expects. You don’t have to look at the unit tests to figure out the input/output of JSON, not looking into DB schema to see what the table looks like
- you can have these codecs/
Meta
/whatever generated for you. I find it much easier to just update the model, than to find where manually written handler is stored and update its complex structure of nested checks - you have to explicitly transform one form into the other, and there you can put additional checks to make sure that your model not only have a proper structure but also all the other checks that you might have prepared for it
- over the time the optimal structures for each use case might drift so far away from each other that you will be forced to do things explicitly anyway
Parsing, not validating
So, we separated the representations. And we cannot use validation anymore.
val input: ItemPriceDAO
def isInputValid(input: ItemPriceDAO): Boolean
def handleItemPrice(itemPrice: ItemPrice): Something
Even if we pass the validation, it is still useless, because we need the domain model, and we have a DTO object. So we need something like:
def parseInput(input: ItemPriceDAO): Either[ParsingError, ItemPrice]
which we could use like this:
// calling .map directly
parseInput(input)
.map(output => handleItemPrice(output))
// for-comprehension
for {
output <- parseInput(input)
} yield handleItemPrice(output)
// async-await, since 2.13.3, requires -Xasync flag
async {
val output = await(parseInput(input))
handleItemPrice(output)
}
So, let’s implement such transformations to-and-from DTO!
// Let's start with defining possible parsing errors
// (we can extend this type as needed),
enum ParsingError:
case InvalidEnum(
enumName: String,
value: String
)
case IllegalCombination(
msg: String
)
// Then let's define bidirectional transformation
// for all the nested types.
final case class MoneyDTO(
currency: String,
amount: BigDecimal
) {
def toDomain: Either[ParsingError, Money] = for {
// Using Enumeratum would make this MUCH simpler
// but let's work with the vanilla Scala for now.
currency <- this.currency.toUpperCase match {
case "USD" => Right(Currency.USD)
case "EUR" => Right(Currency.EUR)
...
case _ => Left(ParsingError.InvalidEnum("Currency", this.currency))
}
amount = Ratio(this.amount)
} yield Money(
currency = currency,
amount = amount
)
}
object MoneyDTO {
def fromDomain(domain: Money): MoneyDTO = MoneyDTO(
// Here Enumeratum would make things easier as well.
currency = domain.currency match {
case Currency.USD => "USD"
...
},
amount = domain.amount.asFraction
)
}
enum VatDTO[Dto]:
case Collected(value: Dto)
case NotCollectedAvoidingDoubleTax()
def toDomain[Domain](
f: Dto => Either[ParsingError, Domain]
): Either[ParsingError, Vat[Domain]] =
// This could be simplified with Cats.
// Again, leave it as vanilla Scala for now.
this match {
case VatDTO.Collected(value) =>
f(value).map(VatDTO.Collected(_))
case VatDTO.NotCollectedAvoidingDoubleTax() =>
Right(Vat.NotCollectedAvoidingDoubleTax)
}
object VatDTO:
def fromDomain[Domain, Dto](domain: Vat[Domain])(f: Domain => Dto): VatDto[Dto] =
domain match {
case Vat.Collected(value) =>
VatDTO.Collected(f(value))
case Vat.NotCollectedAvoidingDoubleTax =>
VatDTO.NotCollectedAvoidingDoubleTax
}
end VatDTO
// Finally, let's transform the whole object.
final case class ItemPriceDTO(
netPrice: MoneyDTO,
vatRate: VatDTO[BigDecimal]
) {
def toDomain: Either[ParsingError, ItemPrice] = for {
netPrice <- this.netPrice.toDomain
vatRate <- this.vatRate.toDomain(rate => Right(Ratio(rate)))
} yield ItemPrice(
netPrice = netPrice,
vatRate = vatRate
)
}
object ItemPriceDTO {
def fromDomain(domain: ItemPrice): ItemPriceDTO = ItemPriceDTO(
netPrice = MoneyDTO.fromDomain(domain.netPrice),
vatRate = VatDTO.fromDomain(vatRate)(_.asFraction))
)
}
Such approach (promoted by Alexis King) have several adventages:
-
you know where to look for it (here: I decided that by convention translation will be placed in DTO/DAO models to keep domain free from the dirtyness of external world)
-
it’s so not-clever, that errors would be easier to spot (especially if you are reusing names, and stick to using named parameters)
-
it is defined explicitly and as pure function, so you can test it easily without starting up any IO-ing framework
// given val dto: ItemPriceDTO = ... val expected: Either[ParsingError, ItemPrice] = ... // when val domain = dto.toDomain // then assert(domain == expected)
-
you can easily reuse it: wether you want to use Circe, or Tapir, or some handrolled solution - you just use these functions and you are done. They are framework- and library-agnostic
import io.circe.{ Decoder, DecodingFailure } import io.circe.generic.extra.auto._ import io.circe.generic.extra.semiauto.deriveDecoder val ourErrorToCirce: ParsingError => io.circe.DecodingFailure = ... implicit val configuration: io.circe.generic.extra.Configuration = io.circe.generic.extra.Configuration .default .withSnakeCaseMemberNames .withDiscriminator("type") // We can translate DTO to Domain here or // within the endpoint logic. implicit val itemPriceDecoder: Decoder[ItemPrice] = deriveDecoder[ItemPriceDto].mapDecode { dto => dto.toDomain.left.map(ourErrorToCirce) }
But sometimes, we want to secure the domain values even further. What if we decided that Ratio
can only hold non-negative values?
Smart constructor
A smart constructor is a constructor which creates the value - but only if it can be done without violating any constrain. If we use a normal class it could look like:
// Private constructor can only be called from within
// the class or its companion object.
final class Ratio private (val asFraction: BigDecimal)
object Ratio {
def apply(value: BigDecimal): Either[ParsingError, Ratio] =
if (value >= BigDecimal(0)) Right(new Ratio(value))
else Left(ParsingError.IllegalCombination("Ratio has to be non-negative"))
}
If we are OK with using normal-classes (sometimes we do), that’s all we have to do. However, without case class
we would loose several functionalities, which might make sense here:
- autogenerated
toString
for easy debugging - autogenerated
hashcode
andequals
for reliable equality check - autogenerated
unapply
for pattern matching symmetrical toapply
- all attributes are public by default without the need for
val
We would also lose autogenerated .copy
BUT if we want to parse things, .copy
could make things harder (break them) rather than easier. So what would happen if we just implemented:
case class Ratio(asFraction: BigDecimal)
object Ratio {
def parse(value: BigDecimal): Either[ParsingError, Ratio] =
if (value >= BigDecimal(0)) Right(new Ratio(value))
else Left(ParsingError.IllegalCombination("Ratio has to be non-negative"))
}
?
Let’s analyse:
apply
would build only a valid object, goodnew
would let the programmers bypass verification of input, bad- subclasses (even anonymous
new Ratio(BigDecimal(-1)) {}
) would also bypass the constraint, bad .copy
could create a newRatio
byvalid.copy(asFraction = BigDecimal(-1))
so, to keep things safe we have to make sure that this case class
, disallows new
, inheriting and .copy
anywhere outside of Ratio
class and its companion.
Since Scala 2.13.2 with -Xsource:3
compiler parameter (and without any extra parameters in Scala 3) .copy
, new
and apply
have the same visibility modifier, so we just have to
// Notice only one "private" and no need for "final".
case class Ratio private (asFraction: BigDecimal)
object Ratio {
def parse(value: BigDecimal): Either[ParsingError, Ratio] =
if (value >= BigDecimal(0)) Right(Ratio(value))
else Left(ParsingError.IllegalCombination("Ratio has to be non-negative"))
}
However, if you have to work with/support older Scala versions, there is a bit more hassle:
- you have to make the
case class
final
orsealed
to prevent uncontrolled inheritance - you have to make the constructor
private
to prevent usage ofnew
- you have to make the class
abstract
to prevent generation of.copy
, and since you cannot usefinal
andabstract
together, you have to make itsealed
. Additionally, you have to create a new subclass inside a smart constructor
Altogether, you end up with
// This long chain of keywords is unfortunately required...
sealed abstract case class Ratio private (asFraction: BigDecimal)
object Ratio {
def parse(value: BigDecimal): Either[ParsingError, Ratio] =
// ... just like this anonymous instance here:
if (value >= BigDecimal(0)) Right(new Ratio(value) {})
else Left(ParsingError.IllegalCombination("Ratio has to be non-negative"))
}
Before we start lamenting how dreadful are cases when we have to write such smart constructors, a few notes
- there is quite a lot of checks that we could basically autogenerate, if we express the type in a particular way, e.g. using Refined types (we’ll see how when we get to the Quality of life improvements section)
- this is especially true when we are needing such checks for a single-property records (ditto)
Since we know some basic way of securing invalid objects into our domain (by, you know, preventing their creation in the first place), we can move on the next challenge: waving side-effects into the picture.
Separating side-effects from value computation
We defined some domain operations as pure functions. We could do it with quite a lot of them - even an event sourcing can often be expressed just as taking old state and event and using some function to compute the new state:
def applyEvent(event: Event, state: State): State
(and it certainly does in e.g. Akka Typed’s EventSourcedBehavior
).
But usually we have to combine somehow these functions, and for people new to functional programming it might be difficult to understant how things like Either
, Future
, IO
and so on, combine the computations. So, let’s do a quick summary.
Producers and pipelines
Let’s say you have some value producer. Perhaps it produces values when you use an index (e.g. List
) or key (Map
). Perhaps to obtain a produced value you have to wait a bit (Future
). Perhaps producer might produce one value or no value at all (Either
when we expect value in Right
, Option
). Each particular producer might have a different ways of producing values. When you will work with it, you might even care how it works. But not at the moment.
At the moment you have e.g. Producer
of Int
s (Producer[Int]
). You also know what you would do with every single value produced by it (whether it would produce one value, million or zero). E.g. you would want to do i => i.toString
. There could be an interface, which - without you caring how this particular producer works - tell you that you are allowed to take this Int => String
and generate Producer[Int] => Producer[String]
. If this interface was generic, you could express quite a lot of your logic as just the compositions of Producer
transformations:
- take
Producer[A]
- take
A => B
and turn it intoProducer[A] => Producer[B]
- take next
B => C
and turn it intoProducer[B] => Producer[C]
- combine all these functions with
andThen
and passProducer[A]
into them
As long as 1-to-1 mapping of inputs of each of these functions would be enough for you, you could express your whole program as just a pipline of producers! It is so common that it has a shortcut. Instead of writing:
(lift(f: A => B): Producer[A] => Producer[B]).apply(pa: Producer[A])
we usually write just:
pa.map(f)
// examples
import scala.util.{ Try, Success, Failure }
import scala.concurrent.{ Await, Future }
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
val f: Int => String = _.toString
List(1, 2, 3).map(f) == List("1", "2", "3")
Nil.map(f) == Nil
Some(1).map(f) == Some("1")
None.map(f) == None
Right(1).map(f) == Right("1")
Left("ups").map(f) == Left("ups")
val err = new Exception
Success(1).map(f) == Success("1")
Failure(err).map(f) == Failure(err)
def await[A](fa: Future[A]) = Try(Await.result(fa, Duration.Inf))
await(Future(1).map(f)) == await(Future("1"))
await(Future(throw err).map(f)) == await(Future(throw err))
We also have a common name for this kind of producer: we call it a functor. All it means that you can express your program as pipelines of mapped e.g. Option
s, List
s, Either
s, Future
s, IO
s.
This pipline has a restriction: each lifted function takes one input and one output, so we cannot filter out or expand our pipeline. We cannot also make it recursive. And we cannot combine 2 pipelines. If we draw it it would be a straight line with some points on it.
So let’s add another operation to our pipeline of producers: a Cartesian product of all values returned by both of them. Cartesian product of List
s would be a tuple of this lists elements, Cartesian product of Either
would be a tuple - if both elements are Right
, if one isn’t then Left
. Cartesian product of Future
s would be a tuple of successful values - if one Future
failed, then the product would a a failure as well. Makes sense, product of 2 singletons should be a singleton, a product of sets where at least one is empty, should be empty. Since a tuple is hardly ever the most useful thing we could .map
immediately over it.
The logic above - make Cartesian and map over it - is exactly what we are expecting when we see that our Producer
provides .map2
(pa: Producer[A]).map2(pb: Producer[B]) { (a: A, b: B) =>
f(a, b): C
}
// examples
import cats.implicits._
List(1, 2, 3).map2(List(4, 5, 6)) { (a, b) => s"$a $b" } ===
List("1 4", "1 5", "1 6", "2 4", "2 5", "2 6", "3 4", "3 5", "3 6")
List(1, 2, 3).map2(Nil) { (a, b) => s"$a $b" } ===
Nil
Option(1).map2(Option(2)) { (a, b) => s"$a $b" } ===
Option("1 2")
Option(1).map2(None) { (a, b) => s"$a $b" } ===
None
(Right(1): Either[String, Int]).map2(Right(2)) { (a, b) => s"$a $b" } ===
Right("1 2")
(Right(1): Either[String, Int]).map2(Left("ups")) { (a, b) => s"$a $b" } ===
Left(value = "ups")
await(Future(1).map2(Future(2)) { (a, b) => s"$a $b" }) ===
await(Future("1 2"))
await(Future(1).map2(Future(throw err)) { (a, b) => s"$a $b" }) ===
await(Future(throw err))
This .map2
allows us to combine 2 producers into one “parallely”: we combine all of the values from the one with all of the values from the another. Our pipeline is slightly less restricted this way. We call such producer an applicative functor.
But we want even more. What if we don’t want to handle some values? Or if we want to produce several values out of one. Or if for some values we wanted to just return all the original values of the original producer?
We could try to use .map
to return as a value… a Producer. If we created it ourselves we could create an empty Producer (whatever empty means for a particular producer: None
, Nil
, Left
, Failed
). We could return one with several values (e.g. for List
where it makes sense). And for async data structures with a trampoline (Future
, IO
) we could return them (or create them anew) to handle e.g. retry logic. The only issue is that after .map
we could have Producer[Producer[B]]
, so the next transformation would have an issue.
So let’s require that our Producer
allows us to .flatten
it. Now, we can have a fine grained control over what we do with each value. We just have to call .flatten
right after .map
. Or merge these 2 into .flatMap
. And if there was some function that takes one value and creates a producer of it, e.g. pure
(Some(_)
, List(_)
, Right(_)
, Future.successful(_)
) we could just implement map
as flatMap(f andThen pure)
.
(pa: Producer[A]).flatMap { a =>
// ...
pb: Producer[B]
}: Producer[B]
// same as
(pa: Producer[A]).map { a =>
// ...
pb: Producer[B]
}.flatten : Producer[B]
// while (f: A => B)
(pa: Producer[A]).map(f): Producer[B]
// same as
(pa: Producer[A]).flatMap(a => pure(f(a))): Producer[B]
// examples
import cats.implicits._
// These 2 List examples
List(1, 2).flatMap(i => List(i.toString, (-i).toString)) ===
List("1", "-1", "2", "-2")
List(1, 2).flatMap(i => List.empty[String]) ===
List.empty[String]
// are the same as these 2 List examples
List(1, 2)
.map { i =>
List(i.toString, (-i).toString)
} // List(List("1", "-1"), List("2", "-2")): List[List[String]]
.flatten ===
List("1", "-1", "2", "-2")
List(1, 2)
.map { i =>
List.empty[String]
} // List(List(), List()): List[List[String]]
.flatten ===
List.empty[String]
// These 2 Option examples
1.some.flatMap(i => i.toString.some) ===
"1".some
1.some.flatMap(i => none[String]) ===
none[String]
// are the same as these 2 Option examples
1.some
.map { i =>
Option(i.toString)
} // Some(Some("1"))
.flatten ===
"1".some
1.some
.map { i =>
none[String]
} // Some(None)
.flatten ===
none[String]
// These 2 Either examples
1.asRight[String].flatMap(i => (-i).asRight[String]) ===
"-1".asRight[String]
1.asRight[String].flatMap(i => "error".asLeft[String]) ===
"error".asLeft[String]
// are the same as these 2 Either examples
1.asRight[String]
.map { i =>
-i.asRight[String]
} // Right(Right("-1"))
.flatten ===
"-1".asRight[String]
1.asRight[String]
.flatMap { i =>
"error".asLeft[String]
} // Right(Left("error"))
.flatten ===
"error".asLeft[String]
// These 2 Future examples
await(
Future(1).flatMap(i => Future((-1).toString))
) === await(
Future("-1".toString)
)
await(
Future(1).flatMap(i => Future(throw new Exception))
) === await(
Future(throw new Exception)
)
// are the same as these 2 Future examples
await(
Future(1)
.map { i =>
Future((-1).toString)
} // Future[Future[String]]
.flatten
) === await(
Future("-1".toString)
)
await(
Future(1)
.map { i =>
Future(throw new Exception)
} // Future[Future[String]]
.flatten
) === await(
Future(throw new Exception)
)
Such a producer with flattenable pipelines we call a monad.
So summarizing, we will build our app as a pipeline of producers:
- where we will add new stages to producers with
.map
(functors) - combine Cartesian product values from two pipelines with
.map2
(or something that uses it) (pplicatives) - or create producers out of values ourselves and then flatten them with
.flatMap
to have finer-grained control (interrupting computations, looping computations, etc) (monads)
Depending on use case we might also introduce another interface for e.g. handling errors in the pipeline.
Domain operations as pipelines
If we come from imperative background where we have a lot of statements, loops and mutations, the idea that our program is just a big pipline might be unintuitive. After wiring your brain around mutating everything everywhere, initially you might not see how to model your whole program as a series of pipelines (and this problem is addressed by Practical FP in Scala: A hands-on approach by Gabriel Volpe, it describes building a whole server as purely functional application).
For our purpose we just focus on one endpoint, and assume that somewhere, this endpoint handler is a part of Request => Response
pipeline (true e,g, for Http4s - very explicitly, and for Akka HTTP - hidden in internals). We would only handle the Input => F[Either[Error, Output]]
part (like in Tapir). E.g. implementing something like:
def handleCheckout(
customer: Customer, // extracted from session
shippingAddress: ShippingAddress // POSTed in JSON
): IO[Either[OrderError, Order]]
We already defined
def checkout(
orderID: Order.ID,
customer: Customer,
shippingAddress: Address,
items: List[Item]
): Either[OrderError, Order] = ...
and we are yet to define
def generateNewOrderID: IO[OrderID] = ???
and here we would also need something like:
// Probably at this point inventing and using Customer.ID might
// make more sense but let's leave it for another day.
def getBasket(customer: Customer): IO[List[Item]] = ???
def emptyBasket(customer: Customer): IO[Unit] = ???
def saveOrder(order: Order): IO[Unit] = ???
We could implement it all in-memory, and then compose it all to create the endpoint implementation:
import cats.effect.IO
import cats.effect.concurrent.Ref
import cats.syntax.functor._ // for .void
// Ref is an IO wrapper around AtomicRef - so it quite good
// way of implementing some in-memory state storage.
class BasketRepository(itemsByCustomers: Ref[IO, Map[Customer, List[Item]]]) {
// Maybe here would be also addItemToBasket, etc
def getBasket(customer: Customer): IO[List[Item]] =
itemsByCustomers.get.map(_.getOrElse(customer, Nil))
def emptyBasket(customer: Customer): IO[Unit] =
itemsByCustomers.tryUpdate(_.removed(customer)).void
}
object BasketRepository {
def create: IO[BasketRepository] =
Ref.of[IO, Map[Customer, List[Item]]](Map.empty)
.map(new BasketRepository(_))
}
class OrderRepository(orders: Ref[IO, Map[Order.ID, Order]]) {
def saveOrder(order: Order): IO[Unit] =
orders.tryUpdate(_.updated(order.id, order)).void
}
object OrderRepository {
def create: IO[OrderRepository] =
Ref.of[IO, Map[Order.ID, Order]](Map.empty)
.map(new OrderRepository(_))
}
// This could consult DB, or external service or whatever,
// it's just an illustartation that probably ID generation
// is not referentially transparrent and we want to wrap it
// with some IO.
def generateNewOrderID: IO[Order.ID] = IO {
Order.ID(java.util.UUID.randomUUID.toString)
}
with that we can finally implement the endpoint:
// IO[A] = side effects resulting in A
// Either[L, R] = typed error (L - what we are left with on error, R - right value)
// EitherT[IO, L, R] = side effect + typed error (wrapper for IO[Either[L, R]])
def handleCheckout(
customer: Customer, // extracted from session
shippingAddress: ShippingAddress // POSTed in JSON
): IO[Either[OrderError, Order]] = (for {
items <- EitherT.liftF(basketRepository.getBasket(customer))
orderID <- EitherT.liftF(generateNewOrderID)
order <- EitherT.fromEither(checkout(orderID, customer, shippingAddress, items))
_ <- EitherT.liftF(orderRepository.saveOrder(order))
_ <- EitherT.liftF(basketRepository.emptyBasket(customer))
} yield order).value // .value unwraps IO[Either[...]]
EitherT
is just one of possible approaches of doing IO with typed errors, and I don’t want to discuss them all here. I just wanted to show that for starters you can just use IO for side effects, Either for the errors that you do NOT want to dissapear accidentally somewhere in the pipeline. And then just define your domain in terms of such pipelines.
Interestingly, this is quite easy to test. It might not looks like it but:
- pure functions that don’t rely on some state, don’t require mocking
- values (
case class
es andsealed
hierarchies) can be created just with constructors, usually you can check them for equality and almost always pattern match - behaviors defined as functions are easy to stub - you just create a function in your test
- if you have a several side-effecting functions - but you hide that state in some IO making functions technically pure - when you combine these functions into a class (and treat that class as a module, rather that encapsulation uility), then you still don’t have to use mocks, you can use in-memory implementation based on e.g.
Ref
instead - in our example repository classes didn’t encapsulate any state on their own, they were just utilities written aroundRef
which might as well been replaced by config used to connect to external database. On its own each class was stateless - if you have in in-memory implementation paired with each of your “production” implementations, you can just use in-memory in your unit tests and be sure that you haven’t e.g. accidentally mocked case that never happens
Another thing to remember (and this was already noticed by OOP programmers - I saw it mentioned in Clear Code) is that types can force order of operations. You want operation A to be called before operation B? Require that B take some input that could only be produced by oepration A. (In particular smart constructors and parsing can force filtering out all invalid values and states before passing them into the domain functions). You just need to have separare types for each of these, one that could be obtained in only one obvious way. This helps guiding the user towards the pitfall of success. (Personally, I aim for the API where users can perform Ctrl-Spacebar-driven-development and achieve their goals without reading docs nor implementations, and only using types and argument names).
I am aware that you might not be interested in IO
(initially). In such case using Either
s and running many operations synchronously can still take you a long way. But there is one part of Scala which do require a bit more explanation: Akka.
Database models
If you read a bit about DDD you will met a things called Aggregate (Aggregate Root). And aggregate is a piece of data that would be:
- self-contatined - it contains all the information related to something without any need to fetch any more data to know everything about it
- unit of persistence - if you passed this piece of data it would have all the information to run transaction that would persist it - without passing any extra helper objects
This concepts is kind of hard to get, especially if you are used to CRUDs and Database-Driven-Design. So, let’s try showing this using some example. Let’s treat a Customer as this single, seld contained unit of domain data. Let’s give it ID to make it easier to trace its lifecycle:
// Customer as a self-contained piece of data (aggreagte)
final case class Customer(
id: Customer.ID,
data: Customer.Data
)
object Customer {
final case class ID(value: UUID)
final case class Data(
billingName: Customer.BillingName,
billingAddress: Address
)
final case class FirstName(value: String)
final case class LastName(value: String)
final case class BillingName(
firstName: Customer.FirstName,
lastName: Customer.LastName,
)
}
If we were modelling things in database first and then propagating DB records as our models we could instead end up with:
// If we started from Database schema
// and them leaked in into our models...
// yuk!
final case class Address(
id: UUID
country: String,
city: String,
postalCode: String,
firstLine: String,
secondLine: String
)
final case class Customer(
id: UUID,
billingFirstName: String,
billingLastName: String,
billingAddressID: UUID
)
and then our whole domain would be filled with us manually fetching and matching all these dumb objects with domain logic hidden somewhere between the noise. But we didn’t. We used the former design with Customer
as an aggregate. It’s easy to work with, even beginner would be able to add/update and test domain function. But we would like to store is in a different way. Probably one that would be compatibe to our later example:
-- treat this table as immutable
CREATE TABLE addresses (
address_id UUID PRIMARY KEY,
customer_id UUID NOT NULL, -- for ease of removing GDPR data
country country NOT NULL,
city TEXT NOT NULL,
postal_code TEXT NOT NULL,
first_line TEXT NOT NULL,
second_line TEXT NOT NULL
);
CREATE TABLE customers (
id UUID PRIMARY KEY,
billing_first_name TEXT NOT NULL,
billing_last_name TEXT NOT NULL,
billing_address UUID NOT NULL REFERENCES addresses
)
Perhaps you would see some other design more fitting your particular use cases, perhaps not. But we want to investigate what you should do in a situation where DB representation more suitable for your queries, where really a paint to work with if you used representation optimal to representing your domain. So for the sake of our example let’s assume that it is fitting DB design. So how to we make one representation cooperate with another?
Let’s start with addresses. We could use a special representation which maps nicely into our database representation. Since not all fields in DAO would be available in domain object, we would provide them as arguments when mapping into DAO, and simply dropping when mapping into domain:
// Address Data Access Object
final case class AddressDAO(
id: UUID,
customerID: UUID,
country: Country,
city: String,
postalCode: String,
firstLine: String,
secondLine: String
) {
def toDomain: Either[ParsingError, Address] = Right(
Address(
country = country,
city = Address.City(city),
postalCode = AddressPostalCode(postalCode)
firstLine = Address.FirstLine(firstLine),
secondsLine = Address.SecondLine(secondLine)
)
)
}
object AddressDAO {
// Address is missing Customer.ID so we will
// provide it as a separate parameter.
// UUID is also something we will resolve during persistence.
def fromDomain(
addressID: UUID,
address: Address,
customerID: Customer.ID
): AddressDAO = AddressDAO(
id = addressID,
customerID = customerID.value,
country = address.country,
city = address.city.value,
postalCode = address.postalCode,
firstLine = address.firstLine,
secondLine = address.secondLine
)
}
Similar situation happens when we are trying to define CustomerDAO
. We know only billingAddress
ID and not its value. So let’s resolve it:
// Customer Data Access Object
final case class CustomerDAO(
id: String,
billingFirstName: String,
billingLastName: String,
billingAddress: UUID
) {
// We're resolving AddressDAO's ID to find out the right Address value.
def toDomain(
addressDaos: List[AddressDAO]
): Either[ParsingError, Customer] = {
val parsedAddress = addressDaos.find { addressDao =>
addressDao.customerID == id && addressDao.id == billingAddress
} match {
case Some(addressDao) =>
addressDao.toDomain
case None =>
Left(ParsingError.IllegalCombination("Customer is missing BillingAddress"))
}
parsedAddress.map { billingAddress =>
Customer(
id = Customer.ID(id),
data = Customer.Data(
billingName = Customer.BillingAddress(
firstName = Customer.FirstName(billingFirstName),
secondName = Customer.SecondName(billingSecondName)
),
billingAddress = billingAddress
)
)
}
}
}
object CustomerDAO {
// Customer is missing billing Addres.ID so we will
// provide it as a separate parameter.
def fromDomain(
customer: Customer,
billingAddressID: UUID
): CustomerDAO = CustomerDAO(
id = customer.id,
billingFirstName = customer.data.billingName.firstName.value,
billingLastName = customer.data.billingName.lastName.value,
billingAddress = AddressDAO.fromDomain(
billingAddressID,
customer.data.billingAddress,
customer.id
)
)
}
We have defined mapping to and from DAOs. How could we run them against DB? Let’s take a look at example using Doobie:
// Doobie with Postgres extensions
import doobie._
import doobie.implicits._
import doobie.postgres._
import doobie.postgres.implicits._
implicit val countryMeta: Meta[Country] = pgEnumString[Country](
"country",
// These would be almost free with Enumeratum:
str => str.toLowerCase match {
case "eur" => Country.EUR
...
},
value => value match {
Country.EUR => "eur"
...
}
)
// This whole method could be optimized, but in this example
// we aim for readability, we could optimize later if needed.
def upsertCustomer(
customer: Customer
): ConnectionIO[(CustomerDao, AddressDao)] = for {
// Check is such Address already exists in DB
addressIDOpt <-
sql"""SELECT id
|FROM addresses
|WHERE country = ${customer.billingAddress.country}
|AND city = ${customer.billingAddress.city.value}
|AND postal_code = ${customer.billingAddress.postal_code.value}
|AND first_line = ${customer.billingAddress.firstLine.value}
|AND second_line = ${customer.billingAddress.secondLine.value}
|""".stripMargin.query[UUID].option
// Domain -> DAO
addressID = addressIDOpt.getOrElse(UUID.randomUUID)
addressDao = AddressDAO.fromDomain(customer.address, addressID, customer.id)
customerDao = CustomerDAO.fromDomain(customer, addressID)
// Upsert Address
_ <- sql"""INSERT INTO addresses (
| id,
| country,
| city,
| postal_code,
| first_line,
| second_line
|) VALUE (
| ${addressDao.id},
| ${addressDao.country},
| ${addressDao.city},
| ${addressDao.postalCode},
| ${addressDao.firstLine},
| ${addressDao.secondLine}
|)
|ON CONFLICT (id)
| DO NOTHING
|""".stripMargin.update.run
// Upsert Customer
_ <- sql"""INSERT INTO customers (
| id,
| billing_first_name,
| billing_second_name,
| billing_address
|) VALUE (
| ${customerDao.id},
| ${customerDao.billingFirstName},
| ${customerDao.billingLastName},
| ${customerDao.billingAddress}
|)
|ON conflict (id)
|DO UPDATE
|SET
| billing_first_name = ${customerDao.billingFirstName},
| billing_second_name = ${customerDao.billingLastName},
| billing_address = ${customerDao.billingAddress}
|""".stripMargin.update.run
} yield (customerDao, addressDao)
// This method can have 3 outcomes:
// - Customer found and parsed correctly
// - Customer not found
// - Customer found but DB state was invalid
def findCustomer(
customerID: Customer.ID
): ConnectionIO[Either[ParsingError, Option[Customer]]] =
sql"""SELECT id,
| billing_first_name,
| billing_second_name,
| billing_address
|FROM customers
|WHERE id = ${customerID}
|""".stripMargin.query[CustomerDAO].option.flatMap {
case Some(customerDAO) =>
sql"""SELECT id,
| customer_id,
| country,
| city,
| postal_code,
| first_line,
| second_line
|FROM addresses
|WHERE id = ${customerDAO.addressID}
| AND customer_id = ${customerDAO.id}
|""".stripMargin.query[AddressDAO].unique.map { addressDAO =>
customerDAO.toDomain(List(addressDAO)).map(Some(_))
}
case None =>
Right(None).pure[ConnectionIO]
}
// If we want to treat invalid DB state as unrecoverable
// error we can simply turn it into Exception.
def findCustomerUnsafe(
customerID: Customer.ID
): ConnectionIO[Option[Customer]] =
findCustomer(customerID).map {
case Right(opt) => opt
case Left(err) => throw new Exception(err.toString)
}
Here we defined methods allowing us to upsert and fetch Customer
from Postgres. We have DAO representations to make generating SQL queries easier, and domain representation, which makes working with data easier. On the other hand, there is a lot of boilerplate, so people coming from e.g. Rapid Application Development might take an issue with it. So let’s discuss the traidoffs:
- we have separated representations, which means we have to maintain both of them, as well as bidirectional mappings between them
- any change to any representation requies updating the mappings
- creating new model will often require creating a new DAO and mapping
- however, types and named parameters allows us to detect mismatches during compiulation
- there are libreries like Chimney which can lessen the burden of transforming one representation into another (we’ll discuss that later)
- we can reorganise how we persist the database completely… and we will only have to update the code that talks to it directly. All the domain code and its tests will remain untouched!
- you don’t have to think about how
Address
relates toCustomer
in database - you simply have it as data, already there
My personal experience is that this persistence-agnostic, non-uniformly-modeled approach has large upfront cost. Especially if you compare it to mentioned RAD applications. However, in a long run it simplifies development and maintenance as it limits the impact of a change. You can redesing your internal model, and still use the old persistence model, or redesign your database (or maybe even migrate to another database type) and still have your old model. In certain projects development might almost halt if you won’t start separating things. But I agree that you can see these changes paying back in a long time, so e.g. in 6 month project with a trivial CRUD logic, it won’t be noticeable. However, if you aim to continually develop something for several years and prepare it for being maintainable even after original authors are gone, I consider uniform approach to be irresponisble. And after a while, as your team will get used to it (and learn a library or to) it will regain its old speed.
Akka and its approach
If you read some sources on Akka and on how you should design your domain with it, the default solution you get is Actor = Aggregate. It implies that each command send to Actor would update Aggregate as a whole (even if it model something nested like Customer
, or Order
. But it imples also that Actor is the model. For instance we could implement it like this:
class Customer(id: String) extends PersistentActor {
override def persistenceId = s"customer:${id}"
var state = Customer.emptyState
val receiveCommand: Receive = {
case command: Customer.Command =>
persist(
Customer.handleCommand(state, command)
) { event =>
state = Customer.projectEvent(state, event)
}
}
val receiveRecover: Receive = {
case event: Customer.Event =>
state = Customer.projectEvent(state, event)
}
}
object Customer {
enum Command:
...
enum Event:
...
final case class State(
billingFirstName: String,
billingLastName: String,
billingAddress: Address
)
val emptyState = State(null, null, null)
def handleCommand(state: State, command: Command): Event = ...
def projectEvent(state: State, event: Event): State = ...
}
or - if we are not savages using Akka Classing and use Akka Typed instead:
object Customer {
enum Command:
...
enum Event:
...
final case class State(
billingFirstName: String,
billingLastName: String,
billingAddress: Address
)
val emptyState = State(null, null, null)
def handleCommand(state: State, command: Command): Effect[Event, State] = ...
def projectEvent(state: State, event: Event): State = ...
def apply(id: String): Behavior[Command] = EventSourcedBehavior(
PersistenceId.of("customer", id),
emptyState,
handleCommand,
projectEvent
)
}
Such approach works well with OOP. An actor is basically an object - we’re sending commands to it, it encapsulates its state - but which handles all commands asynchronously, don’t have to reply to all of them, and magically works in clustered environment (well, not really magically, but as far as most people are concerned - at least the ones not actually involved in comfiguring the cluster, sharding and cluster singletons - it just works).
But following this approach naively means that we would put all domain logic in an actor’s internal behaviors. The most services would be just commands handled by actors. I even saw some people writing several-hundren-lines long def receive: Receive
methods. Not to mention, that it actually couples your domain methods with Actor System, which - as far as I am concerned - is just an infrastructure (and in case of Persistent Actors also a persistence layer). Meanwhile, we would like to keep things separated, so that revisiting some infrastructural choices would not touch the core of our application.
Personally, I suggest using Actors as exactly that - an infrastrucutre layer that shouldn’t define any domain code on its own. E.g. you could still define Customer
as:
final case class Customer(
id: Customer.ID,
data: Customer.Data
)
object Customer {
final case class ID(value: UUID)
final case class Data(
billingName: Customer.BillingName,
billingAddress: Address
)
final case class FirstName(value: String)
final case class LastName(value: String)
final case class BillingName(
firstName: Customer.FirstName,
lastName: Customer.LastName,
)
}
and define some domain operations for it:
def createCustomer(
id: Customer.ID,
billingName: Customer.BillingName,
billingAddress: Address
): Customer = ...
def updateBillingName(
customer: Customer,
newBllingName: Customer.BillingName
): Customer = ...
def updateBillingAddress(
customer: Customer,
newBllingAddress: Address
): Customer = ...
and even if these operations are impure (running in IO
or whatever) you can still delegate to them from actor:
object CustomerActor {
enum Command:
case GetState(
replyTo: ActorRef[Result]
)
case Create(
replyTo: ActorRef[Result],
id: String,
billingName: Customer.BillingName,
billingAddress: Address
)
case UpdateBillingName(
replyTo: ActorRef[Result],
newBillingName: Customer.BillingName
)
case UpdateBillingAddresss(
replyTo: ActorRef[Result],
newBillingAddress: Address
)
val replyTo: ActorRef[Result] // shared by all commands
end Command
enum Event:
case Created(
customer: Customer
)
case BillingNameUpdated(
newBillingName: Customer.BillingName
)
case BillingAddressUpdated(
newBillingAddress: Address
)
end Event
enum State:
case Uninitialized
case Exists(customer: Customer)
case Deleted
enum Error:
case RequireCreated
case AlreadyCreated
case AlreadyDeleted
// custom, monomorphic type is less problematic for serialization
// then e.g. Tuples or Eithers
enum Result:
case Success(state: State)
case Failure(error: Error)
def handleCommand(
state: State,
command: Command
): Effect[Event, State] = (state, command) match {
case (_, Command.GetState(replyTo)) =>
Effect.replySuccess(replyTo)(Right(state))
case (State.Deleted, _) =>
Effect.replySuccess(command.replyTo)(Left(Error.AlreadyDeleted))
case (State.Uninitialized, Command.Create(replyTo, id, name, address)) =>
val event = Event.Created(id, name, address)
Effect.persist(event).thenReply(replyTo)(_ => handleEvent(state, event))
case (_, Command.Create(replyTo, _, _, _)) =>
Effect.replySuccess(replyTo)(Left(Error.AlreadyCreated))
case (State.Exists(_), Command.UpdateBillingName(replyTo, name)) =>
val event = Event.BillingNameUpdated(name)
Effect.persist(event).thenReply(replyTo)(_ => handleEvent(state, event))
case (_, Command.UpdateBillingName(replyTo, _)) =>
Effect.replySuccess(replyTo)(Left(Error.RequireCreated))
case (State.Exists(_), Command.UpdateBillingAddress(address)) =>
Effect.persist(Event.BillingAddressUpdated(address))
case (_, Command.UpdateBillingAddress(replyTo, _)) =>
Effect.replySuccess(replyTo)(Left(Error.RequireCreated))
}
def handleEvent(
state: State,
event: Event
): State = (state, event) match {
case (State.Uninitialized, Event.Created(id, name, address)) =>
State.Exists(createCustomer(id, name, address))
case (State.Uninitialized, _) =>
state // but it should never happen
case (State.Exists(customer), Event.BillingNameUpdated(name)) =>
State.Exists(updateBillingName(customer, name))
case (State.Exists(customer), Event.BillingAddressUpdated(address)) =>
State.Exists(updateBillingAddress(customer, address))
case (State.Deleted, _) =>
state // but it should never happen
}
def apply(id: String): Behavior[Command] = EventSourcedBehavior(
PersistenceId.of("customer", id),
emptyState,
handleCommand,
projectEvent
)
val typeKey = EntityTypeKey[CustomerActor.Command]("customer")
}
Such actor could be used (in cluster) more or less like this:
implicit val ex: ExecutionContext = ...
implicit val timeput: Timeput = ...
val sharding = ClusterSharding(system)
sharding.init(Entity(CustomerActor.typeKey) { ctx =>
CustomerActor(ctx.entityId)
})
def commandCustomer(
id: Customer.ID,
command: ActorRef[CustomerActor.Result] => CustomerActor.Command
): Future[CustomerActor.Result] = {
val ref = sharding.entityRefFor(CustomerActor.typeKey, id.value)
ref.ask(replyTo => command(replyTo))
}
// Depending on our conventions we could map returned CustomerActor.Result
// to some Either[DomainError, Customer] or whatever we find convenient.
def createCustomerImpl(
id: Customer.ID,
name: Customer.BillingName,
address: Address
): Future[CustomerActor.Result] = commandCustomer(id) { replyTo =>
CustomerActor.Command.Create(replyTo, id, name, address)
}
I find Actors perticularly challenging for this kind of separation, because they basically encourage you to implement everything as synchronous methods run within actors, and it is quite easy to end up in a situation when you cannot test your domain without starting an ActorSystem
, which - if you want to use Akka Persistence, Sharding, Clustering, etc - will require you configuring tons on things to just run a simple test. So you can easily end up writing only large integration tests… or no tests at all. It might be just my subjective feeling, but I can test Akka codebases easily only if I put up with global, shared, mutable state in my whole test suite, which I don’t like at all, and treating it merely as infrastructure layer helps me focusing on testing it as such - because all my domain logic was already tested in small, readble unit tests running mostly pure functions. But I do admit that sorting things out required a lot of discipline of me, and I can see how some people could not find it worth the effort.
Use modules, Luke
Some people believe that the only way of separating several subdomain is a request over the network - that is splitting your applications into services. While I do agree it is an option, and if you have a lot of traffic, it might help you managing the resources (e.g. by making all the newly allocated CPUs and RAM be used exclusively for the part of your application that is currently facing a higher usage), it is NOT the only way. And for many companies and projects it is a huge operations overhead that could be easily avoided.
When you have several (sub)domains you would like to have them separated. Entites in one should not rely on entities in another, neither at business-logic-level nor at persistence-level. A lot of people do not trust programmers to be able to handle that by convention (which is kinda justified), and instead they believe that it has to be forced by putting each domain into a separate repository which will run as a separate application (which isn’t always justified). How could we achieve this?
Well, many lanugages - including Scala - allows you to use access modifies to wart external world from accessing the internals. In Scala’s case you can:
- add no modifier - which would make the class/method/variable public
- you can add
private
modifier - then only other instances of the same class (and its companion object) would be able to interact with this piece of code - you can add
protected
modifier - then instances of the same class or its subclasses (and the companion object) would be able to interact with the piece of code - you can add
private[package.name]
- it will make the code available only for code declared inpackage.name
or its subpackages
The later is quite useful. I could for instance do this:
package com.mycompany.mydomaina
trait MyService {
// methods
}
object MyService {
def implementation: MyService = new MyServiceImpl
def inMemory: MyService = new MyServiceInmemory
}
private[mydomaina] class MyServiceImpl extends MyService { /* ... */}
private[mydomaina] class MyServiceInmemory extends MyService { /* ... */ }
I could put MyServiceImpl
and MyServiceInmemory
anywhere under com.mycompany.mydomaina. I wouldn't have to make them
private implementations of
MyService` companion object. I might even use another file and subpackage:
package com.mycompany.mydomaina
trait MyService {
// methods
}
object MyService {
def implementation: MyService = new impl.MyServiceImpl
def inMemory: MyService = new inmemory.MyServiceInmemory
}
package com.mycompany.mydomaina.impl
private[mydomaina] class MyServiceImpl extends MyService { /* ... */ }
package com.mycompany.mydomaina.inmemory
private[mydomaina] class MyServiceInmemory extends MyService { /* ... */ }
So I could define my published language as public API and everything internal for the domain as private
. But I understand that some people don’t trust programmers with that, after all you can always remove that modifier.
But then there is another way of limitting that access: spliting your applications into modules. For instance I could define the following structure:
my-application/
|
+- modules/
|
commons/ -- common definitions
|
domain-foo/ -- published language of domain Foo
|
domain-foo-impl/ -- implementation of domain Foo
|
domain-bar/ -- published language of domain Bar
|
domain-bar-impl/ -- implementation of domain Bar
|
app/ -- wireing everything together
// sbt
val root = project.in(".")
.aggregates(commonds, foo, fooImpl, bar, barImpl, app)
// commons
lazy val commons = project.in("modules/commonds")
// domain A
lazy val foo = project.in("modules/foo")
.dependsOn(commons)
lazy val dooImpl = project.in("modules/foo-impl")
.dependsOn(foo)
// domain B
lazy val bar = project.in("modules/bar")
.dependsOn(commons)
lazy val barImpl = project.in("modules/bar-impl")
.dependsOn(bar)
// whole App
lazy val app = project.in("modules/app")
.dependsOn(fooImpl, barImpl)
Here, preventing exposure if internals is even easier. foo
and bar
modules would have only published language - things that describe the domain (values, entities, services, events) that are known outside. But the implementation - persistence, event handling, domain logic - is hidden inside fooImpl
and barImpl
. They do not know anything about each other, and any attempt to use something defined in on in the other would end up as compilation failure. If one domain has to talk to another service, it has to be done through published language. But if you want to trace the flow of the logic inside an app - exerything is still within the same codebase!
Each module might have its own dependencies and configs, so when we’ll initialize them we might e.g. easily prevent making all services using the same database and and the same schema. Each module might have a separate configuration that connects to different persistent storages. For instance what I sometimes do is to define configs and module initialization helpers. One for domain/module Foo
:
package com.mycompany.foo
import cats.effect.{IO, Resource}
final case class FooConfig()
final case class FooModule(
service1: Service1,
service2: Service2
)
object FooModule {
// use foo.impl classes
def implementation(
config: FooConfig
): Resource[IO, FooModule] = ...
}
and for domain Bar
:
package com.mycompany.bar
import cats.effect.{IO, Resource}
final case class BarConfig()
final case class BarModule(
service1: Service1,
service2: Service2
)
object BarModule {
// use bar.impl classes
def implementation(
config: BarConfig
): Resource[IO, BarModule] = ...
}
and then inside some app
module when I am wireing whole app together:
// could be read from some HOCON using e.g. pureconfig
final case class AppConfig(
foo: FooConfig,
bar: BarConfig
)
val appConfig: AppConfig
for {
FooModule(
fooService1,
fooService2
) <- FooModule.implementation(appConfig.foo)
BarModule(
barService1,
barService2
) <- BarModule.implementation(appConfig.bar)
// use fooService1, fooService2, barService1, barService2 here
} yield println("Initialized!")
So if you e.g. used Doobie for your database connections, you could initialize Transactor
inside each .implementation
method using data provided by each module’s specific config. Since module would manage its lifecycle within Resource
and hide it from the module users (you only have to return things declared as published language and are free hide all the internal details) you couldn’t even reuse the same connections in 2 modules separating them on architectural level. To have them connect to the same database you would have to pass them the same configuration.
And if one module depended on another, it would only have to depend on the published language. Dependency e.g. service could be taken as an argument on module initializetion, and that would be it. You you just have to initialize Resource
s in the right order to get the instance that you could pass to the constructor. The only difficult thing would be a circular dependency… but usually that’s a good thing.
With this kind of setup I was easily able keep the complexity of some monolythic applications in check. Sure, there are limitations to its scalability… but they are much further away than most microservice evangelists claim them to be.
Quality of life improvements
While the biggest improvement that we can get is moving to Scala 3 (as soon as all the libraries we need are already available for it - which means as soon as all macros we use are rewritten), there are things that we can do to make things easier even now. This section will describe some one them.
Enumerations
If our enum type (declared wither as enum
or as sealed
) is made only of case object
s, we might want to be able to get all of its values at once. It should be possible to iterate over them, or get the collection of all of its members, for variety of reasons. In Scala 2 it used to be implemented using scala.Enumeration
:
object MyEnum extends Enumeration {
val A, B, C = Value
}
MyEnum.values.foreach {
case A => ...
case B => ...
case C => ...
}
But scala.Enumeration
has several issues:
-
its values are not of type
X
(whereobject X extends Enumeration
) but of typeX.Value
, since Value is internal type ofscala.Enumeration
-
each value returned by
Value
is of the same type, so the only thing that differes them is their reference -
you could provide your own fields and methods to
Value
… but it is less intuitive than withsealed
hierarchies// Example from the docs: object Planet extends Enumeration { // You have to override the internal type of Enumeration... protected case class PlanetVal(mass: Double, radius: Double) extends super.Val { def surfaceGravity: Double = Planet.G * mass / (radius * radius) def surfaceWeight(otherMass: Double): Double = otherMass * surfaceGravity } // ... and them use implicit converion to downcast each value import scala.language.implicitConversions implicit def valueToPlanetVal(x: Value): PlanetVal = x.asInstanceOf[PlanetVal] val G: Double = 6.67300E-11 val Mercury = PlanetVal(3.303e+23, 2.4397e6) val Venus = PlanetVal(4.869e+24, 6.0518e6) val Earth = PlanetVal(5.976e+24, 6.37814e6) ... }
-
but worst of all, since compiler knows nothing about
scala.Enumeration
(it’s defined on library-level, not on a language- or type-level) compiler cannot check for exhaustivity, so you can either skip somecase
because the compiler didn’t complain and get the runtime error, or - once you addcase _ =>
- don’t notice that you should have added a special handling for a newly added value
Using sealed trait
s/sealed class
es/enum
s makes dealing with all of the above straightforward… but you are loosing the ability to get all of the values as collection. This is solved by Enumeratum
library:
import enumeratum._
sealed abstract class Planet(
mass: Double,
radius: Double
) extends EnumEntry // extend EnumEntry in your type
object Planet extends Enum[Planet] { // and Enum in its companion
val G: Double = 6.67300E-11
case object Mercury extends Planet(3.303e+23, 2.4397e6)
case object Venus extends Planet(4.869e+24, 6.0518e6)
case object Earth extends Planet(5.976e+24, 6.37814e6)
val values = findValues // macro provided by Enumeratum
}
Enumeratum generates a lot of things out of the box. In our earlier example (with MoneyDTO
) we had to map Currencies to their name and back again. With Enumeratum we could implement Currency
as EnumEntry
:
sealed trait Currency extends EnumEntry
object Currency extends Enum[Currency] {
case object CAD
case object EUR
case object USD
...
val values = findValues
}
and use methods provided by it:
final case class MoneyDTO(
currency: String,
amount: BigDecimal
) {
def toDomain: Either[ParsingError, Money] = for {
// withNameInsensitiveEither is enumeratum utility
currency <- Currency.withNameInsensitiveEither(this.currency)
.left.map(_ => ParsingError.InvalidEnum("Currency", this.currency))
amount = Ratio(this.amount)
} yield Money(
currency = currency,
amount = amount
)
}
object MoneyDTO {
def fromDomain(domain: Money): MoneyDTO = MoneyDTO(
// .name is generated by enumeratum as well
currency = domain.currency.name,
amount = domain.amount.asFraction
)
}
Since it is so useful it has integrations with a lot of libraries: Circe, Doobie, Tapir, etc.
Enumeratum is not available for Scala 3, although a lof of its functionalities is provided by enum
out of the box:
enum Currency:
CAD, EUR, USD, ...
Currency.valueOf("CAD")
Currency.fromOrdinal(0)
Currency.values
Leaner value wrappers (AnyVal
, opaque
, @newtype
)
When we wanted to create a new type by wrapping a primitive, we did it this way:
final case class Username(value: String)
It works, it cooperates with derivation, apply
and unapply
, but it allocates. Some people might complain that every single little things in their codebase allocate an extra wrapper. The first solution to this problem was AnyVal
:
final case class Username(value: String) extends AnyVal
Such code will avoid wrapping… most of the time. If we take a look at the JVM bytecode we will see, that there is raw String
passed around in many places. But not all places. Pattern matching wouldn’t be able to distinct if we are matching normal String
or Username
String
. And language designers considered that too risky. So we have the code that wraps and allocates for pattern matching. And for several other circumstances, like passing into some generic interface (including collections and functions).
For that reason people who would like to NOT wrap and allocate EVER came up with their own solutions. There were several historical attempts (the first one by Miles Sabin’s tagged types, then adding the @@
type alias by Jason Zaugg, then several different reimplementations by various projects). Finally, one particular implementation, based on macro annotation, caught. The NewType library:
import io.estatico.newtype.macros.newtype
import scala.language.implicitConversions
// HAS TO be placed in some object (might be package object)
@newtype
final case class Username(value: String)
The code above is translated into:
// This explains the need for object
// (type alias cannot be top level definition in Scala 2) ...
type Username = Username.Type
object Username {
type Repr = String
type Base = Any { type Username$newtype }
trait Tag extends Any
type Type <: Base with Tag
def apply(x: String): Username = x.asInstanceOf[Username]
// ...and this exaplains the need for implicitConversions
implicit final class Ops$newtype(val $this$: Type) extends AnyVal {
def value: String = $this$.asInstanceOf[String]
}
}
This (like any other tagged type) replies on a compilation trick where you use asInstanceOf
to cast your primitive type into something that is considered a separate type by the type system, but which won’t trigger any runtime error. Since compiler don’t know almost anything about Username.Type
it won’t let you call any funny methods that would cause the runtime error. And all the methods that you added to the case class
are being rewritten into something that casts the type back into its actual representation. Ingenious!
NewType doesn’t provide any integration on its own… but it lets you generate code for your inner representation and safely cast it using Coercible
type class. This way you might decide whether each instance should be derived for a particular NewType or for all NewTypes.
While NewType doesn’t provide any integration, the derivation library Derevo
provides one for it.
In Scala 3 NewType is not available, but its role is quite well fulfilled by opaque type
s.
opaque type Username = String
// code here knows that Username =:= String
object Username {
def apply(value: String): Username = value
def unapply(value: Username): String = value
}
extension (username: Username)
def value: String = username
If you are not happy about the need to implement apply
, unapply
, etc yourself you can easily implement the helpers:
trait NewType[Repr] {
opaque type Type = Repr
// You can use wrap to convert wrapped value to a wrapper.
final protected def wrap(value: Repr): Type = value
// We're using Product matcher, which requires returning a Product.
final def unapply(value: Type): Tuple1[Repr] = Tuple1(value)
extension (newType: Type)
def value: Repr = newType
}
// You have to implement apply yourself
// in case you need a smart constructor.
object Username extends NewType[String] {
def apply(value: String): Type = wrap(value)
}
type Username = Username.Type
// extension method works
Username("test1").value
// apply is matched with unapply
Username("test2") match {
case Username(value) => println(value)
}
opaque type
s, similarly to @newtype
never box in runtime, which is an advantage when you want to avoid allocations, and an issue if you are fervent user of a runtime reflection. (I am not).
Autogenerating smart constructors from type (Refined types)
Before, we used a smart constructor to prevent creating invalid value of Ratio
type.
sealed abstract case class Ratio private (asFraction: BigDecimal)
object Ratio {
def parse(value: BigDecimal): Either[ParsingError, Ratio] =
if (value >= BigDecimal(0)) Right(new Ratio(value) {})
else Left(ParsingError.IllegalCombination("Ratio has to be non-negative"))
}
With Scala 3 improvements (or with -Xsource:3
as mentioned before) we could simplify it to:
case class Ratio private (asFraction: BigDecimal)
object Ratio {
def parse(value: BigDecimal): Either[ParsingError, Ratio] =
if (value >= BigDecimal(0)) Right(Ratio(value))
else Left(ParsingError.IllegalCombination("Ratio has to be non-negative"))
}
But you still have to write the code. And this is where Refined Types come into play.
import eu.timepit.refined._
import eu.timepit.refined.api.Refined
import eu.timepit.refined.numeric._
val fraction: Either[String, BigDecimal Refined NonNegative] =
refineV[NonNegative](BigDecimal(1))
println(fraction) // Right(1)
Refined types use types to put constraints on our type (refine it) and then generate the parsing code based solely on this type. It is very useful, although moving around BigDecimal Refined NonNegative
doesn’t tell us much about the usage of this type. Using a type alias:
type Fraction = BigDecimal Refined NonNegative
val fraction: Either[String, Fraction] =
refineV[NonNegative](BigDecimal(1))
is a bit better, but it is still an alias and it can be mixed with some other value of the same constraints. So e.g. Gabriel Volpe popularises in his book refined-newtype approach:
@newtype
case class Ratio(asFraction: BigDecimal Refined NonNegative)
To create Ratio
we have to provide it with validated BigDecimal
so we no longer need to write a smart constructor (altough we may as an utility method using autogenerated code). Meanwhile Refined
value has to go through parsing. So out of the box, to create such value we have to go through:
val ratio: Either[String, Ratio] =
refineV[NonNegative](BigDecimal(1)).map(Ratio(_))
This way it is much harder to inject wrong values into our domain. Sometimes we might find it too annoying - especially, if we know the value is hardcoded to something that is correct. But if we are using literals (String
s, Int
s, etc), there are macros lifting types automatically:
import eu.timepit.refined.auto._
// Implicit conversion happens in the compile time
// if the compiler can prove that BigDecimal(1) is NonNegative.
val ratio = Ratio(BigDecimal(1))
(of course, all runtime values have to be validated).
There is ongoing effort about figuring out a way of implementing Refined in Scala 3, so it might take a while, but eventually it should be possible to implement them. (See blogpost by Michał Sitko, or tweets of Tamer Abdureadi).
Easier work with data types (Quicklens and Chimney)
When you are working with immutable datatypes at some point you will want to modify nested data. There is no issue when you want to do this:
final case class Foo(
foo1: String,
foo2: String
)
val original = Foo(foo1 = "original", foo2 = "original")
val modified = original.copy(foo1 = "modified")
but it quickly gets annoying when you’ll start working with nested structures:
final case class Foo(
foo1: Bar,
foo2: String
)
final case class Bar(
bar1: Baz,
bar2: String
)
final case class Baz(
baz1: String,
baz2: String
)
val original = Foo(
foo1 = Bar(
bar1 = Baz(
baz1 = "original",
baz2 = "original"
),
bar2 = "original"
),
foo2 = "original"
)
val modified = original.copy(
foo1 = original.foo1.copy(
bar1 = original.foo1.bar1.copy(
baz1 = "modified"
)
)
)
Do you see how much effort you have to do in order to modify just one nested field? The fact that you have to modify all the intermediate values is pretty apparent. Cases like that is what makes people dislike nested models, even if they would much better suit the domain that they are implementing.
Except this is not needed at all. All the code can be easily generated using Quicklens:
import com.softwaremill.quicklens._
// updating nested value to precomputed value
val modified1 = original.modify(_.foo1.bar1.baz1).setTo("modified")
// updating nested value with a function
val modified2 = original.modify(_.foo.bar1.baz1).using(old => "modified")
This apprach works with sealed hierarchies as well:
final case class Foo(foo: Bar)
sealed trait Bar extends Product with Serializable
object Bar {
final case class Baz1(bar: Baz) extends Bar
case object Baz extends Bar
}
final case class Baz(baz: String)
val original = Foo(Bar.Baz1(Baz("original")))
val modified = original.modify(_.foo.when[Bar.Bar1].bar.baz).setTo("modified")
The mathematics behind these immutable getters and setters is called optics. They might sound scary if we would always roll them ourselves, but for working with a domain data Quicklens is more than enough. And if you want to take a look at some specific application e.g. one that you could use in a library take a look at how Circe optics use Monocle under the hood.
Quicklens is available for Scala 3, so there is no harm in using it if you intend to migrate to Scala 3 anytime soon.
Updating nested data is something we would do inside our domain, probably somewhere in our pure functions. But how about the edges of the boundary context? All these pieces of code, where we would rewrite one model into another very-similar-but-not-identical model? We had that case already:
final case class AddressDAO(
id: UUID,
customerID: UUID,
country: Country,
city: String,
postalCode: String,
firstLine: String,
secondLine: String
) {
def toDomain: Either[ParsingError, Address] = Right(
Address(
country = country,
// Let's assume that all of the below are @newtypes
city = Address.City(city),
postalCode = AddressPostalCode(postalCode)
firstLine = Address.FirstLine(firstLine),
secondsLine = Address.SecondLine(secondLine)
)
)
}
object AddressDAO {
def fromDomain(
addressID: UUID,
address: Address,
customerID: Customer.ID
): AddressDAO = AddressDAO(
id = addressID,
customerID = customerID.value,
country = address.country,
city = address.city.value,
postalCode = address.postalCode,
firstLine = address.firstLine,
secondLine = address.secondLine
)
}
There is a library that could help us transform one into another - Chimney. But let’s explain it step by step:
final case class Foo(baz: Int)
final case class Bar(baz: Int)
In such case if I wanted to rewrite Foo
into Bar
all fields would have the same names and types. So I could do:
val foo = Foo("test")
import io.scalaland.chimney.dsl._
val bar1 = foo.into[Bar].transform
// or for short
val bar2 = foo.transformInto[Bar]
The library would create whole transformation for us if every field in the target has a corresponding field in thfooe source, and it knows how to turn the source type into the target type. If the field was renamed or we wanted to provide the value ourselves (if it was non-obvious or missing) we could:
// Field Foo.baz is translated as Bar.newName
foo.into[Bar].withFieldRenamed(_.baz, _.newName).transform
// Field Bar.baz is computed using computeValueOfBaz(foo)
foo.into[Bar].withFieldComputed(_.baz, foo => computeValueOfBaz(foo))
But we can also provide the transformation as a type class:
final case class Source(field: OldField)
final case class Target(field: NewField)
implicit val oldToNew: Transformer[OldField, NewField] = ...
// Source.field will be converted to Target.field using Transformer
source.into[Target].transform
This should be enough knowledge to generate the most mundane part of domain to DAO transformations and back:
-
if we like to use
@newtype
s for domain primitives, we could generateTransformer
s for them// Handles wrapping and unwrapping of every @newtype that we are using! implicit def newTypeTransformer[From, To]( implicit ev: Coercible[From, To] ): Transformer[From, To] = value => ev(value)
-
then we only have to use it in the code:
final case class AddressDAO( id: UUID, customerID: UUID, country: Country, city: String, postalCode: String, firstLine: String, secondLine: String ) { def toDomain: Either[ParsingError, Address] = Right( // Not used fields will be skipped, // @newtypes will be wrapped. this.transformInto[Address] ) } object AddressDAO { def fromDomain( addressID: UUID, address: Address, customerID: Customer.ID ): AddressDAO = // Missing fields are explicitly provided, // @newtypes will be unwrapped. this.into[AddressDAO] .withFieldConst(_.id, addressID) .withFieldConst(_.customerID, customerID) .transform }
This code might look like some runtime reflection cheat, but it’s a type-safe code. If you modify your case class
es and the compiler will loose the ability to provide the safe mapping, the compilation will fail. If it compiled, all fields on the new value can be computed.
Actually, we could relax the requirements a bit. We could use Chimney to perform transformations with parsing: if data can be (recursively) validated, it will transform it. If it cannot be validated, you’ll get error. It would come handy in our second example:
final case class CustomerDAO(
id: String,
billingFirstName: String,
billingLastName: String,
billingAddress: UUID
) {
def toDomain(
addressDaos: List[AddressDAO]
): Either[ParsingError, Customer] = {
val parsedAddress = addressDaos.find { addressDao =>
addressDao.customerID == id && addressDao.id == billingAddress
} match {
case Some(addressDao) =>
addressDao.toDomain
case None =>
Left(ParsingError.IllegalCombination("Customer is missing BillingAddress"))
}
parsedAddress.map { billingAddress =>
Customer(
id = Customer.ID(id),
data = Customer.Data(
billingName = Customer.BillingAddress(
firstName = Customer.FirstName(billingFirstName),
secondName = Customer.SecondName(billingSecondName)
),
billingAddress = billingAddress
)
)
}
}
}
object CustomerDAO {
def fromDomain(
customer: Customer,
billingAddress: UUID
): CustomerDAO = CustomerDAO(
id = customer.id,
billingFirstName = customer.data..billingName.firstName.value,
billingLastName = customer.data.billingName.lastName.value,
billingAddress = AddressDAO.fromDomain(
billingAddressID,
customer.data.billingAddress,
customer.id
)
)
}
In the current version of Chimney the counterpart of a Transformer
that can transform and validate at once, is TransformerF
(or lifted transformer) - named like this because you can put your validating algebra as a type parameter (e.g. F[A] = Either[List[String], A]
, List
because we are aggregating errors instead of taking the first one). For instance, we could define transformation for
final case class CustomerDAO(
id: String,
billingFirstName: String,
billingLastName: String,
billingAddress: UUID
) {
def toDomain(
addressDaos: List[AddressDAO]
): Either[ParsingError, Customer] =
// We take the piece of Customer that is the most similar
// to the source CustomerData.
this.into[Customer.Data]
// We start by mapping values that are already there,
// but in different format (under different name)...
.withFieldRenamed(_.billingFirstName, _.firstName)
.withFieldRenamed(_.billingLastName, _.lastName)
// ...then we add logic which can fail - we model it
// with F[A]=Either[ParsingError, A].
.withFieldConstF(_.billingAddress, {
addressDaos.find { addressDao =>
addressDao.customerID == id && addressDao.id == billingAddress
} match {
case Some(addressDao) =>
addressDao.toDomain // delegate to AddressDao parsing
case None =>
Left(ParsingError.IllegalCombination("Customer is missing BillingAddress"))
}
})
.transform
// We got Either[ParsingError, Customer.Data],
// so we add the final touch.
.map(Customer(
id = id.transformInto[Customer.ID],
data = _
))
}
object CustomerDAO {
def fromDomain(
customer: Customer,
billingAddress: UUID
): CustomerDAO =
// We take the piece of Customer that is the most similar
// to the target CustomerDAO.
customer.data.billingAddress.into[CustomerDAO]
// Values missing from the input we provide manually...
.withFieldConst(_.id, customer.id.value)
.withFieldConst(_.billingAddress, billingAddress)
// ...and for values which are already there but under
// different name we provide the rename - Chimeny will
// figure out how to repack them if it is obvious.
.withFieldRenamed(_.firstName, _.billingFirstName)
.withFieldRenamed(_.lastName, _.billingLastName)
.transform
}
These 2 examples show us:
- that we are actually able to derive transformations between 2 different represenations of the same data, and these transformations can be generated recursively. (For more advanced features, I recommend consulting the documentation),
- that the more similar structures of source data format and target data format are, the less code we would have to write. If the formats are spitting images of each other, then we there is almost no configuration to provide - so we can make that decision early and don’t imose on us costs of writing stupid code.
- that when the differences between formats become signifficant, then configuring generator might take more code than simply writing the code by hand - but at this point, we already have 2 separate models, each serving its own purpose, so maintaining this mapping by hand doesn’t feel wrong.
My personal take away is that I don’t have to wait with separating models till some later or. I can split them immediatelly, generate mappings between them without any configurations, and just add configurations as they diverge until by-hand mapping would feel justified.
Both Quicklens and Chimney removed reasons against using ADTs fo modeling our domain:
- we don’t have to fear editing nested immutable data
- we don’t have to fear mismatch between requirements of different edges of your application (how you communicate with the users, how you persist the data, how you express everything in-between)
- we don’t have to maintain tons of bolerplate without any interesting logic
Just in case anyone has to hear it - it doesn’t mean that can just write the first ADT representation that will come to mind and it will work out. Probably some representations will make it easier to transform, exchange or persist the data, and you will move into that direction by trail-and-error. However, your hands are not tied and you could apply these optimizations later, as you prove that you need them.
Generating behavior from type
Using ADT as values in you model has a great advantage - you can generate a lot of code just from the information about the type (quite often using a type class). Among the use cases already mentioned in this article, are:
- JSON decoders and encoders (e.g. Circe)
- JDBC mappings for data-to-query-params and results-to-data (e.g. Put and Get from Doobie)
- getters, setters, prisms, lenses (e.g Quicklens or Monocle)
- data tranformations (e.g. Chimney)
but there are also:
- Show and ShowPretty from Cats for logging and debugging
- Diffx to display difference between data in e.g. tests
Implementation of these generators is not a subject of this article (there will be a separate one for that). We are only interested int the facts that they exist, we are using them, and we can make their usage easier. In particular, we can make using type classes easier.
Usually, we have 2 ways of generating the behavior:
-
automatic derivation -
TypeClass[MyData]
is generated when you need it without waiting for any action from your side. You forgot to provide an instance yourself? No problem: as long as we have enough information to generate the implementation, we will do it where you need it. This approach will make starting easy, you just need to add an import e.g.import io.circe.generic.auto._
in case of Circe, and all codecs will magically be there for you. Sometime a library will provide it out of the box (e.g. Doobie). The disadvantage is that if you needed to provide a custom implementation, because a derived one is somehow wrong, you might accidentally derive a new one and not notice it. Becauses of that it is safer to use the other option which is
-
semiautomatic derivation - you have to explicitly ask the compiler to derive the type class for you. For instance for Circe that would be:
import io.circe.generic.semiauto._ implicit val myEncoder = deriveEncoder[MyData]
This removes the risk of accidentally introducing a second implementation, but might be a pain. E.g. Circe will NOT derive codecs for your codecs elements recursively, so if your codec needs another codec, and it is not deried, compilation will fail. On the other hand e.g. Jsonter Scala derives codecs recursively all the way down.
Usually, you want to have only one implementation of a type class for your type. Usually, you also don’t want to import it manually, every time you need it. And you can solve this problem, by putting the implementation into its companion object:
final case class Foo(field)
object Foo {
implicit val decoder: Decoder[Foo] = ...
implicit val encoder: Encoder[Foo] = ...
}
Obviously, that makes sens if you don’t mind storing the implementation in the same place as the data. I would mind e.g. storing JSON codecs, Doobie mappers and any non-domain logic in my domain models. But I wouldn’t mind storing them next to DTO or DAO objects. I also wouldn’t mind storing there e.g.
cats.Show
,cats.Eq
orcats.Order
type classes as the ability toshow"Debug print $foo"
,foo1 === foo2
orfoo1 < foo2
are always useful inside the domain.But all these type classes that you don’t want to put into companions you will have to import from somewhere so you might e.g. put them into some thematically organized
object
s and import in batches.import io.circe.generic.semiauto._ object FooCodecs { implicit val fooDecoder: Decoder[Foo] = deriveDecoder[Foo] implicit val fooEncoder: Encoder[Foo] = deriveEncoder[Foo] // and so on }
import FooCodecs._ // import all codecs at once
So making sure that there is only one implementation will help us ensure the correctness. But there is another reason why we should do this: derivation adds to the compiler time, so if we can do the jobs once, we avoid unnecessarily slowing down the compilation. And - if the implementation is val
(not possible for generic types which need to take another implicit) we also initialize them once, so we also save some time on not allocating things multiple times. But I would focus on the correctness part, the later is optimization and optimization is something you should base on benchmarks.
Derivation next to type definition
Ok, so we decided to use some type classes and we want to avoid unnecessary boilerplate when the implementations we want are the defaults provided by derivation. Both Scala 2 and Scala 3 had it solved.
Scala 2 has libraries like Derevo, which would let you put these type classes into companion through macro annotation:
import derevo.derive
import derevo.cats.{eq, show}
import derevo.circe.{decoder, encoder}
@derive(eq, show, decoder, encoder)
final case class Foo(int: Int, string: String)
Meanwhile, Scala 3 has a build-in mechanics for that:
final case class Foo(int: Int, string: String)
derives Eq, Show, Decoder, Encoder
// Provided that Eq, Show, Decoder, Encoder
// companion have the right .derived method.
Derevo supports @newtype
(if you configure it to do so). Usually, the provided configuration will just cast the wrapper type to its inner representation and vice-versa, while deriving the type class for the inner type.
// Our type class.
trait MyType[A] {
// ...
}
// Our own derivation configuration for Derevo.
object MyType with Derivation[MyType] with NewTypeDerivation[MyType] {
// maybe we use macro
def instance[A]: MyType[A] = macro MyTypeDerivation.impl
// or maybe we use some semiauto pattern
def instance[A](implicit A: DerivedMyType[A]): MyType[A] = A
}
@derive(MyType) @newtype
final case class Foo(int: Int)
Although, you would only wrote this if there was no existing integration available as a library (and there is a few already).
Putting all imports in one place
There is a few more things to make derivation easier. If you ere ever tired of doing several imports, in every file that had to generate some behavior or provide several intgrations, there is a solution.
For instance, when you are using Doobie and you have to repeat the same imports in each file:
import doobie._
import doobie.implicits._
import doobie.implicits.legacy.instant._
import doobie.postgres._
import doobie.postgres.implicits._
import doobie.refined._
you can actually take a look at the source code if these imports and learn that they usually follow the patern:
package library
trait SupportForOneThings { ... }
trait SupportForAntherThing { ... }
package object allThings
extends SupportForOneThings
with SupportForAntherThing
// import library.allThings._
Since virtually everything is put into traits and then package objects are composed out of them, we might as well compose our own objects with all imports already defined for us. For e.g. for Doobie we could:
package com.ourcompany
// Allows `import DoobieSupport._` instead of... a lot of imports.
// Additionally provides support for a few useful but missing features.
object DoobieSupport
extends doobie.Aliases // basic functionalities
with doobie.hi.Modules
with doobie.syntax.AllSyntax
with doobie.free.Modules
with doobie.free.Types
with doobie.free.Instances
with doobie.postgres.Instances // postgres extensions (without postgis)
with doobie.postgres.hi.Modules
with doobie.postgres.free.Modules
with doobie.postgres.free.Types
with doobie.postgres.free.Instances
with doobie.postgres.syntax.ToPostgresMonadErrorOps
with doobie.postgres.syntax.ToFragmentOps
with doobie.postgres.syntax.ToPostgresExplainOps
with doobie.refined.Instances // refined types
with doobie.util.meta.MetaConstructors // Java Time extensions
with doobie.util.meta.TimeMetaInstances
// import com.ourcompany.DoobieSupport._
We might also put there our own definitions! We wrote out own support for newtypes?
implicit def coercibleMeta[R, N](
implicit
ev: Coercible[Meta[R], Meta[N]],
R: Meta[R]
): Meta[N] = ev(R)
We can put them in our Support
object (or in a trait
that we would compose into Support
object).
We want to handle enumeratum types automatically?
implicit def enumeratumMeta[A <: enumeratum.EnumEntry](
implicit
enum: enumeratum.Enum[A],
typeName: TypeName[A]
): Meta[A] =
Meta[String].timap(enum.withNameInsensitive)(_.entryName)
Same story. We can build such all-in-one for virtually everything… as long as authors provides all definitions in trait
s. If they put things directly into package object
s you would have to redirect each definition manually. (I wrote the code for both cases for Doobie, Tapir, Jsoniter and Pureconfig in one of my projects if you need an example).
Things are much easier in Scala 3. It deprecated package object
s and all values are put as into top level definitions. The patter with trait
s and package object
s will die (mixing into a normal objects could be still a thing). However, Scala 3 introduced an export
as a dual to import
:
package com.mycompany.support.doobie
export doobie.{given, *}
export doobie.implicits.{given, *}
export doobie.implicits.legacy.instant.{given, *}
export doobie.postgres.{given, *}
export doobie.postgres.implicits.{given, *}
export doobie.refined.{given, *}
// import com.mycompany.support.doobie.{given, *}
So building these single line imports should be even easier.
Intermediate objects
There is an issue I see on StackOverflow every now and then. Customizing the derivation. You have to use:
final case class Foo(
bar: String,
baz: Int
)
but it has to be mapped to JSON like:
{
"payload": {
"fooBar": ""
"fooBaz": 0
}
}
My first suggestion would be: just write a FooDTO
which maps exactly to the JSON structure:
final case class FooDTO(
fooBar: String,
fooBaz: Int
)
final case class FooEnvelope(
payload: FooDTO
)
and derive the type class for DTO, and map it (just lke we already shown in previous examples).
But let’s say that - for reasons whatever - you have to make your codecs deserialize directly to this Foo
representation. Usually, you would see the suggestion to use whatever the library at hand offers to build your codec by hand (e.g. with combinators that Circe or Play JSON provide us).
Thing is, this is not the easiest way of implementing such mapping. The pattern that I use (in these rare cases where it makes sens in the first place) I just put the mapping inside the codec, e.g.:
final case class Foo(
bar: String,
baz: Int
)
object Foo {
// Hide the intermediate representation
// from the world.
private final case class FooDTO(
fooBar: String,
fooBaz: Int
)
private final case class FooEnvelope(
payload: FooDTO
)
// Cherish consistent representation between
// whatever type class you are using.
implicit val decoder = deriveDecoder[FooEnvelope]
.map {
case FooEnvelope(FooDTO(bar, baz)) =>
Foo(bar, baz)
}
implicit val encoder = deriveEncoder[FooEnvelope]
.contramap {
case Foo(bar, baz) =>
FooEnvelope(FooDTO(baz, baz))
}
// Tapir schema matching codecs
implicit val schema = Schema.derived[FooEnvelope]
.asInstanceOf[Schema[Foo]]
}
Majority of type classes offer .map
, .contramap
or similar, so this pattern let you make sure that all of these implementations are consistent. E.g. JSON codecs and Tapir schema matches so your autogenerated Swagger documentation actually represents what you accept/return.
I don’t use this often as domain-DTO-DAO separation solves a lot of problems like this (and let you decide when do the translation and how/when to handle errors). But occasionaly it is useful if your DTO codec itself has some weird customized things going on, and you are strongly willing to handle them before the controller’s logic actually starts.
Some useful practices
The last topic that I wanted to touch in this article is a set of practices, that could help you working with your domain model. They aren’t patterns like practices discussed so far, but rather some recommendations:
- try to avoid primitive types in your domain (
String
,Int
,Boolean
,Array
s, etc) - they carry no meaning, you cannot add any guards to them, it is easy to mix them with other values of the same type but used in a different context in your domain, and they require additional documentation of theit purpose.AnyVal
s/@newtype
s/opaque
will let you control the usage of each simple domain definition while keeping it cheap in the runtime - try to avoid
Boolean
types in particular - two element enum will work as good, but they will additionally carry the context of what each value mean - if you have to use
Boolean
, use named parameters -refreshData(entity, true)
requires you jumping to the function declaration to figure out what is going on,refreshData(entity, invalidateCaches = true)
immediatelly tells you what each parameter mean - if you have more than 3 parameters or any types repeat, prefer calling it with named parameters - IDE will happily fill the names for you, and future you will be glad when it will be able to tell at a glance whether or not parameters were mistakangly switched
- do not use
.toString
, Cats’sShow
and similar for any domain-related logic! - while this functionality is super-coventient is is also super unspecified and beyond any contracts. If you want to serialize something , create a dedicated method. Even if the particular type at hand could delegate to.toString
don’t rely on this behavior designing you logic. While some types outputs the same format that they would accept inYourType.parse
method, you want to explicltly handle such important logic rather than relying on what is usually just a debug data intended for logging. Any serialization and deserialization should be explicitly specified and used with interface intended only for this purpose, and an interface defined for every object on ht JVM without any contracts is a poor candidate for that - domain code and business code is not a library code - I saw that certain Scala leaders argue that your business logic should be written just like your libraries, with config only specified in program entrypoint (
main
) but I fundamentally diagree with that. Libraries need to be flexible, because yoy don’t know who your user is, and you only restrict them, when you know that these restrictions are needed. Business logic is the opposite of that: it’s as restrictive as possible, only being permissive when it has to be, because you don’t want to commit to functionalities that would be accidental. It is much easier to contrain the business logic to only work with certain types, with hardcoded types for errors (e.g. hardcoded error ADTs in your result type), side-effects (e.g. hardcoded IO monad, or monad transformer). In my humble experience, majority of programmers that I work with struggled with tagles final, hardly anyone could understand MTL, and virtually in no project that I saw so far, people actually utilised the power that these concepts were supposed to grant them. If you have to pick a monad to model side-effects and errors explicitly I would rather suggest picking Monix BIO, ZIO’s IO orEitherT
of Cats Effect IO (in this order). In library code tagless final is OK, in business code - avoid like a plague.
These are just my own suggestions, from my own experience. If your are different, good for you. But I will stay at mine.
Summary
In this article we discussed some ways of modeling your domain logic using values (in form of case class
es and sealed
hierarchies) and functions (defining behaviors and domain operations). We showed that a lot of our domain logic can be encoded explicitly, as just updates to passed immutable value. It helps us following the first approach from Tony Hoare’s quote:
There are two methods in software design. One is to make the program so simple, there are obviously no errors. The other is to make it so complicated, there are no obvious errors.
We also showed existing, ready to use ways of dealing with the boilerplate, which could discourage us from implementing this design.
This approach isn’t always possible, surely not always possible to implement it everywhere. But it is simple and easy to understand so we should treat it as good default and starting point unless we are proven to know that it would be inefficient. We also have to remember that each problem could be expressed in a several different ways, which don’t have to work out equally well - so even with functional approach we still have to think about our model and look for better ways of expressing the problem at hand.
I hope that you found this article helpful, even though it didn’t discussed everything about degining your programin Scala. The following articles will add more to it.