Becoming More Functional
People on /r/scala sometimes ask how to make their Scala more functional, or about what "advanced" techniques they should learn. This is a list aimed at people who already follow the twitter Scala style guide, and want to know where to go from there. I'll assume ScalaZ is in scope (because I can't find the scaladocs for Cats); learning ScalaZ may be a useful reference for some things. I will also use kind-projector syntax as this tends to be more readable even if one isn't actually using kind-projector.
Use types to represent data, avoid branching
Types help you keep track of distinctions in your code - if a value has two different states, make them two different types. e.g. a few months ago I had a bug where I passed a graph to a function that expected that graph to have been filtered by another function first. Solution: make the filtered graph and the unfiltered graph different types.
if
/else
is generally a sign that you want an "ADT" (sealed trait
). So is a datastructure full ofOption
s orEither
s, especially if the state of one field affects that of another (e.g. "ifa
isSome
thenb
isLeft
")match
constructs are easy to write unsafely and can often be replaced withfold
(e.g. nevermatch
anOption
or anEither
(unless you need to for@tailrec
))- It might be worth defining your own
fold
methods on any customsealed trait
s.
- It might be worth defining your own
- If you find yourself writing a
fold
withidentity
or{}
in one of the branches, see whether the datatype defines a more specialized method for that use case (e.g.Option#getOrElse
). - Use shapeless-based typeclass derivation to avoid having to write boilerplate for custom datatypes
- Particularly applicable to "walk the object graph"-like problems e.g. JSON serialization.
- This is much safer than reflection (and higher-performance too) since it happens at compile time rather than run time, and can give you an error if you try to e.g. include a
File
in your JSON output.
- Tentative: Use matryoshka to avoid boilerplate for custom tree-like datatypes where you want to enable traversal.
Replace general foldLeft
and friends with more specific operations
foldLeft
is a very general/powerful method which makes it hard to reason about - it's more or less as powerful as a general imperative for
loop - though at least it reduces the scope of the "variables" and makes it explicit exactly what is being threaded through each iteration. So while one of the first steps in making a codebase more functional is replacing looping constructs with foldLeft
calls, a next step is often to replace that with a more specific/constrained construct:
- Sometimes there is simply a method for your case e.g.
find
,forall
exists
,groupBy
,min
,max
andpartition
. ScalaZseparate
also sees some use. reduce
should often besuml
- you may need to define aMonoid
instance for your type, or use shapeless-scalaz to derive onemap
followed bysuml
isfoldMap
foldLeft
where the body includes aflatMap
should usually betraverse
(orsequence
). This can sometimes be simplified further:traverse
followed bymap(_.suml)
isfoldMapM
traverse
followed bymap(_.flatten)
istraverseM
- If a
foldLeft
accumulates or modifies a "secondary" parameter along with its primary operation, often the clearest way to express this is usingWriter
orState
(see the next section). In which case you can then often rewrite thefoldLeft
as atraverse
as above.
Use standard for
/yield
-enabled types for "secondary" parameters and concerns
Often code has to manage a secondary concern as well as the primary thing it does. Often these are application-wide "cross-cutting concerns", e.g. audit logging, database transaction management, async I/O, or error handling - we can think of these as "effects". There is a tension between making these things explicit enough that the reader understands what the code is "actually" doing (and isn't confused by "magic" or subtle differences in the handling of different effects as in AOP approaches), and ensuring that the "happy path" and primary concern of the code remains clear (a difficulty in "handle errors where they happen" approaches).
Scala's for
/yield
offers a useful "third way": one can write a chain of for { a <- f(); b = g(); c <- h() } yield ...
where the reader can clearly see where the secondary concerns are happening (the <-
calls) but they don't obscure the straight-through control flow (and the function can remain single-entry/single-exit). We can shift seamlessly between the "value perspective" (where the full effectful value is an ordinary value that we can reason about like any other value, and if necessary compose "manually" with flatMap
- remember that for
/yield
is just a different syntax for flatMap
chains) and the "happy path perspective" (where we write our code in "straight through style" and trust (in a compiler-verified way) that the secondary effects will be handled somewhere) as appropriate for a given piece of code.
Better still, there are well-known libraries of these types that have already been written, covering many of the common cases and making it easy for your colleagues to know exactly what any given effectful value represents. There are also library functions for managing effects. E.g. to sequence effectful operations on collections, use operations like traverse
mentioned in the previous section. If you've defined a treelike datatype using matyroshka, the standard traversal operations on it will come with in "monadic" versions that work like traverse
(i.e. they will perform the traversal using flatMap
to compose the effects at each stage). Note that these library functions are also usable for any custom effects that conform to the standard interfaces.
- Code that produces a value and accumulate a secondary value (often a list) should be represented as ScalaZ
Writer
- This is particularly useful for structured logging, possibly with the treelog library
- If you want to thread a secondary value through a series of function calls that also need to change that secondary value, use ScalaZ
State
- One especially clear sign of this is if you're passing the secondary value into functions and getting a tuple of (primary result, new secondary value) back.
- For validation-like code:
- Want fail-fast? Use
Either
(in pre-2.12 Scala use ScalaZ\/
or theEither
enhancements from recent Cats)- If you need to integrate with a library that uses exceptions for failures, you can convert these into
Either
values using the constructs inscala.util.control.Exception._
: catching(classOf[SomeSpecificException]) either someLibraryMethod
(returnsEither[SomeSpecificException, ...]
)nonFatalCatch either someLibraryMethod
(catches all the exceptions that are sensible to retry - everything except fatal system errors)catching(classOf[SomeSpecificException]) opt someLibraryMethod
(returnsOption[...]
)
- If you need to integrate with a library that uses exceptions for failures, you can convert these into
- Want to accumulate all failures? Use ScalaZ
Validation
and accept that you won't be able to usefor
/yield
for
/yield
can't accumulate all errors, because later validations are allowed to depend on the results of earlier ones, but if an earlier validation fails there's no input value for the later validation.- Look at applicative chaining (using
*>
) or "applicative builder syntax" (using|@|
/⊛
) instead.
- Want to accumulate failures but still return a result value even if there are failures? Use ScalaZ
Writer
Writer
can usefor
/yield
and accumulate all failures, because earlier validations always return a value even when there's a failure.
- Want fail-fast? Use
- Need to pass a read-only "context" value down through your business-logic layers even though it's only going to be used at low level? ScalaZ
Reader
might be appropriate.- This can be used as a replacement for global constants /
object
s / singletons or even as a form of dependency injection, but beware of overusing it. If anobject
contains no business logic I would leave it as a "global static"object
, and only move towards aReader
style if you actually need to pass different values on occasion (e.g. test stubs). - IMO conventional dependency injection can be a relatively benign form of "magic" provided you:
- Use constructor injection (rather than field injection), so that object constructors and fields still behave as expected. In Scala this is if anything more concise, and it ensures that objects can still easily be constructed "normally" e.g. for unit testing.
- If there is the possibility of tooling (e.g. "find references") not knowing about the DI mechanism, ensure that there is some visible marker on classes that are constructed through DI so that a reader can immediately see this class is instantiated in a non-standard way.
- Spring example:
@Component class MyService @Autowired (someDependency: SomeDependency)
- For a "green field" project, MacWire is a good pure-Scala option.
- That said, manual object construction is lightweight enough in Scala that I generally prefer to do my DI "by hand".
- I appreciate the theoretical elegance of the "cake pattern", but I find it's too much (code-level) overhead to use in practice.
- This can be used as a replacement for global constants /
- Have a piece of effectful code that you can't or won't model in detail, but still want to be able to pass around as a value (i.e. control when the effects happen)? Use ScalaZ
Task
. - Want to do async I/O? Use ScalaZ
Task
.- This is usually better than using akka actors, since
Task
is typesafe and you can keep reasoning about functions rather than having to think about messages. - If your async tasks need to access isolated pieces of state concurrently, they can safely use traditional Java tools e.g.
AtomicInteger
,LongAdder
,ConcurrentHashMap
,AtomicReference
. - Actors are only useful if you have two or more pieces of state that you need to access concurrently but also always keep in sync, IME. (Or if you need akka's distribution functionality)
- You can also use (standard library)
Future
, but beware that it doesn't control when the effects happen Future
s with effects inside them aren't generally values you can pass around and control when they actually happen - rather the effects (e.g. a web request) start immediately when theFuture
is instantiatedFuture
would make sense for pure computations. But async in general probably has more overhead than it's worth for cases where you're working simultaneously rather than waiting simultaneously - where async shines is things like external web requests - and in those cases you usually want to control when the I/O happens.
- This is usually better than using akka actors, since
- Have operations that need to happen in some kind of "block" or "context"? (e.g. a database transaction) Represent the operations as a value that you pass into a single method that does the open/close, so that you can't have a path where you forget to match them up.
- This is often a good replacement for "magic" proxies/interceptors (based on method annotations, XML pointcuts or similar)
- At its simplest the value could just be a function (or a
Task
created usingTask.delay
) - In that case you have to be careful not to allow the context to escape (e.g. a file handle that will be closed at the end of the block)
- Tentative: there is a theoretical technique for avoiding this, but I don't think there's a practical library for it yet.
- If you want a more declarative/introspectable/testable way to express your commands, define a custom ADT (
sealed trait
). - If you want to allow "composite" commands connected by functions (so that you can pass around e.g. a series of database operations to be executed in a single transaction), the Free monad is a way to do this without any boilerplate. You define the "primitive" operations in your ADT (e.g.
Load
/Save
), and then building composite operations out of them (usingfor
/yield
syntax) is supported without any further code. - Tentative: If you want to allow composite commands but ensure they can be executed in parallel, consider
FreeAp
.
Tentative: Combining multiple for
/yield
-oriented types
A theory I'm considering lately is that effects are only ever problematic when two or more effects interact. E.g. implicit, pervasive, unmanaged state mutation is fine on its own. Implicit, pervasive, unmanaged asynchronicity is fine on its own. But the interaction of both is extremely difficult to debug. So traditional imperative programming allows working with one effect at a time, but no more.
The techniques in the previous section provide a huge advance over this, because they make it practical to work with two effects at once: one effect that you're managing via for
/yield
, and one implicit, pervasive, global effect. This is probably enough for most programs and even for many libraries (which potentially have to deal with user-defined effects if they ever accept callbacks or similar), as demonstrated by the fact that the techniques for dealing with more have only really been developed in the last few years, at least in Scala. (Haskell handles I/O as an explicitly managed effect, so without these techniques Haskell programs would expend their for
/yield
-equivalent managing I/O and have difficulty expressing explicit management/sequencing of other effects, at least in code that was also performing I/O). Certainly everything in this section is tentative/experimental. However, a limit of two effects seems decidedly inelegant, and can become a practical issue as a codebase gets large enough or wants to manage many effects explicitly. So:
- Double-
flatMap
(flatMap { _.flatMap {... }}
, or similar constructs involvingmap
e.g.flatMap { _.map { ... } }
) is often a sign that you should be using a monad transformer. - The simplest way to work with monad transformers is usually to define a single consistent "stack" that you can use throughout your application.
- E.g.
EitherT[WriterT[Task, Vector[AuditEvent], ?], ValidationError, ?]
for an application that needs to record audit events, report validation errors, and perform async I/O. - You can define type aliases for your stack, and helper methods for "lifting" single effects into a complete stack:
type Action[A] = EitherT[WriterT[Task, Vector[AuditEvent], ?], ValidationError, A]
def log(ae: AuditEvent): Action[Unit] = EitherT.rightU[ValidationError](WriterT.put(Task.now({}))(Vector(ae)))
- E.g.
- If you need to write code that can be reused in different effect stacks (in different parts of your application which have different effect stacks e.g. with/without database access, or because you're writing a library that will be used with a user-provided effect stack), you can write it in a "stack-generic" form using a typeclass constraint:
def log[F[_]](ae: AuditEvent)(implicit mt: MonadTell[F, Vector[AuditEvent]]): F[Unit] = mt.tell(Vector(ae))
- You can also put the
F[_]
type parameter on a service class. - Accept dependencies parameterized by the same
F
:class MyService[F[_]: MonadTell[?, AuditEvent](dependentService: DependentService[F])
- This also lets you instantiate with a minimal
F
in unit tests (e.g.Writer[Vector[AuditEvent], ?]
), and then the full "stack" (as required by the realDependentService
) for the "real" service. - Alternatively, look into one of the various "free coproduct" libraries - FreeK seems to have the most momentum behind it. (Paperdoll is my own effort).