Becoming More Functional

People on /r/scala sometimes ask how to make their Scala more functional, or about what "advanced" techniques they should learn. This is a list aimed at people who already follow the twitter Scala style guide, and want to know where to go from there. I'll assume ScalaZ is in scope (because I can't find the scaladocs for Cats); learning ScalaZ may be a useful reference for some things. I will also use kind-projector syntax as this tends to be more readable even if one isn't actually using kind-projector.

Use types to represent data, avoid branching

Types help you keep track of distinctions in your code - if a value has two different states, make them two different types. e.g. a few months ago I had a bug where I passed a graph to a function that expected that graph to have been filtered by another function first. Solution: make the filtered graph and the unfiltered graph different types.

Replace general foldLeft and friends with more specific operations

foldLeft is a very general/powerful method which makes it hard to reason about - it's more or less as powerful as a general imperative for loop - though at least it reduces the scope of the "variables" and makes it explicit exactly what is being threaded through each iteration. So while one of the first steps in making a codebase more functional is replacing looping constructs with foldLeft calls, a next step is often to replace that with a more specific/constrained construct:

Use standard for/yield-enabled types for "secondary" parameters and concerns

Often code has to manage a secondary concern as well as the primary thing it does. Often these are application-wide "cross-cutting concerns", e.g. audit logging, database transaction management, async I/O, or error handling - we can think of these as "effects". There is a tension between making these things explicit enough that the reader understands what the code is "actually" doing (and isn't confused by "magic" or subtle differences in the handling of different effects as in AOP approaches), and ensuring that the "happy path" and primary concern of the code remains clear (a difficulty in "handle errors where they happen" approaches).

Scala's for/yield offers a useful "third way": one can write a chain of for { a <- f(); b = g(); c <- h() } yield ... where the reader can clearly see where the secondary concerns are happening (the <- calls) but they don't obscure the straight-through control flow (and the function can remain single-entry/single-exit). We can shift seamlessly between the "value perspective" (where the full effectful value is an ordinary value that we can reason about like any other value, and if necessary compose "manually" with flatMap - remember that for/yield is just a different syntax for flatMap chains) and the "happy path perspective" (where we write our code in "straight through style" and trust (in a compiler-verified way) that the secondary effects will be handled somewhere) as appropriate for a given piece of code.

Better still, there are well-known libraries of these types that have already been written, covering many of the common cases and making it easy for your colleagues to know exactly what any given effectful value represents. There are also library functions for managing effects. E.g. to sequence effectful operations on collections, use operations like traverse mentioned in the previous section. If you've defined a treelike datatype using matyroshka, the standard traversal operations on it will come with in "monadic" versions that work like traverse (i.e. they will perform the traversal using flatMap to compose the effects at each stage). Note that these library functions are also usable for any custom effects that conform to the standard interfaces.

Tentative: Combining multiple for/yield-oriented types

A theory I'm considering lately is that effects are only ever problematic when two or more effects interact. E.g. implicit, pervasive, unmanaged state mutation is fine on its own. Implicit, pervasive, unmanaged asynchronicity is fine on its own. But the interaction of both is extremely difficult to debug. So traditional imperative programming allows working with one effect at a time, but no more.

The techniques in the previous section provide a huge advance over this, because they make it practical to work with two effects at once: one effect that you're managing via for/yield, and one implicit, pervasive, global effect. This is probably enough for most programs and even for many libraries (which potentially have to deal with user-defined effects if they ever accept callbacks or similar), as demonstrated by the fact that the techniques for dealing with more have only really been developed in the last few years, at least in Scala. (Haskell handles I/O as an explicitly managed effect, so without these techniques Haskell programs would expend their for/yield-equivalent managing I/O and have difficulty expressing explicit management/sequencing of other effects, at least in code that was also performing I/O). Certainly everything in this section is tentative/experimental. However, a limit of two effects seems decidedly inelegant, and can become a practical issue as a codebase gets large enough or wants to manage many effects explicitly. So:

Home