Language Design is Hard. Let's Go Shopping

There's been a bit of fuss around Scala on Hacker News this week, which ultimately pointed me towards some more useful criticism (video). Paul Philips may not be a great speaker, but what he's saying is important, and in many cases true. I agree with the criticism, but disagree with the proposed solutions[1]. So I want to talk about why Scala is my favourite language, and what I'd do to make a better one.

Where Philips starts to go wrong, I think, is about 2/3 of the way through his video, where he says that for a typical project we have maybe four layers of languages involved: Ant XML, Java, and Scala. Leaving aside the questionable arithmetic, this is simply false for modern Scala projects: it's very much possible to write a project in pure scala, with no Java anywhere, and with SBT your build is configured in Scala too. And that's not all; things that would be another format in Java are handled in pure Scala. Squeryl's Table models are normal code, where Hibernate would be configured with annotations or XML. Spray-json serializers are again ordinary scala, as opposed to Jackson's annotation approach. Play does use a separate route-mapping file and I consider this a mark against it; this is one of the main reasons my employer uses Spray instead. In Spray, your route declarations are, once again, simply code. To my eyes this is half the genius of Scala: things that would have to be config files in another language are instead just Scala code that uses particular functions and classes[2].

To a certain extent, you get this in Python (and Ruby, as I understand it). Your build system is python; your Django models are python classes, and your routes are configured with a mix of code and decorators - and unlike Java annotations, decorators have well-defined semantics in Python, so this is really just pure python code.

All this is true...ish. Django models are real python classes... sort of (they use custom metaclasses, which are approximately equivalent to macros in terms of their effect on the predictability of your code). Django's core config file ends up returning a bunch of strings, arranged in some maps - which class to use for a particular piece of functionality is determined by a string classname, not an object you pass around. The routing configuration is somewhat better - it boils down to a map of matcher -> callable. It's nice that we can at least build these objects up using real python code - your Django config file can connect to a database to assemble its configuration if you want - but it doesn't hold a candle to the flexibility of Spray, where a route is really just a function, any function, and you can compose two routes in the standard fashion, or in your own custom way if that's not good enough for you - e.g. randomly choosing between two subroutes.

Of course it's possible to Greenspun this into Django, but don't expect the existing routing infrastructure to work with you. Don't expect to be able to pass a Django route into a higher-order function defined in a different library and get something useful out. I mean, Python won't stop you - the "we're all consenting adults here" philosophy means you can call any method with any argument - and it might even work, for a while. But there's no way to know (short of asking the devs, or hoping their roadmap is accurate) which styles of calling routes are supported, and which will be broken in the next release.

This is the problem with dynamic syntax in, well, dynamic languages. As Philips puts it, ignorance is strength, and freedom is slavery; the only way to make maintainable software is by imposing non-leaky interfaces between layers (and while this is compromised slightly by reference equality, how much worse is it in a language where we can dynamically add attributes onto an existing object?). Sadly I only know one framework (as distinct from library) that has ever used proper abstractions: Wicket, whose private members and final classes ensure that the only possible ways to use it are the supported ones. But Scala inherits from Java at least the possibility of doing this, and the growing emphasis on compatibility in recent releases makes me hopeful for the future.

The other development that lightens my heart is Scala 2.11's modularization work. The second half of Scala's thesis is that a language core should be small but flexible; despite the power and complexity of Scala libraries, Scala's language specification is much shorter than Java's, because it offers a relatively small number of very general features in the core language. E.g. typeclasses, a core language feature for Haskell, are an "emergent feature" arising from Scala's implicits and higher kinds. C# has async/await as a core language feature; Scala enables a similar coding style with its for/yield sugar, but this generalises to other "context-like objects"[3] such as Validation or Option. Fantom has Option-like null handling as a core language feature, which sounds cool until you realise this means you can't make methods that are polmorphic in nullness, or methods that work for Option and other "context-like" types (e.g. Scalaz's sequence, which is the same method whether it's applied to a Set[Validation] or a List[Future] or a Vector[Option]). Actors, a core feature in Erlang, are just another library in Scala. Which always made it a little odd that XML literals were baked into the language, but as of 2.11 they will be optional, along with documentation generation, parser combinators, the compiler and the interpreter. As Philips acknowledges during Q&A, and contrary to his earlier fears, the compiler internals have been greatly improved in recent releases. Some parts of Scala may already have ossified, but most of the criticism I hear is in the opposite direction: that Scala changes too often, without putting enough emphasis on backwards compatibility.

Still, there are misfeatures that seem unlikely to change. Being unable to fuse chained calls to filter() because we aren't entitled to assume a filter function won't have side effects is definitely a downside. And while I wouldn't rule out the possibility of a clever Scalaz implementation of the function we actually want, analogous to ≟ (the safer equals method), Philips is right that these should be core language features. Most if not all of the methods on java.lang.Object have no business being there implicitly; user-created classes might default to extending something like this so that beginners can get started without having to understand monads and typeclasses, but there should at least be the option to opt one's class out of reference equality and express this at the type level.

I'm not familiar with any matryoshka languages, but they seem like a plausible route to what I want. I can immediately identify several levels that I want to distinguish between: inert data, pure functions, functions that perform I/O, functions that use other kinds of effects like dates, objects that contain mutable state. We can bolt this on with EFFTP, but I've had experience with similar annotation-driven systems in Java; unless the language is designed with them in mind, they tend to require too much handholding to be of much use, and break down at inconvenient times. I don't necessarily want a full matryoshka language, but I do want these effect distinctions to be visible at the type level - and I don't think Scala is willing to do this.

Can we even make this type-level distinction without compromising accessibility for beginning programmers, as Haskell seems to? Honestly, I don't know. Having seen this even more detailed critique of some current problems with Scala from Edward Kmett, I'll be watching Ermine with interest. Kmett seems to know and love Scala, and the problems he identifies are real (though I'm not convinced they're insoluble within Scala). The emphasis on code quality of the compiler itself is well deserved; a shortage of this seems to be holding back not just Scala but even Java. And Ermine is built with compilation to Javascript in mind, which, sadly, seems like it will be a necessity in the future.[4]

Of course, Ermine is, as Kmett says, unbaked. Which brings me back to the worst possibility, that maybe programming languages have a natural half-life. Perhaps every sufficiently popular language reaches a point where backward compatibility becomes too important to allow major changes, where decisions that turned out to be wrong can no longer be corrected. Python, which was my favourite language before Scala, is five years into a migration centered on what're ultimately some quite minor (though deeply necessary) changes to string handling, and expects to take five more before it's done. Worse, "BDFL" Guido van Rossum seems opposed to many functional tools, downplaying map/reduce/filter in the new version. Do languages reach a point where their maintainers feel compelled to "dumb down" for a wider audience? Scala's recent introduction of language feature flags, and Odersky's recent talks (revealing a disturbing plan to, as Kmett puts it, "take the parts of it that do work (type parameters) and possibly replac[e] them with the parts that more often don't (existential members)"), make me fear that we may be headed in that direction.

So, in five years' time I may have to jump ship. Maybe Ermine or a similar upstart will represent the pragmatic rewrite of Scala that I want, increasing safety without diminishing its power or accessibility. Maybe one of the existing super-safe languages like Coq will finally figure out how to make this kind of programming accessible to mere mortals. Maybe someone will show me how to use Haskell to solve OO-shaped problems and how to live without implicits.

But ultimately, right now, the start of Philips' talk is the most relevant part. While Scala doesn't make control and isolation of effects as easy as I'd like, and Kmett describes plenty of niggles when one starts dealing with higher-level types, it's still the best language available for practical programming today. Whenever I return to Python (or Perl, or Ruby, or...) I wonder how I ever survived without implicits and compile-time typing. Clojure appears to come with a typed module available, but I fear this would feel bolted on in the same way as EFFTP. When I look back to Java I'm horrified at how much code it takes to write something as simple as a chain of async calls, and know I could never stand to work with Validations in this kind of language; even Java 8 or Dart seem far too heavy. Looking at Kotlin I find myself envious of the first-class delegation support, but I couldn't give up typeclass functionality (and in any case the language is too immature for me to use). Even F# simply isn't powerful enough (never mind OCaml), and I've grown too used to this style to give it up. Haskell... I could survive in Haskell, if I had to. But comparing Scala to Haskell, the JVM infrastructure is nice, the library ecosystem is very nice (particularly given continued reports of Cabal's flakiness), and the ease of using OO style where appropriate is just the icing on the cake. Is Haskell actually simpler and easier to learn than Scala? Maybe, and maybe it's easier by enough to be worth the loss of these features. But my experience is that it's far easier to persuade one's employer to adopt Scala; interoperability with your existing Java codebase is a huge plus (and it's only by having first-class support for OO style, along with more esoteric things like existential types, that Scala is able to offer this interoperability). Is it worth the sacrifices? For me, 100% - because if it weren't for that interoperability, I'd still be writing Java.

[1] I'm reminded of Zed Shaw's complaints about the web and OOP

[2] To me the value of a codebase in a single language is almost self-evident; it's what enables shared code ownership, which I know from experience is far better than the alternative. But for a more populist argument, look at the explosive growth in Javascript use in recent years. Javascript isn't a good language; in fact it's a downright terrible language, with typing that's not merely dynamic but outright weak, notions of inheritence and even scoping that differ from every other mainstream language, and violations of the Principle of Least Surprise left and right (video). But for many people, all of this is outweighed by the benefits: you can have your server-side and client-side code running in the same language, and things like route definitions and build files are also usually written in Javascript. Even better, if you're using a funky modern datastore like MongoDB/CouchDB/etc., you store Javascript objects in the "database", and you can query them with a map-reduce construct of Javascript functions.

[3] I'm avoiding saying the "M" word here.

[4] I don't entirely agree with Ermine's emphasis on "row types" - IMO SQL represents the wrong approach to the problem it solves, and I'd rather extend a "real language" like Scala to be better at solving SQL-like problems (that is, problems that need to work natively with sets of results, and problems involving semi ad-hoc querying over large datasets, where we can do some degree of precomputation/indexing because we know broadly what kind of queries will be performed, but can't fully specify our queries up front). To this end Spark/Shark is far more interesting, and I hope to start using it on Real Problems soon. I also have serious doubts about using a non-strict language - I work on the kind of problems where consistent performance can be even more important than high throughput.

Home