Code Smell: Concrete Abstraction
This is a hand-wavy philosophical article about programming, without quantifiable justification, but with some actionable advice and a case study.
Suppose that there are two types in the program, Blorb
and Gonk
.
Suppose also that they both can blag
.
Does it make sense to add the following trait?
I claim that it makes sense only if you have a function like
That is, if some part of you program is generic over T: Blag
.
If in every x.blag()
the x
is either Blorg
, or Gonk
, but never a T
(each usage is concrete), you don’t need this abstraction.
“Need” is used in a literal sense here: replace a trait with two inherent methods named blag
, and the code will be essentially the same.
Using a trait here doesn’t achieve any semantic compression.
Given that abstractions have costs “don’t need” can be strengthen to “probably shouldn’t”.
Not going for an abstraction often allows a for more specific interface.
A monad in Haskell is a thing with >>=
.
Which isn’t telling much.
Languages like Rust and OCaml can’t express a general monad, but they still have concrete monads.
The >>=
is called and_then
for futures and flat_map
for lists.
These names are more specific than >>=
and are easier to understand.
The >>=
is only required if you want to write code generic over type of monad itself, which happens rarely.
Another example of abstraction which is used mostly concretely are collection hierarchies.
In Java or Scala, there’s a whole type hierarchy for things which can hold other things.
Rust’s type system can’t express Collection
trait, so we have to get by with using Vec
, HashSet
and BTreeSet
directly.
And it isn’t actually a problem in practice.
Turns out, writing code which is generic over collections (and not just over iterators) is not that useful.
The “but I can change the collection type later” argument also seems overrated — often, there’s only single collection type that makes sense.
Moreover, swapping HashSet
for BTreeSet
is mostly just a change at the definition site, as the two happen to have almost identical interface anyway.
The only case where I miss Java collections is when I return Vec<T>
, but mean a generic unordered collection.
In Java, the difference is captured by List<T>
vs Collection<T>
.
In Rust, there’s nothing built-in for this.
It is possible to define a VecSet<T>(Vec<T>)
, but doesn’t seem worth the effort.
Collections also suffer from >>=
problem — collapsing similar synonyms under a single name.
Java’s
Queue
has add
, offer
, remove
, and poll
methods, because it needs to be a collection, but also is a special kind of collection.
In C++, you have to spell push_back
for vector
’s push operation, so that it duck-types with deque
’s front
and back
.
Finally, the promised case study!
rust-analyzer needs to convert a bunch of internal type to types suitable for converting them into JSON message of the Language Server Protocol.
ra::Completion
is converted into lsp::Completion
; ra::Completion
contains ra::TextRange
which is converted to lsp::Range
, etc.
The first implementation started with an abstraction for conversion:
This abstraction doesn’t work for all cases — sometimes the conversion requires additional context.
For example, to convert a rust-analyzer’s offset (a position of byte in the file) to an LSP position ((line, column)
pair), a table with positions of newlines is needed.
This is easy to handle:
Naturally, there was an intricate web of delegating impls. The typical one looked like this:
There were a couple of genuinely generic impls for converting iterators of convertible things.
The code was hard to understand.
It also was hard to use: if calling .conv
didn’t work immediately, it took a lot of time to find which specific impl didn’t apply.
Finally, there were many accidental (as in “accidental complexity”) changes to the shape of code: CTX
being passed by value or by reference, switching between generic parameters and associated types, etc.
I was really annoyed by how this conceptually simple pure boilerplate operation got expressed as clever and fancy abstraction. Crucially, almost all of the usages of the abstraction (besides those couple of iterator impls) were concrete. So I replaced the whole edifice with much simpler code, a bunch of functions:
Simplicity and ease of use went up tremendously.
Now instead of typing x.conv()
and trying to figure out why an impl I think should apply doesn’t apply, I just auto-complete to_proto::range
and let the compiler tell me exactly which types don’t line up.
I’ve lost fancy iterator impls, but the
total diff
for the commit was +999,-1123
.
There was some genuine code re-use in those impls, but it was not justified by the overall compression, even disregarding additional complexity tax.
To sum up, “is this abstraction used exclusively concretely?” is a meaningful question about the overall shape of code. If the answer is “Yes!”, then the abstraction can be replaced by a number of equivalent non-abstract implementations. As the latter tend to be simpler, shorter, and more direct, “Concrete Abstraction” can be considered a code smell. As usual though, any abstract programming advice can be applied only in a concrete context — don’t blindly replace abstractions with concretions, check if provided justifications work for your particular case!
Discussion on /r/rust.