A Trick For Test Maintenance
This is a post about an interesting testing technique which feels like it should be well known. However, I haven’t seen it mentioned anywhere. I don’t even have a good name for it, I’ve semi-discovered it in the wild. If you know how this thing is called, please leave a comment!
A long time ago…
I was reading Dart analysis server source code, and came across this line. Immediately I was struck as if by lighting. Well, not exactly in the same way, but you get the idea.
What does this line do? I actually don’t know, but I have a guess. My explanation is further down (to give you a chance to discover the trick as well!), but the general idea is that this line helps tremendously with making tests more maintainable.
The two mundane problems
Two tasks which programmers typically enjoy less than furiously cranking out new features are maintaining existing code and writing tests. And, as an old Russian joke says, maintaining tests is the worst. Here are some pain points specific to the post:
Negative tests. You want to check that something does not happen. Writing a test in this situation is tricky because the test might actually pass for a trivial reason instead of the intended one. The rule of thumb is to verify that the test actually fails if the specific condition which it covers is commented out. The problem with this rule of thumb is that it works in a single point in time. As the code evolves, the test might begin to pass for a trivial reason.
Duplicated tests. Test suites are usually append-only and grow indefinitely. Almost inevitably this leads to a situation where different tests are testing essentially the same features, or where one test is a superset of another.
Bifurcated suites. Somewhat similar to the previous point, you may end up in a situation where a single component has two separate test-suites in different parts of the code base. I’d want to say that this happens when two developers write tests independently, but practice says that me and me one month later are enough to create such a mess :)
Tests discoverability. This is a problem a new contributor usually faces. Finding a piece of code where the bug fix should be applied is usually comparatively easier than locating the corresponding tests.
The underlying issue is that it is non-trivial to answer these two questions:
-
Given a line of code, where is the test for this specific line?
-
Given a test, where is the code that is being tested?
The solution
The beautiful solution to this problem (which I hypothesise the
_coverageMarker()
line in Dart does) is to track code coverage on the
test-by-test basis. That is, when running a test, verify that
specific lines of code were covered by this test.
I’ve put together a small Rust library to do this, called
uncover
. It provides two macros:
covered_by
and covers
.
The first macro is used in the code under test, like this:
The second macro is used in the corresponding test:
If the block where covers
is used does not cause the execution of
the corresponding covered_by
line then the error will be raised at
the end of the block.
Under the hood, this is implemented as a global HashMap<String, u64>
which
counts how many times each line was executed. So covered_by!
increments
the corresponding count, and covers!
returns a guard object that
checks
in Drop
that the count was incremented. It is possible to disable these checks
at compile time. And yes, the library actually
exposes
a macro which defines macros :)
I haven’t had a chance to apply this technique in large projects (and it is less useful for smaller ones), but it looks very promising.
It’s now easy to navigate between code and tests: just ripgrep the string literal (or write a plugin for this for your IDE). You will be able to find the test for the specific if-branch! This should be especially handy for new contributors.
If this technique is used pervasively, you also get an idea about the overall test coverage.
During refactorings, you became aware of tests which might be affected. Moreover, because coverage is actually checked by the tests themselves, you’ll notice if some test stop to exercise the code it was intended to check.
Once again, if you know how this thing is called, please do enlighten me in comments! Discussion on /r/rust.