Self Modifying Code
This post has nothing to do with JIT-like techniques for patching machine code on the fly (though they are cool!). Instead, it describes a cute/horrible trick/hack you can use to generate source code if you are not a huge fan of macros. The final technique is going to be independent of any particular programming language, but the lead-up is going to be Rust-specific. The pattern can be applied to a wide variety of tasks, but we’ll use a model problem to study different solutions.
I have a field-less enum representing various error conditions:
This is a type I expect to change fairly often. I predict that it will grow a lot. Even the initial version contains half a dozen variants already! For brevity, I am showing only a subset here.
For the purposes of serialization, I would like to convert this error to and from an error code. One direction is easy, there’s built in mechanism for this in Rust:
The other direction is more annoying: it isn’t handled by the language automatically yet (although there’s an in-progress PR which adds just that!), so we have to write some code ourselves:
Now, given that I expect this type to change frequently, this is asking for trouble!
It’s very easy for the
match and the enum definition to get out of sync!
What should we do? What can we do?
Now, seasoned Rust developers are probably already thinking about macros (or maybe even about specific macro crates). And we’ll get there! But first, let’s see how I usually solve the problem, when (as I am by default) I am not keen on adding macros.
The idea is to trick the compiler into telling us the number of elements in the enum, which would allow us to implement some sanity checking. We can do this by adding a fake element at the end of the enum:
Now, if we add a new error variant, but forget to update the
ALL array, the code will fail to compile — exactly the reminder we need.
The major drawback here is that
__LAST variant has to exist.
This is fine for internal stuff, but something not really great for a public, clean API.
Now, let’s get to macros, and let’s start with the simplest possible one I can think of!
Pretty simple, heh? Let’s look at the definition of
That’s … quite literally a puzzle! Declarative macro machinery is comparatively inexpressive, so you need to get creative to get what you want. Here, ideally I’d write
Alas, counting in macro by example is possible, but not trivial. It’s a subpuzle! Rather than solving it, I use the following work-around:
And then I have to
#![allow(non_upper_case_globals)], to prevent the compiler from complaining.
The big problem with macro is that it’s not only the internal implementation which is baroque! The call-site is pretty inscrutable as well! Let’s imagine we are new to a codebase, and come across the following snippet:
The question I would ask here would be “what’s that
Error thing is?”.
Luckily, we live in the age of powerful IDEs, so we can just “goto definition” to answer that, right?
Well, not really.
An IDE says that the
Error token is produced by something inside that macro invocation.
That’s a correct answer, if not the most useful one!
So I have to read the definition of the
define_error macro and understand how that works internally to get the idea about public API available externally (e.g., that the
Error refers to a public enum).
And here the puzzler nature of declarative macros is exacerbated.
It’s hard enough to figure out how to express the idea you want using the restricted language of macros.
It’s doubly hard to understand the idea the macro’s author had when you can’t peek inside their brain and observer only to the implementation of the macro.
One remedy here is to make macro input look more like the code we want to produce. Something like this:
This indeed is marginally friendlier for IDEs and people to make sense of:
The cost for this is a more complicated macro implementation. Generally, a macro needs to do two things: parse arbitrary token stream input, and emit valid Rust code as output. Parsing is usually the more complicated task. That’s why in our minimal attempt we used maximally simple syntax, just a list of identifiers. However, if we want to make the input of the macro look more like Rust, we have to parse a subset of Rust, and that’s more involved:
We have to carefully deal with all those visibilities and attributes. Even after we do that, the connection between the input Rust-like syntax and the output Rust is skin-deep. This is mostly smoke and mirrors, and is not much different from, e.g., using Haskell syntax here:
We can meaningfully increase the fidelity between macro input and macro output by switching to a derive macro. In contrast to function-like macros, derives require that their input is syntactically and even semantically valid Rust.
So the result looks like this:
enum Error here is an honest, simple enum!
It’s not an alien beast which just wears enum’s skin.
And the implementation of the macro doesn’t look too bad either, thanks to @dtolnay’s tasteful API design:
Unlike declarative macros, here we just directly express the syntax that we want to emit — a match over consecutive natural numbers.
The biggest drawback here is that on the call-site now we don’t have any idea about the extra API generated by the macro.
If, with declarative macros, you can notice an
pub fn from_code in the same file and guess that that’s a part of an API, with a procedural macro that string is in a completely different crate!
While proc-macro can greatly improve the ergonomics of using and implementing macros (inflated compile times notwithstanding), for the reader, they are arguably even more opaque than declarative macros.
Self Modifying Code
Finally, let’s see the promised hacky solution :) While, as you might have noticed, I am not a huge fan of macros, I like plain old code generation — text in, text out. Text manipulation is much worse-is-betterer than advanced macro systems.
So what we are going to do is:
Read the file with the enum definition as a string (
file!()macro will be useful here).
“Parse” enum definition using unsophisticated string splitting (
cutwould be our parser).
Generate the code we want by concatenating strings.
Paste the resulting code into a specially marked position.
Overwrite the file in place, if there are changes.
And we are going to use a
#[test]to drive the process!
That’s the whole pattern!
Note how, unlike every other solution, it is crystal clear how the generated code works.
It’s just code which you can goto-definition, or step through in debugging.
You can be completely oblivious about the shady
#[test] machinery, and that won’t harm understanding in any way.
The code of the “macro” is also easy to understand — that’s literally string manipulation. What’s more, you can easily see how it works by just running the test!
The “read and update your own source code” part is a bit mind-bending! But the implementation is tiny and only uses the standard library, so it should be easy to understand.
Unlike macros, this doesn’t try to enforce at compile time that the generated code is fresh.
If you update the
Error definition, you need to re-run test for the generated code to be updated as well.
But this will be caught by the tests.
Note the important detail — the test only tries to update the source code if there are, in fact, changes.
That is, writable
src/ is required only during development.
That’s all, hope this survey was useful! Discussion on /r/rust.