Unsafe as a Human-Assisted Type System
This is a short note about yet another way to look at Rust’s unsafe
.
Today, an interesting bug was found in rustc, which made me aware just how useful unsafe
is for making code maintainable.
The story begins a couple of months ago, when I was casually browsing through recent pull requests for rust-lang/rust.
I was probably waiting for my code to compile at that moment :]
Anyway, a pull request caught my attention, and, while I was reading the diff, I noticed a usage of unsafe
.
It looked roughly like this:
This function applies a T -> T
function to a &mut T
value, a-la take_mut
crate.
There is a safe way to do this in Rust, by temporary replacing the value with something useless (Jones’s trick):
In map_in_place
we don’t have a T: Default
bound, so the trick is not applicable.
Instead, the function uses (unsafe
) ptr::read
to get an owned value out of a unique reference, and then uses ptr::write
to store the new value back, without calling the destructor.
However, the code has a particular unsafe
code smell: it calls user-supplied code (f
) from within an unsafe
block.
This is usually undesirable, because it makes reasoning about invariants harder: arbitrary code can do arbitrary unexpected things.
And, indeed, this function is unsound: if f
panics and unwinds, the t
value would be dropped twice!
The solution here (which I know from the take_mut
crate) is to just abort the process if the closure panics.
Stern, but effective!
I felt really torn about bringing this issue up: clearly, inside the compiler we know what we are doing, and the error case seems extremely marginal. Nevertheless, I did leave the comment, and the abort trick was implemented.
And guess what?
Today a bug report came in (#62894), demonstrating that closure does panic in some cases, and rustc
aborts.
To be clear, the abort in this case is a good thing!
If rustc didn’t abort, it would be a use-after-free.
Note how cool is this: a casual code-reviewer was able to prevent a memory-safety issue by looking at just a single one-line function. This was possible for two reasons:
-
The code was marked
unsafe
which made it stand out. -
The safety reasoning was purely local: I didn’t need to understand the PR (or surrounding code) as a whole to reason about the
unsafe
block.
The last bullet point is especially interesting, because it is what makes type systems [1] in general effective in large-scale software development:
- Checking types is a local (per-expression, per-function, per-module, depending on the language) procedure. Every step is almost trivial: verify that sub-expressions have the right type and work out the result type.
- Together, these local static checks guarantee a highly non-trivial global property: during runtime, actual types of all the values match inferred static types of variables.
Rust’s unsafe
is similar: if we verify every usage of unsafe
(local property!) to be correct, then we guarantee that the program as a whole does not contain undefined behavior.
The devil is in the details, however, so the reality is slightly more nuanced.
First, unsafe
should be checked by humans, thus a human-assisted type system.
The problem with humans, however, is that they make mistakes all the time.
Second, checking unsafe
can involve a rather large chunk of code.
For example, if you implement Vec
, you can (safely) write to its length
field from anywhere in the defining module.
That means that correctness of Deref
impl for Vec
depends on the whole module.
Common wisdom says that the boundary for unsafe
code is a module, but I would love to see a more precise characteristic.
For example, in map_in_place
case it’s pretty clear that only a single function should be examined.
On the other hand, if Vec
’s field are pub(super)
, parent module should be scrutinized as well.
Third, it’s trivial to make all unsafe
blocks technically correct by just making every function unsafe
.
That wouldn’t be a useful thing to do though!
Similarly, if unsafe
is used willy-nilly across the ecosystem, its value is decreased, because there would be many incorrect unsafe
blocks, and reviewing each additional block would be harder.
Fourth, and probably most disturbing, correctness of two unsafe
blocks in isolation does not guarantee that they together are correct!
We shouldn’t panic though: in practice, realistic usages of unsafe
do compose.
Discussion on r/rust.
Update(2020-08-17): oops, I did it again.
[1] unsafe
is really an effect system, but the difference is not important here.