Newtype Index Pattern
Similarly to the previous post, we will once again add types to the Rust code which works perfectly fine without them. This time, we’ll try to improve the pervasive pattern of using indexes to manage cyclic data structures.
Often one wants to work with a data structure which contains a cycle
of some form: object
bar, which references
foo again. The textbook example here is a graph of
vertices and edges. In practice, however, true graphs are a rare
encounter. Instead, you are more likely to see a tree with parent
pointers, which contains a lot of trivial cycles. And sometimes cyclic
graphs are implicit: an
Employee can be the head of a
Departement has a
Vec<Employee> personal. This is sort-of a
graph in disguise: in usual graphs, all vertices are of the same type,
Departement are different types.
Working with such data structures is hard in any language. To arrive
at a situation when
A points to
B which points back to
form of mutability is required. Indeed, either
B must be
created first, and so it can not point to the other immediately after
construction. You can paper over this mutability with
let rec, as in
OCaml, or with laziness, as in Haskell, but it is still there.
Rust tends to surface subtle problems in the form of compile-time errors, so implementing such graphs in Rust is challenging. The three usual approaches are:
- reference counting, explanation by nrc,
- arena and real cyclic references, explanation by simonsapin (this one is really neat!),
- arena and integer indices, explanation by nikomatsakis.
(apparently, rewriting a Haskell monad tutorial in Rust results in a graphs blog post).
I personally like the indexing approach the most. However it presents
an interesting readability challenge. With references, you have a
foo of type
&Foo, and it is immediately clear what that
and what you can do with it. With indexes, however, you have a
usize, and it is not obvious that you somehow can get a
worse, if indexes are used for two types of objects, like
Bar, you may end up with
thing: usize. While writing the code with
usize actually works pretty well (I don’t think I’ve ever used the
wrong index type), reading it later is more complicated, because
usize is much less suggestive of what you could do.
One way to ameliorate this problem is to introduce a newtype wrapper
Here, “one should use
FooIdx to index into
Vec<Foo>” is still just
a convention. A cool thing about Rust is that we can turn this
convention into a property verified during type checking. By adding an
appropriate impl, we should be able to index into
The impl would look like this:
It’s insightful to study why this impl is allowed. In Rust, types, traits and impls are separate. This creates a room for a problem: what if there are two impl blocks for a given (trait, type) pair? The obvious choice is to forbid to have two impls in the first place, and this is what Rust does.
Actually enforcing this restriction is tricky! The simplest rule of “error if a set of crates currently compiled contains duplicate impls” has severe drawbacks. First of all, this is a global check, which requires the knowledge of all compiled crates. This postpones the check until the later stages of compilation. It also plays awfully with dependencies, because two completely unrelated crates might fail the compilation if present simultaneously. What’s more, it doesn’t actually solve the problem, because the compiler does not necessary know the set of all crates beforehand. For example, you may load additional code at runtime via dynamic libraries, and silent bad things might happen if you program and dynamic library have duplicate impls.
To be able to combine crates freely, we want a much stronger property:
not only the set of crates currently compiled, but all existing and
even future crates must not violate the one impl restriction. How on
earth is it possible to check this? Should
cargo publish look for
conflicting impls across all of the crates.io?
Luckily, and this is stunningly beautiful, it is possible to loosen
this world-global property to a local one. In the simplest form, we
can place a restriction that
impl Foo for Bar can appear either in
the crate that defines
Foo, or in the one that defines
Bar. Crucially, whichever one defines the impl has to use the other,
which makes it possible to detect the conflict.
This is all really nifty, but we’ve just defined an
Index impl for
Vec, and both
Vec are from the standard library! How
is it possible? The trick is that
Index has a type parameter:
Index<Idx: ?Sized>. It is a template for a trait of sorts, and we get
a “real” trait when we substitute type parameter with a type. Because
FooIdx is a local type, the resulting
Index<FromIdx> trait is also
considered local. The precise rules here are quite tricky, this
RFC explains them pretty well.
Index<BarIdx> are different traits, one
type can implement both of them. This is convenient for containers
which hold distinct types:
It’s also helpful to define arithmetic operations and conversions for
the newtyped indexes. I’ve put together a
typed_index_derive crate to automate this boilerplate via a
proc macro, the end result looks like this:
Discussion on /r/rust.