Fast and Simple Rust Interner
This post describes a simple technique for writing interners in Rust which I haven’t seen documented before.
String interning is a classical optimization when you have to deal with many equal strings. The canonical example would be a compiler: most identifiers in a program are repeated several times.
Interning works by ensuring that there’s only one canonical copy of each distinct string in memory. It can give the following benefits:
- Less memory allocated to hold strings.
If all strings are canonicalized, comparison can be done in
O(n)) by using pointer equality.
Interned strings themselves can be represented with an index (typically
u32) instead of a
(ptr, len)pair. This makes data structures which embed strings more compact.
The simplest possible interner in Rust could look like this:
To remove duplicates, we store strings in a
To map from an index back to the string, we also store strings in a
I didn’t quite like this solution yesterday, for two reasons:
- It allocates a lot — each interned string is two separate allocations.
HashMapfeels like cheating, surely there should be a better, more classical data structure!
So I’ve spent a part of the evening cobbling together a non-allocating trie-based interner.
The result: trie does indeed asymptotically reduce the number of allocations from
Unfortunately, it is slower, larger and way more complex than the above snippet.
Minimizing allocations is important, but allocators are pretty fast, and that shouldn’t be done at the expense of everything else.
HashMap (implemented by @Amanieu based on Swiss Table) is fast.
For the curious, the Trie design I've used
The trie is build on per-byte basis (each node has at most 256 children). Each internal node is marked with a single byte. Leaf nodes are marked with substrings, so that only the common prefix requires node per byte.
To avoid allocating individual interned strings, we store them in a single long
An interned string is represented by a
Span (pair of indexes) inside the big buffer.
Trie itself is a tree structure, and we can use a standard trick of packing its nodes into array and using indexes to avoid allocating every node separately. However, nodes themselves can be of varying size, as each node can have different number of children. We can still array-allocate them, by rolling our own mini-allocator (using a segregated free list)!
Node’s children are represented as a sorted array of links. We use binary search for indexing and simple linear shift insertion. With at most 256 children per node, it shouldn’t be that bad. Additionally, we pre-allocate 256 nodes and use array indexing for the first transition.
Links are organized in layers.
n stores a number of
[Link] chunks of length
2^n^ (in a single contiguous array).
Each chunk represents the links for a single node (with possibly some extra capacity).
Node can find its chunk because it knows the number of links (which gives the number of layers) and the first link in the layer.
A new link for the node is added to the current chunk if there’s space.
If the chunk is full, it is copied to a chunk twice as big first.
The old chunk is then added to the list of free chunks for reuse.
Here’s the whole definition of the data structure:
Isn’t it incredibly cool that you can look only at the fields and understand how the thing works, without even seeing the rest 150 lines of relatively tricky implementation?
However, implementing a trie made me realize that there’s a simple optimization we can apply to our naive interner to get rid of extra allocations.
In the trie, I concatenate all interned strings into one giant
String and use
(u32, u32) index pairs as an internal representation of string slice.
If we translate this idea to our naive interner, we get:
The problem here is that we can’t actually write implementations of
Span to make this work.
In theory, this is possible: to compare two
Spans, you resolve them to
buf, and then compare the strings.
However, Rust API does not allow to express this idea.
Moreover, even if
HashMap allowed supplying a key closure at construction time, it wouldn’t help!
Such API would run afoul of the borrow checker.
key_fn would have to borrow from the same
What would work is supplying a
key_fn at call-site for every
HashMap operation, but that would hurt ergonomics and ease of use a lot.
This exact problem requires
design of lazy values in Rust.
However, with a bit of
unsafe, we can make something similar work.
The trick is to add strings to
buf in such a way that they are never moved, even if more strings are added on top.
That way, we can just store
&str in the
To achieve address stability, we use another trick from the
buf is full (so that adding a new string would invalidate old pointers), we allocate a new buffer, twice as large,
without coping the contents of the old one.
Here’s the full implementation:
The precise rule for increasing capacity is slightly more complicated:
Just doubling won’t be enough, we also need to make sure that the new string actually fits.
We could have used a single
bufs: Vec<String> in place of both
The benefit of splitting the last buffer into a dedicated field is that we statically guarantee that there’s at least one buffer.
That way, we void a bounds check and/or
.unwrap when accessing the active buffer.
We also use
&'static str to fake interior references.
Miri (rust in-progress UB checker) is not entirely happy about this.
I haven’t dug into this yet, it might be another instance of
To be on the safe side, we can use
*const str instead, with a bit of boilerplate to delegate
Some kind of (hypothetical)
'unsafe lifetime could also be useful here!
The critical detail that makes our use of fake
'static sound here is that the
alloc function is private.
lookup function shortens the lifetime to that of
&self (via lifetime elision).
For the real implementation, I would change two things:
rustc_hash::FxHashMap. It’s a standard Rust
HashMapwith a faster (but not DOS-resistant) hash function –
Fxstands for Firefox, this is a modification of FNV hash originally used in the browser.
Add a newtype wrapper for string indexes:
That’s all I have to say about fast and simple string interning in Rust! Discussion on /r/rust.