Large Rust Workspaces
In this article, I’ll share my experience with organizing large Rust projects. This is in no way authoritative — just some tips I’ve discovered through trial and error.
Cargo, Rust’s build system, follows convention over configuration principle. It provides a set of good defaults for small projects, and it is especially well-tailored for public crates.io libraries. The defaults are not perfect, but they are good enough. The resulting ecosystem-wide consistency is also welcome.
However, Cargo is less opinionated when it comes to large, multi-crate projects, organized as a Cargo workspace. Workspaces are flexible — Cargo doesn’t have a preferred layout for them. As a result, people try different things, with varying degrees of success.
To cut to the chase, I think for projects in between ten thousand and one million lines of code, the flat layout makes the most sense. rust-analyzer (200k lines) is good example here. The repository is laid out this:
1 2 3 4 5 6 7 8 9 rust-analyzer/ Cargo.toml Cargo.lock crates/ rust-analyzer/ hir/ hir_def/ hir_ty/ ...
In the root of the repo, Cargo.toml defines a virtual manifest:
1 2 [workspace] members = ["crates/*"]
Everything else (including rust-analyzer “main” crate) is nested one-level deep under
The name of each directory is equal to the name of the crate:
1 2 3 4 [package] name = "hir_def" version = "0.0.0" edition = "2018"
At the time of writing, there are 32 different subfolders in
It’s interesting that this advice goes against the natural tendency to just organize everything hierarchically:
1 2 3 4 5 6 7 8 rust-analyzer/ Cargo.toml src/ hir/ Cargo.toml src/ def/ ty/
There are several reasons why trees are inferior in this case.
First, the Cargo-level namespace of crates is flat.
It’s not possible to write
hir::def in Cargo.toml, so crates typically have prefixes in their names.
Tree layout creates an alternative hierarchy, which adds a possibility for inconsistencies.
Second, even comparatively large lists are easier to understand at a glance than even small trees.
ls ./crates gives immediate bird’s eye view of the project, and this view is small enough:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 16:22:57|~/projects/rust-analyzer|master✓ λ ls ./crates base_db cfg flycheck hir hir_def hir_expand hir_ty ide ide_assists ide_completion ide_db ide_diagnostics ide_ssr limit mbe parser paths proc_macro_api proc_macro_srv proc_macro_test profile project_model rust-analyzer sourcegen stdx syntax test_utils text_edit toolchain tt vfs
Doing the same for a tree-based layout is harder.
Looking at a single level doesn’t tell you which folders contains nested crates.
Looking at all level lists too many folders.
Looking only at folder that contain Cargo.toml gives the right result, but is not as trivial as just
It is true that nested structure scales better than a flat one. But the constant matters — until you hit a million lines of code, the number of crates in the project will probably fit on one screen.
Finally, the last problem with hierarchical layout is that there are no perfect hierarchies. With a flat structure, adding or splitting the crates is trivial. With a tree, you need to figure out where to put the new crate, and, if there isn’t a perfect match for it already, you’ll have to either:
add a stupid mostly empty folder near the top
add a catch-all utils folder
place the code in a known suboptimal directory.
This is a significant issue for long-lived multi-person projects — tree structure tends to deteriorate over time, while flat structure doesn’t need maintenance.
Make the root of the workspace a virtual manifest.
It might be tempting to put the main crate into the root, but that pollutes the root with
src/, requires passing
--workspace to every Cargo command, and adds an exception to an otherwise consistent structure.
Don’t succumb to the temptation to strip common prefix from folder names. If each crate is named exactly as the folder it lives in, navigation and renames become easier. Cargo.tomls of reverse dependencies mention both the folder and the crate name, it’s useful when they are exactly the same.
For large projects a lot of repository bloat often comes from ad-hoc automation — Makefiles and various prepare.sh scripts here and there.
To avoid both the bloat and proliferation of ad-hoc workflows, write all automation in Rust in a dedicated crate.
One pattern useful for this is
version = "0.0.0" for internal crates you don’t intend to publish.
If you do want to publish a subset of crates with proper semver API, be very deliberate about them.
It probably makes sense to extract all such crates into a separate top-level folder,
It makes it easier to check that things in
libs/ don’t use things from
Some crates consist only of a single-file.
For those, it is tempting to flatten out the
src directory and keep
Cargo.toml in the same directory.
I suggest not doing that — even if crate is single file now, it might get expanded later.
|This post is a part of One Hundred Thousand Lines of Rust series.|