Notes On Module System

Unedited summary of what I think a better module system for a Rust-like language would look like.

Today’s Rust module system is it’s most exciting feature, after borrow checker. Explicit separation between crates (which form a DAG) and modules (which might be mutually dependent) and the absence of a single global namespace (crates don’t have innate names; instead, the name is written on a dependency edge between two crates, and the same crate might be known under different names in two of its dependents) makes decentralized ecosystems of libraries a-la crates.io robust. Specifically, Rust allows linking-in several versions of the same crate without the fear of naming conflicts.

However, the specific surface syntax we use to express the model I feel is suboptimal. Module system is pretty confusing (in the pre-2018 surveys, it was by far the most confusing aspect of the language after lifetimes. Post-2018 system is better, but there are still regular questions about module system). What can we do better?

First, be more precise about visibilities. The most single most important question about an item is “can it be visible outside of CU?”. Depending on the answer to that, you have either closed world (all usages are known) or open world (usages are not-knowable) assumption. This should be reflected in the modules system. pub is for “visible inside the whole CU, but not further”. export or (my favorite) pub* is for “visible to the outer world”. You sorta can have these in today’s rust with pub(crate), -Dunreachable_pub and some tolerance for compiler false-positive.

I am not sure if the rest of Rust visibility systems pulls its weight. It is OK, but it is pretty complex pub(in some::path) and doesn’t really help — making visibilities more precise within a single CU doesn’t meaningfully make the code better, as you can control and rewrite all the code anyway. CU doesn’t have internal boundaries which can be reflected in visibilities. If we go this way, we get a nice, simple system: fn foo() is visible in the current module only (not its children), pub fn foo() is visible anywhere inside the current crate, and pub* fn foo() is visible to other crates using ours. But then, again, the current tree-based visibility is OK, can leave it in as long as pub/pub* is more explicit and -Dunreachable_pub is an error by default.

In a similar way, the fact that use is an item (ie, a::b can use items imported in a) is an unnecessary cuteness. Imports should only introduce the name into module’s namespace, and should be separate from intentional re-exports. It might make sense to ban glob re-export — this’ll give you a nice property that all the names existing in the module are spelled out explicitly, which is useful for tooling. Though, as Rust has namespaces, looking at pub use submod::thing doesn’t tell you whether the thing is a type or a value, so this might not be a meaningful property after all.

The second thing to change would be module tree/directory structure mapping. The current system creates quite some visible problems:

A bunch of less-objective issues:

I think a better system would say that a compilation unit is equivalent to a directory with Rust source files, and that (relative) file paths correspond to module paths. There’s neither mod foo; nor mod foo {} (yes, sometimes those are genuinely useful. No, the fact that something can be useful doesn’t mean it should be part of the language — it’s very hard to come up with a language features which would be completely useless (though mod foo {} I think can be added back relatively painless)). We use mod.rs, but we name it _$name_of_the_module$.rs instead, to solve two issues: sort it first alphabetically, and generate a unique fuzzy-findable name. So, something like this:

/home/matklad/projects/regex
  Cargo.toml
  src/
    _regex.rs
    parsing/
      _parsing.rs
      ast.rs
    rt/
     _rt.rs
     dfa.rs
     nfa.rs
  bins/
    grep/
      _grep.rs
      cli.rs
  tests/
    _tests.rs   # just a single integration tests binary by default!
    lookahead.rs
    fuzz.rs

The library there would give the following module tree:

crate::{
    parsing::{ast}
    rt::{nfa, dfa}
}

To do conditional compilation, you’d do:

mutex/
  _mutex.rs
  linux_mutex.rs
  windows_mutex.rs

where _mutex.rs is

#[cfg(linux)]
use linux_mutex as os_mutex;
#[cfg(windows)]
use windows_mutex as os_mutex;

pub struct Mutex {
   inner: os_mutex::Mutex
}

and linux_mutex.rs starts with #![cfg(linux)]. But of course we shouldn’t implement conditional compilation by barbarically cutting the AST, and instead should push conditional compilation to after the type checking, so that you at least can check, on Linux, that the windows version of your code wouldn’t fail due to some stupid typos in the name of #[cfg(windows)] functions. Alas, I don’t know how to design such conditional compilation system.

The same re-export idiom would be used for specifying non-default visibility: pub* use rt; would make regex::rt a public module (yeah, this particular bit is sketchy :-) ).

I think this approach would make most of pitfalls impossible. E.g, it wouldn’t be possible to mix several different crates in one source tree. Additionally, it’d be a great help for IDEs, as each file can be processed independently, and it would be clear just from the file contents and path where in the crate namespace the items are mounted, unlocking map-reduce style IDE.

While we are at it, use definitely should use exactly the same path resolution rules as the rest of the language, without any kind of “implicit leading ::” special cases. Oh, and we shouldn’t have nested use groups:

use collections::{
    hash::{HashMap, HashSet},
    BTreeMap,
}

Some projects use them, some projects don’t use them, sufficiently large projects inconsistently both use and don’t use them.

Afterword: as I’ve said in the beginning, this is unedited and not generally something I’ve thought very hard and long about. Please don’t take this as one true way to do things, my level of confidence about these ideas is about 0.5 I guess.