Notes On Module System

Unedited summary of what I think a better module system for a Rust-like language would look like.

Todays Rust module system is its most exciting feature, after borrow checker. Explicit separation between crates (which form a DAG) and modules (which might be mutually dependent) and the absence of a single global namespace (crates dont have innate names; instead, the name is written on a dependency edge between two crates, and the same crate might be known under different names in two of its dependents) makes decentralized ecosystems of libraries a-la crates.io robust. Specifically, Rust allows linking-in several versions of the same crate without the fear of naming conflicts.

However, the specific surface syntax we use to express the model I feel is suboptimal. Module system is pretty confusing (in the pre-2018 surveys, it was by far the most confusing aspect of the language after lifetimes. Post-2018 system is better, but there are still regular questions about module system). What can we do better?

First, be more precise about visibilities. The most single most important question about an item is can it be visible outside of CU?. Depending on the answer to that, you have either closed world (all usages are known) or open world (usages are not-knowable) assumption. This should be reflected in the modules system. pub is for visible inside the whole CU, but not further. export or (my favorite) pub* is for visible to the outer world. You sorta can have these in todays rust with pub(crate), -Dunreachable_pub and some tolerance for compiler false-positive.

I am not sure if the rest of Rust visibility systems pulls its weight. It is OK, but it is pretty complex pub(in some::path) and doesnt really help making visibilities more precise within a single CU doesnt meaningfully make the code better, as you can control and rewrite all the code anyway. CU doesnt have internal boundaries which can be reflected in visibilities. If we go this way, we get a nice, simple system: fn foo() is visible in the current module only (not its children), pub fn foo() is visible anywhere inside the current crate, and pub* fn foo() is visible to other crates using ours. But then, again, the current tree-based visibility is OK, can leave it in as long as pub/pub* is more explicit and -Dunreachable_pub is an error by default.

In a similar way, the fact that use is an item (ie, a::b can use items imported in a) is an unnecessary cuteness. Imports should only introduce the name into modules namespace, and should be separate from intentional re-exports. It might make sense to ban glob re-export thisll give you a nice property that all the names existing in the module are spelled out explicitly, which is useful for tooling. Though, as Rust has namespaces, looking at pub use submod::thing doesnt tell you whether the thing is a type or a value, so this might not be a meaningful property after all.

The second thing to change would be module tree/directory structure mapping. The current system creates quite some visible problems:

A bunch of less-objective issues:

I think a better system would say that a compilation unit is equivalent to a directory with Rust source files, and that (relative) file paths correspond to module paths. Theres neither mod foo; nor mod foo {} (yes, sometimes those are genuinely useful. No, the fact that something can be useful doesnt mean it should be part of the language its very hard to come up with a language features which would be completely useless (though mod foo {} I think can be added back relatively painless)). We use mod.rs, but we name it _$name_of_the_module$.rs instead, to solve two issues: sort it first alphabetically, and generate a unique fuzzy-findable name. So, something like this:

/home/matklad/projects/regex
  Cargo.toml
  src/
    _regex.rs
    parsing/
      _parsing.rs
      ast.rs
    rt/
     _rt.rs
     dfa.rs
     nfa.rs
  bins/
    grep/
      _grep.rs
      cli.rs
  tests/
    _tests.rs   # just a single integration tests binary by default!
    lookahead.rs
    fuzz.rs

The library there would give the following module tree:

crate::{
    parsing::{ast}
    rt::{nfa, dfa}
}

To do conditional compilation, youd do:

mutex/
  _mutex.rs
  linux_mutex.rs
  windows_mutex.rs

where _mutex.rs is

#[cfg(linux)]
use linux_mutex as os_mutex;
#[cfg(windows)]
use windows_mutex as os_mutex;

pub struct Mutex {
   inner: os_mutex::Mutex
}

and linux_mutex.rs starts with #![cfg(linux)]. But of course we shouldnt implement conditional compilation by barbarically cutting the AST, and instead should push conditional compilation to after the type checking, so that you at least can check, on Linux, that the windows version of your code wouldnt fail due to some stupid typos in the name of #[cfg(windows)] functions. Alas, I dont know how to design such conditional compilation system.

The same re-export idiom would be used for specifying non-default visibility: pub* use rt; would make regex::rt a public module (yeah, this particular bit is sketchy :-) ).

I think this approach would make most of pitfalls impossible. E.g, it wouldnt be possible to mix several different crates in one source tree. Additionally, itd be a great help for IDEs, as each file can be processed independently, and it would be clear just from the file contents and path where in the crate namespace the items are mounted, unlocking map-reduce style IDE.

While we are at it, use definitely should use exactly the same path resolution rules as the rest of the language, without any kind of implicit leading ::special cases. Oh, and we shouldnt have nested use groups:

use collections::{
    hash::{HashMap, HashSet},
    BTreeMap,
}

Some projects use them, some projects dont use them, sufficiently large projects inconsistently both use and dont use them.

Afterword: as Ive said in the beginning, this is unedited and not generally something Ive thought very hard and long about. Please dont take this as one true way to do things, my level of confidence about these ideas is about 0.5 I guess.