Minimal Version Selection Revisited

In this post, I want to highlight one aspect of Go-style minimal version selection that I have missed completely at first. Maybe you missed it too?

If you recall, some years ago there was a discussion of various version selection algorithms in the Go and the Rust communities. See this list for a refresher:

https://research.swtch.com/vgo

Broadly, when you write foo = "1.2.3" in the manifest for your library, what you are specifying is not a specific version of foo, but rather a set of versions (a version constraint). Some of these versions might not had existed when you wrote foo = "1.2.3" line!

As an author of a library, you cant actually pick your dependencies exactly. Instead, the final application that uses your library is in charge of picking specific dependencies, including those for your library. In particular, if that application depends on some other library which says foo = "1.3.0", then your library gets at least 1.3.0 as well, although youve written 1.2.3.

In general, there are many ways to pick a specific set of versions that satisfy the constraints. Its interesting that Go and Rust chose, in some sense, opposite algorithms for solving these constraints. Rust selects maximal versions, while Go selects minimal versions.

The original discussion focused (or at least I perceived it as being focused) on the algorithmic complexity. The strawhat claim was that min-ver can be solved linearly, while max-ver requires NP-hard SAT-solving. This is not a particularly strong claim:

Its much more interesting to look at the ecosystem implications of the two approaches.

The big benefit of max-ver is the ecosystem-wide live-at-head: if everyone uses latest versions, than everyone uses the same combination of versions of libraries. In particular, ecosystem-wide testing ends up being concentrated on a few combinations that everyone is using. In practice, if you release a new version of a library, and it turns out to be buggy, someone immediately discovers it in their tests, so that you can quickly yank&regroup.

But the min-ver has a dual property (this is the new thing Ive realized only recently). In min-ver, if you get foo=1.2.3 in your build graph, that means that someone somewhere made an explicit decision to write foo=1.2.3 in the manifest for their library. That is, any version in your dependency graph is additionally vetted manually by someone who is not the original library developer.

This is quite neat it puts a natural damper on the supply chain attacks. If a bad version of a library is released, someone needs to explicitly opt into this new version. Whats more, the deeper in your dependency tree the library is, the more explicit approvals are required for the library to propagate to your project.


Consider this hypothetical system for specifying dependencies:

In the manifest file, you specify dependencies by listing both a version and a checksum:

foo = { version = "1.2.3", checksum = "sha256:bee71ecaffe" }

Only direct dependencies are listed. Checksums are of course filled-in by a tool.

Checksum covers the content of a dependency, including its manifest file. In other words, the checksum for foo above implicitly includes checksums of all direct dependencies of foo.

By exploring the set of dependencies from the root, we collect the set of dependencies and their checksums that are protected by some checksum in the root manifest. Then, if we have several versions for a single dependency, we pick the largest one.

This is cool. This is essentially bring your own registry version resolution, where the set of available versions is fixed, and pinned by the root hashes. All dependencies in the set have themin-ver property that someone somewhere in the dependency graph asked for this version of the dependency explicitly.

We also need neither lockfiles, nor the Go-style checksum files. Manifest form a distributed lockfile, where each dependency locks a part of the graph, and this lockfile fragment is itself hashed&locked by some direct-dependency specification.

But we still get dependency resolution only direct dependencies are specified, and, if both foo=1.2.3 and foo=1.3.0 end up being in the set of dependencies, only 1.3.0 will be included in the final graph.

My understanding is that Zig is trying to do something like this, and, as all things Zig, it feels extremely neat, even if rough at the edges. It never occurred to me before that putting hashes of direct dependencies in the manifest also locks transitive dependencies. So cute!


Update(2024-12-25): It is worth mentioning another qualitative difference between min-ver and max-ver. The latter generally requires some form of centralized registry an entity which knows all the versions of all the libraries. When you pick a maximal version, you pick the highest version known to the registry. For example, in the case of Cargo the central registry is implemented as a git repository (https://github.com/rust-lang/crates.io-index) which holds JSON files describing available versions.

With min-ver, you dont need a registry, as the set of versions to choose from can be incrementally discovered by crawling the web of dependencies starting from the final application, as long dependency specifications include not only a name and a version, but also a location (URL) and a checksum. Though, keep in mind that you still need some solution to ensure availability of dependencies with a large enough dependency tree there will be at least one dependency hosted from grandmas old PC, prone to flooding. You can use either Go-style global shared cache (it doesnt have to be trusted, as long as hashes are included in dependency specification), or make it very easy to create a project-local cache a ./vendor directory in the same repository which embeds all the dependencies (again, it is important that ./vendor is strictly a cache, and the source of truth is the checksum in the manifest).