STD Doesn’t Have to Abstract OS IO Aug 12, 2024

A short note on what goes into a language’s standard library, and what’s left for third party libraries to implement!

Usually, the main underlying driving factor here is cardinality. If it is important that there’s only one of a thing, it goes into std. If having many of a thing is a requirement, it is better handled by a third-party library. That is, the usual physical constraint is that there’s only a single standard library, and everyone uses the same standard library. In contrast, there are many different third-party libraries, and they all can be used at the same time.

So, until very recently, my set of rules of thumb for what goes into stdlib looked roughly like this:

If this is a vocabulary type, which will be used by APIs of different libraries, it should be in the stdlib.
If this is a cross platform abstraction around an IO facility provided by an OS, and this IO facility has a reasonable common subset across most OSes, it should be in the stdlib.
If there’s one obvious way to implement it, it might go to stdlib.

So for example something like Vec goes into a standard library, because all other libraries are going to use vectors at the interfaces.

Something like lazy_static doesn’t: while it is often needed, it is not a vocabulary interface type.

But it is acceptable for something like OnceCell to be in std — it is still not a vocabulary type, but, unlike lazy_static, it is clear that the API is more or less optimal, and that there aren’t that many good options to do this differently.

But I’ve changed my mind about the second bullet point, about facilities like file IO or TCP sockets. I was always under the impression that these things are a must for a standard library. But now I think that’s not necessarily true!

Consider randomness. Not the PRNG kind of randomness you’d use to make a game fun, but a cryptographically secure randomness that you’d use to generate an SSH key pair. This sort of randomness ultimately bottoms out in hardware, and fundamentally requires talking to the OS and doing IO. This is squarely the bullet point number 2. And Rust is an interesting case study here: it failed to provide this abstraction in std, even though std itself actually needs it! But this turned out to be mostly a non-issue in practice — a third party crate, getrandom, took the job of writing all the relevant bindings to various platform-specific API and using a bunch of conditional compilation to abstract that all away and provide a nice cross-platform API.

So, no, it is not a requirement that std has to wrap any wrappable IOing API. This could be handled by the library ecosystem, if the language allows first-class bindings to raw OS APIs outside of compiler-privileged code (and Rust certainly allows for that).

So perhaps it won’t be too unreasonable to leave even things like files and sockets to community experimentation? In a sense, that is happening in the async land anyway.

To clarify, I still believe that Rust should provide bindings to OS-sourced crypto randomness, and I am extremely happy to see recent motion in that area. But the reason for this belief changed. I no longer feel the mere fact that OS-specific APIs are involved to be particularly salient. However, it is still true that there’s more or less one correct way to do this.