Ever since reading
What If We Pretended That a Task = Thread?
I can’t stop thinking about borrowing non-Sync data
across .await. In this post, I’d love to take one more
look at the problem.
To warm up, a refresher on
Send and
Sync auto-traits. These traits are a library feature that enable fearless concurrency — a
statically checked guarantee that non-thread-safe data structures
don’t escape from their original thread.
Why do we need two traits, rather than just a single ThreadSafe? Because there are two degrees of
thread-unsafety.
Some types are fine to use from multiple threads, as long as only a
single thread at a time uses a particular value. An example here
would be a Cell<i32>. If two threads have a
reference to a cell at the same time, a &Cell<i32>, we are in trouble — Cell’s loads and stores are not atomic and are UB by
definition if used concurrently. However, if two different threads
have exclusive access to a Cell, that’s fine — because
the access is exclusive, it necessary means that it is not
simultaneous. That is, it’s OK for thread A to send a Cell<i32> to a different thread B, as long as A itself
loses access to the cell.
But there are also types which are unsafe to use from multiple
threads even if only a single thread at a time has access to a
value. An example here would be an Arc<Cell<i32>>. It’s not possible to safely send
such an Arc to a different thread, because a .clone call can be used to get an independent copy of an
Arc, effectively creating a share operation
out of a send one.
But turns out both cases are covered by just a single trait, Send. The thing is, to share a
Cell<i32> across two threads, it is necessary to
send an &Cell<i32>. So we get the
following table:
Send
!Send
Cell<i32>
&Cell<i32>
i32
Arc<Cell<i32>>
&i32
&Arc<Cell<i32>>
If T is Send, &T might or
might not be Send. And that’s where the Sync traits comes from: &T: Send if and
only if (iff) T: Sync. Which gives the following table:
Send
!Send
Sync
i32
!Sync
Cell<i32>
Arc<Cell<i32>>
What about that last empty cell? Types which are Sync
and !Send are indeed quite rare, and I don’t know
examples which don’t boil down to “underlying API mandates that a
type doesn’t leave a thread”. One example here would be MutexGuard from the standard library — pthreads require
that only the thread that originally locked a mutex can unlock it.
This isn’t a fundamental requirement for a mutex — a MutexGuard from parking lot
can be Send.
As you see, the Send & Sync
infrastructure is quite intricate. Is it worth it? Absolutely, as it
leads to simpler code. In Rust, you can explicitly designate certain
parts of a code base as non-thread-safe, and then avoid worrying
about threads, because compiler will catch your hand if you
accidentally violate this constraint.
The power of Rust is not defensively making everything thread safe,
its the ability to use thread-unsafe code fearlessly.
And it seems like async doesn’t quite have this power.
Let’s build an example, a litmus test!
Let’s start with a Context pattern, where a bunch of
stuff is grouped into a single struct, so that they can be threaded
through the program as one parameter. Such Context
object is usually scoped to a particular operation — the ultimate
owner of Context is a local variable in some top-level
“main” function, it is threaded as &Context or
&mut Context everywhere, and usually isn’t stored
anywhere. For the &Context variant, it is also
customary to add some interior mutability for things like caches.
One real-life example would be a Config type from
Cargo:
config/mod.rs#L168.
Distilling the pattern down, we get something like this:
Do you see the problem? Surprisingly, even rustc
doesn’t see it, the code above compiles in isolation. However, when
we start using it with Tokio’s work-stealing runtime,
error: future cannot be sent between threads safely--> src/main.rs:29:18 | | tokio::spawn(task_main()); | ^^^^^^^^^^^ future returned by `task_main` is not `Send` |within `Context`, the trait `Sync` is not implemented for `Cell<i32>`.if you want to do aliasing and mutation between multiple threads,use `std::sync::RwLock` or `std::sync::atomic::AtomicI32` instead.
What happened here? When compiling async fn f, compiler
reifies its stack frame as a Rust struct:
This struct contains a reference to our Context type,
and then Context: !Sync implies &Context:
!Send implies FStackFrame<'_>: !Send .
And that finally clashes with the signature of
tokio::spawn:
Tokio’s default executor is work-stealing. It’s going to poll the
future from different threads, and that’s why it is required that
the future is Send.
In my eyes this is a rather significant limitation, and a big
difference with synchronous Rust. Async Rust has to be defensively
thread-safe, while sync Rust is free to use non-thread-safe data
structures when convenient.
Let me explain first why this works, and then why this can’t work.
A Future is essentially a stack-frame of an
asynchronous function. Original tokio version requires that all such
stack frames are thread safe. This is not what happens in
synchronous code — there, functions are free to put cells on their
stacks. The Sendness is only guarded when data are
actually send to a different thread, in Chanel::send
and thread::spawn. The spawn function in
particular says nothing about the stack of a new thread. It
only requires that the data used to create the first stack frame is
Send.
And that’s what we do in the async version: instead of spawning a
future directly, it, just like the sync version, takes a closure.
The closure is moved to a different execution context, so it must be
: Send. The actual future created by the closure in the
new context can be whatever. An async runtime is free to poll this
future from different threads regardless of its Sync
status.
Async work-stealing still works for the same reason that blocking
work stealing works. Logical threads of execution can migrate
between physical CPU cores because OS restores execution context
when switching threads. Task can migrate between threads because
async runtime restores execution context when switching tasks. Go is
a proof that this is possible — goroutines migrate between different
threads but they are free to use on-stack non-thread safe state. The
pattern is clearly sound, the question is, can we express this
fundamental soundness in Rust’s type system, like we managed to do
for OS threads?
This is going to be tricky, because Sendtoday absolutely means “same thread”, not “same execution
context”. Here’s one example that would break:
If the .await migrates to a different thread, we are in
trouble: two tasks can start on the same thread, then diverge, but
continue to hammer the same non-atomic reference count.
Another breakage example is various OS APIs that just mandate that
things happen on a particular execution thread, like pthread_mutex_unlock. Though I think that the turtle those
APIs stand on are thread locals again?
Can we fix it? As an absolute strawman proposal, let’s redefine
Send & Sync in terms of abstract
“execution contexts”, add OsThreadSend and OsThreadSync, and change API which involve thread locals to
use the OsThread variants. It seems that everything
else works?
I would like to posit four questions to the wider async Rust
community.
Does this work in theory? As far as I can tell, this does indeed
works, but I am not an async expert. Am I missing something?
Ideally, I’d love to see small, self-contained litmus test
examples that break OsThreadSend
Rust.
Is this an important problem in practice to look into? On the
one hand, people are quite successful with async Rust as it is.
On the other hand, the expressivity gap here is real, and Rust,
as a systems programming language, strives to minimize such
gaps. And then there’s the fact that failure mode today is
rather nasty — although the actual type error is inside the
f
function, we learn about it only at the call site in main.
EDIT: I am also wondering — if we stop caring whether futures
are : Send, does that mean we no longer need an
explicit syntax for Send bounds in async traits?
Assuming that this idea does work, and we decide that we care
enough to try to fix it, is there a backwards-compatible path we
could take to make this a reality?
EDIT: to clarify, no way we are really adding a new auto-trait
like OsThreadSend. But there could be some less
invasive change to get the desired result. For example, a more
promising approach is to expose some runtime hook for async
runtimes to switch TLS, such that each task gets an independent
copy of thread-local storage, as if task=thread.
Is it a new idea that !Send futures and
work-stealing don’t conflict with each other? For me, that 22.05.2023 post
was the first time I’ve learned that having a &Cell<i32> in a future’s state machine does
not preclude polling it from different OS threads. But there’s
nothing particularly new there, the relevant APIs were
stabilized years ago. Was this issue articulated and discussed
back when the async Rust was designed, or is it a genuinely new
finding?
Update(2023-12-30): there was some
discussion of the ideas on
Zulip. It looks this isn’t completely broken and that, indeed,
thread-locals are the main principled obstacle.
I think I also got a clear picture of a solution for ideal world,
where we are not bound by backwards compatibility requirements: make
thread local access unsafe. Specifically:
First, remove any references to OS threads from the
definition of Send and Sync. Instead,
define them in terms of abstract concurrency. I am not well-versed
enough in formal side of things to understand precisely what that
should entail, but I have a litmus test. The new definition should
work for interrupt handlers in embedded. In OS and embedded
programming, one needs to deal with interrupt handlers — code that
is run by a CPU as a response to a hardware interrupt. When CPU is
interrupted, it saves the current execution context, runs the
interrupt, and then restores the original context. Although it all
happens on a single core and there are no OS-threads in sight, the
restrictions are similar to those of threads: an interrupt can
arrive in the middle of reference counter upgrade. To rephrase:
Sync should be a core trait. Right now it
is defined in core, but its definition references OS
threads — a concept no_std is agnostic about!
Second, replace thread_local! macro with a
#[thread_local] attribute on (unsafe) statics. There
are two reasons why people reach for thread locals:
to implement really fast concurrent data structures (eg, a global
allocator or an async runtime),
as a programming shortcut, to avoid passing a Context
argument everywhere.
The thread_local! macro mostly addresses the second
use-case — for a very long time, it even was a non-zero cost
abstraction, so that implementing a fast allocator in Rust was
impossible! But, given that this pattern is rare in software (and,
where it is used, it then takes years to refactor it away, like it
was the case with rustc’s usage of thread locals for parsing
session), I think it’s OK to say that Rust flat-out doesn’t support
it safely, like it doesn’t support mutable statics.
The safety contract for #[thread_local] statics would
be more strict then the contract on static mut: the
user must also ensure that the value isn’t used past the
corresponding thread’s lifetime.