Ever since readingWhat If We Pretended That a Task = Thread?I can’t stop thinking about borrowing non-Sync data across .await.In this post, I’d love to take one more look at the problem.
To warm up, a refresher onSend andSync auto-traits.These traits are a library feature that enable fearless concurrency — a statically checkedguarantee that non-thread-safe data structures don’t escape from their original thread.
Why do we need two traits, rather than just a single ThreadSafe? Because there are two degrees ofthread-unsafety.
Some types are fine to use from multiple threads, as long as only a single thread at a time uses aparticular value. An example here would be a Cell<i32>. If two threads have a reference to a cellat the same time, a &Cell<i32>, we are in trouble —Cell’s loads and stores are not atomicand are UB by definition if used concurrently. However, if two different threads have exclusiveaccess to a Cell, that’s fine — because the access is exclusive, it necessary means that it isnot simultaneous. That is, it’s OK for thread A to send a Cell<i32> to a different thread B,as long as A itself loses access to the cell.
But there are also types which are unsafe to use from multiple threads even if only a single threadat a time has access to a value. An example here would be an Arc<Cell<i32>>. It’s not possibleto safely send such an Arc to a different thread, because a .clone call can be used to get anindependent copy of an Arc, effectively creating a share operation out of a send one.
But turns out both cases are covered by just a single trait, Send. The thing is, to share aCell<i32> across two threads, it is necessary to send an &Cell<i32>. So we get the followingtable:
Send
!Send
Cell<i32>
&Cell<i32>
i32
Arc<Cell<i32>>
&i32
&Arc<Cell<i32>>
If T is Send, &T might or might not be Send. And that’s where the Sync traitscomes from: &T: Send if and only if (iff) T: Sync. Which gives the following table:
Send
!Send
Sync
i32
!Sync
Cell<i32>
Arc<Cell<i32>>
What about that last empty cell? Types which are Sync and !Send are indeed quite rare, and Idon’t know examples which don’t boil down to “underlying API mandates that a type doesn’t leave athread”. One example here would be MutexGuard from the standard library — pthreads requirethat only the thread that originally locked a mutex can unlock it. This isn’t a fundamentalrequirement for a mutex — a MutexGuard from parking lotcan be Send.
As you see, the Send & Sync infrastructure is quite intricate. Is it worth it? Absolutely, as itleads to simpler code. In Rust, you can explicitly designate certain parts of a code base asnon-thread-safe, and then avoid worrying about threads, because compiler will catch your hand if youaccidentally violate this constraint.
The power of Rust is not defensively making everything thread safe, its the ability to usethread-unsafe code fearlessly.
And it seems like async doesn’t quite have this power. Let’s build an example, a litmus test!
Let’s start with a Context pattern, where a bunch of stuff is grouped into a single struct, sothat they can be threaded through the program as one parameter. Such Context object is usuallyscoped to a particular operation — the ultimate owner of Context is a local variable in sometop-level “main” function, it is threaded as &Context or &mut Context everywhere, and usuallyisn’t stored anywhere. For the &Context variant, it is also customary to add some interiormutability for things like caches. One real-life example would be a Config type from Cargo:config/mod.rs#L168.
Distilling the pattern down, we get something like this:
Do you see the problem? Surprisingly, even rustc doesn’t see it, the code above compiles inisolation. However, when we start using it with Tokio’s work-stealing runtime,
error: future cannot be sent between threads safely--> src/main.rs:29:18 | | tokio::spawn(task_main()); | ^^^^^^^^^^^ future returned by `task_main` is not `Send` |within `Context`, the trait `Sync` is not implemented for `Cell<i32>`.if you want to do aliasing and mutation between multiple threads,use `std::sync::RwLock` or `std::sync::atomic::AtomicI32` instead.
What happened here? When compiling async fn f, compiler reifies its stack frame as a Rust struct:
This struct contains a reference to our Context type, and then Context: !Sync implies &Context:
!Send implies FStackFrame<'_>: !Send . And that finally clashes with the signature oftokio::spawn:
Tokio’s default executor is work-stealing. It’s going to poll the future from different threads, and that’swhy it is required that the future is Send.
In my eyes this is a rather significant limitation, and a big difference with synchronous Rust.Async Rust has to be defensively thread-safe, while sync Rust is free to use non-thread-safe datastructures when convenient.
Let me explain first why this works, and then why this can’t work.
A Future is essentially a stack-frame of an asynchronous function. Original tokio version requiresthat all such stack frames are thread safe. This is not what happens in synchronous code — there,functions are free to put cells on their stacks. The Sendness is only guarded when data areactually send to a different thread, in Chanel::send and thread::spawn. The spawn function inparticular says nothing about the stack of a new thread. It only requires that the data used tocreate the first stack frame is Send.
And that’s what we do in the async version: instead of spawning a future directly, it, just like thesync version, takes a closure. The closure is moved to a different execution context, so it must be: Send. The actual future created by the closure in the new context can be whatever. An asyncruntime is free to poll this future from different threads regardless of its Sync status.
Async work-stealing still works for the same reason that blocking work stealing works. Logicalthreads of execution can migrate between physical CPU cores because OS restores execution contextwhen switching threads. Task can migrate between threads because async runtime restores executioncontext when switching tasks. Go is a proof that this is possible — goroutines migrate betweendifferent threads but they are free to use on-stack non-thread safe state. The pattern is clearlysound, the question is, can we express this fundamental soundness in Rust’s type system, like wemanaged to do for OS threads?
This is going to be tricky, because Sendtoday absolutely means “same thread”, not “sameexecution context”. Here’s one example that would break:
If the .await migrates to a different thread, we are in trouble: two tasks can start on the samethread, then diverge, but continue to hammer the same non-atomic reference count.
Another breakage example is various OS APIs that just mandate that things happen on a particularexecution thread, like pthread_mutex_unlock. Though I think that the turtle those APIs stand onare thread locals again?
Can we fix it? As an absolute strawman proposal, let’s redefine Send & Sync in terms of abstract
“execution contexts”, add OsThreadSend and OsThreadSync, and change API which involve threadlocals to use the OsThread variants. It seems that everything else works?
I would like to posit four questions to the wider async Rust community.
Does this work in theory? As far as I can tell, this does indeed works, but I am not an asyncexpert. Am I missing something?
Ideally, I’d love to see small, self-contained litmus test examples that break OsThreadSendRust.
Is this an important problem in practice to look into? On the one hand, people are quitesuccessful with async Rust as it is. On the other hand, the expressivity gap here is real, andRust, as a systems programming language, strives to minimize such gaps. And then there’s the factthat failure mode today is rather nasty — although the actual type error is inside the ffunction, we learn about it only at the call site in main.
EDIT: I am also wondering — if we stop caring whether futures are : Send, does that mean weno longer need an explicit syntax for Send bounds in async traits?
Assuming that this idea does work, and we decide that we care enough to try to fix it, is there abackwards-compatible path we could take to make this a reality?
EDIT: to clarify, no way we are really adding a new auto-trait like OsThreadSend. But therecould be some less invasive change to get the desired result. For example, a more promisingapproach is to expose some runtime hook for async runtimes to switch TLS, such that each taskgets an independent copy of thread-local storage, as if task=thread.
Is it a new idea that !Send futures and work-stealing don’t conflict with each other? For me,that 22.05.2023 postwas the first time I’ve learned that having a &Cell<i32> in a future’s state machine does notpreclude polling it from different OS threads. But there’s nothing particularly new there, therelevant APIs were stabilized years ago. Was this issue articulated and discussed back when theasync Rust was designed, or is it a genuinely new finding?
Update(2023-12-30): there was some discussion of the ideas onZulip.It looks this isn’t completely broken and that, indeed, thread-locals are the main principled obstacle.
I think I also got a clear picture of a solution for ideal world, where we are not bound bybackwards compatibility requirements: make thread local access unsafe. Specifically:
First, remove any references to OS threads from the definition of Send and Sync. Instead,define them in terms of abstract concurrency. I am not well-versed enough in formal side of thingsto understand precisely what that should entail, but I have a litmus test. The new definition shouldwork for interrupt handlers in embedded. In OS and embedded programming, one needs to deal withinterrupt handlers — code that is run by a CPU as a response to a hardware interrupt. When CPU isinterrupted, it saves the current execution context, runs the interrupt, and then restores theoriginal context. Although it all happens on a single core and there are no OS-threads in sight, therestrictions are similar to those of threads: an interrupt can arrive in the middle of referencecounter upgrade. To rephrase: Sync should be a core trait. Right now it is defined in core,but its definition references OS threads — a concept no_std is agnostic about!
Second, replace thread_local! macro with a #[thread_local] attribute on (unsafe) statics.There are two reasons why people reach for thread locals:
to implement really fast concurrent data structures (eg, a global allocator or an async runtime),
as a programming shortcut, to avoid passing a Context argument everywhere.
The thread_local! macro mostly addresses the second use-case — for a very long time, it even wasa non-zero cost abstraction, so that implementing a fast allocator in Rust was impossible! But,given that this pattern is rare in software (and, where it is used, it then takes years to refactorit away, like it was the case with rustc’s usage of thread locals for parsing session), I think it’sOK to say that Rust flat-out doesn’t support it safely, like it doesn’t support mutable statics.
The safety contract for #[thread_local] statics would be more strict then the contract on static
mut: the user must also ensure that the value isn’t used past the corresponding thread’s lifetime.