Non-Send Futures When?
Ever since reading
What If We Pretended That a Task = Thread?
I can’t stop thinking about borrowing non-Sync
data across .await
.
In this post, I’d love to take one more look at the problem.
Send And Sync
To warm up, a refresher on
Send
and
Sync
auto-traits.
These traits are a library feature that enable fearless concurrency — a statically checked
guarantee that non-thread-safe data structures don’t escape from their original thread.
Why do we need two traits, rather than just a single ThreadSafe
? Because there are two degrees of
thread-unsafety.
Some types are fine to use from multiple threads, as long as only a single thread at a time uses a
particular value. An example here would be a Cell<i32>
. If two threads have a reference to a cell
at the same time, a &Cell<i32>
, we are in trouble — Cell
’s loads and stores are not atomic
and are UB by definition if used concurrently. However, if two different threads have exclusive
access to a Cell
, that’s fine — because the access is exclusive, it necessary means that it is
not simultaneous. That is, it’s OK for thread A to send a Cell<i32>
to a different thread B,
as long as A itself loses access to the cell.
But there are also types which are unsafe to use from multiple threads even if only a single thread
at a time has access to a value. An example here would be an Arc<Cell<i32>>
. It’s not possible
to safely send such an Arc
to a different thread, because a .clone
call can be used to get an
independent copy of an Arc
, effectively creating a share operation out of a send one.
But turns out both cases are covered by just a single trait, Send
. The thing is, to share a
Cell<i32>
across two threads, it is necessary to send an &Cell<i32>
. So we get the following
table:
Send |
!Send |
---|---|
Cell<i32> |
&Cell<i32> |
i32 |
Arc<Cell<i32>> |
&i32 |
&Arc<Cell<i32>> |
If T
is Send
, &T
might or might not be Send
. And that’s where the Sync
traits
comes from: &T: Send
if and only if (iff) T: Sync
. Which gives the following table:
Send |
!Send |
|
Sync |
i32 |
|
!Sync |
Cell<i32> |
Arc<Cell<i32>> |
What about that last empty cell? Types which are Sync
and !Send
are indeed quite rare, and I
don’t know examples which don’t boil down to “underlying API mandates that a type doesn’t leave a
thread”. One example here would be MutexGuard
from the standard library — pthreads require
that only the thread that originally locked a mutex can unlock it. This isn’t a fundamental
requirement for a mutex — a MutexGuard
from parking lot
can be Send
.
Thread Safety And Async
As you see, the Send
& Sync
infrastructure is quite intricate. Is it worth it? Absolutely, as it
leads to simpler code. In Rust, you can explicitly designate certain parts of a code base as
non-thread-safe, and then avoid worrying about threads, because compiler will catch your hand if you
accidentally violate this constraint.
The power of Rust is not defensively making everything thread safe, its the ability to use thread-unsafe code fearlessly.
And it seems like async
doesn’t quite have this power. Let’s build an example, a litmus test!
Let’s start with a Context
pattern, where a bunch of stuff is grouped into a single struct, so
that they can be threaded through the program as one parameter. Such Context
object is usually
scoped to a particular operation — the ultimate owner of Context
is a local variable in some
top-level “main” function, it is threaded as &Context
or &mut Context
everywhere, and usually
isn’t stored anywhere. For the &Context
variant, it is also customary to add some interior
mutability for things like caches. One real-life example would be a Config
type from Cargo:
config/mod.rs#L168.
Distilling the pattern down, we get something like this:
Here, a counter
is an interior-mutable value which could, e.g., track cache hit rate. And here how
this type could be used:
However, the async version of the code doesn’t really work, and in a subtle way:
Do you see the problem? Surprisingly, even rustc
doesn’t see it, the code above compiles in
isolation. However, when we start using it with Tokio’s work-stealing runtime,
we’ll hit an error:
What happened here? When compiling async fn f
, compiler reifies its stack frame as a Rust struct:
This struct contains a reference to our Context
type, and then Context: !Sync
implies &Context:
!Send
implies FStackFrame<'_>: !Send
. And that finally clashes with the signature of
tokio::spawn
:
Tokio’s default executor is work-stealing. It’s going to poll the future from different threads, and that’s
why it is required that the future is Send
.
In my eyes this is a rather significant limitation, and a big difference with synchronous Rust. Async Rust has to be defensively thread-safe, while sync Rust is free to use non-thread-safe data structures when convenient.
A Better Spawn
One solution here is to avoid work-stealing executors:
Local Async Executors and Why They Should be the Default
That post correctly identifies the culprit:
But as for the fix, I think Auri (blaz.is) got it right. The fix is not to
remove + Send
bound, but rather to mirror std::thread::spawn
more closely:
// std::thread::spawn
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: FnOnce() -> T + Send + 'static,
T: Send + 'static,
// A hypothetical better async spawn
pub fn spawn<F, Fut>(f: F) -> JoinHandle<Fut::Output>
where
F: FnOnce() -> Fut + Send + 'static,
Fut: Future,
Fut::Output: Send + 'static,
Let me explain first why this works, and then why this can’t work.
A Future
is essentially a stack-frame of an asynchronous function. Original tokio version requires
that all such stack frames are thread safe. This is not what happens in synchronous code — there,
functions are free to put cells on their stacks. The Send
ness is only guarded when data are
actually send to a different thread, in Chanel::send
and thread::spawn
. The spawn
function in
particular says nothing about the stack of a new thread. It only requires that the data used to
create the first stack frame is Send
.
And that’s what we do in the async version: instead of spawning a future directly, it, just like the
sync version, takes a closure. The closure is moved to a different execution context, so it must be
: Send
. The actual future created by the closure in the new context can be whatever. An async
runtime is free to poll this future from different threads regardless of its Sync
status.
Async work-stealing still works for the same reason that blocking work stealing works. Logical threads of execution can migrate between physical CPU cores because OS restores execution context when switching threads. Task can migrate between threads because async runtime restores execution context when switching tasks. Go is a proof that this is possible — goroutines migrate between different threads but they are free to use on-stack non-thread safe state. The pattern is clearly sound, the question is, can we express this fundamental soundness in Rust’s type system, like we managed to do for OS threads?
This is going to be tricky, because Send
today absolutely means “same thread”, not “same
execution context”. Here’s one example that would break:
async fn sneaky() {
thread_local! { static TL: Rc<()> = Rc::new(()); }
let rc = TL.with(|it| it.clone());
async {}.await;
rc.clone();
}
If the .await
migrates to a different thread, we are in trouble: two tasks can start on the same
thread, then diverge, but continue to hammer the same non-atomic reference count.
Another breakage example is various OS APIs that just mandate that things happen on a particular
execution thread, like pthread_mutex_unlock
. Though I think that the turtle those APIs stand on
are thread locals again?
Can we fix it? As an absolute strawman proposal, let’s redefine Send
& Sync
in terms of abstract
“execution contexts”, add OsThreadSend
and OsThreadSync
, and change API which involve thread
locals to use the OsThread
variants. It seems that everything else works?
Four Questions
I would like to posit four questions to the wider async Rust community.
-
Does this work in theory? As far as I can tell, this does indeed works, but I am not an async expert. Am I missing something?
Ideally, I’d love to see small, self-contained litmus test examples that break
OsThreadSend
Rust. -
Is this an important problem in practice to look into? On the one hand, people are quite successful with async Rust as it is. On the other hand, the expressivity gap here is real, and Rust, as a systems programming language, strives to minimize such gaps. And then there’s the fact that failure mode today is rather nasty — although the actual type error is inside the
f
function, we learn about it only at the call site inmain
.EDIT: I am also wondering — if we stop caring whether futures are
: Send
, does that mean we no longer need an explicit syntax forSend
bounds in async traits? -
Assuming that this idea does work, and we decide that we care enough to try to fix it, is there a backwards-compatible path we could take to make this a reality?
EDIT: to clarify, no way we are really adding a new auto-trait like
OsThreadSend
. But there could be some less invasive change to get the desired result. For example, a more promising approach is to expose some runtime hook for async runtimes to switch TLS, such that each task gets an independent copy of thread-local storage, as if task=thread. -
Is it a new idea that
!Send
futures and work-stealing don’t conflict with each other? For me, that 22.05.2023 post was the first time I’ve learned that having a&Cell<i32>
in a future’s state machine does not preclude polling it from different OS threads. But there’s nothing particularly new there, the relevant APIs were stabilized years ago. Was this issue articulated and discussed back when the async Rust was designed, or is it a genuinely new finding?
Update(2023-12-30): there was some discussion of the ideas on Zulip. It looks this isn’t completely broken and that, indeed, thread-locals are the main principled obstacle.
I think I also got a clear picture of a solution for ideal world, where we are not bound by backwards compatibility requirements: make thread local access unsafe. Specifically:
First, remove any references to OS threads from the definition of Send
and Sync
. Instead,
define them in terms of abstract concurrency. I am not well-versed enough in formal side of things
to understand precisely what that should entail, but I have a litmus test. The new definition should
work for interrupt handlers in embedded. In OS and embedded programming, one needs to deal with
interrupt handlers — code that is run by a CPU as a response to a hardware interrupt. When CPU is
interrupted, it saves the current execution context, runs the interrupt, and then restores the
original context. Although it all happens on a single core and there are no OS-threads in sight, the
restrictions are similar to those of threads: an interrupt can arrive in the middle of reference
counter upgrade. To rephrase: Sync
should be a core
trait. Right now it is defined in core
,
but its definition references OS threads — a concept no_std
is agnostic about!
Second, replace thread_local!
macro with a #[thread_local]
attribute on (unsafe) statics.
There are two reasons why people reach for thread locals:
- to implement really fast concurrent data structures (eg, a global allocator or an async runtime),
-
as a programming shortcut, to avoid passing a
Context
argument everywhere.
The thread_local!
macro mostly addresses the second use-case — for a very long time, it even was
a non-zero cost abstraction, so that implementing a fast allocator in Rust was impossible! But,
given that this pattern is rare in software (and, where it is used, it then takes years to refactor
it away, like it was the case with rustc’s usage of thread locals for parsing session), I think it’s
OK to say that Rust flat-out doesn’t support it safely, like it doesn’t support mutable statics.
The safety contract for #[thread_local]
statics would be more strict then the contract on static
mut
: the user must also ensure that the value isn’t used past the corresponding thread’s lifetime.