Goroutines Are Not Significantly Smaller Than Threads
The most commonly cited drawback of OS-level threads is that they use a lot of RAM. This is not true on Linux.
Let’s compare memory footprint of 10_000 Linux threads with 10_000 goroutines. We spawn 10k workers, which sleep for about 10 seconds, waking up every 10 milliseconds. Each worker is staggered by a pseudorandom delay up to 200 milliseconds to avoid the thundering herd problem.
time utility to measure memory usage:
A thread is only 3 times as large as a goroutine. Absolute numbers are also significant: 10k threads require only 100 megabytes of overhead. If the application does 10k concurrent things, 100mb might be negligible.
Note that it is wrong to use this benchmark to compare performance of threads and goroutines. The workload is representative for measuring absolute memory overhead, but is not representative for time overhead.
That being said, it is possible to explain why threads need 21 seconds of CPU time while goroutines need only 14. Go runtime spawns a thread per CPU-core, and tries hard to keep each goroutine tied to specific thread (and, by extension, CPU). Threads by default migrate between CPUs, which incurs synchronization overhead. Pinning threads to cores in a round-robin fashion removes this overhead:
The total CPU time now is approximately the same, but the distribution is different. On this workload, goroutine scheduler spends roughly the same amount of cycles in the userspace that the thread scheduler spends in the kernel.
Code for the benchmarks is available here: matklad/10k_linux_threads.