runtimes · level 4

Concurrency Models

Threads, event loops, async/await, actors, CSP — and when to pick each.

250 XP

Concurrency Models

Concurrency is overlapping multiple tasks in time. Parallelism is actually running them simultaneously on separate cores. Most programs need concurrency; some also need parallelism. The model you pick determines cost per task, scheduling guarantees, and the kind of bugs you'll encounter.

Analogy

Think of a barista running a morning rush. Concurrency is one barista juggling seven drinks — grind, steam, pour, grind, steam — none of the drinks get their full attention but all of them progress. Parallelism is hiring a second barista with a second machine so two drinks can actually be pulling espresso at the exact same second. A single barista can be wildly concurrent without ever being parallel, and a two-person team can still deadlock if they both reach for the last jug of oat milk.

OS threads

The kernel schedules threads onto cores. With N cores and N CPU-bound threads you get real parallelism. Each thread owns a stack (typically 1–2 MB by default), and synchronisation goes through locks, mutexes, and condition variables.

pthread_t t;
pthread_create(&t, NULL, worker, arg);
pthread_join(t, NULL);

Strengths: real parallelism, straightforward mental model. Weaknesses: threads are expensive (thousands is borderline, millions is impossible), context switches cost microseconds, and lock-heavy code is hard to reason about.

When to use: CPU-bound workloads with a small number of long-running tasks — video encoding, scientific compute, a database engine's worker pool.

Event loops

One thread. A big loop that asks the kernel (epoll on Linux, kqueue on BSD/macOS, IOCP on Windows) "which of my file descriptors is ready?" and runs the handler. Node.js, nginx, redis-server are all event-loop-based.

loop:
  events = epoll_wait(fds)
  for event in events:
    callback(event)

Each connection is just a socket + a callback. Ten thousand idle websockets cost nearly nothing. Strengths: scales to enormous numbers of idle connections; no locks because there's one thread. Weaknesses: any blocking call on that thread freezes the server; you can't use more than one core without offloading (worker threads, cluster).

When to use: I/O-bound services with huge fanout — chat servers, proxies, real-time dashboards, front-door gateways.

async / await

Syntactic sugar for "suspend this function until I/O is ready, resume on the same or another thread." Under the hood, the compiler rewrites the function into a state machine. Rust, Python, C#, JavaScript, Swift, Kotlin all have it.

async function handle(req: Request) {
  const user = await db.getUser(req.userId);
  const posts = await db.getPostsByUser(user.id);
  return { user, posts };
}

The code reads sequentially; the runtime interleaves many such calls on a small OS-thread pool. Strengths: ergonomics of synchronous code with the throughput of an event loop; usually multi-threaded underneath so you get some parallelism. Weaknesses: function colouring (async functions can't be called from sync ones), harder debugging when you lose a stack trace across an await.

When to use: general-purpose I/O services, especially where you have a mix of fanout widths and want readable code.

Actor model

Each unit of work is an actor — a lightweight process with its own mailbox. Actors never share memory; they communicate by sending messages. Erlang, Elixir, and Akka on the JVM are the reference implementations.

Pid = spawn(fun() -> loop([]) end),
Pid ! {set, 1}.

Strengths: total isolation. An actor crashing takes nothing else down. Supervision trees let you design explicit failure semantics ("let it crash"). Weaknesses: message-passing overhead; shared read-mostly state is awkward.

When to use: systems with natural message boundaries, high availability requirements, and heavy shared mutable state that would otherwise need aggressive locking.

CSP — Go channels

Communicating Sequential Processes: goroutines (millions of them, ~2 KB stacks that grow as needed) pass values through typed channels. The runtime multiplexes them onto a small pool of OS threads.

ch := make(chan int)
go produce(ch)
for v := range ch {
    process(v)
}

Strengths: very lightweight; the select statement is a clean way to multiplex. Weaknesses: deadlocks are easy if you're sloppy about channel ownership; shared mutation still needs sync.Mutex.

When to use: anything where "thousands of cheap concurrent tasks" is the right abstraction — crawlers, pipelines, streaming aggregators.

Picking a model

A rough decision tree:

  1. Heavy shared mutable state? → actors.
  2. Huge fanout of I/O? → event loop or async/await.
  3. Few, long-running CPU tasks? → OS threads.
  4. Many small CPU tasks? → CSP goroutines (or thread pools if the language lacks them).
  5. Mixed workload, nothing extreme? → async/await is the modern default.

Cost per task

Model Per-task cost Max tasks (rule of thumb)
OS threads ~1 MB stack ~10k
Event-loop callback ~1 KB 100k+
async/await future ~hundreds of bytes 100k+
Actor (Erlang) ~2 KB millions
Goroutine ~2 KB growing millions

Concurrency is a tool for structure; parallelism is a tool for speed. Pick the concurrency model that matches the workload shape, then worry about cores.