Skip to content

Present: Using Tokio for CPU-Bound Tasks (Works Really Well) at Tokio Conf 2026 #19770

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

At @akurmustafa 's great suggestion on #18260 I submitted, and was accepted to speak at the first TokioConf about using Tokio as the DataFusion CPU runtime ngine

Here is more detail
https://www.tokioconf.com/speakers

Here is the talk summary

Using Tokio for CPU-Bound Tasks (Works Really Well)

The Tokio runtime at the heart of the Rust async ecosystem is also a good choice for CPU-heavy jobs such as those found in analytics engines. We will review what makes Tokio a compelling choice for CPU bound workloads, address common concerns, and report on our experience using Tokio as the thread scheduler for Apache DataFusion

Describe the solution you'd like

I want to create this talk / slides in the open. If the talks aren't recorded, I will also record a second version of the talk

Describe alternatives you've considered

The high level idea will be to summarize the findings in
https://www.influxdata.com/blog/using-rustlangs-async-tokio-runtime-for-cpu-bound-tasks/
And then refresh the major pitfalls

Talk outline:

  1. Analytic DB 101 + Volcano Model: Explain DataFusion execution model (data flow graphs, and vectorzed execution)
  2. Explain why people thought using tokio for CPU was bad and the counter arguments
  3. Demonstrate how tokio's scheduler effectively implements the "get_next_batch()" API on the same thread
  4. Discuss pitfalls

Major pitfall 1: Using the same async runtime for IO and CPU bound tasks

  • Explain symptoms (everything just slows down under high concurency) -- the theore is that this is due to the network protocol congestion control protocol (
  • Explain solution: use separate runtimes, thread it throgh
  • TODO: find DF example of multiple runtimes
  • TODO: mention the challenge of having to pass a new runtime to different IO libraries (object_store, etc)

Major pitfall 2: Hot loops and cancelling

  • Basically summarize the contents of https://datafusion.apache.org/blog/2025/06/30/cancellation/ from @pepijnve
  • Explain symtpoms: Cancelling and the plan keeps going
  • Solution 1: (obvious one) no hot loops
  • Solution 2: (less obvious) need to make sure we periodically yield back to the scheduler (otherwise tasks keep running but the scheduer never gets a chance to figure out the consumers have been dropped)

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions