Present: Using Tokio for CPU-Bound Tasks (Works Really Well) at Tokio Conf 2026

### Is your feature request related to a problem or challenge?

At @akurmustafa 's great suggestion on https://github.com/apache/datafusion/discussions/18260 I submitted, and was accepted to speak at the first TokioConf about using Tokio as the DataFusion CPU runtime ngine

Here is more detail
https://www.tokioconf.com/speakers

Here is the talk summary

> ### Using Tokio for CPU-Bound Tasks (Works Really Well)
The Tokio runtime at the heart of the Rust async ecosystem is also a good choice for CPU-heavy jobs such as those found in analytics engines. We will review what makes Tokio a compelling choice for CPU bound workloads, address common concerns, and report on our experience using Tokio as the thread scheduler for Apache DataFusion

### Describe the solution you'd like

I want to create this talk / slides in the open. If the talks aren't recorded, I will also record a second version of the talk

### Describe alternatives you've considered

The high level idea will be to summarize the findings in 
https://www.influxdata.com/blog/using-rustlangs-async-tokio-runtime-for-cpu-bound-tasks/
And then refresh the major pitfalls


Talk outline:
1. Analytic DB 101 + Volcano Model: Explain DataFusion execution model (data flow graphs, and vectorzed execution)
1. Explain why people thought using tokio for CPU was bad and the counter arguments
2. Demonstrate how tokio's scheduler effectively implements the "get_next_batch()" API on the same thread
3. Discuss pitfalls


Major pitfall 1: Using the same async runtime for IO and CPU bound tasks
* Explain symptoms (everything just slows down under high concurency) -- the theore is that this is due to the network protocol congestion control protocol (
* Explain solution: use separate runtimes, thread it throgh
* TODO: find DF example of multiple runtimes
* TODO: mention the challenge of having to pass a new runtime to different IO libraries (object_store, etc)

Major pitfall 2: Hot loops and cancelling
* Basically summarize the contents of https://datafusion.apache.org/blog/2025/06/30/cancellation/ from @pepijnve 
* Explain symtpoms: Cancelling and the plan keeps going
* Solution 1: (obvious one) no hot loops
* Solution 2: (less obvious) need to make sure we periodically yield back to the scheduler (otherwise tasks keep running but the scheduer never gets a chance to figure out the consumers have been dropped)

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Present: Using Tokio for CPU-Bound Tasks (Works Really Well) at Tokio Conf 2026 #19770

Is your feature request related to a problem or challenge?

Using Tokio for CPU-Bound Tasks (Works Really Well)

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Present: Using Tokio for CPU-Bound Tasks (Works Really Well) at Tokio Conf 2026 #19770

Description

Is your feature request related to a problem or challenge?

Using Tokio for CPU-Bound Tasks (Works Really Well)

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions