perf: speed up format 2.2 300% by spawning structural decode batch tasks by Xuanwo · Pull Request #5982 · lance-format/lance

Xuanwo · 2026-02-23T00:26:26Z

NextDecodeTask::into_batch is synchronous and can be CPU-heavy. Running it inline in the future poll path blocks Tokio workers and reduces effective decode concurrency.

This changes becomes more meaningful while we are using zstd.

Benchmarks were run on AWS EC2 using both local and S3 copies of the same dataset (fineweb.lance.v2_2.lz4) with repeated scans.

Main run (3 rounds, 20 repeats each):

Local median latency:
- p50: 894675us -> 289781us (3.087x, -67.61%)
- p95: 929515us -> 307874us (3.019x, -66.88%)
- p99: 1034383us -> 375041us (2.758x, -63.74%)
S3 median latency:
- p50: 3998660us -> 3510771us (1.139x, -12.20%)
- p95: 4068799us -> 3572090us (1.139x, -12.21%)
- p99: 4153371us -> 3592478us (1.156x, -13.50%)

Changes

move structural decode batch conversion in StructuralBatchDecodeStream::into_stream to tokio::spawn(...).await

github-actions · 2026-02-23T00:27:27Z

PR Review

This PR correctly aligns the structural decode batch path with the existing non-structural path (lines 1467-1475) by spawning CPU-heavy into_batch work in a separate task.

No P0/P1 issues found.

The change is:

Minimal and focused
Consistent with the existing pattern in the codebase
Properly handles JoinError via Error::Wrapped
Well-motivated by benchmark results showing 3x improvement for local reads

LGTM ✓

justinrmiller

Lgtm!

westonpace

We keep going back and forth on this one 😄. You originally added the spawn here (7c19c22) and then I removed it here (70636f6)

I think we have some competing goals. This spawn can improve scan performance because we are reading large blocks of data and the decode is expensive. However, it also hurts random access performance because in that case we have a very cheap decode and the introduction of a spawn increases tokio overhead.

I am also still worried about whether or not this will boost performance in an actual query. For example, if we were filtering on this data then not having the spawn means we would decode and filter in the same thread task. By introducing the spawn the decode and filter happen on different thread tasks which means data might have to get loaded into the CPU cache twice.

Can you add some kind of reader config setting? Ideally in a way where we can change the default value for this setting with an environment variable.

Xuanwo · 2026-02-23T14:02:05Z

We keep going back and forth on this one 😄.

Yep, I realized that.

Can you add some kind of reader config setting? Ideally in a way where we can change the default value for this setting with an environment variable.

Seems to be a good idea, will try.

This reverts commit 628683b.

Xuanwo · 2026-02-24T09:06:22Z

cc @westonpace, I added a flag based on our query pattern and an env to allow users to override it. Let me know what do you think about this change. This seems like an interesting issue that may require a more complete design for us to address. I will add that as a follow-up.

codecov · 2026-02-24T09:35:36Z

Codecov Report

❌ Patch coverage is 85.36585% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-encoding/src/decoder.rs	79.31%	5 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

westonpace

Thanks for adding the variable!

…sks (lance-format#5982) `NextDecodeTask::into_batch` is synchronous and can be CPU-heavy. Running it inline in the future poll path blocks Tokio workers and reduces effective decode concurrency. This changes becomes more meaningful while we are using zstd. Benchmarks were run on AWS EC2 using both local and S3 copies of the same dataset (`fineweb.lance.v2_2.lz4`) with repeated scans. Main run (3 rounds, 20 repeats each): - Local median latency: - p50: `894675us -> 289781us` (`3.087x`, `-67.61%`) - p95: `929515us -> 307874us` (`3.019x`, `-66.88%`) - p99: `1034383us -> 375041us` (`2.758x`, `-63.74%`) - S3 median latency: - p50: `3998660us -> 3510771us` (`1.139x`, `-12.20%`) - p95: `4068799us -> 3572090us` (`1.139x`, `-12.21%`) - p99: `4153371us -> 3592478us` (`1.156x`, `-13.50%`) ## Changes move structural decode batch conversion in `StructuralBatchDecodeStream::into_stream` to `tokio::spawn(...).await`

…sks (#5982) `NextDecodeTask::into_batch` is synchronous and can be CPU-heavy. Running it inline in the future poll path blocks Tokio workers and reduces effective decode concurrency. This changes becomes more meaningful while we are using zstd. Benchmarks were run on AWS EC2 using both local and S3 copies of the same dataset (`fineweb.lance.v2_2.lz4`) with repeated scans. Main run (3 rounds, 20 repeats each): - Local median latency: - p50: `894675us -> 289781us` (`3.087x`, `-67.61%`) - p95: `929515us -> 307874us` (`3.019x`, `-66.88%`) - p99: `1034383us -> 375041us` (`2.758x`, `-63.74%`) - S3 median latency: - p50: `3998660us -> 3510771us` (`1.139x`, `-12.20%`) - p95: `4068799us -> 3572090us` (`1.139x`, `-12.21%`) - p99: `4153371us -> 3592478us` (`1.156x`, `-13.50%`) ## Changes move structural decode batch conversion in `StructuralBatchDecodeStream::into_stream` to `tokio::spawn(...).await`

perf: spawn structural decode batch tasks

d78c51b

github-actions Bot added the performance label Feb 23, 2026

Xuanwo changed the title ~~perf: spawn structural decode batch tasks~~ perf: speed up format 2.2 300% by spawning structural decode batch tasks Feb 23, 2026

justinrmiller approved these changes Feb 23, 2026

View reviewed changes

westonpace requested changes Feb 23, 2026

View reviewed changes

Xuanwo added 9 commits February 24, 2026 16:00

perf: auto-tune structural batch decode spawning

628683b

Revert "perf: auto-tune structural batch decode spawning"

3a825d6

This reverts commit 628683b.

perf: auto-select structural decode spawning by access pattern

db60f8b

perf: keep only spawn mode env for structural decode

7e25ba2

refactor: simplify structural spawn auto policy

46f498e

refactor: simplify structural spawn mode parsing

37c9728

refactor: inline structural spawn selection logic

41d077a

refactor: minimize structural decode spawn policy logic

62df58a

Merge branch 'main' into xuanwo/speed

41f53e6

westonpace approved these changes Feb 25, 2026

View reviewed changes

Xuanwo added 3 commits February 26, 2026 01:11

Merge branch 'main' into xuanwo/speed

887ce7a

fix: pass spawn flag to decode stream benchmark

0a9c250

test: tolerate clock skew in credential cache ttl

7beb347

Xuanwo merged commit b2d4c8f into main Feb 25, 2026
27 checks passed

Xuanwo deleted the xuanwo/speed branch February 25, 2026 18:48

andrea-reale mentioned this pull request Mar 30, 2026

emilk/fix write starvation rerun-io/lance#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: speed up format 2.2 300% by spawning structural decode batch tasks#5982

perf: speed up format 2.2 300% by spawning structural decode batch tasks#5982
Xuanwo merged 13 commits intomainfrom
xuanwo/speed

Xuanwo commented Feb 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Feb 23, 2026

Uh oh!

justinrmiller left a comment

Uh oh!

westonpace left a comment

Uh oh!

Xuanwo commented Feb 23, 2026

Uh oh!

Xuanwo commented Feb 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

westonpace left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Xuanwo commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

github-actions Bot commented Feb 23, 2026

PR Review

Uh oh!

justinrmiller left a comment

Choose a reason for hiding this comment

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

Xuanwo commented Feb 23, 2026

Uh oh!

Xuanwo commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Xuanwo commented Feb 23, 2026 •

edited

Loading

Xuanwo commented Feb 24, 2026 •

edited

Loading

codecov Bot commented Feb 24, 2026 •

edited

Loading