fix: spawn part load in fts training by westonpace · Pull Request #5977 · lance-format/lance

westonpace · 2026-02-21T20:33:28Z

When training an FTS index we load all of the partitions and merge them. The code was setup to load partitions in parallel. However, there was no spawn call so it wasn't actually loading anything in parallel. This lead to starvation on HEAD calls. The order of operations (everything was serialized) was something like...

Start load of part 1 file 1
Start load of part 2 file 1
...
Start load of part N file 1
Finish load and do CPU work to load file 1
Start load of part 1 file 2
Finish load and do CPU work to load file 2
...

The load request of part N file 1 would not get polled again until a lot of CPU work was done (to parse files 1-N-1). This resulted in the HEAD request being starved which looked like an S3 timeout.

This fix uses spawn. I've tested it against 15M rows of fineweb data and ensured the RAM is still reasonably bounded even with parallel loading.

github-actions · 2026-02-21T20:34:45Z

Code Review

LGTM ✓

The fix correctly addresses the I/O starvation issue by wrapping partition loads in tokio::task::spawn(). Without spawn, buffered() doesn't actually parallelize work - it just allows multiple futures to be polled, but they still execute on the same task, causing serialization.

Error handling is correct: JoinError converts properly via lance-core's error module, and the inner load error is unwrapped on line 271.

No P0/P1 concerns.

codecov · 2026-02-21T21:08:04Z

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-index/src/scalar/inverted/merger.rs	50.00%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

westonpace added 2 commits February 21, 2026 12:29

don't use buffered without spawn

19dde42

Add a spawn to the part load in FTS training

b8b00f3

westonpace mentioned this pull request Feb 21, 2026

fix: don't use buffered without spawn in fts index training #5974

Closed

github-actions Bot added the bug Something isn't working label Feb 21, 2026

wjones127 approved these changes Feb 21, 2026

View reviewed changes

westonpace merged commit e13dbc0 into lance-format:main Feb 22, 2026
30 checks passed

andrea-reale mentioned this pull request Mar 30, 2026

emilk/fix write starvation rerun-io/lance#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: spawn part load in fts training#5977

fix: spawn part load in fts training#5977
westonpace merged 2 commits intolance-format:mainfrom
westonpace:fix/fts-train-index-starvation-fast

westonpace commented Feb 21, 2026

Uh oh!

github-actions Bot commented Feb 21, 2026

Uh oh!

codecov Bot commented Feb 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

westonpace commented Feb 21, 2026

Uh oh!

github-actions Bot commented Feb 21, 2026

Code Review

Uh oh!

codecov Bot commented Feb 21, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants