Skip to content

fix: spawn part load in fts training#5977

Merged
westonpace merged 2 commits intolance-format:mainfrom
westonpace:fix/fts-train-index-starvation-fast
Feb 22, 2026
Merged

fix: spawn part load in fts training#5977
westonpace merged 2 commits intolance-format:mainfrom
westonpace:fix/fts-train-index-starvation-fast

Conversation

@westonpace
Copy link
Copy Markdown
Member

When training an FTS index we load all of the partitions and merge them. The code was setup to load partitions in parallel. However, there was no spawn call so it wasn't actually loading anything in parallel. This lead to starvation on HEAD calls. The order of operations (everything was serialized) was something like...

Start load of part 1 file 1
Start load of part 2 file 1
...
Start load of part N file 1
Finish load and do CPU work to load file 1
Start load of part 1 file 2
Finish load and do CPU work to load file 2
...

The load request of part N file 1 would not get polled again until a lot of CPU work was done (to parse files 1-N-1). This resulted in the HEAD request being starved which looked like an S3 timeout.

This fix uses spawn. I've tested it against 15M rows of fineweb data and ensured the RAM is still reasonably bounded even with parallel loading.

@github-actions
Copy link
Copy Markdown
Contributor

Code Review

LGTM

The fix correctly addresses the I/O starvation issue by wrapping partition loads in tokio::task::spawn(). Without spawn, buffered() doesn't actually parallelize work - it just allows multiple futures to be polled, but they still execute on the same task, causing serialization.

Error handling is correct: JoinError converts properly via lance-core's error module, and the inner load error is unwrapped on line 271.

No P0/P1 concerns.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 21, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/inverted/merger.rs 50.00% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@westonpace westonpace merged commit e13dbc0 into lance-format:main Feb 22, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants