Skip to content

fix: don't use buffered without spawn in fts index training#5974

Closed
westonpace wants to merge 1 commit intolance-format:mainfrom
westonpace:fix/fts-index-train-starvation
Closed

fix: don't use buffered without spawn in fts index training#5974
westonpace wants to merge 1 commit intolance-format:mainfrom
westonpace:fix/fts-index-train-starvation

Conversation

@westonpace
Copy link
Copy Markdown
Member

When training an FTS index we load all of the partitions and merge them. The code was setup to load partitions in parallel. However, there was no spawn call so it wasn't actually loading anything in parallel. This lead to starvation on HEAD calls. The order of operations (everything was serialized) was something like...

Start load of part 1 file 1
Start load of part 2 file 1
...
Start load of part N file 1
Finish load and do CPU work to load file 1
Start load of part 1 file 2
Finish load and do CPU work to load file 2
...

The load request of part N file 1 would not get polled again until a lot of CPU work was done (to parse files 1-N-1). This resulted in the HEAD request being starved which looked like an S3 timeout.

This fix is the safest fix which just removes the call to buffered.

A more aggressive fix would be to add a spawn call but that could have impacts on RAM usage and I think #5970 would be an even better approach anyways. I can quickly test the spawn fix on Monday and put up a PR for that later if it looks good but wanted to get something in now.

@github-actions github-actions Bot added the bug Something isn't working label Feb 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Code Review

P0: Compilation Error

The PR removes get_num_compute_intensive_cpus from imports at rust/lance-index/src/scalar/inverted/merger.rs:6, but the test test_merge_streams_partitions_in_batches still uses it at line 390:

let num_parts = get_num_compute_intensive_cpus().saturating_add(2);

The test module uses use super::* which previously brought in this function, but after this change it will no longer be available.

Fix: Add use lance_core::utils::tokio::get_num_compute_intensive_cpus; to the test module imports, or replace the usage with a fixed value like let num_parts = 8;.


The approach (removing buffered without spawn in favor of sequential) is reasonable as a quick fix for the starvation issue described in the PR.

@westonpace
Copy link
Copy Markdown
Member Author

Closing in favor of #5977

@westonpace westonpace closed this Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant