fix: don't use buffered without spawn in fts index training by westonpace · Pull Request #5974 · lance-format/lance

westonpace · 2026-02-21T17:07:42Z

When training an FTS index we load all of the partitions and merge them. The code was setup to load partitions in parallel. However, there was no spawn call so it wasn't actually loading anything in parallel. This lead to starvation on HEAD calls. The order of operations (everything was serialized) was something like...

Start load of part 1 file 1
Start load of part 2 file 1
...
Start load of part N file 1
Finish load and do CPU work to load file 1
Start load of part 1 file 2
Finish load and do CPU work to load file 2
...

The load request of part N file 1 would not get polled again until a lot of CPU work was done (to parse files 1-N-1). This resulted in the HEAD request being starved which looked like an S3 timeout.

This fix is the safest fix which just removes the call to buffered.

A more aggressive fix would be to add a spawn call but that could have impacts on RAM usage and I think #5970 would be an even better approach anyways. I can quickly test the spawn fix on Monday and put up a PR for that later if it looks good but wanted to get something in now.

github-actions · 2026-02-21T17:09:32Z

Code Review

P0: Compilation Error

The PR removes get_num_compute_intensive_cpus from imports at rust/lance-index/src/scalar/inverted/merger.rs:6, but the test test_merge_streams_partitions_in_batches still uses it at line 390:

let num_parts = get_num_compute_intensive_cpus().saturating_add(2);

The test module uses use super::* which previously brought in this function, but after this change it will no longer be available.

Fix: Add use lance_core::utils::tokio::get_num_compute_intensive_cpus; to the test module imports, or replace the usage with a fixed value like let num_parts = 8;.

The approach (removing buffered without spawn in favor of sequential) is reasonable as a quick fix for the starvation issue described in the PR.

westonpace · 2026-02-21T20:33:38Z

Closing in favor of #5977

don't use buffered without spawn

9b13e7b

github-actions Bot added the bug Something isn't working label Feb 21, 2026

westonpace closed this Feb 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: don't use buffered without spawn in fts index training#5974

fix: don't use buffered without spawn in fts index training#5974
westonpace wants to merge 1 commit intolance-format:mainfrom
westonpace:fix/fts-index-train-starvation

westonpace commented Feb 21, 2026

Uh oh!

github-actions Bot commented Feb 21, 2026

Uh oh!

westonpace commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

westonpace commented Feb 21, 2026

Uh oh!

github-actions Bot commented Feb 21, 2026

Code Review

P0: Compilation Error

Uh oh!

westonpace commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant