Skip to content

perf: don't concat the batches for writing posting lists#5769

Merged
BubbleCal merged 3 commits intomainfrom
yang/write-posting-write
Jan 21, 2026
Merged

perf: don't concat the batches for writing posting lists#5769
BubbleCal merged 3 commits intomainfrom
yang/write-posting-write

Conversation

@BubbleCal
Copy link
Copy Markdown
Contributor

the file writer has an internal buffer so we don't need to concat the batches, this slightly improves the indexing perf (6%) and reduces the memory footprint (-4%)

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@github-actions
Copy link
Copy Markdown
Contributor

Code Review

This PR removes the custom batching logic in write_posting_lists and relies on the file writer's internal buffer instead. The change looks correct and well-tested.

Summary

The PR simplifies the posting list writing code by removing the LANCE_FTS_FLUSH_SIZE environment variable and the associated manual batching logic. Since FileWriter already has internal buffering (8MiB per column by default via data_cache_bytes), this external batching was redundant.

Observations

  1. Correctness: The change is sound. The file writer's write_batch method already buffers data internally before flushing pages, so the manual concat_batches + size-based flush was unnecessary overhead.

  2. Test Coverage: Good addition of test_write_posting_lists_writes_each_batch with a mock CountingStore to verify the new behavior writes each batch individually.

No blocking issues found.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 59.18367% with 20 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/inverted/builder.rs 59.18% 19 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@BubbleCal BubbleCal merged commit e8f173c into main Jan 21, 2026
29 checks passed
@BubbleCal BubbleCal deleted the yang/write-posting-write branch January 21, 2026 13:48
majin1102 pushed a commit to majin1102/lance that referenced this pull request Jan 23, 2026
…t#5769)

the file writer has an internal buffer so we don't need to concat the
batches, this slightly improves the indexing perf (6%) and reduces the
memory footprint (-4%)

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
vivek-bharathan pushed a commit to vivek-bharathan/lance that referenced this pull request Feb 2, 2026
…t#5769)

the file writer has an internal buffer so we don't need to concat the
batches, this slightly improves the indexing perf (6%) and reduces the
memory footprint (-4%)

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants