perf: materialize the tokens after WAND done#5572
Conversation
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
|
Code Review - PR optimizes WAND search by deferring token materialization, reducing memory from 497 to 222 MiB. P1: Bounds check in index.rs silently skips invalid indices - consider making this a hard error. P1: Add unit test for token-to-index-to-token roundtrip. Note: Benchmark scripts useful but consider gitignore if not for CI. |
Code ReviewSummary: This PR optimizes memory usage in WAND FTS search by deferring token string materialization until after candidate selection. Instead of copying token strings for each candidate during the search, it stores term indices and builds a token lookup table per partition. ObservationsP1 - Potential out-of-bounds access in flat_search path The flat_search method in wand.rs was not updated to use the new iter_term_freqs pattern. It still uses iter_token_freqs and collects (token.to_owned(), freq) tuples (lines 397-400 and 525-528). This creates a mismatch: the DocCandidate::freqs type has changed to Vec(u32, u32) but flat_search tries to store (String, u32). This should cause a compilation error - please verify the code compiles and that flat_search was correctly updated. If I am reading the diff incorrectly and the current code compiles, please ignore this comment. Minor Observations (non-blocking)
Overall the approach is sound - using term indices instead of strings during the hot path and materializing tokens once per partition is a clean optimization with significant memory savings (497MB to 222MB as reported). |
| let mut candidates = if limit <= BLOCK_SIZE * 10 { | ||
| BinaryHeap::with_capacity(limit) | ||
| } else { | ||
| BinaryHeap::new() |
There was a problem hiding this comment.
Even if the limit is quite large, starting with BinaryHeap::with_capacity(BLOCK_SIZE * 10) is still a good approach.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
this avoids copying tokens for each candidate during WAND search, these small strings could lead high memory usage if we have many partitions and the queries are with large limit. verified on a 1M dataset before: Peak RSS delta: 497.09 MiB after: Peak RSS delta: 222.34 MiB this also improves the FTS performance by ~10% for such queries --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>
this avoids copying tokens for each candidate during WAND search, these small strings could lead high memory usage if we have many partitions and the queries are with large limit. verified on a 1M dataset before: Peak RSS delta: 497.09 MiB after: Peak RSS delta: 222.34 MiB this also improves the FTS performance by ~10% for such queries --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>
this avoids copying tokens for each candidate during WAND search, these small strings could lead high memory usage if we have many partitions and the queries are with large limit.
verified on a 1M dataset
before:
Peak RSS delta: 497.09 MiB
after:
Peak RSS delta: 222.34 MiB
this also improves the FTS performance by ~10% for such queries