fix: index overestimates the posting list size#5327
fix: index overestimates the posting list size#5327BubbleCal merged 2 commits intolance-format:mainfrom
Conversation
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self.blocks.get_buffer_memory_size() | ||
| + self | ||
| .positions | ||
| .as_ref() | ||
| .map(|positions| positions.get_array_memory_size()) | ||
| .map(|positions| positions.get_buffer_memory_size()) |
There was a problem hiding this comment.
Include compressed position buffers in deep size
Switching positions to get_buffer_memory_size() means the deep size now counts only the outer list’s offset/null buffers and omits the buffers of the inner LargeBinaryArray that hold the compressed positions themselves. Because CompressedPostingList has no other deep-size traversal, any posting list with positions will have most of its memory (the compressed position payloads) ignored when the cache computes weights, so cache limits can be exceeded whenever positions are present.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Codex is wrong here. I agree with you that get_buffer_memory_size is correct. The array structure itself will already be included.
we double counted the fields' size and it overestimated the posting list size by 1.2x~2x for small ones