feat: merge-insert with primary key dedupe#5633
feat: merge-insert with primary key dedupe#5633jackye1995 merged 2 commits intolance-format:mainfrom
Conversation
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
Code Review: Bloom Filter Merge-Insert Conflict DetectionThis PR adds bloom filter-based conflict detection for merge-insert operations, tracking keys of newly inserted rows to detect concurrent insert conflicts. SummaryThe implementation adds:
P0: Critical - Inconsistent Hashing Between Bloom Filter Helper and Main SBBFFile: The newly added // New helper (lines 428-447) - custom rolling seed hash
pub fn bloom_contains_hash(hash: u64, bitmap: &[u8], num_bits: u32) -> bool {
let mut seed = 0x9e3779b97f4a7c15u64;
for _i in 0..SBBF_NUM_HASHES {
let pos = ((hash.wrapping_add(seed)) % m) as usize;
// ... bit_test on raw bytes
seed = seed.rotate_left(13) ^ 0x517cc1b727220a95u64;
}
}The existing Looking at the actual usage in Recommendation: Either:
P1: Test Coverage Gap for Cross-Filter Type IntersectionFile: The tests cover Bloom-to-Bloom intersection via concurrent merge inserts, but don't explicitly test:
Given the P0 issue above, these paths would be broken. Consider adding explicit unit tests for Minor Observations (non-blocking)
The P0 hashing inconsistency should be addressed before merge as it could cause incorrect conflict detection behavior (both missed conflicts and spurious retries). |
2ca9e9a to
0882e74
Compare
14fde36 to
70d8816
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
70d8816 to
bfa5dc0
Compare
bfa5dc0 to
dd6bb27
Compare
Based on lance-format#4787 Co-authored-by: vinoyang <vinoyang@apache.org>
Based on #4787
Co-authored-by: vinoyang vinoyang@apache.org