-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
To save some memory usage, and potentially also is faster, the data invisited_left_side in the HashJoinStream could be stored in a bitmap instead of a Vec<bool>. This would save ~7/8 byte per left row.
If we store only 32 bit integers on the left, the savings would be ~4-5% assuming we use 4 bytes for the items and roughly 16 bytes per left side row for the hasmap. Not too big, but a nice win in some cases. This could be bigger when we use a more memory-efficient data-structure for the hashmap.
Additionally, in case every row is not matches or no row is unmatched, it could include a fast path for those cases.
Describe the solution you'd like
Use a bitmap instead of Vec<bool>. The bitmap could be from arrow or maybe the bitvec crate.
Describe alternatives you've considered
Keep using a Vec<bool>
Additional context
n/a