Skip to content

Left join could use bitmap for left join instead of Vec<bool> #240

@Dandandan

Description

@Dandandan

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
To save some memory usage, and potentially also is faster, the data invisited_left_side in the HashJoinStream could be stored in a bitmap instead of a Vec<bool>. This would save ~7/8 byte per left row.
If we store only 32 bit integers on the left, the savings would be ~4-5% assuming we use 4 bytes for the items and roughly 16 bytes per left side row for the hasmap. Not too big, but a nice win in some cases. This could be bigger when we use a more memory-efficient data-structure for the hashmap.

Additionally, in case every row is not matches or no row is unmatched, it could include a fast path for those cases.

Describe the solution you'd like
Use a bitmap instead of Vec<bool>. The bitmap could be from arrow or maybe the bitvec crate.

Describe alternatives you've considered
Keep using a Vec<bool>

Additional context
n/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions