Skip to content

[Rust] [DataFusion] Add inner (hash) equijoin physical plan #25621

@asfimport

Description

@asfimport

Here is an overview of how I think we should implement support for equijoins, at least for the initial implementation.

  • Read all batches from the left-side of the join into a single Vec
  • Create a map something like HashMap<Vec, Vec<(usize,usize)>> to map keys to batch/row indices
  • Iterate over this Vec and create an entry in a hash map, mapping the join keys to the index of the batch and row in the Vec
  • For each input partition on the right-side of the join, return an output partition that is an iterator/stream that:
    • For each input row, evaluate the join keys
    • Look up those join keys in the hash map
    • If a match is found:
      • For each (batch, row) index create an output row which has the values from both the left and right row and emit it
    • If no match is found:
      • Do not emit a row

Reporter: Jorge Leitão / @jorgecarleitao
Assignee: Jorge Leitão / @jorgecarleitao

PRs and other links:

Note: This issue was originally created as ARROW-9555. Please see the migration documentation for further details.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions