Skip to content

Expose DataFusion extensions (plan nodes, plan optimizer rules) #1782

@wjones127

Description

@wjones127

The code in dataset/scanner.rs has gotten extremely complicated, to a point where it is hard to test. Before we make any improvements, we need to refactor this to be easier to test and extend.

In addition, outside codebases may wish to extend Lance's capabilities by modifying or composing plans. For example, in LanceDB, we'll want to add a separate WAL that needs to be queried during KNN queries and scans.

Tasks

  • Create a LogicalPlan for KNN search
  • Write an extension for LogicalPlanBuilder and rewrite Scanner::create_plan() in terms of that.
  • Create a custom PhysicalPlanner to convert out logical plans into relevant nodes
  • Implement a PhysicalOptimizerRule for each of the optimizations we make:
    • Using vector indices (ANN)
    • Using scalar indices (ANN and scan)
  • Make sure schema returned from scanner comes from the logical plan.
  • metadata is returned sometimes but not all times. Make this behavior more consistent.
  • make _rowid and _distance less awkward to deal with -- so that select([]).with_row_id() is just select ("_rowid") We should reserve that column name.
  • don't return _distance unless select * or select _distance

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestrustRust related tasks

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions