Skip to content

[Rust][DataFusion] Reduce copies in DataFusion LogicalPlan and Expr creation #27552

@asfimport

Description

@asfimport

The theme of this overall epic to make the plan and expression rewriting phases of DataFusion more efficient by avoiding copies by leveraging the Rust type system

Benefits:

  • More standard / idomatic Rust usage

  • faster / more efficient (I don't have numbers to back this up)

    Downsides:

  • These will be backwards incompatible changes

    Background

    Many things in DataFusion look like

    Input -tranformation->output

    And the input is not used again. In rust, you can model this by giving ownership to the transformation

    At a high level the idea is to avoid so much cloning in DataFustion

    The basic principle is if the function needs to clone one of its arguments, the caller should be given the choice of when to do that. Often, the caller can give up ownership without issue

    I envision at least the following the following items:

    1. Optimizer passes that take &LogicalPlan and produce a new LogicalPlan even though most callsites do not need the original
    2. Expr builder calls that take &expr and return a new Expr
    3. An expression rewriter (TODO) while running down optimizer passes

    I think this style takes advantage of Rust's ownership model and will let us avoid a lot o copying and allocations and avoid the need for something like slab allocators

Reporter: Andrew Lamb / @alamb

Subtasks:

Related issues:

Note: This issue was originally created as ARROW-11689. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions