-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
The theme of this overall epic to make the plan and expression rewriting phases of DataFusion more efficient by avoiding copies by leveraging the Rust type system
Benefits:
-
More standard / idomatic Rust usage
-
faster / more efficient (I don't have numbers to back this up)
Downsides:
-
These will be backwards incompatible changes
Background
Many things in DataFusion look like
Input -
tranformation->outputAnd the input is not used again. In rust, you can model this by giving ownership to the transformation
At a high level the idea is to avoid so much cloning in DataFustion
The basic principle is if the function needs to
cloneone of its arguments, the caller should be given the choice of when to do that. Often, the caller can give up ownership without issueI envision at least the following the following items:
- Optimizer passes that take
&LogicalPlanand produce a newLogicalPlaneven though most callsites do not need the original - Expr builder calls that take
&exprand return a newExpr - An expression rewriter (TODO) while running down optimizer passes
I think this style takes advantage of Rust's ownership model and will let us avoid a lot o copying and allocations and avoid the need for something like slab allocators
- Optimizer passes that take
Reporter: Andrew Lamb / @alamb
Subtasks:
- [Rust][DataFusion] Avoid Expr::clone in Expr builder methods
- [Rust][DataFusion] Move
expressionsandinputsinto LogicalPlan rather than helpers in util - [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy
- [Rust][DataFusion] Rename ExpressionVisitor --> ExprVisitor and standardize input
- [Rust][DataFusion] Introduce PlanRewriter for rewriting plans
- [Rust][DataFusion] Change plan builder signature to take Vec rather than &[Expr]
Related issues:
Note: This issue was originally created as ARROW-11689. Please see the migration documentation for further details.