-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Optimize planning / stop cloning Strings / Fields so much (2-3% faster planning time) #18415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| metadata, | ||
| .. | ||
| }) => { | ||
| let field = expr.to_field(schema).map(|(_, f)| f.as_ref().clone())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these as_ref().clone() type methods are doing a deep clone on the Fields each time even when the name is not changing
|
🤖 |
| /// Returns an immutable reference of a specific `Field` instance selected using an | ||
| /// offset within the internal `fields` vector | ||
| pub fn field(&self, i: usize) -> &Field { | ||
| pub fn field(&self, i: usize) -> &FieldRef { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes in this file are the public API changes -- they return a reference to the Arc rather than what is in the Arc meaning the callsite can clone the arcs when needed
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
074e485 to
879d1bf
Compare
879d1bf to
9dcfdfc
Compare
| let filter_df_fields = filter_df_fields | ||
| .into_iter() | ||
| .map(|(qualifier, field)| { | ||
| (qualifier.cloned(), Arc::new(field.clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was always a deep clone that is now remved
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
| } | ||
|
|
||
| impl FieldExt for Field { | ||
| fn renamed(self, new_name: &str) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added several new methods to avoid cloning when it is unnecesary
| @@ -429,38 +430,36 @@ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is called as part of Expr::nullable and Expr::nullable so it is a very hot path on planning
|
@Jefffrey do you happen to have time to review this PR? |
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me 👍
datafusion/common/src/dfschema.rs
Outdated
| &self.inner.fields | ||
| } | ||
|
|
||
| /// Returns an immutable reference of a specific `Field` instance selected using an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps need to update some of these method docstrings since they assume returning a Field
| /// let renamed_field = int_field.renamed("your_int"); | ||
| /// assert_eq!(renamed_field.name(), "your_int"); | ||
| /// ``` | ||
| fn renamed(self, new_name: &str) -> Self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder for these new methods if there is existing documentation elsewhere we'd need to upgrade to make these new methods more visible? i.e. "use these new methods if possible", though not sure how relevant might be considering these mainly replace the old as_ref from Arc -> clone -> modify field -> wrap in new Arc loop 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a good idea -- I will review the existing docs and try to make it more obvious / clear how to use the new methdods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to update the documentation with some more details on how to discover the new traits
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
…on into alamb/stop_cloning_fields
xudong963
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the effort @alamb, our team is also interested in the planner performance!
Thanks @xudong963 -- the good news is I think there is some more significant improvement we could make (by not copying strings so much). I'll try and explore it when I have more time |
|
run benchmarks sql_planner |
|
🤖 Hi @alamb, thanks for the request (#18415 (comment)).
Please choose one or more of these with |
|
run benchmark sql_planner |
|
show benchmark queue |
|
🤖 Hi @alamb, you asked to view the benchmark queue (#18415 (comment)).
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Thanks again @xudong963 and @Jefffrey |
Which issue does this PR close?
Rationale for this change
Avoid a bunch of clones / String copies during planning
What changes are included in this PR?
Change several methods on DFSchema to return
&FieldRefrather than&Fieldwhich permitsArc::clonerather than a deepFieldcloneAre these changes tested?
yes by CI
I also ran benchmarks that show a small but consistent speedup in the planning benchmarks
Are there any user-facing changes?
Yes, there are several API changes in DFSchema that now return
FieldRefrather thanFieldwhich allows usingArc::clonerather thanclone. I have updated the upgrading guide too