-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add support for EXPLAIN ANALYZE #858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| }, | ||
| /// Runs the actual plan, and then prints the physical plan with | ||
| /// with execution metrics. | ||
| Analyze { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: I chose a new LogicalPlan node because the implementation for ANALYZE is so different than EXPLAIN. However, it would be possible to re-use the same LogicalPlan node if people prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am about to ask when I saw the above code that has analyze as different function. I am worry about future inconsistency and headache of keeping them consistent, as well as redundant work when we change or improve something. I would prefer to keep them in the same LogicalPlan
97b5928 to
c7d1d4a
Compare
c7d1d4a to
3ff7ffd
Compare
NGA-TRAN
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the fact you built this infrastructure before adding new coulters. It is easy to understand and review. My only concern is the analyze is implemented in different path of the explain which seems also in different path of the actual plan. I am not sure how tricky to keep them in the same path but if we can, it will help us keep the results consistent, avoid redundancy and headache for future work.
| .and_then(|plan| plan.build()) | ||
| .map_err(BallistaError::DataFusionError)?; | ||
|
|
||
| roundtrip_test!(plan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we talk I need to learn the the effectiveness of this round trip test thats convert a logical/physical plan into photo and back. The tests look simple and easy to understand this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I simply copy/pasted the test for roundtrip_explain below -- I agree the pattern is quite nice
| }, | ||
| /// Runs the actual plan, and then prints the physical plan with | ||
| /// with execution metrics. | ||
| Analyze { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am about to ask when I saw the above code that has analyze as different function. I am worry about future inconsistency and headache of keeping them consistent, as well as redundant work when we change or improve something. I would prefer to keep them in the same LogicalPlan
@NGA-TRAN I also went back and forth on this point. The existing And thus even though there is definitely some redundancy, I eventually concluded that a new The physical plan ( |
|
I agree that a new variant makes sense here, for the reasons @alamb enumerated. Also, pretty awesome PR! 💯 |
3ff7ffd to
87caa91
Compare
Which issue does this PR close?
Resolves #779
Rationale for this change
EXPLAIN PLANis great to understand what DataFusion plans to do, but it is hard today using the SQL or dataframe interface to understand in more depth what actually happened during execution.My real usecase is being able to see how many rows flowed through each operator as well as the "time spent" and "rows produced" by each operator, and this PR is a step in that direction.
What changes are included in this PR?
EXPLAIN ANALYZEand EXPLAIN ANALYZE VERBOSE` (example below) sqlParquetStrreamintoRecordBatchReceiverStreamfor reuseAre there any user-facing changes?
Yes,
EXPLAIN ANALYZEnow does something different thanEXPLAINExample of use
Run CLI
Example
EXPLAIN ANALYZEoutputExample
EXPLAIN ANALYZE VERBOSEoutputFuture work:
Note this PR is designed just to hook up / plumb the existing code and metrics we have into SQL (basically what got added in #662). I plan a sequence of follow on PRs to both improve the metrics infrastructure #679 and add/fix the metrics that are actually reported so they are consistent. The specific metrics that are displayed are verbose and somewhat ad hoc at the moment.