Improve the output of SQL explain message#11908
Improve the output of SQL explain message#11908abhishekagarwal87 merged 22 commits intoapache:masterfrom
Conversation
kfaraz
left a comment
There was a problem hiding this comment.
Thanks for the changes, @LakshSingla !
On the whole, the PR looks good.
There are a few things that need changing/explanation.
|
We can support both the formats simultaneously in Calcite's syntax using So potentially, |
|
Commented out the |
|
@LakshSingla - just a quick question - will this be backward-compatible? |
|
@dbardbar - will you still need that custom code once this change is merged? This change is trying to solve the same problem of not being able to see the final native query. |
|
hmm. I guess you would still want to rid of the extra fields such as row signature etc |
|
@LakshSingla - the native query can be extracted from the response today, but it does require some logic to extract it. Not the prettiest piece of code, but not that complicated. |
|
@abhishekagarwal87 - for our use-case we want to extract from the native query on the part related to the 'WHERE' clause, so anyway we'll need some basic handling, even with the new code. With the new code, the extraction will be a bit easier. |
|
This looks nice 👍 Since this changes the output of a query, i'm +1 for adding a feature flag to allow the previous results to be returned. Since this output seems totally better I think it is ok to default it to the new stuff. It might be nice to also add a context parameter so that it could be overridden per query to make it easier for developers to migrate their apps to the new output while debugging without having to set it for the entire cluster. |
|
In accordance with the above suggestions, I have added a config option in the |
abhishekagarwal87
left a comment
There was a problem hiding this comment.
looks good to me overall. just one minor comment.
|
Should the property |
Yes. You can put it in |
|
Updated the description with pros and cons of the newer approach. |
|
Very excited for this change! I got 2 questions: (1) is it possible to provide the signature as JSON and not as a string? instead of provide it as the signature strings are really annoying to parse and these signatures are really important (sometimes more so than the query) (2) you included screenshots of the old format in the console query view, but how does it work with the explain dialog? Does it need to be updated as part of this PR or right afterwards? Just to be clear I am talking about this dialog: As you can see the current dialog parses the old format thus relying on it. I would LOVE nothing more than to switch to this new format (providing pt.1) but does this PR break this dialog? Ideally there would be a context flag to trigger the new format but the old format would be the default (at least for a few releases). |
|
it looks like |
|
Thanks for the comment @vogievetsky
|
clintropolis
left a comment
There was a problem hiding this comment.
👍
#11959 got merged faster than I was expecting 😅, so maybe consider the rename/inversion when fixing up conflicts
| |`druid.sql.planner.metadataSegmentPollPeriod`|How often to poll coordinator for published segments list if `druid.sql.planner.metadataSegmentCacheEnable` is set to true. Poll period is in milliseconds. |60000| | ||
| |`druid.sql.planner.authorizeSystemTablesDirectly`|If true, Druid authorizes queries against any of the system schema tables (`sys` in SQL) as `SYSTEM_TABLE` resources which require `READ` access, in addition to permissions based content filtering.|false| | ||
| |`druid.sql.planner.useLegacyDruidExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan in legacy format, else it will return a JSON representation of the native queries the given SQL statement translates to. It can be overridden per query with `useLegacyDruidExplain` context key.|false| | ||
| |`druid.sql.planner.useLegacyDruidExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan in legacy format, else it will return a JSON representation of the native queries the given SQL statement translates to. It can be overridden per query with `useLegacyDruidExplain` context key.|true| |
There was a problem hiding this comment.
since the default has been flipped, I wonder if we should invert this setting and call something like `useJsonStringExplain" or something similar that defaults to false. (I just did a similar thing when swapping the default to keep old behavior in #11184, swapping a "use legacy" to "use new thing", which I think maybe is better since is a bit odd for "use legacy" to be the default of something)
There was a problem hiding this comment.
If in the future, the newer explain plan is to be used, then I think useLegacyDruidExplain would work better, since we won't have to change the feature flag then. But it does look weird right now, since it hasn't been deprecated yet. I am fine with keeping it either way.
There was a problem hiding this comment.
I agree. The choice is between legacy vs new plan so from that the flag name does sound good to me.
|
Just merged in the changes! |
|
Minor comment: which frees the referenced code from usage of |
I think that the current approach is fine since the new approach means creating a new type. If in future, we do similar deserialization in other places, we can use a custom type specifically for explain output. |
|
Thank you @LakshSingla. I have merged your change. |
This is a follow up to the PR #11908. This fixes the bug in top level union all queries when there are more than 2 SQL subqueries are present.


Description
Currently, when we try to do
EXPLAIN PLAN FOR, it returns the structure of the SQL parsed (via Calcite's internal planner util), which is verbose (since it tries to explain about the nodes in the SQL, instead of the Druid Query), and not representative of the native Druid query which will get executed on the broker side.This PR aims to change the format when user tries to
EXPLAIN PLAN FORfor queries which are executed by converting them into Druid's native queries (i.e. notsysschemas).The explanation now will be a list of columns with information about
resources(no change) andexplanation.Explanation is a string representing an array of queries & their signatures (in JSON format). Final shape of the
explanationwill look like:Examples:
Query Shape:
EXPLAIN PLAN FOR ( SELECT dim1, dim2 FROM druid.foo )BEFORE
Output in console


Plan (expanded)
AFTER
Query Shape:
EXPLAIN PLAN FOR ( SELECT dim1 FROM druid.foo UNION ALL SELECT dim2 FROM druid.foo2)BEFORE
Output in console


Plan (expanded)
AFTER
Some parts truncated
Query Shape:
EXPLAIN PLAN FOR ( SELECT a.dim1, COUNT(*) FROM druid.foo a INNER JOIN ( SELECT dim1, dim2 FROM druid.foo UNION ALL SELECT dim1, dim2 FROM druid.foo2 ) b ON a.dim1 = b.dim1 WHERE a.dim1 = 1.0 b.dim1 = 2.0 GROUP BY a.dim1 )BEFORE
Output in console


Plan(expanded)
AFTER
Some parts truncated
The older format vs the newer format
Older Format:
RelNodesformed from the SQL statement. This could help in understanding how the given query was arrangedNewer Format
EXPLAIN PLAN FORsemantics of other databases like SQL/Oracle etc. (Can be said for the older version as well)We are using a external context flag to switch between legacy and native mode, and not relying on
EXPLAIN PLAN FOR ... AS JSONsince the latter should ideally only vary the format of the EXPLAIN PLAN and not the output itself (which is being done here).Key changed/added classes in this PR
DruidPlanner#planExplanation.RowSignatureTestfor serializabilityThis PR has: