Today is that day - Single pass through Calcite planner#12636
Today is that day - Single pass through Calcite planner#12636clintropolis merged 4 commits intoapache:masterfrom
Conversation
|
heh, I think I wrote that comment 😅 iirc the main reason I didn't make the change at the time is because I didn't want to validate that stateful jdbc connections actually worked if we made them be actually stateful (instead of repeating the work per request). So this is probably the area where most testing should focus, especially on prepared statements with dynamic parameters. IIRC, for dynamic parameters, I think we do actually want to re-run through the planner once we have values for the actual parameter bindings because some stuff can be optimized during planning, but... it was a while ago when I wired them up so maybe its not an issue? Anyway, I'll try to have a closer look at this sometime soon 👍 |
|
@gianm, thanks for the background. The new planner test framework will catch the kinds of issues you mention: it captures the details of the Calcite logical plan (as well as the native query, like the existing tests do.) The good news is that the original version of these changes were done in a branch that had the new planner tests, and they reported no changes in the Calcite artifacts. That said, it is worth spending time to trace exactly how we handle parameters to be sure we account for the proper flow. In general, there should be no reason to replan a query under normal conditions: other tools that use Calcite generally make do with a single pass. Where tools tend to plan a second time is if the planner paints itself into a corner, some global state is changed, and the planner is run a different way. (Impala used to do that.) |
Dynamic parameters are handled in two places, the first by rewriting the
I somewhat remember getting different results based on whether or not the parameter bindings (and more importantly, their types) were present or not when running things through the planner, but if I remember might not change the end result native query so much. But yeah, in the JDBC case the state is a bit different once the parameters have values bound to them because they now have a type. SQL over HTTP also supports dynamic parameters, but in that case the bindings come with the HTTP request, instead of after preparing the statement like in JDBC, so there is no intermediary state where we have a bunch of typeless things hanging out. |
|
@clintropolis, thanks again for the description; it makes sense. @clintropolis described offline the process we use for parameters. Basically, when a query is to be run, and parameters provides along with the query text, If the query comes via Avatica, and is a PREPARE, then there are no parameters values: parameters are left as placeholders with values to be filled in per-query. In either case, in the Note also that in the query-with-parameters case, As @clintropolis explained, the Given all this (and after a bit more code tweaking and comment-adding), the one-pass approach in this PR seems to handle all of the above paths through the planner. |
gianm
left a comment
There was a problem hiding this comment.
Very nice to have this cleaned up! The changes mostly look good to me. I had a few line comments. I didn't review the code that was marked as copied from Calcite, because I assume that it's the same stuff we're already using.
I also am not familiar with the JDBC / dynamic parameter issue that you and @clintropolis have been discussing, so I didn't consider that particular situation in my review. Do the tests cover it well enough?
| * Calcite planner. Clone of Calcite's {@code SqlPlannerImpl}, | ||
| * but with the validator made accessible. | ||
| */ | ||
| public class CalcitePlanner implements Planner, ViewExpander |
There was a problem hiding this comment.
I'm surprised checkstyle accepted this file. We must have pretty loose rules, since it doesn't look to me like a normal Druid source file. I suppose that's OK.
There was a problem hiding this comment.
I did have to edit the file quite a bit to convert it from Calcite to Druid style. I refrained from more than the minimal changes to make it easier to compare the two. If we like, I can reformat the whole file to be closer to the Druid format. I would probably hold off changing the code structure to be more Druid-like. Calcite has its quirks, and it might be safer to leave well enough alone on that front. Thoughts? Any particularly egregious issues you'd like addressed?
| { | ||
| final RelDataType rowType; | ||
| try (final DruidPlanner planner = plannerFactory.createPlanner(viewSql, new QueryContext())) { | ||
| planner.validate(false); |
There was a problem hiding this comment.
I didn't realize that view expansion was done with an empty query context. I suppose this makes sense: it ensures the view is the same for everyone.
IMO, it's better to set this to always true instead of always false. Right now, it doesn't matter either way, because the context is empty. But if it's ever made nonempty, then true is a safer constant value because it defaults to the more restrictive security mode. If that doesn't make sense in the future, then the future person that is adding the context parameters can sort out what's best.
A comment would be helpful too, like:
// Since the context is empty, it doesn't matter if the authorizeContextParams flag is true or false.
There was a problem hiding this comment.
So, this turns out to be another messy/tricky area. The authorizeContextParams flag simply adds the context variables to the set of resource actions. But, nothing in the above code path checks the resource actions. In fact, it seems that it can't do so: there is no authorization result available. So, the code is ambiguous: its claiming to do something that it can't do.
As part of the auth cleanup, I moved the authorizeContextParams flag to the authorization step, and clearly stated that a statement (or view) can be prepared without authorization. Only execution (AKA "plan") needs authorization.
The standard way to handle views is to assign them an owner. Views run with the owner's permissions. Queries that use the view must be authorized against the view, not against the resources which the view uses. Of course, Druid has no concept of users, so the idea of "owner" is ill-defined. Maybe we stash a bundle of permissions with the view or some such? That's a question for another time.
Context variables are a property of the query, not of the view. So, we should check them as part of the query check, and not confuse ourselves trying to figure out if we should check them with views.
Anyway, take a look at the revised code to see if it makes sense.
There was a problem hiding this comment.
As part of the auth cleanup, I moved the authorizeContextParams flag to the authorization step, and clearly stated that a statement (or view) can be prepared without authorization. Only execution (AKA "plan") needs authorization.
Hmm, currently DruidStatement calls validateAndAuthorize for both prepare and execute, are you saying that prepare no longer authorizes or that it never did? (because looking at the current code it seems to do it)
The standard way to handle views is to assign them an owner. Views run with the owner's permissions. Queries that use the view must be authorized against the view, not against the resources which the view uses. Of course, Druid has no concept of users, so the idea of "owner" is ill-defined. Maybe we stash a bundle of permissions with the view or some such? That's a question for another time.
There is a 'VIEW' resource that views are authorized against, not the resource that the view query uses, which i think is more or less the same thing?
There was a problem hiding this comment.
What I saw in the code is that the Avatica path did validate, authorize, prepare. But, the view path did validate, prepare but no authorize (it has no auth result or request). The flow before and after the change is the same. For Avatica, the path is validate, authorize, prepare. The query path is validate, authorized, plan. The view path is validate, prepare (with no authorize). Does this sound right?
There was a problem hiding this comment.
Oh, I misunderstood what you were saying. Yes within the view's usage of the planner it is basically unauthorized on purpose, because the view itself as a whole is authorized as a ResourceType.VIEW by the SqlLifecycle of the query using the view. The planner usage here is to get the row type information so the query planner using the view can do its thing. #10812 has details on view authorization stuff (which is also the PR that left the comments that this PR is all about).
| .filter(action -> action.getAction() == Action.READ) | ||
| .collect(Collectors.toSet()); | ||
|
|
||
| // TODO: This is not really a state check since there is a race condition. |
There was a problem hiding this comment.
What's the TODO here — I don't see what change needs to be made later?
And, whatever that change is, why not make it now?
There was a problem hiding this comment.
Thanks for the comment: forced me to look more closely at this. So, the check itself is OK as it is a sanity check. The broader issue is whether the auth check was done: did the caller properly make use of the ResourceActions? This bugged me a bit: it is critical, yet is outside of this planner.
To address that, I reworked how we do authorization: it is now done in the planner, and is guaranteed by a new planner state. Since the work is done in the planner, the VerificationResult became redundant and was removed. Since SqlLifecycleTest verifies implementation, rather than functionality, that needed adjustment also.
There was a problem hiding this comment.
That rework suggested that there are several other areas to clean up, and that SqlLifecycleTest should be rewritten. That will come later, to limit the blast radius of this PR.
|
@clintropolis, thanks for the history of parameters. I reread the links that you generously provided: very helpful. @gianm your advice to look into the JDBC (Avatica) code path was wise: doing so was insightful. It turns out that Druid's implementation of the JDBC protocol is a bit off, which may have led to some of the confusion around when to prepare and when to expect parameters. Issue #12682 spells out the gaps. In particular, the implementation combines the ideas of JDBC PR #12708 handles the Calcite planner change. The prior |
bd881c0 to
edeccb5
Compare
|
Grew to large. WIll split out supporting work into separate PRs. |
edeccb5 to
6b5fcd5
Compare
|
Rebased on the latest master which incorporates some of the changes previously here. Squashed commits to make the merge easier. There are no new changes in this latest commit: only removals of those files committed elsewhere (and removing the later |
| try (final DruidPlanner planner = plannerFactory.createPlanner(viewSql, new QueryContext())) { | ||
| rowType = planner.plan().rowType(); | ||
| planner.validate(); | ||
| rowType = planner.prepare().getRowType(); |
There was a problem hiding this comment.
any reason to switch to prepare row type or just to do a bit less work?
There was a problem hiding this comment.
The state machine added to the planner enforces that authorization is done before planning. However, in this context, we have no auth context. So, by using prepare instead, we get the information without the need to authorize. There is a comment in DruidPlanner.authorize(.) which explains the issue.
And, as you point out, the job of prepare is to provide this info: planning (to get a physical execution plan) is overkill.
| * Clone of Calcite's {@code CalciteSqlValidator} which is not | ||
| * visible to Druid. | ||
| */ | ||
| class DruidSqlValidator extends SqlValidatorImpl |
There was a problem hiding this comment.
hmm, should this be instead modifications to https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/calcite/prepare/DruidSqlValidator.java or does the DruidPlanner really need to clone the validator differently than the other one used by CalcitePlanner?
There was a problem hiding this comment.
Good catch. So, what's supposed to happen is that the now-renamed BaseDruidSqlValidator simply exposes the Java-specific CalciteSqlValidator by living in the Calcite name space and extending a protected class. Then, DruidSqlValidator is the Druid version, with soon-to-be-added extensions for the INSERT statement. The latest commit makes this clearer: DruidSqlValidator extends BaseDruidSqlValidator and removes the methods we now inherit from CalciteSqlValidator. Clear as mud?
Let's let Travis do its thing to validate this latest adjustment, then please take another look.
Resolves the issue that required two parse/plan cycles: one for validate, another for plan. Creates a clone of the Calcite planner and validator to resolve the conflict that prevented the merger.
|
Argh... Tests that use mocks to enforce a call sequence of a class's implementation are quite tedious, and don't shed much light. Adjusted Also rebased on master. To do so, squashed all prior commits; the most recent changes will appear in a new commit on top of the rebase. |
70f6f81 to
23065a9
Compare
heh, sorry, I'm responsible for that test; it was intended to ensure that state transitions were happening as expected and the actions within the state as expected, so was sort of testing implementation details on purpose, but since it was using mocks those sorts of tests are painful when refactoring stuff. Its probably worth retaining testing of some sort to ensure the state machine behaves as it is supposed to so that new callers can't mess stuff up, but not sure the best way to do that without being fragile/implementation dependent. |
|
@clintropolis, on the The present build again failed due to a corrupted Maven-downloaded jar file: Since this has happened twice now, I wonder if the file is corrupted upstream. If so, we'll need to wait for the file to be corrected, or the cache to be flushed. Without this file working, we can't run the ITs. |
|
Looks like the query integration tests are legitimately failing due to changed error messages |
|
@clintropolis, yes foiled by a single space. Fixed that and now the build is clean and ready for a final review, thanks. |
The Druid planner contains the following comment:
Well, today is that day. The Druid planner now makes only one pass through Calcite planner. This means we parse and validate the query only once, not twice as before.
The original two-pass approach likely was a way to resolve a conflict. Druid wants access to the "validator" used to validate a query so we can extract the "resource actions" (the datasources which the query accesses.) The validator is private to the default
CalcitePlanner, so Druid created its own. However, to then continue on to planning, the Calcite planner throws a fit because it thinks we skipped the validation step, because we didn't do validation though Calcites's own default planner. Hence, we had to start over and do it Calcite's way.Now, as it turns out, there is nothing special about the default
CalcitePlannerImpl: it seems to exist as a handy way to support out-of-the-box, JDBC-based queries. More advanced use cases generally provide their own implementation. So, this PR clones the Calcite class, but with adjustments for Druid. The main adjustment is to provide access to the validator.Once we resolve the validation issue, the rest is just plumbing: saving the parser state between validate on the one hand, and prepare/plan on the other.
SqlLifecycleandDruidPlannerchange a bit to handle that task.The PR also moves some validation-like code from the
plan()method tovalidate()to better fit into the new flow.The key risk with this kind of change is that we break something. To catch any regression, this work was done in a private branch that also had the planner test framework. The planner artifacts (schema, logical plan, native query) were identical before and after the change. The various
Calcite?QueryTestcases provide a lighter validation in this PR itself, since the planner framework is not yet in master, nor is it included in this PR.The result of this change is that:
Authorization
Review comments called attention to some oddities in authorization that are addressed in a revised commit:
SqlLifecycletoDruidPlannerso it can be enforced as part of the query lifecycle. A newauthorize()call does the job and ensures that the call is a) made, and b) made at the correct time.ValidationResultbecame redundant and was removed: the resource actions are mostly now private to the planner itself.This PR has: