Today is that day - Single pass through Calcite planner by paul-rogers · Pull Request #12636 · apache/druid

paul-rogers · 2022-06-11T23:40:32Z

The Druid planner contains the following comment:

   * In some future this could perhaps re-use some of the work done by {@link #validate(boolean)}
   * instead of repeating it, but that day is not today.

Well, today is that day. The Druid planner now makes only one pass through Calcite planner. This means we parse and validate the query only once, not twice as before.

The original two-pass approach likely was a way to resolve a conflict. Druid wants access to the "validator" used to validate a query so we can extract the "resource actions" (the datasources which the query accesses.) The validator is private to the default CalcitePlanner, so Druid created its own. However, to then continue on to planning, the Calcite planner throws a fit because it thinks we skipped the validation step, because we didn't do validation though Calcites's own default planner. Hence, we had to start over and do it Calcite's way.

Now, as it turns out, there is nothing special about the default CalcitePlannerImpl: it seems to exist as a handy way to support out-of-the-box, JDBC-based queries. More advanced use cases generally provide their own implementation. So, this PR clones the Calcite class, but with adjustments for Druid. The main adjustment is to provide access to the validator.

Once we resolve the validation issue, the rest is just plumbing: saving the parser state between validate on the one hand, and prepare/plan on the other. SqlLifecycle and DruidPlanner change a bit to handle that task.

The PR also moves some validation-like code from the plan() method to validate() to better fit into the new flow.

The key risk with this kind of change is that we break something. To catch any regression, this work was done in a private branch that also had the planner test framework. The planner artifacts (schema, logical plan, native query) were identical before and after the change. The various Calcite?QueryTest cases provide a lighter validation in this PR itself, since the planner framework is not yet in master, nor is it included in this PR.

The result of this change is that:

The Druid planner produces less "garbage" due to Calcite's heavy object creation.
We do half the work within Calcite, thus saving some CPU resource.
Table (datasource) checks are done once, avoiding potential race conditions if they change between validation and plan.

Authorization

Review comments called attention to some oddities in authorization that are addressed in a revised commit:

Authorization is moved from SqlLifecycle to DruidPlanner so it can be enforced as part of the query lifecycle. A new authorize() call does the job and ensures that the call is a) made, and b) made at the correct time.
Because of this, ValidationResult became redundant and was removed: the resource actions are mostly now private to the planner itself.

This PR has:

been self-reviewed.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
been tested in a test Druid cluster.

clintropolis · 2022-06-14T20:25:27Z

heh, I think I wrote that comment 😅

iirc the main reason I didn't make the change at the time is because I didn't want to validate that stateful jdbc connections actually worked if we made them be actually stateful (instead of repeating the work per request). So this is probably the area where most testing should focus, especially on prepared statements with dynamic parameters.

IIRC, for dynamic parameters, I think we do actually want to re-run through the planner once we have values for the actual parameter bindings because some stuff can be optimized during planning, but... it was a while ago when I wired them up so maybe its not an issue?

Anyway, I'll try to have a closer look at this sometime soon 👍

paul-rogers · 2022-06-14T21:52:23Z

@gianm, thanks for the background. The new planner test framework will catch the kinds of issues you mention: it captures the details of the Calcite logical plan (as well as the native query, like the existing tests do.) The good news is that the original version of these changes were done in a branch that had the new planner tests, and they reported no changes in the Calcite artifacts.

That said, it is worth spending time to trace exactly how we handle parameters to be sure we account for the proper flow.

In general, there should be no reason to replan a query under normal conditions: other tools that use Calcite generally make do with a single pass. Where tools tend to plan a second time is if the planner paints itself into a corner, some global state is changed, and the planner is run a different way. (Impala used to do that.)

clintropolis · 2022-06-14T22:22:43Z

That said, it is worth spending time to trace exactly how we handle parameters to be sure we account for the proper flow.

Dynamic parameters are handled in two places, the first by rewriting the SqlNode with https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/SqlParameterizerShuttle.java#L40 and the second, if any of them are missed at that layer, at the RelNode level with https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/RelParameterizerShuttle.java#L59. The PR that added them, #6974, has some additional details.

In general, there should be no reason to replan a query under normal conditions: other tools that use Calcite generally make do with a single pass. Where tools tend to plan a second time is if the planner paints itself into a corner, some global state is changed, and the planner is run a different way. (Impala used to do that.)

I somewhat remember getting different results based on whether or not the parameter bindings (and more importantly, their types) were present or not when running things through the planner, but if I remember might not change the end result native query so much. But yeah, in the JDBC case the state is a bit different once the parameters have values bound to them because they now have a type. SQL over HTTP also supports dynamic parameters, but in that case the bindings come with the HTTP request, instead of after preparing the statement like in JDBC, so there is no intermediary state where we have a bunch of typeless things hanging out.

paul-rogers · 2022-06-15T23:07:09Z

@clintropolis, thanks again for the description; it makes sense. @clintropolis described offline the process we use for parameters. Basically, when a query is to be run, and parameters provides along with the query text, SqlParameterizerShuttle substitutes in values early in the planing process for the reason @clintropolis outlines in the items linked above. SqlParameterizerShuttle may skip some parameters (such as DATE types with integer values) that it cannot handle.

If the query comes via Avatica, and is a PREPARE, then there are no parameters values: parameters are left as placeholders with values to be filled in per-query.

In either case, in the plan() step, any remaining parameters are replaced by values in RelParameterizerShuttle. This handles the case either of Avatica running a prepared query, or those pesky values which SqlParameterizerShuttle could not convert.

Note also that in the query-with-parameters case, SqlParameterizerShuttle checked that values were provided. In the Avatica PREPARE case, there were no values, and SqlParameterizerShuttle wasn't called. In this case, it falls to RelParameterizerShuttle to do the check that the required values exist.

As @clintropolis explained, the RelParameterizerShuttle check is only done for the "Druid convention" path. The "bindable convention" has its own parameter binding and checks.

Given all this (and after a bit more code tweaking and comment-adding), the one-pass approach in this PR seems to handle all of the above paths through the planner.

gianm

Very nice to have this cleaned up! The changes mostly look good to me. I had a few line comments. I didn't review the code that was marked as copied from Calcite, because I assume that it's the same stuff we're already using.

I also am not familiar with the JDBC / dynamic parameter issue that you and @clintropolis have been discussing, so I didn't consider that particular situation in my review. Do the tests cover it well enough?

gianm · 2022-06-16T07:13:13Z

+ * Calcite planner. Clone of Calcite's {@code SqlPlannerImpl},
+ * but with the validator made accessible.
+ */
+public class CalcitePlanner implements Planner, ViewExpander


I'm surprised checkstyle accepted this file. We must have pretty loose rules, since it doesn't look to me like a normal Druid source file. I suppose that's OK.

I did have to edit the file quite a bit to convert it from Calcite to Druid style. I refrained from more than the minimal changes to make it easier to compare the two. If we like, I can reformat the whole file to be closer to the Druid format. I would probably hold off changing the code structure to be more Druid-like. Calcite has its quirks, and it might be safer to leave well enough alone on that front. Thoughts? Any particularly egregious issues you'd like addressed?

gianm · 2022-06-16T07:20:54Z

  {
    final RelDataType rowType;
    try (final DruidPlanner planner = plannerFactory.createPlanner(viewSql, new QueryContext())) {
+      planner.validate(false);


I didn't realize that view expansion was done with an empty query context. I suppose this makes sense: it ensures the view is the same for everyone.

IMO, it's better to set this to always true instead of always false. Right now, it doesn't matter either way, because the context is empty. But if it's ever made nonempty, then true is a safer constant value because it defaults to the more restrictive security mode. If that doesn't make sense in the future, then the future person that is adding the context parameters can sort out what's best.

A comment would be helpful too, like:

// Since the context is empty, it doesn't matter if the authorizeContextParams flag is true or false.

So, this turns out to be another messy/tricky area. The authorizeContextParams flag simply adds the context variables to the set of resource actions. But, nothing in the above code path checks the resource actions. In fact, it seems that it can't do so: there is no authorization result available. So, the code is ambiguous: its claiming to do something that it can't do.

As part of the auth cleanup, I moved the authorizeContextParams flag to the authorization step, and clearly stated that a statement (or view) can be prepared without authorization. Only execution (AKA "plan") needs authorization.

The standard way to handle views is to assign them an owner. Views run with the owner's permissions. Queries that use the view must be authorized against the view, not against the resources which the view uses. Of course, Druid has no concept of users, so the idea of "owner" is ill-defined. Maybe we stash a bundle of permissions with the view or some such? That's a question for another time.

Context variables are a property of the query, not of the view. So, we should check them as part of the query check, and not confuse ourselves trying to figure out if we should check them with views.

Anyway, take a look at the revised code to see if it makes sense.

As part of the auth cleanup, I moved the authorizeContextParams flag to the authorization step, and clearly stated that a statement (or view) can be prepared without authorization. Only execution (AKA "plan") needs authorization.

Hmm, currently DruidStatement calls validateAndAuthorize for both prepare and execute, are you saying that prepare no longer authorizes or that it never did? (because looking at the current code it seems to do it)

The standard way to handle views is to assign them an owner. Views run with the owner's permissions. Queries that use the view must be authorized against the view, not against the resources which the view uses. Of course, Druid has no concept of users, so the idea of "owner" is ill-defined. Maybe we stash a bundle of permissions with the view or some such? That's a question for another time.

There is a 'VIEW' resource that views are authorized against, not the resource that the view query uses, which i think is more or less the same thing?

What I saw in the code is that the Avatica path did validate, authorize, prepare. But, the view path did validate, prepare but no authorize (it has no auth result or request). The flow before and after the change is the same. For Avatica, the path is validate, authorize, prepare. The query path is validate, authorized, plan. The view path is validate, prepare (with no authorize). Does this sound right?

Oh, I misunderstood what you were saying. Yes within the view's usage of the planner it is basically unauthorized on purpose, because the view itself as a whole is authorized as a ResourceType.VIEW by the SqlLifecycle of the query using the view. The planner usage here is to get the row type information so the query planner using the view can do its thing. #10812 has details on view authorization stuff (which is also the PR that left the comments that this PR is all about).

gianm · 2022-06-16T07:26:22Z

                          .filter(action -> action.getAction() == Action.READ)
                          .collect(Collectors.toSet());

+        // TODO: This is not really a state check since there is a race condition.


What's the TODO here — I don't see what change needs to be made later?

And, whatever that change is, why not make it now?

Thanks for the comment: forced me to look more closely at this. So, the check itself is OK as it is a sanity check. The broader issue is whether the auth check was done: did the caller properly make use of the ResourceActions? This bugged me a bit: it is critical, yet is outside of this planner.

To address that, I reworked how we do authorization: it is now done in the planner, and is guaranteed by a new planner state. Since the work is done in the planner, the VerificationResult became redundant and was removed. Since SqlLifecycleTest verifies implementation, rather than functionality, that needed adjustment also.

That rework suggested that there are several other areas to clean up, and that SqlLifecycleTest should be rewritten. That will come later, to limit the blast radius of this PR.

paul-rogers · 2022-06-22T05:31:02Z

@clintropolis, thanks for the history of parameters. I reread the links that you generously provided: very helpful. @gianm your advice to look into the JDBC (Avatica) code path was wise: doing so was insightful.

It turns out that Druid's implementation of the JDBC protocol is a bit off, which may have led to some of the confusion around when to prepare and when to expect parameters. Issue #12682 spells out the gaps. In particular, the implementation combines the ideas of JDBC Statement, PreparedStatement and ResultSet into one DruidStatement. In addition to being non-compliant with the standard, it cause ambiguity around parameters. So, PR #12709 fixes that issue.

PR #12708 handles the Calcite planner change. The prior SqlLifecycle refactoring was backed out of this PR so that this one is a bit more bite-sized. Once this one is done, we can move onto the "handler" concept from PR #12637 and the lifecycle refactoring previously in this PR.

paul-rogers · 2022-06-25T18:19:56Z

Grew to large. WIll split out supporting work into separate PRs.

paul-rogers · 2022-07-15T02:31:36Z

Rebased on the latest master which incorporates some of the changes previously here. Squashed commits to make the merge easier. There are no new changes in this latest commit: only removals of those files committed elsewhere (and removing the later SqlLifecycle refactoring to put the code back to the state it was at the prior reviews.)

clintropolis · 2022-07-15T19:21:52Z

    try (final DruidPlanner planner = plannerFactory.createPlanner(viewSql, new QueryContext())) {
-      rowType = planner.plan().rowType();
+      planner.validate();
+      rowType = planner.prepare().getRowType();


any reason to switch to prepare row type or just to do a bit less work?

The state machine added to the planner enforces that authorization is done before planning. However, in this context, we have no auth context. So, by using prepare instead, we get the information without the need to authorize. There is a comment in DruidPlanner.authorize(.) which explains the issue.

And, as you point out, the job of prepare is to provide this info: planning (to get a physical execution plan) is overkill.

clintropolis

lgtm overall

clintropolis · 2022-07-27T10:34:46Z

+ * Clone of Calcite's {@code CalciteSqlValidator} which is not
+ * visible to Druid.
+ */
+class DruidSqlValidator extends SqlValidatorImpl


hmm, should this be instead modifications to https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/calcite/prepare/DruidSqlValidator.java or does the DruidPlanner really need to clone the validator differently than the other one used by CalcitePlanner?

Good catch. So, what's supposed to happen is that the now-renamed BaseDruidSqlValidator simply exposes the Java-specific CalciteSqlValidator by living in the Calcite name space and extending a protected class. Then, DruidSqlValidator is the Druid version, with soon-to-be-added extensions for the INSERT statement. The latest commit makes this clearer: DruidSqlValidator extends BaseDruidSqlValidator and removes the methods we now inherit from CalciteSqlValidator. Clear as mud?

Let's let Travis do its thing to validate this latest adjustment, then please take another look.

Resolves the issue that required two parse/plan cycles: one for validate, another for plan. Creates a clone of the Calcite planner and validator to resolve the conflict that prevented the merger.

paul-rogers · 2022-07-27T23:26:25Z

Argh... Tests that use mocks to enforce a call sequence of a class's implementation are quite tedious, and don't shed much light. Adjusted SqlLifecycleTest for the revised code flow. Luckily, this test won't be with us much longer; a later PR in this sequence will escort it off to the great Bit Bucket in the sky.

Also rebased on master. To do so, squashed all prior commits; the most recent changes will appear in a new commit on top of the rebase.

clintropolis · 2022-07-27T23:40:29Z

Argh... Tests that use mocks to enforce a call sequence of a class's implementation are quite tedious, and don't shed much light.

heh, sorry, I'm responsible for that test; it was intended to ensure that state transitions were happening as expected and the actions within the state as expected, so was sort of testing implementation details on purpose, but since it was using mocks those sorts of tests are painful when refactoring stuff. Its probably worth retaining testing of some sort to ensure the state machine behaves as it is supposed to so that new callers can't mess stuff up, but not sure the best way to do that without being fragile/implementation dependent.

paul-rogers · 2022-07-28T01:09:30Z

@clintropolis, on the SqlLifecycleTest, one of the next PRs will refactor SqlLifecycle so it doesn't really have states. That one of the goals of this refactoring: once we get things in line, we can replace that class with a statement-like class that enforces its states via its API. But, that's for the next PR.

The present build again failed due to a corrupted Maven-downloaded jar file:

 error reading /home/travis/.m2/repository/org/assertj/assertj-core/3.19.0/assertj-core-3.19.0.jar; zip END header not found

Since this has happened twice now, I wonder if the file is corrupted upstream. If so, we'll need to wait for the file to be corrected, or the cache to be flushed. Without this file working, we can't run the ITs.

clintropolis · 2022-07-29T02:10:38Z

Looks like the query integration tests are legitimately failing due to changed error messages

The exception was thrown with the wrong message: expected ".* Parameter at position\[0] is not bound" but got "Error -1 (00000) : Remote driver error: SqlPlanningException: org.apache.druid.java.util.common.IAE: Parameter at position [0] is not bound"
	at org.apache.druid.tests.query.ITJdbcQueryTest.testJdbcPrepareStatementQueryMissingParameters(ITJdbcQueryTest.java:220)

paul-rogers · 2022-07-29T16:48:04Z

@clintropolis, yes foiled by a single space. Fixed that and now the build is clean and ready for a final review, thanks.

paul-rogers mentioned this pull request Jun 12, 2022

Convert Druid planner to use statement handlers #12637

Closed

5 tasks

clintropolis added the Area - SQL label Jun 14, 2022

paul-rogers added a commit to paul-rogers/druid that referenced this pull request Jun 15, 2022

Updates from PR apache#12636

a7898b3

gianm reviewed Jun 16, 2022

View reviewed changes

paul-rogers added a commit to paul-rogers/druid that referenced this pull request Jun 18, 2022

Copy changes from PR apache#12636

93c1535

This was referenced Jun 21, 2022

JDBC driver does not follow JDBC standard protocol #12682

Closed

Awkward test: SqlLifecycleTest #12642

Closed

Druid improperly closes a JDBC Statement on executeQuery() error #12684

Closed

paul-rogers force-pushed the 220610-planner branch 2 times, most recently from bd881c0 to edeccb5 Compare June 23, 2022 16:38

paul-rogers marked this pull request as draft June 25, 2022 18:19

This was referenced Jun 25, 2022

Clone Calcite planner to access validator #12708

Merged

Fixes for the Avatica JDBC driver #12709

Merged

paul-rogers force-pushed the 220610-planner branch from edeccb5 to 6b5fcd5 Compare July 15, 2022 02:29

clintropolis reviewed Jul 15, 2022

View reviewed changes

paul-rogers marked this pull request as ready for review July 15, 2022 20:12

clintropolis closed this Jul 27, 2022

clintropolis reopened this Jul 27, 2022

clintropolis reviewed Jul 27, 2022

View reviewed changes

Druid planner now makes only one pass through Calcite planner

45d5b92

Resolves the issue that required two parse/plan cycles: one for validate, another for plan. Creates a clone of the Calcite planner and validator to resolve the conflict that prevented the merger.

Build fix

23065a9

paul-rogers force-pushed the 220610-planner branch from 70f6f81 to 23065a9 Compare July 27, 2022 23:26

Trivial change to force rebuild

c345460

Fixed expected error message

3eb66c4

clintropolis approved these changes Jul 30, 2022

View reviewed changes

clintropolis merged commit d52abe7 into apache:master Jul 30, 2022

clintropolis added the Refactoring label Jul 30, 2022

paul-rogers mentioned this pull request Aug 16, 2022

Convert the Druid planner to use statement handlers #12905

Merged

4 tasks

gianm mentioned this pull request Aug 26, 2022

About the performance of using JDBC and HTTP to request Druids #12930

Open

abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022

Conversation

paul-rogers commented Jun 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Authorization

Uh oh!

clintropolis commented Jun 14, 2022

Uh oh!

paul-rogers commented Jun 14, 2022

Uh oh!

clintropolis commented Jun 14, 2022

Uh oh!

paul-rogers commented Jun 15, 2022

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clintropolis Jun 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paul-rogers commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paul-rogers commented Jun 25, 2022

Uh oh!

paul-rogers commented Jul 15, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paul-rogers Jul 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paul-rogers commented Jul 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clintropolis commented Jul 27, 2022

Uh oh!

paul-rogers commented Jul 28, 2022

Uh oh!

clintropolis commented Jul 29, 2022

Uh oh!

paul-rogers commented Jul 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

paul-rogers commented Jun 11, 2022 •

edited

Loading

clintropolis Jun 18, 2022 •

edited

Loading

paul-rogers commented Jun 22, 2022 •

edited

Loading

paul-rogers Jul 15, 2022 •

edited

Loading

paul-rogers commented Jul 27, 2022 •

edited

Loading