Skip to content

Convert the Druid planner to use statement handlers#12905

Merged
abhishekagarwal87 merged 7 commits intoapache:masterfrom
paul-rogers:220815-handler
Sep 19, 2022
Merged

Convert the Druid planner to use statement handlers#12905
abhishekagarwal87 merged 7 commits intoapache:masterfrom
paul-rogers:220815-handler

Conversation

@paul-rogers
Copy link
Copy Markdown
Contributor

@paul-rogers paul-rogers commented Aug 16, 2022

Druid has traditionally supported just one kind of SQL statement: SELECT. The planner was thus designed to process "a query", and an ever-increasing amount of conditional code was added to support other statements such as INSERT and REPLACE. As we look toward adding DDL statements, the current approach will become unworkable. Other SQL products introduce an additional layer to handle statement types: the statement handler. This PR adds statement handlers to Druid.

This PR builds on the single-pass planner PR to heavily refactor the Druid planner to split statement-specific code into a set of statement-specific handler classes. All handlers implement a simple interface:

interface SqlStatementHandler
{
  SqlNode sqlNode();
  void validate() throws ValidationException;
  Set<ResourceAction> resourceActions();
  void prepare();
  PrepareResult prepareResult();
  PlannerResult plan() throws ValidationException;
}

The details of what is needed for each statement is a (complex) implementation detail of the handler classes.

At present, all the SQL statements which Druid supports include a SELECT: EXPLAIN, INSERT, REPLACE and, of course, SELECT itself. To reflect this fact, a base QueryHandler class handles the common aspects. As we add other statements (such as DDL), completely new handlers will handle those cases.

For the most part, the code is identical between master and this PR, but the code is heavily refactored and shifted around.

This PR is a step toward modifying the SQL validator to handle the newer INSERT and REPLACE nodes. The validation logic for these two statements that migrated to handers in this PR will migrate again into a Druid version of the Calcite validator.

This PR is a redo of an earlier one that did this same work. This version incorporates the many planner changes done recently.


This PR has:

  • been self-reviewed.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • been tested in a test Druid cluster.

Comment thread sql/src/main/java/org/apache/druid/sql/calcite/parser/DruidSqlInsert.java Outdated
Comment thread sql/src/main/java/org/apache/druid/sql/calcite/planner/IngestHandler.java Outdated
this.ingestionGranularity = insertNode.getPartitionedBy();
}

protected static SqlNode convertQuery(DruidSqlIngest sqlNode) throws ValidationException
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that the DruidSqlIngest abstraction will become limiting as query conversion of an INSERT and REPLACE diverges in future. Do we really need the Ingest abstraction?

Copy link
Copy Markdown
Contributor Author

@paul-rogers paul-rogers Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class seems justified by the 180 lines of code in this class that would otherwise be copy/paste duplicated across the INSERT and REPLACE handlers. This is a DIY argument.

Another argument is that REPLACE and INSERT are both forms of "ingest" in Druid: they funnel down to the same MSQ engine. There difference is mainly in how they treat existing data (INSERT adds to it, REPLACE replaces it.)

If the two statements diverge, then we've got some user experience issues to deal with. (Why do I need to learn two different ways to do basically the same thing?) But, if it does occur, we simply copy the once-common code into the two handlers, and change it in one or both places.

Does this answer the question about whether this abstraction is needed?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. btw I meant that the interval implementation of INSERT vs REPLACE can diverge (e.g. validation logic is a bit different, error messages thrown could be different, etc.) But I agree that code reuse is significant.

@paul-rogers
Copy link
Copy Markdown
Contributor Author

@abhishekagarwal87, thank you for your review. This commit addresses the issues you raised.

Also rebased this PR on the latest master to resolve a merge conflict. A line in DruidPlanner changed, and that line now lives in a handler file, so copied the change to that handler file.

this.ingestionGranularity = insertNode.getPartitionedBy();
}

protected static SqlNode convertQuery(DruidSqlIngest sqlNode) throws ValidationException
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. btw I meant that the interval implementation of INSERT vs REPLACE can diverge (e.g. validation logic is a bit different, error messages thrown could be different, etc.) But I agree that code reuse is significant.

Converts the large collection of if-statements for statement
types into a set of classes: one per supported statement type.
Cleans up a few error messages.
@paul-rogers
Copy link
Copy Markdown
Contributor Author

@abhishekagarwal87, thanks for the clarification. I agree that the analysis of INSERT and REPLACE will differ. That difference will appear in the next step: when we move the analysis out of these planner classes and into the Calcite planner. This particular PR sets us up to make the shift of the analysis code out of the various ad-hoc places where it now resides into a customized validator. (And, we want to do that so that the validation can use catalog information available via the Calcite metadata mechanism.)

Thanks for the approval. There was one build failure due to a race condition fixed in a PR a few days back. Rebased on the latest master to pick up that change and @imply-cheddar's change. Resolved the conflict with @imply-cheddar's change. Let's verify that the build works.

@paul-rogers
Copy link
Copy Markdown
Contributor Author

Rebased on the recent QueryResponse change. Added type parameters to avoid IDE warnings.

@paul-rogers
Copy link
Copy Markdown
Contributor Author

@abhishekagarwal87, the build is now clean except for one flaky IT, described in issue #13112. Let me know if you believe this IT to be actually flaky, or if we should investigate the failure (and rerun the build) before we commit this change.

@abhishekagarwal87 abhishekagarwal87 merged commit 8ce03eb into apache:master Sep 19, 2022
@abhishekagarwal87
Copy link
Copy Markdown
Contributor

I have merged the PR. Thank you @paul-rogers.

@kfaraz kfaraz added this to the 25.0 milestone Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants