Convert the Druid planner to use statement handlers#12905
Convert the Druid planner to use statement handlers#12905abhishekagarwal87 merged 7 commits intoapache:masterfrom
Conversation
efddda1 to
a0a76bd
Compare
| this.ingestionGranularity = insertNode.getPartitionedBy(); | ||
| } | ||
|
|
||
| protected static SqlNode convertQuery(DruidSqlIngest sqlNode) throws ValidationException |
There was a problem hiding this comment.
I feel that the DruidSqlIngest abstraction will become limiting as query conversion of an INSERT and REPLACE diverges in future. Do we really need the Ingest abstraction?
There was a problem hiding this comment.
The class seems justified by the 180 lines of code in this class that would otherwise be copy/paste duplicated across the INSERT and REPLACE handlers. This is a DIY argument.
Another argument is that REPLACE and INSERT are both forms of "ingest" in Druid: they funnel down to the same MSQ engine. There difference is mainly in how they treat existing data (INSERT adds to it, REPLACE replaces it.)
If the two statements diverge, then we've got some user experience issues to deal with. (Why do I need to learn two different ways to do basically the same thing?) But, if it does occur, we simply copy the once-common code into the two handlers, and change it in one or both places.
Does this answer the question about whether this abstraction is needed?
There was a problem hiding this comment.
Thanks for the explanation. btw I meant that the interval implementation of INSERT vs REPLACE can diverge (e.g. validation logic is a bit different, error messages thrown could be different, etc.) But I agree that code reuse is significant.
a0a76bd to
4221071
Compare
|
@abhishekagarwal87, thank you for your review. This commit addresses the issues you raised. Also rebased this PR on the latest master to resolve a merge conflict. A line in |
7807d9b to
90811a6
Compare
| this.ingestionGranularity = insertNode.getPartitionedBy(); | ||
| } | ||
|
|
||
| protected static SqlNode convertQuery(DruidSqlIngest sqlNode) throws ValidationException |
There was a problem hiding this comment.
Thanks for the explanation. btw I meant that the interval implementation of INSERT vs REPLACE can diverge (e.g. validation logic is a bit different, error messages thrown could be different, etc.) But I agree that code reuse is significant.
Converts the large collection of if-statements for statement types into a set of classes: one per supported statement type. Cleans up a few error messages.
|
@abhishekagarwal87, thanks for the clarification. I agree that the analysis of Thanks for the approval. There was one build failure due to a race condition fixed in a PR a few days back. Rebased on the latest master to pick up that change and @imply-cheddar's change. Resolved the conflict with @imply-cheddar's change. Let's verify that the build works. |
90811a6 to
3fb24c0
Compare
Forces a rebuild due to a flaky test
|
Rebased on the recent |
|
@abhishekagarwal87, the build is now clean except for one flaky IT, described in issue #13112. Let me know if you believe this IT to be actually flaky, or if we should investigate the failure (and rerun the build) before we commit this change. |
|
I have merged the PR. Thank you @paul-rogers. |
Druid has traditionally supported just one kind of SQL statement:
SELECT. The planner was thus designed to process "a query", and an ever-increasing amount of conditional code was added to support other statements such asINSERTandREPLACE. As we look toward adding DDL statements, the current approach will become unworkable. Other SQL products introduce an additional layer to handle statement types: the statement handler. This PR adds statement handlers to Druid.This PR builds on the single-pass planner PR to heavily refactor the Druid planner to split statement-specific code into a set of statement-specific handler classes. All handlers implement a simple interface:
The details of what is needed for each statement is a (complex) implementation detail of the handler classes.
At present, all the SQL statements which Druid supports include a
SELECT:EXPLAIN,INSERT,REPLACEand, of course,SELECTitself. To reflect this fact, a baseQueryHandlerclass handles the common aspects. As we add other statements (such as DDL), completely new handlers will handle those cases.For the most part, the code is identical between
masterand this PR, but the code is heavily refactored and shifted around.This PR is a step toward modifying the SQL validator to handle the newer
INSERTandREPLACEnodes. The validation logic for these two statements that migrated to handers in this PR will migrate again into a Druid version of the Calcite validator.This PR is a redo of an earlier one that did this same work. This version incorporates the many planner changes done recently.
This PR has: