Skip to content

Clone Calcite planner to access validator#12708

Merged
clintropolis merged 2 commits intoapache:masterfrom
paul-rogers:220625-validator
Jul 15, 2022
Merged

Clone Calcite planner to access validator#12708
clintropolis merged 2 commits intoapache:masterfrom
paul-rogers:220625-validator

Conversation

@paul-rogers
Copy link
Copy Markdown
Contributor

This is a breakout from PR #12636: Single Pass through the Calcite planner. This PR adds just one item: a clone of the Calcite planner, lightly converted to Druid's coding style, with a method to provide access to the validator. The only tricky bit is the validator itself. While #12636 cloned the validator, this PR uses a sneaky approach: it defines DruidSqlValidator as a subclass of the unreachable CalciteSqlValidator. To do that, it lives in the same Calcite name space as CalciteSqlValidator. Some checkstyle hacking was needed to allow this odd state of affairs.

There is zero change in functionality: everything works exactly as before. The goal here is simply to review the replacement of the Druid home-grown validator with the Calcite-provided one. This will then allow, in a later PR, to combine the validation and plan steps in the Druid planner.

Testing is handled by the large number of existing SQL tests, including integration tests.


This PR has:

  • been self-reviewed.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • been tested in a test Druid cluster.

@paul-rogers
Copy link
Copy Markdown
Contributor Author

Build is clean, except for a known flaky IT: 37272.52 (Compile=openjdk8, Run=openjdk8) perfect rollup parallel batch index integration test with Indexer, reported as Issue #12692. Ready for review.

Done in preparation for the "single-pass" planner.
@paul-rogers
Copy link
Copy Markdown
Contributor Author

Ready for review. Different flaky IT in the latest build: (Compile=openjdk8, Run=openjdk8) query retry integration test for missing segments, described in Issue #12733.

* In some future this could perhaps re-use some work done by {@link #validate(boolean)}
* instead of repeating it, but that day is not today.
*/
// RelConversionException is no longer thrown, but removing it causes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it really that many files?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! It expands ever-outward. Is good to clean up, but as a separate PR to keep this one small.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, was curious because on the surface it only looks like SqlLifecycle and some tests. As long as we don't forget about it i think its fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. From SqlLifecycle, the exception is exposed to the callers, which include the tests. Many tests then expose the exception in their test function signatures. So, to fully clean up, we have to clean up many test functions. That's the work to be done in another PR.

@abhishekagarwal87
Copy link
Copy Markdown
Contributor

This PR also paves the way to support multiple SQL statements now that we have forked the PlannerImpl class. We could change the CalcitePlanner to call parseStmtList() instead of parseStmt(). The calcite parser supports multiple SQL statements but the planner doesn't expose it.

Copy link
Copy Markdown
Member

@rohangarg rohangarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! 👍
left some comments

System.setProperty("saffron.default.collation.name", StringUtils.format("%s$en_US", charset));

// The following are the current names. See org.apache.calcite.config.CalciteSystemProperty
// https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/config/CalciteSystemProperty.java
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for checking up on this 👍 - can we also drop setting the old properties?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. I was being super-cautious to limit changes in this PR.

There is some mystery bug I never did solve: somewhere along the line Calcite stopped using our property values. If found this issue when trying to figure out why Calcite was ignoring our values.

}

@Override
public RelNode transform(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the dropping of RelConversionException from the Planner methods intended? I tried but could not figure out the reason for it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it turns out, Calcite declares an exception which is not thrown. Our IntelliJ inspections then refuse to allow this and require that we rectify Calcite's error. That causes the cascading issues mentioned elsewhere. This is a start to say that the exception isn't really thrown.

Comment thread sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java Outdated
Comment thread sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java Outdated
@paul-rogers
Copy link
Copy Markdown
Contributor Author

@abhishekagarwal87, good point:

This PR also paves the way to support multiple SQL statements now that we have forked the PlannerImpl class. We could change the CalcitePlanner to call parseStmtList() instead of parseStmt(). The calcite parser supports multiple SQL statements but the planner doesn't expose it.

The trick is that planning more than one statement at a time creates a race condition. In a normal DB, the following would fail:

CREATE TABLE foo ...;
SELECT * FROM foo

If we plan both statements up front, the second will fail because table foo doesn't exist yet. Once we support INSERT, we have the same issue, since INSERT can create a table. Still, if we had ALTER SESSION SET <option> = <value> (in place of context variables), it would be handy to do:

ALTER SESSION SET "priority" = 1;
SELECT ...

One solution that some tools use is to first split the statements at the semi-colon delimiter, then plan and execute them one by one. That eliminates the race conditions. For the above, we could create a mini per-request "session" to track the options, seeded with query context values.

@paul-rogers
Copy link
Copy Markdown
Contributor Author

Latest build failed with what I assume to be a flaky IT: (Compile=openjdk8, Run=openjdk8) leadership and high availability integration tests. The same IT passed in earlier builds, and nothing in the latest commit should affect the servers. The error says one of the servers failed to become ready.

Perhaps a committer can kindly restart that one test to see if we get lucky this time.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

* In some future this could perhaps re-use some work done by {@link #validate(boolean)}
* instead of repeating it, but that day is not today.
*/
// RelConversionException is no longer thrown, but removing it causes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, was curious because on the surface it only looks like SqlLifecycle and some tests. As long as we don't forget about it i think its fine.

@abhishekagarwal87
Copy link
Copy Markdown
Contributor

@abhishekagarwal87, good point:

This PR also paves the way to support multiple SQL statements now that we have forked the PlannerImpl class. We could change the CalcitePlanner to call parseStmtList() instead of parseStmt(). The calcite parser supports multiple SQL statements but the planner doesn't expose it.

The trick is that planning more than one statement at a time creates a race condition. In a normal DB, the following would fail:

CREATE TABLE foo ...;
SELECT * FROM foo

If we plan both statements up front, the second will fail because table foo doesn't exist yet. Once we support INSERT, we have the same issue, since INSERT can create a table. Still, if we had ALTER SESSION SET <option> = <value> (in place of context variables), it would be handy to do:

ALTER SESSION SET "priority" = 1;
SELECT ...

One solution that some tools use is to first split the statements at the semi-colon delimiter, then plan and execute them one by one. That eliminates the race conditions. For the above, we could create a mini per-request "session" to track the options, seeded with query context values.

Indeed. statements like ALTER SESSION was infact the very reason why we were exploring supporting multiple statements. So users can write the SQL query like

SET CONTEXT timeout=30000;
SET CONTEXT priority=1;
SELECT COUNT(*) from the_awesome_table

Takes us one step closer to just submitting SQL as a string to Druid. Right now, that SQL needs to be wrapped up inside a JSON.

@clintropolis clintropolis merged commit ee15c23 into apache:master Jul 15, 2022
@abhishekagarwal87 abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants