Skip to content

SQL INSERT planner support.#11959

Merged
gianm merged 7 commits intoapache:masterfrom
gianm:sql-insert-p1
Nov 24, 2021
Merged

SQL INSERT planner support.#11959
gianm merged 7 commits intoapache:masterfrom
gianm:sql-insert-p1

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Nov 19, 2021

This patch is doing parts 1–3 and 5 from the "Proposed changes" in #11929. Nothing is exposed to end users as a result of this patch. It's just changes to the planner that enable understanding and authorizing INSERTs, and writing some initial tests. The main logic is in DruidPlanner and the main tests are in CalciteInsertDmlTest. Major things you won't find are the ability to actually execute INSERT queries (that needs a new QueryMaker) and the ability to handle PARTITION BY and CLUSTER BY (that will require parser changes). Those would be done in follow-on patches.

The main changes are:

  1. DruidPlanner is able to validate and authorize INSERT queries. They require WRITE permission on the target datasource.

  2. QueryMaker is now an interface, and there is a QueryMakerFactory that creates instances of it. There is only one production implementation of each (NativeQueryMaker and NativeQueryMakerFactory), which together behave the same way as the former QueryMaker class. But this opens the door to executing queries in ways other than the Druid query stack, and is used by unit tests (CalciteInsertDmlTest) to test the INSERT planning functionality.

  3. Adds an EXTERN table macro that allows references external data using InputSource and InputFormat from Druid's batch ingestion API. This is not exposed in production yet, but is used by unit tests.

  4. Adds a QueryFeature concept that enables the planner to change its behavior slightly depending on the capabilities of the execution system.

  5. Adds an "AuthorizableOperator" concept that enables SqlOperators to require additional permissions. This is used by the EXTERN table macro.

Related odds and ends:

  • Add equals, hashCode, toString methods to InlineInputSource. Aids in the "from external" tests in CalciteInsertDmlTest.
  • Add JSON-serializability to RowSignature.
  • Move the SQL string inside PlannerContext so it is "baked into" the planner when the planner is created. Cleans up the code a bit, since in practice, the same query is passed in every time to the same planner anyway.

The main changes are:

1) DruidPlanner is able to validate and authorize INSERT queries. They
   require WRITE permission on the target datasource.

2) QueryMaker is now an interface, and there is a QueryMakerFactory that
   creates instances of it. There is only one production implementation
   of each (NativeQueryMaker and NativeQueryMakerFactory), which
   together behave the same way as the former QueryMaker class. But this
   opens the door to executing queries in ways other than the Druid
   query stack, and is used by unit tests (CalciteInsertDmlTest) to
   test the INSERT planning functionality.

3) Adds an EXTERN table macro that allows references external data using
   InputSource and InputFormat from Druid's batch ingestion API. This is
   not exposed in production yet, but is used by unit tests.

4) Adds a QueryFeature concept that enables the planner to change its
   behavior slightly depending on the capabilities of the execution
   system.

5) Adds an "AuthorizableOperator" concept that enables SqlOperators
   to require additional permissions. This is used by the EXTERN table
   macro.

Related odds and ends:

- Add equals, hashCode, toString methods to InlineInputSource. Aids in
  the "from external" tests in CalciteInsertDmlTest.
- Add JSON-serializability to RowSignature.
- Move the SQL string inside PlannerContext so it is "baked into" the
  planner when the planner is created. Cleans up the code a bit, since
  in practice, the same query is passed in every time to the
  same planner anyway.
Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very nice, i'm a bit impressed how cleanly this fit into the existing stuff with a bit of refactoring 👍

* Validates an SQL query and collects a {@link ValidationResult} which contains a set of
* {@link org.apache.druid.server.security.Resource} corresponding to any Druid datasources or views which are taking
* part in the query
* Validates a SQL query and populates {@link PlannerContext#getResourceActions()}.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an “es-cue-el” person, this will never read right to me :trollface:

On a serious note, the refactors in DruidPlanner are nice 👍

// finalize aggregations for the outermost query even if we don't explicitly ask it to.

final DruidQuery query = toDruidQuery(false);
if (query != null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this null check be moved into NativeQueryMaker or the DruidRel method or can it not be null here?

Copy link
Copy Markdown
Contributor Author

@gianm gianm Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be null; that was a hold-over from an earlier time when null was used by the old semi-join rel to mean "known to be empty". The new join rel doesn't work that way, since it pushes computation down into the native layer. These days, DruidRel.toDruidQuery is not @Nullable anymore.

{
PreparedStatement statement = client.prepareStatement("SELECT COUNT(*) AS cnt FROM sys.servers WHERE servers.host = ?");
PreparedStatement statement =
client.prepareStatement("SELECT COUNT(*) AS cnt FROM sys.servers WHERE servers.host = ?");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a bug that this didn't explode before or is it an artifact of changes elsewhere or just a flaw in the original test?

Copy link
Copy Markdown
Member

@clintropolis clintropolis Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah nevermind, found the change in CalciteTests that limits to datasource and view permissions, leaving here in case anyone else wonders the same thing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, this is happening because I made the test authorizer more strict for the "regular user". The regular user now only has permissions to datasources and views, not state or external data.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤘

@gianm gianm merged commit 0354407 into apache:master Nov 24, 2021
@gianm gianm deleted the sql-insert-p1 branch November 24, 2021 20:14
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants