Skip to content

refactor sql lifecycle, druid planner, views, and view permissions#10812

Merged
jon-wei merged 9 commits intoapache:masterfrom
clintropolis:split-up-views-and-druids
Feb 5, 2021
Merged

refactor sql lifecycle, druid planner, views, and view permissions#10812
jon-wei merged 9 commits intoapache:masterfrom
clintropolis:split-up-views-and-druids

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented Jan 28, 2021

Description

This PR primarily does two things:

  • refactors the (afaict) currently unused Druid view system in the hopes that someday it can be built into something useful, by splitting into a separate calcite schema and introducing a new ResourceType.VIEW authorization construct to allow defining access to views separately from datasources.
  • refactors SqlLifecycle to authorize datasources (and now views) up front, before preparing or planning a query, by analyzing the SQL expression directly rather than waiting until after the transformation from SQL to native Druid query is done.

Note that this PR does not introduce or propose a view management system at this time, that exercise is left for future work, what is going on here is just refactoring some internal stuff that might someday be used for something cool.

Views

Instead of living in DruidSchema, a new ViewSchema has been introduced to hold all Druid views. This is technically an incompatible change, since druid.some_view is now view.some_view, but since this isn't currently a real feature I think this should not be problematic.

Prior to this PR, authorization was performed against the final set of datasources which would be touched by the native Druid query, and done after planning has been completed (and done twice, once in SqlLifecycle and also again in QueryLifecycle). However, this is not appropriate for views, as it has an overly strict requirement that to query a view, you must also be authorized to read from all underlying datasources, which of course precludes scenarios like providing a restricted view onto a larger underlying table as a means to control access.

To remedy this, a new ResourceType.VIEW construct has been introduced, and SQL query authorization is now done against the set of Resource of ResourceType.DATASOURCE and ResourceType.VIEW which are utilized in the query. In this model views are more or less authorized the same as tables, just separately.

Splitting the 'view' and 'druid' schema also makes any theoretical view manager implementation easier to implement because it would not have to worry about name collisions with Druid datasources, just other views, and requires no other changes to Druid like a real solution cohabitating the same schema would require.

SqlLifecycle and DruidPlanner

Since authorization required a refactor to support independent view authorization anyway, I took the opportunity to rework a bit how SqlLifecycle functions, changing the order of the state transitions to move authorization earlier in the process, which is now modeled by this new flow:

new -> initialized -> authorizing -> authorized -> (unauthorized | planned -> executing -> done)

To collect the set of Resource to authenticate, the DruidPlanner which SqlLifecycle wraps, has a new method validate which returns a ValidationResult containing the set of all views and datasources that were identified in the query. Mechanically this list is constructed using a SqlShuttle which walks the SQL expression tree to examine SqlIdentifier which the validator associates with a table identifier to lookup whether its a 'druid' or a 'view', and construct the correct Resource accordingly.

The added DruidPlannerResourceAnalyzeTest has some tests which are trying to trip up the SqlResourceCollectorShuttle, it looks good so far in my testing but I'm sure there are scenarios I'm not thinking of though. If anyone is worried about this new approach to resolving the resources to authorize, I can introduce a config option to retain the previous check against the old set of datasource names after planning (though I would rather not do this if possible).

Authorizing before planning should also help reduce some potential waste spent planning queries, which can be non-trivial depending on complexity, and then would later not be able to execute due to authorization failure.

I think there are some further improvements that can be made with SqlLifecycle and DruidPlanner, but have not done these changes in this PR to keep it from growing too large. For example, there is a lot of repeated parsing/validation work done between the phases of the lifecycle, which could probably be re-used.

Finally, I tried to fill out some of the javadoc in this area, since it was kind of lacking, hopefully it's better.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

* Initialize the query lifecycle, setting the raw string SQL, initial query context, and assign a sql query id.
*
* If successful (it will be), it will transition the lifecycle to {@link State#INITIALIZED}.
*/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding javadocs 👍

private PlannerContext plan(AuthenticationResult authenticationResult)
throws RelConversionException
/**
* Validate SQL query and authorize against any datasources or views which the query.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which the query reads?

// raw tables and views and such will have a IdentifierNamespace
// since we are scoped to identifiers here, we should only pick up these
SqlValidatorNamespace namespace = validator.getNamespace(id);
if (namespace instanceof IdentifierNamespace) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per Calcite, we should use isWrappedFor() instead of instanceof.

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please add javadoc before merge.

import java.util.Set;

public class ResourceResult
public class ValidationResult
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some javadoc since now it's not intuitive what resources in ValidationResult mean.

new SqlResourceCollectorShuttle(validator, frameworkConfig.getDefaultSchema().getName());
validated.accept(resourceCollectorShuttle);
return new ResourceResult(resourceCollectorShuttle.getResources());
plannerContext.setResources(resourceCollectorShuttle.getResources());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe plannerContext.setValidationResult(validationResult)?

@jon-wei jon-wei merged commit fe30f4b into apache:master Feb 5, 2021
@clintropolis clintropolis deleted the split-up-views-and-druids branch February 10, 2021 03:44
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
jon-wei added a commit to jon-wei/druid that referenced this pull request Nov 22, 2021
jon-wei added a commit to jon-wei/druid that referenced this pull request Nov 22, 2021
Revert "refactor sql lifecycle, druid planner, views, and view permissions (apache#10812)"
jon-wei pushed a commit to jon-wei/druid that referenced this pull request Nov 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants