refactor sql lifecycle, druid planner, views, and view permissions#10812
Merged
jon-wei merged 9 commits intoapache:masterfrom Feb 5, 2021
Merged
refactor sql lifecycle, druid planner, views, and view permissions#10812jon-wei merged 9 commits intoapache:masterfrom
jon-wei merged 9 commits intoapache:masterfrom
Conversation
jihoonson
reviewed
Feb 3, 2021
| * Initialize the query lifecycle, setting the raw string SQL, initial query context, and assign a sql query id. | ||
| * | ||
| * If successful (it will be), it will transition the lifecycle to {@link State#INITIALIZED}. | ||
| */ |
Contributor
There was a problem hiding this comment.
Thanks for adding javadocs 👍
| private PlannerContext plan(AuthenticationResult authenticationResult) | ||
| throws RelConversionException | ||
| /** | ||
| * Validate SQL query and authorize against any datasources or views which the query. |
| // raw tables and views and such will have a IdentifierNamespace | ||
| // since we are scoped to identifiers here, we should only pick up these | ||
| SqlValidatorNamespace namespace = validator.getNamespace(id); | ||
| if (namespace instanceof IdentifierNamespace) { |
Contributor
There was a problem hiding this comment.
Per Calcite, we should use isWrappedFor() instead of instanceof.
jihoonson
approved these changes
Feb 4, 2021
Contributor
jihoonson
left a comment
There was a problem hiding this comment.
LGTM, but please add javadoc before merge.
| import java.util.Set; | ||
|
|
||
| public class ResourceResult | ||
| public class ValidationResult |
Contributor
There was a problem hiding this comment.
Please add some javadoc since now it's not intuitive what resources in ValidationResult mean.
| new SqlResourceCollectorShuttle(validator, frameworkConfig.getDefaultSchema().getName()); | ||
| validated.accept(resourceCollectorShuttle); | ||
| return new ResourceResult(resourceCollectorShuttle.getResources()); | ||
| plannerContext.setResources(resourceCollectorShuttle.getResources()); |
Contributor
There was a problem hiding this comment.
nit: maybe plannerContext.setValidationResult(validationResult)?
jon-wei
approved these changes
Feb 5, 2021
6 tasks
jon-wei
added a commit
to jon-wei/druid
that referenced
this pull request
Nov 22, 2021
…sions (apache#10812)" This reverts commit fe30f4b.
jon-wei
added a commit
to jon-wei/druid
that referenced
this pull request
Nov 22, 2021
Revert "refactor sql lifecycle, druid planner, views, and view permissions (apache#10812)"
jon-wei
pushed a commit
to jon-wei/druid
that referenced
this pull request
Nov 22, 2021
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR primarily does two things:
ResourceType.VIEWauthorization construct to allow defining access to views separately from datasources.SqlLifecycleto authorize datasources (and now views) up front, before preparing or planning a query, by analyzing the SQL expression directly rather than waiting until after the transformation from SQL to native Druid query is done.Note that this PR does not introduce or propose a view management system at this time, that exercise is left for future work, what is going on here is just refactoring some internal stuff that might someday be used for something cool.
Views
Instead of living in
DruidSchema, a newViewSchemahas been introduced to hold all Druid views. This is technically an incompatible change, sincedruid.some_viewis nowview.some_view, but since this isn't currently a real feature I think this should not be problematic.Prior to this PR, authorization was performed against the final set of datasources which would be touched by the native Druid query, and done after planning has been completed (and done twice, once in
SqlLifecycleand also again inQueryLifecycle). However, this is not appropriate for views, as it has an overly strict requirement that to query a view, you must also be authorized to read from all underlying datasources, which of course precludes scenarios like providing a restricted view onto a larger underlying table as a means to control access.To remedy this, a new
ResourceType.VIEWconstruct has been introduced, and SQL query authorization is now done against the set ofResourceofResourceType.DATASOURCEandResourceType.VIEWwhich are utilized in the query. In this model views are more or less authorized the same as tables, just separately.Splitting the 'view' and 'druid' schema also makes any theoretical view manager implementation easier to implement because it would not have to worry about name collisions with Druid datasources, just other views, and requires no other changes to Druid like a real solution cohabitating the same schema would require.
SqlLifecycle and DruidPlanner
Since authorization required a refactor to support independent view authorization anyway, I took the opportunity to rework a bit how
SqlLifecyclefunctions, changing the order of the state transitions to move authorization earlier in the process, which is now modeled by this new flow:To collect the set of
Resourceto authenticate, theDruidPlannerwhichSqlLifecyclewraps, has a new methodvalidatewhich returns aValidationResultcontaining the set of all views and datasources that were identified in the query. Mechanically this list is constructed using aSqlShuttlewhich walks the SQL expression tree to examineSqlIdentifierwhich the validator associates with a table identifier to lookup whether its a 'druid' or a 'view', and construct the correctResourceaccordingly.The added
DruidPlannerResourceAnalyzeTesthas some tests which are trying to trip up theSqlResourceCollectorShuttle, it looks good so far in my testing but I'm sure there are scenarios I'm not thinking of though. If anyone is worried about this new approach to resolving the resources to authorize, I can introduce a config option to retain the previous check against the old set of datasource names after planning (though I would rather not do this if possible).Authorizing before planning should also help reduce some potential waste spent planning queries, which can be non-trivial depending on complexity, and then would later not be able to execute due to authorization failure.
I think there are some further improvements that can be made with
SqlLifecycleandDruidPlanner, but have not done these changes in this PR to keep it from growing too large. For example, there is a lot of repeated parsing/validation work done between the phases of the lifecycle, which could probably be re-used.Finally, I tried to fill out some of the javadoc in this area, since it was kind of lacking, hopefully it's better.
This PR has: