Add support for authorizing query context params#12396
Add support for authorizing query context params#12396abhishekagarwal87 merged 16 commits intoapache:masterfrom
Conversation
clintropolis
left a comment
There was a problem hiding this comment.
overall lgtm, this seems pretty useful to be able to control access to this stuff 🤘
| * using {@link #withContext(QueryContext)}. When callers use query context, they should check first | ||
| * if the query holder has a valid query context using {@link #isValidContext()}. | ||
| */ | ||
| public class QueryHolder<T> |
There was a problem hiding this comment.
this feels somewhat sad given the existence of QueryPlus, the other wrapper, though i guess it is used for processing queries instead of the lead up to it... still, seems like we have too many wrappers...
There was a problem hiding this comment.
I added QueryHolder at first because it was quite confusing which is the valid context between Query.getContext() and QueryContext. QueryHolder was a stateful object indicating whether Query.getContext is valid. QueryLifecycle could find a valid query context using QueryHolder.isValidContext(). A better approach is replacing the context map in Query with QueryContext as we can consolidate the query context stores. I didn't do it at first because it seems quite invasive. However, I was curious how invasive it would be, so went ahead and tried it. It actually doesn't seem that invasive, this PR is rather a little bit smaller than it was before.
The Query interface now has getQueryContext() which returns QueryContext. This new interface is preferred over Query.getContext() which internally simply calls QueryContext.getMergedParams(). QueryContext in Query is "valid" only in the broker in a sense that defaultParams, userParams, and systemParams will not be kept after serialization. All parameters will be stored in userParams after it is deserialized. This should not cause any issue today.
| * - Updated with context after authorization in {@link #doAuthorize}. | ||
| */ | ||
| @MonotonicNonNull | ||
| private QueryHolder<?> baseQuery; |
There was a problem hiding this comment.
this seems error prone to have two contexts like this with one hidden in the wrapper, I think it might be easier to follow what is going on here to keep this as a Query and have a baseContext separate from the context.
| queryId | ||
| ); | ||
|
|
||
| final NonnullPair<QueryHolder<?>, QueryContext> pair = readQuery(req, in, ioReaderWriter); |
There was a problem hiding this comment.
again this feels confusing to have two query contexts, one of which is wrapped, i'd personally probably rather just see them all separately to make it easier to follow what is what
| { | ||
| private static final Logger LOG = new Logger(DruidConnection.class); | ||
| private static final Set<String> SENSITIVE_CONTEXT_FIELDS = Sets.newHashSet( | ||
| public static final Set<String> SENSITIVE_CONTEXT_FIELDS = ImmutableSet.of( |
There was a problem hiding this comment.
this isn't used here anymore, and can be moved to DruidMeta
|
|
||
| public Query getQuery() | ||
| /** | ||
| * Returns the Query wrapped inside QueryHolder. |
| private Query baseQuery; | ||
|
|
||
| /** | ||
| * A holder for the user query to run. |
| return Objects.hash(getMergedParams()); | ||
| } | ||
|
|
||
| // TODO: toString? |
| * You can use {@code getX} methods or {@link #getMergedParams()} to compute the context params | ||
| * merging 3 types of params above. | ||
| * | ||
| * Currently, this class is mainly used for query context parameter authorization in query entires, |
There was a problem hiding this comment.
| * Currently, this class is mainly used for query context parameter authorization in query entires, | |
| * Currently, this class is mainly used for query context parameter authorization, |
| * Auto-generated queryId or sqlQueryId are also set as default parameters. These default parameters can | ||
| * be overridden by user or system parameters. | ||
| * - User parameters. These are the params set by the user. User params override default parameters but | ||
| * are overridden by system paramters. |
There was a problem hiding this comment.
| * are overridden by system paramters. | |
| * are overridden by system parameters. |
kfaraz
left a comment
There was a problem hiding this comment.
Overall looks good. Added some minor comments.
We could also update the utility methods inside QueryContexts that accept a map to now accept a QueryContext itself but this can be done later.
| return makeSQLQueryRequest(httpClient, query, ImmutableMap.of(), expectedStatus); | ||
| } | ||
|
|
||
| protected StatusResponseHolder makeSQLQueryRequest( |
There was a problem hiding this comment.
Suggestion: Maybe rename to something like makeSqlRequestAndVerifyStatus.
There was a problem hiding this comment.
This is actually not a new method, but I added a new parameter for this method. I would like to not rename it as this PR is already quite big.
|
|
||
| DateTimeZone getTimezone(); | ||
|
|
||
| @Deprecated |
There was a problem hiding this comment.
Nit: Please add a comment/javadoc about the deprecation and the alternative.
| return false; | ||
| } | ||
| QueryContext context = (QueryContext) o; | ||
| return getMergedParams().equals(context.getMergedParams()); |
There was a problem hiding this comment.
Nit:
As we are comparing only the final merged params, two QueryContext objects that are currently equal might not be so after performing the same operation (say removeUserParam) on the two of them.
For example:
Context1: userParam={p1=10}, systemParam={}
Context2: userParam={}, systemParam={p1=10}
These two are currently equal.
But after performing the same operation, say removeUserParam(p1), the two contexts will not remain equal anymore.
I guess this should be okay?
There was a problem hiding this comment.
The scenario you described sounds OK to me. defaultParams, userParams, or systemParams are to track how those params are set, and should not affect the equality test of the queryContext object. Any particular concern about the behavior?
There was a problem hiding this comment.
No concern as such. Just wanted to be sure.
| protected abstract String getAuthenticatorName(); | ||
| protected abstract String getAuthorizerName(); | ||
| protected abstract String getExpectedAvaticaAuthError(); | ||
| protected abstract Properties getAvaticaConnectionProperties(); |
There was a problem hiding this comment.
Nit:
Maybe we should just add a new testAvaticaQuery(properties, url)and leave the existingtestAvaticaQuery as is?
Otherwise, we are forced to pass the properties to testAvaticaQuery in every downstream test.
There was a problem hiding this comment.
It was my bad that I removed this method before. I missed that implementations of this class can have a different implementation for this method. I added them back.
for testAvaticaQuery, I think it's OK to change the method signature as the new signature makes it clearer what user's credentials are used for the test.
| } else { | ||
| mergedUserAndConfigContext = defaultQueryConfig.getContext(); | ||
| } | ||
| baseQuery.getQueryContext().addDefaultParam(BaseQuery.QUERY_ID, UUID.randomUUID().toString()); |
| new QueryContext( | ||
| ImmutableMap.of( | ||
| PlannerConfig.CTX_KEY_USE_GROUPING_SET_FOR_EXACT_DISTINCT, | ||
| "true" | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Nit: Maybe add a private util method for this? This code is present in 4 places.
There was a problem hiding this comment.
I'm not sure if that is worth. I think we rather need to refactor and split CalciteQueryTest since this class is too big now.
There was a problem hiding this comment.
+1 on splitting up the CalciteQueryTest
| final DruidStatement statement = new DruidStatement( | ||
| "", | ||
| 0, | ||
| new QueryContext(), | ||
| sqlLifecycleFactory.factorize(), | ||
| () -> {} | ||
| ).prepare(sql, -1, AllowAllAuthenticator.ALLOW_ALL_RESULT); |
There was a problem hiding this comment.
Nit: Maybe add a createStatement(sql) private util method for this to avoid code repetition.
| operand(DruidRel.class, any()) | ||
| ) | ||
| ); | ||
| this.enableLeftScanDirect = QueryContexts.getEnableJoinLeftScanDirect(plannerContext.getQueryContext()); |
There was a problem hiding this comment.
isEnableJoinLeftScanDirect seems too specific a use case to be a part of the base QueryContext class itself. Why have we moved it from QueryContexts?
There was a problem hiding this comment.
IMO, now QueryContexts should remain just as a holder of predefined context parameters. It doesn't give any more benefit over QueryContext. getEnableJoinLeftScanDirect is just a util method for our convenience and they all should be moved to QueryContext.
There was a problem hiding this comment.
👍🏻 Moving all of these methods to QueryContext would be much better than it is now.
| } | ||
|
|
||
| public PlannerConfig withOverrides(final Map<String, Object> context) | ||
| public PlannerConfig withOverrides(final QueryContext queryAndContext) |
There was a problem hiding this comment.
Suggestion: Rename to queryContext
| * If successful (it will be), it will transition the lifecycle to {@link State#INITIALIZED}. | ||
| */ | ||
| public String initialize(String sql, Map<String, Object> queryContext) | ||
| public String initialize(String sql, QueryContext queryAndContext) |
There was a problem hiding this comment.
Nit:
| public String initialize(String sql, QueryContext queryAndContext) | |
| public String initialize(String sql, QueryContext queryContext) |
|
tagging |
|
@clintropolis maybe we can add a default implementation for |
|
Adds a default implementation of getQueryContext, which was added to the Query interface in #12396. Query is marked with @ExtensionPoint, and lately we have been trying to be less volatile on these interfaces by providing default implementations to be more chill for extension writers. The way this default implementation is done in this PR is a bit strange due to the way that getQueryContext is used (mutated with system default and system generated keys); the default implementation has a specific object that it returns, and I added another temporary default method isLegacyContext that checks if the getQueryContext returns that object or not. If not, callers fall back to using getContext and withOverriddenContext to set these default and system values. I am open to other ideas as well, but this way should work at least without exploding, and added some tests to ensure that it is wired up correctly for QueryLifecycle, including the context authorization stuff. The added test shows the strange behavior if query context authorization is enabled, mainly that the system default and system generated query context keys also need to be granted as permissions for things to function correctly. This is not great, so I mentioned it in the javadocs as well. Not sure if it needs to be called out anywhere else.
Adds a default implementation of getQueryContext, which was added to the Query interface in apache#12396. Query is marked with @ExtensionPoint, and lately we have been trying to be less volatile on these interfaces by providing default implementations to be more chill for extension writers. The way this default implementation is done in this PR is a bit strange due to the way that getQueryContext is used (mutated with system default and system generated keys); the default implementation has a specific object that it returns, and I added another temporary default method isLegacyContext that checks if the getQueryContext returns that object or not. If not, callers fall back to using getContext and withOverriddenContext to set these default and system values. I am open to other ideas as well, but this way should work at least without exploding, and added some tests to ensure that it is wired up correctly for QueryLifecycle, including the context authorization stuff. The added test shows the strange behavior if query context authorization is enabled, mainly that the system default and system generated query context keys also need to be granted as permissions for things to function correctly. This is not great, so I mentioned it in the javadocs as well. Not sure if it needs to be called out anywhere else.
Adds a default implementation of getQueryContext, which was added to the Query interface in #12396. Query is marked with @ExtensionPoint, and lately we have been trying to be less volatile on these interfaces by providing default implementations to be more chill for extension writers. The way this default implementation is done in this PR is a bit strange due to the way that getQueryContext is used (mutated with system default and system generated keys); the default implementation has a specific object that it returns, and I added another temporary default method isLegacyContext that checks if the getQueryContext returns that object or not. If not, callers fall back to using getContext and withOverriddenContext to set these default and system values. I am open to other ideas as well, but this way should work at least without exploding, and added some tests to ensure that it is wired up correctly for QueryLifecycle, including the context authorization stuff. The added test shows the strange behavior if query context authorization is enabled, mainly that the system default and system generated query context keys also need to be granted as permissions for things to function correctly. This is not great, so I mentioned it in the javadocs as well. Not sure if it needs to be called out anywhere else. Co-authored-by: Clint Wylie <cwylie@apache.org>
Description
The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below.
druid.query.default.contextin runtime properties. Any user context params can be default params.Today, any context params are allowed to users. This can cause 1) a bad UX if the context param is not matured yet or 2) even query failure or system fault in the worst case if a sensitive param is abused, ex)
maxSubqueryRows.This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type,
QUERY_CONTEXT. The resource to authorize has a name of the context param (such asmaxSubqueryRows) and the type ofQUERY_CONTEXT. To allow a certain context param for a user, the user should be grantedWRITEpermission on the context param resource. Here is an example of the permission.Each role can have multiple permissions for context params. Each permission should be set for different context params.
When a query is issued with a query context
X, the query will fail if the user who issued the query does not have WRITE permission on the query contextX. In this case,Note: there is a context param called
brokerServicethat is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you havebrokerServiceset in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission forbrokerService.The context param authorization can be enabled using
druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.Key changed/added classes in this PR
QueryContexttracks user params and separates them from others.QueryHolderhas a state indicating whether the context in the native query is valid.QueryLifecycleretrieves context params from a valid source.This PR has: