Add policy enforcer to sanity check on policy in query execution#17774
Add policy enforcer to sanity check on policy in query execution#17774gianm merged 111 commits intoapache:masterfrom
Conversation
…l and broker to use the common druid root directory
|
left a few comments; happy to talk about it further! |
Implement pushTaskPayload/streamTaskPayload as introduced in apache#14887 for HDFS storage to allow larger mm-less ingestion payloads when using HDFS as the deep storage location.
* Add deprecated com.google.common.io.Files#write to forbiddenApis * Replace deprecated Files.write()
Mistakenly categories under deep storage instead of metadata store.
Changes --------- - Bind `SegmentMetadataCache` only once to `HeapMemorySegmentMetadataCache` in `SQLMetadataStorageDruidModule` - Invoke start and stop of the cache from `DruidOverlord` rather than on lifecycle start/stop - Do not override the binding in `CliOverlord`
…task time (apache#17770) Changes --------- - Use `maxIntervalToKill` to determine search interval for killing unused segments. - If no segment has been killed for the datasource yet, use durationToRetain
…pec was unmodified (apache#17707) Add an optional query parameter called skipRestartIfUnmodified to the /druid/indexer/v1/supervisor endpoint. Callers can set skipRestartIfUnmodified=true to not restart the supervisor if the spec is unchanged. Example: curl -X POST --header "Content-Type: application/json" -d @supervisor.json localhost:8888/druid/indexer/v1/supervisor?skipRestartIfUnmodified=true
Changes --------- - Emit time lag from Kafka similar to Kinesis as metrics `ingest/kafka/lag/time`, `ingest/kafka/maxLag/time`, `ingest/kafka/avgLag/time` - Add new method in `KafkaSupervisor` to fetch timestamps of latest records in stream to compute time lag - Add new field `emitTimeLagMetrics` in `KafkaSupervisorIOConfig` to toggle emission of new metrics
* suggest filter values when known * update snapshots * add more d * fix load rule clamp * better segment timeline init
There was a problem hiding this comment.
Copilot reviewed 103 out of 104 changed files in this pull request and generated no comments.
Files not reviewed (1)
- extensions-core/multi-stage-query/pom.xml: Language not supported
Comments suppressed due to low confidence (3)
extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/test/MSQTestBase.java:541
- [nitpick] Ensure that the binding for PolicyEnforcer follows the project’s standard pattern for dependency injection and consider adding a brief comment explaining why the NoopPolicyEnforcer is used in tests.
binder -> binder.bind(PolicyEnforcer.class).toInstance(NoopPolicyEnforcer.instance())
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/BaseLeafFrameProcessorFactory.java:165
- Confirm that the updated createSegmentMapFunction method correctly accepts a PolicyEnforcer parameter and that all implementations have been updated accordingly.
final Function<SegmentReference, SegmentReference> segmentMapFn = ExecutionVertex.of(query).createSegmentMapFunction(frameContext.policyEnforcer());
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/SimpleSegmentMapFnProcessor.java:50
- [nitpick] Consider updating or adding unit tests to verify that the behavior of the segment mapping function is correctly influenced by the provided PolicyEnforcer.
public SimpleSegmentMapFnProcessor(final Query<?> query, final PolicyEnforcer policyEnforcer)
Hey, this is ready for review again, a few things to look for from last round:
|
gianm
left a comment
There was a problem hiding this comment.
The new structure of PolicyEnforcer is imo a lot clearer.
| * This should be called right before the segment is about to be processed by the query stack, and after | ||
| * {@link org.apache.druid.query.planning.ExecutionVertex#createSegmentMapFunction(PolicyEnforcer)}. | ||
| */ | ||
| default void validateOrElseThrow(PolicyEnforcer policyEnforcer) |
There was a problem hiding this comment.
There is a call missing at BroadcastJoinSegmentMapFnProcessor -- the path MSQ goes down when there is a segment-level join happening. Maybe move the PolicyEnforcer arg from ExecutionVertex to DataSource#createSegmentMapFunction itself?
Or perhaps have BroadcastJoinSegmentMapFnProcessor apply the enforcer on its own. If you go that route, please also add a comment to the javadoc for DataSource#createSegmentMapFunction reminding callers that if they are using this method instead of ExecutionVertex, they need to apply a PolicyEnforcer to the resulting segment.
There was a problem hiding this comment.
Looking at BroadcastJoinSegmentMapFnProcessor, it converts InputNumberDataSource to InlineDataSource, but the enforcer won't have effects on inline data.
There was a problem hiding this comment.
I managed to run some tests based on BroadcastJoin in MSQ, disabled the enforcer on DruidQuery side, and it's actually catching security validation failure when I use TableDataSource as one of left child and right child, in the scanning step. And the join step (runWithInputChannel) actually don't enforce anything, because it's using a FrameSegment.
There was a problem hiding this comment.
Looking at
BroadcastJoinSegmentMapFnProcessor, it convertsInputNumberDataSourcetoInlineDataSource, but the enforcer won't have effects on inline data.
It only inlines if the input number is in the inputNumberToProcessorChannelMap map. The base input won't be in that map, and that base input might be a regular table. I think in the simplest case where a regular table is joined with some subquery result, like select page, q.c from wikipedia, (select count(*) c from "kttm-v2-2019-08-25") q, the segments from wikipedia won't be checked since they'll be using the map function generated here.
I tried setting a breakpoint at validateOrElseThrow(ReferenceCountingSegment segment, Policy policy) and did find that. For MSQ tasks, the breakpoint triggered for kttm-v2-2019-08-25 but not for wikipedia.
For Dart the situation was a little worse; the breakpoint for the validateOrElseThrow(ReferenceCountingSegment segment, Policy policy) method triggered also only for kttm-v2-2019-08-25, but then validate wasn't called because Dart has two layers of ReferenceCountingSegment, so the check baseSegment instanceof QueryableIndexSegment || baseSegment instanceof IncrementalIndexSegment is false.
IMO, the best approach to the first issue (segment map fn created by BroadcastJoinSegmentMapFnProcessor not having enforcement) is to move the PolicyEnforcer into DataSource#createSegmentMapFunction, just like withPolicies. That way we never need to think about whether a particular bare call to DataSource#createSegmentMapFunction is safe or not, because all calls would need the enforcer present.
For the second issue (Dart not validating properly because of double-wrapped segments) -- perhaps it would make sense to move validateOrElseThrow from ReferenceCountingSegment to Segment, and have the impl in ReferenceCountingSegment call getBaseSegment().validateOrElseThrow(policyEnforcer) rather than validateOrElseThrow(this, policyEnforcer)?
There was a problem hiding this comment.
Thanks for the example, I was able to reproduce the error now, it looks like right child is inlined but it has already gone through a scan stage, and the current implementation missed the check on the left child.
I'm think, we can ask BroadcastJoinSegmentMapFnProcessor to create an ExecutionVertex based on the query, and pass enforcer in ExecutionVertex's createSegmentMapFunction, so like
DataSource transformed = inlineChannelData(query.getDataSource());
ExecutionVertex ev = ExecutionVertex.of(query.withDataSource(transformed));
return ev.createSegmentMapFunction(policyEnforcer);
This way we can centralize the enforcer in ExecutionVertex without changing DataSource interface.
For the Dart issue, I didn't know it can wrap two layer of ReferenceCountingSegment, I think maybe we can add recursive in the validation? For a restricted segment, would it be
- Restrict
- Ref
- Ref_base
- policy
- Ref
or
- Ref
- Restrict
- Ref_base
- policy
- Restrict
?
There was a problem hiding this comment.
The changes to BroadcastJoinSegmentMapFnProcessor look good to me.
The implementation of PolicyEnforcer#validateOrElseThrow for ReferenceCountingSegment seems a little sketchy still, for a couple reasons:
- The "else skip validation" seems like a hole for stuff to fall into. Prior to the latest fix, an inner
ReferenceCountingSegmentdid fall into it. I'm wondering if other things could potentially fall in, like extension segment types, new segment types, etc. - The branch doing
instanceof QueryableIndexSegment || instanceof IncrementalIndexSegmentisn't strong enough to detect whether aSegmentcorresponds to a regular table, because extensions can add other kinds of segments backing tables. This is a specific kind of thing that could fall into the hole mentioned previously. - I don't understand why the
instanceof RestrictedSegmentbranch is needed -- is it really possible for RestrictedSegment to wrap another RestrictedSegment? I would think this is impossible given that RestrictedDataSource can only wrap a TableDataSource.
I observe that the DataSource version of validation is cleaner and more robust, and can be because there are two important things DataSource has:
DataSource#getChildrenexists, so datasource trees can be walked robustly- We know that tables correspond to
TableDataSource, so it's always possible to identify whether a leaf datasource is a regular table or not.
To have equally robust validation for Segment, I think we would need to add Segment#getChildren to address the first point. For the second point I think we can use a check like this for leaf segments to see if they correspond to regular tables: as(PhysicalSegmentInspector.class) != null.
I tried to think for a while of a robust way to write this code without adding Segment#getChildren and wasn't able to. So I do think we should do that. The default implementation should throw an unsupported operation error.
There was a problem hiding this comment.
I think SegmentReference could serve the purpose of Segment#getChildren. The default impl is to throw, and every extension (ReferenceCountingSegment, HashJoinSegment, WrappedSegmentReference needs to implement its own validation (basically asking delegate to validate).
To solidify, I'm thinking, can we eliminate the use cases when ReferenceCountingSegment is wrapped with another SegmentReference? I think it would be simpler if we know the structure is RestrictedSegment wraps a ReferenceCountingSegment, which wraps a basic segment (QueryableIndexSegment, LookupSegment, etc).
It seems tests using SpecificSegmentsQuerySegmentWalker follows this structure as well.
There was a problem hiding this comment.
To solidify, I'm thinking, can we eliminate the use cases when
ReferenceCountingSegmentis wrapped with anotherSegmentReference? I think it would be simpler if we know the structure isRestrictedSegmentwraps aReferenceCountingSegment, which wraps a basicsegment(QueryableIndexSegment,LookupSegment, etc).
I did see double-wrapping (ReferenceCountingSegment on top of ReferenceCountingSegment) with Dart with the query select page, q.c from wikipedia, (select count(*) c from "kttm-v2-2019-08-25") q. However, I doubt this is "necessary". The Dart code could likely be adjusted to not do this. So, if it's helpful, go ahead and restrict things so ReferenceCountingSegment cannot wrap another SegmentReference.
To have equally robust validation for
Segment, I think we would need to addSegment#getChildrento address the first point. For the second point I think we can use a check like this for leaf segments to see if they correspond to regular tables:as(PhysicalSegmentInspector.class) != null.
About this (using PhysicalSegmentInspector as a proxy for "Segment represents a regular table"), I thought about it some more and IMO this approach is still not ideal. Nothing in the interface says that it can't be implemented for a non-table, and nothing requires that it is implemented for a table.
I think a nicer idea would be to make Segment#getId nullable, and spec it so that it should return nonnull for regular tables backed by actual segments, null for anything else. This to me would make more sense than the current way getId() works, because currently dummy IDs are needed for Segment that aren't backed by actual segments, which seems odd. I skimmed usages of getId(); most seem to either be running in scenarios where it would always be nonnull anyway, or else are using it to interpolate into log messages (could use toString or asString instead).
There was a problem hiding this comment.
In the interests of having this PR merged before it grows too large, I suggest doing a super-strict check now -- validate all leaf segments regardless of whether they are actual tables or not. This will essentially mean that validation will always fail on lookups, external, and inline segments, because they will not have policies applied. In a follow up we can add something to constraint validation to regular-table-backed Segment only, perhaps using this nullable Segment#getId idea.
There was a problem hiding this comment.
Completely agree on the SegmentId approach, would be a cleaner solution and we can have a better mapping on different datasource (table, inline, lookup, external) to segmentId.
I reverted the double-wrapping check in commit 1e6632f, and then actually moved the no-double-wrapping check f604abf to another pr (#17943). Please take a look again!
…icyEnforcer should also deal with multiple layer wrapped segments/
…ce, and PolicyEnforcer now validates all segments, remove test cases for inline/lookup.
Description
Added
PolicyEnforcerinterface,NoopPolicyEnforcerandRestrictAllTablesPolicyEnforcer. It'd be configurable bydruid.policy.enforcer, throughPolicyModule.PolicyEnforcerworks in buildingDruidQueryand authorization step inQueryLifecycle,withPolicies(Map<String, Optional<Policy>> policyMap, PolicyEnforcer policyEnforcer). Specifically,enforcer.validateOrElseThrow(TableDataSource ds, Policy policy)throws an exception if validation fails.PolicyEnforceris passed inExecutionVertex.createSegmentMapFunction(PolicyEnforcer), when the mapFn is called, it validates the mapped segment called fromSegmentReference.ValidateOrElseThrow(PolicyEnforcer policyEnforcer). Specifically,HashJoinSegmentandWrappedSegment, call delegate.validateOrElseThrowRestrictedSegmentandReferenceCountingSegment, call enforcer to validate with aReferenceCountingSegmentand policy (null for ReferenceCountingSegment not wrapped in RestrictedSegment.As a singleton object, the enforcer is injected to
QueryLifecycleFactoryandTaskToolboxFactory, and made its way down toSinkQuerySegmentWalker,FrameContextandQueryLifecycle.Additionally,
ServerManagerTestto use Guice bindings (new test dependency), andTestSegmentCacheManagerfor loading segments.MSQTaskQueryMakerTestclass for easy binding of enforcer.Key changed/added classes in this PR
PolicyEnforcerNoopPolicyEnforcer(with test class too)RestrictAllTablesPolicyEnforcer(with test class too)PolicyModule(with test class too)ServerManagerTestMSQTaskQueryMakerTestThis PR has: