ParallelIndexSubTask: support ingestSegment in delegating factories#7089
ParallelIndexSubTask: support ingestSegment in delegating factories#7089jihoonson merged 1 commit intoapache:masterfrom apollographql:glasser/ingestsegment-inject
Conversation
|
Note that this PR is integration-tested by ITIndexerTest. Please note two FIXME comments in the PR. My intention is to squash the two commits of the PR together (and keep the PR description as the commit message). I thought reviewers might like to see my initial attempt which injected a SegmentLoaderFactory instead of directly creating a SegmentLoaderLocalCacheManager. That commit led to this error in ITIndexerTest: |
|
OK, I had a better idea — it seemed like not too hard to improve SegmentLoaderFactory to be injectible without needing to inject SegmentLoaderConfig. Hopefully this works; I stayed up too late and don't want to wait for tests to run :) |
|
Ok great, that worked... But I don't understand how to interpret the Team City report. Looks like tens of thousands of errors in files I didn't touch... |
|
OK, looks like it somehow split my 4-commit PR into two changes (because I pushed twice?) The first adds 1 error and removes 2813. The second adds 2812 errors and removes 1. That... seems like net negative? (That one new added error is definitely a problem in an earlier part of my PR that is fixed on HEAD.) Did it get confused because I rebased? |
|
@glasser thanks for raising a new PR. Sadly, TeamCity sometimes reports something weird. I restarted it. |
There was a problem hiding this comment.
I don't think it's necessary after this PR, but would be worth to double check.
There was a problem hiding this comment.
OK. Do you think just removing this statement and running integration tests would be enough to double check?
There was a problem hiding this comment.
Yeah, I think it would be enough.
There was a problem hiding this comment.
There's a difference between using SegmentListUsedAction and directly calling this API: users can configure the retry policy on failure for all remote task actions including SegmentListUsedAction. (I'm not sure this is intended, but the actual max number of retries for remote task actions is RetryPolicy.maxNumRetries * DruidLeaderClient.MAX_RETRIES).
Can we do the same thing for the coordinator client to provide the same experience?
There was a problem hiding this comment.
OK, what are you proposing exactly?
(a) Should there be a separate retry policy config bound elsewhere, or should this reuse druid.peon.taskActionClient.retry
(b) Should the retry be performed inside this method here, or in the caller?
There was a problem hiding this comment.
(a) I think we should reuse the same configuration name for backward compatibility.
(b) I think it's fine to retry in this method.
There was a problem hiding this comment.
Hmm, the whole RetryPolicy set of classes is defined in indexing-service, so I think I'll ignore your suggestion on (b) and do it in IngestSegmentFirehoseFactory.
There was a problem hiding this comment.
I'm fine by moving CoordinatorClient to indexing-service along with CoordinatorBasedSegmentHandoffNotifier, CoordinatorBasedSegmentHandoffNotifierConfig, and CoordinatorBasedSegmentHandoffNotifierFactory , but it's up to you.
There was a problem hiding this comment.
CoordinatorBasedSegmentHandoffNotifierFactory is used by RealtimeModule which seems to otherwise use very little (though nonzero) from indexing-service. I think doing this move feels a little intense to me and I'd like to leave it alone if that's OK.
There was a problem hiding this comment.
What do you mean by encoding url here?
There was a problem hiding this comment.
Hmm, it's confusing that IntelliJ reformatted this part, which is unchanged other than my comment. I do think my settings are based on the Druid configuration...
In my new method below (getDatabaseSegmentDataSourceSegments) it was necessary for me to write StringUtils.urlEncode(dataSource) in order to get the integration tests (which helpfully put a bunch of non-ASCII characters in the data source name) to pass.
isHandOffComplete above also does that. But this method does not. So that makes me think that this method has a bug (unrelated to the work in this PR).
There was a problem hiding this comment.
Oh, yeah dataSource should be encoded. Thanks for catching it!
|
OK, responded to your comments and Travis is passing. Commits are intended to be squashed; the PR description has been updated to be a good commit description. |
There was a problem hiding this comment.
This looks good to me, but I feel like we can use RetryUtils.retry() to remove duplicate codes even though the retry logic is slightly different. It's up to you.
There was a problem hiding this comment.
It seems like the logic in RetryUtils is somewhat different and not configurable. If we want to keep the compatibility I'd rather keep it this way. A nice little project might be to move RetryPolicy and friends into core along with RetryUtils and then have a version of RetryUtils.retry that takes a RetryPolicyConfig or something — actually maybe just dropping RetryPolicy and RetryPolicyFactory entirely? But I feel like this PR refactors enough already.
There was a problem hiding this comment.
I agree. Integrating RetryPolicy and RetryUtils sounds good, but should be done in a separate PR.
|
I restarted teamcity. |
|
Yay, all checks passed! |
|
Nice! I'll merge this PR tonight unless there's any other opinion. |
IndexTask had special-cased code to properly send a TaskToolbox to a IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory, but ParallelIndexSubTask didn't. This change refactors IngestSegmentFirehoseFactory so that it doesn't need a TaskToolbox; it instead gets a CoordinatorClient and a SegmentLoaderFactory directly injected into it. This also refactors SegmentLoaderFactory so it doesn't depend on an injectable SegmentLoaderConfig, since its only method always replaces the preconfigured SegmentLoaderConfig anyway. This makes it possible to use SegmentLoaderFactory without setting druid.segmentCaches.locations to some dummy value. Another goal of this PR is to make it possible for IngestSegmentFirehoseFactory to list data segments outside of connect() --- specifically, to make it a FiniteFirehoseFactory which can query the coordinator in order to calculate its splits. See #7048. This also adds missing datasource name URL-encoding to an API used by CoordinatorBasedSegmentHandoffNotifier.
|
All I did is squash it and Team City got sad :( |
|
We usually squash commits when the PR is merged (please check https://github.com/apache/incubator-druid/blob/master/CONTRIBUTING.md#if-your-pull-request-shows-conflicts-with-master). I'm going to merge this PR now. Thanks @glasser! |
IndexTask had special-cased code to properly send a TaskToolbox to a
IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory,
but ParallelIndexSubTask didn't.
This change refactors IngestSegmentFirehoseFactory so that it doesn't need a
TaskToolbox; it instead gets a CoordinatorClient and a SegmentLoaderFactory
directly injected into it.
This also refactors SegmentLoaderFactory so it doesn't depend on
an injectable SegmentLoaderConfig, since its only method always
replaces the preconfigured SegmentLoaderConfig anyway.
This makes it possible to use SegmentLoaderFactory without setting
druid.segmentCaches.locations to some dummy value.
Another goal of this PR is to make it possible for IngestSegmentFirehoseFactory
to list data segments outside of connect() --- specifically, to make it a
FiniteFirehoseFactory which can query the coordinator in order to calculate its
splits. See #7048.
This also adds missing datasource name URL-encoding to an API used by
CoordinatorBasedSegmentHandoffNotifier.