[do not merge: see #7089 instead] ParallelIndexSubTask: support ingestSegment in delegating factories#7063
[do not merge: see #7089 instead] ParallelIndexSubTask: support ingestSegment in delegating factories#7063glasser wants to merge 2 commits intoapache:masterfrom apollographql:glasser/ingestsegment-in-combining
Conversation
|
Would you please check the build failure? Here's the error. |
|
CI passed (at least the Travis one). |
|
Hmm, there are a few other delegating factories that should implement this too. |
There was a problem hiding this comment.
Do you have any plan to add other stuffs in addition to taskToolBox? If not, it looks too general to me.
Also, what do you think about adding a new parameter for taskToolBox to connect()? Even though It's currently used only in IngestSegmentFirehoseFactory.connect(), the delegating firehoseFactories should pass them, so I think it makes sense.
There was a problem hiding this comment.
I'm fine to just make it setTaskToolbox. But I had trouble making the argument be declared TaskToolbox rather than Object due to which module TaskToolbox is declared in.
Re adding it to connect(), that won't help for the use case where we want a TaskToolbox to determine splits (which is my actual motivation here)...
IndexTask had special-cased code to properly send a TaskToolbox to a IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory, but ParallelIndexSubTask didn't. This commit generalizes the concept to an optional setContext method on FirehoseFactory that CombiningFirehoseFactory and IngestSegmentFirehoseFactory implement. Also pass the context through for ClippedFirehoseFactory, FixedCountFirehoseFactory, and TimedShutoffFirehoseFactory --- ie, allow IngestSegmentFirehoseFactory to be used from within these wrappers inside both IndexTask and ParallelIndexSubTask.
|
De-generalized setContext to setTaskToolbox. CI is passing. |
| * method signature uses Object so that FirehoseFactories don't all have to be inside the | ||
| * indexing-service module. | ||
| */ | ||
| default void setTaskToolbox(Object taskToolbox) |
There was a problem hiding this comment.
I'm really not sure how we can make this better without huge refactoring.. Let's open an issue about it after this PR.
|
A potential other approach: instead of setting a TaskToolbox, just inject ( The only other thing you need from the toolbox is the SegmentLoader, which maybe can also be injected? It's nice how the DataSegments returned from the coordinator are self-describing: you can even imagine configuring this to talk to an unrelated Druid cluster, as long as you have the proper permissions to download segments from deep storage. |
|
(Or even just inject the IndexerMetadataStorageCoordinator directly? I don't quite understand what's going on in CliPeon's configureTaskActionClient with respect to local vs remote.) |
|
I think #7089 is a better approach to solving this problem. |
|
Closing in favor of #7089. |
IndexTask had special-cased code to properly send a TaskToolbox to a
IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory,
but ParallelIndexSubTask didn't.
This commit generalizes the concept to an optional setTaskToolbox method on
FirehoseFactory that CombiningFirehoseFactory and IngestSegmentFirehoseFactory
implement.
Also pass the toolbox through for ClippedFirehoseFactory,
FixedCountFirehoseFactory, and TimedShutoffFirehoseFactory --- ie, allow
IngestSegmentFirehoseFactory to be used from within these wrappers inside
both IndexTask and ParallelIndexSubTask.