Skip to content

[do not merge: see #7089 instead] ParallelIndexSubTask: support ingestSegment in delegating factories#7063

Closed
glasser wants to merge 2 commits intoapache:masterfrom
apollographql:glasser/ingestsegment-in-combining
Closed

[do not merge: see #7089 instead] ParallelIndexSubTask: support ingestSegment in delegating factories#7063
glasser wants to merge 2 commits intoapache:masterfrom
apollographql:glasser/ingestsegment-in-combining

Conversation

@glasser
Copy link
Copy Markdown
Contributor

@glasser glasser commented Feb 13, 2019

IndexTask had special-cased code to properly send a TaskToolbox to a
IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory,
but ParallelIndexSubTask didn't.

This commit generalizes the concept to an optional setTaskToolbox method on
FirehoseFactory that CombiningFirehoseFactory and IngestSegmentFirehoseFactory
implement.

Also pass the toolbox through for ClippedFirehoseFactory,
FixedCountFirehoseFactory, and TimedShutoffFirehoseFactory --- ie, allow
IngestSegmentFirehoseFactory to be used from within these wrappers inside
both IndexTask and ParallelIndexSubTask.

@jihoonson
Copy link
Copy Markdown
Contributor

jihoonson commented Feb 13, 2019

Would you please check the build failure? Here's the error.

[ERROR] /home/travis/build/apache/incubator-druid/indexing-service/src/main/java/org/apache/druid/indexing/firehose/IngestSegmentFirehoseFactory.java:128: Line matches the illegal pattern 'Use "string".equals(javaVar) and STRING_CONSTANT.equals(javaVar) instead of javaVar.equals("string") and javaVar.equals(STRING_CONSTANT)'. [Regexp]

@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 13, 2019

CI passed (at least the Travis one).

@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 13, 2019

Hmm, there are a few other delegating factories that should implement this too.

@glasser glasser changed the title ParallelIndexSubTask: support ingestSegment wrapped in combining ParallelIndexSubTask: support ingestSegment in delegating factories Feb 13, 2019
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any plan to add other stuffs in addition to taskToolBox? If not, it looks too general to me.

Also, what do you think about adding a new parameter for taskToolBox to connect()? Even though It's currently used only in IngestSegmentFirehoseFactory.connect(), the delegating firehoseFactories should pass them, so I think it makes sense.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to just make it setTaskToolbox. But I had trouble making the argument be declared TaskToolbox rather than Object due to which module TaskToolbox is declared in.

Re adding it to connect(), that won't help for the use case where we want a TaskToolbox to determine splits (which is my actual motivation here)...

IndexTask had special-cased code to properly send a TaskToolbox to a
IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory,
but ParallelIndexSubTask didn't.

This commit generalizes the concept to an optional setContext method on
FirehoseFactory that CombiningFirehoseFactory and IngestSegmentFirehoseFactory
implement.

Also pass the context through for ClippedFirehoseFactory,
FixedCountFirehoseFactory, and TimedShutoffFirehoseFactory --- ie, allow
IngestSegmentFirehoseFactory to be used from within these wrappers inside
both IndexTask and ParallelIndexSubTask.
@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 14, 2019

De-generalized setContext to setTaskToolbox. CI is passing.

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @glasser! The latest change looks good to me, but please consider my comment.

* method signature uses Object so that FirehoseFactories don't all have to be inside the
* indexing-service module.
*/
default void setTaskToolbox(Object taskToolbox)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure how we can make this better without huge refactoring.. Let's open an issue about it after this PR.

@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 16, 2019

A potential other approach: instead of setting a TaskToolbox, just inject (@JacksonInject? still learning Jackon/Guice) a CoordinatorClient, which can learn how to POST /druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments?full. That's equivalent to the SegmentListUsedAction.

The only other thing you need from the toolbox is the SegmentLoader, which maybe can also be injected?

It's nice how the DataSegments returned from the coordinator are self-describing: you can even imagine configuring this to talk to an unrelated Druid cluster, as long as you have the proper permissions to download segments from deep storage.

@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 16, 2019

(Or even just inject the IndexerMetadataStorageCoordinator directly? I don't quite understand what's going on in CliPeon's configureTaskActionClient with respect to local vs remote.)

@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 19, 2019

I think #7089 is a better approach to solving this problem.

@glasser glasser changed the title ParallelIndexSubTask: support ingestSegment in delegating factories [do not merge] ParallelIndexSubTask: support ingestSegment in delegating factories Feb 20, 2019
@glasser glasser changed the title [do not merge] ParallelIndexSubTask: support ingestSegment in delegating factories [do not merge: see #7089 instead] ParallelIndexSubTask: support ingestSegment in delegating factories Feb 20, 2019
@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 20, 2019

Closing in favor of #7089.

@glasser glasser closed this Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants