Skip to content

ParallelIndexSubTask: support ingestSegment in delegating factories#7089

Merged
jihoonson merged 1 commit intoapache:masterfrom
apollographql:glasser/ingestsegment-inject
Feb 24, 2019
Merged

ParallelIndexSubTask: support ingestSegment in delegating factories#7089
jihoonson merged 1 commit intoapache:masterfrom
apollographql:glasser/ingestsegment-inject

Conversation

@glasser
Copy link
Copy Markdown
Contributor

@glasser glasser commented Feb 19, 2019

IndexTask had special-cased code to properly send a TaskToolbox to a
IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory,
but ParallelIndexSubTask didn't.

This change refactors IngestSegmentFirehoseFactory so that it doesn't need a
TaskToolbox; it instead gets a CoordinatorClient and a SegmentLoaderFactory
directly injected into it.

This also refactors SegmentLoaderFactory so it doesn't depend on
an injectable SegmentLoaderConfig, since its only method always
replaces the preconfigured SegmentLoaderConfig anyway.
This makes it possible to use SegmentLoaderFactory without setting
druid.segmentCaches.locations to some dummy value.

Another goal of this PR is to make it possible for IngestSegmentFirehoseFactory
to list data segments outside of connect() --- specifically, to make it a
FiniteFirehoseFactory which can query the coordinator in order to calculate its
splits. See #7048.

This also adds missing datasource name URL-encoding to an API used by
CoordinatorBasedSegmentHandoffNotifier.

@glasser glasser marked this pull request as ready for review February 19, 2019 07:30
@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 19, 2019

Note that this PR is integration-tested by ITIndexerTest.

Please note two FIXME comments in the PR.

My intention is to squash the two commits of the PR together (and keep the PR description as the commit message). I thought reviewers might like to see my initial attempt which injected a SegmentLoaderFactory instead of directly creating a SegmentLoaderLocalCacheManager. That commit led to this error in ITIndexerTest:

Can not construct instance of org.apache.druid.indexing.firehose.IngestSegmentFirehoseFactory, problem: Unable to provision, see the following errors:

1) druid.segmentCache.locations - may not be empty
  at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> org.apache.druid.guice.StorageNodeModule)
  at org.apache.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:151) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> org.apache.druid.guice.StorageNodeModule)
  while locating com.google.common.base.Supplier<org.apache.druid.segment.loading.SegmentLoaderConfig>
  at org.apache.druid.guice.ConfigProvider.bind(JsonConfigProvider.java:152) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> org.apache.druid.guice.StorageNodeModule)
  while locating org.apache.druid.segment.loading.SegmentLoaderConfig
    for the 2nd parameter of org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.<init>(SegmentLoaderLocalCacheManager.java:67)
  while locating org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager
    for the 1st parameter of org.apache.druid.indexing.common.SegmentLoaderFactory.<init>(SegmentLoaderFactory.java:41)
  while locating org.apache.druid.indexing.common.SegmentLoaderFactory

1 error
 at [Source: HttpInputOverHTTP@3eff14c3[c=1623,q=0,[0]=null,s=STREAM]; line: 47, column: 13] (through reference chain: org.apache.druid.indexing.common.task.IndexTask["spec"]->org.apache.druid.indexing.common.task.IndexIngestionSpec["ioConfig"]->org.apache.druid.indexing.common.task.IndexIOConfig["firehose"])

@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 19, 2019

OK, I had a better idea — it seemed like not too hard to improve SegmentLoaderFactory to be injectible without needing to inject SegmentLoaderConfig. Hopefully this works; I stayed up too late and don't want to wait for tests to run :)

@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 19, 2019

Ok great, that worked... But I don't understand how to interpret the Team City report. Looks like tens of thousands of errors in files I didn't touch...

@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 19, 2019

OK, looks like it somehow split my 4-commit PR into two changes (because I pushed twice?) The first adds 1 error and removes 2813. The second adds 2812 errors and removes 1. That... seems like net negative? (That one new added error is definitely a problem in an earlier part of my PR that is fixed on HEAD.)

Did it get confused because I rebased?

@jihoonson
Copy link
Copy Markdown
Contributor

@glasser thanks for raising a new PR. Sadly, TeamCity sometimes reports something weird. I restarted it.

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glasser thanks! I think this PR is definitely better!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary after this PR, but would be worth to double check.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Do you think just removing this statement and running integration tests would be enough to double check?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it would be enough.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a difference between using SegmentListUsedAction and directly calling this API: users can configure the retry policy on failure for all remote task actions including SegmentListUsedAction. (I'm not sure this is intended, but the actual max number of retries for remote task actions is RetryPolicy.maxNumRetries * DruidLeaderClient.MAX_RETRIES).

Can we do the same thing for the coordinator client to provide the same experience?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, what are you proposing exactly?

(a) Should there be a separate retry policy config bound elsewhere, or should this reuse druid.peon.taskActionClient.retry
(b) Should the retry be performed inside this method here, or in the caller?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a) I think we should reuse the same configuration name for backward compatibility.
(b) I think it's fine to retry in this method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the whole RetryPolicy set of classes is defined in indexing-service, so I think I'll ignore your suggestion on (b) and do it in IngestSegmentFirehoseFactory.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine by moving CoordinatorClient to indexing-service along with CoordinatorBasedSegmentHandoffNotifier, CoordinatorBasedSegmentHandoffNotifierConfig, and CoordinatorBasedSegmentHandoffNotifierFactory , but it's up to you.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoordinatorBasedSegmentHandoffNotifierFactory is used by RealtimeModule which seems to otherwise use very little (though nonzero) from indexing-service. I think doing this move feels a little intense to me and I'd like to leave it alone if that's OK.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by encoding url here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's confusing that IntelliJ reformatted this part, which is unchanged other than my comment. I do think my settings are based on the Druid configuration...

In my new method below (getDatabaseSegmentDataSourceSegments) it was necessary for me to write StringUtils.urlEncode(dataSource) in order to get the integration tests (which helpfully put a bunch of non-ASCII characters in the data source name) to pass.

isHandOffComplete above also does that. But this method does not. So that makes me think that this method has a bug (unrelated to the work in this PR).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah dataSource should be encoded. Thanks for catching it!

@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 21, 2019

OK, responded to your comments and Travis is passing. Commits are intended to be squashed; the PR description has been updated to be a good commit description.

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a couple of trivial comments. Thanks @glasser!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but I feel like we can use RetryUtils.retry() to remove duplicate codes even though the retry logic is slightly different. It's up to you.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the logic in RetryUtils is somewhat different and not configurable. If we want to keep the compatibility I'd rather keep it this way. A nice little project might be to move RetryPolicy and friends into core along with RetryUtils and then have a version of RetryUtils.retry that takes a RetryPolicyConfig or something — actually maybe just dropping RetryPolicy and RetryPolicyFactory entirely? But I feel like this PR refactors enough already.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Integrating RetryPolicy and RetryUtils sounds good, but should be done in a separate PR.

@jihoonson
Copy link
Copy Markdown
Contributor

I restarted teamcity.

@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 22, 2019

Yay, all checks passed!

@jihoonson
Copy link
Copy Markdown
Contributor

Nice! I'll merge this PR tonight unless there's any other opinion.

IndexTask had special-cased code to properly send a TaskToolbox to a
IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory,
but ParallelIndexSubTask didn't.

This change refactors IngestSegmentFirehoseFactory so that it doesn't need a
TaskToolbox; it instead gets a CoordinatorClient and a SegmentLoaderFactory
directly injected into it.

This also refactors SegmentLoaderFactory so it doesn't depend on
an injectable SegmentLoaderConfig, since its only method always
replaces the preconfigured SegmentLoaderConfig anyway.
This makes it possible to use SegmentLoaderFactory without setting
druid.segmentCaches.locations to some dummy value.

Another goal of this PR is to make it possible for IngestSegmentFirehoseFactory
to list data segments outside of connect() --- specifically, to make it a
FiniteFirehoseFactory which can query the coordinator in order to calculate its
splits. See #7048.

This also adds missing datasource name URL-encoding to an API used by
CoordinatorBasedSegmentHandoffNotifier.
@glasser
Copy link
Copy Markdown
Contributor Author

glasser commented Feb 23, 2019

All I did is squash it and Team City got sad :(

@jihoonson
Copy link
Copy Markdown
Contributor

We usually squash commits when the PR is merged (please check https://github.com/apache/incubator-druid/blob/master/CONTRIBUTING.md#if-your-pull-request-shows-conflicts-with-master).

I'm going to merge this PR now. Thanks @glasser!

@jihoonson jihoonson merged commit 1c2753a into apache:master Feb 24, 2019
@jihoonson jihoonson added this to the 0.15.0 milestone May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants