-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Avoid deletion of load/drop entry from CuratorLoadQueuePeon in case of load timeout #10213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4e0a995
92154de
6c06ef2
a56f5bf
99f249e
9dbf5b5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -36,9 +36,15 @@ | |
| */ | ||
| public class SegmentReplicantLookup | ||
| { | ||
| public static SegmentReplicantLookup make(DruidCluster cluster) | ||
| public static SegmentReplicantLookup make(DruidCluster cluster, boolean replicateAfterLoadTimeout) | ||
| { | ||
| final Table<SegmentId, String, Integer> segmentsInCluster = HashBasedTable.create(); | ||
|
|
||
| /** | ||
| * For each tier, this stores the number of replicants for all the segments presently queued to load in {@link cluster}. | ||
| * Segments that have failed to load due to the load timeout may not be present in this table if {@link replicateAfterLoadTimeout} is true. | ||
| * This is to enable additional replication of the timed out segments for improved availability. | ||
| */ | ||
| final Table<SegmentId, String, Integer> loadingSegments = HashBasedTable.create(); | ||
|
|
||
| for (SortedSet<ServerHolder> serversByType : cluster.getSortedHistoricalsByTier()) { | ||
|
|
@@ -59,7 +65,11 @@ public static SegmentReplicantLookup make(DruidCluster cluster) | |
| if (numReplicants == null) { | ||
| numReplicants = 0; | ||
| } | ||
| loadingSegments.put(segment.getId(), server.getTier(), numReplicants + 1); | ||
| // Timed out segments need to be replicated in another server for faster availability. | ||
| // Therefore we skip incrementing numReplicants for timed out segments if replicateAfterLoadTimeout is enabled. | ||
| if (!replicateAfterLoadTimeout || !serverHolder.getPeon().getTimedOutSegments().contains(segment)) { | ||
| loadingSegments.put(segment.getId(), server.getTier(), numReplicants + 1); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As @himanshug pointed out in #10193 (comment), there could be two types of slow segment loading.
This particular change in
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jihoonson @himanshug Would it make sense to make the replication behavior user configurable? We could have a dynamic config like
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding a config seems reasonable to me 👍
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It sounds good to me too.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a config. I've set
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It sounds good to me to preserve the existing behavior by default. |
||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we don't emit exceptions currently (using
EmittingLogger.makeAlert()), but should we? At least for the segment loading timeout error, it would be nice to emit those errors so that cluster operators can notice there is something going wrong with segment loading.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alerting sounds like a good idea, but my concern is that since the alert would happen per segment, a slowness on the historical side can generate a large number of alerts for a fairly large cluster. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also as a followup PR I was planning to add the timedOut segment list to the
/druid/coordinator/v1/loadqueuealong with some docs about its usage in understanding the cluster behavior.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a valid concern. We may be able to emit those exceptions in bulk if they are thrown in a short time frame. I believe this should be done in a separate PR even if we want, and thus my comment is not a blocker for this PR.
Thanks. It sounds good to me.