Change default handoffConditionTimeout to 15 minutes.#14539
Merged
gianm merged 4 commits intoapache:masterfrom Jul 13, 2023
Merged
Change default handoffConditionTimeout to 15 minutes.#14539gianm merged 4 commits intoapache:masterfrom
gianm merged 4 commits intoapache:masterfrom
Conversation
Most of the time, when handoff is taking this long, it's because something is preventing Historicals from loading new data. In this case, we have two choices: 1) Stop making progress on ingestion, wait for Historicals to load stuff, and keep the waiting-for-handoff segments available on realtime tasks. (handoffConditionTimeout = 0, the current default) 2) Continue making progress on ingestion, by exiting the realtime tasks that were waiting for handoff. Once the Historicals get their act together, the segments will be loaded, as they are still there on deep storage. They will just not be continuously available. (handoffConditionTimeout > 0) I believe most users would prefer [2], because [1] risks ingestion falling behind the stream, which causes many other problems. It can cause data loss if the stream ages-out data before we have a chance to ingest it. Due to the way tuningConfigs are serialized -- defaults are baked into the serialized form that is written to the database -- this default change will not change anyone's existing supervisors. It will take effect for newly created supervisors.
Contributor
|
Thinking a bit about this change and discussing it with @kfaraz, it probably is ok to make this change. Initially, I was worried about this leading to holes in the data and recovery. But since it's only the hand-off, those holes will be temporary since the segment has been created. |
abhishekagarwal87
approved these changes
Jul 10, 2023
ektravel
reviewed
Jul 12, 2023
ektravel
reviewed
Jul 12, 2023
ektravel
approved these changes
Jul 12, 2023
Contributor
ektravel
left a comment
There was a problem hiding this comment.
A couple of minor nits.
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
sergioferragut
pushed a commit
to sergioferragut/druid
that referenced
this pull request
Jul 21, 2023
* Change default handoffConditionTimeout to 15 minutes. Most of the time, when handoff is taking this long, it's because something is preventing Historicals from loading new data. In this case, we have two choices: 1) Stop making progress on ingestion, wait for Historicals to load stuff, and keep the waiting-for-handoff segments available on realtime tasks. (handoffConditionTimeout = 0, the current default) 2) Continue making progress on ingestion, by exiting the realtime tasks that were waiting for handoff. Once the Historicals get their act together, the segments will be loaded, as they are still there on deep storage. They will just not be continuously available. (handoffConditionTimeout > 0) I believe most users would prefer [2], because [1] risks ingestion falling behind the stream, which causes many other problems. It can cause data loss if the stream ages-out data before we have a chance to ingest it. Due to the way tuningConfigs are serialized -- defaults are baked into the serialized form that is written to the database -- this default change will not change anyone's existing supervisors. It will take effect for newly created supervisors. * Fix tests. * Update docs/development/extensions-core/kafka-supervisor-reference.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/kinesis-ingestion.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Most of the time, when handoff is taking this long, it's because something is preventing Historicals from loading new data. In this case, we have two choices:
Stop making progress on ingestion, wait for Historicals to load stuff,
and keep the waiting-for-handoff segments available on realtime tasks.
(handoffConditionTimeout = 0, the current default)
Continue making progress on ingestion, by exiting the realtime tasks
that were waiting for handoff. Once the Historicals get their act
together, the segments will be loaded, as they are still there on
deep storage. They will just not be continuously available.
(handoffConditionTimeout > 0)
I believe most users would prefer [2], because [1] risks ingestion falling behind the stream, which causes many other problems. It can cause data loss if the stream ages-out data before we have a chance to ingest it.
Due to the way tuningConfigs are serialized -- defaults are baked into the serialized form that is written to the database -- this default change will not change anyone's existing supervisors. It will take effect for newly created supervisors.