Simplify concurrent streaming ingestion with replace#15844
Closed
AmatyaAvadhanula wants to merge 1 commit intoapache:masterfrom
Closed
Simplify concurrent streaming ingestion with replace#15844AmatyaAvadhanula wants to merge 1 commit intoapache:masterfrom
AmatyaAvadhanula wants to merge 1 commit intoapache:masterfrom
Conversation
Contributor
Author
|
Closing this in favor of a different approach in #16144 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR aims to simplify the working of a concurrent streaming ingestion job with a replacing commit.
The original approach required us to update the pending segments and send the mapping to the peon, and this can be error-prone with uncertainties.
This approach is as follows:
When a segment is upgraded to another segment of a higher version, we store the original id as a descriptor in the newly created segment.
The CachingClusteredClient on the Broker then includes all overshadowed realtime segments (if any), except the higher versions of them which already exist on historicals. The filtered set of realtime segments are then included along with the segments returned by the lookup of the complete timeline in the set of segments to be processed.
TODO:
Evaluate performance impact on queries
Clean up the pending segment and task communication methods used in the original approach
been self-reviewed.
added documentation for new or modified features or behaviors.
a release note entry in the PR description.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.