Allocate pending segments at latest committed version#15459
Merged
abhishekagarwal87 merged 2 commits intoapache:masterfrom Dec 14, 2023
Merged
Allocate pending segments at latest committed version#15459abhishekagarwal87 merged 2 commits intoapache:masterfrom
abhishekagarwal87 merged 2 commits intoapache:masterfrom
Conversation
AmatyaAvadhanula
approved these changes
Dec 6, 2023
Contributor
|
Could you please tick all the relevant items from the checklist at the end of the PR description? |
Contributor
Author
|
Thanks for the review, @AmatyaAvadhanula ! |
10 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
The segment allocation algorithm reuses an already allocated pending segment if the new allocation request is made for the same parameters:
skipSegmentLineageCheck(falsefor batch append,truefor streaming append)skipSegmentLineageCheck = false)The above parameters can thus uniquely identify a pending segment (enforced by the UNIQUE constraint on the
sequence_name_prev_id_sha1column indruid_pendingSegmentsmetadata table).This reuse is done in order to
Breaking scenario
pendingV11pendingV11pendingV11.pendingV11assegmentV11even though it belongs to an already overshadowed version.segmentV11gets immediately marked as unused and eventually deleted thus causing loss of the appended dataFix
Changes
sequence_name_prev_id_sha1thus preserving the UNIQUE constraintAlternate approach
Clean up pending segments as soon as they are not needed. It is difficult to ensure that a pending segment is not currently in use as multiple tasks might be using the same segment id.
Release note
Fix bug in segment allocation that can potentially cause loss of appended data when running interleaved append and replace tasks.
This PR has: