Skip to content

Allocate pending segments at latest committed version#15459

Merged
abhishekagarwal87 merged 2 commits intoapache:masterfrom
kfaraz:fix_seg_alloc
Dec 14, 2023
Merged

Allocate pending segments at latest committed version#15459
abhishekagarwal87 merged 2 commits intoapache:masterfrom
kfaraz:fix_seg_alloc

Conversation

@kfaraz
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz commented Nov 30, 2023

Background

The segment allocation algorithm reuses an already allocated pending segment if the new allocation request is made for the same parameters:

  • datasource
  • sequence name
  • same interval
  • same value of skipSegmentLineageCheck (false for batch append, true for streaming append)
  • same previous segment id (used only when skipSegmentLineageCheck = false)

The above parameters can thus uniquely identify a pending segment (enforced by the UNIQUE constraint on the sequence_name_prev_id_sha1 column in druid_pendingSegments metadata table).

This reuse is done in order to

  • allow replica tasks (in case of streaming ingestion) to use the same set of segment IDs.
  • allow re-run of a failed batch task to use the same segment ID and prevent unnecessary allocations

Breaking scenario

  • Append task T1 allocates pending segment against version V1. Let's call this pending segment pendingV11
  • T1 fails and is unable to commit pendingV11
  • Replace task T2 commits a new version V2 which overshadows V1.
  • Append task T3 picks up from where T1 had failed. T3 thus tries to reuse pendingV11.
  • T3 completes and commits pendingV11 as segmentV11 even though it belongs to an already overshadowed version.
  • segmentV11 gets immediately marked as unused and eventually deleted thus causing loss of the appended data

Fix

  • A pending segment must always be allocated for the latest version of committed (used) segments in that interval.
  • If a pending segment already exists but belongs to an older version, do not reuse it. It will eventually be cleaned up by the coordinator.

Changes

  • Add the version of the allocated segment to the set of parameters that uniquely define a pending segment
  • Thus for a given datasource, interval, sequence, prev_segment_id, there can be multiple pending segments, but each of them must have a different version.
  • Include the version while calculating the hash sequence_name_prev_id_sha1 thus preserving the UNIQUE constraint
  • Add test to assert above behaviour. This test would fail without the changes in this PR.

Alternate approach

Clean up pending segments as soon as they are not needed. It is difficult to ensure that a pending segment is not currently in use as multiple tasks might be using the same segment id.

Release note

Fix bug in segment allocation that can potentially cause loss of appended data when running interleaved append and replace tasks.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@kfaraz kfaraz added the Bug label Nov 30, 2023
@AmatyaAvadhanula
Copy link
Copy Markdown
Contributor

Could you please tick all the relevant items from the checklist at the end of the PR description?
Otherwise, changes LGTM!

@abhishekagarwal87 abhishekagarwal87 merged commit feeb4f0 into apache:master Dec 14, 2023
@kfaraz kfaraz deleted the fix_seg_alloc branch December 14, 2023 14:15
@kfaraz
Copy link
Copy Markdown
Contributor Author

kfaraz commented Dec 14, 2023

Thanks for the review, @AmatyaAvadhanula !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants