Skip to content

Fix concurrent append to interval with unused segments#18230

Merged
kfaraz merged 4 commits intoapache:masterfrom
kfaraz:fix_concurrent_append_with_unused
Jul 11, 2025
Merged

Fix concurrent append to interval with unused segments#18230
kfaraz merged 4 commits intoapache:masterfrom
kfaraz:fix_concurrent_append_with_unused

Conversation

@kfaraz
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz commented Jul 10, 2025

Description

This is a better fix for #18216 as suggested by @imply-cheddar in the PR comments.

Note on the original fix

The case of version collision during segment allocation handled in the original PR can happen only if the following conditions are met:

  • The task performing the allocation is an append task (appendToExisting: true)
    • .Non-append tasks don't use segment allocation anyway and just replace segments that have the same version as their locks.
  • The task is using an APPEND lock.
    • Otherwise, the task would have been using an EXCLUSIVE lock which would always have a unique version (due to exclusivity).
  • The interval in question has no used segments but it does have some unused segments.
    • If there were no unused segments, there would be no clash.
    • If there were any used segments, we would simply be appending to that version which wouldn't constitute a clash.
  • The unused segments in the interval must have been created by a prior task that was also using an APPEND lock.
    • Any other kind of lock would have resulted in a non-1970 version, thus avoiding clash with subsequent APPEND tasks.

Despite this fact, appending to an already existing version as a side effect is not desirable and can lead to data consistency issues when this version is marked used / unused.

Changes

  • Use a new segment version rather than allocating new partition numbers for an existing unused version

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@cryptoe cryptoe added this to the 34.0.0 milestone Jul 10, 2025
Copy link
Copy Markdown
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. Nothing to block merge over, so approving anyway.

}

if (foundFreshVersion) {
return new SegmentIdWithShardSpec(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably such a seldom-done thing that an INFO log indicating that a new version was found and what it is shouldn't flood the logs, but also should provide decent context on what's going on in cases where this ends up getting called/run unexpectedly.

Comment on lines +1145 to +1147
// Verify that the new segment gets a different version
Assert.assertEquals(SEGMENT_V0 + "S", pendingSegment2.getVersion());
Assert.assertEquals(0, pendingSegment2.getShardSpec().getPartitionNum());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably also be good to validate the "keeps appending" behavior in the tests as well (i.e. if one is created, marked unused, a new one created, marked unused, created again, marked unused, etc. it should just build mroe and more 'S's)

@kfaraz kfaraz closed this Jul 11, 2025
@kfaraz kfaraz reopened this Jul 11, 2025
@kfaraz kfaraz closed this Jul 11, 2025
@kfaraz kfaraz reopened this Jul 11, 2025
@cryptoe cryptoe closed this Jul 11, 2025
@cryptoe cryptoe reopened this Jul 11, 2025
@kfaraz kfaraz closed this Jul 11, 2025
@kfaraz kfaraz reopened this Jul 11, 2025
@kfaraz kfaraz closed this Jul 11, 2025
@kfaraz kfaraz reopened this Jul 11, 2025
@kfaraz kfaraz closed this Jul 11, 2025
@kfaraz kfaraz reopened this Jul 11, 2025
@kfaraz kfaraz closed this Jul 11, 2025
@kfaraz kfaraz reopened this Jul 11, 2025
@Akshat-Jain Akshat-Jain reopened this Jul 11, 2025
@kfaraz kfaraz merged commit e26d007 into apache:master Jul 11, 2025
76 checks passed
@kfaraz kfaraz deleted the fix_concurrent_append_with_unused branch July 11, 2025 17:36
capistrant pushed a commit to capistrant/incubator-druid that referenced this pull request Jul 17, 2025
This is a better approach to the fix in apache#18216

Changes:
- When allocating the first segment in an interval which already contains an unused segment,
use a fresh version rather than reusing the old version (now unused)
capistrant added a commit that referenced this pull request Jul 17, 2025
This is a better approach to the fix in #18216

Changes:
- When allocating the first segment in an interval which already contains an unused segment,
use a fresh version rather than reusing the old version (now unused)

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants