Skip to content

Fix attempts to publish the same pending segments multiple times#16605

Merged
AmatyaAvadhanula merged 3 commits intoapache:masterfrom
AmatyaAvadhanula:fix_duplicate_pending_segment_commits
Jun 18, 2024
Merged

Fix attempts to publish the same pending segments multiple times#16605
AmatyaAvadhanula merged 3 commits intoapache:masterfrom
AmatyaAvadhanula:fix_duplicate_pending_segment_commits

Conversation

@AmatyaAvadhanula
Copy link
Copy Markdown
Contributor

#16144 introduced a bug in the flow of batch segment allocation where pending segments are allocated exactly once but are not deduplicated before commit. This leads to a log like:

Duplicate entry <pendingSegmentId> for key 'PRIMARY'

The pending segments are not duplicated and this log is transient. However, it can be quite frequent on clusters having streaming ingestion with multiple replicas.

This PR fixes the issue by ensuring that each record is committed exactly once.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@AmatyaAvadhanula AmatyaAvadhanula requested a review from kfaraz June 14, 2024 09:16
@AmatyaAvadhanula AmatyaAvadhanula marked this pull request as ready for review June 14, 2024 09:16
));

final String now = DateTimes.nowUtc().toString();
final Set<String> processedSegmentIds = new HashSet<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use set of SegmentIdWithShardSpec instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please share why? The string is unique, and also acts as the primary key for the table

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes the code easier to understand. Logically, having a set of SegmentIdWithShardSpec would be the same as a set of string since the equals and hashCode of SegmentIdWithShardSpec uses the same id which is returned in toString().

IIRC, the original code (i.e. pre-PendingSegmentRecord) was also maintaining a set of SegmentIdWithShardSpec.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the explanation. I've added a test as well.

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a unit test to verify the behaviour, if possible.

@AmatyaAvadhanula AmatyaAvadhanula requested a review from kfaraz June 18, 2024 03:55
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, @AmatyaAvadhanula !

}

@Test
public void testDuplicatePendingSegmentEntriesAreNotInserted()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@AmatyaAvadhanula
Copy link
Copy Markdown
Contributor Author

Thank you, @kfaraz!

@AmatyaAvadhanula AmatyaAvadhanula merged commit 4c8932e into apache:master Jun 18, 2024
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
vivek807 pushed a commit to deep-bi/druid that referenced this pull request Dec 6, 2024
…che#16605)

* Fix attempts to publish the same pending segments multiple times

(cherry picked from commit 4c8932e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants