Fix attempts to publish the same pending segments multiple times#16605
Conversation
| )); | ||
|
|
||
| final String now = DateTimes.nowUtc().toString(); | ||
| final Set<String> processedSegmentIds = new HashSet<>(); |
There was a problem hiding this comment.
Use set of SegmentIdWithShardSpec instead.
There was a problem hiding this comment.
Could you please share why? The string is unique, and also acts as the primary key for the table
There was a problem hiding this comment.
It makes the code easier to understand. Logically, having a set of SegmentIdWithShardSpec would be the same as a set of string since the equals and hashCode of SegmentIdWithShardSpec uses the same id which is returned in toString().
IIRC, the original code (i.e. pre-PendingSegmentRecord) was also maintaining a set of SegmentIdWithShardSpec.
There was a problem hiding this comment.
Done. Thanks for the explanation. I've added a test as well.
kfaraz
left a comment
There was a problem hiding this comment.
Please add a unit test to verify the behaviour, if possible.
kfaraz
left a comment
There was a problem hiding this comment.
Thanks for the fix, @AmatyaAvadhanula !
| } | ||
|
|
||
| @Test | ||
| public void testDuplicatePendingSegmentEntriesAreNotInserted() |
|
Thank you, @kfaraz! |
…che#16605) * Fix attempts to publish the same pending segments multiple times (cherry picked from commit 4c8932e)
#16144 introduced a bug in the flow of batch segment allocation where pending segments are allocated exactly once but are not deduplicated before commit. This leads to a log like:
The pending segments are not duplicated and this log is transient. However, it can be quite frequent on clusters having streaming ingestion with multiple replicas.
This PR fixes the issue by ensuring that each record is committed exactly once.
This PR has: