-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Remediate ingestion failures when number of segments in time period is larger than 32767 #15090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just checking the argument in the constructor of
RootPartitionRangeis enough. SinceRootPartitionRangedoes not acceptstartPartitionIdorendPartitionIdless than 0, we will not have a case wherepartitionIdpassed to this method is less than 0.And even if it is, we should throw an exception rather than silently convert it to max value.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking on this! @kfaraz , the constructor of
RootPartitionRangethat takes short type ofstartPartitionIdwas defined asprivateand we are forced to use :Since


startPartitionIdis in Integer range, whenstartPartitionId> Short.MAV_VALUE ( 32767 ), the casting from int -> short start producing negative number, for exampleAnd the loop repeats when
startPartitionIdcontinued going because casting from Int to Short losses precision:We happen to ran into this short overflow scenario described in #15091 and our ingestion task for new data was completely broken because of this, throwing exception would still make the ingestion fail.
Here I'm making
startPartitionIdto be Short.MAX_VALUE so that it won't producejava.lang.IllegalArgumentException: fromKey > toKeyand broke ingestion when we dostateMap.subMap(lowFence, false, highFence, false), this is just an remediation.I believe a better way to handle this is to allow
startPartitionIdandendPartitionIdto be integer and avoid the problematic precision-loss casting, I can send another PR to this solution if you can confirm why we had thisshortlimit originally and it is appropriate to do so.Best,
Dun
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kfaraz thanks again for spending time on this!
Bumping again just in case you missed my last message , I‘m more than glad to if there’s anything need further clarification , I can also connect on a zoom call if it‘s convenient for you!
Best Regards,
Dun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dulu98Kurz - I am not entirely sure but
shortwas likely chosen to save on memory that storing these ids will take. 32K partitions in one single interval is too high. Can you describe a bit more as to how your cluster ends up in this situation and why is that a genuine scenario? In my experience, almost every time, an interval touching this high number means that compaction is not configured or ingestion is misconfigured.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @abhishekagarwal87 , thanks for checking on this!
You are right our investigation suggesting both late-messages from upstream and compactions falling behind, specifically we found there were random late-messages mixed in the kafka topics, it keep adding tiny segments to finalized trunk and eventually goes beyond
shortrange and broke live ingestion tasks of new data, setting rejection period was not ideal because it means we will lose data, and because compaction falling behind we can`t afford to wait for it to catch up , I end up hard deleting the problematic time-trunk and then I realized solely relying on compaction seems inadequate.Admittedly it is not an ideal use-case for Druid to handle random late messages, but it was a really difficult choice when user had to chose between letting ingestion broke vs deleting problematic time trunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So instead of capping at
shortmax, we can possibly cover the gap by:IntrangeFor users who do not have late messages or compaction issues, this change has no impact to them because they won't store more than
shortmax segments anyway, so we don`t break the initial intension of saving on mem.For users who actually can produce segments beyond
shortmax, this will buy them more time to compact/reduc number of segments, which may eventually avoid the difficult situation above.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From code quality perspective,

Short.toUnsignedIntis a precision-loss conversion and we used it in 2 files for 18 times, we can simplify the logic and improve readability if we change tointLastly, when partitionId is out of range, the logic we use to handle it right now is simply wrong:
For example when:

Then
stateMap.subMap(lowFench, false, highFence, false)will return all entries instead of empty ...If we are ok with remediation in this PR, we can proceed with merging, if we are OK with refactoring please allow me to send another PR to fix it more completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the PR that refactor partitionId from short to int so that we can compare the scope of changes
#15116