HDDS-12115. RM selects replicas to delete non-deterministically if nodes are overloaded #7728
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
When RM selects nodes to delete replicas from, it sorts the replicas by datanode UUID and then iterates the list. If a node is overloaded when it is selected for delete, then rather than holding that delete for later, it skips it and tries the next replica in the list. This can result in non-deterministic delete selection, which we want to avoid.
This PR changes that, so that the original replica is no longer skipped, but will be tried again on the next iteration.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-12115
How was this patch tested?
A couple of new mis-replication tests added and a test which started failing after the change was modified to reflect the new intended behavior.