HDDS-12115. RM selects replicas to delete non-deterministically if nodes are overloaded #7728

sodonnel · 2025-01-21T12:43:41Z

What changes were proposed in this pull request?

When RM selects nodes to delete replicas from, it sorts the replicas by datanode UUID and then iterates the list. If a node is overloaded when it is selected for delete, then rather than holding that delete for later, it skips it and tries the next replica in the list. This can result in non-deterministic delete selection, which we want to avoid.

This PR changes that, so that the original replica is no longer skipped, but will be tried again on the next iteration.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12115

How was this patch tested?

A couple of new mis-replication tests added and a test which started failing after the change was modified to reflect the new intended behavior.

…des are overloaded

siddhantsangwan

LGTM!

siddhantsangwan · 2025-01-25T18:44:14Z

Thanks for the patch.

…des are overloaded (apache#7728) (cherry picked from commit efd8adc)

…des are overloaded (apache#7728)

* CDPD-78092. HDDS-12114. Prevent delete commands running after a long lock wait and send ICR earlier (apache#7726) (cherry picked from commit b6cc4af) Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java Change-Id: I62ffb7203f2af5be2901ef923f333de53bbc3656 * CDPD-78149. HDDS-12115. RM selects replicas to delete non-deterministically if nodes are overloaded (apache#7728) (cherry picked from commit efd8adc) Conflicts: hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestRatisOverReplicationHandler.java Change-Id: Ia3d54917c7c488a9b706f6ce941e7f466746d3bd * CDPD-78286. HDDS-12135. Set RM default deadline to 12 minutes and datanode offset to 6 minutes (apache#7747) (cherry picked from commit d7616ec) Change-Id: I36f237705f5a94d453bcec72c32056c2be8f38ba * CDPD-78213. HDDS-12127. RM should not expire pending deletes, but retry until delete is confirmed or node is dead (apache#7746) (cherry picked from commit 04f6255) Conflicts: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerReplicaPendingOps.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/balancer/TestMoveManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestContainerReplicaPendingOps.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestECContainerReplicaCount.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestReplicationManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestReplicationManagerScenarios.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/TestBlockDeletion.java Change-Id: Ic01591f72706f2473c63dd2e44c3f2a94fb70d43 --------- Co-authored-by: Stephen O'Donnell <stephen.odonnell@gmail.com>

…des are overloaded (apache#7728)

…des are overloaded (apache#7728) (cherry picked from commit efd8adc)

HDDS-12115. RM selects replicas to delete non-deterministically if no…

d47660d

…des are overloaded

sodonnel requested a review from siddhantsangwan January 21, 2025 12:43

siddhantsangwan approved these changes Jan 25, 2025

View reviewed changes

siddhantsangwan merged commit efd8adc into apache:master Jan 25, 2025
42 checks passed

sodonnel added a commit to sodonnel/hadoop-ozone that referenced this pull request Jan 30, 2025

HDDS-12115. RM selects replicas to delete non-deterministically if no…

ed5311a

…des are overloaded (apache#7728) (cherry picked from commit efd8adc)

nandakumar131 pushed a commit to nandakumar131/ozone that referenced this pull request Feb 10, 2025

HDDS-12115. RM selects replicas to delete non-deterministically if no…

78297ea

…des are overloaded (apache#7728)

Cyrill pushed a commit to Cyrill/ozone that referenced this pull request Nov 10, 2025

HDDS-12115. RM selects replicas to delete non-deterministically if no…

43cb2e5

…des are overloaded (apache#7728)

Cyrill pushed a commit to Cyrill/ozone that referenced this pull request Nov 25, 2025

HDDS-12115. RM selects replicas to delete non-deterministically if no…

27deb3d

…des are overloaded (apache#7728) (cherry picked from commit efd8adc)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-12115. RM selects replicas to delete non-deterministically if nodes are overloaded #7728

HDDS-12115. RM selects replicas to delete non-deterministically if nodes are overloaded #7728

Uh oh!

sodonnel commented Jan 21, 2025

Uh oh!

siddhantsangwan left a comment

Uh oh!

Uh oh!

siddhantsangwan commented Jan 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HDDS-12115. RM selects replicas to delete non-deterministically if nodes are overloaded #7728

HDDS-12115. RM selects replicas to delete non-deterministically if nodes are overloaded #7728

Uh oh!

Conversation

sodonnel commented Jan 21, 2025

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

siddhantsangwan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

siddhantsangwan commented Jan 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants