-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-12114. Prevent delete commands running after a long lock wait and send ICR earlier #7726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
errose28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sodonnel overall looks good just some minor comments.
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Outdated
Show resolved
Hide resolved
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java
Outdated
Show resolved
Hide resolved
|
@errose28 I believe I have addressed the comments. Please have another check. |
errose28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick improvement @sodonnel. Optional if you want to fix the minor whitespace diff before merging
| public static final String | ||
| OZONE_RECOVERING_CONTAINER_TIMEOUT_DEFAULT = "20m"; | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. whitespace diff
…d send ICR earlier (apache#7726) (cherry picked from commit b6cc4af) Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
…d send ICR earlier (apache#7726)
* CDPD-78092. HDDS-12114. Prevent delete commands running after a long lock wait and send ICR earlier (apache#7726) (cherry picked from commit b6cc4af) Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java Change-Id: I62ffb7203f2af5be2901ef923f333de53bbc3656 * CDPD-78149. HDDS-12115. RM selects replicas to delete non-deterministically if nodes are overloaded (apache#7728) (cherry picked from commit efd8adc) Conflicts: hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestRatisOverReplicationHandler.java Change-Id: Ia3d54917c7c488a9b706f6ce941e7f466746d3bd * CDPD-78286. HDDS-12135. Set RM default deadline to 12 minutes and datanode offset to 6 minutes (apache#7747) (cherry picked from commit d7616ec) Change-Id: I36f237705f5a94d453bcec72c32056c2be8f38ba * CDPD-78213. HDDS-12127. RM should not expire pending deletes, but retry until delete is confirmed or node is dead (apache#7746) (cherry picked from commit 04f6255) Conflicts: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerReplicaPendingOps.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/balancer/TestMoveManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestContainerReplicaPendingOps.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestECContainerReplicaCount.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestReplicationManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/replication/TestReplicationManagerScenarios.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/TestBlockDeletion.java Change-Id: Ic01591f72706f2473c63dd2e44c3f2a94fb70d43 --------- Co-authored-by: Stephen O'Donnell <stephen.odonnell@gmail.com>
…d send ICR earlier (apache#7726)
…d send ICR earlier (apache#7726) (cherry picked from commit b6cc4af) Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
What changes were proposed in this pull request?
We have seen some instances where delete container commands are picked from the command queue within the SCM defined deadline. However they run for a very long time in the handler. This cases SCM to think the delete has been dropped or failed, when it is actually still running.
The causes of the slow running command could be:
To compound this problem, an ICR confirming the delete is not sent until the very last stage of the delete process.
To combat this, two changes are included in this PR:
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-12114
How was this patch tested?
New unit test added.