-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-4244. Container deleted wrong replica cause mis-replicated. #1423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@maobaolong , can we add a policy unsatisfied UT too? |
|
@ChenSammi Thank you for remind me, I add a |
|
LGTM +1. |
* master: (47 commits) HDDS-4104. Provide a way to get the default value and key of java-based-configuration easily (apache#1369) HDDS-4250. Fix wrong logger name (apache#1429) HDDS-4244. Container deleted wrong replica cause mis-replicated. (apache#1423) HDDS-4053. Volume space: add quotaUsageInBytes and update it when write and delete key. (apache#1296) HDDS-4210. ResolveBucket during checkAcls fails. (apache#1398) HDDS-4075. Retry request on different OM on AccessControlException (apache#1303) HDDS-4166. Documentation index page redirects to the wrong address (apache#1372) HDDS-4039. Reduce the number of fields in hdds.proto to improve performance (apache#1289) HDDS-4155. Directory and filename can end up with same name in a path. (apache#1361) HDDS-3927. Rename Ozone OM,DN,SCM runtime options to conform to naming conventions (apache#1401) HDDS-4119. Improve performance of the BufferPool management of Ozone client (apache#1336) HDDS-4217.Remove test TestOzoneContainerRatis (apache#1408) HDDS-4218.Remove test TestRatisManager (apache#1409) HDDS-4129. change MAX_QUOTA_IN_BYTES to Long.MAX_VALUE. (apache#1337) HDDS-4228: add field 'num' to ALLOCATE_BLOCK of scm audit log. (apache#1413) HDDS-4196. Add an endpoint in Recon to query Prometheus (apache#1390) HDDS-4211. [OFS] Better owner and group display for listing Ozone volumes and buckets (apache#1397) HDDS-4150. recon.api.TestEndpoints test is flaky (apache#1396) HDDS-4170 - Fix typo in method description. (apache#1406) HDDS-4064. Show container verbose info with verbose option (apache#1290) ...
…ponse * HDDS-4122-quota-attempt2: (51 commits) Remove redundant check status calls in children of AbstractOMOpenKeyDeleteRequest Remove unused inports and fix super constructor calls Move common volume byte usage update code to AbstractOMKeyDeleteResponse Add volume quota update to open key delete response, and group duplicate code HDDS-4104. Provide a way to get the default value and key of java-based-configuration easily (apache#1369) HDDS-4250. Fix wrong logger name (apache#1429) HDDS-4244. Container deleted wrong replica cause mis-replicated. (apache#1423) HDDS-4053. Volume space: add quotaUsageInBytes and update it when write and delete key. (apache#1296) HDDS-4210. ResolveBucket during checkAcls fails. (apache#1398) HDDS-4075. Retry request on different OM on AccessControlException (apache#1303) HDDS-4166. Documentation index page redirects to the wrong address (apache#1372) HDDS-4039. Reduce the number of fields in hdds.proto to improve performance (apache#1289) HDDS-4155. Directory and filename can end up with same name in a path. (apache#1361) HDDS-3927. Rename Ozone OM,DN,SCM runtime options to conform to naming conventions (apache#1401) HDDS-4119. Improve performance of the BufferPool management of Ozone client (apache#1336) HDDS-4217.Remove test TestOzoneContainerRatis (apache#1408) HDDS-4218.Remove test TestRatisManager (apache#1409) HDDS-4129. change MAX_QUOTA_IN_BYTES to Long.MAX_VALUE. (apache#1337) HDDS-4228: add field 'num' to ALLOCATE_BLOCK of scm audit log. (apache#1413) HDDS-4196. Add an endpoint in Recon to query Prometheus (apache#1390) ...
| if (misReplicated || !nowMisRep) { | ||
| // Remove the replica if the container was already mis-replicated | ||
| // OR if losing this replica does not make it become mis-replicated | ||
| ContainerPlacementStatus nowPS = | ||
| getPlacementStatus(replicaSet, replicationFactor); | ||
| if ((!ps.isPolicySatisfied() | ||
| && nowPS.actualPlacementCount() == ps.actualPlacementCount()) | ||
| || (ps.isPolicySatisfied() && nowPS.isPolicySatisfied())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need such a complex condition check?
Won't fixing the previous check solve the problem?
if (misReplicated || !nowMisRep) to if (!misReplicated || nowMisRep)
|
How exactly does the new test cases verify that the ReplicationManager is considering Rack Awareness and picks the correct replica for deletion? As far as I can understand, the new test cases are checking if the delete command is sent or not. The test cases don't verify any kind of Rack Placement. Am I missing something here? |
What changes were proposed in this pull request?
Container deleted wrong replica cause mis-replicated.
What is the link to the Apache JIRA
HDDS-4244
How was this patch tested?
First you should make the 3 replicas of the tested container on the same racks, like racks1.
secondly, change the config file update the request config.
2 ReplicationManager interval, you can use the following command to verify the container replicas are in the 2 racks.
ozone admin container info #xxx