-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-6829. Limit the no of inflight replication tasks in SCM. #3482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I have a couple of thoughts here:
I think we would be better placing a limit on the pending in-flight replication tasks, rather than a limit on the number of containers processed. That way the replication manager will still process over replication and all the other health check tasks, but we can skip scheduling a replication for under-replication if there are too many pending already. The report can also be populated fully, with over / under replicated counts, even if all the under replication tasks are not scheduled. It would also be good to count up how many were skipped on each iteration if possible so it can be logged to give some insight into what is happening. It might be slightly tricky to do this the way things are currently structured, so that would be nice to have. |
|
@sodonnel , thanks for the comments! Let me see how to change the code. |
|
@sodonnel , checked the code. It actually is not hard to limit the inflight replications and the inflight deletions. Since the new ReplicationManager is coming, we don't want to over engineer the legacy code. Also, we should make sure the new confs added here can be used in the new ReplicationManager. Are the any confs in the new ReplicationManager related to inflight replication and inflight deletion? |
No, not as yet. The plan is to use the currently queued command count on the DNs to limit the replication commands in the new RM. It will find all the under-replicated container, prioritize them by remaining redundancy and then schedule them over time. The plan is to have a per-dn limit, so faster DNs can accept more work perhaps. We have not yet worked out the low level details, but that is the high level thinking. The new replication manager is going to use a new class to track inflight operations, called ContainerReplicaPendingOps. That part is already committed. |
|
Then, I would suggest to have the following confs: |
|
Note that this is still work-in-progress. |
|
@sodonnel , please take a look of the current change. If it is good, I will add some new tests for testing the inflight limits. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we limit the inflight replications, I don't think we need this change any longer to limit the number of containers we process. We should always just process all containers and then skip adding replications there are no space for in the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should probably check the return status here, and increment some metric if we are skipping the replication for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let's add some metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit here is based on the number of actions for an individual container I think? Do we not want to limit on the total number of inflight actions across all containers? As a crude estimate it would be map.size(), although each entry could potentially have several replications against it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if we would be better checking the size in handleUnderReplicatedContainer() and skipping it before doing all the work to find a new target etc. If there is no capacity to schedule a replica, they we ,ay as well skip the work too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Let me see how to do it.
|
The windbags failure does not seem related to this. @sodonnel , could you review the change? Thank you in advance. |
| final Boolean remove = processor.apply(i.next()); | ||
| if (remove == Boolean.TRUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The windbags failure does not seem related to this.
M B RC: Suspicious comparison of Boolean references in org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager$InflightMap.iterate(ContainerID, Function) At LegacyReplicationManager.java:[line 160]
I think it is related, assuming windbags == findbugs. :)
Maybe I'm missing something, but can we use Predicate<InflightAction> instead of Function<InflightAction, Boolean>?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adoroszlai , thanks for pointing out the findbugs warning. I kept checking all the highlighted Warnings like this https://github.com/apache/ozone/runs/6874432894?check_suite_focus=true#step:5:1669 but missed the real findbugs warning which was not highlighted https://github.com/apache/ozone/runs/6874432894?check_suite_focus=true#step:5:1726
And yes, windbags is findbugs after the auto spelling correction. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, Predicate sounds great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow has a "summary of failures" section which greps for the real findbugs problems, to avoid the need for checking the complete output manually:
https://github.com/apache/ozone/runs/6874432894#step:6:8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. Thanks!
a5c88a7 to
62777a2
Compare
sodonnel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM
|
Thanks @sodonnel and @adoroszlai for reviewing this! |
* master: (34 commits) HDDS-6868 Add S3Auth information to thread local (apache#3527) HDDS-6877. Keep replication port unchanged when restarting datanode in MiniOzoneCluster (apache#3510) HDDS-6907. OFS should create buckets with FILE_SYSTEM_OPTIMIZED layout. (apache#3528) HDDS-6875. Migrate parameterized tests in hdds-common to JUnit5 (apache#3513) HDDS-6924. OBJECT_STORE isn't flat namespaced (apache#3533) HDDS-6899. [EC] Remove warnings and errors from console during online reconstruction of data. (apache#3522) HDDS-6695. Enable SCM Ratis by default for new clusters only (apache#3499) HDDS-4123. Integrate OM Open Key Cleanup Service Into Existing Code (apache#3319) HDDS-6882. Correct exit code for invalid arguments passed to command-line tools. (apache#3517) HDDS-6890. EC: Fix potential wrong replica read with over-replicated container. (apache#3523) HDDS-6902. Duplicate mockito-core entries in pom.xml (apache#3525) HDDS-6752. Migrate tests with rules in hdds-server-scm to JUnit5 (apache#3442) HDDS-6806. EC: Implement the EC Reconstruction coordinator. (apache#3504) HDDS-6829. Limit the no of inflight replication tasks in SCM. (apache#3482) HDDS-6898. [SCM HA finalization] Modify acceptance test configuration to speed up test finalization (apache#3521) HDDS-6577. Configurations to reserve HDDS volume space. (apache#3484) HDDS-6870 Clean up isTenantAdmin to use UGI (apache#3503) HDDS-6872. TestAuthorizationV4QueryParser should pass offline (apache#3506) HDDS-6840. Add MetaData volume information to the SCM and OM - UI (apache#3488) HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues (apache#3512) ...
…SCM. (apache#3482) HDDS-6829. Limit the no of inflight replication tasks in SCM. (apache#3482) (cherry picked from commit 94945ae) Change-Id: Ife5a3c73abfbb9e843c0d983b5743a7336b51b24
See https://issues.apache.org/jira/browse/HDDS-6829