HDDS-6829. Limit the no of inflight replication tasks in SCM. #3482

szetszwo · 2022-06-03T09:14:59Z

See https://issues.apache.org/jira/browse/HDDS-6829

sodonnel · 2022-06-07T10:30:00Z

I have a couple of thoughts here:

Our main concern is pending replications. The replication manager also processes over-replication, as well as handling containers which need to be closed, quasi-closed, unhealthy etc. Putting the limiter where it is in this change, delays all those things from happening too. It will also impact EC containers which are going to go through a new code path and hence will not face the same problem.
There is a replication Manager Report in place, and if we only process some of the containers, the report, which is only updated after each full replication manager run will not have the correct numbers.
This solution is a temporary measure until we get the new replication manager ready, and it only really affects things inside the LegacyReplicationManager. I think it would be better if we confined this solution to within the LegacyReplicationManager, and will allow us to develop the new version freely.

I think we would be better placing a limit on the pending in-flight replication tasks, rather than a limit on the number of containers processed. That way the replication manager will still process over replication and all the other health check tasks, but we can skip scheduling a replication for under-replication if there are too many pending already. The report can also be populated fully, with over / under replicated counts, even if all the under replication tasks are not scheduled.

It would also be good to count up how many were skipped on each iteration if possible so it can be logged to give some insight into what is happening. It might be slightly tricky to do this the way things are currently structured, so that would be nice to have.

szetszwo · 2022-06-08T04:47:50Z

@sodonnel , thanks for the comments! Let me see how to change the code.

szetszwo · 2022-06-08T21:24:08Z

@sodonnel , checked the code. It actually is not hard to limit the inflight replications and the inflight deletions. Since the new ReplicationManager is coming, we don't want to over engineer the legacy code. Also, we should make sure the new confs added here can be used in the new ReplicationManager.

Are the any confs in the new ReplicationManager related to inflight replication and inflight deletion?

sodonnel · 2022-06-08T22:14:11Z

Are the any confs in the new ReplicationManager related to inflight replication and inflight deletion?

No, not as yet. The plan is to use the currently queued command count on the DNs to limit the replication commands in the new RM. It will find all the under-replicated container, prioritize them by remaining redundancy and then schedule them over time. The plan is to have a per-dn limit, so faster DNs can accept more work perhaps. We have not yet worked out the low level details, but that is the high level thinking.

The new replication manager is going to use a new class to track inflight operations, called ContainerReplicaPendingOps. That part is already committed.

szetszwo · 2022-06-08T23:17:38Z

Then, I would suggest to have the following confs:

    @Config(key = "container.max",
        type = ConfigType.INT,
        defaultValue = "0", // 0 means unlimited.
        tags = {SCM, OZONE},
        description = "This property is used to limit the maximum number " +
            "of containers to process in each loop.";
    )
    private int containerMax = 0;

    @Config(key = "container.inflight.replication.limit",
        type = ConfigType.INT,
        defaultValue = "0", // 0 means unlimited.
        tags = {SCM, OZONE},
        description = "This property is used to limit the maximum number " +
            "of inflight replication in each containers."
    )
    private int containerInflightReplicationLimit = 0;

    @Config(key = "container.inflight.deletion.limit",
        type = ConfigType.INT,
        defaultValue = "0", // 0 means unlimited.
        tags = {SCM, OZONE},
        description = "This property is used to limit the maximum number " +
            "of inflight deletion in each containers."
    )
    private int containerInflightDeletionLimit = 0;

szetszwo · 2022-06-09T02:24:24Z

Inflight limit

Note that this is still work-in-progress.

szetszwo · 2022-06-09T05:18:01Z

@sodonnel , please take a look of the current change. If it is good, I will add some new tests for testing the inflight limits. Thanks.

sodonnel · 2022-06-09T10:13:01Z

...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java

If we limit the inflight replications, I don't think we need this change any longer to limit the number of containers we process. We should always just process all containers and then skip adding replications there are no space for in the queue.

sodonnel · 2022-06-09T10:18:16Z

...src/main/java/org/apache/hadoop/hdds/scm/container/replication/LegacyReplicationManager.java

I guess we should probably check the return status here, and increment some metric if we are skipping the replication for now?

Sure, let's add some metric.

sodonnel · 2022-06-09T10:26:24Z

...src/main/java/org/apache/hadoop/hdds/scm/container/replication/LegacyReplicationManager.java

The limit here is based on the number of actions for an individual container I think? Do we not want to limit on the total number of inflight actions across all containers? As a crude estimate it would be map.size(), although each entry could potentially have several replications against it.

I also wonder if we would be better checking the size in handleUnderReplicatedContainer() and skipping it before doing all the work to find a new target etc. If there is no capacity to schedule a replica, they we ,ay as well skip the work too.

Sure. Let me see how to do it.

szetszwo · 2022-06-14T14:50:01Z

The windbags failure does not seem related to this.

@sodonnel , could you review the change? Thank you in advance.

adoroszlai · 2022-06-14T14:58:31Z

...src/main/java/org/apache/hadoop/hdds/scm/container/replication/LegacyReplicationManager.java

+            final Boolean remove = processor.apply(i.next());
+            if (remove == Boolean.TRUE) {


The windbags failure does not seem related to this.

M B RC: Suspicious comparison of Boolean references in org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager$InflightMap.iterate(ContainerID, Function) At LegacyReplicationManager.java:[line 160]

I think it is related, assuming windbags == findbugs. :)

Maybe I'm missing something, but can we use Predicate<InflightAction> instead of Function<InflightAction, Boolean>?

@adoroszlai , thanks for pointing out the findbugs warning. I kept checking all the highlighted Warnings like this https://github.com/apache/ozone/runs/6874432894?check_suite_focus=true#step:5:1669 but missed the real findbugs warning which was not highlighted https://github.com/apache/ozone/runs/6874432894?check_suite_focus=true#step:5:1726

And yes, windbags is findbugs after the auto spelling correction. :)

Also, Predicate sounds great!

The workflow has a "summary of failures" section which greps for the real findbugs problems, to avoid the need for checking the complete output manually:

https://github.com/apache/ozone/runs/6874432894#step:6:8

Good to know. Thanks!

sodonnel

Changes LGTM

szetszwo · 2022-06-16T16:09:16Z

Thanks @sodonnel and @adoroszlai for reviewing this!

* master: (34 commits) HDDS-6868 Add S3Auth information to thread local (apache#3527) HDDS-6877. Keep replication port unchanged when restarting datanode in MiniOzoneCluster (apache#3510) HDDS-6907. OFS should create buckets with FILE_SYSTEM_OPTIMIZED layout. (apache#3528) HDDS-6875. Migrate parameterized tests in hdds-common to JUnit5 (apache#3513) HDDS-6924. OBJECT_STORE isn't flat namespaced (apache#3533) HDDS-6899. [EC] Remove warnings and errors from console during online reconstruction of data. (apache#3522) HDDS-6695. Enable SCM Ratis by default for new clusters only (apache#3499) HDDS-4123. Integrate OM Open Key Cleanup Service Into Existing Code (apache#3319) HDDS-6882. Correct exit code for invalid arguments passed to command-line tools. (apache#3517) HDDS-6890. EC: Fix potential wrong replica read with over-replicated container. (apache#3523) HDDS-6902. Duplicate mockito-core entries in pom.xml (apache#3525) HDDS-6752. Migrate tests with rules in hdds-server-scm to JUnit5 (apache#3442) HDDS-6806. EC: Implement the EC Reconstruction coordinator. (apache#3504) HDDS-6829. Limit the no of inflight replication tasks in SCM. (apache#3482) HDDS-6898. [SCM HA finalization] Modify acceptance test configuration to speed up test finalization (apache#3521) HDDS-6577. Configurations to reserve HDDS volume space. (apache#3484) HDDS-6870 Clean up isTenantAdmin to use UGI (apache#3503) HDDS-6872. TestAuthorizationV4QueryParser should pass offline (apache#3506) HDDS-6840. Add MetaData volume information to the SCM and OM - UI (apache#3488) HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues (apache#3512) ...

…SCM. (apache#3482) HDDS-6829. Limit the no of inflight replication tasks in SCM. (apache#3482) (cherry picked from commit 94945ae) Change-Id: Ife5a3c73abfbb9e843c0d983b5743a7336b51b24

szetszwo requested a review from sodonnel June 6, 2022 17:42

sodonnel reviewed Jun 9, 2022

View reviewed changes

sodonnel mentioned this pull request Jun 9, 2022

HDDS-6848. Ignore timeout replication tasks on datanode #3497

Closed

HDDS-6829. Limit the no of inflight replication tasks in SCM.

a713879

szetszwo force-pushed the HDDS-6829 branch from 9404dae to a713879 Compare June 9, 2022 22:20

Fix bugs.

75edbdd

adoroszlai reviewed Jun 14, 2022

View reviewed changes

szetszwo force-pushed the HDDS-6829 branch 2 times, most recently from a5c88a7 to 62777a2 Compare June 14, 2022 15:36

Fix findbugs.

15452ac

szetszwo force-pushed the HDDS-6829 branch from 62777a2 to 15452ac Compare June 14, 2022 15:39

szetszwo added 3 commits June 14, 2022 11:13

Remove unused import.

7c13278

Add new tests.

4e36a2d

Fix TestReplicationManagerMetrics.

b4124d9

sodonnel approved these changes Jun 16, 2022

View reviewed changes

szetszwo merged commit 94945ae into apache:master Jun 16, 2022

		final Boolean remove = processor.apply(i.next());
		if (remove == Boolean.TRUE) {

HDDS-6829. Limit the no of inflight replication tasks in SCM. #3482

HDDS-6829. Limit the no of inflight replication tasks in SCM. #3482

Uh oh!

Conversation

szetszwo commented Jun 3, 2022

Uh oh!

sodonnel commented Jun 7, 2022

Uh oh!

szetszwo commented Jun 8, 2022

Uh oh!

szetszwo commented Jun 8, 2022

Uh oh!

sodonnel commented Jun 8, 2022

Uh oh!

szetszwo commented Jun 8, 2022

Uh oh!

szetszwo commented Jun 9, 2022

Uh oh!

szetszwo commented Jun 9, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szetszwo commented Jun 14, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szetszwo Jun 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sodonnel left a comment

Choose a reason for hiding this comment

Uh oh!

szetszwo commented Jun 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

szetszwo Jun 14, 2022 •

edited

Loading