Skip to content

Conversation

@szetszwo
Copy link
Contributor

@szetszwo szetszwo commented Jun 3, 2022

@szetszwo szetszwo requested a review from sodonnel June 6, 2022 17:42
@sodonnel
Copy link
Contributor

sodonnel commented Jun 7, 2022

I have a couple of thoughts here:

  1. Our main concern is pending replications. The replication manager also processes over-replication, as well as handling containers which need to be closed, quasi-closed, unhealthy etc. Putting the limiter where it is in this change, delays all those things from happening too. It will also impact EC containers which are going to go through a new code path and hence will not face the same problem.

  2. There is a replication Manager Report in place, and if we only process some of the containers, the report, which is only updated after each full replication manager run will not have the correct numbers.

  3. This solution is a temporary measure until we get the new replication manager ready, and it only really affects things inside the LegacyReplicationManager. I think it would be better if we confined this solution to within the LegacyReplicationManager, and will allow us to develop the new version freely.

I think we would be better placing a limit on the pending in-flight replication tasks, rather than a limit on the number of containers processed. That way the replication manager will still process over replication and all the other health check tasks, but we can skip scheduling a replication for under-replication if there are too many pending already. The report can also be populated fully, with over / under replicated counts, even if all the under replication tasks are not scheduled.

It would also be good to count up how many were skipped on each iteration if possible so it can be logged to give some insight into what is happening. It might be slightly tricky to do this the way things are currently structured, so that would be nice to have.

@szetszwo
Copy link
Contributor Author

szetszwo commented Jun 8, 2022

@sodonnel , thanks for the comments! Let me see how to change the code.

@szetszwo
Copy link
Contributor Author

szetszwo commented Jun 8, 2022

@sodonnel , checked the code. It actually is not hard to limit the inflight replications and the inflight deletions. Since the new ReplicationManager is coming, we don't want to over engineer the legacy code. Also, we should make sure the new confs added here can be used in the new ReplicationManager.

Are the any confs in the new ReplicationManager related to inflight replication and inflight deletion?

@sodonnel
Copy link
Contributor

sodonnel commented Jun 8, 2022

Are the any confs in the new ReplicationManager related to inflight replication and inflight deletion?

No, not as yet. The plan is to use the currently queued command count on the DNs to limit the replication commands in the new RM. It will find all the under-replicated container, prioritize them by remaining redundancy and then schedule them over time. The plan is to have a per-dn limit, so faster DNs can accept more work perhaps. We have not yet worked out the low level details, but that is the high level thinking.

The new replication manager is going to use a new class to track inflight operations, called ContainerReplicaPendingOps. That part is already committed.

@szetszwo
Copy link
Contributor Author

szetszwo commented Jun 8, 2022

Then, I would suggest to have the following confs:

    @Config(key = "container.max",
        type = ConfigType.INT,
        defaultValue = "0", // 0 means unlimited.
        tags = {SCM, OZONE},
        description = "This property is used to limit the maximum number " +
            "of containers to process in each loop.";
    )
    private int containerMax = 0;

    @Config(key = "container.inflight.replication.limit",
        type = ConfigType.INT,
        defaultValue = "0", // 0 means unlimited.
        tags = {SCM, OZONE},
        description = "This property is used to limit the maximum number " +
            "of inflight replication in each containers."
    )
    private int containerInflightReplicationLimit = 0;

    @Config(key = "container.inflight.deletion.limit",
        type = ConfigType.INT,
        defaultValue = "0", // 0 means unlimited.
        tags = {SCM, OZONE},
        description = "This property is used to limit the maximum number " +
            "of inflight deletion in each containers."
    )
    private int containerInflightDeletionLimit = 0;

@szetszwo
Copy link
Contributor Author

szetszwo commented Jun 9, 2022

Inflight limit

Note that this is still work-in-progress.

@szetszwo
Copy link
Contributor Author

szetszwo commented Jun 9, 2022

@sodonnel , please take a look of the current change. If it is good, I will add some new tests for testing the inflight limits. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we limit the inflight replications, I don't think we need this change any longer to limit the number of containers we process. We should always just process all containers and then skip adding replications there are no space for in the queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should probably check the return status here, and increment some metric if we are skipping the replication for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's add some metric.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit here is based on the number of actions for an individual container I think? Do we not want to limit on the total number of inflight actions across all containers? As a crude estimate it would be map.size(), although each entry could potentially have several replications against it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if we would be better checking the size in handleUnderReplicatedContainer() and skipping it before doing all the work to find a new target etc. If there is no capacity to schedule a replica, they we ,ay as well skip the work too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let me see how to do it.

@szetszwo
Copy link
Contributor Author

The windbags failure does not seem related to this.

@sodonnel , could you review the change? Thank you in advance.

Comment on lines 159 to 160
final Boolean remove = processor.apply(i.next());
if (remove == Boolean.TRUE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The windbags failure does not seem related to this.

M B RC: Suspicious comparison of Boolean references in org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager$InflightMap.iterate(ContainerID, Function) At LegacyReplicationManager.java:[line 160]

I think it is related, assuming windbags == findbugs. :)

Maybe I'm missing something, but can we use Predicate<InflightAction> instead of Function<InflightAction, Boolean>?

Copy link
Contributor Author

@szetszwo szetszwo Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai , thanks for pointing out the findbugs warning. I kept checking all the highlighted Warnings like this https://github.com/apache/ozone/runs/6874432894?check_suite_focus=true#step:5:1669 but missed the real findbugs warning which was not highlighted https://github.com/apache/ozone/runs/6874432894?check_suite_focus=true#step:5:1726

And yes, windbags is findbugs after the auto spelling correction. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Predicate sounds great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow has a "summary of failures" section which greps for the real findbugs problems, to avoid the need for checking the complete output manually:

https://github.com/apache/ozone/runs/6874432894#step:6:8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. Thanks!

@szetszwo szetszwo force-pushed the HDDS-6829 branch 2 times, most recently from a5c88a7 to 62777a2 Compare June 14, 2022 15:36
Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM

@szetszwo szetszwo merged commit 94945ae into apache:master Jun 16, 2022
@szetszwo
Copy link
Contributor Author

Thanks @sodonnel and @adoroszlai for reviewing this!

errose28 added a commit to errose28/ozone that referenced this pull request Jun 23, 2022
* master: (34 commits)
  HDDS-6868 Add S3Auth information to thread local (apache#3527)
  HDDS-6877. Keep replication port unchanged when restarting datanode in MiniOzoneCluster (apache#3510)
  HDDS-6907. OFS should create buckets with FILE_SYSTEM_OPTIMIZED layout. (apache#3528)
  HDDS-6875. Migrate parameterized tests in hdds-common to JUnit5 (apache#3513)
  HDDS-6924. OBJECT_STORE isn't flat namespaced (apache#3533)
  HDDS-6899. [EC] Remove warnings and errors from console during online reconstruction of data. (apache#3522)
  HDDS-6695. Enable SCM Ratis by default for new clusters only (apache#3499)
  HDDS-4123. Integrate OM Open Key Cleanup Service Into Existing Code (apache#3319)
  HDDS-6882. Correct exit code for invalid arguments passed to command-line tools. (apache#3517)
  HDDS-6890. EC: Fix potential wrong replica read with over-replicated container. (apache#3523)
  HDDS-6902. Duplicate mockito-core entries in pom.xml (apache#3525)
  HDDS-6752. Migrate tests with rules in hdds-server-scm to JUnit5 (apache#3442)
  HDDS-6806. EC: Implement the EC Reconstruction coordinator. (apache#3504)
  HDDS-6829. Limit the no of inflight replication tasks in SCM. (apache#3482)
  HDDS-6898. [SCM HA finalization] Modify acceptance test configuration to speed up test finalization (apache#3521)
  HDDS-6577. Configurations to reserve HDDS volume space. (apache#3484)
  HDDS-6870 Clean up isTenantAdmin to use UGI (apache#3503)
  HDDS-6872. TestAuthorizationV4QueryParser should pass offline (apache#3506)
  HDDS-6840. Add MetaData volume information to the SCM and OM - UI (apache#3488)
  HDDS-6697. EC: ReplicationManager - create class to detect EC container health issues (apache#3512)
  ...
duongkame pushed a commit to duongkame/ozone that referenced this pull request Aug 16, 2022
…SCM. (apache#3482)

HDDS-6829. Limit the no of inflight replication tasks in SCM. (apache#3482)

(cherry picked from commit 94945ae)
Change-Id: Ife5a3c73abfbb9e843c0d983b5743a7336b51b24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants