HDDS-13410. Control block deletion for each DN from SCM. #8767

ashishkumar50 · 2025-07-09T07:41:23Z

What changes were proposed in this pull request?

Control number of delete blocks request per DN in a cycle, we can safely increase total delete block request in SCM in each cycle, as it can distribute requests to more DN.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13410

How was this patch tested?

New unit test

xichen01 · 2025-07-09T16:05:52Z

@ashishkumar50 Thanks for your patch, could I know the background of this PR, what problem is this limit trying to solve ?

ashishkumar50 · 2025-07-10T07:47:53Z

@xichen01 Thanks for looking into it.
Currently block limit from SCM to DN is just 100K by default in every interval which is quite conservative value. We want to increase this value to much higher like 2M or even more, at the same time we don't want to overload particular DN to have too many block delete request due to overall SCM delete block limit change. Let the delete block request distribute across many DNs.

xichen01

@ashishkumar50 Thanks for you patch, I have left a few comments.

xichen01 · 2025-07-11T03:59:26Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

+                                       Map<DatanodeID, Map<Long, CmdStatus>> commandStatus) {
+
+    // Check if all replicas satisfy the maxBlocksPerDatanode condition
+    if (!replicas.stream().allMatch(replica -> {


If the deleted replicas are not distributed evenly, this return may cause the blockDeletionLimit hard to be reached (such as: in an expanded cluster, the deleted data may be on the old DNs.).
Maybe we can set a maximum number of loop, e.g. end the while loop in DeletedBlockLogImpl#getTransactions if it exceeds 3 * (DN count) * maxDeleteBlocksPerDatanode.

Thanks for the suggestion, looks fine for me to break the while loop. But the problem is 3 * (DN count) * maxDeleteBlocksPerDatanode may not always happen as there can be under-replicated containers, replica count will be less than 3. Even this will not hold good for EC which may have 5-8 replica or so.

Assume we set total limit as 2M and per data node limit as 100K.
It will iterate whole table only when cluster size is too less may be 15-20 nodes and then we expand the cluster.
The problem will arise only when gap between total limit and per data node limit is way too high compare to the number of DNs in the cluster.

xichen01 · 2025-07-11T04:00:07Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

+                                       Map<DatanodeID, Map<Long, CmdStatus>> commandStatus) {
+
+    // Check if all replicas satisfy the maxBlocksPerDatanode condition
+    if (!replicas.stream().allMatch(replica -> {


And if the ((DN count) * maxDeleteBlocksPerDatanode) < blockDeletionLimit, the DeletedBlockLogImpl#getTransactions will iterator all the data in the table.

aryangupta1998

Thanks, @ashishkumar50, for the patch. It looks good, I have a few suggestions.

One suggestion regarding the default value of ozone.scm.block.deletion.max.blocks.per.dn: it's currently hardcoded to 100k, which is safe and conservative but may not scale well or distribute load efficiently in larger clusters.

We already have another config, hdds.scm.block.deletion.per-interval.max (default: 500k), which controls the total number of blocks SCM can process per cycle. If both values are set to 100k, a single DN could potentially receive the full deletion load, defeating the goal of balanced distribution.

Consider a case where hdds.scm.block.deletion.per-interval.max = 2M. With ozone.scm.block.deletion.max.blocks.per.dn = 100k, we would ideally need 20 DNs to fully utilize the interval. If the cluster has fewer than 20 DNs, the load won't be evenly distributed — some DNs may hit the cap, while others remain underutilized.

To improve flexibility and balance, we could consider deriving the default value dynamically, for example:

ozone.scm.block.deletion.max.blocks.per.dn = hdds.scm.block.deletion.per-interval.max / (number of DNs / 2)

This ensures better load spreading in large clusters, while still avoiding overload in smaller ones. The /2 factor acts as a safety buffer but can be tuned based on experimentation.

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java

ashishkumar50 · 2025-07-18T11:33:50Z

Thanks @aryangupta1998 for the review, updated patch with the dynamic derived value for each DN.

aryangupta1998

Thanks @ashishkumar50, for updating the patch, LGTM!

aryangupta1998 · 2025-07-21T05:40:49Z

Thanks @ashishkumar50 for the contribution and @xichen01 for the review!

* master: (730 commits) HDDS-13083. Handle cases where block deletion generates tree file before scanner (apache#8565) HDDS-12982. Reduce log level for snapshot validation failure (apache#8851) HDDS-13396. Documentation: Improve the top-level overview page for new users. (apache#8753) HDDS-13176. containerIds table value format change to proto from string (apache#8589) HDDS-13449. Incorrect Interrupt Handling for DirectoryDeletingService and KeyDeletingService (apache#8817) HDDS-2453. Add Freon tests for S3 MPU Keys (apache#8803) HDDS-13237. Container data checksum should contain block IDs. (apache#8773) HDDS-13489. Fix SCMBlockdeleting unnecessary iteration in corner case. (apache#8847) HDDS-13464. Make ozone.snapshot.filtering.service.interval reconfigurable (apache#8825) HDDS-13473. Amend validation for OZONE_OM_SNAPSHOT_DB_MAX_OPEN_FILES (apache#8829) HDDS-13435. Add an OzoneManagerAuthorizer interface (apache#8840) HDDS-8565. Recon memory leak in NSSummary (apache#8823). HDDS-12852. Implement a sliding window counter utility (apache#8498) HDDS-12000. Add unit test for RatisContainerSafeModeRule and ECContainerSafeModeRule (apache#8801) HDDS-13092. Container scanner should trigger volume scan when marking a container unhealthy (apache#8603) HDDS-13070. OM Follower changes to create and place sst files from hardlink file. (apache#8761) HDDS-13482. Mark testWriteStateMachineDataIdempotencyWithClosedContainer as flaky HDDS-13481. Fix success latency metric in SCM panels of deletion grafana dashboard (apache#8835) HDDS-13468. Update default value of ozone.scm.ha.dbtransactionbuffer.flush.interval. (apache#8834) HDDS-13410. Control block deletion for each DN from SCM. (apache#8767) ... hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplicaInfo.java hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/container/ReconcileSubcommand.java hadoop-ozone/cli-admin/src/test/java/org/apache/hadoop/hdds/scm/cli/container/TestReconcileSubcommand.java

ashishkr200 added 2 commits July 8, 2025 23:33

HDDS-13410. Control block deletion for each DN from SCM.

796ca7a

Add in ozone-default

7f67a01

ivandika3 requested review from sumitagrawl and xichen01 July 9, 2025 10:37

xichen01 reviewed Jul 11, 2025

View reviewed changes

aryangupta1998 reviewed Jul 14, 2025

View reviewed changes

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java Show resolved Hide resolved

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java Outdated Show resolved Hide resolved

aryangupta1998 reviewed Jul 14, 2025

View reviewed changes

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java Outdated Show resolved Hide resolved

ashishkr200 added 2 commits July 18, 2025 16:58

Fix review comments

28e7077

Fix indentation

39bbc18

Update comment

f6c5e16

aryangupta1998 approved these changes Jul 18, 2025

View reviewed changes

aryangupta1998 merged commit 64a4958 into apache:master Jul 21, 2025
81 of 82 checks passed

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jul 31, 2025

HDDS-13410. Control block deletion for each DN from SCM. (apache#8767)

7e69a10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-13410. Control block deletion for each DN from SCM. #8767

HDDS-13410. Control block deletion for each DN from SCM. #8767

Uh oh!

ashishkumar50 commented Jul 9, 2025

Uh oh!

xichen01 commented Jul 9, 2025

Uh oh!

ashishkumar50 commented Jul 10, 2025

Uh oh!

xichen01 left a comment

Uh oh!

xichen01 Jul 11, 2025

Uh oh!

ashishkumar50 Jul 11, 2025

Uh oh!

xichen01 Jul 11, 2025

Uh oh!

aryangupta1998 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ashishkumar50 commented Jul 18, 2025

Uh oh!

aryangupta1998 left a comment

Uh oh!

Uh oh!

aryangupta1998 commented Jul 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HDDS-13410. Control block deletion for each DN from SCM. #8767

HDDS-13410. Control block deletion for each DN from SCM. #8767

Uh oh!

Conversation

ashishkumar50 commented Jul 9, 2025

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

xichen01 commented Jul 9, 2025

Uh oh!

ashishkumar50 commented Jul 10, 2025

Uh oh!

xichen01 left a comment

Choose a reason for hiding this comment

Uh oh!

xichen01 Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

ashishkumar50 Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

xichen01 Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

aryangupta1998 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ashishkumar50 commented Jul 18, 2025

Uh oh!

aryangupta1998 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aryangupta1998 commented Jul 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants