HDDS-13667. [DiskBalancer] Improve DiskBalancer CLI message for failed commands #9091

Gargi-jais11 · 2025-10-03T07:02:13Z

What changes were proposed in this pull request?

Currently DiskBalancer start, stop and update command is send only to IN_SERVICE_HEALTHY DN but user has no info about this so improve cli output message to show as below:

bash-5.1$ ozone admin datanode diskbalancer start -t 0.0001 -a
Starting DiskBalancer on datanode(s) which are IN_SERVICE and HEALTHY.

When start, stop and update command is sent to a specific DN which is not IN_SERVICE_HEALTHY, command should be rejected same as when sent to all DN.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13667

How was this patch tested?

Updated existing Integration Test TestDiskBalancerDuringDecommissionAndMaintenance .
Also tested manually on docker-cluster:

bash-5.1$ ozone admin datanode decommission -id scmservice ozone-ha-datanode-5
Started decommissioning datanode(s):
ozone-ha-datanode-5

bash-5.1$ ozone admin datanode diskbalancer status
Status result:
Datanode                            Status          Threshold(%)    BandwidthInMB   Threads      SuccessMove  FailureMove  BytesMoved(MB)  EstBytesToMove(MB) EstTimeLeft(min)
ozone-ha-datanode-3.ozone-ha_default STOPPED         10.0000         200             5            0            0            0               0               0              
ozone-ha-datanode-2.ozone-ha_default STOPPED         10.0000         200             5            0            0            0               0               0              
ozone-ha-datanode-4.ozone-ha_default STOPPED         10.0000         200             5            0            0            0               0               0              
ozone-ha-datanode-1.ozone-ha_default STOPPED         10.0000         200             5            0            0            0               0               0              

Note: Estimated time left is calculated based on the estimated bytes to move and the configured disk bandwidth.

bash-5.1$ ozone admin datanode diskbalancer start -b 200 -d ozone-ha-datanode-5
Error: ozone-ha-datanode-5.ozone-ha_default: Datanode is not in optimal state for disk balancing. NodeStatus: DECOMMISSIONING(no expiry)-HEALTHY
Some nodes could not start DiskBalancer.

bash-5.1$ ozone admin datanode decommission -id scmservice ozone-ha-datanode-3 
Started decommissioning datanode(s):
ozone-ha-datanode-3

bash-5.1$ ozone admin datanode decommission -id scmservice ozone-ha-datanode-4 
Started decommissioning datanode(s):
ozone-ha-datanode-4

bash-5.1$ ozone admin datanode diskbalancer start -t 0.002 -a
Starting DiskBalancer on datanode(s) which are IN_SERVICE and HEALTHY.

sarvekshayr

Thanks for the patch @Gargi-jais11.

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DiskBalancerManager.java

...i-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/datanode/DiskBalancerCommonOptions.java

sarvekshayr · 2025-10-06T08:51:40Z

Thanks for updating the patch. Please change the log message from the first output in the PR description.

sumitagrawl

@Gargi-jais11 Given minor comment

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DiskBalancerManager.java

sumitagrawl

LGTM

ChenSammi · 2025-10-16T08:21:25Z

Thanks @Gargi-jais11 for the contribution, and @sumitagrawl for the review.

diskbalancer should not start, stop or update when not in optimal state

62e6b01

Gargi-jais11 marked this pull request as ready for review October 3, 2025 09:01

sarvekshayr reviewed Oct 6, 2025

View reviewed changes

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DiskBalancerManager.java Outdated Show resolved Hide resolved

...i-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/datanode/DiskBalancerCommonOptions.java Show resolved Hide resolved

updated javadoc and cli message

7f9a666

sumitagrawl reviewed Oct 9, 2025

View reviewed changes

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DiskBalancerManager.java Outdated Show resolved Hide resolved

updated error statement and renamed method

68071a0

Gargi-jais11 requested a review from sumitagrawl October 13, 2025 05:42

fixed TestDiskBalancerDuringDecommissionAndMaintenance failure

305fcf9

sumitagrawl reviewed Oct 13, 2025

View reviewed changes

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DiskBalancerManager.java Outdated Show resolved Hide resolved

updated javadoc

e6b3dcc

Gargi-jais11 requested a review from sumitagrawl October 13, 2025 13:00

sumitagrawl approved these changes Oct 13, 2025

View reviewed changes

ChenSammi approved these changes Oct 16, 2025

View reviewed changes

ChenSammi merged commit 2fa932c into apache:HDDS-5713 Oct 16, 2025
83 of 84 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-13667. [DiskBalancer] Improve DiskBalancer CLI message for failed commands #9091

HDDS-13667. [DiskBalancer] Improve DiskBalancer CLI message for failed commands #9091

Gargi-jais11 commented Oct 3, 2025 •

edited

Loading

Uh oh!

sarvekshayr left a comment

Uh oh!

Uh oh!

Uh oh!

sarvekshayr commented Oct 6, 2025

Uh oh!

sumitagrawl left a comment

Uh oh!

Uh oh!

Uh oh!

sumitagrawl left a comment

Uh oh!

ChenSammi commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HDDS-13667. [DiskBalancer] Improve DiskBalancer CLI message for failed commands #9091

HDDS-13667. [DiskBalancer] Improve DiskBalancer CLI message for failed commands #9091

Conversation

Gargi-jais11 commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

sarvekshayr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sarvekshayr commented Oct 6, 2025

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sumitagrawl left a comment

Choose a reason for hiding this comment

Uh oh!

ChenSammi commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Gargi-jais11 commented Oct 3, 2025 •

edited

Loading