HDDS-12437. [DiskBalancer] Estimate the total size pending to move before disk usage becomes even #8056

Gargi-jais11 · 2025-03-12T05:59:50Z

What changes were proposed in this pull request?

It will be an estimation value, due to there are other activities going on, such as block deletion, new container creation, container replica deletion, new data ingestion.

This value will be an indicator for roughly how much time are still needed for the disk usage to become even.

Should Include this value in status report too.

The estimated time pending before disk usage become even is given for the happy path in status report.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12437

How was this patch tested?

Unit Test is written for the function in testDiskBalancerService#testCalculateBytesToMove.

Tested it manually by running locally on docker-cluster.

bash-5.1$ ozone admin datanode diskbalancer status
Status result:
Datanode                            VolumeDensity             Status          Threshold(%)    BandwidthInMB   Threads      SuccessMove  FailureMove  EstBytesToMove(MB) EstTimeLeft(min)
ozone-datanode-2.ozone_default      0.001334742544789370      RUNNING         0.0002          10              5            5            0            816                2           
ozone-datanode-1.ozone_default      0.001486062001891789      RUNNING         0.0002          10              5            5            0            3185               6           

Note: Estimated time left is calculated based on the estimated bytes to move and the configured disk bandwidth.

bash-5.1$ ozone admin datanode diskbalancer status
Status result:
Datanode                            VolumeDensity             Status          Threshold(%)    BandwidthInMB   Threads      SuccessMove  FailureMove  EstBytesToMove(MB) EstTimeLeft(min)
ozone-datanode-2.ozone_default      0.001334742544789370      RUNNING         0.0002          10              5            8            2            683                2           
ozone-datanode-1.ozone_default      0.001486062001891789      RUNNING         0.0002          10              5            5            3            763                2           

Note: Estimated time left is calculated based on the estimated bytes to move and the configured disk bandwidth.

…fore disk usage becomes even # Conflicts: # hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/DiskBalancerInfo.java # hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/DiskBalancerService.java # hadoop-hdds/interface-client/src/main/proto/hdds.proto # hadoop-hdds/interface-server/src/main/proto/ScmServerDatanodeHeartbeatProtocol.proto # hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DiskBalancerManager.java # hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/datanode/DiskBalancerStatusSubcommand.java

…fore disk usage becomes even

...ervice/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/DiskBalancerService.java

hadoop-hdds/interface-client/src/main/proto/hdds.proto

...ools/src/main/java/org/apache/hadoop/hdds/scm/cli/datanode/DiskBalancerStatusSubcommand.java

ChenSammi · 2025-03-12T08:18:45Z

@Gargi-jais11 , let's show EstBytesToMove(MB) together with the estimation time. And add a note at the command final, explain that estimation time is calculated based on estimation bytes to move and bandwidth.

Gargi-jais11 · 2025-03-12T08:21:58Z

@Gargi-jais11 , let's show EstBytesToMove(MB) together with the estimation time. And add a note at the command final, explain that estimation time is calculated based on estimation bytes to move and bandwidth.

Sure will do the above mentioned changes

...ce/src/test/java/org/apache/hadoop/ozone/container/diskbalancer/TestDiskBalancerService.java

ChenSammi · 2025-03-14T09:14:12Z

...ce/src/test/java/org/apache/hadoop/ozone/container/diskbalancer/TestDiskBalancerService.java

+          getDiskBalancerService(containerSet, conf, keyValueHandler, null, 1);
+      svc.setShouldRun(true);
+      svc.setThreshold(10);
+      svc.setQueueSize(2);


Why hard code the queue size to 2?

Because if I don't hard code the queue size than when it checks the actual value of bytesToMove through calculateBytesToMove then everytime the queue size is 0 as for checking the queue size getTask is not called and when I was trying to call it then containerChoosingPolicy was throwing lots of errors to just check whether the calculateBytesToMove I did it hardcode

You can pass the volumeSet and OzoneConfiguration to calculateBytesToMove() and make it public, so that you can directly call calculateBytesToMove from the unit test, and don't have to deal with the containerChoosingPolicy.

I tried passing the volumeSet and OzoneConfiguration to calculateBytesToMove() without hard coding the queue size to 2 or as volume count but the bytesToMove is still returning to be 0 due to queuesize = 0 else after refactoring the unit test as you suggested on hard coding the queue size to 2 all test case are passing

...ce/src/test/java/org/apache/hadoop/ozone/container/diskbalancer/TestDiskBalancerService.java

ChenSammi · 2025-03-14T09:22:52Z

Please also update the console output.

Gargi-jais11 · 2025-03-14T09:34:39Z

Please also update the console output.

done

ChenSammi · 2025-03-24T07:40:04Z

Thanks @Gargi-jais11 .

Gargi Jaiswal added 5 commits March 11, 2025 15:22

HDDS-12437. [DiskBalancer] Estimate the total size pending to move be…

3c54d33

…fore disk usage becomes even

updating scm side to get success and failure move count to status cli

240e506

Resolved checkstyle failures and aligned cli properly

3b8cea7

fixed minor issues

8b0a503

Gargi-jais11 force-pushed the HDDS-12437 branch from 263978d to 8b0a503 Compare March 12, 2025 07:54