Skip to content

Conversation

@szetszwo
Copy link
Contributor

@szetszwo szetszwo commented Dec 15, 2025

What changes were proposed in this pull request?

  • In DefaultVolumeChoosingPolicy.chooseVolume(..), getCurrentUsage() is called multiple times so it can return different values. The biggest problem is the sorted(..) call. The code calls getCurrentUsage() in the comparator. The behavior is ill-defined for sorting changing values.

  • Another bug in the code below is that it has filtered the volumes > threshold first and then choose destVolume. The destVolume must also be > threshold, which is inccorrect.

Code filtering out volumes > threshold:

List<HddsVolume> volumes = allVolumes
.stream()
.filter(volume -> {
SpaceUsageSource usage = volume.getCurrentUsage();
return Math.abs(
((double)((usage.getCapacity() - usage.getAvailable())
+ deltaMap.getOrDefault(volume, 0L) + volume.getCommittedBytes()))
/ usage.getCapacity() - idealUsage) >= normalizedThreshold;
}).sorted((v1, v2) -> {

Code choosing destVolume:

HddsVolume destVolume = volumes.get(volumes.size() - 1);
while (!filter.test(destVolume)) {
// If the destination volume does not have enough space, try the next
// one in the list.
LOG.debug("Destination volume {} does not have enough space, trying next volume.",
destVolume.getStorageID());
volumes.remove(destVolume);
if (volumes.size() <= 1) {
LOG.debug("Can not find appropriate Source volume and Dest Volume.");
return null;
}
destVolume = volumes.get(volumes.size() - 1);
}

What is the link to the Apache JIRA

HDDS-14175

How was this patch tested?

By updating existing tests.

@ivandika3 ivandika3 requested a review from ChenSammi December 16, 2025 01:41
Copy link
Contributor

@Gargi-jais11 Gargi-jais11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @szetszwo for finding out these potential bugs. The patch overall LGTM.
Below are some comments.

@Override
public Pair<HddsVolume, HddsVolume> chooseVolume(MutableVolumeSet volumeSet,
double threshold, Map<HddsVolume, Long> deltaMap, long containerSize) {
double thresholdPercentage, Map<HddsVolume, Long> deltaMap, long containerSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method parameter is named thresholdPercentage but the interface (DiskBalancerVolumeChoosingPolicy) uses thresholdso the parameter name should match the interface for consistency. Could you please change in interface as well from threshold to thresholdPercentage.

Current code:

* @param threshold - the threshold to choose source and dest volumes.
* @param deltaSizes - the sizes changes of inProgress balancing jobs.
* @param containerSize - the estimated size of container to be moved.
* @return Source volume and Dest volume.
*/
Pair<HddsVolume, HddsVolume> chooseVolume(MutableVolumeSet volumeSet,
double threshold, Map<HddsVolume, Long> deltaSizes, long containerSize);

Comment on lines 63 to 65
// Calculate usages and sort in ascending order of utilization
final List<VolumeFixedUsage> volumeUsages = allVolumes.stream()
.map(VolumeFixedUsage::new)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here as well we should use getVolumeUsages() utility method that exists in DiskBalancerVolumeCalculation instead of creating inline VolumeFixedUsage objects.

final List<VolumeFixedUsage> volumeUsages = getVolumeUsages(volumeSet).stream()
    .sorted(Comparator.comparingDouble(v -> v.computeUtilization(deltaMap)))
    .collect(Collectors.toList());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use getVolumeUsages(volumeSet), it will copy the list again. The list may has changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yaa correct that makes sense.

Comment on lines 129 to 130
public static double computeUtilization(HddsVolume volume, long required) {
return computeUtilization(volume.getCurrentUsage(), volume.getCommittedBytes(), required);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szetszwo I think this calls getCurrentUsage() which can return different values on each call, potentially causing inconsistent behavior.
Its better to accept VolumeFixedUsage incomputeUtilization rather than
computeUtilization(HddsVolume volume, long required)

What do you say about this?

Copy link
Contributor Author

@szetszwo szetszwo Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For correctness, it is okay since it is used in choosing container. But for performance, it is better to get it first and keep using the same object. Let me change it.

@szetszwo szetszwo changed the title HDDS-14175. DefaultVolumeChoosingPolicy should not call getCurrentUsage() multiple times. HDDS-14175. DiskBalancer should not call getCurrentUsage() multiple times. Dec 16, 2025
*
* @param volumeSet - volumes to choose from.
* @param threshold - the threshold to choose source and dest volumes.
* @param thresholdPercentage the threshold percentage in range [0, 100] to choose the source volume.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threshold range 0 and 100 are exclusive, so (0, 100).

Copy link
Contributor Author

@szetszwo szetszwo Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For double, the end values are unimportant except for spacial reasons such as division-by-zeros. For example, why 0.001 and 99.999 are allowed but not 0 and 100?

BTW, 0 and 100 should be allowed in this case

  • 0: balance any volumes > average.
  • 100: disable it in this method.

If we want to disallow them, we should validate the conf and throw exceptions instead of relying on the javadoc.

In the javadoc here, I just want to emphasize that the threshold here is a percentage [0,100] but not a ratio [0,1]. The previous code use the word "threshold" interchangeably for both percentage and ratio. (Currently, some other code may still do that.)

Actually, it is better to change the conf to understand percentage -- i.e. users can use both "0.1" and "10%" for the same value. After parsing the conf, the code always treat threshold as a ratio [0,1].

Copy link
Contributor

@ChenSammi ChenSammi Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, "threshold.percentage" is more clear, and a overall check of threshold configuration is desired. We can do this in a new JIRA.

@ChenSammi
Copy link
Contributor

@szetszwo, thanks for this improvement. Overall it looks great. Just one minor comment about the threshold range comment.

Another bug is that it has filtered the volumes > threshold first and then choose destVolume. The destVolume must also be > threshold, which is undesiable.

For this description, would you elaborate it a bit more? IIUC, it seems not the case.

Comment on lines 63 to -71

// Threshold is given as a percentage
double normalizedThreshold = threshold / 100;
List<HddsVolume> volumes = allVolumes
.stream()
.filter(volume -> {
SpaceUsageSource usage = volume.getCurrentUsage();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenSammi , this is the bug filtering out < threashold volumes. Later on, it uses volumes list to find destVolume

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szetszwo, I got the point after a second thought. My previous understanding of this filtering out < threashold volumes is it tries to implement a high efficient way of selecting destVolume, as a straightforward thinking is, if there is one volume beyond the utilization threshold, there is likely one volume below the utilization threshold, but realized that actually there are other cases, that there is one volume beyond threshold and no volumes under threshold, or there is one volume under threshold and no volumes beyond threshold,

(1) one volume beyond threshold and no volumes under threshold

Disk1, 30, 100
Disk2, 30, 100
Disk3, 40, 100

100 / 300 = 33.3%
Disk1: 30%
Disk2: 30%
Disk3: 40%

Threshold: 10
Disk utilization range (23.3, 43.3)
Out range volume list: NULL

Threshold: 5
Disk utilization range (28.3, 38.3)
Out range volume list: Disk3

(2) one volume under threshold and no volumes beyond threshold

Disk1, 30, 100
Disk2, 30, 100
Disk3, 20, 100

80 / 300 = 26.7%
Disk1: 30%
Disk2: 30%
Disk3: 20%

Threshold: 10
Disk utilization range (16.7, 36.7)
Out range volume list: NULL

Threshold: 5
Disk utilization range (21.7, 31.7)
Out range volume list: Disk3

So above two cases, are typical cases which are not covered by existing logic. And it looks like case (2) is not covered by new logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... And it looks like case (2) is not covered by new logic.

@ChenSammi , both the old and the new code won't balance case (2). I thought it is the expectation. No?

In your comment, there are upper and lower thresholds but, in the (old) code, it only considers the upper the threshold but never uses the lower threshold.

BTW, I am not trying to change logic here. Just want to fix the bugs described.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Will address case (2) with a new JIRA. @Gargi-jais11 , could you open a new follow up JIRA for this?

Thanks for pointing out the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure @ChenSammi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the follow up JIRA, we should add the tests described in https://github.com/apache/ozone/pull/9505/changes#r2629666258

@szetszwo
Copy link
Contributor Author

For this description, would you elaborate it a bit more? ...

Updated

Copy link
Contributor

@ChenSammi ChenSammi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @szetszwo .

@ChenSammi ChenSammi merged commit 2abd751 into apache:HDDS-5713 Dec 19, 2025
47 checks passed
@szetszwo
Copy link
Contributor Author

@Gargi-jais11 , @ChenSammi , thanks for reviewing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants