-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-14175. DiskBalancer should not call getCurrentUsage() multiple times. #9505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Gargi-jais11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @szetszwo for finding out these potential bugs. The patch overall LGTM.
Below are some comments.
| @Override | ||
| public Pair<HddsVolume, HddsVolume> chooseVolume(MutableVolumeSet volumeSet, | ||
| double threshold, Map<HddsVolume, Long> deltaMap, long containerSize) { | ||
| double thresholdPercentage, Map<HddsVolume, Long> deltaMap, long containerSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method parameter is named thresholdPercentage but the interface (DiskBalancerVolumeChoosingPolicy) uses thresholdso the parameter name should match the interface for consistency. Could you please change in interface as well from threshold to thresholdPercentage.
Current code:
Lines 33 to 39 in 388084e
| * @param threshold - the threshold to choose source and dest volumes. | |
| * @param deltaSizes - the sizes changes of inProgress balancing jobs. | |
| * @param containerSize - the estimated size of container to be moved. | |
| * @return Source volume and Dest volume. | |
| */ | |
| Pair<HddsVolume, HddsVolume> chooseVolume(MutableVolumeSet volumeSet, | |
| double threshold, Map<HddsVolume, Long> deltaSizes, long containerSize); |
| // Calculate usages and sort in ascending order of utilization | ||
| final List<VolumeFixedUsage> volumeUsages = allVolumes.stream() | ||
| .map(VolumeFixedUsage::new) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here as well we should use getVolumeUsages() utility method that exists in DiskBalancerVolumeCalculation instead of creating inline VolumeFixedUsage objects.
final List<VolumeFixedUsage> volumeUsages = getVolumeUsages(volumeSet).stream()
.sorted(Comparator.comparingDouble(v -> v.computeUtilization(deltaMap)))
.collect(Collectors.toList());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use getVolumeUsages(volumeSet), it will copy the list again. The list may has changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yaa correct that makes sense.
| public static double computeUtilization(HddsVolume volume, long required) { | ||
| return computeUtilization(volume.getCurrentUsage(), volume.getCommittedBytes(), required); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szetszwo I think this calls getCurrentUsage() which can return different values on each call, potentially causing inconsistent behavior.
Its better to accept VolumeFixedUsage incomputeUtilization rather than
computeUtilization(HddsVolume volume, long required)
What do you say about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For correctness, it is okay since it is used in choosing container. But for performance, it is better to get it first and keep using the same object. Let me change it.
| * | ||
| * @param volumeSet - volumes to choose from. | ||
| * @param threshold - the threshold to choose source and dest volumes. | ||
| * @param thresholdPercentage the threshold percentage in range [0, 100] to choose the source volume. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Threshold range 0 and 100 are exclusive, so (0, 100).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For double, the end values are unimportant except for spacial reasons such as division-by-zeros. For example, why 0.001 and 99.999 are allowed but not 0 and 100?
BTW, 0 and 100 should be allowed in this case
- 0: balance any volumes > average.
- 100: disable it in this method.
If we want to disallow them, we should validate the conf and throw exceptions instead of relying on the javadoc.
In the javadoc here, I just want to emphasize that the threshold here is a percentage [0,100] but not a ratio [0,1]. The previous code use the word "threshold" interchangeably for both percentage and ratio. (Currently, some other code may still do that.)
Actually, it is better to change the conf to understand percentage -- i.e. users can use both "0.1" and "10%" for the same value. After parsing the conf, the code always treat threshold as a ratio [0,1].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, "threshold.percentage" is more clear, and a overall check of threshold configuration is desired. We can do this in a new JIRA.
|
@szetszwo, thanks for this improvement. Overall it looks great. Just one minor comment about the threshold range comment.
For this description, would you elaborate it a bit more? IIUC, it seems not the case. |
|
|
||
| // Threshold is given as a percentage | ||
| double normalizedThreshold = threshold / 100; | ||
| List<HddsVolume> volumes = allVolumes | ||
| .stream() | ||
| .filter(volume -> { | ||
| SpaceUsageSource usage = volume.getCurrentUsage(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChenSammi , this is the bug filtering out < threashold volumes. Later on, it uses volumes list to find destVolume
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szetszwo, I got the point after a second thought. My previous understanding of this filtering out < threashold volumes is it tries to implement a high efficient way of selecting destVolume, as a straightforward thinking is, if there is one volume beyond the utilization threshold, there is likely one volume below the utilization threshold, but realized that actually there are other cases, that there is one volume beyond threshold and no volumes under threshold, or there is one volume under threshold and no volumes beyond threshold,
(1) one volume beyond threshold and no volumes under threshold
Disk1, 30, 100
Disk2, 30, 100
Disk3, 40, 100
100 / 300 = 33.3%
Disk1: 30%
Disk2: 30%
Disk3: 40%
Threshold: 10
Disk utilization range (23.3, 43.3)
Out range volume list: NULL
Threshold: 5
Disk utilization range (28.3, 38.3)
Out range volume list: Disk3
(2) one volume under threshold and no volumes beyond threshold
Disk1, 30, 100
Disk2, 30, 100
Disk3, 20, 100
80 / 300 = 26.7%
Disk1: 30%
Disk2: 30%
Disk3: 20%
Threshold: 10
Disk utilization range (16.7, 36.7)
Out range volume list: NULL
Threshold: 5
Disk utilization range (21.7, 31.7)
Out range volume list: Disk3
So above two cases, are typical cases which are not covered by existing logic. And it looks like case (2) is not covered by new logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... And it looks like case (2) is not covered by new logic.
@ChenSammi , both the old and the new code won't balance case (2). I thought it is the expectation. No?
In your comment, there are upper and lower thresholds but, in the (old) code, it only considers the upper the threshold but never uses the lower threshold.
BTW, I am not trying to change logic here. Just want to fix the bugs described.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. Will address case (2) with a new JIRA. @Gargi-jais11 , could you open a new follow up JIRA for this?
Thanks for pointing out the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure @ChenSammi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the follow up JIRA, we should add the tests described in https://github.com/apache/ozone/pull/9505/changes#r2629666258
Updated |
ChenSammi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @szetszwo .
|
@Gargi-jais11 , @ChenSammi , thanks for reviewing this! |
What changes were proposed in this pull request?
In DefaultVolumeChoosingPolicy.chooseVolume(..), getCurrentUsage() is called multiple times so it can return different values. The biggest problem is the sorted(..) call. The code calls getCurrentUsage() in the comparator. The behavior is ill-defined for sorting changing values.
Another bug in the code below is that it has filtered the volumes > threshold first and then choose destVolume. The destVolume must also be > threshold, which is inccorrect.
Code filtering out volumes > threshold:
ozone/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/policy/DefaultVolumeChoosingPolicy.java
Lines 68 to 78 in 388084e
Code choosing
destVolume:ozone/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/policy/DefaultVolumeChoosingPolicy.java
Lines 98 to 110 in 388084e
What is the link to the Apache JIRA
HDDS-14175
How was this patch tested?
By updating existing tests.