HDDS-8844. Internal move logic for DiskBalancer #4887

symious · 2023-06-14T07:15:22Z

What changes were proposed in this pull request?

This ticket includes the internal move logic for the DiskBalancer.

The code are mainly from HDDS-7233.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8844

How was this patch tested?

Used unit test from HDDS-7233.

…#3701)

symious · 2023-06-14T07:16:18Z

@ChenSammi Could you help to review? I shall provide more test cases along the review.

vtutrinov · 2023-07-03T09:19:33Z

...ervice/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/DiskBalancerService.java

+      if (toBalanceContainer != null) {
+        queue.add(new DiskBalancerTask(toBalanceContainer, sourceVolume,
+            destVolume));
+        inProgressContainers.add(toBalanceContainer.getContainerID());


The 'isBalancingContainer' method is never used. What will happen if the command to delete container/block will be received on container balancing? Or do the locks in the container solve the synchronization problem?

I was thinking we can drop the commands and let SCM resend the commands.

sodonnel · 2024-01-25T20:53:41Z

@symious What is the current status of this PR - it has been here for a while? Are you interested in moving it forward?

symious · 2024-01-26T02:05:56Z

@sodonnel Yes, it's been a while for this PR. I'll fix the checks first.

sodonnel · 2024-01-29T12:04:04Z

...ervice/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/DiskBalancerService.java

+            .resolve(DISK_BALANCER_DIR).resolve(String.valueOf(containerId));
+
+        // Copy container to new Volume's tmp Dir
+        ozoneContainer.getController().copyContainer(containerData,


Is there somewhere else in the code that blocks operations against the container while the export / import is happening? Eg, if a we run copyContainer. After that completes a delete block is processed, deleting the block from the original copy, but not the new copy. Then we continue and at the end we have a block in the moved container that should have been removed. I think each of these operations on the container (copy, export, import, etc) lock the container instance but it feels like we need something to lock across the entire move cycle, while also allowing reads to continue.

I think currently we don't have such locks. Since the locks could be time-consuming for export and import, other operations might be complaining about it.

And I think it's possible now that there are already some replicas not same with others, due to DN restart or exporting, it should be good to have a mechanism to sync among all replicas.

We have a design in progress that will deal with mis-matches between the containers in terms of deleted blocks, but I feel we should try to avoid causing such problems if we can. I am not sure how hard that might be with the current code layout.

Eg if there was a read lock held for the duration of the container move, then it would allow block reads etc to come through, but would stop block deletes. However at the moment, I don't think readChunk has any locks involved, so this would be a lot of difficult changes I think.

We have a design in progress that will deal with mis-matches between the containers in terms of deleted blocks

Looking forward to this feature.

Currently I think the read won't be affected, clients can still read data while the move is ongoing.

I think it will be ok if the block deletes get missed. This isn't really any difference to replication between nodes, as after tarring a container a block could be deleted before the new replica is available on the target node.

sodonnel · 2024-01-29T12:07:06Z

...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java

+    Preconditions.checkNotNull(container, "container cannot be null");
+
+    long containerId = container.getContainerData().getContainerID();
+    if (!containerMap.containsKey(containerId)) {


We should probably use computeIfPresent() here, as it makes the operation atomic. There is a small chance the mapping could be remove between the containerKey check and the put.

sodonnel · 2024-01-31T15:59:55Z

Is this the last PR needed to get the disk balancer working?

Also, have you been running a version of this on your own clusters already? I understand you needed this feature but it has been under development for a long time now.

symious · 2024-02-01T01:51:21Z

Is this the last PR needed to get the disk balancer working?

Yes.

have you been running a version of this on your own clusters already?

Yes, but the version on our cluster is different from this PR. Also our cluster's Ozone version is quite different from the master.

sodonnel

I think this change is good to commit. +1 from me.

symious and others added 7 commits March 16, 2023 09:53

HDDS-7106. [DiskBalancer] Client-SCM interface (apache#3663)

9094b0e

HDDS-7155. [DiskBalancer] Create interface between SCM and DN (apache…

248af79

…#3701)

HDDS-7205. DiskBalancer CLI (apache#3739)

4398f87

HDDS-7234. Add a common option for DiskBalancer commands (apache#3762)

1485590

HDDS-7383. Basic framework of DiskBalancerService (apache#3874)

7929c5c

HDDS-8182. Add volume and container choosing policy (apache#4408)

22f4fe7

HDDS-8844. Internal move logic for DiskBalancer

434aff0

symious mentioned this pull request Jul 3, 2023

HDDS-7233. Add DiskBalancerService on Datanode #3760

Closed

vtutrinov reviewed Jul 3, 2023

View reviewed changes

symious added 2 commits January 26, 2024 10:11

trigger new CI check

e8473a4

HDDS-8844. Fix findbugs

4343aa9

sodonnel reviewed Jan 29, 2024

View reviewed changes

symious force-pushed the HDDS-5713 branch from 22f4fe7 to d40978b Compare January 29, 2024 16:46

symious and others added 2 commits January 31, 2024 09:27

Merge branch 'HDDS-5713' into HDDS-8844

9673b28

HDDS-8844. Fix conflicts

cabe790

HDDS-8844. Fix unit test

622e42a

sodonnel approved these changes Feb 1, 2024

View reviewed changes

sodonnel merged commit 57889e7 into apache:HDDS-5713 Feb 1, 2024

sodonnel pushed a commit to sodonnel/hadoop-ozone that referenced this pull request Feb 1, 2024

HDDS-8844. Internal move logic for DiskBalancer (apache#4887)

816ab15

asfgit pushed a commit that referenced this pull request Feb 1, 2024

HDDS-8844. Internal move logic for DiskBalancer (#4887)

8cc0f2d

symious added a commit that referenced this pull request Mar 17, 2024

HDDS-8844. Internal move logic for DiskBalancer (#4887)

f2a0699

sadanand48 pushed a commit to sadanand48/hadoop-ozone that referenced this pull request Apr 17, 2024

HDDS-8844. Internal move logic for DiskBalancer (apache#4887)

7800f3b

sadanand48 pushed a commit to sadanand48/hadoop-ozone that referenced this pull request Apr 17, 2024

HDDS-8844. Internal move logic for DiskBalancer (apache#4887)

395c0ef

sadanand48 pushed a commit that referenced this pull request Apr 30, 2024

HDDS-8844. Internal move logic for DiskBalancer (#4887)

2d5afdd

ChenSammi pushed a commit to ChenSammi/ozone that referenced this pull request Feb 19, 2025

HDDS-8844. Internal move logic for DiskBalancer (apache#4887)

60a84ca

ChenSammi pushed a commit to ChenSammi/ozone that referenced this pull request Feb 20, 2025

HDDS-8844. Internal move logic for DiskBalancer (apache#4887)

a4ddc9e

HDDS-8844. Internal move logic for DiskBalancer #4887

HDDS-8844. Internal move logic for DiskBalancer #4887

Uh oh!

Conversation

symious commented Jun 14, 2023

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

symious commented Jun 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sodonnel commented Jan 25, 2024

Uh oh!

symious commented Jan 26, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sodonnel commented Jan 31, 2024

Uh oh!

symious commented Feb 1, 2024

Uh oh!

sodonnel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants