-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-8844. Internal move logic for DiskBalancer #4887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ChenSammi Could you help to review? I shall provide more test cases along the review. |
| if (toBalanceContainer != null) { | ||
| queue.add(new DiskBalancerTask(toBalanceContainer, sourceVolume, | ||
| destVolume)); | ||
| inProgressContainers.add(toBalanceContainer.getContainerID()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'isBalancingContainer' method is never used. What will happen if the command to delete container/block will be received on container balancing? Or do the locks in the container solve the synchronization problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we can drop the commands and let SCM resend the commands.
|
@symious What is the current status of this PR - it has been here for a while? Are you interested in moving it forward? |
|
@sodonnel Yes, it's been a while for this PR. I'll fix the checks first. |
| .resolve(DISK_BALANCER_DIR).resolve(String.valueOf(containerId)); | ||
|
|
||
| // Copy container to new Volume's tmp Dir | ||
| ozoneContainer.getController().copyContainer(containerData, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there somewhere else in the code that blocks operations against the container while the export / import is happening? Eg, if a we run copyContainer. After that completes a delete block is processed, deleting the block from the original copy, but not the new copy. Then we continue and at the end we have a block in the moved container that should have been removed. I think each of these operations on the container (copy, export, import, etc) lock the container instance but it feels like we need something to lock across the entire move cycle, while also allowing reads to continue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think currently we don't have such locks. Since the locks could be time-consuming for export and import, other operations might be complaining about it.
And I think it's possible now that there are already some replicas not same with others, due to DN restart or exporting, it should be good to have a mechanism to sync among all replicas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a design in progress that will deal with mis-matches between the containers in terms of deleted blocks, but I feel we should try to avoid causing such problems if we can. I am not sure how hard that might be with the current code layout.
Eg if there was a read lock held for the duration of the container move, then it would allow block reads etc to come through, but would stop block deletes. However at the moment, I don't think readChunk has any locks involved, so this would be a lot of difficult changes I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a design in progress that will deal with mis-matches between the containers in terms of deleted blocks
Looking forward to this feature.
Currently I think the read won't be affected, clients can still read data while the move is ongoing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be ok if the block deletes get missed. This isn't really any difference to replication between nodes, as after tarring a container a block could be deleted before the new replica is available on the target node.
| Preconditions.checkNotNull(container, "container cannot be null"); | ||
|
|
||
| long containerId = container.getContainerData().getContainerID(); | ||
| if (!containerMap.containsKey(containerId)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably use computeIfPresent() here, as it makes the operation atomic. There is a small chance the mapping could be remove between the containerKey check and the put.
|
Is this the last PR needed to get the disk balancer working? Also, have you been running a version of this on your own clusters already? I understand you needed this feature but it has been under development for a long time now. |
Yes.
Yes, but the version on our cluster is different from this PR. Also our cluster's Ozone version is quite different from the master. |
sodonnel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is good to commit. +1 from me.
What changes were proposed in this pull request?
This ticket includes the internal move logic for the DiskBalancer.
The code are mainly from HDDS-7233.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8844
How was this patch tested?
Used unit test from HDDS-7233.