-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-4883. Persist replicationIndex on datanode side #2069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this test removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good question, I planned to discuss it.
- // This test is for if we upgrade, and then .container files added by new
- // server will have new fields added to .container file, after a while we
- // decided to rollback. Then older ozone can read .container files
- // created or not.
This is a limitation of the current checksum calculation. With this restriction we can never add new fields to the container file.
I think the proper fix here is removing the test and using the upgrade framework to avoid some situation. This unit test is perfect for the master, temporary, but we should have an option to add new fields if new features are allowed (after finalize).
Today it's not possible as upgrade is not merged, but will be possible with upgrade framework (or just denying all the new EC requests). Therefore, I think it's safe to remove the unit test, move forward, and later add specific upgrade tests for EC.
(cc @avijayanhwx )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point.
I think we can try the following way to handle this? ( we discussed this point and to keep the discussion open, I am posting the comment here)
How about having additional field which calculates the new checksum including new field. Old checksum field unchanged, so, older clients will go with old checksum and newer clients will check new checksum field?
Let's see if this can work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is possible, but we wouldn't like to introduce new checksum fields when we introduce new fields. It should be done (IMHO) together with another fix which ensures that the new checksum field is always backward compatible (for example because all the fields are used to calculate the checksum).
I totally agree that it should be done, but I think it should be done in a separated issue / patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the newer client will always do checksums for available fields in file, but not from java enum fields. Older client will do as is.
Also think about the option ignore the additional field when replIndex is 0, which is default. So, for non-existent we can assume that as 0 in newer clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do it, but it doesn't help the compatibility issue, and it requires a bigger refactor on the write code. I checked it, and it requires refactoring ContainerDataRepresenter (or the surrounding code) which may require bigger work.
With ignoring replIndex =0, for EC disabled clusters should not have any impact right?
Thanks for creating the separated JIRA for improving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With ignoring replIndex =0, for EC disabled clusters should not have any impact right?
Sorry, I am not sure if I understood this question.
-
Today we have a generic code to serialize all the white-listed fields. It can be modified to support default values and write out spefic if it has a value different from the default, but it requires some code re-organization.
-
Even if we do this (do not write replIndex to the container file for non EC containers) it doesn't help the backward compatible problem. When a new EC type of container is written (eg. replIndex=1) it couldn't be read by the old cluster code (as it's not known if it's an EC container or not). Therefore, we should enable to write the replIndex only after the finalize step). In this case there is no advantage to write or not write
replIndexto the container yaml file for normal containers. (It's more like a code style question, if you think it's worth to add more code re-organization for this, I am fine to add it, but from behavior point of view we will have the same results).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that, after EC branch merged, but not using EC containers at all, will have not any impact at all if we skip writing index.
Anyway for the EC containers, we need to deal. One question: do we allow old code DN to serve the EC container data?
I am worried that Old DN will not know if there is any EC specific logic added at DN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway for the EC containers, we need to deal. One question: do we allow old code DN to serve the EC container data?
No. It's not possible as EC data will be written only after finalization which means that old DN code (downgrade) won't be supported any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Let's skip the replicationIndex=0 field writing to yaml file in this patch.
@avijayanhwx If we bump keyvalueContainer version, does the upgrade framework handled to check the version numbers?
Currently, as you noticed in this JIRA, we need to write one additional field in yaml file for EC. Since it's changing the on disk metadata, we need to worry about compt. While in upgrade in progress, your plan is not to allow new features to be used right? In that case, if you are already having check for keyValue container version, then bumping version would make things cleaner. Could you please comment your thoughts here? Thanks
| ContainerProtos.ContainerType.KeyValueContainer; | ||
| createRequest.setContainerType(containerType); | ||
|
|
||
| if (containerRequest.hasWriteChunk()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we decided to pack the index from pipeline object but not from BlockID right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I am not sure about the question. What do you mean pack from the BockID?
I think you suggested including the replicaIndex in the block id which became a (containerid, localid, replicaindex) tuplet. This is the reason to have getBlockd().getReplicaIndex() here.
But let me know if I misunderstand something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got ur point. I confused because we have not added the code to send the index from client yet, but we are trying to using here. For completion it may be good to include client side sending this idx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simplified client side change is here: elek@c868788
I think this is a well scoped and unit-tested change, client side change may need additional work. But if you would like to include this small commit, I would be happy to add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine for me. When need at client side we can add that.
Co-authored-by: Doroszlai, Attila <adoroszlai@apache.org>
…Policy#testRandomChoosingPolicy (apache#2188)
…is changed (apache#2177)" This reverts commit 5dd0943.
|
@umamaheswararao #4c243349b persists Can you PTAL? |
Co-authored-by: wangzhaohui8 <wangzhaohui8@jd.com>
…lication, data read, scm/om snapshot download (apache#2256)
umamaheswararao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest changes looks good to me.
Please take care of a nit comment before commit.
+1
| import java.util.Set; | ||
| import java.util.TreeSet; | ||
| import java.util.*; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we just avoid this auto organize?
* Require block token for all chunk/block operations * Addendum: implement getBlockID; add tests * No block token for ListBlock (unsupported operation anyway), as it has no block ID * Test for more block/chunk operations * Require container token for ListBlock, but not for ListContainer * Do not require...Token for unsupported operations
|
Thanks for the review @umamaheswararao. Merging it now (import is fixed and we have green build) |
What changes were proposed in this pull request?
When a container is created for EC replication, the
replicaIndexmetadata should be persisted to the container yaml file and reported back to the SCM with the container reports.Note: this PR requires #2055 merged (therefore I mark it as draft, but feel free to comment it)
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4883
How was this patch tested?
e2e tests with https://github.com/elek/ozone/tree/ec