Skip to content

Conversation

@elek
Copy link
Member

@elek elek commented Mar 22, 2021

What changes were proposed in this pull request?

When a container is created for EC replication, the replicaIndex metadata should be persisted to the container yaml file and reported back to the SCM with the container reports.

Note: this PR requires #2055 merged (therefore I mark it as draft, but feel free to comment it)

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4883

How was this patch tested?

e2e tests with https://github.com/elek/ozone/tree/ec

@elek elek changed the base branch from master to HDDS-3816-ec March 22, 2021 12:45
@elek elek marked this pull request as ready for review March 30, 2021 09:46
}
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this test removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good question, I planned to discuss it.

-    // This test is for if we upgrade, and then .container files added by new
-    // server will have new fields added to .container file, after a while we
-    // decided to rollback. Then older ozone can read .container files
-    // created or not.

This is a limitation of the current checksum calculation. With this restriction we can never add new fields to the container file.

I think the proper fix here is removing the test and using the upgrade framework to avoid some situation. This unit test is perfect for the master, temporary, but we should have an option to add new fields if new features are allowed (after finalize).

Today it's not possible as upgrade is not merged, but will be possible with upgrade framework (or just denying all the new EC requests). Therefore, I think it's safe to remove the unit test, move forward, and later add specific upgrade tests for EC.

(cc @avijayanhwx )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point.
I think we can try the following way to handle this? ( we discussed this point and to keep the discussion open, I am posting the comment here)

How about having additional field which calculates the new checksum including new field. Old checksum field unchanged, so, older clients will go with old checksum and newer clients will check new checksum field?
Let's see if this can work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is possible, but we wouldn't like to introduce new checksum fields when we introduce new fields. It should be done (IMHO) together with another fix which ensures that the new checksum field is always backward compatible (for example because all the fields are used to calculate the checksum).

I totally agree that it should be done, but I think it should be done in a separated issue / patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the newer client will always do checksums for available fields in file, but not from java enum fields. Older client will do as is.
Also think about the option ignore the additional field when replIndex is 0, which is default. So, for non-existent we can assume that as 0 in newer clients.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do it, but it doesn't help the compatibility issue, and it requires a bigger refactor on the write code. I checked it, and it requires refactoring ContainerDataRepresenter (or the surrounding code) which may require bigger work.

With ignoring replIndex =0, for EC disabled clusters should not have any impact right?

Thanks for creating the separated JIRA for improving.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With ignoring replIndex =0, for EC disabled clusters should not have any impact right?

Sorry, I am not sure if I understood this question.

  1. Today we have a generic code to serialize all the white-listed fields. It can be modified to support default values and write out spefic if it has a value different from the default, but it requires some code re-organization.

  2. Even if we do this (do not write replIndex to the container file for non EC containers) it doesn't help the backward compatible problem. When a new EC type of container is written (eg. replIndex=1) it couldn't be read by the old cluster code (as it's not known if it's an EC container or not). Therefore, we should enable to write the replIndex only after the finalize step). In this case there is no advantage to write or not write replIndex to the container yaml file for normal containers. (It's more like a code style question, if you think it's worth to add more code re-organization for this, I am fine to add it, but from behavior point of view we will have the same results).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that, after EC branch merged, but not using EC containers at all, will have not any impact at all if we skip writing index.
Anyway for the EC containers, we need to deal. One question: do we allow old code DN to serve the EC container data?
I am worried that Old DN will not know if there is any EC specific logic added at DN.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway for the EC containers, we need to deal. One question: do we allow old code DN to serve the EC container data?

No. It's not possible as EC data will be written only after finalization which means that old DN code (downgrade) won't be supported any more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Let's skip the replicationIndex=0 field writing to yaml file in this patch.
@avijayanhwx If we bump keyvalueContainer version, does the upgrade framework handled to check the version numbers?

Currently, as you noticed in this JIRA, we need to write one additional field in yaml file for EC. Since it's changing the on disk metadata, we need to worry about compt. While in upgrade in progress, your plan is not to allow new features to be used right? In that case, if you are already having check for keyValue container version, then bumping version would make things cleaner. Could you please comment your thoughts here? Thanks

ContainerProtos.ContainerType.KeyValueContainer;
createRequest.setContainerType(containerType);

if (containerRequest.hasWriteChunk()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided to pack the index from pipeline object but not from BlockID right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I am not sure about the question. What do you mean pack from the BockID?

I think you suggested including the replicaIndex in the block id which became a (containerid, localid, replicaindex) tuplet. This is the reason to have getBlockd().getReplicaIndex() here.

But let me know if I misunderstand something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got ur point. I confused because we have not added the code to send the index from client yet, but we are trying to using here. For completion it may be good to include client side sending this idx?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simplified client side change is here: elek@c868788

I think this is a well scoped and unit-tested change, client side change may need additional work. But if you would like to include this small commit, I would be happy to add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine for me. When need at client side we can add that.

@swagle swagle requested a review from sodonnel April 26, 2021 15:28
bharatviswa504 and others added 17 commits April 28, 2021 07:05
Co-authored-by: Doroszlai, Attila <adoroszlai@apache.org>
@elek
Copy link
Member Author

elek commented May 19, 2021

@umamaheswararao #4c243349b persists replicaIndex (and use it in checksum) only if it's >0. Somewhat hacky but works.

Can you PTAL?

Copy link
Contributor

@umamaheswararao umamaheswararao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest changes looks good to me.
Please take care of a nit comment before commit.

+1

import java.util.Set;
import java.util.TreeSet;
import java.util.*;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we just avoid this auto organize?

ns7381 and others added 9 commits May 24, 2021 20:59
* Require block token for all chunk/block operations

* Addendum: implement getBlockID; add tests

* No block token for ListBlock (unsupported operation anyway), as it has no block ID

* Test for more block/chunk operations

* Require container token for ListBlock, but not for ListContainer

* Do not require...Token for unsupported operations
@elek
Copy link
Member Author

elek commented May 25, 2021

Thanks for the review @umamaheswararao. Merging it now (import is fixed and we have green build)

@elek elek merged commit ad790b6 into apache:HDDS-3816-ec May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.