-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-13326. Add acceptance tests for ozone debug replicas verify --container-state #8783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tejaskriya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @sarvekshayr, left 1 comment below.
Although it would be nice to find a way to reduce duplication between the 3 rules added such that we can manage with just 1 rule / just 1 rule file. But I am not aware of a way to do it. @ssulav do you have any ideas for this?
hadoop-ozone/dist/src/main/smoketest/debug/container-state-verifier.robot
Outdated
Show resolved
Hide resolved
|
Yes, we can merge all the rules in a single file if we want, and apply the rule file only once.
I have an idea, but not tested. We may achieve this via an environment variable with a conditional check. |
|
Also right now we are overriding the org.apache.hadoop.ozone.container.common.impl.ContainerData.getState() to return as we wish with the help of byteman fault. And the debug verify cli helps to return the state, as it is also calling the same API. I would say this is not the best implementation. It would be better if we could somehow make the state persistent in the datatnode and not just override the API response. |
dombizita
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this @sarvekshayr! Overall it looks good to me. I see that you reduced code duplication, but I think @ssulav meant for the byteman files as well, right? That they could be merged to one rule, instead of the 3 files; but I'm not sure how that would work
I think it would be a lot of pre-test setup work to actually make these unhealthy containers on the datanodes, which would not even necessarily test this tool's functionality, more the container health handling on the datanodes. I get that overriding the same method which is used by this tool could be not ideal, but in this case overriding this makes sense to me, as this method is used at other places in the code for testing container health related things. What do you think, could be any more place which could be used that wouldn't mean to make scenarios on the DN side to achieve the unhealthy containers? |
I tested the byteman rule by overriding getState() using environment variable but it didn't seem to work. |
hadoop-ozone/dist/src/main/smoketest/debug/container-state-verifier.robot
Outdated
Show resolved
Hide resolved
hadoop-ozone/dist/src/main/smoketest/debug/container-state-verifier.robot
Show resolved
Hide resolved
ssulav
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for working on the review coments.
|
Thanks for the test @sarvekshayr , and the reviews @ssulav @dombizita |
What changes were proposed in this pull request?
Added Byteman rules to inject container states
UNHEALTHY,DELETED, andINVALIDin order to test and validate the output ofozone debug replicas verify --container-state. This ensures that theContainerStateVerifiercorrectly identifies and reports that the checks have failed.What is the link to the Apache JIRA
HDDS-13326
How was this patch tested?
CI: https://github.com/sarvekshayr/ozone/actions/runs/16200808599/job/45739884752#step:13:193