-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29190 Allow disabling regions in FAILED_OPEN state #6797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
junegunn
wants to merge
7
commits into
apache:master
Choose a base branch
from
junegunn:HBASE-29190
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+114
−5
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
d9ddd43
Add test case for disabling table with FAILED_OPEN regions
junegunn 42715b6
Allow disabling FAILED_OPEN regions
junegunn cab6eb4
Remove unnecessary error handler
junegunn dd8716f
Add more steps to the test case (disable, fix, and re-enable)
junegunn 718156e
Use two separate ProcedureFutureUtil.suspendIfNecessary calls
junegunn 8bd00b5
Revert the previous change
junegunn c8ec640
Merge remote-tracking branch 'origin/master' into HBASE-29190
junegunn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need to set the state to CLOSED when the state is FAILED_OPEN? And if the state is CLOSED, do we still need to call udpateRegionLocation?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. Before I answer the questions, please take a look at dd8716f. I added a few more steps so that the intention of this change is clearer.
So the question is: "Is
regionNode.setState(State.CLOSED, State.FAILED_OPEN)really necessary? Can we just checkregionNode.isInState(State.FAILED_OPEN)here?" Am I right?If we don't change the state,
we will run into an assertion error in
confirmClosed:hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
Lines 423 to 449 in cab6eb4
Because
FAILED_OPENis not covered in the conditions. If we addFAILED_OPENhere like so:We can avoid the assertion error, but
CloseTableRegionsProcedurewill not finish and endlessly retry:So we also need to change the implementation of
numberOfUnclosedRegionsto account forFAILED_OPENregions.hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
Lines 1147 to 1163 in 42715b6
But I felt this was getting too complicated, and it's simpler to just change the state to
CLOSED.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I called the method to persist the in-memory change of
regionNodeto the meta table. Would it be clearer if I callpersistToMetainstead? The result is roughly the same.If the question is about if it's necessary to persist the changed state to meta, it looks like it's not necessary in this case, though we have to change an assertion in the test code.
However, I think we should try to keep the in-memory state and persistent meta state synchronized to avoid confusion and any potential issues that might arise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, in the if condition you call setState, I misread the code, I thought it is isInState.
Then the logic is fine. But better to add more comments to say why here we do not need to treat CLOSED state specially.
And I suggest that we use two ProcedureFutureUtil.suspendIfNecessary calls for these two conditions, so we do not need to add extra check in closeRegionAfterUpdatingMeta, since for the region in FAILED_OPEN state, the only action after updating meta is to change the state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh.. I remembered why I organized the code this way. This was not super obvious.
When you disable the table with "FAILED_OPEN" regions,
DisableTableProcedurecreates aCloseTableRegionsProcedure,TRSPwithREGION_STATE_TRANSITION_CLOSEas the initial statehbase/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
Lines 157 to 160 in cab6eb4
TRSP#closeRegion,hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
Lines 516 to 518 in cab6eb4
closeRegionAfterUpdatingMetahbase/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
Lines 394 to 401 in cab6eb4
closeRegionAfterUpdatingMetafor aFAILED_OPENregion and HBase creates aCloseRegionProcedurewhich is exactly what we tried to avoid.CloseRegionProcedureis a subclass ofRegionRemoteProcedureBasewhich requirestargetServer, but aFAILED_OPENregion lacks it, and the procedure fails and hangs.So that explains why I had to put the check at the start of
closeRegionAfterUpdatingMeta.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it looks like my analysis was not entirely correct, because at step 4, the initial call to
checkFuture { closeRegionAfterUpdatingMeta }will not actually triggercloseRegionAfterUpdatingMeta. But the suspension of the procedure insuspendIfNecessaryis what triggers multiplecloseRegioncalls which leads to a call tocloseRegionAfterUpdatingMeta.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed that in 718156e. I also added more comments.
The extra check is still there for the reason mentioned above.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the noise. I reverted the change after realizing that
ProcedureFutureUtil.suspendIfNecessarydoes not guarantee execution ofactionAfterDonewhen itsfutureis suspended. So we may get unpredictable behavior ifactionAfterDoneof the previouscheckFutureand that ofsuspendIfNecessaryare not identical.So with the latest commit,
closeRegionAfterUpdatingMetaserves as the only terminal point of the procedure, regardless of the previous state of the region or whether thefuturewas suspended or not.