Skip to content

[Bugfix] Fix the bug data balance causes tablet loss#9971

Merged
morningman merged 7 commits intoapache:masterfrom
platoneko:rebalance-fuzzy-test
Jun 15, 2022
Merged

[Bugfix] Fix the bug data balance causes tablet loss#9971
morningman merged 7 commits intoapache:masterfrom
platoneko:rebalance-fuzzy-test

Conversation

@platoneko
Copy link
Contributor

Proposed changes

Issue Number: close #6061

Problem Summary:

  1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent.
  2. According to [Bug]Fix the bug data balance causes tablet loss #6063, almost apply this fix on current code.

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)
  2. Has unit tests been added: (No/No Need)
  3. Has document been added or modified: (Yes/No/No Need)
  4. Does it need to update dependencies: (Yes/No)
  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@platoneko platoneko changed the title Rebalance fuzzy test [Bugfix] Fix the bug data balance causes tablet loss Jun 6, 2022
@morningman morningman added the dev/1.0.1-deprecated should be merged into dev-1.0.1 branch label Jun 7, 2022
@morningman morningman added this to the v1.1 milestone Jun 7, 2022
* It's used to test the reliability in single replica case when tablet scheduling are frequent.
* Default is false.
*/
@ConfField
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@ConfField
@ConfField(mutable = false, masterOnly = true)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And please add comment to explain more about how this works.

}
if (to_drop_tablet->replica_id() != replica_id && replica_id != 0) {
LOG(WARNING) << "fail to drop tablet because replica_id not match. "
<< "tablet_id=" << tablet_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print replica_id as well.

@morningman morningman added the kind/fix Categorizes issue or PR as related to a bug. label Jun 9, 2022
@platoneko platoneko force-pushed the rebalance-fuzzy-test branch from ae953f6 to a4e3ca5 Compare June 10, 2022 02:24
morningman
morningman previously approved these changes Jun 10, 2022
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 10, 2022
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In the process of cloning a tablet, when the source tablet snapshot file is downloaded, you need to reset the replica ID in the tablet meta to mark the tablet as a new tablet.
    Otherwise, the drop task will still delete the tablet after the clone is finished, which is exactly the bug.

  2. Also need add replica id in report replica info task.

@platoneko
Copy link
Contributor Author

platoneko commented Jun 12, 2022

  1. In the process of cloning a tablet, when the source tablet snapshot file is downloaded, you need to reset the replica ID in the tablet meta to mark the tablet as a new tablet.
    Otherwise, the drop task will still delete the tablet after the clone is finished, which is exactly the bug.
  2. Also need add replica id in report replica info task.
  1. I have done this in SnapshotManager::convert_rowset_ids. Is there anything else to do?
  2. Good catch! I forgot to set replica id in tablet report, I will fix it soon.

@platoneko platoneko force-pushed the rebalance-fuzzy-test branch from a4e3ca5 to e5802c2 Compare June 13, 2022 04:00
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jun 13, 2022
@platoneko platoneko force-pushed the rebalance-fuzzy-test branch from e5802c2 to d5bea88 Compare June 13, 2022 08:33
@platoneko platoneko force-pushed the rebalance-fuzzy-test branch from d5bea88 to 8f39c81 Compare June 13, 2022 15:27
@platoneko platoneko force-pushed the rebalance-fuzzy-test branch from 8f39c81 to 65c8c27 Compare June 14, 2022 15:40
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 15, 2022
@morningman morningman merged commit f4e2f78 into apache:master Jun 15, 2022
@morningman morningman added dev/backlog waiting to be merged in future dev branch and removed dev/1.0.1-deprecated should be merged into dev-1.0.1 branch labels Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/backlog waiting to be merged in future dev branch kind/fix Categorizes issue or PR as related to a bug. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Tablet is lost due to data balancing

3 participants