Skip to content

Conversation

@yujun777
Copy link
Contributor

Proposed changes

pick #25236

fe should sync replica's version with be's report:

I. In some cases, replica version of fe may greater than be's report version due to program bugs. If there's no write later, fe replica's last failed version will always be -1. And this replica couldn't be repair.

To fix these, if fe's version > be's report version, after 5min, fe will mark this replica as missing versions. Later this replica can be repair;

II. In some other cases, it will cause case replica's version = partition's visible version = partition's committed version, and replica's last failed version = partition's + 1. case as follow:
a. suppose partition visible version = 10, committed version = 11, the committed txn version T is not published yet;
b. clone a replica A to B, after cloning, B's version = 10.
c. publish txn T, A version = 11. Also due to txn publish bug, B's version = 11 (PR #23706 has fix this);
d. publish another txn, A and B's version become 12;
e. now B's backend version is: [1, 10], [12, 12], it will report it missing version = 11; then fe will mark this replica's last failed version = version + 1 = 12 + 1 = 13;
f. fe will let B clone version 11 from A, after cloning, B will contains all the version. But Replica.updateVersion has bug,
if it found its version not change (12 -> 12),it will not set replica's last failed version to -1.
Then later, B will always be: version = 12, last failed version = 13.

To Fix this problem, after cloning or when be report, if replica's version >= partition's commit version(more precise, replica's version = partition's commit version), and replica's last failed version > partition's commit (more precise, replica's last failed version = partition's commit version + 1), then we should set replica's last failed version to -1.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@yujun777
Copy link
Contributor Author

run buildall

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 18, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yujun777 yujun777 force-pushed the branch-2.0-fe-sync-version-with-be branch from 55529d8 to 7a51cf2 Compare October 19, 2023 12:44
@yujun777
Copy link
Contributor Author

run buildall

@wm1581066 wm1581066 requested a review from yiguolei October 20, 2023 07:10
@dataroaring dataroaring merged commit 66974a8 into apache:branch-2.0 Oct 20, 2023
xiaokang added a commit that referenced this pull request Oct 20, 2023
xiaokang added a commit that referenced this pull request Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants