[improvement](sync version) fe sync version with be #25574
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
pick #25236
fe should sync replica's version with be's report:
I. In some cases, replica version of fe may greater than be's report version due to program bugs. If there's no write later, fe replica's last failed version will always be -1. And this replica couldn't be repair.
To fix these, if fe's version > be's report version, after 5min, fe will mark this replica as missing versions. Later this replica can be repair;
II. In some other cases, it will cause case replica's version = partition's visible version = partition's committed version, and replica's last failed version = partition's + 1. case as follow:
a. suppose partition visible version = 10, committed version = 11, the committed txn version T is not published yet;
b. clone a replica A to B, after cloning, B's version = 10.
c. publish txn T, A version = 11. Also due to txn publish bug, B's version = 11 (PR #23706 has fix this);
d. publish another txn, A and B's version become 12;
e. now B's backend version is: [1, 10], [12, 12], it will report it missing version = 11; then fe will mark this replica's last failed version = version + 1 = 12 + 1 = 13;
f. fe will let B clone version 11 from A, after cloning, B will contains all the version. But Replica.updateVersion has bug,
if it found its version not change (12 -> 12),it will not set replica's last failed version to -1.
Then later, B will always be: version = 12, last failed version = 13.
To Fix this problem, after cloning or when be report, if replica's version >= partition's commit version(more precise, replica's version = partition's commit version), and replica's last failed version > partition's commit (more precise, replica's last failed version = partition's commit version + 1), then we should set replica's last failed version to -1.
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...