Skip to content

Conversation

@xy720
Copy link
Member

@xy720 xy720 commented Oct 19, 2023

Proposed changes

Recently, in version 1.2.7, we encountered a situation where all replicas of tablets in table are having missing versions. In this case, FE is unable to recover these tablets, and the table is unable to select.

In pr #18986, I noticed that we have support config recover_with_skip_missing_version to ignore the visible version in FE partition.

This config only work when the replica version is behind partition version. But if the replica on be has a version missing, it actually does not skip any missing versions on be.

For example,

1、if the replica versions in BE is {1-100, 101, 102, 103}, the partition visible version is 100, recover_with_skip_missing_version = ignore_version is working well.

2、if the replica versions in BE is {1-100, 103}, the partition visible version is 100, recover_with_skip_missing_version = ignore_version is not working.

This commit support session variable skip_missing_version to control the query behavior.

If skip_missing_version is set to true, the query will always try to select the one with the highest lastSuccessVersion among all surviving BE replicas.

If skip_missing_version is set to true, the query will always skip the missing rowsets in BE and only return the data from existing rowsets.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@xy720
Copy link
Member Author

xy720 commented Oct 19, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@xy720
Copy link
Member Author

xy720 commented Oct 19, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.54 seconds
stream load tsv: 556 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162156215 Bytes

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.09 seconds
stream load tsv: 554 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162115673 Bytes

@xy720
Copy link
Member Author

xy720 commented Oct 20, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.97% (8283/22405)
Line Coverage: 29.06% (66436/228624)
Region Coverage: 27.72% (34499/124438)
Branch Coverage: 24.33% (17522/72020)
Coverage Report: http://coverage.selectdb-in.cc/coverage/6a64c9df2272c78a4f44104259ea6fac04d6276c_6a64c9df2272c78a4f44104259ea6fac04d6276c/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.77 seconds
stream load tsv: 556 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17161981198 Bytes

@xy720
Copy link
Member Author

xy720 commented Oct 20, 2023

run p0

@dataroaring
Copy link
Contributor

We'd better use a session variable to control behavior of query. Recovery is different than query, recovery is a action without lost, however, if query ignores some missing versions on a replica and another replica has all versions, then query returns a wrong result. If query may return a wrong result, it should be controlled per query and users should know that.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better use a session variable to control behavior of query. Recovery is different than query, recovery is a action without lost, however, if query ignores some missing versions on a replica and another replica has all versions, then query returns a wrong result. If query may return a wrong result, it should be controlled per query and users should know that.

@xy720
Copy link
Member Author

xy720 commented Oct 20, 2023

Ok, I understand. I will use a session variable to make this behavior only effect one session.

We'd better use a session variable to control behavior of query. Recovery is different than query, recovery is a action without lost, however, if query ignores some missing versions on a replica and another replica has all versions, then query returns a wrong result. If query may return a wrong result, it should be controlled per query and users should know that.

@xy720 xy720 force-pushed the enhance-skip-missing-version branch from 6a64c9d to 5dc000d Compare October 21, 2023 14:20
@xy720 xy720 changed the title [enhancement](recover) support skipping missing version in select by config [enhancement](recover) support skipping missing version in select by session variable Oct 21, 2023
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@xy720
Copy link
Member Author

xy720 commented Oct 21, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.04% (8301/22411)
Line Coverage: 29.20% (66663/228334)
Region Coverage: 27.83% (34611/124366)
Branch Coverage: 24.41% (17578/72002)
Coverage Report: http://coverage.selectdb-in.cc/coverage/f096b41aaf47ad5f30fbb86a151b0ac7963d841f_f096b41aaf47ad5f30fbb86a151b0ac7963d841f/report/index.html

@xy720
Copy link
Member Author

xy720 commented Oct 22, 2023

run p0

@xy720 xy720 force-pushed the enhance-skip-missing-version branch from df85575 to 578f008 Compare October 22, 2023 07:36
@xy720
Copy link
Member Author

xy720 commented Oct 22, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.05% (8303/22411)
Line Coverage: 29.21% (66701/228334)
Region Coverage: 27.85% (34630/124366)
Branch Coverage: 24.43% (17593/72002)
Coverage Report: http://coverage.selectdb-in.cc/coverage/578f00892830a5fb8c6ccd21faddcfc29cb8d506_578f00892830a5fb8c6ccd21faddcfc29cb8d506/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.95 seconds
stream load tsv: 552 seconds loaded 74807831229 Bytes, about 129 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17161969847 Bytes

@xy720
Copy link
Member Author

xy720 commented Oct 22, 2023

run p0

1 similar comment
@xy720
Copy link
Member Author

xy720 commented Oct 22, 2023

run p0

@xy720 xy720 force-pushed the enhance-skip-missing-version branch from de72e15 to 64d33b8 Compare November 1, 2023 12:38
@xy720
Copy link
Member Author

xy720 commented Nov 1, 2023

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2023

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2023

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2023

clang-tidy review says "All clean, LGTM! 👍"

@xy720
Copy link
Member Author

xy720 commented Nov 1, 2023

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.42% (8487/22680)
Line Coverage: 29.81% (68915/231216)
Region Coverage: 28.27% (35670/126154)
Branch Coverage: 25.15% (18278/72666)
Coverage Report: http://coverage.selectdb-in.cc/coverage/36fad9a1e47a77fb5e2f1124dd96eb1c176ac507_36fad9a1e47a77fb5e2f1124dd96eb1c176ac507/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.04 seconds
stream load tsv: 559 seconds loaded 74807831229 Bytes, about 127 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17162044251 Bytes

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 1, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2023

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2023

PR approved by anyone and no changes requested.

@xy720
Copy link
Member Author

xy720 commented Nov 1, 2023

run feut

@xy720
Copy link
Member Author

xy720 commented Nov 1, 2023

run pipelinex_p0

@lide-reed lide-reed self-requested a review November 2, 2023 11:58
Copy link
Contributor

@lide-reed lide-reed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/1.2.8-merged dev/2.0.11-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants