Skip to content

Conversation

@morningman
Copy link
Contributor

Proposed changes

Issue Number: close #8208

Problem Summary:

In the original tablet reporting information, the version missing information is done by combining
two pieces of information as follows:

  1. the maximum consecutive version number
  2. the version_miss field

The logic of this approach is confusing and inconsistent with the logic of checking for missing versions when querying.

After the change, we directly use the version checking logic used in the query, and set version_miss to true
if a missing version is found

and on the FE processing side. Originally, only the bad replica information was syncronized among FEs,
but not the version missing information. As a result, the non-master FE is not aware of the missing version information.

In the new design, we deprecate the original log persistence class BackendTabletsInfo and use the new
BackendReplicasInfo to record replica reporting information and write both bad and version missing
information to metadata so that other FEs can synchronize these information.

Checklist(Required)

  1. Does it affect the original behavior: (No)
  2. Has unit tests been added: (Yes)
  3. Has document been added or modified: (No Need)
  4. Does it need to update dependencies: (No)
  5. Are there any changes that cannot be rolled back: (No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@morningman morningman added the dev/backlog waiting to be merged in future dev branch label Feb 23, 2022
}

OLAPStatus Tablet::check_version_integrity(const Version& version) {
OLAPStatus Tablet::check_version_integrity(const Version& version, bool quite) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does quite mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if !quite, it will print log for missing versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean quiet? slient may be better

yangzhg
yangzhg previously approved these changes Mar 8, 2022
Copy link
Member

@yangzhg yangzhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

yangzhg
yangzhg previously approved these changes Mar 9, 2022
Copy link
Member

@yangzhg yangzhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 9, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2022

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2022

PR approved by anyone and no changes requested.

@yangzhg yangzhg dismissed their stale review March 9, 2022 03:27

maybe typo

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Mar 9, 2022
Copy link
Member

@yangzhg yangzhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 9, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2022

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 826467e into apache:master Mar 9, 2022
morningman added a commit to morningman/doris that referenced this pull request Mar 10, 2022
apache#8209)

In the original tablet reporting information, the version missing information is done by combining
two pieces of information as follows:

1. the maximum consecutive version number
2. the `version_miss` field

The logic of this approach is confusing and inconsistent with the logic of checking for missing versions when querying.

After the change, we directly use the version checking logic used in the query, and set `version_miss` to true
if a missing version is found

and on the FE processing side. Originally, only the **bad replica** information was syncronized among FEs,
but not the **version missing** information. As a result, the non-master FE is not aware of the missing version information.

In the new design, we deprecate the original log persistence class `BackendTabletsInfo` and use the new
`BackendReplicasInfo` to record replica reporting information and write both **bad** and **version missing**
information to metadata so that other FEs can synchronize these information.
morningman added a commit that referenced this pull request Mar 11, 2022
#8444)

1.
This bug was introduced by #8209.
Error in fe.warn.log:
```
java.lang.IllegalStateException: 560278
        at com.google.common.base.Preconditions.checkState(Preconditions.java:508) ~[spark-dpp-0.15-SNAPSHOT.jar:0.15-SNAPSHOT]
        at org.apache.doris.catalog.TabletInvertedIndex.getReplica(TabletInvertedIndex.java:462) ~[palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.catalog.Catalog.replayBackendReplicasInfo(Catalog.java:6941) ~[palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:626) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2446) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.master.Checkpoint.doCheckpoint(Checkpoint.java:116) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.master.Checkpoint.runAfterCatalogReady(Checkpoint.java:74) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:0.15-SNAPSHOT]
```

Since the reporting of a tablet and the deletion of a tablet are two independent events
and are not mutually exclusive, it may happen that the tablet is deleted first and the reporting is done later.

2.
Change the tablet report info. Now, the version of a tablet report from BE is the largest continuous version.
Eg, versions: [1,2,3,5,7], the report version of this tablet will be 3.
morningman added a commit that referenced this pull request Mar 11, 2022
#8444)

1.
This bug was introduced by #8209.
Error in fe.warn.log:
```
java.lang.IllegalStateException: 560278
        at com.google.common.base.Preconditions.checkState(Preconditions.java:508) ~[spark-dpp-0.15-SNAPSHOT.jar:0.15-SNAPSHOT]
        at org.apache.doris.catalog.TabletInvertedIndex.getReplica(TabletInvertedIndex.java:462) ~[palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.catalog.Catalog.replayBackendReplicasInfo(Catalog.java:6941) ~[palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:626) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2446) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.master.Checkpoint.doCheckpoint(Checkpoint.java:116) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.master.Checkpoint.runAfterCatalogReady(Checkpoint.java:74) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:0.15-SNAPSHOT]
```

Since the reporting of a tablet and the deletion of a tablet are two independent events
and are not mutually exclusive, it may happen that the tablet is deleted first and the reporting is done later.

2.
Change the tablet report info. Now, the version of a tablet report from BE is the largest continuous version.
Eg, versions: [1,2,3,5,7], the report version of this tablet will be 3.
cambyzju pushed a commit to cambyzju/incubator-doris that referenced this pull request Mar 12, 2022
apache#8209)

In the original tablet reporting information, the version missing information is done by combining
two pieces of information as follows:

1. the maximum consecutive version number
2. the `version_miss` field

The logic of this approach is confusing and inconsistent with the logic of checking for missing versions when querying.

After the change, we directly use the version checking logic used in the query, and set `version_miss` to true
if a missing version is found

and on the FE processing side. Originally, only the **bad replica** information was syncronized among FEs,
but not the **version missing** information. As a result, the non-master FE is not aware of the missing version information.

In the new design, we deprecate the original log persistence class `BackendTabletsInfo` and use the new 
`BackendReplicasInfo` to record replica reporting information and write both **bad** and **version missing**
information to metadata so that other FEs can synchronize these information.
morningman pushed a commit to morningman/doris that referenced this pull request Mar 12, 2022
morningman added a commit that referenced this pull request Mar 12, 2022
morningman added a commit that referenced this pull request Mar 12, 2022
morningman added a commit that referenced this pull request Mar 14, 2022
#8209)

In the original tablet reporting information, the version missing information is done by combining
two pieces of information as follows:

1. the maximum consecutive version number
2. the `version_miss` field

The logic of this approach is confusing and inconsistent with the logic of checking for missing versions when querying.

After the change, we directly use the version checking logic used in the query, and set `version_miss` to true
if a missing version is found

and on the FE processing side. Originally, only the **bad replica** information was syncronized among FEs,
but not the **version missing** information. As a result, the non-master FE is not aware of the missing version information.

In the new design, we deprecate the original log persistence class `BackendTabletsInfo` and use the new
`BackendReplicasInfo` to record replica reporting information and write both **bad** and **version missing**
information to metadata so that other FEs can synchronize these information.
@morningman morningman added dev/merged-1.0.1-deprecated PR has been merged into dev-1.0.1 and removed dev/backlog waiting to be merged in future dev branch labels Mar 14, 2022
zhengte pushed a commit to zhengte/incubator-doris that referenced this pull request Mar 15, 2022
apache#8209)

In the original tablet reporting information, the version missing information is done by combining
two pieces of information as follows:

1. the maximum consecutive version number
2. the `version_miss` field

The logic of this approach is confusing and inconsistent with the logic of checking for missing versions when querying.

After the change, we directly use the version checking logic used in the query, and set `version_miss` to true
if a missing version is found

and on the FE processing side. Originally, only the **bad replica** information was syncronized among FEs,
but not the **version missing** information. As a result, the non-master FE is not aware of the missing version information.

In the new design, we deprecate the original log persistence class `BackendTabletsInfo` and use the new 
`BackendReplicasInfo` to record replica reporting information and write both **bad** and **version missing**
information to metadata so that other FEs can synchronize these information.
zhengte pushed a commit to zhengte/incubator-doris that referenced this pull request Mar 15, 2022
apache#8444)

1.
This bug was introduced by apache#8209.
Error in fe.warn.log:
```
java.lang.IllegalStateException: 560278
        at com.google.common.base.Preconditions.checkState(Preconditions.java:508) ~[spark-dpp-0.15-SNAPSHOT.jar:0.15-SNAPSHOT]
        at org.apache.doris.catalog.TabletInvertedIndex.getReplica(TabletInvertedIndex.java:462) ~[palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.catalog.Catalog.replayBackendReplicasInfo(Catalog.java:6941) ~[palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:626) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2446) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.master.Checkpoint.doCheckpoint(Checkpoint.java:116) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.master.Checkpoint.runAfterCatalogReady(Checkpoint.java:74) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:0.15-SNAPSHOT]
        at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:0.15-SNAPSHOT]
```

Since the reporting of a tablet and the deletion of a tablet are two independent events
and are not mutually exclusive, it may happen that the tablet is deleted first and the reporting is done later.

2.
Change the tablet report info. Now, the version of a tablet report from BE is the largest continuous version.
Eg, versions: [1,2,3,5,7], the report version of this tablet will be 3.
zhengte pushed a commit to zhengte/incubator-doris that referenced this pull request Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/merged-1.0.1-deprecated PR has been merged into dev-1.0.1 reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] query failed with -214

3 participants