Skip to content

Conversation

@zhannngchen
Copy link
Contributor

Proposed changes

Issue Number: close #xxx

Problem summary

Only added to unique key table.
We've meet some data inconsistent issues, with this version column, it's much easier to locate issue.
For example:

  1. If there are duplicate keys in unique table, with this version column we can make sure these duplicate keys are generated by a single load, or by compaction.
  2. We can track how our load and compaction replace and merge keys in unique key table.

Checklist(Required)

  1. Does it affect the original behavior:
    • Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • No Need
  3. Has document been added or modified:
    • Yes
    • No
    • No Need
  4. Does it need to update dependencies:
    • Yes
    • No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added the area/planner Issues or PRs related to the query planner label Feb 8, 2023
@zhannngchen zhannngchen changed the title Add version col [Enhancement](storage) add a new hidden column __DORIS_VERSION_COL__ for unique key table Feb 8, 2023
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2023

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

hello-stephen commented Feb 8, 2023

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 34.49 seconds
stream load tsv: 466 seconds loaded 74807831229 Bytes, about 153 MB/s
stream load json: 36 seconds loaded 2358488459 Bytes, about 62 MB/s
stream load orc: 68 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 29 seconds loaded 861443392 Bytes, about 28 MB/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230218091954_clickbench_pr_99535.html

@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2023

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2023

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2023

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 23, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@zhannngchen
Copy link
Contributor Author

run compile

@zhannngchen zhannngchen merged commit edead49 into apache:master Feb 23, 2023
zhannngchen added a commit to zhannngchen/incubator-doris that referenced this pull request Feb 24, 2023
morningman pushed a commit that referenced this pull request Feb 27, 2023
yagagagaga pushed a commit to yagagagaga/doris that referenced this pull request Mar 9, 2023
dataroaring pushed a commit that referenced this pull request Mar 18, 2025
…placed by fake version when merging tmp rowset in sort SC (#49193)

### What problem does this PR solve?

When converting historical rowsets in sort schema change, it may write
many temp rowsets and merge them into one rowset later if memory is not
enough. However, these rowsets have fake versions which are like
`[2^29+x, 2^29+x]`, so the values of `__DORIS_VERSION_COL__` in these
temp rowsets will be wrongly replaced by these fake version(see
#16509) in `Merger::vmerge_rowsets`
when merging them into a single rowset.

This PR modify these fake versions to avoid it.
github-actions bot pushed a commit that referenced this pull request Mar 18, 2025
…placed by fake version when merging tmp rowset in sort SC (#49193)

### What problem does this PR solve?

When converting historical rowsets in sort schema change, it may write
many temp rowsets and merge them into one rowset later if memory is not
enough. However, these rowsets have fake versions which are like
`[2^29+x, 2^29+x]`, so the values of `__DORIS_VERSION_COL__` in these
temp rowsets will be wrongly replaced by these fake version(see
#16509) in `Merger::vmerge_rowsets`
when merging them into a single rowset.

This PR modify these fake versions to avoid it.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…placed by fake version when merging tmp rowset in sort SC (apache#49193)

### What problem does this PR solve?

When converting historical rowsets in sort schema change, it may write
many temp rowsets and merge them into one rowset later if memory is not
enough. However, these rowsets have fake versions which are like
`[2^29+x, 2^29+x]`, so the values of `__DORIS_VERSION_COL__` in these
temp rowsets will be wrongly replaced by these fake version(see
apache#16509) in `Merger::vmerge_rowsets`
when merging them into a single rowset.

This PR modify these fake versions to avoid it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/planner Issues or PRs related to the query planner dev/1.2.3-merged kind/test reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants