Skip to content

Conversation

@bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Sep 25, 2023

Proposed changes

When we insert a row with delete sign marked, it will not mark itself on the delete bitmap if the table has sequence column(see #24011). So if we delete a row using delete sign and then insert a row with same keys using partial update, it will wrongly fill the values in in the unmentioned columns in which the delete sign column will be filled with 1 and the row that should have been inserted will be deleted. So in this occasion we must check the delete sign column to check that if a row really exists in the table.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@bobhan1
Copy link
Contributor Author

bobhan1 commented Sep 25, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.31% (8111/22336)
Line Coverage: 28.43% (64876/228158)
Region Coverage: 27.31% (33634/123137)
Branch Coverage: 23.97% (17168/71632)
Coverage Report: http://coverage.selectdb-in.cc/coverage/aafaf082f7e72b15880defb6491a65b7b2f90fae_aafaf082f7e72b15880defb6491a65b7b2f90fae/report/index.html

@bobhan1 bobhan1 force-pushed the fix-partial-update-delete-sign branch from aafaf08 to 354b933 Compare September 25, 2023 14:15
@bobhan1
Copy link
Contributor Author

bobhan1 commented Sep 25, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.34% (8117/22337)
Line Coverage: 28.45% (64927/228229)
Region Coverage: 27.33% (33669/123200)
Branch Coverage: 23.97% (17182/71672)
Coverage Report: http://coverage.selectdb-in.cc/coverage/354b933b695de4a3ffa1b0d757f87930744146bf_354b933b695de4a3ffa1b0d757f87930744146bf/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.79 seconds
stream load tsv: 573 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17162257880 Bytes

@bobhan1 bobhan1 force-pushed the fix-partial-update-delete-sign branch from 354b933 to 819937b Compare September 26, 2023 04:05
@bobhan1
Copy link
Contributor Author

bobhan1 commented Sep 26, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.33% (8115/22338)
Line Coverage: 28.43% (64897/228260)
Region Coverage: 27.32% (33665/123224)
Branch Coverage: 23.97% (17180/71674)
Coverage Report: http://coverage.selectdb-in.cc/coverage/819937b2fbdb4fdeb789bbca7a377f7fe86d4e73_819937b2fbdb4fdeb789bbca7a377f7fe86d4e73/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.37 seconds
stream load tsv: 576 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.4 seconds inserted 10000000 Rows, about 352K ops/s
storage size: 17162337625 Bytes

@bobhan1 bobhan1 requested a review from zhannngchen September 26, 2023 05:03
Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Sep 26, 2023
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@zhannngchen zhannngchen merged commit 1abda1c into apache:master Sep 26, 2023
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 7, 2023
vinlee19 pushed a commit to vinlee19/doris that referenced this pull request Oct 7, 2023
xiaokang pushed a commit that referenced this pull request Oct 8, 2023
…n merge-on-write unique table (#25071)

* pick "[Fix](Partial update) Fix wrong position using in segment writer #22782"

* pick "[Enhancement](merge-on-write) use delete bitmap to mark delete for rows with delete sign when sequence column doesn't exist #24011"

* pick "[Fix](merge-on-write) Correct the alignment process when the existing rows with same key has marked delete sign #24877"
dataroaring pushed a commit that referenced this pull request Jun 14, 2024
…rows with delete sign (#36210)

Issue Number: close #34296

1. When partial update filling in the missing fields, if a load job
previously wrote data with a delete sign, it will also read out the data
in the column with the delete sign, so that the newly written data will
also become invisible
2. This problem was fixed in #24877, but was introduced again in #26721,
and was never found because the case was changed to the wrong output in
#26721.
3. The fix in #24877 didn't take into account the handling of concurrent
conflicts in the publish phase, the current PR adds this part of the
handling, and adds the corresponding case.
dataroaring pushed a commit that referenced this pull request Jun 17, 2024
…rows with delete sign (#36210)

Issue Number: close #34296

1. When partial update filling in the missing fields, if a load job
previously wrote data with a delete sign, it will also read out the data
in the column with the delete sign, so that the newly written data will
also become invisible
2. This problem was fixed in #24877, but was introduced again in #26721,
and was never found because the case was changed to the wrong output in
#26721.
3. The fix in #24877 didn't take into account the handling of concurrent
conflicts in the publish phase, the current PR adds this part of the
handling, and adds the corresponding case.
zhannngchen added a commit to zhannngchen/incubator-doris that referenced this pull request Jun 24, 2024
…rows with delete sign (apache#36210)

Issue Number: close apache#34296

1. When partial update filling in the missing fields, if a load job
previously wrote data with a delete sign, it will also read out the data
in the column with the delete sign, so that the newly written data will
also become invisible
2. This problem was fixed in apache#24877, but was introduced again in apache#26721,
and was never found because the case was changed to the wrong output in
3. The fix in apache#24877 didn't take into account the handling of concurrent
conflicts in the publish phase, the current PR adds this part of the
handling, and adds the corresponding case.
zhannngchen added a commit to zhannngchen/incubator-doris that referenced this pull request Jun 24, 2024
…rows with delete sign (apache#36210)

Issue Number: close apache#34296

1. When partial update filling in the missing fields, if a load job
previously wrote data with a delete sign, it will also read out the data
in the column with the delete sign, so that the newly written data will
also become invisible
2. This problem was fixed in apache#24877, but was introduced again in apache#26721,
and was never found because the case was changed to the wrong output in
3. The fix in apache#24877 didn't take into account the handling of concurrent
conflicts in the publish phase, the current PR adds this part of the
handling, and adds the corresponding case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.3-merged merge_conflict reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants