-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](iceberg) Don't prematurely erase DeleteRows in reading iceberg table with position delete #47977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](iceberg) Don't prematurely erase DeleteRows in reading iceberg table with position delete #47977
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 31562 ms |
TPC-DS: Total hot run time: 189428 ms |
|
TeamCity be ut coverage result: |
ClickBench: Total hot run time: 30.39 s |
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…table with position delete (apache#47977) ### What problem does this PR solve? Issue Number: close apache#41460 Problem Summary: When reading the Iceberg table, previously read `DeleteRows` should not be released immediately, as the Iceberg data file is split into multiple `IcebergSplit`s for execution. These `IcebergSplit`s belong to the same data file, meaning they share the same `DeleteRows`. Therefore, `DeleteRows` in the `DeleteFile` should not be released prematurely. Instead, they should be released when the shared_kv is reset, at which point all `DeleteRows` will be freed along with the cached `DeleteFile`.
…table with position delete (apache#47977) Issue Number: close apache#41460 Problem Summary: When reading the Iceberg table, previously read `DeleteRows` should not be released immediately, as the Iceberg data file is split into multiple `IcebergSplit`s for execution. These `IcebergSplit`s belong to the same data file, meaning they share the same `DeleteRows`. Therefore, `DeleteRows` in the `DeleteFile` should not be released prematurely. Instead, they should be released when the shared_kv is reset, at which point all `DeleteRows` will be freed along with the cached `DeleteFile`.
…table with position delete (apache#47977) Issue Number: close apache#41460 Problem Summary: When reading the Iceberg table, previously read `DeleteRows` should not be released immediately, as the Iceberg data file is split into multiple `IcebergSplit`s for execution. These `IcebergSplit`s belong to the same data file, meaning they share the same `DeleteRows`. Therefore, `DeleteRows` in the `DeleteFile` should not be released prematurely. Instead, they should be released when the shared_kv is reset, at which point all `DeleteRows` will be freed along with the cached `DeleteFile`.
…ng iceberg table with position delete (#47977) (#48308) ### What problem does this PR solve? Issue Number: close #41460 Problem Summary: When reading the Iceberg table, previously read `DeleteRows` should not be released immediately, as the Iceberg data file is split into multiple `IcebergSplit`s for execution. These `IcebergSplit`s belong to the same data file, meaning they share the same `DeleteRows`. Therefore, `DeleteRows` in the `DeleteFile` should not be released prematurely. Instead, they should be released when the shared_kv is reset, at which point all `DeleteRows` will be freed along with the cached `DeleteFile`. ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…table with position delete (apache#47977) Issue Number: close apache#41460 Problem Summary: When reading the Iceberg table, previously read `DeleteRows` should not be released immediately, as the Iceberg data file is split into multiple `IcebergSplit`s for execution. These `IcebergSplit`s belong to the same data file, meaning they share the same `DeleteRows`. Therefore, `DeleteRows` in the `DeleteFile` should not be released prematurely. Instead, they should be released when the shared_kv is reset, at which point all `DeleteRows` will be freed along with the cached `DeleteFile`.
…ng iceberg table with position delete (#47977) (#48309) Issue Number: close #41460 Problem Summary: When reading the Iceberg table, previously read `DeleteRows` should not be released immediately, as the Iceberg data file is split into multiple `IcebergSplit`s for execution. These `IcebergSplit`s belong to the same data file, meaning they share the same `DeleteRows`. Therefore, `DeleteRows` in the `DeleteFile` should not be released prematurely. Instead, they should be released when the shared_kv is reset, at which point all `DeleteRows` will be freed along with the cached `DeleteFile`.
…table with position delete (apache#47977) ### What problem does this PR solve? Issue Number: close apache#41460 Problem Summary: When reading the Iceberg table, previously read `DeleteRows` should not be released immediately, as the Iceberg data file is split into multiple `IcebergSplit`s for execution. These `IcebergSplit`s belong to the same data file, meaning they share the same `DeleteRows`. Therefore, `DeleteRows` in the `DeleteFile` should not be released prematurely. Instead, they should be released when the shared_kv is reset, at which point all `DeleteRows` will be freed along with the cached `DeleteFile`.
What problem does this PR solve?
Issue Number: close #41460
Problem Summary:
When reading the Iceberg table, previously read
DeleteRowsshould not be released immediately, as the Iceberg data file is split into multipleIcebergSplits for execution. TheseIcebergSplits belong to the same data file, meaning they share the sameDeleteRows. Therefore,DeleteRowsin theDeleteFileshould not be released prematurely. Instead, they should be released when the shared_kv is reset, at which point allDeleteRowswill be freed along with the cachedDeleteFile.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)