-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[cherry-pick](branch-3.0) Don't prematurely erase DeleteRows in reading iceberg table with position delete (#47977) #48309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cherry-pick](branch-3.0) Don't prematurely erase DeleteRows in reading iceberg table with position delete (#47977) #48309
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
run buildall |
4 similar comments
|
run buildall |
|
run buildall |
|
run buildall |
|
run buildall |
…table with position delete (apache#47977) Issue Number: close apache#41460 Problem Summary: When reading the Iceberg table, previously read `DeleteRows` should not be released immediately, as the Iceberg data file is split into multiple `IcebergSplit`s for execution. These `IcebergSplit`s belong to the same data file, meaning they share the same `DeleteRows`. Therefore, `DeleteRows` in the `DeleteFile` should not be released prematurely. Instead, they should be released when the shared_kv is reset, at which point all `DeleteRows` will be freed along with the cached `DeleteFile`.
e589895 to
20a27fe
Compare
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 40291 ms |
TPC-DS: Total hot run time: 197888 ms |
ClickBench: Total hot run time: 33.86 s |
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 40806 ms |
TPC-DS: Total hot run time: 197709 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
ClickBench: Total hot run time: 31.09 s |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
pick #47977
Issue Number: close #41460
Problem Summary:
When reading the Iceberg table, previously read
DeleteRowsshould not be released immediately, as the Iceberg data file is split into multipleIcebergSplits for execution. TheseIcebergSplits belong to the same data file, meaning they share the sameDeleteRows. Therefore,DeleteRowsin theDeleteFileshould not be released prematurely. Instead, they should be released when the shared_kv is reset, at which point allDeleteRowswill be freed along with the cachedDeleteFile.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)