Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #41460

Problem Summary:
When reading the Iceberg table, previously read DeleteRows should not be released immediately, as the Iceberg data file is split into multiple IcebergSplits for execution. These IcebergSplits belong to the same data file, meaning they share the same DeleteRows. Therefore, DeleteRows in the DeleteFile should not be released prematurely. Instead, they should be released when the shared_kv is reset, at which point all DeleteRows will be freed along with the cached DeleteFile.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Feb 17, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223 suxiaogang223 changed the title [fix](iceberg) Don't erase DeleteRows after [fix](iceberg) Don't erase DeleteRows after set_delete_rows Feb 17, 2025
@suxiaogang223 suxiaogang223 changed the title [fix](iceberg) Don't erase DeleteRows after set_delete_rows [fix](iceberg) Don't prematurely erase DeleteRows in reading iceberg table with position delete Feb 17, 2025
@doris-robot
Copy link

TPC-H: Total hot run time: 31562 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a9d8c94067971cd4383da2f95324b7736e688934, data reload: false

------ Round 1 ----------------------------------
q1	17608	5288	5034	5034
q2	2046	306	171	171
q3	10411	1238	761	761
q4	10202	998	549	549
q5	7507	2380	2307	2307
q6	186	166	131	131
q7	889	735	595	595
q8	9291	1286	1110	1110
q9	4807	4712	4808	4712
q10	6845	2311	1897	1897
q11	486	277	265	265
q12	343	345	217	217
q13	17770	3762	3047	3047
q14	225	224	204	204
q15	515	462	457	457
q16	624	612	575	575
q17	574	851	334	334
q18	6704	6165	6182	6165
q19	1277	945	555	555
q20	327	341	196	196
q21	2750	2287	1977	1977
q22	363	330	303	303
Total cold run time: 101750 ms
Total hot run time: 31562 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5127	5148	5112	5112
q2	244	325	234	234
q3	2149	2642	2307	2307
q4	1435	1819	1375	1375
q5	4212	4156	4150	4150
q6	206	175	133	133
q7	1868	1795	1649	1649
q8	2618	2611	2551	2551
q9	7183	7145	7107	7107
q10	3028	3188	2751	2751
q11	563	522	502	502
q12	664	768	586	586
q13	3557	3830	3310	3310
q14	276	288	264	264
q15	512	466	475	466
q16	657	663	668	663
q17	1129	1530	1370	1370
q18	7701	7306	7365	7306
q19	792	835	992	835
q20	1961	2028	1898	1898
q21	5447	5001	4930	4930
q22	629	589	539	539
Total cold run time: 51958 ms
Total hot run time: 50038 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189428 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a9d8c94067971cd4383da2f95324b7736e688934, data reload: false

query1	1300	952	935	935
query2	6216	1805	1792	1792
query3	10989	4436	4372	4372
query4	55403	25327	23109	23109
query5	4992	548	495	495
query6	325	197	182	182
query7	4897	492	290	290
query8	307	251	232	232
query9	5297	2515	2491	2491
query10	403	295	263	263
query11	16503	15041	14972	14972
query12	162	106	107	106
query13	1027	529	386	386
query14	10956	6318	6692	6318
query15	214	197	183	183
query16	7177	643	490	490
query17	1082	737	583	583
query18	1562	426	323	323
query19	203	194	164	164
query20	132	124	127	124
query21	210	164	101	101
query22	4534	4526	4235	4235
query23	34036	33355	33516	33355
query24	5788	2429	2431	2429
query25	466	464	404	404
query26	699	272	156	156
query27	1806	487	341	341
query28	2706	2439	2407	2407
query29	575	547	424	424
query30	210	188	151	151
query31	871	888	799	799
query32	72	59	62	59
query33	423	360	299	299
query34	775	875	507	507
query35	789	839	759	759
query36	977	994	912	912
query37	123	96	69	69
query38	4374	4401	4579	4401
query39	1489	1439	1451	1439
query40	226	115	103	103
query41	49	50	51	50
query42	121	103	103	103
query43	491	502	458	458
query44	1308	798	797	797
query45	181	170	163	163
query46	885	1057	651	651
query47	1870	1882	1791	1791
query48	380	415	302	302
query49	687	502	434	434
query50	720	754	416	416
query51	4313	4327	4251	4251
query52	113	105	93	93
query53	232	258	185	185
query54	477	479	390	390
query55	85	82	79	79
query56	285	270	266	266
query57	1214	1211	1118	1118
query58	247	239	254	239
query59	2704	2815	2749	2749
query60	292	280	274	274
query61	118	115	114	114
query62	734	731	673	673
query63	235	192	201	192
query64	1916	1048	709	709
query65	3200	3153	3126	3126
query66	738	395	302	302
query67	15716	15525	15384	15384
query68	5619	761	495	495
query69	523	290	257	257
query70	1215	1127	1110	1110
query71	438	302	272	272
query72	6335	3601	3777	3601
query73	1386	801	345	345
query74	8926	9153	8836	8836
query75	3212	3169	2683	2683
query76	3784	1163	739	739
query77	535	374	281	281
query78	10324	10119	9350	9350
query79	2615	805	587	587
query80	714	515	450	450
query81	508	270	241	241
query82	435	122	93	93
query83	182	168	152	152
query84	288	86	76	76
query85	776	357	303	303
query86	373	301	294	294
query87	4617	4392	4568	4392
query88	3581	2247	2158	2158
query89	397	311	291	291
query90	1752	193	185	185
query91	137	149	120	120
query92	71	61	58	58
query93	2427	1022	570	570
query94	695	409	302	302
query95	356	266	255	255
query96	474	549	267	267
query97	2823	2833	2731	2731
query98	230	200	197	197
query99	1339	1414	1268	1268
Total cold run time: 296694 ms
Total hot run time: 189428 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 43.81% (11447/26126)
Line Coverage: 33.82% (96409/285062)
Region Coverage: 32.51% (49292/151601)
Branch Coverage: 28.19% (24772/87870)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a9d8c94067971cd4383da2f95324b7736e688934_a9d8c94067971cd4383da2f95324b7736e688934/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 30.39 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a9d8c94067971cd4383da2f95324b7736e688934, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.03	0.03
query3	0.24	0.06	0.06
query4	1.62	0.10	0.10
query5	0.39	0.42	0.39
query6	1.16	0.66	0.65
query7	0.02	0.01	0.01
query8	0.05	0.03	0.03
query9	0.60	0.51	0.51
query10	0.57	0.56	0.56
query11	0.16	0.11	0.10
query12	0.14	0.11	0.11
query13	0.63	0.60	0.61
query14	2.68	2.76	2.70
query15	0.92	0.85	0.86
query16	0.38	0.38	0.37
query17	1.03	1.09	1.04
query18	0.21	0.20	0.19
query19	1.96	1.89	2.00
query20	0.02	0.01	0.01
query21	15.36	0.88	0.52
query22	0.76	1.31	0.65
query23	14.82	1.38	0.61
query24	9.62	2.33	0.74
query25	0.46	0.28	0.08
query26	0.76	0.20	0.13
query27	0.05	0.04	0.05
query28	6.84	0.75	0.44
query29	12.54	3.94	3.27
query30	0.25	0.09	0.06
query31	2.83	0.57	0.40
query32	3.22	0.54	0.47
query33	2.98	2.97	3.03
query34	15.59	5.07	4.48
query35	4.48	4.51	4.53
query36	0.66	0.51	0.48
query37	0.09	0.06	0.07
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.18	0.14	0.13
query41	0.08	0.03	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 104.61 s
Total hot run time: 30.39 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 19, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit cb5b057 into apache:master Feb 19, 2025
30 of 31 checks passed
@suxiaogang223 suxiaogang223 deleted the fix_iceberg_position_delete branch February 19, 2025 13:15
@morningman morningman added the usercase Important user case type label label Feb 21, 2025
lzyy2024 pushed a commit to lzyy2024/doris that referenced this pull request Feb 21, 2025
…table with position delete (apache#47977)

### What problem does this PR solve?

Issue Number: close apache#41460 

Problem Summary:
When reading the Iceberg table, previously read `DeleteRows` should not
be released immediately, as the Iceberg data file is split into multiple
`IcebergSplit`s for execution. These `IcebergSplit`s belong to the same
data file, meaning they share the same `DeleteRows`. Therefore,
`DeleteRows` in the `DeleteFile` should not be released prematurely.
Instead, they should be released when the shared_kv is reset, at which
point all `DeleteRows` will be freed along with the cached `DeleteFile`.
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Feb 25, 2025
…table with position delete (apache#47977)

Issue Number: close apache#41460

Problem Summary:
When reading the Iceberg table, previously read `DeleteRows` should not
be released immediately, as the Iceberg data file is split into multiple
`IcebergSplit`s for execution. These `IcebergSplit`s belong to the same
data file, meaning they share the same `DeleteRows`. Therefore,
`DeleteRows` in the `DeleteFile` should not be released prematurely.
Instead, they should be released when the shared_kv is reset, at which
point all `DeleteRows` will be freed along with the cached `DeleteFile`.
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Feb 25, 2025
…table with position delete (apache#47977)

Issue Number: close apache#41460

Problem Summary:
When reading the Iceberg table, previously read `DeleteRows` should not
be released immediately, as the Iceberg data file is split into multiple
`IcebergSplit`s for execution. These `IcebergSplit`s belong to the same
data file, meaning they share the same `DeleteRows`. Therefore,
`DeleteRows` in the `DeleteFile` should not be released prematurely.
Instead, they should be released when the shared_kv is reset, at which
point all `DeleteRows` will be freed along with the cached `DeleteFile`.
yiguolei pushed a commit that referenced this pull request Feb 27, 2025
…ng iceberg table with position delete (#47977) (#48308)

### What problem does this PR solve?
Issue Number: close #41460
Problem Summary:
When reading the Iceberg table, previously read `DeleteRows` should not
be released immediately, as the Iceberg data file is split into multiple
`IcebergSplit`s for execution. These `IcebergSplit`s belong to the same
data file, meaning they share the same `DeleteRows`. Therefore,
`DeleteRows` in the `DeleteFile` should not be released prematurely.
Instead, they should be released when the shared_kv is reset, at which
point all `DeleteRows` will be freed along with the cached `DeleteFile`.

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
morningman pushed a commit to suxiaogang223/doris that referenced this pull request Mar 5, 2025
…table with position delete (apache#47977)

Issue Number: close apache#41460

Problem Summary:
When reading the Iceberg table, previously read `DeleteRows` should not
be released immediately, as the Iceberg data file is split into multiple
`IcebergSplit`s for execution. These `IcebergSplit`s belong to the same
data file, meaning they share the same `DeleteRows`. Therefore,
`DeleteRows` in the `DeleteFile` should not be released prematurely.
Instead, they should be released when the shared_kv is reset, at which
point all `DeleteRows` will be freed along with the cached `DeleteFile`.
@yiguolei yiguolei mentioned this pull request Mar 25, 2025
dataroaring pushed a commit that referenced this pull request Mar 25, 2025
…ng iceberg table with position delete (#47977) (#48309)

Issue Number: close #41460 

Problem Summary:
When reading the Iceberg table, previously read `DeleteRows` should not
be released immediately, as the Iceberg data file is split into multiple
`IcebergSplit`s for execution. These `IcebergSplit`s belong to the same
data file, meaning they share the same `DeleteRows`. Therefore,
`DeleteRows` in the `DeleteFile` should not be released prematurely.
Instead, they should be released when the shared_kv is reset, at which
point all `DeleteRows` will be freed along with the cached `DeleteFile`.
@gavinchou gavinchou mentioned this pull request Apr 23, 2025
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…table with position delete (apache#47977)

### What problem does this PR solve?

Issue Number: close apache#41460 

Problem Summary:
When reading the Iceberg table, previously read `DeleteRows` should not
be released immediately, as the Iceberg data file is split into multiple
`IcebergSplit`s for execution. These `IcebergSplit`s belong to the same
data file, meaning they share the same `DeleteRows`. Therefore,
`DeleteRows` in the `DeleteFile` should not be released prematurely.
Instead, they should be released when the shared_kv is reset, at which
point all `DeleteRows` will be freed along with the cached `DeleteFile`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.9-merged dev/3.0.5-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] V2.1.6 query iceberg MOR v2 table crash

7 participants