Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Feb 25, 2025

What problem does this PR solve?

pick #47977

Issue Number: close #41460

Problem Summary:
When reading the Iceberg table, previously read DeleteRows should not be released immediately, as the Iceberg data file is split into multiple IcebergSplits for execution. These IcebergSplits belong to the same data file, meaning they share the same DeleteRows. Therefore, DeleteRows in the DeleteFile should not be released prematurely. Instead, they should be released when the shared_kv is reset, at which point all DeleteRows will be freed along with the cached DeleteFile.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223 suxiaogang223 changed the title [cherry-pick](branch-3.0) Don't prematurely erase DeleteRows in reading iceberg table with position delete #47977 [cherry-pick](branch-3.0) Don't prematurely erase DeleteRows in reading iceberg table with position delete (#47977) Feb 26, 2025
@suxiaogang223
Copy link
Contributor Author

run buildall

4 similar comments
@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

…table with position delete (apache#47977)

Issue Number: close apache#41460

Problem Summary:
When reading the Iceberg table, previously read `DeleteRows` should not
be released immediately, as the Iceberg data file is split into multiple
`IcebergSplit`s for execution. These `IcebergSplit`s belong to the same
data file, meaning they share the same `DeleteRows`. Therefore,
`DeleteRows` in the `DeleteFile` should not be released prematurely.
Instead, they should be released when the shared_kv is reset, at which
point all `DeleteRows` will be freed along with the cached `DeleteFile`.
@morningman morningman force-pushed the fix_iceberg_opsition_bug_3.0 branch from e589895 to 20a27fe Compare March 5, 2025 13:51
@morningman
Copy link
Contributor

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40291 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit be0de631a93d2331bb61789a4b6bf073f1697a2c, data reload: false

------ Round 1 ----------------------------------
q1	17574	6770	6583	6583
q2	2062	174	180	174
q3	10536	1108	1239	1108
q4	10459	743	723	723
q5	7781	2920	2855	2855
q6	227	137	135	135
q7	983	617	606	606
q8	9357	1987	2066	1987
q9	6669	6442	6449	6442
q10	7055	2254	2330	2254
q11	481	272	268	268
q12	412	220	212	212
q13	17785	2999	3026	2999
q14	251	209	211	209
q15	494	468	469	468
q16	673	577	590	577
q17	996	594	575	575
q18	7383	6708	6639	6639
q19	1406	1102	1077	1077
q20	476	205	197	197
q21	4036	3302	3203	3203
q22	1106	1017	1000	1000
Total cold run time: 108202 ms
Total hot run time: 40291 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6583	6862	6590	6590
q2	332	228	235	228
q3	2905	2741	2937	2741
q4	2037	1806	1803	1803
q5	5826	5798	5875	5798
q6	227	131	130	130
q7	2284	1808	1846	1808
q8	3410	3560	3517	3517
q9	8882	8965	8968	8965
q10	3594	3542	3547	3542
q11	595	501	502	501
q12	816	586	608	586
q13	9549	3219	3194	3194
q14	305	285	288	285
q15	530	459	485	459
q16	700	635	629	629
q17	1864	1616	1623	1616
q18	8382	7828	7766	7766
q19	1731	1485	1621	1485
q20	2075	1923	1860	1860
q21	5638	5351	5408	5351
q22	1167	1081	1033	1033
Total cold run time: 69432 ms
Total hot run time: 59887 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197888 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit be0de631a93d2331bb61789a4b6bf073f1697a2c, data reload: false

query1	1280	906	909	906
query2	6264	2032	2028	2028
query3	10959	4728	4483	4483
query4	33022	23623	23253	23253
query5	4631	469	456	456
query6	285	209	197	197
query7	4023	319	330	319
query8	294	234	228	228
query9	9540	2622	2628	2622
query10	494	267	258	258
query11	18117	15049	15131	15049
query12	160	118	102	102
query13	1548	428	426	426
query14	8467	6591	7609	6591
query15	246	185	191	185
query16	8042	498	490	490
query17	1600	592	573	573
query18	2109	314	327	314
query19	245	167	164	164
query20	130	123	110	110
query21	224	109	122	109
query22	4742	4428	4447	4428
query23	34910	35651	34056	34056
query24	11885	3002	3063	3002
query25	546	433	446	433
query26	1120	179	182	179
query27	2866	380	351	351
query28	7856	2472	2464	2464
query29	700	460	475	460
query30	261	172	178	172
query31	1157	833	875	833
query32	106	60	58	58
query33	786	311	311	311
query34	943	533	538	533
query35	915	771	753	753
query36	1119	1003	976	976
query37	204	68	74	68
query38	4128	4214	4149	4149
query39	1587	1495	1478	1478
query40	263	106	107	106
query41	50	53	49	49
query42	117	106	107	106
query43	527	503	497	497
query44	1374	807	820	807
query45	186	170	167	167
query46	1167	757	753	753
query47	1985	1893	1945	1893
query48	483	393	397	393
query49	1004	426	393	393
query50	868	469	427	427
query51	7426	7283	7332	7283
query52	109	93	104	93
query53	267	187	196	187
query54	1063	483	472	472
query55	84	77	82	77
query56	288	265	270	265
query57	1273	1165	1169	1165
query58	238	223	217	217
query59	3341	3154	3177	3154
query60	293	268	254	254
query61	112	109	109	109
query62	861	671	677	671
query63	228	197	203	197
query64	4721	692	650	650
query65	3270	3209	3207	3207
query66	1130	311	291	291
query67	15938	15664	15670	15664
query68	5500	587	579	579
query69	438	264	264	264
query70	1164	1143	1138	1138
query71	332	272	252	252
query72	5847	4111	3999	3999
query73	787	353	368	353
query74	10381	9159	9199	9159
query75	3383	2678	2638	2638
query76	3206	1037	1143	1037
query77	396	282	283	282
query78	10477	9593	9628	9593
query79	1161	622	602	602
query80	685	451	454	451
query81	532	246	236	236
query82	743	91	93	91
query83	167	146	152	146
query84	235	82	86	82
query85	1403	349	293	293
query86	447	294	304	294
query87	4374	4376	4214	4214
query88	4044	2383	2398	2383
query89	411	288	291	288
query90	2135	186	187	186
query91	185	150	150	150
query92	68	50	51	50
query93	2318	557	566	557
query94	942	288	275	275
query95	362	262	263	262
query96	614	287	275	275
query97	3301	3142	3193	3142
query98	223	198	202	198
query99	1507	1318	1305	1305
Total cold run time: 303945 ms
Total hot run time: 197888 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.86 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit be0de631a93d2331bb61789a4b6bf073f1697a2c, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.04	0.02
query3	0.24	0.07	0.07
query4	1.63	0.10	0.10
query5	0.51	0.51	0.50
query6	1.15	0.73	0.72
query7	0.03	0.01	0.02
query8	0.04	0.03	0.04
query9	0.56	0.51	0.49
query10	0.56	0.55	0.54
query11	0.14	0.10	0.10
query12	0.14	0.11	0.11
query13	0.62	0.60	0.61
query14	2.73	2.82	2.82
query15	0.90	0.83	0.84
query16	0.39	0.36	0.42
query17	0.99	1.01	1.01
query18	0.24	0.22	0.22
query19	1.89	1.85	1.93
query20	0.02	0.01	0.01
query21	15.34	0.58	0.59
query22	2.83	2.89	2.90
query23	17.10	1.00	0.89
query24	3.01	1.38	2.00
query25	0.22	0.14	0.11
query26	0.56	0.14	0.14
query27	0.04	0.04	0.04
query28	9.30	0.50	0.48
query29	12.59	3.21	3.26
query30	0.25	0.06	0.05
query31	2.88	0.39	0.38
query32	3.25	0.47	0.45
query33	2.96	3.03	3.06
query34	17.35	4.60	4.50
query35	4.52	4.54	4.53
query36	0.68	0.48	0.49
query37	0.09	0.07	0.06
query38	0.04	0.03	0.04
query39	0.03	0.02	0.02
query40	0.16	0.13	0.12
query41	0.07	0.02	0.02
query42	0.04	0.03	0.02
query43	0.04	0.03	0.04
Total cold run time: 106.23 s
Total hot run time: 33.86 s

@suxiaogang223
Copy link
Contributor Author

run buildall

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40806 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7e11cc44a7fc347500576253297ce7eb96d8a144, data reload: false

------ Round 1 ----------------------------------
q1	17586	7022	6664	6664
q2	2076	179	186	179
q3	10691	1118	1204	1118
q4	10554	801	731	731
q5	7748	2906	2853	2853
q6	220	135	134	134
q7	975	609	614	609
q8	9361	2011	2079	2011
q9	6747	6519	6477	6477
q10	7043	2285	2324	2285
q11	462	261	258	258
q12	408	225	212	212
q13	17796	3088	3164	3088
q14	244	208	213	208
q15	513	483	480	480
q16	682	603	584	584
q17	1001	592	594	592
q18	7726	6888	6784	6784
q19	1388	1087	1165	1087
q20	475	204	200	200
q21	4116	3280	3281	3280
q22	1077	1004	972	972
Total cold run time: 108889 ms
Total hot run time: 40806 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6619	6608	6627	6608
q2	342	240	226	226
q3	2955	2964	2970	2964
q4	2068	1830	1835	1830
q5	5785	5792	5764	5764
q6	209	122	126	122
q7	2446	1821	1831	1821
q8	3449	3611	3523	3523
q9	8867	8965	8950	8950
q10	3605	3541	3568	3541
q11	610	491	488	488
q12	782	625	601	601
q13	9445	3179	3201	3179
q14	303	279	287	279
q15	540	483	490	483
q16	713	653	655	653
q17	1909	1617	1600	1600
q18	8348	7885	7765	7765
q19	1664	1530	1587	1530
q20	2135	1884	1914	1884
q21	5570	5407	5475	5407
q22	1150	1052	1026	1026
Total cold run time: 69514 ms
Total hot run time: 60244 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197709 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7e11cc44a7fc347500576253297ce7eb96d8a144, data reload: false

query1	1301	920	920	920
query2	6270	2041	1991	1991
query3	10918	4280	4409	4280
query4	66177	28504	23618	23618
query5	4984	454	444	444
query6	408	173	174	173
query7	5619	311	307	307
query8	307	224	223	223
query9	8906	2583	2595	2583
query10	459	285	279	279
query11	17765	15144	15727	15144
query12	150	105	103	103
query13	1519	444	434	434
query14	10348	7528	7558	7528
query15	205	176	188	176
query16	7136	490	506	490
query17	1093	601	587	587
query18	1876	331	326	326
query19	222	164	160	160
query20	115	113	112	112
query21	208	109	107	107
query22	4663	4446	4440	4440
query23	34491	34032	34074	34032
query24	6276	2915	2960	2915
query25	538	426	423	423
query26	679	169	173	169
query27	1920	356	360	356
query28	3957	2463	2449	2449
query29	719	505	437	437
query30	244	168	162	162
query31	1000	805	831	805
query32	72	56	55	55
query33	459	299	297	297
query34	916	502	535	502
query35	878	751	733	733
query36	1103	962	977	962
query37	118	67	70	67
query38	4084	4029	4110	4029
query39	1526	1510	1510	1510
query40	199	100	101	100
query41	48	48	47	47
query42	113	104	104	104
query43	529	498	498	498
query44	1188	843	793	793
query45	181	168	167	167
query46	1160	723	741	723
query47	2036	1905	1929	1905
query48	495	371	389	371
query49	740	394	408	394
query50	878	460	440	440
query51	7415	7087	7113	7087
query52	102	90	88	88
query53	254	178	182	178
query54	551	451	451	451
query55	80	76	78	76
query56	248	230	244	230
query57	1250	1114	1107	1107
query58	205	206	213	206
query59	3073	2882	2907	2882
query60	282	253	256	253
query61	108	109	110	109
query62	779	676	634	634
query63	213	189	198	189
query64	1397	660	638	638
query65	3228	3181	3196	3181
query66	700	295	294	294
query67	15798	15691	15620	15620
query68	3946	585	572	572
query69	428	264	255	255
query70	1181	1101	1153	1101
query71	352	257	252	252
query72	6402	4052	3937	3937
query73	744	349	347	347
query74	10203	9126	9124	9124
query75	3322	2601	2631	2601
query76	1968	1085	1109	1085
query77	498	273	273	273
query78	10476	9612	9545	9545
query79	1118	610	603	603
query80	958	419	426	419
query81	521	236	235	235
query82	178	86	90	86
query83	162	141	144	141
query84	277	86	80	80
query85	962	312	292	292
query86	399	291	283	283
query87	4508	4354	4339	4339
query88	4030	2395	2370	2370
query89	415	293	293	293
query90	2011	185	186	185
query91	185	149	148	148
query92	66	50	52	50
query93	1557	550	554	550
query94	831	271	307	271
query95	361	251	251	251
query96	607	288	282	282
query97	3362	3176	3187	3176
query98	219	206	198	198
query99	1602	1272	1300	1272
Total cold run time: 316728 ms
Total hot run time: 197709 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 38.89% (10178/26172)
Line Coverage 30.32% (86850/286466)
Region Coverage 29.35% (44598/151942)
Branch Coverage 25.89% (22693/87664)

@doris-robot
Copy link

ClickBench: Total hot run time: 31.09 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7e11cc44a7fc347500576253297ce7eb96d8a144, data reload: false

query1	0.03	0.04	0.03
query2	0.07	0.03	0.03
query3	0.23	0.06	0.06
query4	1.63	0.10	0.10
query5	0.53	0.53	0.53
query6	1.13	0.72	0.73
query7	0.02	0.02	0.02
query8	0.03	0.03	0.03
query9	0.56	0.54	0.50
query10	0.55	0.55	0.54
query11	0.14	0.10	0.10
query12	0.15	0.12	0.12
query13	0.60	0.60	0.59
query14	2.74	2.84	2.79
query15	0.88	0.83	0.84
query16	0.38	0.37	0.38
query17	0.95	1.04	1.04
query18	0.24	0.22	0.21
query19	1.97	1.83	2.01
query20	0.02	0.01	0.01
query21	15.35	0.58	0.57
query22	2.25	3.03	1.26
query23	16.96	0.97	0.73
query24	3.25	0.49	1.97
query25	0.21	0.14	0.06
query26	0.45	0.14	0.14
query27	0.06	0.05	0.04
query28	10.11	0.57	0.46
query29	13.03	3.22	3.21
query30	0.25	0.06	0.05
query31	2.87	0.40	0.37
query32	3.23	0.46	0.45
query33	2.99	3.02	3.01
query34	17.31	4.56	4.49
query35	4.65	4.53	4.49
query36	0.69	0.50	0.50
query37	0.08	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.13	0.13
query41	0.08	0.02	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.99 s
Total hot run time: 31.09 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 9d65907 into apache:branch-3.0 Mar 25, 2025
20 of 23 checks passed
@suxiaogang223 suxiaogang223 deleted the fix_iceberg_opsition_bug_3.0 branch July 10, 2025 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants