Skip to content

Conversation

@airborne12
Copy link
Member

@airborne12 airborne12 commented Jun 18, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
This PR adds error handling around CLucene interactions in the string inverted index reader to prevent core dumps on IO failures and introduces.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jun 18, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12 airborne12 requested a review from Copilot June 18, 2025 02:51
@airborne12
Copy link
Member Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds error handling around CLucene interactions in the string inverted index reader to prevent core dumps on IO failures and introduces unit tests to verify that failures are correctly converted to error statuses.

  • Wraps the search and cache logic in StringTypeInvertedIndexReader::query inside a try/catch for CLuceneError and logs the error.
  • Returns INVERTED_INDEX_CLUCENE_ERROR instead of crashing on IO exceptions.
  • Adds mock readers and new tests (CacheErrorScenarios, TokenizedIndexQueryErrorScenarios) to cover both un-tokenized and tokenized index error paths.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
be/src/olap/rowset/segment_v2/inverted_index_reader.cpp Wrapped query body in try/catch, added logging and status return on CLuceneError.
be/test/olap/rowset/segment_v2/inverted_index_reader_test.cpp Added mocks to simulate CLucene IO failures and tests for both cache-enabled and tokenized queries.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34395 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c1315a940fd038223f1e35b48f8d8c972c35dc72, data reload: false

------ Round 1 ----------------------------------
q1	17642	5221	5038	5038
q2	1937	282	191	191
q3	10346	1414	717	717
q4	10262	1036	563	563
q5	8152	2923	2378	2378
q6	186	161	133	133
q7	908	759	625	625
q8	9332	1359	1206	1206
q9	6786	5111	5136	5111
q10	6930	2390	1975	1975
q11	486	300	289	289
q12	357	365	232	232
q13	17797	3682	3127	3127
q14	221	230	220	220
q15	561	479	492	479
q16	423	432	373	373
q17	617	855	375	375
q18	7742	7136	7131	7131
q19	1459	964	576	576
q20	335	336	224	224
q21	3934	3210	2471	2471
q22	1053	1014	961	961
Total cold run time: 107466 ms
Total hot run time: 34395 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5144	5062	5058	5058
q2	245	318	227	227
q3	2176	2705	2284	2284
q4	1384	1823	1390	1390
q5	4277	4317	4520	4317
q6	235	179	133	133
q7	2044	1930	1775	1775
q8	2627	2575	2484	2484
q9	7177	7164	7246	7164
q10	3046	3279	2888	2888
q11	590	510	514	510
q12	692	790	610	610
q13	3610	3971	3339	3339
q14	278	316	266	266
q15	540	475	474	474
q16	435	500	443	443
q17	1185	1478	1414	1414
q18	7868	7594	7371	7371
q19	868	859	956	859
q20	1992	2080	1938	1938
q21	5037	4424	4318	4318
q22	1066	1016	985	985
Total cold run time: 52516 ms
Total hot run time: 50247 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185782 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c1315a940fd038223f1e35b48f8d8c972c35dc72, data reload: false

query1	992	398	388	388
query2	6542	1827	1831	1827
query3	6755	222	216	216
query4	26419	23485	23432	23432
query5	4397	651	473	473
query6	304	211	224	211
query7	4621	510	297	297
query8	260	229	211	211
query9	8622	2660	2652	2652
query10	467	328	267	267
query11	15608	15130	15289	15130
query12	167	109	108	108
query13	1654	548	429	429
query14	9219	6116	6062	6062
query15	197	195	177	177
query16	7206	655	471	471
query17	1200	734	580	580
query18	1997	429	318	318
query19	201	189	167	167
query20	128	120	123	120
query21	220	130	110	110
query22	4174	4153	4041	4041
query23	34197	33248	33306	33248
query24	8527	2355	2394	2355
query25	541	450	391	391
query26	1227	275	148	148
query27	2749	504	348	348
query28	4332	2128	2111	2111
query29	777	571	438	438
query30	290	216	188	188
query31	946	846	764	764
query32	68	63	62	62
query33	551	346	340	340
query34	777	856	528	528
query35	797	830	745	745
query36	975	1018	889	889
query37	121	99	73	73
query38	4175	4222	4045	4045
query39	1494	1423	1426	1423
query40	203	117	102	102
query41	63	60	57	57
query42	141	109	116	109
query43	487	502	470	470
query44	1309	809	820	809
query45	176	171	166	166
query46	843	1021	621	621
query47	1755	1793	1700	1700
query48	388	421	323	323
query49	742	468	393	393
query50	641	670	422	422
query51	4224	4128	4032	4032
query52	110	107	95	95
query53	226	248	176	176
query54	571	600	501	501
query55	81	88	80	80
query56	294	305	308	305
query57	1173	1201	1111	1111
query58	265	263	254	254
query59	2636	2620	2595	2595
query60	334	323	314	314
query61	122	120	123	120
query62	813	740	674	674
query63	224	192	187	187
query64	4356	985	649	649
query65	4280	4149	4212	4149
query66	1159	407	306	306
query67	15885	15709	15260	15260
query68	8042	886	537	537
query69	483	301	268	268
query70	1191	1175	1087	1087
query71	467	313	313	313
query72	5666	4765	4796	4765
query73	706	608	358	358
query74	8931	9074	8591	8591
query75	3929	3194	2696	2696
query76	3728	1192	745	745
query77	788	375	288	288
query78	9999	10210	9304	9304
query79	1990	829	585	585
query80	615	509	443	443
query81	469	253	228	228
query82	427	126	100	100
query83	254	246	231	231
query84	245	111	94	94
query85	785	354	307	307
query86	347	309	280	280
query87	4320	4429	4295	4295
query88	3598	2361	2287	2287
query89	388	317	292	292
query90	1936	215	209	209
query91	140	139	112	112
query92	140	61	56	56
query93	1536	946	587	587
query94	675	404	302	302
query95	384	290	290	290
query96	491	573	281	281
query97	2701	2790	2604	2604
query98	247	210	198	198
query99	1481	1452	1286	1286
Total cold run time: 274494 ms
Total hot run time: 185782 ms

@doris-robot
Copy link

TPC-H: Total hot run time: 33825 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c1315a940fd038223f1e35b48f8d8c972c35dc72, data reload: false

------ Round 1 ----------------------------------
q1	17582	5071	4987	4987
q2	1939	271	167	167
q3	10321	1266	757	757
q4	10214	999	524	524
q5	7532	2297	2316	2297
q6	177	159	127	127
q7	901	739	602	602
q8	9327	1253	1127	1127
q9	6794	5061	5054	5054
q10	6915	2385	1997	1997
q11	493	286	268	268
q12	340	357	215	215
q13	17777	3664	3079	3079
q14	238	228	217	217
q15	579	480	483	480
q16	421	420	379	379
q17	594	853	351	351
q18	7652	7217	7152	7152
q19	1217	962	539	539
q20	342	346	228	228
q21	3726	3088	2341	2341
q22	1063	1030	937	937
Total cold run time: 106144 ms
Total hot run time: 33825 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5267	5135	5049	5049
q2	241	317	222	222
q3	2176	2651	2352	2352
q4	1365	1766	1372	1372
q5	4173	4092	4347	4092
q6	211	172	127	127
q7	2005	1937	1785	1785
q8	2619	2562	2559	2559
q9	7413	7135	7209	7135
q10	3116	3245	2793	2793
q11	589	521	481	481
q12	714	787	647	647
q13	3445	3905	3333	3333
q14	272	299	292	292
q15	526	490	459	459
q16	473	484	459	459
q17	1139	1533	1360	1360
q18	7775	7460	7449	7449
q19	811	804	1050	804
q20	2006	2035	1894	1894
q21	5056	4341	4316	4316
q22	1115	1093	1020	1020
Total cold run time: 52507 ms
Total hot run time: 50000 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193001 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c1315a940fd038223f1e35b48f8d8c972c35dc72, data reload: false

query1	1424	1017	991	991
query2	6447	1842	1851	1842
query3	11141	4723	4585	4585
query4	25954	23821	23414	23414
query5	4928	626	475	475
query6	315	225	218	218
query7	3998	509	302	302
query8	266	230	221	221
query9	8531	2629	2639	2629
query10	500	327	275	275
query11	15801	15067	14903	14903
query12	173	113	110	110
query13	1582	551	427	427
query14	9554	6147	6182	6147
query15	193	193	170	170
query16	7610	638	523	523
query17	1175	771	582	582
query18	2074	413	314	314
query19	215	200	173	173
query20	127	121	122	121
query21	204	141	113	113
query22	4571	4583	4285	4285
query23	34709	33815	33833	33815
query24	8312	2469	2379	2379
query25	518	467	398	398
query26	724	269	153	153
query27	2652	520	345	345
query28	4544	2174	2155	2155
query29	625	580	465	465
query30	280	232	197	197
query31	904	867	817	817
query32	80	73	63	63
query33	575	358	298	298
query34	781	866	524	524
query35	809	850	750	750
query36	961	1017	907	907
query37	125	100	79	79
query38	4297	4352	4198	4198
query39	1511	1472	1462	1462
query40	214	120	119	119
query41	63	56	61	56
query42	120	113	115	113
query43	515	510	478	478
query44	1377	835	844	835
query45	187	178	169	169
query46	848	1051	668	668
query47	1863	1894	1786	1786
query48	397	450	337	337
query49	676	495	402	402
query50	641	708	403	403
query51	4293	4242	4204	4204
query52	112	112	104	104
query53	223	260	184	184
query54	602	591	514	514
query55	90	88	85	85
query56	312	324	297	297
query57	1240	1248	1194	1194
query58	264	263	264	263
query59	2787	2801	2692	2692
query60	341	316	313	313
query61	123	124	124	124
query62	794	723	691	691
query63	241	197	195	195
query64	3032	1086	707	707
query65	4425	4308	4253	4253
query66	785	394	297	297
query67	15860	15608	15361	15361
query68	8914	935	529	529
query69	489	294	266	266
query70	1177	1193	1115	1115
query71	479	320	300	300
query72	5197	4765	4709	4709
query73	714	590	356	356
query74	8961	9182	8987	8987
query75	4105	3187	2682	2682
query76	3676	1176	753	753
query77	792	371	277	277
query78	10227	10343	9302	9302
query79	2116	871	577	577
query80	584	506	450	450
query81	481	278	230	230
query82	498	129	98	98
query83	248	249	237	237
query84	257	116	97	97
query85	858	465	313	313
query86	385	292	276	276
query87	4492	4434	4296	4296
query88	3606	2267	2230	2230
query89	424	316	287	287
query90	1825	206	206	206
query91	144	138	115	115
query92	73	60	60	60
query93	1775	951	574	574
query94	633	405	319	319
query95	365	285	291	285
query96	489	571	277	277
query97	2684	2778	2661	2661
query98	234	288	205	205
query99	1347	1394	1263	1263
Total cold run time: 279864 ms
Total hot run time: 193001 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 86.84% (33/38) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 56.34% (15051/26713)
Line Coverage 45.11% (134612/298376)
Region Coverage 44.26% (67706/152983)
Branch Coverage 38.84% (34737/89440)

@doris-robot
Copy link

TPC-H: Total hot run time: 33662 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c1315a940fd038223f1e35b48f8d8c972c35dc72, data reload: false

------ Round 1 ----------------------------------
q1	17575	5091	4933	4933
q2	1923	277	158	158
q3	10416	1245	760	760
q4	10220	1003	528	528
q5	7488	2334	2315	2315
q6	178	161	129	129
q7	894	740	592	592
q8	9327	1280	1067	1067
q9	6768	5080	5063	5063
q10	6894	2364	1963	1963
q11	503	292	269	269
q12	340	360	207	207
q13	17791	3688	3070	3070
q14	234	230	215	215
q15	547	482	486	482
q16	414	439	373	373
q17	581	853	373	373
q18	7476	7077	7136	7077
q19	1232	940	551	551
q20	330	332	214	214
q21	3758	2531	2357	2357
q22	1049	996	966	966
Total cold run time: 105938 ms
Total hot run time: 33662 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5024	5032	5004	5004
q2	235	321	223	223
q3	2169	2612	2321	2321
q4	1353	1818	1326	1326
q5	4197	4083	4165	4083
q6	205	172	130	130
q7	2013	1958	1790	1790
q8	2592	2624	2507	2507
q9	7390	7296	7298	7296
q10	3144	3323	2792	2792
q11	560	508	497	497
q12	721	814	646	646
q13	3613	4083	3438	3438
q14	294	308	282	282
q15	537	479	478	478
q16	442	505	448	448
q17	1177	1562	1409	1409
q18	7921	7716	7596	7596
q19	799	939	1170	939
q20	1990	2037	1855	1855
q21	4937	4326	4488	4326
q22	1051	1028	1028	1028
Total cold run time: 52364 ms
Total hot run time: 50414 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191581 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c1315a940fd038223f1e35b48f8d8c972c35dc72, data reload: false

query1	1388	1042	987	987
query2	6154	1822	1865	1822
query3	11019	4417	4465	4417
query4	55431	24379	22896	22896
query5	5011	476	451	451
query6	359	203	200	200
query7	4927	490	296	296
query8	283	227	207	207
query9	5987	2626	2620	2620
query10	451	318	289	289
query11	15046	14994	14757	14757
query12	156	110	111	110
query13	1082	532	425	425
query14	10242	6232	6199	6199
query15	202	211	176	176
query16	7107	650	541	541
query17	1082	733	597	597
query18	1574	412	322	322
query19	204	204	166	166
query20	134	121	114	114
query21	213	122	101	101
query22	4452	4482	4210	4210
query23	34389	33620	33509	33509
query24	6945	2396	2394	2394
query25	480	482	413	413
query26	697	263	151	151
query27	2286	530	354	354
query28	3025	2150	2154	2150
query29	570	570	435	435
query30	276	228	195	195
query31	892	835	782	782
query32	69	63	61	61
query33	449	348	307	307
query34	765	863	519	519
query35	801	807	779	779
query36	930	1006	896	896
query37	109	101	81	81
query38	4158	4274	4145	4145
query39	1504	1476	1449	1449
query40	216	127	120	120
query41	66	60	57	57
query42	126	107	111	107
query43	507	500	496	496
query44	1350	835	827	827
query45	185	201	165	165
query46	864	1028	653	653
query47	1845	1857	1788	1788
query48	394	433	333	333
query49	671	482	409	409
query50	666	713	404	404
query51	4273	4292	4221	4221
query52	112	107	99	99
query53	229	250	195	195
query54	579	569	515	515
query55	82	83	80	80
query56	306	309	273	273
query57	1214	1267	1199	1199
query58	279	266	254	254
query59	2742	2762	2708	2708
query60	348	346	340	340
query61	147	149	147	147
query62	706	774	681	681
query63	238	193	209	193
query64	1597	1142	795	795
query65	4268	4132	4163	4132
query66	698	397	315	315
query67	15792	15783	15380	15380
query68	7994	881	519	519
query69	540	310	267	267
query70	1239	1109	1047	1047
query71	527	333	300	300
query72	5577	4758	4835	4758
query73	1377	648	348	348
query74	8964	9059	8772	8772
query75	3769	3193	2676	2676
query76	4226	1185	746	746
query77	602	364	285	285
query78	10026	10176	9340	9340
query79	2413	780	588	588
query80	663	498	443	443
query81	491	259	226	226
query82	483	129	97	97
query83	383	244	237	237
query84	292	105	86	86
query85	797	353	326	326
query86	407	304	308	304
query87	4368	4431	4312	4312
query88	3506	2327	2245	2245
query89	403	316	307	307
query90	1894	202	205	202
query91	142	138	111	111
query92	74	59	57	57
query93	1779	934	578	578
query94	733	405	288	288
query95	367	290	286	286
query96	492	567	282	282
query97	2750	2784	2666	2666
query98	224	213	202	202
query99	1428	1415	1271	1271
Total cold run time: 300279 ms
Total hot run time: 191581 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.71 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c1315a940fd038223f1e35b48f8d8c972c35dc72, data reload: false

query1	0.03	0.03	0.03
query2	0.08	0.03	0.04
query3	0.24	0.07	0.07
query4	1.61	0.10	0.11
query5	0.43	0.41	0.42
query6	1.16	0.65	0.66
query7	0.03	0.01	0.02
query8	0.05	0.04	0.04
query9	0.59	0.52	0.51
query10	0.57	0.57	0.57
query11	0.15	0.11	0.12
query12	0.14	0.11	0.12
query13	0.61	0.60	0.61
query14	0.80	0.82	0.83
query15	0.90	0.88	0.88
query16	0.40	0.37	0.39
query17	1.04	1.07	1.09
query18	0.22	0.21	0.21
query19	1.99	1.81	1.85
query20	0.02	0.01	0.01
query21	15.40	0.89	0.52
query22	0.76	1.18	0.67
query23	14.91	1.36	0.60
query24	6.56	2.23	1.07
query25	0.52	0.11	0.09
query26	0.61	0.16	0.14
query27	0.06	0.05	0.05
query28	9.97	0.88	0.44
query29	12.54	3.93	3.26
query30	0.26	0.09	0.06
query31	2.83	0.60	0.38
query32	3.24	0.55	0.46
query33	3.06	3.12	3.12
query34	16.12	5.31	4.78
query35	4.82	4.82	4.83
query36	0.70	0.51	0.49
query37	0.09	0.07	0.07
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.20	0.15	0.15
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.05	0.04	0.03
Total cold run time: 103.95 s
Total hot run time: 29.71 s

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 86.84% (33/38) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 61.01% (16040/26290)
Line Coverage 50.48% (150553/298218)
Region Coverage 47.80% (86022/179957)
Branch Coverage 41.28% (42210/102246)

Copy link
Contributor

@zzzxl1993 zzzxl1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@csun5285 csun5285 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@airborne12 airborne12 merged commit 21aa3fc into apache:master Jun 18, 2025
27 of 30 checks passed
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 18, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@airborne12 airborne12 deleted the fix-index branch June 18, 2025 09:41
airborne12 added a commit to airborne12/apache-doris that referenced this pull request Jul 7, 2025
…d index string reader (apache#51844)

Problem Summary:
This PR adds error handling around CLucene interactions in the string
inverted index reader to prevent core dumps on IO failures and
introduces.
airborne12 added a commit to airborne12/apache-doris that referenced this pull request Jul 7, 2025
…d index string reader (apache#51844)

Problem Summary:
This PR adds error handling around CLucene interactions in the string
inverted index reader to prevent core dumps on IO failures and
introduces.
dataroaring pushed a commit that referenced this pull request Jul 8, 2025
airborne12 added a commit that referenced this pull request Jul 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. cloud dev/3.0.7-merged dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants