Skip to content

Conversation

@zhiqiang-hhhh
Copy link
Contributor

@zhiqiang-hhhh zhiqiang-hhhh commented Aug 12, 2024

  • Problem

We are currently facing an issue where pipeline tasks experience leaks in certain situations. The leak in pipeline tasks refers to the scenario where a query has already been completed, but its associated data structures still persist on the backend (BE). This could lead to some memory or computational resources on the BE never being released.

  • Fix

We will periodically reconcile queries with the Frontend (FE) in the cancel work thread. Once we detect that a query has been completed on the FE but still exists on the Backend (BE), we will cancel the query to promptly release the resources. To avoid mistakenly triggering cancellations, we employ a conservative strategy. For instance, we will not proactively cancel queries if we detect any FE is in an abnormal state or if there are network conflicts.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@zhiqiang-hhhh zhiqiang-hhhh changed the title TEMP [opt](query cancel) cancel query if it has pipeline task leakage Aug 12, 2024
@zhiqiang-hhhh zhiqiang-hhhh force-pushed the opt-cancel branch 2 times, most recently from 5eb9c7c to af8853f Compare August 12, 2024 09:53
@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions


DEFINE_mInt16(topn_agg_limit_multiplier, "2");

DEFINE_mInt64(pipeline_task_leakage_detect_period_sec, "60");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


for (const auto& query_id : query_ids_and_rpc_succeed.first) {
LOG_INFO("Running query id: {}", print_id(query_id));
result_ref.insert(query_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可能不能这么写,如果一个fe fetch 失败,我们不能认为这个fe 上运行的query 是空的,此时应该认为都是合理的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果 FE fetch 失败的话,不会到这里,216 行直接返回 false 了。

@doris-robot
Copy link

TPC-H: Total hot run time: 39804 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit af8853fd0f57899a0686956700fa32a3de8ba8d9, data reload: false

------ Round 1 ----------------------------------
q1	17866	4470	4318	4318
q2	2016	211	203	203
q3	11549	1022	1130	1022
q4	10545	755	738	738
q5	7522	2562	2581	2562
q6	221	141	141	141
q7	986	619	593	593
q8	9415	1948	1958	1948
q9	9929	6549	6674	6549
q10	7021	2193	2195	2193
q11	469	239	244	239
q12	410	226	229	226
q13	17757	3003	3009	3003
q14	283	245	235	235
q15	528	480	513	480
q16	505	389	385	385
q17	989	686	640	640
q18	8099	7505	7352	7352
q19	6560	1066	1019	1019
q20	674	354	344	344
q21	5485	4658	4619	4619
q22	1114	1043	995	995
Total cold run time: 119943 ms
Total hot run time: 39804 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4514	4233	4240	4233
q2	386	278	274	274
q3	2937	2714	2803	2714
q4	2012	1715	1783	1715
q5	5525	5491	5399	5399
q6	235	137	138	137
q7	2229	1756	1738	1738
q8	3256	3470	3459	3459
q9	8763	8923	8865	8865
q10	3551	3210	3277	3210
q11	607	483	490	483
q12	835	640	630	630
q13	17183	3205	3173	3173
q14	320	286	300	286
q15	550	477	503	477
q16	494	461	460	460
q17	1821	1506	1525	1506
q18	8209	7881	7794	7794
q19	1805	1600	1730	1600
q20	2950	1852	1852	1852
q21	5457	5215	5266	5215
q22	1157	1046	1073	1046
Total cold run time: 74796 ms
Total hot run time: 56266 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 202724 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit af8853fd0f57899a0686956700fa32a3de8ba8d9, data reload: false

query1	1678	1464	1476	1464
query2	6347	1982	1866	1866
query3	14006	10181	323	323
query4	33554	23057	23096	23057
query5	3658	508	507	507
query6	293	169	148	148
query7	3922	288	294	288
query8	243	212	200	200
query9	8333	2360	2341	2341
query10	543	425	476	425
query11	15481	15000	14873	14873
query12	130	105	93	93
query13	1534	369	351	351
query14	9759	7872	7829	7829
query15	283	248	250	248
query16	7518	485	480	480
query17	1428	598	577	577
query18	1405	290	293	290
query19	234	147	150	147
query20	127	108	108	108
query21	204	106	101	101
query22	4543	4327	4280	4280
query23	34216	33656	33728	33656
query24	10969	2557	2497	2497
query25	552	368	399	368
query26	716	150	146	146
query27	2136	274	278	274
query28	6201	1979	1978	1978
query29	713	406	429	406
query30	257	148	145	145
query31	976	756	762	756
query32	84	53	54	53
query33	724	286	285	285
query34	874	458	461	458
query35	953	821	812	812
query36	1094	902	902	902
query37	138	79	80	79
query38	4316	4083	4048	4048
query39	1411	1370	1361	1361
query40	207	118	118	118
query41	47	44	45	44
query42	115	94	97	94
query43	503	482	459	459
query44	1198	732	740	732
query45	244	207	204	204
query46	1106	753	733	733
query47	1873	1773	1774	1773
query48	380	299	306	299
query49	855	437	483	437
query50	822	417	431	417
query51	6830	6703	6691	6691
query52	99	91	88	88
query53	254	179	179	179
query54	935	447	441	441
query55	78	74	79	74
query56	275	248	254	248
query57	1151	1080	1035	1035
query58	232	234	229	229
query59	2960	2636	2649	2636
query60	299	271	271	271
query61	97	96	92	92
query62	813	638	636	636
query63	207	178	184	178
query64	9312	2274	1734	1734
query65	3222	3107	3189	3107
query66	762	330	325	325
query67	15259	14686	14796	14686
query68	4440	538	542	538
query69	465	370	409	370
query70	1177	1114	1141	1114
query71	372	276	273	273
query72	18801	16705	16832	16705
query73	793	327	323	323
query74	9224	8831	8673	8673
query75	3376	2669	2679	2669
query76	2128	1066	996	996
query77	461	316	310	310
query78	10449	8919	9001	8919
query79	2997	517	513	513
query80	2007	499	505	499
query81	609	224	219	219
query82	770	148	141	141
query83	250	155	147	147
query84	275	77	82	77
query85	2017	285	271	271
query86	464	299	292	292
query87	4698	4529	4573	4529
query88	4187	2492	2524	2492
query89	413	303	287	287
query90	1838	202	198	198
query91	122	94	99	94
query92	57	51	50	50
query93	3230	548	534	534
query94	724	308	285	285
query95	361	263	269	263
query96	613	281	275	275
query97	3235	3059	3039	3039
query98	230	197	197	197
query99	1759	1274	1252	1252
Total cold run time: 314230 ms
Total hot run time: 202724 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.33 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit af8853fd0f57899a0686956700fa32a3de8ba8d9, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.22	0.06	0.06
query4	1.66	0.09	0.08
query5	0.51	0.49	0.51
query6	1.13	0.73	0.72
query7	0.02	0.01	0.02
query8	0.06	0.05	0.04
query9	0.55	0.48	0.49
query10	0.55	0.54	0.53
query11	0.15	0.11	0.11
query12	0.15	0.12	0.12
query13	0.62	0.59	0.59
query14	0.76	0.78	0.76
query15	0.85	0.81	0.82
query16	0.36	0.36	0.38
query17	1.02	1.05	0.96
query18	0.23	0.22	0.23
query19	1.88	1.81	1.74
query20	0.01	0.00	0.01
query21	15.40	0.76	0.66
query22	4.12	7.33	2.63
query23	18.30	1.29	1.25
query24	2.12	0.23	0.23
query25	0.16	0.09	0.08
query26	0.30	0.22	0.21
query27	0.45	0.22	0.23
query28	13.24	1.01	1.00
query29	12.65	3.32	3.30
query30	0.24	0.05	0.05
query31	2.89	0.39	0.39
query32	3.28	0.48	0.47
query33	2.88	2.93	2.92
query34	17.08	4.32	4.40
query35	4.41	4.40	4.41
query36	0.65	0.49	0.46
query37	0.19	0.16	0.16
query38	0.16	0.15	0.15
query39	0.04	0.04	0.03
query40	0.16	0.13	0.12
query41	0.09	0.04	0.05
query42	0.06	0.05	0.04
query43	0.04	0.04	0.04
Total cold run time: 109.76 s
Total hot run time: 31.33 s

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@doris-robot
Copy link

TPC-H: Total hot run time: 40396 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 109c13032badad4116edebfaa3329711129ef764, data reload: false

------ Round 1 ----------------------------------
q1	18432	4647	4389	4389
q2	2715	177	179	177
q3	11069	1211	1128	1128
q4	11439	747	691	691
q5	8110	2929	2872	2872
q6	229	141	140	140
q7	954	602	598	598
q8	9319	2067	2100	2067
q9	8831	6632	6616	6616
q10	7027	2157	2198	2157
q11	489	243	249	243
q12	395	224	220	220
q13	18905	2964	2973	2964
q14	282	239	233	233
q15	531	484	476	476
q16	490	387	380	380
q17	1015	721	686	686
q18	8166	7525	7456	7456
q19	5909	1003	1041	1003
q20	686	337	354	337
q21	5873	4567	4647	4567
q22	1108	996	997	996
Total cold run time: 121974 ms
Total hot run time: 40396 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4570	4319	4281	4281
q2	369	273	266	266
q3	2870	2581	2645	2581
q4	1899	1604	1630	1604
q5	5383	5408	5399	5399
q6	225	135	135	135
q7	2091	1661	1639	1639
q8	3206	3396	3363	3363
q9	8426	8431	8399	8399
q10	3420	3186	3161	3161
q11	599	490	495	490
q12	792	626	624	624
q13	17443	2990	2987	2987
q14	291	285	285	285
q15	527	482	479	479
q16	480	416	411	411
q17	1784	1509	1469	1469
q18	7705	7436	7415	7415
q19	1670	1598	1495	1495
q20	2013	1845	1771	1771
q21	5308	5017	5111	5017
q22	1121	1035	1013	1013
Total cold run time: 72192 ms
Total hot run time: 54284 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184586 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 109c13032badad4116edebfaa3329711129ef764, data reload: false

query1	919	375	369	369
query2	6467	1951	1921	1921
query3	6643	203	212	203
query4	34043	23329	23200	23200
query5	4262	501	487	487
query6	278	154	159	154
query7	4577	296	285	285
query8	233	195	189	189
query9	8581	2470	2446	2446
query10	438	276	257	257
query11	15620	14904	14989	14904
query12	145	98	98	98
query13	1624	361	361	361
query14	10470	7725	6835	6835
query15	218	169	174	169
query16	7606	511	448	448
query17	1580	563	561	561
query18	1949	288	282	282
query19	203	148	142	142
query20	110	106	105	105
query21	207	100	97	97
query22	4275	3994	3968	3968
query23	33702	33354	33119	33119
query24	11518	2915	2807	2807
query25	673	394	414	394
query26	1381	158	157	157
query27	2873	274	272	272
query28	7462	2052	2048	2048
query29	945	423	418	418
query30	307	144	147	144
query31	961	763	730	730
query32	98	53	57	53
query33	771	291	289	289
query34	952	468	468	468
query35	854	744	728	728
query36	1095	913	949	913
query37	156	84	81	81
query38	3823	3702	3821	3702
query39	1421	1389	1363	1363
query40	274	123	114	114
query41	50	46	49	46
query42	118	96	102	96
query43	503	485	464	464
query44	1201	720	725	720
query45	191	170	167	167
query46	1118	702	710	702
query47	1869	1793	1777	1777
query48	365	309	301	301
query49	1102	419	420	419
query50	809	408	408	408
query51	6844	6727	6639	6639
query52	95	92	88	88
query53	256	180	185	180
query54	925	457	450	450
query55	76	79	76	76
query56	274	252	248	248
query57	1134	1055	1084	1055
query58	249	216	226	216
query59	3041	3053	2666	2666
query60	307	272	273	272
query61	119	115	111	111
query62	856	654	654	654
query63	210	296	178	178
query64	9812	2282	1730	1730
query65	3196	3121	3152	3121
query66	1107	337	340	337
query67	15548	14882	14818	14818
query68	6766	548	555	548
query69	720	388	296	296
query70	1192	1121	1103	1103
query71	525	268	272	268
query72	7910	2257	2048	2048
query73	809	327	325	325
query74	9151	8695	8887	8695
query75	5022	2701	2633	2633
query76	4910	1006	1010	1006
query77	786	303	298	298
query78	9673	8989	8987	8987
query79	8794	525	528	525
query80	1007	519	481	481
query81	588	222	226	222
query82	767	132	131	131
query83	336	143	144	143
query84	270	78	73	73
query85	1367	273	263	263
query86	389	303	288	288
query87	4464	4128	4112	4112
query88	4404	2476	2484	2476
query89	518	285	292	285
query90	1983	196	191	191
query91	120	112	94	94
query92	63	49	48	48
query93	7189	538	535	535
query94	947	291	292	291
query95	365	260	257	257
query96	613	280	276	276
query97	3237	3056	3039	3039
query98	223	200	196	196
query99	1575	1283	1233	1233
Total cold run time: 317772 ms
Total hot run time: 184586 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.89 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 109c13032badad4116edebfaa3329711129ef764, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.05
query3	0.22	0.05	0.05
query4	1.67	0.08	0.08
query5	0.50	0.48	0.47
query6	1.14	0.72	0.72
query7	0.02	0.01	0.02
query8	0.04	0.04	0.04
query9	0.54	0.47	0.47
query10	0.53	0.53	0.54
query11	0.16	0.12	0.11
query12	0.15	0.12	0.13
query13	0.61	0.60	0.58
query14	0.78	0.78	0.78
query15	0.84	0.81	0.81
query16	0.37	0.37	0.37
query17	0.95	1.02	0.96
query18	0.23	0.23	0.22
query19	1.76	1.76	1.74
query20	0.01	0.01	0.03
query21	15.40	0.78	0.66
query22	4.44	6.97	2.06
query23	18.27	1.41	1.32
query24	2.08	0.22	0.23
query25	0.15	0.08	0.09
query26	0.30	0.22	0.22
query27	0.45	0.23	0.22
query28	13.34	1.02	1.01
query29	12.64	3.39	3.38
query30	0.24	0.04	0.04
query31	2.88	0.39	0.39
query32	3.26	0.48	0.46
query33	2.91	2.94	2.91
query34	17.14	4.32	4.32
query35	4.43	4.40	4.40
query36	0.65	0.45	0.46
query37	0.19	0.15	0.16
query38	0.15	0.15	0.15
query39	0.05	0.04	0.04
query40	0.15	0.13	0.12
query41	0.09	0.06	0.05
query42	0.06	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.96 s
Total hot run time: 30.89 s

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40116 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 109c13032badad4116edebfaa3329711129ef764, data reload: false

------ Round 1 ----------------------------------
q1	18778	4649	4409	4409
q2	3040	190	183	183
q3	11711	1259	1208	1208
q4	10628	704	701	701
q5	8618	2812	2827	2812
q6	226	137	143	137
q7	952	603	589	589
q8	9328	2040	2041	2040
q9	8793	6581	6587	6581
q10	7076	2257	2222	2222
q11	471	245	261	245
q12	394	217	223	217
q13	18796	2977	2985	2977
q14	293	239	232	232
q15	535	478	485	478
q16	523	387	378	378
q17	998	677	731	677
q18	8164	7608	7353	7353
q19	5289	1038	926	926
q20	652	327	337	327
q21	5354	4697	4431	4431
q22	1109	993	1010	993
Total cold run time: 121728 ms
Total hot run time: 40116 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4419	4253	4232	4232
q2	393	270	261	261
q3	2864	2659	2602	2602
q4	1889	1596	1619	1596
q5	5391	5569	5382	5382
q6	229	136	136	136
q7	2069	1685	1679	1679
q8	3209	3344	3376	3344
q9	8417	8411	8400	8400
q10	3404	3158	3156	3156
q11	620	484	500	484
q12	786	614	599	599
q13	16338	2976	3028	2976
q14	324	268	290	268
q15	523	493	475	475
q16	491	424	410	410
q17	1806	1515	1470	1470
q18	7758	7655	7380	7380
q19	1691	1616	1525	1525
q20	2037	1801	1805	1801
q21	5865	5119	5218	5119
q22	1107	1010	1031	1010
Total cold run time: 71630 ms
Total hot run time: 54305 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185556 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 109c13032badad4116edebfaa3329711129ef764, data reload: false

query1	910	390	378	378
query2	6470	2104	2002	2002
query3	6644	208	221	208
query4	30852	23607	23140	23140
query5	4191	488	510	488
query6	281	169	175	169
query7	4612	300	300	300
query8	248	199	203	199
query9	8453	2431	2416	2416
query10	440	277	277	277
query11	16931	15047	15091	15047
query12	145	102	100	100
query13	1634	383	366	366
query14	10347	6951	6877	6877
query15	236	162	165	162
query16	7756	470	464	464
query17	1411	577	542	542
query18	1973	291	281	281
query19	188	139	139	139
query20	110	103	107	103
query21	207	107	99	99
query22	4330	4074	4047	4047
query23	34076	33569	33309	33309
query24	12155	2881	2822	2822
query25	680	372	373	372
query26	1842	163	158	158
query27	2886	268	272	268
query28	7491	2024	2009	2009
query29	1148	403	432	403
query30	294	150	147	147
query31	965	733	767	733
query32	97	54	56	54
query33	747	290	277	277
query34	968	456	466	456
query35	839	702	717	702
query36	1090	922	918	918
query37	275	85	83	83
query38	4021	3871	3839	3839
query39	1424	1387	1398	1387
query40	270	112	113	112
query41	46	44	45	44
query42	119	94	99	94
query43	500	496	465	465
query44	1254	750	730	730
query45	200	164	163	163
query46	1114	741	734	734
query47	1865	1785	1755	1755
query48	372	305	289	289
query49	1196	418	410	410
query50	803	408	392	392
query51	6796	6820	6651	6651
query52	134	91	92	91
query53	254	182	183	182
query54	994	449	438	438
query55	77	75	75	75
query56	263	249	251	249
query57	1156	1075	1093	1075
query58	247	222	225	222
query59	3153	2821	2994	2821
query60	287	273	270	270
query61	97	100	96	96
query62	842	652	637	637
query63	211	178	182	178
query64	10577	2321	1780	1780
query65	3229	3162	3224	3162
query66	1376	343	327	327
query67	15196	14775	14874	14775
query68	4494	545	557	545
query69	417	272	276	272
query70	1090	1108	1011	1011
query71	433	290	275	275
query72	7107	2263	2059	2059
query73	752	331	329	329
query74	9110	8760	8807	8760
query75	3618	2737	2707	2707
query76	2803	1003	1042	1003
query77	496	333	315	315
query78	9813	9050	9915	9050
query79	3096	523	524	523
query80	1941	521	497	497
query81	590	222	226	222
query82	797	142	142	142
query83	283	153	150	150
query84	257	84	78	78
query85	1871	298	282	282
query86	305	281	272	272
query87	4404	4203	4244	4203
query88	3769	2413	2417	2413
query89	386	298	281	281
query90	1785	206	194	194
query91	124	98	99	98
query92	63	49	52	49
query93	2375	555	535	535
query94	697	302	277	277
query95	358	269	259	259
query96	606	281	271	271
query97	3185	3037	3047	3037
query98	234	207	213	207
query99	1682	1299	1282	1282
Total cold run time: 300936 ms
Total hot run time: 185556 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.25 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 109c13032badad4116edebfaa3329711129ef764, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.06
query4	1.67	0.08	0.07
query5	0.49	0.50	0.47
query6	1.13	0.73	0.73
query7	0.02	0.02	0.01
query8	0.05	0.05	0.04
query9	0.54	0.49	0.47
query10	0.55	0.54	0.54
query11	0.15	0.11	0.12
query12	0.15	0.12	0.12
query13	0.60	0.60	0.58
query14	0.76	0.79	0.78
query15	0.85	0.83	0.81
query16	0.37	0.37	0.37
query17	0.96	1.06	1.03
query18	0.22	0.22	0.22
query19	1.92	1.70	1.85
query20	0.02	0.01	0.01
query21	15.39	0.75	0.66
query22	4.28	6.28	2.45
query23	18.24	1.32	1.25
query24	2.09	0.22	0.22
query25	0.14	0.08	0.07
query26	0.30	0.21	0.22
query27	0.45	0.23	0.23
query28	13.38	1.02	0.99
query29	12.65	3.35	3.32
query30	0.24	0.05	0.05
query31	2.90	0.40	0.38
query32	3.27	0.50	0.47
query33	2.93	2.96	2.98
query34	16.83	4.34	4.33
query35	4.42	4.42	4.46
query36	0.66	0.46	0.49
query37	0.19	0.15	0.15
query38	0.15	0.15	0.15
query39	0.05	0.04	0.04
query40	0.16	0.13	0.12
query41	0.09	0.05	0.05
query42	0.05	0.04	0.05
query43	0.04	0.05	0.04
Total cold run time: 109.7 s
Total hot run time: 31.25 s

const std::map<TNetworkAddress, FrontendInfo>& running_fes =
ExecEnv::GetInstance()->get_running_frontends();

std::vector<TNetworkAddress> qualified_fes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们的返回值,不应该是一个set
应该是 map<feuid,set>
我们检测的时候,应该检测一个query的fe uid 在这个map里,同时他不在后面这个set里,那么表示这个是不合理的。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果一个query的feuid,从这个map 里找不到,那么就不应该处理

auto future_status = future.wait_for(std::chrono::seconds(3));
if (future_status != std::future_status::ready) {
LOG_WARNING("Fetch running queries from frontend timeout");
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为啥是continue? 而不是报错?return false

// 2. the fe is starting, hb has not come yet
// 3. this query does not have coordinator at all (eg. streamload, spark connector)
if (q_ctx->get_fe_process_uuid() == 0) {
white_list_queries.insert(q_ctx->query_id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不用这个, 按道理说,如果get running queries 返回的是map,那么只要这个query的fe uid 不在这个map,那么就应该忽略

// Typically, this means this query is invalid, eg. we have some bugs in pipeline scheduler which
// makes the query can not be closed normally.
// We need to cancel these query to release resources.
LOG_ERROR(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

打一下时间间隔,比如第一次检查,第二次检查

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@zhiqiang-hhhh zhiqiang-hhhh marked this pull request as ready for review August 14, 2024 15:52
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 14, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38033 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 451f22f581b37d86563a1ff2ae25180d264a824f, data reload: false

------ Round 1 ----------------------------------
q1	18745	4990	4403	4403
q2	3093	184	182	182
q3	11062	1190	1092	1092
q4	10581	694	712	694
q5	7867	2872	2828	2828
q6	229	144	146	144
q7	988	616	602	602
q8	9457	2038	2074	2038
q9	7440	6551	6589	6551
q10	7011	2221	2222	2221
q11	456	244	249	244
q12	416	229	219	219
q13	17756	3005	2996	2996
q14	290	236	234	234
q15	532	494	491	491
q16	509	385	392	385
q17	987	673	675	673
q18	7519	6828	6730	6730
q19	5432	1073	1083	1073
q20	701	329	346	329
q21	4385	2916	2912	2912
q22	1088	1011	992	992
Total cold run time: 116544 ms
Total hot run time: 38033 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4573	4216	4261	4216
q2	363	276	272	272
q3	2887	2624	2632	2624
q4	1953	1649	1632	1632
q5	5390	5415	5396	5396
q6	217	137	134	134
q7	2077	1685	1678	1678
q8	3184	3381	3402	3381
q9	8467	8452	8454	8452
q10	3389	3190	3140	3140
q11	588	493	497	493
q12	809	612	645	612
q13	16370	3021	3047	3021
q14	331	277	284	277
q15	518	470	477	470
q16	467	416	428	416
q17	1796	1496	1469	1469
q18	7808	7674	7631	7631
q19	1658	1618	1476	1476
q20	1996	1801	1841	1801
q21	5431	5116	5215	5116
q22	1126	1029	1024	1024
Total cold run time: 71398 ms
Total hot run time: 54731 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184060 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 451f22f581b37d86563a1ff2ae25180d264a824f, data reload: false

query1	899	383	359	359
query2	6465	1919	1814	1814
query3	6659	203	210	203
query4	33369	23237	23171	23171
query5	4189	485	467	467
query6	261	179	153	153
query7	4597	290	287	287
query8	259	208	201	201
query9	8524	2436	2388	2388
query10	419	263	259	259
query11	17926	14934	14986	14934
query12	152	95	96	95
query13	1636	364	369	364
query14	9132	6245	7017	6245
query15	213	162	166	162
query16	7738	506	387	387
query17	1380	574	546	546
query18	1995	283	284	283
query19	195	145	149	145
query20	116	108	104	104
query21	207	102	104	102
query22	4473	4006	3932	3932
query23	34139	33324	33500	33324
query24	12075	2873	2826	2826
query25	707	394	399	394
query26	1842	155	156	155
query27	2950	269	280	269
query28	7540	2060	2039	2039
query29	1194	414	412	412
query30	304	158	152	152
query31	1043	725	746	725
query32	97	54	58	54
query33	773	292	286	286
query34	934	461	466	461
query35	843	708	731	708
query36	1104	943	926	926
query37	304	82	85	82
query38	3909	3834	3820	3820
query39	1432	1367	1370	1367
query40	278	119	120	119
query41	50	46	48	46
query42	115	95	100	95
query43	496	458	440	440
query44	1221	744	733	733
query45	200	165	164	164
query46	1094	734	724	724
query47	1845	1770	1789	1770
query48	362	292	286	286
query49	1205	432	428	428
query50	810	397	408	397
query51	6792	6675	6645	6645
query52	106	95	94	94
query53	253	187	185	185
query54	1017	452	450	450
query55	77	75	75	75
query56	268	258	257	257
query57	1170	1077	1036	1036
query58	250	239	245	239
query59	2979	2753	2695	2695
query60	302	275	276	275
query61	117	116	115	115
query62	952	653	664	653
query63	208	174	184	174
query64	6176	2217	1780	1780
query65	3335	3110	3135	3110
query66	1376	342	322	322
query67	15622	14832	14853	14832
query68	7085	558	553	553
query69	692	380	312	312
query70	1189	1130	1103	1103
query71	532	273	270	270
query72	7626	2277	2042	2042
query73	823	315	317	315
query74	9211	8739	8724	8724
query75	5209	2717	2687	2687
query76	4891	992	959	959
query77	766	322	351	322
query78	9892	8974	8955	8955
query79	8054	525	532	525
query80	1009	503	499	499
query81	598	218	225	218
query82	796	140	140	140
query83	326	147	146	146
query84	276	74	72	72
query85	1365	297	268	268
query86	404	297	290	290
query87	4440	4327	4135	4135
query88	4877	2297	2300	2297
query89	516	297	286	286
query90	2042	200	188	188
query91	124	97	96	96
query92	61	51	49	49
query93	6354	530	536	530
query94	924	297	300	297
query95	355	254	255	254
query96	611	270	270	270
query97	3192	3049	3048	3048
query98	227	201	210	201
query99	1548	1269	1293	1269
Total cold run time: 316685 ms
Total hot run time: 184060 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.94 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 451f22f581b37d86563a1ff2ae25180d264a824f, data reload: false

query1	0.06	0.04	0.04
query2	0.08	0.04	0.04
query3	0.23	0.05	0.06
query4	1.67	0.08	0.08
query5	0.50	0.50	0.50
query6	1.13	0.73	0.72
query7	0.02	0.02	0.02
query8	0.05	0.04	0.05
query9	0.55	0.49	0.48
query10	0.53	0.54	0.54
query11	0.15	0.12	0.11
query12	0.15	0.12	0.13
query13	0.61	0.60	0.57
query14	0.76	0.78	0.77
query15	0.85	0.82	0.81
query16	0.36	0.36	0.36
query17	0.96	0.97	1.03
query18	0.23	0.21	0.22
query19	1.85	1.75	1.77
query20	0.01	0.01	0.01
query21	15.44	0.74	0.64
query22	4.89	6.60	2.04
query23	18.27	1.36	1.32
query24	2.10	0.24	0.23
query25	0.16	0.08	0.08
query26	0.30	0.22	0.22
query27	0.46	0.22	0.23
query28	13.26	1.01	1.01
query29	12.61	3.32	3.29
query30	0.24	0.05	0.05
query31	2.89	0.41	0.40
query32	3.24	0.49	0.47
query33	2.94	2.97	2.99
query34	16.84	4.42	4.34
query35	4.38	4.43	4.42
query36	0.65	0.46	0.47
query37	0.18	0.16	0.16
query38	0.15	0.15	0.15
query39	0.04	0.04	0.04
query40	0.16	0.12	0.12
query41	0.09	0.05	0.04
query42	0.06	0.06	0.04
query43	0.05	0.04	0.05
Total cold run time: 110.15 s
Total hot run time: 30.94 s

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 37700 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bdf142aeff417bdb8b01240a522576d4d29328ef, data reload: false

------ Round 1 ----------------------------------
q1	18054	5230	4247	4247
q2	2037	180	182	180
q3	11817	1012	1162	1012
q4	10512	727	786	727
q5	7764	2808	2801	2801
q6	225	136	132	132
q7	951	585	597	585
q8	9558	2047	2037	2037
q9	8694	6545	6544	6544
q10	7006	2207	2183	2183
q11	454	240	241	240
q12	390	216	217	216
q13	17757	3004	2974	2974
q14	280	230	232	230
q15	533	484	493	484
q16	503	396	378	378
q17	979	740	693	693
q18	7429	6911	6780	6780
q19	6300	1068	1043	1043
q20	652	333	334	333
q21	3891	2966	2890	2890
q22	1136	1029	991	991
Total cold run time: 116922 ms
Total hot run time: 37700 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4456	4233	4263	4233
q2	366	269	278	269
q3	2859	2634	2620	2620
q4	2045	1684	1693	1684
q5	5593	5724	5668	5668
q6	222	131	130	130
q7	2168	1711	1720	1711
q8	3347	3457	3437	3437
q9	8709	8788	8847	8788
q10	3600	3275	3260	3260
q11	595	503	508	503
q12	808	608	621	608
q13	17025	3104	3180	3104
q14	313	291	280	280
q15	525	494	481	481
q16	516	447	425	425
q17	1868	1520	1517	1517
q18	8209	7945	7918	7918
q19	1755	1505	1587	1505
q20	2128	1883	1881	1881
q21	9608	5343	5532	5343
q22	1123	1064	1030	1030
Total cold run time: 77838 ms
Total hot run time: 56395 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189947 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bdf142aeff417bdb8b01240a522576d4d29328ef, data reload: false

query1	1262	900	869	869
query2	6469	1980	1936	1936
query3	10732	3897	3624	3624
query4	58511	25822	23253	23253
query5	5472	487	484	484
query6	419	154	161	154
query7	5812	296	293	293
query8	287	200	210	200
query9	9210	2435	2410	2410
query10	488	277	263	263
query11	18207	14953	15182	14953
query12	158	97	102	97
query13	1589	393	371	371
query14	10725	6998	6771	6771
query15	237	172	185	172
query16	7540	505	484	484
query17	1149	600	598	598
query18	1918	294	291	291
query19	293	148	150	148
query20	126	119	126	119
query21	210	108	102	102
query22	4754	4495	4386	4386
query23	34150	33660	33440	33440
query24	5988	2934	2819	2819
query25	546	403	405	403
query26	688	159	161	159
query27	1767	274	272	272
query28	3706	2061	2054	2054
query29	675	417	431	417
query30	221	147	150	147
query31	923	783	751	751
query32	79	57	54	54
query33	455	288	286	286
query34	840	458	452	452
query35	826	721	679	679
query36	1057	923	934	923
query37	133	80	79	79
query38	3962	3847	3883	3847
query39	1448	1415	1434	1415
query40	205	116	116	116
query41	50	46	50	46
query42	122	101	95	95
query43	506	469	473	469
query44	1090	725	731	725
query45	194	163	166	163
query46	1081	735	735	735
query47	1881	1780	1782	1780
query48	370	295	297	295
query49	762	426	435	426
query50	805	397	402	397
query51	6812	6778	6729	6729
query52	102	95	91	91
query53	252	185	181	181
query54	561	452	556	452
query55	76	75	76	75
query56	255	241	240	240
query57	1146	1038	1051	1038
query58	219	227	229	227
query59	2999	2813	2860	2813
query60	291	262	265	262
query61	95	100	91	91
query62	760	661	644	644
query63	206	175	190	175
query64	3187	1714	2758	1714
query65	3195	3156	3115	3115
query66	681	324	324	324
query67	15670	15057	15209	15057
query68	6852	551	539	539
query69	717	381	330	330
query70	1156	1085	1116	1085
query71	534	273	268	268
query72	7663	2286	2027	2027
query73	945	317	322	317
query74	9021	8767	8885	8767
query75	5063	2681	2689	2681
query76	4655	1053	1001	1001
query77	776	305	308	305
query78	9754	9135	9004	9004
query79	6910	519	513	513
query80	1958	479	467	467
query81	591	220	228	220
query82	706	138	133	133
query83	295	148	144	144
query84	266	76	112	76
query85	926	274	274	274
query86	338	289	302	289
query87	4281	4148	4263	4148
query88	4096	2303	2292	2292
query89	425	288	283	283
query90	2310	188	190	188
query91	120	94	93	93
query92	63	50	49	49
query93	5428	533	534	533
query94	1120	290	280	280
query95	358	249	250	249
query96	615	270	270	270
query97	3251	3005	3028	3005
query98	215	208	231	208
query99	1570	1262	1259	1259
Total cold run time: 329583 ms
Total hot run time: 189947 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.25 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bdf142aeff417bdb8b01240a522576d4d29328ef, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.04
query3	0.23	0.06	0.06
query4	1.66	0.08	0.09
query5	0.51	0.47	0.48
query6	1.12	0.74	0.72
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.55	0.49	0.50
query10	0.53	0.54	0.54
query11	0.15	0.12	0.11
query12	0.15	0.11	0.12
query13	0.62	0.60	0.59
query14	0.75	0.80	0.78
query15	0.84	0.81	0.82
query16	0.35	0.36	0.37
query17	0.97	0.98	1.02
query18	0.22	0.22	0.23
query19	1.75	1.76	1.66
query20	0.01	0.01	0.02
query21	15.40	0.74	0.65
query22	4.17	8.18	1.58
query23	18.28	1.33	1.24
query24	2.06	0.24	0.22
query25	0.15	0.07	0.09
query26	0.31	0.21	0.22
query27	0.46	0.23	0.23
query28	13.25	1.02	0.99
query29	12.66	3.33	3.32
query30	0.24	0.05	0.04
query31	2.86	0.39	0.38
query32	3.28	0.49	0.47
query33	2.90	2.95	2.91
query34	16.92	4.38	4.37
query35	4.45	4.37	4.40
query36	0.65	0.48	0.47
query37	0.19	0.16	0.16
query38	0.16	0.14	0.15
query39	0.04	0.04	0.04
query40	0.16	0.14	0.13
query41	0.10	0.05	0.05
query42	0.06	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.4 s
Total hot run time: 30.25 s

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 37966 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bdf142aeff417bdb8b01240a522576d4d29328ef, data reload: false

------ Round 1 ----------------------------------
q1	18893	6438	4307	4307
q2	2030	179	184	179
q3	11491	1019	1110	1019
q4	10487	710	883	710
q5	7759	2864	2818	2818
q6	229	136	137	136
q7	956	618	616	616
q8	9323	2038	2081	2038
q9	7323	6538	6558	6538
q10	7005	2203	2211	2203
q11	444	245	243	243
q12	387	215	218	215
q13	18787	3002	3071	3002
q14	294	261	243	243
q15	526	489	497	489
q16	511	408	405	405
q17	1028	782	701	701
q18	7499	7012	6810	6810
q19	6543	1041	1013	1013
q20	696	322	332	322
q21	3954	2926	2926	2926
q22	1147	1033	1075	1033
Total cold run time: 117312 ms
Total hot run time: 37966 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4533	4266	4268	4266
q2	382	277	281	277
q3	2831	2631	2814	2631
q4	1960	1755	1703	1703
q5	5678	5672	5625	5625
q6	230	138	134	134
q7	2132	1786	1812	1786
q8	3335	3552	3543	3543
q9	8810	8844	8832	8832
q10	3556	3323	3244	3244
q11	603	516	519	516
q12	831	655	658	655
q13	15877	3227	3219	3219
q14	338	330	288	288
q15	541	490	495	490
q16	500	450	440	440
q17	1834	1543	1498	1498
q18	8132	8099	7852	7852
q19	1799	1619	1737	1619
q20	2457	1874	1879	1874
q21	9133	5288	5290	5288
q22	1164	1083	1084	1083
Total cold run time: 76656 ms
Total hot run time: 56863 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189933 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bdf142aeff417bdb8b01240a522576d4d29328ef, data reload: false

query1	1259	902	869	869
query2	6364	1884	1893	1884
query3	10753	3951	3968	3951
query4	59510	24064	23122	23122
query5	5619	490	488	488
query6	510	159	160	159
query7	6434	290	297	290
query8	310	203	202	202
query9	8697	2470	2432	2432
query10	500	270	259	259
query11	18103	15115	15126	15115
query12	165	102	99	99
query13	1560	384	365	365
query14	12159	7339	6727	6727
query15	241	174	178	174
query16	7774	428	520	428
query17	1143	594	586	586
query18	2112	305	309	305
query19	302	162	166	162
query20	122	115	110	110
query21	221	107	109	107
query22	4589	4441	4328	4328
query23	34315	33630	33412	33412
query24	5627	2875	2872	2872
query25	547	408	431	408
query26	698	161	162	161
query27	1775	274	275	274
query28	3878	2077	2081	2077
query29	748	419	424	419
query30	245	151	164	151
query31	921	764	746	746
query32	83	53	58	53
query33	500	307	289	289
query34	851	457	480	457
query35	806	727	731	727
query36	1058	936	943	936
query37	139	86	84	84
query38	3966	3824	3863	3824
query39	1432	1374	1381	1374
query40	202	117	117	117
query41	49	46	45	45
query42	119	99	98	98
query43	497	473	463	463
query44	1073	732	743	732
query45	196	171	165	165
query46	1095	745	740	740
query47	1849	1766	1743	1743
query48	369	297	291	291
query49	768	432	457	432
query50	805	410	409	409
query51	6775	6765	6784	6765
query52	98	96	90	90
query53	258	185	185	185
query54	570	451	556	451
query55	75	73	75	73
query56	263	240	244	240
query57	1109	1048	1043	1043
query58	220	236	245	236
query59	2907	2790	2772	2772
query60	297	269	270	269
query61	96	92	93	92
query62	756	648	646	646
query63	214	185	184	184
query64	4111	2242	1724	1724
query65	3255	3138	3156	3138
query66	672	336	332	332
query67	15570	15200	14910	14910
query68	8442	543	586	543
query69	725	384	292	292
query70	1564	1086	1071	1071
query71	505	277	283	277
query72	6856	2259	2007	2007
query73	2738	323	318	318
query74	9126	8647	8707	8647
query75	4776	2722	2726	2722
query76	5054	979	902	902
query77	711	323	303	303
query78	11439	9461	9028	9028
query79	11420	530	524	524
query80	1427	486	496	486
query81	585	224	221	221
query82	523	129	128	128
query83	272	152	146	146
query84	270	74	74	74
query85	717	271	312	271
query86	336	297	259	259
query87	4612	4168	4233	4168
query88	4596	2288	2302	2288
query89	453	285	281	281
query90	2477	188	191	188
query91	123	93	94	93
query92	64	53	48	48
query93	4723	541	523	523
query94	1162	307	293	293
query95	361	251	254	251
query96	595	271	268	268
query97	3217	3038	3083	3038
query98	238	207	193	193
query99	1533	1277	1271	1271
Total cold run time: 341944 ms
Total hot run time: 189933 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.85 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bdf142aeff417bdb8b01240a522576d4d29328ef, data reload: false

query1	0.05	0.04	0.04
query2	0.09	0.04	0.04
query3	0.23	0.05	0.06
query4	1.66	0.09	0.08
query5	0.51	0.49	0.50
query6	1.14	0.72	0.74
query7	0.02	0.01	0.01
query8	0.05	0.04	0.05
query9	0.55	0.49	0.49
query10	0.55	0.55	0.53
query11	0.16	0.12	0.12
query12	0.14	0.13	0.13
query13	0.62	0.59	0.58
query14	0.76	0.79	0.78
query15	0.85	0.82	0.81
query16	0.36	0.37	0.37
query17	1.07	1.02	1.05
query18	0.23	0.22	0.22
query19	1.95	1.79	1.70
query20	0.01	0.01	0.01
query21	15.40	0.75	0.65
query22	4.33	7.11	1.96
query23	18.24	1.41	1.20
query24	2.15	0.23	0.22
query25	0.16	0.08	0.08
query26	0.30	0.20	0.21
query27	0.46	0.23	0.23
query28	13.21	1.02	1.01
query29	12.64	3.37	3.26
query30	0.25	0.05	0.05
query31	2.91	0.41	0.38
query32	3.25	0.49	0.48
query33	2.95	3.00	2.98
query34	17.22	4.42	4.42
query35	4.46	4.44	4.49
query36	0.66	0.47	0.47
query37	0.19	0.17	0.15
query38	0.16	0.15	0.15
query39	0.04	0.03	0.04
query40	0.16	0.13	0.12
query41	0.09	0.06	0.06
query42	0.06	0.05	0.04
query43	0.04	0.05	0.04
Total cold run time: 110.33 s
Total hot run time: 30.85 s

Copy link
Contributor

@wangbo wangbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit b0f22f0 into apache:master Aug 17, 2024
dataroaring pushed a commit that referenced this pull request Aug 17, 2024
)

* Problem

We are currently facing an issue where pipeline tasks experience leaks
in certain situations. The leak in pipeline tasks refers to the scenario
where a query has already been completed, but its associated data
structures still persist on the backend (BE). This could lead to some
memory or computational resources on the BE never being released.

* Fix

We will periodically reconcile queries with the Frontend (FE) in the
cancel work thread. Once we detect that a query has been completed on
the FE but still exists on the Backend (BE), we will cancel the query to
promptly release the resources. To avoid mistakenly triggering
cancellations, we employ a conservative strategy. For instance, we will
not proactively cancel queries if we detect any FE is in an abnormal
state or if there are network conflicts.
yiguolei pushed a commit that referenced this pull request Aug 19, 2024
… (#39537)

pick #39223 with some modifications. Optimization will only be applied
to pipeline x.
@zhiqiang-hhhh zhiqiang-hhhh deleted the opt-cancel branch September 5, 2024 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants