Skip to content

Conversation

@sollhui
Copy link
Contributor

@sollhui sollhui commented Jul 4, 2025

pick (#52654)

### What problem does this PR solve?

routine load task will block in following case:
1. The user created a job using the admin user of clusterA, and at some point deleted clusterA, and renamed clusterB to clusterA
2. The cluster ID saved in the job is invalid and can't find any BE
3. This task was repeatedly taken out of the queue and was put back to queue for there was no BE to execute, causing the other tasks to get stuck.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

… find any BE (apache#52654)

    ### What problem does this PR solve?

    routine load task will block in following case:
    1. The user created a job using the admin user of clusterA, and at some
    point deleted clusterA, and renamed clusterB to clusterA
    2. The cluster ID saved in the job is invalid and can't find any BE
    3. This task was repeatedly taken out of the queue and was put back to
    queue for there was no BE to execute, causing the other tasks to get
    stuck.
@sollhui sollhui requested a review from morrySnow as a code owner July 4, 2025 09:17
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Contributor Author

sollhui commented Jul 4, 2025

run buildall

1 similar comment
@morrySnow
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39618 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 543ef30ecd64cbd7489bbf5f60a634a96a7520ec, data reload: false

------ Round 1 ----------------------------------
q1	17595	6814	6578	6578
q2	2062	190	176	176
q3	10652	1116	1141	1116
q4	10415	754	749	749
q5	7786	2870	2786	2786
q6	217	139	135	135
q7	981	622	613	613
q8	9345	1948	2013	1948
q9	6690	6414	6397	6397
q10	7074	2253	2271	2253
q11	470	263	262	262
q12	404	205	204	204
q13	17778	3021	2988	2988
q14	234	205	210	205
q15	513	463	464	463
q16	505	381	377	377
q17	960	581	576	576
q18	7443	6478	6667	6478
q19	1311	957	967	957
q20	450	205	209	205
q21	3882	3196	3142	3142
q22	1088	1010	1010	1010
Total cold run time: 107855 ms
Total hot run time: 39618 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6627	6597	6546	6546
q2	327	231	233	231
q3	2906	2878	2871	2871
q4	2099	1837	1858	1837
q5	5749	5783	5799	5783
q6	207	131	132	131
q7	2224	1880	1840	1840
q8	3412	3565	3558	3558
q9	8782	8809	8854	8809
q10	3569	3544	3532	3532
q11	594	492	496	492
q12	786	595	618	595
q13	7379	3149	3190	3149
q14	299	272	274	272
q15	522	460	463	460
q16	477	443	449	443
q17	1831	1638	1569	1569
q18	8222	7919	7705	7705
q19	1678	1567	1562	1562
q20	2106	1810	1808	1808
q21	5220	5038	5032	5032
q22	1174	1088	1043	1043
Total cold run time: 66190 ms
Total hot run time: 59268 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197001 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 543ef30ecd64cbd7489bbf5f60a634a96a7520ec, data reload: false

query1	1280	916	901	901
query2	6337	1933	1870	1870
query3	10938	4494	4483	4483
query4	33037	23492	23433	23433
query5	3510	466	437	437
query6	264	195	176	176
query7	3989	318	315	315
query8	292	233	246	233
query9	9570	2567	2564	2564
query10	479	268	255	255
query11	17862	15295	15531	15295
query12	164	105	102	102
query13	1577	435	418	418
query14	10018	6803	6596	6596
query15	245	187	201	187
query16	8068	517	518	517
query17	1634	617	614	614
query18	2190	323	322	322
query19	236	165	181	165
query20	133	121	116	116
query21	209	109	112	109
query22	4718	4502	4399	4399
query23	35368	34226	34210	34210
query24	11475	2939	2917	2917
query25	675	435	438	435
query26	1207	174	174	174
query27	2393	375	353	353
query28	7322	2199	2164	2164
query29	886	455	465	455
query30	260	157	176	157
query31	1063	814	810	810
query32	93	56	58	56
query33	771	307	300	300
query34	1000	518	523	518
query35	874	721	732	721
query36	1075	928	939	928
query37	131	63	66	63
query38	4086	3999	3996	3996
query39	1547	1503	1502	1502
query40	207	100	105	100
query41	47	47	48	47
query42	113	97	100	97
query43	519	486	486	486
query44	1243	801	806	801
query45	191	170	175	170
query46	1186	745	751	745
query47	2040	1986	1974	1974
query48	442	328	349	328
query49	981	406	391	391
query50	845	429	419	419
query51	7531	7373	7236	7236
query52	106	95	92	92
query53	264	198	186	186
query54	1344	479	479	479
query55	81	79	79	79
query56	267	253	253	253
query57	1312	1217	1214	1214
query58	232	206	229	206
query59	3286	3068	3000	3000
query60	280	268	262	262
query61	130	109	137	109
query62	896	700	698	698
query63	218	194	192	192
query64	4228	682	639	639
query65	3367	3201	3185	3185
query66	976	291	292	291
query67	15799	15636	15668	15636
query68	4520	575	562	562
query69	439	265	255	255
query70	1125	1045	1128	1045
query71	344	252	269	252
query72	6321	4155	4001	4001
query73	754	349	348	348
query74	10638	9122	9101	9101
query75	3418	2608	2649	2608
query76	2770	1089	1153	1089
query77	405	271	277	271
query78	10603	9705	9450	9450
query79	1080	605	595	595
query80	708	427	429	427
query81	515	221	220	220
query82	595	91	87	87
query83	235	142	147	142
query84	237	82	77	77
query85	1049	306	298	298
query86	326	301	304	301
query87	4415	4307	4252	4252
query88	3411	2397	2377	2377
query89	390	293	300	293
query90	2055	187	189	187
query91	137	108	117	108
query92	62	50	49	49
query93	1054	543	550	543
query94	767	306	286	286
query95	368	262	252	252
query96	614	280	278	278
query97	3291	3156	3293	3156
query98	231	217	212	212
query99	1594	1298	1317	1298
Total cold run time: 299578 ms
Total hot run time: 197001 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.2 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 543ef30ecd64cbd7489bbf5f60a634a96a7520ec, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.04	0.03
query3	0.24	0.07	0.06
query4	1.61	0.11	0.10
query5	0.54	0.53	0.50
query6	1.13	0.72	0.73
query7	0.02	0.02	0.01
query8	0.05	0.04	0.03
query9	0.56	0.49	0.50
query10	0.57	0.56	0.58
query11	0.15	0.10	0.11
query12	0.14	0.11	0.12
query13	0.60	0.60	0.61
query14	0.78	0.80	0.80
query15	0.84	0.82	0.84
query16	0.38	0.39	0.38
query17	1.03	1.06	1.07
query18	0.23	0.22	0.22
query19	1.89	1.76	1.87
query20	0.01	0.02	0.01
query21	15.40	0.59	0.57
query22	2.28	2.67	1.72
query23	17.10	0.97	0.92
query24	2.78	0.87	1.19
query25	0.19	0.06	0.11
query26	0.53	0.15	0.14
query27	0.05	0.05	0.04
query28	10.57	0.47	0.46
query29	12.58	3.26	3.26
query30	0.26	0.07	0.07
query31	2.85	0.40	0.39
query32	3.22	0.47	0.45
query33	2.97	3.01	3.01
query34	17.09	4.56	4.51
query35	4.57	4.63	4.56
query36	0.64	0.48	0.48
query37	0.09	0.06	0.06
query38	0.04	0.03	0.04
query39	0.04	0.02	0.02
query40	0.16	0.12	0.12
query41	0.09	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 104.45 s
Total hot run time: 30.2 s

@morrySnow morrySnow merged commit b987e64 into apache:branch-3.1 Jul 8, 2025
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants