Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Jul 8, 2025

Cherry-picked from #52887

…52887)

### What problem does this PR solve?

Routine load job could not transform RUNNING to NEED_SCHEDULE, when
partition num increase and reschedule job, it will throw exception,
causing new partition can not consume:
```
2025-07-07 14:35:39,847 WARN (Routine load scheduler|41) [RoutineLoadScheduler.runAfterCatalogReady():59] Failed to process one round of RoutineLoadScheduler
org.apache.doris.common.DdlException: errCode = 2, detailMessage = Could not transform RUNNING to NEED_SCHEDULE
        at org.apache.doris.load.routineload.RoutineLoadJob.checkStateTransform(RoutineLoadJob.java:788) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.load.routineload.RoutineLoadJob.unprotectUpdateState(RoutineLoadJob.java:1366) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.load.routineload.RoutineLoadJob.update(RoutineLoadJob.java:1483) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.load.routineload.RoutineLoadManager.updateRoutineLoadJob(RoutineLoadManager.java:839) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.load.routineload.RoutineLoadScheduler.process(RoutineLoadScheduler.java:65) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.load.routineload.RoutineLoadScheduler.runAfterCatalogReady(RoutineLoadScheduler.java:57) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
```

introduced by #40728, and should
remove this limit.
@github-actions github-actions bot requested a review from dataroaring as a code owner July 8, 2025 01:55
@Thearas
Copy link
Contributor

Thearas commented Jul 8, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Jul 8, 2025
@dataroaring dataroaring reopened this Jul 8, 2025
@Thearas
Copy link
Contributor

Thearas commented Jul 8, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40218 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 155e06da717602b64a5a7effee94d0884f2ef5bd, data reload: false

------ Round 1 ----------------------------------
q1	17617	6804	6635	6635
q2	2049	202	195	195
q3	10506	1139	1156	1139
q4	10391	732	741	732
q5	7786	2973	2897	2897
q6	216	134	132	132
q7	989	634	632	632
q8	9363	2054	2059	2054
q9	6659	6413	6455	6413
q10	7029	2281	2313	2281
q11	466	264	289	264
q12	395	219	217	217
q13	17776	2961	2944	2944
q14	243	206	208	206
q15	526	462	471	462
q16	491	376	380	376
q17	990	645	568	568
q18	7421	6607	6586	6586
q19	1438	1043	1101	1043
q20	477	199	206	199
q21	3995	3329	3289	3289
q22	1067	990	954	954
Total cold run time: 107890 ms
Total hot run time: 40218 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6609	6580	6580	6580
q2	333	225	239	225
q3	2963	2961	2933	2933
q4	2044	1811	1841	1811
q5	5782	5742	5720	5720
q6	214	127	128	127
q7	2262	1795	1849	1795
q8	3357	3535	3571	3535
q9	8809	8931	8883	8883
q10	3542	3543	3511	3511
q11	633	498	507	498
q12	828	561	612	561
q13	9737	3182	3122	3122
q14	307	273	280	273
q15	520	468	464	464
q16	517	450	460	450
q17	1863	1612	1583	1583
q18	8399	7743	7738	7738
q19	1693	1434	1432	1432
q20	2062	1842	1886	1842
q21	5196	4971	4926	4926
q22	1136	1074	999	999
Total cold run time: 68806 ms
Total hot run time: 59008 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196578 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 155e06da717602b64a5a7effee94d0884f2ef5bd, data reload: false

query1	1297	904	873	873
query2	6326	1885	1829	1829
query3	10822	4313	4485	4313
query4	40579	24026	23867	23867
query5	5289	454	439	439
query6	237	176	187	176
query7	3711	310	319	310
query8	264	246	218	218
query9	6450	2608	2599	2599
query10	464	261	257	257
query11	16082	15227	15084	15084
query12	160	99	104	99
query13	864	429	420	420
query14	9742	6685	7317	6685
query15	235	187	200	187
query16	7645	480	500	480
query17	1542	603	590	590
query18	2056	314	324	314
query19	258	170	166	166
query20	120	118	116	116
query21	216	108	107	107
query22	4702	4444	4408	4408
query23	35008	34208	34456	34208
query24	11287	2941	2911	2911
query25	543	392	405	392
query26	744	170	167	167
query27	2276	355	359	355
query28	6406	2193	2193	2193
query29	700	459	426	426
query30	261	159	172	159
query31	1031	790	807	790
query32	66	59	54	54
query33	609	299	302	299
query34	909	506	515	506
query35	937	737	717	717
query36	1055	960	951	951
query37	108	67	64	64
query38	4082	4003	3962	3962
query39	1527	1451	1448	1448
query40	206	98	99	98
query41	49	50	47	47
query42	115	105	108	105
query43	516	495	493	493
query44	1375	805	808	805
query45	185	171	167	167
query46	1143	718	764	718
query47	1987	1874	1930	1874
query48	502	403	396	396
query49	936	400	405	400
query50	823	422	420	420
query51	7373	7233	7354	7233
query52	107	90	85	85
query53	267	194	199	194
query54	578	478	464	464
query55	82	78	80	78
query56	259	256	263	256
query57	1269	1221	1189	1189
query58	237	214	214	214
query59	3163	2995	3038	2995
query60	299	276	271	271
query61	117	138	122	122
query62	847	704	691	691
query63	225	192	194	192
query64	3278	687	730	687
query65	3357	3283	3210	3210
query66	751	297	310	297
query67	15692	15509	15308	15308
query68	4119	607	592	592
query69	422	278	277	277
query70	1190	1054	1127	1054
query71	320	268	263	263
query72	6500	4263	4013	4013
query73	754	351	348	348
query74	10099	9181	9057	9057
query75	3361	2643	2665	2643
query76	2123	1009	1020	1009
query77	383	272	264	264
query78	10524	9657	9505	9505
query79	1841	600	601	600
query80	1427	430	420	420
query81	550	219	217	217
query82	935	90	89	89
query83	233	143	147	143
query84	239	75	79	75
query85	1375	311	302	302
query86	420	294	300	294
query87	4457	4282	4300	4282
query88	3629	2407	2348	2348
query89	418	291	286	286
query90	2013	186	180	180
query91	185	149	151	149
query92	67	51	52	51
query93	2681	558	550	550
query94	866	299	299	299
query95	357	258	261	258
query96	618	291	286	286
query97	3296	3112	3116	3112
query98	216	205	198	198
query99	1555	1266	1289	1266
Total cold run time: 299521 ms
Total hot run time: 196578 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.98 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 155e06da717602b64a5a7effee94d0884f2ef5bd, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.02	0.03
query3	0.24	0.07	0.06
query4	1.62	0.11	0.11
query5	0.52	0.59	0.52
query6	1.13	0.72	0.73
query7	0.01	0.01	0.01
query8	0.04	0.03	0.03
query9	0.58	0.50	0.50
query10	0.54	0.55	0.56
query11	0.14	0.11	0.10
query12	0.15	0.11	0.11
query13	0.61	0.60	0.59
query14	0.77	0.78	0.82
query15	0.84	0.82	0.82
query16	0.37	0.38	0.37
query17	1.00	1.02	0.97
query18	0.23	0.22	0.22
query19	1.94	1.85	1.77
query20	0.01	0.01	0.01
query21	15.39	0.58	0.56
query22	2.80	2.21	1.35
query23	16.97	1.01	0.88
query24	3.16	0.36	2.12
query25	0.13	0.10	0.17
query26	0.48	0.13	0.14
query27	0.04	0.05	0.04
query28	10.01	0.50	0.51
query29	12.57	3.21	3.22
query30	0.25	0.07	0.06
query31	2.86	0.38	0.39
query32	3.26	0.47	0.46
query33	2.93	2.96	3.00
query34	17.18	4.49	4.46
query35	4.56	4.52	4.45
query36	0.68	0.48	0.48
query37	0.09	0.07	0.06
query38	0.04	0.03	0.04
query39	0.04	0.03	0.02
query40	0.17	0.13	0.13
query41	0.08	0.02	0.03
query42	0.03	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 104.6 s
Total hot run time: 28.98 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 8b62d34 into branch-3.0 Jul 9, 2025
23 of 25 checks passed
@github-actions github-actions bot deleted the auto-pick-52887-branch-3.0 branch July 9, 2025 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants