Skip to content

Conversation

@sollhui
Copy link
Contributor

@sollhui sollhui commented Jul 24, 2025

pick #53799

… leader change (apache#53799)

multi table load plan fail after restart master Fe or leader change:
```
mysql> show routine load for test_multi_table\G
***************************
Id: 1753247186255
Name: test2
CreateTime: 2025-07-23 13:06:53
PauseTime: NULL
EndTime: NULL
DbName: db
TableName:
IsMultiTable: true
State: RUNNING
DataSourceType: KAFKA
CurrentTaskNum: 1
JobProperties: {"max_batch_rows": "3000000","timezone":"Asia/Shanghai","send_batch_parallelism":"1","loadd_to_single_tablet":"false","column_separator":";'''","line_delimiter":"\n","delete":"*","
current_concurrent_number":"1","partial_columns":"false","merge_type":"APPEND","exec_mem_limit":"2147483648","strict_mode":"false","max_batch_interval": 20","max_batch_size": "209715200","esscape":"\u
0000","enclose":"\u0000","partitions":"**","columnToColumnExpr":"","whereExpr":"*****'',"desired_concurrent_number":"256","precedingFilter":"*","format":"csv","max_error_number":"0","max_filter_ratio":"1.
0","sequence_col":"****}
DataSourceProperties: {"topic":"my-topic","currentkafkaPartitions": "0", "brokerList": "10.16.10.10.10.77:19092"}
CustomProperties: {"kafka_default_offsets":"OFFSET_BEGINNING","group.id": "test2_7f6143d8-f270-4667-851a-e8fb87c27d32"}
Statistic: {"receivedBytes":89,"runningTxns": [1542060502549504],"errorRows":0, "committedTaskNum":0, "loadedRows":1,"LoadRowsRate":0,"abortedTaskNum":7,"errorRowsAfterResumed":0,"totalRows"
:1,"unselectedRows":0,"receivedBytesRate":1,"taskExecuteTimeMs":51588}
Progress: {"0":"0"}
Lag: {"0":1}
ReasonOfStateChanged:
ErrorLogUrls:
OtherMsg: 2025-07-23 13:08:07: [INTERNAL_ERROR]TStatus:AnalysisException: errCode = 2, detailMessage = , connect context's user is null, ComputeGroupException: CURRENT_USER_NO_AUTH_TO_US
E_DEFAULT_COMPUTE_GROUP, you can contact the system admministrator and request that they grant you the defaultcompute group permissions, use SQL 'SHOW PROPERTY like'default_compute_group'` and
NT USAGE_PRIV ON COMPUTE GROUP {compute_group_name}TO{user}
GRA
0# #
doris::Status doris::Status::create<true>(doris::TStatus const&) at /mnt/disk1/laihui/build/ldb_toolchain/bin/../lib/gcc/x86_64-pc-linux-gnu/114/include/g++-v14/bits/basic_string.h:228
1# doris::io::MultiTablePipe::request_and_exec_plans() at /mnt/disk1/laihui/doris/be/src/common/status.h:522
2#
doris: RoutineLoadTaskExecutor::exec_task(std::shared_ptr<doris: StreamLoadContext>, doris::DataConsumerPool*, std::function<void (std::shared_ptr<doris::StreamLoadContext>)>) at /mnt/di
sk1/laihui/doris/be/src/runtime/routine_load/routine_load_task_executor.cpp:0
3#
std::_Function_handler<void (), ... (reason istruncated, check fe.log with txnId for details(1
User: root
Comment:
```

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
@sollhui sollhui requested a review from morrySnow as a code owner July 24, 2025 06:16
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Contributor Author

sollhui commented Jul 24, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32039 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a4ef25e6539ccaa51f65360385816208c9917fbb, data reload: false

------ Round 1 ----------------------------------
q1	17587	5491	5365	5365
q2	2043	290	175	175
q3	10449	1220	735	735
q4	10202	855	433	433
q5	7636	2322	2079	2079
q6	179	161	131	131
q7	884	748	610	610
q8	9308	1378	1117	1117
q9	5178	4845	4912	4845
q10	6756	2235	1835	1835
q11	464	268	246	246
q12	328	351	205	205
q13	17783	3576	3041	3041
q14	216	214	204	204
q15	506	459	477	459
q16	414	419	374	374
q17	569	843	350	350
q18	6744	6405	6220	6220
q19	1197	933	534	534
q20	319	343	207	207
q21	2749	2076	1879	1879
q22	1046	1058	995	995
Total cold run time: 102557 ms
Total hot run time: 32039 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5536	5451	5466	5451
q2	235	330	237	237
q3	2225	2584	2275	2275
q4	1365	1736	1321	1321
q5	4414	4753	4762	4753
q6	159	157	126	126
q7	2025	1935	1783	1783
q8	2604	2788	2649	2649
q9	7214	7089	7099	7089
q10	3049	3261	2768	2768
q11	565	518	489	489
q12	638	742	566	566
q13	3375	3772	3199	3199
q14	281	295	274	274
q15	514	460	466	460
q16	439	471	424	424
q17	1186	1710	1240	1240
q18	7539	7449	7368	7368
q19	771	1333	1206	1206
q20	2000	2009	1871	1871
q21	5259	4786	4616	4616
q22	1109	1049	1027	1027
Total cold run time: 52502 ms
Total hot run time: 51192 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196475 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a4ef25e6539ccaa51f65360385816208c9917fbb, data reload: false

query1	1276	917	906	906
query2	6221	1983	1964	1964
query3	10997	4328	4547	4328
query4	32918	23817	23492	23492
query5	3751	621	465	465
query6	283	203	185	185
query7	3988	484	323	323
query8	298	239	235	235
query9	9475	2605	2604	2604
query10	447	305	252	252
query11	18115	15754	15091	15091
query12	169	103	101	101
query13	1552	524	412	412
query14	10243	7147	6973	6973
query15	230	201	181	181
query16	7913	645	508	508
query17	1541	747	569	569
query18	2104	421	307	307
query19	217	196	160	160
query20	124	121	117	117
query21	204	133	106	106
query22	4517	4537	4484	4484
query23	35241	34049	34387	34049
query24	7182	2684	2687	2684
query25	518	490	429	429
query26	848	280	174	174
query27	2057	472	357	357
query28	5580	2212	2173	2173
query29	692	592	464	464
query30	248	189	163	163
query31	1009	920	820	820
query32	92	60	59	59
query33	506	376	320	320
query34	734	846	506	506
query35	796	811	743	743
query36	1007	1088	943	943
query37	97	92	68	68
query38	4057	4070	4003	4003
query39	1539	1464	1459	1459
query40	207	138	105	105
query41	52	47	45	45
query42	118	104	106	104
query43	517	532	479	479
query44	1376	847	864	847
query45	180	176	173	173
query46	892	1062	667	667
query47	2000	1986	1945	1945
query48	408	433	329	329
query49	737	485	392	392
query50	676	681	432	432
query51	7308	7358	7254	7254
query52	103	104	92	92
query53	237	256	190	190
query54	547	537	482	482
query55	84	79	77	77
query56	257	275	266	266
query57	1290	1260	1212	1212
query58	241	224	219	219
query59	3089	3154	2966	2966
query60	289	271	262	262
query61	109	103	112	103
query62	784	749	676	676
query63	226	196	196	196
query64	3698	1012	626	626
query65	3324	3245	3269	3245
query66	820	411	300	300
query67	16357	15762	15543	15543
query68	8350	820	534	534
query69	480	303	261	261
query70	1121	1151	1092	1092
query71	392	295	263	263
query72	5121	3714	3715	3714
query73	623	734	362	362
query74	10378	9302	9282	9282
query75	3480	3073	2670	2670
query76	3348	1159	753	753
query77	570	354	278	278
query78	10425	10281	9514	9514
query79	3763	898	581	581
query80	787	518	425	425
query81	502	266	215	215
query82	1187	116	94	94
query83	159	156	143	143
query84	247	100	79	79
query85	777	377	294	294
query86	412	306	252	252
query87	4316	4367	4241	4241
query88	5047	2394	2393	2393
query89	413	323	298	298
query90	1770	188	184	184
query91	135	131	104	104
query92	63	53	50	50
query93	3133	874	529	529
query94	746	408	293	293
query95	333	270	264	264
query96	483	606	278	278
query97	3162	3298	3139	3139
query98	241	203	201	201
query99	1351	1400	1291	1291
Total cold run time: 298027 ms
Total hot run time: 196475 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.41 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a4ef25e6539ccaa51f65360385816208c9917fbb, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.03	0.03
query3	0.23	0.07	0.07
query4	1.62	0.11	0.11
query5	0.52	0.53	0.50
query6	1.14	0.72	0.72
query7	0.02	0.01	0.01
query8	0.04	0.03	0.02
query9	0.56	0.50	0.50
query10	0.55	0.54	0.57
query11	0.14	0.11	0.11
query12	0.13	0.11	0.11
query13	0.61	0.60	0.60
query14	0.77	0.79	0.77
query15	0.83	0.83	0.81
query16	0.38	0.38	0.39
query17	1.05	1.04	1.04
query18	0.24	0.21	0.23
query19	1.86	1.86	1.82
query20	0.01	0.02	0.01
query21	15.41	0.94	0.58
query22	0.76	0.75	0.56
query23	15.29	1.40	0.52
query24	2.93	1.79	0.68
query25	0.20	0.26	0.07
query26	0.31	0.14	0.14
query27	0.06	0.05	0.07
query28	13.62	1.02	0.43
query29	12.63	3.94	3.28
query30	0.24	0.08	0.07
query31	2.83	0.59	0.37
query32	3.22	0.53	0.46
query33	2.98	3.03	3.04
query34	16.89	5.20	4.46
query35	4.59	4.57	4.54
query36	0.63	0.50	0.48
query37	0.09	0.06	0.06
query38	0.04	0.04	0.03
query39	0.03	0.02	0.02
query40	0.15	0.13	0.12
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 103.83 s
Total hot run time: 28.41 s

@morrySnow morrySnow changed the title branch-3.1: [fix](load) fix multi table load plan fail after restart master Fe or leader change (#53799) branch-3.1: [fix](load) fix multi table load plan fail after restart master Fe or leader change #53799 Jul 25, 2025
@morrySnow morrySnow merged commit f604067 into apache:branch-3.1 Jul 25, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants