Skip to content

Conversation

@gavinchou
Copy link
Contributor

In cloud mode, FE master is not mandatory for committing load txn, and there is overhead (locks and PRC) if we commit load txn via FE master.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@wm1581066 wm1581066 requested a review from liaoxin01 June 13, 2024 06:08
@gavinchou gavinchou requested a review from Jibing-Li June 13, 2024 06:08
@gavinchou
Copy link
Contributor Author

run buildall

@gavinchou gavinchou changed the title [Fix](streamload) Commit txn for streamload in cloud mode to avoid RPC and lock overhead [Fix](streamload) Commit txn for streamload on BE coordinator in cloud mode to avoid RPC and lock overhead Jun 13, 2024
@gavinchou gavinchou changed the title [Fix](streamload) Commit txn for streamload on BE coordinator in cloud mode to avoid RPC and lock overhead [Opt](streamload) Commit txn on BE coordinator in cloud mode to avoid RPC and lock overhead Jun 13, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

put_result.params.is_mow_table) ||
(put_result.__isset.pipeline_params && put_result.pipeline_params.__isset.is_mow_table &&
put_result.pipeline_params.is_mow_table)) {
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: redundant boolean literal in conditional return statement [readability-simplify-boolean-expr]

be/src/runtime/stream_load/stream_load_context.cpp:353:

-     if ((put_result.__isset.params && put_result.params.__isset.is_mow_table &&
-          put_result.params.is_mow_table) ||
-         (put_result.__isset.pipeline_params && put_result.pipeline_params.__isset.is_mow_table &&
-          put_result.pipeline_params.is_mow_table)) {
-         return true;
-     }
-     return false;
+     return (put_result.__isset.params && put_result.params.__isset.is_mow_table &&
+          put_result.params.is_mow_table) ||
+         (put_result.__isset.pipeline_params && put_result.pipeline_params.__isset.is_mow_table &&
+          put_result.pipeline_params.is_mow_table);

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.50% (9012/24688)
Line Coverage: 28.08% (73922/263216)
Region Coverage: 27.54% (38384/139354)
Branch Coverage: 24.22% (19549/80698)
Coverage Report: http://coverage.selectdb-in.cc/coverage/3be880fac01cf8b4b86f1801713659f3c4b267ef_3be880fac01cf8b4b86f1801713659f3c4b267ef/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 39630 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3be880fac01cf8b4b86f1801713659f3c4b267ef, data reload: false

------ Round 1 ----------------------------------
q1	17626	4297	4305	4297
q2	2020	197	207	197
q3	10436	1147	1113	1113
q4	10198	824	708	708
q5	7573	2653	2630	2630
q6	217	135	132	132
q7	951	611	590	590
q8	9215	2072	2081	2072
q9	8876	6517	6521	6517
q10	8955	3703	3693	3693
q11	443	233	239	233
q12	485	231	224	224
q13	17757	2982	2947	2947
q14	269	216	223	216
q15	507	468	459	459
q16	512	379	368	368
q17	969	669	727	669
q18	7988	7524	7368	7368
q19	7818	1474	1441	1441
q20	659	319	329	319
q21	4959	3109	4003	3109
q22	392	328	338	328
Total cold run time: 118825 ms
Total hot run time: 39630 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4379	4221	4237	4221
q2	380	279	272	272
q3	3051	3006	2945	2945
q4	1960	1754	1788	1754
q5	5535	5511	5468	5468
q6	232	134	130	130
q7	2259	1853	1761	1761
q8	3284	3432	3449	3432
q9	8710	8757	8784	8757
q10	4086	3773	3701	3701
q11	603	505	487	487
q12	815	623	665	623
q13	15956	3183	3135	3135
q14	325	277	276	276
q15	524	487	481	481
q16	492	438	439	438
q17	1847	1526	1510	1510
q18	8063	7928	7726	7726
q19	1827	1662	1512	1512
q20	2869	1900	1872	1872
q21	10658	4846	4898	4846
q22	621	529	560	529
Total cold run time: 78476 ms
Total hot run time: 55876 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173660 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3be880fac01cf8b4b86f1801713659f3c4b267ef, data reload: false

query1	925	372	370	370
query2	6449	2499	2379	2379
query3	6638	209	200	200
query4	19073	17194	17266	17194
query5	3611	468	459	459
query6	234	151	151	151
query7	4572	300	291	291
query8	333	285	276	276
query9	8532	2470	2452	2452
query10	574	315	285	285
query11	10642	10049	10067	10049
query12	116	84	88	84
query13	1636	361	365	361
query14	8663	7372	7606	7372
query15	236	198	187	187
query16	7767	269	262	262
query17	1800	546	504	504
query18	1953	270	273	270
query19	201	148	146	146
query20	92	82	80	80
query21	213	131	127	127
query22	4275	4160	3875	3875
query23	33651	33680	33716	33680
query24	11140	2858	2892	2858
query25	643	374	366	366
query26	1116	154	155	154
query27	2370	344	324	324
query28	6806	2164	2152	2152
query29	906	654	633	633
query30	237	152	156	152
query31	983	759	764	759
query32	95	55	54	54
query33	780	282	266	266
query34	1089	488	477	477
query35	739	624	632	624
query36	1172	962	978	962
query37	140	74	70	70
query38	2942	2807	2817	2807
query39	909	833	835	833
query40	208	132	125	125
query41	57	55	55	55
query42	104	101	105	101
query43	588	553	545	545
query44	1187	717	752	717
query45	189	166	164	164
query46	1062	746	741	741
query47	1847	1766	1768	1766
query48	373	292	303	292
query49	844	395	395	395
query50	767	403	391	391
query51	6820	6666	6622	6622
query52	99	91	94	91
query53	351	279	294	279
query54	852	434	433	433
query55	70	71	72	71
query56	269	251	252	251
query57	1103	1084	1060	1060
query58	235	247	255	247
query59	3481	3158	3171	3158
query60	284	282	285	282
query61	110	106	103	103
query62	650	438	437	437
query63	324	280	283	280
query64	8831	2212	1729	1729
query65	3159	3120	3094	3094
query66	749	327	322	322
query67	15211	15125	14907	14907
query68	4476	543	549	543
query69	495	396	339	339
query70	1209	1085	1097	1085
query71	407	311	265	265
query72	7464	5705	5162	5162
query73	738	326	331	326
query74	5845	5471	5502	5471
query75	3342	2658	2619	2619
query76	2250	904	939	904
query77	419	291	288	288
query78	10327	9798	9749	9749
query79	2533	513	509	509
query80	1430	455	454	454
query81	600	219	223	219
query82	886	100	98	98
query83	270	172	167	167
query84	251	85	83	83
query85	1116	280	266	266
query86	440	326	293	293
query87	3342	3142	3089	3089
query88	4250	2475	2441	2441
query89	468	373	380	373
query90	1666	197	195	195
query91	135	110	110	110
query92	57	51	51	51
query93	1930	507	507	507
query94	1097	199	197	197
query95	417	319	389	319
query96	594	277	275	275
query97	3243	3151	3021	3021
query98	221	198	202	198
query99	1266	841	865	841
Total cold run time: 267261 ms
Total hot run time: 173660 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.7 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3be880fac01cf8b4b86f1801713659f3c4b267ef, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.22	0.04	0.04
query4	1.69	0.07	0.07
query5	0.50	0.49	0.48
query6	1.13	0.72	0.72
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.53	0.49	0.50
query10	0.52	0.56	0.54
query11	0.15	0.11	0.11
query12	0.15	0.12	0.12
query13	0.59	0.58	0.60
query14	0.78	0.78	0.80
query15	0.83	0.82	0.80
query16	0.35	0.35	0.36
query17	1.05	0.99	0.97
query18	0.24	0.24	0.20
query19	1.76	1.77	1.74
query20	0.01	0.02	0.01
query21	15.42	0.66	0.66
query22	3.81	6.44	3.01
query23	18.24	1.42	1.34
query24	2.15	0.21	0.22
query25	0.15	0.09	0.08
query26	0.26	0.17	0.17
query27	0.08	0.09	0.07
query28	13.24	1.02	1.00
query29	12.60	3.30	3.26
query30	0.28	0.06	0.06
query31	2.87	0.38	0.37
query32	3.29	0.46	0.47
query33	2.89	2.86	2.96
query34	17.07	4.46	4.51
query35	4.55	4.50	4.49
query36	0.65	0.47	0.47
query37	0.18	0.16	0.15
query38	0.15	0.15	0.15
query39	0.04	0.03	0.04
query40	0.20	0.15	0.16
query41	0.09	0.04	0.05
query42	0.05	0.05	0.04
query43	0.04	0.04	0.04
Total cold run time: 108.99 s
Total hot run time: 31.7 s

@gavinchou
Copy link
Contributor Author

gavinchou commented Jun 13, 2024

refer to #34548
produceEvent() should be in afterCommitTxnResp()

        afterCommitTxnResp(commitTxnResponse);
        // Here, we only wait for the EventProcessor to finish processing the event,
        // but regardless of the success or failure of the result,
        // it does not affect the logic of transaction
        try {
            produceEvent(dbId, tableList);
        } catch (Throwable t) {
            // According to normal logic, no exceptions will be thrown,
            // but in order to avoid bugs affecting the original logic, all exceptions are caught
            LOG.warn("produceEvent failed: ", t);
        }
    }

@gavinchou gavinchou force-pushed the gavin-fix-strema-load-commit-txn branch from 5dbd545 to 062b257 Compare June 13, 2024 13:43
@gavinchou
Copy link
Contributor Author

run buildall

@gavinchou gavinchou force-pushed the gavin-fix-strema-load-commit-txn branch from 062b257 to 22b306e Compare June 13, 2024 13:46
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@gavinchou
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 13, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.44% (8988/24664)
Line Coverage: 28.02% (73674/262926)
Region Coverage: 27.49% (38265/139193)
Branch Coverage: 24.20% (19510/80636)
Coverage Report: http://coverage.selectdb-in.cc/coverage/22b306e4abeda4b77b5649e0773fb5d04b7c0429_22b306e4abeda4b77b5649e0773fb5d04b7c0429/report/index.html

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doris-robot
Copy link

TPC-H: Total hot run time: 39877 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 22b306e4abeda4b77b5649e0773fb5d04b7c0429, data reload: false

------ Round 1 ----------------------------------
q1	17645	4354	4317	4317
q2	2034	186	185	185
q3	10458	1173	1116	1116
q4	10198	808	801	801
q5	7534	2689	2654	2654
q6	222	133	134	133
q7	962	603	592	592
q8	9220	2074	2099	2074
q9	9052	6497	6518	6497
q10	8992	3774	3710	3710
q11	432	235	235	235
q12	439	230	221	221
q13	18986	2934	2954	2934
q14	265	214	212	212
q15	531	463	472	463
q16	488	386	375	375
q17	969	713	675	675
q18	7978	7407	7320	7320
q19	8522	1473	1458	1458
q20	641	312	321	312
q21	4829	3268	3902	3268
q22	391	352	325	325
Total cold run time: 120788 ms
Total hot run time: 39877 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4371	4269	4277	4269
q2	363	255	275	255
q3	3272	2905	2874	2874
q4	2046	1788	1694	1694
q5	5462	5434	5554	5434
q6	220	130	133	130
q7	2235	1770	1829	1770
q8	3319	3397	3421	3397
q9	8704	8921	8754	8754
q10	4024	3720	3841	3720
q11	583	503	494	494
q12	786	651	622	622
q13	17209	3135	3188	3135
q14	306	269	291	269
q15	524	471	495	471
q16	479	421	415	415
q17	1813	1510	1545	1510
q18	8035	7974	7837	7837
q19	1853	1675	1508	1508
q20	2180	1823	1851	1823
q21	4948	4859	4843	4843
q22	615	545	558	545
Total cold run time: 73347 ms
Total hot run time: 55769 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173411 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 22b306e4abeda4b77b5649e0773fb5d04b7c0429, data reload: false

query1	930	376	380	376
query2	6390	2375	2362	2362
query3	6638	212	216	212
query4	18960	17341	17223	17223
query5	3663	462	447	447
query6	242	157	168	157
query7	4585	291	292	291
query8	321	295	297	295
query9	8577	2397	2408	2397
query10	548	338	269	269
query11	10522	9964	10089	9964
query12	118	85	83	83
query13	1630	358	352	352
query14	9493	7669	7580	7580
query15	238	195	190	190
query16	7715	253	261	253
query17	1914	549	502	502
query18	1820	267	269	267
query19	202	174	158	158
query20	94	83	83	83
query21	211	130	127	127
query22	4387	4025	4005	4005
query23	33541	33567	33574	33567
query24	11081	2965	2902	2902
query25	659	361	381	361
query26	768	150	152	150
query27	2344	325	334	325
query28	6475	2109	2106	2106
query29	889	619	617	617
query30	233	153	156	153
query31	974	772	740	740
query32	106	53	53	53
query33	781	272	313	272
query34	1077	498	476	476
query35	771	658	645	645
query36	1108	958	982	958
query37	149	74	79	74
query38	2931	2848	2831	2831
query39	938	865	812	812
query40	219	125	127	125
query41	53	52	51	51
query42	112	97	99	97
query43	589	563	540	540
query44	1224	730	736	730
query45	190	164	162	162
query46	1081	731	719	719
query47	1868	1751	1761	1751
query48	367	314	295	295
query49	836	393	395	393
query50	767	382	384	382
query51	6711	6644	6708	6644
query52	102	92	92	92
query53	353	285	287	285
query54	879	445	430	430
query55	74	76	73	73
query56	267	260	257	257
query57	1122	1035	1038	1035
query58	240	255	250	250
query59	3313	3151	3106	3106
query60	287	258	280	258
query61	110	86	90	86
query62	605	449	434	434
query63	320	291	284	284
query64	8678	2230	1761	1761
query65	3170	3067	3090	3067
query66	747	322	328	322
query67	15474	15017	14742	14742
query68	6159	541	529	529
query69	650	464	423	423
query70	1221	1085	1164	1085
query71	477	273	262	262
query72	7397	5286	5459	5286
query73	809	323	322	322
query74	5981	5586	5462	5462
query75	3896	2624	2617	2617
query76	3854	1028	982	982
query77	645	346	289	289
query78	10405	9940	9748	9748
query79	1709	510	508	508
query80	2029	459	447	447
query81	559	213	218	213
query82	788	103	101	101
query83	290	163	162	162
query84	258	82	84	82
query85	1284	285	261	261
query86	450	304	311	304
query87	3235	3057	3125	3057
query88	3629	2332	2327	2327
query89	473	379	369	369
query90	1740	198	187	187
query91	128	115	100	100
query92	56	51	55	51
query93	1711	506	491	491
query94	1057	183	180	180
query95	397	309	303	303
query96	584	272	261	261
query97	3242	2969	3067	2969
query98	224	204	191	191
query99	1243	836	855	836
Total cold run time: 270361 ms
Total hot run time: 173411 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.74 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 22b306e4abeda4b77b5649e0773fb5d04b7c0429, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.69	0.08	0.07
query5	0.52	0.49	0.49
query6	1.12	0.72	0.73
query7	0.02	0.01	0.01
query8	0.06	0.05	0.04
query9	0.56	0.49	0.49
query10	0.55	0.55	0.55
query11	0.16	0.12	0.11
query12	0.15	0.12	0.12
query13	0.59	0.58	0.60
query14	0.79	0.79	0.77
query15	0.82	0.81	0.81
query16	0.36	0.34	0.36
query17	1.05	1.07	1.01
query18	0.21	0.24	0.25
query19	1.89	1.79	1.83
query20	0.01	0.00	0.00
query21	15.40	0.66	0.65
query22	4.56	6.81	2.33
query23	18.35	1.28	1.09
query24	2.12	0.21	0.22
query25	0.16	0.09	0.08
query26	0.26	0.18	0.18
query27	0.08	0.09	0.07
query28	13.29	1.01	0.99
query29	12.63	3.30	3.23
query30	0.25	0.07	0.06
query31	2.87	0.39	0.37
query32	3.26	0.48	0.46
query33	2.89	2.89	2.86
query34	17.26	4.38	4.42
query35	4.54	4.46	4.55
query36	0.65	0.49	0.47
query37	0.17	0.15	0.15
query38	0.15	0.14	0.14
query39	0.04	0.04	0.03
query40	0.17	0.15	0.14
query41	0.09	0.04	0.05
query42	0.06	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 110.19 s
Total hot run time: 30.74 s

@dataroaring dataroaring merged commit 18a9b7b into apache:master Jun 15, 2024
gavinchou added a commit to gavinchou/doris that referenced this pull request Jun 18, 2024
Fix incomplete optimization introduced by apache#36237
[Opt](streamload) Commit txn on BE coordinator in cloud mode to avoid RPC and lock overhead (apache#36237)
gavinchou added a commit to gavinchou/doris that referenced this pull request Jun 18, 2024
Fix incomplete optimization introduced by apache#36237
[Opt](streamload) Commit txn on BE coordinator in cloud mode to avoid RPC and lock overhead (apache#36237)
gavinchou added a commit to gavinchou/doris that referenced this pull request Jun 19, 2024
Fix incomplete optimization introduced by apache#36237
[Opt](streamload) Commit txn on BE coordinator in cloud mode to avoid RPC and lock overhead (apache#36237)
gavinchou added a commit that referenced this pull request Jun 20, 2024
…troduced by #36237 (#36496)

Fix incomplete optimization introduced by #36237
[Opt](streamload) Commit txn on BE coordinator in cloud mode to avoid
RPC and lock overhead (#36237)
iszhangpch pushed a commit to iszhangpch/doris-p that referenced this pull request Jun 21, 2024
…troduced by apache#36237 (apache#36496)

Fix incomplete optimization introduced by apache#36237
[Opt](streamload) Commit txn on BE coordinator in cloud mode to avoid
RPC and lock overhead (apache#36237)
dataroaring pushed a commit that referenced this pull request Jun 21, 2024
… RPC and lock overhead (#36237)

In cloud mode, FE master is not mandatory for committing load txn, and
there is overhead (locks and PRC) if we commit load txn via FE master.
dataroaring pushed a commit that referenced this pull request Jun 21, 2024
…troduced by #36237 (#36496)

Fix incomplete optimization introduced by #36237
[Opt](streamload) Commit txn on BE coordinator in cloud mode to avoid
RPC and lock overhead (#36237)
dataroaring pushed a commit that referenced this pull request Jul 6, 2024
…7347)

## Proposed changes

introduce by #36237

If meets some error like `out of range`, routine load job state should
change from running to pause.

When rollback transaction, the RPC will send to meta service directly,
which can not change state by transaction information.

This pr sends RPC to FE first to solve this problem.
dataroaring pushed a commit that referenced this pull request Jul 6, 2024
…7347)

## Proposed changes

introduce by #36237

If meets some error like `out of range`, routine load job state should
change from running to pause.

When rollback transaction, the RPC will send to meta service directly,
which can not change state by transaction information.

This pr sends RPC to FE first to solve this problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants