Skip to content

Conversation

@kaijchen
Copy link
Member

@kaijchen kaijchen commented Oct 22, 2024

backport #42039

…ache#42039)

## Proposed changes
Currently, an upstream BE (sink_v2) will open multiple streams to a
downstream BE (load_stream).
If any of the streams fails, the use_cnt on the downstream BE will be
messed up.
The load_stream will not report any success tablets to the sink_v2 since
in its view there are still unfinished streams.

So fault tolerance when open streams is not meaningful in practical, and
may cause data lost.
i.e. Upstream think there is still working streams to transfer data, but
downstream does not report any commit info.

This PR removes fault tolerance when open multiple streams to the same
backend.
If any of the open fails, the upstream sink_v2 should mark the
downstream BE as failed replicas.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@kaijchen
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40355 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e2814d6cadaa498d3a181b2d3d67063f5bda9937, data reload: false

------ Round 1 ----------------------------------
q1	17658	7473	7408	7408
q2	2128	153	143	143
q3	10914	1101	1155	1101
q4	10229	724	736	724
q5	7721	2829	2758	2758
q6	234	153	146	146
q7	969	601	617	601
q8	9343	1880	1951	1880
q9	6515	6355	6347	6347
q10	6996	2292	2242	2242
q11	436	241	254	241
q12	404	213	213	213
q13	17811	3001	2964	2964
q14	235	219	209	209
q15	568	531	534	531
q16	672	601	602	601
q17	955	536	521	521
q18	7318	6611	6517	6517
q19	1834	960	991	960
q20	571	283	271	271
q21	3834	3178	3008	3008
q22	1026	991	969	969
Total cold run time: 108371 ms
Total hot run time: 40355 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7349	7253	7246	7246
q2	320	224	232	224
q3	2858	2677	2761	2677
q4	1869	1707	1708	1707
q5	5355	5444	5453	5444
q6	231	143	141	141
q7	2102	1645	1654	1645
q8	3198	3380	3411	3380
q9	8501	8472	8555	8472
q10	3427	3370	3392	3370
q11	580	474	479	474
q12	760	555	587	555
q13	16882	2968	2980	2968
q14	285	264	256	256
q15	599	564	518	518
q16	689	657	666	657
q17	1797	1564	1556	1556
q18	7617	7464	7247	7247
q19	1641	1479	1442	1442
q20	1971	1794	1796	1794
q21	5236	5029	5024	5024
q22	1083	978	971	971
Total cold run time: 74350 ms
Total hot run time: 57768 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188471 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e2814d6cadaa498d3a181b2d3d67063f5bda9937, data reload: false

query1	997	364	361	361
query2	6521	2099	2009	2009
query3	6706	212	227	212
query4	33996	23300	23642	23300
query5	4367	452	441	441
query6	263	171	166	166
query7	4637	312	301	301
query8	255	213	199	199
query9	9587	2674	2664	2664
query10	478	273	264	264
query11	17969	15048	15156	15048
query12	156	100	104	100
query13	1624	417	394	394
query14	9755	7103	6328	6328
query15	250	170	171	170
query16	8071	481	503	481
query17	1628	564	554	554
query18	2113	302	312	302
query19	256	146	149	146
query20	117	107	107	107
query21	214	105	104	104
query22	4445	4176	4207	4176
query23	34725	33747	34027	33747
query24	11707	2848	2756	2756
query25	696	410	403	403
query26	1772	167	170	167
query27	2838	301	293	293
query28	8022	2486	2439	2439
query29	1037	438	450	438
query30	332	156	160	156
query31	1010	789	818	789
query32	99	58	59	58
query33	760	298	292	292
query34	960	489	500	489
query35	913	750	716	716
query36	1083	929	917	917
query37	186	89	87	87
query38	3953	3933	3811	3811
query39	1465	1425	1400	1400
query40	295	102	98	98
query41	53	50	48	48
query42	116	97	94	94
query43	522	483	486	483
query44	1261	764	791	764
query45	194	166	165	165
query46	1122	721	714	714
query47	1882	1816	1786	1786
query48	469	373	363	363
query49	1257	396	385	385
query50	815	415	397	397
query51	7006	6865	6802	6802
query52	105	96	93	93
query53	259	188	179	179
query54	1204	468	461	461
query55	80	74	78	74
query56	276	250	257	250
query57	1203	1087	1086	1086
query58	276	246	246	246
query59	3075	2814	2818	2814
query60	311	282	279	279
query61	125	124	134	124
query62	837	650	678	650
query63	206	181	182	181
query64	5202	638	617	617
query65	3274	3308	3158	3158
query66	1422	313	293	293
query67	15714	15279	15115	15115
query68	4860	557	548	548
query69	535	285	286	285
query70	1137	1066	1124	1066
query71	452	279	281	279
query72	7372	3943	3816	3816
query73	760	343	362	343
query74	10435	8891	8926	8891
query75	3967	2645	2638	2638
query76	3427	870	930	870
query77	554	276	291	276
query78	9836	9396	9423	9396
query79	3083	578	590	578
query80	2338	441	446	441
query81	558	235	243	235
query82	937	143	146	143
query83	289	134	133	133
query84	300	87	89	87
query85	2195	291	284	284
query86	483	310	303	303
query87	4457	4325	4430	4325
query88	3987	2413	2399	2399
query89	402	280	285	280
query90	1959	184	183	183
query91	173	140	141	140
query92	67	48	49	48
query93	2431	546	529	529
query94	1036	297	287	287
query95	349	251	244	244
query96	612	280	281	280
query97	3299	3167	3184	3167
query98	213	194	197	194
query99	1640	1300	1279	1279
Total cold run time: 308667 ms
Total hot run time: 188471 ms

@liaoxin01 liaoxin01 merged commit 262f845 into apache:branch-3.0 Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants