Skip to content

Conversation

@mymeiyi
Copy link
Contributor

@mymeiyi mymeiyi commented Aug 5, 2024

Proposed changes

When replay wal, it firstly abort the txn with the label but does not check the abort result.
And when begin txn of replay, if FE returns LabelAlreadyUsedException, it consider the load is success in previous group commit load or repaly wal, and delete this wal directly.
But LabelAlreadyUsedException means there is a txn with this label, the txn may be in PREPARE / RUNNING / COMMITTED / VISIBLE status(the abort txn in first step may fail), so replay wal should check both LabelAlreadyUsedException and txn status is COMMITTED / VISIBLE.

This pr also add a case for replay wal with schema change.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@github-actions github-actions bot added the doing label Aug 5, 2024
@mymeiyi
Copy link
Contributor Author

mymeiyi commented Aug 5, 2024

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 41220 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit aad5da799c956af9fcae0456ad5cd554bfa0399b, data reload: false

------ Round 1 ----------------------------------
q1	17617	4044	4036	4036
q2	2013	199	194	194
q3	10467	1242	1360	1242
q4	10150	860	923	860
q5	7571	2973	2936	2936
q6	221	140	137	137
q7	1056	610	601	601
q8	9423	1791	1922	1791
q9	8494	6589	6570	6570
q10	8743	3812	3816	3812
q11	434	248	251	248
q12	412	230	219	219
q13	17752	2918	2943	2918
q14	268	245	243	243
q15	523	487	489	487
q16	527	407	381	381
q17	959	925	867	867
q18	7935	7180	7258	7180
q19	1819	1186	1203	1186
q20	533	333	328	328
q21	5192	4738	4709	4709
q22	340	276	275	275
Total cold run time: 112449 ms
Total hot run time: 41220 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4077	4034	3991	3991
q2	328	234	219	219
q3	2968	2984	3023	2984
q4	1983	2001	1996	1996
q5	5528	5515	5458	5458
q6	217	140	135	135
q7	2085	1758	1790	1758
q8	3279	3351	3301	3301
q9	8672	8615	8769	8615
q10	3955	3992	3923	3923
q11	539	450	461	450
q12	755	583	586	583
q13	16323	3122	3094	3094
q14	306	298	274	274
q15	543	494	511	494
q16	465	412	417	412
q17	1779	1728	1704	1704
q18	8097	7692	7756	7692
q19	1718	1696	1706	1696
q20	2067	1904	1840	1840
q21	5665	5485	5324	5324
q22	526	474	469	469
Total cold run time: 71875 ms
Total hot run time: 56412 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168565 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit aad5da799c956af9fcae0456ad5cd554bfa0399b, data reload: false

query1	908	378	364	364
query2	6470	1691	1692	1691
query3	6661	212	214	212
query4	20265	17401	17125	17125
query5	3685	504	496	496
query6	271	166	169	166
query7	4594	299	290	290
query8	258	202	192	192
query9	8523	2388	2380	2380
query10	420	298	267	267
query11	10433	9846	9847	9846
query12	127	89	88	88
query13	1629	386	403	386
query14	8345	7557	6670	6670
query15	210	163	168	163
query16	6841	435	441	435
query17	968	563	561	561
query18	1917	299	283	283
query19	200	151	150	150
query20	96	85	86	85
query21	201	101	99	99
query22	4376	3996	3970	3970
query23	33655	33917	33404	33404
query24	9266	3117	3061	3061
query25	692	404	410	404
query26	1550	150	152	150
query27	3041	281	283	281
query28	7755	2051	2034	2034
query29	1131	448	421	421
query30	246	152	160	152
query31	987	786	755	755
query32	101	57	53	53
query33	673	324	338	324
query34	928	502	529	502
query35	856	757	763	757
query36	1051	864	892	864
query37	197	81	85	81
query38	2955	2881	2819	2819
query39	890	822	813	813
query40	284	118	114	114
query41	44	43	43	43
query42	120	100	99	99
query43	459	423	421	421
query44	1177	732	724	724
query45	205	176	178	176
query46	1088	802	781	781
query47	1779	1688	1737	1688
query48	353	292	294	292
query49	1025	408	427	408
query50	880	437	424	424
query51	6786	6710	6787	6710
query52	98	92	94	92
query53	248	194	185	185
query54	640	478	449	449
query55	75	73	75	73
query56	263	244	249	244
query57	1118	1032	1026	1026
query58	261	262	256	256
query59	2484	2344	2593	2344
query60	282	271	269	269
query61	99	98	94	94
query62	892	654	657	654
query63	210	187	184	184
query64	5737	1915	1922	1915
query65	3154	3118	3107	3107
query66	1275	338	328	328
query67	15190	14857	14747	14747
query68	4370	577	589	577
query69	474	301	291	291
query70	1081	1049	1023	1023
query71	412	282	278	278
query72	7645	2678	2472	2472
query73	756	330	335	330
query74	6018	5614	5653	5614
query75	3562	2748	2755	2748
query76	2537	1178	1281	1178
query77	587	304	311	304
query78	9405	8993	8822	8822
query79	2301	531	526	526
query80	1161	515	501	501
query81	552	225	226	225
query82	1082	132	131	131
query83	269	194	171	171
query84	340	78	83	78
query85	1251	326	298	298
query86	428	310	280	280
query87	3253	3119	3097	3097
query88	2921	2441	2400	2400
query89	398	284	295	284
query90	1828	194	191	191
query91	124	98	104	98
query92	66	49	50	49
query93	2036	612	615	612
query94	934	297	290	290
query95	375	268	268	268
query96	592	278	287	278
query97	3219	3069	3043	3043
query98	215	206	194	194
query99	1693	1273	1280	1273
Total cold run time: 261483 ms
Total hot run time: 168565 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.68 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit aad5da799c956af9fcae0456ad5cd554bfa0399b, data reload: false

query1	0.04	0.04	0.04
query2	0.07	0.04	0.03
query3	0.23	0.05	0.04
query4	1.68	0.07	0.06
query5	0.49	0.48	0.47
query6	1.14	0.72	0.72
query7	0.02	0.01	0.01
query8	0.06	0.04	0.05
query9	0.58	0.52	0.52
query10	0.56	0.57	0.56
query11	0.16	0.12	0.11
query12	0.14	0.12	0.12
query13	0.60	0.61	0.60
query14	0.77	0.79	0.79
query15	0.94	0.86	0.86
query16	0.36	0.35	0.34
query17	0.99	0.97	0.95
query18	0.22	0.24	0.20
query19	1.84	1.72	1.74
query20	0.02	0.01	0.01
query21	15.40	0.76	0.69
query22	4.09	8.10	1.15
query23	17.99	1.20	1.22
query24	2.26	0.22	0.22
query25	0.18	0.08	0.08
query26	0.33	0.22	0.21
query27	0.45	0.24	0.23
query28	13.18	0.98	0.96
query29	12.52	3.26	3.25
query30	0.25	0.07	0.06
query31	2.84	0.40	0.40
query32	3.25	0.50	0.49
query33	2.92	2.97	2.96
query34	15.47	4.23	4.25
query35	4.29	4.30	4.31
query36	0.68	0.48	0.47
query37	0.18	0.17	0.15
query38	0.17	0.15	0.15
query39	0.04	0.04	0.04
query40	0.17	0.12	0.14
query41	0.10	0.05	0.06
query42	0.06	0.05	0.04
query43	0.05	0.04	0.04
Total cold run time: 107.78 s
Total hot run time: 29.68 s

@mymeiyi
Copy link
Contributor Author

mymeiyi commented Aug 5, 2024

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 41510 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2eed50a823641a57d02040502cbc644adfcc0fe5, data reload: false

------ Round 1 ----------------------------------
q1	18236	4286	4161	4161
q2	2637	209	196	196
q3	11820	1327	1361	1327
q4	11016	904	895	895
q5	8633	3017	2940	2940
q6	218	137	138	137
q7	1050	616	626	616
q8	9434	1764	1920	1764
q9	8402	6528	6567	6528
q10	8717	3834	3823	3823
q11	426	250	251	250
q12	406	228	221	221
q13	17755	2930	3009	2930
q14	273	250	243	243
q15	525	479	501	479
q16	484	401	387	387
q17	964	896	878	878
q18	7988	7378	7218	7218
q19	1449	1206	1211	1206
q20	568	312	327	312
q21	5338	4724	4770	4724
q22	345	275	282	275
Total cold run time: 116684 ms
Total hot run time: 41510 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4094	4028	4057	4028
q2	326	217	224	217
q3	2988	3024	2968	2968
q4	1863	1861	1828	1828
q5	5255	5227	5223	5223
q6	213	127	127	127
q7	2085	1687	1673	1673
q8	3166	3245	3222	3222
q9	8285	8258	8203	8203
q10	3774	3834	3836	3834
q11	543	467	441	441
q12	732	590	562	562
q13	13785	2967	2956	2956
q14	294	258	253	253
q15	518	473	475	473
q16	439	401	398	398
q17	1725	1708	1722	1708
q18	7646	7363	7320	7320
q19	1678	1653	1650	1650
q20	1975	1761	1742	1742
q21	5445	5129	5042	5042
q22	534	457	434	434
Total cold run time: 67363 ms
Total hot run time: 54302 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168462 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2eed50a823641a57d02040502cbc644adfcc0fe5, data reload: false

query1	920	380	366	366
query2	6484	1673	1650	1650
query3	6665	214	227	214
query4	19518	17277	17212	17212
query5	4279	511	538	511
query6	296	181	188	181
query7	4602	299	298	298
query8	258	208	197	197
query9	8523	2395	2386	2386
query10	444	278	261	261
query11	10321	10069	10075	10069
query12	139	88	86	86
query13	1651	389	377	377
query14	8508	6855	6773	6773
query15	197	159	160	159
query16	7093	444	414	414
query17	944	562	541	541
query18	1935	278	277	277
query19	192	138	141	138
query20	92	82	84	82
query21	199	99	94	94
query22	4025	4145	4155	4145
query23	33968	33069	32783	32783
query24	10421	3046	3029	3029
query25	701	383	382	382
query26	1858	151	149	149
query27	2992	277	279	277
query28	7015	1992	1999	1992
query29	1362	420	409	409
query30	285	147	150	147
query31	951	766	739	739
query32	101	54	55	54
query33	700	328	316	316
query34	899	481	487	481
query35	818	714	742	714
query36	996	854	837	837
query37	299	79	81	79
query38	2875	2781	2785	2781
query39	866	811	801	801
query40	284	111	113	111
query41	46	46	43	43
query42	115	99	98	98
query43	487	415	420	415
query44	1149	728	730	728
query45	206	174	179	174
query46	1068	824	792	792
query47	1771	1690	1710	1690
query48	359	313	301	301
query49	1198	433	414	414
query50	889	426	423	423
query51	6738	6672	6667	6667
query52	104	92	89	89
query53	250	185	177	177
query54	673	454	460	454
query55	80	73	76	73
query56	279	262	271	262
query57	1153	1034	1050	1034
query58	281	274	290	274
query59	2596	2412	2351	2351
query60	304	275	281	275
query61	102	128	96	96
query62	934	675	681	675
query63	215	179	182	179
query64	5871	1903	1880	1880
query65	3162	3118	3101	3101
query66	1444	329	332	329
query67	15289	14817	14976	14817
query68	4372	560	575	560
query69	443	299	309	299
query70	1093	1075	1042	1042
query71	420	286	293	286
query72	7227	2695	2484	2484
query73	771	324	332	324
query74	6021	5586	5638	5586
query75	3346	2747	2771	2747
query76	2490	1177	1283	1177
query77	440	321	318	318
query78	9472	8998	8929	8929
query79	1421	540	541	540
query80	997	499	504	499
query81	578	225	229	225
query82	978	132	133	132
query83	244	168	173	168
query84	273	88	81	81
query85	1308	324	296	296
query86	403	307	310	307
query87	3247	3147	3078	3078
query88	2886	2408	2416	2408
query89	397	302	300	300
query90	1806	192	192	192
query91	130	167	100	100
query92	63	49	53	49
query93	1366	628	612	612
query94	916	316	309	309
query95	375	270	261	261
query96	597	278	283	278
query97	3210	3041	3068	3041
query98	220	205	208	205
query99	1654	1306	1314	1306
Total cold run time: 260603 ms
Total hot run time: 168462 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.84 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2eed50a823641a57d02040502cbc644adfcc0fe5, data reload: false

query1	0.05	0.04	0.04
query2	0.07	0.04	0.04
query3	0.22	0.05	0.05
query4	1.68	0.07	0.07
query5	0.48	0.47	0.49
query6	1.15	0.72	0.71
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.57	0.52	0.51
query10	0.56	0.57	0.56
query11	0.16	0.12	0.12
query12	0.15	0.12	0.12
query13	0.60	0.61	0.60
query14	0.76	0.81	0.81
query15	0.93	0.85	0.88
query16	0.36	0.35	0.35
query17	0.98	1.06	1.00
query18	0.22	0.20	0.21
query19	1.86	1.77	1.79
query20	0.01	0.01	0.00
query21	15.40	0.76	0.68
query22	4.21	8.94	1.15
query23	17.92	1.24	1.24
query24	2.24	0.22	0.22
query25	0.18	0.08	0.07
query26	0.32	0.21	0.21
query27	0.46	0.23	0.23
query28	13.15	1.01	0.98
query29	12.57	3.27	3.24
query30	0.25	0.06	0.05
query31	2.87	0.41	0.40
query32	3.24	0.49	0.49
query33	2.92	2.95	2.99
query34	15.46	4.28	4.24
query35	4.30	4.32	4.28
query36	0.67	0.48	0.49
query37	0.20	0.17	0.17
query38	0.15	0.15	0.16
query39	0.05	0.04	0.04
query40	0.15	0.12	0.14
query41	0.09	0.04	0.05
query42	0.06	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 107.79 s
Total hot run time: 29.84 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 5, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2024

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2024

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit 75b57f3 into apache:master Aug 7, 2024
mymeiyi added a commit to mymeiyi/doris that referenced this pull request Aug 7, 2024
When replay wal, it firstly abort the txn with the label but does not
check the abort result.
And when begin txn of replay, if FE returns `LabelAlreadyUsedException`,
it consider the load is success in previous group commit load or repaly
wal, and delete this wal directly.
But `LabelAlreadyUsedException` means there is a txn with this label,
the txn may be in `PREPARE / RUNNING / COMMITTED / VISIBLE` status(the
abort txn in first step may fail), so replay wal should check both
`LabelAlreadyUsedException` and txn status is `COMMITTED / VISIBLE`.

This pr also add a case for replay wal with schema change.
dataroaring pushed a commit that referenced this pull request Aug 7, 2024
## Proposed changes

When replay wal, it firstly abort the txn with the label but does not
check the abort result.
And when begin txn of replay, if FE returns `LabelAlreadyUsedException`,
it consider the load is success in previous group commit load or repaly
wal, and delete this wal directly.
But `LabelAlreadyUsedException` means there is a txn with this label,
the txn may be in `PREPARE / RUNNING / COMMITTED / VISIBLE` status(the
abort txn in first step may fail), so replay wal should check both
`LabelAlreadyUsedException` and txn status is `COMMITTED / VISIBLE`.

This pr also add a case for replay wal with schema change.
wyxxxcat pushed a commit to wyxxxcat/doris that referenced this pull request Aug 14, 2024
## Proposed changes

When replay wal, it firstly abort the txn with the label but does not
check the abort result.
And when begin txn of replay, if FE returns `LabelAlreadyUsedException`,
it consider the load is success in previous group commit load or repaly
wal, and delete this wal directly.
But `LabelAlreadyUsedException` means there is a txn with this label,
the txn may be in `PREPARE / RUNNING / COMMITTED / VISIBLE` status(the
abort txn in first step may fail), so replay wal should check both
`LabelAlreadyUsedException` and txn status is `COMMITTED / VISIBLE`.

This pr also add a case for replay wal with schema change.
@gavinchou gavinchou mentioned this pull request Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.1-merged doing reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants