Skip to content

Conversation

@Johnnyssc
Copy link
Contributor

@Johnnyssc Johnnyssc commented Aug 15, 2024

If broker load or stream load task execute in one table that is restoring data, load task will failed with Exception.
Exception info :"Table [xxx] is under restore" or "Table [xxx] is in restore process, can't load into it".

But mostly restoreJob only effects some partitions in this table, not all of them, so that the other partitions still need to load data successfully.
To achieve this goal, before checking olap table state, check partition state first.

ps: set restore status for partitions in this pr:#8245

test case for this pr

restore tbl's partition p202408

$ RESTORE SNAPSHOT db.tbl_p202408_test
FROM repo
ON(
tbl PARTITION (p202408)
)
PROPERTIES(
"backup_timestamp"="2024-08-22-20-32-37",
"replication_num" = "1"
);

check restore job state\G

$ SHOW RESTORE\G
*************************** 1. row ***************************
JobId: 21741
Label: tbl_p202408_test
Timestamp: 2024-08-22-20-32-37
State: DOWNLOADING
RestoreObjs: {
"name": "tbl_p202408_test",
"database": "db",
"olap_table_list": [
{
"name": "tbl",
"partition_names": ["p202408"]
}
]

load to partition p202408, failed with exception

curl --location-trusted -u root:"" \

-H "label:tbl_test_load_19"
-H "timeout:300"
-H "format: parquet"
-T data_for_p202408.parquet
-XPUT http://fe_ip:8030/api/db/tbl/_stream_load
{
"TxnId": 3042,
"Label": "tbl_test_load_19",
"Comment": "",
"TwoPhaseCommit": "false",
"Status": "Fail",
"Message": "[ANALYSIS_ERROR]TStatus: errCode = 2, detailMessage = Table [zt_order_detail_v3], Partition [p202408] is in restore process. Can not load into it.etc.",
"NumberTotalRows": 682,
"NumberLoadedRows": 682,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 82025,
"LoadTimeMs": 48,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 7,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 38,
"CommitAndPublishTimeMs": 0
}

load to partition p202408, successfully

$ curl --location-trusted -u root:"" \

-H "timeout:300"
-H "format: json"
-H "read_json_by_line:true"
-T data_for_p202407.json
-XPUT http://fe_ip:8030/api/db/tbl/_stream_load
{
"TxnId": 3043,
"Label": "2f2dae38-a495-4c22-9492-419ea70b724e",
"Comment": "",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 1,
"NumberLoadedRows": 1,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 1128,
"LoadTimeMs": 51,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 6,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 30,
"CommitAndPublishTimeMs": 13
}

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@github-actions github-actions bot added the area/load Issues or PRs related to all kinds of load label Aug 15, 2024
@Johnnyssc Johnnyssc force-pushed the dev_load_enhance branch 6 times, most recently from eeab365 to 729a8bc Compare August 17, 2024 07:34
@Johnnyssc
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 50582 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 729a8bc095da11bacbd82293a70afdf3c8b854de, data reload: false

------ Round 1 ----------------------------------
q1	17777	4441	4360	4360
q2	2088	193	188	188
q3	10226	1915	1960	1915
q4	10372	1310	1309	1309
q5	8559	3885	3948	3885
q6	269	144	143	143
q7	2064	1631	1617	1617
q8	9516	2767	2745	2745
q9	10835	10267	10226	10226
q10	8631	3546	3547	3546
q11	451	279	279	279
q12	499	330	327	327
q13	18685	4007	4091	4007
q14	384	350	365	350
q15	551	504	515	504
q16	719	602	619	602
q17	1167	996	998	996
q18	7329	6991	6980	6980
q19	1763	1685	1655	1655
q20	595	352	333	333
q21	4474	4219	4172	4172
q22	553	443	450	443
Total cold run time: 117507 ms
Total hot run time: 50582 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4343	4359	4339	4339
q2	363	270	265	265
q3	4200	4188	4206	4188
q4	2799	2782	2769	2769
q5	7211	7225	7221	7221
q6	257	142	143	142
q7	3356	2899	2964	2899
q8	4630	4707	4738	4707
q9	17248	17180	17046	17046
q10	4292	4350	4336	4336
q11	786	712	729	712
q12	1074	876	873	873
q13	7264	4022	3869	3869
q14	489	464	458	458
q15	563	517	497	497
q16	799	731	723	723
q17	3859	3890	3867	3867
q18	8821	8828	8843	8828
q19	1759	1763	1704	1704
q20	2400	2126	2140	2126
q21	8509	8438	8577	8438
q22	1064	980	964	964
Total cold run time: 86086 ms
Total hot run time: 80971 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 209534 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 729a8bc095da11bacbd82293a70afdf3c8b854de, data reload: false

query1	989	422	405	405
query2	6779	2269	2409	2269
query3	6957	222	223	222
query4	20105	18304	18206	18206
query5	19949	6689	6794	6689
query6	340	255	279	255
query7	4188	332	333	332
query8	483	465	424	424
query9	3289	2815	2757	2757
query10	498	368	388	368
query11	11476	10868	10837	10837
query12	161	106	116	106
query13	5684	728	748	728
query14	17754	14130	13856	13856
query15	402	257	252	252
query16	6513	326	302	302
query17	1597	1478	946	946
query18	2331	476	480	476
query19	237	194	189	189
query20	110	114	117	114
query21	247	154	143	143
query22	5290	5032	5037	5032
query23	32613	31970	32295	31970
query24	7252	6570	6567	6567
query25	580	491	463	463
query26	565	198	193	193
query27	1850	330	345	330
query28	6120	2433	2391	2391
query29	2959	2824	2710	2710
query30	293	211	206	206
query31	1002	828	825	825
query32	103	93	91	91
query33	485	342	337	337
query34	901	513	529	513
query35	1199	1002	989	989
query36	1286	1208	1209	1208
query37	117	88	93	88
query38	3153	2990	2982	2982
query39	1491	1442	1437	1437
query40	258	142	150	142
query41	155	157	152	152
query42	109	115	117	115
query43	787	610	644	610
query44	1248	798	788	788
query45	278	271	271	271
query46	1273	1017	1011	1011
query47	1915	1854	1787	1787
query48	1053	739	732	732
query49	849	592	579	579
query50	947	681	730	681
query51	4917	4695	4807	4695
query52	124	103	102	102
query53	506	374	371	371
query54	2748	2532	2553	2532
query55	102	105	104	104
query56	327	296	292	292
query57	1324	1310	1186	1186
query58	313	314	305	305
query59	3577	3283	3347	3283
query60	302	290	277	277
query61	157	153	157	153
query62	916	648	561	561
query63	548	404	399	399
query64	2601	1677	1511	1511
query65	3651	3645	3591	3591
query66	1260	831	812	812
query67	15933	16481	15523	15523
query68	10630	693	675	675
query69	630	420	436	420
query70	2171	1469	1532	1469
query71	468	350	364	350
query72	6602	3555	3598	3555
query73	788	367	361	361
query74	6401	6022	5973	5973
query75	5529	3820	3864	3820
query76	6982	1201	1251	1201
query77	1356	422	413	413
query78	12401	12021	11716	11716
query79	8580	697	667	667
query80	1011	547	564	547
query81	535	276	275	275
query82	694	123	123	123
query83	254	217	216	216
query84	282	96	97	96
query85	971	409	408	408
query86	364	370	342	342
query87	3299	3040	3064	3040
query88	4538	2536	2503	2503
query89	510	334	359	334
query90	1977	252	263	252
query91	193	160	158	158
query92	89	85	83	83
query93	6555	594	618	594
query94	731	247	245	245
query95	1169	1104	1104	1104
query96	655	345	341	341
query97	6595	6308	6425	6308
query98	223	210	209	209
query99	3125	977	963	963
Total cold run time: 321834 ms
Total hot run time: 209534 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.01 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 729a8bc095da11bacbd82293a70afdf3c8b854de, data reload: false

query1	0.02	0.02	0.02
query2	0.07	0.03	0.03
query3	0.25	0.05	0.04
query4	1.79	0.07	0.07
query5	0.56	0.53	0.52
query6	1.32	0.67	0.62
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.55	0.49	0.51
query10	0.55	0.54	0.55
query11	0.12	0.09	0.09
query12	0.12	0.10	0.10
query13	0.64	0.62	0.62
query14	0.78	0.80	0.80
query15	0.79	0.77	0.77
query16	0.37	0.37	0.39
query17	0.98	0.99	1.00
query18	0.23	0.26	0.25
query19	1.94	1.85	1.88
query20	0.02	0.01	0.01
query21	15.46	0.55	0.57
query22	2.16	2.46	1.32
query23	17.10	1.12	0.94
query24	5.90	0.91	0.92
query25	0.36	0.07	0.06
query26	0.68	0.17	0.16
query27	0.05	0.04	0.04
query28	7.40	0.78	0.72
query29	12.62	2.35	2.35
query30	0.83	0.79	0.77
query31	2.82	0.40	0.38
query32	3.38	0.51	0.51
query33	3.07	3.09	3.04
query34	15.25	4.80	4.79
query35	4.83	4.83	4.88
query36	1.06	1.04	1.04
query37	0.08	0.06	0.07
query38	0.05	0.03	0.04
query39	0.04	0.03	0.03
query40	0.19	0.17	0.17
query41	0.09	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 104.66 s
Total hot run time: 31.01 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 729a8bc095da11bacbd82293a70afdf3c8b854de with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.2 seconds inserted 10000000 Rows, about 471K ops/s

@Johnnyssc
Copy link
Contributor Author

run external

@Johnnyssc Johnnyssc force-pushed the dev_load_enhance branch 2 times, most recently from f069d47 to 208203b Compare August 19, 2024 06:31
@Johnnyssc
Copy link
Contributor Author

run buildall

1 similar comment
@Johnnyssc
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 50293 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 208203b98daf8609c8b89473755dc9b1939aa3fc, data reload: false

------ Round 1 ----------------------------------
q1	18319	4445	4371	4371
q2	2081	197	184	184
q3	10194	1901	1929	1901
q4	10381	1274	1351	1274
q5	8643	4261	3949	3949
q6	276	145	141	141
q7	2088	1622	1629	1622
q8	9537	2765	2741	2741
q9	13556	10387	10136	10136
q10	8665	3533	3482	3482
q11	447	273	277	273
q12	509	331	328	328
q13	18434	4017	4132	4017
q14	402	369	362	362
q15	554	505	512	505
q16	727	637	639	637
q17	1162	977	935	935
q18	7463	7004	6942	6942
q19	2004	1657	1541	1541
q20	549	331	328	328
q21	4507	4196	4188	4188
q22	519	436	459	436
Total cold run time: 121017 ms
Total hot run time: 50293 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4403	4395	4400	4395
q2	366	268	264	264
q3	4208	4212	4185	4185
q4	2799	2798	2783	2783
q5	7208	7250	7155	7155
q6	258	142	134	134
q7	3327	3059	3046	3046
q8	4591	4746	4711	4711
q9	17373	17070	16906	16906
q10	4315	4325	4319	4319
q11	800	722	707	707
q12	1062	854	902	854
q13	7276	3777	3752	3752
q14	492	442	460	442
q15	547	498	522	498
q16	772	732	710	710
q17	3849	3917	3871	3871
q18	8804	8829	8869	8829
q19	1779	1791	1720	1720
q20	2423	2171	2147	2147
q21	8574	8576	8547	8547
q22	1060	998	949	949
Total cold run time: 86286 ms
Total hot run time: 80924 ms

@doris-robot
Copy link

TPC-H: Total hot run time: 50686 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 208203b98daf8609c8b89473755dc9b1939aa3fc, data reload: false

------ Round 1 ----------------------------------
q1	18232	4482	4491	4482
q2	2071	195	191	191
q3	10181	1912	1917	1912
q4	10350	1274	1342	1274
q5	8545	3940	3930	3930
q6	274	143	141	141
q7	2044	1666	1656	1656
q8	9336	2786	2776	2776
q9	10507	10405	10379	10379
q10	8700	3552	3514	3514
q11	462	280	276	276
q12	498	339	351	339
q13	19208	3983	4056	3983
q14	370	350	349	349
q15	553	502	506	502
q16	709	605	612	605
q17	1156	996	999	996
q18	7310	6856	6954	6856
q19	1761	1634	1578	1578
q20	578	342	332	332
q21	4529	4227	4170	4170
q22	557	445	465	445
Total cold run time: 117931 ms
Total hot run time: 50686 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4530	4568	4504	4504
q2	377	295	296	295
q3	4297	4308	4283	4283
q4	2871	2852	2867	2852
q5	7490	7206	7231	7206
q6	261	147	144	144
q7	3304	2881	2925	2881
q8	4382	4542	4528	4528
q9	16994	16839	16676	16676
q10	4250	4331	4315	4315
q11	800	718	690	690
q12	1055	871	874	871
q13	7398	3791	3746	3746
q14	479	438	438	438
q15	544	500	512	500
q16	773	714	714	714
q17	3843	3867	3876	3867
q18	8827	8710	8801	8710
q19	1756	1747	1723	1723
q20	2404	2155	2146	2146
q21	8531	8466	8445	8445
q22	1071	976	948	948
Total cold run time: 86237 ms
Total hot run time: 80482 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 209156 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 208203b98daf8609c8b89473755dc9b1939aa3fc, data reload: false

query1	978	430	409	409
query2	6783	2253	2238	2238
query3	6955	227	226	226
query4	21165	18263	18101	18101
query5	19948	6742	6746	6742
query6	335	263	276	263
query7	4187	333	337	333
query8	491	482	487	482
query9	3245	2815	2743	2743
query10	481	369	358	358
query11	11507	10767	10785	10767
query12	154	117	108	108
query13	5676	718	723	718
query14	18304	13385	14087	13385
query15	380	261	247	247
query16	6500	338	312	312
query17	1582	1465	934	934
query18	2338	503	480	480
query19	242	193	190	190
query20	115	114	112	112
query21	242	148	161	148
query22	5365	5162	4895	4895
query23	32734	32064	32034	32034
query24	7032	6544	6599	6544
query25	578	476	479	476
query26	557	192	189	189
query27	1840	328	332	328
query28	6070	2444	2390	2390
query29	2978	2820	2764	2764
query30	278	210	207	207
query31	1004	820	841	820
query32	99	96	92	92
query33	486	338	335	335
query34	912	520	528	520
query35	1181	978	1003	978
query36	1378	1189	1372	1189
query37	118	89	89	89
query38	3074	2921	2979	2921
query39	1492	1428	1446	1428
query40	258	144	145	144
query41	162	154	153	153
query42	110	107	112	107
query43	776	677	672	672
query44	1182	788	787	787
query45	284	271	267	267
query46	1292	989	1029	989
query47	1957	2035	1785	1785
query48	1053	753	739	739
query49	842	580	585	580
query50	951	723	672	672
query51	4830	4901	4740	4740
query52	112	99	109	99
query53	504	394	381	381
query54	2757	2551	2539	2539
query55	109	112	97	97
query56	318	304	289	289
query57	1321	1265	1286	1265
query58	317	308	314	308
query59	3664	3266	3533	3266
query60	292	287	291	287
query61	158	161	156	156
query62	951	546	552	546
query63	546	400	407	400
query64	2734	1697	1557	1557
query65	3662	3611	3611	3611
query66	1255	814	803	803
query67	16436	15463	15584	15463
query68	8616	707	702	702
query69	641	428	401	401
query70	1530	1572	1401	1401
query71	458	359	372	359
query72	6568	3542	3564	3542
query73	783	364	368	364
query74	6440	5981	5960	5960
query75	4918	3833	3880	3833
query76	4951	1215	1265	1215
query77	880	432	416	416
query78	12626	12041	11900	11900
query79	8984	664	687	664
query80	1965	550	557	550
query81	541	280	275	275
query82	1695	126	128	126
query83	249	220	214	214
query84	286	98	101	98
query85	1226	412	411	411
query86	383	332	330	330
query87	3247	3063	3113	3063
query88	5430	2544	2540	2540
query89	446	329	334	329
query90	1872	258	254	254
query91	193	171	157	157
query92	90	84	83	83
query93	5140	555	585	555
query94	838	257	252	252
query95	1151	1101	1097	1097
query96	657	350	338	338
query97	6530	6360	6417	6360
query98	218	211	207	207
query99	2937	963	976	963
Total cold run time: 320106 ms
Total hot run time: 209156 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.21 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 208203b98daf8609c8b89473755dc9b1939aa3fc, data reload: false

query1	0.03	0.02	0.03
query2	0.07	0.03	0.02
query3	0.25	0.05	0.05
query4	1.79	0.06	0.06
query5	0.54	0.52	0.53
query6	1.24	0.63	0.62
query7	0.02	0.01	0.02
query8	0.05	0.04	0.03
query9	0.53	0.49	0.49
query10	0.55	0.54	0.54
query11	0.13	0.10	0.09
query12	0.13	0.10	0.10
query13	0.63	0.62	0.63
query14	0.82	0.81	0.78
query15	0.78	0.77	0.76
query16	0.36	0.37	0.37
query17	1.04	0.99	1.02
query18	0.23	0.26	0.24
query19	1.83	1.87	1.86
query20	0.02	0.01	0.01
query21	15.45	0.55	0.55
query22	2.37	2.80	1.48
query23	16.74	1.02	0.88
query24	7.09	1.10	1.06
query25	0.38	0.08	0.07
query26	0.77	0.16	0.16
query27	0.05	0.05	0.05
query28	5.88	0.81	0.72
query29	12.80	2.38	2.37
query30	0.74	0.67	0.76
query31	2.83	0.40	0.39
query32	3.34	0.52	0.50
query33	3.09	3.07	3.07
query34	15.24	4.79	4.82
query35	4.88	4.86	4.85
query36	1.05	1.03	1.03
query37	0.08	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.04	0.03
query40	0.19	0.16	0.16
query41	0.08	0.04	0.03
query42	0.03	0.04	0.03
query43	0.04	0.04	0.04
Total cold run time: 104.24 s
Total hot run time: 31.21 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 208203b98daf8609c8b89473755dc9b1939aa3fc with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.3 seconds inserted 10000000 Rows, about 469K ops/s

@Johnnyssc Johnnyssc changed the title [enhancement](Load)only check partitions' state in loading process, instead of checking table state [enhancement](Load)allow load data to the other partitions when some partitions are restoring Aug 19, 2024
@Johnnyssc
Copy link
Contributor Author

run beut

@w41ter
Copy link
Contributor

w41ter commented Aug 20, 2024

Adding a partition to a restoring table is not concurrent safety, since the new partition only adds to the local table after all the creating replica tasks are finished.

See RestoreJob::checkAndPrepareMeta for details.

@Johnnyssc Johnnyssc force-pushed the dev_load_enhance branch 3 times, most recently from 8505f07 to b67e2a2 Compare August 20, 2024 09:57
@Johnnyssc
Copy link
Contributor Author

run buildall

@Johnnyssc
Copy link
Contributor Author

Adding a partition to a restoring table is not concurrent safety, since the new partition only adds to the local table after all the creating replica tasks are finished.
@w41ter this pr is not going to add partition to a restoring table, but to allow load data in some partitions with the other partitions is restoring in one table

@doris-robot
Copy link

TPC-H: Total hot run time: 50316 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8392e8230c346f4d914f7ee7520b7767ef292a35, data reload: false

------ Round 1 ----------------------------------
q1	17838	4396	4388	4388
q2	2110	195	189	189
q3	10317	1909	1940	1909
q4	10387	1219	1340	1219
q5	8576	3935	3957	3935
q6	269	145	144	144
q7	2117	1666	1638	1638
q8	9320	2753	2750	2750
q9	11138	10568	10360	10360
q10	8693	3546	3551	3546
q11	448	280	279	279
q12	500	327	332	327
q13	18388	3989	4065	3989
q14	372	342	351	342
q15	543	515	503	503
q16	707	611	608	608
q17	1146	972	929	929
q18	7355	6851	6918	6851
q19	1732	1572	1597	1572
q20	565	353	327	327
q21	4487	4200	4080	4080
q22	534	431	440	431
Total cold run time: 117542 ms
Total hot run time: 50316 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4387	4313	4348	4313
q2	359	270	263	263
q3	4236	4194	4204	4194
q4	2774	2760	2776	2760
q5	7289	7228	7181	7181
q6	257	137	139	137
q7	3332	2940	2920	2920
q8	4465	4548	4495	4495
q9	17411	16974	17196	16974
q10	4247	4273	4315	4273
q11	796	701	736	701
q12	1057	902	867	867
q13	7188	3709	3767	3709
q14	479	435	442	435
q15	554	494	519	494
q16	771	712	728	712
q17	3795	3900	3803	3803
q18	8856	8738	8814	8738
q19	1764	1743	1728	1728
q20	2401	2212	2155	2155
q21	8524	8444	8420	8420
q22	1083	999	949	949
Total cold run time: 86025 ms
Total hot run time: 80221 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 210853 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8392e8230c346f4d914f7ee7520b7767ef292a35, data reload: false

query1	988	422	417	417
query2	6759	2265	2124	2124
query3	6941	240	224	224
query4	20073	18272	18228	18228
query5	19945	6778	6801	6778
query6	330	263	295	263
query7	4188	344	349	344
query8	487	485	475	475
query9	3271	2834	2773	2773
query10	491	382	381	381
query11	11531	10904	10883	10883
query12	158	114	111	111
query13	5670	752	747	747
query14	18634	13947	13899	13899
query15	397	264	247	247
query16	6487	327	312	312
query17	1638	1467	953	953
query18	2494	483	483	483
query19	249	183	200	183
query20	116	117	120	117
query21	247	157	151	151
query22	5273	5211	5054	5054
query23	32898	32001	32082	32001
query24	7074	6645	6613	6613
query25	573	486	479	479
query26	563	195	188	188
query27	1833	331	338	331
query28	6083	2442	2398	2398
query29	2953	2747	2849	2747
query30	284	214	212	212
query31	996	830	837	830
query32	100	95	93	93
query33	483	337	350	337
query34	913	538	526	526
query35	1188	981	1009	981
query36	1345	1228	1338	1228
query37	117	98	92	92
query38	3107	2961	2989	2961
query39	1494	1440	1451	1440
query40	265	152	145	145
query41	160	162	158	158
query42	113	117	108	108
query43	701	700	738	700
query44	1207	791	790	790
query45	283	270	270	270
query46	1277	1023	1050	1023
query47	1996	2052	1797	1797
query48	1028	731	741	731
query49	848	592	608	592
query50	950	739	714	714
query51	4859	4859	4853	4853
query52	127	108	110	108
query53	508	387	376	376
query54	2742	2556	2558	2556
query55	100	104	102	102
query56	313	300	295	295
query57	1305	1210	1287	1210
query58	341	318	305	305
query59	3739	3276	3339	3276
query60	298	282	296	282
query61	162	166	167	166
query62	865	525	535	525
query63	542	400	404	400
query64	2755	1711	1673	1673
query65	3700	3605	3585	3585
query66	1255	839	829	829
query67	16090	15138	17358	15138
query68	8609	722	712	712
query69	636	420	415	415
query70	1752	1662	1550	1550
query71	457	361	394	361
query72	6639	3608	3580	3580
query73	781	384	371	371
query74	6382	5978	5921	5921
query75	5246	3772	3842	3772
query76	5150	1179	1296	1179
query77	1007	441	440	440
query78	12769	12115	12109	12109
query79	6608	693	712	693
query80	1145	563	557	557
query81	538	296	288	288
query82	1369	132	130	130
query83	278	222	220	220
query84	285	99	104	99
query85	997	426	436	426
query86	381	328	329	328
query87	3276	3075	3054	3054
query88	5011	2544	2542	2542
query89	398	336	358	336
query90	1985	258	268	258
query91	210	164	169	164
query92	99	87	92	87
query93	4224	599	616	599
query94	739	256	257	256
query95	1187	1131	1099	1099
query96	666	346	353	346
query97	6492	6510	6495	6495
query98	224	219	224	219
query99	3013	1040	936	936
Total cold run time: 315453 ms
Total hot run time: 210853 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.67 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8392e8230c346f4d914f7ee7520b7767ef292a35, data reload: false

query1	0.02	0.02	0.02
query2	0.07	0.03	0.02
query3	0.25	0.06	0.06
query4	1.77	0.06	0.06
query5	0.54	0.52	0.53
query6	1.28	0.63	0.63
query7	0.02	0.01	0.01
query8	0.04	0.03	0.03
query9	0.54	0.48	0.48
query10	0.55	0.54	0.54
query11	0.13	0.10	0.09
query12	0.14	0.10	0.11
query13	0.63	0.63	0.62
query14	0.80	0.79	0.81
query15	0.79	0.77	0.80
query16	0.39	0.38	0.36
query17	1.03	0.99	1.03
query18	0.21	0.28	0.22
query19	1.94	1.85	1.83
query20	0.02	0.01	0.01
query21	15.45	0.55	0.56
query22	1.99	2.14	1.50
query23	16.93	1.10	1.01
query24	5.93	0.57	0.89
query25	0.35	0.12	0.05
query26	0.62	0.16	0.15
query27	0.05	0.04	0.04
query28	7.74	0.81	0.74
query29	12.64	2.32	2.18
query30	0.77	0.75	0.73
query31	2.81	0.40	0.38
query32	3.36	0.51	0.51
query33	3.09	3.09	3.06
query34	15.30	4.82	4.85
query35	4.89	4.84	4.88
query36	1.07	1.01	1.03
query37	0.07	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.03	0.04
query40	0.19	0.17	0.17
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.03	0.04
Total cold run time: 104.67 s
Total hot run time: 30.67 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 8392e8230c346f4d914f7ee7520b7767ef292a35 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.2 seconds inserted 10000000 Rows, about 471K ops/s

@doris-robot
Copy link

TPC-H: Total hot run time: 50293 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8392e8230c346f4d914f7ee7520b7767ef292a35, data reload: false

------ Round 1 ----------------------------------
q1	18209	4393	4401	4393
q2	2101	194	185	185
q3	10366	1932	1945	1932
q4	10330	1225	1320	1225
q5	8828	3909	3917	3909
q6	270	142	144	142
q7	2065	1651	1633	1633
q8	9335	2774	2733	2733
q9	10551	10290	10288	10288
q10	8653	3488	3542	3488
q11	447	272	270	270
q12	494	333	330	330
q13	18352	4016	4041	4016
q14	373	346	356	346
q15	559	499	505	499
q16	698	604	606	604
q17	1126	995	984	984
q18	7289	6915	6906	6906
q19	1731	1636	1540	1540
q20	561	338	307	307
q21	4906	4200	4125	4125
q22	514	438	448	438
Total cold run time: 117758 ms
Total hot run time: 50293 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4366	4350	4319	4319
q2	362	266	277	266
q3	4179	4164	4138	4138
q4	2760	2758	2759	2758
q5	7196	7105	7087	7087
q6	255	142	138	138
q7	3279	2898	2918	2898
q8	4423	4536	4510	4510
q9	16934	16660	16778	16660
q10	4277	4404	4246	4246
q11	751	695	723	695
q12	1043	902	897	897
q13	6412	3806	3766	3766
q14	488	440	440	440
q15	546	501	500	500
q16	766	706	714	706
q17	3811	3884	3872	3872
q18	8842	8704	8729	8704
q19	1769	1749	1690	1690
q20	2404	2163	2129	2129
q21	8507	8464	8408	8408
q22	1088	983	953	953
Total cold run time: 84458 ms
Total hot run time: 79780 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 210201 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8392e8230c346f4d914f7ee7520b7767ef292a35, data reload: false

query1	988	416	407	407
query2	6756	2386	2320	2320
query3	6940	234	221	221
query4	19983	18051	17993	17993
query5	19957	6700	6767	6700
query6	332	260	272	260
query7	4234	327	345	327
query8	477	468	450	450
query9	3269	2814	2765	2765
query10	496	392	387	387
query11	11412	10910	10877	10877
query12	156	111	114	111
query13	5668	722	728	722
query14	18256	13523	13789	13523
query15	386	249	242	242
query16	6500	326	301	301
query17	1573	1512	928	928
query18	2330	483	489	483
query19	244	184	179	179
query20	117	114	118	114
query21	245	157	151	151
query22	5281	4995	5043	4995
query23	32666	32288	31968	31968
query24	6967	6576	6553	6553
query25	568	481	482	481
query26	565	190	191	190
query27	1812	325	327	325
query28	6108	2429	2402	2402
query29	2839	2713	2786	2713
query30	285	209	210	209
query31	1014	827	870	827
query32	100	94	92	92
query33	472	334	332	332
query34	884	536	535	535
query35	1162	970	984	970
query36	1255	1364	1210	1210
query37	118	90	91	90
query38	3111	2960	2960	2960
query39	1478	1443	1428	1428
query40	256	149	145	145
query41	157	159	153	153
query42	106	116	110	110
query43	692	692	632	632
query44	1181	784	789	784
query45	281	272	262	262
query46	1275	1036	1027	1027
query47	1893	1845	2120	1845
query48	1052	760	723	723
query49	836	594	602	594
query50	950	695	714	695
query51	4835	4745	4721	4721
query52	119	108	118	108
query53	500	374	381	374
query54	2727	2542	2554	2542
query55	105	91	100	91
query56	323	301	288	288
query57	1282	1244	1246	1244
query58	334	302	316	302
query59	3472	3322	3596	3322
query60	292	270	293	270
query61	158	161	158	158
query62	841	572	528	528
query63	541	387	404	387
query64	2770	1716	1681	1681
query65	3673	3603	3646	3603
query66	1206	809	824	809
query67	17250	16357	16956	16357
query68	4218	679	719	679
query69	589	416	420	416
query70	1596	1596	1436	1436
query71	413	368	362	362
query72	6370	3538	3597	3538
query73	764	368	372	368
query74	6449	5861	5943	5861
query75	4367	3794	3769	3769
query76	2185	1202	1236	1202
query77	610	430	414	414
query78	12504	11683	11972	11683
query79	7053	676	694	676
query80	3269	554	558	554
query81	574	281	274	274
query82	1239	134	127	127
query83	330	220	215	215
query84	251	104	97	97
query85	1698	415	416	415
query86	472	329	347	329
query87	3313	3055	3062	3055
query88	5043	2551	2551	2551
query89	397	355	343	343
query90	1678	254	265	254
query91	196	161	162	161
query92	110	88	85	85
query93	1911	623	622	622
query94	1001	245	257	245
query95	1170	1116	1112	1112
query96	663	359	341	341
query97	6554	6358	6303	6303
query98	218	217	204	204
query99	1772	1066	1014	1014
Total cold run time: 305088 ms
Total hot run time: 210201 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.11 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8392e8230c346f4d914f7ee7520b7767ef292a35, data reload: false

query1	0.02	0.02	0.03
query2	0.08	0.03	0.03
query3	0.25	0.05	0.05
query4	1.79	0.07	0.07
query5	0.54	0.53	0.55
query6	1.24	0.62	0.60
query7	0.01	0.01	0.01
query8	0.04	0.03	0.03
query9	0.53	0.49	0.49
query10	0.55	0.54	0.56
query11	0.13	0.10	0.09
query12	0.13	0.10	0.10
query13	0.62	0.62	0.62
query14	0.80	0.80	0.79
query15	0.80	0.77	0.79
query16	0.40	0.38	0.39
query17	1.01	1.01	1.03
query18	0.24	0.25	0.26
query19	1.88	1.86	1.84
query20	0.02	0.01	0.01
query21	15.44	0.56	0.59
query22	2.23	2.16	1.60
query23	17.15	0.96	0.98
query24	5.90	0.79	0.95
query25	0.36	0.13	0.05
query26	0.61	0.16	0.17
query27	0.04	0.04	0.04
query28	7.42	0.81	0.72
query29	12.61	2.36	2.32
query30	0.77	0.70	0.70
query31	2.84	0.40	0.39
query32	3.37	0.51	0.50
query33	3.11	3.04	3.12
query34	15.26	4.85	4.78
query35	4.87	4.85	4.85
query36	1.08	1.03	1.04
query37	0.08	0.06	0.06
query38	0.06	0.04	0.04
query39	0.04	0.04	0.03
query40	0.18	0.16	0.17
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 104.67 s
Total hot run time: 31.11 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 8392e8230c346f4d914f7ee7520b7767ef292a35 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.1 seconds inserted 10000000 Rows, about 473K ops/s

@w41ter
Copy link
Contributor

w41ter commented Aug 21, 2024

Hi, @Johnnyssc.

We have to consider this scenario: once your PR is merged, users will be able to add new partitions to a table that is currently being restored, and these partitions might soon be recovered from the snapshot and added to the table. Although this isn't the intent of your PR, we still need to avoid any changes that could cause issues.

You must guarantee that the restore process won't add new partitions, which might conflict with the partitions you want to add, to ensure the guarantee, you should look into how to modify Restore::checkAndPrepareMeta and make the operation to change partition state and add restoring partition as an atomic operation.

@w41ter
Copy link
Contributor

w41ter commented Aug 21, 2024

BTW, you should submit the PR to master first instead of directly to branch-2.0.

@Johnnyssc
Copy link
Contributor Author

Hi, @Johnnyssc.

We have to consider this scenario: once your PR is merged, users will be able to add new partitions to a table that is currently being restored, and these partitions might soon be recovered from the snapshot and added to the table. Although this isn't the intent of your PR, we still need to avoid any changes that could cause issues.

You must guarantee that the restore process won't add new partitions, which might conflict with the partitions you want to add, to ensure the guarantee, you should look into how to modify Restore::checkAndPrepareMeta and make the operation to change partition state and add restoring partition as an atomic operation.

@w41ter thx for ur advice and reminding, as u suggested, i check the code about RestoreJob::checkAndPrepareMeta & InternalCatalog::addPartition, and make sure that whether restoring table or restoring some partition, table state is have to set to RESTORE.At the same time, addPartition operation firstly check the olapTableState is NORMAL or not, which means that restoring and adding partition is an atomic operation already.
If any question still exits, plz contact me via weChat: Dove2626, thx~

@w41ter
Copy link
Contributor

w41ter commented Aug 22, 2024

@Johnnyssc Okay, I misunderstood.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 22, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@w41ter
Copy link
Contributor

w41ter commented Aug 22, 2024

Ref: #39595

@w41ter w41ter merged commit ab4d81a into apache:branch-2.0 Aug 26, 2024
mongo360 pushed a commit to mongo360/doris that referenced this pull request Dec 11, 2024
…partitions are restoring (apache#39411)

If broker load or stream load task execute in one table that is
restoring data, load task will failed with Exception.
Exception info :"Table [xxx] is under restore" or "Table [xxx] is in
restore process, can't load into it".

But mostly restoreJob only effects some partitions in this table, not
all of them, so that the other partitions still need to load data
successfully.
To achieve this goal, before checking olap table state, check partition
state first.

ps: set restore status for partitions in this
pr:apache#8245


## test case for this pr

### restore tbl's partition p202408
$ RESTORE SNAPSHOT db.tbl_p202408_test
FROM repo
ON(
    `tbl` PARTITION (p202408)
)
PROPERTIES(
    "backup_timestamp"="2024-08-22-20-32-37",
    "replication_num" = "1"
);

###  check restore job state\G
$ SHOW RESTORE\G
*************************** 1. row ***************************
JobId: 21741
Label: tbl_p202408_test
Timestamp: 2024-08-22-20-32-37
State: DOWNLOADING
RestoreObjs: {
  "name": "tbl_p202408_test",
  "database": "db",
  "olap_table_list": [
      {
          "name": "tbl",
          "partition_names": ["p202408"]
      }
]

### load to partition p202408, failed with exception
 curl --location-trusted -u root:"" \
>   -H "label:tbl_test_load_19" \
>   -H "timeout:300" \
>   -H "format: parquet" \
>   -T data_for_p202408.parquet \
>   -XPUT http://fe_ip:8030/api/db/tbl/_stream_load
{
    "TxnId": 3042,
    "Label": "tbl_test_load_19",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Fail",
"Message": "[ANALYSIS_ERROR]TStatus: errCode = 2, detailMessage = Table
[zt_order_detail_v3], Partition [p202408] is in restore process. Can not
load into it.etc.",
    "NumberTotalRows": 682,
    "NumberLoadedRows": 682,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 82025,
    "LoadTimeMs": 48,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 7,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 38,
    "CommitAndPublishTimeMs": 0
}

### load to partition p202408, successfully
$ curl --location-trusted -u root:"" \
>   -H "timeout:300" \
>   -H "format: json" \
>   -H "read_json_by_line:true" \
>   -T  data_for_p202407.json \
>   -XPUT http://fe_ip:8030/api/db/tbl/_stream_load
{
    "TxnId": 3043,
    "Label": "2f2dae38-a495-4c22-9492-419ea70b724e",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Success",
    "Message": "OK",
    "NumberTotalRows": 1,
    "NumberLoadedRows": 1,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 1128,
    "LoadTimeMs": 51,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 6,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 30,
    "CommitAndPublishTimeMs": 13
}

Co-authored-by: shenshoucheng <shenshoucheng@jd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/load Issues or PRs related to all kinds of load reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants