Skip to content

Conversation

@Johnnyssc
Copy link
Contributor

@Johnnyssc Johnnyssc commented Aug 20, 2024

If broker load or stream load task execute in one table that is restoring data, load task will failed with Exception.
Exception info :"Table [xxx] is under restore" or "Table [xxx] is in restore process, can't load into it".

But mostly restoreJob only effects some partitions in this table, not all of them, so that the other partitions still need to load data successfully.
To achieve this goal, before checking olap table state, check partition state first.

ps: set restore status for partitions in this pr:#8245

test case for this pr

restore tbl's partition p202408

$ RESTORE SNAPSHOT db.tbl_p202408_test
FROM repo
ON(
tbl PARTITION (p202408)
)
PROPERTIES(
"backup_timestamp"="2024-08-22-20-32-37",
"replication_num" = "1"
);

check restore job state\G

$ SHOW RESTORE\G
*************************** 1. row ***************************
JobId: 21741
Label: tbl_p202408_test
Timestamp: 2024-08-22-20-32-37
State: DOWNLOADING
RestoreObjs: {
"name": "tbl_p202408_test",
"database": "db",
"olap_table_list": [
{
"name": "tbl",
"partition_names": ["p202408"]
}
]

load to partition p202408, failed with exception

curl --location-trusted -u root:"" \

-H "label:tbl_test_load_19"
-H "timeout:300"
-H "format: parquet"
-T data_for_p202408.parquet
-XPUT http://fe_ip:8030/api/db/tbl/_stream_load
{
"TxnId": 3042,
"Label": "tbl_test_load_19",
"Comment": "",
"TwoPhaseCommit": "false",
"Status": "Fail",
"Message": "[ANALYSIS_ERROR]TStatus: errCode = 2, detailMessage = Table [zt_order_detail_v3], Partition [p202408] is in restore process. Can not load into it.etc.",
"NumberTotalRows": 682,
"NumberLoadedRows": 682,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 82025,
"LoadTimeMs": 48,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 7,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 38,
"CommitAndPublishTimeMs": 0
}

load to partition p202408, successfully

$ curl --location-trusted -u root:"" \

-H "timeout:300"
-H "format: json"
-H "read_json_by_line:true"
-T data_for_p202407.json
-XPUT http://fe_ip:8030/api/db/tbl/_stream_load
{
"TxnId": 3043,
"Label": "2f2dae38-a495-4c22-9492-419ea70b724e",
"Comment": "",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 1,
"NumberLoadedRows": 1,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 1128,
"LoadTimeMs": 51,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 6,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 30,
"CommitAndPublishTimeMs": 13
}

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@Johnnyssc Johnnyssc force-pushed the ssc_dev_load_enhance branch 2 times, most recently from 9a403cc to 02df1a7 Compare August 20, 2024 10:11
@Johnnyssc Johnnyssc force-pushed the ssc_dev_load_enhance branch from 02df1a7 to e82614f Compare August 20, 2024 10:31
@Johnnyssc
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38257 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e82614f5572a25e409877086ffc45cd4336f7ee6, data reload: false

------ Round 1 ----------------------------------
q1	17900	4314	4317	4314
q2	2064	208	217	208
q3	10479	1147	1128	1128
q4	10866	752	703	703
q5	7820	2871	2842	2842
q6	265	156	160	156
q7	1021	667	638	638
q8	9456	2092	2115	2092
q9	7175	6653	6650	6650
q10	7431	2246	2259	2246
q11	475	259	272	259
q12	472	258	263	258
q13	17780	3005	3006	3005
q14	301	253	255	253
q15	567	530	555	530
q16	551	422	415	415
q17	997	641	765	641
q18	7361	7052	6708	6708
q19	2327	1066	1025	1025
q20	694	375	359	359
q21	4005	2793	2791	2791
q22	1133	1036	1054	1036
Total cold run time: 111140 ms
Total hot run time: 38257 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4393	4330	4278	4278
q2	408	322	313	313
q3	2885	2646	2661	2646
q4	1932	1656	1687	1656
q5	5375	5367	5399	5367
q6	245	152	156	152
q7	2086	1722	1710	1710
q8	3206	3376	3346	3346
q9	8443	8458	8473	8458
q10	3407	3182	3155	3155
q11	626	526	529	526
q12	823	628	650	628
q13	16352	2980	3054	2980
q14	337	303	299	299
q15	558	530	531	530
q16	497	447	449	447
q17	1768	1500	1487	1487
q18	8000	7860	7563	7563
q19	1770	1656	1598	1598
q20	2090	1864	1817	1817
q21	12305	4959	5093	4959
q22	1119	1038	985	985
Total cold run time: 78625 ms
Total hot run time: 54900 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191544 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e82614f5572a25e409877086ffc45cd4336f7ee6, data reload: false

query1	985	394	386	386
query2	6745	2053	2001	2001
query3	6663	240	240	240
query4	34285	23294	23242	23242
query5	4370	730	716	716
query6	322	218	213	213
query7	4614	325	325	325
query8	512	450	434	434
query9	8664	2554	2524	2524
query10	511	346	337	337
query11	17587	15021	15088	15021
query12	185	140	138	138
query13	1715	463	449	449
query14	10030	7272	7092	7092
query15	299	204	192	192
query16	8034	486	512	486
query17	1657	617	588	588
query18	2166	348	343	343
query19	247	170	174	170
query20	141	139	139	139
query21	252	145	143	143
query22	4202	4053	3963	3963
query23	34171	33356	33577	33356
query24	11339	2926	2972	2926
query25	660	430	434	430
query26	1193	180	185	180
query27	2484	319	305	305
query28	7355	2150	2130	2130
query29	866	469	451	451
query30	342	187	183	183
query31	1051	837	845	837
query32	124	82	84	82
query33	813	378	345	345
query34	915	522	507	507
query35	903	761	755	755
query36	1096	985	977	977
query37	175	103	106	103
query38	3985	3848	3967	3848
query39	1518	1467	1467	1467
query40	255	159	160	159
query41	145	143	141	141
query42	132	120	120	120
query43	554	513	509	509
query44	1305	790	806	790
query45	222	191	214	191
query46	1135	799	813	799
query47	1920	1817	1832	1817
query48	420	343	338	338
query49	1249	604	594	594
query50	882	475	466	466
query51	7238	7129	7143	7129
query52	124	110	110	110
query53	303	232	235	232
query54	950	507	505	505
query55	91	89	92	89
query56	341	319	323	319
query57	1230	1146	1109	1109
query58	325	305	317	305
query59	3181	2923	2830	2830
query60	386	374	383	374
query61	188	192	166	166
query62	880	701	722	701
query63	269	234	234	234
query64	5334	2406	1836	1836
query65	3236	3168	3150	3150
query66	1119	679	673	673
query67	15514	15043	14835	14835
query68	8520	608	619	608
query69	751	448	343	343
query70	1383	1186	1112	1112
query71	581	322	314	314
query72	7679	2297	2057	2057
query73	2048	363	361	361
query74	9308	8764	8785	8764
query75	5160	2712	2751	2712
query76	5225	1011	1032	1011
query77	909	448	456	448
query78	9776	9178	9634	9178
query79	8432	557	555	555
query80	1041	618	626	618
query81	615	263	258	258
query82	357	162	161	161
query83	375	215	218	215
query84	291	99	97	97
query85	1037	374	354	354
query86	365	333	312	312
query87	4377	4237	4235	4235
query88	4851	2517	2484	2484
query89	543	329	323	323
query90	2109	237	239	237
query91	157	128	128	128
query92	90	74	75	74
query93	5669	574	554	554
query94	979	322	333	322
query95	399	294	296	294
query96	631	284	291	284
query97	3250	3054	3057	3054
query98	254	233	222	222
query99	1584	1310	1345	1310
Total cold run time: 320850 ms
Total hot run time: 191544 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.4 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e82614f5572a25e409877086ffc45cd4336f7ee6, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.05	0.05
query3	0.22	0.05	0.05
query4	1.67	0.07	0.07
query5	0.52	0.50	0.51
query6	1.13	0.72	0.72
query7	0.02	0.01	0.01
query8	0.05	0.05	0.05
query9	0.56	0.50	0.50
query10	0.55	0.55	0.55
query11	0.16	0.12	0.12
query12	0.16	0.13	0.14
query13	0.63	0.59	0.60
query14	0.75	0.79	0.77
query15	0.84	0.83	0.83
query16	0.38	0.40	0.38
query17	1.06	1.07	0.96
query18	0.21	0.20	0.21
query19	1.81	1.86	1.85
query20	0.02	0.01	0.01
query21	15.46	0.67	0.65
query22	4.27	7.98	1.32
query23	18.26	1.40	1.24
query24	2.07	0.23	0.23
query25	0.16	0.09	0.09
query26	0.27	0.20	0.18
query27	0.09	0.09	0.09
query28	13.24	1.02	1.01
query29	12.63	3.36	3.31
query30	0.43	0.25	0.24
query31	2.80	0.41	0.40
query32	3.24	0.48	0.48
query33	2.98	3.02	2.91
query34	16.89	4.33	4.31
query35	4.45	4.39	4.39
query36	0.67	0.51	0.48
query37	0.20	0.17	0.17
query38	0.17	0.16	0.16
query39	0.06	0.05	0.06
query40	0.17	0.15	0.14
query41	0.12	0.06	0.06
query42	0.08	0.06	0.06
query43	0.06	0.05	0.05
Total cold run time: 109.64 s
Total hot run time: 30.4 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 22, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@Johnnyssc
Copy link
Contributor Author

test case for this pr

restore tbl's partition p202408

$ RESTORE SNAPSHOT db.tbl_p202408_test
FROM repo
ON(
tbl PARTITION (p202408)
)
PROPERTIES(
"backup_timestamp"="2024-08-22-20-32-37",
"replication_num" = "1"
);

check restore job state\G

$ SHOW RESTORE\G
*************************** 1. row ***************************
JobId: 21741
Label: tbl_p202408_test
Timestamp: 2024-08-22-20-32-37
State: DOWNLOADING
RestoreObjs: {
"name": "tbl_p202408_test",
"database": "db",
"olap_table_list": [
{
"name": "tbl",
"partition_names": ["p202408"]
}
]

load to partition p202408, failed with exception

curl --location-trusted -u root:"" \

-H "label:tbl_test_load_19"
-H "timeout:300"
-H "format: parquet"
-T data_for_p202408.parquet
-XPUT http://fe_ip:8030/api/db/tbl/_stream_load
{
"TxnId": 3042,
"Label": "tbl_test_load_19",
"Comment": "",
"TwoPhaseCommit": "false",
"Status": "Fail",
"Message": "[ANALYSIS_ERROR]TStatus: errCode = 2, detailMessage = Table [zt_order_detail_v3], Partition [p202408] is in restore process. Can not load into it.etc.",
"NumberTotalRows": 682,
"NumberLoadedRows": 682,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 82025,
"LoadTimeMs": 48,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 7,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 38,
"CommitAndPublishTimeMs": 0
}

load to partition p202408, successfully

$ curl --location-trusted -u root:"" \

-H "timeout:300"
-H "format: json"
-H "read_json_by_line:true"
-T data_for_p202407.json
-XPUT http://fe_ip:8030/api/db/tbl/_stream_load
{
"TxnId": 3043,
"Label": "2f2dae38-a495-4c22-9492-419ea70b724e",
"Comment": "",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 1,
"NumberLoadedRows": 1,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 1128,
"LoadTimeMs": 51,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 6,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 30,
"CommitAndPublishTimeMs": 13
}

@w41ter w41ter merged commit 6580981 into apache:master Aug 26, 2024
yiguolei pushed a commit that referenced this pull request Aug 26, 2024
…partitions are restoring (#39915)

If broker load or stream load task execute in one table that is
restoring data, load task will failed with Exception.
Exception info :"Table [xxx] is under restore" or "Table [xxx] is in
restore process, can't load into it".

But mostly restoreJob only effects some partitions in this table, not
all of them, so that the other partitions still need to load data
successfully.
To achieve this goal, before checking olap table state, check partition
state first.

cherry pick from master branch, pr has been merged:
#39595

Co-authored-by: shenshoucheng <shenshoucheng@jd.com>
dataroaring pushed a commit that referenced this pull request Aug 26, 2024
…partitions are restoring (#39595)

If broker load or stream load task execute in one table that is
restoring data, load task will failed with Exception.
Exception info :"Table [xxx] is under restore" or "Table [xxx] is in
restore process, can't load into it".

But mostly restoreJob only effects some partitions in this table, not
all of them, so that the other partitions still need to load data
successfully.
To achieve this goal, before checking olap table state, check partition
state first.

ps: set restore status for partitions in this
pr:#8245


## test case for this pr

### restore tbl's partition p202408
$ RESTORE SNAPSHOT db.tbl_p202408_test
FROM repo
ON(
    `tbl` PARTITION (p202408)
)
PROPERTIES(
    "backup_timestamp"="2024-08-22-20-32-37",
    "replication_num" = "1"
);

###  check restore job state\G
$ SHOW RESTORE\G
*************************** 1. row ***************************
JobId: 21741
Label: tbl_p202408_test
Timestamp: 2024-08-22-20-32-37
State: DOWNLOADING
RestoreObjs: {
  "name": "tbl_p202408_test",
  "database": "db",
  "olap_table_list": [
      {
          "name": "tbl",
          "partition_names": ["p202408"]
      }
]

### load to partition p202408, failed with exception
 curl --location-trusted -u root:"" \
>   -H "label:tbl_test_load_19" \
>   -H "timeout:300" \
>   -H "format: parquet" \
>   -T data_for_p202408.parquet \
>   -XPUT http://fe_ip:8030/api/db/tbl/_stream_load
{
    "TxnId": 3042,
    "Label": "tbl_test_load_19",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Fail",
"Message": "[ANALYSIS_ERROR]TStatus: errCode = 2, detailMessage = Table
[zt_order_detail_v3], Partition [p202408] is in restore process. Can not
load into it.etc.",
    "NumberTotalRows": 682,
    "NumberLoadedRows": 682,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 82025,
    "LoadTimeMs": 48,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 7,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 38,
    "CommitAndPublishTimeMs": 0
}

### load to partition p202408, successfully
$ curl --location-trusted -u root:"" \
>   -H "timeout:300" \
>   -H "format: json" \
>   -H "read_json_by_line:true" \
>   -T  data_for_p202407.json \
>   -XPUT http://fe_ip:8030/api/db/tbl/_stream_load
{
    "TxnId": 3043,
    "Label": "2f2dae38-a495-4c22-9492-419ea70b724e",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Success",
    "Message": "OK",
    "NumberTotalRows": 1,
    "NumberLoadedRows": 1,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 1128,
    "LoadTimeMs": 51,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 6,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 30,
    "CommitAndPublishTimeMs": 13
}

Co-authored-by: shenshoucheng <shenshoucheng@jd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants