-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[enhancement](Load)allow load data to the other partitions when some partitions are restoring #39411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
eeab365 to
729a8bc
Compare
|
run buildall |
TPC-H: Total hot run time: 50582 ms |
TPC-DS: Total hot run time: 209534 ms |
ClickBench: Total hot run time: 31.01 s |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
|
run external |
f069d47 to
208203b
Compare
|
run buildall |
1 similar comment
|
run buildall |
TPC-H: Total hot run time: 50293 ms |
TPC-H: Total hot run time: 50686 ms |
TPC-DS: Total hot run time: 209156 ms |
ClickBench: Total hot run time: 31.21 s |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
|
run beut |
|
Adding a partition to a restoring table is not concurrent safety, since the new partition only adds to the local table after all the creating replica tasks are finished. See |
8505f07 to
b67e2a2
Compare
…partitions are restoring
b67e2a2 to
8392e82
Compare
|
run buildall |
|
TPC-H: Total hot run time: 50316 ms |
TPC-DS: Total hot run time: 210853 ms |
ClickBench: Total hot run time: 30.67 s |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
TPC-H: Total hot run time: 50293 ms |
TPC-DS: Total hot run time: 210201 ms |
ClickBench: Total hot run time: 31.11 s |
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G' |
|
Hi, @Johnnyssc. We have to consider this scenario: once your PR is merged, users will be able to add new partitions to a table that is currently being restored, and these partitions might soon be recovered from the snapshot and added to the table. Although this isn't the intent of your PR, we still need to avoid any changes that could cause issues. You must guarantee that the restore process won't add new partitions, which might conflict with the partitions you want to add, to ensure the guarantee, you should look into how to modify |
|
BTW, you should submit the PR to master first instead of directly to branch-2.0. |
@w41ter thx for ur advice and reminding, as u suggested, i check the code about |
|
@Johnnyssc Okay, I misunderstood. |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
Ref: #39595 |
…partitions are restoring (apache#39411) If broker load or stream load task execute in one table that is restoring data, load task will failed with Exception. Exception info :"Table [xxx] is under restore" or "Table [xxx] is in restore process, can't load into it". But mostly restoreJob only effects some partitions in this table, not all of them, so that the other partitions still need to load data successfully. To achieve this goal, before checking olap table state, check partition state first. ps: set restore status for partitions in this pr:apache#8245 ## test case for this pr ### restore tbl's partition p202408 $ RESTORE SNAPSHOT db.tbl_p202408_test FROM repo ON( `tbl` PARTITION (p202408) ) PROPERTIES( "backup_timestamp"="2024-08-22-20-32-37", "replication_num" = "1" ); ### check restore job state\G $ SHOW RESTORE\G *************************** 1. row *************************** JobId: 21741 Label: tbl_p202408_test Timestamp: 2024-08-22-20-32-37 State: DOWNLOADING RestoreObjs: { "name": "tbl_p202408_test", "database": "db", "olap_table_list": [ { "name": "tbl", "partition_names": ["p202408"] } ] ### load to partition p202408, failed with exception curl --location-trusted -u root:"" \ > -H "label:tbl_test_load_19" \ > -H "timeout:300" \ > -H "format: parquet" \ > -T data_for_p202408.parquet \ > -XPUT http://fe_ip:8030/api/db/tbl/_stream_load { "TxnId": 3042, "Label": "tbl_test_load_19", "Comment": "", "TwoPhaseCommit": "false", "Status": "Fail", "Message": "[ANALYSIS_ERROR]TStatus: errCode = 2, detailMessage = Table [zt_order_detail_v3], Partition [p202408] is in restore process. Can not load into it.etc.", "NumberTotalRows": 682, "NumberLoadedRows": 682, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 82025, "LoadTimeMs": 48, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 7, "ReadDataTimeMs": 0, "WriteDataTimeMs": 38, "CommitAndPublishTimeMs": 0 } ### load to partition p202408, successfully $ curl --location-trusted -u root:"" \ > -H "timeout:300" \ > -H "format: json" \ > -H "read_json_by_line:true" \ > -T data_for_p202407.json \ > -XPUT http://fe_ip:8030/api/db/tbl/_stream_load { "TxnId": 3043, "Label": "2f2dae38-a495-4c22-9492-419ea70b724e", "Comment": "", "TwoPhaseCommit": "false", "Status": "Success", "Message": "OK", "NumberTotalRows": 1, "NumberLoadedRows": 1, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 1128, "LoadTimeMs": 51, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 6, "ReadDataTimeMs": 0, "WriteDataTimeMs": 30, "CommitAndPublishTimeMs": 13 } Co-authored-by: shenshoucheng <shenshoucheng@jd.com>
If broker load or stream load task execute in one table that is restoring data, load task will failed with Exception.
Exception info :"Table [xxx] is under restore" or "Table [xxx] is in restore process, can't load into it".
But mostly restoreJob only effects some partitions in this table, not all of them, so that the other partitions still need to load data successfully.
To achieve this goal, before checking olap table state, check partition state first.
ps: set restore status for partitions in this pr:#8245
test case for this pr
restore tbl's partition p202408
$ RESTORE SNAPSHOT db.tbl_p202408_test
FROM repo
ON(
tblPARTITION (p202408))
PROPERTIES(
"backup_timestamp"="2024-08-22-20-32-37",
"replication_num" = "1"
);
check restore job state\G
$ SHOW RESTORE\G
*************************** 1. row ***************************
JobId: 21741
Label: tbl_p202408_test
Timestamp: 2024-08-22-20-32-37
State: DOWNLOADING
RestoreObjs: {
"name": "tbl_p202408_test",
"database": "db",
"olap_table_list": [
{
"name": "tbl",
"partition_names": ["p202408"]
}
]
load to partition p202408, failed with exception
curl --location-trusted -u root:"" \
load to partition p202408, successfully
$ curl --location-trusted -u root:"" \