-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](load) fix multi table load plan fail after restart master Fe or leader change #53799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](load) fix multi table load plan fail after restart master Fe or leader change #53799
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
liaoxin01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 33535 ms |
TPC-DS: Total hot run time: 187070 ms |
ClickBench: Total hot run time: 32.5 s |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… leader change (apache#53799) multi table load plan fail after restart master Fe or leader change: ``` mysql> show routine load for test_multi_table\G *************************** Id: 1753247186255 Name: test2 CreateTime: 2025-07-23 13:06:53 PauseTime: NULL EndTime: NULL DbName: db TableName: IsMultiTable: true State: RUNNING DataSourceType: KAFKA CurrentTaskNum: 1 JobProperties: {"max_batch_rows": "3000000","timezone":"Asia/Shanghai","send_batch_parallelism":"1","loadd_to_single_tablet":"false","column_separator":";'''","line_delimiter":"\n","delete":"*"," current_concurrent_number":"1","partial_columns":"false","merge_type":"APPEND","exec_mem_limit":"2147483648","strict_mode":"false","max_batch_interval": 20","max_batch_size": "209715200","esscape":"\u 0000","enclose":"\u0000","partitions":"**","columnToColumnExpr":"","whereExpr":"*****'',"desired_concurrent_number":"256","precedingFilter":"*","format":"csv","max_error_number":"0","max_filter_ratio":"1. 0","sequence_col":"****} DataSourceProperties: {"topic":"my-topic","currentkafkaPartitions": "0", "brokerList": "10.16.10.10.10.77:19092"} CustomProperties: {"kafka_default_offsets":"OFFSET_BEGINNING","group.id": "test2_7f6143d8-f270-4667-851a-e8fb87c27d32"} Statistic: {"receivedBytes":89,"runningTxns": [1542060502549504],"errorRows":0, "committedTaskNum":0, "loadedRows":1,"LoadRowsRate":0,"abortedTaskNum":7,"errorRowsAfterResumed":0,"totalRows" :1,"unselectedRows":0,"receivedBytesRate":1,"taskExecuteTimeMs":51588} Progress: {"0":"0"} Lag: {"0":1} ReasonOfStateChanged: ErrorLogUrls: OtherMsg: 2025-07-23 13:08:07: [INTERNAL_ERROR]TStatus:AnalysisException: errCode = 2, detailMessage = , connect context's user is null, ComputeGroupException: CURRENT_USER_NO_AUTH_TO_US E_DEFAULT_COMPUTE_GROUP, you can contact the system admministrator and request that they grant you the defaultcompute group permissions, use SQL 'SHOW PROPERTY like'default_compute_group'` and NT USAGE_PRIV ON COMPUTE GROUP {compute_group_name}TO{user} GRA 0# # doris::Status doris::Status::create<true>(doris::TStatus const&) at /mnt/disk1/laihui/build/ldb_toolchain/bin/../lib/gcc/x86_64-pc-linux-gnu/114/include/g++-v14/bits/basic_string.h:228 1# doris::io::MultiTablePipe::request_and_exec_plans() at /mnt/disk1/laihui/doris/be/src/common/status.h:522 2# doris: RoutineLoadTaskExecutor::exec_task(std::shared_ptr<doris: StreamLoadContext>, doris::DataConsumerPool*, std::function<void (std::shared_ptr<doris::StreamLoadContext>)>) at /mnt/di sk1/laihui/doris/be/src/runtime/routine_load/routine_load_task_executor.cpp:0 3# std::_Function_handler<void (), ... (reason istruncated, check fe.log with txnId for details(1 User: root Comment: ``` None - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
… leader change (apache#53799) multi table load plan fail after restart master Fe or leader change: ``` mysql> show routine load for test_multi_table\G *************************** Id: 1753247186255 Name: test2 CreateTime: 2025-07-23 13:06:53 PauseTime: NULL EndTime: NULL DbName: db TableName: IsMultiTable: true State: RUNNING DataSourceType: KAFKA CurrentTaskNum: 1 JobProperties: {"max_batch_rows": "3000000","timezone":"Asia/Shanghai","send_batch_parallelism":"1","loadd_to_single_tablet":"false","column_separator":";'''","line_delimiter":"\n","delete":"*"," current_concurrent_number":"1","partial_columns":"false","merge_type":"APPEND","exec_mem_limit":"2147483648","strict_mode":"false","max_batch_interval": 20","max_batch_size": "209715200","esscape":"\u 0000","enclose":"\u0000","partitions":"**","columnToColumnExpr":"","whereExpr":"*****'',"desired_concurrent_number":"256","precedingFilter":"*","format":"csv","max_error_number":"0","max_filter_ratio":"1. 0","sequence_col":"****} DataSourceProperties: {"topic":"my-topic","currentkafkaPartitions": "0", "brokerList": "10.16.10.10.10.77:19092"} CustomProperties: {"kafka_default_offsets":"OFFSET_BEGINNING","group.id": "test2_7f6143d8-f270-4667-851a-e8fb87c27d32"} Statistic: {"receivedBytes":89,"runningTxns": [1542060502549504],"errorRows":0, "committedTaskNum":0, "loadedRows":1,"LoadRowsRate":0,"abortedTaskNum":7,"errorRowsAfterResumed":0,"totalRows" :1,"unselectedRows":0,"receivedBytesRate":1,"taskExecuteTimeMs":51588} Progress: {"0":"0"} Lag: {"0":1} ReasonOfStateChanged: ErrorLogUrls: OtherMsg: 2025-07-23 13:08:07: [INTERNAL_ERROR]TStatus:AnalysisException: errCode = 2, detailMessage = , connect context's user is null, ComputeGroupException: CURRENT_USER_NO_AUTH_TO_US E_DEFAULT_COMPUTE_GROUP, you can contact the system admministrator and request that they grant you the defaultcompute group permissions, use SQL 'SHOW PROPERTY like'default_compute_group'` and NT USAGE_PRIV ON COMPUTE GROUP {compute_group_name}TO{user} GRA 0# # doris::Status doris::Status::create<true>(doris::TStatus const&) at /mnt/disk1/laihui/build/ldb_toolchain/bin/../lib/gcc/x86_64-pc-linux-gnu/114/include/g++-v14/bits/basic_string.h:228 1# doris::io::MultiTablePipe::request_and_exec_plans() at /mnt/disk1/laihui/doris/be/src/common/status.h:522 2# doris: RoutineLoadTaskExecutor::exec_task(std::shared_ptr<doris: StreamLoadContext>, doris::DataConsumerPool*, std::function<void (std::shared_ptr<doris::StreamLoadContext>)>) at /mnt/di sk1/laihui/doris/be/src/runtime/routine_load/routine_load_task_executor.cpp:0 3# std::_Function_handler<void (), ... (reason istruncated, check fe.log with txnId for details(1 User: root Comment: ``` None - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
…master Fe or leader change (#53799) (#53829) pick (#53799) multi table load plan fail after restart master Fe or leader change: ``` mysql> show routine load for test_multi_table\G *************************** Id: 1753247186255 Name: test2 CreateTime: 2025-07-23 13:06:53 PauseTime: NULL EndTime: NULL DbName: db TableName: IsMultiTable: true State: RUNNING DataSourceType: KAFKA CurrentTaskNum: 1 JobProperties: {"max_batch_rows": "3000000","timezone":"Asia/Shanghai","send_batch_parallelism":"1","loadd_to_single_tablet":"false","column_separator":";'''","line_delimiter":"\n","delete":"*"," current_concurrent_number":"1","partial_columns":"false","merge_type":"APPEND","exec_mem_limit":"2147483648","strict_mode":"false","max_batch_interval": 20","max_batch_size": "209715200","esscape":"\u 0000","enclose":"\u0000","partitions":"**","columnToColumnExpr":"","whereExpr":"*****'',"desired_concurrent_number":"256","precedingFilter":"*","format":"csv","max_error_number":"0","max_filter_ratio":"1. 0","sequence_col":"****} DataSourceProperties: {"topic":"my-topic","currentkafkaPartitions": "0", "brokerList": "10.16.10.10.10.77:19092"} CustomProperties: {"kafka_default_offsets":"OFFSET_BEGINNING","group.id": "test2_7f6143d8-f270-4667-851a-e8fb87c27d32"} Statistic: {"receivedBytes":89,"runningTxns": [1542060502549504],"errorRows":0, "committedTaskNum":0, "loadedRows":1,"LoadRowsRate":0,"abortedTaskNum":7,"errorRowsAfterResumed":0,"totalRows" :1,"unselectedRows":0,"receivedBytesRate":1,"taskExecuteTimeMs":51588} Progress: {"0":"0"} Lag: {"0":1} ReasonOfStateChanged: ErrorLogUrls: OtherMsg: 2025-07-23 13:08:07: [INTERNAL_ERROR]TStatus:AnalysisException: errCode = 2, detailMessage = , connect context's user is null, ComputeGroupException: CURRENT_USER_NO_AUTH_TO_US E_DEFAULT_COMPUTE_GROUP, you can contact the system admministrator and request that they grant you the defaultcompute group permissions, use SQL 'SHOW PROPERTY like'default_compute_group'` and NT USAGE_PRIV ON COMPUTE GROUP {compute_group_name}TO{user} GRA 0# # doris::Status doris::Status::create<true>(doris::TStatus const&) at /mnt/disk1/laihui/build/ldb_toolchain/bin/../lib/gcc/x86_64-pc-linux-gnu/114/include/g++-v14/bits/basic_string.h:228 1# doris::io::MultiTablePipe::request_and_exec_plans() at /mnt/disk1/laihui/doris/be/src/common/status.h:522 2# doris: RoutineLoadTaskExecutor::exec_task(std::shared_ptr<doris: StreamLoadContext>, doris::DataConsumerPool*, std::function<void (std::shared_ptr<doris::StreamLoadContext>)>) at /mnt/di sk1/laihui/doris/be/src/runtime/routine_load/routine_load_task_executor.cpp:0 3# std::_Function_handler<void (), ... (reason istruncated, check fe.log with txnId for details(1 User: root Comment: ``` None - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into --> ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
… leader change (apache#53799) ### What problem does this PR solve? multi table load plan fail after restart master Fe or leader change: ``` mysql> show routine load for test_multi_table\G *************************** Id: 1753247186255 Name: test2 CreateTime: 2025-07-23 13:06:53 PauseTime: NULL EndTime: NULL DbName: db TableName: IsMultiTable: true State: RUNNING DataSourceType: KAFKA CurrentTaskNum: 1 JobProperties: {"max_batch_rows": "3000000","timezone":"Asia/Shanghai","send_batch_parallelism":"1","loadd_to_single_tablet":"false","column_separator":";'''","line_delimiter":"\n","delete":"*"," current_concurrent_number":"1","partial_columns":"false","merge_type":"APPEND","exec_mem_limit":"2147483648","strict_mode":"false","max_batch_interval": 20","max_batch_size": "209715200","esscape":"\u 0000","enclose":"\u0000","partitions":"**","columnToColumnExpr":"","whereExpr":"*****'',"desired_concurrent_number":"256","precedingFilter":"*","format":"csv","max_error_number":"0","max_filter_ratio":"1. 0","sequence_col":"****} DataSourceProperties: {"topic":"my-topic","currentkafkaPartitions": "0", "brokerList": "10.16.10.10.10.77:19092"} CustomProperties: {"kafka_default_offsets":"OFFSET_BEGINNING","group.id": "test2_7f6143d8-f270-4667-851a-e8fb87c27d32"} Statistic: {"receivedBytes":89,"runningTxns": [1542060502549504],"errorRows":0, "committedTaskNum":0, "loadedRows":1,"LoadRowsRate":0,"abortedTaskNum":7,"errorRowsAfterResumed":0,"totalRows" :1,"unselectedRows":0,"receivedBytesRate":1,"taskExecuteTimeMs":51588} Progress: {"0":"0"} Lag: {"0":1} ReasonOfStateChanged: ErrorLogUrls: OtherMsg: 2025-07-23 13:08:07: [INTERNAL_ERROR]TStatus:AnalysisException: errCode = 2, detailMessage = , connect context's user is null, ComputeGroupException: CURRENT_USER_NO_AUTH_TO_US E_DEFAULT_COMPUTE_GROUP, you can contact the system admministrator and request that they grant you the defaultcompute group permissions, use SQL 'SHOW PROPERTY like'default_compute_group'` and NT USAGE_PRIV ON COMPUTE GROUP {compute_group_name}TO{user} GRA 0# # doris::Status doris::Status::create<true>(doris::TStatus const&) at /mnt/disk1/laihui/build/ldb_toolchain/bin/../lib/gcc/x86_64-pc-linux-gnu/114/include/g++-v14/bits/basic_string.h:228 1# doris::io::MultiTablePipe::request_and_exec_plans() at /mnt/disk1/laihui/doris/be/src/common/status.h:522 2# doris: RoutineLoadTaskExecutor::exec_task(std::shared_ptr<doris: StreamLoadContext>, doris::DataConsumerPool*, std::function<void (std::shared_ptr<doris::StreamLoadContext>)>) at /mnt/di sk1/laihui/doris/be/src/runtime/routine_load/routine_load_task_executor.cpp:0 3# std::_Function_handler<void (), ... (reason istruncated, check fe.log with txnId for details(1 User: root Comment: ``` ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
… leader change (apache#53799) ### What problem does this PR solve? multi table load plan fail after restart master Fe or leader change: ``` mysql> show routine load for test_multi_table\G *************************** Id: 1753247186255 Name: test2 CreateTime: 2025-07-23 13:06:53 PauseTime: NULL EndTime: NULL DbName: db TableName: IsMultiTable: true State: RUNNING DataSourceType: KAFKA CurrentTaskNum: 1 JobProperties: {"max_batch_rows": "3000000","timezone":"Asia/Shanghai","send_batch_parallelism":"1","loadd_to_single_tablet":"false","column_separator":";'''","line_delimiter":"\n","delete":"*"," current_concurrent_number":"1","partial_columns":"false","merge_type":"APPEND","exec_mem_limit":"2147483648","strict_mode":"false","max_batch_interval": 20","max_batch_size": "209715200","esscape":"\u 0000","enclose":"\u0000","partitions":"**","columnToColumnExpr":"","whereExpr":"*****'',"desired_concurrent_number":"256","precedingFilter":"*","format":"csv","max_error_number":"0","max_filter_ratio":"1. 0","sequence_col":"****} DataSourceProperties: {"topic":"my-topic","currentkafkaPartitions": "0", "brokerList": "10.16.10.10.10.77:19092"} CustomProperties: {"kafka_default_offsets":"OFFSET_BEGINNING","group.id": "test2_7f6143d8-f270-4667-851a-e8fb87c27d32"} Statistic: {"receivedBytes":89,"runningTxns": [1542060502549504],"errorRows":0, "committedTaskNum":0, "loadedRows":1,"LoadRowsRate":0,"abortedTaskNum":7,"errorRowsAfterResumed":0,"totalRows" :1,"unselectedRows":0,"receivedBytesRate":1,"taskExecuteTimeMs":51588} Progress: {"0":"0"} Lag: {"0":1} ReasonOfStateChanged: ErrorLogUrls: OtherMsg: 2025-07-23 13:08:07: [INTERNAL_ERROR]TStatus:AnalysisException: errCode = 2, detailMessage = , connect context's user is null, ComputeGroupException: CURRENT_USER_NO_AUTH_TO_US E_DEFAULT_COMPUTE_GROUP, you can contact the system admministrator and request that they grant you the defaultcompute group permissions, use SQL 'SHOW PROPERTY like'default_compute_group'` and NT USAGE_PRIV ON COMPUTE GROUP {compute_group_name}TO{user} GRA 0# # doris::Status doris::Status::create<true>(doris::TStatus const&) at /mnt/disk1/laihui/build/ldb_toolchain/bin/../lib/gcc/x86_64-pc-linux-gnu/114/include/g++-v14/bits/basic_string.h:228 1# doris::io::MultiTablePipe::request_and_exec_plans() at /mnt/disk1/laihui/doris/be/src/common/status.h:522 2# doris: RoutineLoadTaskExecutor::exec_task(std::shared_ptr<doris: StreamLoadContext>, doris::DataConsumerPool*, std::function<void (std::shared_ptr<doris::StreamLoadContext>)>) at /mnt/di sk1/laihui/doris/be/src/runtime/routine_load/routine_load_task_executor.cpp:0 3# std::_Function_handler<void (), ... (reason istruncated, check fe.log with txnId for details(1 User: root Comment: ``` ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
What problem does this PR solve?
multi table load plan fail after restart master Fe or leader change:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)