-
Notifications
You must be signed in to change notification settings - Fork 3.7k
(cloud-merge) Support to abort txn when coordinate be restart and do schema change #37669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
ca83c30 to
166a237
Compare
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| #include <gen_cpp/cloud.pb.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'gen_cpp/cloud.pb.h' file not found [clang-diagnostic-error]
#include <gen_cpp/cloud.pb.h>
^|
run buildall |
1 similar comment
|
run buildall |
TPC-H: Total hot run time: 39787 ms |
TPC-DS: Total hot run time: 173334 ms |
ClickBench: Total hot run time: 30.29 s |
5db13f0 to
70332d6
Compare
|
run buildall |
TPC-H: Total hot run time: 39994 ms |
TPC-DS: Total hot run time: 172919 ms |
ClickBench: Total hot run time: 30.32 s |
|
run cloud_p0 |
fe/fe-core/src/main/java/org/apache/doris/transaction/GlobalTransactionMgr.java
Show resolved
Hide resolved
70332d6 to
caa19ef
Compare
|
run buildall |
2 similar comments
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 40310 ms |
TPC-DS: Total hot run time: 173016 ms |
ClickBench: Total hot run time: 30.66 s |
|
run external |
93ad64b to
7bff4b6
Compare
7bff4b6 to
0be9160
Compare
|
run buildall |
fe/fe-core/src/main/java/org/apache/doris/transaction/GlobalTransactionMgr.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/transaction/GlobalTransactionMgr.java
Show resolved
Hide resolved
|
run buildall |
TPC-H: Total hot run time: 41733 ms |
TPC-DS: Total hot run time: 171104 ms |
ClickBench: Total hot run time: 29.87 s |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…schema change (#37669) When do schema change, it will check whether transactions of the given database which txnId is less than 'watershedTxnId' are finished. If is not, schema change will be waiting. When the coordinator restart (whatever fe/be), the txn belong the cooridinator will be hang until timeout. So we need to abort it in time. There are two optimizations. 1. Check BE's lastStartTime when get it heart beat. If current lastStartTime is large than lastStartTime in fe memory. Abort all the hang txns belong this BE. 2. Check conflict txns when do schema change. If txns is failed (Maybe coordinator be/fe restart), abort it directly.
…schema change (#37669) When do schema change, it will check whether transactions of the given database which txnId is less than 'watershedTxnId' are finished. If is not, schema change will be waiting. When the coordinator restart (whatever fe/be), the txn belong the cooridinator will be hang until timeout. So we need to abort it in time. There are two optimizations. 1. Check BE's lastStartTime when get it heart beat. If current lastStartTime is large than lastStartTime in fe memory. Abort all the hang txns belong this BE. 2. Check conflict txns when do schema change. If txns is failed (Maybe coordinator be/fe restart), abort it directly.
When do schema change, it will check whether transactions of the given database which txnId is less than 'watershedTxnId' are finished. If is not, schema change will be waiting.
When the coordinator restart (whatever fe/be), the txn belong the cooridinator will be hang until timeout. So we need to abort it in time.
There are two optimizations.