Skip to content

Conversation

@smallx
Copy link
Contributor

@smallx smallx commented Aug 25, 2024

The version number of unique key mow table is strictly increasing. We just need to ensure that the txn status is COMMITTED instead of VISIBLE. Otherwise, binlog replay may loop endlessly for this txn.

ccr binlog replay error log: (loop endlessly)

ERROR wait transaction done failed, err +[normal] transaction 501 status: COMMITTED job=*** line=base/spec.go:798

dest doris cluster fe log: (loop endlessly)

WARN (thrift-server-pool-37|708) [MasterImpl.finishTask():94] finish task reports bad. request: TFinishTaskRequest(backend:TBackend(host:***, be_port:9060, http_port:8040), task_type:PUBLISH_VERSION, signature:501, task_status:TStatus(status_code:INTERNAL_ERROR, error_msgs:[(***)[E-3115]version not continuous for mow, tablet_id=15144, tablet_max_version=1037596, txn_version=1037627]), report_version:17244039530467, error_tablet_ids:[15028, 15036, 15044, 15048, 15056, 15064, 15068, 15076, 15084, 15088, 15096, 15104, 15108, 15116, 15124, 15128, 15136, 15144], succ_tablets:{}, table_id_to_delta_num_rows:{})

the status of txn 501 is forever COMMITTED with ErrMsg wait for publishing partition 15027 version 1037597. self version: 1037627. table 15025.

@smallx
Copy link
Contributor Author

smallx commented Aug 25, 2024

@JackDrogon @w41ter please review, thank you.

@w41ter
Copy link
Contributor

w41ter commented Aug 26, 2024

@smallx This seems to be a bug in backup/restore:

version not continuous for mow, tablet_id=15144, tablet_max_version=1037596, txn_version=1037627]), report_version:17244039530467, error_tablet_ids:[15028, 15036, 15044, 15048, 15056, 15064, 15068, 15076, 15084, 15088, 15096, 15104, 15108, 15116, 15124, 15128, 15136, 15144

which version of the FE did you test?

@smallx
Copy link
Contributor Author

smallx commented Aug 26, 2024

@smallx This seems to be a bug in backup/restore:

version not continuous for mow, tablet_id=15144, tablet_max_version=1037596, txn_version=1037627]), report_version:17244039530467, error_tablet_ids:[15028, 15036, 15044, 15048, 15056, 15064, 15068, 15076, 15084, 15088, 15096, 15104, 15108, 15116, 15124, 15128, 15136, 15144

which version of the FE did you test?

@w41ter Doris 2.0.14

@w41ter
Copy link
Contributor

w41ter commented Aug 26, 2024

@smallx apache/doris#38321 This PR has fixed a likely problem caused by the backup/restore, it might be the root cause of the case you meet.

@smallx
Copy link
Contributor Author

smallx commented Aug 26, 2024

@w41ter Okay, I will try this patch. 3q

@smallx
Copy link
Contributor Author

smallx commented Aug 29, 2024

apache/doris#38321 has been merged into doris 2.0.14. This issue should be fixed by apache/doris#40118, Closing this. :-)

@smallx smallx closed this Aug 29, 2024
@smallx smallx deleted the fix-commit-txn-for-mow-table branch August 29, 2024 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants