Skip to content

Conversation

@EmmyMiao87
Copy link
Contributor

If FE is restarted between txn committed and visible, the load job will be rescheduled and failed with label already exists.
The reason is that there are inconsistency between transaction of load job and meta of load job.
So, the replay of the txn attachment need to be done in function replayOnCommitted.
The load job state and progress is correct after that.

@EmmyMiao87 EmmyMiao87 added the kind/fix Categorizes issue or PR as related to a bug. label Oct 16, 2019
@EmmyMiao87 EmmyMiao87 added this to the 0.11 milestone Oct 16, 2019
@EmmyMiao87 EmmyMiao87 requested a review from morningman October 16, 2019 02:24
@EmmyMiao87 EmmyMiao87 self-assigned this Oct 16, 2019
@EmmyMiao87
Copy link
Contributor Author

#1991


@Override
protected void executeReplayOnVisible(TransactionState txnState) {
protected void executeReplayTxnAttachment(TransactionState txnState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected void executeReplayTxnAttachment(TransactionState txnState) {
protected void replayTxnAttachment(TransactionState txnState) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

map.put(loadJob.getLabel(), jobs);
}
jobs.add(loadJob);
if (!loadJob.isCompleted()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some comment to explain this operation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use another way to achieve this. When this function is called, Catalog is not ready. However you start to use Catalog's item, this is error-prone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callback need to be add in here. The reason is that the replay of txn maybe use the callback and reload the job state.

map.put(loadJob.getLabel(), jobs);
}
jobs.add(loadJob);
if (!loadJob.isCompleted()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use another way to achieve this. When this function is called, Catalog is not ready. However you start to use Catalog's item, this is error-prone.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other questions:

  1. isCommitting and isCancellable variable in LoadJob.java is confusing. Add some comments.
    And it seems that isCancellable is useless.

}
jobs.add(loadJob);
if (!loadJob.isCompleted()) {
Catalog.getCurrentGlobalTransactionMgr().getCallbackFactory().addCallback(loadJob);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done in prepareJobs();

writeLock();
try {
executeReplayOnVisible(txnState);
executeReplayTxnAttachment(txnState);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name executeReplayTxnAttachment should be changed.
Or calling a "replay" method in "non-replay" method is weird

imay
imay previously approved these changes Oct 16, 2019
Copy link
Contributor

@imay imay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…he txn has been finished

If FE is restarted between txn committed and visible, the load job will be rescheduled and failed with label already exists.
The reason is that there are inconsistency between transaction of load job and meta of load job.
So, the replay of the txn attachment need to be done in function replayOnCommitted.
The load job state and progress is correct after that.
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@imay imay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit ac16318 into apache:master Oct 16, 2019
wuyunfeng pushed a commit to wuyunfeng/incubator-doris that referenced this pull request Oct 22, 2019
…he txn has been finished (apache#1992)

If FE is restarted between txn committed and visible, the load job will be rescheduled and failed with label already exists.
The reason is that there are inconsistency between transaction of load job and meta of load job.
So, the replay of the txn attachment need to be done in function replayOnCommitted.
The load job state and progress is correct after that.
morningman pushed a commit to morningman/doris that referenced this pull request Dec 10, 2019
…he txn has been finished (apache#1992)

If FE is restarted between txn committed and visible, the load job will be rescheduled and failed with label already exists.
The reason is that there are inconsistency between transaction of load job and meta of load job.
So, the replay of the txn attachment need to be done in function replayOnCommitted.
The load job state and progress is correct after that.
swjtu-zhanglei pushed a commit to swjtu-zhanglei/incubator-doris that referenced this pull request Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/fix Categorizes issue or PR as related to a bug.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants