Skip to content

MINOR: Refactor code for restoring tasks#5768

Merged
mjsax merged 6 commits intoapache:trunkfrom
mjsax:minor-task-cleanup
Nov 23, 2018
Merged

MINOR: Refactor code for restoring tasks#5768
mjsax merged 6 commits intoapache:trunkfrom
mjsax:minor-task-cleanup

Conversation

@mjsax
Copy link
Copy Markdown
Member

@mjsax mjsax commented Oct 9, 2018

Only StreamTasks can be restored. Thus, moving restoring logic into AssignedStreamTasks class

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@mjsax mjsax added the streams label Oct 9, 2018
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Oct 9, 2018

Call for review @guozhangwang @bbejeck @vvcephei

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this cleanup! Thanks for doing this.

This is orthogonal to the PR, but while reviewing this I found that today it seems we always try to restore a task even if the store is not loggingEabled, what will happen if loggingEnabled is false?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: @Override.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Override.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for testing only? If yes let's make it at the bottom of the class with comments.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about add the interface to RestoringTasks? And then in initializeNewTasks we can:

                if (!entry.getValue().initializeStateStores()) {
                    log.debug("Transitioning {} {} to restoring", taskTypeName, entry.getKey());
                    ((RestoringTasks) this).addToRestoring(entry.getValue());
                } else {
                    transitionToRunning(entry.getValue());
                }

We can also move updateRestored to that interface as well.

Copy link
Copy Markdown
Member Author

@mjsax mjsax Nov 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure about this.

The original motivation to add the method addToRestoring to AssignedTasks was to avoid the cast in initializeNewTasks -- it's fair request to get rid of this exception code and trade in a cast. Neither are nice solutions.

However, adding both methods to RestoringTasks does not improve the situation, as we still need the cast (we could also directly cast to AssignedStreamTasks instead) while we loose package private visibility and both methods are public if we add them to RestoringTasks.

Also RestoringTasks was added for a different purpose and adding more method would "drill a hole" into the API that we might not want. (The idea of RestoringTasks was to protect access to AssignedStreamsTask from StoreChangelogReader.)

Thoughts?

Copy link
Copy Markdown
Member

@bbejeck bbejeck Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @mjsax regarding the change in that adding the methods to RestoringTasks we are still left with a cast.

But IMHO the trade-off is worth it when considering the interaction with the StoreChangelogReader as the StoreChangelogReader should not have exposure to AssignedStreamsTask methods, but only the methods related to restoring as presented in the RestoringTasks interface.

Either way, it's a trade-off, but again IMHO presenting the StoreChangelogReader with an interface limited to only restoring concerns makes the most sense to me.

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Nov 4, 2018

This is orthogonal to the PR, but while reviewing this I found that today it seems we always try to restore a task even if the store is not loggingEabled, what will happen if loggingEnabled is false?

Good question. I was looking into the code a little bit. It seems, this would not really be a problem. The task is added to restoring and also for all task partitions it's added to retoringByPartition (if all stores have loggingDisabled, we would still add for input topic partitions). In AssingedStreamTasks#updateRestored we would remove it from restored via

            if (restoredPartitions.containsAll(task.changelogPartitions())) {
                transitionToRunning(task);
                it.remove();

(if all stores have loggingDisabled task.changelogPartitions() would be empty and contiansAll() would return true).

We only don't clean up restoringByPartition it seems... Thoughts?

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Nov 6, 2018

Call for review @bbejeck @vvcephei

Please consider and comment: #5768 (comment)

@mjsax mjsax force-pushed the minor-task-cleanup branch from 86e6ae9 to 09fee8f Compare November 6, 2018 05:19
Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I don't have any further comments beyond what @guozhangwang has presented already. The cleanup looks good to me.

I've left my comments for #5768 (comment) in-line.

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Nov 15, 2018

Can we run the EOS system tests as one last check before merging?

@guozhangwang
Copy link
Copy Markdown
Contributor

retest this please

@guozhangwang
Copy link
Copy Markdown
Contributor

Can we run the EOS system tests as one last check before merging?

+1

@guozhangwang
Copy link
Copy Markdown
Contributor

We only don't clean up restoringByPartition it seems... Thoughts?

Thanks for looking into this! I think it's okay to leave it in restoringByPartition since we do not actively remove entries from it until cleanup anyways, and we only fetch entries from it is restoringTaskFor which is not affected. So keeping this map entry does not hurt much.

@guozhangwang
Copy link
Copy Markdown
Contributor

LGTM. @mjsax please feel free to merge if the EOS system test passed as well.

@mjsax mjsax force-pushed the minor-task-cleanup branch from 09fee8f to 6f0cef5 Compare November 17, 2018 03:18
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Nov 17, 2018

Remove the methods from RestoringTasks interface, as discussed, and rebased. If not further comments come in, I'll merge after build passed.

@guozhangwang
Copy link
Copy Markdown
Contributor

Checkstyle failure:

21:32:10 FAILURE: Build failed with an exception.
21:32:10 
21:32:10 * What went wrong:
21:32:10 Execution failed for task ':streams:checkstyleMain'.

All tests passed.

@guozhangwang
Copy link
Copy Markdown
Contributor

Remove the methods from RestoringTasks interface, as discussed, and rebased. If not further comments come in, I'll merge after build passed.

Did you run EOS system test to verify it passed?

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Nov 18, 2018

Fixed checkstyle error.

Triggered system test: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2085/

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Nov 19, 2018

Some system tests failed because of unrelated bug. Filed #5928 to fix the system tests. Can rebase this PR after fix is merged.

Build timed out. Will retrigger Jenkins after rebase.

@mjsax mjsax force-pushed the minor-task-cleanup branch from e7dfdbc to 94a941e Compare November 20, 2018 02:46
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Nov 20, 2018

Rebased to pick up system test fixes from #5928. Triggered system tests again: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2090/

@mjsax mjsax force-pushed the minor-task-cleanup branch from 7109bb4 to 93b7daf Compare November 21, 2018 20:59
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Nov 21, 2018

Rebased to pick up system test fix. Re-triggered system tests: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2097/

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Nov 23, 2018

System test passed. Merging.

@mjsax mjsax merged commit d0ed389 into apache:trunk Nov 23, 2018
@mjsax mjsax deleted the minor-task-cleanup branch November 23, 2018 20:32
pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
Reviewers: Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants