Skip to content

KAFKA-7956 In ShutdownableThread, immediately complete the shutdown if the thread has not been started#6218

Merged
junrao merged 9 commits intoapache:trunkfrom
gardnervickers:shutdownablethread-nonblocking-shutdown
Feb 26, 2019
Merged

KAFKA-7956 In ShutdownableThread, immediately complete the shutdown if the thread has not been started#6218
junrao merged 9 commits intoapache:trunkfrom
gardnervickers:shutdownablethread-nonblocking-shutdown

Conversation

@gardnervickers
Copy link
Copy Markdown
Contributor

@gardnervickers gardnervickers commented Feb 1, 2019

In some test cases it's desirable to instantiate a subclass of ShutdownableThread without starting it. Since most subclasses of ShutdownableThread put cleanup logic in ShutdownableThread.shutdown(), being able to call shutdown() on the non-running thread would be useful.

This change allows us to avoid blocking in ShutdownableThread.shutdown() if the thread's run() method has not been called. We also add a check that initiateShutdown() was called before awaitShutdown(), to protect against the case where a user calls awaitShutdown() before the thread has been started, and unexpectedly is not blocked on the thread shutting down.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@gardnervickers gardnervickers changed the title In ShutdownableThread, immediately complete the shutdown if the thread is not alive In ShutdownableThread, immediately complete the shutdown if the has not been started Feb 1, 2019
@gardnervickers gardnervickers force-pushed the shutdownablethread-nonblocking-shutdown branch from dac346a to d20095e Compare February 1, 2019 22:06
shutdownComplete latch if the thread is not running.
@ijuma ijuma requested a review from junrao February 2, 2019 19:19
@gardnervickers gardnervickers changed the title In ShutdownableThread, immediately complete the shutdown if the has not been started In ShutdownableThread, immediately complete the shutdown if the thread has not been started Feb 4, 2019
def shutdown(): Unit = {
initiateShutdown()
awaitShutdown()
if (this.isAlive) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be clearer if this logic was moved to awaitShutdown.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhruvilshah3 actually I think awaitShutdown() needs to block regardless of the thread being alive or not. There could be cases where the user calls awaitShutdown() from a thread before starting the ShutdownableThread. The goal I had here was to just short-circuit shutdown() if the thread was never started.

Copy link
Copy Markdown
Contributor

@dhruvilshah3 dhruvilshah3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, LGTM. Just a minor comment.

Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gardnervickers : Thanks for the patch. A couple of comments below.

Comment thread core/src/main/scala/kafka/utils/ShutdownableThread.scala Outdated
if (this.isAlive) {
awaitShutdown()
} else {
shutdownComplete.countDown()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of usage, a user could call (1) shutdown(), or (2) initiateShutdown(); some other code; awaitShutdown(). In the case when a thread is never started, the patch makes sure that (1) doesn't block. For consistency, it seems that we should make (2) not blocking too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not sure, my interpretation of awaitShutdown() was that it should always block the caller until shutdown is complete, regardless of the thread being started or not.

Looking through the codebase it seems like it's mostly used when we know the thread is already started, so it's probably best to do the this.isAlive check in awaitShutdown() in order to make both (1) and (2) non-blocking.

Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gardnervickers : Thanks for the updated patch. One more suggestion below.

def awaitShutdown(): Unit = {
shutdownComplete.await()
if (this.isAlive)
shutdownComplete.await()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 ways to shut down a ShutdownableThread: (1) Just call shutdown(), which blocks until the thread completes. (2) First call initiateShutdown() and then call awaitShutdown(). The reason for this style is to allow the caller to add additional logic between the 2 calls. For example, if the thread can block in a selector, one can wake up the selector between the 2 calls. If the thread wasn't started or died unexpectedly, it's reasonable not to block the shutdown in either case.

Also, just calling awaitShutdown() w/o calling initiateShutdown() first is unexpected and not supported. To protect against this, perhaps we can add a check to make sure that shutdownInitiated is already 0 when awaitShutdown() is called. Otherwise, we can throw an IllegalStateException.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I believe we can cover both cases here if we check in awaitShutdown() that the thread has been started before blocking? Perhaps I'm misunderstanding though.

I'll add a check that shutdownInitiated == 0.

@gardnervickers
Copy link
Copy Markdown
Contributor Author

During testing I found Thread.isAlive isn't consistent with the actual thread status. Occasionally Thread.isAlive would return false when being queried from a running thread. I will introduce an alternate strategy for tracking thread liveness.

@gardnervickers gardnervickers force-pushed the shutdownablethread-nonblocking-shutdown branch from 0b596ed to 7c6ac0d Compare February 18, 2019 20:18
@gardnervickers
Copy link
Copy Markdown
Contributor Author

I updated the PR to set an isStarted flag in Thread.start() instead of relying on Thread.isAlive()

@gardnervickers
Copy link
Copy Markdown
Contributor Author

I also have an alternate patch available which removes the usage of CountdownLatch in favor of explicitly defining the thread states and enforcing a state transition order.
https://github.com/apache/kafka/compare/trunk...gardnervickers:shutdownablethread-nonblocking-shutdown-new?expand=1

Beyond attempting to make explicit the various states a ShutdownableThread can be in, it removes the need to ensure call ordering between initiateShutdown() and awaitShutdown(). I can replace this current patch with the new one if it seems like the refactor is worth it.

@gardnervickers
Copy link
Copy Markdown
Contributor Author

retest this please

Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gardnervickers : Since the usage of ShutdownableThread is mostly simple, perhaps we can just take the current patch for now. Could you file a jira and include it in the PR title so that we can track it? Just one other comment below.

super.start()
}
override def run(): Unit = {
isStarted = true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, do we need to set isStarted here again since it's already set in start()?

Copy link
Copy Markdown
Contributor Author

@gardnervickers gardnervickers Feb 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to cover the case where run() is called directly, without start(), either directly or by an executor implementation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, could we just set it in run() and not in start()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added in the start() case because there could be a delay between start() getting called and run() getting called in the new thread, but I think this sends up producing inconsistent behavior depending on how the ShutdownableThread is started.

In the current state, if start() is used to start the ShutdownableThread, we're guaranteed to always correctly block in subsequent calls to awaitShutdown() since isStarted is set from the caller thread. If the ShutdownableThread is submitted to an executor, run() can be called at any time in the future, which makes it impossible for the caller thread to know if awaitShutdown() will block as we expect.

We end up with different guarantees based on how the class is run, which is not optimal. I think the only one we can guarantee across both start() and an Executor starting the thread is that the caller thread won't know when isStarted is true, and as a result, when it can rely on the blocking behavior of awaitShutdown.

@gardnervickers gardnervickers changed the title In ShutdownableThread, immediately complete the shutdown if the thread has not been started KAFKA-7956 In ShutdownableThread, immediately complete the shutdown if the thread has not been started Feb 19, 2019
@gardnervickers
Copy link
Copy Markdown
Contributor Author

retest this please

Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gardnervickers : Thanks for the patch. LGTM. Just waiting for the tests to pass.

@gardnervickers
Copy link
Copy Markdown
Contributor Author

retest this please

@junrao
Copy link
Copy Markdown
Contributor

junrao commented Feb 22, 2019

@gardnervickers : The test actually revealed a real issue.

15:00:11 kafka.controller.ControllerEventManagerTest > testEventThatThrowsException STARTED
15:00:11 kafka.controller.ControllerEventManagerTest.testEventThatThrowsException failed, log available in /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/core/build/reports/testOutput/kafka.controller.ControllerEventManagerTest.testEventThatThrowsException.test.stdout
15:00:11 
15:00:11 kafka.controller.ControllerEventManagerTest > testEventThatThrowsException FAILED
15:00:11     java.lang.IllegalStateException: initiateShutdown() was not called before awaitShutdown()
15:00:11         at kafka.utils.ShutdownableThread.awaitShutdown(ShutdownableThread.scala:59)
15:00:11         at kafka.controller.ControllerEventManager.close(ControllerEventManager.scala:65)
15:00:11         at kafka.controller.ControllerEventManagerTest.tearDown(ControllerEventManagerTest.scala:38)

The issue is that in ControllerEventManager.close(). We have the following logic.

    clearAndPut(KafkaController.ShutdownEventThread)
    thread.awaitShutdown()

We put a ShutdownEventThread event into the queue, which will be processed by the ControllerEventThread. When this thread handles the ShutdownEventThread event, it calls initiateShutdown(). Since this happens async, there is no guarantee that initiateShutdown() will be called before awaitShutdown().

We can probably change this logic a bit. In ControllerEventManager.close(), we can do

    initiateShutdown()
    clearAndPut(KafkaController.ShutdownEventThread)
    thread.awaitShutdown()

Then, we can just get rid of initiateShutdown() when handling the ShutdownEventThread event in the ControllerEventThread.

@gardnervickers
Copy link
Copy Markdown
Contributor Author

Thanks! I pushed a fix for that. I noticed there were a few more tests failing but I missed the test result caching window (again) so I'll track those down too.

Copy link
Copy Markdown
Contributor

@junrao junrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gardnervickers : Thanks for the updated PR. LGTM. Just a minor comment below.

override def doWork(): Unit = {
queue.take() match {
case KafkaController.ShutdownEventThread => initiateShutdown()
case KafkaController.ShutdownEventThread =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment like "// The shutting down of the thread has been initiated at this point. Just ignore this event"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good!

@junrao junrao merged commit bd6520a into apache:trunk Feb 26, 2019
jarekr pushed a commit to confluentinc/kafka that referenced this pull request Apr 18, 2019
* AK/trunk: (36 commits)
  KAFKA-7962: Avoid NPE for StickyAssignor (apache#6308)
  Address flakiness of CustomQuotaCallbackTest#testCustomQuotaCallback (apache#6330)
  KAFKA-7918: Inline generic parameters Pt. II: RocksDB Bytes Store and Memory LRU Caches (apache#6327)
  MINOR: fix parameter naming (apache#6316)
  KAFKA-7956 In ShutdownableThread, immediately complete the shutdown if the thread has not been started (apache#6218)
  MINOR: Refactor replica log dir fetching for improved logging (apache#6313)
  [TRIVIAL] Remove unused StreamsGraphNode#repartitionRequired (apache#6227)
  MINOR: Increase produce timeout to 120 seconds (apache#6326)
  KAFKA-7918: Inline generic parameters Pt. I: in-memory key-value store (apache#6293)
  MINOR: Fix line break issue in upgrade notes (apache#6320)
  KAFKA-7972: Use automatic RPC generation in SaslHandshake
  MINOR: Enable capture of full stack trace in StreamTask#process (apache#6310)
  KAFKA-7938: Fix test flakiness in DeleteConsumerGroupsTest (apache#6312)
  KAFKA-7937: Fix Flaky Test ResetConsumerGroupOffsetTest.testResetOffsetsNotExistingGroup (apache#6311)
  MINOR: Update docs to say 2.2 (apache#6315)
  KAFKA-7672 : force write checkpoint during StreamTask #suspend (apache#6115)
  KAFKA-7961; Ignore assignment for un-subscribed partitions (apache#6304)
  KAFKA-7672: Restoring tasks need to be closed upon task suspension (apache#6113)
  KAFKA-7864; validate partitions are 0-based (apache#6246)
  KAFKA-7492 : Updated javadocs for aggregate and reduce methods returning null behavior. (apache#6285)
  ...
pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
…f the thread has not been started (apache#6218)

In some test cases it's desirable to instantiate a subclass of `ShutdownableThread` without starting it. Since most subclasses of `ShutdownableThread` put cleanup logic in `ShutdownableThread.shutdown()`, being able to call `shutdown()` on the non-running thread would be useful.

This change allows us to avoid blocking in `ShutdownableThread.shutdown()` if the thread's `run()` method has not been called. We also add a check that `initiateShutdown()` was called before `awaitShutdown()`, to protect against the case where a user calls `awaitShutdown()` before the thread has been started, and unexpectedly is not blocked on the thread shutting down.

Reviewers : Dhruvil Shah <dhruvil@confluent.io>, Jun Rao <junrao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants