Skip to content

MINOR: MiniKdc JVM shutdown hook fix#7946

Merged
rajinisivaram merged 8 commits intoapache:trunkfrom
rondagostino:MINOR_MiniKdc_fix
Jan 24, 2020
Merged

MINOR: MiniKdc JVM shutdown hook fix#7946
rajinisivaram merged 8 commits intoapache:trunkfrom
rondagostino:MINOR_MiniKdc_fix

Conversation

@rondagostino
Copy link
Copy Markdown
Contributor

When started as a separate process (which happens only in system tests), the shutdown hook for MiniKdc was running immediately, which causes tests that depend on it for Kerberos to fail. The system tests in zookeeper_security_upgrade_test.py, for example, fail because ZooKeeper is unable to contact the KDC.

I ran the above system test locally and confirmed it failed prior to the fix and succeeded after the fix:

WITHOUT FIX:

$ ducktape tests/kafkatest/tests/core/zookeeper_security_upgrade_test.py
[INFO  - 2020-01-13 09:19:04,249 - runner_client - log - lineno:240]: RunnerClient: kafkatest.tests.core.zookeeper_security_upgrade_test.ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade.security_protocol=PLAINTEXT: Summary: Zookeeper node failed to start
Traceback (most recent call last):
...
test_id:    kafkatest.tests.core.zookeeper_security_upgrade_test.ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade.security_protocol=PLAINTEXT
status:     FAIL

WITH FIX:

$ ducktape tests/kafkatest/tests/core/zookeeper_security_upgrade_test.py
[INFO:2020-01-13 09:24:08,072]: RunnerClient: kafkatest.tests.core.zookeeper_security_upgrade_test.ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade.security_protocol=PLAINTEXT: PASS
...
test_id:    kafkatest.tests.core.zookeeper_security_upgrade_test.ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade.security_protocol=PLAINTEXT
status:     PASS

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding and fixing the issue. Could we add a unit or integration test that verifies the fix? That will help us not regress in the same way in the future.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jan 13, 2020

ok to test

@rondagostino
Copy link
Copy Markdown
Contributor Author

Thanks for finding and fixing the issue. Could we add a unit or integration test that verifies the fix? That will help us not regress in the same way in the future.

@ijuma I can think of two ways but not sure which one (if either) is appropriate.

  1. The brute force approach. Actually build and start a separate MiniKdc process, and test to make sure it doesn't exit right away. This would entail setting up the CLASSPATH environment variable with all the required jars, creating appropriate Kerberos config files, etc. It seems like a lot of work, and it isn't going to test anything else that creates a JVM shutdown hook -- it would just apply to the MiniKdc class. We add JVM shutdown hooks in 7 different places by my count, and all but the MiniKdc follow this pattern (MiniKdc leverages KafkaThread instead of Thread):
      Runtime.getRuntime().addShutdownHook(new Thread("<threadName>") {
        override def run(): Unit = {...}
      })

I think the possibility of regressing exists in the MiniKdc case because KafkaThread has so many overloaded constructors that it is possible to mistakenly not pass in the Runnable.

A similar potential for mistake exists with the other 6 cases (that create a Thread rather than a KafkaThread) as well because Thread isn't abstract; it is possible to mistakenly not override the run() method with this syntactically valid code:

      Runtime.getRuntime().addShutdownHook(new Thread("<threadName>") {
        miniKdc.stop() // no override def run() = good syntax but bad semantics
      })

Which leads us to the second possibility:

  1. Define a separate utility method that creates a JVM shutdown hook based on a provided instance of Runnable and a thread name:
  def addShutdownHook(threadName: String, runnable: Runnable): Unit = {
    Runtime.getRuntime.addShutdownHook(new KafkaThread(threadName, runnable, false))
  }

Such a utility method eliminates the possibility of mistakenly not overriding the run() method because this code is syntactically invalid:

addShutdownHook("minikdc-shutdown-hook", {miniKdc.stop()}) // does not compile

Static type checking that forces us to write it correctly:

SomeUtilityObject.addShutdownHook("minikdc-shutdown-hook", new Runnable() {override def run = miniKdc.stop()})

A test would make sure that whatever is passed in doesn't actually run right away.

A process that leverages the method wouldn't need so much configuration (just the test jar on the CLASSPATH and no Kerberos configs), so if we wanted to we could actually test that the Runnable in fact really does run when the JVM shuts down by creating a process and making the shutdown hook create a file or something like that when we kill the process.

A disadvantage of this second approach is that we won't test that MiniKdc (or anything else) actually applies a shutdown hook -- we would only be testing that any hook -- if applied -- will be called at the right time.

(2) seems like the better choice, but since shutdown hooks are generally difficult to test I wanted to confirm that the approach of writing the utility method seems reasonable and worth it before moving ahead.

@cmccabe
Copy link
Copy Markdown
Contributor

cmccabe commented Jan 13, 2020

I'm not sure that having a separate utility method is really that much more error-proof than having a constructor for KafkaThread like we do now. After all, you could always forget to call the utility method.

Since this is tested through ducktape, maybe that's enough for now? Unless we really want to add a junit test of spawning this thing in a separate JVM...

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jan 13, 2020

Can you not call start and check if MiniKdc is not stopped at the end of the method call?

@rondagostino
Copy link
Copy Markdown
Contributor Author

@ijuma Do you mean pass in something at the command line (e.g. --testing-do-not-start) that will cause private def start(...) to only register the shutdown hook? That seems kind of contrived to me and doesn't address any of the other 6 places where shutdown hooks are used.

I was tending to agree more with @cmccabe, that maybe this isn't worth trying to test, but it did occur to me that creating a utility method and doing in-process testing (i.e. confirm the shutdown hook is not being called immediately) would be quick to implement, and doing so would provide the static type checking benefits relatively cheaply. Out-of-process testing to confirm that the utility method is actually registering the hook seems like overkill, but we could creating something akin to kafka.util.Exit(e.g. kafka.util.Runtime) to confirm it -- this would also be relatively inexpensive to do.

It is expensive to confirm that anything spawning as a process is adding the hook. Maybe we just skip this part and optimize for cost/benefit as opposed to perfection?

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jan 13, 2020

@rondagostino I mean simply calling start in a test. No forking needed. Since stop will be called immediately, MiniKdc will be closed soon after the method returns. What am I missing?

@rondagostino
Copy link
Copy Markdown
Contributor Author

@ijuma That method is defined as private def start(...) on object MiniKdc. Would we have to change it to be public instead? We would also have to make a stop() method so we could tear down at the end of the test.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jan 14, 2020

You can make it private[minikdc]. You don't really need to add a stop method since the returned MiniKdc has a stop method.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@ijuma I'm think a two-pronged approach would be quick and effective:

  1. Address the root cause of this particular occurrence of the problem -- ineffective static type checking -- via a separate utility method on the Exit class (Exit.addShutdownHook(name: String, runnable: Runnable)` that we can quickly test. Make all 7 cases leverage the new method.

  2. Add a test for the MinKdc class itself to make sure the JVM doesn't exit.

Unless I hear otherwise I'll go ahead with both.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jan 15, 2020

Sounds good. I agree that a utility method that takes a Runnable instead of a Thread is useful.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@ijuma I changed the 7 Scala classes that were calling Runtime.getRuntime().addShutdownHook() and made them invoke kafka.utils.Exit.addShutdownHook() instead. I also changed the 20 instances of Java classes that were calling Runtime.getRuntime().addShutdownHook() and made them invoke org.apache.kafka.common.utils.Exit.addShutdownHook() instead.

There were 5 Java instances under streams/example/... and 3 Java instances under streams/quickstart/... that I did not modify; I thought perhaps these are supposed to represent code snippets that people can copy and it might be best for that code to not invoke non-public APIs.

@rondagostino
Copy link
Copy Markdown
Contributor Author

retest this please

throw new AssertionError("halt should not return, but it did.")
}

def addShutdownHook(runnable: Runnable, name: Option[String] = None): Unit = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a by-name parameter instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so because the runnable is dereferenced once, immediately, and the code passes it to the underlying Java implementation org.apache.kafka.common.utils.Exit

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why any of that disqualifies a by name parameter. The benefit of using a by-name is that the caller code becomes cleaner (no need for () => ...).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it working with a by-name parameter, and the below test passes. Is this what you were thinking? If so, I can add a new commit with the changes for a better look.

  @Test
  def shouldNotInvokeShutdownHookImmediately(): Unit = {
    val value = "value"
    val array:Array[Any] = Array(value)

    def sideEffect(): Unit = {
      // mutate the first element
      array(0) = array(0).toString + array(0).toString
    }
    Exit.addShutdownHook(sideEffect) // by-name parameter, not invoked
    // make sure the first element wasn't mutated
    assertEquals(value, array(0))
    Exit.addShutdownHook(sideEffect()) // by-name parameter, not invoked
    // again make sure the first element wasn't mutated
    assertEquals(value, array(0))
    Exit.addShutdownHook(sideEffect, Some("message")) // by-name parameter, not invoked
    // make sure the first element still isn't mutated
    assertEquals(value, array(0))
  }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


def addShutdownHook(code: => Unit, name: Option[String] = None): Unit = {
JExit.addShutdownHook(() => code, name.orNull)
def addShutdownHook(statementByName: => Unit, name: Option[String] = None): Unit = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scala is an expression oriented language, there are no statements. This could be simply f or shutdownHook. Also, I think name should not be optional and should be the first parameter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are no statements
This could be simply f or shutdownHook

I was under the impression that statements were expressions that return no value (i.e. :Unit), but I could see how this may be technically incorrect. I'll rename it shutdownHook.

I think name should not be optional and should be the first parameter.

There are places where currently a name is not given, and the Thread class auto-generates a name when it is not supplied, but I don't see a problem with making it a mandatory first parameter and supplying a thread name in the places where one is not yet supplied to the shutdown hook thread; I will do that. I will switch the order in the Java Exit class as well (name first, Runnable second).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Unit in Scala is actually a type and () a value. So, no statements. :)

With regards to names, I personally think it's very useful to give names to threads and we should require it. I think you've done that for the Scala code. Can we do it for the Java code too?

@rajinisivaram
Copy link
Copy Markdown
Contributor

@ijuma Looks like @rondagostino has addressed your comments. Does this PR need another full review or have you already reviewed the changes?

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jan 22, 2020

@rajinisivaram I didn't do a proper review, just looked at the method definitions. So, it would probably be good for you to review.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram The last commit addresses the last review comment from @ijuma; we now accept a thread name with a Runnable from Java clients. We don't check that the name is non-null and throw an IllegalArgumentException in that case, but I think it's fine -- the Thread class will auto-generate a name in such cases.

Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rondagostino Thanks for the PR. Left a few minor comments, apart from that LGTM.

Comment thread core/src/main/scala/kafka/tools/MirrorMaker.scala Outdated
Comment thread core/src/main/scala/kafka/tools/MirrorMaker.scala Outdated
Comment thread core/src/main/scala/kafka/tools/MirrorMaker.scala Outdated
Comment thread core/src/main/scala/kafka/utils/Exit.scala Outdated
Comment thread core/src/test/scala/kafka/security/minikdc/MinikdcTest.scala Outdated
Comment thread core/src/test/scala/kafka/utils/ExitTest.scala Outdated
Comment thread core/src/test/scala/kafka/utils/ExitTest.scala Outdated
@rondagostino
Copy link
Copy Markdown
Contributor Author

Thanks, @rajinisivaram. All set. I reverted the other format changes -- I'm not sure how that happened. Probably the IDE did it without me realizing it.

@rajinisivaram
Copy link
Copy Markdown
Contributor

@rondagostino Thanks for the updates, merging to trunk.

@rajinisivaram rajinisivaram merged commit a3509c0 into apache:trunk Jan 24, 2020
ijuma added a commit to confluentinc/kafka that referenced this pull request Jan 25, 2020
Conflicts and/or compiler errors due to the fact that we
temporarily reverted the commit that removes
Scala 2.11 support:

* Exit.scala: replace SAMs with anonymous inner classes.
* MiniKdc.scala: take upstream changes.

# By A. Sophie Blee-Goldman (1) and others
# Via Jason Gustafson
* apache-github/trunk:
  KAFKA-9254; Overridden topic configs are reset after dynamic default change (apache#7870)
  MINOR: MiniKdc JVM shutdown hook fix (apache#7946)
  KAFKA-9152; Improve Sensor Retrieval (apache#7928)
  Correct exception message in DistributedHerder (apache#7995)
  KAFKA-7317: Use collections subscription for main consumer to reduce metadata (apache#7969)
  KAFKA-9181; Maintain clean separation between local and group subscriptions in consumer's SubscriptionState (apache#7941)
  KAFKA-7737; Use single path in producer for initializing the producerId (apache#7920)

# Conflicts:
#	core/src/test/scala/kafka/security/minikdc/MiniKdc.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants