MINOR: Close ZooKeeperClient if waitUntilConnected fails during construction#5411
MINOR: Close ZooKeeperClient if waitUntilConnected fails during construction#5411ijuma merged 8 commits intoapache:trunkfrom
Conversation
|
Three is a zookeeper thread leak incase of unknown host address passed to zk connect string. This is due to recent ZK client lib changes. Previously zookeeper fail immediately for unknown address. In new version re-resolves hosts when connection attempts fail. This is leading to thread leak in ServerShutdownTest.testCleanShutdownAfterFailedStartup test, ZooKeeperClientTest.testUnresolvableConnectString. Thread leak in ServerShutdownTest.testCleanShutdownAfterFailedStartup Before zookeeper-3.4.13 With zookeeper-3.4.13 |
|
This fix doesn't look like the right fix. We want to close zookeeper if this fails during construction only right? In other cases, the caller can decide what to do if waitUntilConnected fails. |
|
Ye, we can close in constructor. Updated the PR. |
| case e: ZooKeeperClientTimeoutException => | ||
| zooKeeper.close() | ||
| throw e | ||
| } |
There was a problem hiding this comment.
This should just be a finally instead of catch, I think.
There was a problem hiding this comment.
Thinking about it some more, we actually need to shutdown the scheduler too.
There was a problem hiding this comment.
Maybe we can call ZooKeeperClient.close()?
|
@omkreddy I agree that writing a test is a bit challenging. The only way I can think of is to subclass ZooKeeperClient and catch the parent constructor exception in the subclass. |
| close() | ||
| throw e | ||
| } | ||
|
|
There was a problem hiding this comment.
Thanks for updating, I still think we should use a finally block instead of catching a specific exception.
There was a problem hiding this comment.
Oh wait. I was wrong, we can't use finally here. But we should probably catch a more general exception?
There was a problem hiding this comment.
Yeah, we can user ZooKeeperClientException, which parent is parent class for zk excpetions
There was a problem hiding this comment.
Right, as other exceptions can be thrown too. But to be honest, we might as well catch Throwable since we rethrow he exception anyway. We just want to avoid leaking the resource.
There was a problem hiding this comment.
Updated the PR with Throwable . Not sure, if we can catch parent constructor exception in subclass constructor. For now, added a test based on zk thread count.
| case e: Throwable => | ||
| close() | ||
| throw e | ||
| } |
There was a problem hiding this comment.
Nit: we don't need the block for try, e.g.
try waitUntilConnected(connectionTimeoutMs, TimeUnit.MILLISECONDS)
catch {
case e: Throwable =>
close()
throw e
}| .filter(_.isAlive) | ||
| .map(_.getName) | ||
| .count(t => t.contains("SendThread()") || t.contains(hostAddress)) // Verify threadName pattern for unresolvable host zk send threads | ||
|
|
There was a problem hiding this comment.
Good idea for a test. Why not simply:
@Test
def testUnresolvableConnectString(): Unit = {
val hostAddress = "some.invalid.hostname.foo.bar.local"
try {
new ZooKeeperClient(hostAddress, zkSessionTimeout, connectionTimeoutMs = 10, Int.MaxValue, time,
"testMetricGroup", "testMetricType")
} catch {
case _: ZooKeeperClientTimeoutException =>
assertEquals("ZooKeeper client threads still running", Set.empty, runningZkSendThreads)
}
}
private def runningZkSendThreads: collection.Set[String] = Thread.getAllStackTraces.keySet.asScala
.filter(_.isAlive)
.map(_.getName)
.filter(_.contains("SendThread()"))|
Merging to trunk and 2.0. |
…ruction (#5411) This has always been an issue, but the recent upgrade to ZooKeeper 3.4.13 means it is also an issue when an unresolvable ZK address is used, causing some tests to leak threads. The change in behaviour in ZK 3.4.13 is that no exception is thrown from the ZooKeeper constructor in case of an unresolvable address. Instead, ZooKeeper tries to re-resolve the address hoping it becomes resolvable again. We eventually throw a `ZooKeeperClientTimeoutException`, which is similar to the case where the the address is resolvable but ZooKeeper is not reachable. Reviewers: Ismael Juma <ismael@juma.me.uk>
* apache-github/2.0: MINOR: Close ZooKeeperClient if waitUntilConnected fails during construction (apache#5411) KAFKA-6897; Prevent KafkaProducer.send from blocking when producer is closed (apache#5027)
This has always been an issue, but the recent upgrade to ZooKeeper
3.4.13 means that it is also an issue when an unresolvable ZK
address is used, causing some tests to leak threads.
The change in behaviour in ZK 3.4.13 is that no exception is thrown
from the ZooKeeper constructor in case of an unresolvable address.
Instead, ZooKeeper tries to re-resolve the address hoping it becomes
resolvable again. We eventually throw a
ZooKeeperClientTimeoutException, which is similar to the casewhere the the address is resolvable, but ZooKeeper is not
reachable.
Committer Checklist (excluded from commit message)