Scenario:
- Everything is up and running.
- Zookeeper goes away (for whatever reason).
- Nodes start spamming:
2015-11-27T01:44:08,767 INFO [Curator-Framework-0-SendThread(monitowl-dev:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server monitowl-dev/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-11-27T01:44:08,767 WARN [Curator-Framework-0-SendThread(monitowl-dev:2181)] org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_66]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_66]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) [zookeeper-3.4.6.jar:3.4.6-1569965]
(Which is "fine", I guess.)
4. I do kill -TERM <node_pid>.
5. Node says:
2015-11-27T01:44:09,446 INFO [Thread-42] com.metamx.common.lifecycle.Lifecycle - Running shutdown hook
2015-11-27 01:44:09,464 FATAL Unable to register shutdown hook because JVM is shutting down.
(Which seems to be normal, they always say that before dying.)
6. But... it does not die. Process is still displayed as running (in htop, etc.), no further logs or anything, though.
Using 0.8.2, happens for all kinds of nodes AFAICT.
Possibly worth noting that when I deliberately start a node when zookeeper is down, it exits fine on SIGTERM.
Probably wouldn't even notice it, but this confuses our systemd configs quite a bit :-/.
Scenario:
(Which is "fine", I guess.)
4. I do
kill -TERM <node_pid>.5. Node says:
(Which seems to be normal, they always say that before dying.)
6. But... it does not die. Process is still displayed as running (in htop, etc.), no further logs or anything, though.
Using 0.8.2, happens for all kinds of nodes AFAICT.
Possibly worth noting that when I deliberately start a node when zookeeper is down, it exits fine on SIGTERM.
Probably wouldn't even notice it, but this confuses our systemd configs quite a bit :-/.