-
Notifications
You must be signed in to change notification settings - Fork 7.3k
ZOOKEEPER-4736: Fix nio socket fd leak if network service is down #2047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sonatype Lift is retiringSonatype Lift will be retiring on Sep 12, 2023, with its analysis stopping on Aug 12, 2023. We understand that this news may come as a disappointment, and Sonatype is committed to helping you transition off it seamlessly. If you’d like to retain your data, please export your issues from the web console. |
|
@eolivelli, @kalmar, @hanm ,@jowiho Can you please take a minute look at this ? |
kezhuw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 in general and I have verified this with local tests.
I think it is better to write a test for this. You can use ClientCnxnSocketFragilityTest as an example.
| void registerAndConnect(SocketChannel sock, InetSocketAddress addr) throws IOException { | ||
| sockKey = sock.register(selector, SelectionKey.OP_CONNECT); | ||
| boolean immediateConnect = sock.connect(addr); | ||
| sockKey = sock.register(selector, SelectionKey.OP_CONNECT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but because in ClientCnxnSocketNIO::registerAndConnect method socket is registed to selector firstly and do sock.connect operation leading the fd of sock can't be closed.
// If this channel is not registered then it's safe to close the fd
// immediately since we know at this point that no thread is
// blocked in an I/O operation upon the channel and, since the
// channel is marked closed, no thread will start another such
// operation. If this channel is registered then we don't close
// the fd since it might be in use by a selector. In that case
// closing this channel caused its keys to be cancelled, so the
// last selector to deregister a key for this channel will invoke
// kill() to close the fd.
//
if (!isRegistered())
kill();sock.close will not close fd due to registered status. It is sad that:
- We never got a chance to
ClientCnxnSocketNIO#doTransportwhich callSelector::selectwhich process cancelled key and close fd. - We don't have client side session expiration, so never a chance to
ClientCnxnSocketNIO#closeandSelector::close. ZOOKEEPER-4508: Expire session in client side to avoid endless connection loss #2058
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asides from above, I think we should skip register in case of immediateConnect otherwise we may prime the connection twice. But it is probably another story and need verification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review. I agree with your viewpoint on the reason of fd leak. if skip register we can not get the sockKey that the primeConnection() is based of in the follow-up actions. And this just register a SelectionKey.OP_CONNECT event,it will not affect primeConnection() implements that will alter the interestOps with "clientCnxnSocket.enableReadWriteOnly()".
|
Why not just invoke |
This sounds sensible and is a one for all solution. Currently, @lchqlchq What do you think on this ? I think we could add |
…addr) method throws "SocketException: Network is unreachable", because the socket had registered to selector, cleanup() method can't close the fd ,And SendThread keep doing startConnect() to keep zk client alive ,leading to fd leak.
|
I have taken over this so to move forward. It has been opened for two years. Could you please take a look ? @anmolnar @eolivelli @tisonkun |
`Socket::close` could only mark the object as closed but leave underlying socket fd open if the socket is registered to nio selector. So, we have to call `Selector::selectNow` in `cleanup` to close underlying socket fd. This could happen if network service get shutdown in socket connecting, say, "ifdown eth0" or "service network stop" and so on. See: * https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java#L841
anmolnar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
Socket::closecould only mark the object as closed but leaveunderlying socket fd open if the socket is registered to nio selector.
So, we have to call
Selector::selectNowincleanupto closeunderlying socket fd.
This could happen if network service get shutdown in socket connecting,
say, "ifdown eth0" or "service network stop" and so on.
See: