KAFKA-7168: Treat connection close during SSL handshake as retriable#5371
KAFKA-7168: Treat connection close during SSL handshake as retriable#5371rajinisivaram merged 4 commits intoapache:trunkfrom
Conversation
ijuma
left a comment
There was a problem hiding this comment.
Thanks for the PR, left a few comments.
|
|
||
| // this exception could be due to a write. If there is data available to unwrap, | ||
| // process the data so that any SSLExceptions are reported | ||
| // process the data so that any SSL handshake exceptions are reported |
There was a problem hiding this comment.
It would be good to elaborate on why we need to do this as it's not obvious by just reading the code.
There was a problem hiding this comment.
Added comments in a new method to process the exceptions.
| } catch (SSLException e1) { | ||
| } catch (SSLHandshakeException | SSLProtocolException | SSLPeerUnverifiedException | SSLKeyException e1) { | ||
| handshakeFailure(e1, false); | ||
| } catch (SSLException e1) { |
There was a problem hiding this comment.
Shouldn't we be rethrowing the IOException in this case as per the comment "If we get here, this is not a handshake failure, throw the original IOException"?
There was a problem hiding this comment.
Also, I assume the IOException we get doesn't provide any clues?
There was a problem hiding this comment.
No, IOException didn't give any clues here. We were falling through to propagate the original exception earlier, but I have added a method to process the exception now.
|
@ijuma Thanks for the review. Added a couple more tests and in the end I had to rely on the exception String to cover some of the cases. Can you take another look? |
| // We want to handle a) as a non-retriable SslAuthenticationException and b) as a retriable IOException. | ||
| // To do this we need to rely on the exception string. Since it is safer to throw a retriable exception | ||
| // when we are not sure, we will treat only the first exception string as a handshake exception. | ||
| private void maybeProcessHandshakeFailure(SSLException sslException, boolean flush, IOException ioException) throws IOException { |
There was a problem hiding this comment.
Would it be easier to understand if this handled all of the unwrap exceptions after the IOException? And then we could call this method processUnwrapExceptionAfterIOException.
There was a problem hiding this comment.
@ijuma Thanks for the review. At the moment, this method is used to process SSLException in two places, regardless of whether the original exception was after IOException or not. I thought it would be better to do the String check in a single method rather than separate out handling of IOException. We need to handle both cases because SSLException due to close_notify may be processed before or after IOException. I can use two methods if a single method is confusing.
There was a problem hiding this comment.
Good point. So the name doesn't work, but in both cases we have two catches like:
} catch (SSLHandshakeException | SSLProtocolException | SSLPeerUnverifiedException | SSLKeyException e) {
...
} catch (SSLException e1) {
...
}Maybe we can catch SSLException and then pass it to the shared method. What do you think?
There was a problem hiding this comment.
@ijuma Yes, I was in two minds about that - whether to use instanceof in one place or catch the exception in two places. I have updated to use the common method. Thanks.
ijuma
left a comment
There was a problem hiding this comment.
Thanks for the PR, LGTM. Just a question and a minor suggestion.
| if (sslException instanceof SSLHandshakeException || sslException instanceof SSLProtocolException || | ||
| sslException instanceof SSLPeerUnverifiedException || sslException instanceof SSLKeyException) { | ||
| handshakeFailure(sslException, flush); | ||
| } else if (sslException.getMessage().contains("Unrecognized SSL message")) |
There was a problem hiding this comment.
Should this be an additional || in the previous if?
|
|
||
| /** | ||
| * Tests that if the remote end closes connection ungracefully during SSL handshake while writing data, | ||
| * the disconnection is not treated as an authentication failure. |
There was a problem hiding this comment.
One question: how would the test fail if the disconnection was treated as an authentication failure?
There was a problem hiding this comment.
assertTrue("Unexpected channel state " + state, state == ChannelState.State.AUTHENTICATE || state == ChannelState.State.READY)
All the tests check the state when the channel is disconnected. state will be AUTHENTICATION_FAILED if the disconnection is treated as an authentication failure.
|
@ijuma Thanks for the reviews, merging to trunk and 2.0. |
…5371) SSL `close_notify` from broker connection close was processed as a handshake failure in clients while unwrapping the message if a handshake is in progress. Updated to handle this as a retriable IOException rather than a non-retriable SslAuthenticationException to avoid authentication exceptions in clients during rolling restart of brokers. Reviewers: Ismael Juma <ismael@juma.me.uk>
SSL
close_notifyfrom broker connection close is processed as anSSLExceptionwhile unwrapping the final message when the I/O exception due to remote close is processed. This should be handled as a retriableIOExceptionrather than a non-retriableSslAuthenticationException.Committer Checklist (excluded from commit message)