Skip to content

KAFKA-9190: Close connections with expired authentication sessions#7723

Merged
hachikuji merged 2 commits intoapache:trunkfrom
rondagostino:KAFKA-9190
Dec 4, 2019
Merged

KAFKA-9190: Close connections with expired authentication sessions#7723
hachikuji merged 2 commits intoapache:trunkfrom
rondagostino:KAFKA-9190

Conversation

@rondagostino
Copy link
Copy Markdown
Contributor

@hachikuji Here's the PR with two commits: one for the autoboxing/unboxing cleanup and the other for the close of the connection when an authentication session expires. You had originally suggested invoking selector.close(channel.id) but I committed close(channel.id) instead because that invokes selector.close() but also performs additional cleanup associated with the termination of a connection.

Signed-off-by: Ron Dagostino <rdagostino@confluent.io>
@hachikuji
Copy link
Copy Markdown
Contributor

Thanks @rondagostino . That makes sense. I'm wondering how much effort it would be to create a test case which hits this path.

@rondagostino
Copy link
Copy Markdown
Contributor Author

You stated in https://issues.apache.org/jira/browse/KAFKA-9190 that the problem was found when debugging the flaky test failure EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl. But when I began to try to write a test case to exercise this code path I realized that the code path should only be executing in 1 of 2 separate situations:

  1. When a client connects that does not understand how to re-authenticate (in other words, either a non-Java client that doesn't yet have re-authentication logic or a Java client that is pre-2.2.0 -- and this is by definition impossible given that the code base under test in EndToEndAuthorizationTest is the post-2.2.0 Java client)
  2. When a client connects with a credential that has an expiration date before the point when the broker checks and the client does not generate a new credential with a later expiration date when it refreshes. I cannot think of a situation where this could happen. Maybe Kerberos (GSSAPI)? Certainly not OAUTHBEARER since the unsecured token implementation it is using (https://github.com/apache/kafka/blob/trunk/core/src/test/scala/unit/kafka/utils/JaasTestUtils.scala#L95) will create a token that is good for an hour both initially and whenever it refreshes itself.

So this raises two issues. One is that in order to test this code path we would have to create a test where we have access to older and/or non-Java clients.

The other issue is that I don't see how this code path was being executed. One thing that occurs to me is that maybe the act of debugging itself may have introduced an otherwise-impossible state. Maybe a thread was paused, for example, and any credential refresh thyat would normaly occur on the client side could not happen.

Thoughts?

@hachikuji
Copy link
Copy Markdown
Contributor

I think the failing test hits this case for two reasons:

  1. We use a low session lifetime of 1.5s
  2. When we are handling a request, we mute the socket. I believe this prevents reauthentication. I suspect the problem could be even worse when you get several requests queued up on the broker because we'd have to handle all of them before doing the reauthentication. I confess I am not very familiar with this logic though.

cc @rajinisivaram

@rajinisivaram
Copy link
Copy Markdown
Contributor

@hachikuji @rondagostino Yes, I think we can hit this in EndToEndAuthorizationTest because connections.max.reauth.ms=1500 and I believe this overrides the credential lifetime. Since the value is so small and clients perform reauthentication somewhere between 85-95% of this session lifetime, the amount of time the client gets to perform reauthentication without channel being disconnected by the broker is actually very small. And it does explain the test failures.

Agree that we should have a test for this and I think we could perhaps add a test in SocketServerTest. I don't think we have SASL tests in SocketServerTest at the moment, but we have SSL tests, so I think it shouldn't be too hard to add a SASL one. SocketServerTest already has idle disconnection tests using MockTime and also tests that override the Selector to make it easier to test these types of scenarios.

Good find @hachikuji !

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram Let me know if the added test looks good. I confirmed that it fails prior to the fix and passes with the fix.

Also note that while writing the test I discovered that a client's SASL connection will never be disconnected even if it does not re-authenticate when the client fails to utilize SaslAuthenticateRequest -- in other words, any client that does not implement KIP-152: Improve diagnostics for SASL authentication failures. I know how to fix it and how to write a test for it, and I assume it should be its own separate JIRA ticket and PR -- I'll take care of it immediately once we have this PR reviewed since the test is pretty close to this one and I won't write it until this one passes review.

@rondagostino
Copy link
Copy Markdown
Contributor Author

Comment thread core/src/main/scala/kafka/network/SocketServer.scala Outdated
Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rondagostino Thanks for the update, left a few comments on the test.

Comment thread core/src/main/scala/kafka/network/RequestChannel.scala Outdated
Comment thread core/src/main/scala/kafka/network/SocketServer.scala Outdated
Comment thread core/src/test/scala/unit/kafka/network/SocketServerTest.scala Outdated
Comment thread core/src/test/scala/unit/kafka/network/SocketServerTest.scala Outdated
Comment thread core/src/test/scala/unit/kafka/network/SocketServerTest.scala Outdated
Comment thread core/src/test/scala/unit/kafka/network/SocketServerTest.scala Outdated
Comment thread core/src/test/scala/unit/kafka/network/SocketServerTest.scala Outdated
Signed-off-by: Ron Dagostino <rdagostino@confluent.io>
@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram All set. If you can merge it (and maybe don't squash the two commits to keep the autoboxing/unboxing commit separate) then I will follow up with a separate PR for KAFKA-9241: SASL Clients are not forced to re-authenticate if they don't leverage SaslAuthenticateRequest

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few pending comments on the test case, but looks like they all overlapped with @rajinisivaram's review 😄 . The updates LGTM, but I will let Rajini take a final look.

Comment thread core/src/test/scala/unit/kafka/network/SocketServerTest.scala
Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rondagostino Thanks for the updates, LGTM. Will merge after @hachikuji 's last remaining comment is addressed.

@hachikuji hachikuji merged commit 0871f7b into apache:trunk Dec 4, 2019
hachikuji pushed a commit that referenced this pull request Dec 4, 2019
…7723)

This patch fixes a bug in `SocketServer` in the expiration of connections which have not re-authenticated quickly enough. Previously these connections were left hanging, but now they are properly closed and cleaned up. This was one cause of the flaky test failures in `EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl`.

Reviewers: Jason Gustafson<jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
hachikuji pushed a commit that referenced this pull request Dec 4, 2019
…7723)

This patch fixes a bug in `SocketServer` in the expiration of connections which have not re-authenticated quickly enough. Previously these connections were left hanging, but now they are properly closed and cleaned up. This was one cause of the flaky test failures in `EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl`.

Reviewers: Jason Gustafson<jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
hachikuji pushed a commit that referenced this pull request Dec 4, 2019
…7723)

This patch fixes a bug in `SocketServer` in the expiration of connections which have not re-authenticated quickly enough. Previously these connections were left hanging, but now they are properly closed and cleaned up. This was one cause of the flaky test failures in `EndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl`.

Reviewers: Jason Gustafson<jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants