Skip to content

KIP-368: Allow SASL Connections to Periodically Re-Authenticate#5582

Merged
rajinisivaram merged 12 commits intoapache:trunkfrom
rondagostino:KAFKA-7352
Oct 26, 2018
Merged

KIP-368: Allow SASL Connections to Periodically Re-Authenticate#5582
rajinisivaram merged 12 commits intoapache:trunkfrom
rondagostino:KAFKA-7352

Conversation

@rondagostino
Copy link
Copy Markdown
Contributor

@rondagostino rondagostino commented Aug 28, 2018

The adoption of KIP-255: OAuth Authentication via
SASL/OAUTHBEARER in release 2.0.0 creates the possibility of
using information in the bearer token to make authorization
decisions. Unfortunately, however, Kafka connections are
long-lived, so there is no ability to change the bearer token
associated with a particular connection. Allowing SASL
connections to periodically re-authenticate would resolve this.
In addition to this motivation there are two others that are
security-related. First, to eliminate access to Kafka for
connected clients, the current requirement is to remove all
authorizations (i.e. remove all ACLs). This is necessary because
of the long-lived nature of the connections. It is operationally
simpler to shut off access at the point of authentication, and
with the release of KIP-86: Configurable SASL Callback Handlers
it is going to become more and more likely that installations
will authenticate users against external directories (e.g. via
LDAP). The ability to stop Kafka access by simply disabling an
account in an LDAP directory (for example) is desirable. The
second motivating factor for re-authentication related to
security is that the use of short-lived tokens is a common OAuth
security recommendation, but issuing a short-lived token to a
Kafka client (or a broker when OAUTHBEARER is the inter-broker
protocol) currently has no benefit because once a client is
connected to a broker the client is never challenged again and
the connection may remain intact beyond the token expiration time
(and may remain intact indefinitely under perfect circumstances).
This KIP proposes adding the ability for clients (and brokers
when OAUTHBEARER is the inter-broker protocol) to re-authenticate
their connections to brokers and for brokers to close connections
that continue to use expired sessions.

Signed-off-by: Ron Dagostino rndgstn@gmail.com

@rondagostino
Copy link
Copy Markdown
Contributor Author

Beginning to add tests, and I need to rebase to get KAFKA-7324: NPE due to lack of SASLExtensions in SASL/OAUTHBEARER (#5552)

@rondagostino
Copy link
Copy Markdown
Contributor Author

Rebasing again to resolve conflicts with KAFKA-6950: Delay response to failed client authentication to prevent potential DoS issues (KIP-306)

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram I rebased and pushed a squashed commit that implements your approach except it triggers a re-auth upon a write as described by the KIP rather than upon a read (which is what your prototype used). I think triggering upon a write is correct. I'll add metrics and tests soon. Hopefully we will get the 3rd binding up-vote in time for the 2.1.0 deadline today. Thanks again for that prototype -- I could not have gotten there myself in a reasonable amount of time because the low-level network code is pretty tough to pick up.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram This PR now has everything except:

  • implementation and tests for client-side latency metrics
  • implementation and tests for server-side connection kill metrics
  • a test to see if server-side kill works

I added re-authentication to the NioEchoServer and I test for it with bearer tokens and delegation tokens within SaslAuthenticatorTest. I added re-authentication every 50 ms for all end-to-end scala authorization integration tests as well.

I will work on the remaining 3 items on Wednesday. Can you review what is here already, or would you prefer to wait?

Ron

@rajinisivaram
Copy link
Copy Markdown
Contributor

@rondagostino I will review the PR tonight or tomorrow. There is a failing test iin the PR builds (org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest.testTokenReauthenticationOverSaslScram[reauthenticate=true]) - can you check if that is a test issue or a code issue? I will start a system test run once you get a chance to look at that.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram It is a test issue. I cannot reproduce the error locally -- it passes on my laptop. I pushed a commit to sleep a bit longer in the hope that it will fix the issue during the build.

@rajinisivaram
Copy link
Copy Markdown
Contributor

@rondagostino Thanks, I have started a system test run on your branch: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1979/. Will review the PR later tonight.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram Test still fails here but cannot reproduce the issue locally. Pushed another commit to try to fix it...

Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rondagostino Thanks for the PR. I had a quick look and left some comments. Will do a more thorough review tomorrow (it was a much larger PR that I expected!)

Comment thread clients/src/main/java/org/apache/kafka/common/network/Authenticator.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/network/Authenticator.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/network/KafkaChannel.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/network/ChannelBuilders.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/network/ChannelBuilders.java Outdated
Comment thread clients/src/test/java/org/apache/kafka/common/protocol/ApiKeysTest.java Outdated
Comment thread core/src/main/scala/kafka/server/KafkaApis.scala Outdated
@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram Pushed a commit that I think fixes/resolves most of the issues you raised. It's getting late here so I couldn't get to all of them, but I think we are in a better place at this point.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram The test failures are due to the tests being flaky rather than a coding problem in the feature implementation. I may have been able to solve the delegation token-related test that was failing (it didn't fail in these runs) by increasing the session lifetime in that test (from 50 ms to 500 ms).
I will try to fix these flaky tests by increasing the associated session lifetimes as well. I think what may be going on is the tests are taking a long time to run and by the time they send data the session expiration time has arrived and the connection is being killed. It has also occurred to me that setting the session lifetime to something less than 1000 ms is probably not a good idea in general -- it may invite this type of flakiness due to unexpected delays, for example -- and maybe we should change the units of the configuration value to seconds instead of milliseconds to make sure it is never less than 1000 ms. Thoughts? Commit pushed with larger lifetimes, it was clean on my laptop.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram I just pushed a commit to add an ExpiredSessionsKilledCount metric. The KIP called for expired-connections-killed-total as the metric name but ExpiredSessionsKilledCount is consistent with the existing metric RequestHandlerAvgIdlePercent. Thoughts?

@rajinisivaram
Copy link
Copy Markdown
Contributor

@rondagostino Sorry, I haven't had any time today to look into this PR. Given the complexity of the PR and how close it is to feature freeze, it would be very helpful if we could get another reviewer. I will try and take another look tomorrow, but not sure if we will have enough time to get this merged without help from one more reviewer. If it didn't meet feature freeze deadline, would it work for you if it was committed only to trunk and not the 2.1.0 branch?

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram We would really like this in 2.1.0 if at all possible. Is there anything I can do to help? Maybe find another reviewer, perhaps someone else who has voted for the KIP?

@rajinisivaram
Copy link
Copy Markdown
Contributor

@rondagostino Yes, getting another reviewer would definitely speed up the process.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram There are a couple of changes that are not necessary as part of this PR -- more like cleanup of things I discovered while working on the alternate implementations that we considered and rejected. I could remove those changes and submit them as minor tickets post-2.1.0 if it helps to minimize the surface area requiring review. ExpiringCredentialRefreshingLogin is one, for example, and OAuthBearerSaslClientTest is another.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram I discovered that many of the failing tests were due to the fact that GSSAPI was not re-authenticating because it uses the CLIENT_COMPLETE state on the client and I wasn't handling that possibility correctly. This meant its connections were continually being killed by the server. This is now fixed. I also moved the expiration check from KafkaRequestHandler to SocketServer so that the check is done more upstream and therefore with as little delay as possible; I think that delay sometimes has an impact when we run with really tight expiration times on the order of milliseconds. I'll be curious to see how the tests look now.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram Both builds passed. I'm going to work on adding the last metrics, and if I get that done tonight then I believe this will be complete.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram I was able to add the additional metrics. This PR now has all functionality, but I could not figure out how to test server-side expired connection kill or its metric since I have no way of running an older client. I know it works because it was the cause of the test failures in SaslGssapiSslEndToEndAuthorizationTest -- I just don't have a way to test it in an automated fashion because there is no way to disable re-authentication at the client side. I probably have to add metric documentation as well. But I think that's it. I asked someone to review, but this section of the product is pretty difficult to understand without experience, so I'm not optimistic about it. Any chance you can review this on Friday? Al tests passed last run, and I hope the same will be true of this one.

@rajinisivaram
Copy link
Copy Markdown
Contributor

@rondagostino In terms of testing, I think system test may be the way to go for testing with older clients since we already have other system tests for testing with older clients. If this goes in by Monday, you have until code freeze two weeks later to add the system test to 2.1.0.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram Ok. Last set of builds was clean again. I'm confident this is working as expected and is able to be merged assuming a review cycle can be completed.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram Metric documentation is added, this is now done except for any additional PR review required. All conversations from above are resolved/fixed. I've contacted 5 people (4 of whom up-voted the KIP) in the hope that at least one will be able to review.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram I reverted 9 files and eliminated 500 changed lines from this PR. I will submit minor tickets for these changes separately. These changes perform cleanup of code that was discovered while implementing rejected alternatives. These changes are minor and can be submitted separately at a later date. They have no impact on functionality or performance and are simply reducing technical debt. Keeping these separate reduces the surface area of this PR.

Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rondagostino Thanks for the PR. I made one pass over it and left some comments, mostly minor. But there are some significant issues that need addressing too:

  1. As mentioned in one of the comments, I think brokers need to be able to limit the number of re-authentications accepted from clients to avoid being over-loaded since this is not covered by quotas.
  2. We need to ensure that clients cannot re-authenticate with a different principal. Otherwise, if there is ever a vulnerability found in the re-authentication logic, the impact would be much worse.
  3. We need to ensure that clients are only allowed to re-authenticate using the same SASL mechanism as the original authentication.

I think we need more tests to cover these scenarios and also more around interoperability with the all the versions (pre-SASL_HANDSHAKE, SASL_HANDSHAKE v0 and v1, SASL_AUTHENTICATE v0 and v1). May be easier to do with system tests, but we do need them all.

Comment thread clients/src/main/java/org/apache/kafka/common/network/ChannelBuilders.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/network/KafkaChannel.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/network/KafkaChannel.java Outdated
Comment thread clients/src/main/java/org/apache/kafka/common/network/Selector.java Outdated
Comment thread core/src/test/scala/integration/kafka/api/EndToEndAuthorizationTest.scala Outdated
Comment thread core/src/main/scala/kafka/network/SocketServer.scala Outdated
@rondagostino
Copy link
Copy Markdown
Contributor Author

Ok, I see #5806 is where this will be fixed.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram Any progress on another review? Would like to merge ASAP so it gets as many builds and test runs as possible prior to the next release.

@rajinisivaram
Copy link
Copy Markdown
Contributor

@rondagostino Sorry, haven't had time to review yet, will try and do it today or tomorrow.

Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rondagostino Thanks for the updates, it is looking good. Left some minor comments.

Comment thread clients/src/main/java/org/apache/kafka/common/network/KafkaChannel.java Outdated
public boolean maybeBeginServerReauthentication(NetworkReceive saslHandshakeNetworkReceive,
Supplier<Long> nowNanosSupplier) throws AuthenticationException, IOException {
if (!ready())
throw new IllegalStateException("KafkaChannel should always be \"ready\" upon receiving SASL Handshake");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reword exception message since channel is not ready on receiving first handshake?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I changed it to. Not sure if this is what you were getting at. Let me know if there is something else it should say. I also fixed the Javadoc -- it was still referring to "not muted" when we got rid of that check based on a review comment.

        if (!ready())
            throw new IllegalStateException(
                    "KafkaChannel should be \"ready\" when processing SASL Handshake for potential re-authentication");

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram Latest comments are addressed/pushed. The one issue that might need more attention is the REAUTH_BAD_MECHANISM state in SaslServerAuthenticator. See my comment above.

Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rondagostino Thanks for the updates. Responded to a few of the remaining comments (minor changes). I think I will merge once these are addressed and if there are any other changes, we can make them in follow-on PRs. I think the test failures in the last build are unrelated. Can you just confirm that?

@omkreddy
Copy link
Copy Markdown
Contributor

@rondagostino my recent merge #5684 created some conflicts with this PR. Please rebase the PR.

The adoption of KIP-255: OAuth Authentication via
SASL/OAUTHBEARER in release 2.0.0 creates the possibility of
using information in the bearer token to make authorization
decisions. Unfortunately, however, Kafka connections are
long-lived, so there is no ability to change the bearer token
associated with a particular connection. Allowing SASL
connections to periodically re-authenticate would resolve this.
In addition to this motivation there are two others that are
security-related. First, to eliminate access to Kafka for
connected clients, the current requirement is to remove all
authorizations (i.e. remove all ACLs). This is necessary because
of the long-lived nature of the connections. It is operationally
simpler to shut off access at the point of authentication, and
with the release of KIP-86: Configurable SASL Callback Handlers
it is going to become more and more likely that installations
will authenticate users against external directories (e.g. via
LDAP). The ability to stop Kafka access by simply disabling an
account in an LDAP directory (for example) is desirable. The
second motivating factor for re-authentication related to
security is that the use of short-lived tokens is a common OAuth
security recommendation, but issuing a short-lived token to a
Kafka client (or a broker when OAUTHBEARER is the inter-broker
protocol) currently has no benefit because once a client is
connected to a broker the client is never challenged again and
the connection may remain intact beyond the token expiration time
(and may remain intact indefinitely under perfect circumstances).
This KIP proposes adding the ability for clients (and brokers
when OAUTHBEARER is the inter-broker protocol) to re-authenticate
their connections to brokers and for brokers to close connections
that continue to use expired sessions.

Signed-off-by: Ron Dagostino <rndgstn@gmail.com>
@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram Made two suggested changes, both builds were clean. Rebasing onto latest trunk now to resolve conflicts.

@rondagostino
Copy link
Copy Markdown
Contributor Author

@rajinisivaram After rebase JDK8 build was clean, JDK 11 build has 2 failures but they seem unrelated/transient issues.

@rajinisivaram
Copy link
Copy Markdown
Contributor

@rondagostino Thanks for the updates. I think we are good to go. I will take a look later today and merge to trunk (if there are any other changes required, we can do them in follow-on PRs).

Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rondagostino Thanks for the PR, LGTM! Merging to trunk.

@rajinisivaram rajinisivaram merged commit e8a3bc7 into apache:trunk Oct 26, 2018
pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
…IP-368) (apache#5582)

KIP-368 implementation to enable periodic re-authentication of SASL clients. Also adds a broker configuration option to terminate client connections that do not re-authenticate within the configured interval.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants