[feat][broker] OneStageAuth State: move authn out of constructor#19295
[feat][broker] OneStageAuth State: move authn out of constructor#19295lhotari merged 2 commits intoapache:masterfrom
Conversation
|
/pulsarbot rerun-failure-checks |
| // Authentication is already completed | ||
| return CompletableFuture.completedFuture(null); | ||
| } | ||
| this.authenticationDataSource = new AuthenticationDataCommand( |
There was a problem hiding this comment.
I have one question:
When having AuthChallenge, do we need to renew AuthenticationState?
It looks like so.
There was a problem hiding this comment.
Since this is a class for single stage authentication, there are no AuthChallenges. The primary reason to set this here instead of the constructor is that we set it with the authData that we authenticate. Note that this class always has isExpired return false, which means there are never any AuthChallenges generated.
If this were a multi-stage auth, this solution would not work.
There was a problem hiding this comment.
Your means that a single-stage authentication doesn't refresh authentication data, if so, when having AuthChallenge, we must new an AuthenticationState.
There was a problem hiding this comment.
single-stage authentication doesn't refresh authentication data
Yes, that is what I mean. However, I don't agree that it means we need to create a new AuthenticationState.
My understanding of the the protocol is that the AuthenticationState method isExpired returns true, Pulsar will then call refreshAuthentication() to get the bytes to send to the client. The client responds with AuthData, and Pulsar passes the response to authenticate(AuthData). Since isExpired is always false for OneStateAuthenticationState, there is no expectation of getting an AuthChallenge back. The single stage authentication is truly one stage.
|
@lhotari - yes, that proxy test is failing because of my changes. Pushing up a fix now. Note that the fix includes a new call to |
|
Looks like |
|
Looks like the error was transient. I am not sure what would have led to that error. @lhotari - PTAL |
That's a known issue, #13620 (comment) . Mockito is not thread safe. |
PIP: #12105 ### Motivation When authentication fails in the `ServerCnx`, the state is left in `Start` if the primary `authData` fails authentication and in `Connecting` or `Connected` if the `originalAuthData` authentication fails. To prevent any kind of unexpected behavior, we should go to `Failed` state. Note that the tests verify the current behavior where a failed `originalAuthData` results first in a `Connected` command from the broker and then an `Error` command. I documented that I think this is sub optimal here #19311. ### Modifications * Update `ServerCnx` state to `Failed` when there is an authentication exception during `handleConnect` and during `handleAuthResponse`. * Update `handleAuthResponse` reply to `"Unable to authenticate"` instead of the `AuthenticationState` exception. ### Verifying this change A new test is added. The added test covers the change made in #19295 where we updated `ServerCnx` so that we call `AuthState#authenticate` instead of relying on the implementation detail that the initialization calls `authenticate`. That PR should have added a test. ### Does this pull request potentially affect one of the following parts: This is not a breaking change. ### Documentation - [x] `doc-not-needed` ### Matching PR in forked repository PR in forked repository: michaeljmarshall#18
…he#19312) PIP: apache#12105 When authentication fails in the `ServerCnx`, the state is left in `Start` if the primary `authData` fails authentication and in `Connecting` or `Connected` if the `originalAuthData` authentication fails. To prevent any kind of unexpected behavior, we should go to `Failed` state. Note that the tests verify the current behavior where a failed `originalAuthData` results first in a `Connected` command from the broker and then an `Error` command. I documented that I think this is sub optimal here apache#19311. * Update `ServerCnx` state to `Failed` when there is an authentication exception during `handleConnect` and during `handleAuthResponse`. * Update `handleAuthResponse` reply to `"Unable to authenticate"` instead of the `AuthenticationState` exception. A new test is added. The added test covers the change made in apache#19295 where we updated `ServerCnx` so that we call `AuthState#authenticate` instead of relying on the implementation detail that the initialization calls `authenticate`. That PR should have added a test. This is not a breaking change. - [x] `doc-not-needed` PR in forked repository: #18 (cherry picked from commit 8049690)
…he#19312) PIP: apache#12105 When authentication fails in the `ServerCnx`, the state is left in `Start` if the primary `authData` fails authentication and in `Connecting` or `Connected` if the `originalAuthData` authentication fails. To prevent any kind of unexpected behavior, we should go to `Failed` state. Note that the tests verify the current behavior where a failed `originalAuthData` results first in a `Connected` command from the broker and then an `Error` command. I documented that I think this is sub optimal here apache#19311. * Update `ServerCnx` state to `Failed` when there is an authentication exception during `handleConnect` and during `handleAuthResponse`. * Update `handleAuthResponse` reply to `"Unable to authenticate"` instead of the `AuthenticationState` exception. A new test is added. The added test covers the change made in apache#19295 where we updated `ServerCnx` so that we call `AuthState#authenticate` instead of relying on the implementation detail that the initialization calls `authenticate`. That PR should have added a test. This is not a breaking change. - [x] `doc-not-needed` PR in forked repository: #18 (cherry picked from commit 8049690) (cherry picked from commit 3ef3bf1) (cherry picked from commit 467cd32) (cherry picked from commit 1fab71b)
This change was introduced by apache#19295. That PR had more changes than are worth cherry-picking, though, so this commit only has the additional call to authenticate the original auth data. As a result, this commit is slightly less efficient because in some implementations, the authdata will be validated twice. (cherry picked from commit f9727ca) (cherry picked from commit 1935f07) (cherry picked from commit 870bf04)
|
I forgot why I tagged this commit for a commit to all of the release branches. Those commits reference the fact that I added a call to authenticate the |
| if (authRole == null) { | ||
| throw new AuthenticationException("Must authenticate before calling getAuthRole"); | ||
| } |
There was a problem hiding this comment.
This PR looks like a breaking change. In KoP, this method is used to get the role from the token. After upgrading the dependency, many authorization related tests failed. See https://github.com/streamnative/kop/actions/runs/4343910383/jobs/7586547352
Not sure if it should be reverted in master branch, but it should not be cherry-picked to release branches. @michaeljmarshall
There was a problem hiding this comment.
@BewareMyPower - I agree that we shouldn't cherry pick this line. I don't think I did though, taking a quick look at one of the linked commits. I primarily cherry picked the code that calls authenticate on the original auth data. Are you seeing this error in the kop logs?
There was a problem hiding this comment.
I understand now. That KoP test references pulsar 3.0.0, so it must be from a recent build of master. First, I'll note that I didn't cherry pick the behavior change to old release branches. I cherry-picked the extra authentication call for original auth data, which was code introduced in this PR. Second, I think this change is necessary for enabling PIP 97. I first documented the contract here: #19283. The primary issue is that creating the AuthState should not trigger authenticating the AuthData. KoP relied on this implementation detail, and that is why the tests failed. Given that the only impacted parties are plugin developers, I think a PIP, documented release notes, a major version bump, and a fail fast behavior (the object throws an exception), make this change acceptable for 3.0.0. Let me know what you think, thanks.
There was a problem hiding this comment.
Sorry I missed the fact that you have removed these cherry-picked labels.
Regarding this change, I think you should add another constructor to OneStageAuthenticationState. Currently, the first argument authData is actually not used. Maybe you didn't remove it because it's public. (However, you removed the exception signature) But the semantic changed and it looked weird now.
To keep the compatibility, it's better to mark the constructor as deprecated first and keep it the semantic as "authenticate synchronously". To use the asynchronous semantic, you should add another constructor to do that.
public OneStageAuthenticationState(SocketAddress remoteAddress,
SSLSession sslSession,
AuthenticationProvider provider) {
this.provider = provider;
this.remoteAddress = remoteAddress;
this.sslSession = sslSession;
}
@Deprecated
public OneStageAuthenticationState(AuthData authData,
SocketAddress remoteAddress,
SSLSession sslSession,
AuthenticationProvider provider) throws AuthenticationException {
this(remoteAddress, sslSession, provider);
this.authenticationDataSource = new AuthenticationDataCommand(
new String(authData.getBytes(), StandardCharsets.UTF_8), remoteAddress, sslSession);
this.authRole = provider.authenticate(this.authenticationDataSource);
}KoP relied on this implementation detail
I'd rather use "semantics" instead of the "implementation detail". Take Producer#send as example:
- Semantics: the send is synchronous
- Implementation details: the exception message representation, etc.
If you made a change that makes send asynchronous, it should be treated as a breaking change. But if you just modified the exception message, for example, adding a common prefix to the exception message, it would be okay.
There was a problem hiding this comment.
@BewareMyPower - thanks for the context and the suggestion.
I think you should add another constructor to
OneStageAuthenticationState.
Thanks for the suggestion, I agree this is the best way to move forward without breaking any other applications when 3.0.0 is released. I also agree that it makes the API clearer because the unused AuthData won't be passed in. I shoul be able to get a PR submitted within 24 hours or so.
(However, you removed the exception signature)
I didn't realize this would cause an issue. I'll add it back when I fix the constructor. Out of curiosity, what is the consequence of breaking source compatibility by removing the exception from the method signature?
There was a problem hiding this comment.
Out of curiosity, what is the consequence of breaking source compatibility by removing the exception from the method signature?
No. I just noticed this point when I reviewed again. After looking for some documents and testing the ABI compatibility locally, it seems that removing the exception signature does not affect the ABI compatibility.
There was a problem hiding this comment.
@BewareMyPower - I finally got a chance to come back to this, and now I remember that I did try to use a similar solution that you proposed while writing this PR, but I found it didn't actually work as you propose. There are a few details to show the issue:
First, the OneStageAuthenticationState is not used directly by the broker or by plugins. Instead, the broker and plugins use AuthenticationProvider#newAuthState to create the AuthenticationState object. Here is the code:
As such, changing the constructor in the way you propose is unlikely to help users that built their own extensions. For example, KoP does the following:
Users must rely on the AuthenticationProvider#newAuthState in order to make the AuthenticationProvider pluggable.
When I noticed this detail while writing this PR, I thought about adding a new method to the AuthenticationProvider that would not take the AuthData object, in the same way that you proposed for the new OneStageAuthenticationState constructor. However, the main issue is how to default that method in the interface. We could use the following definition:
default AuthenticationState newAuthState(SocketAddress remoteAddress, SSLSession sslSession)
throws AuthenticationException {
return new OneStageAuthenticationState(remoteAddress, sslSession, this);
}This would be analogous to the current newAuthState method, but the issue is that custom AuthenticationProvider implementations that override the current newAuthState method likely don't want to use the OneStageAuthenticationState, and that would lead to unexpected behavior.
Another option could be:
default AuthenticationState newAuthState(SocketAddress remoteAddress, SSLSession sslSession)
throws AuthenticationException {
return newAuthState(null, remoteAddress, sslSession);
}However, that would require breaking the current implementation in the way you're concerned about. (For what it's worth, this option might be the best long term solution for cleaning up the interface if we want to remove the authData that is passed. The main point is that it doesn't take care of your concern of not breaking user code.)
Another option:
AuthenticationState newAuthState(SocketAddress remoteAddress, SSLSession sslSession);This unimplemented option has two problems. First, it will break user code if they do not implement the method. Since this whole exercise is an effort to try to prevent users from needing to change any code, that isn't a good option. Second, it removes the OneStageAuthenticationState default already present in the interface, which seems problematic.
In my view, the primary challenge is that the AuthenticationProvider interface has a brittle default behavior.
Given the above, do you have any ideas on how to make this better for users building their own broker extensions?
There was a problem hiding this comment.
but the issue is that custom AuthenticationProvider implementations that override the current newAuthState method likely don't want to use the OneStageAuthenticationState, and that would lead to unexpected behavior.
Could you show an example of a possible unexpected behavior? IMO, the existing authentication provider only overrides the previous newAuthState method still has the previous behavior and is not affected by the new newAuthState method.
There was a problem hiding this comment.
Could you show an example of a possible unexpected behavior? IMO, the existing authentication provider only overrides the previous
newAuthStatemethod still has the previous behavior and is not affected by the newnewAuthStatemethod.
The challenge comes from the fact that Apache Pulsar will transition to using the new method to build the state object and if that method is not overridden by the custom AuthenticationProvider implementation, the OneStageAuthenticationState will be used. That will lead to unexpected behavior.
A good example is the AuthenticationProviderToken class. That one would "work" with the OneStageAuthenticationState with the key difference being the isExpired value from OneStageAuthenticationState always returns false while the TokenAuthenticationState returns true when the token's exp claim indicates the token has expired.
There was a problem hiding this comment.
I got it. The key point is that Pulsar will prefer the new overload but the existing implementations might still implement the old overload. It seems that there is no better way to solve it perfectly.
### Modifications - Upgrade the Pulsar dependency to 3.0.0.1-SNAPSHOT - Use `List<EntryFilter>` from apache/pulsar#19364 as the filters - Remove the removed APIs in `AuthorizationService` - Use `TopicType` instead of String. - Fix authenticator failure caused by apache/pulsar#19295 Co-authored-by: Demogorgon314 <kwang@streamnative.io>
…che#19295) (cherry picked from commit e8695bf)
…he#19312) PIP: apache#12105 When authentication fails in the `ServerCnx`, the state is left in `Start` if the primary `authData` fails authentication and in `Connecting` or `Connected` if the `originalAuthData` authentication fails. To prevent any kind of unexpected behavior, we should go to `Failed` state. Note that the tests verify the current behavior where a failed `originalAuthData` results first in a `Connected` command from the broker and then an `Error` command. I documented that I think this is sub optimal here apache#19311. * Update `ServerCnx` state to `Failed` when there is an authentication exception during `handleConnect` and during `handleAuthResponse`. * Update `handleAuthResponse` reply to `"Unable to authenticate"` instead of the `AuthenticationState` exception. A new test is added. The added test covers the change made in apache#19295 where we updated `ServerCnx` so that we call `AuthState#authenticate` instead of relying on the implementation detail that the initialization calls `authenticate`. That PR should have added a test. This is not a breaking change. - [x] `doc-not-needed` PR in forked repository: #18 (cherry picked from commit 8049690)
PIP: #12105
Motivation
In order to implement PIP 97, we must replace
authenticatecalls withauthenticateAsync. This PR makes those changes forOneStageAuthenticationState. It also moves the call to authenticate theAuthDataout of the constructor and into theauthenticateAsyncmethod.One assumption I make in this PR, is that there will not be concurrent calls to the same
AuthenticationState#authenticateAsyncmethod. That seems like a reasonable assumption because Pulsar uses this state object in the netty eventloop handlers, but please let me know if you disagree.Modifications
authenticatecall out of the constructor so that authentication is only done on the first pass through theauthenticateAsyncmethod.authenticatebackwards compatible for any plugins that are using it.Verifying this change
New tests are added to verify the new and the old code.
Does this pull request potentially affect one of the following parts:
This could break 3rd party plugins in the broker if they were relying on authentication to happen in the constructor. In order to make those implementations fail fast, this PR includes a change to throw an exception when the
getAuthRoleis called without first callingauthenticateAsyncorauthenticate. That makes these changes semi-backwards compatible.Documentation
doc-not-neededMatching PR in forked repository
PR in forked repository: michaeljmarshall#17