KAFKA-18135: ShareConsumer HB UnsupportedVersion msg mixed with Consumer HB#18101
KAFKA-18135: ShareConsumer HB UnsupportedVersion msg mixed with Consumer HB#18101AndrewJSchofield merged 8 commits intoapache:trunkfrom
Conversation
| "group protocol. Set group.protocol=classic on the consumer configs to revert to the CLASSIC protocol " + | ||
| "until the cluster is upgraded."; | ||
|
|
||
| public static final String SHARE_PROTOCOL_NOT_SUPPORTED_MSG = "The cluster does not support the new SHARE " + |
There was a problem hiding this comment.
I would change it to "The cluster does not support the share group protocol. To use share groups, the cluster must have the share group protocol enabled." I think this message will remain valid over the next few releases without us needing to update it.
There was a problem hiding this comment.
Also, I would expect this constant should go in the ShareMembershipMgr right?
|
We should get @lianetm to make sure she's happy too. |
lianetm
left a comment
There was a problem hiding this comment.
Thanks for the changes! Took a first look, here are some comments
| "group protocol. Set group.protocol=classic on the consumer configs to revert to the CLASSIC protocol " + | ||
| "until the cluster is upgraded."; | ||
|
|
||
| public static final String SHARE_PROTOCOL_NOT_SUPPORTED_MSG = "The cluster does not support the new SHARE " + |
There was a problem hiding this comment.
Also, I would expect this constant should go in the ShareMembershipMgr right?
| if (!handleSpecificError(response, currentTimeMs)) { | ||
| logger.error("{} failed due to {}: {}", heartbeatRequestName(), error, errorMessage); | ||
| handleFatalFailure(error.exception(errorMessage)); | ||
| } |
There was a problem hiding this comment.
I wonder if it would be simpler to remove the UnsupportedVersion handling from here completely? In the end we're duplicating the logic we already have in the default case further down right? We would only need the logic in the Share/Consumer managers, simply because even if the error is not specific the handling is. Thoughts?
There was a problem hiding this comment.
You're right, I think the logic in default and UnsupportedVersion handling are the same. Thanks!
| if (errorMessage.contains(REGEX_RESOLUTION_NOT_SUPPORTED_MSG)) { | ||
| // If the error is about regex subscription, use the original error message | ||
| message = errorMessage; | ||
| } |
There was a problem hiding this comment.
I would expect we don't need this here. This regex error is generated on the client when building the request (it generates a response with exception) so the handling lands on the onFailure path (not on the onResponse path that uses this handleSpeficifError).
I would expect we just need to have here the same handling we had for the UnsupportedVersion in the parent class before this PR:
That being said, we do need to ensure that we handle responses for each consumer, to cover this bit that is now on the abstract mgr (so it could wrongly apply a consumer msg to the share consumer):
Both consumers need to ensure they propagate their specific msgs when that exception is in a response.
There was a problem hiding this comment.
So I guess we should add a handle onSpeficFailure in each consumer just like handleSpecificError did?
There was a problem hiding this comment.
Yes. The implementation of that onSpecificFailure/handleSpecificExceptionInResponse on each consumer should cover this bit I expect, with the correct msg:
And in the AbstractHBMgr onFailure probably just need something like:
if (exception instanceof RetriableException) {
...
} else if (!onSpecificFailure()) {
handleFatalFailure(exception);
}
We don't need any specifics for the regex case I would expect. It should land on the handleFatalFailure and propagate the msg it has (it will only ever be generated when building consumer HB, so no risk for the share consumer)
With that seems we cover the specifics of handling responses with exceptions (generated on the client) , and in a very similar way to how we handle responses with error codes (generated on the broker).
AndrewJSchofield
left a comment
There was a problem hiding this comment.
A few more changes requested. I have been working with the code on my local environment, and I think there's still something not totally right. This PR isn't making it worse in any way, but I think there's another improvement we can make here which would improve it more.
|
@peterxcli It seems to me that this doesn't quite work as well as it might. In my testing, connecting a share consumer to a broker which does not support The consumer checks the ApiVersions response and notices that What do you think @lianetm? Am I missing something? |
|
@peterxcli Actually, I've realised that this is actually a temporary problem. In AK 4.0, |
|
Just to answer @AndrewJSchofield's comment #18101 (comment) , you got it right, we basically have 2 paths to get the Unsupported:
For the KafkaConsumer we can have the 2 paths now but Share will always go down 1 as you described until the API becomes stable. |
| if (errorMessage.contains(REGEX_RESOLUTION_NOT_SUPPORTED_MSG)) { | ||
| // If the error is about regex subscription, use the original error message | ||
| message = errorMessage; | ||
| } |
There was a problem hiding this comment.
Yes. The implementation of that onSpecificFailure/handleSpecificExceptionInResponse on each consumer should cover this bit I expect, with the correct msg:
And in the AbstractHBMgr onFailure probably just need something like:
if (exception instanceof RetriableException) {
...
} else if (!onSpecificFailure()) {
handleFatalFailure(exception);
}
We don't need any specifics for the regex case I would expect. It should land on the handleFatalFailure and propagate the msg it has (it will only ever be generated when building consumer HB, so no risk for the share consumer)
With that seems we cover the specifics of handling responses with exceptions (generated on the client) , and in a very similar way to how we handle responses with error codes (generated on the broker).
|
|
||
| @ParameterizedTest | ||
| @ValueSource(strings = {SHARE_PROTOCOL_NOT_SUPPORTED_MSG}) | ||
| public void testUnsupportedVersion(String errorMsg) { |
There was a problem hiding this comment.
this is truly testUnsupportedVersionGeneratedOnTheClient (client response with exception). Renaming will maybe clarify, and we should ensure we have coverage for something like testUnsupportedVersionGeneratedOnTheBroker (client response with ErrorCode), is it already covered?
The outcome of both tests should be the same, but they execute diff code paths modified in this PR (handleSpecificError and handleSpecificFailure/exception)
|
Rebase CI hotfix #18284. |
|
Three unrelated tests failed.
|
AndrewJSchofield
left a comment
There was a problem hiding this comment.
A couple of final comments. This is a nice improvement.
| } | ||
|
|
||
| /** | ||
| * Error handling specific failure to a group type when sending the request. |
There was a problem hiding this comment.
I would add "and no response has been received" to the comment. This is the crucial difference between the usage of handleSpecificFailure and handleSpecificExceptionInResponse.
| String message = CONSUMER_PROTOCOL_NOT_SUPPORTED_MSG; | ||
| if (errorMessage.equals(REGEX_RESOLUTION_NOT_SUPPORTED_MSG)) { | ||
| message = REGEX_RESOLUTION_NOT_SUPPORTED_MSG; | ||
| logger.error("{} failed due to regex resolution not support: {}", heartbeatRequestName(), message); |
There was a problem hiding this comment.
nit: "regex resolution not supported".
…ove error handling - Move SHARE_PROTOCOL_NOT_SUPPORTED_MSG constant to AbstractHeartbeatRequestManager - Add test case for unsupported version error handling in ShareHeartbeatRequestManager - Improve error message handling for regex resolution errors
…artbeatRequestManager#onErrorResponse
- Split error handling into two distinct methods: handleSpecificFailure and handleSpecificExceptionInResponse - Move UnsupportedVersion handling from onFailure to specific implementations - Rename handleSpecificError to handleSpecificExceptionInResponse for clarity - Improve error logging messages to better distinguish between client-side and broker-side version errors This change makes the error handling more modular and clearer by separating client-side request failures from broker response errors.
- Split UnsupportedVersion tests into client-side and broker-side cases - Add separate test methods for each scenario in both Consumer and Share heartbeat managers - Fix test helper methods to properly simulate client vs broker version errors - Rename test methods to better describe test scenarios This change improves test clarity by explicitly testing both client-side and
|
Failing test is known as flaky and tracked by https://issues.apache.org/jira/browse/KAFKA-18298. |
…mer HB (apache#18101) Add specific error handling for unsupported version in share consumer and consumer Reviewers: Lianet Magrans <lmagrans@confluent.io>, Andrew Schofield <aschofield@confluent.io>
UnsupportedVersion error is handled in the parent AbstractHeartbeatRequestManager, so used by consumer and share consumer. If a share consumer gets the errors, it will end up with a msg that is currently specific to consumer
kafka/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractHeartbeatRequestManager.java
Lines 384 to 386 in 6fd951a
Handle the UnsupportedVersion separately in the existing
handleSpecificError (note that the unsupported version for consumer may also end up containing a msg specific to SubscriptionPattern not supported in HB v0, if regex is used without the required v1)
onSpecificFailurefor consumerHB and shareHBtestUnsupportedVersionGeneratedOnTheBrokerto cover both handleSpecificError and handleSpecificFailure/exception