Skip to content

KAFKA-18619: New consumer topic metadata events should set requireMetadata flag#18668

Merged
lianetm merged 1 commit intoapache:trunkfrom
frankvicky:KAFKA-18619
Jan 29, 2025
Merged

KAFKA-18619: New consumer topic metadata events should set requireMetadata flag#18668
lianetm merged 1 commit intoapache:trunkfrom
frankvicky:KAFKA-18619

Conversation

@frankvicky
Copy link
Copy Markdown
Contributor

JIRA: KAFKA-18619
In short, the new async consumer's topic metadata operations are unwared of metadata errors because of not overriding requireSubscriptionMetadata.
For further details, please refer to the jira ticket.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@github-actions github-actions Bot added triage PRs from the community consumer clients small Small PRs labels Jan 22, 2025
@frankvicky
Copy link
Copy Markdown
Contributor Author

Hi @lianetm,

I have enabled the SaslClientsWithInvalidCredentialsTest.scala for the new consumer with this patch. Most of the tests pass, except for testConsumerWithAuthenticationFailure. This test fails due to a retry assertion (expected 1 but got 0).

However, I believe this is a separate issue that needs investigation and is not related to this patch.

@lianetm
Copy link
Copy Markdown
Member

lianetm commented Jan 22, 2025

Thanks for the patch @frankvicky! Could you enable here all the tests that pass so we can have validation of the change? We can investigate a bit the one that still fails, if it's related to another issue we can track and address it separately. Makes sense?

Comment on lines +225 to +222
private def verifyWithRetryPredicate(predicate: => Boolean): Unit = {
var attempts = 0
TestUtils.waitUntilTrue(() => {
try {
attempts += 1
predicate
} catch {
case _: SaslAuthenticationException => false
}
}, s"Operation did not succeed within timeout after $attempts")
}
Copy link
Copy Markdown
Contributor Author

@frankvicky frankvicky Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lianetm
I have found that the cause of testConsumerWithAuthenticationFailure failing is due to a non-retriable assertion.

verifyWithRetry(assertEquals(1, consumer.poll(Duration.ofMillis(1000)).count))

If we pass the assertion as an argument to verifyWithRetry, it will fail because we don't catch AssertionFailedError (and we shouldn't do it). The subscribe call needs to wait for metadata updates and it's not as quick as assign, so there's a high chance it will fail on the first poll. As mentioned, when we pass the assertion as an argument, it will fail with AssertionFailedError and cannot retry.

Copy link
Copy Markdown
Member

@lianetm lianetm Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, makes sense to me.

But then, couldn't we just change the

verifyWithRetry(assertEquals(1, consumer.poll(Duration.ofMillis(1000)).count))

to

verifyWithRetry(consumer.poll(Duration.ofMillis(1000)).count == 1) 

(just to avoid introducing this new verifyWithRetryPredicate)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, i'll give it a try

@github-actions github-actions Bot removed the triage PRs from the community label Jan 23, 2025
Copy link
Copy Markdown
Member

@lianetm lianetm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @frankvicky !

@ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumAndGroupProtocolNames)
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersClassicGroupProtocolOnly"))
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersAll"))
def testTransactionalProducerWithAuthenticationFailure(quorum: String, groupProtocol: String): Unit = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it does not use a consumer. Could you double check and if so, then no need to run it for both consumers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we better remove the groupProtocol param given that it does not apply here? (and parametrize the test with the quorum only)

  @ParameterizedTest
  @ValueSource(strings = Array("kraft"))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got your point 😄

@ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumAndGroupProtocolNames)
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersClassicGroupProtocolOnly"))
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersAll"))
def testKafkaAdminClientWithAuthenticationFailure(quorum: String, groupProtocol: String): Unit = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar as above, looks like it does need to run for both consumers?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines +225 to +222
private def verifyWithRetryPredicate(predicate: => Boolean): Unit = {
var attempts = 0
TestUtils.waitUntilTrue(() => {
try {
attempts += 1
predicate
} catch {
case _: SaslAuthenticationException => false
}
}, s"Operation did not succeed within timeout after $attempts")
}
Copy link
Copy Markdown
Member

@lianetm lianetm Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, makes sense to me.

But then, couldn't we just change the

verifyWithRetry(assertEquals(1, consumer.poll(Duration.ofMillis(1000)).count))

to

verifyWithRetry(consumer.poll(Duration.ofMillis(1000)).count == 1) 

(just to avoid introducing this new verifyWithRetryPredicate)

@frankvicky frankvicky force-pushed the KAFKA-18619 branch 2 times, most recently from e7ba1be to d82baf5 Compare January 24, 2025 04:30
Copy link
Copy Markdown
Member

@lianetm lianetm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

@ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumAndGroupProtocolNames)
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersClassicGroupProtocolOnly"))
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersAll"))
def testTransactionalProducerWithAuthenticationFailure(quorum: String, groupProtocol: String): Unit = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we better remove the groupProtocol param given that it does not apply here? (and parametrize the test with the quorum only)

  @ParameterizedTest
  @ValueSource(strings = Array("kraft"))

@ParameterizedTest(name = TestInfoUtils.TestWithParameterizedQuorumAndGroupProtocolNames)
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersClassicGroupProtocolOnly"))
@MethodSource(Array("getTestQuorumAndGroupProtocolParametersAll"))
def testKafkaAdminClientWithAuthenticationFailure(quorum: String, groupProtocol: String): Unit = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Member

@lianetm lianetm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM. I just re-triggered the build to see if we can get rid of some of the unrelated failures that have been fixed recently.

@frankvicky
Copy link
Copy Markdown
Contributor Author

Failed test is handled by #18735

@lianetm lianetm merged commit 97a2280 into apache:trunk Jan 29, 2025
lianetm pushed a commit that referenced this pull request Jan 29, 2025
…adata flag (#18668)

Reviewers: Lianet Magrans <lmagrans@confluent.io>
@lianetm
Copy link
Copy Markdown
Member

lianetm commented Jan 29, 2025

Merged to trunk and cherry-picked to 4.0 52280cd

ijuma added a commit to ijuma/kafka that referenced this pull request Jan 30, 2025
…ibrdkafka-compressed-produce-fails

* apache-github/trunk:
  MINOR: prevent exception from HdrHistogram (apache#18674)
  KAFKA-18653: Fix mocks and potential thread leak issues causing silent RejectedExecutionException in share group broker tests (apache#18725)
  KAFKA-18646: Null records in fetch response breaks librdkafka (apache#18726)
  KAFKA-18619: New consumer topic metadata events should set requireMetadata flag (apache#18668)
  KAFKA-18488: Improve KafkaShareConsumerTest (apache#18728)
pdruley pushed a commit to pdruley/kafka that referenced this pull request Feb 12, 2025
…adata flag (apache#18668)

Reviewers: Lianet Magrans <lmagrans@confluent.io>
manoj-mathivanan pushed a commit to manoj-mathivanan/kafka that referenced this pull request Feb 19, 2025
…adata flag (apache#18668)

Reviewers: Lianet Magrans <lmagrans@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants