Skip to content

KAFKA-9530; Fix flaky test testDescribeGroupWithShortInitializationTimeout#8154

Merged
hachikuji merged 3 commits intoapache:trunkfrom
hachikuji:KAFKA-9530
Feb 23, 2020
Merged

KAFKA-9530; Fix flaky test testDescribeGroupWithShortInitializationTimeout#8154
hachikuji merged 3 commits intoapache:trunkfrom
hachikuji:KAFKA-9530

Conversation

@hachikuji
Copy link
Copy Markdown
Contributor

@hachikuji hachikuji commented Feb 21, 2020

This should fix the flakiness with this test case. With a short timeout, the call may timeout and the client might disconnect. Currently this can be exposed to the user as either a TimeoutException or a DisconnectException. To be consistent, rather than exposing the underlying retriable error, we handle both cases with a TimeoutException.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Copy Markdown

@abbccdda abbccdda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix LGTM, just make sure to retry the test enough time locally.

Comment thread clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java Outdated
@hachikuji
Copy link
Copy Markdown
Contributor Author

Two green builds!

new Exception(prettyPrintException(throwable)));
}
handleFailure(throwable);
failWithTimeout(now, throwable);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below, we have a case where we throw the underlying exception if we run out of retries. Is that intentional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was debating changing that one too. If retries are exhausted, do we consider that a timeout? I guess it's defensible.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if retries are exhausted and it's a retriable exception, then it seems like it should be a TimeoutException.

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hachikuji hachikuji merged commit 2df7ea5 into apache:trunk Feb 23, 2020
@ijuma
Copy link
Copy Markdown
Member

ijuma commented Feb 24, 2020

@hachikuji please cherry-pick to the 2.5 branch as well.

ijuma added a commit to confluentinc/kafka that referenced this pull request Feb 24, 2020
* apache-github/trunk: (23 commits)
  KAFKA-9530; Fix flaky test `testDescribeGroupWithShortInitializationTimeout` (apache#8154)
  HOTFIX: fix NPE in Kafka Streams IQ (apache#8158)
  MINOR: set scala version automatically based on gradle.properties
  KAFKA-9577; SaslClientAuthenticator incorrectly negotiates SASL_HANDSHAKE version (apache#8142)
  KAFKA-9441: Add internal TransactionManager (apache#8105)
  MINOR: Document endpoints for connector topic tracking (KIP-558)
  MINOR: Standby task commit needed when offsets updated (apache#8146)
  KAFKA-9206; Throw KafkaException on CORRUPT_MESSAGE error in Fetch response (apache#8111)
  MINOR: Remove unwanted regexReplace on tests/kafkatest/__init__.py
  KAFKA-9586: Fix errored json filename in ops documentation
  KAFKA-9575: Mention ZooKeeper 3.5.7 upgrade
  KAFKA-9481: Graceful handling TaskMigrated and TaskCorrupted (apache#8058)
  HOTFIX: don't try to remove uninitialized changelogs from assignment & don't prematurely mark task closed (apache#8140)
  MINOR: Fix javadoc at org.apache.kafka.clients.producer.KafkaProducer.InterceptorCallback#onCompletion (apache#7337)
  MINOR: Improve EOS example exception handling (apache#8052)
  MINOR: Fix a number of warnings in clients test (apache#8073)
  MINOR: Update shell scripts to support z/OS system (apache#7913)
  MINOR: Wording fix in Streams DSL docs (apache#5692)
  MINOR: Add missing @test annotation to MetadataTest#testMetadataMerge (apache#8141)
  KAFKA-9533: ValueTransform forwards `null` values (apache#8108)
  ...
ijuma added a commit to confluentinc/kafka that referenced this pull request Feb 24, 2020
…etrics-common

* confluent/master: (76 commits)
  KAFKA-9530; Fix flaky test `testDescribeGroupWithShortInitializationTimeout` (apache#8154)
  HOTFIX: fix NPE in Kafka Streams IQ (apache#8158)
  MINOR: set scala version automatically based on gradle.properties
  KAFKA-9577; SaslClientAuthenticator incorrectly negotiates SASL_HANDSHAKE version (apache#8142)
  KAFKA-9441: Add internal TransactionManager (apache#8105)
  MINOR: Document endpoints for connector topic tracking (KIP-558)
  MINOR: Standby task commit needed when offsets updated (apache#8146)
  Changes to migrate to Artifactory (#263)
  KAFKA-9206; Throw KafkaException on CORRUPT_MESSAGE error in Fetch response (apache#8111)
  MINOR: Remove unwanted regexReplace on tests/kafkatest/__init__.py
  KAFKA-9586: Fix errored json filename in ops documentation
  KAFKA-9575: Mention ZooKeeper 3.5.7 upgrade
  KAFKA-9481: Graceful handling TaskMigrated and TaskCorrupted (apache#8058)
  HOTFIX: don't try to remove uninitialized changelogs from assignment & don't prematurely mark task closed (apache#8140)
  MINOR: Fix javadoc at org.apache.kafka.clients.producer.KafkaProducer.InterceptorCallback#onCompletion (apache#7337)
  MINOR: Improve EOS example exception handling (apache#8052)
  MINOR: Fix a number of warnings in clients test (apache#8073)
  MINOR: Update shell scripts to support z/OS system (apache#7913)
  MINOR: Wording fix in Streams DSL docs (apache#5692)
  MINOR: Add missing @test annotation to MetadataTest#testMetadataMerge (apache#8141)
  ...
mumrah pushed a commit that referenced this pull request Mar 5, 2020
…imeout` (#8154)

With a short timeout, a call in KafkaAdminClient may timeout and the client might disconnect. Currently this can be exposed to the user as either a TimeoutException or a DisconnectException. To be consistent, rather than exposing the underlying retriable error, we handle both cases with a TimeoutException.

Reviewers: Boyang Chen <boyang@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants