Skip to content

MINOR: fix flaky StreamsUpgradeTestIntegrationTest#7974

Merged
vvcephei merged 2 commits intoapache:trunkfrom
vvcephei:MINOR-fix-flaky-upgrade-test
Jan 16, 2020
Merged

MINOR: fix flaky StreamsUpgradeTestIntegrationTest#7974
vvcephei merged 2 commits intoapache:trunkfrom
vvcephei:MINOR-fix-flaky-upgrade-test

Conversation

@vvcephei
Copy link
Copy Markdown
Contributor

This test failed on trunk recently. It looks like a race condition in which kafkaStreams4 hadn't completed the rebalance by the time we tested the final version number. Rather than trying to synchronize on state transitions, it's more foolproof (although potentially more opaque) to just wait a while for the versions to converge.

See https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk11/detail/kafka-trunk-jdk11/1085/tests for the failure

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@vvcephei vvcephei requested a review from bbejeck January 16, 2020 16:22
@guozhangwang
Copy link
Copy Markdown
Contributor

/home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.13/streams/src/test/java/org/apache/kafka/streams/integration/StoreQuerySuite.java:46: error: cannot find symbol
08:39:47                         LagFetchIntegrationTest.class,
08:39:47                         ^
08:39:47   symbol: class LagFetchIntegrationTest
08:39:58 1 error

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's compilation error on the test suite, and other than that it lgtm.

assertThat(usedVersion6.get(), is(LATEST_SUPPORTED_VERSION + 1));
assertThat(usedVersion5.get(), is(LATEST_SUPPORTED_VERSION + 1));
assertThat(usedVersion4.get(), is(LATEST_SUPPORTED_VERSION + 1));
retryOnExceptionWithTimeout(() -> assertThat(usedVersion6.get(), is(LATEST_SUPPORTED_VERSION + 1)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider doing the same for second / first roll as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @guozhangwang . It should be unnecessary, since they already block until running (in startSync), and they should have gotten the expected number deterministically before transitioning to running. Time will tell if I am wrong, though...

@vvcephei
Copy link
Copy Markdown
Contributor Author

@guozhangwang Thanks for the review!

Oops, I dragged in an unrelated change from another branch while preparing the PR. Sorry about that.

I'll wait for the tests to pass before merging.

@vvcephei vvcephei removed the request for review from bbejeck January 16, 2020 18:22
@vvcephei
Copy link
Copy Markdown
Contributor Author

The Java 11 failures were unrelated, known flaky tests:

kafka.api.ConsumerBounceTest.testCloseDuringRebalance    kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

Proceeding to merge...

@vvcephei vvcephei merged commit cd4ab41 into apache:trunk Jan 16, 2020
@vvcephei vvcephei deleted the MINOR-fix-flaky-upgrade-test branch January 16, 2020 22:31
ijuma added a commit to confluentinc/kafka that referenced this pull request Jan 21, 2020
Conflicts or compilation errors due to the fact that we temporarily
reverted the commit that removes Scala 2.11 support:

* AclCommand.scala: take upstream changes.
* AclCommandTest.scala: take upstream changes.
* TransactionCoordinatorTest.scala: don't use SAMs, but adjust
mock call to putTransactionStateIfNotExists given new signature.
* TransactionStateManagerTest: use Runnable instead of SAMs.
* PartitionLockTest: use Runnable instead of SAMs.
* docs/upgrade.html: take upstream changes excluding line that
states that Scala 2.11 support has been removed.

* apache-github/trunk: (28 commits)
  KAFKA-9457; Fix flaky test org.apache.kafka.common.network.SelectorTest.testGracefulClose (apache#7989)
  MINOR: Update AclCommand help message to match implementation (apache#7990)
  MINOR: Update introduction page in Kafka documentation
  MINOR: Use Math.min for StreamsPartitionAssignor#updateMinReceivedVersion method (apache#7954)
  KAFKA-9338; Fetch session should cache request leader epoch (apache#7970)
  KAFKA-9329; KafkaController::replicasAreValid should return error message (apache#7865)
  KAFKA-9449; Adds support for closing the producer's BufferPool. (apache#7967)
  MINOR: Handle expandIsr in PartitionLockTest and ensure read threads not blocked on write (apache#7973)
  MINOR: Fix typo in connect integration test class name (apache#7976)
  KAFKA-9218: MirrorMaker 2 can fail to create topics (apache#7745)
  KAFKA-8847; Deprecate and remove usage of supporting classes in kafka.security.auth (apache#7966)
  MINOR: Suppress DescribeConfigs Denied log during CreateTopics (apache#7971)
  [MINOR]: Fix typo in Fetcher comment (apache#7934)
  MINOR: Remove unnecessary call to `super` in `MetricConfig` constructor (apache#7975)
  MINOR: fix flaky StreamsUpgradeTestIntegrationTest (apache#7974)
  KAFKA-9431: Expose API in KafkaStreams to fetch all local offset lags (apache#7961)
  KAFKA-9235; Ensure transaction coordinator is stopped after replica deletion (apache#7963)
  KAFKA-9410; Make groupId Optional in KafkaConsumer (apache#7943)
  MINOR: Removed accidental double negation in error message. (apache#7834)
  KAFKA-6144: IQ option to query standbys (apache#7962)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants