KAFKA-9802; Increase transaction timeout in system tests to reduce flakiness#8736
Merged
hachikuji merged 2 commits intoapache:trunkfrom May 28, 2020
Merged
KAFKA-9802; Increase transaction timeout in system tests to reduce flakiness#8736hachikuji merged 2 commits intoapache:trunkfrom
hachikuji merged 2 commits intoapache:trunkfrom
Conversation
ijuma
approved these changes
May 28, 2020
Member
ijuma
left a comment
There was a problem hiding this comment.
LGTM, thanks. There were two spaces before addingReplicas in the fixed log output. Is that correct or an issue during copy and paste? Looking at the code, it seemed like there should be a single space.
Contributor
Author
|
Thanks, it was just a copy/paste bug I think. I tweaked the output a little. |
bob-barrett
reviewed
May 28, 2020
Contributor
bob-barrett
left a comment
There was a problem hiding this comment.
The fix looks good to me, I just had one question about the new timeout value
| self.num_seed_messages = 100000 | ||
| self.transaction_size = 750 | ||
| self.transaction_timeout = 10000 | ||
| self.transaction_timeout = 30000 |
Contributor
There was a problem hiding this comment.
If the producer can block for up to 30 seconds, do we want a slightly longer transaction timeout, just to have some buffer?
Contributor
Author
There was a problem hiding this comment.
That's fair. Maybe we can add 5-10s padding.
hachikuji
added a commit
that referenced
this pull request
May 28, 2020
…akiness (#8736) We have been seeing increased flakiness in transaction system tests. I believe the cause might be due to KIP-537, which increased the default zk session timeout from 6s to 18s and the default replica lag timeout from 10s to 30s. In the system test, we use the default transaction timeout of 10s. However, since the system test involves hard failures, the Produce request could be blocking for as long as the max of these two in order to wait for an ISR shrink. Hence this patch increases the timeout to 30s. Reviewers: Bob Barrett <bob.barrett@confluent.io>, Ismael Juma <github@juma.me.uk>
Kvicii
pushed a commit
to Kvicii/kafka
that referenced
this pull request
May 30, 2020
* 'trunk' of github.com:apache/kafka: (36 commits) Remove redundant `containsKey` call in KafkaProducer (apache#8761) KAFKA-9494; Include additional metadata information in DescribeConfig response (KIP-569) (apache#8723) KAFKA-10061; Fix flaky `ReassignPartitionsIntegrationTest.testCancellation` (apache#8749) KAFKA-9130; KIP-518 Allow listing consumer groups per state (apache#8238) KAFKA-9501: convert between active and standby without closing stores (apache#8248) KAFKA-10056; Ensure consumer metadata contains new topics on subscription change (apache#8739) MINOR: Log the reason for coordinator discovery failure (apache#8747) KAFKA-10029; Don't update completedReceives when channels are closed to avoid ConcurrentModificationException (apache#8705) MINOR: remove unnecessary timeout for admin request (apache#8738) MINOR: Relax Percentiles test (apache#8748) MINOR: regression test for task assignor config (apache#8743) MINOR: Update documentation.html to refer to 2.6 (apache#8745) MINOR: Update documentation.html to refer to 2.5 (apache#8744) KAFKA-9673: Filter and Conditional SMTs (apache#8699) KAFKA-9971: Error Reporting in Sink Connectors (KIP-610) (apache#8720) KAFKA-10052: Harden assertion of topic settings in Connect integration tests (apache#8735) MINOR: Slight MetadataCache tweaks to avoid unnecessary work (apache#8728) KAFKA-9802; Increase transaction timeout in system tests to reduce flakiness (apache#8736) KAFKA-10050: kafka_log4j_appender.py fixed for JDK11 (apache#8731) KAFKA-9146: Add option to force delete active members in StreamsResetter (apache#8589) ... # Conflicts: # core/src/main/scala/kafka/log/Log.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We have been seeing increased flakiness in transaction system tests. I believe the cause might be due to KIP-537, which increased the default zk session timeout from 6s to 18s and the default replica lag timeout from 10s to 30s. In the system test, we use the default transaction timeout of 10s. However, since the system test involves hard failures, the Produce request could be blocking for as long as the max of these two in order to wait for an ISR shrink. Hence this patch increases the timeout to 30s.
Note this patch also includes a minor logging fix in
Partition. Previously we would see messages like the following:This patch fixes the log to print as the following:
Committer Checklist (excluded from commit message)