Skip to content

KAFKA-8070: Increase consumer startup timeout in system tests#6405

Merged
rajinisivaram merged 1 commit intoapache:trunkfrom
rajinisivaram:KAFKA-8070-consumer-systemtest
Mar 8, 2019
Merged

KAFKA-8070: Increase consumer startup timeout in system tests#6405
rajinisivaram merged 1 commit intoapache:trunkfrom
rajinisivaram:KAFKA-8070-consumer-systemtest

Conversation

@rajinisivaram
Copy link
Copy Markdown
Contributor

We currently use 10 seconds as the timeout for ConsoleConsumer process to be started in ConsumerGroupCommandTest. For tests using SSL, this requires SSL keystores to be created first and then the process is started. Looking at successful test runs, it typically takes between 5 and 7 seconds to start SSL-enabled consumer process. But there have been several test failures that show Consumer was too slow to start in tests using SSL. The logs from the last two failures in ConsumerGroupCommand test had consumers which successfully started with SSL, but took ~13 seconds to log their first message. Hence changing the timeout to 20 seconds in system tests that check for consumer start up.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@rajinisivaram rajinisivaram requested a review from omkreddy March 8, 2019 14:34
@ijuma
Copy link
Copy Markdown
Member

ijuma commented Mar 8, 2019

@rajinisivaram do we know what the consumer is doing through this period?

@rajinisivaram
Copy link
Copy Markdown
Contributor Author

@ijuma The failing tests are SSL tests. The timer starts when we decide to start the SSL consumer and ends when the process is up and running. For SSL tests, we create keystore, copy them to the worker etc. after the timer starts. And that seems to take a bit longer sometimes.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Mar 8, 2019

Sorry, my question was unclear. I was asking if we knew which parts were slower when the timeout triggered. In the past, we found some regressions when longer timeouts were needed for tests that had been there for a long time.

@rajinisivaram
Copy link
Copy Markdown
Contributor Author

@ijuma The console consumer start command was issued after the 10 second timeout. So the delay was in creating/copying keystores/truststores. The only odd thing I see in the logs are two sets of mkdir -p /mnt/security (in both successful and failed runs). I will take a look to see if we are creating keystores twice unnecessarily.

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rajinisivaram
Copy link
Copy Markdown
Contributor Author

@ijuma Thanks for the review, merging to trunk and 2.2. I have created https://issues.apache.org/jira/browse/KAFKA-8074 to review all the system test services to ensure we are creating keystores/truststores only once for each client/server.

@rajinisivaram rajinisivaram merged commit 460b3a6 into apache:trunk Mar 8, 2019
rajinisivaram added a commit that referenced this pull request Mar 8, 2019
For consumers using SSL, this timeout includes the time to create and copy keystores and truststores and sometime it takes longer than 10s to complete the security setup before starting the consumer process.

Reviewers: Ismael Juma <ismael@juma.me.uk>
jarekr pushed a commit to confluentinc/kafka that referenced this pull request Apr 18, 2019
* warn-apache-kafka/trunk: (41 commits)
  MINOR: Avoid double null check in KStream#transform() (apache#6429)
  KAFKA-7944: Improve Suppress test coverage (apache#6382)
  KAFKA-3522: add missing guards for TimestampedXxxStore (apache#6356)
  MINOR: Change Trogdor agent's cleanup executor to a cached thread pool (apache#6309)
  KAFKA-7976; Update config before notifying controller of unclean leader update (apache#6426)
  KAFKA-7801: TopicCommand should not be able to alter transaction topic partition count
  KAFKA-8091; Wait for processor shutdown before testing removed listeners (apache#6425)
  MINOR: Update delete topics zk path in assertion error messages
  KAFKA-7939: Fix timing issue in KafkaAdminClientTest.testCreateTopicsRetryBackoff
  KAFKA-7922: Return authorized operations in Metadata request response (KIP-430 Part-2)
  MINOR: Print usage when parse fails during console producer
  MINOR: fix Scala compiler warning (apache#6417)
  KAFKA-7288; Fix check in SelectorTest to wait for no buffered bytes (apache#6415)
  KAFKA-8065: restore original input record timestamp in forward() (apache#6393)
  MINOR: cleanup deprectaion annotations (apache#6290)
  KAFKA-3522: Add TimestampedWindowStore builder/runtime classes (apache#6173)
  KAFKA-8069; Fix early expiration of offsets due to invalid loading of expire timestamp (apache#6401)
  KAFKA-8070: Increase consumer startup timeout in system tests (apache#6405)
  KAFKA-8040: Streams handle initTransactions timeout (apache#6372)
  KAFKA-7980 - Fix timing issue in SocketServerTest.testConnectionRateLimit (apache#6391)
  ...
pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
…#6405)

For consumers using SSL, this timeout includes the time to create and copy keystores and truststores and sometime it takes longer than 10s to complete the security setup before starting the consumer process.

Reviewers: Ismael Juma <ismael@juma.me.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants