MINOR: Do not wait for first line of console consumer output since we now have a more reliable test using JMX by ewencp · Pull Request #3447 · apache/kafka

ewencp · 2017-06-27T23:00:57Z

Waiting for the first line of output was added in KAFKA-2527 when JmxMixin was originally added as a heuristic to
determine when the process was ready. We've since determined this is not good enough given JmxTool's limitations
and now include a separate, more reliable check before starting JmxTool. This check is also dangerous since a
consumer that is started before data is available in the topic, it won't output anything to stdout and only logs
errors to a separate log file. This means we may have a long delay between starting the process and starting JMX
monitoring.

Since we have a more reliable check for liveness via JMX now (and in cases that need it, partition assignment
metrics via JMX), we should no longer need to wait for the first line of output.

… now have a more reliable test using JMX Waiting for the first line of output was added in KAFKA-2527 when JmxMixin was originally added as a heuristic to determine when the process was ready. We've since determined this is not good enough given JmxTool's limitations and now include a separate, more reliable check before starting JmxTool. This check is also dangerous since a consumer that is started before data is available in the topic, it won't output anything to stdout and only logs errors to a separate log file. This means we may have a long delay between starting the process and starting JMX monitoring. Since we have a more reliable check for liveness via JMX now (and in cases that need it, partition assignment metrics via JMX), we should no longer need to wait for the first line of output.

ewencp · 2017-06-27T23:04:16Z

@apurvam @ijuma This is particularly sensitive code since it's used everywhere so I'm doing at least one full round of tests here: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/941

I found this because of a timeout in an external test, but realized we could refine the last patch that added the nc check.

There is a slight change in behavior in this version because we only wait for JMX not the first message, where the first message would normally indicate a message had been consumed and therefore also indicates assignment must have happened. I'm honestly not sure if we have tests that depend on that behavior (unless there's some other output that may happen too, but I think we split out stdout, stderr, and log for just that reason). I am wondering if we should just put a wait_until on has_partitions_assigned after the JMX setup when it's the new consumer just to make use of the class simpler.

asfgit · 2017-06-28T00:07:11Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/5757/
Test PASSed (JDK 7 and Scala 2.11).

asfgit · 2017-06-28T00:29:33Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/5743/
Test PASSed (JDK 8 and Scala 2.12).

ijuma · 2017-06-28T08:15:55Z

Thanks for the PR. Is there any downside to adding the wait_until check you suggested? Not sure if we have any tests that would not to wait until the assignment has happened. If we do, we could make it optional, but true by default.

ewencp · 2017-06-28T16:15:12Z

To be honest, I'm not sure if there is a downside. I doubt any tests would rely on not waiting, but I can't be sure. Also, it would still only apply to the new consumer.

I can update the patch and re-run tests, but there are also some tests failing that I need to check on -- some of them do seem to be related to the consumer, but not sure if they are new failures or the same as the nightly failures we saw recently.

ewencp · 2017-06-28T16:28:59Z

Actually @ijuma I take it back. After writing the code I realize that adding that check doesn't really do anything. It would just be between the JMX code and the for loop over output. Since getting any stdout output would require that we've been assigned partitions, the loop effectively enforces that condition. The only thing we would gain by adding that check is some notification that we never successfully assigned partitions in the case that we get no output, but that seems like a very unusual case that we probably don't need to specifically catch.

apurvam · 2017-06-28T22:41:47Z

I agree with @ewencp that adding the wait_until check for assigned partitions doesn't make sense in the context of these changes, since the presence of stdout implies that partitions are assigned.

However, we do have tests that don't depend on consuming messages, but do depend on having partitions assigned, like the produce_consume_validate.py which explicitly checks for assigned partitions. See https://github.com/apache/kafka/blob/trunk/tests/kafkatest/tests/produce_consume_validate.py#L53

In particular, in the produce_consume_validate context, it is possible that the topic being consumed from is empty when you start the consumer. In that case, waiting on has_partitions_assigned is the only way you can be sure that the consumer is ready and will not miss any messages from the producer which starts later.

That is the only use of has_partitions_assigned that I know of, and it doesn't apply here. So I think we can leave it out.

ijuma

LGTM. @ewencp, I'll let you merge it as I am not sure which branches you want to merge this to.

ewencp · 2017-07-18T04:23:55Z

Thanks, I'll merge back to 0.10.1 since that's how far back I merged the other fix. 0.10.1 will need a manual merge, but it should be straightforward.

… now have a more reliable test using JMX Waiting for the first line of output was added in KAFKA-2527 when JmxMixin was originally added as a heuristic to determine when the process was ready. We've since determined this is not good enough given JmxTool's limitations and now include a separate, more reliable check before starting JmxTool. This check is also dangerous since a consumer that is started before data is available in the topic, it won't output anything to stdout and only logs errors to a separate log file. This means we may have a long delay between starting the process and starting JMX monitoring. Since we have a more reliable check for liveness via JMX now (and in cases that need it, partition assignment metrics via JMX), we should no longer need to wait for the first line of output. Author: Ewen Cheslack-Postava <me@ewencp.org> Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io> Closes #3447 from ewencp/dont-wait-first-line-console-consumer (cherry picked from commit d663005) Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>

ijuma approved these changes Jul 14, 2017

View reviewed changes

asfgit closed this in d663005 Jul 18, 2017

apurvam mentioned this pull request Jul 19, 2017

KAFKA-5608: Add --wait option for JmxTool and use in system tests to avoid race between JmxTool and monitored services #3547

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MINOR: Do not wait for first line of console consumer output since we now have a more reliable test using JMX#3447

MINOR: Do not wait for first line of console consumer output since we now have a more reliable test using JMX#3447
ewencp wants to merge 1 commit intoapache:trunkfrom
ewencp:dont-wait-first-line-console-consumer

ewencp commented Jun 27, 2017

Uh oh!

ewencp commented Jun 27, 2017

Uh oh!

asfgit commented Jun 28, 2017

Uh oh!

asfgit commented Jun 28, 2017

Uh oh!

ijuma commented Jun 28, 2017

Uh oh!

ewencp commented Jun 28, 2017

Uh oh!

ewencp commented Jun 28, 2017

Uh oh!

apurvam commented Jun 28, 2017

Uh oh!

ijuma left a comment

Uh oh!

ewencp commented Jul 18, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ewencp commented Jun 27, 2017

Uh oh!

ewencp commented Jun 27, 2017

Uh oh!

asfgit commented Jun 28, 2017

Uh oh!

asfgit commented Jun 28, 2017

Uh oh!

ijuma commented Jun 28, 2017

Uh oh!

ewencp commented Jun 28, 2017

Uh oh!

ewencp commented Jun 28, 2017

Uh oh!

apurvam commented Jun 28, 2017

Uh oh!

ijuma left a comment

Choose a reason for hiding this comment

Uh oh!

ewencp commented Jul 18, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants