Skip to content

MINOR: Do not wait for first line of console consumer output since we now have a more reliable test using JMX#3447

Closed
ewencp wants to merge 1 commit intoapache:trunkfrom
ewencp:dont-wait-first-line-console-consumer
Closed

MINOR: Do not wait for first line of console consumer output since we now have a more reliable test using JMX#3447
ewencp wants to merge 1 commit intoapache:trunkfrom
ewencp:dont-wait-first-line-console-consumer

Conversation

@ewencp
Copy link
Copy Markdown
Contributor

@ewencp ewencp commented Jun 27, 2017

Waiting for the first line of output was added in KAFKA-2527 when JmxMixin was originally added as a heuristic to
determine when the process was ready. We've since determined this is not good enough given JmxTool's limitations
and now include a separate, more reliable check before starting JmxTool. This check is also dangerous since a
consumer that is started before data is available in the topic, it won't output anything to stdout and only logs
errors to a separate log file. This means we may have a long delay between starting the process and starting JMX
monitoring.

Since we have a more reliable check for liveness via JMX now (and in cases that need it, partition assignment
metrics via JMX), we should no longer need to wait for the first line of output.

… now have a more reliable test using JMX

Waiting for the first line of output was added in KAFKA-2527 when JmxMixin was originally added as a heuristic to
determine when the process was ready. We've since determined this is not good enough given JmxTool's limitations
and now include a separate, more reliable check before starting JmxTool. This check is also dangerous since a
consumer that is started before data is available in the topic, it won't output anything to stdout and only logs
errors to a separate log file. This means we may have a long delay between starting the process and starting JMX
monitoring.

Since we have a more reliable check for liveness via JMX now (and in cases that need it, partition assignment
metrics via JMX), we should no longer need to wait for the first line of output.
@ewencp
Copy link
Copy Markdown
Contributor Author

ewencp commented Jun 27, 2017

@apurvam @ijuma This is particularly sensitive code since it's used everywhere so I'm doing at least one full round of tests here: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/941

I found this because of a timeout in an external test, but realized we could refine the last patch that added the nc check.

There is a slight change in behavior in this version because we only wait for JMX not the first message, where the first message would normally indicate a message had been consumed and therefore also indicates assignment must have happened. I'm honestly not sure if we have tests that depend on that behavior (unless there's some other output that may happen too, but I think we split out stdout, stderr, and log for just that reason). I am wondering if we should just put a wait_until on has_partitions_assigned after the JMX setup when it's the new consumer just to make use of the class simpler.

@asfgit
Copy link
Copy Markdown

asfgit commented Jun 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/5757/
Test PASSed (JDK 7 and Scala 2.11).

@asfgit
Copy link
Copy Markdown

asfgit commented Jun 28, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/5743/
Test PASSed (JDK 8 and Scala 2.12).

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jun 28, 2017

Thanks for the PR. Is there any downside to adding the wait_until check you suggested? Not sure if we have any tests that would not to wait until the assignment has happened. If we do, we could make it optional, but true by default.

@ewencp
Copy link
Copy Markdown
Contributor Author

ewencp commented Jun 28, 2017

To be honest, I'm not sure if there is a downside. I doubt any tests would rely on not waiting, but I can't be sure. Also, it would still only apply to the new consumer.

I can update the patch and re-run tests, but there are also some tests failing that I need to check on -- some of them do seem to be related to the consumer, but not sure if they are new failures or the same as the nightly failures we saw recently.

@ewencp
Copy link
Copy Markdown
Contributor Author

ewencp commented Jun 28, 2017

Actually @ijuma I take it back. After writing the code I realize that adding that check doesn't really do anything. It would just be between the JMX code and the for loop over output. Since getting any stdout output would require that we've been assigned partitions, the loop effectively enforces that condition. The only thing we would gain by adding that check is some notification that we never successfully assigned partitions in the case that we get no output, but that seems like a very unusual case that we probably don't need to specifically catch.

@apurvam
Copy link
Copy Markdown
Contributor

apurvam commented Jun 28, 2017

I agree with @ewencp that adding the wait_until check for assigned partitions doesn't make sense in the context of these changes, since the presence of stdout implies that partitions are assigned.

However, we do have tests that don't depend on consuming messages, but do depend on having partitions assigned, like the produce_consume_validate.py which explicitly checks for assigned partitions. See https://github.com/apache/kafka/blob/trunk/tests/kafkatest/tests/produce_consume_validate.py#L53

In particular, in the produce_consume_validate context, it is possible that the topic being consumed from is empty when you start the consumer. In that case, waiting on has_partitions_assigned is the only way you can be sure that the consumer is ready and will not miss any messages from the producer which starts later.

That is the only use of has_partitions_assigned that I know of, and it doesn't apply here. So I think we can leave it out.

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @ewencp, I'll let you merge it as I am not sure which branches you want to merge this to.

@ewencp
Copy link
Copy Markdown
Contributor Author

ewencp commented Jul 18, 2017

Thanks, I'll merge back to 0.10.1 since that's how far back I merged the other fix. 0.10.1 will need a manual merge, but it should be straightforward.

asfgit pushed a commit that referenced this pull request Jul 18, 2017
… now have a more reliable test using JMX

Waiting for the first line of output was added in KAFKA-2527 when JmxMixin was originally added as a heuristic to
determine when the process was ready. We've since determined this is not good enough given JmxTool's limitations
and now include a separate, more reliable check before starting JmxTool. This check is also dangerous since a
consumer that is started before data is available in the topic, it won't output anything to stdout and only logs
errors to a separate log file. This means we may have a long delay between starting the process and starting JMX
monitoring.

Since we have a more reliable check for liveness via JMX now (and in cases that need it, partition assignment
metrics via JMX), we should no longer need to wait for the first line of output.

Author: Ewen Cheslack-Postava <me@ewencp.org>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io>

Closes #3447 from ewencp/dont-wait-first-line-console-consumer

(cherry picked from commit d663005)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
asfgit pushed a commit that referenced this pull request Jul 18, 2017
… now have a more reliable test using JMX

Waiting for the first line of output was added in KAFKA-2527 when JmxMixin was originally added as a heuristic to
determine when the process was ready. We've since determined this is not good enough given JmxTool's limitations
and now include a separate, more reliable check before starting JmxTool. This check is also dangerous since a
consumer that is started before data is available in the topic, it won't output anything to stdout and only logs
errors to a separate log file. This means we may have a long delay between starting the process and starting JMX
monitoring.

Since we have a more reliable check for liveness via JMX now (and in cases that need it, partition assignment
metrics via JMX), we should no longer need to wait for the first line of output.

Author: Ewen Cheslack-Postava <me@ewencp.org>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io>

Closes #3447 from ewencp/dont-wait-first-line-console-consumer

(cherry picked from commit d663005)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
@asfgit asfgit closed this in d663005 Jul 18, 2017
asfgit pushed a commit that referenced this pull request Jul 18, 2017
… now have a more reliable test using JMX

Waiting for the first line of output was added in KAFKA-2527 when JmxMixin was originally added as a heuristic to
determine when the process was ready. We've since determined this is not good enough given JmxTool's limitations
and now include a separate, more reliable check before starting JmxTool. This check is also dangerous since a
consumer that is started before data is available in the topic, it won't output anything to stdout and only logs
errors to a separate log file. This means we may have a long delay between starting the process and starting JMX
monitoring.

Since we have a more reliable check for liveness via JMX now (and in cases that need it, partition assignment
metrics via JMX), we should no longer need to wait for the first line of output.

Author: Ewen Cheslack-Postava <me@ewencp.org>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Apurva Mehta <apurva@confluent.io>

Closes #3447 from ewencp/dont-wait-first-line-console-consumer

(cherry picked from commit d663005)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants