Skip to content

KAFKA-12867: Fix ConsumeBenchWorker exit behavior for maxMessages config#10797

Merged
rajinisivaram merged 1 commit intoapache:trunkfrom
kowshik:KAFKA-12867_fix_trogdor_ConsumeBenchWorker
Jun 2, 2021
Merged

KAFKA-12867: Fix ConsumeBenchWorker exit behavior for maxMessages config#10797
rajinisivaram merged 1 commit intoapache:trunkfrom
kowshik:KAFKA-12867_fix_trogdor_ConsumeBenchWorker

Conversation

@kowshik
Copy link
Copy Markdown
Contributor

@kowshik kowshik commented May 31, 2021

Bug:
The trogdor ConsumeBenchWorker has a bug. It allows several consumption tasks to be run in parallel, the number is configurable using the threadsPerWorker config. If one of the consumption tasks completes executing successfully due to maxMessages being consumed, then, the consumption task prematurely notifies the doneFuture causing the entire ConsumeBenchWorker to halt. This becomes a problem when more than 1 consumption task is running in parallel, because the successful completion of 1 of the tasks shuts down the entire worker while the other tasks are still running. When the worker is shut down, it kills all the active consumption tasks, though they have not consumed maxMessages yet. This is not the desired behavior.

How to reproduce?:
The bug is easily reproducible by running a ConsumeBenchSpec task configured with maxMessages value and threadsPerWorker > 1. When a trogdor workload is started with such a spec, and when one of the threads (i.e. consumption task) has consumed maxMessages, you can notice that it prematurely shuts down the worker although the other threads have not yet consumed at least maxMessages.

Fix:
The fix is to defer the notification of the doneFuture to the CloseStatusUpdater thread. This thread is already responsible for tracking the status of the tasks and updating their status when all of the tasks complete, so this seems like the right place to inject the doneFuture notification.

@kowshik
Copy link
Copy Markdown
Contributor Author

kowshik commented May 31, 2021

cc @junrao @apovzner @rajinisivaram for review.

@apovzner It appears this behavior has been around since ConsumeBenchWorker was first implemented: #4775. Please let me know your thoughts.

Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kowshik Thanks for the PR, LGTM. Test failures not related, merging to trunk.

@rajinisivaram rajinisivaram merged commit 13ba9c9 into apache:trunk Jun 2, 2021
@kowshik kowshik deleted the KAFKA-12867_fix_trogdor_ConsumeBenchWorker branch June 3, 2021 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants