KAFKA-7576: Fix shutdown of replica fetcher threads by rajinisivaram · Pull Request #5875 · apache/kafka

rajinisivaram · 2018-11-02T23:07:52Z

ReplicaFetcherThread.shutdown attempts to close the fetcher's Selector while the thread is running. This in unsafe and can result in Selector.close() failing with an exception. The exception is caught and logged at debug level, but this can lead to socket leak if the shutdown is due to dynamic config update rather than broker shutdown. This PR changes the shutdown logic to close Selector after the replica fetcher thread is shutdown, with a wakeup() and flag to terminate blocking sends first.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

ijuma · 2018-11-03T00:49:01Z

Is the bug being fixed here a regression?

rajinisivaram · 2018-11-03T11:14:11Z

@ijuma It is not exactly a regression since the bug was originally introduced under KAFKA-6051 which was in 1.1, the same release as dynamic config updates. That can result in an exception being thrown from replica fetcher shutdown, during broker shutdown or when replica fetchers are reduced using dynamic config update. In 2.0.1 and 2.1.0, the exception propagation was fixed under KAFKA-7464, which catches and ignores the exception. That may be ok during shutdown, but would result in resource leakage with dynamic config update. So this PR is essentially redoing the changes under KAFKA-6051 and KAFKA-7464.

ijuma · 2018-11-03T16:51:12Z

Wouldn't it be better to have a way to initiate an orderly shutdown of KafkaClient? It seems a bit more general.

@ijuma Thanks for the review. I have updated with a new method in KafkaClient to initiate shutdown. Not sure if it matches what you had in mind. Let me know what you think.

ijuma · 2018-11-12T19:54:43Z

@hachikuji can you please review this one?

hachikuji · 2018-11-13T17:05:24Z

retest this please

hachikuji

Thanks @rajinisivaram. Looks good. Just had a few small comments.

hachikuji · 2018-11-14T18:20:30Z

Was there a strong reason to check aborted sends before this?

Not really, moved it to the start of the method.

hachikuji · 2018-11-14T18:41:49Z

nit: the "shutdown" at the end seems unintended?

hachikuji · 2018-11-14T18:44:43Z

This is kind of icky. I wonder if we should consider adding a field to ShutdownableThread to save the failure exception?

Agree this is not nice. The first log entry comparison was added under KAFKA-7464. And I extended that to add another of the same type. I think we want the test to ensure that exceptions are not propagated to the caller and that we are closing the blockingSend instance. And that is tested by this subset of that test:

thread.initiateShutdown() thread.awaitShutdown() verify(mockBlockingSend)

We dont really need to verify the log entry at all for this. I suppose the log entry comparison checks that the test is correct by verifying that it did inject that exception, but not sure we really need to do that. It feels unnecessary to store away the exception in ShutdownableThread just to check this. What do you think?

Yes, that makes sense to me.

@hachikuji Thank you, updated.

hachikuji · 2018-11-14T18:48:45Z

I guess there may not be a non-intrusive way to accomplish this more reliably. Perhaps we could check the size of the message queue?

I was being lazy here, updated the test to wait for the server to receive one byte.

rajinisivaram · 2018-11-15T11:28:32Z

@hachikuji Thanks for the review. I addressed some of the comments and left a couple of questions for the remaining.

hachikuji

LGTM. Thanks Rajini!

ReplicaFetcherThread.shutdown attempts to close the fetcher's Selector while the thread is running. This in unsafe and can result in `Selector.close()` failing with an exception. The exception is caught and logged at debug level, but this can lead to socket leak if the shutdown is due to dynamic config update rather than broker shutdown. This PR changes the shutdown logic to close Selector after the replica fetcher thread is shutdown, with a wakeup() and flag to terminate blocking sends first. Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>

rajinisivaram force-pushed the KAFKA-7576-replica-fetcher-update branch from 6bea963 to dabb207 Compare November 3, 2018 11:05

ijuma reviewed Nov 3, 2018

View reviewed changes

rajinisivaram requested a review from ijuma November 9, 2018 09:53

ijuma requested a review from hachikuji November 12, 2018 19:54

hachikuji reviewed Nov 14, 2018

View reviewed changes

rajinisivaram and others added 5 commits November 15, 2018 11:01

KAFKA-7576: Fix shutdown of replica fetcher threads

5fb461d

Address review comment

6c09845

Initiate shutdown for controller request send thread as well

42ce4c4

Add test for controller shutdown

6661b73

Address review comments

b4c0a9b

rajinisivaram force-pushed the KAFKA-7576-replica-fetcher-update branch from 393120b to b4c0a9b Compare November 15, 2018 11:01

Address review comment

49f6212

hachikuji approved these changes Nov 16, 2018

View reviewed changes

hachikuji merged commit 1a4d44f into apache:trunk Nov 16, 2018

Conversation

rajinisivaram commented Nov 2, 2018

Committer Checklist (excluded from commit message)

Uh oh!

ijuma commented Nov 3, 2018

Uh oh!

rajinisivaram commented Nov 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijuma commented Nov 12, 2018

Uh oh!

hachikuji commented Nov 13, 2018

Uh oh!

hachikuji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rajinisivaram commented Nov 15, 2018

Uh oh!

hachikuji left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants