Make KubernetesWorkItem.shutdown idempotent#18576
Conversation
capistrant
left a comment
There was a problem hiding this comment.
Seems like a logical solution to me. I think this might be a time to take a second and consider the log watch in the shutdown and if it is even necessary. what do you think?
| this.kubernetesPeonLifecycle.shutdown(); | ||
| if (isShutdown.compareAndSet(false, true)) { | ||
| synchronized (this) { | ||
| this.kubernetesPeonLifecycle.startWatchingLogs(); |
There was a problem hiding this comment.
do we even need to be calling startWatchingLogs() here? Kind of off topic for the PR. But part of the reason shutdown is slow and can even hang, is because of the LogWatch being unhealthy. #18444 was implemented to put a time limit on log persist for saving off task logs. In that code path we are actually writing the logs out somewhere. I guess I don't see how the call here is useful at all if we are trying to just shut the task down?
There was a problem hiding this comment.
Yeah, I agree, the only apparent effect of calling startWatchingLogs() is to initialize KubernetesPeonLifecycle.logWatch field which would be initialized anyway via join() -> finally -> saveLogs() -> doSaveLogs().
We can explore that in a separate PR.
|
#18579 opened this for proposing dropping the LogWatch init in shutdown |
Subsequent calls to KubernetesWorkItem.shutdown() should not block and return immediately.
Description
This patch tries to address the following condition due to a bug in fabric8 client.
The callback executor in
TaskQueuecan get stuck in the following scenario.TaskQueuetries to shutdown a task vianotifyStatus()and queues up aTaskRunner.shutdown()on the callback executor.TaskQueuethen removes the entry for this task from its in-memory data-structuresTaskRunner.shutdown()remains stuck due to the fabric8 bugTaskRunnerstill has the task in its memoryTaskQueueperiodically manages itself viastartPendingTasksOnRunner(),it realizes that the
TaskRunnerhas this unknown task and tries to shut it down again.Fix
Subsequent calls to
KubernetesWorkItem.shutdown()should not block and return immediately.This PR has: