Skip to content

kubernetes-overlord-extension: Fix tasks not being shutdown#16711

Merged
georgew5656 merged 4 commits intoapache:masterfrom
ac9817:fix-tasks-not-being-shutdown
Jul 15, 2024
Merged

kubernetes-overlord-extension: Fix tasks not being shutdown#16711
georgew5656 merged 4 commits intoapache:masterfrom
ac9817:fix-tasks-not-being-shutdown

Conversation

@ac9817
Copy link
Copy Markdown
Contributor

@ac9817 ac9817 commented Jul 9, 2024

Description

In rare cases when fabric client is unable to talk to the k8s server, TaskQueue is skipping the complete shutdown logic because of the exception thrown during figuring task location. This PR allows to continue the shutdown logic without populating the location on task status and thus allowing to clean up all the data structures.


Key changed/added classes in this PR
  • KubernetesPeonLifecycle

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

maybePod = kubernetesClient.getPeonPod(taskId.getK8sJobName());
}
catch (Exception e) {
log.makeAlert("Unable to get location for task", e)

Check warning

Code scanning / CodeQL

Unused format argument

This format call refers to 0 argument(s) but supplies 1 argument(s).
Copy link
Copy Markdown
Contributor

@georgew5656 georgew5656 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to just put this in KubernetesTaskRunner.getTaskLocation since that's the method that TaskQueue calls? it seems like we really don't want that function specifically to ever throw a exception

@ac9817 ac9817 requested a review from georgew5656 July 15, 2024 15:00
Adithya Chakilam added 2 commits July 15, 2024 11:55
@georgew5656 georgew5656 merged commit 6cf6838 into apache:master Jul 15, 2024
}
}
catch (Exception e) {
log.warn("Unable to find location for task [%s]", taskId);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we are not logging the exception ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will add it in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants