KubernetesExecutor observability Improvements #35579

dirrao · 2023-11-11T02:44:10Z

Description

We have a scheduler house keeping work (adopt_or_reset_orphaned_tasks, check_trigger_timeouts, _emit_pool_metrics, _find_zombies, clear_not_launched_queued_tasks and _check_worker_pods_pending_timeout) runs on certain frequency. Right now, we don't have any latency metrics on these house keeping work. These will impact the scheduler heartbeat. Its good idea to capture these latency metrics to identify and tune the airflow configuration

Use case/motivation

As we run the airflow at a large scale, we have found that the adopt_or_reset_orphaned_tasks and clear_not_launched_queued_tasks functions might take time in a few minutes prior to bug fix (#34877). These will delay the heartbeat of the scheduler and leads to the scheduler instance restarting/killed. In order to detect these latency issues, we need metrics to capture these latencies.

closes: #31957

… stat

potiuk

Goo idea.

potiuk · 2023-11-11T10:32:20Z

can you also - however add the metrics to the documentation https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/metrics.html#metric-descriptions

potiuk

Change is good but needs documentation update.

dirrao · 2023-11-11T11:28:27Z

Change is good but needs documentation update.

Metrics documentation updated

gopal added 2 commits November 11, 2023 08:09

kubernetes executor a few crtical and time consuming functions timing…

dc689a4

… stat

kubernetes executor a few crtical and time consuming functions timing…

16afab4

… stat

dirrao requested review from hussein-awala and jedcunningham as code owners November 11, 2023 02:44

boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Nov 11, 2023

potiuk approved these changes Nov 11, 2023

View reviewed changes

potiuk requested changes Nov 11, 2023

View reviewed changes

metrics documentation update

e0f9013

dirrao requested a review from potiuk November 11, 2023 11:28

Merge branch 'main' into 31957-observability_improvment

8c44d0d

eladkal changed the title ~~kubernetes executor a few crtical and time consuming functions timing…~~ KubernetesExecutor observability Improvements Nov 11, 2023

potiuk approved these changes Nov 12, 2023

View reviewed changes

potiuk merged commit cd296d2 into apache:main Nov 12, 2023

dirrao deleted the 31957-observability_improvment branch November 13, 2023 02:34

ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Nov 20, 2023

ephraimbuddy added this to the Airflow 2.8.0 milestone Nov 20, 2023

eladkal mentioned this pull request Nov 24, 2023

Status of testing Providers that were prepared on November 24, 2023 #35845

Closed

69 tasks

eladkal mentioned this pull request Jan 16, 2024

ECS Executor - add support to adopt orphaned tasks. #36803

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KubernetesExecutor observability Improvements #35579

KubernetesExecutor observability Improvements #35579

Uh oh!

dirrao commented Nov 11, 2023 •

edited by eladkal

Loading

Uh oh!

potiuk left a comment

Uh oh!

potiuk commented Nov 11, 2023

Uh oh!

potiuk left a comment

Uh oh!

dirrao commented Nov 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KubernetesExecutor observability Improvements #35579

KubernetesExecutor observability Improvements #35579

Uh oh!

Conversation

dirrao commented Nov 11, 2023 • edited by eladkal Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

potiuk left a comment

Choose a reason for hiding this comment

Uh oh!

potiuk commented Nov 11, 2023

Uh oh!

potiuk left a comment

Choose a reason for hiding this comment

Uh oh!

dirrao commented Nov 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dirrao commented Nov 11, 2023 •

edited by eladkal

Loading