Skip to content

Conversation

@dirrao
Copy link
Contributor

@dirrao dirrao commented Nov 11, 2023

Description

We have a scheduler house keeping work (adopt_or_reset_orphaned_tasks, check_trigger_timeouts, _emit_pool_metrics, _find_zombies, clear_not_launched_queued_tasks and _check_worker_pods_pending_timeout) runs on certain frequency. Right now, we don't have any latency metrics on these house keeping work. These will impact the scheduler heartbeat. Its good idea to capture these latency metrics to identify and tune the airflow configuration

Use case/motivation

As we run the airflow at a large scale, we have found that the adopt_or_reset_orphaned_tasks and clear_not_launched_queued_tasks functions might take time in a few minutes prior to bug fix (#34877). These will delay the heartbeat of the scheduler and leads to the scheduler instance restarting/killed. In order to detect these latency issues, we need metrics to capture these latencies.

closes: #31957

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Nov 11, 2023
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goo idea.

@potiuk
Copy link
Member

potiuk commented Nov 11, 2023

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change is good but needs documentation update.

@dirrao
Copy link
Contributor Author

dirrao commented Nov 11, 2023

Change is good but needs documentation update.

Metrics documentation updated

@dirrao dirrao requested a review from potiuk November 11, 2023 11:28
@eladkal eladkal changed the title kubernetes executor a few crtical and time consuming functions timing… KubernetesExecutor observability Improvements Nov 11, 2023
@potiuk potiuk merged commit cd296d2 into apache:main Nov 12, 2023
@dirrao dirrao deleted the 31957-observability_improvment branch November 13, 2023 02:34
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Nov 20, 2023
@ephraimbuddy ephraimbuddy added this to the Airflow 2.8.0 milestone Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Airflow Observability Improvement Request

3 participants