Skip to content

Conversation

@AutomationDev85
Copy link
Contributor

@AutomationDev85 AutomationDev85 commented Jan 24, 2025

Overview

This PR enables the task.scheduled_duration metric. It is already described here:
https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/metrics.html#timers
Search for "task.scheduled_duration"
The metric is not working at the moment as the scheduled_dttm field is missing in the task_instance table.

I hope I catched all points in the code were the task_instance is set to scheduled state to collect the right time and that I understood the metric in the right way.
Idea of this PR to keep it as simple as possible to enable this metric.

Details of changes:

  • Add scheduled_dttm field to the task_instance table.
  • Setting scheduled_dttm field if task is set to scheduled state.
  • Enable metric calcuation.
  • Update the api to return the task_instance with the new field.

Looking forward to start the discussion about this metric and getting this live with Airflow 3.

Relates: #30612 #34493 #34771

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:db-migrations PRs with DB migration area:Scheduler including HA (high availability) scheduler area:Triggerer area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:documentation labels Jan 24, 2025
@jscheffl jscheffl added the legacy api Whether legacy API changes should be allowed in PR label Jan 24, 2025
@jscheffl jscheffl closed this Jan 24, 2025
@jscheffl jscheffl reopened this Jan 24, 2025
@jscheffl jscheffl requested a review from Copilot January 24, 2025 15:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 8 out of 23 changed files in this pull request and generated no comments.

Files not reviewed (15)
  • airflow/api_fastapi/core_api/openapi/v1-generated.yaml: Language not supported
  • docs/apache-airflow/img/airflow_erd.sha256: Language not supported
  • docs/apache-airflow/migrations-ref.rst: Language not supported
  • airflow/api_fastapi/core_api/datamodels/task_instances.py: Evaluated as low risk
  • airflow/api_connexion/schemas/task_instance_schema.py: Evaluated as low risk
  • tests/api_connexion/endpoints/test_task_instance_endpoint.py: Evaluated as low risk
  • airflow/models/dag.py: Evaluated as low risk
  • tests/api_connexion/schemas/test_task_instance_schema.py: Evaluated as low risk
  • airflow/models/dagrun.py: Evaluated as low risk
  • airflow/models/taskinstancehistory.py: Evaluated as low risk
  • tests/api_connexion/endpoints/test_mapped_task_instance_endpoint.py: Evaluated as low risk
  • airflow/models/taskinstance.py: Evaluated as low risk
  • tests/api_fastapi/core_api/routes/public/test_task_instances.py: Evaluated as low risk
  • airflow/models/trigger.py: Evaluated as low risk
  • airflow/triggers/base.py: Evaluated as low risk

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this looks good and reasonable. But before merging... as I am not an expert in Scheduler details... would request another pair of eyes for review.

(Anyway the merge conflict needs to be resolved before merge...)

@eladkal
Copy link
Contributor

eladkal commented Jan 25, 2025

This PR enables the task.scheduled_duration metric

Does this mean the metric was never produced in Airflow 2.x?

@jscheffl
Copy link
Contributor

This PR enables the task.scheduled_duration metric

Does this mean the metric was never produced in Airflow 2.x?

Yes, it was defined but never filled - as the scheduled time was not recorded. Only the queued time was written into the DB then the metric was able to be produced. Good overview is in #30612

@eladkal
Copy link
Contributor

eladkal commented Jan 25, 2025

This PR enables the task.scheduled_duration metric

Does this mean the metric was never produced in Airflow 2.x?

Yes, it was defined but never filled - as the scheduled time was not recorded. Only the queued time was written into the DB then the metric was able to be produced. Good overview is in #30612

#45285 reports it was rarely sent. Not that it wasn't sent at all. That's odd.

@AutomationDev85 AutomationDev85 force-pushed the feature/enable-scheduled-duration-metric branch from eae3f28 to 79cf999 Compare January 27, 2025 06:59
@jscheffl
Copy link
Contributor

This PR enables the task.scheduled_duration metric

Does this mean the metric was never produced in Airflow 2.x?

Yes, it was defined but never filled - as the scheduled time was not recorded. Only the queued time was written into the DB then the metric was able to be produced. Good overview is in #30612

#45285 reports it was rarely sent. Not that it wasn't sent at all. That's odd.

I don't know in which cases it "rarely" was sent - @AutomationDev85 do you know in which cases the metric was emitted in the past?

@eladkal does anything speak against merging this for Airflow 3 to make it "proper"? (e.g. additional column that adds overhead?)

@AutomationDev85
Copy link
Contributor Author

AutomationDev85 commented Jan 28, 2025

@jscheffl That is a good question. I´m not sure how this metric should be rarely exported. In the current implementation the metric would be exported if the start_date is available but from my point of view this start_date will be set after the task went from scheduled into queued state.

May be if a task reruns and the task_instance includes already a start_date and it is not moved into the task_instance_history then this metric could be exported but this value is wrong for this metric. But I do not know in which case we can end in such kind of state. :(

@AutomationDev85 AutomationDev85 force-pushed the feature/enable-scheduled-duration-metric branch from 79cf999 to 6dd047d Compare January 28, 2025 11:55
@AutomationDev85 AutomationDev85 force-pushed the feature/enable-scheduled-duration-metric branch from 6dd047d to 1e06124 Compare January 30, 2025 10:15
@AutomationDev85 AutomationDev85 force-pushed the feature/enable-scheduled-duration-metric branch from 1e06124 to b3071b8 Compare January 30, 2025 15:53
@jscheffl
Copy link
Contributor

Nobody is objection - I thought it is a bit more of a discussion - I propose to merge this as an improvement for observability with the trade-off of an additional column in the DB

@jscheffl jscheffl merged commit 24d0fb9 into apache:main Jan 30, 2025
61 checks passed
jason810496 added a commit to jason810496/airflow that referenced this pull request Jan 31, 2025
jason810496 added a commit to jason810496/airflow that referenced this pull request Jan 31, 2025
Prab-27 pushed a commit to Prab-27/airflow that referenced this pull request Jan 31, 2025
* Enable scheduled_dttm field in task_instance

* Fixed unit tests

* Fixed Unit test

* Removed comment

* Fixed unit tests

---------

Co-authored-by: Marco Küttelwesch <marco.kuettelwesch@de.bosch.com>
shahar1 pushed a commit that referenced this pull request Feb 1, 2025
* Remove Alembic migration autogenerated comment in #46009

* Add new erd hash
amoghrajesh pushed a commit to astronomer/airflow that referenced this pull request Feb 3, 2025
* Remove Alembic migration autogenerated comment in apache#46009

* Add new erd hash
dabla pushed a commit to dabla/airflow that referenced this pull request Feb 3, 2025
* Remove Alembic migration autogenerated comment in apache#46009

* Add new erd hash
niklasr22 pushed a commit to niklasr22/airflow that referenced this pull request Feb 8, 2025
* Enable scheduled_dttm field in task_instance

* Fixed unit tests

* Fixed Unit test

* Removed comment

* Fixed unit tests

---------

Co-authored-by: Marco Küttelwesch <marco.kuettelwesch@de.bosch.com>
niklasr22 pushed a commit to niklasr22/airflow that referenced this pull request Feb 8, 2025
* Remove Alembic migration autogenerated comment in apache#46009

* Add new erd hash
ambika-garg pushed a commit to ambika-garg/airflow that referenced this pull request Feb 17, 2025
* Enable scheduled_dttm field in task_instance

* Fixed unit tests

* Fixed Unit test

* Removed comment

* Fixed unit tests

---------

Co-authored-by: Marco Küttelwesch <marco.kuettelwesch@de.bosch.com>
ambika-garg pushed a commit to ambika-garg/airflow that referenced this pull request Feb 17, 2025
* Remove Alembic migration autogenerated comment in apache#46009

* Add new erd hash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:db-migrations PRs with DB migration area:Scheduler including HA (high availability) scheduler area:Triggerer area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:documentation legacy api Whether legacy API changes should be allowed in PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants