Skip to content

Add an option to load the dags from db for command tasks run#32038

Merged
potiuk merged 1 commit intoapache:mainfrom
hussein-awala:feat/tasks_run_from_db
Jun 23, 2023
Merged

Add an option to load the dags from db for command tasks run#32038
potiuk merged 1 commit intoapache:mainfrom
hussein-awala:feat/tasks_run_from_db

Conversation

@hussein-awala
Copy link
Member

@hussein-awala hussein-awala commented Jun 20, 2023

closes: #32020

This PR adds a new param --read-from-db to tasks run command, when it is provided, the CLI will try to load the dag from the Metadata instead of parsing the dags files.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Signed-off-by: Hussein Awala <hussein@awala.fr>
@hussein-awala hussein-awala added the type:new-feature Changelog: New Features label Jun 20, 2023
@hussein-awala hussein-awala added this to the Airflow 2.7.0 milestone Jun 20, 2023
Copy link
Contributor

@vandonr-amz vandonr-amz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you think making reading from the DB the default and having a flag to force a full re-parse would make more sense ?
Why make the slow command the default when the fast one is going to be what the user wants 99% of the time ?

Also, it'd probably more likely occur to me to look in the doc for an option to force a refresh than for an option that does not refresh.

@shubham22
Copy link

shubham22 commented Jun 20, 2023

Agree with @vandonr-amz on making the default option as optimized version and then letting customers decide if they want to refresh the DAGs. And do we need same solution for the REST API https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_task_instance? REST API doesn't seem to parse.

@ephraimbuddy
Copy link
Contributor

ephraimbuddy commented Jun 21, 2023

don't you think making reading from the DB the default and having a flag to force a full re-parse would make more sense ? Why make the slow command the default when the fast one is going to be what the user wants 99% of the time ?

Also, it'd probably more likely occur to me to look in the doc for an option to force a refresh than for an option that does not refresh.

It will be a breaking change for those that uses run_as_user in their dags if you make this change the default. Needs investigation...

@ephraimbuddy
Copy link
Contributor

@hussein-awala , even though you linked the issue you are working on in this PR, it's necessary to provide a good commit message for the PR

@hussein-awala
Copy link
Member Author

hussein-awala commented Jun 21, 2023

It will be a breaking change for those that uses run_as_user in their dags if you make this change the default. Needs investigation...

@ephraimbuddy I agree, but I think we should we can add a new Airflow configuration to specify the default value for this parameter. wdyt?
To do this, we would need to replace the boolean parameter with a string parameter, like:

DEFAULT_DAGS_SOURCE_FOR_TASKS_RUN = "dags_folder"
# or
DEFAULT_DAGS_SOURCE_FOR_TASKS_RUN = "metadata_db"

By implementing this approach, the worker will be capable of executing the task without requiring the parsing of the DAG file. Consequently, it will be possible to run the task without including them in the worker container. (testing is necessary to ensure its effectiveness)

@hussein-awala
Copy link
Member Author

@hussein-awala , even though you linked the issue you are working on in this PR, it's necessary to provide a good commit message for the PR

👍
I just added a description

@potiuk potiuk merged commit 05a67ef into apache:main Jun 23, 2023
ferruzzi pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Jun 27, 2023
@tirkarthi
Copy link
Contributor

tirkarthi commented Jul 6, 2023

Reading from SerializedModel was originally added as part of AIP-45 in https://github.com/apache/airflow/pull/21877/files#diff-ad618185a072910e49c11770954af009d1cc070b120a4fde5f2fc095a588271bR360-R363 . It was changed in #26750 . Not reading from database was intentional as @ephraimbuddy mentioned. I am not sure if I am misunderstanding something where serialized models where changed after 2.4 to allow this.

cc: @pingzh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:CLI type:new-feature Changelog: New Features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Airflow tasks run -m cli command giving 504 response

7 participants