Support DAGS folder being in different location on scheduler and runners#16860
Conversation
airflow/jobs/scheduler_job.py
Outdated
There was a problem hiding this comment.
These are the defaults, we don't need to pass them again
airflow/models/dag.py
Outdated
There was a problem hiding this comment.
This relationship was added for "Duck-type compatibility" between a DAG and a DagModel in the case of subdags -- both now have a "parent_dag" attribute.
airflow/models/dag.py
Outdated
There was a problem hiding this comment.
How will this field behave for example DAGs?
There was a problem hiding this comment.
In this case it would be /usr/lib/python3.7/.../airflow/example_dags/example_bash_operator.py -- so not relative at all.
I'll update/expand the docstring to say how it deals with DAGs outside of the dags folder.
There was a problem hiding this comment.
Maybe we are also able to handle examples DAGs by adding a suffix? This will allow us to install workers on shared machines without any problems
For samples:
[exaample_dags_folder]/example_bash_operator.py => /usr/lib/python3.7/.../airflow/example_dags/example_bash_operator.py
[dags_folder]/example_bash_operator.py => ~/home/airflow/dags
There was a problem hiding this comment.
I think this change would be better done along with something like https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest, so I've not done this for now.
There was a problem hiding this comment.
Maybe we shouldn’t call this relative fileloc, but just fileloc…? Not sure about this.
a72eb83 to
6a7d7ab
Compare
|
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
airflow/models/dag.py
Outdated
There was a problem hiding this comment.
This reminds me I was wondering whether fileloc is guaranteed to be absolute or not and had to trace a lot of code. Maybe it’s worthwhile to add this to the docstring (and the one on DagModel).
uranusjr
left a comment
There was a problem hiding this comment.
I think this looks good to me in general
|
@ashb - can you rebase please ? I think most of the issues with CI stability are solved |
be08c89 to
af12ddd
Compare
There has been some vestigial support for this concept in Airflow for a while (all the CLI command already turn the literal `DAGS_FOLDER` in to the real value of the DAGS folder when loading dags), but sometime around 1.10.1-1.10.3 it got fully broken and the scheduler only ever passed full paths to DAG files. This PR brings back this behaviour
ca640d3 to
0da2b8c
Compare
|
Looks like transient MSQQL errors :( |
|
Linking #16025 (comment) |
There has been some vestigial support for this concept in Airflow for a
while (all the CLI command already turn the literal
DAGS_FOLDERin tothe real value of the DAGS folder when loading dags), but sometime
around 1.10.1-1.10.3 it got fully broken and the scheduler only ever
passed full paths to DAG files.
This PR brings back this behaviour
Closes #8061
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.