[Airflow-2423] syncing DAGs without scheduler/web-server restart #3318
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Make sure you have checked all steps below.
JIRA
Description
This PR enables syncing DAGs from a common remote location in S3 without scheduler or web-server restart. Syncing DAGs is useful while running Airflow on a distributed setup (like in mesos), where the hosts/containers running the scheduler and the web-server can change with time. Also, where scheduler and web-server restart is to be avoided for every DAG update/addition. This PR syncs DAGs periodically from S3 location with the local DAGs in scheduler and web-server. The newly added/updated DAG in S3 is reflected in the web-server and scheduler local directories, and added to the meta-store backend on every call of
collect_dagsIf
s3_dags_folderproperty is defined in the airflow config, the '.py' files from S3 location are recursively scanned. The corresponding DAG file from S3 is downloaded only if its new or its last update timestamp is later than the local DAG file's last update timestamp.Tests
Tested it locally, and using it with airflow deployment on mesos. Currently, there is no test for s3_hook which is required in this PR, due to which a test for S3 DAG sync is not added
Commits
Documentation
Code Quality
git diff upstream/master -u -- "*.py" | flake8 --diff