Skip to content

Conversation

@aditiverma
Copy link

Make sure you have checked all steps below.

JIRA

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    This PR enables syncing DAGs from a common remote location in S3 without scheduler or web-server restart. Syncing DAGs is useful while running Airflow on a distributed setup (like in mesos), where the hosts/containers running the scheduler and the web-server can change with time. Also, where scheduler and web-server restart is to be avoided for every DAG update/addition. This PR syncs DAGs periodically from S3 location with the local DAGs in scheduler and web-server. The newly added/updated DAG in S3 is reflected in the web-server and scheduler local directories, and added to the meta-store backend on every call of collect_dags
    If s3_dags_folder property is defined in the airflow config, the '.py' files from S3 location are recursively scanned. The corresponding DAG file from S3 is downloaded only if its new or its last update timestamp is later than the local DAG file's last update timestamp.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    Tested it locally, and using it with airflow deployment on mesos. Currently, there is no test for s3_hook which is required in this PR, due to which a test for S3 DAG sync is not added

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

@jgao54
Copy link

jgao54 commented May 25, 2018

@aditiverma thanks for your contribution! We are in the process of abstracting a dag fetcher which will allow all kinds of fetchers to be extended from the a common base (see #3138).

@aditiverma
Copy link
Author

@jgao54 thanks for the initiative! Could you also review this PR to be included as a part of the dag fetcher?

@jgao54
Copy link

jgao54 commented May 31, 2018

I want to put this PR on hold and revisited once the fetcher is implemented. Reason being this will need to be extended from the BaseDagFetcher. Merging it now will add unnecessary complication to the fetcher implementation. I'd ask you to add some unit tests but that would be early optimization, given the overall pr will need to adopt the fetcher once that's implemented.

@aditiverma
Copy link
Author

@jgao54 sounds good. Please update me once the BaseDagFetcher is ready to be extended.

@ashb
Copy link
Member

ashb commented Nov 1, 2018

See #3138. I'm going to close this PR for now.

@ashb ashb closed this Nov 1, 2018
@shekarraj3
Copy link

Hi all i want to implement one logic
if 1st dag running.
2nd dag should not run based on some condition(after success 1st dag).
3rd dag should not run based on some condition(after succes 1st dag).
again anyone dag means 2nd or 3rd should start
how to implement logic in airflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants