Skip to content

Conversation

@danmactough
Copy link
Contributor

@danmactough danmactough commented Jun 10, 2019

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:

AIRFLOW-2900 introduced a change that broke Airflow whenever it tries to reprocess a packaged DAG. Since that change, the fileloc column in the database for packaged DAGs isn't an actual filepath. It looks something like: /usr/local/airflow/dags/package.zip/my_dag.py -- notice that the path to the DAG inside the zip is appended to the zip file's path. Then, when the DagBag#process_file method tries to load that filepath, it bails because that filepath is invalid.

This PR fixes that behavior by updating the DagBag#get_dag method such that when it fetches the DAG from the database and proceeds to reprocess the file, instead of using the orm_dag.fileloc property directly, we reuse the correct_maybe_zipped helper function to ensure that the passed value is an actual filepath.

Before

2019-06-07-ghosting-before

After

2019-06-07-ghosting-after

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

    • add a sanity check test to ensure that the fileloc property does not change for packaged DAGs
    • add a test to verify that packaged DAGs can be refreshed
    • add a test that plain .py DAGs can also be refreshed (seems silly to have test coverage for only packaged DAGs 😄)

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good - though there is already a text/dags/test_zip.zip that we should use instead of creating a new example dag (those show up in every new install which we don't want here)

@danmactough
Copy link
Contributor Author

Thanks for the lightning-fast review @ashb. ⚡️Updated and waiting for Travis...

@danmactough danmactough force-pushed the master branch 5 times, most recently from 80840a4 to b779d64 Compare June 10, 2019 23:49
@codecov-io
Copy link

Codecov Report

Merging #5404 into master will decrease coverage by 68.16%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #5404       +/-   ##
==========================================
- Coverage   78.97%   10.8%   -68.17%     
==========================================
  Files         483     483               
  Lines       30254   30254               
==========================================
- Hits        23893    3270    -20623     
- Misses       6361   26984    +20623
Impacted Files Coverage Δ
airflow/models/dagbag.py 17.39% <ø> (-74.4%) ⬇️
airflow/models/dag.py 25.15% <ø> (-66.62%) ⬇️
...low/contrib/operators/wasb_delete_blob_operator.py 0% <0%> (-100%) ⬇️
airflow/example_dags/subdags/subdag.py 0% <0%> (-100%) ⬇️
airflow/contrib/sensors/emr_base_sensor.py 0% <0%> (-100%) ⬇️
airflow/contrib/operators/gcs_list_operator.py 0% <0%> (-100%) ⬇️
airflow/example_dags/example_subdag_operator.py 0% <0%> (-100%) ⬇️
airflow/contrib/operators/file_to_gcs.py 0% <0%> (-100%) ⬇️
airflow/contrib/sensors/cassandra_record_sensor.py 0% <0%> (-100%) ⬇️
airflow/www/blueprints.py 0% <0%> (-100%) ⬇️
... and 394 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f710a0d...8658ea6. Read the comment docs.

@danmactough
Copy link
Contributor Author

Phew! 😓 Wow, @ashb, that change caused a totally unexpected (to me) failure in Travis. Very painful to track down, but it's ✅ now!

@ashb ashb merged commit 34056f8 into apache:master Jun 11, 2019
ashb pushed a commit to ashb/airflow that referenced this pull request Jun 13, 2019
andriisoldatenko pushed a commit to andriisoldatenko/airflow that referenced this pull request Jul 26, 2019
wmorris75 pushed a commit to modmed-external/incubator-airflow that referenced this pull request Jul 29, 2019
dharamsk pushed a commit to postmates/airflow that referenced this pull request Aug 8, 2019
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jan 16, 2020
* changes:
  Cherrypick [AIRFLOW-4760] PR apache/airflow#5404
  Cherrypick [AIRFLOW-3626] apache/airflow#4439

GitOrigin-RevId: 05ea000a888d92c30b83c649efc888cbeb149a6e
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jan 23, 2020
* changes:
  Cherrypick [AIRFLOW-4760] PR apache/airflow#5404
  Cherrypick [AIRFLOW-3626] apache/airflow#4439

GitOrigin-RevId: 05ea000a888d92c30b83c649efc888cbeb149a6e
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Apr 8, 2020
* changes:
  Cherrypick [AIRFLOW-4760] PR apache/airflow#5404
  Cherrypick [AIRFLOW-3626] apache/airflow#4439

GitOrigin-RevId: 05ea000a888d92c30b83c649efc888cbeb149a6e
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 15, 2021
* changes:
  Cherrypick [AIRFLOW-4760] PR apache/airflow#5404
  Cherrypick [AIRFLOW-3626] apache/airflow#4439

GitOrigin-RevId: 05ea000a888d92c30b83c649efc888cbeb149a6e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants