Skip to content

Conversation

@BasPH
Copy link
Contributor

@BasPH BasPH commented Mar 30, 2019

This PR makes several changes to the task context variables. I realise these are multiple changes, if you'd like me to split the PR into individual PRs, let me know.

  • Sorted the context variables alphabetically, both in the code & in the docs.

  • I think it's useful to pass a number of variables from which users can compute derived values, however wouldn't pass and the "main" variables and the derived variables. I'd leave deriving values from such "main" variables up to the user. Otherwise, we could start adding ds_nextweek, ds_nextmonth, ds_nextyear, etc. which I consider a task for the user. Hence, I removed such "derived" values:

    • Removed yesterday_ds, yesterday_ds_nodash, tomorrow_ds, tomorrow_ds_nodash. IMO the next_* and previous_* variables are useful since these require complex logic to compute the next execution date, however would leave these variables up to the user since they are simple one-liners and don't relate to the DAG interval.
    • Removed tables. This is a field in params, and is thus also accessible by the user ({{ params.tables }}). Also, it was undocumented.
    • Removed latest_date. It's the same as ds and was also undocumented.
    • Removed inlets and outlets. Also undocumented, and have the inlets/outlets ever worked/ever been used by anybody?
    • Removed end_date and END_DATE. Both have the same value, so it doesn't make sense to have both variables. Also, the value is ds which contains the start date of the interval, so the naming didn't make sense to me. However, if anybody argues in favour of adding "start_date" and "end_date" to provide the start and end datetime of task instance intervals, I'd be happy to add them.
  • Grouped together statements computing similar variables (e.g. the statements building the ds_* variables are now grouped together)

Sidenote: in principle the ds* and ts* variables are also derivations from the execution_date, however these are widely used and I also consider them useful.

The remaining context variables are now:

{
    'conf': configuration,
    'dag': task.dag,
    'dag_run': dag_run,
    'ds': ds,
    'ds_nodash': ds_nodash,
    'execution_date': self.execution_date,
    'macros': macros,
    'next_ds': next_ds,
    'next_ds_nodash': next_ds_nodash,
    'next_execution_date': next_execution_date,
    'params': params,
    'prev_ds': prev_ds,
    'prev_ds_nodash': prev_ds_nodash,
    'prev_execution_date': prev_execution_date,
    'run_id': run_id,
    'task': task,
    'task_instance': self,
    'task_instance_key_str': ti_key_str,
    'test_mode': self.test_mode,
    'ti': self,
    'ts': ts,
    'ts_nodash': ts_nodash,
    'ts_nodash_with_tz': ts_nodash_with_tz,
    'var': {
        'json': VariableJsonAccessor(),
        'value': VariableAccessor(),
    },
}

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-4192
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

See above.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Only altered tests, no additions.

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@codecov-io
Copy link

Codecov Report

Merging #5010 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5010      +/-   ##
==========================================
- Coverage   75.76%   75.75%   -0.01%     
==========================================
  Files         461      461              
  Lines       29955    29948       -7     
==========================================
- Hits        22694    22688       -6     
+ Misses       7261     7260       -1
Impacted Files Coverage Δ
...flow/contrib/example_dags/example_qubole_sensor.py 0% <ø> (ø) ⬆️
airflow/models/__init__.py 92.98% <100%> (+0.02%) ⬆️
airflow/operators/hive_to_druid.py 63.07% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 157f7bf...c9e4e70. Read the comment docs.

@codecov-io
Copy link

codecov-io commented Mar 30, 2019

Codecov Report

Merging #5010 into master will decrease coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #5010      +/-   ##
=========================================
- Coverage   76.36%   76.3%   -0.06%     
=========================================
  Files         471     471              
  Lines       30290   30313      +23     
=========================================
  Hits        23130   23130              
- Misses       7160    7183      +23
Impacted Files Coverage Δ
...flow/contrib/example_dags/example_qubole_sensor.py 0% <ø> (ø) ⬆️
airflow/models/__init__.py 92.98% <100%> (-0.03%) ⬇️
airflow/operators/hive_to_druid.py 63.07% <100%> (ø) ⬆️
airflow/ti_deps/deps/trigger_rule_dep.py 68.08% <0%> (-22.06%) ⬇️
airflow/utils/trigger_rule.py 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b93f264...60367e0. Read the comment docs.

Copy link
Member

@KevinYang21 KevinYang21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have very strong opinion on whether to change the context like this but I think this worth a note in UPDATING.md.

@BasPH
Copy link
Contributor Author

BasPH commented Mar 31, 2019

I'd say some variables are debatable, e.g. if the PMCs have a very strong opinion about keeping yesterday_ds; find by me.

However the values of latest_date, end_date and END_DATE do not correspond to the name of the variables and are thus incorrect, so I'd definitely change or drop those.

Good point about the UPDATING.md, will add info there.

@BasPH
Copy link
Contributor Author

BasPH commented Mar 31, 2019

@KevinYang21 I've updated UPDATING.md.

Would be nice to have a PMC's opinion on this.

@Fokko
Copy link
Contributor

Fokko commented Mar 31, 2019

Thanks for working on this @BasPH

I believe that yesterday_ds does not make sense since we cannot assume that the schedule is daily. I don't see any usage of this variable. Personally I do use next_execution_date quite extensively. When you have a job that runs daily, but you want to change this to an hourly job. In such a case you don't want to change {{ (execution_date + macros.timedelta(days=1)) }} to {{ (execution_date + macros.timedelta(hours=1)) }}everywhere.

@bolkedebruin What was your plan with the {in,out}lets?

@BasPH
Copy link
Contributor Author

BasPH commented Mar 31, 2019

@Fokko yesterday_ds is removed and next_execution_date is still in there, were on the same page then 🙂

@BasPH
Copy link
Contributor Author

BasPH commented Mar 31, 2019

@Fokko after the last push 1 task failed, which succeeded before, can you rerun that one? Noticed there was now a conflict with the current master, so merged and pushed once again.

@Fokko
Copy link
Contributor

Fokko commented Apr 2, 2019

Indeed @BasPH. I would like to remove the yesterday_ds as well for this reason. The only thing I'm not sure about are the {in,out}lets.

@Fokko Fokko requested a review from bolkedebruin April 2, 2019 08:44
@feng-tao
Copy link
Member

feng-tao commented Apr 6, 2019

sorry, I thought you remove the prev_ds and next_ds. For the yesterday_ds and tomrrow_ds, I am fine on removing it.

@BasPH
Copy link
Contributor Author

BasPH commented Apr 6, 2019

I resolved @milton0825's question and once again the CI failed on 1 job with unrelated issues. Can somebody restart that job?

@feng-tao
Copy link
Member

feng-tao commented Apr 6, 2019

@BasPH , I understand that we could compute the removed macro with some expressions. But I assume someone created those for convenience access. I am ok from my side, but could you send an email to the mailing list get some feedbacks from user as macros are used in users' DAG? Once there are no concern from the mailing list, we could commit the change.

@ashb
Copy link
Member

ashb commented Apr 8, 2019

We need to note in UPDATING if we remove any variables - and can we change the title of this PR/Jira to reflect that too please?

Edit: you've already got a note in UPDATING - not sure how I managed to miss that.

@BasPH BasPH changed the title [AIRFLOW-4192] Reorganize task context variables [AIRFLOW-4192] Remove obsolete/derived task context variables Apr 8, 2019
@BasPH
Copy link
Contributor Author

BasPH commented Apr 8, 2019

@feng-tao I started this discussion: https://lists.apache.org/thread.html/a81b32efc95db41a5bf847ed03b64ae4d3e9c8a2cbeda1bdcf0eb2f5@%3Cdev.airflow.apache.org%3E

@stale
Copy link

stale bot commented Sep 3, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Sep 3, 2019
@ashb ashb removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Sep 5, 2019
@stale
Copy link

stale bot commented Nov 20, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 20, 2019
@stale stale bot closed this Nov 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants