Skip to content

Conversation

@ashb
Copy link
Member

@ashb ashb commented Dec 2, 2021

Closes #19889


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@ashb ashb requested review from XD-DENG and kaxil as code owners December 2, 2021 13:05
@ashb ashb marked this pull request as draft December 2, 2021 13:05
@ashb

This comment has been minimized.

@ashb ashb force-pushed the operator-mapping branch from 6eda954 to 0792002 Compare December 8, 2021 15:20
@ashb ashb changed the title Map and Partial DAG authoring interface for Operators Map and Partial DAG authoring interface for Dynamic Task Mapping Dec 8, 2021
@ashb ashb force-pushed the operator-mapping branch 3 times, most recently from 93bbbf3 to a2f4bcb Compare December 10, 2021 17:56
@ashb
Copy link
Member Author

ashb commented Dec 10, 2021

This should now have all of the DAG authoring interface changes for AIP-42, including (some, if not extensive) tests.

Still to do:

  • Fix mypy issues in new code (there will be lots, I was running with SKIP=mypy
  • [ ] Add docs. That could be a follow up PR but it feels nicer to me to add at least basic docs here as part of this PR.

Docs will come later -- the interface might change yet still as we work out more details.

@ashb ashb force-pushed the operator-mapping branch 4 times, most recently from 9dc47d6 to 025cfee Compare December 16, 2021 16:36
@ashb ashb marked this pull request as ready for review December 16, 2021 17:25
@ashb ashb force-pushed the operator-mapping branch 2 times, most recently from 6b268bf to 0823885 Compare December 16, 2021 20:41
@ashb
Copy link
Member Author

ashb commented Dec 17, 2021

I've got another small change coming to better validate mapped/partial argument names (I was missing a few cases)

@ashb ashb requested a review from kaxil December 17, 2021 14:18
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not finished, but submitting this batch because my comments already went stale once before I finished.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder, what should MyOperator.map(x=something).map(x=another) do? If I understand this correctly, this would currently discard something and just map to another. We should likely add something in to prevent this from happening, perhaps in _validate_arg_names?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's an unspecified point of the API. I think I'm leaning towards .map().map() being an error on general grounds.

But yes, in my head updating the params was my intent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think map().map() should be an error. We already agreed on suporting map(arg1, arg2) and:

There should be one-- and preferably only one --obvious way to do it.

Copy link
Member Author

@ashb ashb Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unliekly to be able to make this change before stopping for Christmas, so either someone else can make this, or we can merge it with this and fix it later.

(I think that having a .map() function that returns an error would be clearer than having no map method, similar to how I have .partial() on a Task object throw an error still.)

Comment on lines 105 to 115
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s the purpose of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the tests, to catch this:

    @task_group_decorator
    def tg():
        ...

    with pytest.warns(UserWarning, match='was never mapped'):
        with DAG("test-dag", start_date=DEFAULT_DATE):
            tg.partial()

I.e. tg = tg.partial().map() was needed. You doesn't make sense to have a partial TG without also mapping it once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen in this case, nothing? (i.e. this is likely a user mistake)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct -- nothing, and likely a user mistake as the tasks in the task group wouldn't appear in the DAG

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if _validate_arg_names and this have similar logic to be refactored out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar, but the way this gets the possible names (most of the function) is quite differnt, so I couldn't work out a way of really sharing it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test to ensure get_serialized_fields() is up-to-date? Otherwise we may accidentally add a field on BaseOperator but forget to serialize it, and this test would fail to reflect that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, no, this is it. This will pick up anything added via self.x in the constructor, but won't pick up things like x: int = 0 at the class level.

Think it's worth it?

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn’t read the tests.

Comment on lines +159 to +163
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Mypy OK with this? I’d make _set_relatives() only accept the sequence case, and do the coercing outside in the public functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, yes.

Comment on lines +150 to +152
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not change the parent class definition to be plain class-level variable declarations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because on BaseOperator these are defined as properties.

@ashb ashb requested a review from uranusjr December 17, 2021 17:45
Comment on lines 278 to 279
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this one doesn't behave correctly right now, as:

@taks(task_id='x')
def y():
    ...

This currently stores task_id in the self.kwargs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this check only in deb54df

@ashb ashb force-pushed the operator-mapping branch from 7b04489 to deb54df Compare January 4, 2022 14:32
@github-actions
Copy link

github-actions bot commented Jan 5, 2022

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jan 5, 2022
@ashb ashb merged commit e922613 into apache:main Jan 6, 2022
@ashb ashb deleted the operator-mapping branch January 6, 2022 08:18
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 10, 2022
@jedcunningham jedcunningham added this to the Airflow 2.3.0 milestone Apr 26, 2022
kaxil added a commit to astronomer/airflow that referenced this pull request Dec 24, 2024
This was missed in the refactor: apache#19965 -- since the `task_decorator_factory` enforces keyword-only parameters for via the `*` in its signature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dynamic-task-mapping AIP-42 changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create BaseOperator.map and basic mapping framework

6 participants