Skip to content

Conversation

@uranusjr
Copy link
Member

TaskMap is populated by a TaskInstance when it pushes its return value to XCom.

Mostly for feedback for now. The pushing part needs some additional logic from #19965 to discover downstreams that need mapping.

@uranusjr uranusjr force-pushed the task-map-table branch 2 times, most recently from 637d31c to 13105b0 Compare December 15, 2021 12:48
@uranusjr uranusjr force-pushed the task-map-table branch 4 times, most recently from cc39c58 to 76cb924 Compare January 10, 2022 05:03
@uranusjr
Copy link
Member Author

uranusjr commented Jan 10, 2022

I think this is ready-ish. Probably needs more eyes on the migration code.

@uranusjr uranusjr marked this pull request as ready for review January 10, 2022 05:06
@uranusjr uranusjr requested a review from ashb January 10, 2022 05:07
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration looks good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it matters, but for consistency should we have the same default here as we do on TI's map_index column?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default on TI.map_index is mainly added so we don’t need to change too much existing code (otherwise we’d need to add a ton of map_index=-1), but this being a new class, I think I prefer being explicit instead.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need more tests of _record_task_map_for_downstreams behaviour though please.

@uranusjr
Copy link
Member Author

Alright, modified the code a bit and added a few tests. We can’t put mapped task in a DAG yet (bag_dag kept throwing AttributeError at me about a ton of things expected on BaseOperator but not yet present on MappedOperator), so a bit of mocking is needed. But otherwise things seem to work as expected.

@ashb
Copy link
Member

ashb commented Jan 12, 2022

so a bit of mocking is needed. But otherwise things seem to work as expected.

Yeah, I guess those'll be fixed by #20743

TaskMap is populated by a TaskInstance when it pushes its return value
to XCom.
On MySQL you must not drop an index "needed by a primary key", so we
need to drop the primary key first. On both MySQL and MSSQL, a column
used by a primary key cannot be nullable (which we already accounted
for, but I forgot to add nullable=False in the migration file).
This call actually makes more sense in TI._execute_task(), which calls
xcom_push() for the return value, since we get to remove a somewhat
hackish if check.
This makes it easier to keep the logic in sync with BaseOperator.
An XCom value is arbitrary and can potentially be a giant blob of data,
which would make the log unreadable. Instead, only log the *type* of the
XCom value instead; since the offending TaskInstance's key is also
logged, the user can always look up the actual value in the XCom storage
if needed.

The exception still contains a reference to the actual value so the
catching frame can access it. Since that value is already in memory
anyway, keeping it a bit longer shouldn't be too big a problem.
We need to traverse the entire DAG to find mapped decendants because
task groups are not currently recorded in the upstream-downstream lists,
only operators. This should change in the future, but until then we need
to make do.
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc change but LGTM

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jan 13, 2022
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
with op.batch_alter_table("task_instance") as batch_op:
# I think we always use this name for TaskInstance after 7b2661a43ba3?
batch_op.drop_constraint("task_instance_pkey", type_="primary")
batch_op.add_column(Column("map_index", Integer, nullable=False, default=-1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uranusjr Did this need server_default=-1 instead? - re #20876

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m investigating if it works as expected on all engines (if not I’ll just do an UPDATE SET map_index = -1 after this line)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it with a server default is much preferred -- as otherwise an UPDATE requires us to re-write/touch every row, but a server default (at least on Postgres) doesn't.

@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 11, 2022
@jedcunningham jedcunningham added this to the Airflow 2.3.0 milestone Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dynamic-task-mapping AIP-42 changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants