Skip to content

Conversation

@sean-rose
Copy link
Contributor

Since TaskInstanceHistory records are now used as the basis for showing previous task tries in the "Details", "Gantt", and "Logs" tabs (#40304), it's important they are recorded reliably.

After upgrading our Airflow instance from 2.9.3 to 2.10.5 we discovered that links to previous task tries which failed and were retried per their retries setting aren't showing up in the Airflow UI (similar to #43739), and I've confirmed this is due to the associated TaskInstanceHistory records not being successfully persisted to the database despite the TaskInstanceHistory.record_ti() call for retries happening (in our case I suspect it's due to unintended side effects of the on_retry_callback that acryl-datahub-airflow-plugin 0.13.2.4 is adding to all our tasks, as disabling that plugin makes the problem go away).

This change is meant to ensure that TaskInstanceHistory records are always persisted, despite things potentially going awry later in the retry handling.

related: #43739


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@sean-rose sean-rose requested review from XD-DENG, ashb and kaxil as code owners April 4, 2025 02:15
@sean-rose sean-rose force-pushed the persist-new-TaskInstanceHistory branch from 18094a7 to a612488 Compare April 4, 2025 02:18
@ashb
Copy link
Member

ashb commented Apr 4, 2025

This is not right the right place to do this -- functions that take a session should not commit, that is either handled by the decorator, or should be the responsibility of the caller.

So this commit needs to be handled somewhere else

@sean-rose
Copy link
Contributor Author

sean-rose commented Apr 19, 2025

This is not right the right place to do this -- functions that take a session should not commit, that is either handled by the decorator, or should be the responsibility of the caller.

It would be ideal if the session commit could be handled by the decorator or caller, but in the scenario we're running into that isn't happening successfully, so IMO Airflow ought to be more proactive in persisting the task instance history records.

Also, from looking at other Airflow code it doesn't seem uncommon for functions that take a session to commit (e.g. there are eight such session.commit() calls in airflow/models/taskinstance.py, all within functions decorated with @provide_session).

So this commit needs to be handled somewhere else

The TaskInstanceHistory.record_ti() call that isn't successfully persisting the record in our case is in TaskInstance.fetch_handle_failure_context(), so one option would be adding a session.commit() call there (though that method is also decorated with @provide_session).

Or we could create a separate session there specifically for saving the task instance history record, like:

with create_session() as ti_history_session:
    TaskInstanceHistory.record_ti(ti, session=ti_history_session)

However, I'm not familiar enough with the Airflow codebase to know what drawbacks creating a session like that might have. I generally try to follow the conventions/precedents of the surrounding code when making changes in unfamiliar codebases, and at least in airflow/models/taskinstance.py calling session.commit() seemed like the more common convention than explicitly creating new sessions.

@ashb what approach would you recommend?

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jul 16, 2025
@github-actions github-actions bot closed this Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants