Return 410 Gone for heartbeat when cleared TI exists in TIH#61631
Return 410 Gone for heartbeat when cleared TI exists in TIH#61631andreahlert wants to merge 6 commits intoapache:mainfrom
Conversation
|
It's an intermittent failure, restarting it and taking a look |
amoghrajesh
left a comment
There was a problem hiding this comment.
Small comments, otherwise LGTM thanks!
airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py
Outdated
Show resolved
Hide resolved
airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py
Show resolved
Hide resolved
airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_task_instances.py
Outdated
Show resolved
Hide resolved
Done! Thanks for reviewing! |
|
@andreahlert can you check the failing tests? |
|
Tests need fixing |
|
@andreahlert This PR has been converted to draft because it does not yet meet our Pull Request quality criteria. Issues found:
What to do next:
Converting a PR to draft is not a rejection — it is an invitation to bring the PR up to the project's standards so that maintainer review time is spent productively. If you have questions, feel free to ask on the Airflow Slack. |
When a running task instance is cleared, its previous try is archived to the Task Instance History table and the TI receives a new UUID. Subsequent heartbeats from the old process get a 404 because the old UUID no longer exists in the TI table. This change improves the error handling by checking the TIH table when a heartbeat TI is not found. If the UUID exists in TIH, return 410 Gone instead of 404 Not Found, giving the client a more specific signal that the task was cleared rather than never existing. - Server: check TIH on heartbeat NoResultFound, return 410 if found - Supervisor: handle 410 Gone same as 404/409 (terminate process) - Keep 404 for TIs that genuinely never existed closes: apache#53140
Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
… message - Replace undefined ti_id_str with task_instance_id in TIH query and log - Use task_instance_id (UUID) for TIH.task_instance_id comparison - Set 410 Gone detail message to match test expectation
0a744f6 to
0ef9c07
Compare
|
Tests are failing |
|
Rebased on main. GitHub Actions infra issue, let's see if gets green again. |
|
Rebased and ready for review. @amoghrajesh @ashb @kaxil |
Closes: #53140
When a running Task Instance is cleared,
prepare_db_for_next_try()archives the current try to the Task Instance History (TIH) table and assigns a new UUID. This means heartbeats from the old process will fail with a 404 "Not Found" because the UUID no longer exists in the TI table. However, a generic 404 is misleading since the TI did exist - it was just cleared.This PR adds a TIH lookup in the
NoResultFoundhandler of the heartbeat endpoint:This gives the task SDK supervisor a more specific signal and matches HTTP semantics (410 means "the target resource is no longer available at the origin server and this condition is likely to be permanent").
Changes:
execution_api/routes/task_instances.py: Added TIH existence check in the heartbeatNoResultFoundhandler; added 410 to OpenAPI response schemaexecution_time/supervisor.py: AddedHTTPStatus.GONEto the set of status codes that the supervisor treats as "stop heartbeating" (alongside NOT_FOUND and CONFLICT)test_task_instances.py: Added test usingprepare_db_for_next_try()to properly simulate a cleared TI and verify the 410 responsePrevious attempt: #56443 (closed as stale). This PR addresses the review feedback from that PR by actually checking the TIH table rather than blindly changing 404 to 410.