Fix/db clean dag version fk constraint #59679
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes the
airflow db cleancommand failing with a foreign key constraint error when attempting to delete olddag_versionrecords that are still referenced by recenttask_instanceordag_runrecords.Problem
When running
airflow db clean, the command fails with:Root Cause:
dag_versionrows are deleted based on theircreated_attimestamptask_instancerows are deleted based on theirstart_datetimestampdag_version.created_at→ marked for deletiontask_instance.start_date→ kept3ac9e5732b1fchanged the FK constraint toON DELETE RESTRICTdag_versionbecausetask_instancestill references itReproduction:
dag_version.created_at)task_instance.start_date)airflow db clean --clean-before-timestamp <date>where date is between the two timestampsSolution
Modified the
_build_query()function indb_cleanup.pyto add special handling for thedag_versiontable:dag_versionrow, check if it's referenced by anytask_instanceordag_rundag_versionrows that have NO active references, regardless of their ageThis ensures:
dag_versionrecords are cleaned upChanges
Modified Files:
airflow-core/src/airflow/utils/db_cleanup.pynot_import from sqlalchemy_build_query()to excludedag_versionrows with active references intask_instanceordag_runtablesairflow-core/tests/unit/utils/test_db_cleanup.pytest_dag_version_with_active_references_not_deleted()test casedag_versionrows with recent references are not deletedTesting
New Test Case:
The test creates an old
dag_version(60 days ago), a recenttask_instance(55 days ago) referencing it, runs cleanup with 30-day threshold, and verifiesdag_versionis NOT deleted despite being old enough.