Skip to content

Conversation

@Arunodoy18
Copy link
Contributor

Fixes the airflow db clean command failing with a foreign key constraint error when attempting to delete old dag_version records that are still referenced by recent task_instance or dag_run records.

Problem

When running airflow db clean, the command fails with:

Root Cause:

  • dag_version rows are deleted based on their created_at timestamp
  • task_instance rows are deleted based on their start_date timestamp
  • A DAG created long ago but executed recently will have:
    • Old dag_version.created_at → marked for deletion
    • Recent task_instance.start_date → kept
  • Migration 3ac9e5732b1f changed the FK constraint to ON DELETE RESTRICT
  • Result: Cannot delete dag_version because task_instance still references it

Reproduction:

  1. Have a DAG that hasn't been updated for a while (old dag_version.created_at)
  2. Run the DAG recently (recent task_instance.start_date)
  3. Execute airflow db clean --clean-before-timestamp <date> where date is between the two timestamps
  4. Command fails with the IntegrityError above

Solution

Modified the _build_query() function in db_cleanup.py to add special handling for the dag_version table:

  • Before attempting to delete a dag_version row, check if it's referenced by any task_instance or dag_run
  • Only delete dag_version rows that have NO active references, regardless of their age
  • Uses SQL EXISTS subqueries to efficiently check for references

This ensures:

  • ✅ Foreign key constraints are respected
  • ✅ No IntegrityError is raised
  • ✅ Database integrity is maintained
  • ✅ Only truly orphaned dag_version records are cleaned up

Changes

Modified Files:

  1. airflow-core/src/airflow/utils/db_cleanup.py

    • Added not_ import from sqlalchemy
    • Modified _build_query() to exclude dag_version rows with active references in task_instance or dag_run tables
  2. airflow-core/tests/unit/utils/test_db_cleanup.py

    • Added test_dag_version_with_active_references_not_deleted() test case
    • Reproduces the exact scenario from the bug report
    • Verifies that old dag_version rows with recent references are not deleted

Testing

New Test Case:
The test creates an old dag_version (60 days ago), a recent task_instance (55 days ago) referencing it, runs cleanup with 30-day threshold, and verifies dag_version is NOT deleted despite being old enough.

# Run the new test
pytest airflow-core/tests/unit/utils/test_db_cleanup.py::TestDBCleanup::test_dag_version_with_active_references_not_deleted -v

# Run all db_cleanup tests
pytest  -v

The root logger level was hardcoded to INFO instead of respecting the configured logging_level setting. This caused user task code using logging.getLogger(__name__).info() to not show up in task logs unless the log level was artificially high (e.g. level 55).

Changes:

- Set root logger level to log_level.upper() instead of hardcoded INFO

- Add tests verifying root logger respects configured log level

- Add test for INFO level filtering (DEBUG messages not shown)

This restores the behavior from Airflow 2.x where logger.info() worked as documented.
When using 'airflow db clean', an IntegrityError occurs when attempting
to delete rows from the dag_version table that are still referenced by
task_instance or dag_run rows. This happens because:

1. dag_version rows are deleted based on their 'created_at' timestamp
2. task_instance rows are deleted based on their 'start_date' timestamp
3. A DAG created long ago but run recently has an old dag_version but
   recent task_instance records
4. The foreign key constraint task_instance_dag_version_id_fkey was
   changed to ON DELETE RESTRICT in migration 3ac9e5732b1f

This fix adds logic to exclude dag_version rows from deletion if they
are still referenced by any task_instance or dag_run rows, regardless
of the age of the dag_version record itself. This respects the foreign
key constraint and prevents the IntegrityError.

Changes:
- Modified _build_query() in db_cleanup.py to add EXISTS checks for
  dag_version table, excluding rows with active references
- Added comprehensive test case to verify dag_version with active
  references is not deleted even when old enough to meet age criteria

Fixes issue where db clean fails with:
IntegrityError: Cannot delete or update a parent row: a foreign key
constraint fails (airflow.task_instance, CONSTRAINT
task_instance_dag_version_id_fkey FOREIGN KEY (dag_version_id)
REFERENCES dag_version (id) ON DELETE RESTRICT)
@kaxil
Copy link
Member

kaxil commented Jan 5, 2026

the fact that you have 2 PRs with the same pr title doesn't instil confidence in your PRs anymore. #59525

@Arunodoy18 I am going to close your PRs -- Please review and test your changes with correct PR description. Using LLMs without those increase maintenance burdens and CI run time.

Feel free to recreate focussed PRs following those guidelines.

@kaxil kaxil closed this Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants