test(model): roll back uncommitted ds_col mutations in timestamp-expression tests#40451
Conversation
Code Review Agent Run #7e9867Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
…ession tests ``test_get_timestamp_expression`` and ``test_get_timestamp_expression_epoch`` mutate ``birth_names.ds_col.expression`` (and, in the ``_epoch`` variant, ``ds_col.python_date_format``) to exercise the ``get_timestamp_expression`` compile path. ``expression`` is restored at the end of each test; ``python_date_format`` is not. Neither test commits or rolls back, so the ``TableColumn`` row stays in ``session.dirty`` for the next autoflush. This is harmless today because nothing on master inspects ``TableColumn`` for dirty state at flush time. It becomes load-bearing the moment any code adds a SQLAlchemy listener that walks dirty rows — for example, the ``__versioned__`` instrumentation from the in-flight entity-versioning work (sc-103156 / PR apache#39603). With that instrumentation applied, the next autoflush captures the uncommitted mutation, writes a ``table_columns_version`` shadow row, and corrupts ``birth_names``' version timeline — cascading into 422s on downstream dataset-restore tests and empty RLS clauses on the virtual-dataset tests, but **only on Postgres** (SQLite's post-INSERT attribute-expire behaviour is laxer and hides the bug). Wrap each test body in ``try`` / ``finally`` with ``metadata_db.session.rollback()`` in the ``finally`` to discard the attribute history. Behaviour on vanilla master is unchanged; the fix just removes a latent footgun for anyone adding flush-time listeners to ``TableColumn``. Closes sc-107618.
2c687de to
bb94e0f
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #40451 +/- ##
=======================================
Coverage 64.18% 64.18%
=======================================
Files 2592 2592
Lines 139256 139256
Branches 32334 32334
=======================================
Hits 89385 89385
Misses 48336 48336
Partials 1535 1535
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Code Review Agent Run #a449f6Actionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
SUMMARY
tests/integration_tests/model_tests.py::TestSqlaTableModel::test_get_timestamp_expressionand::test_get_timestamp_expression_epochdirectly mutatebirth_names.ds_col.expression(and, in the_epochvariant,ds_col.python_date_format) to exercise theget_timestamp_expressioncompile path.expressionis restored at the end of each test;python_date_formatis not. Neither test commits nor rolls back, so theTableColumnrow stays insession.dirtyfor the next autoflush.This is harmless on master today because nothing inspects
TableColumnfor dirty state at flush time. It becomes load-bearing the moment any code adds a SQLAlchemy listener that walks dirty rows — concretely, the__versioned__(sqlalchemy-continuum) instrumentation from the in-flight entity-versioning work (#39603). With that instrumentation applied, the next autoflush captures the uncommitted mutation, writes atable_columns_versionshadow row fords_col, and corruptsbirth_names' version timeline.The polluter mechanism is plain SQLAlchemy session-dirty + autoflush, not anything backend-specific — so the cascade reproduces on all three backends:
test_restore_applies_scalar_field(downstream ofbirth_namesversion-history)test_rls_filter_applies_to_virtual_dataset(+_with_join)test-sqlitetest-mysqltest-postgres (current/next/previous)The virtual-dataset-RLS-cascade Postgres-only symptom appears to involve an additional Postgres path through the composite-PK
rls_filter_tablesM2M (not fully root-caused; out of scope here). The dataset-restore cascade is uniform across backends.Fix: wrap each test body in
try/finallywithmetadata_db.session.rollback()in thefinallyto discard the dirty attribute history. Behaviour on master is unchanged; the fix removes a latent footgun for anyone adding flush-time listeners toTableColumn.TESTING INSTRUCTIONS
The fix is hygiene-only on vanilla master — the two affected tests pass before and after:
pytest tests/integration_tests/model_tests.py::TestSqlaTableModel::test_get_timestamp_expression \ tests/integration_tests/model_tests.py::TestSqlaTableModel::test_get_timestamp_expression_epochTo verify the latent bug this prevents, apply any SQLAlchemy
__versioned__listener toTableColumn(e.g. viasqlalchemy-continuum) and run the full integration suite. Before this fix: 3 downstream tests fail in CI's full-suite ordering (1 on each backend; Postgres also picks up the two RLS virtual-dataset failures). After: clean run.ADDITIONAL INFORMATION