Skip to content

fix(db): rollback on SessionCTX exception + pool_pre_ping#20

Merged
jirhiker merged 2 commits into
mainfrom
claude/epic-cray-a3b28b
May 12, 2026
Merged

fix(db): rollback on SessionCTX exception + pool_pre_ping#20
jirhiker merged 2 commits into
mainfrom
claude/epic-cray-a3b28b

Conversation

@jirhiker
Copy link
Copy Markdown

Summary

  • SessionCTX exception handling: __exit__ now rolls back the session before close when an exception propagates through the context, so dirty connections never return to the pool with a half-committed transaction. Previously this could surface as a phantom failure several call sites away from the original error.
  • Counter leak fix: close_session decrement moved into a finally block. An exception during flush/close no longer leaves _session_cnt stuck above zero (which would leak the session forever and cause DetachedInstanceError-adjacent symptoms downstream).
  • pool_pre_ping=True enabled on both the standard MySQL/Postgres engine and the Cloud SQL IAM engine (skipped for SQLite). Catches Cloud SQL idle-disconnects before the next query instead of surfacing "server has gone away" to the user.

Why

The user reported intermittent DetachedInstanceErrors and added session_ctx as a workaround. Audit of the context manager surfaced three robustness gaps that this PR addresses:

  1. __exit__ closed without rollback on exception — dirty connection returned to pool.
  2. close_session decremented the refcount after flush(), so a flush exception left the counter stuck and the session permanently held.
  3. Cloud SQL IAM engine had pool_pre_ping=False, so idle-drops surfaced as errors instead of being silently recovered.

The expire_on_commit=False setting and the application-level catch in sample_browser_model.py mask symptoms; this PR fixes upstream causes.

What this does NOT do

Deferred to follow-up tasks (already spawned):

  • Fix the DetachedInstanceError catch in sample_browser_model.py (reattach() kills the app) — root cause is get_analyses_uuid returning ORM rows with lazy relationships. Needs eager-load or DTO conversion.
  • Eager-loading audit of hot DVC fan-out loops (make_analyses, repository_transfer, sample browser load). Needs profiling first.
  • scoped_session migration — no background DB threads found in the codebase, so the current refcount + lock model is adequate. Revisit if threading needs arise.

Test plan

  • Smoke test against in-memory SQLite confirms:
    • Happy nested session_ctx — counter increments/decrements correctly, session closes on outermost exit
    • Exception inside a single session_ctx — session rolls back, counter resets to 0, session attribute cleared
    • Exception inside a nested inner session_ctx — outer survives with counter back to 1, closes cleanly on outer exit
    • use_parent_session=False — separate session created, parent session restored on exit
  • Manual verification against real Cloud SQL instance (idle reconnect path)
  • Soak test in a long-running pychron session to confirm no session-counter drift

🤖 Generated with Claude Code

jirhiker and others added 2 commits May 11, 2026 17:52
SessionCTX.__exit__ now rolls back the session on exception before
closing, so dirty connections never return to the pool. The close
path was reworked under try/finally so a failure during flush/close
cannot leave _session_cnt stuck above zero (previously leaked the
session forever). close_session() gained a skip_flush flag used on
the exception path to avoid flush re-raising and masking the
original error.

pool_pre_ping is now enabled on both the standard MySQL/Postgres
engine and the Cloud SQL IAM engine (skipped for SQLite). This
catches Cloud SQL's silent idle-disconnects before the next query
instead of surfacing "server has gone away" to the user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eError

reattach() in SampleBrowserModel previously caught DetachedInstanceError
and killed the app. Root cause: get_analyses_uuid returned ORM rows with
default lazy='select' chains (irradiation_position.level/sample.project)
that detached when session_ctx closed. Downstream callers
(progress_bind_records -> AnalysisTbl.bind()) then walked those chains
and raised.

Add joinedload/selectinload options matching get_labnumber_analyses,
extracted as _analysis_eager_options() helper, and apply to
get_analysis, get_analysis_uuid, get_analyses_uuid, get_analysis_runid,
and get_analysis_by_attr. Drop the try/except in reattach() so any
remaining bug surfaces instead of a "restart pychron" dialog.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jirhiker jirhiker merged commit 8d3807b into main May 12, 2026
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant