Skip to content

fix: stale recovery state from crashed run poisons reprompt#572

Merged
Muizzkolapo merged 2 commits into
integration/batch-online-unificationfrom
fix/f13-stale-recovery-state
May 19, 2026
Merged

fix: stale recovery state from crashed run poisons reprompt#572
Muizzkolapo merged 2 commits into
integration/batch-online-unificationfrom
fix/f13-stale-recovery-state

Conversation

@Muizzkolapo
Copy link
Copy Markdown
Owner

Summary

  • F13 bug fix: _process_original_batch loaded recovery_state from disk and passed it to check_and_submit_reprompt. If a previous run crashed after writing state, the stale reprompt_attempt counter caused the next fresh run to skip reprompt entirely (believing attempts were exhausted).
  • Fix 1: Original batch path no longer loads recovery state — any existing file is stale by definition (recovery batches go through a separate code path).
  • Fix 2: _finalize_batch_output now deletes any leftover recovery state file before finalizing, matching the cleanup already done in the recovery path's _finalize_and_cleanup.

Verification

  • Updated existing test test_process_original_batch_ignores_stale_recovery_state to assert recovery_state=None (was asserting buggy behavior)
  • Added TestStaleRecoveryState with 2 regression tests:
    • test_stale_state_does_not_skip_reprompt — stale exhausted state does not prevent fresh reprompt
    • test_finalize_deletes_stale_recovery_state — finalization cleans up leftover state files
  • 7011 tests passed, ruff clean

The original batch path loaded recovery_state from disk and passed it to
check_and_submit_reprompt. If a previous run crashed after writing state,
the stale reprompt_attempt counter caused the next fresh run to skip
reprompt (believing attempts were exhausted).

Two fixes:
1. _process_original_batch no longer loads recovery state — in the
   original batch path, any existing file is stale by definition.
2. _finalize_batch_output now deletes any leftover recovery state file
   before finalizing, matching the cleanup already done in the recovery
   path's _finalize_and_cleanup.
test_process_original_batch_ignores_stale_recovery_state already verifies
that recovery_state=None is passed and RecoveryStateManager.load is not
called. The removed test duplicated this assertion with a fragile
side_effect capture pattern.
@Muizzkolapo Muizzkolapo merged commit 1b5e19c into integration/batch-online-unification May 19, 2026
1 check passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant