Detect incomplete Claude assistant runs by ejc3 · Pull Request #109 · ejc3/fcvm

ejc3 · 2026-01-05T19:41:00Z

Summary

Fixes a bug where Claude assistant jobs show "success" even when killed mid-execution.

Root cause: When GitHub cancels a job (e.g., via cancel-in-progress), the Claude process receives SIGTERM and exits. The async iterator in runClaude() stops yielding, the function returns normally, and the process exits with code 0. GitHub sees exit code 0 and marks the job as "success".

Evidence: Run #20726814304 shows "success" but logs show Claude was mid-execution when "Post job cleanup" started.

Changes

Add CLAUDE_ASSISTANT_COMPLETE marker after runClaude() finishes in index.ts
Verify marker exists in all workflow steps before declaring success
Applies to: review, manual-review, respond, ci-fix jobs

Test plan

Push to trigger Claude review
Verify completion marker appears in successful runs
Verify cancelled runs fail instead of showing success

Stacked on: tty-interactive-mode (PR #108)

When a Claude assistant job is killed mid-execution (e.g., by cancel-in-progress), the process exits with code 0 and GitHub marks the job as "success" even though it didn't complete. Fix: - Add completion marker in index.ts after runClaude finishes - Verify marker exists in workflow steps before declaring success - Applies to review, manual-review, respond, and ci-fix jobs Tested: Verified old run 20726814304 has no completion marker in logs, confirming it was killed mid-execution.

claude-claude · 2026-01-05T19:43:04Z

🔍 Claude Review

SEVERITY: critical

Findings

[CRITICAL] Race condition in pipeline operation
The pattern 2>&1 | tee /tmp/claude-output.log followed by grep on the same file creates a race condition. The tee process may still be buffering output when grep executes. While unlikely in most cases, under high system load or with large output, the marker could be in tee's buffer but not yet written to /tmp/claude-output.log when grep runs.

[CRITICAL] Missing set -o pipefail
Without set -o pipefail, the pipeline pnpm exec tsx index.ts 2>&1 | tee /tmp/claude-output.log returns the exit code of the last command (tee), not the first (pnpm exec tsx). This means:

If index.ts throws an error (line 706: process.exit(1)), it exits with code 1
But tee succeeds (exit 0), so the pipeline succeeds
The grep check then runs and fails (marker not present)
The real error message from main().catch() is hidden by the generic "process may have been killed" message

[MEDIUM] No SIGTERM signal handler
The PR description identifies SIGTERM as the root cause of premature exits, but no SIGTERM handler is added. When Node.js receives SIGTERM (e.g., from cancel-in-progress):

Current async operations may complete
Process exits before reaching line 702 (console.log("CLAUDE_ASSISTANT_COMPLETE"))
The marker is never printed, which is the intended behavior, BUT...
Without a handler, you can't distinguish "killed by SIGTERM" from "hung/infinite loop"

[MEDIUM] Completion marker may not flush to stdout
console.log() writes to a buffered stdout stream. If the process receives SIGTERM (or exits quickly) after line 702 executes, Node.js may not flush the stdout buffer before terminating. The marker could be "printed" in-memory but never appear in logs.

Recommended Fixes

Fix pipeline: Use a temporary exit code file instead of relying on the pipeline:

cd scripts/claude-assistant && pnpm exec tsx index.ts 2>&1 | tee /tmp/claude-output.log; echo $? > /tmp/claude-exit-code.txt
EXIT_CODE=$(cat /tmp/claude-exit-code.txt)
if \! grep -q "CLAUDE_ASSISTANT_COMPLETE" /tmp/claude-output.log; then
  echo "ERROR: Claude assistant did not complete - process may have been killed"
  exit 1
fi
exit $EXIT_CODE

Add pipefail (simpler alternative to Replace custom PassthroughFs with fuse-backend-rs #1):

set -o pipefail
cd scripts/claude-assistant && pnpm exec tsx index.ts 2>&1 | tee /tmp/claude-output.log
# ... rest of check

Add SIGTERM handler (in index.ts):

let terminated = false;
process.on('SIGTERM', () => {
  terminated = true;
  console.log("\n=== CLAUDE_ASSISTANT_TERMINATED_BY_SIGNAL ===");
  process.exit(143); // Standard SIGTERM exit code
});

Force flush stdout (replace line 702):

process.stdout.write("\n=== CLAUDE_ASSISTANT_COMPLETE ===\n");
// Ensure synchronous flush (process.stdout.write is sync for TTY, but be explicit)

Summary

The approach is sound, but the implementation has critical flaws that could cause false positives (reporting killed when actually errored) or false negatives (marker buffered but not flushed). The pipeline handling and stdout buffering issues need to be addressed.

Review by Claude | Run

claude-claude · 2026-01-05T19:44:19Z

🔧 Auto-Fix Attempted

I attempted to create fixes for the critical issues, but encountered a permissions issue:

Error: The GitHub App cannot push changes to .github/workflows/claude.yml without the workflows permission.

Changes I Attempted to Make

I created the following fixes on branch claude/fix-20727096323:

Added SIGTERM handler (index.ts):
- Distinguishes cancellation from incomplete execution
- Outputs CLAUDE_ASSISTANT_TERMINATED_BY_SIGNAL marker
- Exits with code 143 (standard SIGTERM exit code)
Fixed stdout buffering (index.ts):
- Changed console.log() to process.stdout.write() for completion marker
- Ensures synchronous flush before process exits
Added set -o pipefail (claude.yml, all 4 jobs):
- Propagates exit codes from pnpm exec tsx through the pipeline
- Prevents tee's success from masking index.ts errors

Required Actions

Option 1: Manually apply the changes from the review comments above

Option 2: Grant the workflows permission to the GitHub App and I can push the fixes

Option 3: Cherry-pick the commit from my branch:

git fetch origin claude/fix-20727096323
git cherry-pick d4fd160

View Claude Run

Addresses review feedback: 1. Add SIGTERM handler (index.ts) - Outputs CLAUDE_ASSISTANT_TERMINATED_BY_SIGNAL marker - Exits with code 143 (standard SIGTERM exit code) - Distinguishes "killed by signal" from "hung/crashed" 2. Fix stdout buffering (index.ts) - Use process.stdout.write() instead of console.log() - Ensures synchronous flush before process exits - Prevents marker from being lost in buffer 3. Add set -o pipefail (claude.yml, all 4 jobs) - Propagates exit codes through pipeline - Prevents tee's success from masking index.ts errors - review, manual-review, respond, ci-fix jobs updated Tested: pnpm exec tsx --version (syntax check)

claude-claude · 2026-01-05T20:21:10Z

🔍 Claude Review

SEVERITY: critical

Findings

[CRITICAL] Pipeline exit code not preserved
The workflow script does not check the pipeline exit code after the marker verification. If index.ts successfully writes the completion marker but then errors in main().catch() (lines 714-716), the workflow will:

Execute the pipeline which exits with code 1 (due to set -o pipefail)
Find the completion marker (grep succeeds)
Exit the script without error (exit code 0)
Mark the job as "success" despite the error

Example scenario:

await runClaude(prompt);
process.stdout.write("\n=== CLAUDE_ASSISTANT_COMPLETE ===\n");
// If an error occurs here or in cleanup code
throw new Error("Something failed");

The marker is in the logs, but the process errored. Current workflow reports success.

Fix: Capture and propagate the pipeline exit code:

set -o pipefail
cd scripts/claude-assistant && pnpm exec tsx index.ts 2>&1 | tee /tmp/claude-output.log
PIPELINE_EXIT=$?
if \! grep -q "CLAUDE_ASSISTANT_COMPLETE" /tmp/claude-output.log; then
  echo "ERROR: Claude assistant did not complete - process may have been killed"
  exit 1
fi
exit $PIPELINE_EXIT

This ensures:

If marker missing → exit 1 (killed/incomplete)
If marker present but pipeline failed → exit with pipeline code
If marker present and pipeline succeeded → exit 0

Summary

The completion marker detection is sound, but the workflow doesn't preserve the pipeline exit code. This can cause false positives where errored runs are marked as successful if the error occurs after the completion marker is printed. The fix is to capture and propagate the pipeline exit code.

Review by Claude | Run

claude-claude · 2026-01-05T20:22:27Z

🔧 Auto-Fix Attempted

I identified the critical issue and created a fix, but encountered a permissions issue:

Error: The GitHub App cannot push changes to .github/workflows/claude.yml without the workflows permission.

Fix Required

Add exit code preservation to all 4 Claude assistant jobs in .github/workflows/claude.yml:

         run: |
           set -o pipefail
           cd scripts/claude-assistant && pnpm exec tsx index.ts 2>&1 | tee /tmp/claude-output.log
+          PIPELINE_EXIT=$?
           if ! grep -q "CLAUDE_ASSISTANT_COMPLETE" /tmp/claude-output.log; then
             echo "ERROR: Claude assistant did not complete - process may have been killed"
             exit 1
           fi
+          exit $PIPELINE_EXIT

This needs to be applied to all 4 jobs: review, manual-review, respond, and ci-fix.

Why This Fix Is Needed

Without capturing the pipeline exit code, if index.ts successfully writes the completion marker but then errors (e.g., in main().catch()), the workflow will:

Run the pipeline which fails (exit 1)
Check for the marker (found)
Exit without error (exit 0) ← BUG
Report "success" despite the error

Manual Application

You can apply this fix manually by editing .github/workflows/claude.yml and adding the two lines (PIPELINE_EXIT=$? and exit $PIPELINE_EXIT) to each of the 4 jobs.

View Claude Run

ejc3 mentioned this pull request Jan 5, 2026

Create fix branch from PR branch, not main #110

Merged

3 tasks

ejc3 deleted the branch tty-interactive-mode January 6, 2026 04:11

ejc3 closed this Jan 6, 2026

ejc3 deleted the ci-log-truncation-check branch January 17, 2026 03:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect incomplete Claude assistant runs#109

Detect incomplete Claude assistant runs#109
ejc3 wants to merge 2 commits intotty-interactive-modefrom
ci-log-truncation-check

ejc3 commented Jan 5, 2026

Uh oh!

claude-claude bot commented Jan 5, 2026

Uh oh!

claude-claude bot commented Jan 5, 2026

Uh oh!

claude-claude bot commented Jan 5, 2026

Uh oh!

claude-claude bot commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ejc3 commented Jan 5, 2026

Summary

Changes

Test plan

Uh oh!

claude-claude bot commented Jan 5, 2026

🔍 Claude Review

Findings

Recommended Fixes

Summary

Uh oh!

claude-claude bot commented Jan 5, 2026

🔧 Auto-Fix Attempted

Changes I Attempted to Make

Required Actions

Uh oh!

claude-claude bot commented Jan 5, 2026

🔍 Claude Review

Findings

Summary

Uh oh!

claude-claude bot commented Jan 5, 2026

🔧 Auto-Fix Attempted

Fix Required

Why This Fix Is Needed

Manual Application

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant