Flowctl comprehensive optimization and hardening#1
Merged
Conversation
- Replace verify_receipt + read_receipt_verdict with single read_and_verify_receipt that reads file once and returns validity + verdict together (eliminates TOCTOU race) - Add Python fcntl.flock-based concurrent run lock on fd 200 to prevent multiple ralph.sh instances on the same repo (degrades gracefully on Windows) - Add --dry-run flag that runs selector loop only, printing iter/status/epic/task info without Claude invocation, branch creation, or state changes Task: fn-4-flowctl-comprehensive-optimization-and.6
The spec heading validation added by task .6 rejected all heading errors including missing headings, which breaks the smoke test set-spec --file test case (legitimate specs without ## Done summary / ## Evidence). Per acceptance criteria, only duplicate headings should be rejected. Task: fn-4-flowctl-comprehensive-optimization-and.4
- Tests handle_pre_tool_use with 11 blocked command patterns (chat-send --json, --new-chat re-review, direct codex exec/review, --last, setup-review missing flags, select-add missing --window, flowctl done missing flags, receipt before review) - Tests handle_pre_tool_use with 11 allowed patterns (flowctl wrappers, chat-send without --json, first --new-chat, done with all flags, git/flowctl show, etc.) - Tests handle_post_tool_use state transitions (chat-send success/null, flowctl done tracking, codex review verdicts, receipt reset, setup-review W=/T= tracking) - Tests edge cases (done in comments, echo, multi-line commands, $FLOWCTL variable) - Tests protected file checks, parse_receipt_path, state persistence, stop events Task: fn-4-flowctl-comprehensive-optimization-and.3
- Remove dead cmd_codex_completion_review (~220 lines) from prompts.py (active version lives in commands.py, dead copy was from pre-refactor) - Remove 6 stale imports only used by the dead function (json, Any, EPICS_DIR, SPECS_DIR, TASKS_DIR, load_json_or_exit, get_flow_dir, gather_context_hints) - Add test_admin.py: 12 tests for validate_task_spec_headings (valid, missing, duplicate, edge cases) - Add test_task.py: 10 tests for patch_task_section (replace, add, duplicate error, heading strip, edge cases) - Add test_review.py: 33 tests for parse_codex_verdict, resolve_codex_sandbox, is_sandbox_failure (all branches, env vars, JSON output parsing) Note: import shutil was already present in commands.py (fixed by prior commit) Task: fn-4-flowctl-comprehensive-optimization-and.2
- Memoize get_state_dir() per cwd with _reset_state_dir_cache() for tests - Add load_all_tasks_with_state() using os.scandir for single-pass loading - Replace per-file task loading in cmd_tasks/cmd_list/cmd_epics with batch - Add [iter N] [fn-X.Y] prefix to watch-filter output from env vars - Use $RUN_DIR/guard-debug.log when RUN_DIR is set in ralph-guard Task: fn-4-flowctl-comprehensive-optimization-and.8
…s plan review - validate_task_spec_headings now strips fenced code blocks before checking for duplicate headings, preventing false positives on specs with markdown examples containing ## headings - epic reopen now resets plan_review_status and plan_reviewed_at alongside completion_review_status, so flowctl next --require-plan-review correctly re-requires review after reopening - Updated test to verify fenced code block handling (was documenting as known limitation, now fixed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
z23cc
added a commit
that referenced
this pull request
Apr 9, 2026
P0 fixes (state loss — root cause of 5 issues): - get_flow_dir() now walks up directory tree (FLOW_STATE_DIR env → walk-up → CWD) Fixes: #1 state loss, #3 state not persistent, #5 worker parallel fail, #9 .flow symlink issues. Same pattern as git finding .git. - flowctl recover --epic <id> [--dry-run]: rebuilds task completion status from git log. Fixes #11 no recovery after state loss. P1 fixes (guard + review): - Guard graceful fallback: missing tools → "skipped" (not "failed"). Only actual failures block pipeline. Fixes #8. - Review-backend availability check: if rp-cli/codex not in PATH, auto-fallback to "none" with warning. Fixes #7. P2 fixes (UX): - Slug max length 40→20 chars. "Django+React platform with account management" → "fn-3-django-react-plat" not 40-char monster. Fixes #2 #12. - Brainstorm auto-skip: trivial tasks (≤10 words, contains "fix"/"typo"/etc) skip brainstorm entirely. Fixes #6. - --interactive flag: pause at key decisions. Fixes #10. 370 tests pass. Zero new dependencies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
z23cc
added a commit
that referenced
this pull request
Apr 9, 2026
When starting a task while other tasks in the same epic are already in_progress, flowctl now emits: - stderr warning about worktree isolation requirement - JSON field "parallel_warning" with concurrent task count This catches the #1 Critical audit finding (3 consecutive rounds): workers running in the same directory without worktree isolation. The warning is informational (not blocking) because flowctl cannot control the Agent tool's isolation parameter. But the message appears in the agent's tool output, making it harder to ignore. Use --force to suppress the warning. 370 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
z23cc
added a commit
that referenced
this pull request
Apr 9, 2026
…xes) Fix 1: Global --project-dir (-C) flag flowctl -C /path/to/project <command> Changes CWD before any command runs. Fixes the #1 recurring issue across 5 audit rounds: CWD drift after cd to subdirectories causes "task not found" / "epic not found" errors. Fix 2: Evidence must contain prose, not just Q:N scores Old: --evidence "Q1:3 Q2:2 Q3:3..." (50 chars, easily fabricated) New: minimum 200 chars + must reference questions Rejects: "Q1:3 Q2:2..." → "too short (50 chars, minimum 200)" Accepts: "Q1(Right problem):3 evidence from code..." → 438 chars OK This addresses the fn-8 audit finding where --score 24 was provided with fabricated Q:N entries. AI must now include actual reasoning. 370 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Major quality, robustness, and DX overhaul for the flow-code plugin based on a deep codebase audit.
Changes (8 tasks across 3 waves)
Testing (P0)
conftest.pyfixtures (flow_dir,git_repo)Bug fixes
cmd_codex_completion_reviewduplicate) fromprompts.pyshutilimport crash incommands.pyRobustness (P1)
atomic_writenow callsf.flush()+os.fsync()(with macOSF_FULLFSYNC) beforeos.replace()file_locks.jsonoperations usefcntl.flockfor mutual exclusion (prevents concurrent worker races)task set-spec,epic set-plan), not justvalidateReview module deduplication (P1)
get_diff_context()intocore/git.py(eliminates 3x duplicated diff-fetch pattern)codex_utils.pyRalph hardening (P1)
read_and_verify_receipt())fcntl.flock-based concurrent run protection--dry-runmode (runs selector loop without Claude invocation)New commands (P2)
flowctl doctor— superset ofvalidate, checks state-dir health, orphaned files, stale tasks, lock accumulationflowctl epic reopen— reopens closed epics, resets review metadata, fails on archived epicsPerformance & DX (P2-P3)
get_state_dir()memoized (eliminates repeatedgit rev-parsesubprocess)load_all_tasks_with_state()batch loading (singleos.scandirvs per-file)[iter N] [fn-X.Y])$RUN_DIRpath when availableTest plan
python3 -m pytest scripts/flowctl/tests/ -v— 212 passedbash scripts/smoke_test.sh— 90/90 passedbash scripts/ci_test.sh— 44/44 passed🤖 Generated with Claude Code