Flowctl comprehensive optimization and hardening by z23cc · Pull Request #1 · z23cc/flow-code

z23cc · 2026-04-03T04:30:24Z

Summary

Major quality, robustness, and DX overhaul for the flow-code plugin based on a deep codebase audit.

Changes (8 tasks across 3 waves)

Testing (P0)

212 pytest tests covering core modules (ids, io, state, config), command modules (admin, task, review), and ralph-guard.py hook logic
Test infrastructure with conftest.py fixtures (flow_dir, git_repo)

Bug fixes

Removed ~220 lines of dead code (cmd_codex_completion_review duplicate) from prompts.py
Fixed missing shutil import crash in commands.py
Fixed spec heading validation rejecting valid markdown inside fenced code blocks

Robustness (P1)

atomic_write now calls f.flush() + os.fsync() (with macOS F_FULLFSYNC) before os.replace()
file_locks.json operations use fcntl.flock for mutual exclusion (prevents concurrent worker races)
Spec heading validation runs on write (task set-spec, epic set-plan), not just validate

Review module deduplication (P1)

Extracted get_diff_context() into core/git.py (eliminates 3x duplicated diff-fetch pattern)
Extracted receipt lifecycle helpers into codex_utils.py

Ralph hardening (P1)

Fixed TOCTOU in receipt verification (single-read read_and_verify_receipt())
Added Python fcntl.flock-based concurrent run protection
Added --dry-run mode (runs selector loop without Claude invocation)

New commands (P2)

flowctl doctor — superset of validate, checks state-dir health, orphaned files, stale tasks, lock accumulation
flowctl epic reopen — reopens closed epics, resets review metadata, fails on archived epics

Performance & DX (P2-P3)

get_state_dir() memoized (eliminates repeated git rev-parse subprocess)
load_all_tasks_with_state() batch loading (single os.scandir vs per-file)
Watch-filter iteration/task prefix ([iter N] [fn-X.Y])
Guard debug log uses $RUN_DIR path when available

Test plan

python3 -m pytest scripts/flowctl/tests/ -v — 212 passed
bash scripts/smoke_test.sh — 90/90 passed
bash scripts/ci_test.sh — 44/44 passed
All Python files compile cleanly
Layer 3 adversarial review completed (2 iterations)

🤖 Generated with Claude Code

- Replace verify_receipt + read_receipt_verdict with single read_and_verify_receipt that reads file once and returns validity + verdict together (eliminates TOCTOU race) - Add Python fcntl.flock-based concurrent run lock on fd 200 to prevent multiple ralph.sh instances on the same repo (degrades gracefully on Windows) - Add --dry-run flag that runs selector loop only, printing iter/status/epic/task info without Claude invocation, branch creation, or state changes Task: fn-4-flowctl-comprehensive-optimization-and.6

The spec heading validation added by task .6 rejected all heading errors including missing headings, which breaks the smoke test set-spec --file test case (legitimate specs without ## Done summary / ## Evidence). Per acceptance criteria, only duplicate headings should be rejected. Task: fn-4-flowctl-comprehensive-optimization-and.4

- Tests handle_pre_tool_use with 11 blocked command patterns (chat-send --json, --new-chat re-review, direct codex exec/review, --last, setup-review missing flags, select-add missing --window, flowctl done missing flags, receipt before review) - Tests handle_pre_tool_use with 11 allowed patterns (flowctl wrappers, chat-send without --json, first --new-chat, done with all flags, git/flowctl show, etc.) - Tests handle_post_tool_use state transitions (chat-send success/null, flowctl done tracking, codex review verdicts, receipt reset, setup-review W=/T= tracking) - Tests edge cases (done in comments, echo, multi-line commands, $FLOWCTL variable) - Tests protected file checks, parse_receipt_path, state persistence, stop events Task: fn-4-flowctl-comprehensive-optimization-and.3

- Remove dead cmd_codex_completion_review (~220 lines) from prompts.py (active version lives in commands.py, dead copy was from pre-refactor) - Remove 6 stale imports only used by the dead function (json, Any, EPICS_DIR, SPECS_DIR, TASKS_DIR, load_json_or_exit, get_flow_dir, gather_context_hints) - Add test_admin.py: 12 tests for validate_task_spec_headings (valid, missing, duplicate, edge cases) - Add test_task.py: 10 tests for patch_task_section (replace, add, duplicate error, heading strip, edge cases) - Add test_review.py: 33 tests for parse_codex_verdict, resolve_codex_sandbox, is_sandbox_failure (all branches, env vars, JSON output parsing) Note: import shutil was already present in commands.py (fixed by prior commit) Task: fn-4-flowctl-comprehensive-optimization-and.2

- Memoize get_state_dir() per cwd with _reset_state_dir_cache() for tests - Add load_all_tasks_with_state() using os.scandir for single-pass loading - Replace per-file task loading in cmd_tasks/cmd_list/cmd_epics with batch - Add [iter N] [fn-X.Y] prefix to watch-filter output from env vars - Use $RUN_DIR/guard-debug.log when RUN_DIR is set in ralph-guard Task: fn-4-flowctl-comprehensive-optimization-and.8

…s plan review - validate_task_spec_headings now strips fenced code blocks before checking for duplicate headings, preventing false positives on specs with markdown examples containing ## headings - epic reopen now resets plan_review_status and plan_reviewed_at alongside completion_review_status, so flowctl next --require-plan-review correctly re-requires review after reopening - Updated test to verify fenced code block handling (was documenting as known limitation, now fixed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

P0 fixes (state loss — root cause of 5 issues): - get_flow_dir() now walks up directory tree (FLOW_STATE_DIR env → walk-up → CWD) Fixes: #1 state loss, #3 state not persistent, #5 worker parallel fail, #9 .flow symlink issues. Same pattern as git finding .git. - flowctl recover --epic <id> [--dry-run]: rebuilds task completion status from git log. Fixes #11 no recovery after state loss. P1 fixes (guard + review): - Guard graceful fallback: missing tools → "skipped" (not "failed"). Only actual failures block pipeline. Fixes #8. - Review-backend availability check: if rp-cli/codex not in PATH, auto-fallback to "none" with warning. Fixes #7. P2 fixes (UX): - Slug max length 40→20 chars. "Django+React platform with account management" → "fn-3-django-react-plat" not 40-char monster. Fixes #2 #12. - Brainstorm auto-skip: trivial tasks (≤10 words, contains "fix"/"typo"/etc) skip brainstorm entirely. Fixes #6. - --interactive flag: pause at key decisions. Fixes #10. 370 tests pass. Zero new dependencies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When starting a task while other tasks in the same epic are already in_progress, flowctl now emits: - stderr warning about worktree isolation requirement - JSON field "parallel_warning" with concurrent task count This catches the #1 Critical audit finding (3 consecutive rounds): workers running in the same directory without worktree isolation. The warning is informational (not blocking) because flowctl cannot control the Agent tool's isolation parameter. But the message appears in the agent's tool output, making it harder to ignore. Use --force to suppress the warning. 370 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…xes) Fix 1: Global --project-dir (-C) flag flowctl -C /path/to/project <command> Changes CWD before any command runs. Fixes the #1 recurring issue across 5 audit rounds: CWD drift after cd to subdirectories causes "task not found" / "epic not found" errors. Fix 2: Evidence must contain prose, not just Q:N scores Old: --evidence "Q1:3 Q2:2 Q3:3..." (50 chars, easily fabricated) New: minimum 200 chars + must reference questions Rejects: "Q1:3 Q2:2..." → "too short (50 chars, minimum 200)" Accepts: "Q1(Right problem):3 evidence from code..." → 438 chars OK This addresses the fn-8 audit finding where --score 24 was provided with fabricated Q:N entries. AI must now include actual reasoning. 370 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

z23cc and others added 6 commits April 3, 2026 11:50

z23cc merged commit 51d6280 into main Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flowctl comprehensive optimization and hardening#1

Flowctl comprehensive optimization and hardening#1
z23cc merged 6 commits intomainfrom
fn-4-flowctl-comprehensive-optimization-and

z23cc commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

z23cc commented Apr 3, 2026

Summary

Changes (8 tasks across 3 waves)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant