Skip to content

Flowctl comprehensive optimization and hardening#1

Merged
z23cc merged 6 commits intomainfrom
fn-4-flowctl-comprehensive-optimization-and
Apr 3, 2026
Merged

Flowctl comprehensive optimization and hardening#1
z23cc merged 6 commits intomainfrom
fn-4-flowctl-comprehensive-optimization-and

Conversation

@z23cc
Copy link
Copy Markdown
Owner

@z23cc z23cc commented Apr 3, 2026

Summary

Major quality, robustness, and DX overhaul for the flow-code plugin based on a deep codebase audit.

Changes (8 tasks across 3 waves)

Testing (P0)

  • 212 pytest tests covering core modules (ids, io, state, config), command modules (admin, task, review), and ralph-guard.py hook logic
  • Test infrastructure with conftest.py fixtures (flow_dir, git_repo)

Bug fixes

  • Removed ~220 lines of dead code (cmd_codex_completion_review duplicate) from prompts.py
  • Fixed missing shutil import crash in commands.py
  • Fixed spec heading validation rejecting valid markdown inside fenced code blocks

Robustness (P1)

  • atomic_write now calls f.flush() + os.fsync() (with macOS F_FULLFSYNC) before os.replace()
  • file_locks.json operations use fcntl.flock for mutual exclusion (prevents concurrent worker races)
  • Spec heading validation runs on write (task set-spec, epic set-plan), not just validate

Review module deduplication (P1)

  • Extracted get_diff_context() into core/git.py (eliminates 3x duplicated diff-fetch pattern)
  • Extracted receipt lifecycle helpers into codex_utils.py

Ralph hardening (P1)

  • Fixed TOCTOU in receipt verification (single-read read_and_verify_receipt())
  • Added Python fcntl.flock-based concurrent run protection
  • Added --dry-run mode (runs selector loop without Claude invocation)

New commands (P2)

  • flowctl doctor — superset of validate, checks state-dir health, orphaned files, stale tasks, lock accumulation
  • flowctl epic reopen — reopens closed epics, resets review metadata, fails on archived epics

Performance & DX (P2-P3)

  • get_state_dir() memoized (eliminates repeated git rev-parse subprocess)
  • load_all_tasks_with_state() batch loading (single os.scandir vs per-file)
  • Watch-filter iteration/task prefix ([iter N] [fn-X.Y])
  • Guard debug log uses $RUN_DIR path when available

Test plan

  • python3 -m pytest scripts/flowctl/tests/ -v — 212 passed
  • bash scripts/smoke_test.sh — 90/90 passed
  • bash scripts/ci_test.sh — 44/44 passed
  • All Python files compile cleanly
  • Layer 3 adversarial review completed (2 iterations)

🤖 Generated with Claude Code

z23cc and others added 6 commits April 3, 2026 11:50
- Replace verify_receipt + read_receipt_verdict with single
  read_and_verify_receipt that reads file once and returns
  validity + verdict together (eliminates TOCTOU race)
- Add Python fcntl.flock-based concurrent run lock on fd 200
  to prevent multiple ralph.sh instances on the same repo
  (degrades gracefully on Windows)
- Add --dry-run flag that runs selector loop only, printing
  iter/status/epic/task info without Claude invocation,
  branch creation, or state changes

Task: fn-4-flowctl-comprehensive-optimization-and.6
The spec heading validation added by task .6 rejected all heading errors
including missing headings, which breaks the smoke test set-spec --file
test case (legitimate specs without ## Done summary / ## Evidence).
Per acceptance criteria, only duplicate headings should be rejected.

Task: fn-4-flowctl-comprehensive-optimization-and.4
- Tests handle_pre_tool_use with 11 blocked command patterns (chat-send --json,
  --new-chat re-review, direct codex exec/review, --last, setup-review missing
  flags, select-add missing --window, flowctl done missing flags, receipt before review)
- Tests handle_pre_tool_use with 11 allowed patterns (flowctl wrappers, chat-send
  without --json, first --new-chat, done with all flags, git/flowctl show, etc.)
- Tests handle_post_tool_use state transitions (chat-send success/null, flowctl
  done tracking, codex review verdicts, receipt reset, setup-review W=/T= tracking)
- Tests edge cases (done in comments, echo, multi-line commands, $FLOWCTL variable)
- Tests protected file checks, parse_receipt_path, state persistence, stop events

Task: fn-4-flowctl-comprehensive-optimization-and.3
- Remove dead cmd_codex_completion_review (~220 lines) from prompts.py
  (active version lives in commands.py, dead copy was from pre-refactor)
- Remove 6 stale imports only used by the dead function (json, Any,
  EPICS_DIR, SPECS_DIR, TASKS_DIR, load_json_or_exit, get_flow_dir,
  gather_context_hints)
- Add test_admin.py: 12 tests for validate_task_spec_headings (valid,
  missing, duplicate, edge cases)
- Add test_task.py: 10 tests for patch_task_section (replace, add,
  duplicate error, heading strip, edge cases)
- Add test_review.py: 33 tests for parse_codex_verdict, resolve_codex_sandbox,
  is_sandbox_failure (all branches, env vars, JSON output parsing)

Note: import shutil was already present in commands.py (fixed by prior commit)

Task: fn-4-flowctl-comprehensive-optimization-and.2
- Memoize get_state_dir() per cwd with _reset_state_dir_cache() for tests
- Add load_all_tasks_with_state() using os.scandir for single-pass loading
- Replace per-file task loading in cmd_tasks/cmd_list/cmd_epics with batch
- Add [iter N] [fn-X.Y] prefix to watch-filter output from env vars
- Use $RUN_DIR/guard-debug.log when RUN_DIR is set in ralph-guard

Task: fn-4-flowctl-comprehensive-optimization-and.8
…s plan review

- validate_task_spec_headings now strips fenced code blocks before checking
  for duplicate headings, preventing false positives on specs with markdown
  examples containing ## headings
- epic reopen now resets plan_review_status and plan_reviewed_at alongside
  completion_review_status, so flowctl next --require-plan-review correctly
  re-requires review after reopening
- Updated test to verify fenced code block handling (was documenting as
  known limitation, now fixed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@z23cc z23cc merged commit 51d6280 into main Apr 3, 2026
z23cc added a commit that referenced this pull request Apr 9, 2026
P0 fixes (state loss — root cause of 5 issues):
- get_flow_dir() now walks up directory tree (FLOW_STATE_DIR env → walk-up → CWD)
  Fixes: #1 state loss, #3 state not persistent, #5 worker parallel fail,
  #9 .flow symlink issues. Same pattern as git finding .git.
- flowctl recover --epic <id> [--dry-run]: rebuilds task completion status
  from git log. Fixes #11 no recovery after state loss.

P1 fixes (guard + review):
- Guard graceful fallback: missing tools → "skipped" (not "failed").
  Only actual failures block pipeline. Fixes #8.
- Review-backend availability check: if rp-cli/codex not in PATH,
  auto-fallback to "none" with warning. Fixes #7.

P2 fixes (UX):
- Slug max length 40→20 chars. "Django+React platform with account
  management" → "fn-3-django-react-plat" not 40-char monster. Fixes #2 #12.
- Brainstorm auto-skip: trivial tasks (≤10 words, contains "fix"/"typo"/etc)
  skip brainstorm entirely. Fixes #6.
- --interactive flag: pause at key decisions. Fixes #10.

370 tests pass. Zero new dependencies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
z23cc added a commit that referenced this pull request Apr 9, 2026
When starting a task while other tasks in the same epic are already
in_progress, flowctl now emits:
- stderr warning about worktree isolation requirement
- JSON field "parallel_warning" with concurrent task count

This catches the #1 Critical audit finding (3 consecutive rounds):
workers running in the same directory without worktree isolation.

The warning is informational (not blocking) because flowctl cannot
control the Agent tool's isolation parameter. But the message appears
in the agent's tool output, making it harder to ignore.

Use --force to suppress the warning.

370 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
z23cc added a commit that referenced this pull request Apr 9, 2026
…xes)

Fix 1: Global --project-dir (-C) flag
  flowctl -C /path/to/project <command>
  Changes CWD before any command runs. Fixes the #1 recurring issue
  across 5 audit rounds: CWD drift after cd to subdirectories causes
  "task not found" / "epic not found" errors.

Fix 2: Evidence must contain prose, not just Q:N scores
  Old: --evidence "Q1:3 Q2:2 Q3:3..." (50 chars, easily fabricated)
  New: minimum 200 chars + must reference questions
  Rejects: "Q1:3 Q2:2..." → "too short (50 chars, minimum 200)"
  Accepts: "Q1(Right problem):3 evidence from code..." → 438 chars OK

This addresses the fn-8 audit finding where --score 24 was provided
with fabricated Q:N entries. AI must now include actual reasoning.

370 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant