Skip to content

V2 reliability and performance hardening#3

Merged
z23cc merged 7 commits intomainfrom
fn-6-v2-reliability-and-performance-hardening
Apr 3, 2026
Merged

V2 reliability and performance hardening#3
z23cc merged 7 commits intomainfrom
fn-6-v2-reliability-and-performance-hardening

Conversation

@z23cc
Copy link
Copy Markdown
Owner

@z23cc z23cc commented Apr 3, 2026

Summary

Second optimization round based on deep audit of v0.1.18. Focuses on business logic reliability, performance hot paths, and review system robustness.

P0 — Critical reliability

  • cmd_done write ordering: Runtime state now written FIRST (authoritative), then spec, then receipt — prevents phantom incomplete tasks on crash
  • Workflow command tests: 43 new pytest tests for cmd_start, cmd_done, cmd_ready, cmd_next, cmd_restart, cmd_block
  • get_repo_root() cached: Module-level memoization eliminates 10+ subprocess calls per command

P1 — High priority fixes

  • ralph-guard TOCTOU: lock-check (read-only) replaced with lock --task (acquire-or-fail) — eliminates check-then-act race
  • Backward-compat writes removed: cmd_start/done/reset/restart no longer write status to git-tracked definition JSON
  • Codex review fixes: Timeout configurable via FLOW_CODEX_TIMEOUT, resume fallback logs warning, files_embedded respects sandbox mode

P2 — Performance + DX

  • Batch git grep: gather_context_hints uses alternation pattern (500 subprocess → ~10)
  • find_dependents O(N²) → O(N): Loads tasks once, traverses in-memory
  • cmd_next batch loading: Uses load_all_tasks_with_state() instead of per-file reads
  • rp.py --json consistency: All 15 error paths now respect args.json
  • Guard hardening: Command regex uses normalized input, state stored in state-dir, debug log rotates at 1MB

P3 — Cross-platform + prompts

  • Windows flock: Uses msvcrt.locking() instead of no-op
  • Adversarial prompt validation: Warns on unconsumed {{placeholders}}
  • phases.md fix: codex execflowctl codex exec (matches guard allowlist)

Test plan

  • python3 -m pytest scripts/flowctl/tests/ -v — 255 passed (43 new)
  • bash scripts/smoke_test.sh — 90/90 passed
  • All Python files compile cleanly
  • Teams mode used for all parallel waves

🤖 Generated with Claude Code

z23cc and others added 7 commits April 3, 2026 14:22
…e files_embedded

- Replace hardcoded timeout=600 with FLOW_CODEX_TIMEOUT env var (default 600s)
- Print WARNING to stderr when codex resume falls back to new session
- Force files_embedded=True in plan-review when sandbox prevents disk reads

Task: fn-6-v2-reliability-and-performance-hardening.5
- compat.py: use msvcrt.locking() on Windows instead of no-op flock,
  with warning fallback for unknown platforms
- adversarial.py: warn on unconsumed {{...}} placeholders after
  template substitution
- phases.md: replace raw `codex exec` with `flowctl codex exec` to
  match guard allowlist
Write runtime state before spec/receipt in cmd_done so a mid-write
crash still marks the task as done (runtime is authoritative). Cache
get_repo_root() per cwd to avoid repeated subprocess calls, following
the same pattern as get_state_dir().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Status is now managed exclusively by runtime state files.
Definition JSON files no longer have status written to them
in cmd_task_reset, cmd_restart, cmd_skip, and cmd_split.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cmd_start: 11 tests (claim, deps, blocked, force, resume, invalid ID)
- cmd_done: 10 tests (evidence, spec update, duration, cross-actor, force)
- cmd_ready: 7 tests (unblocked, dep done, skipped-as-done, in_progress, blocked)
- cmd_next: 6 tests (ready task, resume, all-done, completion/plan review gates)
- cmd_restart: 6 tests (cascade, dry-run, force, skip-todo, invalid/missing)
- cmd_block: 3 tests (status, done-fails, spec update)

Task: fn-6-v2-reliability-and-performance-hardening.2
- Thread use_json parameter through all rp.py helpers (require_rp_cli,
  run_rp_cli, parse_windows, parse_builder_tab) and cmd_* functions
- All ~15 error_exit calls now use args.json instead of hardcoded False
- Add 1MB log rotation in ralph-guard _debug_log_path() (single-file .1)
- Bump RALPH_GUARD_VERSION to 0.15.0

Task: fn-6-v2-reliability-and-performance-hardening.7
- gather_context_hints: collect all symbols, batch into ≤5 git grep calls
  instead of N per-symbol subprocess calls
- find_dependents: load all task files once upfront, then BFS over
  in-memory dict instead of re-globbing per BFS level
- cmd_next: use load_all_tasks_with_state(epic_id) single-scan batch
  load instead of per-file load_task_with_state loop

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@z23cc z23cc merged commit 4c67897 into main Apr 3, 2026
z23cc added a commit that referenced this pull request Apr 9, 2026
P0 fixes (state loss — root cause of 5 issues):
- get_flow_dir() now walks up directory tree (FLOW_STATE_DIR env → walk-up → CWD)
  Fixes: #1 state loss, #3 state not persistent, #5 worker parallel fail,
  #9 .flow symlink issues. Same pattern as git finding .git.
- flowctl recover --epic <id> [--dry-run]: rebuilds task completion status
  from git log. Fixes #11 no recovery after state loss.

P1 fixes (guard + review):
- Guard graceful fallback: missing tools → "skipped" (not "failed").
  Only actual failures block pipeline. Fixes #8.
- Review-backend availability check: if rp-cli/codex not in PATH,
  auto-fallback to "none" with warning. Fixes #7.

P2 fixes (UX):
- Slug max length 40→20 chars. "Django+React platform with account
  management" → "fn-3-django-react-plat" not 40-char monster. Fixes #2 #12.
- Brainstorm auto-skip: trivial tasks (≤10 words, contains "fix"/"typo"/etc)
  skip brainstorm entirely. Fixes #6.
- --interactive flag: pause at key decisions. Fixes #10.

370 tests pass. Zero new dependencies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant