Skip to content

feat(guardrails): autonomous pipeline — orchestration, auto-review, ship agent#145

Merged
terisuke merged 7 commits intodevfrom
feat/autonomous-pipeline-wave1
Apr 8, 2026
Merged

feat(guardrails): autonomous pipeline — orchestration, auto-review, ship agent#145
terisuke merged 7 commits intodevfrom
feat/autonomous-pipeline-wave1

Conversation

@terisuke
Copy link
Copy Markdown

@terisuke terisuke commented Apr 8, 2026

Summary

Key Changes

File Lines Description
guardrail.ts +290 Client type, autoReview pipeline, checklist gate, workflow state machine, system.transform
team.ts +25 Worktree verification, directory injection, auto-commit, regex fix
agents/ship.md +42 New agent with merge permission
commands/ship.md rewrite Executable merge workflow (was read-only gate)
commands/auto.md +18 New autonomous pipeline command
AGENTS.md +1 Ship agent entry

Test plan

  • bun test all pass (2062 pass, pre-existing LSP fail only)
  • Typecheck clean (bun turbo typecheck)
  • Local deploy + binary launch
  • Firing test: /implement sets workflow_phase
  • Firing test: session.idle triggers auto-review
  • Firing test: /ship executes gh pr merge
  • Firing test: /auto completes full pipeline

Closes #140, #141, #142, #144

🤖 Generated with Claude Code

terisuke and others added 5 commits April 8, 2026 23:59
…#141/#142

Bug 0-1: Add exit code guards to all git() helper callers to prevent
using stdout from failed commands in security decisions.

Bug 0-2: Fix team.ts mut() merge regex to use (\s|$) instead of \b,
preventing false matches on "git merge-base".

Bug 0-3: Create MUTATING_TOOLS constant (edit/write/apply_patch/multiedit)
and use it consistently for review_state resets and advisories.

Bug 0-4: Verified Python test detector regex already handles subdirectory
paths correctly via (^|\/) pattern — no change needed.

Bug 0-5: Convert chat.message git fetch from blocking await to
fire-and-forget with deferred advisory surfaced on next message.

Bug 0-6: Add secExempt pattern to skip .env.example/.sample/.template
files from secret material read blocks.

Bug 0-7: Add apply_patch to tool.execute.before preflight checks for
security files, config files, version baseline, and context budget.

Bug 0-8: Tighten mut[] regex patterns: \brm\b → \brm\s+, \bmv\b →
\bmv\s+, \bcp\b → \bcp\s+ to reduce false positives from arguments.

Closes #140, Closes #141, Closes #142

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add ship agent with gh pr merge permission for /ship command
- Rewrite /ship from read-only gate to executable merge workflow
- Fix delegation worktree empty init (Issue #144): verify files after creation
- Inject working directory path into child session prompts
- Auto-commit applied worker changes in merge() to prevent uncommitted drift
- Add ship agent to AGENTS.md subagents table

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p command

Add a dedicated ship agent with `gh pr merge` permission, replacing the
read-only review agent for the /ship command. The ship agent uses deny-by-default
permissions with explicit allowlists for git read commands and gh PR operations,
enabling OpenCode to actually execute merges instead of saying "please merge manually."

- New: packages/guardrails/profile/agents/ship.md (deny-by-default, gh pr merge allowed)
- Rewrite: /ship command from read-only gate check to executable merge workflow
- Update: AGENTS.md subagents table with ship agent entry
- Update: test replay fixture and assertion to use ship agent

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… management

Add autonomous pipeline support to guardrail.ts:
- Client type for session API (create, prompt, status, messages, abort)
- Workflow state tracking (phase, PR URL, review attempts, issues)
- Auto-review trigger on session.idle (3+ edits, review pending)
- pollIdle/readResult/parseFindings helpers for review pipeline
- checklist() gate for merge readiness (tests, review, CI, findings)
- Workflow phase transitions in tool.execute.after (PR/test/merge/CI/issue)
- experimental.chat.system.transform hook for pipeline guidance
- command.execute.before: /implement and /auto workflow initialization
- Compacting context includes workflow phase, PR, review attempts
- New /auto command for full autonomous pipeline execution

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pipeline-wave1

# Conflicts:
#	packages/guardrails/profile/agents/ship.md
Copilot AI review requested due to automatic review settings April 8, 2026 15:47
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

New PR opened -- automated review will run on the next push.

To trigger a manual review, comment /review on this PR.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

The following comment was made by an LLM, it may be inaccurate:

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the guardrails profile toward an autonomous workflow pipeline by adding auto-review orchestration, a dedicated /ship subagent that can execute gh pr merge, and tightening/repairing delegation worktree behavior.

Changes:

  • Add an auto-review pipeline triggered on session.idle (edits ≥ 3) plus workflow phase/state tracking and system prompt injection.
  • Introduce a new ship agent + rewrite /ship command to verify gates and run gh pr merge; add /auto command for end-to-end automation.
  • Improve delegation/worktree handling in team.ts (worktree verification, auto-commit, mutating regex adjustments) and update scenario tests/replays.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
packages/guardrails/profile/plugins/guardrail.ts Adds auto-review + workflow orchestration/state, expands mutation handling, and adjusts secret detection.
packages/guardrails/profile/plugins/team.ts Worktree creation verification, auto-commit after merge-back, and mutating command regex tweaks.
packages/guardrails/profile/agents/ship.md New ship subagent with restricted permissions and merge capability.
packages/guardrails/profile/commands/ship.md Rewrites /ship into an executable merge workflow (gates → merge).
packages/guardrails/profile/commands/auto.md Adds /auto command instructions for autonomous pipeline execution.
packages/guardrails/profile/AGENTS.md Documents the new ship agent.
packages/opencode/test/scenario/replay.ts Updates ship replay metadata to use the new ship agent/prompt.
packages/opencode/test/scenario/guardrails.test.ts Updates expectation that ship maps to the ship agent.
Comments suppressed due to low confidence (1)

packages/guardrails/profile/plugins/guardrail.ts:797

  • tool.execute.before applies deny/version/context-budget checks to edit/write/apply_patch, but not to multiedit. Since multiedit takes a filePath and performs sequential edits, this creates a policy bypass for secret/config protection and the version/context gates. Consider extending the same checks to multiedit (and if needed, iterating through edits to validate each operation) to keep behavior consistent with other mutating tools.
    "tool.execute.before": async (
      item: { tool: string; args?: unknown },
      out: { args: Record<string, unknown> },
    ) => {
      const file = pick(out.args ?? item.args)
      if (file && (item.tool === "read" || item.tool === "edit" || item.tool === "write" || item.tool === "apply_patch")) {
        const err = deny(file, item.tool === "read" ? "read" : "edit")
        if (err) {
          await mark({ last_block: item.tool, last_file: rel(input.worktree, file), last_reason: err })
          throw new Error(text(err))
        }
      }
      if (item.tool === "edit" || item.tool === "write" || item.tool === "apply_patch") {
        const err = await version(out.args ?? {})
        if (err) {
          await mark({ last_block: item.tool, last_file: file ? rel(input.worktree, file) : "", last_reason: err })
          throw new Error(text(err))
        }
      }
      if ((item.tool === "edit" || item.tool === "write" || item.tool === "apply_patch") && file && code(file)) {
        const count = await budget()
        if (count >= 4) {
          const budgetData = await stash(state)
          const readFiles = list(budgetData.read_files).slice(-5).join(", ")
          const err = `context budget exceeded after ${count} source reads (recent: ${readFiles || "unknown"}). Recovery options:\n(1) call \`team\` tool to delegate edit to isolated worker\n(2) use \`background\` tool for side work\n(3) narrow edit scope to a specific function/section rather than whole file\n(4) start a new session and continue from where you left off`
          await mark({ last_block: item.tool, last_file: rel(input.worktree, file), last_reason: err })
          throw new Error(text(err))
        }
      }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 103 to 107
const secExempt = /\.(example|sample|template)$/i

function has(file: string, list: RegExp[]) {
if (list === sec && secExempt.test(file)) return false
return list.some((item) => item.test(file))
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secExempt currently exempts any path ending with .example, .sample, or .template from all sec protections. This weakens the secret-file guard broadly (e.g., a file matching *.key or *.pem but suffixed with .template would no longer be blocked). If the goal is only to allow .env.example (Issue #142), please narrow the exemption to explicit .env.(example|sample|template) (or a small allowlist) rather than a generic extension-based bypass.

Copilot uses AI. Check for mistakes.
Comment on lines +462 to +468
const REVIEW_POLL_GAP = 750

async function pollIdle(sessionID: string) {
for (;;) {
const stat = await input.client.session.status({ query: { directory: input.directory } })
const item = stat.data?.[sessionID]
if (!item || item.type === "idle") return
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pollIdle() loops indefinitely with no timeout/abort path. If the review session never reaches idle (API issue, stuck run, missing status entry), autoReview() never resolves/rejects and auto_review_in_progress can remain stuck true, preventing future auto-reviews and leaking a background task. Consider adding a max wait (and/or using input.client.session.abort) and ensuring auto_review_in_progress is cleared on timeout.

Suggested change
const REVIEW_POLL_GAP = 750
async function pollIdle(sessionID: string) {
for (;;) {
const stat = await input.client.session.status({ query: { directory: input.directory } })
const item = stat.data?.[sessionID]
if (!item || item.type === "idle") return
const REVIEW_POLL_GAP = 750
const REVIEW_POLL_TIMEOUT = 60_000
async function pollIdle(sessionID: string) {
const deadline = Date.now() + REVIEW_POLL_TIMEOUT
const sessionClient = input.client.session as typeof input.client.session & {
abort?: (args: { path: { id: string }; query: { directory: string } }) => Promise<unknown>
}
for (;;) {
const stat = await input.client.session.status({ query: { directory: input.directory } })
const item = stat.data?.[sessionID]
if (!item || item.type === "idle") return
if (Date.now() >= deadline) {
if (typeof sessionClient.abort === "function") {
await sessionClient.abort({ path: { id: sessionID }, query: { directory: input.directory } }).catch(() => undefined)
}
throw new Error(`Timed out waiting for review session ${sessionID} to become idle`)
}

Copilot uses AI. Check for mistakes.
Comment on lines +507 to +519
await pollIdle(made.data.id)
const result = await readResult(made.data.id)
const findings = parseFindings(result.text)
await mark({
auto_review_in_progress: false,
auto_review_session: made.data.id,
review_state: "done",
reviewed: true,
review_at: new Date().toISOString(),
edits_since_review: 0,
review_critical_count: findings.critical,
review_high_count: findings.high,
})
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoReview() marks review_state: "done" even when the spawned session returns no assistant message (result.text === "") or reports an error (result.error). This can incorrectly satisfy merge gates without a valid review result. Consider treating empty/error results as a failed review: keep review_state unset, record the error, and notify the parent session.

Copilot uses AI. Check for mistakes.
@@ -37,6 +37,8 @@ const mut = [
/>/,
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutating-bash detector includes a bare />/, which matches > inside quoted strings (e.g., echo "version > 2") and will trigger mutating behavior like review_state resets and protected-mutation blocks. To avoid false positives (Issue #141), tighten this to only match redirections outside quotes (or add a scrub/parse step similar to team.ts before applying the regexes).

Suggested change
/>/,
/(?:^|[ \t;&|()])\d*>>?\s*(?:"[^"]+"|'[^']+'|(?!&?\d+\b)[^\s"'&|;()<>]+)/i,

Copilot uses AI. Check for mistakes.
Comment on lines 141 to 165
function scrub(cmd: string) {
return cmd
.replace(/(?:\d*>>?|\&>>?|\&>)\s*\/dev\/null\b/g, " ")
.replace(/\d*>\s*&\s*\d+\b/g, " ")
.replace(/\d*>\s*&-/g, " ")
}

function mut(cmd: string) {
const data = scrub(cmd)
return [
/\brm\b/i,
/\bmv\b/i,
/\bcp\b/i,
/\brm\s+/i,
/\bmv\s+/i,
/\bcp\s+/i,
/\bchmod\b/i,
/\bchown\b/i,
/\btouch\b/i,
/\btruncate\b/i,
/\btee\b/i,
/\bsed\s+-i\b/i,
/\bperl\s+-pi\b/i,
/>/,
/\bgit\s+(apply|am|merge|rebase|cherry-pick|checkout\s+--|reset\s+--hard)\b/i,
/\bgit\s+(apply|am|rebase|cherry-pick|checkout\s+--|reset\s+--hard)\b/i,
/\bgit\s+merge(\s|$)/i,
].some((item) => item.test(data))
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mut() still uses a bare />/ on the scrubbed command. Since scrub() does not remove quoted strings, this will still treat > inside quotes as a mutation (e.g., echo "version > 2"), which can incorrectly reset review state / trigger mutation gates. Consider tightening > detection to actual shell redirections outside quotes (or implement a small unquoted-token scan) per Issue #141.

Copilot uses AI. Check for mistakes.
Comment on lines +281 to +284
const files = await git(next, ["ls-files", "--cached"]).catch(() => ({ code: 1, out: "", err: "" }))
if (files.code !== 0 || !files.out.trim()) {
// Worktree might be empty — force checkout HEAD contents
await git(next, ["checkout", "HEAD", "--", "."])
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In yardadd(), if the worktree appears empty you run git checkout HEAD -- . but don't check the return code or re-validate that files are now present. If checkout fails (or the worktree is misconfigured), this will silently return a still-broken worktree and later steps will fail in harder-to-debug ways. Consider checking the checkout result and throwing (or at least surfacing a clear error) if the worktree remains empty.

Suggested change
const files = await git(next, ["ls-files", "--cached"]).catch(() => ({ code: 1, out: "", err: "" }))
if (files.code !== 0 || !files.out.trim()) {
// Worktree might be empty — force checkout HEAD contents
await git(next, ["checkout", "HEAD", "--", "."])
let files = await git(next, ["ls-files", "--cached"]).catch(() => ({ code: 1, out: "", err: "" }))
if (files.code !== 0 || !files.out.trim()) {
// Worktree might be empty — force checkout HEAD contents
const checkout = await git(next, ["checkout", "HEAD", "--", "."]).catch(() => ({ code: 1, out: "", err: "" }))
if (checkout.code !== 0) {
throw new Error(checkout.err || checkout.out || "Failed to restore files in git worktree")
}
files = await git(next, ["ls-files", "--cached"]).catch(() => ({ code: 1, out: "", err: "" }))
if (files.code !== 0 || !files.out.trim()) {
throw new Error("Git worktree is still empty after checking out HEAD")
}

Copilot uses AI. Check for mistakes.
await git(dir, ["add", "-A"])
const commitMsg = `chore(team): apply worker changes from task ${id}`
const commit = await git(dir, ["commit", "-m", commitMsg, "--no-verify"])
if (commit.code !== 0 && !commit.err.includes("nothing to commit")) {
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-commit failure detection only checks commit.err.includes("nothing to commit"), but git commit commonly prints "nothing to commit" to stdout (with non-zero exit) depending on config/version. This can incorrectly report an auto-commit failure in verification.issues. Consider checking both stdout and stderr for the "nothing to commit" condition (or using git diff --cached --quiet to decide whether to commit).

Suggested change
if (commit.code !== 0 && !commit.err.includes("nothing to commit")) {
const commitOutput = `${commit.out || ""}\n${commit.err || ""}`
if (commit.code !== 0 && !commitOutput.includes("nothing to commit")) {

Copilot uses AI. Check for mistakes.
Comment on lines +657 to +671
if (event.type === "session.idle") {
const data = await stash(state)
const edits = num(data.edit_count)
const pending = str(data.review_state) !== "done"
const inProgress = flag(data.auto_review_in_progress)
const sessionID = str(event.properties?.sessionID)

if (edits >= 3 && pending && !inProgress && sessionID) {
await mark({ auto_review_in_progress: true })
await seen("auto_review.triggered", { edit_count: edits, sessionID })
void autoReview(sessionID, data).catch(async (err) => {
await mark({ auto_review_in_progress: false })
await seen("auto_review.failed", { error: String(err) })
})
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-review behavior (trigger on session.idle after edit_count >= 3, spawning a review session, and updating review_state/counts) is newly introduced here, but there are no scenario/unit tests covering the trigger, the spawned-session polling, or the resulting state transitions (auto_review_in_progress, review_critical_count, etc.). Since packages/opencode/test/scenario/guardrails.test.ts already exercises other guardrail hooks, adding coverage for this path would help prevent regressions (especially around stuck in-progress state and merge gating).

Copilot uses AI. Check for mistakes.
terisuke and others added 2 commits April 9, 2026 00:53
- parseFindings(): skip negation lines ("No CRITICAL issues") to avoid
  false positive severity counts blocking clean reviews
- autoReview(): do not mark review_state="done" when session errored or
  returned empty — prevents merge gate bypass on failed reviews
- team.ts merge(): scope auto-commit to patch-applied files only instead
  of git add -A, preventing unrelated local edits from being swept in

Codex review: codex review --base dev (3 findings, all addressed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Scope secExempt to .env.example/.sample/.template only (not *.key.template)
- Add 2-minute timeout to pollIdle() to prevent stuck auto-review
- Tighten bare `>` regex to avoid matching inside quoted strings
- Add error check + re-verify after yardadd() checkout fallback

Review: PR #145 inline comments (6 findings, 4 new + 1 already fixed + 1 truncated)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(guardrails): apply_patch bypasses tool.execute.before preflight checks

2 participants