Workflows are multi-step agent pipelines defined as YAML files in ~/.agent007/workflows/. Steps without depends_on run in parallel; steps with depends_on run after their predecessors complete.
Command-style dispatch (Codex-friendly):
agent007_dispatch command="$agent007 wf tdd Add rate limiting to the API"
Via MCP (from your AI editor):
agent007_workflow_run name="tdd" task="Add rate limiting to the API"
Via CLI:
agent007 workflow run tdd "Add rate limiting to the API"agent007_dispatch is additive convenience. Direct workflow tools still work.
For multi-step workflows where the host LLM executes each step:
agent007_workflow_start name="feature" task="..."
# Returns session ID
agent007_workflow_next session="<id>"
# Returns step prompts to execute
agent007_workflow_submit_step session="<id>" step="research" output="..."
# Submit your output; get next steps
agent007_workflow_approve session="<id>"
# Approve a human-gate step
Approval-gated workflows now follow a split ownership model:
workflow starts from Codex / Claude / Cursor / Copilot / Zed
-> approval surfaces back to the initiating client
-> the web dashboard stays read-only for that externally initiated run
workflow starts from the dashboard's standalone runtime
-> dashboard can show Resume Workflow
-> the standalone run can resume there
The practical rule is:
- externally initiated workflows are approved and continued in the initiating client
- dashboard-owned standalone workflows can still be resumed from the dashboard
For current limitations and guidance, see Known Issues.
Steps: 3 (sequential) | Use when: Writing a new feature or requirement with tests first.
red → green → refactor
- red — Write failing tests that define the expected behavior
- green — Write minimal implementation to make tests pass
- refactor — Clean up code while keeping tests green
Steps: 4 (3 parallel + synthesize) | Use when: Reviewing code for quality, security, and performance.
security-review ─┐
performance-review ─┤─► synthesize
style-review ────┘
- security-review — OWASP vulnerabilities, auth, injection, secrets
- performance-review — Allocations, N+1 queries, blocking calls, caching
- style-review — Code clarity, naming, patterns, maintainability
- synthesize — Aggregates all findings into a severity-ranked report
Steps: 5 (sequential) | Use when: Building a new feature end-to-end from requirements.
spec → pseudocode → architecture → refinement → completion
- spec — Detailed specification with requirements and constraints
- pseudocode — Structured pseudocode with logic flow
- architecture — Component design, data flow, interfaces
- refinement — Review for correctness, edge cases, quality
- completion — Final docs, tests, polish
Steps: 4 (3 parallel + synthesize) | Use when: Diagnosing errors, performance issues, or security events in logs.
error-finder ──┐
pattern-analyst ─┤─► synthesize
security-checker ┘
- error-finder — Identify errors, exceptions, and crashes
- pattern-analyst — Frequency patterns, timing, correlations
- security-checker — Auth failures, anomalies, potential attacks
- synthesize — Aggregated report with root cause and action items
Steps: 17 | Use when: Delivering a production feature with full review gates.
load-context → research → document-brief → feature-spec → architecture
→ implement → [APPROVAL GATE]
→ code-review, security-review, performance-review, gap-analysis, issue-analysis (parallel)
→ rework → test-design → test-coverage-review → document-feature
→ [APPROVAL GATE: release-signoff]
Human approval gates at implement and release-signoff.
Steps: 7 | Use when: Turning a vague idea into a concrete PRD and architecture.
research → [APPROVAL] → document-ideation → prd → architecture → document-design → project-plan
Steps: 5 (4 parallel + synthesize) | Use when: Full security review before a release.
owasp-scan ────────┐
secrets-scan ──────┤─► synthesize
threat-model ──────┤
dependency-scan ───┘
Use when: You need fast idea exploration with an approval checkpoint before committing to full architecture work.
Typical flow:
brainstorm -> approval -> PRD + ideation doc
Eval Gates score each workflow run against a rolling baseline and make a pass / warn / block decision before the run proceeds past the gate.
# Configured per-workflow in the workflow YAML
eval_gate:
min_baseline_runs: 5 # runs needed before gating starts
baseline_window: 20 # rolling window size
warn_threshold: 0.15 # score drop that triggers warn
block_threshold: 0.30 # score drop that triggers blockDecision fields surfaced in the dashboard:
| Field | Description |
|---|---|
decision |
pass / warn / block |
baseline_sample_size |
Runs in the current baseline window |
min_baseline_runs |
Minimum runs needed to activate the gate |
baseline_window |
Rolling window size used for scoring |
reason_codes |
Machine-readable codes explaining the decision |
message |
Human-readable explanation |
When the gate fires block, the workflow step is halted and the run is marked failed. warn continues but records the degradation. pass continues silently.
Eval gate results are visible in the Persisted Runs accordion in the web dashboard.
The reliability engine wraps each workflow step with four additive controls. All behaviors are feature-gated and backward-compatible — existing workflows are unaffected unless you opt in.
Tracks token spend per step and per run. On budget breach:
- If a degradation path is available → truncates step output and continues (
degradetransition) - If no degradation path → aborts the step (
budget-exceededtransition)
Runs a pre-step check before executing any step marked as risky. If the guardrails check blocks the operation, the step is transitioned to guardrail-blocked and the run fails that step cleanly.
After a step produces output, the reliability engine scores confidence. Low-confidence output routes into the existing approval flow — a human can review and approve or reject before the workflow continues.
Each failure mode is recorded as an explicit transition (not a silent crash). Steps retry up to a bounded limit before marking as failed. Transition records are queryable on the run detail.
Adaptive Shadow records advisory routing recommendations alongside each run without changing execution. The router observes which model/route was used for each step and computes what it would recommend based on historical performance — logged as a shadow recommendation.
This is read-only. The actual route used does not change. Shadow recommendations accumulate over time and can be used to tune ModelRouter config.
Shadow data is surfaced in the Persisted Runs accordion under Routing Recommendations:
step_id current_route recommended_route confidence
researcher claude-sonnet-4-6 claude-opus-4-6 74% (12 samples)
implementer claude-sonnet-4-6 claude-sonnet-4-6 91% (12 samples)
fallback: true means the router had insufficient data and used the default.
agent007_workflow_create name="my-workflow" yaml="..."Workflow YAML schema:
name: my-workflow
description: What this workflow does
steps:
- id: research
agent: Researcher
prompt: |
Research the following: {{task}}
output: research_findings
- id: implement
agent: Engineer
depends_on: [research]
prompt: |
Based on: {{research_findings}}
Implement: {{task}}
output: implementationFields:
id— unique step identifieragent— persona name to use for this stepprompt— Tera template;{{task}}is always available; prior step outputs available by theiroutputvariable nameoutput— variable name this step's result is stored underdepends_on— list of step IDs that must complete first (omit for parallel execution)