From faf7e6698a418c89c3dfbecf289b0360f88375a2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jose=20Villase=C3=B1or=20Montfort?= <195970+montfort@users.noreply.github.com> Date: Tue, 5 May 2026 09:57:43 -0600 Subject: [PATCH] =?UTF-8?q?feat(framework):=20Phase=20v1=20PR=206=20?= =?UTF-8?q?=E2=80=94=20audit-prompt=20and=20audit-review=20skills=20update?= =?UTF-8?q?d?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updates the two existing v0 skills to align with the v1 unified flow. audit-prompt is simplified to a thin orchestration shell (no more inline prompt surfacing); audit-review evolves substantially from "validate + merge YAML" to "consolidated analysis generator" producing a six-section review.md document. devtrail-audit-prompt (rewritten body, simplified to ~85 lines per platform): - No longer surfaces prompts inline. The v1 flow writes the prompt to .devtrail/audits//audit-prompt.md and the auditor-side CLIs read it from disk via /devtrail-audit-execute (PR 5). Operators never copy/paste prompts. - Calls `devtrail charter audit --prepare` (the v1 subcommand introduced in PR 4). - Next-steps guidance directs the operator to open N auditor-side CLIs and invoke /devtrail-audit-execute in each, then return to the main agent and run /devtrail-audit-review only when ALL audits commissioned have completed. devtrail-audit-review (substantial rewrite, ~253 lines per platform): - Replaces the v0 "validate + merge YAML" behavior with a consolidated analysis generator. The skill now: 1. Verifies report set under .devtrail/audits//. 2. Reads N reports, builds master finding list. 3. Verifies each finding against actual code via Explore agents in parallel (up to 3 at a time) — the calibrator role moves from a paste-based prompt template to in-conversation work with filesystem access. This is what makes the consolidated review substantive rather than mechanical. 4. Classifies findings by verdict (VALID / PARTIALLY VALID / MISATTRIBUTED / FALSE POSITIVE / DUPLICATE) and recalibrates severity against the active configuration (anti-inflation, anti-deflation per the audit prompt's Paso 5 discipline). 5. Identifies findings the auditors missed. 6. Writes review.md with six sections: Executive summary, Scope definition, Per-auditor evaluation, Remediation plan P0-P4, Discarded findings, Auditor ratings 1-10 across four weighted criteria. 7. Runs `devtrail charter audit --merge-reports` to validate all reports against the schema and emit/merge the external_audit YAML block. 8. Branch B handling: when telemetry doesn't exist yet, writes external-audit-pending.yaml for the operator to paste at charter close time. Per-platform variants: - dist/.claude/skills//SKILL.md (allowed-tools) - dist/.gemini/skills//SKILL.md (no allowed-tools) - dist/.agent/workflows/.md (description-only frontmatter) Both skills credit the lift from Sentinel's pre-DevTrail audit/SKILL.md and audit-review/SKILL.md (issue #102). Tests (cli/tests/audit_skill_test.rs): updated assertions for the parity test on each skill, removing v0-specific markers and adding v1 ones: - audit-prompt parity now asserts: /devtrail-audit-execute, /devtrail-audit-review, .devtrail/audits/, audit-prompt.md, --prepare, "ALL audits ... complete" wait warning, "different model families" recommendation. v0 markers ("Run AUDITOR PRIMARY PROMPT", "DO NOT use the same family for both") removed. - audit-review parity now asserts: --merge-reports, review.md, Executive summary, Remediation plan, Auditor ratings, the five verdicts (VALID/PARTIALLY VALID/MISATTRIBUTED/FALSE POSITIVE/ DUPLICATE), the four criterion names (Scope precision, Technical depth, Bug detection, False positive rate), and external-audit-pending.yaml. v0 markers (--calibrate, --finalize) removed. Test plan: - cargo test --test audit_skill_test → 12/12 green. - cargo test (full suite) → all suites green, no regressions. - No version bump (lands together with PRs 7-8 in the integrated v1 release per Propuesta v0.2 §5). Co-Authored-By: Claude Opus 4.7 (1M context) --- cli/tests/audit_skill_test.rs | 73 ++++- .../.agent/workflows/devtrail-audit-prompt.md | 85 +++--- .../.agent/workflows/devtrail-audit-review.md | 260 +++++++++++++---- .../skills/devtrail-audit-prompt/SKILL.md | 83 +++--- .../skills/devtrail-audit-review/SKILL.md | 262 ++++++++++++++---- .../skills/devtrail-audit-prompt/SKILL.md | 83 +++--- .../skills/devtrail-audit-review/SKILL.md | 260 +++++++++++++---- 7 files changed, 781 insertions(+), 325 deletions(-) diff --git a/cli/tests/audit_skill_test.rs b/cli/tests/audit_skill_test.rs index 2de88df..c993647 100644 --- a/cli/tests/audit_skill_test.rs +++ b/cli/tests/audit_skill_test.rs @@ -110,21 +110,36 @@ fn devtrail_audit_prompt_three_platforms_share_core_guidance() { .join("devtrail-audit-prompt.md"), ); - // The next-steps guidance text is load-bearing for the workflow — - // the operator gets the same instructions regardless of platform. + // v1: skill no longer surfaces prompts inline. It runs --prepare and + // points the operator at /devtrail-audit-execute in N CLIs. The + // wait-for-all warning is the load-bearing UX guarantee. for body in [&claude, &gemini, &agent] { assert!( - body.contains("Run AUDITOR PRIMARY PROMPT"), - "next-steps guidance missing" - ); - assert!( - body.contains("DO NOT use the same family for both"), - "heterogeneity recommendation missing" + body.contains("/devtrail-audit-execute"), + "skill must point operator at the auditor-side execute skill" ); assert!( body.contains("/devtrail-audit-review"), "skill must point operator at the follow-up review skill" ); + assert!( + body.contains(".devtrail/audits/") + && body.contains("audit-prompt.md"), + "skill must reference the v1 canonical prompt location" + ); + assert!( + body.contains("--prepare"), + "skill must invoke `devtrail charter audit ... --prepare`" + ); + assert!( + body.contains("ALL audits") && body.contains("complete"), + "skill must include the wait-for-all warning before review" + ); + assert!( + body.contains("DIFFERENT model families") + || body.contains("different model families"), + "skill must surface the heterogeneity inter-family recommendation" + ); } } @@ -221,21 +236,51 @@ fn devtrail_audit_review_three_platforms_share_core_guidance() { ); for body in [&claude, &gemini, &agent] { + // v1: review consolidates N reports + writes review.md + + // optionally merges YAML. No more --calibrate / --finalize + // round-trip. + assert!( + body.contains("--merge-reports"), + "skill must invoke the v1 merge-reports CLI subcommand" + ); + assert!( + body.contains("review.md"), + "skill must produce the consolidated review.md document" + ); + // Six-section structure lifted from Sentinel skill. + assert!( + body.contains("Executive summary") || body.contains("executive summary"), + "review.md must include an Executive summary section" + ); + assert!( + body.contains("Remediation plan") || body.contains("remediation plan"), + "review.md must include the prioritized remediation plan" + ); assert!( - body.contains("--calibrate"), - "skill must reference the CALIBRATE step" + body.contains("Auditor ratings") || body.contains("auditor ratings"), + "review.md must include the per-auditor ratings" ); + // Verdict vocabulary. assert!( - body.contains("--finalize"), - "skill must reference the FINALIZE step" + body.contains("VALID") + && body.contains("PARTIALLY VALID") + && body.contains("MISATTRIBUTED") + && body.contains("FALSE POSITIVE") + && body.contains("DUPLICATE"), + "skill must use the five-verdict vocabulary lifted from Sentinel" ); + // Branch B: pending YAML when telemetry doesn't exist yet. assert!( body.contains("external-audit-pending.yaml"), "skill must handle the Branch B (telemetry not yet present) case" ); + // The four-criterion weighted auditor rating. assert!( - body.contains("re-audit") || body.contains("re-merge"), - "skill must surface the v0 re-audit limitation" + body.contains("Scope precision") + && body.contains("Technical depth") + && body.contains("Bug detection") + && body.contains("False positive rate"), + "skill must include the four-criterion weighted rating" ); } } diff --git a/dist/.agent/workflows/devtrail-audit-prompt.md b/dist/.agent/workflows/devtrail-audit-prompt.md index 1037acb..6dd26f7 100644 --- a/dist/.agent/workflows/devtrail-audit-prompt.md +++ b/dist/.agent/workflows/devtrail-audit-prompt.md @@ -1,14 +1,14 @@ --- -description: Prepare external multi-model audit for a Charter. Generates auditor prompts inline so the operator can paste them into 2 LLM auditors of different families. +description: Generate the unified audit prompt for a Charter at the canonical filesystem location. The operator then opens N auditor-side CLIs (gemini-cli, claude-cli, copilot-cli, etc.) and invokes /devtrail-audit-execute in each — no copy/paste. Counterpart of /devtrail-audit-review. --- -# DevTrail Audit Prompt Workflow +# DevTrail Audit Prompt Skill -Generate prompts for external multi-model audit of a Charter, surfaced inline in the conversation so the operator can paste them into auditors of two different model families without leaving the chat. +Generate the unified audit prompt for a Charter and write it to the canonical filesystem location. The operator then opens auditor-side CLIs in the same repo and invokes `/devtrail-audit-execute` — the audit prompt is read directly from disk by each auditor; the operator never copies/pastes. ## When to invoke -Use this workflow when the developer agreed to run an external audit at the Charter checkpoint (see `.devtrail/00-governance/AGENT-RULES.md` § Audit checkpoint, available from `fw-4.8.0`). +Use this skill when the developer agreed to run an external audit at the Charter checkpoint (see `.devtrail/00-governance/AGENT-RULES.md` § Audit checkpoint). The Charter should be in `in-progress` or `declared` status — auditing closed Charters is allowed but atypical (warn the operator and proceed only on confirmation). @@ -16,7 +16,7 @@ The Charter should be in `in-progress` or `declared` status — auditing closed ### 1. Resolve the Charter -Argument: a Charter identifier (`CHARTER-04`, `04`, or the full id with slug). +Argument: a Charter identifier (`CHARTER-04`, `04`, or full id with slug). ```bash devtrail charter status @@ -24,67 +24,60 @@ devtrail charter status Verify the Charter exists and capture its `status`. If `status: closed`, surface a one-line warning to the operator and ask whether to proceed. -### 2. Generate the auditor prompts (PREPARE step) +### 2. Generate the unified audit prompt ```bash -devtrail charter audit +devtrail charter audit --prepare ``` -The CLI writes the resolved prompts to disk: +The CLI writes the resolved prompt to: -- `audit/charters//prompts/auditor-primary.prompt.md` -- `audit/charters//prompts/auditor-secondary.prompt.md` +``` +.devtrail/audits//audit-prompt.md +``` -This is `devtrail charter audit` step 1 (PREPARE). The CLI does NOT invoke any LLM — it only resolves placeholders against the Charter content, the git diff, and the originating AILOGs. +The prompt is self-contained: it embeds the Charter content, the originating AILOGs, the git diff over the resolved range (default `origin/main..HEAD`, falls back to `HEAD~1..HEAD` if no upstream is reachable), and the discipline rules (REGLA ABSOLUTA — SOLO LECTURA, evidence-citation, severity calibration). The prompt template lifts the seven universal sections from Sentinel's pre-DevTrail audit skill and parameterizes the project-specific hardcodes. -### 3. Surface the prompts inline +The CLI does NOT invoke any LLM. It only resolves placeholders. -Read both files and print their contents in the conversation, with clear separators: +### 3. Notify the operator -``` -═══════════════════ AUDITOR PRIMARY PROMPT ═══════════════════ -[full contents of auditor-primary.prompt.md] -══════════════════════════════════════════════════════════════ +After the CLI succeeds, print this guidance verbatim (substituting ``): -═══════════════════ AUDITOR SECONDARY PROMPT ═════════════════ -[full contents of auditor-secondary.prompt.md] -══════════════════════════════════════════════════════════════ ``` +Audit prompt prepared for . -Do not summarise or truncate. The operator needs the full prompts to paste into the external auditors as-is. - -### 4. Provide next-steps guidance - -After surfacing the prompts, print this guidance verbatim (substituting ``): + Location: .devtrail/audits//audit-prompt.md + (informational — you don't need to copy this path anywhere) -``` Next steps: - 1. Run AUDITOR PRIMARY PROMPT in a model of family A - (e.g., Anthropic — claude-sonnet-4-6 or claude-opus-4-7). + 1. Open one or more auditor-side CLIs (gemini-cli, claude-cli, + copilot-cli, codex-cli — whatever you have) in this repo. Each + CLI session uses its own model; recommendation is at least 2 + auditors of DIFFERENT model families, so cross-family blind + spots become signal when their findings converge. - 2. Run AUDITOR SECONDARY PROMPT in a model of family B - (e.g., Google — gemini-2.5-pro, or OpenAI — gpt-4o / Copilot). - DO NOT use the same family for both. Heterogeneity inter-family - is what makes convergent findings high-signal — same-family - auditors share blind spots. + 2. In each auditor CLI, invoke: - 3. Save the auditor responses to canonical paths: - audit/charters//auditor-primary.md - audit/charters//auditor-secondary.md + /devtrail-audit-execute - 4. Return with: /devtrail-audit-review - I will calibrate the responses, generate the calibrator analysis - locally, and merge the findings into the Charter telemetry. -``` + The skill is already installed (devtrail init copies it to all + three platform locations). It reads the prompt from disk + automatically — you do not need to copy/paste anything. -## Output schema for auditor responses + 3. When and only when ALL audits you commissioned have completed, + return to this main agent and run: -The `auditor-primary.md` and `auditor-secondary.md` files the operator saves must follow `audit-output.schema.v0.json` (validated by `devtrail charter audit --calibrate`). The required frontmatter is documented inside each prompt — the operator should preserve it when pasting the LLM response. + /devtrail-audit-review + + Reviewing with incomplete reports gives you a partial consolidated + analysis that you will have to discard or re-run. +``` ## Notes -- This workflow is **orchestration-only**. It does NOT invoke LLM APIs, decide which models the operator uses, or wait for responses. It surfaces prompts and exits. -- Re-running on the same Charter regenerates the prompts (idempotent at the prompt level). It does NOT overwrite operator-saved responses in `auditor-primary.md` / `auditor-secondary.md`. -- Heterogeneity inter-family is recommended but not enforced in v0. The operator decides the model pairing; the workflow surfaces the recommendation in the next-steps guidance. -- For the rationale on why dual auditors of different families produce calibration that mono-auditor cannot, see `dist/.devtrail/audit-prompts/auditor-primary.md` § heterogeneity (or `Propuesta/devtrail-cli-roadmap.md` §5.2 in the upstream repo). +- This skill is **orchestration-only**. It does NOT invoke LLM APIs, decide which models the operator uses, or wait for responses. It runs the CLI's `--prepare`, points the operator at the next-step skill, and exits. +- Re-running the skill on the same Charter regenerates the prompt at the same path. It does NOT touch operator-saved reports under `.devtrail/audits//report-*.md`. +- Heterogeneity inter-family is recommended but not enforced in v0; the operator decides the model pairing across the N CLIs they open. +- For the rationale on cross-family auditors, see `dist/.devtrail/audit-prompts/audit-prompt.md` § Tu rol or `Propuesta/devtrail-cli-roadmap.md` §5.2 in the upstream repo. diff --git a/dist/.agent/workflows/devtrail-audit-review.md b/dist/.agent/workflows/devtrail-audit-review.md index cd3443f..f28ed0c 100644 --- a/dist/.agent/workflows/devtrail-audit-review.md +++ b/dist/.agent/workflows/devtrail-audit-review.md @@ -1,107 +1,251 @@ --- -description: Calibrate the responses of two external auditors and merge findings into the Charter telemetry. Counterpart to /devtrail-audit-prompt — invoke after the operator has saved the auditor responses. +description: Consolidate N external auditor reports into a critical review document with verdicts, remediation plan, and auditor ratings. Then merge the external_audit YAML block into the Charter telemetry. Counterpart of /devtrail-audit-prompt and /devtrail-audit-execute. --- # DevTrail Audit Review Skill -Reconcile the responses of two external auditors and produce the calibrator analysis. When the Charter has telemetry, append the consolidated `external_audit:` array directly into the YAML so the operator does not have to copy/paste. +Critically evaluate the N external auditor reports for a Charter, cross-reference each finding against the actual source code, and produce a consolidated `review.md` with verdicts, a prioritized remediation plan, and per-auditor ratings. Then merge the `external_audit` YAML block into the Charter telemetry. + +This is the third and final step of the v1 audit cycle, and it is where the substance lives — the calibrator role (definitional reconciliation across heterogeneous auditor verdicts) is now performed by the main agent inline, replacing the v0 paste-based calibrator-reconciler prompt. ## When to invoke -After running `/devtrail-audit-prompt `, having pasted both prompts into LLMs of different families, and saved the responses to: +After the operator has commissioned the audits in N auditor-side CLIs (each running `/devtrail-audit-execute `) and **all of them have completed**. The operator returns to the main IDE and invokes: -- `audit/charters//auditor-primary.md` -- `audit/charters//auditor-secondary.md` +``` +/devtrail-audit-review +``` -This skill produces the calibrator response and merges findings into telemetry. +If only some audits have completed, **do not proceed** — invoking the skill with incomplete reports produces a partial consolidated analysis. Verify with the operator that all the CLIs they opened have finished writing their reports under `.devtrail/audits//report-*.md`. ## Instructions -### 1. Verify auditor responses exist +### 1. Resolve the Charter and verify report set + +Argument: a Charter identifier. ```bash -ls audit/charters//auditor-primary.md \ - audit/charters//auditor-secondary.md +ls -la .devtrail/audits// ``` -If either file is missing, instruct the operator to run `/devtrail-audit-prompt ` first and exit. +Confirm: -### 2. Resolve the calibrator prompt (CALIBRATE step) +- `audit-prompt.md` exists (the prepared prompt the auditors read). +- `report-*.md` files exist (one per auditor that completed). At least 2 reports are recommended for cross-family heterogeneity to deliver signal; warn (do not block) if only 1 report is present. +- Match each `report-.md` filename against the model identifier in its frontmatter (`auditor:` field) to confirm consistency. Discrepancies are surfaced to the operator. -```bash -devtrail charter audit --calibrate -``` +If only the prompt is present and no reports exist, instruct the operator to run `/devtrail-audit-execute` in the auditor CLIs first. -The CLI: -- Validates `auditor-primary.md` and `auditor-secondary.md` against `audit-output.schema.v0.json`. If validation fails, surface the error to the operator and exit — the auditor response frontmatter must follow the schema documented inside each prompt. -- Writes `audit/charters//prompts/calibrator-reconciler.prompt.md`. +### 2. Read all auditor reports + extract finding master list -### 3. Run the calibrator inline (this conversation IS the calibrator) +For each `.devtrail/audits//report-*.md`: -The calibrator-reconciler prompt is designed to run in any model family — the agent currently in this conversation can produce the calibrator response directly. Heterogeneity inter-family is required for the auditor pair, NOT for the calibrator (the calibrator's task is definitional, not discovery — see `Propuesta/devtrail-cli-roadmap.md` §5.2 for the rationale). +- Validate the frontmatter against `audit-output.schema.v0.json` (the CLI does this in step 6, but a soft check here gives an earlier and more readable error if any report is malformed). +- Extract: model identifier, total findings, findings by category, evidence_citations count, audit_quality. +- Parse the body for finding entries (typically `## Findings` → `### F1 — title — category` blocks with `Where:`, `What I observed:`, `Why I'm flagging it:`). -Read the resolved prompt: +Build a **master finding list** — every unique claim across all auditors, deduplicated when two auditors clearly describe the same thing. -```bash -cat audit/charters//prompts/calibrator-reconciler.prompt.md -``` +### 3. Verify every finding against actual code -Produce the response **following the prompt's output schema exactly** — the frontmatter is required (`audit_role: calibrator-reconciler`, `calibrator: `, `auditors_reconciled`, `findings_consolidated`, `findings_by_status`). Save to: +This is the substantive step. For EACH finding in the master list: -- `audit/charters//calibrator-reconciler.md` +**Launch Explore agents in parallel** (up to 3 at a time) to verify findings. Group findings by the file they reference and send related findings to the same agent. -### 4. Finalize and merge into telemetry +For each finding, the verification answers four questions: -Determine the telemetry path: `.devtrail/charters/.telemetry.yaml` (canonical form, `` without slug). +1. **Does the code actually have this problem?** Read the cited `path:line`. Is the claim accurate? +2. **Is it in scope?** Does the finding affect a task or file declared in the Charter? If it's outside the Charter's scope, classify as `MISATTRIBUTED`. +3. **What's the real severity given the CURRENT configuration?** Check active driver, feature flags, build tags, DB role, deployment scope (the calibration discipline from the audit prompt). The Etapa 12 example in the prompt is a real case of inflation that careful calibration catches. +4. **Is it a duplicate?** Does another auditor report the same finding differently? -```bash -test -f .devtrail/charters/.telemetry.yaml -``` +#### Verdict classification -**Branch A — telemetry exists** (operator has already run `devtrail charter close`): +Each finding gets one of these verdicts: -```bash -devtrail charter audit --finalize \ - --merge-into .devtrail/charters/.telemetry.yaml +- **VALID** — Real problem, in scope, correctly described, correctly severized. +- **PARTIALLY VALID** — Real observation but wrong severity, missing nuance, or triggers only under a config not active in main. Include the reclassification. +- **MISATTRIBUTED** — Real observation but belongs to a different Charter or scope unit. +- **FALSE POSITIVE** — Claim is factually incorrect (file exists, code works as expected, the assumed driver isn't active, etc.). +- **DUPLICATE** — Same finding reported by another auditor (reference the original). + +#### Severity reclassification + +When the auditor's severity does not match the active configuration: + +- **Inflation** (auditor says Critical/High, code confirms only conditional): downgrade to Medium with explicit note on what config would activate it, OR move to "post-Charter / no bloqueante" if the trigger is a component not yet implemented. +- **Deflation** (auditor says Low/None, code shows a real trigger in current config): upgrade with evidence from the code path that activates it. +- **Correct inflation** (auditor says Critical and config confirms it): keep. + +An auditor that consistently inflates or deflates loses points in the auditor rating (step 5). + +### 4. Identify findings the auditors missed + +Based on your code exploration, check whether there are problems the auditors did NOT find. Focus on: + +- Security: SQL queries missing ownership filters, missing transaction boundaries, secrets in logs. +- Logic: input parameters ignored, dead code paths, unreachable branches. +- Consistency: naming mismatches between layers (model vs handler vs API). + +Mark these as "Missed by all auditors" in the remediation plan. + +### 5. Build the consolidated review.md + +Write the consolidated analysis to `.devtrail/audits//review.md` with this structure (six sections, lifted from Sentinel's pre-DevTrail audit-review skill): + +```markdown +--- +audit_role: calibrator-reconciler +calibrator: +charter_id: +git_range: "" +prompt_used: ../audit-prompt.md +calibrated_at: +auditors_reconciled: + - report-.md + - report-.md + - ... +findings_consolidated: +findings_by_status: + agreed: + disputed: + unique_: + unique_: + rejected: +--- + +# Consolidated audit review — + +**Reviewer:** +**Date:** +**Confidence:** [High | Medium] + +## 1. Executive summary + +[2-3 paragraphs. Total findings count, scope confusion if any, most critical bug, +overall verdict on the Charter's implementation.] + +## 2. Scope definition + +[Table of Charter tasks, the closing criterion, and what IS vs IS NOT in scope +of this Charter. The auditors' findings are evaluated against THIS scope.] + +## 3. Per-auditor evaluation + +### 3.1 (model: ) + +| # | Finding | Reported severity | Verdict | Justification | +|---|---------|-------------------|---------|---------------| + +**Summary:** [2-3 sentences on this auditor's overall performance.] + +### 3.2 ... (same shape) + +## 4. Remediation plan — VALID and PARTIALLY VALID findings + +### P0 — Security +- **Files:** `path:line` +- **Problem:** [description with code evidence] +- **Remediation:** [specific approach] +- **Complexity:** [Low / Medium / High] +- **Detected by:** [auditor slug(s), or "Missed by all auditors" if you found it] + +### P1 — Integrity +[Same shape] + +### P2 — Consistency +[Same shape] + +### P3 — Robustness +[Same shape] + +### P4 — Documentation +[Same shape] + +## 5. Discarded findings — misattributions and false positives + +| Finding | Type | Charter / area | Auditor | +|---------|------|----------------|---------| + +## 6. Auditor ratings + +Score each auditor 1-10 across four criteria with weights: + +| Auditor | Scope precision (25%) | Technical depth (25%) | Bug detection (30%) | False positive rate (20%) | **Overall** | +|---------|:-:|:-:|:-:|:-:|:-:| +| | /10 | /10 | /10 | /10 | **/10** | +| ... + +### Justifications + +**/10**: [2-3 sentences on what this auditor did well and where they slipped.] + +## 7. Conclusion + +[State of the Charter — clean / partial / deviated. Critical findings count. +Key remediation items the operator should address before close. +Recommended next step.] ``` -The CLI validates all 3 outputs and appends `external_audit:` to the telemetry YAML directly. **Do NOT** edit the YAML by hand afterwards — `git diff` will show what changed; the operator reviews before commit. +### 6. Validate + emit external_audit YAML -If the CLI rejects the merge because `external_audit:` already exists (re-audit guard, v0 does not support re-merge), surface the message to the operator. Manual append of new findings is the v0 fallback for that case. +Run the CLI's merge step to validate all reports against the schema and emit the YAML block (combined with `--merge-into` if the operator's Charter telemetry already exists): -**Branch B — telemetry does NOT exist** (Charter not yet closed): +**Branch A — telemetry exists** (operator already ran `devtrail charter close` for this Charter, perhaps without audit, and is now adding audit findings retroactively): ```bash -devtrail charter audit --finalize > /tmp/external-audit-block.yaml -mkdir -p audit/charters/ -mv /tmp/external-audit-block.yaml audit/charters//external-audit-pending.yaml +devtrail charter audit --merge-reports \ + --merge-into .devtrail/charters/.telemetry.yaml ``` -Tell the operator: "The Charter is not yet closed. The findings are saved in `audit/charters//external-audit-pending.yaml`. When you run `devtrail charter close `, paste the `external_audit:` block from that file into the telemetry when prompted, or merge it manually after close completes." +The CLI appends the `external_audit:` array to the telemetry YAML. The CLI v1 deliberately rejects re-audit (file already has `external_audit:`) — if that fires, surface the message to the operator. Manual append is the fallback. -### 5. Print summary - -After step 4, print to the operator: +**Branch B — telemetry does NOT exist** (the typical case: operator audits BEFORE closing): +```bash +devtrail charter audit --merge-reports ``` -Audit review complete for . - Auditors reconciled: - - auditor-primary.md ( findings, ) - - auditor-secondary.md ( findings, ) - Calibrator: ( findings consolidated) +The YAML block prints to stdout. Capture it and write it to `.devtrail/audits//external-audit-pending.yaml` so the operator can paste it into the telemetry when running `devtrail charter close `. Tell the operator: "When you run `devtrail charter close `, paste the `external_audit:` block from `.devtrail/audits//external-audit-pending.yaml` when prompted." - external_audit YAML: - [merged into .devtrail/charters/.telemetry.yaml] - or - [pending in audit/charters//external-audit-pending.yaml] +### 7. Notify the operator with a summary - Run `git diff .devtrail/charters/.telemetry.yaml` to review. +``` +Consolidated audit review complete for . + + Auditors reconciled: + Findings reported: + Findings VALID: (%) + False positives: + Misattributions: + Missed by all (you): + + Remediation plan: + P0 (Security): + P1 (Integrity): + P2-P4 (Other): + + Auditor ratings: + : /10 + : /10 + ... + + Documents written: + .devtrail/audits//review.md + [either: .devtrail/charters/.telemetry.yaml (merged)] + [or: .devtrail/audits//external-audit-pending.yaml] + +Run `git diff` to review the changes before commit. ``` ## Notes -- This skill **does** invoke an LLM (the calibrator runs in the current conversation), unlike `/devtrail-audit-prompt` which is purely orchestration. The distinction is intentional: the calibrator is family-agnostic, so the agent driving the conversation is a valid calibrator. -- The skill is **idempotent for steps 1-3** (re-running regenerates the calibrator response if you delete `calibrator-reconciler.md`). It is **NOT idempotent for step 4 in Branch A** — once telemetry has `external_audit:`, the CLI rejects re-merge to prevent silent duplication. -- The auto-merge re-emits the YAML using the existing `charter_telemetry:` shape; cosmetic formatting of the merged file matches the close.rs output. Comments in the original telemetry YAML, if any, are preserved (the CLI uses string-level append, not full re-serialization). -- The `audit_notes:` field in the merged block points at the canonical paths `audit/charters//auditor-{primary,secondary}.md`. If you renamed those files, fix the field manually after merge. +- **The calibrator role is in-conversation, not in a paste-based prompt.** v0 had a `calibrator-reconciler.md` template that the operator pasted into a separate model. v1 eliminates that round-trip — the main agent (this conversation) has filesystem access and can verify findings against code, which is what makes the consolidated review substantive instead of mechanical. +- **Heterogeneity inter-family is for the auditor pair, not the calibrator.** The calibrator (this skill, running in the main agent) may be of any family — even the same family as one of the auditors — because its task is definitional (apply the schema to already-produced verdicts), not discovery. +- **The review.md is the human-readable delivery; the YAML block is the machine-readable input to telemetry.** Both coexist by design. The operator reads `review.md`; `devtrail metrics` and the Phase 2 telemetry aggregation read the YAML. +- **Re-running the skill** overwrites `review.md`. If the operator wants to keep the previous review (e.g., after a re-audit cycle), instruct them to copy it manually before re-running. +- **No HTTP API calls.** The skill runs the CLI's `--merge-reports` (which validates schemas and emits YAML), invokes Explore agents internally for code verification (which run in the main agent's context, not external APIs), and writes markdown. DevTrail does not invoke LLM APIs at any point. + +## Credit + +The six-section structure of `review.md` and the verdict vocabulary (VALID / PARTIALLY VALID / MISATTRIBUTED / FALSE POSITIVE / DUPLICATE) are lifted from the `audit-review/SKILL.md` skill mature pre-DevTrail in Sentinel, contributed via [issue #102](https://github.com/StrangeDaysTech/devtrail/issues/102) by José Villaseñor Montfort (StrangeDaysTech). The four-criterion weighted auditor rating (Scope precision 25% / Technical depth 25% / Bug detection 30% / False positive rate 20%) is the same. Project-specific hardcodes parameterized. diff --git a/dist/.claude/skills/devtrail-audit-prompt/SKILL.md b/dist/.claude/skills/devtrail-audit-prompt/SKILL.md index a3051c0..fcac07c 100644 --- a/dist/.claude/skills/devtrail-audit-prompt/SKILL.md +++ b/dist/.claude/skills/devtrail-audit-prompt/SKILL.md @@ -1,16 +1,16 @@ --- name: devtrail-audit-prompt -description: Prepare external multi-model audit for a Charter. Generates auditor prompts inline so the operator can paste them into 2 LLM auditors of different families. +description: Generate the unified audit prompt for a Charter at the canonical filesystem location. The operator then opens N auditor-side CLIs (gemini-cli, claude-cli, copilot-cli, etc.) and invokes /devtrail-audit-execute in each — no copy/paste. Counterpart of /devtrail-audit-review. allowed-tools: Read, Bash(devtrail charter audit *, devtrail charter status *, ls *) --- # DevTrail Audit Prompt Skill -Generate prompts for external multi-model audit of a Charter, surfaced inline in the conversation so the operator can paste them into auditors of two different model families without leaving the chat. +Generate the unified audit prompt for a Charter and write it to the canonical filesystem location. The operator then opens auditor-side CLIs in the same repo and invokes `/devtrail-audit-execute` — the audit prompt is read directly from disk by each auditor; the operator never copies/pastes. ## When to invoke -Use this skill when the developer agreed to run an external audit at the Charter checkpoint (see `.devtrail/00-governance/AGENT-RULES.md` § Audit checkpoint, available from `fw-4.8.0`). +Use this skill when the developer agreed to run an external audit at the Charter checkpoint (see `.devtrail/00-governance/AGENT-RULES.md` § Audit checkpoint). The Charter should be in `in-progress` or `declared` status — auditing closed Charters is allowed but atypical (warn the operator and proceed only on confirmation). @@ -18,7 +18,7 @@ The Charter should be in `in-progress` or `declared` status — auditing closed ### 1. Resolve the Charter -Argument: a Charter identifier (`CHARTER-04`, `04`, or the full id with slug). +Argument: a Charter identifier (`CHARTER-04`, `04`, or full id with slug). ```bash devtrail charter status @@ -26,67 +26,60 @@ devtrail charter status Verify the Charter exists and capture its `status`. If `status: closed`, surface a one-line warning to the operator and ask whether to proceed. -### 2. Generate the auditor prompts (PREPARE step) +### 2. Generate the unified audit prompt ```bash -devtrail charter audit +devtrail charter audit --prepare ``` -The CLI writes the resolved prompts to disk: +The CLI writes the resolved prompt to: -- `audit/charters//prompts/auditor-primary.prompt.md` -- `audit/charters//prompts/auditor-secondary.prompt.md` +``` +.devtrail/audits//audit-prompt.md +``` -This is `devtrail charter audit` step 1 (PREPARE). The CLI does NOT invoke any LLM — it only resolves placeholders against the Charter content, the git diff, and the originating AILOGs. +The prompt is self-contained: it embeds the Charter content, the originating AILOGs, the git diff over the resolved range (default `origin/main..HEAD`, falls back to `HEAD~1..HEAD` if no upstream is reachable), and the discipline rules (REGLA ABSOLUTA — SOLO LECTURA, evidence-citation, severity calibration). The prompt template lifts the seven universal sections from Sentinel's pre-DevTrail audit skill and parameterizes the project-specific hardcodes. -### 3. Surface the prompts inline +The CLI does NOT invoke any LLM. It only resolves placeholders. -Read both files and print their contents in the conversation, with clear separators: +### 3. Notify the operator -``` -═══════════════════ AUDITOR PRIMARY PROMPT ═══════════════════ -[full contents of auditor-primary.prompt.md] -══════════════════════════════════════════════════════════════ +After the CLI succeeds, print this guidance verbatim (substituting ``): -═══════════════════ AUDITOR SECONDARY PROMPT ═════════════════ -[full contents of auditor-secondary.prompt.md] -══════════════════════════════════════════════════════════════ ``` +Audit prompt prepared for . -Do not summarise or truncate. The operator needs the full prompts to paste into the external auditors as-is. - -### 4. Provide next-steps guidance - -After surfacing the prompts, print this guidance verbatim (substituting ``): + Location: .devtrail/audits//audit-prompt.md + (informational — you don't need to copy this path anywhere) -``` Next steps: - 1. Run AUDITOR PRIMARY PROMPT in a model of family A - (e.g., Anthropic — claude-sonnet-4-6 or claude-opus-4-7). + 1. Open one or more auditor-side CLIs (gemini-cli, claude-cli, + copilot-cli, codex-cli — whatever you have) in this repo. Each + CLI session uses its own model; recommendation is at least 2 + auditors of DIFFERENT model families, so cross-family blind + spots become signal when their findings converge. - 2. Run AUDITOR SECONDARY PROMPT in a model of family B - (e.g., Google — gemini-2.5-pro, or OpenAI — gpt-4o / Copilot). - DO NOT use the same family for both. Heterogeneity inter-family - is what makes convergent findings high-signal — same-family - auditors share blind spots. + 2. In each auditor CLI, invoke: - 3. Save the auditor responses to canonical paths: - audit/charters//auditor-primary.md - audit/charters//auditor-secondary.md + /devtrail-audit-execute - 4. Return with: /devtrail-audit-review - I will calibrate the responses, generate the calibrator analysis - locally, and merge the findings into the Charter telemetry. -``` + The skill is already installed (devtrail init copies it to all + three platform locations). It reads the prompt from disk + automatically — you do not need to copy/paste anything. -## Output schema for auditor responses + 3. When and only when ALL audits you commissioned have completed, + return to this main agent and run: -The `auditor-primary.md` and `auditor-secondary.md` files the operator saves must follow `audit-output.schema.v0.json` (validated by `devtrail charter audit --calibrate`). The required frontmatter is documented inside each prompt — the operator should preserve it when pasting the LLM response. + /devtrail-audit-review + + Reviewing with incomplete reports gives you a partial consolidated + analysis that you will have to discard or re-run. +``` ## Notes -- This skill is **orchestration-only**. It does NOT invoke LLM APIs, decide which models the operator uses, or wait for responses. It surfaces prompts and exits. -- Re-running the skill on the same Charter regenerates the prompts (idempotent at the prompt level). It does NOT overwrite operator-saved responses in `auditor-primary.md` / `auditor-secondary.md`. -- Heterogeneity inter-family is recommended but not enforced in v0. The operator decides the model pairing; the skill surfaces the recommendation in the next-steps guidance. -- For the rationale on why dual auditors of different families produce calibration that mono-auditor cannot, see `dist/.devtrail/audit-prompts/auditor-primary.md` § heterogeneity (or `Propuesta/devtrail-cli-roadmap.md` §5.2 in the upstream repo). +- This skill is **orchestration-only**. It does NOT invoke LLM APIs, decide which models the operator uses, or wait for responses. It runs the CLI's `--prepare`, points the operator at the next-step skill, and exits. +- Re-running the skill on the same Charter regenerates the prompt at the same path. It does NOT touch operator-saved reports under `.devtrail/audits//report-*.md`. +- Heterogeneity inter-family is recommended but not enforced in v0; the operator decides the model pairing across the N CLIs they open. +- For the rationale on cross-family auditors, see `dist/.devtrail/audit-prompts/audit-prompt.md` § Tu rol or `Propuesta/devtrail-cli-roadmap.md` §5.2 in the upstream repo. diff --git a/dist/.claude/skills/devtrail-audit-review/SKILL.md b/dist/.claude/skills/devtrail-audit-review/SKILL.md index 04587f7..65e4d23 100644 --- a/dist/.claude/skills/devtrail-audit-review/SKILL.md +++ b/dist/.claude/skills/devtrail-audit-review/SKILL.md @@ -1,109 +1,253 @@ --- name: devtrail-audit-review -description: Calibrate the responses of two external auditors and merge findings into the Charter telemetry. Counterpart to /devtrail-audit-prompt — invoke after the operator has saved the auditor responses. -allowed-tools: Read, Write, Bash(devtrail charter audit *, devtrail charter status *, ls *, git diff *) +description: Consolidate N external auditor reports into a critical review document with verdicts, remediation plan, and auditor ratings. Then merge the external_audit YAML block into the Charter telemetry. Counterpart of /devtrail-audit-prompt and /devtrail-audit-execute. +allowed-tools: Read, Write, Glob, Grep, Bash(devtrail charter audit *, devtrail charter status *, ls *, git diff *, git log *) --- # DevTrail Audit Review Skill -Reconcile the responses of two external auditors and produce the calibrator analysis. When the Charter has telemetry, append the consolidated `external_audit:` array directly into the YAML so the operator does not have to copy/paste. +Critically evaluate the N external auditor reports for a Charter, cross-reference each finding against the actual source code, and produce a consolidated `review.md` with verdicts, a prioritized remediation plan, and per-auditor ratings. Then merge the `external_audit` YAML block into the Charter telemetry. + +This is the third and final step of the v1 audit cycle, and it is where the substance lives — the calibrator role (definitional reconciliation across heterogeneous auditor verdicts) is now performed by the main agent inline, replacing the v0 paste-based calibrator-reconciler prompt. ## When to invoke -After running `/devtrail-audit-prompt `, having pasted both prompts into LLMs of different families, and saved the responses to: +After the operator has commissioned the audits in N auditor-side CLIs (each running `/devtrail-audit-execute `) and **all of them have completed**. The operator returns to the main IDE and invokes: -- `audit/charters//auditor-primary.md` -- `audit/charters//auditor-secondary.md` +``` +/devtrail-audit-review +``` -This skill produces the calibrator response and merges findings into telemetry. +If only some audits have completed, **do not proceed** — invoking the skill with incomplete reports produces a partial consolidated analysis. Verify with the operator that all the CLIs they opened have finished writing their reports under `.devtrail/audits//report-*.md`. ## Instructions -### 1. Verify auditor responses exist +### 1. Resolve the Charter and verify report set + +Argument: a Charter identifier. ```bash -ls audit/charters//auditor-primary.md \ - audit/charters//auditor-secondary.md +ls -la .devtrail/audits// ``` -If either file is missing, instruct the operator to run `/devtrail-audit-prompt ` first and exit. +Confirm: -### 2. Resolve the calibrator prompt (CALIBRATE step) +- `audit-prompt.md` exists (the prepared prompt the auditors read). +- `report-*.md` files exist (one per auditor that completed). At least 2 reports are recommended for cross-family heterogeneity to deliver signal; warn (do not block) if only 1 report is present. +- Match each `report-.md` filename against the model identifier in its frontmatter (`auditor:` field) to confirm consistency. Discrepancies are surfaced to the operator. -```bash -devtrail charter audit --calibrate -``` +If only the prompt is present and no reports exist, instruct the operator to run `/devtrail-audit-execute` in the auditor CLIs first. -The CLI: -- Validates `auditor-primary.md` and `auditor-secondary.md` against `audit-output.schema.v0.json`. If validation fails, surface the error to the operator and exit — the auditor response frontmatter must follow the schema documented inside each prompt. -- Writes `audit/charters//prompts/calibrator-reconciler.prompt.md`. +### 2. Read all auditor reports + extract finding master list -### 3. Run the calibrator inline (this conversation IS the calibrator) +For each `.devtrail/audits//report-*.md`: -The calibrator-reconciler prompt is designed to run in any model family — the agent currently in this conversation can produce the calibrator response directly. Heterogeneity inter-family is required for the auditor pair, NOT for the calibrator (the calibrator's task is definitional, not discovery — see `Propuesta/devtrail-cli-roadmap.md` §5.2 for the rationale). +- Validate the frontmatter against `audit-output.schema.v0.json` (the CLI does this in step 6, but a soft check here gives an earlier and more readable error if any report is malformed). +- Extract: model identifier, total findings, findings by category, evidence_citations count, audit_quality. +- Parse the body for finding entries (typically `## Findings` → `### F1 — title — category` blocks with `Where:`, `What I observed:`, `Why I'm flagging it:`). -Read the resolved prompt: +Build a **master finding list** — every unique claim across all auditors, deduplicated when two auditors clearly describe the same thing. -```bash -cat audit/charters//prompts/calibrator-reconciler.prompt.md -``` +### 3. Verify every finding against actual code -Produce the response **following the prompt's output schema exactly** — the frontmatter is required (`audit_role: calibrator-reconciler`, `calibrator: `, `auditors_reconciled`, `findings_consolidated`, `findings_by_status`). Save to: +This is the substantive step. For EACH finding in the master list: -- `audit/charters//calibrator-reconciler.md` +**Launch Explore agents in parallel** (up to 3 at a time) to verify findings. Group findings by the file they reference and send related findings to the same agent. -### 4. Finalize and merge into telemetry +For each finding, the verification answers four questions: -Determine the telemetry path: `.devtrail/charters/.telemetry.yaml` (canonical form, `` without slug). +1. **Does the code actually have this problem?** Read the cited `path:line`. Is the claim accurate? +2. **Is it in scope?** Does the finding affect a task or file declared in the Charter? If it's outside the Charter's scope, classify as `MISATTRIBUTED`. +3. **What's the real severity given the CURRENT configuration?** Check active driver, feature flags, build tags, DB role, deployment scope (the calibration discipline from the audit prompt). The Etapa 12 example in the prompt is a real case of inflation that careful calibration catches. +4. **Is it a duplicate?** Does another auditor report the same finding differently? -```bash -test -f .devtrail/charters/.telemetry.yaml -``` +#### Verdict classification -**Branch A — telemetry exists** (operator has already run `devtrail charter close`): +Each finding gets one of these verdicts: -```bash -devtrail charter audit --finalize \ - --merge-into .devtrail/charters/.telemetry.yaml +- **VALID** — Real problem, in scope, correctly described, correctly severized. +- **PARTIALLY VALID** — Real observation but wrong severity, missing nuance, or triggers only under a config not active in main. Include the reclassification. +- **MISATTRIBUTED** — Real observation but belongs to a different Charter or scope unit. +- **FALSE POSITIVE** — Claim is factually incorrect (file exists, code works as expected, the assumed driver isn't active, etc.). +- **DUPLICATE** — Same finding reported by another auditor (reference the original). + +#### Severity reclassification + +When the auditor's severity does not match the active configuration: + +- **Inflation** (auditor says Critical/High, code confirms only conditional): downgrade to Medium with explicit note on what config would activate it, OR move to "post-Charter / no bloqueante" if the trigger is a component not yet implemented. +- **Deflation** (auditor says Low/None, code shows a real trigger in current config): upgrade with evidence from the code path that activates it. +- **Correct inflation** (auditor says Critical and config confirms it): keep. + +An auditor that consistently inflates or deflates loses points in the auditor rating (step 5). + +### 4. Identify findings the auditors missed + +Based on your code exploration, check whether there are problems the auditors did NOT find. Focus on: + +- Security: SQL queries missing ownership filters, missing transaction boundaries, secrets in logs. +- Logic: input parameters ignored, dead code paths, unreachable branches. +- Consistency: naming mismatches between layers (model vs handler vs API). + +Mark these as "Missed by all auditors" in the remediation plan. + +### 5. Build the consolidated review.md + +Write the consolidated analysis to `.devtrail/audits//review.md` with this structure (six sections, lifted from Sentinel's pre-DevTrail audit-review skill): + +```markdown +--- +audit_role: calibrator-reconciler +calibrator: +charter_id: +git_range: "" +prompt_used: ../audit-prompt.md +calibrated_at: +auditors_reconciled: + - report-.md + - report-.md + - ... +findings_consolidated: +findings_by_status: + agreed: + disputed: + unique_: + unique_: + rejected: +--- + +# Consolidated audit review — + +**Reviewer:** +**Date:** +**Confidence:** [High | Medium] + +## 1. Executive summary + +[2-3 paragraphs. Total findings count, scope confusion if any, most critical bug, +overall verdict on the Charter's implementation.] + +## 2. Scope definition + +[Table of Charter tasks, the closing criterion, and what IS vs IS NOT in scope +of this Charter. The auditors' findings are evaluated against THIS scope.] + +## 3. Per-auditor evaluation + +### 3.1 (model: ) + +| # | Finding | Reported severity | Verdict | Justification | +|---|---------|-------------------|---------|---------------| + +**Summary:** [2-3 sentences on this auditor's overall performance.] + +### 3.2 ... (same shape) + +## 4. Remediation plan — VALID and PARTIALLY VALID findings + +### P0 — Security +- **Files:** `path:line` +- **Problem:** [description with code evidence] +- **Remediation:** [specific approach] +- **Complexity:** [Low / Medium / High] +- **Detected by:** [auditor slug(s), or "Missed by all auditors" if you found it] + +### P1 — Integrity +[Same shape] + +### P2 — Consistency +[Same shape] + +### P3 — Robustness +[Same shape] + +### P4 — Documentation +[Same shape] + +## 5. Discarded findings — misattributions and false positives + +| Finding | Type | Charter / area | Auditor | +|---------|------|----------------|---------| + +## 6. Auditor ratings + +Score each auditor 1-10 across four criteria with weights: + +| Auditor | Scope precision (25%) | Technical depth (25%) | Bug detection (30%) | False positive rate (20%) | **Overall** | +|---------|:-:|:-:|:-:|:-:|:-:| +| | /10 | /10 | /10 | /10 | **/10** | +| ... + +### Justifications + +**/10**: [2-3 sentences on what this auditor did well and where they slipped.] + +## 7. Conclusion + +[State of the Charter — clean / partial / deviated. Critical findings count. +Key remediation items the operator should address before close. +Recommended next step.] ``` -The CLI validates all 3 outputs and appends `external_audit:` to the telemetry YAML directly. **Do NOT** edit the YAML by hand afterwards — `git diff` will show what changed; the operator reviews before commit. +### 6. Validate + emit external_audit YAML -If the CLI rejects the merge because `external_audit:` already exists (re-audit guard, v0 does not support re-merge), surface the message to the operator. Manual append of new findings is the v0 fallback for that case. +Run the CLI's merge step to validate all reports against the schema and emit the YAML block (combined with `--merge-into` if the operator's Charter telemetry already exists): -**Branch B — telemetry does NOT exist** (Charter not yet closed): +**Branch A — telemetry exists** (operator already ran `devtrail charter close` for this Charter, perhaps without audit, and is now adding audit findings retroactively): ```bash -devtrail charter audit --finalize > /tmp/external-audit-block.yaml -mkdir -p audit/charters/ -mv /tmp/external-audit-block.yaml audit/charters//external-audit-pending.yaml +devtrail charter audit --merge-reports \ + --merge-into .devtrail/charters/.telemetry.yaml ``` -Tell the operator: "The Charter is not yet closed. The findings are saved in `audit/charters//external-audit-pending.yaml`. When you run `devtrail charter close `, paste the `external_audit:` block from that file into the telemetry when prompted, or merge it manually after close completes." +The CLI appends the `external_audit:` array to the telemetry YAML. The CLI v1 deliberately rejects re-audit (file already has `external_audit:`) — if that fires, surface the message to the operator. Manual append is the fallback. -### 5. Print summary - -After step 4, print to the operator: +**Branch B — telemetry does NOT exist** (the typical case: operator audits BEFORE closing): +```bash +devtrail charter audit --merge-reports ``` -Audit review complete for . - Auditors reconciled: - - auditor-primary.md ( findings, ) - - auditor-secondary.md ( findings, ) - Calibrator: ( findings consolidated) +The YAML block prints to stdout. Capture it and write it to `.devtrail/audits//external-audit-pending.yaml` so the operator can paste it into the telemetry when running `devtrail charter close `. Tell the operator: "When you run `devtrail charter close `, paste the `external_audit:` block from `.devtrail/audits//external-audit-pending.yaml` when prompted." - external_audit YAML: - [merged into .devtrail/charters/.telemetry.yaml] - or - [pending in audit/charters//external-audit-pending.yaml] +### 7. Notify the operator with a summary - Run `git diff .devtrail/charters/.telemetry.yaml` to review. +``` +Consolidated audit review complete for . + + Auditors reconciled: + Findings reported: + Findings VALID: (%) + False positives: + Misattributions: + Missed by all (you): + + Remediation plan: + P0 (Security): + P1 (Integrity): + P2-P4 (Other): + + Auditor ratings: + : /10 + : /10 + ... + + Documents written: + .devtrail/audits//review.md + [either: .devtrail/charters/.telemetry.yaml (merged)] + [or: .devtrail/audits//external-audit-pending.yaml] + +Run `git diff` to review the changes before commit. ``` ## Notes -- This skill **does** invoke an LLM (the calibrator runs in the current conversation), unlike `/devtrail-audit-prompt` which is purely orchestration. The distinction is intentional: the calibrator is family-agnostic, so the agent driving the conversation is a valid calibrator. -- The skill is **idempotent for steps 1-3** (re-running regenerates the calibrator response if you delete `calibrator-reconciler.md`). It is **NOT idempotent for step 4 in Branch A** — once telemetry has `external_audit:`, the CLI rejects re-merge to prevent silent duplication. -- The auto-merge re-emits the YAML using the existing `charter_telemetry:` shape; cosmetic formatting of the merged file matches the close.rs output. Comments in the original telemetry YAML, if any, are preserved (the CLI uses string-level append, not full re-serialization). -- The `audit_notes:` field in the merged block points at the canonical paths `audit/charters//auditor-{primary,secondary}.md`. If you renamed those files, fix the field manually after merge. +- **The calibrator role is in-conversation, not in a paste-based prompt.** v0 had a `calibrator-reconciler.md` template that the operator pasted into a separate model. v1 eliminates that round-trip — the main agent (this conversation) has filesystem access and can verify findings against code, which is what makes the consolidated review substantive instead of mechanical. +- **Heterogeneity inter-family is for the auditor pair, not the calibrator.** The calibrator (this skill, running in the main agent) may be of any family — even the same family as one of the auditors — because its task is definitional (apply the schema to already-produced verdicts), not discovery. +- **The review.md is the human-readable delivery; the YAML block is the machine-readable input to telemetry.** Both coexist by design. The operator reads `review.md`; `devtrail metrics` and the Phase 2 telemetry aggregation read the YAML. +- **Re-running the skill** overwrites `review.md`. If the operator wants to keep the previous review (e.g., after a re-audit cycle), instruct them to copy it manually before re-running. +- **No HTTP API calls.** The skill runs the CLI's `--merge-reports` (which validates schemas and emits YAML), invokes Explore agents internally for code verification (which run in the main agent's context, not external APIs), and writes markdown. DevTrail does not invoke LLM APIs at any point. + +## Credit + +The six-section structure of `review.md` and the verdict vocabulary (VALID / PARTIALLY VALID / MISATTRIBUTED / FALSE POSITIVE / DUPLICATE) are lifted from the `audit-review/SKILL.md` skill mature pre-DevTrail in Sentinel, contributed via [issue #102](https://github.com/StrangeDaysTech/devtrail/issues/102) by José Villaseñor Montfort (StrangeDaysTech). The four-criterion weighted auditor rating (Scope precision 25% / Technical depth 25% / Bug detection 30% / False positive rate 20%) is the same. Project-specific hardcodes parameterized. diff --git a/dist/.gemini/skills/devtrail-audit-prompt/SKILL.md b/dist/.gemini/skills/devtrail-audit-prompt/SKILL.md index b4ade68..2d68155 100644 --- a/dist/.gemini/skills/devtrail-audit-prompt/SKILL.md +++ b/dist/.gemini/skills/devtrail-audit-prompt/SKILL.md @@ -1,15 +1,15 @@ --- name: devtrail-audit-prompt -description: Prepare external multi-model audit for a Charter. Generates auditor prompts inline so the operator can paste them into 2 LLM auditors of different families. +description: Generate the unified audit prompt for a Charter at the canonical filesystem location. The operator then opens N auditor-side CLIs (gemini-cli, claude-cli, copilot-cli, etc.) and invokes /devtrail-audit-execute in each — no copy/paste. Counterpart of /devtrail-audit-review. --- # DevTrail Audit Prompt Skill -Generate prompts for external multi-model audit of a Charter, surfaced inline in the conversation so the operator can paste them into auditors of two different model families without leaving the chat. +Generate the unified audit prompt for a Charter and write it to the canonical filesystem location. The operator then opens auditor-side CLIs in the same repo and invokes `/devtrail-audit-execute` — the audit prompt is read directly from disk by each auditor; the operator never copies/pastes. ## When to invoke -Use this skill when the developer agreed to run an external audit at the Charter checkpoint (see `.devtrail/00-governance/AGENT-RULES.md` § Audit checkpoint, available from `fw-4.8.0`). +Use this skill when the developer agreed to run an external audit at the Charter checkpoint (see `.devtrail/00-governance/AGENT-RULES.md` § Audit checkpoint). The Charter should be in `in-progress` or `declared` status — auditing closed Charters is allowed but atypical (warn the operator and proceed only on confirmation). @@ -17,7 +17,7 @@ The Charter should be in `in-progress` or `declared` status — auditing closed ### 1. Resolve the Charter -Argument: a Charter identifier (`CHARTER-04`, `04`, or the full id with slug). +Argument: a Charter identifier (`CHARTER-04`, `04`, or full id with slug). ```bash devtrail charter status @@ -25,67 +25,60 @@ devtrail charter status Verify the Charter exists and capture its `status`. If `status: closed`, surface a one-line warning to the operator and ask whether to proceed. -### 2. Generate the auditor prompts (PREPARE step) +### 2. Generate the unified audit prompt ```bash -devtrail charter audit +devtrail charter audit --prepare ``` -The CLI writes the resolved prompts to disk: +The CLI writes the resolved prompt to: -- `audit/charters//prompts/auditor-primary.prompt.md` -- `audit/charters//prompts/auditor-secondary.prompt.md` +``` +.devtrail/audits//audit-prompt.md +``` -This is `devtrail charter audit` step 1 (PREPARE). The CLI does NOT invoke any LLM — it only resolves placeholders against the Charter content, the git diff, and the originating AILOGs. +The prompt is self-contained: it embeds the Charter content, the originating AILOGs, the git diff over the resolved range (default `origin/main..HEAD`, falls back to `HEAD~1..HEAD` if no upstream is reachable), and the discipline rules (REGLA ABSOLUTA — SOLO LECTURA, evidence-citation, severity calibration). The prompt template lifts the seven universal sections from Sentinel's pre-DevTrail audit skill and parameterizes the project-specific hardcodes. -### 3. Surface the prompts inline +The CLI does NOT invoke any LLM. It only resolves placeholders. -Read both files and print their contents in the conversation, with clear separators: +### 3. Notify the operator -``` -═══════════════════ AUDITOR PRIMARY PROMPT ═══════════════════ -[full contents of auditor-primary.prompt.md] -══════════════════════════════════════════════════════════════ +After the CLI succeeds, print this guidance verbatim (substituting ``): -═══════════════════ AUDITOR SECONDARY PROMPT ═════════════════ -[full contents of auditor-secondary.prompt.md] -══════════════════════════════════════════════════════════════ ``` +Audit prompt prepared for . -Do not summarise or truncate. The operator needs the full prompts to paste into the external auditors as-is. - -### 4. Provide next-steps guidance - -After surfacing the prompts, print this guidance verbatim (substituting ``): + Location: .devtrail/audits//audit-prompt.md + (informational — you don't need to copy this path anywhere) -``` Next steps: - 1. Run AUDITOR PRIMARY PROMPT in a model of family A - (e.g., Anthropic — claude-sonnet-4-6 or claude-opus-4-7). + 1. Open one or more auditor-side CLIs (gemini-cli, claude-cli, + copilot-cli, codex-cli — whatever you have) in this repo. Each + CLI session uses its own model; recommendation is at least 2 + auditors of DIFFERENT model families, so cross-family blind + spots become signal when their findings converge. - 2. Run AUDITOR SECONDARY PROMPT in a model of family B - (e.g., Google — gemini-2.5-pro, or OpenAI — gpt-4o / Copilot). - DO NOT use the same family for both. Heterogeneity inter-family - is what makes convergent findings high-signal — same-family - auditors share blind spots. + 2. In each auditor CLI, invoke: - 3. Save the auditor responses to canonical paths: - audit/charters//auditor-primary.md - audit/charters//auditor-secondary.md + /devtrail-audit-execute - 4. Return with: /devtrail-audit-review - I will calibrate the responses, generate the calibrator analysis - locally, and merge the findings into the Charter telemetry. -``` + The skill is already installed (devtrail init copies it to all + three platform locations). It reads the prompt from disk + automatically — you do not need to copy/paste anything. -## Output schema for auditor responses + 3. When and only when ALL audits you commissioned have completed, + return to this main agent and run: -The `auditor-primary.md` and `auditor-secondary.md` files the operator saves must follow `audit-output.schema.v0.json` (validated by `devtrail charter audit --calibrate`). The required frontmatter is documented inside each prompt — the operator should preserve it when pasting the LLM response. + /devtrail-audit-review + + Reviewing with incomplete reports gives you a partial consolidated + analysis that you will have to discard or re-run. +``` ## Notes -- This skill is **orchestration-only**. It does NOT invoke LLM APIs, decide which models the operator uses, or wait for responses. It surfaces prompts and exits. -- Re-running the skill on the same Charter regenerates the prompts (idempotent at the prompt level). It does NOT overwrite operator-saved responses in `auditor-primary.md` / `auditor-secondary.md`. -- Heterogeneity inter-family is recommended but not enforced in v0. The operator decides the model pairing; the skill surfaces the recommendation in the next-steps guidance. -- For the rationale on why dual auditors of different families produce calibration that mono-auditor cannot, see `dist/.devtrail/audit-prompts/auditor-primary.md` § heterogeneity (or `Propuesta/devtrail-cli-roadmap.md` §5.2 in the upstream repo). +- This skill is **orchestration-only**. It does NOT invoke LLM APIs, decide which models the operator uses, or wait for responses. It runs the CLI's `--prepare`, points the operator at the next-step skill, and exits. +- Re-running the skill on the same Charter regenerates the prompt at the same path. It does NOT touch operator-saved reports under `.devtrail/audits//report-*.md`. +- Heterogeneity inter-family is recommended but not enforced in v0; the operator decides the model pairing across the N CLIs they open. +- For the rationale on cross-family auditors, see `dist/.devtrail/audit-prompts/audit-prompt.md` § Tu rol or `Propuesta/devtrail-cli-roadmap.md` §5.2 in the upstream repo. diff --git a/dist/.gemini/skills/devtrail-audit-review/SKILL.md b/dist/.gemini/skills/devtrail-audit-review/SKILL.md index 4b547e8..fffc20d 100644 --- a/dist/.gemini/skills/devtrail-audit-review/SKILL.md +++ b/dist/.gemini/skills/devtrail-audit-review/SKILL.md @@ -1,108 +1,252 @@ --- name: devtrail-audit-review -description: Calibrate the responses of two external auditors and merge findings into the Charter telemetry. Counterpart to /devtrail-audit-prompt — invoke after the operator has saved the auditor responses. +description: Consolidate N external auditor reports into a critical review document with verdicts, remediation plan, and auditor ratings. Then merge the external_audit YAML block into the Charter telemetry. Counterpart of /devtrail-audit-prompt and /devtrail-audit-execute. --- # DevTrail Audit Review Skill -Reconcile the responses of two external auditors and produce the calibrator analysis. When the Charter has telemetry, append the consolidated `external_audit:` array directly into the YAML so the operator does not have to copy/paste. +Critically evaluate the N external auditor reports for a Charter, cross-reference each finding against the actual source code, and produce a consolidated `review.md` with verdicts, a prioritized remediation plan, and per-auditor ratings. Then merge the `external_audit` YAML block into the Charter telemetry. + +This is the third and final step of the v1 audit cycle, and it is where the substance lives — the calibrator role (definitional reconciliation across heterogeneous auditor verdicts) is now performed by the main agent inline, replacing the v0 paste-based calibrator-reconciler prompt. ## When to invoke -After running `/devtrail-audit-prompt `, having pasted both prompts into LLMs of different families, and saved the responses to: +After the operator has commissioned the audits in N auditor-side CLIs (each running `/devtrail-audit-execute `) and **all of them have completed**. The operator returns to the main IDE and invokes: -- `audit/charters//auditor-primary.md` -- `audit/charters//auditor-secondary.md` +``` +/devtrail-audit-review +``` -This skill produces the calibrator response and merges findings into telemetry. +If only some audits have completed, **do not proceed** — invoking the skill with incomplete reports produces a partial consolidated analysis. Verify with the operator that all the CLIs they opened have finished writing their reports under `.devtrail/audits//report-*.md`. ## Instructions -### 1. Verify auditor responses exist +### 1. Resolve the Charter and verify report set + +Argument: a Charter identifier. ```bash -ls audit/charters//auditor-primary.md \ - audit/charters//auditor-secondary.md +ls -la .devtrail/audits// ``` -If either file is missing, instruct the operator to run `/devtrail-audit-prompt ` first and exit. +Confirm: -### 2. Resolve the calibrator prompt (CALIBRATE step) +- `audit-prompt.md` exists (the prepared prompt the auditors read). +- `report-*.md` files exist (one per auditor that completed). At least 2 reports are recommended for cross-family heterogeneity to deliver signal; warn (do not block) if only 1 report is present. +- Match each `report-.md` filename against the model identifier in its frontmatter (`auditor:` field) to confirm consistency. Discrepancies are surfaced to the operator. -```bash -devtrail charter audit --calibrate -``` +If only the prompt is present and no reports exist, instruct the operator to run `/devtrail-audit-execute` in the auditor CLIs first. -The CLI: -- Validates `auditor-primary.md` and `auditor-secondary.md` against `audit-output.schema.v0.json`. If validation fails, surface the error to the operator and exit — the auditor response frontmatter must follow the schema documented inside each prompt. -- Writes `audit/charters//prompts/calibrator-reconciler.prompt.md`. +### 2. Read all auditor reports + extract finding master list -### 3. Run the calibrator inline (this conversation IS the calibrator) +For each `.devtrail/audits//report-*.md`: -The calibrator-reconciler prompt is designed to run in any model family — the agent currently in this conversation can produce the calibrator response directly. Heterogeneity inter-family is required for the auditor pair, NOT for the calibrator (the calibrator's task is definitional, not discovery — see `Propuesta/devtrail-cli-roadmap.md` §5.2 for the rationale). +- Validate the frontmatter against `audit-output.schema.v0.json` (the CLI does this in step 6, but a soft check here gives an earlier and more readable error if any report is malformed). +- Extract: model identifier, total findings, findings by category, evidence_citations count, audit_quality. +- Parse the body for finding entries (typically `## Findings` → `### F1 — title — category` blocks with `Where:`, `What I observed:`, `Why I'm flagging it:`). -Read the resolved prompt: +Build a **master finding list** — every unique claim across all auditors, deduplicated when two auditors clearly describe the same thing. -```bash -cat audit/charters//prompts/calibrator-reconciler.prompt.md -``` +### 3. Verify every finding against actual code -Produce the response **following the prompt's output schema exactly** — the frontmatter is required (`audit_role: calibrator-reconciler`, `calibrator: `, `auditors_reconciled`, `findings_consolidated`, `findings_by_status`). Save to: +This is the substantive step. For EACH finding in the master list: -- `audit/charters//calibrator-reconciler.md` +**Launch Explore agents in parallel** (up to 3 at a time) to verify findings. Group findings by the file they reference and send related findings to the same agent. -### 4. Finalize and merge into telemetry +For each finding, the verification answers four questions: -Determine the telemetry path: `.devtrail/charters/.telemetry.yaml` (canonical form, `` without slug). +1. **Does the code actually have this problem?** Read the cited `path:line`. Is the claim accurate? +2. **Is it in scope?** Does the finding affect a task or file declared in the Charter? If it's outside the Charter's scope, classify as `MISATTRIBUTED`. +3. **What's the real severity given the CURRENT configuration?** Check active driver, feature flags, build tags, DB role, deployment scope (the calibration discipline from the audit prompt). The Etapa 12 example in the prompt is a real case of inflation that careful calibration catches. +4. **Is it a duplicate?** Does another auditor report the same finding differently? -```bash -test -f .devtrail/charters/.telemetry.yaml -``` +#### Verdict classification -**Branch A — telemetry exists** (operator has already run `devtrail charter close`): +Each finding gets one of these verdicts: -```bash -devtrail charter audit --finalize \ - --merge-into .devtrail/charters/.telemetry.yaml +- **VALID** — Real problem, in scope, correctly described, correctly severized. +- **PARTIALLY VALID** — Real observation but wrong severity, missing nuance, or triggers only under a config not active in main. Include the reclassification. +- **MISATTRIBUTED** — Real observation but belongs to a different Charter or scope unit. +- **FALSE POSITIVE** — Claim is factually incorrect (file exists, code works as expected, the assumed driver isn't active, etc.). +- **DUPLICATE** — Same finding reported by another auditor (reference the original). + +#### Severity reclassification + +When the auditor's severity does not match the active configuration: + +- **Inflation** (auditor says Critical/High, code confirms only conditional): downgrade to Medium with explicit note on what config would activate it, OR move to "post-Charter / no bloqueante" if the trigger is a component not yet implemented. +- **Deflation** (auditor says Low/None, code shows a real trigger in current config): upgrade with evidence from the code path that activates it. +- **Correct inflation** (auditor says Critical and config confirms it): keep. + +An auditor that consistently inflates or deflates loses points in the auditor rating (step 5). + +### 4. Identify findings the auditors missed + +Based on your code exploration, check whether there are problems the auditors did NOT find. Focus on: + +- Security: SQL queries missing ownership filters, missing transaction boundaries, secrets in logs. +- Logic: input parameters ignored, dead code paths, unreachable branches. +- Consistency: naming mismatches between layers (model vs handler vs API). + +Mark these as "Missed by all auditors" in the remediation plan. + +### 5. Build the consolidated review.md + +Write the consolidated analysis to `.devtrail/audits//review.md` with this structure (six sections, lifted from Sentinel's pre-DevTrail audit-review skill): + +```markdown +--- +audit_role: calibrator-reconciler +calibrator: +charter_id: +git_range: "" +prompt_used: ../audit-prompt.md +calibrated_at: +auditors_reconciled: + - report-.md + - report-.md + - ... +findings_consolidated: +findings_by_status: + agreed: + disputed: + unique_: + unique_: + rejected: +--- + +# Consolidated audit review — + +**Reviewer:** +**Date:** +**Confidence:** [High | Medium] + +## 1. Executive summary + +[2-3 paragraphs. Total findings count, scope confusion if any, most critical bug, +overall verdict on the Charter's implementation.] + +## 2. Scope definition + +[Table of Charter tasks, the closing criterion, and what IS vs IS NOT in scope +of this Charter. The auditors' findings are evaluated against THIS scope.] + +## 3. Per-auditor evaluation + +### 3.1 (model: ) + +| # | Finding | Reported severity | Verdict | Justification | +|---|---------|-------------------|---------|---------------| + +**Summary:** [2-3 sentences on this auditor's overall performance.] + +### 3.2 ... (same shape) + +## 4. Remediation plan — VALID and PARTIALLY VALID findings + +### P0 — Security +- **Files:** `path:line` +- **Problem:** [description with code evidence] +- **Remediation:** [specific approach] +- **Complexity:** [Low / Medium / High] +- **Detected by:** [auditor slug(s), or "Missed by all auditors" if you found it] + +### P1 — Integrity +[Same shape] + +### P2 — Consistency +[Same shape] + +### P3 — Robustness +[Same shape] + +### P4 — Documentation +[Same shape] + +## 5. Discarded findings — misattributions and false positives + +| Finding | Type | Charter / area | Auditor | +|---------|------|----------------|---------| + +## 6. Auditor ratings + +Score each auditor 1-10 across four criteria with weights: + +| Auditor | Scope precision (25%) | Technical depth (25%) | Bug detection (30%) | False positive rate (20%) | **Overall** | +|---------|:-:|:-:|:-:|:-:|:-:| +| | /10 | /10 | /10 | /10 | **/10** | +| ... + +### Justifications + +**/10**: [2-3 sentences on what this auditor did well and where they slipped.] + +## 7. Conclusion + +[State of the Charter — clean / partial / deviated. Critical findings count. +Key remediation items the operator should address before close. +Recommended next step.] ``` -The CLI validates all 3 outputs and appends `external_audit:` to the telemetry YAML directly. **Do NOT** edit the YAML by hand afterwards — `git diff` will show what changed; the operator reviews before commit. +### 6. Validate + emit external_audit YAML -If the CLI rejects the merge because `external_audit:` already exists (re-audit guard, v0 does not support re-merge), surface the message to the operator. Manual append of new findings is the v0 fallback for that case. +Run the CLI's merge step to validate all reports against the schema and emit the YAML block (combined with `--merge-into` if the operator's Charter telemetry already exists): -**Branch B — telemetry does NOT exist** (Charter not yet closed): +**Branch A — telemetry exists** (operator already ran `devtrail charter close` for this Charter, perhaps without audit, and is now adding audit findings retroactively): ```bash -devtrail charter audit --finalize > /tmp/external-audit-block.yaml -mkdir -p audit/charters/ -mv /tmp/external-audit-block.yaml audit/charters//external-audit-pending.yaml +devtrail charter audit --merge-reports \ + --merge-into .devtrail/charters/.telemetry.yaml ``` -Tell the operator: "The Charter is not yet closed. The findings are saved in `audit/charters//external-audit-pending.yaml`. When you run `devtrail charter close `, paste the `external_audit:` block from that file into the telemetry when prompted, or merge it manually after close completes." +The CLI appends the `external_audit:` array to the telemetry YAML. The CLI v1 deliberately rejects re-audit (file already has `external_audit:`) — if that fires, surface the message to the operator. Manual append is the fallback. -### 5. Print summary - -After step 4, print to the operator: +**Branch B — telemetry does NOT exist** (the typical case: operator audits BEFORE closing): +```bash +devtrail charter audit --merge-reports ``` -Audit review complete for . - Auditors reconciled: - - auditor-primary.md ( findings, ) - - auditor-secondary.md ( findings, ) - Calibrator: ( findings consolidated) +The YAML block prints to stdout. Capture it and write it to `.devtrail/audits//external-audit-pending.yaml` so the operator can paste it into the telemetry when running `devtrail charter close `. Tell the operator: "When you run `devtrail charter close `, paste the `external_audit:` block from `.devtrail/audits//external-audit-pending.yaml` when prompted." - external_audit YAML: - [merged into .devtrail/charters/.telemetry.yaml] - or - [pending in audit/charters//external-audit-pending.yaml] +### 7. Notify the operator with a summary - Run `git diff .devtrail/charters/.telemetry.yaml` to review. +``` +Consolidated audit review complete for . + + Auditors reconciled: + Findings reported: + Findings VALID: (%) + False positives: + Misattributions: + Missed by all (you): + + Remediation plan: + P0 (Security): + P1 (Integrity): + P2-P4 (Other): + + Auditor ratings: + : /10 + : /10 + ... + + Documents written: + .devtrail/audits//review.md + [either: .devtrail/charters/.telemetry.yaml (merged)] + [or: .devtrail/audits//external-audit-pending.yaml] + +Run `git diff` to review the changes before commit. ``` ## Notes -- This skill **does** invoke an LLM (the calibrator runs in the current conversation), unlike `/devtrail-audit-prompt` which is purely orchestration. The distinction is intentional: the calibrator is family-agnostic, so the agent driving the conversation is a valid calibrator. -- The skill is **idempotent for steps 1-3** (re-running regenerates the calibrator response if you delete `calibrator-reconciler.md`). It is **NOT idempotent for step 4 in Branch A** — once telemetry has `external_audit:`, the CLI rejects re-merge to prevent silent duplication. -- The auto-merge re-emits the YAML using the existing `charter_telemetry:` shape; cosmetic formatting of the merged file matches the close.rs output. Comments in the original telemetry YAML, if any, are preserved (the CLI uses string-level append, not full re-serialization). -- The `audit_notes:` field in the merged block points at the canonical paths `audit/charters//auditor-{primary,secondary}.md`. If you renamed those files, fix the field manually after merge. +- **The calibrator role is in-conversation, not in a paste-based prompt.** v0 had a `calibrator-reconciler.md` template that the operator pasted into a separate model. v1 eliminates that round-trip — the main agent (this conversation) has filesystem access and can verify findings against code, which is what makes the consolidated review substantive instead of mechanical. +- **Heterogeneity inter-family is for the auditor pair, not the calibrator.** The calibrator (this skill, running in the main agent) may be of any family — even the same family as one of the auditors — because its task is definitional (apply the schema to already-produced verdicts), not discovery. +- **The review.md is the human-readable delivery; the YAML block is the machine-readable input to telemetry.** Both coexist by design. The operator reads `review.md`; `devtrail metrics` and the Phase 2 telemetry aggregation read the YAML. +- **Re-running the skill** overwrites `review.md`. If the operator wants to keep the previous review (e.g., after a re-audit cycle), instruct them to copy it manually before re-running. +- **No HTTP API calls.** The skill runs the CLI's `--merge-reports` (which validates schemas and emits YAML), invokes Explore agents internally for code verification (which run in the main agent's context, not external APIs), and writes markdown. DevTrail does not invoke LLM APIs at any point. + +## Credit + +The six-section structure of `review.md` and the verdict vocabulary (VALID / PARTIALLY VALID / MISATTRIBUTED / FALSE POSITIVE / DUPLICATE) are lifted from the `audit-review/SKILL.md` skill mature pre-DevTrail in Sentinel, contributed via [issue #102](https://github.com/StrangeDaysTech/devtrail/issues/102) by José Villaseñor Montfort (StrangeDaysTech). The four-criterion weighted auditor rating (Scope precision 25% / Technical depth 25% / Bug detection 30% / False positive rate 20%) is the same. Project-specific hardcodes parameterized.