Skip to content

audit-skills v0: substantive-depth limitations in primary-adopter cycle (Sentinel CHARTER-07) + prompt resolver HTML-comment bug #102

@montfort

Description

@montfort

Context

This issue consolidates feedback from the first primary-adopter run of the audit-skills flow shipped in fw-4.8.0 / cli-3.9.0 (3 May). Sentinel is executing CommsHub Etapa 2 as 9 Charters per the meta-plan; CHARTER-07 (foundation: setup + ports + 7 migrations + SQLC + scaffolding + Wire stub) just completed its checkpoint-driven audit cycle — the first such cycle in any adopter project.

This is exactly the data point Etapa 2 telemetry was set up to surface. We chose to file this before the consolidated snapshot at the end of Etapa 2 because the issues identified are blockers for substantive audit value and merit upstream iteration sooner, not later. Sentinel has paused CHARTER-07 close (Charter status: in-progress) so the audit can be re-run after upstream fixes, instead of polluting the telemetry with a structurally-limited cycle.

There are two distinct findings (R10 mechanical bug, R11 substantive-depth limitations) plus a forward-looking proposal that builds on the methodology Sentinel had pre-DevTrail. All evidence lives in the Sentinel repo paths cited at the bottom of this issue and is reproducible.


R10 — Prompt resolver duplicates content by ignoring HTML comment boundaries

Symptom

After devtrail charter audit CHARTER-07 (Step 1/3 PREPARE), audit/charters/CHARTER-07/prompts/auditor-primary.prompt.md was 1300 lines with the Charter content + AILOG + diff appearing twice — once inside the template's leading HTML comment block (lines 1-605), and again in the prompt body proper (lines 607+). The operator detected the duplication visually before pasting into the auditor LLM.

Root cause

.devtrail/audit-prompts/auditor-primary.md (the template) has an HTML comment header (lines 1-32) that documents available placeholders. Lines 21-31 list each placeholder with a description using literal {{placeholder}} — description syntax:

Placeholders supported by `devtrail charter audit`:
  {{charter_id}}        — e.g., CHARTER-05
  {{charter_content}}   — full body of the Charter doc
  {{git_diff}}          — output of `git diff <git_range>`
  ...

The CLI resolver does string-replace globally of each {{placeholder}} → resolved content, without distinguishing whether the placeholder is inside or outside an <!-- ... --> block. Result: the documentation-only lines (e.g., {{charter_content}} — full body of the Charter doc) expand to (entire Charter body) — full body of the Charter doc, inflating the comment block from 32 to 605 lines and duplicating every payload (Charter + AILOG + diff) that already appears in its proper place outside the comment.

Scope

Only affects auditor-primary.md template. The auditor-secondary.md template has a short comment (16 lines) without {{placeholders}} — no duplication there.

Operational impact

  • ~30k tokens of duplicated payload sent to the auditor LLM (cost overhead in gpt-4o / gemini chat).
  • Risk that the auditor interprets the duplication as a delta between two versions of the Charter, degrading finding quality.
  • Documentation block becomes noise (no longer useful as placeholder reference).

Workaround applied locally

cd audit/charters/CHARTER-07/prompts/
tail -n +607 auditor-primary.prompt.md > tmp && mv tmp auditor-primary.prompt.md

Reduces primary from 1300 → 694 lines (comparable with secondary at 704). Workaround is non-destructive (touches only the resolved prompt, not the template) and reversible (re-running devtrail charter audit regenerates the buggy version).

Sentinel commit applying the workaround + AILOG R10 documentation: see references at bottom.

Proposed fix

Two complementary options for the resolver:

  1. Respect HTML comment boundaries: skip placeholder replacement inside <!-- ... --> ranges. Conservative; preserves comment block as documentation.
  2. Detect documentation-only lines: if a placeholder appears as {{placeholder}} — description (indented, followed by ), skip the replacement on that specific line. Less conservative but doesn't require markdown comment parsing.

Either fix unblocks the template's documentation header from being weaponized as a content duplicator.


R11 — Audit cycle v0 produces audits of structurally limited substantive reach

The CHARTER-07 audit cycle completed with findings_total: 1 (one self-categorized false_positive from auditor-primary; auditor-secondary reported zero). On surface that looks clean. It is not — it is mechanically true for the audit window the auditors received, but vacuously so relative to the Charter's actual scope. Two structural causes.

Root cause A — git_range defaults to HEAD~1..HEAD

CHARTER-07 has 8 commits in the feature branch charter-07-commshub-foundation:

Commit Scope
2ee3352 Setup: module skeleton + Preference Center embed assets
cee74f3 Migrations (T004-T010): 7 migrations 012-018 with RLS + partitioning
a6efa9c SQLC (T011): 12 query files + generated code + 2 migration fixes
824ad76 Scaffolding: 4 ports + 28 errors + 21 models + 16 events + PII guard test + 8 OTel metrics
6d68392 Wire (T018) + compile gate (T019) + middleware verify (T019a)
7486b64 AILOG-035 + atomic Charter §Closing notes
c68702d Charter originating_ailogs + canonical R drift suppression format
aa5e948 R10 workaround (the only commit the auditors saw)

The CLI defaults git_range to HEAD~1..HEAD, so the auditors processed only the final atomic-update commit — a metadata-only delta that updates Charter frontmatter and reorganizes AILOG R7-R9 into canonical drift-suppression format. They did not see migrations, SQLC, scaffolding, Wire, or the PII guard test — i.e., they did not audit the ~4150 lines that constitute CHARTER-07's actual implementation.

Both auditors performed correctly given what they were handed. But the convergence on "no substantive findings" carries no weight as evidence that the foundation is correct — they were not given the foundation to audit.

Root cause B — Paste-based mode without filesystem tool access

Even if git_range had been main..HEAD (the full branch), the auditors operated in paste-based mode. They could not:

  • Open files outside the diff to verify cross-references (e.g., does function X invoked here actually exist in file Y).
  • Consult data-model.md to validate migrations against documented schema intent.
  • Confirm the eventTypeToPayload map covers the full set declared in EmittedEventTypes.
  • Inspect spec-001 RLS patterns to validate the new templates_tenant_or_system policy doesn't leak system rows into spec-001 tests.

Without tool use, auditors extend to training data and assumptions. Sentinel's pre-DevTrail audit methodology had an iteratively-refined prompt that explicitly enforced "do not opine on files you have not opened" — a discipline that catches a known failure mode where Gemini, in particular, has a tendency to assume class/interface correctness from naming alone without reading the file. That discipline isn't representable in a paste-based prompt.

Calibrator compensation (this cycle only)

The calibrator (claude-opus-4-7[1m] running in the operator's IDE) has filesystem access and read both the diff and the 5 implementation commits not in the audit window. The calibrator output (audit/charters/CHARTER-07/calibrator-reconciler.md §Reconciliation summary) documents the structural limitations explicitly — as critical observations, NOT as fresh C<N> findings (the calibrator role explicitly forbids introducing fresh findings: "that's what the next audit cycle is for").

This compensation works for one Charter audited by an operator using a calibrator-class tool with filesystem access. It does not generalize: most adopters won't have a calibrator with full repo read.


Forward-looking proposal (not a regression)

The audit-skills v0 architecture has real strengths worth preserving:

  • Cross-family heterogeneity as the discovery mechanism (different model family per auditor → different blind spots → convergence is high signal).
  • Calibrator pattern (definitional, not discovery).
  • Opt-in checkpoint per AGENT-RULES §12 heuristic.
  • Telemetry capture as a side effect of the workflow (zero extra operator burden).

The proposal below preserves all four and addresses the substantive depth gap:

Audits, when run, should run via CLI with read-only filesystem access

The auditor-primary and auditor-secondary roles should be invoked via auditor-side CLI tooling (gemini-cli, claude-cli, copilot-cli, codex-cli — whatever the operator has) configured with read-only access to the repo. The DevTrail prompts then enforce the principle that gives the audit its substance:

"You may only opine on files you have read via tool call. Any finding you produce must cite the specific files (paths and lines) you opened. If you have not opened a file, you may not infer behavior, structure, or correctness about it."

This shifts the audit from "extend training data + diff" to "verify diff + read context + cite evidence." It is the discipline that distinguishes substantive code review from pattern matching.

The Sentinel pre-DevTrail audit methodology used exactly this approach across the spec-001 implementation cycle. The prompt was iteratively refined as specific resbalones (slip-ups) were observed per model — Gemini's class-by-name assumption being the canonical example we polished out. That mature prompt is shareable; we can contribute it as a starting point if useful.

git_range default should match the Charter's commit set, not the last commit

For Charters implemented as multiple commits on a feature branch (the common case for L Charters), the default HEAD~1..HEAD is the wrong unit. Sensible alternatives:

  • git merge-base origin/main HEAD..HEAD — captures all commits unique to the branch since divergence from main.
  • origin/main..HEAD — same effect, simpler to express.
  • A --range flag the operator overrides per invocation, but with the default biased toward "the full branch since main" rather than "the last commit."

This applies regardless of whether audits move to CLI mode or stay paste-based.

Optional: tighten the resolver (R10 fix)

Independent of the larger architectural shift — the R10 fix above is small, mechanical, and worth shipping as a patch independent of any larger v1 redesign.


Evidence in Sentinel repo

All evidence is reproducible from the Sentinel branch charter-07-commshub-foundation (commits 2ee3352..f0f0cee). Branch is currently local; will push if the upstream maintainers want to inspect concretely — say the word.

Substantive references inside the Sentinel branch:

  • .devtrail/07-ai-audit/agent-logs/AILOG-2026-05-04-035-charter-07-commshub-foundation.md §Risk:
    • R10 (~lines 286-352): full root-cause analysis, scope, workaround applied, fix proposal.
    • R11 (~lines 354-440): structural-depth limitations, both root causes elaborated, calibrator compensation applied, forward-looking proposal.
  • audit/charters/CHARTER-07/auditor-primary.md: copilot-v1.0.40 (gpt-5.3-codex) actual response (1 finding, self-categorized false_positive).
  • audit/charters/CHARTER-07/auditor-secondary.md: gemini-2.5-pro-cli actual response (0 findings, "execution and accompanying documentation are pristine").
  • audit/charters/CHARTER-07/calibrator-reconciler.md §Reconciliation summary: detailed analysis of why the convergent zero-substantive-findings does not constitute evidence of foundation correctness.
  • audit/charters/CHARTER-07/external-audit-pending.yaml: the external_audit: block ready to merge into .devtrail/charters/CHARTER-07.telemetry.yaml once the Charter closes (currently held back; see operational decision below).

Six commits granular for the implementation + 3 for AILOG/atomic update/audit cycle. The branch is paused at the audit-cycle close commit (f0f0cee) — no devtrail charter close invoked, no PR opened.

Operational decision in Sentinel

Per the operator's call: pause CHARTER-07 close until upstream iterates on R10 + R11 fixes. Once a new cli version ships with the resolver fix (R10) + a sensible git_range default (R11 part A) — and ideally with the CLI-based audit pattern surfaced (R11 part B) — Sentinel will:

  1. devtrail update-cli + devtrail update-framework.
  2. Re-run the audit cycle on CHARTER-07 with the new flow.
  3. devtrail charter close CHARTER-07 with the substantive audit findings merged into telemetry.
  4. Continue with CHARTER-08 onward under the improved methodology.

The current audit cycle outputs (auditor responses + calibrator) stay in audit/charters/CHARTER-07/ as historical evidence of what the v0 produced; the re-run will overwrite or coexist as a separate cycle (decision pending upstream iteration).

This issue is the formal feedback channel from the first primary adopter. Sentinel will continue Etapa 2 telemetry per the agreed devtrail-telemetria-etapa-2.md v0.1 conventions and report a consolidated snapshot at the end (CHARTER-13 close), but R10 + R11 + the forward-looking proposal merit upstream attention now rather than at the end of a multi-week cycle.

Happy to provide more concrete material (the Sentinel pre-DevTrail audit prompt, full auditor responses, additional Charter execution data) if useful.


🤖 Filed by Sentinel operator at the close of CHARTER-07 audit cycle, with calibrator analysis from claude-opus-4-7[1m]. All assertions verifiable against the cited commits and file paths.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions