Skip to content

docs(spec) post-Phase-3-close audit fix-ups — README honesty pass + observability PinballMap inventory#94

Merged
jkeeley2073 merged 1 commit into
mainfrom
Dev-Phase3PostCloseFixUps
May 7, 2026
Merged

docs(spec) post-Phase-3-close audit fix-ups — README honesty pass + observability PinballMap inventory#94
jkeeley2073 merged 1 commit into
mainfrom
Dev-Phase3PostCloseFixUps

Conversation

@jkeeley2073
Copy link
Copy Markdown
Contributor

Summary

Closes the three findings from the post-Phase-3-close due-diligence audit. Doc-only, ~30 lines net change across three files.

  1. README.md stale — Phase 3 row now ✅ Complete with the actual Phase 3 surface (Foundry orchestration, four-agent surface, confidence-threshold refusal, cost routing + LRU cache, evaluation harness, H2 baseline). Test count 566 → 687. ADR count 0001–0013 → 0001–0018. Project-structure tree updated to match.

  2. README + docs/vision.md "always cite" overclaim softened. The H2 baseline (data/eval/results/wizard.20260507T162529Z.json) scored citation_precision = 0.133 — the metric is floored by two structural gaps: (a) citations are extracted by regexing OPDB URLs out of agent prose (a Phase 3 placeholder — tool-call-trace inspection replaces it in Phase 4), and (b) eval ground-truth OPDB IDs were curated from titles, not verified against the deployed catalog. A new "Phase 3 — current capability honest read" section in README names these gaps plus connected-agents-scaffolding and RAG-corpus-deferred. vision.md line 18 likewise softens "sub-agent routing (Valuation / Rules / Repair)" to acknowledge structural wiring lands in Phase 4. Threshold-driven refusal (ADR-0017's draft 0.65) and the evaluation harness itself ARE shipped and unchanged.

  3. docs/observability.md gains the pinwiz.pinballmap.* instrument table that's existed in PinballWizardTelemetry.cs:62–80 since PR feat(pinballmap) Pinball Map client with on-disk cache + per-source politeness #83 but was never tabulated. Activity inventory also gains pinwiz.pinballmap.fetch for parity with the metric inventory (emitted by PinballMapClient.cs:74).

Why the soften, not just the mechanical fix

Per guardrails.md goal #1 (10-minute prospect landing), customer-facing surfaces must distinguish shipped vs scaffolded vs deferred. "Always cite" set an expectation the H2 baseline cannot honor today; refuse-rather-than-fabricate is the actual safety invariant per ADR-0017. The Phase 4 trajectory closes the gap between architectural invariant and empirical baseline; this PR makes the docs honest about where we are now.

Local review

0 🔴 / 1 ⚠️ / 4 doc-applicable categories ✅ (6 code categories N/A on a doc PR). The ⚠️ — Activity inventory parity gap in docs/observability.md — was fixed in the same PR before push.

Test plan

  • Doc-only diff — eligible for the doc-only carve-out per guardrails.md § Per-PR gate; /local-review ran anyway because Finding 2 is a posture-correction.
  • dotnet build PinballWizard.slnx -c Release — clean, 0 warnings.
  • dotnet test PinballWizard.slnx -c Release — 687/687 passing.
  • Identity check — git log -1 --format='%an <%ae>' shows personal noreply.
  • Anchor link [Phase 3 known limitations](#phase-3--current-capability-honest-read) resolves to GitHub-slugified ### Phase 3 — current capability honest read (em-dash drops, double-hyphen preserved).
  • pinwiz.pinballmap.* rows in observability.md cross-checked against PinballWizardTelemetry.cs — names, types, units, descriptions all match.
  • H2 baseline JSON path in README is a real file.

Per-phase-close lesson

This is the second time a post-close audit has surfaced a README + vision.md update that the closeout PR didn't include (Phase 2 close → PR #81; Phase 3 close → this). Worth adding to guardrails.md § Per-phase gate as a Phase 4 enhancement: "README marketing claims reviewed against the phase's actual demonstrable artifact." Surfaced as a Phase 4 follow-up, not bundled here (one PR, one purpose).

Memory references

  • session_handoff_2026_05_07_post_close_audit.md — the audit findings + the fix-it scope being closed by this PR
  • session_handoff_2026_05_07_phase3_close.md — Phase 3 closeout context
  • feedback_pre_pr_self_audit.md — two-step audit gate that produced this PR

🤖 Generated with Claude Code

…bservability.md PinballMap inventory

Closes the three findings from the post-Phase-3-close due-diligence
audit (memory: session_handoff_2026_05_07_post_close_audit.md):

1. README stale — Phase 3 status row now ✅ Complete with the actual
   Phase 3 surface (Foundry orchestration, four-agent surface,
   confidence-threshold refusal, cost routing + LRU cache, evaluation
   harness, H2 baseline). Test count 566 → 687. ADR count
   0001-0013 → 0001-0018. Project-structure tree updated to match.

2. README + vision.md "always cite" overclaim softened. The H2 baseline
   surfaced two structural gaps that floor citation_precision at 0.133:
   (a) citations are extracted by regexing OPDB URLs out of agent prose
   (a Phase 3 placeholder — tool-call-trace inspection replaces it in
   Phase 4), and (b) eval ground-truth OPDB IDs were curated from
   titles, not verified against the deployed catalog. New "Phase 3 —
   current capability honest read" section in README names these gaps
   plus the connected-agents scaffolding gap and the RAG-corpus-deferred
   gap. vision.md line 18 likewise softens "sub-agent routing
   (Valuation / Rules / Repair)" to acknowledge structural wiring lands
   in Phase 4. Threshold-driven refusal (ADR-0017's draft 0.65) and the
   evaluation harness itself ARE shipped and unchanged.

3. observability.md gains the pinwiz.pinballmap.* instrument table that
   existed in PinballWizardTelemetry.cs since PR #83 but was never
   tabulated. Activity inventory also gains pinwiz.pinballmap.fetch
   (parity with the metric inventory; emitted by PinballMapClient).

Why the soften, not just the mechanical fix: per guardrails.md goal #1
(10-minute prospect landing), customer-facing surfaces must distinguish
shipped vs scaffolded vs deferred. "Always cite" set an expectation the
H2 baseline cannot honor today; refuse-rather-than-fabricate is the
actual safety invariant per ADR-0017. The Phase 4 trajectory closes the
gap between architectural invariant and empirical baseline; this PR
makes the docs honest about where we are now.

Local review: 0 🔴 / 1 ⚠️ (Activity inventory parity — fixed in same PR
before push). Doc-only — build green, 687/687 tests pass.

Per-phase-close lesson recorded for guardrails.md: README + vision.md
updates should be a per-phase-close checklist item. The pattern (PR #81
after Phase 2, this PR after Phase 3) repeated itself; the post-close
audit caught both. Worth adding to guardrails.md § Per-phase gate as a
Phase 4 enhancement.
@jkeeley2073 jkeeley2073 added the claude-code Generated with Claude Code label May 7, 2026
@jkeeley2073 jkeeley2073 merged commit ac3c9c3 into main May 7, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code Generated with Claude Code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant