docs(spec) post-Phase-3-close audit fix-ups — README honesty pass + observability PinballMap inventory by jkeeley2073 · Pull Request #94 · Early-Bird-Solutions-LLC/PinballWizard

jkeeley2073 · 2026-05-07T17:12:31Z

Summary

Closes the three findings from the post-Phase-3-close due-diligence audit. Doc-only, ~30 lines net change across three files.

README.md stale — Phase 3 row now ✅ Complete with the actual Phase 3 surface (Foundry orchestration, four-agent surface, confidence-threshold refusal, cost routing + LRU cache, evaluation harness, H2 baseline). Test count 566 → 687. ADR count 0001–0013 → 0001–0018. Project-structure tree updated to match.
README + docs/vision.md "always cite" overclaim softened. The H2 baseline (data/eval/results/wizard.20260507T162529Z.json) scored citation_precision = 0.133 — the metric is floored by two structural gaps: (a) citations are extracted by regexing OPDB URLs out of agent prose (a Phase 3 placeholder — tool-call-trace inspection replaces it in Phase 4), and (b) eval ground-truth OPDB IDs were curated from titles, not verified against the deployed catalog. A new "Phase 3 — current capability honest read" section in README names these gaps plus connected-agents-scaffolding and RAG-corpus-deferred. vision.md line 18 likewise softens "sub-agent routing (Valuation / Rules / Repair)" to acknowledge structural wiring lands in Phase 4. Threshold-driven refusal (ADR-0017's draft 0.65) and the evaluation harness itself ARE shipped and unchanged.
docs/observability.md gains the pinwiz.pinballmap.* instrument table that's existed in PinballWizardTelemetry.cs:62–80 since PR feat(pinballmap) Pinball Map client with on-disk cache + per-source politeness #83 but was never tabulated. Activity inventory also gains pinwiz.pinballmap.fetch for parity with the metric inventory (emitted by PinballMapClient.cs:74).

Why the soften, not just the mechanical fix

Per guardrails.md goal #1 (10-minute prospect landing), customer-facing surfaces must distinguish shipped vs scaffolded vs deferred. "Always cite" set an expectation the H2 baseline cannot honor today; refuse-rather-than-fabricate is the actual safety invariant per ADR-0017. The Phase 4 trajectory closes the gap between architectural invariant and empirical baseline; this PR makes the docs honest about where we are now.

Local review

0 🔴 / 1 ⚠️ / 4 doc-applicable categories ✅ (6 code categories N/A on a doc PR). The ⚠️ — Activity inventory parity gap in docs/observability.md — was fixed in the same PR before push.

Test plan

Doc-only diff — eligible for the doc-only carve-out per guardrails.md § Per-PR gate; /local-review ran anyway because Finding 2 is a posture-correction.
dotnet build PinballWizard.slnx -c Release — clean, 0 warnings.
dotnet test PinballWizard.slnx -c Release — 687/687 passing.
Identity check — git log -1 --format='%an <%ae>' shows personal noreply.
Anchor link [Phase 3 known limitations](#phase-3--current-capability-honest-read) resolves to GitHub-slugified ### Phase 3 — current capability honest read (em-dash drops, double-hyphen preserved).
pinwiz.pinballmap.* rows in observability.md cross-checked against PinballWizardTelemetry.cs — names, types, units, descriptions all match.
H2 baseline JSON path in README is a real file.

Per-phase-close lesson

This is the second time a post-close audit has surfaced a README + vision.md update that the closeout PR didn't include (Phase 2 close → PR #81; Phase 3 close → this). Worth adding to guardrails.md § Per-phase gate as a Phase 4 enhancement: "README marketing claims reviewed against the phase's actual demonstrable artifact." Surfaced as a Phase 4 follow-up, not bundled here (one PR, one purpose).

Memory references

session_handoff_2026_05_07_post_close_audit.md — the audit findings + the fix-it scope being closed by this PR
session_handoff_2026_05_07_phase3_close.md — Phase 3 closeout context
feedback_pre_pr_self_audit.md — two-step audit gate that produced this PR

🤖 Generated with Claude Code

…bservability.md PinballMap inventory Closes the three findings from the post-Phase-3-close due-diligence audit (memory: session_handoff_2026_05_07_post_close_audit.md): 1. README stale — Phase 3 status row now ✅ Complete with the actual Phase 3 surface (Foundry orchestration, four-agent surface, confidence-threshold refusal, cost routing + LRU cache, evaluation harness, H2 baseline). Test count 566 → 687. ADR count 0001-0013 → 0001-0018. Project-structure tree updated to match. 2. README + vision.md "always cite" overclaim softened. The H2 baseline surfaced two structural gaps that floor citation_precision at 0.133: (a) citations are extracted by regexing OPDB URLs out of agent prose (a Phase 3 placeholder — tool-call-trace inspection replaces it in Phase 4), and (b) eval ground-truth OPDB IDs were curated from titles, not verified against the deployed catalog. New "Phase 3 — current capability honest read" section in README names these gaps plus the connected-agents scaffolding gap and the RAG-corpus-deferred gap. vision.md line 18 likewise softens "sub-agent routing (Valuation / Rules / Repair)" to acknowledge structural wiring lands in Phase 4. Threshold-driven refusal (ADR-0017's draft 0.65) and the evaluation harness itself ARE shipped and unchanged. 3. observability.md gains the pinwiz.pinballmap.* instrument table that existed in PinballWizardTelemetry.cs since PR #83 but was never tabulated. Activity inventory also gains pinwiz.pinballmap.fetch (parity with the metric inventory; emitted by PinballMapClient). Why the soften, not just the mechanical fix: per guardrails.md goal #1 (10-minute prospect landing), customer-facing surfaces must distinguish shipped vs scaffolded vs deferred. "Always cite" set an expectation the H2 baseline cannot honor today; refuse-rather-than-fabricate is the actual safety invariant per ADR-0017. The Phase 4 trajectory closes the gap between architectural invariant and empirical baseline; this PR makes the docs honest about where we are now. Local review: 0 🔴 / 1 ⚠️ (Activity inventory parity — fixed in same PR before push). Doc-only — build green, 687/687 tests pass. Per-phase-close lesson recorded for guardrails.md: README + vision.md updates should be a per-phase-close checklist item. The pattern (PR #81 after Phase 2, this PR after Phase 3) repeated itself; the post-close audit caught both. Worth adding to guardrails.md § Per-phase gate as a Phase 4 enhancement.

jkeeley2073 added the claude-code Generated with Claude Code label May 7, 2026

jkeeley2073 merged commit ac3c9c3 into main May 7, 2026
3 of 4 checks passed

This was referenced May 7, 2026

docs(spec) Phase 4 Wave 0 — § Scope + ADRs 0019-0024 + guardrails Per-phase gate #95

Merged

feat(ai) Phase 4 W4-3 citation-required guardrail (ADR-0023) #128

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(spec) post-Phase-3-close audit fix-ups — README honesty pass + observability PinballMap inventory#94

docs(spec) post-Phase-3-close audit fix-ups — README honesty pass + observability PinballMap inventory#94
jkeeley2073 merged 1 commit into
mainfrom
Dev-Phase3PostCloseFixUps

jkeeley2073 commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jkeeley2073 commented May 7, 2026

Summary

Why the soften, not just the mechanical fix

Local review

Test plan

Per-phase-close lesson

Memory references

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant