feat(ai) Phase 4 W1-2 — tool-trace citation extraction (ADR-0022) by jkeeley2073 · Pull Request #97 · Early-Bird-Solutions-LLC/PinballWizard

jkeeley2073 · 2026-05-08T12:18:49Z

Summary

Phase 4 Wave 1 PR W1-2 — closes § Scope item 10 / inherited Phase 3 follow-up #5. Implements ADR-0022: citations come from the agent's tool-call result trace, not regex over the Wizard's outer prose.

New abstraction ICitationExtractor (Application/Ai/Citations/) with two impls:

ToolTraceCitationExtractor — walks AgentResponse.Messages and extracts citations from FunctionResultContent instances. getMachineByTitle results produce structured citations from MachineGroundingDto.OpdbSourceUrl; connected sub-agent function results (Valuation/Rules/Repair from PR feat(ai) Phase 4 W1-1 — connected-agents wiring on Wizard via AsAIFunction #96 wiring) produce citations from OPDB URLs embedded in the sub-agent's text reply. The Wizard's outer assistant text is structurally unreachable by the extractor — hallucinated URLs in Wizard prose without backing tool-call results no longer count.
RegexLegacyCitationExtractor — the Phase 3 regex impl, kept behind AiFoundryOptions.RetainRegexCitationCutover (default true) for the cutover observability window per ADR-0022 § Telemetry. Both extractors emit pinwiz.ai.citations.extracted_total{source=...} so a behavioral regression in the new tool-trace one would surface in metrics before H3 baseline. Counter, type, unit, and tag shape exactly match the ADR spec. After H2 confirms parity-or-better, RegexLegacyCitationExtractor + the cutover flag get deleted in a follow-up PR.

AiRouter refactored: inline ExtractCitationsFromText + OpdbMachineUrlRegex deleted; AiRouter is no longer partial; using System.Text.RegularExpressions removed. Both extractors are concrete-type-injected into AiRouter (the cutover semantics are asymmetric — tool-trace authoritative, regex_legacy telemetry-only).

This PR + W1-1 (PR #96) together address the two structural causes of Phase 3's H2 baseline citation_precision=0.133: connected-agents wiring (W1-1) made sub-agent grounding reachable, tool-trace extraction (W1-2) makes citations structurally tied to what the agent actually retrieved.

Test Plan

dotnet build PinballWizard.slnx — Build succeeded, 0 Warning(s), 0 Error(s)
dotnet test PinballWizard.slnx — Passed: 709, Failed: 0, Skipped: 0 (was 691; +18 new tests = 11 tool-trace + 6 regex-legacy + 1 cross-channel-dedup + 1 multi-FRC-in-message)
Tool-trace tests cover: null response, no FRC, structured grounding DTO, null grounding, missing-OPDB-URL DTO, duplicate calls (same DTO twice), sub-agent text mining, mixed grounding+sub-agent (union), cross-channel dedup (DTO+text same URL), multiple FRC in single message, no-Wizard-prose-counted invariant (the headline behavior change vs. regex)
Regex-legacy tests pin Phase 3 behavior so cutover comparison has a stable comparison point
Live integration verification post-merge: H2 baseline rerun (operational hand-off) measures actual citation_precision lift from the new structural extractor

Out of Scope

searchCorpus tool integration (Phase 4 scope item 21, separate PR — extends ToolTraceCitationExtractor symmetrically with RetrievedChunk[] results from RAG retrieval)
Citation-required guardrail (ADR-0023, Phase 4 scope item 23 — depends on this extractor being structural; that PR uses citations.Count == 0 as the refusal trigger)
Removal of RegexLegacyCitationExtractor and RetainRegexCitationCutover flag (deferred to a follow-up PR after H2 baseline confirms parity-or-better)

Checklist

CI is green (build + test + coverage + CodeQL + sanitization) — verified pre-push; will re-confirm in PR checks
PR title follows the Conventional Commits format above
If this is a new architectural decision, an ADR has been added under docs/adr/ — implements ADR-0022 (already merged in Wave 0)
If user-visible behavior changes, README.md and/or docs/ are updated in the same PR — citation extraction is internal mechanics; no user-visible change yet (the citation surface in WizardAnswer is unchanged)
If a memory in ~/.claude/projects/c--projects-PinballWizard/memory/ is now stale, it has been updated or removed in the same PR — wave0_close handoff already references W1-2 as queued
No TODO / FIXME / commented-out code committed
No new entries in <NoWarn> without a comment explaining why and the removal criterion — N/A (no Directory.Build.props changes)

Pre-push self-audit

Step 0 — `/local-review` (qualitative)

Ran /local-review and addressed every 🔴 finding before push
Local review outcome: 0 🔴 / 2 ⚠️ (both fixed pre-push) / 12 categories ✅
- Fixed: comment typo in RegexLegacyCitationExtractor.cs:8 referenced wrong property name
- Fixed: added Extract_SameUrlInDtoAndSubAgentText_Deduplicates and Extract_MultipleFunctionResultsInSingleMessage_AllProcessed test-gap fills

Step 1 — Mechanical checklist

Every new *Options property has at least one real getter call in src/ — RetainRegexCitationCutover is read at AiRouter.cs:179
Sibling-diffed against the closest existing implementation — verified by /local-review § 4; RegexLegacyCitationExtractor preserves the prior inline implementation's behavior exactly so cutover comparison counts will be apples-to-apples
No bare catch { } — no new try/catch added
New ISourceScraper? — N/A
Tests assert behavior, not just structure — every test exercises a specific ADR-0022 invariant or behavior (no-prose-counted, structured grounding, sub-agent text mining, cross-channel dedup, etc.)
Build is zero-warning — verified
git log -1 --format='%an <%ae>' shows personal noreply, not work email — confirmed 94459922+jkeeley2073@users.noreply.github.com

Phase 4 W1-2 / inherited Phase 3 follow-up #5. Replaces the AiRouter inline regex over the agent's final prose with ToolTraceCitationExtractor reading citations from the AgentResponse's tool-call result trace per ADR-0022. New abstraction ICitationExtractor (Application/Ai/Citations) localizes the extraction contract; two impls: - ToolTraceCitationExtractor: walks AgentResponse.Messages, extracts citations from FunctionResultContent. getMachineByTitle results produce structured citations from MachineGroundingDto.OpdbSourceUrl; connected sub-agent function results (Valuation/Rules/Repair from W1-1 wiring) produce citations from OPDB URLs embedded in the sub-agent's text reply. The Wizard's outer prose is NOT scanned — hallucinated URLs in Wizard text without backing tool-call results no longer count. - RegexLegacyCitationExtractor: the Phase 3 regex impl, kept behind AiFoundryOptions.RetainRegexCitationCutover (default true) for cutover observability per ADR-0022 § Telemetry. Both extractors emit pinwiz.ai.citations.extracted_total{source=...} so a behavioral regression in the new tool-trace one would surface before H3. RegexLegacyCitationExtractor + the cutover flag get deleted in a follow-up PR after H2 confirms parity-or-better. AiRouter refactored to consume both extractors via concrete-type injection (the cutover semantics are asymmetric — tool-trace is authoritative; regex is telemetry-only). The old inline ExtractCitationsFromText method + OpdbMachineUrlRegex are deleted; AiRouter no longer needs RegexOptions / GeneratedRegex. New telemetry: pinwiz.ai.citations.extracted_total counter, tagged with source (tool_trace | regex_legacy). Documented in the existing PinballWizardTelemetry instrument inventory. Tests: 691 → 707 (+16). 9 tool-trace tests cover null, no-tool-result, structured grounding, null grounding, missing-OPDB-URL, duplicate-call, sub-agent-text-mining, mixed grounding+sub-agent, and the no-Wizard-prose-counted invariant. 6 regex-legacy tests pin the Phase 3 behavior so cutover comparison has a stable point. Build clean, zero warnings.

Pre-push /local-review surfaced 0 🔴 + 2 ⚠️; both addressed: ⚠️ Comment typo in RegexLegacyCitationExtractor referenced the wrong property name (RetainRegexCutover → RetainRegexCitationCutover). One-line fix. ⚠️ Two test-gap additions: - Extract_SameUrlInDtoAndSubAgentText_Deduplicates pins the cross-channel deduplication (DTO + sub-agent text returning the same OPDB URL collapse to one citation via the shared seenUrls HashSet). - Extract_MultipleFunctionResultsInSingleMessage_AllProcessed pins the inner foreach over message.Contents — Microsoft.Agents.AI can pack multiple FunctionResultContent into a single tool ChatMessage. Tests: 707 → 709 (+2). 12 ✅ categories from the review confirmed. Build clean, zero warnings.

jkeeley2073 added 2 commits May 8, 2026 08:13

jkeeley2073 added the claude-code Generated with Claude Code label May 8, 2026

jkeeley2073 merged commit 0f23b7a into main May 8, 2026
4 of 5 checks passed

jkeeley2073 mentioned this pull request May 8, 2026

fix(ci) un-trip Sanitization on its own DL-0004 documentation #101

Merged

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai) Phase 4 W1-2 — tool-trace citation extraction (ADR-0022)#97

feat(ai) Phase 4 W1-2 — tool-trace citation extraction (ADR-0022)#97
jkeeley2073 merged 2 commits into
mainfrom
Dev-Phase4W12ToolTraceCitations

jkeeley2073 commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jkeeley2073 commented May 8, 2026

Summary

Test Plan

Out of Scope

Checklist

Pre-push self-audit

Step 0 — /local-review (qualitative)

Step 1 — Mechanical checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Step 0 — `/local-review` (qualitative)