Skip to content

feat(ai) Phase 4 W1-2 — tool-trace citation extraction (ADR-0022)#97

Merged
jkeeley2073 merged 2 commits into
mainfrom
Dev-Phase4W12ToolTraceCitations
May 8, 2026
Merged

feat(ai) Phase 4 W1-2 — tool-trace citation extraction (ADR-0022)#97
jkeeley2073 merged 2 commits into
mainfrom
Dev-Phase4W12ToolTraceCitations

Conversation

@jkeeley2073
Copy link
Copy Markdown
Contributor

Summary

Phase 4 Wave 1 PR W1-2 — closes § Scope item 10 / inherited Phase 3 follow-up #5. Implements ADR-0022: citations come from the agent's tool-call result trace, not regex over the Wizard's outer prose.

New abstraction ICitationExtractor (Application/Ai/Citations/) with two impls:

  • ToolTraceCitationExtractor — walks AgentResponse.Messages and extracts citations from FunctionResultContent instances. getMachineByTitle results produce structured citations from MachineGroundingDto.OpdbSourceUrl; connected sub-agent function results (Valuation/Rules/Repair from PR feat(ai) Phase 4 W1-1 — connected-agents wiring on Wizard via AsAIFunction #96 wiring) produce citations from OPDB URLs embedded in the sub-agent's text reply. The Wizard's outer assistant text is structurally unreachable by the extractor — hallucinated URLs in Wizard prose without backing tool-call results no longer count.

  • RegexLegacyCitationExtractor — the Phase 3 regex impl, kept behind AiFoundryOptions.RetainRegexCitationCutover (default true) for the cutover observability window per ADR-0022 § Telemetry. Both extractors emit pinwiz.ai.citations.extracted_total{source=...} so a behavioral regression in the new tool-trace one would surface in metrics before H3 baseline. Counter, type, unit, and tag shape exactly match the ADR spec. After H2 confirms parity-or-better, RegexLegacyCitationExtractor + the cutover flag get deleted in a follow-up PR.

AiRouter refactored: inline ExtractCitationsFromText + OpdbMachineUrlRegex deleted; AiRouter is no longer partial; using System.Text.RegularExpressions removed. Both extractors are concrete-type-injected into AiRouter (the cutover semantics are asymmetric — tool-trace authoritative, regex_legacy telemetry-only).

This PR + W1-1 (PR #96) together address the two structural causes of Phase 3's H2 baseline citation_precision=0.133: connected-agents wiring (W1-1) made sub-agent grounding reachable, tool-trace extraction (W1-2) makes citations structurally tied to what the agent actually retrieved.

Test Plan

  • dotnet build PinballWizard.slnx — Build succeeded, 0 Warning(s), 0 Error(s)
  • dotnet test PinballWizard.slnx — Passed: 709, Failed: 0, Skipped: 0 (was 691; +18 new tests = 11 tool-trace + 6 regex-legacy + 1 cross-channel-dedup + 1 multi-FRC-in-message)
  • Tool-trace tests cover: null response, no FRC, structured grounding DTO, null grounding, missing-OPDB-URL DTO, duplicate calls (same DTO twice), sub-agent text mining, mixed grounding+sub-agent (union), cross-channel dedup (DTO+text same URL), multiple FRC in single message, no-Wizard-prose-counted invariant (the headline behavior change vs. regex)
  • Regex-legacy tests pin Phase 3 behavior so cutover comparison has a stable comparison point
  • Live integration verification post-merge: H2 baseline rerun (operational hand-off) measures actual citation_precision lift from the new structural extractor

Out of Scope

  • searchCorpus tool integration (Phase 4 scope item 21, separate PR — extends ToolTraceCitationExtractor symmetrically with RetrievedChunk[] results from RAG retrieval)
  • Citation-required guardrail (ADR-0023, Phase 4 scope item 23 — depends on this extractor being structural; that PR uses citations.Count == 0 as the refusal trigger)
  • Removal of RegexLegacyCitationExtractor and RetainRegexCitationCutover flag (deferred to a follow-up PR after H2 baseline confirms parity-or-better)

Checklist

  • CI is green (build + test + coverage + CodeQL + sanitization) — verified pre-push; will re-confirm in PR checks
  • PR title follows the Conventional Commits format above
  • If this is a new architectural decision, an ADR has been added under docs/adr/ — implements ADR-0022 (already merged in Wave 0)
  • If user-visible behavior changes, README.md and/or docs/ are updated in the same PR — citation extraction is internal mechanics; no user-visible change yet (the citation surface in WizardAnswer is unchanged)
  • If a memory in ~/.claude/projects/c--projects-PinballWizard/memory/ is now stale, it has been updated or removed in the same PR — wave0_close handoff already references W1-2 as queued
  • No TODO / FIXME / commented-out code committed
  • No new entries in <NoWarn> without a comment explaining why and the removal criterion — N/A (no Directory.Build.props changes)

Pre-push self-audit

Step 0 — /local-review (qualitative)

  • Ran /local-review and addressed every 🔴 finding before push
  • Local review outcome: 0 🔴 / 2 ⚠️ (both fixed pre-push) / 12 categories ✅
    • Fixed: comment typo in RegexLegacyCitationExtractor.cs:8 referenced wrong property name
    • Fixed: added Extract_SameUrlInDtoAndSubAgentText_Deduplicates and Extract_MultipleFunctionResultsInSingleMessage_AllProcessed test-gap fills

Step 1 — Mechanical checklist

  • Every new *Options property has at least one real getter call in src/RetainRegexCitationCutover is read at AiRouter.cs:179
  • Sibling-diffed against the closest existing implementation — verified by /local-review § 4; RegexLegacyCitationExtractor preserves the prior inline implementation's behavior exactly so cutover comparison counts will be apples-to-apples
  • No bare catch { } — no new try/catch added
  • New ISourceScraper? — N/A
  • Tests assert behavior, not just structure — every test exercises a specific ADR-0022 invariant or behavior (no-prose-counted, structured grounding, sub-agent text mining, cross-channel dedup, etc.)
  • Build is zero-warning — verified
  • git log -1 --format='%an <%ae>' shows personal noreply, not work email — confirmed 94459922+jkeeley2073@users.noreply.github.com

Phase 4 W1-2 / inherited Phase 3 follow-up #5.

Replaces the AiRouter inline regex over the agent's final prose with
ToolTraceCitationExtractor reading citations from the AgentResponse's
tool-call result trace per ADR-0022. New abstraction
ICitationExtractor (Application/Ai/Citations) localizes the extraction
contract; two impls:

- ToolTraceCitationExtractor: walks AgentResponse.Messages, extracts
  citations from FunctionResultContent. getMachineByTitle results
  produce structured citations from MachineGroundingDto.OpdbSourceUrl;
  connected sub-agent function results (Valuation/Rules/Repair from
  W1-1 wiring) produce citations from OPDB URLs embedded in the
  sub-agent's text reply. The Wizard's outer prose is NOT scanned —
  hallucinated URLs in Wizard text without backing tool-call results
  no longer count.

- RegexLegacyCitationExtractor: the Phase 3 regex impl, kept behind
  AiFoundryOptions.RetainRegexCitationCutover (default true) for
  cutover observability per ADR-0022 § Telemetry. Both extractors
  emit pinwiz.ai.citations.extracted_total{source=...} so a behavioral
  regression in the new tool-trace one would surface before H3.
  RegexLegacyCitationExtractor + the cutover flag get deleted in a
  follow-up PR after H2 confirms parity-or-better.

AiRouter refactored to consume both extractors via concrete-type
injection (the cutover semantics are asymmetric — tool-trace is
authoritative; regex is telemetry-only). The old inline
ExtractCitationsFromText method + OpdbMachineUrlRegex are deleted;
AiRouter no longer needs RegexOptions / GeneratedRegex.

New telemetry: pinwiz.ai.citations.extracted_total counter, tagged
with source (tool_trace | regex_legacy). Documented in the existing
PinballWizardTelemetry instrument inventory.

Tests: 691 → 707 (+16). 9 tool-trace tests cover null, no-tool-result,
structured grounding, null grounding, missing-OPDB-URL, duplicate-call,
sub-agent-text-mining, mixed grounding+sub-agent, and the
no-Wizard-prose-counted invariant. 6 regex-legacy tests pin the Phase
3 behavior so cutover comparison has a stable point.

Build clean, zero warnings.
Pre-push /local-review surfaced 0 🔴 + 2 ⚠️; both addressed:

⚠️ Comment typo in RegexLegacyCitationExtractor referenced the wrong
   property name (RetainRegexCutover → RetainRegexCitationCutover).
   One-line fix.

⚠️ Two test-gap additions:
   - Extract_SameUrlInDtoAndSubAgentText_Deduplicates pins the
     cross-channel deduplication (DTO + sub-agent text returning the
     same OPDB URL collapse to one citation via the shared seenUrls
     HashSet).
   - Extract_MultipleFunctionResultsInSingleMessage_AllProcessed pins
     the inner foreach over message.Contents — Microsoft.Agents.AI
     can pack multiple FunctionResultContent into a single tool
     ChatMessage.

Tests: 707 → 709 (+2). 12 ✅ categories from the review confirmed.
Build clean, zero warnings.
@jkeeley2073 jkeeley2073 added the claude-code Generated with Claude Code label May 8, 2026
@jkeeley2073 jkeeley2073 merged commit 0f23b7a into main May 8, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code Generated with Claude Code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant