feat(ai) Phase 4 W1-2 — tool-trace citation extraction (ADR-0022)#97
Merged
Conversation
Phase 4 W1-2 / inherited Phase 3 follow-up #5. Replaces the AiRouter inline regex over the agent's final prose with ToolTraceCitationExtractor reading citations from the AgentResponse's tool-call result trace per ADR-0022. New abstraction ICitationExtractor (Application/Ai/Citations) localizes the extraction contract; two impls: - ToolTraceCitationExtractor: walks AgentResponse.Messages, extracts citations from FunctionResultContent. getMachineByTitle results produce structured citations from MachineGroundingDto.OpdbSourceUrl; connected sub-agent function results (Valuation/Rules/Repair from W1-1 wiring) produce citations from OPDB URLs embedded in the sub-agent's text reply. The Wizard's outer prose is NOT scanned — hallucinated URLs in Wizard text without backing tool-call results no longer count. - RegexLegacyCitationExtractor: the Phase 3 regex impl, kept behind AiFoundryOptions.RetainRegexCitationCutover (default true) for cutover observability per ADR-0022 § Telemetry. Both extractors emit pinwiz.ai.citations.extracted_total{source=...} so a behavioral regression in the new tool-trace one would surface before H3. RegexLegacyCitationExtractor + the cutover flag get deleted in a follow-up PR after H2 confirms parity-or-better. AiRouter refactored to consume both extractors via concrete-type injection (the cutover semantics are asymmetric — tool-trace is authoritative; regex is telemetry-only). The old inline ExtractCitationsFromText method + OpdbMachineUrlRegex are deleted; AiRouter no longer needs RegexOptions / GeneratedRegex. New telemetry: pinwiz.ai.citations.extracted_total counter, tagged with source (tool_trace | regex_legacy). Documented in the existing PinballWizardTelemetry instrument inventory. Tests: 691 → 707 (+16). 9 tool-trace tests cover null, no-tool-result, structured grounding, null grounding, missing-OPDB-URL, duplicate-call, sub-agent-text-mining, mixed grounding+sub-agent, and the no-Wizard-prose-counted invariant. 6 regex-legacy tests pin the Phase 3 behavior so cutover comparison has a stable point. Build clean, zero warnings.
Pre-push /local-review surfaced 0 🔴 + 2⚠️ ; both addressed:⚠️ Comment typo in RegexLegacyCitationExtractor referenced the wrong property name (RetainRegexCutover → RetainRegexCitationCutover). One-line fix.⚠️ Two test-gap additions: - Extract_SameUrlInDtoAndSubAgentText_Deduplicates pins the cross-channel deduplication (DTO + sub-agent text returning the same OPDB URL collapse to one citation via the shared seenUrls HashSet). - Extract_MultipleFunctionResultsInSingleMessage_AllProcessed pins the inner foreach over message.Contents — Microsoft.Agents.AI can pack multiple FunctionResultContent into a single tool ChatMessage. Tests: 707 → 709 (+2). 12 ✅ categories from the review confirmed. Build clean, zero warnings.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 4 Wave 1 PR W1-2 — closes § Scope item 10 / inherited Phase 3 follow-up #5. Implements ADR-0022: citations come from the agent's tool-call result trace, not regex over the Wizard's outer prose.
New abstraction
ICitationExtractor(Application/Ai/Citations/) with two impls:ToolTraceCitationExtractor— walksAgentResponse.Messagesand extracts citations fromFunctionResultContentinstances.getMachineByTitleresults produce structured citations fromMachineGroundingDto.OpdbSourceUrl; connected sub-agent function results (Valuation/Rules/Repair from PR feat(ai) Phase 4 W1-1 — connected-agents wiring on Wizard via AsAIFunction #96 wiring) produce citations from OPDB URLs embedded in the sub-agent's text reply. The Wizard's outer assistant text is structurally unreachable by the extractor — hallucinated URLs in Wizard prose without backing tool-call results no longer count.RegexLegacyCitationExtractor— the Phase 3 regex impl, kept behindAiFoundryOptions.RetainRegexCitationCutover(default true) for the cutover observability window per ADR-0022 § Telemetry. Both extractors emitpinwiz.ai.citations.extracted_total{source=...}so a behavioral regression in the new tool-trace one would surface in metrics before H3 baseline. Counter, type, unit, and tag shape exactly match the ADR spec. After H2 confirms parity-or-better,RegexLegacyCitationExtractor+ the cutover flag get deleted in a follow-up PR.AiRouterrefactored: inlineExtractCitationsFromText+OpdbMachineUrlRegexdeleted; AiRouter is no longerpartial;using System.Text.RegularExpressionsremoved. Both extractors are concrete-type-injected intoAiRouter(the cutover semantics are asymmetric — tool-trace authoritative, regex_legacy telemetry-only).This PR + W1-1 (PR #96) together address the two structural causes of Phase 3's H2 baseline
citation_precision=0.133: connected-agents wiring (W1-1) made sub-agent grounding reachable, tool-trace extraction (W1-2) makes citations structurally tied to what the agent actually retrieved.Test Plan
dotnet build PinballWizard.slnx— Build succeeded, 0 Warning(s), 0 Error(s)dotnet test PinballWizard.slnx— Passed: 709, Failed: 0, Skipped: 0 (was 691; +18 new tests = 11 tool-trace + 6 regex-legacy + 1 cross-channel-dedup + 1 multi-FRC-in-message)citation_precisionlift from the new structural extractorOut of Scope
searchCorpustool integration (Phase 4 scope item 21, separate PR — extendsToolTraceCitationExtractorsymmetrically withRetrievedChunk[]results from RAG retrieval)citations.Count == 0as the refusal trigger)RegexLegacyCitationExtractorandRetainRegexCitationCutoverflag (deferred to a follow-up PR after H2 baseline confirms parity-or-better)Checklist
docs/adr/— implements ADR-0022 (already merged in Wave 0)WizardAnsweris unchanged)~/.claude/projects/c--projects-PinballWizard/memory/is now stale, it has been updated or removed in the same PR — wave0_close handoff already references W1-2 as queuedTODO/FIXME/ commented-out code committed<NoWarn>without a comment explaining why and the removal criterion — N/A (no Directory.Build.props changes)Pre-push self-audit
Step 0 —
/local-review(qualitative)/local-reviewand addressed every 🔴 finding before pushRegexLegacyCitationExtractor.cs:8referenced wrong property nameExtract_SameUrlInDtoAndSubAgentText_DeduplicatesandExtract_MultipleFunctionResultsInSingleMessage_AllProcessedtest-gap fillsStep 1 — Mechanical checklist
*Optionsproperty has at least one real getter call insrc/—RetainRegexCitationCutoveris read atAiRouter.cs:179/local-review § 4;RegexLegacyCitationExtractorpreserves the prior inline implementation's behavior exactly so cutover comparison counts will be apples-to-applescatch { }— no new try/catch addedISourceScraper? — N/Agit log -1 --format='%an <%ae>'shows personal noreply, not work email — confirmed94459922+jkeeley2073@users.noreply.github.com