feat(observability) Phase 2.1 wire pinwiz.ai.tool_duration_ms histogram by jkeeley2073 · Pull Request #127 · Early-Bird-Solutions-LLC/PinballWizard

jkeeley2073 · 2026-05-08T23:46:43Z

Summary

Lead action of the Phase 2 brainstorm half of the observability follow-up (memory/project_observability_followup_per_tool_metrics.md). Closes the pinwiz.rag.retrieval_duration_ms description's forward-reference left dangling on main: "pair with pinwiz.ai.tool_duration_ms (next observability batch)" — this PR is that batch.
New histogram pinwiz.ai.tool_duration_ms (unit ms, tagged tool) wired in MachineGroundingTool.GetMachineByTitleAsync and SearchCorpusTool.SearchCorpusAsync. Drives the docs/architecture-v2.md §7.1 user-delight revisit triggers (200ms p95 structured-records latency for getMachineByTitle, 500ms cold-start for searchCorpus).
Tests: 887/887 passing on this branch; build zero-warning.

Wiring details

Stopwatch + try/finally in both tools. The whitespace-input short-circuit fires BEFORE Stopwatch.StartNew() so dashboards don't see noise samples for empty/whitespace prompts the orchestrator accidentally produced.
SearchCorpusTool outer finally wraps the inner try/catch — the empty-result fail-closed path (ADR-0023 § Negative consequence Reframe project principle: enterprise-quality portfolio bar #3) still emits. Slow-then-empty and fast-then-empty are operationally different alerts; the metric must observe both.
ToolTagValue constants on both tool classes so dashboards, prompts, and tests agree on the label and a future rename can't drift unobserved.

Design pivot — `cache_state` tag dropped

The brainstorm draft (memory/project_observability_followup_per_tool_metrics.md § 2.1) proposed a second cache_state tag. Code-level review found the LRU semantic cache (per ADR-0015) lives in SemanticAnswerCache wrapping AiRouter — above the tools. When a tool fires, the cache state is structurally always "miss-path"; cache_state would be a constant column with zero dashboard signal. Cache effectiveness is observed via the existing pinwiz.ai.cache.hits + pinwiz.ai.cache.misses counters. The pivot is documented in:

The AiToolDurationMs description on PinballWizardTelemetry
A note paragraph under the AI-orchestrator instruments table in docs/observability.md
The follow-up memory entry (memory/project_observability_followup_per_tool_metrics.md § 2.1 "Design pivot")

A future per-tool internal cache (e.g. searchCorpus caching AI-Search query results) is the right time to revisit the tag.

Bonus closure

docs/observability.md was missing a row for the pinwiz.ai.tool_errors_total counter that PR #120 wired (the description claimed both tools tag it but the docs table never got the entry). Added the row alongside tool_duration_ms so both per-tool metrics are documented together.

Tests

Pinning test — AiToolDurationMsHistogram_HasExpectedNameAndUnit in PinballWizardTelemetryTests, matching the pattern used for the OPDB and RAG instrument suites.
Emission tests — six new tests across MachineGroundingToolTests + SearchCorpusToolTests:
- Success path emits with correct tool tag (both tools)
- Fast/no-result/throw paths still emit (no-match / retriever-throws — operational coverage of the "slow vs fast empty" distinction)
- Whitespace short-circuit does NOT emit (no-op-path noise prevention)
Test infrastructure uses ConcurrentBag + Assert.Contains rather than List + Single because parallel test classes that emit to the same process-global Meter produce concurrent measurement callbacks; the original Single assertion was flaky on the empty-result tests for that reason. The new pattern keeps tests parallel-by-default while pinning the actual contract ("tool emits the metric tagged with its name").

Test plan

CI green
Local-review subagent (running in parallel) returns SHIP verdict
Identity check: git log -1 --format='%an <%ae>' on this branch shows the personal noreply

🤖 Generated with Claude Code

Lead action of the Phase 2 brainstorm half of the observability follow-up (memory/project_observability_followup_per_tool_metrics.md). Closes the last cross-reference left dangling after PR #125 — the `pinwiz.rag.retrieval_duration_ms` description on main says "pair with `pinwiz.ai.tool_duration_ms` (next observability batch)" — that next batch is this PR. What this adds: - New `Histogram<double>` `AiToolDurationMs` on PinballWizardTelemetry, emission name `pinwiz.ai.tool_duration_ms`, unit `ms`, tagged `tool` (values: `searchCorpus` | `getMachineByTitle`). - Stopwatch + try/finally instrumentation in both MachineGroundingTool.GetMachineByTitleAsync and SearchCorpusTool.SearchCorpusAsync. Whitespace-input short-circuits fire BEFORE the Stopwatch starts so dashboards don't see noise samples for prompts the orchestrator accidentally produced. The SearchCorpus outer-finally wraps the inner try/catch so the empty-result fail- closed path (ADR-0023 § Negative consequence #3) still emits — slow- then-empty and fast-then-empty are operationally different alerts and the metric must observe both. - ToolTagValue constants on both tool classes so dashboards, prompts, and tests agree on the label and a future rename can't drift. - One pinning test (name + unit) and three emission tests per tool (success / not-found-or-throws-but-still-emits / whitespace-skips emission). Tests use ConcurrentBag for sample collection because parallel test classes that emit to the same process-global Meter produce concurrent measurement callbacks; assertions use Contains rather than Single because the listener captures emissions from any parallel test that hits the same instrument and "exactly one" would couple the test to scheduler. - `docs/observability.md`: new `pinwiz.ai.tool_duration_ms` row in the AI orchestrator instruments table, plus a paragraph explaining why the brainstorm-draft `cache_state` tag was dropped (the LRU semantic cache per ADR-0015 wraps IAiRouter ABOVE the tools, so when a tool fires the cache state is structurally always 'miss-path' — the signal lives on `pinwiz.ai.cache.{hits,misses}` instead). - Bonus: documented the `pinwiz.ai.tool_errors_total` counter that PR #120 wired but didn't add to the docs table. Why now: PR #125 closed the gap-closure half of the observability follow-up (the four `pinwiz.rag.*` instruments documented but not emitting). Phase 2.1 is the brainstorm half's lead action. Discipline per memory entry: gaps closed before brainstorm extension, so the canonical observability spec was honored end-to-end before adding new instruments. Tests: 887/887 passing, build zero-warning.

+            catch (Exception ex)
+            {
+                // Transport-level failures (auth, network, AI Search 5xx)
+                // are caught + surfaced as empty so the model can't loop
+                // on a transient failure. The tool-errors counter (per
+                // ADR-0023 § Negative consequence #3) tags by `tool` so
+                // operator dashboards distinguish retrieval-side failures
+                // from agent-didn't-call-tool cases — both produce empty
+                // citation sets but they need different alerts.
+                PinballWizardTelemetry.AiToolErrors.Add(1,
+                    new KeyValuePair<string, object?>("tool", ToolTagValue));
+                _logger.LogWarning(
+                    ex,
+                    "SearchCorpusTool: retriever threw — returning empty result. query length={QueryLength} machineId={MachineId} documentType={DocumentType} topK={TopK}",
+                    query.Length,
+                    options.MachineId ?? "(any)",
+                    options.DocumentType ?? "(any)",
+                    options.TopK);
+                return new SearchCorpusResult([]);
+            }


+            catch (Exception ex)
+            {
+                // Transport-level failures (auth, network, AI Search 5xx)
+                // are caught + surfaced as empty so the model can't loop
+                // on a transient failure. The tool-errors counter (per
+                // ADR-0023 § Negative consequence #3) tags by `tool` so
+                // operator dashboards distinguish retrieval-side failures
+                // from agent-didn't-call-tool cases — both produce empty
+                // citation sets but they need different alerts.
+                PinballWizardTelemetry.AiToolErrors.Add(1,
+                    new KeyValuePair<string, object?>("tool", ToolTagValue));
+                _logger.LogWarning(
+                    ex,
+                    "SearchCorpusTool: retriever threw — returning empty result. query length={QueryLength} machineId={MachineId} documentType={DocumentType} topK={TopK}",
+                    query.Length,
+                    options.MachineId ?? "(any)",
+                    options.DocumentType ?? "(any)",
+                    options.TopK);
+                return new SearchCorpusResult([]);
+            }


jkeeley2073 enabled auto-merge May 8, 2026 23:46

github-code-quality Bot found potential problems May 8, 2026

View reviewed changes

github-advanced-security AI found potential problems May 8, 2026

View reviewed changes

jkeeley2073 merged commit 1cc69ec into main May 8, 2026
5 checks passed

jkeeley2073 mentioned this pull request May 9, 2026

chore(docs+tests) close two small W3-2 follow-ups #138

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observability) Phase 2.1 wire pinwiz.ai.tool_duration_ms histogram#127

feat(observability) Phase 2.1 wire pinwiz.ai.tool_duration_ms histogram#127
jkeeley2073 merged 1 commit into
mainfrom
Dev-Phase4Phase2ToolDurationMetric

jkeeley2073 commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jkeeley2073 commented May 8, 2026

Summary

Wiring details

Design pivot — cache_state tag dropped

Bonus closure

Tests

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Design pivot — `cache_state` tag dropped