Skip to content

feat(observability) Phase 2.1 wire pinwiz.ai.tool_duration_ms histogram#127

Merged
jkeeley2073 merged 1 commit into
mainfrom
Dev-Phase4Phase2ToolDurationMetric
May 8, 2026
Merged

feat(observability) Phase 2.1 wire pinwiz.ai.tool_duration_ms histogram#127
jkeeley2073 merged 1 commit into
mainfrom
Dev-Phase4Phase2ToolDurationMetric

Conversation

@jkeeley2073
Copy link
Copy Markdown
Contributor

Summary

  • Lead action of the Phase 2 brainstorm half of the observability follow-up (memory/project_observability_followup_per_tool_metrics.md). Closes the pinwiz.rag.retrieval_duration_ms description's forward-reference left dangling on main: "pair with pinwiz.ai.tool_duration_ms (next observability batch)" — this PR is that batch.
  • New histogram pinwiz.ai.tool_duration_ms (unit ms, tagged tool) wired in MachineGroundingTool.GetMachineByTitleAsync and SearchCorpusTool.SearchCorpusAsync. Drives the docs/architecture-v2.md §7.1 user-delight revisit triggers (200ms p95 structured-records latency for getMachineByTitle, 500ms cold-start for searchCorpus).
  • Tests: 887/887 passing on this branch; build zero-warning.

Wiring details

  • Stopwatch + try/finally in both tools. The whitespace-input short-circuit fires BEFORE Stopwatch.StartNew() so dashboards don't see noise samples for empty/whitespace prompts the orchestrator accidentally produced.
  • SearchCorpusTool outer finally wraps the inner try/catch — the empty-result fail-closed path (ADR-0023 § Negative consequence Reframe project principle: enterprise-quality portfolio bar #3) still emits. Slow-then-empty and fast-then-empty are operationally different alerts; the metric must observe both.
  • ToolTagValue constants on both tool classes so dashboards, prompts, and tests agree on the label and a future rename can't drift unobserved.

Design pivot — cache_state tag dropped

The brainstorm draft (memory/project_observability_followup_per_tool_metrics.md § 2.1) proposed a second cache_state tag. Code-level review found the LRU semantic cache (per ADR-0015) lives in SemanticAnswerCache wrapping AiRouterabove the tools. When a tool fires, the cache state is structurally always "miss-path"; cache_state would be a constant column with zero dashboard signal. Cache effectiveness is observed via the existing pinwiz.ai.cache.hits + pinwiz.ai.cache.misses counters. The pivot is documented in:

  • The AiToolDurationMs description on PinballWizardTelemetry
  • A note paragraph under the AI-orchestrator instruments table in docs/observability.md
  • The follow-up memory entry (memory/project_observability_followup_per_tool_metrics.md § 2.1 "Design pivot")

A future per-tool internal cache (e.g. searchCorpus caching AI-Search query results) is the right time to revisit the tag.

Bonus closure

docs/observability.md was missing a row for the pinwiz.ai.tool_errors_total counter that PR #120 wired (the description claimed both tools tag it but the docs table never got the entry). Added the row alongside tool_duration_ms so both per-tool metrics are documented together.

Tests

  • Pinning testAiToolDurationMsHistogram_HasExpectedNameAndUnit in PinballWizardTelemetryTests, matching the pattern used for the OPDB and RAG instrument suites.
  • Emission tests — six new tests across MachineGroundingToolTests + SearchCorpusToolTests:
    • Success path emits with correct tool tag (both tools)
    • Fast/no-result/throw paths still emit (no-match / retriever-throws — operational coverage of the "slow vs fast empty" distinction)
    • Whitespace short-circuit does NOT emit (no-op-path noise prevention)
  • Test infrastructure uses ConcurrentBag + Assert.Contains rather than List + Single because parallel test classes that emit to the same process-global Meter produce concurrent measurement callbacks; the original Single assertion was flaky on the empty-result tests for that reason. The new pattern keeps tests parallel-by-default while pinning the actual contract ("tool emits the metric tagged with its name").

Test plan

  • CI green
  • Local-review subagent (running in parallel) returns SHIP verdict
  • Identity check: git log -1 --format='%an <%ae>' on this branch shows the personal noreply

🤖 Generated with Claude Code

Lead action of the Phase 2 brainstorm half of the observability follow-up
(memory/project_observability_followup_per_tool_metrics.md). Closes the
last cross-reference left dangling after PR #125 — the
`pinwiz.rag.retrieval_duration_ms` description on main says "pair with
`pinwiz.ai.tool_duration_ms` (next observability batch)" — that next
batch is this PR.

What this adds:
- New `Histogram<double>` `AiToolDurationMs` on PinballWizardTelemetry,
  emission name `pinwiz.ai.tool_duration_ms`, unit `ms`, tagged `tool`
  (values: `searchCorpus` | `getMachineByTitle`).
- Stopwatch + try/finally instrumentation in both
  MachineGroundingTool.GetMachineByTitleAsync and
  SearchCorpusTool.SearchCorpusAsync. Whitespace-input short-circuits
  fire BEFORE the Stopwatch starts so dashboards don't see noise samples
  for prompts the orchestrator accidentally produced. The SearchCorpus
  outer-finally wraps the inner try/catch so the empty-result fail-
  closed path (ADR-0023 § Negative consequence #3) still emits — slow-
  then-empty and fast-then-empty are operationally different alerts and
  the metric must observe both.
- ToolTagValue constants on both tool classes so dashboards, prompts,
  and tests agree on the label and a future rename can't drift.
- One pinning test (name + unit) and three emission tests per tool
  (success / not-found-or-throws-but-still-emits / whitespace-skips
  emission). Tests use ConcurrentBag for sample collection because
  parallel test classes that emit to the same process-global Meter
  produce concurrent measurement callbacks; assertions use Contains
  rather than Single because the listener captures emissions from any
  parallel test that hits the same instrument and "exactly one" would
  couple the test to scheduler.
- `docs/observability.md`: new `pinwiz.ai.tool_duration_ms` row in the
  AI orchestrator instruments table, plus a paragraph explaining why
  the brainstorm-draft `cache_state` tag was dropped (the LRU semantic
  cache per ADR-0015 wraps IAiRouter ABOVE the tools, so when a tool
  fires the cache state is structurally always 'miss-path' — the
  signal lives on `pinwiz.ai.cache.{hits,misses}` instead).
- Bonus: documented the `pinwiz.ai.tool_errors_total` counter that PR
  #120 wired but didn't add to the docs table.

Why now: PR #125 closed the gap-closure half of the observability
follow-up (the four `pinwiz.rag.*` instruments documented but not
emitting). Phase 2.1 is the brainstorm half's lead action. Discipline
per memory entry: gaps closed before brainstorm extension, so the
canonical observability spec was honored end-to-end before adding new
instruments.

Tests: 887/887 passing, build zero-warning.
@jkeeley2073 jkeeley2073 enabled auto-merge May 8, 2026 23:46
Comment on lines +109 to +128
catch (Exception ex)
{
// Transport-level failures (auth, network, AI Search 5xx)
// are caught + surfaced as empty so the model can't loop
// on a transient failure. The tool-errors counter (per
// ADR-0023 § Negative consequence #3) tags by `tool` so
// operator dashboards distinguish retrieval-side failures
// from agent-didn't-call-tool cases — both produce empty
// citation sets but they need different alerts.
PinballWizardTelemetry.AiToolErrors.Add(1,
new KeyValuePair<string, object?>("tool", ToolTagValue));
_logger.LogWarning(
ex,
"SearchCorpusTool: retriever threw — returning empty result. query length={QueryLength} machineId={MachineId} documentType={DocumentType} topK={TopK}",
query.Length,
options.MachineId ?? "(any)",
options.DocumentType ?? "(any)",
options.TopK);
return new SearchCorpusResult([]);
}
Comment on lines +109 to +128
catch (Exception ex)
{
// Transport-level failures (auth, network, AI Search 5xx)
// are caught + surfaced as empty so the model can't loop
// on a transient failure. The tool-errors counter (per
// ADR-0023 § Negative consequence #3) tags by `tool` so
// operator dashboards distinguish retrieval-side failures
// from agent-didn't-call-tool cases — both produce empty
// citation sets but they need different alerts.
PinballWizardTelemetry.AiToolErrors.Add(1,
new KeyValuePair<string, object?>("tool", ToolTagValue));
_logger.LogWarning(
ex,
"SearchCorpusTool: retriever threw — returning empty result. query length={QueryLength} machineId={MachineId} documentType={DocumentType} topK={TopK}",
query.Length,
options.MachineId ?? "(any)",
options.DocumentType ?? "(any)",
options.TopK);
return new SearchCorpusResult([]);
}
@jkeeley2073 jkeeley2073 merged commit 1cc69ec into main May 8, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants