feat(observability) Phase 2.1 wire pinwiz.ai.tool_duration_ms histogram#127
Merged
Conversation
Lead action of the Phase 2 brainstorm half of the observability follow-up (memory/project_observability_followup_per_tool_metrics.md). Closes the last cross-reference left dangling after PR #125 — the `pinwiz.rag.retrieval_duration_ms` description on main says "pair with `pinwiz.ai.tool_duration_ms` (next observability batch)" — that next batch is this PR. What this adds: - New `Histogram<double>` `AiToolDurationMs` on PinballWizardTelemetry, emission name `pinwiz.ai.tool_duration_ms`, unit `ms`, tagged `tool` (values: `searchCorpus` | `getMachineByTitle`). - Stopwatch + try/finally instrumentation in both MachineGroundingTool.GetMachineByTitleAsync and SearchCorpusTool.SearchCorpusAsync. Whitespace-input short-circuits fire BEFORE the Stopwatch starts so dashboards don't see noise samples for prompts the orchestrator accidentally produced. The SearchCorpus outer-finally wraps the inner try/catch so the empty-result fail- closed path (ADR-0023 § Negative consequence #3) still emits — slow- then-empty and fast-then-empty are operationally different alerts and the metric must observe both. - ToolTagValue constants on both tool classes so dashboards, prompts, and tests agree on the label and a future rename can't drift. - One pinning test (name + unit) and three emission tests per tool (success / not-found-or-throws-but-still-emits / whitespace-skips emission). Tests use ConcurrentBag for sample collection because parallel test classes that emit to the same process-global Meter produce concurrent measurement callbacks; assertions use Contains rather than Single because the listener captures emissions from any parallel test that hits the same instrument and "exactly one" would couple the test to scheduler. - `docs/observability.md`: new `pinwiz.ai.tool_duration_ms` row in the AI orchestrator instruments table, plus a paragraph explaining why the brainstorm-draft `cache_state` tag was dropped (the LRU semantic cache per ADR-0015 wraps IAiRouter ABOVE the tools, so when a tool fires the cache state is structurally always 'miss-path' — the signal lives on `pinwiz.ai.cache.{hits,misses}` instead). - Bonus: documented the `pinwiz.ai.tool_errors_total` counter that PR #120 wired but didn't add to the docs table. Why now: PR #125 closed the gap-closure half of the observability follow-up (the four `pinwiz.rag.*` instruments documented but not emitting). Phase 2.1 is the brainstorm half's lead action. Discipline per memory entry: gaps closed before brainstorm extension, so the canonical observability spec was honored end-to-end before adding new instruments. Tests: 887/887 passing, build zero-warning.
Comment on lines
+109
to
+128
| catch (Exception ex) | ||
| { | ||
| // Transport-level failures (auth, network, AI Search 5xx) | ||
| // are caught + surfaced as empty so the model can't loop | ||
| // on a transient failure. The tool-errors counter (per | ||
| // ADR-0023 § Negative consequence #3) tags by `tool` so | ||
| // operator dashboards distinguish retrieval-side failures | ||
| // from agent-didn't-call-tool cases — both produce empty | ||
| // citation sets but they need different alerts. | ||
| PinballWizardTelemetry.AiToolErrors.Add(1, | ||
| new KeyValuePair<string, object?>("tool", ToolTagValue)); | ||
| _logger.LogWarning( | ||
| ex, | ||
| "SearchCorpusTool: retriever threw — returning empty result. query length={QueryLength} machineId={MachineId} documentType={DocumentType} topK={TopK}", | ||
| query.Length, | ||
| options.MachineId ?? "(any)", | ||
| options.DocumentType ?? "(any)", | ||
| options.TopK); | ||
| return new SearchCorpusResult([]); | ||
| } |
Comment on lines
+109
to
+128
| catch (Exception ex) | ||
| { | ||
| // Transport-level failures (auth, network, AI Search 5xx) | ||
| // are caught + surfaced as empty so the model can't loop | ||
| // on a transient failure. The tool-errors counter (per | ||
| // ADR-0023 § Negative consequence #3) tags by `tool` so | ||
| // operator dashboards distinguish retrieval-side failures | ||
| // from agent-didn't-call-tool cases — both produce empty | ||
| // citation sets but they need different alerts. | ||
| PinballWizardTelemetry.AiToolErrors.Add(1, | ||
| new KeyValuePair<string, object?>("tool", ToolTagValue)); | ||
| _logger.LogWarning( | ||
| ex, | ||
| "SearchCorpusTool: retriever threw — returning empty result. query length={QueryLength} machineId={MachineId} documentType={DocumentType} topK={TopK}", | ||
| query.Length, | ||
| options.MachineId ?? "(any)", | ||
| options.DocumentType ?? "(any)", | ||
| options.TopK); | ||
| return new SearchCorpusResult([]); | ||
| } |
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
memory/project_observability_followup_per_tool_metrics.md). Closes thepinwiz.rag.retrieval_duration_msdescription's forward-reference left dangling on main: "pair withpinwiz.ai.tool_duration_ms(next observability batch)" — this PR is that batch.pinwiz.ai.tool_duration_ms(unitms, taggedtool) wired inMachineGroundingTool.GetMachineByTitleAsyncandSearchCorpusTool.SearchCorpusAsync. Drives thedocs/architecture-v2.md§7.1 user-delight revisit triggers (200ms p95 structured-records latency forgetMachineByTitle, 500ms cold-start forsearchCorpus).Wiring details
Stopwatch.StartNew()so dashboards don't see noise samples for empty/whitespace prompts the orchestrator accidentally produced.SearchCorpusToolouter finally wraps the inner try/catch — the empty-result fail-closed path (ADR-0023 § Negative consequence Reframe project principle: enterprise-quality portfolio bar #3) still emits. Slow-then-empty and fast-then-empty are operationally different alerts; the metric must observe both.ToolTagValueconstants on both tool classes so dashboards, prompts, and tests agree on the label and a future rename can't drift unobserved.Design pivot —
cache_statetag droppedThe brainstorm draft (
memory/project_observability_followup_per_tool_metrics.md§ 2.1) proposed a secondcache_statetag. Code-level review found the LRU semantic cache (per ADR-0015) lives inSemanticAnswerCachewrappingAiRouter— above the tools. When a tool fires, the cache state is structurally always "miss-path";cache_statewould be a constant column with zero dashboard signal. Cache effectiveness is observed via the existingpinwiz.ai.cache.hits+pinwiz.ai.cache.missescounters. The pivot is documented in:AiToolDurationMsdescription onPinballWizardTelemetrydocs/observability.mdmemory/project_observability_followup_per_tool_metrics.md§ 2.1 "Design pivot")A future per-tool internal cache (e.g.
searchCorpuscaching AI-Search query results) is the right time to revisit the tag.Bonus closure
docs/observability.mdwas missing a row for thepinwiz.ai.tool_errors_totalcounter that PR #120 wired (the description claimed both tools tag it but the docs table never got the entry). Added the row alongsidetool_duration_msso both per-tool metrics are documented together.Tests
AiToolDurationMsHistogram_HasExpectedNameAndUnitinPinballWizardTelemetryTests, matching the pattern used for the OPDB and RAG instrument suites.MachineGroundingToolTests+SearchCorpusToolTests:tooltag (both tools)ConcurrentBag+Assert.Containsrather thanList+Singlebecause parallel test classes that emit to the same process-global Meter produce concurrent measurement callbacks; the originalSingleassertion was flaky on the empty-result tests for that reason. The new pattern keeps tests parallel-by-default while pinning the actual contract ("tool emits the metric tagged with its name").Test plan
git log -1 --format='%an <%ae>'on this branch shows the personal noreply🤖 Generated with Claude Code