Skip to content

feat(ai) cost-attribution telemetry + per-call ceiling + observability docs (Wave 3 PR 7)#91

Merged
jkeeley2073 merged 1 commit into
mainfrom
Dev-Phase3CostTelemetryAndCeiling
May 7, 2026
Merged

feat(ai) cost-attribution telemetry + per-call ceiling + observability docs (Wave 3 PR 7)#91
jkeeley2073 merged 1 commit into
mainfrom
Dev-Phase3CostTelemetryAndCeiling

Conversation

@jkeeley2073
Copy link
Copy Markdown
Contributor

Summary

Wave 3 PR 7 — closes the cost half of Wave 3. Per ADR-0015. PR 8 (eval harness) is in flight in a worktree-isolated subagent and lands separately; the two only conflict on Program.cs (which PR 7 does NOT touch).

What ships

Cost calculation surface

  • AiFoundryOptions.PricingTable — per-deployment USD-cent rates (per 1k input + output tokens), defaults populated for the three Phase 3 deployments from May 2026 Azure OpenAI public pricing:
    • gpt-4o-mini: 0.015c/1k in, 0.060c/1k out
    • gpt-4-1 (mapping to gpt-4.1): 0.20c/1k in, 0.80c/1k out
    • text-embedding-3-large: 0.013c/1k uniform
  • ModelPricing record — Industry units (per 1k); cost-counter values stay readable in dashboards
  • AiCostCalculator + IAiCostCalculator — Pure-function calculator. Missing rows → 0c + debug log
  • TokenUsage record — Per-call token-count snapshot (DeploymentName + InputTokens + OutputTokens)

Token-usage extraction abstraction

  • ITokenUsageReader + NullTokenUsageReader — Abstraction seam. As of Microsoft.Agents.AI 1.4.0 GA the SDK doesn't expose a stable Usage surface on response objects (agent-framework#2688 — open feature request). NullTokenUsageReader (default) returns null so cost stays at 0c until the SDK lands the API. Pricing + ceiling machinery is in place so the swap is a one-class change in a follow-up PR. Interface accepts object so the type contract doesn't break across SDK reshuffles

AiRouter integration

  • After agent invocation, attempts ITokenUsageReader.TryRead; on hit, computes cost via IAiCostCalculator and tags pinwiz.ai.cost_usd_cents (with model + sub_agent + prompt_version attributes per ADR-0015)
  • Per-call cost ceiling: if cumulative cost exceeded AiFoundryOptions.PerCallCostCeilingUsdCents (default $0.10), router returns refusal with RefusalCategory.CostCeilingHit (previously-reserved enum value, now wired)
  • AgentResponse type name correction: the actual Microsoft.Agents.AI 1.4.0 type is AgentResponse (NOT AgentRunResponse — Microsoft Learn docs are ahead of the package). Fixed in the new router branches; existing code already used var so prior PRs were unaffected

docs/observability.md

  • New section AI orchestrator instruments (Phase 3) with the full pinwiz.ai.* inventory + the inherited gen_ai.* attribute list (do NOT duplicate those into pinwiz.* per ADR-0015 lean posture)
  • New section Daily AI cost aggregation (Phase 6 deploy gate) with the KQL query template Phase 6 alert rules pin against
  • Deferred section updated: "Real ITokenUsageReader impl pending .NET Agent Performance Metrics and Token Usage microsoft/agent-framework#2688" as explicit revisit trigger

Tests

10 new AiCostCalculatorTests:

  • known-deployment per-k rates, heavy-tier higher rate, unknown-deployment-zero, zero-tokens-zero, empty-deployment-zero, null-usage-throws, empty-pricing-table-zero, ctor-arg-null guards, default-pricing-table-shape pin

629/629 tests passing (was 619, +10 new).

Audit posture

  • Build green: 0 warnings, 0 errors under TreatWarningsAsErrors
  • Tests green: 629/629
  • Identity check ✅
  • No bare catch
  • No XML doc comments per feedback_no_xml_docs.md
  • Sibling-diff against ConfidenceCalculator / existing options patterns: matches
  • Every option field is read (PricingTable, PerCallCostCeilingUsdCents consumed by AiRouter)
  • [GeneratedRegex] etc. — no new analyzer suppressions

What this PR does NOT do (intentional)

  • Real token usage extractionNullTokenUsageReader placeholder until Microsoft.Agents.AI exposes Usage on AgentResponse (issue #2688). Cost stays at 0c in production until that swap; pricing + ceiling machinery is ready
  • Program.cs changes — none, so PR 7 + PR 8 don't conflict on the CLI surface
  • pinwiz.ai.escalations population — counter exists from PR 4; populated in a future PR when sub-agent escalation routing trace is read from Foundry's connected-agents dispatch
  • Unit tests for AiRouter cost-ceiling path — would require mocking IAiAgent which is sealed in Microsoft.Agents.AI. The cost-ceiling path is exercised end-to-end via the eval harness (PR 8); H2 hand-off is the live validation

Test plan

  • Build green (0 warnings, 0 errors)
  • Tests green (629/629)
  • Identity check (personal noreply on commit)
  • DI gating verified (NullTokenUsageReader registered as singleton; AiCostCalculator registered)
  • Reviewer confirms the deferred-real-impl approach is acceptable until Microsoft.Agents.AI lands Usage on AgentResponse
  • After PR 8 + H2 hand-off: live exercise validates the full cost-attribution + ceiling path against deployed Foundry

Sources:

…y docs (Wave 3 PR 7)

Wave 3 PR 7 of Phase 3 — closes the cost half of the wave per ADR-0015.
PR 8 (eval harness) lands separately; the two only conflict on
Program.cs (which PR 7 doesn't touch).

What ships:

PricingTable (AiFoundryOptions):
- Per-deployment USD-cent rates per 1k input/output tokens, keyed by
  deployment name. Defaults populated for the three Phase 3
  deployments at May 2026 Azure OpenAI public pricing:
    gpt-4o-mini       — 0.015c/1k in, 0.060c/1k out
    gpt-4-1 (gpt-4.1) — 0.20c/1k in, 0.80c/1k out
    text-embedding-3-large — 0.013c/1k uniform
  Operators override via configuration when prices shift.

ModelPricing record:
- Per-deployment pricing fact (InputCentsPer1K + OutputCentsPer1K).
  Industry units (per 1k) keep cost-counter values readable in
  dashboards even though Microsoft publishes pricing per 1M tokens.

AiCostCalculator + IAiCostCalculator (Application/Ai/Cost/):
- Pure-function calculator. Looks up the deployment in
  PricingTable; missing rows return 0 cents (best-effort) and emit
  a debug log so operators can spot mis-keyed deployments without
  breaking the hot path.

ITokenUsageReader + NullTokenUsageReader:
- Abstraction for extracting a TokenUsage snapshot from
  Microsoft.Agents.AI's response object. As of 1.4.0 GA the SDK
  doesn't yet expose a stable Usage surface (microsoft/agent-
  framework#2688); NullTokenUsageReader (default) returns null so
  cost stays at 0c until the SDK lands the API. Pricing + ceiling
  machinery is in place so the swap is a one-class change in a
  follow-up PR. Interface accepts `object` so the type contract
  doesn't break across SDK reshuffles.

AiRouter integration:
- After agent invocation, attempts to read TokenUsage; on hit,
  computes cost and tags pinwiz.ai.cost_usd_cents counter (with
  model + sub_agent + prompt_version attributes per ADR-0015).
- Per-call cost ceiling: if cumulative cost exceeded
  AiFoundryOptions.PerCallCostCeilingUsdCents (default $0.10), the
  router returns a refusal with RefusalCategory.CostCeilingHit
  (the previously-reserved enum value, now wired). Refusal text
  invites the user to ask a more focused question.
- AgentResponse type name correction: the actual Microsoft.Agents.AI
  1.4.0 type is AgentResponse (NOT AgentRunResponse — Microsoft Learn
  docs are ahead of the package as of build time). Fixed in the new
  router branches; the existing code already used `var` so the prior
  PRs were unaffected.

docs/observability.md:
- New section: AI orchestrator instruments (Phase 3) with the full
  pinwiz.ai.* inventory + the inherited gen_ai.* attribute list (do
  NOT duplicate those into pinwiz.* per ADR-0015 lean posture).
- New section: Daily AI cost aggregation (Phase 6 deploy gate) with
  the KQL query template Phase 6 alert rules pin against.
- Deferred section updated: 'Real ITokenUsageReader impl pending
  microsoft/agent-framework#2688' as the explicit revisit trigger.

Tests:
- AiCostCalculatorTests (10 tests): known-deployment per-k rates,
  heavy-tier higher rate, unknown-deployment-zero, zero-tokens-zero,
  empty-deployment-zero, null-usage-throws, empty-pricing-table-zero,
  ctor-arg-null guards, default-pricing-table-shape pin.

Build green (0 warnings under TreatWarningsAsErrors), 629/629 tests
passing (was 619, +10 new). Identity verified.

Sources:
- Microsoft.Agents.AI 1.4.0 GA NuGet (April 2026)
- microsoft/agent-framework#2688 (.NET token usage on response — open)
- Azure OpenAI public pricing (May 2026)
@jkeeley2073 jkeeley2073 added the claude-code Generated with Claude Code label May 7, 2026
@jkeeley2073 jkeeley2073 merged commit d672558 into main May 7, 2026
4 of 5 checks passed
jkeeley2073 added a commit that referenced this pull request May 7, 2026
Resolves the lone CONFLICTING file ServiceCollectionExtensions.cs in
Application/Ai/ — PR #91 added IAiCostCalculator + ITokenUsageReader
singletons; this PR (#92) added the four custom evaluators. Both sets
of singletons need to register; the resolved file keeps both blocks
side-by-side in the import order Cost-then-Evaluators (alphabetical).

Build green (0 warnings under TreatWarningsAsErrors); 687/687 tests
passing — that's 629 (PR #91 baseline after merge to main) + 58 new
from this PR's evaluator + harness + ground-truth-file tests.

Identity verified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code Generated with Claude Code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant