feat(ai) cost-attribution telemetry + per-call ceiling + observability docs (Wave 3 PR 7)#91
Merged
Merged
Conversation
…y docs (Wave 3 PR 7)
Wave 3 PR 7 of Phase 3 — closes the cost half of the wave per ADR-0015.
PR 8 (eval harness) lands separately; the two only conflict on
Program.cs (which PR 7 doesn't touch).
What ships:
PricingTable (AiFoundryOptions):
- Per-deployment USD-cent rates per 1k input/output tokens, keyed by
deployment name. Defaults populated for the three Phase 3
deployments at May 2026 Azure OpenAI public pricing:
gpt-4o-mini — 0.015c/1k in, 0.060c/1k out
gpt-4-1 (gpt-4.1) — 0.20c/1k in, 0.80c/1k out
text-embedding-3-large — 0.013c/1k uniform
Operators override via configuration when prices shift.
ModelPricing record:
- Per-deployment pricing fact (InputCentsPer1K + OutputCentsPer1K).
Industry units (per 1k) keep cost-counter values readable in
dashboards even though Microsoft publishes pricing per 1M tokens.
AiCostCalculator + IAiCostCalculator (Application/Ai/Cost/):
- Pure-function calculator. Looks up the deployment in
PricingTable; missing rows return 0 cents (best-effort) and emit
a debug log so operators can spot mis-keyed deployments without
breaking the hot path.
ITokenUsageReader + NullTokenUsageReader:
- Abstraction for extracting a TokenUsage snapshot from
Microsoft.Agents.AI's response object. As of 1.4.0 GA the SDK
doesn't yet expose a stable Usage surface (microsoft/agent-
framework#2688); NullTokenUsageReader (default) returns null so
cost stays at 0c until the SDK lands the API. Pricing + ceiling
machinery is in place so the swap is a one-class change in a
follow-up PR. Interface accepts `object` so the type contract
doesn't break across SDK reshuffles.
AiRouter integration:
- After agent invocation, attempts to read TokenUsage; on hit,
computes cost and tags pinwiz.ai.cost_usd_cents counter (with
model + sub_agent + prompt_version attributes per ADR-0015).
- Per-call cost ceiling: if cumulative cost exceeded
AiFoundryOptions.PerCallCostCeilingUsdCents (default $0.10), the
router returns a refusal with RefusalCategory.CostCeilingHit
(the previously-reserved enum value, now wired). Refusal text
invites the user to ask a more focused question.
- AgentResponse type name correction: the actual Microsoft.Agents.AI
1.4.0 type is AgentResponse (NOT AgentRunResponse — Microsoft Learn
docs are ahead of the package as of build time). Fixed in the new
router branches; the existing code already used `var` so the prior
PRs were unaffected.
docs/observability.md:
- New section: AI orchestrator instruments (Phase 3) with the full
pinwiz.ai.* inventory + the inherited gen_ai.* attribute list (do
NOT duplicate those into pinwiz.* per ADR-0015 lean posture).
- New section: Daily AI cost aggregation (Phase 6 deploy gate) with
the KQL query template Phase 6 alert rules pin against.
- Deferred section updated: 'Real ITokenUsageReader impl pending
microsoft/agent-framework#2688' as the explicit revisit trigger.
Tests:
- AiCostCalculatorTests (10 tests): known-deployment per-k rates,
heavy-tier higher rate, unknown-deployment-zero, zero-tokens-zero,
empty-deployment-zero, null-usage-throws, empty-pricing-table-zero,
ctor-arg-null guards, default-pricing-table-shape pin.
Build green (0 warnings under TreatWarningsAsErrors), 629/629 tests
passing (was 619, +10 new). Identity verified.
Sources:
- Microsoft.Agents.AI 1.4.0 GA NuGet (April 2026)
- microsoft/agent-framework#2688 (.NET token usage on response — open)
- Azure OpenAI public pricing (May 2026)
jkeeley2073
added a commit
that referenced
this pull request
May 7, 2026
Resolves the lone CONFLICTING file ServiceCollectionExtensions.cs in Application/Ai/ — PR #91 added IAiCostCalculator + ITokenUsageReader singletons; this PR (#92) added the four custom evaluators. Both sets of singletons need to register; the resolved file keeps both blocks side-by-side in the import order Cost-then-Evaluators (alphabetical). Build green (0 warnings under TreatWarningsAsErrors); 687/687 tests passing — that's 629 (PR #91 baseline after merge to main) + 58 new from this PR's evaluator + harness + ground-truth-file tests. Identity verified.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wave 3 PR 7 — closes the cost half of Wave 3. Per ADR-0015. PR 8 (eval harness) is in flight in a worktree-isolated subagent and lands separately; the two only conflict on
Program.cs(which PR 7 does NOT touch).What ships
Cost calculation surface
AiFoundryOptions.PricingTable— per-deployment USD-cent rates (per 1k input + output tokens), defaults populated for the three Phase 3 deployments from May 2026 Azure OpenAI public pricing:gpt-4o-mini: 0.015c/1k in, 0.060c/1k outgpt-4-1(mapping togpt-4.1): 0.20c/1k in, 0.80c/1k outtext-embedding-3-large: 0.013c/1k uniformModelPricingrecord — Industry units (per 1k); cost-counter values stay readable in dashboardsAiCostCalculator+IAiCostCalculator— Pure-function calculator. Missing rows → 0c + debug logTokenUsagerecord — Per-call token-count snapshot (DeploymentName + InputTokens + OutputTokens)Token-usage extraction abstraction
ITokenUsageReader+NullTokenUsageReader— Abstraction seam. As ofMicrosoft.Agents.AI1.4.0 GA the SDK doesn't expose a stableUsagesurface on response objects (agent-framework#2688 — open feature request).NullTokenUsageReader(default) returns null so cost stays at 0c until the SDK lands the API. Pricing + ceiling machinery is in place so the swap is a one-class change in a follow-up PR. Interface acceptsobjectso the type contract doesn't break across SDK reshufflesAiRouterintegrationITokenUsageReader.TryRead; on hit, computes cost viaIAiCostCalculatorand tagspinwiz.ai.cost_usd_cents(withmodel+sub_agent+prompt_versionattributes per ADR-0015)AiFoundryOptions.PerCallCostCeilingUsdCents(default $0.10), router returns refusal withRefusalCategory.CostCeilingHit(previously-reserved enum value, now wired)AgentResponsetype name correction: the actualMicrosoft.Agents.AI1.4.0 type isAgentResponse(NOTAgentRunResponse— Microsoft Learn docs are ahead of the package). Fixed in the new router branches; existing code already usedvarso prior PRs were unaffecteddocs/observability.mdpinwiz.ai.*inventory + the inheritedgen_ai.*attribute list (do NOT duplicate those intopinwiz.*per ADR-0015 lean posture)ITokenUsageReaderimpl pending .NET Agent Performance Metrics and Token Usage microsoft/agent-framework#2688" as explicit revisit triggerTests
10 new
AiCostCalculatorTests:629/629 tests passing (was 619, +10 new).
Audit posture
TreatWarningsAsErrorscatchfeedback_no_xml_docs.mdConfidenceCalculator/ existing options patterns: matchesPricingTable,PerCallCostCeilingUsdCentsconsumed by AiRouter)[GeneratedRegex]etc. — no new analyzer suppressionsWhat this PR does NOT do (intentional)
NullTokenUsageReaderplaceholder untilMicrosoft.Agents.AIexposes Usage onAgentResponse(issue #2688). Cost stays at 0c in production until that swap; pricing + ceiling machinery is readyProgram.cschanges — none, so PR 7 + PR 8 don't conflict on the CLI surfacepinwiz.ai.escalationspopulation — counter exists from PR 4; populated in a future PR when sub-agent escalation routing trace is read from Foundry's connected-agents dispatchIAiAgentwhich is sealed inMicrosoft.Agents.AI. The cost-ceiling path is exercised end-to-end via the eval harness (PR 8); H2 hand-off is the live validationTest plan
Sources: