feat(token-usage): Add token usage tracking and cost calculation #40
feat(token-usage): Add token usage tracking and cost calculation #40
Conversation
- Add token usage tracking to LLM providers (Anthropic, Azure OpenAI, Gemini, OpenAI) - Implement LLMResult wrapper type to return both data and token usage from provider calls - Add TokenUsageStats type and calculateCost function for pricing calculations - Add environment variables for input and output token pricing configuration - Integrate token usage accumulation in orchestrator during file evaluation - Add printTokenUsage function to display token usage and cost in Line output format - Include token usage stats in EvaluateFileResult for downstream consumption - Add comprehensive tests for token usage calculation and provider integration - Update provider interfaces to return structured results with usage metadata
- Move token-usage.ts from src/types/ to src/providers/ for better architectural organization - Update all import paths across codebase to reference new token-usage location - Add pricing configuration parameter to EvaluationOptions interface - Pass pricing config from CLI commands through orchestrator to evaluation functions - Remove redundant environment variable parsing from orchestrator, use passed config instead - Update PricingConfig type annotations for explicit undefined handling - Change config schema runRules default from optional to empty array - Consolidate token usage and pricing logic in providers module for improved separation of concerns
📝 WalkthroughWalkthroughAdds token-usage collection and optional cost calculation: LLM providers now return usage with responses; evaluators and orchestrator aggregate usage and cost (when pricing provided); CLI reads pricing and prints token usage and total cost via reporter. (<=50 words) Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as CLI (commands.ts)
participant Orch as Orchestrator (orchestrator.ts)
participant Eval as Evaluator (base/accuracy)
participant LLM as LLM Provider
participant Reporter as Reporter (reporter.ts)
User->>CLI: run evaluation (env pricing may be set)
CLI->>CLI: read INPUT_PRICE_PER_MILLION / OUTPUT_PRICE_PER_MILLION
CLI->>Orch: evaluateFiles(options { pricing, outputFormat })
rect rgb(240,248,255)
Orch->>Orch: iterate files & prompts
Orch->>Eval: runPromptEvaluation(...)
Eval->>LLM: runPromptStructured(prompt)
LLM-->>Eval: LLMResult { data, usage{ inputTokens, outputTokens } }
Eval-->>Orch: Prompt result (includes usage)
Orch->>Orch: aggregate usage per-file & total
end
rect rgb(255,250,240)
Orch->>Orch: if pricing present → calculateCost(totalUsage, pricing)
Orch-->>CLI: EvaluationResult { tokenUsage:{ totalInputTokens, totalOutputTokens, totalCost? } }
CLI->>Reporter: printTokenUsage(tokenUsage)
Reporter-->>User: display token counts and optional cost
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cli/orchestrator.ts (1)
786-837: AddtokenUsageaggregation toevaluateFilesor updateEvaluationResult.The
evaluateFilefunction returnstokenUsageper file, butevaluateFilesdoes not aggregate it. All other per-file metrics (errors,warnings,requestFailures, and status flags) follow a consistent aggregation pattern, yettokenUsageis excluded from both the aggregation loop and theEvaluationResultreturn type. Either aggregate token usage across files (tracking cumulative input/output tokens and cost) to match the pattern of other metrics, or document why token usage is intentionally excluded from multi-file results.
🧹 Nitpick comments (6)
src/output/reporter.ts (1)
203-210: Consider dynamic precision for cost display.The cost is formatted with 4 decimal places, which works well for small amounts (e.g., $0.0001) but may be excessive for larger costs (e.g., $123.4567). Consider using dynamic precision based on the cost magnitude, or at least 2 decimal places for costs above $1.
🔎 View suggested refactor
export function printTokenUsage(stats: TokenUsageStats) { console.log(chalk.bold('\nToken Usage:')); console.log(` - Input tokens: ${stats.totalInputTokens.toLocaleString()}`); console.log(` - Output tokens: ${stats.totalOutputTokens.toLocaleString()}`); if (stats.totalCost !== undefined) { - console.log(` - Total cost: $${stats.totalCost.toFixed(4)}`); + const decimals = stats.totalCost >= 1 ? 2 : 4; + console.log(` - Total cost: $${stats.totalCost.toFixed(decimals)}`); } }src/providers/token-usage.ts (2)
12-15: Remove redundant| undefinedtype annotation.The
?operator already makes the fieldstype | undefined, so explicitly adding| undefinedis redundant.🔎 Apply this diff
export interface PricingConfig { - inputPricePerMillion?: number | undefined; - outputPricePerMillion?: number | undefined; + inputPricePerMillion?: number; + outputPricePerMillion?: number; }
21-30: Consider validating non-negative token counts.The
calculateCostfunction doesn't validate that token counts are non-negative. While the upstream data is likely valid, defensive validation could prevent unexpected negative costs from malformed usage data.🔎 View suggested validation
export function calculateCost(usage: TokenUsage, pricing?: PricingConfig): number | undefined { if (!pricing || pricing.inputPricePerMillion === undefined || pricing.outputPricePerMillion === undefined) { return undefined; } + + if (usage.inputTokens < 0 || usage.outputTokens < 0) { + return undefined; + } const inputCost = (usage.inputTokens / 1_000_000) * pricing.inputPricePerMillion; const outputCost = (usage.outputTokens / 1_000_000) * pricing.outputPricePerMillion; return inputCost + outputCost; }tests/anthropic-provider.test.ts (1)
339-343: Error mock pattern is valid but differs from OpenAI tests.This file uses an explicit type cast pattern for error construction:
const mockApiError = anthropic.APIError as unknown as new (params: MockAPIErrorParams) => Error;While OpenAI tests use
@ts-expect-errorcomments. Both work, but consider standardizing the approach across test files for consistency.src/cli/orchestrator.ts (2)
604-612: Type cast may fail for non-BaseEvaluator implementations.The cast
(evaluator as BaseEvaluator).getLastUsage?.()assumes all evaluators extendBaseEvaluator. IfcreateEvaluatorcan return a different evaluator type that doesn't havegetLastUsage, the optional chaining protects at runtime, but the explicit cast is misleading. Consider checking if the evaluator is an instance ofBaseEvaluatorfirst, or ensure the interfaceLLMProviderincludesgetLastUsage.🔎 Suggested improvement:
- const usage = (evaluator as BaseEvaluator).getLastUsage?.(); + const usage = evaluator instanceof BaseEvaluator ? evaluator.getLastUsage?.() : undefined;
745-762: Consider initializingpricingto avoid passing an empty object.When
options.pricingis undefined,pricingbecomes{}. ThecalculateCostfunction handles missing pricing properties by returningundefined, so this is functionally correct. However, explicitly passingoptions.pricing(which may beundefined) rather than an empty object is clearer.🔎 Suggested improvement:
- const pricing = options.pricing || {}; - - const tokenUsageStats: TokenUsageStats = { - totalInputTokens, - totalOutputTokens, - }; - - const cost = calculateCost( - { - inputTokens: totalInputTokens, - outputTokens: totalOutputTokens - }, - pricing - ); + const tokenUsageStats: TokenUsageStats = { + totalInputTokens, + totalOutputTokens, + }; + + const cost = calculateCost( + { + inputTokens: totalInputTokens, + outputTokens: totalOutputTokens + }, + options.pricing + );
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
src/cli/commands.ts(3 hunks)src/cli/orchestrator.ts(6 hunks)src/cli/types.ts(4 hunks)src/evaluators/base-evaluator.ts(5 hunks)src/output/reporter.ts(2 hunks)src/providers/anthropic-provider.ts(4 hunks)src/providers/azure-openai-provider.ts(3 hunks)src/providers/gemini-provider.ts(3 hunks)src/providers/llm-provider.ts(1 hunks)src/providers/openai-provider.ts(3 hunks)src/providers/token-usage.ts(1 hunks)src/schemas/config-schemas.ts(1 hunks)src/schemas/env-schemas.ts(2 hunks)tests/anthropic-e2e.test.ts(7 hunks)tests/anthropic-provider.test.ts(8 hunks)tests/openai-provider.test.ts(15 hunks)tests/scoring-types.test.ts(6 hunks)tests/token-usage.test.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (13)
src/output/reporter.ts (1)
src/providers/token-usage.ts (1)
TokenUsageStats(6-10)
tests/token-usage.test.ts (1)
src/providers/token-usage.ts (3)
TokenUsage(1-4)PricingConfig(12-15)calculateCost(21-30)
src/providers/openai-provider.ts (1)
src/providers/llm-provider.ts (1)
LLMResult(3-6)
tests/openai-provider.test.ts (1)
tests/schemas/mock-schemas.ts (1)
MockOpenAIClient(55-64)
src/cli/types.ts (1)
src/providers/token-usage.ts (3)
PricingConfig(12-15)TokenUsage(1-4)TokenUsageStats(6-10)
src/providers/llm-provider.ts (1)
src/providers/token-usage.ts (1)
TokenUsage(1-4)
src/providers/azure-openai-provider.ts (1)
src/providers/llm-provider.ts (1)
LLMResult(3-6)
src/providers/gemini-provider.ts (1)
src/providers/llm-provider.ts (1)
LLMResult(3-6)
src/providers/anthropic-provider.ts (1)
src/providers/llm-provider.ts (1)
LLMResult(3-6)
src/evaluators/base-evaluator.ts (2)
src/providers/token-usage.ts (1)
TokenUsage(1-4)src/prompts/schema.ts (2)
SubjectiveLLMResult(74-82)SemiObjectiveLLMResult(84-92)
src/cli/orchestrator.ts (4)
src/evaluators/index.ts (1)
BaseEvaluator(16-16)src/cli/types.ts (1)
RunPromptEvaluationResultSuccess(130-134)src/providers/token-usage.ts (2)
TokenUsageStats(6-10)calculateCost(21-30)src/output/reporter.ts (2)
printEvaluationSummaries(138-175)printTokenUsage(203-210)
tests/anthropic-provider.test.ts (1)
tests/schemas/mock-schemas.ts (4)
MockAPIErrorParams(46-46)MockRateLimitErrorParams(48-48)MockAuthenticationErrorParams(47-47)MockBadRequestErrorParams(49-49)
tests/scoring-types.test.ts (2)
src/providers/llm-provider.ts (1)
LLMResult(3-6)src/prompts/schema.ts (2)
SubjectiveLLMResult(74-82)SemiObjectiveLLMResult(84-92)
🔇 Additional comments (27)
src/schemas/config-schemas.ts (1)
11-11: TherunRulesdefault change is backward compatible and safe.Using
.default([])makes the input optional (accepts undefined) while ensuring the output is never undefined, which is actually an improvement over.optional(). Existing code patterns in the codebase work correctly with this change:
- The helper function in
tests/utils.tsthat checksif (runRules !== undefined)continues to work because Zod's.default()still accepts undefined input during parsing- The truthiness check in
scan-path-resolver.ts(if (match.runRules)) works correctly—empty arrays are truthy but iterate zero times- The
FilePatternConfiginterface's optionalrunRulesproperty accommodates the now-guaranteed string array typeThis change aligns the type system to reflect the actual parsed output:
runRuleswill always be present as either a user-provided array or an empty array default, eliminating theundefinedcase.src/schemas/env-schemas.ts (1)
32-49: LGTM! Clean environment schema extension.The BASE_ENV_SCHEMA with pricing fields is properly defined and consistently merged across all provider configurations. The use of
.positive()ensures valid pricing when provided.tests/token-usage.test.ts (1)
1-55: LGTM! Comprehensive test coverage.The test suite thoroughly covers all scenarios including correct calculations, partial millions, missing pricing configurations, and edge cases like zero tokens.
tests/anthropic-e2e.test.ts (1)
144-152: LGTM! Test updates correctly reflect the LLMResult wrapper.All test assertions have been properly updated to access
result.datafor the response payload andresult.usagefor token tracking, aligning with the new structured response format.Also applies to: 238-242, 480-500, 558-565, 617-621
src/cli/commands.ts (1)
17-17: LGTM! Clean integration of pricing configuration.The OutputFormat type cast is safe given the validated CLI options, and the pricing configuration is correctly passed through from environment variables to the orchestrator.
Also applies to: 160-176
src/evaluators/base-evaluator.ts (2)
28-48: LGTM! Clean token usage tracking integration.The protected
lastUsagefield and publicgetLastUsage()accessor provide a clean API for external access to token usage without coupling the evaluator to specific consumers.
68-75: LGTM! Consistent usage tracking across evaluation paths.Both subjective and semi-objective evaluation paths correctly destructure the LLMResult wrapper and conditionally store usage data when present.
Also applies to: 122-129
src/providers/llm-provider.ts (1)
1-10: LGTM! Excellent abstraction for structured LLM responses.The
LLMResult<T>wrapper provides a clean, type-safe interface for returning both the response data and optional usage metrics. The generic type parameter ensures type safety is preserved for the data payload while standardizing usage reporting across all providers.src/providers/azure-openai-provider.ts (2)
135-145: LGTM! Token usage extraction implemented correctly.The implementation correctly wraps the parsed data in
LLMResult<T>and populates usage metadata when available. The mapping fromprompt_tokens→inputTokensandcompletion_tokens→outputTokensis consistent with the OpenAI provider.
51-51: Method signature correctly updated for LLMResult wrapper.The return type change from
Promise<T>toPromise<LLMResult<T>>aligns with the interface contract inllm-provider.ts.tests/scoring-types.test.ts (2)
34-55: LGTM! Mock response correctly structured with LLMResult wrapper.The test mock now properly uses the
LLMResult<SubjectiveLLMResult>structure with nesteddatafield, aligning with the provider's new return type.
172-172: Type cast is pragmatic for testing mock data.The
as unknown as LLMResult<any>cast is acceptable here since this test mocks claim extraction which uses a different schema than the typed evaluation results.src/providers/openai-provider.ts (1)
168-179: LGTM! Token usage extraction follows consistent pattern.The implementation correctly:
- Wraps parsed JSON data in
LLMResult<T>- Conditionally populates
usageonly when present in response- Maps OpenAI field names (
prompt_tokens,completion_tokens) to standardized names (inputTokens,outputTokens)src/providers/anthropic-provider.ts (2)
95-95: Good addition of explicitstream: false.Explicitly setting
stream: falseensures the response includes complete usage metadata, which is required for token tracking.
166-172: Verify that usage is always present in Anthropic responses.Unlike OpenAI/Azure providers which conditionally set
usage, this implementation always accessesvalidatedResponse.usage.input_tokensandoutput_tokens. This assumes Anthropic always returns usage data.Based on the
ANTHROPIC_RESPONSE_SCHEMAvalidation, verify thatusageis a required field. If not, this could throw when accessing properties onundefined.#!/bin/bash # Check Anthropic response schema to verify if usage is required rg -n -A 20 'ANTHROPIC_RESPONSE_SCHEMA' --type tstests/openai-provider.test.ts (1)
200-209: LGTM! Test assertions correctly validate the new response structure.The test properly verifies:
result.datacontains the expected parsed JSONresult.usageis defined when the mock includes usage data- Token counts are correctly mapped from OpenAI field names
The conditional check
if (result.usage)is appropriate given usage is optional.tests/anthropic-provider.test.ts (1)
175-183: LGTM! Test assertions properly validate LLMResult wrapper and usage.The test correctly:
- Asserts
result.datacontains expected tool output- Validates
result.usageis defined- Verifies
inputTokensandoutputTokensmatch mock valuessrc/cli/types.ts (3)
130-138: LGTM! Clean type definitions for token usage support.The new
RunPromptEvaluationResultSuccessinterface cleanly separates the success case with optionalusagefield, and the discriminated unionRunPromptEvaluationResultmaintains type safety for error handling.
12-12: Imports correctly reference token-usage types.The import from
../providers/token-usagealigns with the type definitions shown in the relevant code snippets.
146-148: EvaluateFileResult properly extended for aggregated token usage.Adding
tokenUsage?: TokenUsageStatsenables propagating aggregated token counts through the evaluation pipeline for final cost calculation.src/providers/gemini-provider.ts (3)
2-2: LGTM!Import of
LLMResultaligns with the updated return type.
44-48: LGTM!Return type correctly updated to
Promise<LLMResult<T>>to match the interface contract.
77-93: Property names are correct.The
promptTokenCountandcandidatesTokenCountproperties are the correct property names for usageMetadata in the Gemini API, and the nullish coalescing operator provides appropriate safety for optional properties.src/cli/orchestrator.ts (4)
8-8: LGTM!Import of
printTokenUsageis consistent with the new token usage reporting feature.
13-13: LGTM!Import of
BaseEvaluatorneeded for accessinggetLastUsage()method.
631-632: LGTM!Token accumulators properly initialized to zero.
720-724: LGTM!Token usage accumulation correctly guards against undefined
r.usagebefore adding.
- Import and call printTokenUsage function in main command handler - Aggregate token usage statistics across all evaluated files - Calculate total input and output tokens from file results - Compute aggregated cost based on configured pricing - Add tokenUsage field to EvaluationResult interface - Improve cost formatting in reporter with dynamic precision (4 decimals for costs < $1, 2 for larger amounts) - Token usage stats now displayed after global summary when available
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/cli/orchestrator.ts (1)
604-612: Type assertion could be made more defensive.The
(evaluator as BaseEvaluator).getLastUsage?.()pattern works due to optional chaining but relies on knowing that all evaluators extendBaseEvaluator. Consider adding a type guard or checking the method existence more explicitly if the evaluator hierarchy might change.That said, the current implementation is functionally safe with optional chaining.
🔎 Alternative approach using explicit check:
- const usage = (evaluator as BaseEvaluator).getLastUsage?.(); + const usage = 'getLastUsage' in evaluator + ? (evaluator as BaseEvaluator).getLastUsage() + : undefined;
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/cli/commands.ts(4 hunks)src/cli/orchestrator.ts(9 hunks)src/cli/types.ts(5 hunks)src/output/reporter.ts(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/cli/types.ts
🧰 Additional context used
🧬 Code graph analysis (3)
src/output/reporter.ts (1)
src/providers/token-usage.ts (1)
TokenUsageStats(6-10)
src/cli/commands.ts (1)
src/output/reporter.ts (1)
printTokenUsage(203-212)
src/cli/orchestrator.ts (4)
src/evaluators/base-evaluator.ts (1)
BaseEvaluator(27-212)src/cli/types.ts (1)
RunPromptEvaluationResultSuccess(131-135)src/providers/token-usage.ts (2)
TokenUsageStats(6-10)calculateCost(21-30)src/output/reporter.ts (1)
printTokenUsage(203-212)
🔇 Additional comments (5)
src/output/reporter.ts (1)
203-212: LGTM!The implementation is clean and handles the optional
totalCostappropriately. The dynamic precision (4 decimals for sub-dollar costs, 2 for larger amounts) provides good UX for typical LLM API pricing.src/cli/commands.ts (2)
172-175: Pricing configuration integration looks correct.The pricing object correctly passes the environment variables to the orchestrator. The
calculateCostfunction intoken-usage.tshandlesundefinedvalues gracefully, so this is safe even when pricing is not configured.
186-190: LGTM!The token usage is printed after the global summary as specified in the requirements, and the guard on
result.tokenUsageexistence is appropriate.src/cli/orchestrator.ts (2)
720-724: Token usage aggregation logic is correct.The accumulation of input/output tokens both at the prompt level within a file and across files is implemented correctly. The cost is recalculated from aggregated tokens at the end rather than summing per-file costs, which is mathematically equivalent and avoids potential floating-point accumulation errors.
Also applies to: 819-824
838-849: LGTM!The final token usage stats calculation correctly aggregates totals and only sets
totalCostwhen pricing is configured. This is consistent with the type definition wheretotalCostis optional.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/cli/orchestrator.ts (3)
607-615: Consider clarifying the type safety pattern.The code casts
evaluatortoBaseEvaluatorand then uses optional chaining ongetLastUsage, which is somewhat redundant. If all evaluators extendBaseEvaluator, the optional chaining is unnecessary; if some don't, the cast is misleading. The current code is safe and functional, but could be clearer.💡 Alternative approaches
Option 1: If all evaluators extend BaseEvaluator, remove optional chaining:
-const usage = (evaluator as BaseEvaluator).getLastUsage?.(); +const usage = (evaluator as BaseEvaluator).getLastUsage();Option 2: If some evaluators don't have
getLastUsage, use type guard instead of cast:-const usage = (evaluator as BaseEvaluator).getLastUsage?.(); +const usage = 'getLastUsage' in evaluator && typeof evaluator.getLastUsage === 'function' + ? evaluator.getLastUsage() + : undefined;
748-765: Minor clarity improvements.Two small issues:
The comment on Line 748 says "Calculate costs if output format is Line" but the cost is calculated unconditionally for all formats (which is correct). The comment is misleading.
Line 749 uses
options.pricing || {}as a fallback, butcalculateCostalready handlesundefinedpricing by returningundefined. The fallback to{}is unnecessary defensive code.🔎 Suggested improvements
- // Calculate costs if output format is Line - const pricing = options.pricing || {}; + // Calculate token usage stats and cost + const pricing = options.pricing;
839-850: Same minor improvement as per-file calculation.Line 846 uses
options.pricing || {}as a fallback, butcalculateCostalready handlesundefinedpricing by returningundefined. The fallback is unnecessary.🔎 Suggested improvement
- const pricing = options.pricing || {}; + const pricing = options.pricing;
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/cli/commands.ts(3 hunks)src/cli/orchestrator.ts(9 hunks)src/output/reporter.ts(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/cli/commands.ts
🧰 Additional context used
🧬 Code graph analysis (2)
src/output/reporter.ts (1)
src/providers/token-usage.ts (1)
TokenUsageStats(6-10)
src/cli/orchestrator.ts (4)
src/evaluators/base-evaluator.ts (1)
BaseEvaluator(27-212)src/evaluators/index.ts (1)
BaseEvaluator(16-16)src/cli/types.ts (1)
RunPromptEvaluationResultSuccess(131-135)src/providers/token-usage.ts (2)
TokenUsageStats(6-10)calculateCost(21-30)
🔇 Additional comments (8)
src/output/reporter.ts (2)
5-5: LGTM: Import is correct.The import of
TokenUsageStatsis properly structured and the relative path is correct.
203-213: LGTM: Well-implemented token usage display.The function correctly implements the requirements from the PR objectives:
- Uses
.toLocaleString()for readable token counts (e.g., "1,250")- Conditionally displays cost only when available (line 207)
- Applies sensible dynamic precision (4 decimals for costs < $1, 2 decimals otherwise)
- Formatting and structure are consistent with other reporter functions
The implementation handles edge cases appropriately and matches the example output format specified in the PR.
src/cli/orchestrator.ts (6)
13-13: LGTM!The new imports for token usage tracking are appropriate and all are utilized in the implementation below.
Also applies to: 20-26
634-635: LGTM!The per-file token accumulation logic correctly aggregates usage only from successful prompt evaluations, with appropriate defensive checks for
r.usageexistence.Also applies to: 723-727
767-770: Past review issue appears resolved.The previous review comment flagged duplicate token usage printing at this location. The current code only prints evaluation summaries here and does not call
printTokenUsage, so the duplication concern has been addressed.
778-778: LGTM!Correctly includes token usage statistics in the file evaluation result.
799-800: LGTM!The global token aggregation correctly accumulates usage across all evaluated files with appropriate defensive checks.
Also applies to: 820-825
859-859: LGTM!Correctly includes aggregated token usage statistics in the final evaluation result.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/cli/commands.ts (1)
161-161: Unsafe cast from string toOutputFormatenum.The CLI option
--outputaccepts any string, but it's cast directly toOutputFormatwithout validation. If a user passes an invalid value (e.g.,--output xml), the code will proceed with an invalid format value instead of failing fast.Consider validating the output format before use:
🔎 Proposed fix
- const outputFormat = cliOptions.output as OutputFormat; + const outputFormat = cliOptions.output as OutputFormat; + if (!Object.values(OutputFormat).includes(outputFormat)) { + console.error(`Error: Invalid output format '${cliOptions.output}'. Valid options: line, json, vale-json, rdjson`); + process.exit(1); + }src/cli/orchestrator.ts (2)
701-709: Type assertion forgetLastUsageaccess.The cast
(evaluator as BaseEvaluator).getLastUsage?.()works becauseBaseEvaluatordefinesgetLastUsage(), but this couples the orchestrator to the base implementation. If a custom evaluator doesn't extendBaseEvaluator, the usage won't be captured.This is acceptable for now since all evaluators extend
BaseEvaluator, but consider addinggetLastUsage()to theEvaluatorinterface in the future for type safety.
851-868: Per-file cost calculation is redundant with aggregate calculation.Cost is calculated per-file at lines 859-868, but this per-file cost is never used for display or reporting—only the aggregate cost in
evaluateFilesis shown. The per-filetokenUsageStats.totalCostis included inEvaluateFileResult.tokenUsage, but the aggregate function at lines 926-929 only sumstotalInputTokensandtotalOutputTokens, ignoring the per-file costs and recalculating at line 955.This is technically correct (due to floating-point precision, recalculating from totals is more accurate), but the per-file cost calculation is dead code. Consider removing it to simplify:
🔎 Simplify by removing per-file cost calculation
const tokenUsageStats: TokenUsageStats = { totalInputTokens, totalOutputTokens, }; - const cost = calculateCost( - { - inputTokens: totalInputTokens, - outputTokens: totalOutputTokens - }, - pricing - ); - if (cost !== undefined) { - tokenUsageStats.totalCost = cost; - }src/cli/types.ts (1)
12-12: Consider usingimport typefor type-only imports.The import on line 12 brings in
TokenUsage,TokenUsageStats, andPricingConfigwhich are all interfaces (types only). Per coding guidelines ("Use TypeScript ESM with explicit imports and narrow types"), consider usingimport typefor consistency with other type imports in this file (lines 1-6).🔎 Proposed fix
-import { TokenUsage, TokenUsageStats, PricingConfig } from '../providers/token-usage'; +import type { TokenUsage, TokenUsageStats, PricingConfig } from '../providers/token-usage';
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/cli/commands.tssrc/cli/orchestrator.tssrc/cli/types.tssrc/evaluators/base-evaluator.tssrc/schemas/config-schemas.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/evaluators/base-evaluator.ts
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.ts
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with noany; useunknown+ schema validation for external data
Use custom error types with proper inheritance; catch blocks useunknowntype
Files:
src/cli/commands.tssrc/schemas/config-schemas.tssrc/cli/types.tssrc/cli/orchestrator.ts
🧠 Learnings (4)
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/boundaries/**/*.ts : Use Zod schemas for boundary validation of all external data (files, CLI, env, APIs) at system boundaries
Applied to files:
src/schemas/config-schemas.ts
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/**/*.ts : Maintain strict TypeScript with no `any`; use `unknown` + schema validation for external data
Applied to files:
src/schemas/config-schemas.ts
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/output/**/*.ts : IDs should be shown as `PromptId.CriterionId` in output
Applied to files:
src/cli/types.tssrc/cli/orchestrator.ts
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to evals/**/*.md : Eval YAML should use `name` (human), `id` (PascalCase), and criteria `id` (PascalCase)
Applied to files:
src/cli/orchestrator.ts
🧬 Code graph analysis (3)
src/cli/commands.ts (1)
src/output/reporter.ts (1)
printTokenUsage(203-213)
src/cli/types.ts (1)
src/providers/token-usage.ts (3)
PricingConfig(12-15)TokenUsageStats(6-10)TokenUsage(1-4)
src/cli/orchestrator.ts (4)
src/evaluators/base-evaluator.ts (1)
BaseEvaluator(28-236)src/evaluators/index.ts (1)
BaseEvaluator(16-16)src/cli/types.ts (1)
RunPromptEvaluationResultSuccess(120-124)src/providers/token-usage.ts (2)
TokenUsageStats(6-10)calculateCost(21-30)
🔇 Additional comments (9)
src/schemas/config-schemas.ts (1)
11-11: Type safety improvement is correct and consistent.The change from
.optional()to.default([])improves type safety by guaranteeing thatrunRulesis always a defined array. This is consistent with theconcurrencyfield pattern on line 6.Verification shows all usages of
runRuleshandle this change correctly. Code iterating overrunRulesworks without issues since the schema now guarantees an empty array when not provided. Existing undefined checks infile-section-parser.tsandtests/utils.tsremain harmless though now always-true.src/cli/commands.ts (2)
173-177: LGTM!The pricing configuration is correctly passed through from environment variables. The
PricingConfiginterface properly allows undefined values, and downstreamcalculateCosthandles missing pricing gracefully by returningundefinedfor cost.
180-190: Consider the display order of token usage vs. global summary.Based on the PR objective ("Display token usage and cost information in the terminal after each completed evaluation, shown after the calculation summary"), the token usage should be displayed after the global summary, not before. Currently,
printTokenUsageis called beforeprintGlobalSummary.If the current order is intentional for better UX, this is fine. Otherwise, consider swapping the order:
🔎 Suggested reorder (if after summary was intended)
if (cliOptions.output === 'line') { - if (result.tokenUsage) { - printTokenUsage(result.tokenUsage); - } printGlobalSummary( result.totalFiles, result.totalErrors, result.totalWarnings, result.requestFailures ); + if (result.tokenUsage) { + printTokenUsage(result.tokenUsage); + } }src/cli/orchestrator.ts (3)
22-25: LGTM!The token usage imports are correctly added and align with the types defined in
src/providers/token-usage.ts.
947-958: LGTM!The aggregate token usage and cost calculation is correctly implemented. Cost is calculated once from the total summed tokens, which is more accurate than summing per-file costs due to floating-point precision.
262-268: No functional change.This is a formatting adjustment in the
criterionIdternary expression. Logic remains unchanged.src/cli/types.ts (3)
14-19: LGTM!The
OutputFormatenum properly defines all supported output formats with string values matching the CLI option strings.
120-128: LGTM!The
RunPromptEvaluationResultSuccessinterface and updatedRunPromptEvaluationResultunion type follow a clean discriminated union pattern. The optionalusagefield appropriately handles cases where token usage isn't available.
30-30: LGTM!The optional
pricingandtokenUsagefields are correctly added to their respective interfaces, maintaining backward compatibility while enabling the new token tracking feature.Also applies to: 40-40, 138-138
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/evaluators/evaluator.ts (1)
10-10: LGTM: Optional method enables token usage tracking.The optional method maintains backward compatibility while enabling token usage tracking as intended by the PR objectives.
Optional refinements:
The return type could be simplified from
TokenUsage | undefinedto justTokenUsage, since the?already signals the method may not exist. However, the explicitundefinedmakes it clear the method can return undefined even when implemented (e.g., before first evaluation).Consider adding JSDoc to document the method's purpose:
/** * Returns token usage from the last evaluation, if available. */ getLastUsage?(): TokenUsage | undefined;src/cli/orchestrator.ts (1)
940-941: Consider passingoptions.pricingdirectly.The
|| {}fallback is unnecessary sincecalculateCostalready handlesundefinedpricing gracefully (returnsundefinedwhen pricing config is missing). This simplifies the code and removes an unneeded intermediate variable.🔎 Suggested simplification
- // Calculate cost if pricing is configured - const pricing = options.pricing || {}; - const cost = calculateCost({ inputTokens: totalInputTokens, outputTokens: totalOutputTokens }, pricing); + // Calculate cost if pricing is configured + const cost = calculateCost({ inputTokens: totalInputTokens, outputTokens: totalOutputTokens }, options.pricing);
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/cli/commands.tssrc/cli/orchestrator.tssrc/cli/types.tssrc/evaluators/evaluator.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/cli/types.ts
- src/cli/commands.ts
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.ts
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with noany; useunknown+ schema validation for external data
Use custom error types with proper inheritance; catch blocks useunknowntype
Files:
src/evaluators/evaluator.tssrc/cli/orchestrator.ts
🧠 Learnings (2)
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/output/**/*.ts : IDs should be shown as `PromptId.CriterionId` in output
Applied to files:
src/cli/orchestrator.ts
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to evals/**/*.md : Eval YAML should use `name` (human), `id` (PascalCase), and criteria `id` (PascalCase)
Applied to files:
src/cli/orchestrator.ts
🧬 Code graph analysis (2)
src/evaluators/evaluator.ts (4)
src/evaluators/index.ts (1)
Evaluator(13-13)src/cli/types.ts (1)
EvaluationResult(33-41)src/prompts/schema.ts (1)
EvaluationResult(171-171)src/providers/token-usage.ts (1)
TokenUsage(1-4)
src/cli/orchestrator.ts (2)
src/cli/types.ts (1)
RunPromptEvaluationResultSuccess(120-124)src/providers/token-usage.ts (2)
TokenUsageStats(6-10)calculateCost(21-30)
🔇 Additional comments (4)
src/evaluators/evaluator.ts (1)
2-2: LGTM: Type import follows best practices.The type-only import is correctly specified and follows TypeScript ESM guidelines.
src/cli/orchestrator.ts (3)
701-709: LGTM!Clean use of optional chaining for
getLastUsage?.()and conditional attachment of usage data to the result object. This correctly handles evaluators that may not support token usage tracking.
824-829: LGTM!Token usage accumulation is correctly guarded and only processes successful evaluations that return usage data.
910-915: LGTM!Aggregation of token usage across files is correctly guarded.
|
There are ts errors in the following files: |
|
Have you been able to test this feature with all supported providers? |
- Remove `getLastUsage()` method and `lastUsage` field from BaseEvaluator - Move token usage from orchestrator result wrapper to nested evaluation result - Update `RunPromptEvaluationResultSuccess` to remove top-level usage field - Modify accuracy evaluator to aggregate token usage from claim extraction and base evaluation - Update orchestrator to access usage from `result.usage` instead of `r.usage` - Create `ClaimExtractionResult` interface to return claims with optional usage data - Ensure token usage is properly accumulated and returned within evaluation results - Simplifies the result structure by keeping usage data closer to the actual evaluation data
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/evaluators/accuracy-evaluator.ts (1)
64-68: Consider unconditionally assigning usage to avoid mutation.The pattern of conditionally mutating
resultafter creation works, but could be cleaner. SincecalculateSemiObjectiveResultreturns a mutable object, this is fine, but an alternative would be to spread usage into a new object.🔎 Optional refactor for immutability
if (claims.length === 0) { const wordCount = content.trim().split(/\s+/).length || 1; - const result = this.calculateSemiObjectiveResult([], wordCount); - if (claimUsage) result.usage = claimUsage; - return result; + return { + ...this.calculateSemiObjectiveResult([], wordCount), + ...(claimUsage && { usage: claimUsage }), + }; }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/cli/orchestrator.tssrc/cli/types.tssrc/evaluators/accuracy-evaluator.tssrc/evaluators/base-evaluator.tssrc/prompts/schema.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/evaluators/base-evaluator.ts
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.ts
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with noany; useunknown+ schema validation for external data
Use custom error types with proper inheritance; catch blocks useunknowntype
Files:
src/prompts/schema.tssrc/cli/orchestrator.tssrc/cli/types.tssrc/evaluators/accuracy-evaluator.ts
🧠 Learnings (3)
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/output/**/*.ts : IDs should be shown as `PromptId.CriterionId` in output
Applied to files:
src/cli/orchestrator.tssrc/cli/types.ts
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to evals/**/*.md : Eval YAML should use `name` (human), `id` (PascalCase), and criteria `id` (PascalCase)
Applied to files:
src/cli/orchestrator.ts
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/providers/**/*.ts : Depend on `LLMProvider` and `SearchProvider` interfaces; keep providers thin (transport only)
Applied to files:
src/evaluators/accuracy-evaluator.ts
🧬 Code graph analysis (4)
src/prompts/schema.ts (2)
src/providers/token-usage.ts (1)
TokenUsage(1-4)src/evaluators/types.ts (1)
EvaluationType(14-17)
src/cli/orchestrator.ts (2)
src/cli/types.ts (1)
RunPromptEvaluationResultSuccess(120-123)src/providers/token-usage.ts (2)
TokenUsageStats(6-10)calculateCost(21-30)
src/cli/types.ts (9)
src/prompts/prompt-loader.ts (3)
PromptFile(8-8)PromptCriterionSpec(8-8)PromptMeta(8-8)src/schemas/prompt-schemas.ts (3)
PromptFile(64-64)PromptCriterionSpec(62-62)PromptMeta(63-63)src/providers/llm-provider.ts (1)
LLMProvider(8-10)src/providers/search-provider.ts (1)
SearchProvider(5-7)src/boundaries/file-section-parser.ts (1)
FilePatternConfig(2-6)src/providers/token-usage.ts (2)
PricingConfig(12-15)TokenUsageStats(6-10)src/prompts/schema.ts (2)
EvaluationResult(174-174)SubjectiveResult(123-143)src/output/reporter.ts (1)
EvaluationSummary(7-11)src/output/json-formatter.ts (2)
JsonFormatter(54-94)ScoreComponent(10-18)
src/evaluators/accuracy-evaluator.ts (1)
src/providers/token-usage.ts (1)
TokenUsage(1-4)
🔇 Additional comments (14)
src/prompts/schema.ts (1)
2-2: LGTM!The
TokenUsagetype is correctly imported and consistently added as an optional field to bothSubjectiveResultandSemiObjectiveResult. This maintains type safety while allowing evaluation results to optionally carry token usage metadata from LLM providers.Also applies to: 142-142, 171-171
src/evaluators/accuracy-evaluator.ts (3)
28-31: LGTM!The
ClaimExtractionResultinterface cleanly encapsulates the return type ofextractClaims, making the optionalusagefield explicit in the contract.
95-101: LGTM on token aggregation logic.The aggregation correctly combines claim extraction usage with the base evaluator's usage, using
|| 0to handle cases whereresult.usagemight be undefined. This ensures accurate total token counts across the multi-step evaluation pipeline.
129-139: LGTM on claim extraction with usage propagation.The destructuring correctly extracts both
dataandusagefrom the LLM provider's structured response. The conditional spread...(usage && { usage })is an idiomatic way to optionally include usage in the return object.src/cli/orchestrator.ts (5)
1-27: LGTM on imports.The imports are well-organized, bringing in the necessary types (
TokenUsageStats) and utilities (calculateCost) for token usage tracking.
701-704: LGTM on typed success result.Using the
RunPromptEvaluationResultSuccessinterface provides better type safety and aligns with the discriminated union pattern used forRunPromptEvaluationResult.
819-823: LGTM on per-prompt usage accumulation.The accumulation correctly extracts token counts from each successful prompt evaluation result. The optional chaining on
r.result.usagesafely handles cases where usage data isn't present.
846-862: LGTM on file-level token usage aggregation.The
TokenUsageStatsobject correctly captures per-file totals and is properly included in theEvaluateFileResult.
928-941: LGTM on cost calculation integration.The cost calculation is correctly applied only when pricing config is available, and the result is conditionally added to
tokenUsage.totalCost. This aligns with thecalculateCostfunction signature which returnsundefinedwhen pricing isn't configured.src/cli/types.ts (5)
12-12: LGTM on type imports.Using
typeimports forTokenUsageStatsandPricingConfigcorrectly indicates these are type-only imports, which helps with tree-shaking and ensures no runtime code is generated.
30-30: LGTM on pricing config option.Adding
pricing?: PricingConfigtoEvaluationOptionsenables cost calculation when users provide pricing information via environment variables.
40-40: LGTM on token usage in EvaluationResult.The optional
tokenUsagefield properly surfaces aggregated token statistics to callers ofevaluateFiles.
120-127: LGTM on discriminated union refactor.Extracting
RunPromptEvaluationResultSuccessas a named interface improves code clarity and enables the orchestrator to reference this type explicitly when constructing success results. The union at lines 125-127 maintains the discriminated union pattern withokas the discriminant.
135-138: LGTM on file result token usage.Adding
tokenUsage?: TokenUsageStatstoEvaluateFileResultenables per-file token tracking, which is then aggregated inevaluateFilesfor the overall total.
The errors from these files aren't related to this PR. I can remember fixing them. can't remember which actual PR. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/providers/anthropic-provider.ts (2)
95-95: Clarify the necessity of settingstream: falsein both locations.The
stream: falseparameter is set on both line 95 (inparams) and line 131 (inanthropicParams). Unlike other E2E compatibility aliases (e.g.,maxTokens,toolChoice), this uses the same snake_case name in both places.Since
anthropicParamsis constructed independently and sent to the Anthropic API, is thestream: falseon line 95 necessary for E2E mocks? If not, it could be removed from the initialparamsobject and only set inanthropicParams.This addresses the previous review question: "
stream: false" ensures the API returns a complete response rather than a streaming response.🔎 Optional: Remove redundant stream parameter if not needed for E2E
If E2E mocks don't require
streamin the params object, you could simplify:max_tokens: this.config.maxTokens!, tools: [toolSchema], tool_choice: { type: 'tool', name: schema.name }, - stream: false, maxTokens: this.config.maxTokens!, toolChoice: { type: 'tool', name: schema.name },Also applies to: 131-131
132-134: Consider simplifying conditional spreads for always-defined fields.The conditional spreads for
system,tools, andtool_choicecheck forundefined, but these fields are always set earlier in the code:
systemis set on line 85 fromsystemPrompt(line 76)toolsis always set on line 93tool_choiceis always set on line 94Unless there are edge cases or future scenarios where these might be undefined, you could simplify to unconditional spreads or direct assignment.
🔎 Simplified approach
const anthropicParams: Anthropic.Messages.MessageCreateParams = { model: params.model, messages: params.messages, max_tokens: params.max_tokens, stream: false, - ...(params.system !== undefined && { system: params.system }), - ...(params.tools !== undefined && { tools: params.tools }), - ...(params.tool_choice !== undefined && { tool_choice: params.tool_choice }), + system: params.system, + tools: params.tools, + tool_choice: params.tool_choice, ...(params.temperature !== undefined && { temperature: params.temperature }), };
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/evaluators/base-evaluator.tssrc/providers/anthropic-provider.tstests/scoring-types.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/scoring-types.test.ts
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.ts
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with noany; useunknown+ schema validation for external data
Use custom error types with proper inheritance; catch blocks useunknowntype
Files:
src/providers/anthropic-provider.tssrc/evaluators/base-evaluator.ts
src/providers/**/*.ts
📄 CodeRabbit inference engine (AGENTS.md)
src/providers/**/*.ts: Depend onLLMProviderandSearchProviderinterfaces; keep providers thin (transport only)
InjectRequestBuildervia provider constructor to avoid coupling
Files:
src/providers/anthropic-provider.ts
🧠 Learnings (1)
📚 Learning: 2025-12-28T19:43:51.176Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/providers/**/*.ts : Depend on `LLMProvider` and `SearchProvider` interfaces; keep providers thin (transport only)
Applied to files:
src/providers/anthropic-provider.ts
🧬 Code graph analysis (2)
src/providers/anthropic-provider.ts (1)
src/providers/llm-provider.ts (1)
LLMResult(3-6)
src/evaluators/base-evaluator.ts (1)
src/prompts/schema.ts (2)
SubjectiveLLMResult(96-110)SemiObjectiveLLMResult(112-121)
🔇 Additional comments (4)
src/evaluators/base-evaluator.ts (1)
71-75: No changes needed—usage field is already properly typed.The
usagefield in bothSubjectiveResultandSemiObjectiveResultis already typed asusage?: TokenUsage;, whereTokenUsageis a well-defined interface withinputTokensandoutputTokensproperties. The code already complies with the coding guidelines requiring strict TypeScript withoutany.Likely an incorrect or invalid review comment.
src/providers/anthropic-provider.ts (3)
3-3: LGTM! Return type updated to support token usage tracking.The import of
LLMResultand the updated return type correctly align with the standardized interface for tracking token usage across providers.Also applies to: 75-75
231-233: LGTM! Improved error handling for text blocks.The safer extraction of
firstTextBlockwith an explicit existence check before accessing its properties is a good improvement that prevents potential undefined access errors.
163-171: No changes required. TheANTHROPIC_RESPONSE_SCHEMAdefinesusageas a required field (line 33 insrc/schemas/anthropic-responses.ts), andANTHROPIC_USAGE_SCHEMArequires bothinput_tokensandoutput_tokensas non-optional numbers. After schema validation viaANTHROPIC_RESPONSE_SCHEMA.parse(), TypeScript guaranteesvalidatedResponse.usageis present, making direct access tovalidatedResponse.usage.input_tokensandvalidatedResponse.usage.output_tokenstype-safe with no null checks needed.Likely an incorrect or invalid review comment.
…prove type safety
feat(token-usage): Add token usage tracking and cost calculation
This PR resolves #35,
This PR implements comprehensive token usage tracking and cost calculation for LLM evaluations. It enables users to monitor token consumption per run and estimate costs by configuring pricing rates via environment variables.
Key Changes
Configuration
To enable cost estimation, add the following to your .env file (rates per 1 million tokens):
Example Output
Summary by CodeRabbit
New Features
Tests
Documentation / Config
✏️ Tip: You can customize this high-level summary in your review settings.