feat(token-usage): Add token usage tracking and cost calculation by ayo6706 · Pull Request #40 · TRocket-Labs/vectorlint

ayo6706 · 2025-12-18T11:35:49Z

This PR resolves #35,

This PR implements comprehensive token usage tracking and cost calculation for LLM evaluations. It enables users to monitor token consumption per run and estimate costs by configuring pricing rates via environment variables.

Key Changes

Core Types: Added TokenUsage, TokenUsageStats, and PricingConfig interfaces to standardizing tracking across the system.
Provider Updates: Updated all LLM providers (Anthropic, OpenAI, Azure, Gemini) to extract and return input/output token counts in standard LLMResult format.
CLI Aggregation: The orchestrator now aggregates token counts across all evaluated files and prompts.
Reporting: Added a new summary section to the CLI output that displays total input/output tokens and estimated cost.
Configuration: Introduced INPUT_PRICE_PER_MILLION and OUTPUT_PRICE_PER_MILLION environment variables to enable cost calculation.

Configuration

To enable cost estimation, add the following to your .env file (rates per 1 million tokens):

INPUT_PRICE_PER_MILLION=3.00   # e.g., $3.00/1M input tokens
OUTPUT_PRICE_PER_MILLION=15.00 # e.g., $15.00/1M output tokens

Example Output

Token Usage:
  - Input tokens: 15,420
  - Output tokens: 4,210
  - Total cost: $0.1094

Summary by CodeRabbit

New Features
- Evaluation reports now include token usage (input/output tokens) and optional total cost.
- CLI prints a token usage summary; pricing can be provided via INPUT_PRICE_PER_MILLION and OUTPUT_PRICE_PER_MILLION env vars.
- Provider responses now return a wrapped result with data + usage, enabling downstream usage reporting.
Tests
- Added/updated tests for token usage reporting, cost calculation, and wrapped response shape.
Documentation / Config
- Scan config runRules now defaults to an empty list.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Add token usage tracking to LLM providers (Anthropic, Azure OpenAI, Gemini, OpenAI) - Implement LLMResult wrapper type to return both data and token usage from provider calls - Add TokenUsageStats type and calculateCost function for pricing calculations - Add environment variables for input and output token pricing configuration - Integrate token usage accumulation in orchestrator during file evaluation - Add printTokenUsage function to display token usage and cost in Line output format - Include token usage stats in EvaluateFileResult for downstream consumption - Add comprehensive tests for token usage calculation and provider integration - Update provider interfaces to return structured results with usage metadata

- Move token-usage.ts from src/types/ to src/providers/ for better architectural organization - Update all import paths across codebase to reference new token-usage location - Add pricing configuration parameter to EvaluationOptions interface - Pass pricing config from CLI commands through orchestrator to evaluation functions - Remove redundant environment variable parsing from orchestrator, use passed config instead - Update PricingConfig type annotations for explicit undefined handling - Change config schema runRules default from optional to empty array - Consolidate token usage and pricing logic in providers module for improved separation of concerns

coderabbitai · 2025-12-18T11:36:02Z

📝 Walkthrough

Walkthrough

Adds token-usage collection and optional cost calculation: LLM providers now return usage with responses; evaluators and orchestrator aggregate usage and cost (when pricing provided); CLI reads pricing and prints token usage and total cost via reporter. (<=50 words)

Changes

Cohort / File(s)	Summary
Provider interfaces & implementations `src/providers/llm-provider.ts`, `src/providers/anthropic-provider.ts`, `src/providers/azure-openai-provider.ts`, `src/providers/gemini-provider.ts`, `src/providers/openai-provider.ts`	Introduce `LLMResult<T>` (`{ data: T; usage?: TokenUsage }`) and change `runPromptStructured` to return `Promise<LLMResult<T>>`; providers now wrap parsed data and attach input/output token usage.
Token usage & pricing utilities `src/providers/token-usage.ts`	New types `TokenUsage`, `TokenUsageStats`, `PricingConfig` and `calculateCost(usage, pricing?)` that computes cost or returns undefined if pricing incomplete.
CLI types & orchestration `src/cli/types.ts`, `src/cli/orchestrator.ts`, `src/cli/commands.ts`	Types extended with optional `tokenUsage` and `pricing?`; `evaluateFiles` aggregates usage and computes totalCost when pricing provided; CLI reads env pricing, validates output format, and prints token usage via `printTokenUsage`.
Evaluators & prompt flows `src/evaluators/base-evaluator.ts`, `src/evaluators/accuracy-evaluator.ts`, ...`src/evaluators/*`	Consumers of `runPromptStructured` now handle `LLMResult`, propagate/merge `usage` into Subjective/SemiObjective results and per-file/overall evaluation outputs; claim extraction includes usage.
Reporting `src/output/reporter.ts`	New exported `printTokenUsage(stats: TokenUsageStats)` prints input tokens, output tokens, and optional total cost.
Schemas / Configuration `src/schemas/env-schemas.ts`, `src/schemas/config-schemas.ts`	ENV schema merged with base pricing fields `INPUT_PRICE_PER_MILLION` and `OUTPUT_PRICE_PER_MILLION`; `runRules` in config schema now defaults to `[]`.
Prompts/schema changes `src/prompts/schema.ts`	`SubjectiveResult` and `SemiObjectiveResult` gain optional `usage?: TokenUsage`.
Tests updated / added `tests/*` (e.g., `tests/anthropic-e2e.test.ts`, `tests/anthropic-provider.test.ts`, `tests/openai-provider.test.ts`, `tests/scoring-types.test.ts`, `tests/token-usage.test.ts`)	Tests adapted to `LLMResult` shape (`result.data`) and to assert `usage` values; new tests for `calculateCost()` verify pricing scenarios.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as CLI (commands.ts)
    participant Orch as Orchestrator (orchestrator.ts)
    participant Eval as Evaluator (base/accuracy)
    participant LLM as LLM Provider
    participant Reporter as Reporter (reporter.ts)

    User->>CLI: run evaluation (env pricing may be set)
    CLI->>CLI: read INPUT_PRICE_PER_MILLION / OUTPUT_PRICE_PER_MILLION
    CLI->>Orch: evaluateFiles(options { pricing, outputFormat })

    rect rgb(240,248,255)
      Orch->>Orch: iterate files & prompts
      Orch->>Eval: runPromptEvaluation(...)
      Eval->>LLM: runPromptStructured(prompt)
      LLM-->>Eval: LLMResult { data, usage{ inputTokens, outputTokens } }
      Eval-->>Orch: Prompt result (includes usage)
      Orch->>Orch: aggregate usage per-file & total
    end

    rect rgb(255,250,240)
      Orch->>Orch: if pricing present → calculateCost(totalUsage, pricing)
      Orch-->>CLI: EvaluationResult { tokenUsage:{ totalInputTokens, totalOutputTokens, totalCost? } }
      CLI->>Reporter: printTokenUsage(tokenUsage)
      Reporter-->>User: display token counts and optional cost
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I counted tokens by moonlit rule,

Hopping through prompts, each API pool.
Input, output — numbers gleam,
Prices turn tokens into a dream.
Cost now prints — hop, tally, and cheer!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately summarizes the main objective: adding token usage tracking and cost calculation features across the LLM providers and CLI output.
Linked Issues check	✅ Passed	The pull request implements all core requirements from issue #35: token tracking infrastructure (TokenUsage, TokenUsageStats, PricingConfig types), per-provider token extraction, orchestrator aggregation, and CLI cost display including input tokens, output tokens, and total cost.
Out of Scope Changes check	✅ Passed	The pull request includes scope-appropriate changes: token-usage provider implementation, LLM provider updates to extract tokens, orchestrator aggregation, CLI reporting, schema updates, and related tests. Anthropic provider refactoring (type safety improvements) is complementary to token tracking. Config schema change to default runRules to empty array is a minor ancillary improvement unrelated to core objectives but acceptable.

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e2b1b1d and 4240bae.

📒 Files selected for processing (2)

src/evaluators/base-evaluator.ts
src/providers/anthropic-provider.ts

🚧 Files skipped from review as they are similar to previous changes (2)

src/providers/anthropic-provider.ts
src/evaluators/base-evaluator.ts

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cli/orchestrator.ts (1)

786-837: Add tokenUsage aggregation to evaluateFiles or update EvaluationResult.

The evaluateFile function returns tokenUsage per file, but evaluateFiles does not aggregate it. All other per-file metrics (errors, warnings, requestFailures, and status flags) follow a consistent aggregation pattern, yet tokenUsage is excluded from both the aggregation loop and the EvaluationResult return type. Either aggregate token usage across files (tracking cumulative input/output tokens and cost) to match the pattern of other metrics, or document why token usage is intentionally excluded from multi-file results.

🧹 Nitpick comments (6)

src/output/reporter.ts (1)
203-210: Consider dynamic precision for cost display.

The cost is formatted with 4 decimal places, which works well for small amounts (e.g., $0.0001) but may be excessive for larger costs (e.g., $123.4567). Consider using dynamic precision based on the cost magnitude, or at least 2 decimal places for costs above $1.
🔎 View suggested refactor
 export function printTokenUsage(stats: TokenUsageStats) {
   console.log(chalk.bold('\nToken Usage:'));
   console.log(`  - Input tokens: ${stats.totalInputTokens.toLocaleString()}`);
   console.log(`  - Output tokens: ${stats.totalOutputTokens.toLocaleString()}`);
   if (stats.totalCost !== undefined) {
-    console.log(`  - Total cost: $${stats.totalCost.toFixed(4)}`);
+    const decimals = stats.totalCost >= 1 ? 2 : 4;
+    console.log(`  - Total cost: $${stats.totalCost.toFixed(decimals)}`);
   }
 }
src/providers/token-usage.ts (2)
12-15: Remove redundant | undefined type annotation.

The ? operator already makes the fields type | undefined, so explicitly adding | undefined is redundant.
🔎 Apply this diff
 export interface PricingConfig {
-    inputPricePerMillion?: number | undefined;
-    outputPricePerMillion?: number | undefined;
+    inputPricePerMillion?: number;
+    outputPricePerMillion?: number;
 }
21-30: Consider validating non-negative token counts.

The calculateCost function doesn't validate that token counts are non-negative. While the upstream data is likely valid, defensive validation could prevent unexpected negative costs from malformed usage data.
🔎 View suggested validation
 export function calculateCost(usage: TokenUsage, pricing?: PricingConfig): number | undefined {
     if (!pricing || pricing.inputPricePerMillion === undefined || pricing.outputPricePerMillion === undefined) {
         return undefined;
     }
+    
+    if (usage.inputTokens < 0 || usage.outputTokens < 0) {
+        return undefined;
+    }
 
     const inputCost = (usage.inputTokens / 1_000_000) * pricing.inputPricePerMillion;
     const outputCost = (usage.outputTokens / 1_000_000) * pricing.outputPricePerMillion;
 
     return inputCost + outputCost;
 }
tests/anthropic-provider.test.ts (1)
339-343: Error mock pattern is valid but differs from OpenAI tests.

This file uses an explicit type cast pattern for error construction:
const mockApiError = anthropic.APIError as unknown as new (params: MockAPIErrorParams) => Error;
While OpenAI tests use @ts-expect-error comments. Both work, but consider standardizing the approach across test files for consistency.
src/cli/orchestrator.ts (2)
604-612: Type cast may fail for non-BaseEvaluator implementations.

The cast (evaluator as BaseEvaluator).getLastUsage?.() assumes all evaluators extend BaseEvaluator. If createEvaluator can return a different evaluator type that doesn't have getLastUsage, the optional chaining protects at runtime, but the explicit cast is misleading. Consider checking if the evaluator is an instance of BaseEvaluator first, or ensure the interface LLMProvider includes getLastUsage.
🔎 Suggested improvement:
-    const usage = (evaluator as BaseEvaluator).getLastUsage?.();
+    const usage = evaluator instanceof BaseEvaluator ? evaluator.getLastUsage?.() : undefined;
745-762: Consider initializing pricing to avoid passing an empty object.

When options.pricing is undefined, pricing becomes {}. The calculateCost function handles missing pricing properties by returning undefined, so this is functionally correct. However, explicitly passing options.pricing (which may be undefined) rather than an empty object is clearer.
🔎 Suggested improvement:
-  const pricing = options.pricing || {};
-
-  const tokenUsageStats: TokenUsageStats = {
-    totalInputTokens,
-    totalOutputTokens,
-  };
-
-  const cost = calculateCost(
-    {
-      inputTokens: totalInputTokens,
-      outputTokens: totalOutputTokens
-    },
-    pricing
-  );
+  const tokenUsageStats: TokenUsageStats = {
+    totalInputTokens,
+    totalOutputTokens,
+  };
+
+  const cost = calculateCost(
+    {
+      inputTokens: totalInputTokens,
+      outputTokens: totalOutputTokens
+    },
+    options.pricing
+  );

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 659ea2e and f43f4ff.

📒 Files selected for processing (18)

src/cli/commands.ts (3 hunks)
src/cli/orchestrator.ts (6 hunks)
src/cli/types.ts (4 hunks)
src/evaluators/base-evaluator.ts (5 hunks)
src/output/reporter.ts (2 hunks)
src/providers/anthropic-provider.ts (4 hunks)
src/providers/azure-openai-provider.ts (3 hunks)
src/providers/gemini-provider.ts (3 hunks)
src/providers/llm-provider.ts (1 hunks)
src/providers/openai-provider.ts (3 hunks)
src/providers/token-usage.ts (1 hunks)
src/schemas/config-schemas.ts (1 hunks)
src/schemas/env-schemas.ts (2 hunks)
tests/anthropic-e2e.test.ts (7 hunks)
tests/anthropic-provider.test.ts (8 hunks)
tests/openai-provider.test.ts (15 hunks)
tests/scoring-types.test.ts (6 hunks)
tests/token-usage.test.ts (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (13)

src/output/reporter.ts (1)

src/providers/token-usage.ts (1)

TokenUsageStats (6-10)

tests/token-usage.test.ts (1)

src/providers/token-usage.ts (3)

TokenUsage (1-4)

PricingConfig (12-15)

calculateCost (21-30)

src/providers/openai-provider.ts (1)

src/providers/llm-provider.ts (1)

LLMResult (3-6)

tests/openai-provider.test.ts (1)

tests/schemas/mock-schemas.ts (1)

MockOpenAIClient (55-64)

src/cli/types.ts (1)

src/providers/token-usage.ts (3)

PricingConfig (12-15)

TokenUsage (1-4)

TokenUsageStats (6-10)

src/providers/llm-provider.ts (1)

src/providers/token-usage.ts (1)

TokenUsage (1-4)

src/providers/azure-openai-provider.ts (1)

src/providers/llm-provider.ts (1)

LLMResult (3-6)

src/providers/gemini-provider.ts (1)

src/providers/llm-provider.ts (1)

LLMResult (3-6)

src/providers/anthropic-provider.ts (1)

src/providers/llm-provider.ts (1)

LLMResult (3-6)

src/evaluators/base-evaluator.ts (2)

src/providers/token-usage.ts (1)

TokenUsage (1-4)

src/prompts/schema.ts (2)

SubjectiveLLMResult (74-82)

SemiObjectiveLLMResult (84-92)

src/cli/orchestrator.ts (4)

src/evaluators/index.ts (1)

BaseEvaluator (16-16)

src/cli/types.ts (1)

RunPromptEvaluationResultSuccess (130-134)

src/providers/token-usage.ts (2)

TokenUsageStats (6-10)

calculateCost (21-30)

src/output/reporter.ts (2)

printEvaluationSummaries (138-175)

printTokenUsage (203-210)

tests/anthropic-provider.test.ts (1)

tests/schemas/mock-schemas.ts (4)

MockAPIErrorParams (46-46)

MockRateLimitErrorParams (48-48)

MockAuthenticationErrorParams (47-47)

MockBadRequestErrorParams (49-49)

tests/scoring-types.test.ts (2)

src/providers/llm-provider.ts (1)

LLMResult (3-6)

src/prompts/schema.ts (2)

SubjectiveLLMResult (74-82)

SemiObjectiveLLMResult (84-92)

🔇 Additional comments (27)

src/schemas/config-schemas.ts (1)

11-11: The runRules default change is backward compatible and safe.

Using .default([]) makes the input optional (accepts undefined) while ensuring the output is never undefined, which is actually an improvement over .optional(). Existing code patterns in the codebase work correctly with this change:

The helper function in tests/utils.ts that checks if (runRules !== undefined) continues to work because Zod's .default() still accepts undefined input during parsing

The truthiness check in scan-path-resolver.ts (if (match.runRules)) works correctly—empty arrays are truthy but iterate zero times

The FilePatternConfig interface's optional runRules property accommodates the now-guaranteed string array type

This change aligns the type system to reflect the actual parsed output: runRules will always be present as either a user-provided array or an empty array default, eliminating the undefined case.

src/schemas/env-schemas.ts (1)

32-49: LGTM! Clean environment schema extension.

The BASE_ENV_SCHEMA with pricing fields is properly defined and consistently merged across all provider configurations. The use of .positive() ensures valid pricing when provided.

tests/token-usage.test.ts (1)

1-55: LGTM! Comprehensive test coverage.

The test suite thoroughly covers all scenarios including correct calculations, partial millions, missing pricing configurations, and edge cases like zero tokens.

tests/anthropic-e2e.test.ts (1)

144-152: LGTM! Test updates correctly reflect the LLMResult wrapper.

All test assertions have been properly updated to access result.data for the response payload and result.usage for token tracking, aligning with the new structured response format.

Also applies to: 238-242, 480-500, 558-565, 617-621

src/cli/commands.ts (1)

17-17: LGTM! Clean integration of pricing configuration.

The OutputFormat type cast is safe given the validated CLI options, and the pricing configuration is correctly passed through from environment variables to the orchestrator.

Also applies to: 160-176

src/evaluators/base-evaluator.ts (2)

28-48: LGTM! Clean token usage tracking integration.

The protected lastUsage field and public getLastUsage() accessor provide a clean API for external access to token usage without coupling the evaluator to specific consumers.

68-75: LGTM! Consistent usage tracking across evaluation paths.

Both subjective and semi-objective evaluation paths correctly destructure the LLMResult wrapper and conditionally store usage data when present.

Also applies to: 122-129

src/providers/llm-provider.ts (1)

1-10: LGTM! Excellent abstraction for structured LLM responses.

The LLMResult<T> wrapper provides a clean, type-safe interface for returning both the response data and optional usage metrics. The generic type parameter ensures type safety is preserved for the data payload while standardizing usage reporting across all providers.

src/providers/azure-openai-provider.ts (2)

135-145: LGTM! Token usage extraction implemented correctly.

The implementation correctly wraps the parsed data in LLMResult<T> and populates usage metadata when available. The mapping from prompt_tokens → inputTokens and completion_tokens → outputTokens is consistent with the OpenAI provider.

51-51: Method signature correctly updated for LLMResult wrapper.

The return type change from Promise<T> to Promise<LLMResult<T>> aligns with the interface contract in llm-provider.ts.

tests/scoring-types.test.ts (2)

34-55: LGTM! Mock response correctly structured with LLMResult wrapper.

The test mock now properly uses the LLMResult<SubjectiveLLMResult> structure with nested data field, aligning with the provider's new return type.

172-172: Type cast is pragmatic for testing mock data.

The as unknown as LLMResult<any> cast is acceptable here since this test mocks claim extraction which uses a different schema than the typed evaluation results.

src/providers/openai-provider.ts (1)

168-179: LGTM! Token usage extraction follows consistent pattern.

The implementation correctly:

Wraps parsed JSON data in LLMResult<T>

Conditionally populates usage only when present in response

Maps OpenAI field names (prompt_tokens, completion_tokens) to standardized names (inputTokens, outputTokens)
src/providers/anthropic-provider.ts (2)
95-95: Good addition of explicit stream: false.

Explicitly setting stream: false ensures the response includes complete usage metadata, which is required for token tracking.

166-172: Verify that usage is always present in Anthropic responses.

Unlike OpenAI/Azure providers which conditionally set usage, this implementation always accesses validatedResponse.usage.input_tokens and output_tokens. This assumes Anthropic always returns usage data.

Based on the ANTHROPIC_RESPONSE_SCHEMA validation, verify that usage is a required field. If not, this could throw when accessing properties on undefined.
#!/bin/bash
# Check Anthropic response schema to verify if usage is required
rg -n -A 20 'ANTHROPIC_RESPONSE_SCHEMA' --type ts
tests/openai-provider.test.ts (1)

200-209: LGTM! Test assertions correctly validate the new response structure.

The test properly verifies:

result.data contains the expected parsed JSON

result.usage is defined when the mock includes usage data

Token counts are correctly mapped from OpenAI field names

The conditional check if (result.usage) is appropriate given usage is optional.

tests/anthropic-provider.test.ts (1)

175-183: LGTM! Test assertions properly validate LLMResult wrapper and usage.

The test correctly:

Asserts result.data contains expected tool output

Validates result.usage is defined

Verifies inputTokens and outputTokens match mock values

src/cli/types.ts (3)

130-138: LGTM! Clean type definitions for token usage support.

The new RunPromptEvaluationResultSuccess interface cleanly separates the success case with optional usage field, and the discriminated union RunPromptEvaluationResult maintains type safety for error handling.

12-12: Imports correctly reference token-usage types.

The import from ../providers/token-usage aligns with the type definitions shown in the relevant code snippets.

146-148: EvaluateFileResult properly extended for aggregated token usage.

Adding tokenUsage?: TokenUsageStats enables propagating aggregated token counts through the evaluation pipeline for final cost calculation.

src/providers/gemini-provider.ts (3)

2-2: LGTM!

Import of LLMResult aligns with the updated return type.

44-48: LGTM!

Return type correctly updated to Promise<LLMResult<T>> to match the interface contract.

77-93: Property names are correct.

The promptTokenCount and candidatesTokenCount properties are the correct property names for usageMetadata in the Gemini API, and the nullish coalescing operator provides appropriate safety for optional properties.

src/cli/orchestrator.ts (4)

8-8: LGTM!

Import of printTokenUsage is consistent with the new token usage reporting feature.

13-13: LGTM!

Import of BaseEvaluator needed for accessing getLastUsage() method.

631-632: LGTM!

Token accumulators properly initialized to zero.

720-724: LGTM!

Token usage accumulation correctly guards against undefined r.usage before adding.

- Import and call printTokenUsage function in main command handler - Aggregate token usage statistics across all evaluated files - Calculate total input and output tokens from file results - Compute aggregated cost based on configured pricing - Add tokenUsage field to EvaluationResult interface - Improve cost formatting in reporter with dynamic precision (4 decimals for costs < $1, 2 for larger amounts) - Token usage stats now displayed after global summary when available

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/cli/orchestrator.ts (1)
604-612: Type assertion could be made more defensive.

The (evaluator as BaseEvaluator).getLastUsage?.() pattern works due to optional chaining but relies on knowing that all evaluators extend BaseEvaluator. Consider adding a type guard or checking the method existence more explicitly if the evaluator hierarchy might change.

That said, the current implementation is functionally safe with optional chaining.
🔎 Alternative approach using explicit check:
-    const usage = (evaluator as BaseEvaluator).getLastUsage?.();
+    const usage = 'getLastUsage' in evaluator
+      ? (evaluator as BaseEvaluator).getLastUsage()
+      : undefined;

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f43f4ff and 244e3ae.

📒 Files selected for processing (4)

src/cli/commands.ts (4 hunks)
src/cli/orchestrator.ts (9 hunks)
src/cli/types.ts (5 hunks)
src/output/reporter.ts (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/cli/types.ts

🧰 Additional context used

🧬 Code graph analysis (3)

src/output/reporter.ts (1)

src/providers/token-usage.ts (1)

TokenUsageStats (6-10)

src/cli/commands.ts (1)

src/output/reporter.ts (1)

printTokenUsage (203-212)

src/cli/orchestrator.ts (4)

src/evaluators/base-evaluator.ts (1)

BaseEvaluator (27-212)

src/cli/types.ts (1)

RunPromptEvaluationResultSuccess (131-135)

src/providers/token-usage.ts (2)

TokenUsageStats (6-10)

calculateCost (21-30)

src/output/reporter.ts (1)

printTokenUsage (203-212)

🔇 Additional comments (5)

src/output/reporter.ts (1)

203-212: LGTM!

The implementation is clean and handles the optional totalCost appropriately. The dynamic precision (4 decimals for sub-dollar costs, 2 for larger amounts) provides good UX for typical LLM API pricing.

src/cli/commands.ts (2)

172-175: Pricing configuration integration looks correct.

The pricing object correctly passes the environment variables to the orchestrator. The calculateCost function in token-usage.ts handles undefined values gracefully, so this is safe even when pricing is not configured.

186-190: LGTM!

The token usage is printed after the global summary as specified in the requirements, and the guard on result.tokenUsage existence is appropriate.

src/cli/orchestrator.ts (2)

720-724: Token usage aggregation logic is correct.

The accumulation of input/output tokens both at the prompt level within a file and across files is implemented correctly. The cost is recalculated from aggregated tokens at the end rather than summing per-file costs, which is mathematically equivalent and avoids potential floating-point accumulation errors.

Also applies to: 819-824

838-849: LGTM!

The final token usage stats calculation correctly aggregates totals and only sets totalCost when pricing is configured. This is consistent with the type definition where totalCost is optional.

src/cli/orchestrator.ts

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

src/cli/orchestrator.ts (3)
607-615: Consider clarifying the type safety pattern.

The code casts evaluator to BaseEvaluator and then uses optional chaining on getLastUsage, which is somewhat redundant. If all evaluators extend BaseEvaluator, the optional chaining is unnecessary; if some don't, the cast is misleading. The current code is safe and functional, but could be clearer.
💡 Alternative approaches

Option 1: If all evaluators extend BaseEvaluator, remove optional chaining:
-const usage = (evaluator as BaseEvaluator).getLastUsage?.();
+const usage = (evaluator as BaseEvaluator).getLastUsage();
Option 2: If some evaluators don't have getLastUsage, use type guard instead of cast:
-const usage = (evaluator as BaseEvaluator).getLastUsage?.();
+const usage = 'getLastUsage' in evaluator && typeof evaluator.getLastUsage === 'function'
+  ? evaluator.getLastUsage()
+  : undefined;
748-765: Minor clarity improvements.

Two small issues:

The comment on Line 748 says "Calculate costs if output format is Line" but the cost is calculated unconditionally for all formats (which is correct). The comment is misleading.

Line 749 uses options.pricing || {} as a fallback, but calculateCost already handles undefined pricing by returning undefined. The fallback to {} is unnecessary defensive code.
🔎 Suggested improvements
-  // Calculate costs if output format is Line
-  const pricing = options.pricing || {};
+  // Calculate token usage stats and cost
+  const pricing = options.pricing;
839-850: Same minor improvement as per-file calculation.

Line 846 uses options.pricing || {} as a fallback, but calculateCost already handles undefined pricing by returning undefined. The fallback is unnecessary.
🔎 Suggested improvement
-  const pricing = options.pricing || {};
+  const pricing = options.pricing;

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 244e3ae and cee544f.

📒 Files selected for processing (3)

src/cli/commands.ts (3 hunks)
src/cli/orchestrator.ts (9 hunks)
src/output/reporter.ts (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/cli/commands.ts

🧰 Additional context used

🧬 Code graph analysis (2)

src/output/reporter.ts (1)

src/providers/token-usage.ts (1)

TokenUsageStats (6-10)

src/cli/orchestrator.ts (4)

src/evaluators/base-evaluator.ts (1)

BaseEvaluator (27-212)

src/evaluators/index.ts (1)

BaseEvaluator (16-16)

src/cli/types.ts (1)

RunPromptEvaluationResultSuccess (131-135)

src/providers/token-usage.ts (2)

TokenUsageStats (6-10)

calculateCost (21-30)

🔇 Additional comments (8)

src/output/reporter.ts (2)

5-5: LGTM: Import is correct.

The import of TokenUsageStats is properly structured and the relative path is correct.

203-213: LGTM: Well-implemented token usage display.

The function correctly implements the requirements from the PR objectives:

Uses .toLocaleString() for readable token counts (e.g., "1,250")

Conditionally displays cost only when available (line 207)

Applies sensible dynamic precision (4 decimals for costs < $1, 2 decimals otherwise)

Formatting and structure are consistent with other reporter functions

The implementation handles edge cases appropriately and matches the example output format specified in the PR.

src/cli/orchestrator.ts (6)

13-13: LGTM!

The new imports for token usage tracking are appropriate and all are utilized in the implementation below.

Also applies to: 20-26

634-635: LGTM!

The per-file token accumulation logic correctly aggregates usage only from successful prompt evaluations, with appropriate defensive checks for r.usage existence.

Also applies to: 723-727

767-770: Past review issue appears resolved.

The previous review comment flagged duplicate token usage printing at this location. The current code only prints evaluation summaries here and does not call printTokenUsage, so the duplication concern has been addressed.

778-778: LGTM!

Correctly includes token usage statistics in the file evaluation result.

799-800: LGTM!

The global token aggregation correctly accumulates usage across all evaluated files with appropriate defensive checks.

Also applies to: 820-825

859-859: LGTM!

Correctly includes aggregated token usage statistics in the final evaluation result.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

src/cli/commands.ts (1)
161-161: Unsafe cast from string to OutputFormat enum.

The CLI option --output accepts any string, but it's cast directly to OutputFormat without validation. If a user passes an invalid value (e.g., --output xml), the code will proceed with an invalid format value instead of failing fast.

Consider validating the output format before use:
🔎 Proposed fix
-      const outputFormat = cliOptions.output as OutputFormat;
+      const outputFormat = cliOptions.output as OutputFormat;
+      if (!Object.values(OutputFormat).includes(outputFormat)) {
+        console.error(`Error: Invalid output format '${cliOptions.output}'. Valid options: line, json, vale-json, rdjson`);
+        process.exit(1);
+      }
src/cli/orchestrator.ts (2)
701-709: Type assertion for getLastUsage access.

The cast (evaluator as BaseEvaluator).getLastUsage?.() works because BaseEvaluator defines getLastUsage(), but this couples the orchestrator to the base implementation. If a custom evaluator doesn't extend BaseEvaluator, the usage won't be captured.

This is acceptable for now since all evaluators extend BaseEvaluator, but consider adding getLastUsage() to the Evaluator interface in the future for type safety.

851-868: Per-file cost calculation is redundant with aggregate calculation.

Cost is calculated per-file at lines 859-868, but this per-file cost is never used for display or reporting—only the aggregate cost in evaluateFiles is shown. The per-file tokenUsageStats.totalCost is included in EvaluateFileResult.tokenUsage, but the aggregate function at lines 926-929 only sums totalInputTokens and totalOutputTokens, ignoring the per-file costs and recalculating at line 955.

This is technically correct (due to floating-point precision, recalculating from totals is more accurate), but the per-file cost calculation is dead code. Consider removing it to simplify:
🔎 Simplify by removing per-file cost calculation
  const tokenUsageStats: TokenUsageStats = {
    totalInputTokens,
    totalOutputTokens,
  };

-  const cost = calculateCost(
-    {
-      inputTokens: totalInputTokens,
-      outputTokens: totalOutputTokens
-    },
-    pricing
-  );
-  if (cost !== undefined) {
-    tokenUsageStats.totalCost = cost;
-  }
src/cli/types.ts (1)
12-12: Consider using import type for type-only imports.

The import on line 12 brings in TokenUsage, TokenUsageStats, and PricingConfig which are all interfaces (types only). Per coding guidelines ("Use TypeScript ESM with explicit imports and narrow types"), consider using import type for consistency with other type imports in this file (lines 1-6).
🔎 Proposed fix
-import { TokenUsage, TokenUsageStats, PricingConfig } from '../providers/token-usage';
+import type { TokenUsage, TokenUsageStats, PricingConfig } from '../providers/token-usage';

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cee544f and d0c425a.

📒 Files selected for processing (5)

src/cli/commands.ts
src/cli/orchestrator.ts
src/cli/types.ts
src/evaluators/base-evaluator.ts
src/schemas/config-schemas.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/evaluators/base-evaluator.ts

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with no any; use unknown + schema validation for external data
Use custom error types with proper inheritance; catch blocks use unknown type

Files:

src/cli/commands.ts
src/schemas/config-schemas.ts
src/cli/types.ts
src/cli/orchestrator.ts

🧠 Learnings (4)

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/boundaries/**/*.ts : Use Zod schemas for boundary validation of all external data (files, CLI, env, APIs) at system boundaries

Applied to files:

src/schemas/config-schemas.ts

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/**/*.ts : Maintain strict TypeScript with no `any`; use `unknown` + schema validation for external data

Applied to files:

src/schemas/config-schemas.ts

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/output/**/*.ts : IDs should be shown as `PromptId.CriterionId` in output

Applied to files:

src/cli/types.ts
src/cli/orchestrator.ts

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to evals/**/*.md : Eval YAML should use `name` (human), `id` (PascalCase), and criteria `id` (PascalCase)

Applied to files:

src/cli/orchestrator.ts

🧬 Code graph analysis (3)

src/cli/commands.ts (1)

src/output/reporter.ts (1)

printTokenUsage (203-213)

src/cli/types.ts (1)

src/providers/token-usage.ts (3)

PricingConfig (12-15)

TokenUsageStats (6-10)

TokenUsage (1-4)

src/cli/orchestrator.ts (4)

src/evaluators/base-evaluator.ts (1)

BaseEvaluator (28-236)

src/evaluators/index.ts (1)

BaseEvaluator (16-16)

src/cli/types.ts (1)

RunPromptEvaluationResultSuccess (120-124)

src/providers/token-usage.ts (2)

TokenUsageStats (6-10)

calculateCost (21-30)

🔇 Additional comments (9)

src/schemas/config-schemas.ts (1)

11-11: Type safety improvement is correct and consistent.

The change from .optional() to .default([]) improves type safety by guaranteeing that runRules is always a defined array. This is consistent with the concurrency field pattern on line 6.

Verification shows all usages of runRules handle this change correctly. Code iterating over runRules works without issues since the schema now guarantees an empty array when not provided. Existing undefined checks in file-section-parser.ts and tests/utils.ts remain harmless though now always-true.
src/cli/commands.ts (2)
173-177: LGTM!

The pricing configuration is correctly passed through from environment variables. The PricingConfig interface properly allows undefined values, and downstream calculateCost handles missing pricing gracefully by returning undefined for cost.

180-190: Consider the display order of token usage vs. global summary.

Based on the PR objective ("Display token usage and cost information in the terminal after each completed evaluation, shown after the calculation summary"), the token usage should be displayed after the global summary, not before. Currently, printTokenUsage is called before printGlobalSummary.

If the current order is intentional for better UX, this is fine. Otherwise, consider swapping the order:
🔎 Suggested reorder (if after summary was intended)
      if (cliOptions.output === 'line') {
-        if (result.tokenUsage) {
-          printTokenUsage(result.tokenUsage);
-        }
        printGlobalSummary(
          result.totalFiles,
          result.totalErrors,
          result.totalWarnings,
          result.requestFailures
        );
+        if (result.tokenUsage) {
+          printTokenUsage(result.tokenUsage);
+        }
      }
src/cli/orchestrator.ts (3)

22-25: LGTM!

The token usage imports are correctly added and align with the types defined in src/providers/token-usage.ts.

947-958: LGTM!

The aggregate token usage and cost calculation is correctly implemented. Cost is calculated once from the total summed tokens, which is more accurate than summing per-file costs due to floating-point precision.

262-268: No functional change.

This is a formatting adjustment in the criterionId ternary expression. Logic remains unchanged.

src/cli/types.ts (3)

14-19: LGTM!

The OutputFormat enum properly defines all supported output formats with string values matching the CLI option strings.

120-128: LGTM!

The RunPromptEvaluationResultSuccess interface and updated RunPromptEvaluationResult union type follow a clean discriminated union pattern. The optional usage field appropriately handles cases where token usage isn't available.

30-30: LGTM!

The optional pricing and tokenUsage fields are correctly added to their respective interfaces, maintaining backward compatibility while enabling the new token tracking feature.

Also applies to: 40-40, 138-138

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/evaluators/evaluator.ts (1)
10-10: LGTM: Optional method enables token usage tracking.

The optional method maintains backward compatibility while enabling token usage tracking as intended by the PR objectives.

Optional refinements:
The return type could be simplified from TokenUsage | undefined to just TokenUsage, since the ? already signals the method may not exist. However, the explicit undefined makes it clear the method can return undefined even when implemented (e.g., before first evaluation).
Consider adding JSDoc to document the method's purpose:
/**
 * Returns token usage from the last evaluation, if available.
 */
getLastUsage?(): TokenUsage | undefined;
src/cli/orchestrator.ts (1)
940-941: Consider passing options.pricing directly.

The || {} fallback is unnecessary since calculateCost already handles undefined pricing gracefully (returns undefined when pricing config is missing). This simplifies the code and removes an unneeded intermediate variable.
🔎 Suggested simplification
-  // Calculate cost if pricing is configured
-  const pricing = options.pricing || {};
-  const cost = calculateCost({ inputTokens: totalInputTokens, outputTokens: totalOutputTokens }, pricing);
+  // Calculate cost if pricing is configured
+  const cost = calculateCost({ inputTokens: totalInputTokens, outputTokens: totalOutputTokens }, options.pricing);

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0c425a and 708e2c7.

📒 Files selected for processing (4)

src/cli/commands.ts
src/cli/orchestrator.ts
src/cli/types.ts
src/evaluators/evaluator.ts

🚧 Files skipped from review as they are similar to previous changes (2)

src/cli/types.ts
src/cli/commands.ts

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with no any; use unknown + schema validation for external data
Use custom error types with proper inheritance; catch blocks use unknown type

Files:

src/evaluators/evaluator.ts
src/cli/orchestrator.ts

🧠 Learnings (2)

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/output/**/*.ts : IDs should be shown as `PromptId.CriterionId` in output

Applied to files:

src/cli/orchestrator.ts

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to evals/**/*.md : Eval YAML should use `name` (human), `id` (PascalCase), and criteria `id` (PascalCase)

Applied to files:

src/cli/orchestrator.ts

🧬 Code graph analysis (2)

src/evaluators/evaluator.ts (4)

src/evaluators/index.ts (1)

Evaluator (13-13)

src/cli/types.ts (1)

EvaluationResult (33-41)

src/prompts/schema.ts (1)

EvaluationResult (171-171)

src/providers/token-usage.ts (1)

TokenUsage (1-4)

src/cli/orchestrator.ts (2)

src/cli/types.ts (1)

RunPromptEvaluationResultSuccess (120-124)

src/providers/token-usage.ts (2)

TokenUsageStats (6-10)

calculateCost (21-30)

🔇 Additional comments (4)

src/evaluators/evaluator.ts (1)

2-2: LGTM: Type import follows best practices.

The type-only import is correctly specified and follows TypeScript ESM guidelines.

src/cli/orchestrator.ts (3)

701-709: LGTM!

Clean use of optional chaining for getLastUsage?.() and conditional attachment of usage data to the result object. This correctly handles evaluators that may not support token usage tracking.

824-829: LGTM!

Token usage accumulation is correctly guarded and only processes successful evaluations that return usage data.

910-915: LGTM!

Aggregation of token usage across files is correctly guarded.

…sary variable

hurshore · 2025-12-29T13:23:44Z

There are ts errors in the following files: anthropic-provider.ts, anthropic-e2e.test.ts, openai-provider.test.ts, scoring-types.test.ts

hurshore · 2025-12-29T13:26:01Z

Have you been able to test this feature with all supported providers?

src/evaluators/base-evaluator.ts

src/providers/anthropic-provider.ts

- Remove `getLastUsage()` method and `lastUsage` field from BaseEvaluator - Move token usage from orchestrator result wrapper to nested evaluation result - Update `RunPromptEvaluationResultSuccess` to remove top-level usage field - Modify accuracy evaluator to aggregate token usage from claim extraction and base evaluation - Update orchestrator to access usage from `result.usage` instead of `r.usage` - Create `ClaimExtractionResult` interface to return claims with optional usage data - Ensure token usage is properly accumulated and returned within evaluation results - Simplifies the result structure by keeping usage data closer to the actual evaluation data

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/evaluators/accuracy-evaluator.ts (1)
64-68: Consider unconditionally assigning usage to avoid mutation.

The pattern of conditionally mutating result after creation works, but could be cleaner. Since calculateSemiObjectiveResult returns a mutable object, this is fine, but an alternative would be to spread usage into a new object.
🔎 Optional refactor for immutability
    if (claims.length === 0) {
      const wordCount = content.trim().split(/\s+/).length || 1;
-      const result = this.calculateSemiObjectiveResult([], wordCount);
-      if (claimUsage) result.usage = claimUsage;
-      return result;
+      return {
+        ...this.calculateSemiObjectiveResult([], wordCount),
+        ...(claimUsage && { usage: claimUsage }),
+      };
    }

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da2ffe5 and 705d914.

📒 Files selected for processing (5)

src/cli/orchestrator.ts
src/cli/types.ts
src/evaluators/accuracy-evaluator.ts
src/evaluators/base-evaluator.ts
src/prompts/schema.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/evaluators/base-evaluator.ts

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with no any; use unknown + schema validation for external data
Use custom error types with proper inheritance; catch blocks use unknown type

Files:

src/prompts/schema.ts
src/cli/orchestrator.ts
src/cli/types.ts
src/evaluators/accuracy-evaluator.ts

🧠 Learnings (3)

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/output/**/*.ts : IDs should be shown as `PromptId.CriterionId` in output

Applied to files:

src/cli/orchestrator.ts
src/cli/types.ts

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to evals/**/*.md : Eval YAML should use `name` (human), `id` (PascalCase), and criteria `id` (PascalCase)

Applied to files:

src/cli/orchestrator.ts

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/providers/**/*.ts : Depend on `LLMProvider` and `SearchProvider` interfaces; keep providers thin (transport only)

Applied to files:

src/evaluators/accuracy-evaluator.ts

🧬 Code graph analysis (4)

src/prompts/schema.ts (2)

src/providers/token-usage.ts (1)

TokenUsage (1-4)

src/evaluators/types.ts (1)

EvaluationType (14-17)

src/cli/orchestrator.ts (2)

src/cli/types.ts (1)

RunPromptEvaluationResultSuccess (120-123)

src/providers/token-usage.ts (2)

TokenUsageStats (6-10)

calculateCost (21-30)

src/cli/types.ts (9)

src/prompts/prompt-loader.ts (3)

PromptFile (8-8)

PromptCriterionSpec (8-8)

PromptMeta (8-8)

src/schemas/prompt-schemas.ts (3)

PromptFile (64-64)

PromptCriterionSpec (62-62)

PromptMeta (63-63)

src/providers/llm-provider.ts (1)

LLMProvider (8-10)

src/providers/search-provider.ts (1)

SearchProvider (5-7)

src/boundaries/file-section-parser.ts (1)

FilePatternConfig (2-6)

src/providers/token-usage.ts (2)

PricingConfig (12-15)

TokenUsageStats (6-10)

src/prompts/schema.ts (2)

EvaluationResult (174-174)

SubjectiveResult (123-143)

src/output/reporter.ts (1)

EvaluationSummary (7-11)

src/output/json-formatter.ts (2)

JsonFormatter (54-94)

ScoreComponent (10-18)

src/evaluators/accuracy-evaluator.ts (1)

src/providers/token-usage.ts (1)

TokenUsage (1-4)

🔇 Additional comments (14)

src/prompts/schema.ts (1)

2-2: LGTM!

The TokenUsage type is correctly imported and consistently added as an optional field to both SubjectiveResult and SemiObjectiveResult. This maintains type safety while allowing evaluation results to optionally carry token usage metadata from LLM providers.

Also applies to: 142-142, 171-171

src/evaluators/accuracy-evaluator.ts (3)

28-31: LGTM!

The ClaimExtractionResult interface cleanly encapsulates the return type of extractClaims, making the optional usage field explicit in the contract.

95-101: LGTM on token aggregation logic.

The aggregation correctly combines claim extraction usage with the base evaluator's usage, using || 0 to handle cases where result.usage might be undefined. This ensures accurate total token counts across the multi-step evaluation pipeline.

129-139: LGTM on claim extraction with usage propagation.

The destructuring correctly extracts both data and usage from the LLM provider's structured response. The conditional spread ...(usage && { usage }) is an idiomatic way to optionally include usage in the return object.

src/cli/orchestrator.ts (5)

1-27: LGTM on imports.

The imports are well-organized, bringing in the necessary types (TokenUsageStats) and utilities (calculateCost) for token usage tracking.

701-704: LGTM on typed success result.

Using the RunPromptEvaluationResultSuccess interface provides better type safety and aligns with the discriminated union pattern used for RunPromptEvaluationResult.

819-823: LGTM on per-prompt usage accumulation.

The accumulation correctly extracts token counts from each successful prompt evaluation result. The optional chaining on r.result.usage safely handles cases where usage data isn't present.

846-862: LGTM on file-level token usage aggregation.

The TokenUsageStats object correctly captures per-file totals and is properly included in the EvaluateFileResult.

928-941: LGTM on cost calculation integration.

The cost calculation is correctly applied only when pricing config is available, and the result is conditionally added to tokenUsage.totalCost. This aligns with the calculateCost function signature which returns undefined when pricing isn't configured.

src/cli/types.ts (5)

12-12: LGTM on type imports.

Using type imports for TokenUsageStats and PricingConfig correctly indicates these are type-only imports, which helps with tree-shaking and ensures no runtime code is generated.

30-30: LGTM on pricing config option.

Adding pricing?: PricingConfig to EvaluationOptions enables cost calculation when users provide pricing information via environment variables.

40-40: LGTM on token usage in EvaluationResult.

The optional tokenUsage field properly surfaces aggregated token statistics to callers of evaluateFiles.

120-127: LGTM on discriminated union refactor.

Extracting RunPromptEvaluationResultSuccess as a named interface improves code clarity and enables the orchestrator to reference this type explicitly when constructing success results. The union at lines 125-127 maintains the discriminated union pattern with ok as the discriminant.

135-138: LGTM on file result token usage.

Adding tokenUsage?: TokenUsageStats to EvaluateFileResult enables per-file token tracking, which is then aggregated in evaluateFiles for the overall total.

ayo6706 · 2025-12-29T20:36:29Z

anthropic-provider.ts

There are ts errors in the following files: anthropic-provider.ts, anthropic-e2e.test.ts, openai-provider.test.ts, scoring-types.test.ts

The errors from these files aren't related to this PR. I can remember fixing them. can't remember which actual PR.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/providers/anthropic-provider.ts (2)
95-95: Clarify the necessity of setting stream: false in both locations.

The stream: false parameter is set on both line 95 (in params) and line 131 (in anthropicParams). Unlike other E2E compatibility aliases (e.g., maxTokens, toolChoice), this uses the same snake_case name in both places.

Since anthropicParams is constructed independently and sent to the Anthropic API, is the stream: false on line 95 necessary for E2E mocks? If not, it could be removed from the initial params object and only set in anthropicParams.

This addresses the previous review question: "stream: false" ensures the API returns a complete response rather than a streaming response.
🔎 Optional: Remove redundant stream parameter if not needed for E2E

If E2E mocks don't require stream in the params object, you could simplify:
      max_tokens: this.config.maxTokens!,
      tools: [toolSchema],
      tool_choice: { type: 'tool', name: schema.name },
-      stream: false,
      maxTokens: this.config.maxTokens!,
      toolChoice: { type: 'tool', name: schema.name },
Also applies to: 131-131

132-134: Consider simplifying conditional spreads for always-defined fields.

The conditional spreads for system, tools, and tool_choice check for undefined, but these fields are always set earlier in the code:

system is set on line 85 from systemPrompt (line 76)

tools is always set on line 93

tool_choice is always set on line 94

Unless there are edge cases or future scenarios where these might be undefined, you could simplify to unconditional spreads or direct assignment.
🔎 Simplified approach
    const anthropicParams: Anthropic.Messages.MessageCreateParams = {
      model: params.model,
      messages: params.messages,
      max_tokens: params.max_tokens,
      stream: false,
-      ...(params.system !== undefined && { system: params.system }),
-      ...(params.tools !== undefined && { tools: params.tools }),
-      ...(params.tool_choice !== undefined && { tool_choice: params.tool_choice }),
+      system: params.system,
+      tools: params.tools,
+      tool_choice: params.tool_choice,
      ...(params.temperature !== undefined && { temperature: params.temperature }),
    };

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45889c6 and e2b1b1d.

📒 Files selected for processing (3)

src/evaluators/base-evaluator.ts
src/providers/anthropic-provider.ts
tests/scoring-types.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

tests/scoring-types.test.ts

🧰 Additional context used

📓 Path-based instructions (2)

src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with no any; use unknown + schema validation for external data
Use custom error types with proper inheritance; catch blocks use unknown type

Files:

src/providers/anthropic-provider.ts
src/evaluators/base-evaluator.ts

src/providers/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/providers/**/*.ts: Depend on LLMProvider and SearchProvider interfaces; keep providers thin (transport only)
Inject RequestBuilder via provider constructor to avoid coupling

Files:

src/providers/anthropic-provider.ts

🧠 Learnings (1)

📚 Learning: 2025-12-28T19:43:51.176Z

Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.176Z
Learning: Applies to src/providers/**/*.ts : Depend on `LLMProvider` and `SearchProvider` interfaces; keep providers thin (transport only)

Applied to files:

src/providers/anthropic-provider.ts

🧬 Code graph analysis (2)

src/providers/anthropic-provider.ts (1)

src/providers/llm-provider.ts (1)

LLMResult (3-6)

src/evaluators/base-evaluator.ts (1)

src/prompts/schema.ts (2)

SubjectiveLLMResult (96-110)

SemiObjectiveLLMResult (112-121)

🔇 Additional comments (4)

src/evaluators/base-evaluator.ts (1)

71-75: No changes needed—usage field is already properly typed.

The usage field in both SubjectiveResult and SemiObjectiveResult is already typed as usage?: TokenUsage;, where TokenUsage is a well-defined interface with inputTokens and outputTokens properties. The code already complies with the coding guidelines requiring strict TypeScript without any.

Likely an incorrect or invalid review comment.

src/providers/anthropic-provider.ts (3)

3-3: LGTM! Return type updated to support token usage tracking.

The import of LLMResult and the updated return type correctly align with the standardized interface for tracking token usage across providers.

Also applies to: 75-75

231-233: LGTM! Improved error handling for text blocks.

The safer extraction of firstTextBlock with an explicit existence check before accessing its properties is a good improvement that prevents potential undefined access errors.

163-171: No changes required. The ANTHROPIC_RESPONSE_SCHEMA defines usage as a required field (line 33 in src/schemas/anthropic-responses.ts), and ANTHROPIC_USAGE_SCHEMA requires both input_tokens and output_tokens as non-optional numbers. After schema validation via ANTHROPIC_RESPONSE_SCHEMA.parse(), TypeScript guarantees validatedResponse.usage is present, making direct access to validatedResponse.usage.input_tokens and validatedResponse.usage.output_tokens type-safe with no null checks needed.

Likely an incorrect or invalid review comment.

src/evaluators/base-evaluator.ts

…prove type safety

feat(token-usage): Add token usage tracking and cost calculation

ayo6706 added 5 commits December 18, 2025 11:02

Implement types for token usage

78f7219

Import token type into cli types

1e5cca5

Implement usage calculation

80810c1

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

src/cli/orchestrator.ts Outdated Show resolved Hide resolved

Clean up implementation

cee544f

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

ayo6706 added 2 commits December 29, 2025 06:56

Merge branch 'main' into ft/token-usage

d300092

Fix: Resolve wrong merge conflicts

d0c425a

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

chores: Clean up implementation

708e2c7

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

refactor(cli): Improve cost calculation formatting and remove unneces…

da2ffe5

…sary variable

hurshore reviewed Dec 29, 2025

View reviewed changes

src/evaluators/base-evaluator.ts Outdated Show resolved Hide resolved

src/providers/anthropic-provider.ts Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

ayo6706 added 2 commits December 29, 2025 21:47

refactor(accuracy-evaluator): Simplify empty claims result handling

45889c6

style: Standardize quote style and clean up Anthropic provider

e2b1b1d

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

src/evaluators/base-evaluator.ts Outdated Show resolved Hide resolved

refactor(providers): Clean up Anthropic provider configuration and im…

4240bae

…prove type safety

hurshore merged commit 2f5fe6c into main Dec 30, 2025
3 checks passed

This was referenced Dec 30, 2025

feat: implement bundled rule packs (presets) #32

Merged

feat: implement content chunking and dedicated scoring logic for evaluators #39

Merged

This was referenced Jan 2, 2026

Rename rule types from semi-objective/subjective to check/judge #49

Merged

feat: add zero-config style guide support with VECTORLINT.md #51

Merged

This was referenced Jan 13, 2026

refactor: replace analysis field with issue/message pair #53

Open

Two-Phase Evaluation Architecture #55

Open

coderabbitai bot mentioned this pull request Feb 2, 2026

Always include user instruction in every run #56

Merged

coderabbitai bot mentioned this pull request Feb 24, 2026

refactor(providers): migrate to Vercel AI SDK for unified LLM provider interface #62

Merged

oshorefueled pushed a commit that referenced this pull request Mar 2, 2026

Merge pull request #40 from TRocket-Labs/ft/token-usage

e288588

feat(token-usage): Add token usage tracking and cost calculation

coderabbitai bot mentioned this pull request Mar 19, 2026

feat(agent): add agent mode for cross-document evaluation #74

Open

Conversation

ayo6706 commented Dec 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

hurshore commented Dec 29, 2025

Uh oh!

hurshore commented Dec 29, 2025

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

ayo6706 commented Dec 29, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ayo6706 commented Dec 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 18, 2025 •

edited

Loading