refactor: standardize evaluation terminology to Check and Judge#52
refactor: standardize evaluation terminology to Check and Judge#52oshorefueled merged 4 commits intomainfrom
Conversation
… to judge/check - Rename `buildSubjectiveLLMSchema()` to `buildJudgeLLMSchema()` - Rename `buildSemiObjectiveLLMSchema()` to `buildCheckLLMSchema()` - Rename `SubjectiveLLMResult` type to `JudgeLLMResult` - Rename `SemiObjectiveLLMResult` type to `CheckLLMResult` - Rename `SubjectiveResult` type to `JudgeResult` - Rename `SemiObjectiveItem` type to `CheckItem` - Rename `SemiObjectiveResult` type to `CheckResult` - Rename `isSubjectiveResult()` function to `isJudgeResult()` - Rename `isSemiObjectiveResult()` function to `isCheckResult()` - Update `EvaluationResult` union type to use new type names - Align schema naming with recent rule type terminology changes
- Rename SemiObjectiveItem to CheckItem in chunking/merger.ts - Rename SubjectiveResult to JudgeResult in cli/orchestrator.ts and cli/types.ts - Rename SemiObjectiveResult to CheckResult in evaluators - Rename buildSubjectiveLLMSchema to buildJudgeLLMSchema - Rename buildSemiObjectiveLLMSchema to buildCheckLLMSchema - Rename SubjectiveLLMResult to JudgeLLMResult - Rename SemiObjectiveLLMResult to CheckLLMResult - Rename calculateSubjectiveScore to calculateJudgeScore - Rename calculateSemiObjectiveScore to calculateCheckScore - Rename averageSubjectiveScores to averageJudgeScores - Rename isSubjectiveResult to isJudgeResult - Update all comments and documentation to use judge/check terminology - Update test files to reflect new type names - Ensures consistent naming convention across codebase following recent rule type refactoring
📝 WalkthroughWalkthroughThis PR renames evaluation-related types, functions, and predicates across the codebase, replacing "Subjective/Semi-Objective" terminology with "Judge/Check" naming (types, schema builders, scoring functions, predicates, and related imports/usages). Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/prompts/schema.ts (1)
58-94: Schema/type drift:buildCheckLLMSchema()requires fields thatCheckLLMResultmarks optional.Right now the schema requires
quoted_text,context_before,context_after,description,analysis,suggestion, butCheckLLMResultallows most of these to be absent. This is a maintainability trap (types suggest “optional”, runtime schema rejects).Proposed fix (align TS type to the schema)
export type CheckLLMResult = { violations: Array<{ - description: string; - analysis: string; - suggestion?: string; - quoted_text?: string; - context_before?: string; - context_after?: string; + line?: number; + description: string; + analysis: string; + suggestion: string; + quoted_text: string; + context_before: string; + context_after: string; }>; };
🧹 Nitpick comments (7)
src/prompts/schema.ts (2)
96-122: Optional: includeline?: numberinJudgeLLMResultviolations for parity with the schema.The judge schema allows
line, butJudgeLLMResultdrops it. That’s fine if intentionally unused, but it’s surprising given the schema.Possible tweak
export type JudgeLLMResult = { criteria: Array<{ @@ violations: Array<{ + line?: number; quoted_text: string; context_before: string; context_after: string; analysis: string; suggestion: string; }>; }>; };
123-173: Public API naming:export type EvaluationResultis easy to confuse with CLI’s aggregatedEvaluationResult.You’re already working around it via import-aliasing (
EvaluationResult as PromptEvaluationResult) elsewhere. Consider renaming this to something likePromptEvaluationResultto reduce repeated aliasing and accidental misuse.src/cli/orchestrator.ts (1)
538-645: Future-proof routing: preferisCheckResult(result)over!isJudgeResult(result).Today it’s equivalent (two-type union), but
!isJudgeResultwill silently treat any future third type as “Check”. Using the explicit guard makes intent and safety clearer.Proposed change
-import { isJudgeResult } from '../prompts/schema'; +import { isCheckResult, isJudgeResult } from '../prompts/schema'; @@ - // Handle Check Result - if (!isJudgeResult(result)) { + // Handle Check Result + if (isCheckResult(result)) { const severity = result.severity; const violationCount = result.violations.length; @@ - } + } else if (!isJudgeResult(result)) { + // If additional evaluation types are ever introduced, fail loudly. + throw new Error(`Unknown evaluation result type: ${(result as { type?: unknown }).type}`); + } - // Handle Judge Result + // Handle Judge Resulttests/scoring-types.test.ts (1)
80-135: Check mock result is aligned withCheckLLMResult; consider renamingid/namefromsemi→checkfor consistency.src/scoring/scorer.ts (3)
48-100: GuardwordCount(and consider clamping strictness) to avoid surprising scores from invalid inputs.Proposed diff
export function calculateCheckScore( violations: CheckItem[], wordCount: number, options: CheckScoringOptions = {} ): CheckResult { - const strictness = resolveStrictness(options.strictness); + const strictness = resolveStrictness(options.strictness); + const safeWordCount = Math.max(1, wordCount); // Map items to violation format const mappedViolations = violations.map((item) => ({ analysis: item.analysis, @@ })); // Density Calculation: Violations per 100 words - const density = (mappedViolations.length / wordCount) * 100; + const density = (mappedViolations.length / safeWordCount) * 100;
108-144: Use??instead of||for criterion weights to avoid treatingweight: 0as “unset”.Proposed diff
- const weight = definedCriterion?.weight || 1; + const weight = definedCriterion?.weight ?? 1;
147-165:averageJudgeScoreswarning on length mismatch is good; consider failing fast in tests/dev to avoid silently underweighting missing word counts.Also applies to: 232-281
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
src/chunking/merger.tssrc/cli/orchestrator.tssrc/cli/types.tssrc/evaluators/accuracy-evaluator.tssrc/evaluators/base-evaluator.tssrc/prompts/schema.tssrc/scoring/index.tssrc/scoring/scorer.tstests/scoring-types.test.ts
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.ts
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with noany; useunknown+ schema validation for external data
Use custom error types with proper inheritance; catch blocks useunknowntype
Files:
src/cli/types.tssrc/cli/orchestrator.tssrc/chunking/merger.tssrc/evaluators/accuracy-evaluator.tssrc/evaluators/base-evaluator.tssrc/prompts/schema.tssrc/scoring/scorer.tssrc/scoring/index.ts
tests/**/*.test.ts
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.test.ts: Write tests using Vitest framework with focus on config parsing, file discovery, schema/structured output, and locator
Use dependency injection in tests: mock providers; do not hit network in unit tests
Files:
tests/scoring-types.test.ts
🧠 Learnings (4)
📚 Learning: 2025-12-28T19:43:51.189Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.189Z
Learning: Applies to src/output/**/*.ts : IDs should be shown as `PromptId.CriterionId` in output
Applied to files:
src/cli/types.tssrc/prompts/schema.ts
📚 Learning: 2025-12-28T19:43:51.189Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.189Z
Learning: Applies to src/boundaries/**/*.ts : Use Zod schemas for boundary validation of all external data (files, CLI, env, APIs) at system boundaries
Applied to files:
src/evaluators/accuracy-evaluator.ts
📚 Learning: 2025-12-28T19:43:51.189Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.189Z
Learning: Applies to src/providers/**/*.ts : Depend on `LLMProvider` and `SearchProvider` interfaces; keep providers thin (transport only)
Applied to files:
tests/scoring-types.test.ts
📚 Learning: 2025-12-28T19:43:51.189Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.189Z
Learning: Applies to src/**/*.ts : Maintain strict TypeScript with no `any`; use `unknown` + schema validation for external data
Applied to files:
src/prompts/schema.ts
🧬 Code graph analysis (7)
src/cli/types.ts (1)
src/prompts/schema.ts (1)
JudgeResult(123-143)
src/cli/orchestrator.ts (1)
src/prompts/schema.ts (1)
isJudgeResult(175-179)
src/chunking/merger.ts (2)
src/chunking/index.ts (1)
mergeViolations(3-3)src/prompts/schema.ts (1)
CheckItem(145-152)
src/evaluators/accuracy-evaluator.ts (1)
src/scoring/scorer.ts (1)
calculateCheckScore(48-100)
tests/scoring-types.test.ts (2)
src/providers/llm-provider.ts (1)
LLMResult(3-6)src/prompts/schema.ts (2)
JudgeLLMResult(96-110)CheckLLMResult(112-121)
src/prompts/schema.ts (1)
src/cli/types.ts (1)
EvaluationResult(33-41)
src/scoring/scorer.ts (2)
src/scoring/index.ts (5)
CheckScoringOptions(2-2)JudgeScoringOptions(3-3)calculateCheckScore(4-4)calculateJudgeScore(5-5)averageJudgeScores(6-6)src/prompts/schema.ts (4)
CheckItem(145-152)CheckResult(154-171)JudgeLLMResult(96-110)JudgeResult(123-143)
🔇 Additional comments (16)
src/prompts/schema.ts (2)
4-56:buildJudgeLLMSchema()looks consistent and strict in the right places.The optional
linefield (present but not required) matches the intent of the rest of the schema and avoids hard-failing on missing line numbers.
175-185: Type guards look correct.
result.type === EvaluationType.JUDGE/CHECKis the cleanest discriminator here.src/cli/types.ts (3)
10-12: Import aliasing is appropriate to avoid theEvaluationResultname collision.No concerns with
EvaluationResult as PromptEvaluationResulthere.
103-106: Same forValidationParams.result: JudgeResult.Good tightening of types (validation is judge-only).
87-111: TheProcessCriterionParams.resultnarrowing toJudgeResultis correct and properly enforced.The call to
extractAndReportCriterionat line 662 is correctly placed inside the Judge Result branch (after theif (!isJudgeResult(result))guard). No changes needed.src/evaluators/accuracy-evaluator.ts (1)
13-14: Rename propagation looks correct:calculateCheckScoreused in the no-claims path.The call signature matches the provided scorer snippet (violations array + wordCount + options).
Also applies to: 66-76
src/cli/orchestrator.ts (1)
10-10: Import updated as expected for new terminology.src/chunking/merger.ts (1)
1-5: Type rename and deduplication logic verified as correct.The CheckItem type rename is fully applied with no remaining SemiObjectiveItem references. The merge/dedupe implementation correctly flattens input and deduplicates using the composite key (quoted_text + description + analysis). Call site in base-evaluator.ts properly passes CheckLLMResult["violations"][] which matches the CheckItem[][] signature.
tests/scoring-types.test.ts (3)
6-9: Type import rename looks consistent with the new public schema surface.
35-77: Judge mock result is aligned withJudgeLLMResult(score 1–4) and weighted scoring expectations.
137-159: Empty-violations case correctly exercises the “perfect score” path withCheckLLMResult.src/scoring/index.ts (1)
1-7: Public scoring exports are cleanly updated to Check/Judge terminology.src/evaluators/base-evaluator.ts (3)
4-12: Renames to Judge/Check schema + scoring functions are consistent and improve terminology alignment.Also applies to: 22-26, 32-42
186-231: Check evaluation path correctly scores once from merged violations using original (non-line-numbered) word count.
117-176: Remove this concern —countWords()explicitly strips line number prefixes before counting words.The
countWords()function insrc/chunking/utils.tsincludes logic to remove line number prefixes using the regex/^\d+\t/gmbefore word counting. This was intentionally designed to ensure accurate word counts for line-numbered content, preventing any skew in chunk weighting.Likely an incorrect or invalid review comment.
src/scoring/scorer.ts (1)
9-24: New scoring option types are clear and match the Check/Judge rename intent.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/prompts/schema.ts (1)
112-121: Critical: Schema/type mismatch for CheckLLMResult.The
buildCheckLLMSchemafunction markssuggestion,quoted_text,context_before, andcontext_afteras required fields (lines 80-87), but theCheckLLMResulttype marks them as optional. This creates a mismatch between what the LLM is forced to return and what TypeScript enforces.🔧 Proposed fix to align type with schema
export type CheckLLMResult = { violations: Array<{ description: string; analysis: string; - suggestion?: string; - quoted_text?: string; - context_before?: string; - context_after?: string; + suggestion: string; + quoted_text: string; + context_before: string; + context_after: string; }>; };
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/cli/types.tssrc/evaluators/accuracy-evaluator.tssrc/evaluators/base-evaluator.tssrc/evaluators/evaluator.tssrc/prompts/schema.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/evaluators/accuracy-evaluator.ts
- src/cli/types.ts
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.ts
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with noany; useunknown+ schema validation for external data
Use custom error types with proper inheritance; catch blocks useunknowntype
Files:
src/evaluators/evaluator.tssrc/prompts/schema.tssrc/evaluators/base-evaluator.ts
🧠 Learnings (1)
📚 Learning: 2025-12-28T19:43:51.189Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.189Z
Learning: Applies to src/output/**/*.ts : IDs should be shown as `PromptId.CriterionId` in output
Applied to files:
src/prompts/schema.ts
🧬 Code graph analysis (2)
src/evaluators/evaluator.ts (1)
src/prompts/schema.ts (1)
PromptEvaluationResult(173-173)
src/evaluators/base-evaluator.ts (4)
src/prompts/schema.ts (7)
PromptEvaluationResult(173-173)JudgeResult(123-143)buildJudgeLLMSchema(4-56)JudgeLLMResult(96-110)CheckResult(154-171)buildCheckLLMSchema(58-94)CheckLLMResult(112-121)src/scoring/scorer.ts (3)
calculateJudgeScore(108-144)averageJudgeScores(147-281)calculateCheckScore(48-100)src/output/line-numbering.ts (1)
prependLineNumbers(13-18)src/chunking/utils.ts (1)
countWords(5-10)
🔇 Additional comments (6)
src/prompts/schema.ts (2)
4-94: LGTM!The schema builders have been correctly renamed to
buildJudgeLLMSchemaandbuildCheckLLMSchema, with appropriate internal schema names (vectorlint_judge_resultandvectorlint_check_result). The structure and strict mode usage are appropriate.
96-110: LGTM!The type definitions and type guards have been successfully renamed to use the new Check/Judge terminology. The discriminated union pattern with
EvaluationTypeis correctly implemented in the type guards.Also applies to: 123-185
src/evaluators/evaluator.ts (1)
1-9: LGTM!The
Evaluatorinterface has been correctly updated to use the renamedPromptEvaluationResulttype. The refactoring is clean and maintains the same contract.src/evaluators/base-evaluator.ts (3)
4-12: LGTM!The imports and class-level changes correctly adopt the new Check/Judge terminology. All type imports, scoring function imports, comments, and method signatures are properly updated.
Also applies to: 23-26, 33-50
117-176: LGTM!The
runJudgeEvaluationmethod has been correctly updated to use the new terminology:buildJudgeLLMSchema,JudgeLLMResult,JudgeResult,calculateJudgeScore, andaverageJudgeScores. The implementation logic remains unchanged and correct.
186-231: LGTM!The
runCheckEvaluationmethod has been correctly updated to use the new terminology:buildCheckLLMSchema,CheckLLMResult,CheckResult, andcalculateCheckScore. The array type annotationCheckLLMResult["violations"][]is properly used for type safety.
- Rename test prompt id from "test-semi" to "test-check" - Update prompt metadata id to match new naming convention - Change prompt name from "Test Semi" to "Test Check" - Align test identifiers with refactored check/judge type naming
* refactor(prompts): rename schema types from subjective/semi-objective to judge/check - Rename `buildSubjectiveLLMSchema()` to `buildJudgeLLMSchema()` - Rename `buildSemiObjectiveLLMSchema()` to `buildCheckLLMSchema()` - Rename `SubjectiveLLMResult` type to `JudgeLLMResult` - Rename `SemiObjectiveLLMResult` type to `CheckLLMResult` - Rename `SubjectiveResult` type to `JudgeResult` - Rename `SemiObjectiveItem` type to `CheckItem` - Rename `SemiObjectiveResult` type to `CheckResult` - Rename `isSubjectiveResult()` function to `isJudgeResult()` - Rename `isSemiObjectiveResult()` function to `isCheckResult()` - Update `EvaluationResult` union type to use new type names - Align schema naming with recent rule type terminology changes * refactor(schema): rename subjective/semi-objective types to judge/check - Rename SemiObjectiveItem to CheckItem in chunking/merger.ts - Rename SubjectiveResult to JudgeResult in cli/orchestrator.ts and cli/types.ts - Rename SemiObjectiveResult to CheckResult in evaluators - Rename buildSubjectiveLLMSchema to buildJudgeLLMSchema - Rename buildSemiObjectiveLLMSchema to buildCheckLLMSchema - Rename SubjectiveLLMResult to JudgeLLMResult - Rename SemiObjectiveLLMResult to CheckLLMResult - Rename calculateSubjectiveScore to calculateJudgeScore - Rename calculateSemiObjectiveScore to calculateCheckScore - Rename averageSubjectiveScores to averageJudgeScores - Rename isSubjectiveResult to isJudgeResult - Update all comments and documentation to use judge/check terminology - Update test files to reflect new type names - Ensures consistent naming convention across codebase following recent rule type refactoring * chore: Rename EvalutionResult to PromptEvaluationResult * test(scoring-types): Update check evaluation test identifiers - Rename test prompt id from "test-semi" to "test-check" - Update prompt metadata id to match new naming convention - Change prompt name from "Test Semi" to "Test Check" - Align test identifiers with refactored check/judge type naming
This refactor updates the codebase to align internal terminology with our product types. It replaces the legacy terms "Semi-Objective" and "Subjective" with the canonical "Check" and "Judge" respectively.
Changes
SemiObjective*toCheck*andSubjective*toJudge*in schema.ts and types.ts.Reasoning
Consistent terminology reduces cognitive load for developers and prevents confusion between internal code and user-facing documentation.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.