Skip to content

refactor: standardize evaluation terminology to Check and Judge#52

Merged
oshorefueled merged 4 commits intomainfrom
refactor/check-judge-terminology
Jan 13, 2026
Merged

refactor: standardize evaluation terminology to Check and Judge#52
oshorefueled merged 4 commits intomainfrom
refactor/check-judge-terminology

Conversation

@ayo6706
Copy link
Collaborator

@ayo6706 ayo6706 commented Jan 9, 2026

This refactor updates the codebase to align internal terminology with our product types. It replaces the legacy terms "Semi-Objective" and "Subjective" with the canonical "Check" and "Judge" respectively.

Changes

  • Schemas & Types: Renamed SemiObjective* to Check* and Subjective* to Judge* in schema.ts and types.ts.
  • Scoring Engine: Updated scorer.ts functions (calculateCheckScore, calculateJudgeScore) to match the new naming.
  • Evaluators: Updated BaseEvaluator and AccuracyEvaluator to use the new types and scoring functions.
  • Orchestrator: Refactored orchestrator.ts to route results using isCheckResult / isJudgeResult.
  • Tests: updated scoring-types.test.ts to verify the new terminology.

Reasoning

Consistent terminology reduces cognitive load for developers and prevents confusion between internal code and user-facing documentation.

Summary by CodeRabbit

  • Refactor
    • Unified public naming across prompts, evaluators, and scoring: "Subjective"/"SemiObjective" → "Judge"/"Check".
    • Renamed public types, scoring functions, and result types to the new vocabulary for a consistent API.
    • No changes to behavior or user-visible outputs; improves clarity and integration consistency for consumers.

✏️ Tip: You can customize this high-level summary in your review settings.

… to judge/check

- Rename `buildSubjectiveLLMSchema()` to `buildJudgeLLMSchema()`
- Rename `buildSemiObjectiveLLMSchema()` to `buildCheckLLMSchema()`
- Rename `SubjectiveLLMResult` type to `JudgeLLMResult`
- Rename `SemiObjectiveLLMResult` type to `CheckLLMResult`
- Rename `SubjectiveResult` type to `JudgeResult`
- Rename `SemiObjectiveItem` type to `CheckItem`
- Rename `SemiObjectiveResult` type to `CheckResult`
- Rename `isSubjectiveResult()` function to `isJudgeResult()`
- Rename `isSemiObjectiveResult()` function to `isCheckResult()`
- Update `EvaluationResult` union type to use new type names
- Align schema naming with recent rule type terminology changes
- Rename SemiObjectiveItem to CheckItem in chunking/merger.ts
- Rename SubjectiveResult to JudgeResult in cli/orchestrator.ts and cli/types.ts
- Rename SemiObjectiveResult to CheckResult in evaluators
- Rename buildSubjectiveLLMSchema to buildJudgeLLMSchema
- Rename buildSemiObjectiveLLMSchema to buildCheckLLMSchema
- Rename SubjectiveLLMResult to JudgeLLMResult
- Rename SemiObjectiveLLMResult to CheckLLMResult
- Rename calculateSubjectiveScore to calculateJudgeScore
- Rename calculateSemiObjectiveScore to calculateCheckScore
- Rename averageSubjectiveScores to averageJudgeScores
- Rename isSubjectiveResult to isJudgeResult
- Update all comments and documentation to use judge/check terminology
- Update test files to reflect new type names
- Ensures consistent naming convention across codebase following recent rule type refactoring
@coderabbitai
Copy link

coderabbitai bot commented Jan 9, 2026

📝 Walkthrough

Walkthrough

This PR renames evaluation-related types, functions, and predicates across the codebase, replacing "Subjective/Semi-Objective" terminology with "Judge/Check" naming (types, schema builders, scoring functions, predicates, and related imports/usages).

Changes

Cohort / File(s) Change Summary
Schema & Type Definitions
src/prompts/schema.ts
Renamed public types and builders: SubjectiveLLMResultJudgeLLMResult, SemiObjectiveLLMResultCheckLLMResult, SubjectiveResultJudgeResult, SemiObjectiveItemCheckItem, SemiObjectiveResultCheckResult; buildSubjectiveLLMSchemabuildJudgeLLMSchema, buildSemiObjectiveLLMSchemabuildCheckLLMSchema; type guards isSubjectiveResultisJudgeResult, isSemiObjectiveResultisCheckResult.
Scoring Implementation & Exports
src/scoring/scorer.ts, src/scoring/index.ts
Renamed scoring options and functions: SubjectiveScoringOptionsJudgeScoringOptions, SemiObjectiveScoringOptionsCheckScoringOptions (added promptSeverity?); calculateSubjectiveScorecalculateJudgeScore, calculateSemiObjectiveScorecalculateCheckScore, averageSubjectiveScoresaverageJudgeScores; updated signatures, types, and aggregation logic.
Evaluation Layer
src/evaluators/base-evaluator.ts, src/evaluators/evaluator.ts
Replaced EvaluationResult with PromptEvaluationResult; replaced Subjective/Semi-Objective flows with Judge/Check: runJudgeEvaluation returns JudgeResult, runCheckEvaluation returns CheckResult; switched schema builders and scorer invocations to new names and updated chunk/result typings.
Accuracy Evaluator
src/evaluators/accuracy-evaluator.ts
Swapped calculateSemiObjectiveScorecalculateCheckScore; updated evaluate return typing to Promise<PromptEvaluationResult> and related imports.
CLI & Orchestration
src/cli/orchestrator.ts, src/cli/types.ts
Replaced predicate isSubjectiveResultisJudgeResult in routing/control flow; updated ProcessCriterionParams.result and ValidationParams.result types from SubjectiveResultJudgeResult; comments/docs adjusted.
Utilities / Chunking
src/chunking/merger.ts
Updated mergeViolations signature to accept chunkViolations: CheckItem[][] and return CheckItem[]; internal flatten/dedup logic unchanged.
Tests
tests/scoring-types.test.ts
Updated test typings/imports to use JudgeLLMResult and CheckLLMResult in mock LLM responses and generic test types.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • oshorefueled

"I hopped through code with a twitchy nose,
Names swapped like carrots in tidy rows,
Judge and Check now dance and play,
Old labels hopped away today,
Bugs stay small, the build still glows. 🐇✨"

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: a systematic refactor standardizing evaluation terminology from 'Semi-Objective/Subjective' to 'Check/Judge' across the entire codebase, affecting schemas, types, scoring functions, and evaluators.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad4cca6 and ea9743b.

📒 Files selected for processing (1)
  • tests/scoring-types.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/scoring-types.test.ts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/prompts/schema.ts (1)

58-94: Schema/type drift: buildCheckLLMSchema() requires fields that CheckLLMResult marks optional.

Right now the schema requires quoted_text, context_before, context_after, description, analysis, suggestion, but CheckLLMResult allows most of these to be absent. This is a maintainability trap (types suggest “optional”, runtime schema rejects).

Proposed fix (align TS type to the schema)
 export type CheckLLMResult = {
   violations: Array<{
-    description: string;
-    analysis: string;
-    suggestion?: string;
-    quoted_text?: string;
-    context_before?: string;
-    context_after?: string;
+    line?: number;
+    description: string;
+    analysis: string;
+    suggestion: string;
+    quoted_text: string;
+    context_before: string;
+    context_after: string;
   }>;
 };
🧹 Nitpick comments (7)
src/prompts/schema.ts (2)

96-122: Optional: include line?: number in JudgeLLMResult violations for parity with the schema.

The judge schema allows line, but JudgeLLMResult drops it. That’s fine if intentionally unused, but it’s surprising given the schema.

Possible tweak
 export type JudgeLLMResult = {
   criteria: Array<{
@@
     violations: Array<{
+      line?: number;
       quoted_text: string;
       context_before: string;
       context_after: string;
       analysis: string;
       suggestion: string;
     }>;
   }>;
 };

123-173: Public API naming: export type EvaluationResult is easy to confuse with CLI’s aggregated EvaluationResult.

You’re already working around it via import-aliasing (EvaluationResult as PromptEvaluationResult) elsewhere. Consider renaming this to something like PromptEvaluationResult to reduce repeated aliasing and accidental misuse.

src/cli/orchestrator.ts (1)

538-645: Future-proof routing: prefer isCheckResult(result) over !isJudgeResult(result).

Today it’s equivalent (two-type union), but !isJudgeResult will silently treat any future third type as “Check”. Using the explicit guard makes intent and safety clearer.

Proposed change
-import { isJudgeResult } from '../prompts/schema';
+import { isCheckResult, isJudgeResult } from '../prompts/schema';
@@
-  // Handle Check Result
-  if (!isJudgeResult(result)) {
+  // Handle Check Result
+  if (isCheckResult(result)) {
     const severity = result.severity;
     const violationCount = result.violations.length;
@@
-  }
+  } else if (!isJudgeResult(result)) {
+    // If additional evaluation types are ever introduced, fail loudly.
+    throw new Error(`Unknown evaluation result type: ${(result as { type?: unknown }).type}`);
+  }
 
-  // Handle Judge Result
+  // Handle Judge Result
tests/scoring-types.test.ts (1)

80-135: Check mock result is aligned with CheckLLMResult; consider renaming id/name from semicheck for consistency.

src/scoring/scorer.ts (3)

48-100: Guard wordCount (and consider clamping strictness) to avoid surprising scores from invalid inputs.

Proposed diff
 export function calculateCheckScore(
   violations: CheckItem[],
   wordCount: number,
   options: CheckScoringOptions = {}
 ): CheckResult {
-  const strictness = resolveStrictness(options.strictness);
+  const strictness = resolveStrictness(options.strictness);
+  const safeWordCount = Math.max(1, wordCount);

   // Map items to violation format
   const mappedViolations = violations.map((item) => ({
     analysis: item.analysis,
@@
   }));

   // Density Calculation: Violations per 100 words
-  const density = (mappedViolations.length / wordCount) * 100;
+  const density = (mappedViolations.length / safeWordCount) * 100;

108-144: Use ?? instead of || for criterion weights to avoid treating weight: 0 as “unset”.

Proposed diff
-    const weight = definedCriterion?.weight || 1;
+    const weight = definedCriterion?.weight ?? 1;

147-165: averageJudgeScores warning on length mismatch is good; consider failing fast in tests/dev to avoid silently underweighting missing word counts.

Also applies to: 232-281

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0fddb2c and f79bb9b.

📒 Files selected for processing (9)
  • src/chunking/merger.ts
  • src/cli/orchestrator.ts
  • src/cli/types.ts
  • src/evaluators/accuracy-evaluator.ts
  • src/evaluators/base-evaluator.ts
  • src/prompts/schema.ts
  • src/scoring/index.ts
  • src/scoring/scorer.ts
  • tests/scoring-types.test.ts
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with no any; use unknown + schema validation for external data
Use custom error types with proper inheritance; catch blocks use unknown type

Files:

  • src/cli/types.ts
  • src/cli/orchestrator.ts
  • src/chunking/merger.ts
  • src/evaluators/accuracy-evaluator.ts
  • src/evaluators/base-evaluator.ts
  • src/prompts/schema.ts
  • src/scoring/scorer.ts
  • src/scoring/index.ts
tests/**/*.test.ts

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.test.ts: Write tests using Vitest framework with focus on config parsing, file discovery, schema/structured output, and locator
Use dependency injection in tests: mock providers; do not hit network in unit tests

Files:

  • tests/scoring-types.test.ts
🧠 Learnings (4)
📚 Learning: 2025-12-28T19:43:51.189Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.189Z
Learning: Applies to src/output/**/*.ts : IDs should be shown as `PromptId.CriterionId` in output

Applied to files:

  • src/cli/types.ts
  • src/prompts/schema.ts
📚 Learning: 2025-12-28T19:43:51.189Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.189Z
Learning: Applies to src/boundaries/**/*.ts : Use Zod schemas for boundary validation of all external data (files, CLI, env, APIs) at system boundaries

Applied to files:

  • src/evaluators/accuracy-evaluator.ts
📚 Learning: 2025-12-28T19:43:51.189Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.189Z
Learning: Applies to src/providers/**/*.ts : Depend on `LLMProvider` and `SearchProvider` interfaces; keep providers thin (transport only)

Applied to files:

  • tests/scoring-types.test.ts
📚 Learning: 2025-12-28T19:43:51.189Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.189Z
Learning: Applies to src/**/*.ts : Maintain strict TypeScript with no `any`; use `unknown` + schema validation for external data

Applied to files:

  • src/prompts/schema.ts
🧬 Code graph analysis (7)
src/cli/types.ts (1)
src/prompts/schema.ts (1)
  • JudgeResult (123-143)
src/cli/orchestrator.ts (1)
src/prompts/schema.ts (1)
  • isJudgeResult (175-179)
src/chunking/merger.ts (2)
src/chunking/index.ts (1)
  • mergeViolations (3-3)
src/prompts/schema.ts (1)
  • CheckItem (145-152)
src/evaluators/accuracy-evaluator.ts (1)
src/scoring/scorer.ts (1)
  • calculateCheckScore (48-100)
tests/scoring-types.test.ts (2)
src/providers/llm-provider.ts (1)
  • LLMResult (3-6)
src/prompts/schema.ts (2)
  • JudgeLLMResult (96-110)
  • CheckLLMResult (112-121)
src/prompts/schema.ts (1)
src/cli/types.ts (1)
  • EvaluationResult (33-41)
src/scoring/scorer.ts (2)
src/scoring/index.ts (5)
  • CheckScoringOptions (2-2)
  • JudgeScoringOptions (3-3)
  • calculateCheckScore (4-4)
  • calculateJudgeScore (5-5)
  • averageJudgeScores (6-6)
src/prompts/schema.ts (4)
  • CheckItem (145-152)
  • CheckResult (154-171)
  • JudgeLLMResult (96-110)
  • JudgeResult (123-143)
🔇 Additional comments (16)
src/prompts/schema.ts (2)

4-56: buildJudgeLLMSchema() looks consistent and strict in the right places.

The optional line field (present but not required) matches the intent of the rest of the schema and avoids hard-failing on missing line numbers.


175-185: Type guards look correct.

result.type === EvaluationType.JUDGE/CHECK is the cleanest discriminator here.

src/cli/types.ts (3)

10-12: Import aliasing is appropriate to avoid the EvaluationResult name collision.

No concerns with EvaluationResult as PromptEvaluationResult here.


103-106: Same for ValidationParams.result: JudgeResult.

Good tightening of types (validation is judge-only).


87-111: The ProcessCriterionParams.result narrowing to JudgeResult is correct and properly enforced.

The call to extractAndReportCriterion at line 662 is correctly placed inside the Judge Result branch (after the if (!isJudgeResult(result)) guard). No changes needed.

src/evaluators/accuracy-evaluator.ts (1)

13-14: Rename propagation looks correct: calculateCheckScore used in the no-claims path.

The call signature matches the provided scorer snippet (violations array + wordCount + options).

Also applies to: 66-76

src/cli/orchestrator.ts (1)

10-10: Import updated as expected for new terminology.

src/chunking/merger.ts (1)

1-5: Type rename and deduplication logic verified as correct.

The CheckItem type rename is fully applied with no remaining SemiObjectiveItem references. The merge/dedupe implementation correctly flattens input and deduplicates using the composite key (quoted_text + description + analysis). Call site in base-evaluator.ts properly passes CheckLLMResult["violations"][] which matches the CheckItem[][] signature.

tests/scoring-types.test.ts (3)

6-9: Type import rename looks consistent with the new public schema surface.


35-77: Judge mock result is aligned with JudgeLLMResult (score 1–4) and weighted scoring expectations.


137-159: Empty-violations case correctly exercises the “perfect score” path with CheckLLMResult.

src/scoring/index.ts (1)

1-7: Public scoring exports are cleanly updated to Check/Judge terminology.

src/evaluators/base-evaluator.ts (3)

4-12: Renames to Judge/Check schema + scoring functions are consistent and improve terminology alignment.

Also applies to: 22-26, 32-42


186-231: Check evaluation path correctly scores once from merged violations using original (non-line-numbered) word count.


117-176: Remove this concern — countWords() explicitly strips line number prefixes before counting words.

The countWords() function in src/chunking/utils.ts includes logic to remove line number prefixes using the regex /^\d+\t/gm before word counting. This was intentionally designed to ensure accurate word counts for line-numbered content, preventing any skew in chunk weighting.

Likely an incorrect or invalid review comment.

src/scoring/scorer.ts (1)

9-24: New scoring option types are clear and match the Check/Judge rename intent.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/prompts/schema.ts (1)

112-121: Critical: Schema/type mismatch for CheckLLMResult.

The buildCheckLLMSchema function marks suggestion, quoted_text, context_before, and context_after as required fields (lines 80-87), but the CheckLLMResult type marks them as optional. This creates a mismatch between what the LLM is forced to return and what TypeScript enforces.

🔧 Proposed fix to align type with schema
 export type CheckLLMResult = {
   violations: Array<{
     description: string;
     analysis: string;
-    suggestion?: string;
-    quoted_text?: string;
-    context_before?: string;
-    context_after?: string;
+    suggestion: string;
+    quoted_text: string;
+    context_before: string;
+    context_after: string;
   }>;
 };
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f79bb9b and ad4cca6.

📒 Files selected for processing (5)
  • src/cli/types.ts
  • src/evaluators/accuracy-evaluator.ts
  • src/evaluators/base-evaluator.ts
  • src/evaluators/evaluator.ts
  • src/prompts/schema.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/evaluators/accuracy-evaluator.ts
  • src/cli/types.ts
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.ts

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.ts: Use TypeScript ESM with explicit imports and narrow types
Use 2-space indentation; avoid trailing whitespace
Maintain strict TypeScript with no any; use unknown + schema validation for external data
Use custom error types with proper inheritance; catch blocks use unknown type

Files:

  • src/evaluators/evaluator.ts
  • src/prompts/schema.ts
  • src/evaluators/base-evaluator.ts
🧠 Learnings (1)
📚 Learning: 2025-12-28T19:43:51.189Z
Learnt from: CR
Repo: TRocket-Labs/vectorlint PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-28T19:43:51.189Z
Learning: Applies to src/output/**/*.ts : IDs should be shown as `PromptId.CriterionId` in output

Applied to files:

  • src/prompts/schema.ts
🧬 Code graph analysis (2)
src/evaluators/evaluator.ts (1)
src/prompts/schema.ts (1)
  • PromptEvaluationResult (173-173)
src/evaluators/base-evaluator.ts (4)
src/prompts/schema.ts (7)
  • PromptEvaluationResult (173-173)
  • JudgeResult (123-143)
  • buildJudgeLLMSchema (4-56)
  • JudgeLLMResult (96-110)
  • CheckResult (154-171)
  • buildCheckLLMSchema (58-94)
  • CheckLLMResult (112-121)
src/scoring/scorer.ts (3)
  • calculateJudgeScore (108-144)
  • averageJudgeScores (147-281)
  • calculateCheckScore (48-100)
src/output/line-numbering.ts (1)
  • prependLineNumbers (13-18)
src/chunking/utils.ts (1)
  • countWords (5-10)
🔇 Additional comments (6)
src/prompts/schema.ts (2)

4-94: LGTM!

The schema builders have been correctly renamed to buildJudgeLLMSchema and buildCheckLLMSchema, with appropriate internal schema names (vectorlint_judge_result and vectorlint_check_result). The structure and strict mode usage are appropriate.


96-110: LGTM!

The type definitions and type guards have been successfully renamed to use the new Check/Judge terminology. The discriminated union pattern with EvaluationType is correctly implemented in the type guards.

Also applies to: 123-185

src/evaluators/evaluator.ts (1)

1-9: LGTM!

The Evaluator interface has been correctly updated to use the renamed PromptEvaluationResult type. The refactoring is clean and maintains the same contract.

src/evaluators/base-evaluator.ts (3)

4-12: LGTM!

The imports and class-level changes correctly adopt the new Check/Judge terminology. All type imports, scoring function imports, comments, and method signatures are properly updated.

Also applies to: 23-26, 33-50


117-176: LGTM!

The runJudgeEvaluation method has been correctly updated to use the new terminology: buildJudgeLLMSchema, JudgeLLMResult, JudgeResult, calculateJudgeScore, and averageJudgeScores. The implementation logic remains unchanged and correct.


186-231: LGTM!

The runCheckEvaluation method has been correctly updated to use the new terminology: buildCheckLLMSchema, CheckLLMResult, CheckResult, and calculateCheckScore. The array type annotation CheckLLMResult["violations"][] is properly used for type safety.

- Rename test prompt id from "test-semi" to "test-check"
- Update prompt metadata id to match new naming convention
- Change prompt name from "Test Semi" to "Test Check"
- Align test identifiers with refactored check/judge type naming
@oshorefueled oshorefueled merged commit 03ed298 into main Jan 13, 2026
3 checks passed
@oshorefueled oshorefueled deleted the refactor/check-judge-terminology branch January 13, 2026 11:04
oshorefueled pushed a commit that referenced this pull request Mar 2, 2026
* refactor(prompts): rename schema types from subjective/semi-objective to judge/check

- Rename `buildSubjectiveLLMSchema()` to `buildJudgeLLMSchema()`
- Rename `buildSemiObjectiveLLMSchema()` to `buildCheckLLMSchema()`
- Rename `SubjectiveLLMResult` type to `JudgeLLMResult`
- Rename `SemiObjectiveLLMResult` type to `CheckLLMResult`
- Rename `SubjectiveResult` type to `JudgeResult`
- Rename `SemiObjectiveItem` type to `CheckItem`
- Rename `SemiObjectiveResult` type to `CheckResult`
- Rename `isSubjectiveResult()` function to `isJudgeResult()`
- Rename `isSemiObjectiveResult()` function to `isCheckResult()`
- Update `EvaluationResult` union type to use new type names
- Align schema naming with recent rule type terminology changes

* refactor(schema): rename subjective/semi-objective types to judge/check

- Rename SemiObjectiveItem to CheckItem in chunking/merger.ts
- Rename SubjectiveResult to JudgeResult in cli/orchestrator.ts and cli/types.ts
- Rename SemiObjectiveResult to CheckResult in evaluators
- Rename buildSubjectiveLLMSchema to buildJudgeLLMSchema
- Rename buildSemiObjectiveLLMSchema to buildCheckLLMSchema
- Rename SubjectiveLLMResult to JudgeLLMResult
- Rename SemiObjectiveLLMResult to CheckLLMResult
- Rename calculateSubjectiveScore to calculateJudgeScore
- Rename calculateSemiObjectiveScore to calculateCheckScore
- Rename averageSubjectiveScores to averageJudgeScores
- Rename isSubjectiveResult to isJudgeResult
- Update all comments and documentation to use judge/check terminology
- Update test files to reflect new type names
- Ensures consistent naming convention across codebase following recent rule type refactoring

* chore: Rename EvalutionResult to PromptEvaluationResult

* test(scoring-types): Update check evaluation test identifiers

- Rename test prompt id from "test-semi" to "test-check"
- Update prompt metadata id to match new naming convention
- Change prompt name from "Test Semi" to "Test Check"
- Align test identifiers with refactored check/judge type naming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants