From 7055d27cafb987a7949bed99666b9a05a6fb6ce2 Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Wed, 25 Mar 2026 02:14:26 +0000
Subject: [PATCH 01/11] docs: add design for --threshold flag (#698)

Design document for suite-level quality gate threshold flag
that fails CI when mean eval score drops below a specified value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 1604663b3709
---
 .../plans/2026-03-25-threshold-flag-design.md | 76 +++++++++++++++++++
 1 file changed, 76 insertions(+)
 create mode 100644 docs/plans/2026-03-25-threshold-flag-design.md
diff --git a/docs/plans/2026-03-25-threshold-flag-design.md b/docs/plans/2026-03-25-threshold-flag-design.md
new file mode 100644
index 000000000..29c6b5e74
--- /dev/null
+++ b/docs/plans/2026-03-25-threshold-flag-design.md
@@ -0,0 +1,76 @@
+# Design: `--threshold` flag for suite-level quality gates
+
+**Issue:** #698
+**Date:** 2026-03-25
+
+## Objective
+
+Add a `--threshold` CLI flag to `agentv eval` that fails (exit 1) if the mean score across all tests falls below the specified threshold. This enables CI/CD quality gating without needing `agentv compare --baseline`.
+
+## CLI Flag
+
+- `--threshold <number>` on `agentv eval run` (0–1 scale)
+- Optional — if omitted, no threshold check (current behavior preserved)
+- Overrides `execution.threshold` from YAML if both set
+
+## YAML Config
+
+Add `threshold` to the `execution` block in eval YAML files:
+
+```yaml
+execution:
+  threshold: 0.8
+```
+
+Both `threshold` and `execution.threshold` accepted (snake_case wire format convention).
+
+## Score Evaluation
+
+After all tests complete:
+
+1. Compute mean score from quality results only (excluding `execution_error` tests — same as existing `calculateEvaluationSummary()`)
+2. If mean score < threshold → exit code 1
+3. Execution errors fail independently via existing `fail_on_error` mechanism (separate concern)
+4. If no quality results exist (all execution errors), threshold check is skipped
+
+## Output
+
+When threshold is active, append a summary line after the existing result summary:
+
+```
+Suite score: 0.53 (threshold: 0.60) — FAIL
+```
+
+or:
+
+```
+Suite score: 0.85 (threshold: 0.60) — PASS
+```
+
+## JUnit Integration
+
+The JUnit writer uses the threshold for per-test pass/fail:
+
+- If threshold is set: `score < threshold` → `<failure>` element
+- If threshold is not set: `score < 0.5` (current hardcoded behavior preserved)
+
+## Exit Code
+
+- Exit 0: mean score >= threshold (or no threshold set)
+- Exit 1: mean score < threshold
+- Execution errors handled separately by `fail_on_error`
+
+## Files to Modify
+
+1. `packages/core/src/evaluation/validation/eval-file.schema.ts` — add `threshold` to ExecutionSchema
+2. `apps/cli/src/commands/eval/commands/run.ts` — add `--threshold` CLI flag
+3. `apps/cli/src/commands/eval/run-eval.ts` — pass threshold through, check after results
+4. `apps/cli/src/commands/eval/statistics.ts` — add threshold summary formatting
+5. `apps/cli/src/commands/eval/junit-writer.ts` — use threshold for pass/fail
+6. Tests for new behavior
+
+## Non-Goals
+
+- Per-test threshold override (use `required` for that)
+- Replacement for `agentv compare` regression gating
+- Severity levels (#334)

From 44587b9eec003fd02d5aefee60e48ddb4c1def27 Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Wed, 25 Mar 2026 02:19:22 +0000
Subject: [PATCH 02/11] docs: add implementation plan for --threshold flag
 (#698)

8-task TDD plan covering core extractor, YAML schema, CLI flag,
threshold check, JUnit integration, and manual UAT.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 6cfbff7718f7
---
 docs/plans/2026-03-25-threshold-flag-plan.md | 562 +++++++++++++++++++
 1 file changed, 562 insertions(+)
 create mode 100644 docs/plans/2026-03-25-threshold-flag-plan.md

diff --git a/docs/plans/2026-03-25-threshold-flag-plan.md b/docs/plans/2026-03-25-threshold-flag-plan.md
new file mode 100644
index 000000000..57ba2eb53
--- /dev/null
+++ b/docs/plans/2026-03-25-threshold-flag-plan.md
@@ -0,0 +1,562 @@
+# `--threshold` Flag Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Add a `--threshold` CLI flag and `execution.threshold` YAML field to `agentv eval` that exits 1 when mean quality score falls below the threshold.
+
+**Architecture:** The threshold value flows from CLI flag or YAML config through the existing options pipeline. After all tests complete, the summary is checked against the threshold. JUnit writer also uses the threshold for per-test pass/fail.
+
+**Tech Stack:** TypeScript, cmd-ts (CLI parsing), Zod (schema validation), Vitest (testing)
+
+---
+
+### Task 1: Add `extractThreshold` to core config-loader
+
+**Files:**
+- Modify: `packages/core/src/evaluation/loaders/config-loader.ts:287` (after `extractTotalBudgetUsd`)
+- Test: `packages/core/test/evaluation/loaders/config-loader.test.ts`
+
+**Step 1: Write the failing tests**
+
+Add to `packages/core/test/evaluation/loaders/config-loader.test.ts` after the `extractFailOnError` describe block:
+
+```typescript
+describe('extractThreshold', () => {
+  it('returns undefined when no execution block', () => {
+    const suite: JsonObject = { tests: [] };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+
+  it('returns undefined when threshold not set', () => {
+    const suite: JsonObject = { execution: { target: 'default' } };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+
+  it('parses valid threshold', () => {
+    const suite: JsonObject = { execution: { threshold: 0.8 } };
+    expect(extractThreshold(suite)).toBe(0.8);
+  });
+
+  it('accepts 0 as threshold', () => {
+    const suite: JsonObject = { execution: { threshold: 0 } };
+    expect(extractThreshold(suite)).toBe(0);
+  });
+
+  it('accepts 1 as threshold', () => {
+    const suite: JsonObject = { execution: { threshold: 1 } };
+    expect(extractThreshold(suite)).toBe(1);
+  });
+
+  it('returns undefined for negative threshold', () => {
+    const suite: JsonObject = { execution: { threshold: -0.1 } };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+
+  it('returns undefined for threshold > 1', () => {
+    const suite: JsonObject = { execution: { threshold: 1.5 } };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+
+  it('returns undefined for non-number threshold', () => {
+    const suite: JsonObject = { execution: { threshold: 'high' } };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+});
+```
+
+Also add `extractThreshold` to the import at the top of the test file.
+
+**Step 2: Run tests to verify they fail**
+
+Run: `bun test packages/core/test/evaluation/loaders/config-loader.test.ts`
+Expected: FAIL — `extractThreshold` not found
+
+**Step 3: Implement `extractThreshold`**
+
+Add to `packages/core/src/evaluation/loaders/config-loader.ts` after `extractTotalBudgetUsd` (after line ~308):
+
+```typescript
+/**
+ * Extract `execution.threshold` from parsed eval suite.
+ * Accepts a number in [0, 1] range.
+ * Returns undefined when not specified.
+ */
+export function extractThreshold(suite: JsonObject): number | undefined {
+  const execution = suite.execution;
+  if (!execution || typeof execution !== 'object' || Array.isArray(execution)) {
+    return undefined;
+  }
+
+  const executionObj = execution as Record<string, unknown>;
+  const raw = executionObj.threshold;
+
+  if (raw === undefined || raw === null) {
+    return undefined;
+  }
+
+  if (typeof raw === 'number' && raw >= 0 && raw <= 1) {
+    return raw;
+  }
+
+  logWarning(
+    `Invalid execution.threshold: ${raw}. Must be a number between 0 and 1. Ignoring.`,
+  );
+  return undefined;
+}
+```
+
+**Step 4: Run tests to verify they pass**
+
+Run: `bun test packages/core/test/evaluation/loaders/config-loader.test.ts`
+Expected: PASS
+
+**Step 5: Commit**
+
+```bash
+git add packages/core/src/evaluation/loaders/config-loader.ts packages/core/test/evaluation/loaders/config-loader.test.ts
+git commit -m "feat(core): add extractThreshold for execution.threshold YAML field (#698)"
+```
+
+---
+
+### Task 2: Wire `extractThreshold` through YAML parser and schema
+
+**Files:**
+- Modify: `packages/core/src/evaluation/yaml-parser.ts:12` (imports), `:58` (re-exports), `:204` (loadTestSuite)
+- Modify: `packages/core/src/evaluation/yaml-parser.ts:168` (EvalSuiteResult type)
+- Modify: `packages/core/src/evaluation/validation/eval-file.schema.ts:317` (ExecutionSchema)
+
+**Step 1: Add `threshold` to ExecutionSchema in eval-file.schema.ts**
+
+In `packages/core/src/evaluation/validation/eval-file.schema.ts`, add to the `ExecutionSchema` object (after `failOnError` at line 330):
+
+```typescript
+  threshold: z.number().min(0).max(1).optional(),
+```
+
+**Step 2: Add to EvalSuiteResult type in yaml-parser.ts**
+
+In `packages/core/src/evaluation/yaml-parser.ts`, add to the `EvalSuiteResult` type (after `failOnError` at line 182):
+
+```typescript
+  /** Suite-level quality threshold (0-1) — suite fails if mean score is below */
+  readonly threshold?: number;
+```
+
+**Step 3: Import and re-export `extractThreshold` in yaml-parser.ts**
+
+Add `extractThreshold` to the import from `./loaders/config-loader.js` (line 12 area) and the re-export block (line 58 area).
+
+**Step 4: Use in `loadTestSuite`**
+
+In the `loadTestSuite` function (around line 203), extract and return threshold:
+
+```typescript
+  const threshold = extractThreshold(parsed);
+  return {
+    tests,
+    trials: extractTrialsConfig(parsed),
+    targets: extractTargetsFromSuite(parsed),
+    workers: extractWorkersFromSuite(parsed),
+    cacheConfig: extractCacheConfig(parsed),
+    totalBudgetUsd: extractTotalBudgetUsd(parsed),
+    ...(metadata !== undefined && { metadata }),
+    ...(failOnError !== undefined && { failOnError }),
+    ...(threshold !== undefined && { threshold }),
+  };
+```
+
+**Step 5: Regenerate the JSON schema**
+
+Run: `bun run generate:schema`
+
+**Step 6: Run core tests**
+
+Run: `bun test packages/core/test/evaluation/loaders/config-loader.test.ts`
+Expected: PASS
+
+**Step 7: Commit**
+
+```bash
+git add packages/core/src/evaluation/validation/eval-file.schema.ts packages/core/src/evaluation/yaml-parser.ts
+git commit -m "feat(core): wire extractThreshold through YAML parser and schema (#698)"
+```
+
+---
+
+### Task 3: Add `--threshold` CLI flag and pass through to run-eval
+
+**Files:**
+- Modify: `apps/cli/src/commands/eval/commands/run.ts` (add CLI flag)
+- Modify: `apps/cli/src/commands/eval/run-eval.ts` (NormalizedOptions, normalizeOptions, handler return)
+
+**Step 1: Add CLI flag to run.ts**
+
+In `apps/cli/src/commands/eval/commands/run.ts`, add after the `model` option (around line 171):
+
+```typescript
+    threshold: option({
+      type: optional(number),
+      long: 'threshold',
+      description: 'Suite-level quality gate: exit 1 if mean score falls below this value (0-1)',
+    }),
+```
+
+And add `threshold: args.threshold` to the `rawOptions` object in the handler (around line 219).
+
+**Step 2: Add to NormalizedOptions in run-eval.ts**
+
+In `apps/cli/src/commands/eval/run-eval.ts`, add to the `NormalizedOptions` interface:
+
+```typescript
+  readonly threshold?: number;
+```
+
+**Step 3: Add to normalizeOptions**
+
+In the `normalizeOptions` function, add threshold resolution (CLI > YAML):
+
+```typescript
+  // Resolve threshold: CLI --threshold > YAML execution.threshold
+  const cliThreshold = normalizeOptionalNumber(rawOptions.threshold);
+```
+
+And in the return statement:
+
+```typescript
+    threshold: cliThreshold,
+```
+
+**Step 4: Wire YAML threshold into normalized options**
+
+In `runEvalCommand`, after `prepareEvalFile` returns, merge the YAML threshold if CLI didn't set one. In the loop over eval files (around the `prepareEvalFile` call), capture `suite.threshold` and pass it through.
+
+The cleanest approach: read the YAML threshold in `prepareEvalFile` and return it alongside the other fields. Then in the main `runEvalCommand`, resolve CLI vs YAML threshold.
+
+Add `threshold` to the `prepareEvalFile` return type (alongside `failOnError`):
+
+```typescript
+  readonly threshold?: number;
+```
+
+And in `prepareEvalFile`, add after `failOnError: suite.failOnError`:
+
+```typescript
+    threshold: suite.threshold,
+```
+
+**Step 5: Commit**
+
+```bash
+git add apps/cli/src/commands/eval/commands/run.ts apps/cli/src/commands/eval/run-eval.ts
+git commit -m "feat(cli): add --threshold flag and wire through options pipeline (#698)"
+```
+
+---
+
+### Task 4: Add threshold check and summary output after eval completes
+
+**Files:**
+- Modify: `apps/cli/src/commands/eval/run-eval.ts` (after summary calculation ~line 1152)
+- Modify: `apps/cli/src/commands/eval/statistics.ts` (add `formatThresholdSummary`)
+- Test: `apps/cli/test/commands/eval/threshold.test.ts` (new)
+
+**Step 1: Write failing tests**
+
+Create `apps/cli/test/commands/eval/threshold.test.ts`:
+
+```typescript
+import { describe, expect, it } from 'bun:test';
+
+import type { EvaluationResult } from '@agentv/core';
+
+import { formatThresholdSummary } from '../../../src/commands/eval/statistics.js';
+
+function makeResult(overrides: Partial<EvaluationResult> = {}): EvaluationResult {
+  return {
+    timestamp: '2024-01-01T00:00:00Z',
+    testId: 'test-1',
+    score: 1.0,
+    assertions: [{ text: 'criterion-1', passed: true }],
+    output: [{ role: 'assistant' as const, content: 'answer' }],
+    target: 'default',
+    ...overrides,
+  };
+}
+
+describe('formatThresholdSummary', () => {
+  it('returns PASS when mean score meets threshold', () => {
+    const result = formatThresholdSummary(0.85, 0.6);
+    expect(result.passed).toBe(true);
+    expect(result.message).toContain('0.85');
+    expect(result.message).toContain('0.60');
+    expect(result.message).toContain('PASS');
+  });
+
+  it('returns FAIL when mean score is below threshold', () => {
+    const result = formatThresholdSummary(0.53, 0.6);
+    expect(result.passed).toBe(false);
+    expect(result.message).toContain('0.53');
+    expect(result.message).toContain('0.60');
+    expect(result.message).toContain('FAIL');
+  });
+
+  it('returns PASS when mean score exactly equals threshold', () => {
+    const result = formatThresholdSummary(0.6, 0.6);
+    expect(result.passed).toBe(true);
+  });
+
+  it('returns PASS for threshold 0 with any score', () => {
+    const result = formatThresholdSummary(0, 0);
+    expect(result.passed).toBe(true);
+  });
+});
+```
+
+**Step 2: Run tests to verify they fail**
+
+Run: `bun test apps/cli/test/commands/eval/threshold.test.ts`
+Expected: FAIL — `formatThresholdSummary` not found
+
+**Step 3: Implement `formatThresholdSummary` in statistics.ts**
+
+Add to `apps/cli/src/commands/eval/statistics.ts`:
+
+```typescript
+/**
+ * Format a threshold check summary line.
+ * Returns whether the threshold was met and the formatted message.
+ */
+export function formatThresholdSummary(
+  meanScore: number,
+  threshold: number,
+): { passed: boolean; message: string } {
+  const passed = meanScore >= threshold;
+  const verdict = passed ? 'PASS' : 'FAIL';
+  const message = `Suite score: ${meanScore.toFixed(2)} (threshold: ${threshold.toFixed(2)}) — ${verdict}`;
+  return { passed, message };
+}
+```
+
+**Step 4: Run tests to verify they pass**
+
+Run: `bun test apps/cli/test/commands/eval/threshold.test.ts`
+Expected: PASS
+
+**Step 5: Wire the threshold check into run-eval.ts**
+
+In `apps/cli/src/commands/eval/run-eval.ts`, after the summary is printed (around line 1153), add:
+
+```typescript
+    // Threshold quality gate check
+    const resolvedThreshold = options.threshold ?? yamlThreshold;
+    if (resolvedThreshold !== undefined) {
+      const { formatThresholdSummary } = await import('./statistics.js');
+      const thresholdResult = formatThresholdSummary(summary.mean, resolvedThreshold);
+      console.log(`\n${thresholdResult.message}`);
+      if (!thresholdResult.passed) {
+        process.exitCode = 1;
+      }
+    }
+```
+
+Note: `yamlThreshold` needs to be captured from the `prepareEvalFile` results. If multiple eval files are run, use the first non-undefined threshold (or the CLI value).
+
+Import `formatThresholdSummary` statically at the top (preferred over dynamic import since it's in the same package):
+
+```typescript
+import {
+  calculateEvaluationSummary,
+  formatEvaluationSummary,
+  formatMatrixSummary,
+  formatThresholdSummary,
+} from './statistics.js';
+```
+
+**Step 6: Commit**
+
+```bash
+git add apps/cli/src/commands/eval/statistics.ts apps/cli/src/commands/eval/run-eval.ts apps/cli/test/commands/eval/threshold.test.ts
+git commit -m "feat(cli): add threshold check with summary output after eval (#698)"
+```
+
+---
+
+### Task 5: JUnit writer uses threshold for per-test pass/fail
+
+**Files:**
+- Modify: `apps/cli/src/commands/eval/junit-writer.ts`
+- Modify: `apps/cli/test/commands/eval/output-writers.test.ts` (add tests)
+
+**Step 1: Write failing tests**
+
+Add to `apps/cli/test/commands/eval/output-writers.test.ts` in the JUnit describe block:
+
+```typescript
+  it('uses custom threshold for pass/fail when provided', async () => {
+    const filePath = path.join(testDir, `junit-threshold-${Date.now()}.xml`);
+    const writer = await JunitWriter.open(filePath, { threshold: 0.8 });
+
+    await writer.append(makeResult({ testId: 'high', score: 0.9 }));
+    await writer.append(makeResult({ testId: 'mid', score: 0.6 }));
+    await writer.close();
+
+    const xml = await readFile(filePath, 'utf8');
+    expect(xml).not.toContain('<failure message="score=0.900"');
+    expect(xml).toContain('<failure message="score=0.600"');
+  });
+
+  it('defaults to 0.5 threshold when none provided', async () => {
+    const filePath = path.join(testDir, `junit-default-${Date.now()}.xml`);
+    const writer = await JunitWriter.open(filePath);
+
+    await writer.append(makeResult({ testId: 'pass', score: 0.6 }));
+    await writer.append(makeResult({ testId: 'fail', score: 0.3 }));
+    await writer.close();
+
+    const xml = await readFile(filePath, 'utf8');
+    expect(xml).not.toContain('<failure message="score=0.600"');
+    expect(xml).toContain('<failure message="score=0.300"');
+  });
+```
+
+**Step 2: Run tests to verify they fail**
+
+Run: `bun test apps/cli/test/commands/eval/output-writers.test.ts`
+Expected: FAIL — `JunitWriter.open` doesn't accept options
+
+**Step 3: Implement threshold support in JunitWriter**
+
+Modify `apps/cli/src/commands/eval/junit-writer.ts`:
+
+```typescript
+export interface JunitWriterOptions {
+  readonly threshold?: number;
+}
+
+export class JunitWriter {
+  private readonly filePath: string;
+  private readonly results: EvaluationResult[] = [];
+  private readonly threshold: number;
+  private closed = false;
+
+  private constructor(filePath: string, options?: JunitWriterOptions) {
+    this.filePath = filePath;
+    this.threshold = options?.threshold ?? 0.5;
+  }
+
+  static async open(filePath: string, options?: JunitWriterOptions): Promise<JunitWriter> {
+    await mkdir(path.dirname(filePath), { recursive: true });
+    return new JunitWriter(filePath, options);
+  }
+```
+
+Then replace all `r.score < 0.5` with `r.score < this.threshold` in the `close()` method.
+
+**Step 4: Pass threshold to JunitWriter in output-writer.ts**
+
+In `apps/cli/src/commands/eval/output-writer.ts`, where JunitWriter is created, pass the threshold. Check how output writers are created and thread the threshold through.
+
+**Step 5: Run tests to verify they pass**
+
+Run: `bun test apps/cli/test/commands/eval/output-writers.test.ts`
+Expected: PASS
+
+**Step 6: Commit**
+
+```bash
+git add apps/cli/src/commands/eval/junit-writer.ts apps/cli/src/commands/eval/output-writer.ts apps/cli/test/commands/eval/output-writers.test.ts
+git commit -m "feat(cli): JUnit writer uses --threshold for per-test pass/fail (#698)"
+```
+
+---
+
+### Task 6: Add `threshold` to Zod schema and regenerate JSON schema
+
+**Files:**
+- Modify: `packages/core/src/evaluation/validation/eval-file.schema.ts` (already done in Task 2)
+- Run: `bun run generate:schema`
+
+**Step 1: Verify threshold is in ExecutionSchema**
+
+Read `packages/core/src/evaluation/validation/eval-file.schema.ts` and confirm `threshold` was added in Task 2.
+
+**Step 2: Regenerate JSON schema**
+
+Run: `bun run generate:schema`
+
+**Step 3: Run validate:examples to check existing YAML files still pass**
+
+Run: `bun run validate:examples`
+Expected: PASS (threshold is optional, so existing files are unaffected)
+
+**Step 4: Commit if schema file changed**
+
+```bash
+git add packages/core/
+git commit -m "chore: regenerate eval-schema.json with threshold field (#698)"
+```
+
+---
+
+### Task 7: Run full test suite and verify
+
+**Step 1: Run all tests**
+
+Run: `bun run test`
+Expected: PASS (except any pre-existing known failures)
+
+**Step 2: Run typecheck**
+
+Run: `bun run typecheck`
+Expected: PASS
+
+**Step 3: Run lint**
+
+Run: `bun run lint`
+Expected: PASS
+
+**Step 4: Run build**
+
+Run: `bun run build`
+Expected: PASS
+
+---
+
+### Task 8: Manual red/green UAT
+
+**Step 1: Red — verify no threshold behavior on main**
+
+Run an eval without --threshold:
+
+```bash
+bun apps/cli/src/cli.ts eval examples/features/rubric/evals/dataset.eval.yaml --test-id summary-1
+```
+
+Confirm: no "Suite score" line in output, exit code is 0.
+
+**Step 2: Green — verify --threshold works**
+
+Run with a threshold that should PASS:
+
+```bash
+bun apps/cli/src/cli.ts eval examples/features/rubric/evals/dataset.eval.yaml --test-id summary-1 --threshold 0.3
+```
+
+Confirm: "Suite score: X.XX (threshold: 0.30) — PASS" printed, exit code 0.
+
+Run with a threshold that should FAIL:
+
+```bash
+bun apps/cli/src/cli.ts eval examples/features/rubric/evals/dataset.eval.yaml --test-id summary-1 --threshold 0.99
+```
+
+Confirm: "Suite score: X.XX (threshold: 0.99) — FAIL" printed, exit code 1.
+
+**Step 3: Verify JUnit output uses threshold**
+
+```bash
+bun apps/cli/src/cli.ts eval examples/features/rubric/evals/dataset.eval.yaml --test-id summary-1 --threshold 0.9 -o /tmp/test-threshold.xml
+```
+
+Inspect the XML: tests with score < 0.9 should have `<failure>` elements.

From a1f283979757096f9ad9899a14d41ec00099576c Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Wed, 25 Mar 2026 02:20:29 +0000
Subject: [PATCH 03/11] docs(repo): restore dropped sections in AGENTS.md

PR #757 moved content from CLAUDE.md to AGENTS.md but accidentally
dropped several sections: Evaluator Type System, Git Workflow (issue
claiming, PRs, worktrees), Version Management, Package Publishing,
and Python Scripts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 1ed266d094ed
---
 AGENTS.md | 182 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 182 insertions(+)

diff --git a/AGENTS.md b/AGENTS.md
index e0a6b1aa8..cb022cb1b 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -258,3 +258,185 @@ When making changes to functionality:
 2. **Skill files** (`plugins/agentv-dev/skills/agentv-eval-builder/`): Update the AI-focused reference card if the change affects YAML schema, evaluator types, or CLI commands. Keep concise — link to docs site for details.
 
 3. **Examples** (`examples/`): Update any example code, scripts, or eval YAML files that exercise the changed functionality. Examples are both documentation and integration tests.
+
+4. **README.md**: Keep minimal. Links point to agentv.dev.
+
+## Evaluator Type System
+
+Evaluator types use **kebab-case** everywhere (matching promptfoo convention):
+
+- **YAML config:** `type: llm-grader`, `type: is-json`, `type: execution-metrics`
+- **Internal TypeScript:** `EvaluatorKind = 'llm-grader' | 'is-json' | ...`
+- **Output `scores[].type`:** `"llm-grader"`, `"is-json"`
+- **Registry keys:** `registry.register('llm-grader', ...)`
+
+**Source of truth:** `EVALUATOR_KIND_VALUES` array in `packages/core/src/evaluation/types.ts`
+
+**Backward compatibility:** Snake_case is accepted in YAML (`llm_judge` → `llm-grader`) via `normalizeEvaluatorType()` in `evaluator-parser.ts`. Single-word types (`contains`, `equals`, `regex`, `latency`, `cost`) have no separator and are unchanged.
+
+**Two type definitions exist:**
+- `EvaluatorKind` in `packages/core/src/evaluation/types.ts` — internal, canonical
+- `AssertionType` in `packages/eval/src/assertion.ts` — SDK-facing, must stay in sync
+
+## Git Workflow
+
+### Commit Convention
+
+Follow conventional commits: `type(scope): description`
+
+Types: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`
+
+### Issue Workflow
+
+When working on a GitHub issue, **ALWAYS** follow this workflow:
+
+1. **Claim the issue** — prevents other agents from duplicating work:
+   ```bash
+   # Load AGENT_ID from .env; if not set, ask the user or default to <harness>-<model>
+   # Harness = the coding tool (claude-code, opencode, codex-cli, cursor, etc.)
+   # Model = the LLM (opus, sonnet, o3, etc.)
+   # Examples: "claude-code-opus", "opencode-sonnet", "cursor-o3", "codex-cli-o3"
+   # In this local dev environment, default to "devbox2-codex" unless the user specifies another AGENT_ID.
+   # Do NOT use hostname or machine name.
+   source .env 2>/dev/null
+   if [ -z "$AGENT_ID" ]; then
+     echo "AGENT_ID is not set. Ask the user for an agent identifier, or default to devbox2-codex in this environment (otherwise use <harness>-<model>)."
+   fi
+
+   # Check if already claimed
+   gh issue view <number> --json labels --jq '.labels[].name' | grep -q "in-progress" && echo "SKIP — already claimed" && exit 1
+
+   # Claim it — label + project roadmap status
+   gh issue edit <number> --add-label "in-progress"
+
+   # Update project roadmap: set status to "In Progress" and stamp Agent ID
+   ITEM_ID=$(gh project item-list 1 --owner EntityProcess --format json | jq -r '.items[] | select(.content.number == <number> and .content.repository == "agentv") | .id')
+   if [ -n "$ITEM_ID" ]; then
+     gh project item-edit --project-id PVT_kwDOAIbbRc4BSmjF --id "$ITEM_ID" --field-id PVTSSF_lADOAIbbRc4BSmjFzhAFomw --single-select-option-id 47fc9ee4
+     gh project item-edit --project-id PVT_kwDOAIbbRc4BSmjF --id "$ITEM_ID" --field-id PVTF_lADOAIbbRc4BSmjFzhAHSnk --text "$AGENT_ID"
+   fi
+   ```
+   If the issue has the `in-progress` label, **do not work on it** — pick a different issue.
+
+2. **Create a worktree** with a feature branch:
+   ```bash
+   git worktree add agentv.worktrees/<branch-name> -b <type>/<issue-number>-<short-description>
+   cd agentv.worktrees/<branch-name>
+   bun install
+   cp "$(git worktree list --porcelain | head -1 | sed 's/worktree //')/.env" .env
+   # Example: git worktree add agentv.worktrees/feat/42-add-new-embedder -b feat/42-add-new-embedder
+   ```
+
+3. **Implement the changes** and commit following the commit convention
+
+4. **Push the branch and create a Pull Request**:
+   ```bash
+   git push -u origin <branch-name>
+   gh pr create --title "<type>(scope): description" --body "Closes #<issue-number>"
+   ```
+
+5. **Before merging**, ensure:
+   - **E2E verification completed** (see "Completing Work — E2E Checklist")
+   - CI pipeline passes (all checks green)
+   - Code has been reviewed if required
+   - No merge conflicts with `main`
+
+The `in-progress` label stays on the issue until the PR is merged and the issue is closed. Do not remove it manually.
+
+**IMPORTANT:** Never push directly to `main`. Always use branches and PRs.
+
+### Tracker Conventions
+
+- The roadmap project is the source of truth for prioritization.
+- Issues in the roadmap are prioritized; issues outside it are not.
+- `bug` marks defects.
+- Issues without `bug` are non-bug work by default.
+- `in-progress` marks an issue as claimed by an agent — do not start work on it.
+- `core`, `wui`, and `tui` are area labels.
+- Keep issue bodies focused on the handoff contract: objective, design latitude, acceptance signals, non-goals, and related links.
+- Do not put priority metadata in issue bodies.
+
+### Pull Requests
+
+**Always use squash merge** when merging PRs to main. This keeps the commit history clean with one commit per feature/fix.
+
+```bash
+# Using GitHub CLI to squash merge a PR
+gh pr merge <PR_NUMBER> --squash --delete-branch
+
+# Or with auto-merge enabled
+gh pr merge <PR_NUMBER> --squash --auto
+```
+
+Do NOT use regular merge or rebase merge, as these create noisy commit history with intermediate commits.
+
+### After Squash Merge
+
+Once a PR is squash-merged, its source branch diverges from main. **Do NOT** try to push additional commits from that branch—you will get merge conflicts.
+
+For follow-up fixes:
+```bash
+git checkout main
+git pull origin main
+git checkout -b fix/<short-description>
+# Apply fixes on the fresh branch
+```
+
+### Plans and Worktrees
+
+#### Plans
+
+Design documents and implementation plans are stored in `docs/plans/` inside the worktree (not the main repo). Save plans to the worktree so they are committed on the feature branch and visible in the draft PR.
+
+**Path warning:** When working in a worktree, use paths relative to the worktree root (e.g., `docs/plans/plan.md`). Do NOT prefix with the worktree directory from the main repo (e.g., `agentv.worktrees/feat/xxx/docs/plans/plan.md`) — this creates accidental nested directories inside the worktree.
+
+Plans are temporary working materials. **Before merging the PR**, delete the plan file and incorporate any user-relevant details into the official documentation.
+
+#### Git Worktrees
+
+Use the sibling `../agentv.worktrees/` directory for all AgentV worktrees. This overrides any generic skill or default preference for `.worktrees/` or `worktrees/` inside the repository. Do not create new AgentV worktrees inside the repository root.
+
+After creating a worktree, always run setup:
+```bash
+bun install                                    # worktrees do NOT share node_modules
+cp "$(git worktree list --porcelain | head -1 | sed 's/worktree //')/.env" .env    # required for e2e tests and LLM operations
+```
+Both steps are required before running builds, tests, or evals in the worktree.
+
+## Version Management
+
+This project uses a simple release script for version bumping. The git commit history serves as the changelog.
+
+### Releasing a new version
+
+Run the release script for a version bump:
+
+```bash
+bun run release          # patch bump (default)
+bun run release minor    # minor bump
+bun run release major    # major bump
+```
+
+The script will:
+1. Validate you're on the `main` branch with no uncommitted changes
+2. Pull latest changes from origin
+3. Bump version in all package.json files
+4. Commit the version bump
+5. Create and push a git tag
+
+Recommended publish flow:
+```bash
+bun run publish:next   # publish current version to npm `next`
+bun run promote:latest # promote same version to npm `latest`
+bun run tag:next 2.18.0
+bun run promote:latest 2.18.0
+```
+
+## Package Publishing
+- Core package (`packages/core/`) - Core evaluation engine and grading logic (published as `@agentv/core`)
+- CLI package (`apps/cli/`) is published as `agentv` on npm
+- Uses tsup with `noExternal: ["@agentv/core"]` to bundle workspace dependencies
+- Install command: `bun install -g agentv` (preferred) or `npm install -g agentv`
+
+## Python Scripts
+When running Python scripts, always use: `uv run <script.py>`

From 430418a510b5c179ee4d4bb79d2c8c0c660a5adf Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Wed, 25 Mar 2026 02:23:00 +0000
Subject: [PATCH 04/11] feat(core): add extractThreshold for
 execution.threshold YAML field (#698)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .../src/evaluation/loaders/config-loader.ts   | 28 ++++++++++++
 .../evaluation/loaders/config-loader.test.ts  | 43 +++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/packages/core/src/evaluation/loaders/config-loader.ts b/packages/core/src/evaluation/loaders/config-loader.ts
index 4835dcbd2..daa2aa7aa 100644
--- a/packages/core/src/evaluation/loaders/config-loader.ts
+++ b/packages/core/src/evaluation/loaders/config-loader.ts
@@ -333,6 +333,34 @@ export function extractFailOnError(suite: JsonObject): FailOnError | undefined {
   return undefined;
 }
 
+/**
+ * Extract `execution.threshold` from parsed eval suite.
+ * Accepts a number in [0, 1] range.
+ * Returns undefined when not specified.
+ */
+export function extractThreshold(suite: JsonObject): number | undefined {
+  const execution = suite.execution;
+  if (!execution || typeof execution !== 'object' || Array.isArray(execution)) {
+    return undefined;
+  }
+
+  const executionObj = execution as Record<string, unknown>;
+  const raw = executionObj.threshold;
+
+  if (raw === undefined || raw === null) {
+    return undefined;
+  }
+
+  if (typeof raw === 'number' && raw >= 0 && raw <= 1) {
+    return raw;
+  }
+
+  logWarning(
+    `Invalid execution.threshold: ${raw}. Must be a number between 0 and 1. Ignoring.`,
+  );
+  return undefined;
+}
+
 export function parseExecutionDefaults(
   raw: unknown,
   configPath: string,
diff --git a/packages/core/test/evaluation/loaders/config-loader.test.ts b/packages/core/test/evaluation/loaders/config-loader.test.ts
index 27dd52c1e..ac68e0eb9 100644
--- a/packages/core/test/evaluation/loaders/config-loader.test.ts
+++ b/packages/core/test/evaluation/loaders/config-loader.test.ts
@@ -5,6 +5,7 @@ import {
   extractTargetFromSuite,
   extractTargetsFromSuite,
   extractTargetsFromTestCase,
+  extractThreshold,
   extractTotalBudgetUsd,
   extractTrialsConfig,
   parseExecutionDefaults,
@@ -302,6 +303,48 @@ describe('extractFailOnError', () => {
   });
 });
 
+describe('extractThreshold', () => {
+  it('returns undefined when no execution block', () => {
+    const suite: JsonObject = { tests: [] };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+
+  it('returns undefined when threshold not set', () => {
+    const suite: JsonObject = { execution: { target: 'default' } };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+
+  it('parses valid threshold', () => {
+    const suite: JsonObject = { execution: { threshold: 0.8 } };
+    expect(extractThreshold(suite)).toBe(0.8);
+  });
+
+  it('accepts 0 as threshold', () => {
+    const suite: JsonObject = { execution: { threshold: 0 } };
+    expect(extractThreshold(suite)).toBe(0);
+  });
+
+  it('accepts 1 as threshold', () => {
+    const suite: JsonObject = { execution: { threshold: 1 } };
+    expect(extractThreshold(suite)).toBe(1);
+  });
+
+  it('returns undefined for negative threshold', () => {
+    const suite: JsonObject = { execution: { threshold: -0.1 } };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+
+  it('returns undefined for threshold > 1', () => {
+    const suite: JsonObject = { execution: { threshold: 1.5 } };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+
+  it('returns undefined for non-number threshold', () => {
+    const suite: JsonObject = { execution: { threshold: 'high' } };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+});
+
 describe('parseExecutionDefaults', () => {
   it('returns undefined when no execution block', () => {
     expect(parseExecutionDefaults(undefined, '/test/config.yaml')).toBeUndefined();

From 5b6a80d8a378216bafe7332e3c7ecd05edc7a714 Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Wed, 25 Mar 2026 02:27:42 +0000
Subject: [PATCH 05/11] feat(core): wire extractThreshold through YAML parser
 and schema (#698)

Add threshold to ExecutionSchema in Zod, wire extractThreshold through
yaml-parser.ts (import, re-export, EvalSuiteResult type, loadTestSuite),
and regenerate eval-schema.json.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .../evaluation/validation/eval-file.schema.ts |    1 +
 packages/core/src/evaluation/yaml-parser.ts   |    6 +
 .../references/eval-schema.json               | 3545 +++++++++++++----
 3 files changed, 2881 insertions(+), 671 deletions(-)

diff --git a/packages/core/src/evaluation/validation/eval-file.schema.ts b/packages/core/src/evaluation/validation/eval-file.schema.ts
index 9ebf758a9..084eb3466 100644
--- a/packages/core/src/evaluation/validation/eval-file.schema.ts
+++ b/packages/core/src/evaluation/validation/eval-file.schema.ts
@@ -328,6 +328,7 @@ const ExecutionSchema = z.object({
   totalBudgetUsd: z.number().min(0).optional(),
   fail_on_error: FailOnErrorSchema.optional(),
   failOnError: FailOnErrorSchema.optional(),
+  threshold: z.number().min(0).max(1).optional(),
 });
 
 // ---------------------------------------------------------------------------
diff --git a/packages/core/src/evaluation/yaml-parser.ts b/packages/core/src/evaluation/yaml-parser.ts
index 119cadb47..e8004bbc9 100644
--- a/packages/core/src/evaluation/yaml-parser.ts
+++ b/packages/core/src/evaluation/yaml-parser.ts
@@ -13,6 +13,7 @@ import {
   extractTargetFromSuite,
   extractTargetsFromSuite,
   extractTargetsFromTestCase,
+  extractThreshold,
   extractTotalBudgetUsd,
   extractTrialsConfig,
   extractWorkersFromSuite,
@@ -59,6 +60,7 @@ export {
   extractTargetFromSuite,
   extractTargetsFromSuite,
   extractTargetsFromTestCase,
+  extractThreshold,
   extractTrialsConfig,
   extractWorkersFromSuite,
   loadConfig,
@@ -180,6 +182,8 @@ export type EvalSuiteResult = {
   readonly totalBudgetUsd?: number;
   /** Execution error tolerance: true or false */
   readonly failOnError?: import('./types.js').FailOnError;
+  /** Suite-level quality threshold (0-1) — suite fails if mean score is below */
+  readonly threshold?: number;
 };
 
 /**
@@ -201,6 +205,7 @@ export async function loadTestSuite(
   const { tests, parsed } = await loadTestsFromYaml(evalFilePath, repoRoot, options);
   const metadata = parseMetadata(parsed);
   const failOnError = extractFailOnError(parsed);
+  const threshold = extractThreshold(parsed);
   return {
     tests,
     trials: extractTrialsConfig(parsed),
@@ -210,6 +215,7 @@ export async function loadTestSuite(
     totalBudgetUsd: extractTotalBudgetUsd(parsed),
     ...(metadata !== undefined && { metadata }),
     ...(failOnError !== undefined && { failOnError }),
+    ...(threshold !== undefined && { threshold }),
   };
 }
 
diff --git a/plugins/agentv-dev/skills/agentv-eval-writer/references/eval-schema.json b/plugins/agentv-dev/skills/agentv-eval-writer/references/eval-schema.json
index a7be8362b..4df59a334 100644
--- a/plugins/agentv-dev/skills/agentv-eval-writer/references/eval-schema.json
+++ b/plugins/agentv-dev/skills/agentv-eval-writer/references/eval-schema.json
@@ -53,7 +53,12 @@
                 "properties": {
                   "role": {
                     "type": "string",
-                    "enum": ["system", "user", "assistant", "tool"]
+                    "enum": [
+                      "system",
+                      "user",
+                      "assistant",
+                      "tool"
+                    ]
                   },
                   "content": {
                     "anyOf": [
@@ -67,20 +72,29 @@
                           "properties": {
                             "type": {
                               "type": "string",
-                              "enum": ["text", "file"]
+                              "enum": [
+                                "text",
+                                "file"
+                              ]
                             },
                             "value": {
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         }
                       }
                     ]
                   }
                 },
-                "required": ["role", "content"],
+                "required": [
+                  "role",
+                  "content"
+                ],
                 "additionalProperties": false
               }
             }
@@ -121,7 +135,12 @@
                           "properties": {
                             "role": {
                               "type": "string",
-                              "enum": ["system", "user", "assistant", "tool"]
+                              "enum": [
+                                "system",
+                                "user",
+                                "assistant",
+                                "tool"
+                              ]
                             },
                             "content": {
                               "anyOf": [
@@ -135,20 +154,29 @@
                                     "properties": {
                                       "type": {
                                         "type": "string",
-                                        "enum": ["text", "file"]
+                                        "enum": [
+                                          "text",
+                                          "file"
+                                        ]
                                       },
                                       "value": {
                                         "type": "string"
                                       }
                                     },
-                                    "required": ["type", "value"],
+                                    "required": [
+                                      "type",
+                                      "value"
+                                    ],
                                     "additionalProperties": false
                                   }
                                 }
                               ]
                             }
                           },
-                          "required": ["role", "content"],
+                          "required": [
+                            "role",
+                            "content"
+                          ],
                           "additionalProperties": false
                         }
                       }
@@ -176,7 +204,12 @@
                           "properties": {
                             "role": {
                               "type": "string",
-                              "enum": ["system", "user", "assistant", "tool"]
+                              "enum": [
+                                "system",
+                                "user",
+                                "assistant",
+                                "tool"
+                              ]
                             },
                             "content": {
                               "anyOf": [
@@ -190,20 +223,29 @@
                                     "properties": {
                                       "type": {
                                         "type": "string",
-                                        "enum": ["text", "file"]
+                                        "enum": [
+                                          "text",
+                                          "file"
+                                        ]
                                       },
                                       "value": {
                                         "type": "string"
                                       }
                                     },
-                                    "required": ["type", "value"],
+                                    "required": [
+                                      "type",
+                                      "value"
+                                    ],
                                     "additionalProperties": false
                                   }
                                 }
                               ]
                             }
                           },
-                          "required": ["role", "content"],
+                          "required": [
+                            "role",
+                            "content"
+                          ],
                           "additionalProperties": false
                         }
                       }
@@ -240,7 +282,12 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                              "enum": [
+                                "code-grader",
+                                "code_grader",
+                                "code-judge",
+                                "code_judge"
+                              ]
                             },
                             "command": {
                               "anyOf": [
@@ -292,7 +339,10 @@
                               "additionalProperties": {}
                             }
                           },
-                          "required": ["type", "command"],
+                          "required": [
+                            "type",
+                            "command"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -322,7 +372,12 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                              "enum": [
+                                "llm-grader",
+                                "llm_grader",
+                                "llm-judge",
+                                "llm_judge"
+                              ]
                             },
                             "prompt": {
                               "anyOf": [
@@ -416,7 +471,10 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": ["score_range", "outcome"],
+                                      "required": [
+                                        "score_range",
+                                        "outcome"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   }
@@ -445,7 +503,9 @@
                               "maximum": 2
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -505,7 +565,9 @@
                                       }
                                     }
                                   },
-                                  "required": ["type"],
+                                  "required": [
+                                    "type"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -521,7 +583,10 @@
                                       "maximum": 1
                                     }
                                   },
-                                  "required": ["type", "threshold"],
+                                  "required": [
+                                    "type",
+                                    "threshold"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -538,7 +603,10 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type", "path"],
+                                  "required": [
+                                    "type",
+                                    "path"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -555,13 +623,18 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type"],
+                                  "required": [
+                                    "type"
+                                  ],
                                   "additionalProperties": false
                                 }
                               ]
                             }
                           },
-                          "required": ["type", "aggregator"],
+                          "required": [
+                            "type",
+                            "aggregator"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -591,11 +664,20 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["tool-trajectory", "tool_trajectory"]
+                              "enum": [
+                                "tool-trajectory",
+                                "tool_trajectory"
+                              ]
                             },
                             "mode": {
                               "type": "string",
-                              "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                              "enum": [
+                                "any_order",
+                                "in_order",
+                                "exact",
+                                "subset",
+                                "superset"
+                              ]
                             },
                             "minimums": {
                               "type": "object",
@@ -636,7 +718,12 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": ["exact", "ignore", "subset", "superset"]
+                                        "enum": [
+                                          "exact",
+                                          "ignore",
+                                          "subset",
+                                          "superset"
+                                        ]
                                       },
                                       {
                                         "type": "array",
@@ -650,7 +737,12 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": ["exact", "ignore", "subset", "superset"]
+                                        "enum": [
+                                          "exact",
+                                          "ignore",
+                                          "subset",
+                                          "superset"
+                                        ]
                                       },
                                       {
                                         "type": "array",
@@ -661,7 +753,9 @@
                                     ]
                                   }
                                 },
-                                "required": ["tool"],
+                                "required": [
+                                  "tool"
+                                ],
                                 "additionalProperties": false
                               }
                             },
@@ -669,7 +763,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -683,7 +782,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -694,7 +798,10 @@
                               ]
                             }
                           },
-                          "required": ["type", "mode"],
+                          "required": [
+                            "type",
+                            "mode"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -724,7 +831,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["field-accuracy", "field_accuracy"]
+                              "enum": [
+                                "field-accuracy",
+                                "field_accuracy"
+                              ]
                             },
                             "fields": {
                               "type": "array",
@@ -736,7 +846,11 @@
                                   },
                                   "match": {
                                     "type": "string",
-                                    "enum": ["exact", "numeric_tolerance", "date"]
+                                    "enum": [
+                                      "exact",
+                                      "numeric_tolerance",
+                                      "date"
+                                    ]
                                   },
                                   "required": {
                                     "type": "boolean"
@@ -758,17 +872,26 @@
                                     }
                                   }
                                 },
-                                "required": ["path", "match"],
+                                "required": [
+                                  "path",
+                                  "match"
+                                ],
                                 "additionalProperties": false
                               },
                               "minItems": 1
                             },
                             "aggregation": {
                               "type": "string",
-                              "enum": ["weighted_average", "all_or_nothing"]
+                              "enum": [
+                                "weighted_average",
+                                "all_or_nothing"
+                              ]
                             }
                           },
-                          "required": ["type", "fields"],
+                          "required": [
+                            "type",
+                            "fields"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -805,7 +928,10 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type", "threshold"],
+                          "required": [
+                            "type",
+                            "threshold"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -842,7 +968,10 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type", "budget"],
+                          "required": [
+                            "type",
+                            "budget"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -872,7 +1001,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["token-usage", "token_usage"]
+                              "enum": [
+                                "token-usage",
+                                "token_usage"
+                              ]
                             },
                             "max_total": {
                               "type": "number",
@@ -887,7 +1019,9 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -917,7 +1051,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["execution-metrics", "execution_metrics"]
+                              "enum": [
+                                "execution-metrics",
+                                "execution_metrics"
+                              ]
                             },
                             "max_tool_calls": {
                               "type": "number",
@@ -949,7 +1086,9 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -985,7 +1124,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1021,7 +1163,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1051,10 +1196,15 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["is-json", "is_json"]
+                              "enum": [
+                                "is-json",
+                                "is_json"
+                              ]
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1090,7 +1240,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1171,7 +1324,10 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": ["score_range", "outcome"],
+                                      "required": [
+                                        "score_range",
+                                        "outcome"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   }
@@ -1181,7 +1337,10 @@
                               "minItems": 1
                             }
                           },
-                          "required": ["type", "criteria"],
+                          "required": [
+                            "type",
+                            "criteria"
+                          ],
                           "additionalProperties": false
                         }
                       ]
@@ -1218,7 +1377,12 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                              "enum": [
+                                "code-grader",
+                                "code_grader",
+                                "code-judge",
+                                "code_judge"
+                              ]
                             },
                             "command": {
                               "anyOf": [
@@ -1270,7 +1434,10 @@
                               "additionalProperties": {}
                             }
                           },
-                          "required": ["type", "command"],
+                          "required": [
+                            "type",
+                            "command"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1300,7 +1467,12 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                              "enum": [
+                                "llm-grader",
+                                "llm_grader",
+                                "llm-judge",
+                                "llm_judge"
+                              ]
                             },
                             "prompt": {
                               "anyOf": [
@@ -1394,7 +1566,10 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": ["score_range", "outcome"],
+                                      "required": [
+                                        "score_range",
+                                        "outcome"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   }
@@ -1423,7 +1598,9 @@
                               "maximum": 2
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1483,7 +1660,9 @@
                                       }
                                     }
                                   },
-                                  "required": ["type"],
+                                  "required": [
+                                    "type"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -1499,7 +1678,10 @@
                                       "maximum": 1
                                     }
                                   },
-                                  "required": ["type", "threshold"],
+                                  "required": [
+                                    "type",
+                                    "threshold"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -1516,7 +1698,10 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type", "path"],
+                                  "required": [
+                                    "type",
+                                    "path"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -1533,13 +1718,18 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type"],
+                                  "required": [
+                                    "type"
+                                  ],
                                   "additionalProperties": false
                                 }
                               ]
                             }
                           },
-                          "required": ["type", "aggregator"],
+                          "required": [
+                            "type",
+                            "aggregator"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1569,11 +1759,20 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["tool-trajectory", "tool_trajectory"]
+                              "enum": [
+                                "tool-trajectory",
+                                "tool_trajectory"
+                              ]
                             },
                             "mode": {
                               "type": "string",
-                              "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                              "enum": [
+                                "any_order",
+                                "in_order",
+                                "exact",
+                                "subset",
+                                "superset"
+                              ]
                             },
                             "minimums": {
                               "type": "object",
@@ -1614,7 +1813,12 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": ["exact", "ignore", "subset", "superset"]
+                                        "enum": [
+                                          "exact",
+                                          "ignore",
+                                          "subset",
+                                          "superset"
+                                        ]
                                       },
                                       {
                                         "type": "array",
@@ -1628,7 +1832,12 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": ["exact", "ignore", "subset", "superset"]
+                                        "enum": [
+                                          "exact",
+                                          "ignore",
+                                          "subset",
+                                          "superset"
+                                        ]
                                       },
                                       {
                                         "type": "array",
@@ -1639,7 +1848,9 @@
                                     ]
                                   }
                                 },
-                                "required": ["tool"],
+                                "required": [
+                                  "tool"
+                                ],
                                 "additionalProperties": false
                               }
                             },
@@ -1647,7 +1858,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -1661,7 +1877,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -1672,7 +1893,10 @@
                               ]
                             }
                           },
-                          "required": ["type", "mode"],
+                          "required": [
+                            "type",
+                            "mode"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1702,7 +1926,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["field-accuracy", "field_accuracy"]
+                              "enum": [
+                                "field-accuracy",
+                                "field_accuracy"
+                              ]
                             },
                             "fields": {
                               "type": "array",
@@ -1714,7 +1941,11 @@
                                   },
                                   "match": {
                                     "type": "string",
-                                    "enum": ["exact", "numeric_tolerance", "date"]
+                                    "enum": [
+                                      "exact",
+                                      "numeric_tolerance",
+                                      "date"
+                                    ]
                                   },
                                   "required": {
                                     "type": "boolean"
@@ -1736,17 +1967,26 @@
                                     }
                                   }
                                 },
-                                "required": ["path", "match"],
+                                "required": [
+                                  "path",
+                                  "match"
+                                ],
                                 "additionalProperties": false
                               },
                               "minItems": 1
                             },
                             "aggregation": {
                               "type": "string",
-                              "enum": ["weighted_average", "all_or_nothing"]
+                              "enum": [
+                                "weighted_average",
+                                "all_or_nothing"
+                              ]
                             }
                           },
-                          "required": ["type", "fields"],
+                          "required": [
+                            "type",
+                            "fields"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1783,7 +2023,10 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type", "threshold"],
+                          "required": [
+                            "type",
+                            "threshold"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1820,7 +2063,10 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type", "budget"],
+                          "required": [
+                            "type",
+                            "budget"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1850,7 +2096,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["token-usage", "token_usage"]
+                              "enum": [
+                                "token-usage",
+                                "token_usage"
+                              ]
                             },
                             "max_total": {
                               "type": "number",
@@ -1865,7 +2114,9 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1895,7 +2146,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["execution-metrics", "execution_metrics"]
+                              "enum": [
+                                "execution-metrics",
+                                "execution_metrics"
+                              ]
                             },
                             "max_tool_calls": {
                               "type": "number",
@@ -1927,7 +2181,9 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1963,7 +2219,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -1999,7 +2258,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -2029,10 +2291,15 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["is-json", "is_json"]
+                              "enum": [
+                                "is-json",
+                                "is_json"
+                              ]
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -2068,7 +2335,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -2149,7 +2419,10 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": ["score_range", "outcome"],
+                                      "required": [
+                                        "score_range",
+                                        "outcome"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   }
@@ -2159,7 +2432,10 @@
                               "minItems": 1
                             }
                           },
-                          "required": ["type", "criteria"],
+                          "required": [
+                            "type",
+                            "criteria"
+                          ],
                           "additionalProperties": false
                         }
                       ]
@@ -2196,7 +2472,12 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                              "enum": [
+                                "code-grader",
+                                "code_grader",
+                                "code-judge",
+                                "code_judge"
+                              ]
                             },
                             "command": {
                               "anyOf": [
@@ -2248,7 +2529,10 @@
                               "additionalProperties": {}
                             }
                           },
-                          "required": ["type", "command"],
+                          "required": [
+                            "type",
+                            "command"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -2278,7 +2562,12 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                              "enum": [
+                                "llm-grader",
+                                "llm_grader",
+                                "llm-judge",
+                                "llm_judge"
+                              ]
                             },
                             "prompt": {
                               "anyOf": [
@@ -2372,7 +2661,10 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": ["score_range", "outcome"],
+                                      "required": [
+                                        "score_range",
+                                        "outcome"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   }
@@ -2401,7 +2693,9 @@
                               "maximum": 2
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -2461,7 +2755,9 @@
                                       }
                                     }
                                   },
-                                  "required": ["type"],
+                                  "required": [
+                                    "type"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -2477,7 +2773,10 @@
                                       "maximum": 1
                                     }
                                   },
-                                  "required": ["type", "threshold"],
+                                  "required": [
+                                    "type",
+                                    "threshold"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -2494,7 +2793,10 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type", "path"],
+                                  "required": [
+                                    "type",
+                                    "path"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -2511,13 +2813,18 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type"],
+                                  "required": [
+                                    "type"
+                                  ],
                                   "additionalProperties": false
                                 }
                               ]
                             }
                           },
-                          "required": ["type", "aggregator"],
+                          "required": [
+                            "type",
+                            "aggregator"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -2547,11 +2854,20 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["tool-trajectory", "tool_trajectory"]
+                              "enum": [
+                                "tool-trajectory",
+                                "tool_trajectory"
+                              ]
                             },
                             "mode": {
                               "type": "string",
-                              "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                              "enum": [
+                                "any_order",
+                                "in_order",
+                                "exact",
+                                "subset",
+                                "superset"
+                              ]
                             },
                             "minimums": {
                               "type": "object",
@@ -2592,7 +2908,12 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": ["exact", "ignore", "subset", "superset"]
+                                        "enum": [
+                                          "exact",
+                                          "ignore",
+                                          "subset",
+                                          "superset"
+                                        ]
                                       },
                                       {
                                         "type": "array",
@@ -2606,7 +2927,12 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": ["exact", "ignore", "subset", "superset"]
+                                        "enum": [
+                                          "exact",
+                                          "ignore",
+                                          "subset",
+                                          "superset"
+                                        ]
                                       },
                                       {
                                         "type": "array",
@@ -2617,7 +2943,9 @@
                                     ]
                                   }
                                 },
-                                "required": ["tool"],
+                                "required": [
+                                  "tool"
+                                ],
                                 "additionalProperties": false
                               }
                             },
@@ -2625,7 +2953,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -2639,7 +2972,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -2650,7 +2988,10 @@
                               ]
                             }
                           },
-                          "required": ["type", "mode"],
+                          "required": [
+                            "type",
+                            "mode"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -2680,7 +3021,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["field-accuracy", "field_accuracy"]
+                              "enum": [
+                                "field-accuracy",
+                                "field_accuracy"
+                              ]
                             },
                             "fields": {
                               "type": "array",
@@ -2692,7 +3036,11 @@
                                   },
                                   "match": {
                                     "type": "string",
-                                    "enum": ["exact", "numeric_tolerance", "date"]
+                                    "enum": [
+                                      "exact",
+                                      "numeric_tolerance",
+                                      "date"
+                                    ]
                                   },
                                   "required": {
                                     "type": "boolean"
@@ -2714,17 +3062,26 @@
                                     }
                                   }
                                 },
-                                "required": ["path", "match"],
+                                "required": [
+                                  "path",
+                                  "match"
+                                ],
                                 "additionalProperties": false
                               },
                               "minItems": 1
                             },
                             "aggregation": {
                               "type": "string",
-                              "enum": ["weighted_average", "all_or_nothing"]
+                              "enum": [
+                                "weighted_average",
+                                "all_or_nothing"
+                              ]
                             }
                           },
-                          "required": ["type", "fields"],
+                          "required": [
+                            "type",
+                            "fields"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -2761,7 +3118,10 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type", "threshold"],
+                          "required": [
+                            "type",
+                            "threshold"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -2798,7 +3158,10 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type", "budget"],
+                          "required": [
+                            "type",
+                            "budget"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -2828,7 +3191,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["token-usage", "token_usage"]
+                              "enum": [
+                                "token-usage",
+                                "token_usage"
+                              ]
                             },
                             "max_total": {
                               "type": "number",
@@ -2843,7 +3209,9 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -2873,7 +3241,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["execution-metrics", "execution_metrics"]
+                              "enum": [
+                                "execution-metrics",
+                                "execution_metrics"
+                              ]
                             },
                             "max_tool_calls": {
                               "type": "number",
@@ -2905,7 +3276,9 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -2941,7 +3314,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -2977,7 +3353,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -3007,10 +3386,15 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["is-json", "is_json"]
+                              "enum": [
+                                "is-json",
+                                "is_json"
+                              ]
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -3046,7 +3430,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -3127,7 +3514,10 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": ["score_range", "outcome"],
+                                      "required": [
+                                        "score_range",
+                                        "outcome"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   }
@@ -3137,7 +3527,10 @@
                               "minItems": 1
                             }
                           },
-                          "required": ["type", "criteria"],
+                          "required": [
+                            "type",
+                            "criteria"
+                          ],
                           "additionalProperties": false
                         }
                       ]
@@ -3191,7 +3584,12 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                                  "enum": [
+                                    "code-grader",
+                                    "code_grader",
+                                    "code-judge",
+                                    "code_judge"
+                                  ]
                                 },
                                 "command": {
                                   "anyOf": [
@@ -3243,7 +3641,10 @@
                                   "additionalProperties": {}
                                 }
                               },
-                              "required": ["type", "command"],
+                              "required": [
+                                "type",
+                                "command"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -3273,7 +3674,12 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                                  "enum": [
+                                    "llm-grader",
+                                    "llm_grader",
+                                    "llm-judge",
+                                    "llm_judge"
+                                  ]
                                 },
                                 "prompt": {
                                   "anyOf": [
@@ -3367,7 +3773,10 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": ["score_range", "outcome"],
+                                          "required": [
+                                            "score_range",
+                                            "outcome"
+                                          ],
                                           "additionalProperties": false
                                         }
                                       }
@@ -3396,7 +3805,9 @@
                                   "maximum": 2
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -3456,7 +3867,9 @@
                                           }
                                         }
                                       },
-                                      "required": ["type"],
+                                      "required": [
+                                        "type"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -3472,7 +3885,10 @@
                                           "maximum": 1
                                         }
                                       },
-                                      "required": ["type", "threshold"],
+                                      "required": [
+                                        "type",
+                                        "threshold"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -3489,7 +3905,10 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": ["type", "path"],
+                                      "required": [
+                                        "type",
+                                        "path"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -3506,13 +3925,18 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": ["type"],
+                                      "required": [
+                                        "type"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   ]
                                 }
                               },
-                              "required": ["type", "aggregator"],
+                              "required": [
+                                "type",
+                                "aggregator"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -3542,11 +3966,20 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["tool-trajectory", "tool_trajectory"]
+                                  "enum": [
+                                    "tool-trajectory",
+                                    "tool_trajectory"
+                                  ]
                                 },
                                 "mode": {
                                   "type": "string",
-                                  "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                                  "enum": [
+                                    "any_order",
+                                    "in_order",
+                                    "exact",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 "minimums": {
                                   "type": "object",
@@ -3587,7 +4020,12 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": ["exact", "ignore", "subset", "superset"]
+                                            "enum": [
+                                              "exact",
+                                              "ignore",
+                                              "subset",
+                                              "superset"
+                                            ]
                                           },
                                           {
                                             "type": "array",
@@ -3601,7 +4039,12 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": ["exact", "ignore", "subset", "superset"]
+                                            "enum": [
+                                              "exact",
+                                              "ignore",
+                                              "subset",
+                                              "superset"
+                                            ]
                                           },
                                           {
                                             "type": "array",
@@ -3612,7 +4055,9 @@
                                         ]
                                       }
                                     },
-                                    "required": ["tool"],
+                                    "required": [
+                                      "tool"
+                                    ],
                                     "additionalProperties": false
                                   }
                                 },
@@ -3620,7 +4065,12 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": ["exact", "ignore", "subset", "superset"]
+                                      "enum": [
+                                        "exact",
+                                        "ignore",
+                                        "subset",
+                                        "superset"
+                                      ]
                                     },
                                     {
                                       "type": "array",
@@ -3634,7 +4084,12 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": ["exact", "ignore", "subset", "superset"]
+                                      "enum": [
+                                        "exact",
+                                        "ignore",
+                                        "subset",
+                                        "superset"
+                                      ]
                                     },
                                     {
                                       "type": "array",
@@ -3645,7 +4100,10 @@
                                   ]
                                 }
                               },
-                              "required": ["type", "mode"],
+                              "required": [
+                                "type",
+                                "mode"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -3675,7 +4133,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["field-accuracy", "field_accuracy"]
+                                  "enum": [
+                                    "field-accuracy",
+                                    "field_accuracy"
+                                  ]
                                 },
                                 "fields": {
                                   "type": "array",
@@ -3687,7 +4148,11 @@
                                       },
                                       "match": {
                                         "type": "string",
-                                        "enum": ["exact", "numeric_tolerance", "date"]
+                                        "enum": [
+                                          "exact",
+                                          "numeric_tolerance",
+                                          "date"
+                                        ]
                                       },
                                       "required": {
                                         "type": "boolean"
@@ -3709,17 +4174,26 @@
                                         }
                                       }
                                     },
-                                    "required": ["path", "match"],
+                                    "required": [
+                                      "path",
+                                      "match"
+                                    ],
                                     "additionalProperties": false
                                   },
                                   "minItems": 1
                                 },
                                 "aggregation": {
                                   "type": "string",
-                                  "enum": ["weighted_average", "all_or_nothing"]
+                                  "enum": [
+                                    "weighted_average",
+                                    "all_or_nothing"
+                                  ]
                                 }
                               },
-                              "required": ["type", "fields"],
+                              "required": [
+                                "type",
+                                "fields"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -3756,7 +4230,10 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type", "threshold"],
+                              "required": [
+                                "type",
+                                "threshold"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -3793,7 +4270,10 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type", "budget"],
+                              "required": [
+                                "type",
+                                "budget"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -3823,7 +4303,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["token-usage", "token_usage"]
+                                  "enum": [
+                                    "token-usage",
+                                    "token_usage"
+                                  ]
                                 },
                                 "max_total": {
                                   "type": "number",
@@ -3838,7 +4321,9 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -3868,7 +4353,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["execution-metrics", "execution_metrics"]
+                                  "enum": [
+                                    "execution-metrics",
+                                    "execution_metrics"
+                                  ]
                                 },
                                 "max_tool_calls": {
                                   "type": "number",
@@ -3900,7 +4388,9 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -3936,7 +4426,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -3972,7 +4465,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4002,10 +4498,15 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["is-json", "is_json"]
+                                  "enum": [
+                                    "is-json",
+                                    "is_json"
+                                  ]
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4041,7 +4542,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4122,7 +4626,10 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": ["score_range", "outcome"],
+                                          "required": [
+                                            "score_range",
+                                            "outcome"
+                                          ],
                                           "additionalProperties": false
                                         }
                                       }
@@ -4132,7 +4639,10 @@
                                   "minItems": 1
                                 }
                               },
-                              "required": ["type", "criteria"],
+                              "required": [
+                                "type",
+                                "criteria"
+                              ],
                               "additionalProperties": false
                             }
                           ]
@@ -4169,7 +4679,12 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                                  "enum": [
+                                    "code-grader",
+                                    "code_grader",
+                                    "code-judge",
+                                    "code_judge"
+                                  ]
                                 },
                                 "command": {
                                   "anyOf": [
@@ -4221,7 +4736,10 @@
                                   "additionalProperties": {}
                                 }
                               },
-                              "required": ["type", "command"],
+                              "required": [
+                                "type",
+                                "command"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4251,7 +4769,12 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                                  "enum": [
+                                    "llm-grader",
+                                    "llm_grader",
+                                    "llm-judge",
+                                    "llm_judge"
+                                  ]
                                 },
                                 "prompt": {
                                   "anyOf": [
@@ -4345,7 +4868,10 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": ["score_range", "outcome"],
+                                          "required": [
+                                            "score_range",
+                                            "outcome"
+                                          ],
                                           "additionalProperties": false
                                         }
                                       }
@@ -4374,7 +4900,9 @@
                                   "maximum": 2
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4434,7 +4962,9 @@
                                           }
                                         }
                                       },
-                                      "required": ["type"],
+                                      "required": [
+                                        "type"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -4450,7 +4980,10 @@
                                           "maximum": 1
                                         }
                                       },
-                                      "required": ["type", "threshold"],
+                                      "required": [
+                                        "type",
+                                        "threshold"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -4467,7 +5000,10 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": ["type", "path"],
+                                      "required": [
+                                        "type",
+                                        "path"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -4484,13 +5020,18 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": ["type"],
+                                      "required": [
+                                        "type"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   ]
                                 }
                               },
-                              "required": ["type", "aggregator"],
+                              "required": [
+                                "type",
+                                "aggregator"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4520,11 +5061,20 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["tool-trajectory", "tool_trajectory"]
+                                  "enum": [
+                                    "tool-trajectory",
+                                    "tool_trajectory"
+                                  ]
                                 },
                                 "mode": {
                                   "type": "string",
-                                  "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                                  "enum": [
+                                    "any_order",
+                                    "in_order",
+                                    "exact",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 "minimums": {
                                   "type": "object",
@@ -4565,7 +5115,12 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": ["exact", "ignore", "subset", "superset"]
+                                            "enum": [
+                                              "exact",
+                                              "ignore",
+                                              "subset",
+                                              "superset"
+                                            ]
                                           },
                                           {
                                             "type": "array",
@@ -4579,7 +5134,12 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": ["exact", "ignore", "subset", "superset"]
+                                            "enum": [
+                                              "exact",
+                                              "ignore",
+                                              "subset",
+                                              "superset"
+                                            ]
                                           },
                                           {
                                             "type": "array",
@@ -4590,7 +5150,9 @@
                                         ]
                                       }
                                     },
-                                    "required": ["tool"],
+                                    "required": [
+                                      "tool"
+                                    ],
                                     "additionalProperties": false
                                   }
                                 },
@@ -4598,7 +5160,12 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": ["exact", "ignore", "subset", "superset"]
+                                      "enum": [
+                                        "exact",
+                                        "ignore",
+                                        "subset",
+                                        "superset"
+                                      ]
                                     },
                                     {
                                       "type": "array",
@@ -4612,7 +5179,12 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": ["exact", "ignore", "subset", "superset"]
+                                      "enum": [
+                                        "exact",
+                                        "ignore",
+                                        "subset",
+                                        "superset"
+                                      ]
                                     },
                                     {
                                       "type": "array",
@@ -4623,7 +5195,10 @@
                                   ]
                                 }
                               },
-                              "required": ["type", "mode"],
+                              "required": [
+                                "type",
+                                "mode"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4653,7 +5228,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["field-accuracy", "field_accuracy"]
+                                  "enum": [
+                                    "field-accuracy",
+                                    "field_accuracy"
+                                  ]
                                 },
                                 "fields": {
                                   "type": "array",
@@ -4665,7 +5243,11 @@
                                       },
                                       "match": {
                                         "type": "string",
-                                        "enum": ["exact", "numeric_tolerance", "date"]
+                                        "enum": [
+                                          "exact",
+                                          "numeric_tolerance",
+                                          "date"
+                                        ]
                                       },
                                       "required": {
                                         "type": "boolean"
@@ -4687,17 +5269,26 @@
                                         }
                                       }
                                     },
-                                    "required": ["path", "match"],
+                                    "required": [
+                                      "path",
+                                      "match"
+                                    ],
                                     "additionalProperties": false
                                   },
                                   "minItems": 1
                                 },
                                 "aggregation": {
                                   "type": "string",
-                                  "enum": ["weighted_average", "all_or_nothing"]
+                                  "enum": [
+                                    "weighted_average",
+                                    "all_or_nothing"
+                                  ]
                                 }
                               },
-                              "required": ["type", "fields"],
+                              "required": [
+                                "type",
+                                "fields"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4734,7 +5325,10 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type", "threshold"],
+                              "required": [
+                                "type",
+                                "threshold"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4771,7 +5365,10 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type", "budget"],
+                              "required": [
+                                "type",
+                                "budget"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4801,7 +5398,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["token-usage", "token_usage"]
+                                  "enum": [
+                                    "token-usage",
+                                    "token_usage"
+                                  ]
                                 },
                                 "max_total": {
                                   "type": "number",
@@ -4816,7 +5416,9 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4846,7 +5448,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["execution-metrics", "execution_metrics"]
+                                  "enum": [
+                                    "execution-metrics",
+                                    "execution_metrics"
+                                  ]
                                 },
                                 "max_tool_calls": {
                                   "type": "number",
@@ -4878,7 +5483,9 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4914,7 +5521,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4950,7 +5560,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -4980,10 +5593,15 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["is-json", "is_json"]
+                                  "enum": [
+                                    "is-json",
+                                    "is_json"
+                                  ]
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5019,7 +5637,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5100,7 +5721,10 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": ["score_range", "outcome"],
+                                          "required": [
+                                            "score_range",
+                                            "outcome"
+                                          ],
                                           "additionalProperties": false
                                         }
                                       }
@@ -5110,7 +5734,10 @@
                                   "minItems": 1
                                 }
                               },
-                              "required": ["type", "criteria"],
+                              "required": [
+                                "type",
+                                "criteria"
+                              ],
                               "additionalProperties": false
                             }
                           ]
@@ -5147,7 +5774,12 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                                  "enum": [
+                                    "code-grader",
+                                    "code_grader",
+                                    "code-judge",
+                                    "code_judge"
+                                  ]
                                 },
                                 "command": {
                                   "anyOf": [
@@ -5199,7 +5831,10 @@
                                   "additionalProperties": {}
                                 }
                               },
-                              "required": ["type", "command"],
+                              "required": [
+                                "type",
+                                "command"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5229,7 +5864,12 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                                  "enum": [
+                                    "llm-grader",
+                                    "llm_grader",
+                                    "llm-judge",
+                                    "llm_judge"
+                                  ]
                                 },
                                 "prompt": {
                                   "anyOf": [
@@ -5323,7 +5963,10 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": ["score_range", "outcome"],
+                                          "required": [
+                                            "score_range",
+                                            "outcome"
+                                          ],
                                           "additionalProperties": false
                                         }
                                       }
@@ -5352,7 +5995,9 @@
                                   "maximum": 2
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5412,7 +6057,9 @@
                                           }
                                         }
                                       },
-                                      "required": ["type"],
+                                      "required": [
+                                        "type"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -5428,7 +6075,10 @@
                                           "maximum": 1
                                         }
                                       },
-                                      "required": ["type", "threshold"],
+                                      "required": [
+                                        "type",
+                                        "threshold"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -5445,7 +6095,10 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": ["type", "path"],
+                                      "required": [
+                                        "type",
+                                        "path"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -5462,13 +6115,18 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": ["type"],
+                                      "required": [
+                                        "type"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   ]
                                 }
                               },
-                              "required": ["type", "aggregator"],
+                              "required": [
+                                "type",
+                                "aggregator"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5498,11 +6156,20 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["tool-trajectory", "tool_trajectory"]
+                                  "enum": [
+                                    "tool-trajectory",
+                                    "tool_trajectory"
+                                  ]
                                 },
                                 "mode": {
                                   "type": "string",
-                                  "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                                  "enum": [
+                                    "any_order",
+                                    "in_order",
+                                    "exact",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 "minimums": {
                                   "type": "object",
@@ -5543,7 +6210,12 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": ["exact", "ignore", "subset", "superset"]
+                                            "enum": [
+                                              "exact",
+                                              "ignore",
+                                              "subset",
+                                              "superset"
+                                            ]
                                           },
                                           {
                                             "type": "array",
@@ -5557,7 +6229,12 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": ["exact", "ignore", "subset", "superset"]
+                                            "enum": [
+                                              "exact",
+                                              "ignore",
+                                              "subset",
+                                              "superset"
+                                            ]
                                           },
                                           {
                                             "type": "array",
@@ -5568,7 +6245,9 @@
                                         ]
                                       }
                                     },
-                                    "required": ["tool"],
+                                    "required": [
+                                      "tool"
+                                    ],
                                     "additionalProperties": false
                                   }
                                 },
@@ -5576,7 +6255,12 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": ["exact", "ignore", "subset", "superset"]
+                                      "enum": [
+                                        "exact",
+                                        "ignore",
+                                        "subset",
+                                        "superset"
+                                      ]
                                     },
                                     {
                                       "type": "array",
@@ -5590,7 +6274,12 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": ["exact", "ignore", "subset", "superset"]
+                                      "enum": [
+                                        "exact",
+                                        "ignore",
+                                        "subset",
+                                        "superset"
+                                      ]
                                     },
                                     {
                                       "type": "array",
@@ -5601,7 +6290,10 @@
                                   ]
                                 }
                               },
-                              "required": ["type", "mode"],
+                              "required": [
+                                "type",
+                                "mode"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5631,7 +6323,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["field-accuracy", "field_accuracy"]
+                                  "enum": [
+                                    "field-accuracy",
+                                    "field_accuracy"
+                                  ]
                                 },
                                 "fields": {
                                   "type": "array",
@@ -5643,7 +6338,11 @@
                                       },
                                       "match": {
                                         "type": "string",
-                                        "enum": ["exact", "numeric_tolerance", "date"]
+                                        "enum": [
+                                          "exact",
+                                          "numeric_tolerance",
+                                          "date"
+                                        ]
                                       },
                                       "required": {
                                         "type": "boolean"
@@ -5665,17 +6364,26 @@
                                         }
                                       }
                                     },
-                                    "required": ["path", "match"],
+                                    "required": [
+                                      "path",
+                                      "match"
+                                    ],
                                     "additionalProperties": false
                                   },
                                   "minItems": 1
                                 },
                                 "aggregation": {
                                   "type": "string",
-                                  "enum": ["weighted_average", "all_or_nothing"]
+                                  "enum": [
+                                    "weighted_average",
+                                    "all_or_nothing"
+                                  ]
                                 }
                               },
-                              "required": ["type", "fields"],
+                              "required": [
+                                "type",
+                                "fields"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5712,7 +6420,10 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type", "threshold"],
+                              "required": [
+                                "type",
+                                "threshold"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5749,7 +6460,10 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type", "budget"],
+                              "required": [
+                                "type",
+                                "budget"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5779,7 +6493,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["token-usage", "token_usage"]
+                                  "enum": [
+                                    "token-usage",
+                                    "token_usage"
+                                  ]
                                 },
                                 "max_total": {
                                   "type": "number",
@@ -5794,7 +6511,9 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5824,7 +6543,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["execution-metrics", "execution_metrics"]
+                                  "enum": [
+                                    "execution-metrics",
+                                    "execution_metrics"
+                                  ]
                                 },
                                 "max_tool_calls": {
                                   "type": "number",
@@ -5856,7 +6578,9 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5892,7 +6616,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5928,7 +6655,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5958,10 +6688,15 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["is-json", "is_json"]
+                                  "enum": [
+                                    "is-json",
+                                    "is_json"
+                                  ]
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -5997,7 +6732,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -6078,7 +6816,10 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": ["score_range", "outcome"],
+                                          "required": [
+                                            "score_range",
+                                            "outcome"
+                                          ],
                                           "additionalProperties": false
                                         }
                                       }
@@ -6088,7 +6829,10 @@
                                   "minItems": 1
                                 }
                               },
-                              "required": ["type", "criteria"],
+                              "required": [
+                                "type",
+                                "criteria"
+                              ],
                               "additionalProperties": false
                             }
                           ]
@@ -6109,7 +6853,11 @@
                           },
                           "strategy": {
                             "type": "string",
-                            "enum": ["pass_at_k", "mean", "confidence_interval"]
+                            "enum": [
+                              "pass_at_k",
+                              "mean",
+                              "confidence_interval"
+                            ]
                           },
                           "cost_limit_usd": {
                             "type": "number",
@@ -6120,7 +6868,9 @@
                             "minimum": 0
                           }
                         },
-                        "required": ["count"],
+                        "required": [
+                          "count"
+                        ],
                         "additionalProperties": false
                       },
                       "total_budget_usd": {
@@ -6136,6 +6886,11 @@
                       },
                       "failOnError": {
                         "type": "boolean"
+                      },
+                      "threshold": {
+                        "type": "number",
+                        "minimum": 0,
+                        "maximum": 1
                       }
                     },
                     "additionalProperties": false
@@ -6148,7 +6903,10 @@
                       },
                       "isolation": {
                         "type": "string",
-                        "enum": ["shared", "per_test"]
+                        "enum": [
+                          "shared",
+                          "per_test"
+                        ]
                       },
                       "repos": {
                         "type": "array",
@@ -6172,7 +6930,10 @@
                                       "format": "uri"
                                     }
                                   },
-                                  "required": ["type", "url"],
+                                  "required": [
+                                    "type",
+                                    "url"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -6186,7 +6947,10 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type", "path"],
+                                  "required": [
+                                    "type",
+                                    "path"
+                                  ],
                                   "additionalProperties": false
                                 }
                               ]
@@ -6199,7 +6963,10 @@
                                 },
                                 "resolve": {
                                   "type": "string",
-                                  "enum": ["remote", "local"]
+                                  "enum": [
+                                    "remote",
+                                    "local"
+                                  ]
                                 },
                                 "ancestor": {
                                   "type": "integer",
@@ -6228,7 +6995,10 @@
                               "additionalProperties": false
                             }
                           },
-                          "required": ["path", "source"],
+                          "required": [
+                            "path",
+                            "source"
+                          ],
                           "additionalProperties": false
                         }
                       },
@@ -6264,7 +7034,11 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": ["none", "fast", "strict"]
+                                "enum": [
+                                  "none",
+                                  "fast",
+                                  "strict"
+                                ]
                               }
                             },
                             "additionalProperties": false
@@ -6295,7 +7069,11 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": ["none", "fast", "strict"]
+                                "enum": [
+                                  "none",
+                                  "fast",
+                                  "strict"
+                                ]
                               }
                             },
                             "additionalProperties": false
@@ -6326,7 +7104,11 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": ["none", "fast", "strict"]
+                                "enum": [
+                                  "none",
+                                  "fast",
+                                  "strict"
+                                ]
                               }
                             },
                             "additionalProperties": false
@@ -6357,7 +7139,11 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": ["none", "fast", "strict"]
+                                "enum": [
+                                  "none",
+                                  "fast",
+                                  "strict"
+                                ]
                               }
                             },
                             "additionalProperties": false
@@ -6367,7 +7153,11 @@
                       },
                       "mode": {
                         "type": "string",
-                        "enum": ["pooled", "temp", "static"]
+                        "enum": [
+                          "pooled",
+                          "temp",
+                          "static"
+                        ]
                       },
                       "path": {
                         "type": "string"
@@ -6389,7 +7179,9 @@
                     "type": "string"
                   }
                 },
-                "required": ["id"],
+                "required": [
+                  "id"
+                ],
                 "additionalProperties": false
               }
             },
@@ -6427,7 +7219,12 @@
                           "properties": {
                             "role": {
                               "type": "string",
-                              "enum": ["system", "user", "assistant", "tool"]
+                              "enum": [
+                                "system",
+                                "user",
+                                "assistant",
+                                "tool"
+                              ]
                             },
                             "content": {
                               "anyOf": [
@@ -6441,20 +7238,29 @@
                                     "properties": {
                                       "type": {
                                         "type": "string",
-                                        "enum": ["text", "file"]
+                                        "enum": [
+                                          "text",
+                                          "file"
+                                        ]
                                       },
                                       "value": {
                                         "type": "string"
                                       }
                                     },
-                                    "required": ["type", "value"],
+                                    "required": [
+                                      "type",
+                                      "value"
+                                    ],
                                     "additionalProperties": false
                                   }
                                 }
                               ]
                             }
                           },
-                          "required": ["role", "content"],
+                          "required": [
+                            "role",
+                            "content"
+                          ],
                           "additionalProperties": false
                         }
                       }
@@ -6482,7 +7288,12 @@
                           "properties": {
                             "role": {
                               "type": "string",
-                              "enum": ["system", "user", "assistant", "tool"]
+                              "enum": [
+                                "system",
+                                "user",
+                                "assistant",
+                                "tool"
+                              ]
                             },
                             "content": {
                               "anyOf": [
@@ -6496,20 +7307,29 @@
                                     "properties": {
                                       "type": {
                                         "type": "string",
-                                        "enum": ["text", "file"]
+                                        "enum": [
+                                          "text",
+                                          "file"
+                                        ]
                                       },
                                       "value": {
                                         "type": "string"
                                       }
                                     },
-                                    "required": ["type", "value"],
+                                    "required": [
+                                      "type",
+                                      "value"
+                                    ],
                                     "additionalProperties": false
                                   }
                                 }
                               ]
                             }
                           },
-                          "required": ["role", "content"],
+                          "required": [
+                            "role",
+                            "content"
+                          ],
                           "additionalProperties": false
                         }
                       }
@@ -6546,7 +7366,12 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                              "enum": [
+                                "code-grader",
+                                "code_grader",
+                                "code-judge",
+                                "code_judge"
+                              ]
                             },
                             "command": {
                               "anyOf": [
@@ -6598,7 +7423,10 @@
                               "additionalProperties": {}
                             }
                           },
-                          "required": ["type", "command"],
+                          "required": [
+                            "type",
+                            "command"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -6628,7 +7456,12 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                              "enum": [
+                                "llm-grader",
+                                "llm_grader",
+                                "llm-judge",
+                                "llm_judge"
+                              ]
                             },
                             "prompt": {
                               "anyOf": [
@@ -6722,7 +7555,10 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": ["score_range", "outcome"],
+                                      "required": [
+                                        "score_range",
+                                        "outcome"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   }
@@ -6751,7 +7587,9 @@
                               "maximum": 2
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -6811,7 +7649,9 @@
                                       }
                                     }
                                   },
-                                  "required": ["type"],
+                                  "required": [
+                                    "type"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -6827,7 +7667,10 @@
                                       "maximum": 1
                                     }
                                   },
-                                  "required": ["type", "threshold"],
+                                  "required": [
+                                    "type",
+                                    "threshold"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -6844,7 +7687,10 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type", "path"],
+                                  "required": [
+                                    "type",
+                                    "path"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -6861,13 +7707,18 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type"],
+                                  "required": [
+                                    "type"
+                                  ],
                                   "additionalProperties": false
                                 }
                               ]
                             }
                           },
-                          "required": ["type", "aggregator"],
+                          "required": [
+                            "type",
+                            "aggregator"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -6897,11 +7748,20 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["tool-trajectory", "tool_trajectory"]
+                              "enum": [
+                                "tool-trajectory",
+                                "tool_trajectory"
+                              ]
                             },
                             "mode": {
                               "type": "string",
-                              "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                              "enum": [
+                                "any_order",
+                                "in_order",
+                                "exact",
+                                "subset",
+                                "superset"
+                              ]
                             },
                             "minimums": {
                               "type": "object",
@@ -6942,7 +7802,12 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": ["exact", "ignore", "subset", "superset"]
+                                        "enum": [
+                                          "exact",
+                                          "ignore",
+                                          "subset",
+                                          "superset"
+                                        ]
                                       },
                                       {
                                         "type": "array",
@@ -6956,7 +7821,12 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": ["exact", "ignore", "subset", "superset"]
+                                        "enum": [
+                                          "exact",
+                                          "ignore",
+                                          "subset",
+                                          "superset"
+                                        ]
                                       },
                                       {
                                         "type": "array",
@@ -6967,7 +7837,9 @@
                                     ]
                                   }
                                 },
-                                "required": ["tool"],
+                                "required": [
+                                  "tool"
+                                ],
                                 "additionalProperties": false
                               }
                             },
@@ -6975,7 +7847,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -6989,7 +7866,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -7000,7 +7882,10 @@
                               ]
                             }
                           },
-                          "required": ["type", "mode"],
+                          "required": [
+                            "type",
+                            "mode"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -7030,7 +7915,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["field-accuracy", "field_accuracy"]
+                              "enum": [
+                                "field-accuracy",
+                                "field_accuracy"
+                              ]
                             },
                             "fields": {
                               "type": "array",
@@ -7042,7 +7930,11 @@
                                   },
                                   "match": {
                                     "type": "string",
-                                    "enum": ["exact", "numeric_tolerance", "date"]
+                                    "enum": [
+                                      "exact",
+                                      "numeric_tolerance",
+                                      "date"
+                                    ]
                                   },
                                   "required": {
                                     "type": "boolean"
@@ -7064,17 +7956,26 @@
                                     }
                                   }
                                 },
-                                "required": ["path", "match"],
+                                "required": [
+                                  "path",
+                                  "match"
+                                ],
                                 "additionalProperties": false
                               },
                               "minItems": 1
                             },
                             "aggregation": {
                               "type": "string",
-                              "enum": ["weighted_average", "all_or_nothing"]
+                              "enum": [
+                                "weighted_average",
+                                "all_or_nothing"
+                              ]
                             }
                           },
-                          "required": ["type", "fields"],
+                          "required": [
+                            "type",
+                            "fields"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -7111,7 +8012,10 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type", "threshold"],
+                          "required": [
+                            "type",
+                            "threshold"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -7148,7 +8052,10 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type", "budget"],
+                          "required": [
+                            "type",
+                            "budget"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -7178,7 +8085,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["token-usage", "token_usage"]
+                              "enum": [
+                                "token-usage",
+                                "token_usage"
+                              ]
                             },
                             "max_total": {
                               "type": "number",
@@ -7193,7 +8103,9 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -7223,7 +8135,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["execution-metrics", "execution_metrics"]
+                              "enum": [
+                                "execution-metrics",
+                                "execution_metrics"
+                              ]
                             },
                             "max_tool_calls": {
                               "type": "number",
@@ -7255,7 +8170,9 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -7291,7 +8208,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -7327,7 +8247,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -7357,10 +8280,15 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["is-json", "is_json"]
+                              "enum": [
+                                "is-json",
+                                "is_json"
+                              ]
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -7396,7 +8324,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -7477,7 +8408,10 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": ["score_range", "outcome"],
+                                      "required": [
+                                        "score_range",
+                                        "outcome"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   }
@@ -7487,7 +8421,10 @@
                               "minItems": 1
                             }
                           },
-                          "required": ["type", "criteria"],
+                          "required": [
+                            "type",
+                            "criteria"
+                          ],
                           "additionalProperties": false
                         }
                       ]
@@ -7524,7 +8461,12 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                              "enum": [
+                                "code-grader",
+                                "code_grader",
+                                "code-judge",
+                                "code_judge"
+                              ]
                             },
                             "command": {
                               "anyOf": [
@@ -7576,7 +8518,10 @@
                               "additionalProperties": {}
                             }
                           },
-                          "required": ["type", "command"],
+                          "required": [
+                            "type",
+                            "command"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -7606,7 +8551,12 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                              "enum": [
+                                "llm-grader",
+                                "llm_grader",
+                                "llm-judge",
+                                "llm_judge"
+                              ]
                             },
                             "prompt": {
                               "anyOf": [
@@ -7700,7 +8650,10 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": ["score_range", "outcome"],
+                                      "required": [
+                                        "score_range",
+                                        "outcome"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   }
@@ -7729,7 +8682,9 @@
                               "maximum": 2
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -7789,7 +8744,9 @@
                                       }
                                     }
                                   },
-                                  "required": ["type"],
+                                  "required": [
+                                    "type"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -7805,7 +8762,10 @@
                                       "maximum": 1
                                     }
                                   },
-                                  "required": ["type", "threshold"],
+                                  "required": [
+                                    "type",
+                                    "threshold"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -7822,7 +8782,10 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type", "path"],
+                                  "required": [
+                                    "type",
+                                    "path"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -7839,13 +8802,18 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type"],
+                                  "required": [
+                                    "type"
+                                  ],
                                   "additionalProperties": false
                                 }
                               ]
                             }
                           },
-                          "required": ["type", "aggregator"],
+                          "required": [
+                            "type",
+                            "aggregator"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -7875,11 +8843,20 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["tool-trajectory", "tool_trajectory"]
+                              "enum": [
+                                "tool-trajectory",
+                                "tool_trajectory"
+                              ]
                             },
                             "mode": {
                               "type": "string",
-                              "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                              "enum": [
+                                "any_order",
+                                "in_order",
+                                "exact",
+                                "subset",
+                                "superset"
+                              ]
                             },
                             "minimums": {
                               "type": "object",
@@ -7920,7 +8897,12 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": ["exact", "ignore", "subset", "superset"]
+                                        "enum": [
+                                          "exact",
+                                          "ignore",
+                                          "subset",
+                                          "superset"
+                                        ]
                                       },
                                       {
                                         "type": "array",
@@ -7934,7 +8916,12 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": ["exact", "ignore", "subset", "superset"]
+                                        "enum": [
+                                          "exact",
+                                          "ignore",
+                                          "subset",
+                                          "superset"
+                                        ]
                                       },
                                       {
                                         "type": "array",
@@ -7945,7 +8932,9 @@
                                     ]
                                   }
                                 },
-                                "required": ["tool"],
+                                "required": [
+                                  "tool"
+                                ],
                                 "additionalProperties": false
                               }
                             },
@@ -7953,7 +8942,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -7967,7 +8961,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -7978,7 +8977,10 @@
                               ]
                             }
                           },
-                          "required": ["type", "mode"],
+                          "required": [
+                            "type",
+                            "mode"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8008,7 +9010,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["field-accuracy", "field_accuracy"]
+                              "enum": [
+                                "field-accuracy",
+                                "field_accuracy"
+                              ]
                             },
                             "fields": {
                               "type": "array",
@@ -8020,7 +9025,11 @@
                                   },
                                   "match": {
                                     "type": "string",
-                                    "enum": ["exact", "numeric_tolerance", "date"]
+                                    "enum": [
+                                      "exact",
+                                      "numeric_tolerance",
+                                      "date"
+                                    ]
                                   },
                                   "required": {
                                     "type": "boolean"
@@ -8042,17 +9051,26 @@
                                     }
                                   }
                                 },
-                                "required": ["path", "match"],
+                                "required": [
+                                  "path",
+                                  "match"
+                                ],
                                 "additionalProperties": false
                               },
                               "minItems": 1
                             },
                             "aggregation": {
                               "type": "string",
-                              "enum": ["weighted_average", "all_or_nothing"]
+                              "enum": [
+                                "weighted_average",
+                                "all_or_nothing"
+                              ]
                             }
                           },
-                          "required": ["type", "fields"],
+                          "required": [
+                            "type",
+                            "fields"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8089,7 +9107,10 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type", "threshold"],
+                          "required": [
+                            "type",
+                            "threshold"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8126,7 +9147,10 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type", "budget"],
+                          "required": [
+                            "type",
+                            "budget"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8156,7 +9180,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["token-usage", "token_usage"]
+                              "enum": [
+                                "token-usage",
+                                "token_usage"
+                              ]
                             },
                             "max_total": {
                               "type": "number",
@@ -8171,7 +9198,9 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8201,7 +9230,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["execution-metrics", "execution_metrics"]
+                              "enum": [
+                                "execution-metrics",
+                                "execution_metrics"
+                              ]
                             },
                             "max_tool_calls": {
                               "type": "number",
@@ -8233,7 +9265,9 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8269,7 +9303,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8305,7 +9342,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8335,10 +9375,15 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["is-json", "is_json"]
+                              "enum": [
+                                "is-json",
+                                "is_json"
+                              ]
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8374,7 +9419,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8455,7 +9503,10 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": ["score_range", "outcome"],
+                                      "required": [
+                                        "score_range",
+                                        "outcome"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   }
@@ -8465,7 +9516,10 @@
                               "minItems": 1
                             }
                           },
-                          "required": ["type", "criteria"],
+                          "required": [
+                            "type",
+                            "criteria"
+                          ],
                           "additionalProperties": false
                         }
                       ]
@@ -8502,7 +9556,12 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                              "enum": [
+                                "code-grader",
+                                "code_grader",
+                                "code-judge",
+                                "code_judge"
+                              ]
                             },
                             "command": {
                               "anyOf": [
@@ -8554,7 +9613,10 @@
                               "additionalProperties": {}
                             }
                           },
-                          "required": ["type", "command"],
+                          "required": [
+                            "type",
+                            "command"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8584,7 +9646,12 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                              "enum": [
+                                "llm-grader",
+                                "llm_grader",
+                                "llm-judge",
+                                "llm_judge"
+                              ]
                             },
                             "prompt": {
                               "anyOf": [
@@ -8678,7 +9745,10 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": ["score_range", "outcome"],
+                                      "required": [
+                                        "score_range",
+                                        "outcome"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   }
@@ -8707,7 +9777,9 @@
                               "maximum": 2
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8767,7 +9839,9 @@
                                       }
                                     }
                                   },
-                                  "required": ["type"],
+                                  "required": [
+                                    "type"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -8783,7 +9857,10 @@
                                       "maximum": 1
                                     }
                                   },
-                                  "required": ["type", "threshold"],
+                                  "required": [
+                                    "type",
+                                    "threshold"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -8800,7 +9877,10 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type", "path"],
+                                  "required": [
+                                    "type",
+                                    "path"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -8817,13 +9897,18 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type"],
+                                  "required": [
+                                    "type"
+                                  ],
                                   "additionalProperties": false
                                 }
                               ]
                             }
                           },
-                          "required": ["type", "aggregator"],
+                          "required": [
+                            "type",
+                            "aggregator"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8853,11 +9938,20 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["tool-trajectory", "tool_trajectory"]
+                              "enum": [
+                                "tool-trajectory",
+                                "tool_trajectory"
+                              ]
                             },
                             "mode": {
                               "type": "string",
-                              "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                              "enum": [
+                                "any_order",
+                                "in_order",
+                                "exact",
+                                "subset",
+                                "superset"
+                              ]
                             },
                             "minimums": {
                               "type": "object",
@@ -8898,7 +9992,12 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": ["exact", "ignore", "subset", "superset"]
+                                        "enum": [
+                                          "exact",
+                                          "ignore",
+                                          "subset",
+                                          "superset"
+                                        ]
                                       },
                                       {
                                         "type": "array",
@@ -8912,7 +10011,12 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": ["exact", "ignore", "subset", "superset"]
+                                        "enum": [
+                                          "exact",
+                                          "ignore",
+                                          "subset",
+                                          "superset"
+                                        ]
                                       },
                                       {
                                         "type": "array",
@@ -8923,7 +10027,9 @@
                                     ]
                                   }
                                 },
-                                "required": ["tool"],
+                                "required": [
+                                  "tool"
+                                ],
                                 "additionalProperties": false
                               }
                             },
@@ -8931,7 +10037,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -8945,7 +10056,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -8956,7 +10072,10 @@
                               ]
                             }
                           },
-                          "required": ["type", "mode"],
+                          "required": [
+                            "type",
+                            "mode"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -8986,7 +10105,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["field-accuracy", "field_accuracy"]
+                              "enum": [
+                                "field-accuracy",
+                                "field_accuracy"
+                              ]
                             },
                             "fields": {
                               "type": "array",
@@ -8998,7 +10120,11 @@
                                   },
                                   "match": {
                                     "type": "string",
-                                    "enum": ["exact", "numeric_tolerance", "date"]
+                                    "enum": [
+                                      "exact",
+                                      "numeric_tolerance",
+                                      "date"
+                                    ]
                                   },
                                   "required": {
                                     "type": "boolean"
@@ -9020,17 +10146,26 @@
                                     }
                                   }
                                 },
-                                "required": ["path", "match"],
+                                "required": [
+                                  "path",
+                                  "match"
+                                ],
                                 "additionalProperties": false
                               },
                               "minItems": 1
                             },
                             "aggregation": {
                               "type": "string",
-                              "enum": ["weighted_average", "all_or_nothing"]
+                              "enum": [
+                                "weighted_average",
+                                "all_or_nothing"
+                              ]
                             }
                           },
-                          "required": ["type", "fields"],
+                          "required": [
+                            "type",
+                            "fields"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -9067,7 +10202,10 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type", "threshold"],
+                          "required": [
+                            "type",
+                            "threshold"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -9104,7 +10242,10 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type", "budget"],
+                          "required": [
+                            "type",
+                            "budget"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -9134,7 +10275,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["token-usage", "token_usage"]
+                              "enum": [
+                                "token-usage",
+                                "token_usage"
+                              ]
                             },
                             "max_total": {
                               "type": "number",
@@ -9149,7 +10293,9 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -9179,7 +10325,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["execution-metrics", "execution_metrics"]
+                              "enum": [
+                                "execution-metrics",
+                                "execution_metrics"
+                              ]
                             },
                             "max_tool_calls": {
                               "type": "number",
@@ -9211,7 +10360,9 @@
                               "minimum": 0
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -9247,7 +10398,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -9283,7 +10437,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -9313,10 +10470,15 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": ["is-json", "is_json"]
+                              "enum": [
+                                "is-json",
+                                "is_json"
+                              ]
                             }
                           },
-                          "required": ["type"],
+                          "required": [
+                            "type"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -9352,7 +10514,10 @@
                               "type": "string"
                             }
                           },
-                          "required": ["type", "value"],
+                          "required": [
+                            "type",
+                            "value"
+                          ],
                           "additionalProperties": false
                         },
                         {
@@ -9433,7 +10598,10 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": ["score_range", "outcome"],
+                                      "required": [
+                                        "score_range",
+                                        "outcome"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   }
@@ -9443,7 +10611,10 @@
                               "minItems": 1
                             }
                           },
-                          "required": ["type", "criteria"],
+                          "required": [
+                            "type",
+                            "criteria"
+                          ],
                           "additionalProperties": false
                         }
                       ]
@@ -9497,7 +10668,12 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                                  "enum": [
+                                    "code-grader",
+                                    "code_grader",
+                                    "code-judge",
+                                    "code_judge"
+                                  ]
                                 },
                                 "command": {
                                   "anyOf": [
@@ -9549,7 +10725,10 @@
                                   "additionalProperties": {}
                                 }
                               },
-                              "required": ["type", "command"],
+                              "required": [
+                                "type",
+                                "command"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -9579,7 +10758,12 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                                  "enum": [
+                                    "llm-grader",
+                                    "llm_grader",
+                                    "llm-judge",
+                                    "llm_judge"
+                                  ]
                                 },
                                 "prompt": {
                                   "anyOf": [
@@ -9673,7 +10857,10 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": ["score_range", "outcome"],
+                                          "required": [
+                                            "score_range",
+                                            "outcome"
+                                          ],
                                           "additionalProperties": false
                                         }
                                       }
@@ -9702,7 +10889,9 @@
                                   "maximum": 2
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -9762,7 +10951,9 @@
                                           }
                                         }
                                       },
-                                      "required": ["type"],
+                                      "required": [
+                                        "type"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -9778,7 +10969,10 @@
                                           "maximum": 1
                                         }
                                       },
-                                      "required": ["type", "threshold"],
+                                      "required": [
+                                        "type",
+                                        "threshold"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -9795,7 +10989,10 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": ["type", "path"],
+                                      "required": [
+                                        "type",
+                                        "path"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -9812,13 +11009,18 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": ["type"],
+                                      "required": [
+                                        "type"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   ]
                                 }
                               },
-                              "required": ["type", "aggregator"],
+                              "required": [
+                                "type",
+                                "aggregator"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -9848,11 +11050,20 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["tool-trajectory", "tool_trajectory"]
+                                  "enum": [
+                                    "tool-trajectory",
+                                    "tool_trajectory"
+                                  ]
                                 },
                                 "mode": {
                                   "type": "string",
-                                  "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                                  "enum": [
+                                    "any_order",
+                                    "in_order",
+                                    "exact",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 "minimums": {
                                   "type": "object",
@@ -9893,7 +11104,12 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": ["exact", "ignore", "subset", "superset"]
+                                            "enum": [
+                                              "exact",
+                                              "ignore",
+                                              "subset",
+                                              "superset"
+                                            ]
                                           },
                                           {
                                             "type": "array",
@@ -9907,7 +11123,12 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": ["exact", "ignore", "subset", "superset"]
+                                            "enum": [
+                                              "exact",
+                                              "ignore",
+                                              "subset",
+                                              "superset"
+                                            ]
                                           },
                                           {
                                             "type": "array",
@@ -9918,7 +11139,9 @@
                                         ]
                                       }
                                     },
-                                    "required": ["tool"],
+                                    "required": [
+                                      "tool"
+                                    ],
                                     "additionalProperties": false
                                   }
                                 },
@@ -9926,7 +11149,12 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": ["exact", "ignore", "subset", "superset"]
+                                      "enum": [
+                                        "exact",
+                                        "ignore",
+                                        "subset",
+                                        "superset"
+                                      ]
                                     },
                                     {
                                       "type": "array",
@@ -9940,7 +11168,12 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": ["exact", "ignore", "subset", "superset"]
+                                      "enum": [
+                                        "exact",
+                                        "ignore",
+                                        "subset",
+                                        "superset"
+                                      ]
                                     },
                                     {
                                       "type": "array",
@@ -9951,7 +11184,10 @@
                                   ]
                                 }
                               },
-                              "required": ["type", "mode"],
+                              "required": [
+                                "type",
+                                "mode"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -9981,7 +11217,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["field-accuracy", "field_accuracy"]
+                                  "enum": [
+                                    "field-accuracy",
+                                    "field_accuracy"
+                                  ]
                                 },
                                 "fields": {
                                   "type": "array",
@@ -9993,7 +11232,11 @@
                                       },
                                       "match": {
                                         "type": "string",
-                                        "enum": ["exact", "numeric_tolerance", "date"]
+                                        "enum": [
+                                          "exact",
+                                          "numeric_tolerance",
+                                          "date"
+                                        ]
                                       },
                                       "required": {
                                         "type": "boolean"
@@ -10015,17 +11258,26 @@
                                         }
                                       }
                                     },
-                                    "required": ["path", "match"],
+                                    "required": [
+                                      "path",
+                                      "match"
+                                    ],
                                     "additionalProperties": false
                                   },
                                   "minItems": 1
                                 },
                                 "aggregation": {
                                   "type": "string",
-                                  "enum": ["weighted_average", "all_or_nothing"]
+                                  "enum": [
+                                    "weighted_average",
+                                    "all_or_nothing"
+                                  ]
                                 }
                               },
-                              "required": ["type", "fields"],
+                              "required": [
+                                "type",
+                                "fields"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -10062,7 +11314,10 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type", "threshold"],
+                              "required": [
+                                "type",
+                                "threshold"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -10099,7 +11354,10 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type", "budget"],
+                              "required": [
+                                "type",
+                                "budget"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -10129,7 +11387,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["token-usage", "token_usage"]
+                                  "enum": [
+                                    "token-usage",
+                                    "token_usage"
+                                  ]
                                 },
                                 "max_total": {
                                   "type": "number",
@@ -10144,7 +11405,9 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -10174,7 +11437,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["execution-metrics", "execution_metrics"]
+                                  "enum": [
+                                    "execution-metrics",
+                                    "execution_metrics"
+                                  ]
                                 },
                                 "max_tool_calls": {
                                   "type": "number",
@@ -10206,7 +11472,9 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -10242,7 +11510,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -10278,7 +11549,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -10308,10 +11582,15 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["is-json", "is_json"]
+                                  "enum": [
+                                    "is-json",
+                                    "is_json"
+                                  ]
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -10347,7 +11626,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -10428,7 +11710,10 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": ["score_range", "outcome"],
+                                          "required": [
+                                            "score_range",
+                                            "outcome"
+                                          ],
                                           "additionalProperties": false
                                         }
                                       }
@@ -10438,7 +11723,10 @@
                                   "minItems": 1
                                 }
                               },
-                              "required": ["type", "criteria"],
+                              "required": [
+                                "type",
+                                "criteria"
+                              ],
                               "additionalProperties": false
                             }
                           ]
@@ -10475,7 +11763,12 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                                  "enum": [
+                                    "code-grader",
+                                    "code_grader",
+                                    "code-judge",
+                                    "code_judge"
+                                  ]
                                 },
                                 "command": {
                                   "anyOf": [
@@ -10527,7 +11820,10 @@
                                   "additionalProperties": {}
                                 }
                               },
-                              "required": ["type", "command"],
+                              "required": [
+                                "type",
+                                "command"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -10557,7 +11853,12 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                                  "enum": [
+                                    "llm-grader",
+                                    "llm_grader",
+                                    "llm-judge",
+                                    "llm_judge"
+                                  ]
                                 },
                                 "prompt": {
                                   "anyOf": [
@@ -10651,7 +11952,10 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": ["score_range", "outcome"],
+                                          "required": [
+                                            "score_range",
+                                            "outcome"
+                                          ],
                                           "additionalProperties": false
                                         }
                                       }
@@ -10680,7 +11984,9 @@
                                   "maximum": 2
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -10740,7 +12046,9 @@
                                           }
                                         }
                                       },
-                                      "required": ["type"],
+                                      "required": [
+                                        "type"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -10756,7 +12064,10 @@
                                           "maximum": 1
                                         }
                                       },
-                                      "required": ["type", "threshold"],
+                                      "required": [
+                                        "type",
+                                        "threshold"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -10773,7 +12084,10 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": ["type", "path"],
+                                      "required": [
+                                        "type",
+                                        "path"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -10790,13 +12104,18 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": ["type"],
+                                      "required": [
+                                        "type"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   ]
                                 }
                               },
-                              "required": ["type", "aggregator"],
+                              "required": [
+                                "type",
+                                "aggregator"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -10826,11 +12145,20 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["tool-trajectory", "tool_trajectory"]
+                                  "enum": [
+                                    "tool-trajectory",
+                                    "tool_trajectory"
+                                  ]
                                 },
                                 "mode": {
                                   "type": "string",
-                                  "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                                  "enum": [
+                                    "any_order",
+                                    "in_order",
+                                    "exact",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 "minimums": {
                                   "type": "object",
@@ -10871,7 +12199,12 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": ["exact", "ignore", "subset", "superset"]
+                                            "enum": [
+                                              "exact",
+                                              "ignore",
+                                              "subset",
+                                              "superset"
+                                            ]
                                           },
                                           {
                                             "type": "array",
@@ -10885,7 +12218,12 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": ["exact", "ignore", "subset", "superset"]
+                                            "enum": [
+                                              "exact",
+                                              "ignore",
+                                              "subset",
+                                              "superset"
+                                            ]
                                           },
                                           {
                                             "type": "array",
@@ -10896,7 +12234,9 @@
                                         ]
                                       }
                                     },
-                                    "required": ["tool"],
+                                    "required": [
+                                      "tool"
+                                    ],
                                     "additionalProperties": false
                                   }
                                 },
@@ -10904,7 +12244,12 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": ["exact", "ignore", "subset", "superset"]
+                                      "enum": [
+                                        "exact",
+                                        "ignore",
+                                        "subset",
+                                        "superset"
+                                      ]
                                     },
                                     {
                                       "type": "array",
@@ -10918,7 +12263,12 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": ["exact", "ignore", "subset", "superset"]
+                                      "enum": [
+                                        "exact",
+                                        "ignore",
+                                        "subset",
+                                        "superset"
+                                      ]
                                     },
                                     {
                                       "type": "array",
@@ -10929,7 +12279,10 @@
                                   ]
                                 }
                               },
-                              "required": ["type", "mode"],
+                              "required": [
+                                "type",
+                                "mode"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -10959,7 +12312,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["field-accuracy", "field_accuracy"]
+                                  "enum": [
+                                    "field-accuracy",
+                                    "field_accuracy"
+                                  ]
                                 },
                                 "fields": {
                                   "type": "array",
@@ -10971,7 +12327,11 @@
                                       },
                                       "match": {
                                         "type": "string",
-                                        "enum": ["exact", "numeric_tolerance", "date"]
+                                        "enum": [
+                                          "exact",
+                                          "numeric_tolerance",
+                                          "date"
+                                        ]
                                       },
                                       "required": {
                                         "type": "boolean"
@@ -10993,17 +12353,26 @@
                                         }
                                       }
                                     },
-                                    "required": ["path", "match"],
+                                    "required": [
+                                      "path",
+                                      "match"
+                                    ],
                                     "additionalProperties": false
                                   },
                                   "minItems": 1
                                 },
                                 "aggregation": {
                                   "type": "string",
-                                  "enum": ["weighted_average", "all_or_nothing"]
+                                  "enum": [
+                                    "weighted_average",
+                                    "all_or_nothing"
+                                  ]
                                 }
                               },
-                              "required": ["type", "fields"],
+                              "required": [
+                                "type",
+                                "fields"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -11040,7 +12409,10 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type", "threshold"],
+                              "required": [
+                                "type",
+                                "threshold"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -11077,7 +12449,10 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type", "budget"],
+                              "required": [
+                                "type",
+                                "budget"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -11107,7 +12482,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["token-usage", "token_usage"]
+                                  "enum": [
+                                    "token-usage",
+                                    "token_usage"
+                                  ]
                                 },
                                 "max_total": {
                                   "type": "number",
@@ -11122,7 +12500,9 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -11152,7 +12532,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["execution-metrics", "execution_metrics"]
+                                  "enum": [
+                                    "execution-metrics",
+                                    "execution_metrics"
+                                  ]
                                 },
                                 "max_tool_calls": {
                                   "type": "number",
@@ -11184,7 +12567,9 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -11220,7 +12605,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -11256,7 +12644,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -11286,10 +12677,15 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["is-json", "is_json"]
+                                  "enum": [
+                                    "is-json",
+                                    "is_json"
+                                  ]
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -11325,7 +12721,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -11406,7 +12805,10 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": ["score_range", "outcome"],
+                                          "required": [
+                                            "score_range",
+                                            "outcome"
+                                          ],
                                           "additionalProperties": false
                                         }
                                       }
@@ -11416,7 +12818,10 @@
                                   "minItems": 1
                                 }
                               },
-                              "required": ["type", "criteria"],
+                              "required": [
+                                "type",
+                                "criteria"
+                              ],
                               "additionalProperties": false
                             }
                           ]
@@ -11453,7 +12858,12 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                                  "enum": [
+                                    "code-grader",
+                                    "code_grader",
+                                    "code-judge",
+                                    "code_judge"
+                                  ]
                                 },
                                 "command": {
                                   "anyOf": [
@@ -11505,7 +12915,10 @@
                                   "additionalProperties": {}
                                 }
                               },
-                              "required": ["type", "command"],
+                              "required": [
+                                "type",
+                                "command"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -11535,7 +12948,12 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                                  "enum": [
+                                    "llm-grader",
+                                    "llm_grader",
+                                    "llm-judge",
+                                    "llm_judge"
+                                  ]
                                 },
                                 "prompt": {
                                   "anyOf": [
@@ -11629,7 +13047,10 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": ["score_range", "outcome"],
+                                          "required": [
+                                            "score_range",
+                                            "outcome"
+                                          ],
                                           "additionalProperties": false
                                         }
                                       }
@@ -11658,7 +13079,9 @@
                                   "maximum": 2
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -11718,7 +13141,9 @@
                                           }
                                         }
                                       },
-                                      "required": ["type"],
+                                      "required": [
+                                        "type"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -11734,7 +13159,10 @@
                                           "maximum": 1
                                         }
                                       },
-                                      "required": ["type", "threshold"],
+                                      "required": [
+                                        "type",
+                                        "threshold"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -11751,7 +13179,10 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": ["type", "path"],
+                                      "required": [
+                                        "type",
+                                        "path"
+                                      ],
                                       "additionalProperties": false
                                     },
                                     {
@@ -11768,13 +13199,18 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": ["type"],
+                                      "required": [
+                                        "type"
+                                      ],
                                       "additionalProperties": false
                                     }
                                   ]
                                 }
                               },
-                              "required": ["type", "aggregator"],
+                              "required": [
+                                "type",
+                                "aggregator"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -11804,11 +13240,20 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["tool-trajectory", "tool_trajectory"]
+                                  "enum": [
+                                    "tool-trajectory",
+                                    "tool_trajectory"
+                                  ]
                                 },
                                 "mode": {
                                   "type": "string",
-                                  "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                                  "enum": [
+                                    "any_order",
+                                    "in_order",
+                                    "exact",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 "minimums": {
                                   "type": "object",
@@ -11849,7 +13294,12 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": ["exact", "ignore", "subset", "superset"]
+                                            "enum": [
+                                              "exact",
+                                              "ignore",
+                                              "subset",
+                                              "superset"
+                                            ]
                                           },
                                           {
                                             "type": "array",
@@ -11863,7 +13313,12 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": ["exact", "ignore", "subset", "superset"]
+                                            "enum": [
+                                              "exact",
+                                              "ignore",
+                                              "subset",
+                                              "superset"
+                                            ]
                                           },
                                           {
                                             "type": "array",
@@ -11874,7 +13329,9 @@
                                         ]
                                       }
                                     },
-                                    "required": ["tool"],
+                                    "required": [
+                                      "tool"
+                                    ],
                                     "additionalProperties": false
                                   }
                                 },
@@ -11882,7 +13339,12 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": ["exact", "ignore", "subset", "superset"]
+                                      "enum": [
+                                        "exact",
+                                        "ignore",
+                                        "subset",
+                                        "superset"
+                                      ]
                                     },
                                     {
                                       "type": "array",
@@ -11896,7 +13358,12 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": ["exact", "ignore", "subset", "superset"]
+                                      "enum": [
+                                        "exact",
+                                        "ignore",
+                                        "subset",
+                                        "superset"
+                                      ]
                                     },
                                     {
                                       "type": "array",
@@ -11907,7 +13374,10 @@
                                   ]
                                 }
                               },
-                              "required": ["type", "mode"],
+                              "required": [
+                                "type",
+                                "mode"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -11937,7 +13407,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["field-accuracy", "field_accuracy"]
+                                  "enum": [
+                                    "field-accuracy",
+                                    "field_accuracy"
+                                  ]
                                 },
                                 "fields": {
                                   "type": "array",
@@ -11949,7 +13422,11 @@
                                       },
                                       "match": {
                                         "type": "string",
-                                        "enum": ["exact", "numeric_tolerance", "date"]
+                                        "enum": [
+                                          "exact",
+                                          "numeric_tolerance",
+                                          "date"
+                                        ]
                                       },
                                       "required": {
                                         "type": "boolean"
@@ -11971,17 +13448,26 @@
                                         }
                                       }
                                     },
-                                    "required": ["path", "match"],
+                                    "required": [
+                                      "path",
+                                      "match"
+                                    ],
                                     "additionalProperties": false
                                   },
                                   "minItems": 1
                                 },
                                 "aggregation": {
                                   "type": "string",
-                                  "enum": ["weighted_average", "all_or_nothing"]
+                                  "enum": [
+                                    "weighted_average",
+                                    "all_or_nothing"
+                                  ]
                                 }
                               },
-                              "required": ["type", "fields"],
+                              "required": [
+                                "type",
+                                "fields"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -12018,7 +13504,10 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type", "threshold"],
+                              "required": [
+                                "type",
+                                "threshold"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -12055,7 +13544,10 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type", "budget"],
+                              "required": [
+                                "type",
+                                "budget"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -12085,7 +13577,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["token-usage", "token_usage"]
+                                  "enum": [
+                                    "token-usage",
+                                    "token_usage"
+                                  ]
                                 },
                                 "max_total": {
                                   "type": "number",
@@ -12100,7 +13595,9 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -12130,7 +13627,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["execution-metrics", "execution_metrics"]
+                                  "enum": [
+                                    "execution-metrics",
+                                    "execution_metrics"
+                                  ]
                                 },
                                 "max_tool_calls": {
                                   "type": "number",
@@ -12162,7 +13662,9 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -12198,7 +13700,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -12234,7 +13739,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -12264,10 +13772,15 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": ["is-json", "is_json"]
+                                  "enum": [
+                                    "is-json",
+                                    "is_json"
+                                  ]
                                 }
                               },
-                              "required": ["type"],
+                              "required": [
+                                "type"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -12303,7 +13816,10 @@
                                   "type": "string"
                                 }
                               },
-                              "required": ["type", "value"],
+                              "required": [
+                                "type",
+                                "value"
+                              ],
                               "additionalProperties": false
                             },
                             {
@@ -12384,7 +13900,10 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": ["score_range", "outcome"],
+                                          "required": [
+                                            "score_range",
+                                            "outcome"
+                                          ],
                                           "additionalProperties": false
                                         }
                                       }
@@ -12394,7 +13913,10 @@
                                   "minItems": 1
                                 }
                               },
-                              "required": ["type", "criteria"],
+                              "required": [
+                                "type",
+                                "criteria"
+                              ],
                               "additionalProperties": false
                             }
                           ]
@@ -12415,7 +13937,11 @@
                           },
                           "strategy": {
                             "type": "string",
-                            "enum": ["pass_at_k", "mean", "confidence_interval"]
+                            "enum": [
+                              "pass_at_k",
+                              "mean",
+                              "confidence_interval"
+                            ]
                           },
                           "cost_limit_usd": {
                             "type": "number",
@@ -12426,7 +13952,9 @@
                             "minimum": 0
                           }
                         },
-                        "required": ["count"],
+                        "required": [
+                          "count"
+                        ],
                         "additionalProperties": false
                       },
                       "total_budget_usd": {
@@ -12442,6 +13970,11 @@
                       },
                       "failOnError": {
                         "type": "boolean"
+                      },
+                      "threshold": {
+                        "type": "number",
+                        "minimum": 0,
+                        "maximum": 1
                       }
                     },
                     "additionalProperties": false
@@ -12454,7 +13987,10 @@
                       },
                       "isolation": {
                         "type": "string",
-                        "enum": ["shared", "per_test"]
+                        "enum": [
+                          "shared",
+                          "per_test"
+                        ]
                       },
                       "repos": {
                         "type": "array",
@@ -12478,7 +14014,10 @@
                                       "format": "uri"
                                     }
                                   },
-                                  "required": ["type", "url"],
+                                  "required": [
+                                    "type",
+                                    "url"
+                                  ],
                                   "additionalProperties": false
                                 },
                                 {
@@ -12492,7 +14031,10 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": ["type", "path"],
+                                  "required": [
+                                    "type",
+                                    "path"
+                                  ],
                                   "additionalProperties": false
                                 }
                               ]
@@ -12505,7 +14047,10 @@
                                 },
                                 "resolve": {
                                   "type": "string",
-                                  "enum": ["remote", "local"]
+                                  "enum": [
+                                    "remote",
+                                    "local"
+                                  ]
                                 },
                                 "ancestor": {
                                   "type": "integer",
@@ -12534,7 +14079,10 @@
                               "additionalProperties": false
                             }
                           },
-                          "required": ["path", "source"],
+                          "required": [
+                            "path",
+                            "source"
+                          ],
                           "additionalProperties": false
                         }
                       },
@@ -12570,7 +14118,11 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": ["none", "fast", "strict"]
+                                "enum": [
+                                  "none",
+                                  "fast",
+                                  "strict"
+                                ]
                               }
                             },
                             "additionalProperties": false
@@ -12601,7 +14153,11 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": ["none", "fast", "strict"]
+                                "enum": [
+                                  "none",
+                                  "fast",
+                                  "strict"
+                                ]
                               }
                             },
                             "additionalProperties": false
@@ -12632,7 +14188,11 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": ["none", "fast", "strict"]
+                                "enum": [
+                                  "none",
+                                  "fast",
+                                  "strict"
+                                ]
                               }
                             },
                             "additionalProperties": false
@@ -12663,7 +14223,11 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": ["none", "fast", "strict"]
+                                "enum": [
+                                  "none",
+                                  "fast",
+                                  "strict"
+                                ]
                               }
                             },
                             "additionalProperties": false
@@ -12673,7 +14237,11 @@
                       },
                       "mode": {
                         "type": "string",
-                        "enum": ["pooled", "temp", "static"]
+                        "enum": [
+                          "pooled",
+                          "temp",
+                          "static"
+                        ]
                       },
                       "path": {
                         "type": "string"
@@ -12695,7 +14263,9 @@
                     "type": "string"
                   }
                 },
-                "required": ["id"],
+                "required": [
+                  "id"
+                ],
                 "additionalProperties": false
               }
             },
@@ -12755,7 +14325,12 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                        "enum": [
+                          "code-grader",
+                          "code_grader",
+                          "code-judge",
+                          "code_judge"
+                        ]
                       },
                       "command": {
                         "anyOf": [
@@ -12807,7 +14382,10 @@
                         "additionalProperties": {}
                       }
                     },
-                    "required": ["type", "command"],
+                    "required": [
+                      "type",
+                      "command"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -12837,7 +14415,12 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                        "enum": [
+                          "llm-grader",
+                          "llm_grader",
+                          "llm-judge",
+                          "llm_judge"
+                        ]
                       },
                       "prompt": {
                         "anyOf": [
@@ -12931,7 +14514,10 @@
                                     "minLength": 1
                                   }
                                 },
-                                "required": ["score_range", "outcome"],
+                                "required": [
+                                  "score_range",
+                                  "outcome"
+                                ],
                                 "additionalProperties": false
                               }
                             }
@@ -12960,7 +14546,9 @@
                         "maximum": 2
                       }
                     },
-                    "required": ["type"],
+                    "required": [
+                      "type"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13020,7 +14608,9 @@
                                 }
                               }
                             },
-                            "required": ["type"],
+                            "required": [
+                              "type"
+                            ],
                             "additionalProperties": false
                           },
                           {
@@ -13036,7 +14626,10 @@
                                 "maximum": 1
                               }
                             },
-                            "required": ["type", "threshold"],
+                            "required": [
+                              "type",
+                              "threshold"
+                            ],
                             "additionalProperties": false
                           },
                           {
@@ -13053,7 +14646,10 @@
                                 "type": "string"
                               }
                             },
-                            "required": ["type", "path"],
+                            "required": [
+                              "type",
+                              "path"
+                            ],
                             "additionalProperties": false
                           },
                           {
@@ -13070,13 +14666,18 @@
                                 "type": "string"
                               }
                             },
-                            "required": ["type"],
+                            "required": [
+                              "type"
+                            ],
                             "additionalProperties": false
                           }
                         ]
                       }
                     },
-                    "required": ["type", "aggregator"],
+                    "required": [
+                      "type",
+                      "aggregator"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13106,11 +14707,20 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["tool-trajectory", "tool_trajectory"]
+                        "enum": [
+                          "tool-trajectory",
+                          "tool_trajectory"
+                        ]
                       },
                       "mode": {
                         "type": "string",
-                        "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                        "enum": [
+                          "any_order",
+                          "in_order",
+                          "exact",
+                          "subset",
+                          "superset"
+                        ]
                       },
                       "minimums": {
                         "type": "object",
@@ -13151,7 +14761,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -13165,7 +14780,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -13176,7 +14796,9 @@
                               ]
                             }
                           },
-                          "required": ["tool"],
+                          "required": [
+                            "tool"
+                          ],
                           "additionalProperties": false
                         }
                       },
@@ -13184,7 +14806,12 @@
                         "anyOf": [
                           {
                             "type": "string",
-                            "enum": ["exact", "ignore", "subset", "superset"]
+                            "enum": [
+                              "exact",
+                              "ignore",
+                              "subset",
+                              "superset"
+                            ]
                           },
                           {
                             "type": "array",
@@ -13198,7 +14825,12 @@
                         "anyOf": [
                           {
                             "type": "string",
-                            "enum": ["exact", "ignore", "subset", "superset"]
+                            "enum": [
+                              "exact",
+                              "ignore",
+                              "subset",
+                              "superset"
+                            ]
                           },
                           {
                             "type": "array",
@@ -13209,7 +14841,10 @@
                         ]
                       }
                     },
-                    "required": ["type", "mode"],
+                    "required": [
+                      "type",
+                      "mode"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13239,7 +14874,10 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["field-accuracy", "field_accuracy"]
+                        "enum": [
+                          "field-accuracy",
+                          "field_accuracy"
+                        ]
                       },
                       "fields": {
                         "type": "array",
@@ -13251,7 +14889,11 @@
                             },
                             "match": {
                               "type": "string",
-                              "enum": ["exact", "numeric_tolerance", "date"]
+                              "enum": [
+                                "exact",
+                                "numeric_tolerance",
+                                "date"
+                              ]
                             },
                             "required": {
                               "type": "boolean"
@@ -13273,17 +14915,26 @@
                               }
                             }
                           },
-                          "required": ["path", "match"],
+                          "required": [
+                            "path",
+                            "match"
+                          ],
                           "additionalProperties": false
                         },
                         "minItems": 1
                       },
                       "aggregation": {
                         "type": "string",
-                        "enum": ["weighted_average", "all_or_nothing"]
+                        "enum": [
+                          "weighted_average",
+                          "all_or_nothing"
+                        ]
                       }
                     },
-                    "required": ["type", "fields"],
+                    "required": [
+                      "type",
+                      "fields"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13320,7 +14971,10 @@
                         "minimum": 0
                       }
                     },
-                    "required": ["type", "threshold"],
+                    "required": [
+                      "type",
+                      "threshold"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13357,7 +15011,10 @@
                         "minimum": 0
                       }
                     },
-                    "required": ["type", "budget"],
+                    "required": [
+                      "type",
+                      "budget"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13387,7 +15044,10 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["token-usage", "token_usage"]
+                        "enum": [
+                          "token-usage",
+                          "token_usage"
+                        ]
                       },
                       "max_total": {
                         "type": "number",
@@ -13402,7 +15062,9 @@
                         "minimum": 0
                       }
                     },
-                    "required": ["type"],
+                    "required": [
+                      "type"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13432,7 +15094,10 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["execution-metrics", "execution_metrics"]
+                        "enum": [
+                          "execution-metrics",
+                          "execution_metrics"
+                        ]
                       },
                       "max_tool_calls": {
                         "type": "number",
@@ -13464,7 +15129,9 @@
                         "minimum": 0
                       }
                     },
-                    "required": ["type"],
+                    "required": [
+                      "type"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13500,7 +15167,10 @@
                         "type": "string"
                       }
                     },
-                    "required": ["type", "value"],
+                    "required": [
+                      "type",
+                      "value"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13536,7 +15206,10 @@
                         "type": "string"
                       }
                     },
-                    "required": ["type", "value"],
+                    "required": [
+                      "type",
+                      "value"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13566,10 +15239,15 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["is-json", "is_json"]
+                        "enum": [
+                          "is-json",
+                          "is_json"
+                        ]
                       }
                     },
-                    "required": ["type"],
+                    "required": [
+                      "type"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13605,7 +15283,10 @@
                         "type": "string"
                       }
                     },
-                    "required": ["type", "value"],
+                    "required": [
+                      "type",
+                      "value"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13686,7 +15367,10 @@
                                     "minLength": 1
                                   }
                                 },
-                                "required": ["score_range", "outcome"],
+                                "required": [
+                                  "score_range",
+                                  "outcome"
+                                ],
                                 "additionalProperties": false
                               }
                             }
@@ -13696,7 +15380,10 @@
                         "minItems": 1
                       }
                     },
-                    "required": ["type", "criteria"],
+                    "required": [
+                      "type",
+                      "criteria"
+                    ],
                     "additionalProperties": false
                   }
                 ]
@@ -13733,7 +15420,12 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                        "enum": [
+                          "code-grader",
+                          "code_grader",
+                          "code-judge",
+                          "code_judge"
+                        ]
                       },
                       "command": {
                         "anyOf": [
@@ -13785,7 +15477,10 @@
                         "additionalProperties": {}
                       }
                     },
-                    "required": ["type", "command"],
+                    "required": [
+                      "type",
+                      "command"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13815,7 +15510,12 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                        "enum": [
+                          "llm-grader",
+                          "llm_grader",
+                          "llm-judge",
+                          "llm_judge"
+                        ]
                       },
                       "prompt": {
                         "anyOf": [
@@ -13909,7 +15609,10 @@
                                     "minLength": 1
                                   }
                                 },
-                                "required": ["score_range", "outcome"],
+                                "required": [
+                                  "score_range",
+                                  "outcome"
+                                ],
                                 "additionalProperties": false
                               }
                             }
@@ -13938,7 +15641,9 @@
                         "maximum": 2
                       }
                     },
-                    "required": ["type"],
+                    "required": [
+                      "type"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -13998,7 +15703,9 @@
                                 }
                               }
                             },
-                            "required": ["type"],
+                            "required": [
+                              "type"
+                            ],
                             "additionalProperties": false
                           },
                           {
@@ -14014,7 +15721,10 @@
                                 "maximum": 1
                               }
                             },
-                            "required": ["type", "threshold"],
+                            "required": [
+                              "type",
+                              "threshold"
+                            ],
                             "additionalProperties": false
                           },
                           {
@@ -14031,7 +15741,10 @@
                                 "type": "string"
                               }
                             },
-                            "required": ["type", "path"],
+                            "required": [
+                              "type",
+                              "path"
+                            ],
                             "additionalProperties": false
                           },
                           {
@@ -14048,13 +15761,18 @@
                                 "type": "string"
                               }
                             },
-                            "required": ["type"],
+                            "required": [
+                              "type"
+                            ],
                             "additionalProperties": false
                           }
                         ]
                       }
                     },
-                    "required": ["type", "aggregator"],
+                    "required": [
+                      "type",
+                      "aggregator"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -14084,11 +15802,20 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["tool-trajectory", "tool_trajectory"]
+                        "enum": [
+                          "tool-trajectory",
+                          "tool_trajectory"
+                        ]
                       },
                       "mode": {
                         "type": "string",
-                        "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                        "enum": [
+                          "any_order",
+                          "in_order",
+                          "exact",
+                          "subset",
+                          "superset"
+                        ]
                       },
                       "minimums": {
                         "type": "object",
@@ -14129,7 +15856,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -14143,7 +15875,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -14154,7 +15891,9 @@
                               ]
                             }
                           },
-                          "required": ["tool"],
+                          "required": [
+                            "tool"
+                          ],
                           "additionalProperties": false
                         }
                       },
@@ -14162,7 +15901,12 @@
                         "anyOf": [
                           {
                             "type": "string",
-                            "enum": ["exact", "ignore", "subset", "superset"]
+                            "enum": [
+                              "exact",
+                              "ignore",
+                              "subset",
+                              "superset"
+                            ]
                           },
                           {
                             "type": "array",
@@ -14176,7 +15920,12 @@
                         "anyOf": [
                           {
                             "type": "string",
-                            "enum": ["exact", "ignore", "subset", "superset"]
+                            "enum": [
+                              "exact",
+                              "ignore",
+                              "subset",
+                              "superset"
+                            ]
                           },
                           {
                             "type": "array",
@@ -14187,7 +15936,10 @@
                         ]
                       }
                     },
-                    "required": ["type", "mode"],
+                    "required": [
+                      "type",
+                      "mode"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -14217,7 +15969,10 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["field-accuracy", "field_accuracy"]
+                        "enum": [
+                          "field-accuracy",
+                          "field_accuracy"
+                        ]
                       },
                       "fields": {
                         "type": "array",
@@ -14229,7 +15984,11 @@
                             },
                             "match": {
                               "type": "string",
-                              "enum": ["exact", "numeric_tolerance", "date"]
+                              "enum": [
+                                "exact",
+                                "numeric_tolerance",
+                                "date"
+                              ]
                             },
                             "required": {
                               "type": "boolean"
@@ -14251,17 +16010,26 @@
                               }
                             }
                           },
-                          "required": ["path", "match"],
+                          "required": [
+                            "path",
+                            "match"
+                          ],
                           "additionalProperties": false
                         },
                         "minItems": 1
                       },
                       "aggregation": {
                         "type": "string",
-                        "enum": ["weighted_average", "all_or_nothing"]
+                        "enum": [
+                          "weighted_average",
+                          "all_or_nothing"
+                        ]
                       }
                     },
-                    "required": ["type", "fields"],
+                    "required": [
+                      "type",
+                      "fields"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -14298,7 +16066,10 @@
                         "minimum": 0
                       }
                     },
-                    "required": ["type", "threshold"],
+                    "required": [
+                      "type",
+                      "threshold"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -14335,7 +16106,10 @@
                         "minimum": 0
                       }
                     },
-                    "required": ["type", "budget"],
+                    "required": [
+                      "type",
+                      "budget"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -14365,7 +16139,10 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["token-usage", "token_usage"]
+                        "enum": [
+                          "token-usage",
+                          "token_usage"
+                        ]
                       },
                       "max_total": {
                         "type": "number",
@@ -14380,7 +16157,9 @@
                         "minimum": 0
                       }
                     },
-                    "required": ["type"],
+                    "required": [
+                      "type"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -14410,7 +16189,10 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["execution-metrics", "execution_metrics"]
+                        "enum": [
+                          "execution-metrics",
+                          "execution_metrics"
+                        ]
                       },
                       "max_tool_calls": {
                         "type": "number",
@@ -14442,7 +16224,9 @@
                         "minimum": 0
                       }
                     },
-                    "required": ["type"],
+                    "required": [
+                      "type"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -14478,7 +16262,10 @@
                         "type": "string"
                       }
                     },
-                    "required": ["type", "value"],
+                    "required": [
+                      "type",
+                      "value"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -14514,7 +16301,10 @@
                         "type": "string"
                       }
                     },
-                    "required": ["type", "value"],
+                    "required": [
+                      "type",
+                      "value"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -14544,10 +16334,15 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["is-json", "is_json"]
+                        "enum": [
+                          "is-json",
+                          "is_json"
+                        ]
                       }
                     },
-                    "required": ["type"],
+                    "required": [
+                      "type"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -14583,7 +16378,10 @@
                         "type": "string"
                       }
                     },
-                    "required": ["type", "value"],
+                    "required": [
+                      "type",
+                      "value"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -14664,7 +16462,10 @@
                                     "minLength": 1
                                   }
                                 },
-                                "required": ["score_range", "outcome"],
+                                "required": [
+                                  "score_range",
+                                  "outcome"
+                                ],
                                 "additionalProperties": false
                               }
                             }
@@ -14674,7 +16475,10 @@
                         "minItems": 1
                       }
                     },
-                    "required": ["type", "criteria"],
+                    "required": [
+                      "type",
+                      "criteria"
+                    ],
                     "additionalProperties": false
                   }
                 ]
@@ -14711,7 +16515,12 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                        "enum": [
+                          "code-grader",
+                          "code_grader",
+                          "code-judge",
+                          "code_judge"
+                        ]
                       },
                       "command": {
                         "anyOf": [
@@ -14763,7 +16572,10 @@
                         "additionalProperties": {}
                       }
                     },
-                    "required": ["type", "command"],
+                    "required": [
+                      "type",
+                      "command"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -14793,7 +16605,12 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                        "enum": [
+                          "llm-grader",
+                          "llm_grader",
+                          "llm-judge",
+                          "llm_judge"
+                        ]
                       },
                       "prompt": {
                         "anyOf": [
@@ -14887,7 +16704,10 @@
                                     "minLength": 1
                                   }
                                 },
-                                "required": ["score_range", "outcome"],
+                                "required": [
+                                  "score_range",
+                                  "outcome"
+                                ],
                                 "additionalProperties": false
                               }
                             }
@@ -14916,7 +16736,9 @@
                         "maximum": 2
                       }
                     },
-                    "required": ["type"],
+                    "required": [
+                      "type"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -14976,7 +16798,9 @@
                                 }
                               }
                             },
-                            "required": ["type"],
+                            "required": [
+                              "type"
+                            ],
                             "additionalProperties": false
                           },
                           {
@@ -14992,7 +16816,10 @@
                                 "maximum": 1
                               }
                             },
-                            "required": ["type", "threshold"],
+                            "required": [
+                              "type",
+                              "threshold"
+                            ],
                             "additionalProperties": false
                           },
                           {
@@ -15009,7 +16836,10 @@
                                 "type": "string"
                               }
                             },
-                            "required": ["type", "path"],
+                            "required": [
+                              "type",
+                              "path"
+                            ],
                             "additionalProperties": false
                           },
                           {
@@ -15026,13 +16856,18 @@
                                 "type": "string"
                               }
                             },
-                            "required": ["type"],
+                            "required": [
+                              "type"
+                            ],
                             "additionalProperties": false
                           }
                         ]
                       }
                     },
-                    "required": ["type", "aggregator"],
+                    "required": [
+                      "type",
+                      "aggregator"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -15062,11 +16897,20 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["tool-trajectory", "tool_trajectory"]
+                        "enum": [
+                          "tool-trajectory",
+                          "tool_trajectory"
+                        ]
                       },
                       "mode": {
                         "type": "string",
-                        "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                        "enum": [
+                          "any_order",
+                          "in_order",
+                          "exact",
+                          "subset",
+                          "superset"
+                        ]
                       },
                       "minimums": {
                         "type": "object",
@@ -15107,7 +16951,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -15121,7 +16970,12 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": ["exact", "ignore", "subset", "superset"]
+                                  "enum": [
+                                    "exact",
+                                    "ignore",
+                                    "subset",
+                                    "superset"
+                                  ]
                                 },
                                 {
                                   "type": "array",
@@ -15132,7 +16986,9 @@
                               ]
                             }
                           },
-                          "required": ["tool"],
+                          "required": [
+                            "tool"
+                          ],
                           "additionalProperties": false
                         }
                       },
@@ -15140,7 +16996,12 @@
                         "anyOf": [
                           {
                             "type": "string",
-                            "enum": ["exact", "ignore", "subset", "superset"]
+                            "enum": [
+                              "exact",
+                              "ignore",
+                              "subset",
+                              "superset"
+                            ]
                           },
                           {
                             "type": "array",
@@ -15154,7 +17015,12 @@
                         "anyOf": [
                           {
                             "type": "string",
-                            "enum": ["exact", "ignore", "subset", "superset"]
+                            "enum": [
+                              "exact",
+                              "ignore",
+                              "subset",
+                              "superset"
+                            ]
                           },
                           {
                             "type": "array",
@@ -15165,7 +17031,10 @@
                         ]
                       }
                     },
-                    "required": ["type", "mode"],
+                    "required": [
+                      "type",
+                      "mode"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -15195,7 +17064,10 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["field-accuracy", "field_accuracy"]
+                        "enum": [
+                          "field-accuracy",
+                          "field_accuracy"
+                        ]
                       },
                       "fields": {
                         "type": "array",
@@ -15207,7 +17079,11 @@
                             },
                             "match": {
                               "type": "string",
-                              "enum": ["exact", "numeric_tolerance", "date"]
+                              "enum": [
+                                "exact",
+                                "numeric_tolerance",
+                                "date"
+                              ]
                             },
                             "required": {
                               "type": "boolean"
@@ -15229,17 +17105,26 @@
                               }
                             }
                           },
-                          "required": ["path", "match"],
+                          "required": [
+                            "path",
+                            "match"
+                          ],
                           "additionalProperties": false
                         },
                         "minItems": 1
                       },
                       "aggregation": {
                         "type": "string",
-                        "enum": ["weighted_average", "all_or_nothing"]
+                        "enum": [
+                          "weighted_average",
+                          "all_or_nothing"
+                        ]
                       }
                     },
-                    "required": ["type", "fields"],
+                    "required": [
+                      "type",
+                      "fields"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -15276,7 +17161,10 @@
                         "minimum": 0
                       }
                     },
-                    "required": ["type", "threshold"],
+                    "required": [
+                      "type",
+                      "threshold"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -15313,7 +17201,10 @@
                         "minimum": 0
                       }
                     },
-                    "required": ["type", "budget"],
+                    "required": [
+                      "type",
+                      "budget"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -15343,7 +17234,10 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["token-usage", "token_usage"]
+                        "enum": [
+                          "token-usage",
+                          "token_usage"
+                        ]
                       },
                       "max_total": {
                         "type": "number",
@@ -15358,7 +17252,9 @@
                         "minimum": 0
                       }
                     },
-                    "required": ["type"],
+                    "required": [
+                      "type"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -15388,7 +17284,10 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["execution-metrics", "execution_metrics"]
+                        "enum": [
+                          "execution-metrics",
+                          "execution_metrics"
+                        ]
                       },
                       "max_tool_calls": {
                         "type": "number",
@@ -15420,7 +17319,9 @@
                         "minimum": 0
                       }
                     },
-                    "required": ["type"],
+                    "required": [
+                      "type"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -15456,7 +17357,10 @@
                         "type": "string"
                       }
                     },
-                    "required": ["type", "value"],
+                    "required": [
+                      "type",
+                      "value"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -15492,7 +17396,10 @@
                         "type": "string"
                       }
                     },
-                    "required": ["type", "value"],
+                    "required": [
+                      "type",
+                      "value"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -15522,10 +17429,15 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": ["is-json", "is_json"]
+                        "enum": [
+                          "is-json",
+                          "is_json"
+                        ]
                       }
                     },
-                    "required": ["type"],
+                    "required": [
+                      "type"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -15561,7 +17473,10 @@
                         "type": "string"
                       }
                     },
-                    "required": ["type", "value"],
+                    "required": [
+                      "type",
+                      "value"
+                    ],
                     "additionalProperties": false
                   },
                   {
@@ -15642,7 +17557,10 @@
                                     "minLength": 1
                                   }
                                 },
-                                "required": ["score_range", "outcome"],
+                                "required": [
+                                  "score_range",
+                                  "outcome"
+                                ],
                                 "additionalProperties": false
                               }
                             }
@@ -15652,7 +17570,10 @@
                         "minItems": 1
                       }
                     },
-                    "required": ["type", "criteria"],
+                    "required": [
+                      "type",
+                      "criteria"
+                    ],
                     "additionalProperties": false
                   }
                 ]
@@ -15673,7 +17594,11 @@
                 },
                 "strategy": {
                   "type": "string",
-                  "enum": ["pass_at_k", "mean", "confidence_interval"]
+                  "enum": [
+                    "pass_at_k",
+                    "mean",
+                    "confidence_interval"
+                  ]
                 },
                 "cost_limit_usd": {
                   "type": "number",
@@ -15684,7 +17609,9 @@
                   "minimum": 0
                 }
               },
-              "required": ["count"],
+              "required": [
+                "count"
+              ],
               "additionalProperties": false
             },
             "total_budget_usd": {
@@ -15700,6 +17627,11 @@
             },
             "failOnError": {
               "type": "boolean"
+            },
+            "threshold": {
+              "type": "number",
+              "minimum": 0,
+              "maximum": 1
             }
           },
           "additionalProperties": false
@@ -15735,7 +17667,12 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                    "enum": [
+                      "code-grader",
+                      "code_grader",
+                      "code-judge",
+                      "code_judge"
+                    ]
                   },
                   "command": {
                     "anyOf": [
@@ -15787,7 +17724,10 @@
                     "additionalProperties": {}
                   }
                 },
-                "required": ["type", "command"],
+                "required": [
+                  "type",
+                  "command"
+                ],
                 "additionalProperties": false
               },
               {
@@ -15817,7 +17757,12 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                    "enum": [
+                      "llm-grader",
+                      "llm_grader",
+                      "llm-judge",
+                      "llm_judge"
+                    ]
                   },
                   "prompt": {
                     "anyOf": [
@@ -15911,7 +17856,10 @@
                                 "minLength": 1
                               }
                             },
-                            "required": ["score_range", "outcome"],
+                            "required": [
+                              "score_range",
+                              "outcome"
+                            ],
                             "additionalProperties": false
                           }
                         }
@@ -15940,7 +17888,9 @@
                     "maximum": 2
                   }
                 },
-                "required": ["type"],
+                "required": [
+                  "type"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16000,7 +17950,9 @@
                             }
                           }
                         },
-                        "required": ["type"],
+                        "required": [
+                          "type"
+                        ],
                         "additionalProperties": false
                       },
                       {
@@ -16016,7 +17968,10 @@
                             "maximum": 1
                           }
                         },
-                        "required": ["type", "threshold"],
+                        "required": [
+                          "type",
+                          "threshold"
+                        ],
                         "additionalProperties": false
                       },
                       {
@@ -16033,7 +17988,10 @@
                             "type": "string"
                           }
                         },
-                        "required": ["type", "path"],
+                        "required": [
+                          "type",
+                          "path"
+                        ],
                         "additionalProperties": false
                       },
                       {
@@ -16050,13 +18008,18 @@
                             "type": "string"
                           }
                         },
-                        "required": ["type"],
+                        "required": [
+                          "type"
+                        ],
                         "additionalProperties": false
                       }
                     ]
                   }
                 },
-                "required": ["type", "aggregator"],
+                "required": [
+                  "type",
+                  "aggregator"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16086,11 +18049,20 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["tool-trajectory", "tool_trajectory"]
+                    "enum": [
+                      "tool-trajectory",
+                      "tool_trajectory"
+                    ]
                   },
                   "mode": {
                     "type": "string",
-                    "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                    "enum": [
+                      "any_order",
+                      "in_order",
+                      "exact",
+                      "subset",
+                      "superset"
+                    ]
                   },
                   "minimums": {
                     "type": "object",
@@ -16131,7 +18103,12 @@
                           "anyOf": [
                             {
                               "type": "string",
-                              "enum": ["exact", "ignore", "subset", "superset"]
+                              "enum": [
+                                "exact",
+                                "ignore",
+                                "subset",
+                                "superset"
+                              ]
                             },
                             {
                               "type": "array",
@@ -16145,7 +18122,12 @@
                           "anyOf": [
                             {
                               "type": "string",
-                              "enum": ["exact", "ignore", "subset", "superset"]
+                              "enum": [
+                                "exact",
+                                "ignore",
+                                "subset",
+                                "superset"
+                              ]
                             },
                             {
                               "type": "array",
@@ -16156,7 +18138,9 @@
                           ]
                         }
                       },
-                      "required": ["tool"],
+                      "required": [
+                        "tool"
+                      ],
                       "additionalProperties": false
                     }
                   },
@@ -16164,7 +18148,12 @@
                     "anyOf": [
                       {
                         "type": "string",
-                        "enum": ["exact", "ignore", "subset", "superset"]
+                        "enum": [
+                          "exact",
+                          "ignore",
+                          "subset",
+                          "superset"
+                        ]
                       },
                       {
                         "type": "array",
@@ -16178,7 +18167,12 @@
                     "anyOf": [
                       {
                         "type": "string",
-                        "enum": ["exact", "ignore", "subset", "superset"]
+                        "enum": [
+                          "exact",
+                          "ignore",
+                          "subset",
+                          "superset"
+                        ]
                       },
                       {
                         "type": "array",
@@ -16189,7 +18183,10 @@
                     ]
                   }
                 },
-                "required": ["type", "mode"],
+                "required": [
+                  "type",
+                  "mode"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16219,7 +18216,10 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["field-accuracy", "field_accuracy"]
+                    "enum": [
+                      "field-accuracy",
+                      "field_accuracy"
+                    ]
                   },
                   "fields": {
                     "type": "array",
@@ -16231,7 +18231,11 @@
                         },
                         "match": {
                           "type": "string",
-                          "enum": ["exact", "numeric_tolerance", "date"]
+                          "enum": [
+                            "exact",
+                            "numeric_tolerance",
+                            "date"
+                          ]
                         },
                         "required": {
                           "type": "boolean"
@@ -16253,17 +18257,26 @@
                           }
                         }
                       },
-                      "required": ["path", "match"],
+                      "required": [
+                        "path",
+                        "match"
+                      ],
                       "additionalProperties": false
                     },
                     "minItems": 1
                   },
                   "aggregation": {
                     "type": "string",
-                    "enum": ["weighted_average", "all_or_nothing"]
+                    "enum": [
+                      "weighted_average",
+                      "all_or_nothing"
+                    ]
                   }
                 },
-                "required": ["type", "fields"],
+                "required": [
+                  "type",
+                  "fields"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16300,7 +18313,10 @@
                     "minimum": 0
                   }
                 },
-                "required": ["type", "threshold"],
+                "required": [
+                  "type",
+                  "threshold"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16337,7 +18353,10 @@
                     "minimum": 0
                   }
                 },
-                "required": ["type", "budget"],
+                "required": [
+                  "type",
+                  "budget"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16367,7 +18386,10 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["token-usage", "token_usage"]
+                    "enum": [
+                      "token-usage",
+                      "token_usage"
+                    ]
                   },
                   "max_total": {
                     "type": "number",
@@ -16382,7 +18404,9 @@
                     "minimum": 0
                   }
                 },
-                "required": ["type"],
+                "required": [
+                  "type"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16412,7 +18436,10 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["execution-metrics", "execution_metrics"]
+                    "enum": [
+                      "execution-metrics",
+                      "execution_metrics"
+                    ]
                   },
                   "max_tool_calls": {
                     "type": "number",
@@ -16444,7 +18471,9 @@
                     "minimum": 0
                   }
                 },
-                "required": ["type"],
+                "required": [
+                  "type"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16480,7 +18509,10 @@
                     "type": "string"
                   }
                 },
-                "required": ["type", "value"],
+                "required": [
+                  "type",
+                  "value"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16516,7 +18548,10 @@
                     "type": "string"
                   }
                 },
-                "required": ["type", "value"],
+                "required": [
+                  "type",
+                  "value"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16546,10 +18581,15 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["is-json", "is_json"]
+                    "enum": [
+                      "is-json",
+                      "is_json"
+                    ]
                   }
                 },
-                "required": ["type"],
+                "required": [
+                  "type"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16585,7 +18625,10 @@
                     "type": "string"
                   }
                 },
-                "required": ["type", "value"],
+                "required": [
+                  "type",
+                  "value"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16666,7 +18709,10 @@
                                 "minLength": 1
                               }
                             },
-                            "required": ["score_range", "outcome"],
+                            "required": [
+                              "score_range",
+                              "outcome"
+                            ],
                             "additionalProperties": false
                           }
                         }
@@ -16676,7 +18722,10 @@
                     "minItems": 1
                   }
                 },
-                "required": ["type", "criteria"],
+                "required": [
+                  "type",
+                  "criteria"
+                ],
                 "additionalProperties": false
               }
             ]
@@ -16713,7 +18762,12 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
+                    "enum": [
+                      "code-grader",
+                      "code_grader",
+                      "code-judge",
+                      "code_judge"
+                    ]
                   },
                   "command": {
                     "anyOf": [
@@ -16765,7 +18819,10 @@
                     "additionalProperties": {}
                   }
                 },
-                "required": ["type", "command"],
+                "required": [
+                  "type",
+                  "command"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16795,7 +18852,12 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
+                    "enum": [
+                      "llm-grader",
+                      "llm_grader",
+                      "llm-judge",
+                      "llm_judge"
+                    ]
                   },
                   "prompt": {
                     "anyOf": [
@@ -16889,7 +18951,10 @@
                                 "minLength": 1
                               }
                             },
-                            "required": ["score_range", "outcome"],
+                            "required": [
+                              "score_range",
+                              "outcome"
+                            ],
                             "additionalProperties": false
                           }
                         }
@@ -16918,7 +18983,9 @@
                     "maximum": 2
                   }
                 },
-                "required": ["type"],
+                "required": [
+                  "type"
+                ],
                 "additionalProperties": false
               },
               {
@@ -16978,7 +19045,9 @@
                             }
                           }
                         },
-                        "required": ["type"],
+                        "required": [
+                          "type"
+                        ],
                         "additionalProperties": false
                       },
                       {
@@ -16994,7 +19063,10 @@
                             "maximum": 1
                           }
                         },
-                        "required": ["type", "threshold"],
+                        "required": [
+                          "type",
+                          "threshold"
+                        ],
                         "additionalProperties": false
                       },
                       {
@@ -17011,7 +19083,10 @@
                             "type": "string"
                           }
                         },
-                        "required": ["type", "path"],
+                        "required": [
+                          "type",
+                          "path"
+                        ],
                         "additionalProperties": false
                       },
                       {
@@ -17028,13 +19103,18 @@
                             "type": "string"
                           }
                         },
-                        "required": ["type"],
+                        "required": [
+                          "type"
+                        ],
                         "additionalProperties": false
                       }
                     ]
                   }
                 },
-                "required": ["type", "aggregator"],
+                "required": [
+                  "type",
+                  "aggregator"
+                ],
                 "additionalProperties": false
               },
               {
@@ -17064,11 +19144,20 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["tool-trajectory", "tool_trajectory"]
+                    "enum": [
+                      "tool-trajectory",
+                      "tool_trajectory"
+                    ]
                   },
                   "mode": {
                     "type": "string",
-                    "enum": ["any_order", "in_order", "exact", "subset", "superset"]
+                    "enum": [
+                      "any_order",
+                      "in_order",
+                      "exact",
+                      "subset",
+                      "superset"
+                    ]
                   },
                   "minimums": {
                     "type": "object",
@@ -17109,7 +19198,12 @@
                           "anyOf": [
                             {
                               "type": "string",
-                              "enum": ["exact", "ignore", "subset", "superset"]
+                              "enum": [
+                                "exact",
+                                "ignore",
+                                "subset",
+                                "superset"
+                              ]
                             },
                             {
                               "type": "array",
@@ -17123,7 +19217,12 @@
                           "anyOf": [
                             {
                               "type": "string",
-                              "enum": ["exact", "ignore", "subset", "superset"]
+                              "enum": [
+                                "exact",
+                                "ignore",
+                                "subset",
+                                "superset"
+                              ]
                             },
                             {
                               "type": "array",
@@ -17134,7 +19233,9 @@
                           ]
                         }
                       },
-                      "required": ["tool"],
+                      "required": [
+                        "tool"
+                      ],
                       "additionalProperties": false
                     }
                   },
@@ -17142,7 +19243,12 @@
                     "anyOf": [
                       {
                         "type": "string",
-                        "enum": ["exact", "ignore", "subset", "superset"]
+                        "enum": [
+                          "exact",
+                          "ignore",
+                          "subset",
+                          "superset"
+                        ]
                       },
                       {
                         "type": "array",
@@ -17156,7 +19262,12 @@
                     "anyOf": [
                       {
                         "type": "string",
-                        "enum": ["exact", "ignore", "subset", "superset"]
+                        "enum": [
+                          "exact",
+                          "ignore",
+                          "subset",
+                          "superset"
+                        ]
                       },
                       {
                         "type": "array",
@@ -17167,7 +19278,10 @@
                     ]
                   }
                 },
-                "required": ["type", "mode"],
+                "required": [
+                  "type",
+                  "mode"
+                ],
                 "additionalProperties": false
               },
               {
@@ -17197,7 +19311,10 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["field-accuracy", "field_accuracy"]
+                    "enum": [
+                      "field-accuracy",
+                      "field_accuracy"
+                    ]
                   },
                   "fields": {
                     "type": "array",
@@ -17209,7 +19326,11 @@
                         },
                         "match": {
                           "type": "string",
-                          "enum": ["exact", "numeric_tolerance", "date"]
+                          "enum": [
+                            "exact",
+                            "numeric_tolerance",
+                            "date"
+                          ]
                         },
                         "required": {
                           "type": "boolean"
@@ -17231,17 +19352,26 @@
                           }
                         }
                       },
-                      "required": ["path", "match"],
+                      "required": [
+                        "path",
+                        "match"
+                      ],
                       "additionalProperties": false
                     },
                     "minItems": 1
                   },
                   "aggregation": {
                     "type": "string",
-                    "enum": ["weighted_average", "all_or_nothing"]
+                    "enum": [
+                      "weighted_average",
+                      "all_or_nothing"
+                    ]
                   }
                 },
-                "required": ["type", "fields"],
+                "required": [
+                  "type",
+                  "fields"
+                ],
                 "additionalProperties": false
               },
               {
@@ -17278,7 +19408,10 @@
                     "minimum": 0
                   }
                 },
-                "required": ["type", "threshold"],
+                "required": [
+                  "type",
+                  "threshold"
+                ],
                 "additionalProperties": false
               },
               {
@@ -17315,7 +19448,10 @@
                     "minimum": 0
                   }
                 },
-                "required": ["type", "budget"],
+                "required": [
+                  "type",
+                  "budget"
+                ],
                 "additionalProperties": false
               },
               {
@@ -17345,7 +19481,10 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["token-usage", "token_usage"]
+                    "enum": [
+                      "token-usage",
+                      "token_usage"
+                    ]
                   },
                   "max_total": {
                     "type": "number",
@@ -17360,7 +19499,9 @@
                     "minimum": 0
                   }
                 },
-                "required": ["type"],
+                "required": [
+                  "type"
+                ],
                 "additionalProperties": false
               },
               {
@@ -17390,7 +19531,10 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["execution-metrics", "execution_metrics"]
+                    "enum": [
+                      "execution-metrics",
+                      "execution_metrics"
+                    ]
                   },
                   "max_tool_calls": {
                     "type": "number",
@@ -17422,7 +19566,9 @@
                     "minimum": 0
                   }
                 },
-                "required": ["type"],
+                "required": [
+                  "type"
+                ],
                 "additionalProperties": false
               },
               {
@@ -17458,7 +19604,10 @@
                     "type": "string"
                   }
                 },
-                "required": ["type", "value"],
+                "required": [
+                  "type",
+                  "value"
+                ],
                 "additionalProperties": false
               },
               {
@@ -17494,7 +19643,10 @@
                     "type": "string"
                   }
                 },
-                "required": ["type", "value"],
+                "required": [
+                  "type",
+                  "value"
+                ],
                 "additionalProperties": false
               },
               {
@@ -17524,10 +19676,15 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": ["is-json", "is_json"]
+                    "enum": [
+                      "is-json",
+                      "is_json"
+                    ]
                   }
                 },
-                "required": ["type"],
+                "required": [
+                  "type"
+                ],
                 "additionalProperties": false
               },
               {
@@ -17563,7 +19720,10 @@
                     "type": "string"
                   }
                 },
-                "required": ["type", "value"],
+                "required": [
+                  "type",
+                  "value"
+                ],
                 "additionalProperties": false
               },
               {
@@ -17644,7 +19804,10 @@
                                 "minLength": 1
                               }
                             },
-                            "required": ["score_range", "outcome"],
+                            "required": [
+                              "score_range",
+                              "outcome"
+                            ],
                             "additionalProperties": false
                           }
                         }
@@ -17654,7 +19817,10 @@
                     "minItems": 1
                   }
                 },
-                "required": ["type", "criteria"],
+                "required": [
+                  "type",
+                  "criteria"
+                ],
                 "additionalProperties": false
               }
             ]
@@ -17670,7 +19836,10 @@
                 },
                 "isolation": {
                   "type": "string",
-                  "enum": ["shared", "per_test"]
+                  "enum": [
+                    "shared",
+                    "per_test"
+                  ]
                 },
                 "repos": {
                   "type": "array",
@@ -17694,7 +19863,10 @@
                                 "format": "uri"
                               }
                             },
-                            "required": ["type", "url"],
+                            "required": [
+                              "type",
+                              "url"
+                            ],
                             "additionalProperties": false
                           },
                           {
@@ -17708,7 +19880,10 @@
                                 "type": "string"
                               }
                             },
-                            "required": ["type", "path"],
+                            "required": [
+                              "type",
+                              "path"
+                            ],
                             "additionalProperties": false
                           }
                         ]
@@ -17721,7 +19896,10 @@
                           },
                           "resolve": {
                             "type": "string",
-                            "enum": ["remote", "local"]
+                            "enum": [
+                              "remote",
+                              "local"
+                            ]
                           },
                           "ancestor": {
                             "type": "integer",
@@ -17750,7 +19928,10 @@
                         "additionalProperties": false
                       }
                     },
-                    "required": ["path", "source"],
+                    "required": [
+                      "path",
+                      "source"
+                    ],
                     "additionalProperties": false
                   }
                 },
@@ -17786,7 +19967,11 @@
                         },
                         "reset": {
                           "type": "string",
-                          "enum": ["none", "fast", "strict"]
+                          "enum": [
+                            "none",
+                            "fast",
+                            "strict"
+                          ]
                         }
                       },
                       "additionalProperties": false
@@ -17817,7 +20002,11 @@
                         },
                         "reset": {
                           "type": "string",
-                          "enum": ["none", "fast", "strict"]
+                          "enum": [
+                            "none",
+                            "fast",
+                            "strict"
+                          ]
                         }
                       },
                       "additionalProperties": false
@@ -17848,7 +20037,11 @@
                         },
                         "reset": {
                           "type": "string",
-                          "enum": ["none", "fast", "strict"]
+                          "enum": [
+                            "none",
+                            "fast",
+                            "strict"
+                          ]
                         }
                       },
                       "additionalProperties": false
@@ -17879,7 +20072,11 @@
                         },
                         "reset": {
                           "type": "string",
-                          "enum": ["none", "fast", "strict"]
+                          "enum": [
+                            "none",
+                            "fast",
+                            "strict"
+                          ]
                         }
                       },
                       "additionalProperties": false
@@ -17889,7 +20086,11 @@
                 },
                 "mode": {
                   "type": "string",
-                  "enum": ["pooled", "temp", "static"]
+                  "enum": [
+                    "pooled",
+                    "temp",
+                    "static"
+                  ]
                 },
                 "path": {
                   "type": "string"
@@ -17903,7 +20104,9 @@
           ]
         }
       },
-      "required": ["tests"],
+      "required": [
+        "tests"
+      ],
       "additionalProperties": false
     }
   }

From f3d2bebb4afbae228385bf2b8210735543901acf Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Wed, 25 Mar 2026 02:33:37 +0000
Subject: [PATCH 06/11] feat(cli): add --threshold flag and wire through
 options pipeline (#698)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 apps/cli/src/commands/eval/commands/run.ts | 6 ++++++
 apps/cli/src/commands/eval/run-eval.ts     | 9 +++++++++
 2 files changed, 15 insertions(+)

diff --git a/apps/cli/src/commands/eval/commands/run.ts b/apps/cli/src/commands/eval/commands/run.ts
index e680301f3..713366e7b 100644
--- a/apps/cli/src/commands/eval/commands/run.ts
+++ b/apps/cli/src/commands/eval/commands/run.ts
@@ -175,6 +175,11 @@ export const evalRunCommand = command({
       description:
         'Number of trailing messages to include in results output (default: 1, or "all")',
     }),
+    threshold: option({
+      type: optional(number),
+      long: 'threshold',
+      description: 'Suite-level quality gate: exit 1 if mean score falls below this value (0-1)',
+    }),
   },
   handler: async (args) => {
     // Launch interactive wizard when no eval paths and stdin is a TTY
@@ -217,6 +222,7 @@ export const evalRunCommand = command({
       graderTarget: args.graderTarget,
       model: args.model,
       outputMessages: args.outputMessages,
+      threshold: args.threshold,
     };
     await runEvalCommand({ testFiles: resolvedPaths, rawOptions });
   },
diff --git a/apps/cli/src/commands/eval/run-eval.ts b/apps/cli/src/commands/eval/run-eval.ts
index 2d486eab2..ace542fdd 100644
--- a/apps/cli/src/commands/eval/run-eval.ts
+++ b/apps/cli/src/commands/eval/run-eval.ts
@@ -86,6 +86,7 @@ interface NormalizedOptions {
   readonly graderTarget?: string;
   readonly model?: string;
   readonly outputMessages: number | 'all';
+  readonly threshold?: number;
 }
 
 function normalizeBoolean(value: unknown): boolean {
@@ -301,6 +302,7 @@ function normalizeOptions(
     graderTarget: normalizeString(rawOptions.graderTarget),
     model: normalizeString(rawOptions.model),
     outputMessages: normalizeOutputMessages(normalizeString(rawOptions.outputMessages)),
+    threshold: normalizeOptionalNumber(rawOptions.threshold),
   } satisfies NormalizedOptions;
 }
 
@@ -430,6 +432,7 @@ async function prepareFileMetadata(params: {
   readonly yamlCachePath?: string;
   readonly totalBudgetUsd?: number;
   readonly failOnError?: FailOnError;
+  readonly threshold?: number;
 }> {
   const { testFilePath, repoRoot, cwd, options } = params;
 
@@ -515,6 +518,7 @@ async function prepareFileMetadata(params: {
     yamlCachePath: suite.cacheConfig?.cachePath,
     totalBudgetUsd: suite.totalBudgetUsd,
     failOnError: suite.failOnError,
+    threshold: suite.threshold,
   };
 }
 
@@ -951,6 +955,7 @@ export async function runEvalCommand(
       readonly yamlCachePath?: string;
       readonly totalBudgetUsd?: number;
       readonly failOnError?: FailOnError;
+      readonly threshold?: number;
     }
   >();
   // Separate TypeScript/JS eval files from YAML files.
@@ -1006,6 +1011,10 @@ export async function runEvalCommand(
     console.log(`Response cache: enabled${yamlCachePath ? ` (${yamlCachePath})` : ''}`);
   }
 
+  // Resolve suite-level threshold: CLI --threshold takes precedence over YAML execution.threshold
+  const yamlThreshold = firstMeta?.threshold;
+  const resolvedThreshold = options.threshold ?? yamlThreshold;
+
   // Detect matrix mode: multiple targets for any file
   const isMatrixMode = Array.from(fileMetadata.values()).some((meta) => meta.selections.length > 1);
 

From 1ea5e067bc9bc45c58ca37641962f88146bf1037 Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Wed, 25 Mar 2026 02:36:01 +0000
Subject: [PATCH 07/11] feat(cli): add threshold check with summary output
 after eval (#698)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 apps/cli/src/commands/eval/run-eval.ts        | 10 ++++++
 apps/cli/src/commands/eval/statistics.ts      | 14 +++++++++
 apps/cli/test/commands/eval/threshold.test.ts | 31 +++++++++++++++++++
 3 files changed, 55 insertions(+)
 create mode 100644 apps/cli/test/commands/eval/threshold.test.ts

diff --git a/apps/cli/src/commands/eval/run-eval.ts b/apps/cli/src/commands/eval/run-eval.ts
index ace542fdd..7deac0024 100644
--- a/apps/cli/src/commands/eval/run-eval.ts
+++ b/apps/cli/src/commands/eval/run-eval.ts
@@ -45,6 +45,7 @@ import {
   calculateEvaluationSummary,
   formatEvaluationSummary,
   formatMatrixSummary,
+  formatThresholdSummary,
 } from './statistics.js';
 import { type TargetSelection, selectMultipleTargets, selectTarget } from './targets.js';
 
@@ -1161,6 +1162,15 @@ export async function runEvalCommand(
     const summary = calculateEvaluationSummary(allResults);
     console.log(formatEvaluationSummary(summary));
 
+    // Threshold quality gate check
+    if (resolvedThreshold !== undefined) {
+      const thresholdResult = formatThresholdSummary(summary.mean, resolvedThreshold);
+      console.log(`\n${thresholdResult.message}`);
+      if (!thresholdResult.passed) {
+        process.exitCode = 1;
+      }
+    }
+
     // Print matrix summary when multiple targets were evaluated
     if (isMatrixMode && allResults.length > 0) {
       console.log(formatMatrixSummary(allResults));
diff --git a/apps/cli/src/commands/eval/statistics.ts b/apps/cli/src/commands/eval/statistics.ts
index e47a65791..910052d24 100644
--- a/apps/cli/src/commands/eval/statistics.ts
+++ b/apps/cli/src/commands/eval/statistics.ts
@@ -334,3 +334,17 @@ export function formatMatrixSummary(results: readonly EvaluationResult[]): strin
 
   return lines.join('\n');
 }
+
+/**
+ * Format a threshold check summary line.
+ * Returns whether the threshold was met and the formatted message.
+ */
+export function formatThresholdSummary(
+  meanScore: number,
+  threshold: number,
+): { passed: boolean; message: string } {
+  const passed = meanScore >= threshold;
+  const verdict = passed ? 'PASS' : 'FAIL';
+  const message = `Suite score: ${meanScore.toFixed(2)} (threshold: ${threshold.toFixed(2)}) — ${verdict}`;
+  return { passed, message };
+}
diff --git a/apps/cli/test/commands/eval/threshold.test.ts b/apps/cli/test/commands/eval/threshold.test.ts
new file mode 100644
index 000000000..65c059167
--- /dev/null
+++ b/apps/cli/test/commands/eval/threshold.test.ts
@@ -0,0 +1,31 @@
+import { describe, expect, it } from 'bun:test';
+
+import { formatThresholdSummary } from '../../../src/commands/eval/statistics.js';
+
+describe('formatThresholdSummary', () => {
+  it('returns PASS when mean score meets threshold', () => {
+    const result = formatThresholdSummary(0.85, 0.6);
+    expect(result.passed).toBe(true);
+    expect(result.message).toContain('0.85');
+    expect(result.message).toContain('0.60');
+    expect(result.message).toContain('PASS');
+  });
+
+  it('returns FAIL when mean score is below threshold', () => {
+    const result = formatThresholdSummary(0.53, 0.6);
+    expect(result.passed).toBe(false);
+    expect(result.message).toContain('0.53');
+    expect(result.message).toContain('0.60');
+    expect(result.message).toContain('FAIL');
+  });
+
+  it('returns PASS when mean score exactly equals threshold', () => {
+    const result = formatThresholdSummary(0.6, 0.6);
+    expect(result.passed).toBe(true);
+  });
+
+  it('returns PASS for threshold 0 with any score', () => {
+    const result = formatThresholdSummary(0, 0);
+    expect(result.passed).toBe(true);
+  });
+});

From e6261a1efbde6d336596c9cd55abd5b918c5a6e7 Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Wed, 25 Mar 2026 02:41:10 +0000
Subject: [PATCH 08/11] feat(cli): JUnit writer uses --threshold for per-test
 pass/fail (#698)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 apps/cli/src/commands/eval/junit-writer.ts    | 18 ++++++++-----
 apps/cli/src/commands/eval/output-writer.ts   | 18 ++++++++++---
 apps/cli/src/commands/eval/run-eval.ts        | 13 +++++++---
 .../test/commands/eval/output-writers.test.ts | 26 +++++++++++++++++++
 4 files changed, 62 insertions(+), 13 deletions(-)

diff --git a/apps/cli/src/commands/eval/junit-writer.ts b/apps/cli/src/commands/eval/junit-writer.ts
index f3bfb7f18..514b24585 100644
--- a/apps/cli/src/commands/eval/junit-writer.ts
+++ b/apps/cli/src/commands/eval/junit-writer.ts
@@ -3,6 +3,10 @@ import path from 'node:path';
 
 import type { EvaluationResult } from '@agentv/core';
 
+export interface JunitWriterOptions {
+  readonly threshold?: number;
+}
+
 export function escapeXml(str: string): string {
   return str
     .replace(/&/g, '&amp;')
@@ -15,15 +19,17 @@ export function escapeXml(str: string): string {
 export class JunitWriter {
   private readonly filePath: string;
   private readonly results: EvaluationResult[] = [];
+  private readonly threshold: number;
   private closed = false;
 
-  private constructor(filePath: string) {
+  private constructor(filePath: string, options?: JunitWriterOptions) {
     this.filePath = filePath;
+    this.threshold = options?.threshold ?? 0.5;
   }
 
-  static async open(filePath: string): Promise<JunitWriter> {
+  static async open(filePath: string, options?: JunitWriterOptions): Promise<JunitWriter> {
     await mkdir(path.dirname(filePath), { recursive: true });
-    return new JunitWriter(filePath);
+    return new JunitWriter(filePath, options);
   }
 
   async append(result: EvaluationResult): Promise<void> {
@@ -52,7 +58,7 @@ export class JunitWriter {
 
     const suiteXmls: string[] = [];
     for (const [suiteName, results] of grouped) {
-      const failures = results.filter((r) => r.score < 0.5).length;
+      const failures = results.filter((r) => r.score < this.threshold).length;
       const errors = results.filter((r) => r.error !== undefined).length;
 
       const testCases = results.map((r) => {
@@ -61,7 +67,7 @@ export class JunitWriter {
         let inner = '';
         if (r.error) {
           inner = `\n      <error message="${escapeXml(r.error)}">${escapeXml(r.error)}</error>\n    `;
-        } else if (r.score < 0.5) {
+        } else if (r.score < this.threshold) {
           const message = `score=${r.score.toFixed(3)}`;
           const failedAssertions = r.assertions.filter((a) => !a.passed);
           const detail = [
@@ -84,7 +90,7 @@ export class JunitWriter {
     }
 
     const totalTests = this.results.length;
-    const totalFailures = this.results.filter((r) => r.score < 0.5).length;
+    const totalFailures = this.results.filter((r) => r.score < this.threshold).length;
     const totalErrors = this.results.filter((r) => r.error !== undefined).length;
 
     const xml = `<?xml version="1.0" encoding="UTF-8"?>\n<testsuites tests="${totalTests}" failures="${totalFailures}" errors="${totalErrors}">\n${suiteXmls.join('\n')}\n</testsuites>\n`;
diff --git a/apps/cli/src/commands/eval/output-writer.ts b/apps/cli/src/commands/eval/output-writer.ts
index acaf757fe..e4d2cebd8 100644
--- a/apps/cli/src/commands/eval/output-writer.ts
+++ b/apps/cli/src/commands/eval/output-writer.ts
@@ -15,6 +15,10 @@ export interface OutputWriter {
   close(): Promise<void>;
 }
 
+export interface WriterOptions {
+  readonly threshold?: number;
+}
+
 export async function createOutputWriter(
   filePath: string,
   format: OutputFormat,
@@ -35,7 +39,10 @@ export async function createOutputWriter(
 
 const SUPPORTED_EXTENSIONS = new Set(['.jsonl', '.json', '.xml', '.yaml', '.yml', '.html', '.htm']);
 
-export function createWriterFromPath(filePath: string): Promise<OutputWriter> {
+export function createWriterFromPath(
+  filePath: string,
+  options?: WriterOptions,
+): Promise<OutputWriter> {
   const ext = path.extname(filePath).toLowerCase();
   switch (ext) {
     case '.jsonl':
@@ -43,7 +50,7 @@ export function createWriterFromPath(filePath: string): Promise<OutputWriter> {
     case '.json':
       return JsonWriter.open(filePath);
     case '.xml':
-      return JunitWriter.open(filePath);
+      return JunitWriter.open(filePath, { threshold: options?.threshold });
     case '.yaml':
     case '.yml':
       return YamlWriter.open(filePath);
@@ -57,8 +64,11 @@ export function createWriterFromPath(filePath: string): Promise<OutputWriter> {
   }
 }
 
-export async function createMultiWriter(filePaths: readonly string[]): Promise<OutputWriter> {
-  const writers = await Promise.all(filePaths.map((fp) => createWriterFromPath(fp)));
+export async function createMultiWriter(
+  filePaths: readonly string[],
+  options?: WriterOptions,
+): Promise<OutputWriter> {
+  const writers = await Promise.all(filePaths.map((fp) => createWriterFromPath(fp, options)));
   return {
     async append(result: EvaluationResult): Promise<void> {
       await Promise.all(writers.map((w) => w.append(result)));
diff --git a/apps/cli/src/commands/eval/run-eval.ts b/apps/cli/src/commands/eval/run-eval.ts
index 7deac0024..cd09e8e22 100644
--- a/apps/cli/src/commands/eval/run-eval.ts
+++ b/apps/cli/src/commands/eval/run-eval.ts
@@ -906,12 +906,9 @@ export async function runEvalCommand(
     extraOutputPaths.length > 0 ? [outputPath, ...extraOutputPaths] : [outputPath];
   const uniqueReportedOutputPaths = [...new Set(reportedOutputPaths)];
 
-  let outputWriter: OutputWriter;
   if (uniqueOutputPaths.length === 1) {
-    outputWriter = await createOutputWriter(primaryWritePath, options.format);
     console.log(`Output path: ${outputPath}`);
   } else {
-    outputWriter = await createMultiWriter(uniqueOutputPaths);
     console.log('Output paths:');
     for (const p of uniqueReportedOutputPaths) {
       console.log(`  ${p}`);
@@ -1016,6 +1013,16 @@ export async function runEvalCommand(
   const yamlThreshold = firstMeta?.threshold;
   const resolvedThreshold = options.threshold ?? yamlThreshold;
 
+  // Build the output writer (deferred until after threshold is resolved so JUnit
+  // writer can use the resolved threshold for per-test pass/fail decisions)
+  const writerOptions = resolvedThreshold !== undefined ? { threshold: resolvedThreshold } : undefined;
+  let outputWriter: OutputWriter;
+  if (uniqueOutputPaths.length === 1) {
+    outputWriter = await createOutputWriter(primaryWritePath, options.format);
+  } else {
+    outputWriter = await createMultiWriter(uniqueOutputPaths, writerOptions);
+  }
+
   // Detect matrix mode: multiple targets for any file
   const isMatrixMode = Array.from(fileMetadata.values()).some((meta) => meta.selections.length > 1);
 
diff --git a/apps/cli/test/commands/eval/output-writers.test.ts b/apps/cli/test/commands/eval/output-writers.test.ts
index 8c1ea67fb..75ff80da2 100644
--- a/apps/cli/test/commands/eval/output-writers.test.ts
+++ b/apps/cli/test/commands/eval/output-writers.test.ts
@@ -162,6 +162,32 @@ describe('JunitWriter', () => {
       'Cannot write to closed JUnit writer',
     );
   });
+
+  it('uses custom threshold for pass/fail when provided', async () => {
+    const filePath = path.join(testDir, `junit-threshold-${Date.now()}.xml`);
+    const writer = await JunitWriter.open(filePath, { threshold: 0.8 });
+
+    await writer.append(makeResult({ testId: 'high', score: 0.9 }));
+    await writer.append(makeResult({ testId: 'mid', score: 0.6 }));
+    await writer.close();
+
+    const xml = await readFile(filePath, 'utf8');
+    expect(xml).not.toContain('<failure message="score=0.900"');
+    expect(xml).toContain('<failure message="score=0.600"');
+  });
+
+  it('defaults to 0.5 threshold when none provided', async () => {
+    const filePath = path.join(testDir, `junit-default-${Date.now()}.xml`);
+    const writer = await JunitWriter.open(filePath);
+
+    await writer.append(makeResult({ testId: 'pass', score: 0.6 }));
+    await writer.append(makeResult({ testId: 'fail', score: 0.3 }));
+    await writer.close();
+
+    const xml = await readFile(filePath, 'utf8');
+    expect(xml).not.toContain('<failure message="score=0.600"');
+    expect(xml).toContain('<failure message="score=0.300"');
+  });
 });
 
 describe('escapeXml', () => {

From b66323dfb5eed7bdb858388564c42d9e688d5cdd Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Wed, 25 Mar 2026 02:46:07 +0000
Subject: [PATCH 09/11] style: fix biome formatting in threshold implementation
 files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 apps/cli/src/commands/eval/run-eval.ts        |    3 +-
 .../src/evaluation/loaders/config-loader.ts   |    4 +-
 .../references/eval-schema.json               | 3530 ++++-------------
 3 files changed, 674 insertions(+), 2863 deletions(-)

diff --git a/apps/cli/src/commands/eval/run-eval.ts b/apps/cli/src/commands/eval/run-eval.ts
index cd09e8e22..8dc114969 100644
--- a/apps/cli/src/commands/eval/run-eval.ts
+++ b/apps/cli/src/commands/eval/run-eval.ts
@@ -1015,7 +1015,8 @@ export async function runEvalCommand(
 
   // Build the output writer (deferred until after threshold is resolved so JUnit
   // writer can use the resolved threshold for per-test pass/fail decisions)
-  const writerOptions = resolvedThreshold !== undefined ? { threshold: resolvedThreshold } : undefined;
+  const writerOptions =
+    resolvedThreshold !== undefined ? { threshold: resolvedThreshold } : undefined;
   let outputWriter: OutputWriter;
   if (uniqueOutputPaths.length === 1) {
     outputWriter = await createOutputWriter(primaryWritePath, options.format);
diff --git a/packages/core/src/evaluation/loaders/config-loader.ts b/packages/core/src/evaluation/loaders/config-loader.ts
index daa2aa7aa..54505cddc 100644
--- a/packages/core/src/evaluation/loaders/config-loader.ts
+++ b/packages/core/src/evaluation/loaders/config-loader.ts
@@ -355,9 +355,7 @@ export function extractThreshold(suite: JsonObject): number | undefined {
     return raw;
   }
 
-  logWarning(
-    `Invalid execution.threshold: ${raw}. Must be a number between 0 and 1. Ignoring.`,
-  );
+  logWarning(`Invalid execution.threshold: ${raw}. Must be a number between 0 and 1. Ignoring.`);
   return undefined;
 }
 
diff --git a/plugins/agentv-dev/skills/agentv-eval-writer/references/eval-schema.json b/plugins/agentv-dev/skills/agentv-eval-writer/references/eval-schema.json
index 4df59a334..9827ee04c 100644
--- a/plugins/agentv-dev/skills/agentv-eval-writer/references/eval-schema.json
+++ b/plugins/agentv-dev/skills/agentv-eval-writer/references/eval-schema.json
@@ -53,12 +53,7 @@
                 "properties": {
                   "role": {
                     "type": "string",
-                    "enum": [
-                      "system",
-                      "user",
-                      "assistant",
-                      "tool"
-                    ]
+                    "enum": ["system", "user", "assistant", "tool"]
                   },
                   "content": {
                     "anyOf": [
@@ -72,29 +67,20 @@
                           "properties": {
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "text",
-                                "file"
-                              ]
+                              "enum": ["text", "file"]
                             },
                             "value": {
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         }
                       }
                     ]
                   }
                 },
-                "required": [
-                  "role",
-                  "content"
-                ],
+                "required": ["role", "content"],
                 "additionalProperties": false
               }
             }
@@ -135,12 +121,7 @@
                           "properties": {
                             "role": {
                               "type": "string",
-                              "enum": [
-                                "system",
-                                "user",
-                                "assistant",
-                                "tool"
-                              ]
+                              "enum": ["system", "user", "assistant", "tool"]
                             },
                             "content": {
                               "anyOf": [
@@ -154,29 +135,20 @@
                                     "properties": {
                                       "type": {
                                         "type": "string",
-                                        "enum": [
-                                          "text",
-                                          "file"
-                                        ]
+                                        "enum": ["text", "file"]
                                       },
                                       "value": {
                                         "type": "string"
                                       }
                                     },
-                                    "required": [
-                                      "type",
-                                      "value"
-                                    ],
+                                    "required": ["type", "value"],
                                     "additionalProperties": false
                                   }
                                 }
                               ]
                             }
                           },
-                          "required": [
-                            "role",
-                            "content"
-                          ],
+                          "required": ["role", "content"],
                           "additionalProperties": false
                         }
                       }
@@ -204,12 +176,7 @@
                           "properties": {
                             "role": {
                               "type": "string",
-                              "enum": [
-                                "system",
-                                "user",
-                                "assistant",
-                                "tool"
-                              ]
+                              "enum": ["system", "user", "assistant", "tool"]
                             },
                             "content": {
                               "anyOf": [
@@ -223,29 +190,20 @@
                                     "properties": {
                                       "type": {
                                         "type": "string",
-                                        "enum": [
-                                          "text",
-                                          "file"
-                                        ]
+                                        "enum": ["text", "file"]
                                       },
                                       "value": {
                                         "type": "string"
                                       }
                                     },
-                                    "required": [
-                                      "type",
-                                      "value"
-                                    ],
+                                    "required": ["type", "value"],
                                     "additionalProperties": false
                                   }
                                 }
                               ]
                             }
                           },
-                          "required": [
-                            "role",
-                            "content"
-                          ],
+                          "required": ["role", "content"],
                           "additionalProperties": false
                         }
                       }
@@ -282,12 +240,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "code-grader",
-                                "code_grader",
-                                "code-judge",
-                                "code_judge"
-                              ]
+                              "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                             },
                             "command": {
                               "anyOf": [
@@ -339,10 +292,7 @@
                               "additionalProperties": {}
                             }
                           },
-                          "required": [
-                            "type",
-                            "command"
-                          ],
+                          "required": ["type", "command"],
                           "additionalProperties": false
                         },
                         {
@@ -372,12 +322,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "llm-grader",
-                                "llm_grader",
-                                "llm-judge",
-                                "llm_judge"
-                              ]
+                              "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                             },
                             "prompt": {
                               "anyOf": [
@@ -471,10 +416,7 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": [
-                                        "score_range",
-                                        "outcome"
-                                      ],
+                                      "required": ["score_range", "outcome"],
                                       "additionalProperties": false
                                     }
                                   }
@@ -503,9 +445,7 @@
                               "maximum": 2
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -565,9 +505,7 @@
                                       }
                                     }
                                   },
-                                  "required": [
-                                    "type"
-                                  ],
+                                  "required": ["type"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -583,10 +521,7 @@
                                       "maximum": 1
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "threshold"
-                                  ],
+                                  "required": ["type", "threshold"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -603,10 +538,7 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "path"
-                                  ],
+                                  "required": ["type", "path"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -623,18 +555,13 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type"
-                                  ],
+                                  "required": ["type"],
                                   "additionalProperties": false
                                 }
                               ]
                             }
                           },
-                          "required": [
-                            "type",
-                            "aggregator"
-                          ],
+                          "required": ["type", "aggregator"],
                           "additionalProperties": false
                         },
                         {
@@ -664,20 +591,11 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "tool-trajectory",
-                                "tool_trajectory"
-                              ]
+                              "enum": ["tool-trajectory", "tool_trajectory"]
                             },
                             "mode": {
                               "type": "string",
-                              "enum": [
-                                "any_order",
-                                "in_order",
-                                "exact",
-                                "subset",
-                                "superset"
-                              ]
+                              "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                             },
                             "minimums": {
                               "type": "object",
@@ -718,12 +636,7 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "ignore",
-                                          "subset",
-                                          "superset"
-                                        ]
+                                        "enum": ["exact", "ignore", "subset", "superset"]
                                       },
                                       {
                                         "type": "array",
@@ -737,12 +650,7 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "ignore",
-                                          "subset",
-                                          "superset"
-                                        ]
+                                        "enum": ["exact", "ignore", "subset", "superset"]
                                       },
                                       {
                                         "type": "array",
@@ -753,9 +661,7 @@
                                     ]
                                   }
                                 },
-                                "required": [
-                                  "tool"
-                                ],
+                                "required": ["tool"],
                                 "additionalProperties": false
                               }
                             },
@@ -763,12 +669,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -782,12 +683,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -798,10 +694,7 @@
                               ]
                             }
                           },
-                          "required": [
-                            "type",
-                            "mode"
-                          ],
+                          "required": ["type", "mode"],
                           "additionalProperties": false
                         },
                         {
@@ -831,10 +724,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "field-accuracy",
-                                "field_accuracy"
-                              ]
+                              "enum": ["field-accuracy", "field_accuracy"]
                             },
                             "fields": {
                               "type": "array",
@@ -846,11 +736,7 @@
                                   },
                                   "match": {
                                     "type": "string",
-                                    "enum": [
-                                      "exact",
-                                      "numeric_tolerance",
-                                      "date"
-                                    ]
+                                    "enum": ["exact", "numeric_tolerance", "date"]
                                   },
                                   "required": {
                                     "type": "boolean"
@@ -872,26 +758,17 @@
                                     }
                                   }
                                 },
-                                "required": [
-                                  "path",
-                                  "match"
-                                ],
+                                "required": ["path", "match"],
                                 "additionalProperties": false
                               },
                               "minItems": 1
                             },
                             "aggregation": {
                               "type": "string",
-                              "enum": [
-                                "weighted_average",
-                                "all_or_nothing"
-                              ]
+                              "enum": ["weighted_average", "all_or_nothing"]
                             }
                           },
-                          "required": [
-                            "type",
-                            "fields"
-                          ],
+                          "required": ["type", "fields"],
                           "additionalProperties": false
                         },
                         {
@@ -928,10 +805,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type",
-                            "threshold"
-                          ],
+                          "required": ["type", "threshold"],
                           "additionalProperties": false
                         },
                         {
@@ -968,10 +842,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type",
-                            "budget"
-                          ],
+                          "required": ["type", "budget"],
                           "additionalProperties": false
                         },
                         {
@@ -1001,10 +872,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "token-usage",
-                                "token_usage"
-                              ]
+                              "enum": ["token-usage", "token_usage"]
                             },
                             "max_total": {
                               "type": "number",
@@ -1019,9 +887,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -1051,10 +917,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "execution-metrics",
-                                "execution_metrics"
-                              ]
+                              "enum": ["execution-metrics", "execution_metrics"]
                             },
                             "max_tool_calls": {
                               "type": "number",
@@ -1086,9 +949,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -1124,10 +985,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -1163,10 +1021,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -1196,15 +1051,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "is-json",
-                                "is_json"
-                              ]
+                              "enum": ["is-json", "is_json"]
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -1240,10 +1090,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -1324,10 +1171,7 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": [
-                                        "score_range",
-                                        "outcome"
-                                      ],
+                                      "required": ["score_range", "outcome"],
                                       "additionalProperties": false
                                     }
                                   }
@@ -1337,10 +1181,7 @@
                               "minItems": 1
                             }
                           },
-                          "required": [
-                            "type",
-                            "criteria"
-                          ],
+                          "required": ["type", "criteria"],
                           "additionalProperties": false
                         }
                       ]
@@ -1377,12 +1218,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "code-grader",
-                                "code_grader",
-                                "code-judge",
-                                "code_judge"
-                              ]
+                              "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                             },
                             "command": {
                               "anyOf": [
@@ -1434,10 +1270,7 @@
                               "additionalProperties": {}
                             }
                           },
-                          "required": [
-                            "type",
-                            "command"
-                          ],
+                          "required": ["type", "command"],
                           "additionalProperties": false
                         },
                         {
@@ -1467,12 +1300,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "llm-grader",
-                                "llm_grader",
-                                "llm-judge",
-                                "llm_judge"
-                              ]
+                              "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                             },
                             "prompt": {
                               "anyOf": [
@@ -1566,10 +1394,7 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": [
-                                        "score_range",
-                                        "outcome"
-                                      ],
+                                      "required": ["score_range", "outcome"],
                                       "additionalProperties": false
                                     }
                                   }
@@ -1598,9 +1423,7 @@
                               "maximum": 2
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -1660,9 +1483,7 @@
                                       }
                                     }
                                   },
-                                  "required": [
-                                    "type"
-                                  ],
+                                  "required": ["type"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -1678,10 +1499,7 @@
                                       "maximum": 1
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "threshold"
-                                  ],
+                                  "required": ["type", "threshold"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -1698,10 +1516,7 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "path"
-                                  ],
+                                  "required": ["type", "path"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -1718,18 +1533,13 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type"
-                                  ],
+                                  "required": ["type"],
                                   "additionalProperties": false
                                 }
                               ]
                             }
                           },
-                          "required": [
-                            "type",
-                            "aggregator"
-                          ],
+                          "required": ["type", "aggregator"],
                           "additionalProperties": false
                         },
                         {
@@ -1759,20 +1569,11 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "tool-trajectory",
-                                "tool_trajectory"
-                              ]
+                              "enum": ["tool-trajectory", "tool_trajectory"]
                             },
                             "mode": {
                               "type": "string",
-                              "enum": [
-                                "any_order",
-                                "in_order",
-                                "exact",
-                                "subset",
-                                "superset"
-                              ]
+                              "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                             },
                             "minimums": {
                               "type": "object",
@@ -1813,12 +1614,7 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "ignore",
-                                          "subset",
-                                          "superset"
-                                        ]
+                                        "enum": ["exact", "ignore", "subset", "superset"]
                                       },
                                       {
                                         "type": "array",
@@ -1832,12 +1628,7 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "ignore",
-                                          "subset",
-                                          "superset"
-                                        ]
+                                        "enum": ["exact", "ignore", "subset", "superset"]
                                       },
                                       {
                                         "type": "array",
@@ -1848,9 +1639,7 @@
                                     ]
                                   }
                                 },
-                                "required": [
-                                  "tool"
-                                ],
+                                "required": ["tool"],
                                 "additionalProperties": false
                               }
                             },
@@ -1858,12 +1647,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -1877,12 +1661,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -1893,10 +1672,7 @@
                               ]
                             }
                           },
-                          "required": [
-                            "type",
-                            "mode"
-                          ],
+                          "required": ["type", "mode"],
                           "additionalProperties": false
                         },
                         {
@@ -1926,10 +1702,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "field-accuracy",
-                                "field_accuracy"
-                              ]
+                              "enum": ["field-accuracy", "field_accuracy"]
                             },
                             "fields": {
                               "type": "array",
@@ -1941,11 +1714,7 @@
                                   },
                                   "match": {
                                     "type": "string",
-                                    "enum": [
-                                      "exact",
-                                      "numeric_tolerance",
-                                      "date"
-                                    ]
+                                    "enum": ["exact", "numeric_tolerance", "date"]
                                   },
                                   "required": {
                                     "type": "boolean"
@@ -1967,26 +1736,17 @@
                                     }
                                   }
                                 },
-                                "required": [
-                                  "path",
-                                  "match"
-                                ],
+                                "required": ["path", "match"],
                                 "additionalProperties": false
                               },
                               "minItems": 1
                             },
                             "aggregation": {
                               "type": "string",
-                              "enum": [
-                                "weighted_average",
-                                "all_or_nothing"
-                              ]
+                              "enum": ["weighted_average", "all_or_nothing"]
                             }
                           },
-                          "required": [
-                            "type",
-                            "fields"
-                          ],
+                          "required": ["type", "fields"],
                           "additionalProperties": false
                         },
                         {
@@ -2023,10 +1783,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type",
-                            "threshold"
-                          ],
+                          "required": ["type", "threshold"],
                           "additionalProperties": false
                         },
                         {
@@ -2063,10 +1820,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type",
-                            "budget"
-                          ],
+                          "required": ["type", "budget"],
                           "additionalProperties": false
                         },
                         {
@@ -2096,10 +1850,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "token-usage",
-                                "token_usage"
-                              ]
+                              "enum": ["token-usage", "token_usage"]
                             },
                             "max_total": {
                               "type": "number",
@@ -2114,9 +1865,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -2146,10 +1895,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "execution-metrics",
-                                "execution_metrics"
-                              ]
+                              "enum": ["execution-metrics", "execution_metrics"]
                             },
                             "max_tool_calls": {
                               "type": "number",
@@ -2181,9 +1927,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -2219,10 +1963,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -2258,10 +1999,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -2291,15 +2029,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "is-json",
-                                "is_json"
-                              ]
+                              "enum": ["is-json", "is_json"]
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -2335,10 +2068,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -2419,10 +2149,7 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": [
-                                        "score_range",
-                                        "outcome"
-                                      ],
+                                      "required": ["score_range", "outcome"],
                                       "additionalProperties": false
                                     }
                                   }
@@ -2432,10 +2159,7 @@
                               "minItems": 1
                             }
                           },
-                          "required": [
-                            "type",
-                            "criteria"
-                          ],
+                          "required": ["type", "criteria"],
                           "additionalProperties": false
                         }
                       ]
@@ -2472,12 +2196,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "code-grader",
-                                "code_grader",
-                                "code-judge",
-                                "code_judge"
-                              ]
+                              "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                             },
                             "command": {
                               "anyOf": [
@@ -2529,10 +2248,7 @@
                               "additionalProperties": {}
                             }
                           },
-                          "required": [
-                            "type",
-                            "command"
-                          ],
+                          "required": ["type", "command"],
                           "additionalProperties": false
                         },
                         {
@@ -2562,12 +2278,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "llm-grader",
-                                "llm_grader",
-                                "llm-judge",
-                                "llm_judge"
-                              ]
+                              "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                             },
                             "prompt": {
                               "anyOf": [
@@ -2661,10 +2372,7 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": [
-                                        "score_range",
-                                        "outcome"
-                                      ],
+                                      "required": ["score_range", "outcome"],
                                       "additionalProperties": false
                                     }
                                   }
@@ -2693,9 +2401,7 @@
                               "maximum": 2
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -2755,9 +2461,7 @@
                                       }
                                     }
                                   },
-                                  "required": [
-                                    "type"
-                                  ],
+                                  "required": ["type"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -2773,10 +2477,7 @@
                                       "maximum": 1
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "threshold"
-                                  ],
+                                  "required": ["type", "threshold"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -2793,10 +2494,7 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "path"
-                                  ],
+                                  "required": ["type", "path"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -2813,18 +2511,13 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type"
-                                  ],
+                                  "required": ["type"],
                                   "additionalProperties": false
                                 }
                               ]
                             }
                           },
-                          "required": [
-                            "type",
-                            "aggregator"
-                          ],
+                          "required": ["type", "aggregator"],
                           "additionalProperties": false
                         },
                         {
@@ -2854,20 +2547,11 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "tool-trajectory",
-                                "tool_trajectory"
-                              ]
+                              "enum": ["tool-trajectory", "tool_trajectory"]
                             },
                             "mode": {
                               "type": "string",
-                              "enum": [
-                                "any_order",
-                                "in_order",
-                                "exact",
-                                "subset",
-                                "superset"
-                              ]
+                              "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                             },
                             "minimums": {
                               "type": "object",
@@ -2908,12 +2592,7 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "ignore",
-                                          "subset",
-                                          "superset"
-                                        ]
+                                        "enum": ["exact", "ignore", "subset", "superset"]
                                       },
                                       {
                                         "type": "array",
@@ -2927,12 +2606,7 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "ignore",
-                                          "subset",
-                                          "superset"
-                                        ]
+                                        "enum": ["exact", "ignore", "subset", "superset"]
                                       },
                                       {
                                         "type": "array",
@@ -2943,9 +2617,7 @@
                                     ]
                                   }
                                 },
-                                "required": [
-                                  "tool"
-                                ],
+                                "required": ["tool"],
                                 "additionalProperties": false
                               }
                             },
@@ -2953,12 +2625,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -2972,12 +2639,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -2988,10 +2650,7 @@
                               ]
                             }
                           },
-                          "required": [
-                            "type",
-                            "mode"
-                          ],
+                          "required": ["type", "mode"],
                           "additionalProperties": false
                         },
                         {
@@ -3021,10 +2680,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "field-accuracy",
-                                "field_accuracy"
-                              ]
+                              "enum": ["field-accuracy", "field_accuracy"]
                             },
                             "fields": {
                               "type": "array",
@@ -3036,11 +2692,7 @@
                                   },
                                   "match": {
                                     "type": "string",
-                                    "enum": [
-                                      "exact",
-                                      "numeric_tolerance",
-                                      "date"
-                                    ]
+                                    "enum": ["exact", "numeric_tolerance", "date"]
                                   },
                                   "required": {
                                     "type": "boolean"
@@ -3062,26 +2714,17 @@
                                     }
                                   }
                                 },
-                                "required": [
-                                  "path",
-                                  "match"
-                                ],
+                                "required": ["path", "match"],
                                 "additionalProperties": false
                               },
                               "minItems": 1
                             },
                             "aggregation": {
                               "type": "string",
-                              "enum": [
-                                "weighted_average",
-                                "all_or_nothing"
-                              ]
+                              "enum": ["weighted_average", "all_or_nothing"]
                             }
                           },
-                          "required": [
-                            "type",
-                            "fields"
-                          ],
+                          "required": ["type", "fields"],
                           "additionalProperties": false
                         },
                         {
@@ -3118,10 +2761,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type",
-                            "threshold"
-                          ],
+                          "required": ["type", "threshold"],
                           "additionalProperties": false
                         },
                         {
@@ -3158,10 +2798,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type",
-                            "budget"
-                          ],
+                          "required": ["type", "budget"],
                           "additionalProperties": false
                         },
                         {
@@ -3191,10 +2828,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "token-usage",
-                                "token_usage"
-                              ]
+                              "enum": ["token-usage", "token_usage"]
                             },
                             "max_total": {
                               "type": "number",
@@ -3209,9 +2843,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -3241,10 +2873,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "execution-metrics",
-                                "execution_metrics"
-                              ]
+                              "enum": ["execution-metrics", "execution_metrics"]
                             },
                             "max_tool_calls": {
                               "type": "number",
@@ -3276,9 +2905,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -3314,10 +2941,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -3353,10 +2977,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -3386,15 +3007,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "is-json",
-                                "is_json"
-                              ]
+                              "enum": ["is-json", "is_json"]
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -3430,10 +3046,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -3514,10 +3127,7 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": [
-                                        "score_range",
-                                        "outcome"
-                                      ],
+                                      "required": ["score_range", "outcome"],
                                       "additionalProperties": false
                                     }
                                   }
@@ -3527,10 +3137,7 @@
                               "minItems": 1
                             }
                           },
-                          "required": [
-                            "type",
-                            "criteria"
-                          ],
+                          "required": ["type", "criteria"],
                           "additionalProperties": false
                         }
                       ]
@@ -3584,12 +3191,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "code-grader",
-                                    "code_grader",
-                                    "code-judge",
-                                    "code_judge"
-                                  ]
+                                  "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                                 },
                                 "command": {
                                   "anyOf": [
@@ -3641,10 +3243,7 @@
                                   "additionalProperties": {}
                                 }
                               },
-                              "required": [
-                                "type",
-                                "command"
-                              ],
+                              "required": ["type", "command"],
                               "additionalProperties": false
                             },
                             {
@@ -3674,12 +3273,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "llm-grader",
-                                    "llm_grader",
-                                    "llm-judge",
-                                    "llm_judge"
-                                  ]
+                                  "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                                 },
                                 "prompt": {
                                   "anyOf": [
@@ -3773,10 +3367,7 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": [
-                                            "score_range",
-                                            "outcome"
-                                          ],
+                                          "required": ["score_range", "outcome"],
                                           "additionalProperties": false
                                         }
                                       }
@@ -3805,9 +3396,7 @@
                                   "maximum": 2
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -3867,9 +3456,7 @@
                                           }
                                         }
                                       },
-                                      "required": [
-                                        "type"
-                                      ],
+                                      "required": ["type"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -3885,10 +3472,7 @@
                                           "maximum": 1
                                         }
                                       },
-                                      "required": [
-                                        "type",
-                                        "threshold"
-                                      ],
+                                      "required": ["type", "threshold"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -3905,10 +3489,7 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": [
-                                        "type",
-                                        "path"
-                                      ],
+                                      "required": ["type", "path"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -3925,18 +3506,13 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": [
-                                        "type"
-                                      ],
+                                      "required": ["type"],
                                       "additionalProperties": false
                                     }
                                   ]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "aggregator"
-                              ],
+                              "required": ["type", "aggregator"],
                               "additionalProperties": false
                             },
                             {
@@ -3966,20 +3542,11 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "tool-trajectory",
-                                    "tool_trajectory"
-                                  ]
+                                  "enum": ["tool-trajectory", "tool_trajectory"]
                                 },
                                 "mode": {
                                   "type": "string",
-                                  "enum": [
-                                    "any_order",
-                                    "in_order",
-                                    "exact",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                                 },
                                 "minimums": {
                                   "type": "object",
@@ -4020,12 +3587,7 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": [
-                                              "exact",
-                                              "ignore",
-                                              "subset",
-                                              "superset"
-                                            ]
+                                            "enum": ["exact", "ignore", "subset", "superset"]
                                           },
                                           {
                                             "type": "array",
@@ -4039,12 +3601,7 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": [
-                                              "exact",
-                                              "ignore",
-                                              "subset",
-                                              "superset"
-                                            ]
+                                            "enum": ["exact", "ignore", "subset", "superset"]
                                           },
                                           {
                                             "type": "array",
@@ -4055,9 +3612,7 @@
                                         ]
                                       }
                                     },
-                                    "required": [
-                                      "tool"
-                                    ],
+                                    "required": ["tool"],
                                     "additionalProperties": false
                                   }
                                 },
@@ -4065,12 +3620,7 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": [
-                                        "exact",
-                                        "ignore",
-                                        "subset",
-                                        "superset"
-                                      ]
+                                      "enum": ["exact", "ignore", "subset", "superset"]
                                     },
                                     {
                                       "type": "array",
@@ -4084,12 +3634,7 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": [
-                                        "exact",
-                                        "ignore",
-                                        "subset",
-                                        "superset"
-                                      ]
+                                      "enum": ["exact", "ignore", "subset", "superset"]
                                     },
                                     {
                                       "type": "array",
@@ -4100,10 +3645,7 @@
                                   ]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "mode"
-                              ],
+                              "required": ["type", "mode"],
                               "additionalProperties": false
                             },
                             {
@@ -4133,10 +3675,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "field-accuracy",
-                                    "field_accuracy"
-                                  ]
+                                  "enum": ["field-accuracy", "field_accuracy"]
                                 },
                                 "fields": {
                                   "type": "array",
@@ -4148,11 +3687,7 @@
                                       },
                                       "match": {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "numeric_tolerance",
-                                          "date"
-                                        ]
+                                        "enum": ["exact", "numeric_tolerance", "date"]
                                       },
                                       "required": {
                                         "type": "boolean"
@@ -4174,26 +3709,17 @@
                                         }
                                       }
                                     },
-                                    "required": [
-                                      "path",
-                                      "match"
-                                    ],
+                                    "required": ["path", "match"],
                                     "additionalProperties": false
                                   },
                                   "minItems": 1
                                 },
                                 "aggregation": {
                                   "type": "string",
-                                  "enum": [
-                                    "weighted_average",
-                                    "all_or_nothing"
-                                  ]
+                                  "enum": ["weighted_average", "all_or_nothing"]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "fields"
-                              ],
+                              "required": ["type", "fields"],
                               "additionalProperties": false
                             },
                             {
@@ -4230,10 +3756,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type",
-                                "threshold"
-                              ],
+                              "required": ["type", "threshold"],
                               "additionalProperties": false
                             },
                             {
@@ -4270,10 +3793,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type",
-                                "budget"
-                              ],
+                              "required": ["type", "budget"],
                               "additionalProperties": false
                             },
                             {
@@ -4303,10 +3823,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "token-usage",
-                                    "token_usage"
-                                  ]
+                                  "enum": ["token-usage", "token_usage"]
                                 },
                                 "max_total": {
                                   "type": "number",
@@ -4321,9 +3838,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -4353,10 +3868,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "execution-metrics",
-                                    "execution_metrics"
-                                  ]
+                                  "enum": ["execution-metrics", "execution_metrics"]
                                 },
                                 "max_tool_calls": {
                                   "type": "number",
@@ -4388,9 +3900,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -4426,10 +3936,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -4465,10 +3972,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -4498,15 +4002,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "is-json",
-                                    "is_json"
-                                  ]
+                                  "enum": ["is-json", "is_json"]
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -4542,10 +4041,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -4626,10 +4122,7 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": [
-                                            "score_range",
-                                            "outcome"
-                                          ],
+                                          "required": ["score_range", "outcome"],
                                           "additionalProperties": false
                                         }
                                       }
@@ -4639,10 +4132,7 @@
                                   "minItems": 1
                                 }
                               },
-                              "required": [
-                                "type",
-                                "criteria"
-                              ],
+                              "required": ["type", "criteria"],
                               "additionalProperties": false
                             }
                           ]
@@ -4679,12 +4169,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "code-grader",
-                                    "code_grader",
-                                    "code-judge",
-                                    "code_judge"
-                                  ]
+                                  "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                                 },
                                 "command": {
                                   "anyOf": [
@@ -4736,10 +4221,7 @@
                                   "additionalProperties": {}
                                 }
                               },
-                              "required": [
-                                "type",
-                                "command"
-                              ],
+                              "required": ["type", "command"],
                               "additionalProperties": false
                             },
                             {
@@ -4769,12 +4251,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "llm-grader",
-                                    "llm_grader",
-                                    "llm-judge",
-                                    "llm_judge"
-                                  ]
+                                  "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                                 },
                                 "prompt": {
                                   "anyOf": [
@@ -4868,10 +4345,7 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": [
-                                            "score_range",
-                                            "outcome"
-                                          ],
+                                          "required": ["score_range", "outcome"],
                                           "additionalProperties": false
                                         }
                                       }
@@ -4900,9 +4374,7 @@
                                   "maximum": 2
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -4962,9 +4434,7 @@
                                           }
                                         }
                                       },
-                                      "required": [
-                                        "type"
-                                      ],
+                                      "required": ["type"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -4980,10 +4450,7 @@
                                           "maximum": 1
                                         }
                                       },
-                                      "required": [
-                                        "type",
-                                        "threshold"
-                                      ],
+                                      "required": ["type", "threshold"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -5000,10 +4467,7 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": [
-                                        "type",
-                                        "path"
-                                      ],
+                                      "required": ["type", "path"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -5020,18 +4484,13 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": [
-                                        "type"
-                                      ],
+                                      "required": ["type"],
                                       "additionalProperties": false
                                     }
                                   ]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "aggregator"
-                              ],
+                              "required": ["type", "aggregator"],
                               "additionalProperties": false
                             },
                             {
@@ -5061,20 +4520,11 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "tool-trajectory",
-                                    "tool_trajectory"
-                                  ]
+                                  "enum": ["tool-trajectory", "tool_trajectory"]
                                 },
                                 "mode": {
                                   "type": "string",
-                                  "enum": [
-                                    "any_order",
-                                    "in_order",
-                                    "exact",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                                 },
                                 "minimums": {
                                   "type": "object",
@@ -5115,12 +4565,7 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": [
-                                              "exact",
-                                              "ignore",
-                                              "subset",
-                                              "superset"
-                                            ]
+                                            "enum": ["exact", "ignore", "subset", "superset"]
                                           },
                                           {
                                             "type": "array",
@@ -5134,12 +4579,7 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": [
-                                              "exact",
-                                              "ignore",
-                                              "subset",
-                                              "superset"
-                                            ]
+                                            "enum": ["exact", "ignore", "subset", "superset"]
                                           },
                                           {
                                             "type": "array",
@@ -5150,9 +4590,7 @@
                                         ]
                                       }
                                     },
-                                    "required": [
-                                      "tool"
-                                    ],
+                                    "required": ["tool"],
                                     "additionalProperties": false
                                   }
                                 },
@@ -5160,12 +4598,7 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": [
-                                        "exact",
-                                        "ignore",
-                                        "subset",
-                                        "superset"
-                                      ]
+                                      "enum": ["exact", "ignore", "subset", "superset"]
                                     },
                                     {
                                       "type": "array",
@@ -5179,12 +4612,7 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": [
-                                        "exact",
-                                        "ignore",
-                                        "subset",
-                                        "superset"
-                                      ]
+                                      "enum": ["exact", "ignore", "subset", "superset"]
                                     },
                                     {
                                       "type": "array",
@@ -5195,10 +4623,7 @@
                                   ]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "mode"
-                              ],
+                              "required": ["type", "mode"],
                               "additionalProperties": false
                             },
                             {
@@ -5228,10 +4653,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "field-accuracy",
-                                    "field_accuracy"
-                                  ]
+                                  "enum": ["field-accuracy", "field_accuracy"]
                                 },
                                 "fields": {
                                   "type": "array",
@@ -5243,11 +4665,7 @@
                                       },
                                       "match": {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "numeric_tolerance",
-                                          "date"
-                                        ]
+                                        "enum": ["exact", "numeric_tolerance", "date"]
                                       },
                                       "required": {
                                         "type": "boolean"
@@ -5269,26 +4687,17 @@
                                         }
                                       }
                                     },
-                                    "required": [
-                                      "path",
-                                      "match"
-                                    ],
+                                    "required": ["path", "match"],
                                     "additionalProperties": false
                                   },
                                   "minItems": 1
                                 },
                                 "aggregation": {
                                   "type": "string",
-                                  "enum": [
-                                    "weighted_average",
-                                    "all_or_nothing"
-                                  ]
+                                  "enum": ["weighted_average", "all_or_nothing"]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "fields"
-                              ],
+                              "required": ["type", "fields"],
                               "additionalProperties": false
                             },
                             {
@@ -5325,10 +4734,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type",
-                                "threshold"
-                              ],
+                              "required": ["type", "threshold"],
                               "additionalProperties": false
                             },
                             {
@@ -5365,10 +4771,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type",
-                                "budget"
-                              ],
+                              "required": ["type", "budget"],
                               "additionalProperties": false
                             },
                             {
@@ -5398,10 +4801,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "token-usage",
-                                    "token_usage"
-                                  ]
+                                  "enum": ["token-usage", "token_usage"]
                                 },
                                 "max_total": {
                                   "type": "number",
@@ -5416,9 +4816,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -5448,10 +4846,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "execution-metrics",
-                                    "execution_metrics"
-                                  ]
+                                  "enum": ["execution-metrics", "execution_metrics"]
                                 },
                                 "max_tool_calls": {
                                   "type": "number",
@@ -5483,9 +4878,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -5521,10 +4914,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -5560,10 +4950,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -5593,15 +4980,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "is-json",
-                                    "is_json"
-                                  ]
+                                  "enum": ["is-json", "is_json"]
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -5637,10 +5019,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -5721,10 +5100,7 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": [
-                                            "score_range",
-                                            "outcome"
-                                          ],
+                                          "required": ["score_range", "outcome"],
                                           "additionalProperties": false
                                         }
                                       }
@@ -5734,10 +5110,7 @@
                                   "minItems": 1
                                 }
                               },
-                              "required": [
-                                "type",
-                                "criteria"
-                              ],
+                              "required": ["type", "criteria"],
                               "additionalProperties": false
                             }
                           ]
@@ -5774,12 +5147,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "code-grader",
-                                    "code_grader",
-                                    "code-judge",
-                                    "code_judge"
-                                  ]
+                                  "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                                 },
                                 "command": {
                                   "anyOf": [
@@ -5831,10 +5199,7 @@
                                   "additionalProperties": {}
                                 }
                               },
-                              "required": [
-                                "type",
-                                "command"
-                              ],
+                              "required": ["type", "command"],
                               "additionalProperties": false
                             },
                             {
@@ -5864,12 +5229,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "llm-grader",
-                                    "llm_grader",
-                                    "llm-judge",
-                                    "llm_judge"
-                                  ]
+                                  "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                                 },
                                 "prompt": {
                                   "anyOf": [
@@ -5963,10 +5323,7 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": [
-                                            "score_range",
-                                            "outcome"
-                                          ],
+                                          "required": ["score_range", "outcome"],
                                           "additionalProperties": false
                                         }
                                       }
@@ -5995,9 +5352,7 @@
                                   "maximum": 2
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -6057,9 +5412,7 @@
                                           }
                                         }
                                       },
-                                      "required": [
-                                        "type"
-                                      ],
+                                      "required": ["type"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -6075,10 +5428,7 @@
                                           "maximum": 1
                                         }
                                       },
-                                      "required": [
-                                        "type",
-                                        "threshold"
-                                      ],
+                                      "required": ["type", "threshold"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -6095,10 +5445,7 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": [
-                                        "type",
-                                        "path"
-                                      ],
+                                      "required": ["type", "path"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -6115,18 +5462,13 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": [
-                                        "type"
-                                      ],
+                                      "required": ["type"],
                                       "additionalProperties": false
                                     }
                                   ]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "aggregator"
-                              ],
+                              "required": ["type", "aggregator"],
                               "additionalProperties": false
                             },
                             {
@@ -6156,20 +5498,11 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "tool-trajectory",
-                                    "tool_trajectory"
-                                  ]
+                                  "enum": ["tool-trajectory", "tool_trajectory"]
                                 },
                                 "mode": {
                                   "type": "string",
-                                  "enum": [
-                                    "any_order",
-                                    "in_order",
-                                    "exact",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                                 },
                                 "minimums": {
                                   "type": "object",
@@ -6210,12 +5543,7 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": [
-                                              "exact",
-                                              "ignore",
-                                              "subset",
-                                              "superset"
-                                            ]
+                                            "enum": ["exact", "ignore", "subset", "superset"]
                                           },
                                           {
                                             "type": "array",
@@ -6229,12 +5557,7 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": [
-                                              "exact",
-                                              "ignore",
-                                              "subset",
-                                              "superset"
-                                            ]
+                                            "enum": ["exact", "ignore", "subset", "superset"]
                                           },
                                           {
                                             "type": "array",
@@ -6245,9 +5568,7 @@
                                         ]
                                       }
                                     },
-                                    "required": [
-                                      "tool"
-                                    ],
+                                    "required": ["tool"],
                                     "additionalProperties": false
                                   }
                                 },
@@ -6255,12 +5576,7 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": [
-                                        "exact",
-                                        "ignore",
-                                        "subset",
-                                        "superset"
-                                      ]
+                                      "enum": ["exact", "ignore", "subset", "superset"]
                                     },
                                     {
                                       "type": "array",
@@ -6274,12 +5590,7 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": [
-                                        "exact",
-                                        "ignore",
-                                        "subset",
-                                        "superset"
-                                      ]
+                                      "enum": ["exact", "ignore", "subset", "superset"]
                                     },
                                     {
                                       "type": "array",
@@ -6290,10 +5601,7 @@
                                   ]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "mode"
-                              ],
+                              "required": ["type", "mode"],
                               "additionalProperties": false
                             },
                             {
@@ -6323,10 +5631,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "field-accuracy",
-                                    "field_accuracy"
-                                  ]
+                                  "enum": ["field-accuracy", "field_accuracy"]
                                 },
                                 "fields": {
                                   "type": "array",
@@ -6338,11 +5643,7 @@
                                       },
                                       "match": {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "numeric_tolerance",
-                                          "date"
-                                        ]
+                                        "enum": ["exact", "numeric_tolerance", "date"]
                                       },
                                       "required": {
                                         "type": "boolean"
@@ -6364,26 +5665,17 @@
                                         }
                                       }
                                     },
-                                    "required": [
-                                      "path",
-                                      "match"
-                                    ],
+                                    "required": ["path", "match"],
                                     "additionalProperties": false
                                   },
                                   "minItems": 1
                                 },
                                 "aggregation": {
                                   "type": "string",
-                                  "enum": [
-                                    "weighted_average",
-                                    "all_or_nothing"
-                                  ]
+                                  "enum": ["weighted_average", "all_or_nothing"]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "fields"
-                              ],
+                              "required": ["type", "fields"],
                               "additionalProperties": false
                             },
                             {
@@ -6420,10 +5712,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type",
-                                "threshold"
-                              ],
+                              "required": ["type", "threshold"],
                               "additionalProperties": false
                             },
                             {
@@ -6460,10 +5749,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type",
-                                "budget"
-                              ],
+                              "required": ["type", "budget"],
                               "additionalProperties": false
                             },
                             {
@@ -6493,10 +5779,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "token-usage",
-                                    "token_usage"
-                                  ]
+                                  "enum": ["token-usage", "token_usage"]
                                 },
                                 "max_total": {
                                   "type": "number",
@@ -6511,9 +5794,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -6543,10 +5824,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "execution-metrics",
-                                    "execution_metrics"
-                                  ]
+                                  "enum": ["execution-metrics", "execution_metrics"]
                                 },
                                 "max_tool_calls": {
                                   "type": "number",
@@ -6578,9 +5856,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -6616,10 +5892,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -6655,10 +5928,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -6688,15 +5958,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "is-json",
-                                    "is_json"
-                                  ]
+                                  "enum": ["is-json", "is_json"]
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -6732,10 +5997,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -6816,10 +6078,7 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": [
-                                            "score_range",
-                                            "outcome"
-                                          ],
+                                          "required": ["score_range", "outcome"],
                                           "additionalProperties": false
                                         }
                                       }
@@ -6829,10 +6088,7 @@
                                   "minItems": 1
                                 }
                               },
-                              "required": [
-                                "type",
-                                "criteria"
-                              ],
+                              "required": ["type", "criteria"],
                               "additionalProperties": false
                             }
                           ]
@@ -6853,11 +6109,7 @@
                           },
                           "strategy": {
                             "type": "string",
-                            "enum": [
-                              "pass_at_k",
-                              "mean",
-                              "confidence_interval"
-                            ]
+                            "enum": ["pass_at_k", "mean", "confidence_interval"]
                           },
                           "cost_limit_usd": {
                             "type": "number",
@@ -6868,9 +6120,7 @@
                             "minimum": 0
                           }
                         },
-                        "required": [
-                          "count"
-                        ],
+                        "required": ["count"],
                         "additionalProperties": false
                       },
                       "total_budget_usd": {
@@ -6903,10 +6153,7 @@
                       },
                       "isolation": {
                         "type": "string",
-                        "enum": [
-                          "shared",
-                          "per_test"
-                        ]
+                        "enum": ["shared", "per_test"]
                       },
                       "repos": {
                         "type": "array",
@@ -6930,10 +6177,7 @@
                                       "format": "uri"
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "url"
-                                  ],
+                                  "required": ["type", "url"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -6947,10 +6191,7 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "path"
-                                  ],
+                                  "required": ["type", "path"],
                                   "additionalProperties": false
                                 }
                               ]
@@ -6963,10 +6204,7 @@
                                 },
                                 "resolve": {
                                   "type": "string",
-                                  "enum": [
-                                    "remote",
-                                    "local"
-                                  ]
+                                  "enum": ["remote", "local"]
                                 },
                                 "ancestor": {
                                   "type": "integer",
@@ -6995,10 +6233,7 @@
                               "additionalProperties": false
                             }
                           },
-                          "required": [
-                            "path",
-                            "source"
-                          ],
+                          "required": ["path", "source"],
                           "additionalProperties": false
                         }
                       },
@@ -7034,11 +6269,7 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": [
-                                  "none",
-                                  "fast",
-                                  "strict"
-                                ]
+                                "enum": ["none", "fast", "strict"]
                               }
                             },
                             "additionalProperties": false
@@ -7069,11 +6300,7 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": [
-                                  "none",
-                                  "fast",
-                                  "strict"
-                                ]
+                                "enum": ["none", "fast", "strict"]
                               }
                             },
                             "additionalProperties": false
@@ -7104,11 +6331,7 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": [
-                                  "none",
-                                  "fast",
-                                  "strict"
-                                ]
+                                "enum": ["none", "fast", "strict"]
                               }
                             },
                             "additionalProperties": false
@@ -7139,11 +6362,7 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": [
-                                  "none",
-                                  "fast",
-                                  "strict"
-                                ]
+                                "enum": ["none", "fast", "strict"]
                               }
                             },
                             "additionalProperties": false
@@ -7153,11 +6372,7 @@
                       },
                       "mode": {
                         "type": "string",
-                        "enum": [
-                          "pooled",
-                          "temp",
-                          "static"
-                        ]
+                        "enum": ["pooled", "temp", "static"]
                       },
                       "path": {
                         "type": "string"
@@ -7179,9 +6394,7 @@
                     "type": "string"
                   }
                 },
-                "required": [
-                  "id"
-                ],
+                "required": ["id"],
                 "additionalProperties": false
               }
             },
@@ -7219,12 +6432,7 @@
                           "properties": {
                             "role": {
                               "type": "string",
-                              "enum": [
-                                "system",
-                                "user",
-                                "assistant",
-                                "tool"
-                              ]
+                              "enum": ["system", "user", "assistant", "tool"]
                             },
                             "content": {
                               "anyOf": [
@@ -7238,29 +6446,20 @@
                                     "properties": {
                                       "type": {
                                         "type": "string",
-                                        "enum": [
-                                          "text",
-                                          "file"
-                                        ]
+                                        "enum": ["text", "file"]
                                       },
                                       "value": {
                                         "type": "string"
                                       }
                                     },
-                                    "required": [
-                                      "type",
-                                      "value"
-                                    ],
+                                    "required": ["type", "value"],
                                     "additionalProperties": false
                                   }
                                 }
                               ]
                             }
                           },
-                          "required": [
-                            "role",
-                            "content"
-                          ],
+                          "required": ["role", "content"],
                           "additionalProperties": false
                         }
                       }
@@ -7288,12 +6487,7 @@
                           "properties": {
                             "role": {
                               "type": "string",
-                              "enum": [
-                                "system",
-                                "user",
-                                "assistant",
-                                "tool"
-                              ]
+                              "enum": ["system", "user", "assistant", "tool"]
                             },
                             "content": {
                               "anyOf": [
@@ -7307,29 +6501,20 @@
                                     "properties": {
                                       "type": {
                                         "type": "string",
-                                        "enum": [
-                                          "text",
-                                          "file"
-                                        ]
+                                        "enum": ["text", "file"]
                                       },
                                       "value": {
                                         "type": "string"
                                       }
                                     },
-                                    "required": [
-                                      "type",
-                                      "value"
-                                    ],
+                                    "required": ["type", "value"],
                                     "additionalProperties": false
                                   }
                                 }
                               ]
                             }
                           },
-                          "required": [
-                            "role",
-                            "content"
-                          ],
+                          "required": ["role", "content"],
                           "additionalProperties": false
                         }
                       }
@@ -7366,12 +6551,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "code-grader",
-                                "code_grader",
-                                "code-judge",
-                                "code_judge"
-                              ]
+                              "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                             },
                             "command": {
                               "anyOf": [
@@ -7423,10 +6603,7 @@
                               "additionalProperties": {}
                             }
                           },
-                          "required": [
-                            "type",
-                            "command"
-                          ],
+                          "required": ["type", "command"],
                           "additionalProperties": false
                         },
                         {
@@ -7456,12 +6633,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "llm-grader",
-                                "llm_grader",
-                                "llm-judge",
-                                "llm_judge"
-                              ]
+                              "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                             },
                             "prompt": {
                               "anyOf": [
@@ -7555,10 +6727,7 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": [
-                                        "score_range",
-                                        "outcome"
-                                      ],
+                                      "required": ["score_range", "outcome"],
                                       "additionalProperties": false
                                     }
                                   }
@@ -7587,9 +6756,7 @@
                               "maximum": 2
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -7649,9 +6816,7 @@
                                       }
                                     }
                                   },
-                                  "required": [
-                                    "type"
-                                  ],
+                                  "required": ["type"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -7667,10 +6832,7 @@
                                       "maximum": 1
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "threshold"
-                                  ],
+                                  "required": ["type", "threshold"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -7687,10 +6849,7 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "path"
-                                  ],
+                                  "required": ["type", "path"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -7707,18 +6866,13 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type"
-                                  ],
+                                  "required": ["type"],
                                   "additionalProperties": false
                                 }
                               ]
                             }
                           },
-                          "required": [
-                            "type",
-                            "aggregator"
-                          ],
+                          "required": ["type", "aggregator"],
                           "additionalProperties": false
                         },
                         {
@@ -7748,20 +6902,11 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "tool-trajectory",
-                                "tool_trajectory"
-                              ]
+                              "enum": ["tool-trajectory", "tool_trajectory"]
                             },
                             "mode": {
                               "type": "string",
-                              "enum": [
-                                "any_order",
-                                "in_order",
-                                "exact",
-                                "subset",
-                                "superset"
-                              ]
+                              "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                             },
                             "minimums": {
                               "type": "object",
@@ -7802,12 +6947,7 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "ignore",
-                                          "subset",
-                                          "superset"
-                                        ]
+                                        "enum": ["exact", "ignore", "subset", "superset"]
                                       },
                                       {
                                         "type": "array",
@@ -7821,12 +6961,7 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "ignore",
-                                          "subset",
-                                          "superset"
-                                        ]
+                                        "enum": ["exact", "ignore", "subset", "superset"]
                                       },
                                       {
                                         "type": "array",
@@ -7837,9 +6972,7 @@
                                     ]
                                   }
                                 },
-                                "required": [
-                                  "tool"
-                                ],
+                                "required": ["tool"],
                                 "additionalProperties": false
                               }
                             },
@@ -7847,12 +6980,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -7866,12 +6994,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -7882,10 +7005,7 @@
                               ]
                             }
                           },
-                          "required": [
-                            "type",
-                            "mode"
-                          ],
+                          "required": ["type", "mode"],
                           "additionalProperties": false
                         },
                         {
@@ -7915,10 +7035,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "field-accuracy",
-                                "field_accuracy"
-                              ]
+                              "enum": ["field-accuracy", "field_accuracy"]
                             },
                             "fields": {
                               "type": "array",
@@ -7930,11 +7047,7 @@
                                   },
                                   "match": {
                                     "type": "string",
-                                    "enum": [
-                                      "exact",
-                                      "numeric_tolerance",
-                                      "date"
-                                    ]
+                                    "enum": ["exact", "numeric_tolerance", "date"]
                                   },
                                   "required": {
                                     "type": "boolean"
@@ -7956,26 +7069,17 @@
                                     }
                                   }
                                 },
-                                "required": [
-                                  "path",
-                                  "match"
-                                ],
+                                "required": ["path", "match"],
                                 "additionalProperties": false
                               },
                               "minItems": 1
                             },
                             "aggregation": {
                               "type": "string",
-                              "enum": [
-                                "weighted_average",
-                                "all_or_nothing"
-                              ]
+                              "enum": ["weighted_average", "all_or_nothing"]
                             }
                           },
-                          "required": [
-                            "type",
-                            "fields"
-                          ],
+                          "required": ["type", "fields"],
                           "additionalProperties": false
                         },
                         {
@@ -8012,10 +7116,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type",
-                            "threshold"
-                          ],
+                          "required": ["type", "threshold"],
                           "additionalProperties": false
                         },
                         {
@@ -8052,10 +7153,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type",
-                            "budget"
-                          ],
+                          "required": ["type", "budget"],
                           "additionalProperties": false
                         },
                         {
@@ -8085,10 +7183,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "token-usage",
-                                "token_usage"
-                              ]
+                              "enum": ["token-usage", "token_usage"]
                             },
                             "max_total": {
                               "type": "number",
@@ -8103,9 +7198,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -8135,10 +7228,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "execution-metrics",
-                                "execution_metrics"
-                              ]
+                              "enum": ["execution-metrics", "execution_metrics"]
                             },
                             "max_tool_calls": {
                               "type": "number",
@@ -8170,9 +7260,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -8208,10 +7296,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -8247,10 +7332,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -8280,15 +7362,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "is-json",
-                                "is_json"
-                              ]
+                              "enum": ["is-json", "is_json"]
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -8324,10 +7401,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -8408,10 +7482,7 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": [
-                                        "score_range",
-                                        "outcome"
-                                      ],
+                                      "required": ["score_range", "outcome"],
                                       "additionalProperties": false
                                     }
                                   }
@@ -8421,10 +7492,7 @@
                               "minItems": 1
                             }
                           },
-                          "required": [
-                            "type",
-                            "criteria"
-                          ],
+                          "required": ["type", "criteria"],
                           "additionalProperties": false
                         }
                       ]
@@ -8461,12 +7529,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "code-grader",
-                                "code_grader",
-                                "code-judge",
-                                "code_judge"
-                              ]
+                              "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                             },
                             "command": {
                               "anyOf": [
@@ -8518,10 +7581,7 @@
                               "additionalProperties": {}
                             }
                           },
-                          "required": [
-                            "type",
-                            "command"
-                          ],
+                          "required": ["type", "command"],
                           "additionalProperties": false
                         },
                         {
@@ -8551,12 +7611,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "llm-grader",
-                                "llm_grader",
-                                "llm-judge",
-                                "llm_judge"
-                              ]
+                              "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                             },
                             "prompt": {
                               "anyOf": [
@@ -8650,10 +7705,7 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": [
-                                        "score_range",
-                                        "outcome"
-                                      ],
+                                      "required": ["score_range", "outcome"],
                                       "additionalProperties": false
                                     }
                                   }
@@ -8682,9 +7734,7 @@
                               "maximum": 2
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -8744,9 +7794,7 @@
                                       }
                                     }
                                   },
-                                  "required": [
-                                    "type"
-                                  ],
+                                  "required": ["type"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -8762,10 +7810,7 @@
                                       "maximum": 1
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "threshold"
-                                  ],
+                                  "required": ["type", "threshold"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -8782,10 +7827,7 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "path"
-                                  ],
+                                  "required": ["type", "path"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -8802,18 +7844,13 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type"
-                                  ],
+                                  "required": ["type"],
                                   "additionalProperties": false
                                 }
                               ]
                             }
                           },
-                          "required": [
-                            "type",
-                            "aggregator"
-                          ],
+                          "required": ["type", "aggregator"],
                           "additionalProperties": false
                         },
                         {
@@ -8843,20 +7880,11 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "tool-trajectory",
-                                "tool_trajectory"
-                              ]
+                              "enum": ["tool-trajectory", "tool_trajectory"]
                             },
                             "mode": {
                               "type": "string",
-                              "enum": [
-                                "any_order",
-                                "in_order",
-                                "exact",
-                                "subset",
-                                "superset"
-                              ]
+                              "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                             },
                             "minimums": {
                               "type": "object",
@@ -8897,12 +7925,7 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "ignore",
-                                          "subset",
-                                          "superset"
-                                        ]
+                                        "enum": ["exact", "ignore", "subset", "superset"]
                                       },
                                       {
                                         "type": "array",
@@ -8916,12 +7939,7 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "ignore",
-                                          "subset",
-                                          "superset"
-                                        ]
+                                        "enum": ["exact", "ignore", "subset", "superset"]
                                       },
                                       {
                                         "type": "array",
@@ -8932,9 +7950,7 @@
                                     ]
                                   }
                                 },
-                                "required": [
-                                  "tool"
-                                ],
+                                "required": ["tool"],
                                 "additionalProperties": false
                               }
                             },
@@ -8942,12 +7958,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -8961,12 +7972,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -8977,10 +7983,7 @@
                               ]
                             }
                           },
-                          "required": [
-                            "type",
-                            "mode"
-                          ],
+                          "required": ["type", "mode"],
                           "additionalProperties": false
                         },
                         {
@@ -9010,10 +8013,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "field-accuracy",
-                                "field_accuracy"
-                              ]
+                              "enum": ["field-accuracy", "field_accuracy"]
                             },
                             "fields": {
                               "type": "array",
@@ -9025,11 +8025,7 @@
                                   },
                                   "match": {
                                     "type": "string",
-                                    "enum": [
-                                      "exact",
-                                      "numeric_tolerance",
-                                      "date"
-                                    ]
+                                    "enum": ["exact", "numeric_tolerance", "date"]
                                   },
                                   "required": {
                                     "type": "boolean"
@@ -9051,26 +8047,17 @@
                                     }
                                   }
                                 },
-                                "required": [
-                                  "path",
-                                  "match"
-                                ],
+                                "required": ["path", "match"],
                                 "additionalProperties": false
                               },
                               "minItems": 1
                             },
                             "aggregation": {
                               "type": "string",
-                              "enum": [
-                                "weighted_average",
-                                "all_or_nothing"
-                              ]
+                              "enum": ["weighted_average", "all_or_nothing"]
                             }
                           },
-                          "required": [
-                            "type",
-                            "fields"
-                          ],
+                          "required": ["type", "fields"],
                           "additionalProperties": false
                         },
                         {
@@ -9107,10 +8094,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type",
-                            "threshold"
-                          ],
+                          "required": ["type", "threshold"],
                           "additionalProperties": false
                         },
                         {
@@ -9147,10 +8131,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type",
-                            "budget"
-                          ],
+                          "required": ["type", "budget"],
                           "additionalProperties": false
                         },
                         {
@@ -9180,10 +8161,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "token-usage",
-                                "token_usage"
-                              ]
+                              "enum": ["token-usage", "token_usage"]
                             },
                             "max_total": {
                               "type": "number",
@@ -9198,9 +8176,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -9230,10 +8206,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "execution-metrics",
-                                "execution_metrics"
-                              ]
+                              "enum": ["execution-metrics", "execution_metrics"]
                             },
                             "max_tool_calls": {
                               "type": "number",
@@ -9265,9 +8238,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -9303,10 +8274,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -9342,10 +8310,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -9375,15 +8340,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "is-json",
-                                "is_json"
-                              ]
+                              "enum": ["is-json", "is_json"]
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -9419,10 +8379,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -9503,10 +8460,7 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": [
-                                        "score_range",
-                                        "outcome"
-                                      ],
+                                      "required": ["score_range", "outcome"],
                                       "additionalProperties": false
                                     }
                                   }
@@ -9516,10 +8470,7 @@
                               "minItems": 1
                             }
                           },
-                          "required": [
-                            "type",
-                            "criteria"
-                          ],
+                          "required": ["type", "criteria"],
                           "additionalProperties": false
                         }
                       ]
@@ -9556,12 +8507,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "code-grader",
-                                "code_grader",
-                                "code-judge",
-                                "code_judge"
-                              ]
+                              "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                             },
                             "command": {
                               "anyOf": [
@@ -9613,10 +8559,7 @@
                               "additionalProperties": {}
                             }
                           },
-                          "required": [
-                            "type",
-                            "command"
-                          ],
+                          "required": ["type", "command"],
                           "additionalProperties": false
                         },
                         {
@@ -9646,12 +8589,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "llm-grader",
-                                "llm_grader",
-                                "llm-judge",
-                                "llm_judge"
-                              ]
+                              "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                             },
                             "prompt": {
                               "anyOf": [
@@ -9745,10 +8683,7 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": [
-                                        "score_range",
-                                        "outcome"
-                                      ],
+                                      "required": ["score_range", "outcome"],
                                       "additionalProperties": false
                                     }
                                   }
@@ -9777,9 +8712,7 @@
                               "maximum": 2
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -9839,9 +8772,7 @@
                                       }
                                     }
                                   },
-                                  "required": [
-                                    "type"
-                                  ],
+                                  "required": ["type"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -9857,10 +8788,7 @@
                                       "maximum": 1
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "threshold"
-                                  ],
+                                  "required": ["type", "threshold"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -9877,10 +8805,7 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "path"
-                                  ],
+                                  "required": ["type", "path"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -9897,18 +8822,13 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type"
-                                  ],
+                                  "required": ["type"],
                                   "additionalProperties": false
                                 }
                               ]
                             }
                           },
-                          "required": [
-                            "type",
-                            "aggregator"
-                          ],
+                          "required": ["type", "aggregator"],
                           "additionalProperties": false
                         },
                         {
@@ -9938,20 +8858,11 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "tool-trajectory",
-                                "tool_trajectory"
-                              ]
+                              "enum": ["tool-trajectory", "tool_trajectory"]
                             },
                             "mode": {
                               "type": "string",
-                              "enum": [
-                                "any_order",
-                                "in_order",
-                                "exact",
-                                "subset",
-                                "superset"
-                              ]
+                              "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                             },
                             "minimums": {
                               "type": "object",
@@ -9992,12 +8903,7 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "ignore",
-                                          "subset",
-                                          "superset"
-                                        ]
+                                        "enum": ["exact", "ignore", "subset", "superset"]
                                       },
                                       {
                                         "type": "array",
@@ -10011,12 +8917,7 @@
                                     "anyOf": [
                                       {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "ignore",
-                                          "subset",
-                                          "superset"
-                                        ]
+                                        "enum": ["exact", "ignore", "subset", "superset"]
                                       },
                                       {
                                         "type": "array",
@@ -10027,9 +8928,7 @@
                                     ]
                                   }
                                 },
-                                "required": [
-                                  "tool"
-                                ],
+                                "required": ["tool"],
                                 "additionalProperties": false
                               }
                             },
@@ -10037,12 +8936,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -10056,12 +8950,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -10072,10 +8961,7 @@
                               ]
                             }
                           },
-                          "required": [
-                            "type",
-                            "mode"
-                          ],
+                          "required": ["type", "mode"],
                           "additionalProperties": false
                         },
                         {
@@ -10105,10 +8991,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "field-accuracy",
-                                "field_accuracy"
-                              ]
+                              "enum": ["field-accuracy", "field_accuracy"]
                             },
                             "fields": {
                               "type": "array",
@@ -10120,11 +9003,7 @@
                                   },
                                   "match": {
                                     "type": "string",
-                                    "enum": [
-                                      "exact",
-                                      "numeric_tolerance",
-                                      "date"
-                                    ]
+                                    "enum": ["exact", "numeric_tolerance", "date"]
                                   },
                                   "required": {
                                     "type": "boolean"
@@ -10146,26 +9025,17 @@
                                     }
                                   }
                                 },
-                                "required": [
-                                  "path",
-                                  "match"
-                                ],
+                                "required": ["path", "match"],
                                 "additionalProperties": false
                               },
                               "minItems": 1
                             },
                             "aggregation": {
                               "type": "string",
-                              "enum": [
-                                "weighted_average",
-                                "all_or_nothing"
-                              ]
+                              "enum": ["weighted_average", "all_or_nothing"]
                             }
                           },
-                          "required": [
-                            "type",
-                            "fields"
-                          ],
+                          "required": ["type", "fields"],
                           "additionalProperties": false
                         },
                         {
@@ -10202,10 +9072,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type",
-                            "threshold"
-                          ],
+                          "required": ["type", "threshold"],
                           "additionalProperties": false
                         },
                         {
@@ -10242,10 +9109,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type",
-                            "budget"
-                          ],
+                          "required": ["type", "budget"],
                           "additionalProperties": false
                         },
                         {
@@ -10275,10 +9139,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "token-usage",
-                                "token_usage"
-                              ]
+                              "enum": ["token-usage", "token_usage"]
                             },
                             "max_total": {
                               "type": "number",
@@ -10293,9 +9154,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -10325,10 +9184,7 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "execution-metrics",
-                                "execution_metrics"
-                              ]
+                              "enum": ["execution-metrics", "execution_metrics"]
                             },
                             "max_tool_calls": {
                               "type": "number",
@@ -10360,9 +9216,7 @@
                               "minimum": 0
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -10398,10 +9252,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -10437,10 +9288,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -10470,15 +9318,10 @@
                             },
                             "type": {
                               "type": "string",
-                              "enum": [
-                                "is-json",
-                                "is_json"
-                              ]
+                              "enum": ["is-json", "is_json"]
                             }
                           },
-                          "required": [
-                            "type"
-                          ],
+                          "required": ["type"],
                           "additionalProperties": false
                         },
                         {
@@ -10514,10 +9357,7 @@
                               "type": "string"
                             }
                           },
-                          "required": [
-                            "type",
-                            "value"
-                          ],
+                          "required": ["type", "value"],
                           "additionalProperties": false
                         },
                         {
@@ -10598,10 +9438,7 @@
                                           "minLength": 1
                                         }
                                       },
-                                      "required": [
-                                        "score_range",
-                                        "outcome"
-                                      ],
+                                      "required": ["score_range", "outcome"],
                                       "additionalProperties": false
                                     }
                                   }
@@ -10611,10 +9448,7 @@
                               "minItems": 1
                             }
                           },
-                          "required": [
-                            "type",
-                            "criteria"
-                          ],
+                          "required": ["type", "criteria"],
                           "additionalProperties": false
                         }
                       ]
@@ -10668,12 +9502,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "code-grader",
-                                    "code_grader",
-                                    "code-judge",
-                                    "code_judge"
-                                  ]
+                                  "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                                 },
                                 "command": {
                                   "anyOf": [
@@ -10725,10 +9554,7 @@
                                   "additionalProperties": {}
                                 }
                               },
-                              "required": [
-                                "type",
-                                "command"
-                              ],
+                              "required": ["type", "command"],
                               "additionalProperties": false
                             },
                             {
@@ -10758,12 +9584,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "llm-grader",
-                                    "llm_grader",
-                                    "llm-judge",
-                                    "llm_judge"
-                                  ]
+                                  "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                                 },
                                 "prompt": {
                                   "anyOf": [
@@ -10857,10 +9678,7 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": [
-                                            "score_range",
-                                            "outcome"
-                                          ],
+                                          "required": ["score_range", "outcome"],
                                           "additionalProperties": false
                                         }
                                       }
@@ -10889,9 +9707,7 @@
                                   "maximum": 2
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -10951,9 +9767,7 @@
                                           }
                                         }
                                       },
-                                      "required": [
-                                        "type"
-                                      ],
+                                      "required": ["type"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -10969,10 +9783,7 @@
                                           "maximum": 1
                                         }
                                       },
-                                      "required": [
-                                        "type",
-                                        "threshold"
-                                      ],
+                                      "required": ["type", "threshold"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -10989,10 +9800,7 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": [
-                                        "type",
-                                        "path"
-                                      ],
+                                      "required": ["type", "path"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -11009,18 +9817,13 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": [
-                                        "type"
-                                      ],
+                                      "required": ["type"],
                                       "additionalProperties": false
                                     }
                                   ]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "aggregator"
-                              ],
+                              "required": ["type", "aggregator"],
                               "additionalProperties": false
                             },
                             {
@@ -11050,20 +9853,11 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "tool-trajectory",
-                                    "tool_trajectory"
-                                  ]
+                                  "enum": ["tool-trajectory", "tool_trajectory"]
                                 },
                                 "mode": {
                                   "type": "string",
-                                  "enum": [
-                                    "any_order",
-                                    "in_order",
-                                    "exact",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                                 },
                                 "minimums": {
                                   "type": "object",
@@ -11104,12 +9898,7 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": [
-                                              "exact",
-                                              "ignore",
-                                              "subset",
-                                              "superset"
-                                            ]
+                                            "enum": ["exact", "ignore", "subset", "superset"]
                                           },
                                           {
                                             "type": "array",
@@ -11123,12 +9912,7 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": [
-                                              "exact",
-                                              "ignore",
-                                              "subset",
-                                              "superset"
-                                            ]
+                                            "enum": ["exact", "ignore", "subset", "superset"]
                                           },
                                           {
                                             "type": "array",
@@ -11139,9 +9923,7 @@
                                         ]
                                       }
                                     },
-                                    "required": [
-                                      "tool"
-                                    ],
+                                    "required": ["tool"],
                                     "additionalProperties": false
                                   }
                                 },
@@ -11149,12 +9931,7 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": [
-                                        "exact",
-                                        "ignore",
-                                        "subset",
-                                        "superset"
-                                      ]
+                                      "enum": ["exact", "ignore", "subset", "superset"]
                                     },
                                     {
                                       "type": "array",
@@ -11168,12 +9945,7 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": [
-                                        "exact",
-                                        "ignore",
-                                        "subset",
-                                        "superset"
-                                      ]
+                                      "enum": ["exact", "ignore", "subset", "superset"]
                                     },
                                     {
                                       "type": "array",
@@ -11184,10 +9956,7 @@
                                   ]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "mode"
-                              ],
+                              "required": ["type", "mode"],
                               "additionalProperties": false
                             },
                             {
@@ -11217,10 +9986,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "field-accuracy",
-                                    "field_accuracy"
-                                  ]
+                                  "enum": ["field-accuracy", "field_accuracy"]
                                 },
                                 "fields": {
                                   "type": "array",
@@ -11232,11 +9998,7 @@
                                       },
                                       "match": {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "numeric_tolerance",
-                                          "date"
-                                        ]
+                                        "enum": ["exact", "numeric_tolerance", "date"]
                                       },
                                       "required": {
                                         "type": "boolean"
@@ -11258,26 +10020,17 @@
                                         }
                                       }
                                     },
-                                    "required": [
-                                      "path",
-                                      "match"
-                                    ],
+                                    "required": ["path", "match"],
                                     "additionalProperties": false
                                   },
                                   "minItems": 1
                                 },
                                 "aggregation": {
                                   "type": "string",
-                                  "enum": [
-                                    "weighted_average",
-                                    "all_or_nothing"
-                                  ]
+                                  "enum": ["weighted_average", "all_or_nothing"]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "fields"
-                              ],
+                              "required": ["type", "fields"],
                               "additionalProperties": false
                             },
                             {
@@ -11314,10 +10067,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type",
-                                "threshold"
-                              ],
+                              "required": ["type", "threshold"],
                               "additionalProperties": false
                             },
                             {
@@ -11354,10 +10104,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type",
-                                "budget"
-                              ],
+                              "required": ["type", "budget"],
                               "additionalProperties": false
                             },
                             {
@@ -11387,10 +10134,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "token-usage",
-                                    "token_usage"
-                                  ]
+                                  "enum": ["token-usage", "token_usage"]
                                 },
                                 "max_total": {
                                   "type": "number",
@@ -11405,9 +10149,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -11437,10 +10179,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "execution-metrics",
-                                    "execution_metrics"
-                                  ]
+                                  "enum": ["execution-metrics", "execution_metrics"]
                                 },
                                 "max_tool_calls": {
                                   "type": "number",
@@ -11472,9 +10211,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -11510,10 +10247,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -11549,10 +10283,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -11582,15 +10313,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "is-json",
-                                    "is_json"
-                                  ]
+                                  "enum": ["is-json", "is_json"]
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -11626,10 +10352,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -11710,10 +10433,7 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": [
-                                            "score_range",
-                                            "outcome"
-                                          ],
+                                          "required": ["score_range", "outcome"],
                                           "additionalProperties": false
                                         }
                                       }
@@ -11723,10 +10443,7 @@
                                   "minItems": 1
                                 }
                               },
-                              "required": [
-                                "type",
-                                "criteria"
-                              ],
+                              "required": ["type", "criteria"],
                               "additionalProperties": false
                             }
                           ]
@@ -11763,12 +10480,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "code-grader",
-                                    "code_grader",
-                                    "code-judge",
-                                    "code_judge"
-                                  ]
+                                  "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                                 },
                                 "command": {
                                   "anyOf": [
@@ -11820,10 +10532,7 @@
                                   "additionalProperties": {}
                                 }
                               },
-                              "required": [
-                                "type",
-                                "command"
-                              ],
+                              "required": ["type", "command"],
                               "additionalProperties": false
                             },
                             {
@@ -11853,12 +10562,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "llm-grader",
-                                    "llm_grader",
-                                    "llm-judge",
-                                    "llm_judge"
-                                  ]
+                                  "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                                 },
                                 "prompt": {
                                   "anyOf": [
@@ -11952,10 +10656,7 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": [
-                                            "score_range",
-                                            "outcome"
-                                          ],
+                                          "required": ["score_range", "outcome"],
                                           "additionalProperties": false
                                         }
                                       }
@@ -11984,9 +10685,7 @@
                                   "maximum": 2
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -12046,9 +10745,7 @@
                                           }
                                         }
                                       },
-                                      "required": [
-                                        "type"
-                                      ],
+                                      "required": ["type"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -12064,10 +10761,7 @@
                                           "maximum": 1
                                         }
                                       },
-                                      "required": [
-                                        "type",
-                                        "threshold"
-                                      ],
+                                      "required": ["type", "threshold"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -12084,10 +10778,7 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": [
-                                        "type",
-                                        "path"
-                                      ],
+                                      "required": ["type", "path"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -12104,18 +10795,13 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": [
-                                        "type"
-                                      ],
+                                      "required": ["type"],
                                       "additionalProperties": false
                                     }
                                   ]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "aggregator"
-                              ],
+                              "required": ["type", "aggregator"],
                               "additionalProperties": false
                             },
                             {
@@ -12145,20 +10831,11 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "tool-trajectory",
-                                    "tool_trajectory"
-                                  ]
+                                  "enum": ["tool-trajectory", "tool_trajectory"]
                                 },
                                 "mode": {
                                   "type": "string",
-                                  "enum": [
-                                    "any_order",
-                                    "in_order",
-                                    "exact",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                                 },
                                 "minimums": {
                                   "type": "object",
@@ -12199,12 +10876,7 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": [
-                                              "exact",
-                                              "ignore",
-                                              "subset",
-                                              "superset"
-                                            ]
+                                            "enum": ["exact", "ignore", "subset", "superset"]
                                           },
                                           {
                                             "type": "array",
@@ -12218,12 +10890,7 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": [
-                                              "exact",
-                                              "ignore",
-                                              "subset",
-                                              "superset"
-                                            ]
+                                            "enum": ["exact", "ignore", "subset", "superset"]
                                           },
                                           {
                                             "type": "array",
@@ -12234,9 +10901,7 @@
                                         ]
                                       }
                                     },
-                                    "required": [
-                                      "tool"
-                                    ],
+                                    "required": ["tool"],
                                     "additionalProperties": false
                                   }
                                 },
@@ -12244,12 +10909,7 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": [
-                                        "exact",
-                                        "ignore",
-                                        "subset",
-                                        "superset"
-                                      ]
+                                      "enum": ["exact", "ignore", "subset", "superset"]
                                     },
                                     {
                                       "type": "array",
@@ -12263,12 +10923,7 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": [
-                                        "exact",
-                                        "ignore",
-                                        "subset",
-                                        "superset"
-                                      ]
+                                      "enum": ["exact", "ignore", "subset", "superset"]
                                     },
                                     {
                                       "type": "array",
@@ -12279,10 +10934,7 @@
                                   ]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "mode"
-                              ],
+                              "required": ["type", "mode"],
                               "additionalProperties": false
                             },
                             {
@@ -12312,10 +10964,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "field-accuracy",
-                                    "field_accuracy"
-                                  ]
+                                  "enum": ["field-accuracy", "field_accuracy"]
                                 },
                                 "fields": {
                                   "type": "array",
@@ -12327,11 +10976,7 @@
                                       },
                                       "match": {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "numeric_tolerance",
-                                          "date"
-                                        ]
+                                        "enum": ["exact", "numeric_tolerance", "date"]
                                       },
                                       "required": {
                                         "type": "boolean"
@@ -12353,26 +10998,17 @@
                                         }
                                       }
                                     },
-                                    "required": [
-                                      "path",
-                                      "match"
-                                    ],
+                                    "required": ["path", "match"],
                                     "additionalProperties": false
                                   },
                                   "minItems": 1
                                 },
                                 "aggregation": {
                                   "type": "string",
-                                  "enum": [
-                                    "weighted_average",
-                                    "all_or_nothing"
-                                  ]
+                                  "enum": ["weighted_average", "all_or_nothing"]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "fields"
-                              ],
+                              "required": ["type", "fields"],
                               "additionalProperties": false
                             },
                             {
@@ -12409,10 +11045,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type",
-                                "threshold"
-                              ],
+                              "required": ["type", "threshold"],
                               "additionalProperties": false
                             },
                             {
@@ -12449,10 +11082,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type",
-                                "budget"
-                              ],
+                              "required": ["type", "budget"],
                               "additionalProperties": false
                             },
                             {
@@ -12482,10 +11112,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "token-usage",
-                                    "token_usage"
-                                  ]
+                                  "enum": ["token-usage", "token_usage"]
                                 },
                                 "max_total": {
                                   "type": "number",
@@ -12500,9 +11127,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -12532,10 +11157,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "execution-metrics",
-                                    "execution_metrics"
-                                  ]
+                                  "enum": ["execution-metrics", "execution_metrics"]
                                 },
                                 "max_tool_calls": {
                                   "type": "number",
@@ -12567,9 +11189,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -12605,10 +11225,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -12644,10 +11261,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -12677,15 +11291,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "is-json",
-                                    "is_json"
-                                  ]
+                                  "enum": ["is-json", "is_json"]
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -12721,10 +11330,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -12805,10 +11411,7 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": [
-                                            "score_range",
-                                            "outcome"
-                                          ],
+                                          "required": ["score_range", "outcome"],
                                           "additionalProperties": false
                                         }
                                       }
@@ -12818,10 +11421,7 @@
                                   "minItems": 1
                                 }
                               },
-                              "required": [
-                                "type",
-                                "criteria"
-                              ],
+                              "required": ["type", "criteria"],
                               "additionalProperties": false
                             }
                           ]
@@ -12858,12 +11458,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "code-grader",
-                                    "code_grader",
-                                    "code-judge",
-                                    "code_judge"
-                                  ]
+                                  "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                                 },
                                 "command": {
                                   "anyOf": [
@@ -12915,10 +11510,7 @@
                                   "additionalProperties": {}
                                 }
                               },
-                              "required": [
-                                "type",
-                                "command"
-                              ],
+                              "required": ["type", "command"],
                               "additionalProperties": false
                             },
                             {
@@ -12948,12 +11540,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "llm-grader",
-                                    "llm_grader",
-                                    "llm-judge",
-                                    "llm_judge"
-                                  ]
+                                  "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                                 },
                                 "prompt": {
                                   "anyOf": [
@@ -13047,10 +11634,7 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": [
-                                            "score_range",
-                                            "outcome"
-                                          ],
+                                          "required": ["score_range", "outcome"],
                                           "additionalProperties": false
                                         }
                                       }
@@ -13079,9 +11663,7 @@
                                   "maximum": 2
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -13141,9 +11723,7 @@
                                           }
                                         }
                                       },
-                                      "required": [
-                                        "type"
-                                      ],
+                                      "required": ["type"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -13159,10 +11739,7 @@
                                           "maximum": 1
                                         }
                                       },
-                                      "required": [
-                                        "type",
-                                        "threshold"
-                                      ],
+                                      "required": ["type", "threshold"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -13179,10 +11756,7 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": [
-                                        "type",
-                                        "path"
-                                      ],
+                                      "required": ["type", "path"],
                                       "additionalProperties": false
                                     },
                                     {
@@ -13199,18 +11773,13 @@
                                           "type": "string"
                                         }
                                       },
-                                      "required": [
-                                        "type"
-                                      ],
+                                      "required": ["type"],
                                       "additionalProperties": false
                                     }
                                   ]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "aggregator"
-                              ],
+                              "required": ["type", "aggregator"],
                               "additionalProperties": false
                             },
                             {
@@ -13240,20 +11809,11 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "tool-trajectory",
-                                    "tool_trajectory"
-                                  ]
+                                  "enum": ["tool-trajectory", "tool_trajectory"]
                                 },
                                 "mode": {
                                   "type": "string",
-                                  "enum": [
-                                    "any_order",
-                                    "in_order",
-                                    "exact",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                                 },
                                 "minimums": {
                                   "type": "object",
@@ -13294,12 +11854,7 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": [
-                                              "exact",
-                                              "ignore",
-                                              "subset",
-                                              "superset"
-                                            ]
+                                            "enum": ["exact", "ignore", "subset", "superset"]
                                           },
                                           {
                                             "type": "array",
@@ -13313,12 +11868,7 @@
                                         "anyOf": [
                                           {
                                             "type": "string",
-                                            "enum": [
-                                              "exact",
-                                              "ignore",
-                                              "subset",
-                                              "superset"
-                                            ]
+                                            "enum": ["exact", "ignore", "subset", "superset"]
                                           },
                                           {
                                             "type": "array",
@@ -13329,9 +11879,7 @@
                                         ]
                                       }
                                     },
-                                    "required": [
-                                      "tool"
-                                    ],
+                                    "required": ["tool"],
                                     "additionalProperties": false
                                   }
                                 },
@@ -13339,12 +11887,7 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": [
-                                        "exact",
-                                        "ignore",
-                                        "subset",
-                                        "superset"
-                                      ]
+                                      "enum": ["exact", "ignore", "subset", "superset"]
                                     },
                                     {
                                       "type": "array",
@@ -13358,12 +11901,7 @@
                                   "anyOf": [
                                     {
                                       "type": "string",
-                                      "enum": [
-                                        "exact",
-                                        "ignore",
-                                        "subset",
-                                        "superset"
-                                      ]
+                                      "enum": ["exact", "ignore", "subset", "superset"]
                                     },
                                     {
                                       "type": "array",
@@ -13374,10 +11912,7 @@
                                   ]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "mode"
-                              ],
+                              "required": ["type", "mode"],
                               "additionalProperties": false
                             },
                             {
@@ -13407,10 +11942,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "field-accuracy",
-                                    "field_accuracy"
-                                  ]
+                                  "enum": ["field-accuracy", "field_accuracy"]
                                 },
                                 "fields": {
                                   "type": "array",
@@ -13422,11 +11954,7 @@
                                       },
                                       "match": {
                                         "type": "string",
-                                        "enum": [
-                                          "exact",
-                                          "numeric_tolerance",
-                                          "date"
-                                        ]
+                                        "enum": ["exact", "numeric_tolerance", "date"]
                                       },
                                       "required": {
                                         "type": "boolean"
@@ -13448,26 +11976,17 @@
                                         }
                                       }
                                     },
-                                    "required": [
-                                      "path",
-                                      "match"
-                                    ],
+                                    "required": ["path", "match"],
                                     "additionalProperties": false
                                   },
                                   "minItems": 1
                                 },
                                 "aggregation": {
                                   "type": "string",
-                                  "enum": [
-                                    "weighted_average",
-                                    "all_or_nothing"
-                                  ]
+                                  "enum": ["weighted_average", "all_or_nothing"]
                                 }
                               },
-                              "required": [
-                                "type",
-                                "fields"
-                              ],
+                              "required": ["type", "fields"],
                               "additionalProperties": false
                             },
                             {
@@ -13504,10 +12023,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type",
-                                "threshold"
-                              ],
+                              "required": ["type", "threshold"],
                               "additionalProperties": false
                             },
                             {
@@ -13544,10 +12060,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type",
-                                "budget"
-                              ],
+                              "required": ["type", "budget"],
                               "additionalProperties": false
                             },
                             {
@@ -13577,10 +12090,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "token-usage",
-                                    "token_usage"
-                                  ]
+                                  "enum": ["token-usage", "token_usage"]
                                 },
                                 "max_total": {
                                   "type": "number",
@@ -13595,9 +12105,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -13627,10 +12135,7 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "execution-metrics",
-                                    "execution_metrics"
-                                  ]
+                                  "enum": ["execution-metrics", "execution_metrics"]
                                 },
                                 "max_tool_calls": {
                                   "type": "number",
@@ -13662,9 +12167,7 @@
                                   "minimum": 0
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -13700,10 +12203,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -13739,10 +12239,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -13772,15 +12269,10 @@
                                 },
                                 "type": {
                                   "type": "string",
-                                  "enum": [
-                                    "is-json",
-                                    "is_json"
-                                  ]
+                                  "enum": ["is-json", "is_json"]
                                 }
                               },
-                              "required": [
-                                "type"
-                              ],
+                              "required": ["type"],
                               "additionalProperties": false
                             },
                             {
@@ -13816,10 +12308,7 @@
                                   "type": "string"
                                 }
                               },
-                              "required": [
-                                "type",
-                                "value"
-                              ],
+                              "required": ["type", "value"],
                               "additionalProperties": false
                             },
                             {
@@ -13900,10 +12389,7 @@
                                               "minLength": 1
                                             }
                                           },
-                                          "required": [
-                                            "score_range",
-                                            "outcome"
-                                          ],
+                                          "required": ["score_range", "outcome"],
                                           "additionalProperties": false
                                         }
                                       }
@@ -13913,10 +12399,7 @@
                                   "minItems": 1
                                 }
                               },
-                              "required": [
-                                "type",
-                                "criteria"
-                              ],
+                              "required": ["type", "criteria"],
                               "additionalProperties": false
                             }
                           ]
@@ -13937,11 +12420,7 @@
                           },
                           "strategy": {
                             "type": "string",
-                            "enum": [
-                              "pass_at_k",
-                              "mean",
-                              "confidence_interval"
-                            ]
+                            "enum": ["pass_at_k", "mean", "confidence_interval"]
                           },
                           "cost_limit_usd": {
                             "type": "number",
@@ -13952,9 +12431,7 @@
                             "minimum": 0
                           }
                         },
-                        "required": [
-                          "count"
-                        ],
+                        "required": ["count"],
                         "additionalProperties": false
                       },
                       "total_budget_usd": {
@@ -13987,10 +12464,7 @@
                       },
                       "isolation": {
                         "type": "string",
-                        "enum": [
-                          "shared",
-                          "per_test"
-                        ]
+                        "enum": ["shared", "per_test"]
                       },
                       "repos": {
                         "type": "array",
@@ -14014,10 +12488,7 @@
                                       "format": "uri"
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "url"
-                                  ],
+                                  "required": ["type", "url"],
                                   "additionalProperties": false
                                 },
                                 {
@@ -14031,10 +12502,7 @@
                                       "type": "string"
                                     }
                                   },
-                                  "required": [
-                                    "type",
-                                    "path"
-                                  ],
+                                  "required": ["type", "path"],
                                   "additionalProperties": false
                                 }
                               ]
@@ -14047,10 +12515,7 @@
                                 },
                                 "resolve": {
                                   "type": "string",
-                                  "enum": [
-                                    "remote",
-                                    "local"
-                                  ]
+                                  "enum": ["remote", "local"]
                                 },
                                 "ancestor": {
                                   "type": "integer",
@@ -14079,10 +12544,7 @@
                               "additionalProperties": false
                             }
                           },
-                          "required": [
-                            "path",
-                            "source"
-                          ],
+                          "required": ["path", "source"],
                           "additionalProperties": false
                         }
                       },
@@ -14118,11 +12580,7 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": [
-                                  "none",
-                                  "fast",
-                                  "strict"
-                                ]
+                                "enum": ["none", "fast", "strict"]
                               }
                             },
                             "additionalProperties": false
@@ -14153,11 +12611,7 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": [
-                                  "none",
-                                  "fast",
-                                  "strict"
-                                ]
+                                "enum": ["none", "fast", "strict"]
                               }
                             },
                             "additionalProperties": false
@@ -14188,11 +12642,7 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": [
-                                  "none",
-                                  "fast",
-                                  "strict"
-                                ]
+                                "enum": ["none", "fast", "strict"]
                               }
                             },
                             "additionalProperties": false
@@ -14223,11 +12673,7 @@
                               },
                               "reset": {
                                 "type": "string",
-                                "enum": [
-                                  "none",
-                                  "fast",
-                                  "strict"
-                                ]
+                                "enum": ["none", "fast", "strict"]
                               }
                             },
                             "additionalProperties": false
@@ -14237,11 +12683,7 @@
                       },
                       "mode": {
                         "type": "string",
-                        "enum": [
-                          "pooled",
-                          "temp",
-                          "static"
-                        ]
+                        "enum": ["pooled", "temp", "static"]
                       },
                       "path": {
                         "type": "string"
@@ -14263,9 +12705,7 @@
                     "type": "string"
                   }
                 },
-                "required": [
-                  "id"
-                ],
+                "required": ["id"],
                 "additionalProperties": false
               }
             },
@@ -14325,12 +12765,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "code-grader",
-                          "code_grader",
-                          "code-judge",
-                          "code_judge"
-                        ]
+                        "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                       },
                       "command": {
                         "anyOf": [
@@ -14382,10 +12817,7 @@
                         "additionalProperties": {}
                       }
                     },
-                    "required": [
-                      "type",
-                      "command"
-                    ],
+                    "required": ["type", "command"],
                     "additionalProperties": false
                   },
                   {
@@ -14415,12 +12847,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "llm-grader",
-                          "llm_grader",
-                          "llm-judge",
-                          "llm_judge"
-                        ]
+                        "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                       },
                       "prompt": {
                         "anyOf": [
@@ -14514,10 +12941,7 @@
                                     "minLength": 1
                                   }
                                 },
-                                "required": [
-                                  "score_range",
-                                  "outcome"
-                                ],
+                                "required": ["score_range", "outcome"],
                                 "additionalProperties": false
                               }
                             }
@@ -14546,9 +12970,7 @@
                         "maximum": 2
                       }
                     },
-                    "required": [
-                      "type"
-                    ],
+                    "required": ["type"],
                     "additionalProperties": false
                   },
                   {
@@ -14608,9 +13030,7 @@
                                 }
                               }
                             },
-                            "required": [
-                              "type"
-                            ],
+                            "required": ["type"],
                             "additionalProperties": false
                           },
                           {
@@ -14626,10 +13046,7 @@
                                 "maximum": 1
                               }
                             },
-                            "required": [
-                              "type",
-                              "threshold"
-                            ],
+                            "required": ["type", "threshold"],
                             "additionalProperties": false
                           },
                           {
@@ -14646,10 +13063,7 @@
                                 "type": "string"
                               }
                             },
-                            "required": [
-                              "type",
-                              "path"
-                            ],
+                            "required": ["type", "path"],
                             "additionalProperties": false
                           },
                           {
@@ -14666,18 +13080,13 @@
                                 "type": "string"
                               }
                             },
-                            "required": [
-                              "type"
-                            ],
+                            "required": ["type"],
                             "additionalProperties": false
                           }
                         ]
                       }
                     },
-                    "required": [
-                      "type",
-                      "aggregator"
-                    ],
+                    "required": ["type", "aggregator"],
                     "additionalProperties": false
                   },
                   {
@@ -14707,20 +13116,11 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "tool-trajectory",
-                          "tool_trajectory"
-                        ]
+                        "enum": ["tool-trajectory", "tool_trajectory"]
                       },
                       "mode": {
                         "type": "string",
-                        "enum": [
-                          "any_order",
-                          "in_order",
-                          "exact",
-                          "subset",
-                          "superset"
-                        ]
+                        "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                       },
                       "minimums": {
                         "type": "object",
@@ -14761,12 +13161,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -14780,12 +13175,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -14796,9 +13186,7 @@
                               ]
                             }
                           },
-                          "required": [
-                            "tool"
-                          ],
+                          "required": ["tool"],
                           "additionalProperties": false
                         }
                       },
@@ -14806,12 +13194,7 @@
                         "anyOf": [
                           {
                             "type": "string",
-                            "enum": [
-                              "exact",
-                              "ignore",
-                              "subset",
-                              "superset"
-                            ]
+                            "enum": ["exact", "ignore", "subset", "superset"]
                           },
                           {
                             "type": "array",
@@ -14825,12 +13208,7 @@
                         "anyOf": [
                           {
                             "type": "string",
-                            "enum": [
-                              "exact",
-                              "ignore",
-                              "subset",
-                              "superset"
-                            ]
+                            "enum": ["exact", "ignore", "subset", "superset"]
                           },
                           {
                             "type": "array",
@@ -14841,10 +13219,7 @@
                         ]
                       }
                     },
-                    "required": [
-                      "type",
-                      "mode"
-                    ],
+                    "required": ["type", "mode"],
                     "additionalProperties": false
                   },
                   {
@@ -14874,10 +13249,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "field-accuracy",
-                          "field_accuracy"
-                        ]
+                        "enum": ["field-accuracy", "field_accuracy"]
                       },
                       "fields": {
                         "type": "array",
@@ -14889,11 +13261,7 @@
                             },
                             "match": {
                               "type": "string",
-                              "enum": [
-                                "exact",
-                                "numeric_tolerance",
-                                "date"
-                              ]
+                              "enum": ["exact", "numeric_tolerance", "date"]
                             },
                             "required": {
                               "type": "boolean"
@@ -14915,26 +13283,17 @@
                               }
                             }
                           },
-                          "required": [
-                            "path",
-                            "match"
-                          ],
+                          "required": ["path", "match"],
                           "additionalProperties": false
                         },
                         "minItems": 1
                       },
                       "aggregation": {
                         "type": "string",
-                        "enum": [
-                          "weighted_average",
-                          "all_or_nothing"
-                        ]
+                        "enum": ["weighted_average", "all_or_nothing"]
                       }
                     },
-                    "required": [
-                      "type",
-                      "fields"
-                    ],
+                    "required": ["type", "fields"],
                     "additionalProperties": false
                   },
                   {
@@ -14971,10 +13330,7 @@
                         "minimum": 0
                       }
                     },
-                    "required": [
-                      "type",
-                      "threshold"
-                    ],
+                    "required": ["type", "threshold"],
                     "additionalProperties": false
                   },
                   {
@@ -15011,10 +13367,7 @@
                         "minimum": 0
                       }
                     },
-                    "required": [
-                      "type",
-                      "budget"
-                    ],
+                    "required": ["type", "budget"],
                     "additionalProperties": false
                   },
                   {
@@ -15044,10 +13397,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "token-usage",
-                          "token_usage"
-                        ]
+                        "enum": ["token-usage", "token_usage"]
                       },
                       "max_total": {
                         "type": "number",
@@ -15062,9 +13412,7 @@
                         "minimum": 0
                       }
                     },
-                    "required": [
-                      "type"
-                    ],
+                    "required": ["type"],
                     "additionalProperties": false
                   },
                   {
@@ -15094,10 +13442,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "execution-metrics",
-                          "execution_metrics"
-                        ]
+                        "enum": ["execution-metrics", "execution_metrics"]
                       },
                       "max_tool_calls": {
                         "type": "number",
@@ -15129,9 +13474,7 @@
                         "minimum": 0
                       }
                     },
-                    "required": [
-                      "type"
-                    ],
+                    "required": ["type"],
                     "additionalProperties": false
                   },
                   {
@@ -15167,10 +13510,7 @@
                         "type": "string"
                       }
                     },
-                    "required": [
-                      "type",
-                      "value"
-                    ],
+                    "required": ["type", "value"],
                     "additionalProperties": false
                   },
                   {
@@ -15206,10 +13546,7 @@
                         "type": "string"
                       }
                     },
-                    "required": [
-                      "type",
-                      "value"
-                    ],
+                    "required": ["type", "value"],
                     "additionalProperties": false
                   },
                   {
@@ -15239,15 +13576,10 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "is-json",
-                          "is_json"
-                        ]
+                        "enum": ["is-json", "is_json"]
                       }
                     },
-                    "required": [
-                      "type"
-                    ],
+                    "required": ["type"],
                     "additionalProperties": false
                   },
                   {
@@ -15283,10 +13615,7 @@
                         "type": "string"
                       }
                     },
-                    "required": [
-                      "type",
-                      "value"
-                    ],
+                    "required": ["type", "value"],
                     "additionalProperties": false
                   },
                   {
@@ -15367,10 +13696,7 @@
                                     "minLength": 1
                                   }
                                 },
-                                "required": [
-                                  "score_range",
-                                  "outcome"
-                                ],
+                                "required": ["score_range", "outcome"],
                                 "additionalProperties": false
                               }
                             }
@@ -15380,10 +13706,7 @@
                         "minItems": 1
                       }
                     },
-                    "required": [
-                      "type",
-                      "criteria"
-                    ],
+                    "required": ["type", "criteria"],
                     "additionalProperties": false
                   }
                 ]
@@ -15420,12 +13743,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "code-grader",
-                          "code_grader",
-                          "code-judge",
-                          "code_judge"
-                        ]
+                        "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                       },
                       "command": {
                         "anyOf": [
@@ -15477,10 +13795,7 @@
                         "additionalProperties": {}
                       }
                     },
-                    "required": [
-                      "type",
-                      "command"
-                    ],
+                    "required": ["type", "command"],
                     "additionalProperties": false
                   },
                   {
@@ -15510,12 +13825,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "llm-grader",
-                          "llm_grader",
-                          "llm-judge",
-                          "llm_judge"
-                        ]
+                        "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                       },
                       "prompt": {
                         "anyOf": [
@@ -15609,10 +13919,7 @@
                                     "minLength": 1
                                   }
                                 },
-                                "required": [
-                                  "score_range",
-                                  "outcome"
-                                ],
+                                "required": ["score_range", "outcome"],
                                 "additionalProperties": false
                               }
                             }
@@ -15641,9 +13948,7 @@
                         "maximum": 2
                       }
                     },
-                    "required": [
-                      "type"
-                    ],
+                    "required": ["type"],
                     "additionalProperties": false
                   },
                   {
@@ -15703,9 +14008,7 @@
                                 }
                               }
                             },
-                            "required": [
-                              "type"
-                            ],
+                            "required": ["type"],
                             "additionalProperties": false
                           },
                           {
@@ -15721,10 +14024,7 @@
                                 "maximum": 1
                               }
                             },
-                            "required": [
-                              "type",
-                              "threshold"
-                            ],
+                            "required": ["type", "threshold"],
                             "additionalProperties": false
                           },
                           {
@@ -15741,10 +14041,7 @@
                                 "type": "string"
                               }
                             },
-                            "required": [
-                              "type",
-                              "path"
-                            ],
+                            "required": ["type", "path"],
                             "additionalProperties": false
                           },
                           {
@@ -15761,18 +14058,13 @@
                                 "type": "string"
                               }
                             },
-                            "required": [
-                              "type"
-                            ],
+                            "required": ["type"],
                             "additionalProperties": false
                           }
                         ]
                       }
                     },
-                    "required": [
-                      "type",
-                      "aggregator"
-                    ],
+                    "required": ["type", "aggregator"],
                     "additionalProperties": false
                   },
                   {
@@ -15802,20 +14094,11 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "tool-trajectory",
-                          "tool_trajectory"
-                        ]
+                        "enum": ["tool-trajectory", "tool_trajectory"]
                       },
                       "mode": {
                         "type": "string",
-                        "enum": [
-                          "any_order",
-                          "in_order",
-                          "exact",
-                          "subset",
-                          "superset"
-                        ]
+                        "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                       },
                       "minimums": {
                         "type": "object",
@@ -15856,12 +14139,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -15875,12 +14153,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -15891,9 +14164,7 @@
                               ]
                             }
                           },
-                          "required": [
-                            "tool"
-                          ],
+                          "required": ["tool"],
                           "additionalProperties": false
                         }
                       },
@@ -15901,12 +14172,7 @@
                         "anyOf": [
                           {
                             "type": "string",
-                            "enum": [
-                              "exact",
-                              "ignore",
-                              "subset",
-                              "superset"
-                            ]
+                            "enum": ["exact", "ignore", "subset", "superset"]
                           },
                           {
                             "type": "array",
@@ -15920,12 +14186,7 @@
                         "anyOf": [
                           {
                             "type": "string",
-                            "enum": [
-                              "exact",
-                              "ignore",
-                              "subset",
-                              "superset"
-                            ]
+                            "enum": ["exact", "ignore", "subset", "superset"]
                           },
                           {
                             "type": "array",
@@ -15936,10 +14197,7 @@
                         ]
                       }
                     },
-                    "required": [
-                      "type",
-                      "mode"
-                    ],
+                    "required": ["type", "mode"],
                     "additionalProperties": false
                   },
                   {
@@ -15969,10 +14227,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "field-accuracy",
-                          "field_accuracy"
-                        ]
+                        "enum": ["field-accuracy", "field_accuracy"]
                       },
                       "fields": {
                         "type": "array",
@@ -15984,11 +14239,7 @@
                             },
                             "match": {
                               "type": "string",
-                              "enum": [
-                                "exact",
-                                "numeric_tolerance",
-                                "date"
-                              ]
+                              "enum": ["exact", "numeric_tolerance", "date"]
                             },
                             "required": {
                               "type": "boolean"
@@ -16010,26 +14261,17 @@
                               }
                             }
                           },
-                          "required": [
-                            "path",
-                            "match"
-                          ],
+                          "required": ["path", "match"],
                           "additionalProperties": false
                         },
                         "minItems": 1
                       },
                       "aggregation": {
                         "type": "string",
-                        "enum": [
-                          "weighted_average",
-                          "all_or_nothing"
-                        ]
+                        "enum": ["weighted_average", "all_or_nothing"]
                       }
                     },
-                    "required": [
-                      "type",
-                      "fields"
-                    ],
+                    "required": ["type", "fields"],
                     "additionalProperties": false
                   },
                   {
@@ -16066,10 +14308,7 @@
                         "minimum": 0
                       }
                     },
-                    "required": [
-                      "type",
-                      "threshold"
-                    ],
+                    "required": ["type", "threshold"],
                     "additionalProperties": false
                   },
                   {
@@ -16106,10 +14345,7 @@
                         "minimum": 0
                       }
                     },
-                    "required": [
-                      "type",
-                      "budget"
-                    ],
+                    "required": ["type", "budget"],
                     "additionalProperties": false
                   },
                   {
@@ -16139,10 +14375,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "token-usage",
-                          "token_usage"
-                        ]
+                        "enum": ["token-usage", "token_usage"]
                       },
                       "max_total": {
                         "type": "number",
@@ -16157,9 +14390,7 @@
                         "minimum": 0
                       }
                     },
-                    "required": [
-                      "type"
-                    ],
+                    "required": ["type"],
                     "additionalProperties": false
                   },
                   {
@@ -16189,10 +14420,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "execution-metrics",
-                          "execution_metrics"
-                        ]
+                        "enum": ["execution-metrics", "execution_metrics"]
                       },
                       "max_tool_calls": {
                         "type": "number",
@@ -16224,9 +14452,7 @@
                         "minimum": 0
                       }
                     },
-                    "required": [
-                      "type"
-                    ],
+                    "required": ["type"],
                     "additionalProperties": false
                   },
                   {
@@ -16262,10 +14488,7 @@
                         "type": "string"
                       }
                     },
-                    "required": [
-                      "type",
-                      "value"
-                    ],
+                    "required": ["type", "value"],
                     "additionalProperties": false
                   },
                   {
@@ -16301,10 +14524,7 @@
                         "type": "string"
                       }
                     },
-                    "required": [
-                      "type",
-                      "value"
-                    ],
+                    "required": ["type", "value"],
                     "additionalProperties": false
                   },
                   {
@@ -16334,15 +14554,10 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "is-json",
-                          "is_json"
-                        ]
+                        "enum": ["is-json", "is_json"]
                       }
                     },
-                    "required": [
-                      "type"
-                    ],
+                    "required": ["type"],
                     "additionalProperties": false
                   },
                   {
@@ -16378,10 +14593,7 @@
                         "type": "string"
                       }
                     },
-                    "required": [
-                      "type",
-                      "value"
-                    ],
+                    "required": ["type", "value"],
                     "additionalProperties": false
                   },
                   {
@@ -16462,10 +14674,7 @@
                                     "minLength": 1
                                   }
                                 },
-                                "required": [
-                                  "score_range",
-                                  "outcome"
-                                ],
+                                "required": ["score_range", "outcome"],
                                 "additionalProperties": false
                               }
                             }
@@ -16475,10 +14684,7 @@
                         "minItems": 1
                       }
                     },
-                    "required": [
-                      "type",
-                      "criteria"
-                    ],
+                    "required": ["type", "criteria"],
                     "additionalProperties": false
                   }
                 ]
@@ -16515,12 +14721,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "code-grader",
-                          "code_grader",
-                          "code-judge",
-                          "code_judge"
-                        ]
+                        "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                       },
                       "command": {
                         "anyOf": [
@@ -16572,10 +14773,7 @@
                         "additionalProperties": {}
                       }
                     },
-                    "required": [
-                      "type",
-                      "command"
-                    ],
+                    "required": ["type", "command"],
                     "additionalProperties": false
                   },
                   {
@@ -16605,12 +14803,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "llm-grader",
-                          "llm_grader",
-                          "llm-judge",
-                          "llm_judge"
-                        ]
+                        "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                       },
                       "prompt": {
                         "anyOf": [
@@ -16704,10 +14897,7 @@
                                     "minLength": 1
                                   }
                                 },
-                                "required": [
-                                  "score_range",
-                                  "outcome"
-                                ],
+                                "required": ["score_range", "outcome"],
                                 "additionalProperties": false
                               }
                             }
@@ -16736,9 +14926,7 @@
                         "maximum": 2
                       }
                     },
-                    "required": [
-                      "type"
-                    ],
+                    "required": ["type"],
                     "additionalProperties": false
                   },
                   {
@@ -16798,9 +14986,7 @@
                                 }
                               }
                             },
-                            "required": [
-                              "type"
-                            ],
+                            "required": ["type"],
                             "additionalProperties": false
                           },
                           {
@@ -16816,10 +15002,7 @@
                                 "maximum": 1
                               }
                             },
-                            "required": [
-                              "type",
-                              "threshold"
-                            ],
+                            "required": ["type", "threshold"],
                             "additionalProperties": false
                           },
                           {
@@ -16836,10 +15019,7 @@
                                 "type": "string"
                               }
                             },
-                            "required": [
-                              "type",
-                              "path"
-                            ],
+                            "required": ["type", "path"],
                             "additionalProperties": false
                           },
                           {
@@ -16856,18 +15036,13 @@
                                 "type": "string"
                               }
                             },
-                            "required": [
-                              "type"
-                            ],
+                            "required": ["type"],
                             "additionalProperties": false
                           }
                         ]
                       }
                     },
-                    "required": [
-                      "type",
-                      "aggregator"
-                    ],
+                    "required": ["type", "aggregator"],
                     "additionalProperties": false
                   },
                   {
@@ -16897,20 +15072,11 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "tool-trajectory",
-                          "tool_trajectory"
-                        ]
+                        "enum": ["tool-trajectory", "tool_trajectory"]
                       },
                       "mode": {
                         "type": "string",
-                        "enum": [
-                          "any_order",
-                          "in_order",
-                          "exact",
-                          "subset",
-                          "superset"
-                        ]
+                        "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                       },
                       "minimums": {
                         "type": "object",
@@ -16951,12 +15117,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -16970,12 +15131,7 @@
                               "anyOf": [
                                 {
                                   "type": "string",
-                                  "enum": [
-                                    "exact",
-                                    "ignore",
-                                    "subset",
-                                    "superset"
-                                  ]
+                                  "enum": ["exact", "ignore", "subset", "superset"]
                                 },
                                 {
                                   "type": "array",
@@ -16986,9 +15142,7 @@
                               ]
                             }
                           },
-                          "required": [
-                            "tool"
-                          ],
+                          "required": ["tool"],
                           "additionalProperties": false
                         }
                       },
@@ -16996,12 +15150,7 @@
                         "anyOf": [
                           {
                             "type": "string",
-                            "enum": [
-                              "exact",
-                              "ignore",
-                              "subset",
-                              "superset"
-                            ]
+                            "enum": ["exact", "ignore", "subset", "superset"]
                           },
                           {
                             "type": "array",
@@ -17015,12 +15164,7 @@
                         "anyOf": [
                           {
                             "type": "string",
-                            "enum": [
-                              "exact",
-                              "ignore",
-                              "subset",
-                              "superset"
-                            ]
+                            "enum": ["exact", "ignore", "subset", "superset"]
                           },
                           {
                             "type": "array",
@@ -17031,10 +15175,7 @@
                         ]
                       }
                     },
-                    "required": [
-                      "type",
-                      "mode"
-                    ],
+                    "required": ["type", "mode"],
                     "additionalProperties": false
                   },
                   {
@@ -17064,10 +15205,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "field-accuracy",
-                          "field_accuracy"
-                        ]
+                        "enum": ["field-accuracy", "field_accuracy"]
                       },
                       "fields": {
                         "type": "array",
@@ -17079,11 +15217,7 @@
                             },
                             "match": {
                               "type": "string",
-                              "enum": [
-                                "exact",
-                                "numeric_tolerance",
-                                "date"
-                              ]
+                              "enum": ["exact", "numeric_tolerance", "date"]
                             },
                             "required": {
                               "type": "boolean"
@@ -17105,26 +15239,17 @@
                               }
                             }
                           },
-                          "required": [
-                            "path",
-                            "match"
-                          ],
+                          "required": ["path", "match"],
                           "additionalProperties": false
                         },
                         "minItems": 1
                       },
                       "aggregation": {
                         "type": "string",
-                        "enum": [
-                          "weighted_average",
-                          "all_or_nothing"
-                        ]
+                        "enum": ["weighted_average", "all_or_nothing"]
                       }
                     },
-                    "required": [
-                      "type",
-                      "fields"
-                    ],
+                    "required": ["type", "fields"],
                     "additionalProperties": false
                   },
                   {
@@ -17161,10 +15286,7 @@
                         "minimum": 0
                       }
                     },
-                    "required": [
-                      "type",
-                      "threshold"
-                    ],
+                    "required": ["type", "threshold"],
                     "additionalProperties": false
                   },
                   {
@@ -17201,10 +15323,7 @@
                         "minimum": 0
                       }
                     },
-                    "required": [
-                      "type",
-                      "budget"
-                    ],
+                    "required": ["type", "budget"],
                     "additionalProperties": false
                   },
                   {
@@ -17234,10 +15353,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "token-usage",
-                          "token_usage"
-                        ]
+                        "enum": ["token-usage", "token_usage"]
                       },
                       "max_total": {
                         "type": "number",
@@ -17252,9 +15368,7 @@
                         "minimum": 0
                       }
                     },
-                    "required": [
-                      "type"
-                    ],
+                    "required": ["type"],
                     "additionalProperties": false
                   },
                   {
@@ -17284,10 +15398,7 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "execution-metrics",
-                          "execution_metrics"
-                        ]
+                        "enum": ["execution-metrics", "execution_metrics"]
                       },
                       "max_tool_calls": {
                         "type": "number",
@@ -17319,9 +15430,7 @@
                         "minimum": 0
                       }
                     },
-                    "required": [
-                      "type"
-                    ],
+                    "required": ["type"],
                     "additionalProperties": false
                   },
                   {
@@ -17357,10 +15466,7 @@
                         "type": "string"
                       }
                     },
-                    "required": [
-                      "type",
-                      "value"
-                    ],
+                    "required": ["type", "value"],
                     "additionalProperties": false
                   },
                   {
@@ -17396,10 +15502,7 @@
                         "type": "string"
                       }
                     },
-                    "required": [
-                      "type",
-                      "value"
-                    ],
+                    "required": ["type", "value"],
                     "additionalProperties": false
                   },
                   {
@@ -17429,15 +15532,10 @@
                       },
                       "type": {
                         "type": "string",
-                        "enum": [
-                          "is-json",
-                          "is_json"
-                        ]
+                        "enum": ["is-json", "is_json"]
                       }
                     },
-                    "required": [
-                      "type"
-                    ],
+                    "required": ["type"],
                     "additionalProperties": false
                   },
                   {
@@ -17473,10 +15571,7 @@
                         "type": "string"
                       }
                     },
-                    "required": [
-                      "type",
-                      "value"
-                    ],
+                    "required": ["type", "value"],
                     "additionalProperties": false
                   },
                   {
@@ -17557,10 +15652,7 @@
                                     "minLength": 1
                                   }
                                 },
-                                "required": [
-                                  "score_range",
-                                  "outcome"
-                                ],
+                                "required": ["score_range", "outcome"],
                                 "additionalProperties": false
                               }
                             }
@@ -17570,10 +15662,7 @@
                         "minItems": 1
                       }
                     },
-                    "required": [
-                      "type",
-                      "criteria"
-                    ],
+                    "required": ["type", "criteria"],
                     "additionalProperties": false
                   }
                 ]
@@ -17594,11 +15683,7 @@
                 },
                 "strategy": {
                   "type": "string",
-                  "enum": [
-                    "pass_at_k",
-                    "mean",
-                    "confidence_interval"
-                  ]
+                  "enum": ["pass_at_k", "mean", "confidence_interval"]
                 },
                 "cost_limit_usd": {
                   "type": "number",
@@ -17609,9 +15694,7 @@
                   "minimum": 0
                 }
               },
-              "required": [
-                "count"
-              ],
+              "required": ["count"],
               "additionalProperties": false
             },
             "total_budget_usd": {
@@ -17667,12 +15750,7 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "code-grader",
-                      "code_grader",
-                      "code-judge",
-                      "code_judge"
-                    ]
+                    "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                   },
                   "command": {
                     "anyOf": [
@@ -17724,10 +15802,7 @@
                     "additionalProperties": {}
                   }
                 },
-                "required": [
-                  "type",
-                  "command"
-                ],
+                "required": ["type", "command"],
                 "additionalProperties": false
               },
               {
@@ -17757,12 +15832,7 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "llm-grader",
-                      "llm_grader",
-                      "llm-judge",
-                      "llm_judge"
-                    ]
+                    "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                   },
                   "prompt": {
                     "anyOf": [
@@ -17856,10 +15926,7 @@
                                 "minLength": 1
                               }
                             },
-                            "required": [
-                              "score_range",
-                              "outcome"
-                            ],
+                            "required": ["score_range", "outcome"],
                             "additionalProperties": false
                           }
                         }
@@ -17888,9 +15955,7 @@
                     "maximum": 2
                   }
                 },
-                "required": [
-                  "type"
-                ],
+                "required": ["type"],
                 "additionalProperties": false
               },
               {
@@ -17950,9 +16015,7 @@
                             }
                           }
                         },
-                        "required": [
-                          "type"
-                        ],
+                        "required": ["type"],
                         "additionalProperties": false
                       },
                       {
@@ -17968,10 +16031,7 @@
                             "maximum": 1
                           }
                         },
-                        "required": [
-                          "type",
-                          "threshold"
-                        ],
+                        "required": ["type", "threshold"],
                         "additionalProperties": false
                       },
                       {
@@ -17988,10 +16048,7 @@
                             "type": "string"
                           }
                         },
-                        "required": [
-                          "type",
-                          "path"
-                        ],
+                        "required": ["type", "path"],
                         "additionalProperties": false
                       },
                       {
@@ -18008,18 +16065,13 @@
                             "type": "string"
                           }
                         },
-                        "required": [
-                          "type"
-                        ],
+                        "required": ["type"],
                         "additionalProperties": false
                       }
                     ]
                   }
                 },
-                "required": [
-                  "type",
-                  "aggregator"
-                ],
+                "required": ["type", "aggregator"],
                 "additionalProperties": false
               },
               {
@@ -18049,20 +16101,11 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "tool-trajectory",
-                      "tool_trajectory"
-                    ]
+                    "enum": ["tool-trajectory", "tool_trajectory"]
                   },
                   "mode": {
                     "type": "string",
-                    "enum": [
-                      "any_order",
-                      "in_order",
-                      "exact",
-                      "subset",
-                      "superset"
-                    ]
+                    "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                   },
                   "minimums": {
                     "type": "object",
@@ -18103,12 +16146,7 @@
                           "anyOf": [
                             {
                               "type": "string",
-                              "enum": [
-                                "exact",
-                                "ignore",
-                                "subset",
-                                "superset"
-                              ]
+                              "enum": ["exact", "ignore", "subset", "superset"]
                             },
                             {
                               "type": "array",
@@ -18122,12 +16160,7 @@
                           "anyOf": [
                             {
                               "type": "string",
-                              "enum": [
-                                "exact",
-                                "ignore",
-                                "subset",
-                                "superset"
-                              ]
+                              "enum": ["exact", "ignore", "subset", "superset"]
                             },
                             {
                               "type": "array",
@@ -18138,9 +16171,7 @@
                           ]
                         }
                       },
-                      "required": [
-                        "tool"
-                      ],
+                      "required": ["tool"],
                       "additionalProperties": false
                     }
                   },
@@ -18148,12 +16179,7 @@
                     "anyOf": [
                       {
                         "type": "string",
-                        "enum": [
-                          "exact",
-                          "ignore",
-                          "subset",
-                          "superset"
-                        ]
+                        "enum": ["exact", "ignore", "subset", "superset"]
                       },
                       {
                         "type": "array",
@@ -18167,12 +16193,7 @@
                     "anyOf": [
                       {
                         "type": "string",
-                        "enum": [
-                          "exact",
-                          "ignore",
-                          "subset",
-                          "superset"
-                        ]
+                        "enum": ["exact", "ignore", "subset", "superset"]
                       },
                       {
                         "type": "array",
@@ -18183,10 +16204,7 @@
                     ]
                   }
                 },
-                "required": [
-                  "type",
-                  "mode"
-                ],
+                "required": ["type", "mode"],
                 "additionalProperties": false
               },
               {
@@ -18216,10 +16234,7 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "field-accuracy",
-                      "field_accuracy"
-                    ]
+                    "enum": ["field-accuracy", "field_accuracy"]
                   },
                   "fields": {
                     "type": "array",
@@ -18231,11 +16246,7 @@
                         },
                         "match": {
                           "type": "string",
-                          "enum": [
-                            "exact",
-                            "numeric_tolerance",
-                            "date"
-                          ]
+                          "enum": ["exact", "numeric_tolerance", "date"]
                         },
                         "required": {
                           "type": "boolean"
@@ -18257,26 +16268,17 @@
                           }
                         }
                       },
-                      "required": [
-                        "path",
-                        "match"
-                      ],
+                      "required": ["path", "match"],
                       "additionalProperties": false
                     },
                     "minItems": 1
                   },
                   "aggregation": {
                     "type": "string",
-                    "enum": [
-                      "weighted_average",
-                      "all_or_nothing"
-                    ]
+                    "enum": ["weighted_average", "all_or_nothing"]
                   }
                 },
-                "required": [
-                  "type",
-                  "fields"
-                ],
+                "required": ["type", "fields"],
                 "additionalProperties": false
               },
               {
@@ -18313,10 +16315,7 @@
                     "minimum": 0
                   }
                 },
-                "required": [
-                  "type",
-                  "threshold"
-                ],
+                "required": ["type", "threshold"],
                 "additionalProperties": false
               },
               {
@@ -18353,10 +16352,7 @@
                     "minimum": 0
                   }
                 },
-                "required": [
-                  "type",
-                  "budget"
-                ],
+                "required": ["type", "budget"],
                 "additionalProperties": false
               },
               {
@@ -18386,10 +16382,7 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "token-usage",
-                      "token_usage"
-                    ]
+                    "enum": ["token-usage", "token_usage"]
                   },
                   "max_total": {
                     "type": "number",
@@ -18404,9 +16397,7 @@
                     "minimum": 0
                   }
                 },
-                "required": [
-                  "type"
-                ],
+                "required": ["type"],
                 "additionalProperties": false
               },
               {
@@ -18436,10 +16427,7 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "execution-metrics",
-                      "execution_metrics"
-                    ]
+                    "enum": ["execution-metrics", "execution_metrics"]
                   },
                   "max_tool_calls": {
                     "type": "number",
@@ -18471,9 +16459,7 @@
                     "minimum": 0
                   }
                 },
-                "required": [
-                  "type"
-                ],
+                "required": ["type"],
                 "additionalProperties": false
               },
               {
@@ -18509,10 +16495,7 @@
                     "type": "string"
                   }
                 },
-                "required": [
-                  "type",
-                  "value"
-                ],
+                "required": ["type", "value"],
                 "additionalProperties": false
               },
               {
@@ -18548,10 +16531,7 @@
                     "type": "string"
                   }
                 },
-                "required": [
-                  "type",
-                  "value"
-                ],
+                "required": ["type", "value"],
                 "additionalProperties": false
               },
               {
@@ -18581,15 +16561,10 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "is-json",
-                      "is_json"
-                    ]
+                    "enum": ["is-json", "is_json"]
                   }
                 },
-                "required": [
-                  "type"
-                ],
+                "required": ["type"],
                 "additionalProperties": false
               },
               {
@@ -18625,10 +16600,7 @@
                     "type": "string"
                   }
                 },
-                "required": [
-                  "type",
-                  "value"
-                ],
+                "required": ["type", "value"],
                 "additionalProperties": false
               },
               {
@@ -18709,10 +16681,7 @@
                                 "minLength": 1
                               }
                             },
-                            "required": [
-                              "score_range",
-                              "outcome"
-                            ],
+                            "required": ["score_range", "outcome"],
                             "additionalProperties": false
                           }
                         }
@@ -18722,10 +16691,7 @@
                     "minItems": 1
                   }
                 },
-                "required": [
-                  "type",
-                  "criteria"
-                ],
+                "required": ["type", "criteria"],
                 "additionalProperties": false
               }
             ]
@@ -18762,12 +16728,7 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "code-grader",
-                      "code_grader",
-                      "code-judge",
-                      "code_judge"
-                    ]
+                    "enum": ["code-grader", "code_grader", "code-judge", "code_judge"]
                   },
                   "command": {
                     "anyOf": [
@@ -18819,10 +16780,7 @@
                     "additionalProperties": {}
                   }
                 },
-                "required": [
-                  "type",
-                  "command"
-                ],
+                "required": ["type", "command"],
                 "additionalProperties": false
               },
               {
@@ -18852,12 +16810,7 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "llm-grader",
-                      "llm_grader",
-                      "llm-judge",
-                      "llm_judge"
-                    ]
+                    "enum": ["llm-grader", "llm_grader", "llm-judge", "llm_judge"]
                   },
                   "prompt": {
                     "anyOf": [
@@ -18951,10 +16904,7 @@
                                 "minLength": 1
                               }
                             },
-                            "required": [
-                              "score_range",
-                              "outcome"
-                            ],
+                            "required": ["score_range", "outcome"],
                             "additionalProperties": false
                           }
                         }
@@ -18983,9 +16933,7 @@
                     "maximum": 2
                   }
                 },
-                "required": [
-                  "type"
-                ],
+                "required": ["type"],
                 "additionalProperties": false
               },
               {
@@ -19045,9 +16993,7 @@
                             }
                           }
                         },
-                        "required": [
-                          "type"
-                        ],
+                        "required": ["type"],
                         "additionalProperties": false
                       },
                       {
@@ -19063,10 +17009,7 @@
                             "maximum": 1
                           }
                         },
-                        "required": [
-                          "type",
-                          "threshold"
-                        ],
+                        "required": ["type", "threshold"],
                         "additionalProperties": false
                       },
                       {
@@ -19083,10 +17026,7 @@
                             "type": "string"
                           }
                         },
-                        "required": [
-                          "type",
-                          "path"
-                        ],
+                        "required": ["type", "path"],
                         "additionalProperties": false
                       },
                       {
@@ -19103,18 +17043,13 @@
                             "type": "string"
                           }
                         },
-                        "required": [
-                          "type"
-                        ],
+                        "required": ["type"],
                         "additionalProperties": false
                       }
                     ]
                   }
                 },
-                "required": [
-                  "type",
-                  "aggregator"
-                ],
+                "required": ["type", "aggregator"],
                 "additionalProperties": false
               },
               {
@@ -19144,20 +17079,11 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "tool-trajectory",
-                      "tool_trajectory"
-                    ]
+                    "enum": ["tool-trajectory", "tool_trajectory"]
                   },
                   "mode": {
                     "type": "string",
-                    "enum": [
-                      "any_order",
-                      "in_order",
-                      "exact",
-                      "subset",
-                      "superset"
-                    ]
+                    "enum": ["any_order", "in_order", "exact", "subset", "superset"]
                   },
                   "minimums": {
                     "type": "object",
@@ -19198,12 +17124,7 @@
                           "anyOf": [
                             {
                               "type": "string",
-                              "enum": [
-                                "exact",
-                                "ignore",
-                                "subset",
-                                "superset"
-                              ]
+                              "enum": ["exact", "ignore", "subset", "superset"]
                             },
                             {
                               "type": "array",
@@ -19217,12 +17138,7 @@
                           "anyOf": [
                             {
                               "type": "string",
-                              "enum": [
-                                "exact",
-                                "ignore",
-                                "subset",
-                                "superset"
-                              ]
+                              "enum": ["exact", "ignore", "subset", "superset"]
                             },
                             {
                               "type": "array",
@@ -19233,9 +17149,7 @@
                           ]
                         }
                       },
-                      "required": [
-                        "tool"
-                      ],
+                      "required": ["tool"],
                       "additionalProperties": false
                     }
                   },
@@ -19243,12 +17157,7 @@
                     "anyOf": [
                       {
                         "type": "string",
-                        "enum": [
-                          "exact",
-                          "ignore",
-                          "subset",
-                          "superset"
-                        ]
+                        "enum": ["exact", "ignore", "subset", "superset"]
                       },
                       {
                         "type": "array",
@@ -19262,12 +17171,7 @@
                     "anyOf": [
                       {
                         "type": "string",
-                        "enum": [
-                          "exact",
-                          "ignore",
-                          "subset",
-                          "superset"
-                        ]
+                        "enum": ["exact", "ignore", "subset", "superset"]
                       },
                       {
                         "type": "array",
@@ -19278,10 +17182,7 @@
                     ]
                   }
                 },
-                "required": [
-                  "type",
-                  "mode"
-                ],
+                "required": ["type", "mode"],
                 "additionalProperties": false
               },
               {
@@ -19311,10 +17212,7 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "field-accuracy",
-                      "field_accuracy"
-                    ]
+                    "enum": ["field-accuracy", "field_accuracy"]
                   },
                   "fields": {
                     "type": "array",
@@ -19326,11 +17224,7 @@
                         },
                         "match": {
                           "type": "string",
-                          "enum": [
-                            "exact",
-                            "numeric_tolerance",
-                            "date"
-                          ]
+                          "enum": ["exact", "numeric_tolerance", "date"]
                         },
                         "required": {
                           "type": "boolean"
@@ -19352,26 +17246,17 @@
                           }
                         }
                       },
-                      "required": [
-                        "path",
-                        "match"
-                      ],
+                      "required": ["path", "match"],
                       "additionalProperties": false
                     },
                     "minItems": 1
                   },
                   "aggregation": {
                     "type": "string",
-                    "enum": [
-                      "weighted_average",
-                      "all_or_nothing"
-                    ]
+                    "enum": ["weighted_average", "all_or_nothing"]
                   }
                 },
-                "required": [
-                  "type",
-                  "fields"
-                ],
+                "required": ["type", "fields"],
                 "additionalProperties": false
               },
               {
@@ -19408,10 +17293,7 @@
                     "minimum": 0
                   }
                 },
-                "required": [
-                  "type",
-                  "threshold"
-                ],
+                "required": ["type", "threshold"],
                 "additionalProperties": false
               },
               {
@@ -19448,10 +17330,7 @@
                     "minimum": 0
                   }
                 },
-                "required": [
-                  "type",
-                  "budget"
-                ],
+                "required": ["type", "budget"],
                 "additionalProperties": false
               },
               {
@@ -19481,10 +17360,7 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "token-usage",
-                      "token_usage"
-                    ]
+                    "enum": ["token-usage", "token_usage"]
                   },
                   "max_total": {
                     "type": "number",
@@ -19499,9 +17375,7 @@
                     "minimum": 0
                   }
                 },
-                "required": [
-                  "type"
-                ],
+                "required": ["type"],
                 "additionalProperties": false
               },
               {
@@ -19531,10 +17405,7 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "execution-metrics",
-                      "execution_metrics"
-                    ]
+                    "enum": ["execution-metrics", "execution_metrics"]
                   },
                   "max_tool_calls": {
                     "type": "number",
@@ -19566,9 +17437,7 @@
                     "minimum": 0
                   }
                 },
-                "required": [
-                  "type"
-                ],
+                "required": ["type"],
                 "additionalProperties": false
               },
               {
@@ -19604,10 +17473,7 @@
                     "type": "string"
                   }
                 },
-                "required": [
-                  "type",
-                  "value"
-                ],
+                "required": ["type", "value"],
                 "additionalProperties": false
               },
               {
@@ -19643,10 +17509,7 @@
                     "type": "string"
                   }
                 },
-                "required": [
-                  "type",
-                  "value"
-                ],
+                "required": ["type", "value"],
                 "additionalProperties": false
               },
               {
@@ -19676,15 +17539,10 @@
                   },
                   "type": {
                     "type": "string",
-                    "enum": [
-                      "is-json",
-                      "is_json"
-                    ]
+                    "enum": ["is-json", "is_json"]
                   }
                 },
-                "required": [
-                  "type"
-                ],
+                "required": ["type"],
                 "additionalProperties": false
               },
               {
@@ -19720,10 +17578,7 @@
                     "type": "string"
                   }
                 },
-                "required": [
-                  "type",
-                  "value"
-                ],
+                "required": ["type", "value"],
                 "additionalProperties": false
               },
               {
@@ -19804,10 +17659,7 @@
                                 "minLength": 1
                               }
                             },
-                            "required": [
-                              "score_range",
-                              "outcome"
-                            ],
+                            "required": ["score_range", "outcome"],
                             "additionalProperties": false
                           }
                         }
@@ -19817,10 +17669,7 @@
                     "minItems": 1
                   }
                 },
-                "required": [
-                  "type",
-                  "criteria"
-                ],
+                "required": ["type", "criteria"],
                 "additionalProperties": false
               }
             ]
@@ -19836,10 +17685,7 @@
                 },
                 "isolation": {
                   "type": "string",
-                  "enum": [
-                    "shared",
-                    "per_test"
-                  ]
+                  "enum": ["shared", "per_test"]
                 },
                 "repos": {
                   "type": "array",
@@ -19863,10 +17709,7 @@
                                 "format": "uri"
                               }
                             },
-                            "required": [
-                              "type",
-                              "url"
-                            ],
+                            "required": ["type", "url"],
                             "additionalProperties": false
                           },
                           {
@@ -19880,10 +17723,7 @@
                                 "type": "string"
                               }
                             },
-                            "required": [
-                              "type",
-                              "path"
-                            ],
+                            "required": ["type", "path"],
                             "additionalProperties": false
                           }
                         ]
@@ -19896,10 +17736,7 @@
                           },
                           "resolve": {
                             "type": "string",
-                            "enum": [
-                              "remote",
-                              "local"
-                            ]
+                            "enum": ["remote", "local"]
                           },
                           "ancestor": {
                             "type": "integer",
@@ -19928,10 +17765,7 @@
                         "additionalProperties": false
                       }
                     },
-                    "required": [
-                      "path",
-                      "source"
-                    ],
+                    "required": ["path", "source"],
                     "additionalProperties": false
                   }
                 },
@@ -19967,11 +17801,7 @@
                         },
                         "reset": {
                           "type": "string",
-                          "enum": [
-                            "none",
-                            "fast",
-                            "strict"
-                          ]
+                          "enum": ["none", "fast", "strict"]
                         }
                       },
                       "additionalProperties": false
@@ -20002,11 +17832,7 @@
                         },
                         "reset": {
                           "type": "string",
-                          "enum": [
-                            "none",
-                            "fast",
-                            "strict"
-                          ]
+                          "enum": ["none", "fast", "strict"]
                         }
                       },
                       "additionalProperties": false
@@ -20037,11 +17863,7 @@
                         },
                         "reset": {
                           "type": "string",
-                          "enum": [
-                            "none",
-                            "fast",
-                            "strict"
-                          ]
+                          "enum": ["none", "fast", "strict"]
                         }
                       },
                       "additionalProperties": false
@@ -20072,11 +17894,7 @@
                         },
                         "reset": {
                           "type": "string",
-                          "enum": [
-                            "none",
-                            "fast",
-                            "strict"
-                          ]
+                          "enum": ["none", "fast", "strict"]
                         }
                       },
                       "additionalProperties": false
@@ -20086,11 +17904,7 @@
                 },
                 "mode": {
                   "type": "string",
-                  "enum": [
-                    "pooled",
-                    "temp",
-                    "static"
-                  ]
+                  "enum": ["pooled", "temp", "static"]
                 },
                 "path": {
                   "type": "string"
@@ -20104,9 +17918,7 @@
           ]
         }
       },
-      "required": [
-        "tests"
-      ],
+      "required": ["tests"],
       "additionalProperties": false
     }
   }

From 1fc65514ce6e00df3f534a75cbea7cc86251d4e5 Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Wed, 25 Mar 2026 02:50:32 +0000
Subject: [PATCH 10/11] fix(cli): use process.exit for threshold gate exit code
 (#698)

process.exitCode was being reset by the cmd-ts handler wrapper.
Return thresholdFailed from runEvalCommand and call process.exit(1)
in the handler instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 apps/cli/src/commands/eval/commands/run.ts | 5 ++++-
 apps/cli/src/commands/eval/run-eval.ts     | 8 +++++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/apps/cli/src/commands/eval/commands/run.ts b/apps/cli/src/commands/eval/commands/run.ts
index 713366e7b..5df5ee42b 100644
--- a/apps/cli/src/commands/eval/commands/run.ts
+++ b/apps/cli/src/commands/eval/commands/run.ts
@@ -224,6 +224,9 @@ export const evalRunCommand = command({
       outputMessages: args.outputMessages,
       threshold: args.threshold,
     };
-    await runEvalCommand({ testFiles: resolvedPaths, rawOptions });
+    const result = await runEvalCommand({ testFiles: resolvedPaths, rawOptions });
+    if (result?.thresholdFailed) {
+      process.exit(1);
+    }
   },
 });
diff --git a/apps/cli/src/commands/eval/run-eval.ts b/apps/cli/src/commands/eval/run-eval.ts
index 8dc114969..98d5670fc 100644
--- a/apps/cli/src/commands/eval/run-eval.ts
+++ b/apps/cli/src/commands/eval/run-eval.ts
@@ -754,6 +754,8 @@ export interface RunEvalResult {
   readonly outputPath: string;
   readonly testFiles: readonly string[];
   readonly target?: string;
+  /** True when --threshold is set and mean score is below the threshold */
+  readonly thresholdFailed?: boolean;
 }
 
 export async function runEvalCommand(
@@ -1171,12 +1173,11 @@ export async function runEvalCommand(
     console.log(formatEvaluationSummary(summary));
 
     // Threshold quality gate check
+    let thresholdFailed = false;
     if (resolvedThreshold !== undefined) {
       const thresholdResult = formatThresholdSummary(summary.mean, resolvedThreshold);
       console.log(`\n${thresholdResult.message}`);
-      if (!thresholdResult.passed) {
-        process.exitCode = 1;
-      }
+      thresholdFailed = !thresholdResult.passed;
     }
 
     // Print matrix summary when multiple targets were evaluated
@@ -1273,6 +1274,7 @@ export async function runEvalCommand(
       outputPath,
       testFiles: resolvedTestFiles,
       target: options.target,
+      thresholdFailed,
     };
   } finally {
     unsubscribeCodexLogs();

From 5ea4e4bfc616a3cab7ba3dfb3330b16822bdfeed Mon Sep 17 00:00:00 2001
From: Christopher Tso <christso@gmail.com>
Date: Wed, 25 Mar 2026 02:58:13 +0000
Subject: [PATCH 11/11] docs: add --threshold documentation and CLI validation
 (#698)

- Add CLI range validation (0-1) for --threshold flag
- Document threshold in running-evals.mdx, eval-files.mdx, and SKILL.md
- Remove temporary plan files before merge

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 apps/cli/src/commands/eval/run-eval.ts        |   3 +
 .../content/docs/evaluation/eval-files.mdx    |   2 +-
 .../content/docs/evaluation/running-evals.mdx |  27 +
 .../plans/2026-03-25-threshold-flag-design.md |  76 ---
 docs/plans/2026-03-25-threshold-flag-plan.md  | 562 ------------------
 .../skills/agentv-eval-writer/SKILL.md        |  15 +-
 6 files changed, 45 insertions(+), 640 deletions(-)
 delete mode 100644 docs/plans/2026-03-25-threshold-flag-design.md
 delete mode 100644 docs/plans/2026-03-25-threshold-flag-plan.md

diff --git a/apps/cli/src/commands/eval/run-eval.ts b/apps/cli/src/commands/eval/run-eval.ts
index 98d5670fc..ac3a84cd9 100644
--- a/apps/cli/src/commands/eval/run-eval.ts
+++ b/apps/cli/src/commands/eval/run-eval.ts
@@ -1014,6 +1014,9 @@ export async function runEvalCommand(
   // Resolve suite-level threshold: CLI --threshold takes precedence over YAML execution.threshold
   const yamlThreshold = firstMeta?.threshold;
   const resolvedThreshold = options.threshold ?? yamlThreshold;
+  if (resolvedThreshold !== undefined && (resolvedThreshold < 0 || resolvedThreshold > 1)) {
+    throw new Error('--threshold must be between 0 and 1');
+  }
 
   // Build the output writer (deferred until after threshold is resolved so JUnit
   // writer can use the resolved threshold for per-test pass/fail decisions)
diff --git a/apps/web/src/content/docs/evaluation/eval-files.mdx b/apps/web/src/content/docs/evaluation/eval-files.mdx
index 281614053..41c03eb97 100644
--- a/apps/web/src/content/docs/evaluation/eval-files.mdx
+++ b/apps/web/src/content/docs/evaluation/eval-files.mdx
@@ -34,7 +34,7 @@ tests:
 |-------|-------------|
 | `description` | Human-readable description of the evaluation |
 | `dataset` | Optional dataset identifier |
-| `execution` | Default execution config (`target`, `fail_on_error`, etc.) |
+| `execution` | Default execution config (`target`, `fail_on_error`, `threshold`, etc.) |
 | `workspace` | Suite-level workspace config — inline object or string path to an [external workspace file](/guides/workspace-pool/#external-workspace-config) |
 | `tests` | Array of individual tests, or a string path to an external file |
 | `assertions` | Suite-level evaluators appended to each test unless `execution.skip_defaults: true` is set on the test |
diff --git a/apps/web/src/content/docs/evaluation/running-evals.mdx b/apps/web/src/content/docs/evaluation/running-evals.mdx
index 7e221bbc6..5c502aa19 100644
--- a/apps/web/src/content/docs/evaluation/running-evals.mdx
+++ b/apps/web/src/content/docs/evaluation/running-evals.mdx
@@ -229,6 +229,33 @@ execution:
 
 When halted, remaining tests are recorded with `failureReasonCode: 'error_threshold_exceeded'`. With concurrency > 1, a few additional tests may complete before halting takes effect.
 
+### Suite-Level Quality Threshold
+
+Set a minimum mean score for the eval suite. If the mean quality score falls below the threshold, the CLI exits with code 1 — useful for CI/CD quality gates.
+
+**CLI flag:**
+
+```bash
+agentv eval evals/ --threshold 0.8
+```
+
+**YAML config:**
+
+```yaml
+execution:
+  threshold: 0.8
+```
+
+The CLI `--threshold` flag overrides the YAML value. The threshold is a number between 0 and 1. Mean score is computed from quality results only (execution errors are excluded).
+
+When active, a summary line is printed after the eval results:
+
+```
+Suite score: 0.85 (threshold: 0.80) — PASS
+```
+
+The threshold also controls JUnit XML pass/fail: tests with scores below the threshold are marked as `<failure>` in JUnit output. When no threshold is set, JUnit defaults to 0.5.
+
 ## Validate Before Running
 
 Check eval files for schema errors without executing:
diff --git a/docs/plans/2026-03-25-threshold-flag-design.md b/docs/plans/2026-03-25-threshold-flag-design.md
deleted file mode 100644
index 29c6b5e74..000000000
--- a/docs/plans/2026-03-25-threshold-flag-design.md
+++ /dev/null
@@ -1,76 +0,0 @@
-# Design: `--threshold` flag for suite-level quality gates
-
-**Issue:** #698
-**Date:** 2026-03-25
-
-## Objective
-
-Add a `--threshold` CLI flag to `agentv eval` that fails (exit 1) if the mean score across all tests falls below the specified threshold. This enables CI/CD quality gating without needing `agentv compare --baseline`.
-
-## CLI Flag
-
-- `--threshold <number>` on `agentv eval run` (0–1 scale)
-- Optional — if omitted, no threshold check (current behavior preserved)
-- Overrides `execution.threshold` from YAML if both set
-
-## YAML Config
-
-Add `threshold` to the `execution` block in eval YAML files:
-
-```yaml
-execution:
-  threshold: 0.8
-```
-
-Both `threshold` and `execution.threshold` accepted (snake_case wire format convention).
-
-## Score Evaluation
-
-After all tests complete:
-
-1. Compute mean score from quality results only (excluding `execution_error` tests — same as existing `calculateEvaluationSummary()`)
-2. If mean score < threshold → exit code 1
-3. Execution errors fail independently via existing `fail_on_error` mechanism (separate concern)
-4. If no quality results exist (all execution errors), threshold check is skipped
-
-## Output
-
-When threshold is active, append a summary line after the existing result summary:
-
-```
-Suite score: 0.53 (threshold: 0.60) — FAIL
-```
-
-or:
-
-```
-Suite score: 0.85 (threshold: 0.60) — PASS
-```
-
-## JUnit Integration
-
-The JUnit writer uses the threshold for per-test pass/fail:
-
-- If threshold is set: `score < threshold` → `<failure>` element
-- If threshold is not set: `score < 0.5` (current hardcoded behavior preserved)
-
-## Exit Code
-
-- Exit 0: mean score >= threshold (or no threshold set)
-- Exit 1: mean score < threshold
-- Execution errors handled separately by `fail_on_error`
-
-## Files to Modify
-
-1. `packages/core/src/evaluation/validation/eval-file.schema.ts` — add `threshold` to ExecutionSchema
-2. `apps/cli/src/commands/eval/commands/run.ts` — add `--threshold` CLI flag
-3. `apps/cli/src/commands/eval/run-eval.ts` — pass threshold through, check after results
-4. `apps/cli/src/commands/eval/statistics.ts` — add threshold summary formatting
-5. `apps/cli/src/commands/eval/junit-writer.ts` — use threshold for pass/fail
-6. Tests for new behavior
-
-## Non-Goals
-
-- Per-test threshold override (use `required` for that)
-- Replacement for `agentv compare` regression gating
-- Severity levels (#334)
diff --git a/docs/plans/2026-03-25-threshold-flag-plan.md b/docs/plans/2026-03-25-threshold-flag-plan.md
deleted file mode 100644
index 57ba2eb53..000000000
--- a/docs/plans/2026-03-25-threshold-flag-plan.md
+++ /dev/null
@@ -1,562 +0,0 @@
-# `--threshold` Flag Implementation Plan
-
-> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
-
-**Goal:** Add a `--threshold` CLI flag and `execution.threshold` YAML field to `agentv eval` that exits 1 when mean quality score falls below the threshold.
-
-**Architecture:** The threshold value flows from CLI flag or YAML config through the existing options pipeline. After all tests complete, the summary is checked against the threshold. JUnit writer also uses the threshold for per-test pass/fail.
-
-**Tech Stack:** TypeScript, cmd-ts (CLI parsing), Zod (schema validation), Vitest (testing)
-
----
-
-### Task 1: Add `extractThreshold` to core config-loader
-
-**Files:**
-- Modify: `packages/core/src/evaluation/loaders/config-loader.ts:287` (after `extractTotalBudgetUsd`)
-- Test: `packages/core/test/evaluation/loaders/config-loader.test.ts`
-
-**Step 1: Write the failing tests**
-
-Add to `packages/core/test/evaluation/loaders/config-loader.test.ts` after the `extractFailOnError` describe block:
-
-```typescript
-describe('extractThreshold', () => {
-  it('returns undefined when no execution block', () => {
-    const suite: JsonObject = { tests: [] };
-    expect(extractThreshold(suite)).toBeUndefined();
-  });
-
-  it('returns undefined when threshold not set', () => {
-    const suite: JsonObject = { execution: { target: 'default' } };
-    expect(extractThreshold(suite)).toBeUndefined();
-  });
-
-  it('parses valid threshold', () => {
-    const suite: JsonObject = { execution: { threshold: 0.8 } };
-    expect(extractThreshold(suite)).toBe(0.8);
-  });
-
-  it('accepts 0 as threshold', () => {
-    const suite: JsonObject = { execution: { threshold: 0 } };
-    expect(extractThreshold(suite)).toBe(0);
-  });
-
-  it('accepts 1 as threshold', () => {
-    const suite: JsonObject = { execution: { threshold: 1 } };
-    expect(extractThreshold(suite)).toBe(1);
-  });
-
-  it('returns undefined for negative threshold', () => {
-    const suite: JsonObject = { execution: { threshold: -0.1 } };
-    expect(extractThreshold(suite)).toBeUndefined();
-  });
-
-  it('returns undefined for threshold > 1', () => {
-    const suite: JsonObject = { execution: { threshold: 1.5 } };
-    expect(extractThreshold(suite)).toBeUndefined();
-  });
-
-  it('returns undefined for non-number threshold', () => {
-    const suite: JsonObject = { execution: { threshold: 'high' } };
-    expect(extractThreshold(suite)).toBeUndefined();
-  });
-});
-```
-
-Also add `extractThreshold` to the import at the top of the test file.
-
-**Step 2: Run tests to verify they fail**
-
-Run: `bun test packages/core/test/evaluation/loaders/config-loader.test.ts`
-Expected: FAIL — `extractThreshold` not found
-
-**Step 3: Implement `extractThreshold`**
-
-Add to `packages/core/src/evaluation/loaders/config-loader.ts` after `extractTotalBudgetUsd` (after line ~308):
-
-```typescript
-/**
- * Extract `execution.threshold` from parsed eval suite.
- * Accepts a number in [0, 1] range.
- * Returns undefined when not specified.
- */
-export function extractThreshold(suite: JsonObject): number | undefined {
-  const execution = suite.execution;
-  if (!execution || typeof execution !== 'object' || Array.isArray(execution)) {
-    return undefined;
-  }
-
-  const executionObj = execution as Record<string, unknown>;
-  const raw = executionObj.threshold;
-
-  if (raw === undefined || raw === null) {
-    return undefined;
-  }
-
-  if (typeof raw === 'number' && raw >= 0 && raw <= 1) {
-    return raw;
-  }
-
-  logWarning(
-    `Invalid execution.threshold: ${raw}. Must be a number between 0 and 1. Ignoring.`,
-  );
-  return undefined;
-}
-```
-
-**Step 4: Run tests to verify they pass**
-
-Run: `bun test packages/core/test/evaluation/loaders/config-loader.test.ts`
-Expected: PASS
-
-**Step 5: Commit**
-
-```bash
-git add packages/core/src/evaluation/loaders/config-loader.ts packages/core/test/evaluation/loaders/config-loader.test.ts
-git commit -m "feat(core): add extractThreshold for execution.threshold YAML field (#698)"
-```
-
----
-
-### Task 2: Wire `extractThreshold` through YAML parser and schema
-
-**Files:**
-- Modify: `packages/core/src/evaluation/yaml-parser.ts:12` (imports), `:58` (re-exports), `:204` (loadTestSuite)
-- Modify: `packages/core/src/evaluation/yaml-parser.ts:168` (EvalSuiteResult type)
-- Modify: `packages/core/src/evaluation/validation/eval-file.schema.ts:317` (ExecutionSchema)
-
-**Step 1: Add `threshold` to ExecutionSchema in eval-file.schema.ts**
-
-In `packages/core/src/evaluation/validation/eval-file.schema.ts`, add to the `ExecutionSchema` object (after `failOnError` at line 330):
-
-```typescript
-  threshold: z.number().min(0).max(1).optional(),
-```
-
-**Step 2: Add to EvalSuiteResult type in yaml-parser.ts**
-
-In `packages/core/src/evaluation/yaml-parser.ts`, add to the `EvalSuiteResult` type (after `failOnError` at line 182):
-
-```typescript
-  /** Suite-level quality threshold (0-1) — suite fails if mean score is below */
-  readonly threshold?: number;
-```
-
-**Step 3: Import and re-export `extractThreshold` in yaml-parser.ts**
-
-Add `extractThreshold` to the import from `./loaders/config-loader.js` (line 12 area) and the re-export block (line 58 area).
-
-**Step 4: Use in `loadTestSuite`**
-
-In the `loadTestSuite` function (around line 203), extract and return threshold:
-
-```typescript
-  const threshold = extractThreshold(parsed);
-  return {
-    tests,
-    trials: extractTrialsConfig(parsed),
-    targets: extractTargetsFromSuite(parsed),
-    workers: extractWorkersFromSuite(parsed),
-    cacheConfig: extractCacheConfig(parsed),
-    totalBudgetUsd: extractTotalBudgetUsd(parsed),
-    ...(metadata !== undefined && { metadata }),
-    ...(failOnError !== undefined && { failOnError }),
-    ...(threshold !== undefined && { threshold }),
-  };
-```
-
-**Step 5: Regenerate the JSON schema**
-
-Run: `bun run generate:schema`
-
-**Step 6: Run core tests**
-
-Run: `bun test packages/core/test/evaluation/loaders/config-loader.test.ts`
-Expected: PASS
-
-**Step 7: Commit**
-
-```bash
-git add packages/core/src/evaluation/validation/eval-file.schema.ts packages/core/src/evaluation/yaml-parser.ts
-git commit -m "feat(core): wire extractThreshold through YAML parser and schema (#698)"
-```
-
----
-
-### Task 3: Add `--threshold` CLI flag and pass through to run-eval
-
-**Files:**
-- Modify: `apps/cli/src/commands/eval/commands/run.ts` (add CLI flag)
-- Modify: `apps/cli/src/commands/eval/run-eval.ts` (NormalizedOptions, normalizeOptions, handler return)
-
-**Step 1: Add CLI flag to run.ts**
-
-In `apps/cli/src/commands/eval/commands/run.ts`, add after the `model` option (around line 171):
-
-```typescript
-    threshold: option({
-      type: optional(number),
-      long: 'threshold',
-      description: 'Suite-level quality gate: exit 1 if mean score falls below this value (0-1)',
-    }),
-```
-
-And add `threshold: args.threshold` to the `rawOptions` object in the handler (around line 219).
-
-**Step 2: Add to NormalizedOptions in run-eval.ts**
-
-In `apps/cli/src/commands/eval/run-eval.ts`, add to the `NormalizedOptions` interface:
-
-```typescript
-  readonly threshold?: number;
-```
-
-**Step 3: Add to normalizeOptions**
-
-In the `normalizeOptions` function, add threshold resolution (CLI > YAML):
-
-```typescript
-  // Resolve threshold: CLI --threshold > YAML execution.threshold
-  const cliThreshold = normalizeOptionalNumber(rawOptions.threshold);
-```
-
-And in the return statement:
-
-```typescript
-    threshold: cliThreshold,
-```
-
-**Step 4: Wire YAML threshold into normalized options**
-
-In `runEvalCommand`, after `prepareEvalFile` returns, merge the YAML threshold if CLI didn't set one. In the loop over eval files (around the `prepareEvalFile` call), capture `suite.threshold` and pass it through.
-
-The cleanest approach: read the YAML threshold in `prepareEvalFile` and return it alongside the other fields. Then in the main `runEvalCommand`, resolve CLI vs YAML threshold.
-
-Add `threshold` to the `prepareEvalFile` return type (alongside `failOnError`):
-
-```typescript
-  readonly threshold?: number;
-```
-
-And in `prepareEvalFile`, add after `failOnError: suite.failOnError`:
-
-```typescript
-    threshold: suite.threshold,
-```
-
-**Step 5: Commit**
-
-```bash
-git add apps/cli/src/commands/eval/commands/run.ts apps/cli/src/commands/eval/run-eval.ts
-git commit -m "feat(cli): add --threshold flag and wire through options pipeline (#698)"
-```
-
----
-
-### Task 4: Add threshold check and summary output after eval completes
-
-**Files:**
-- Modify: `apps/cli/src/commands/eval/run-eval.ts` (after summary calculation ~line 1152)
-- Modify: `apps/cli/src/commands/eval/statistics.ts` (add `formatThresholdSummary`)
-- Test: `apps/cli/test/commands/eval/threshold.test.ts` (new)
-
-**Step 1: Write failing tests**
-
-Create `apps/cli/test/commands/eval/threshold.test.ts`:
-
-```typescript
-import { describe, expect, it } from 'bun:test';
-
-import type { EvaluationResult } from '@agentv/core';
-
-import { formatThresholdSummary } from '../../../src/commands/eval/statistics.js';
-
-function makeResult(overrides: Partial<EvaluationResult> = {}): EvaluationResult {
-  return {
-    timestamp: '2024-01-01T00:00:00Z',
-    testId: 'test-1',
-    score: 1.0,
-    assertions: [{ text: 'criterion-1', passed: true }],
-    output: [{ role: 'assistant' as const, content: 'answer' }],
-    target: 'default',
-    ...overrides,
-  };
-}
-
-describe('formatThresholdSummary', () => {
-  it('returns PASS when mean score meets threshold', () => {
-    const result = formatThresholdSummary(0.85, 0.6);
-    expect(result.passed).toBe(true);
-    expect(result.message).toContain('0.85');
-    expect(result.message).toContain('0.60');
-    expect(result.message).toContain('PASS');
-  });
-
-  it('returns FAIL when mean score is below threshold', () => {
-    const result = formatThresholdSummary(0.53, 0.6);
-    expect(result.passed).toBe(false);
-    expect(result.message).toContain('0.53');
-    expect(result.message).toContain('0.60');
-    expect(result.message).toContain('FAIL');
-  });
-
-  it('returns PASS when mean score exactly equals threshold', () => {
-    const result = formatThresholdSummary(0.6, 0.6);
-    expect(result.passed).toBe(true);
-  });
-
-  it('returns PASS for threshold 0 with any score', () => {
-    const result = formatThresholdSummary(0, 0);
-    expect(result.passed).toBe(true);
-  });
-});
-```
-
-**Step 2: Run tests to verify they fail**
-
-Run: `bun test apps/cli/test/commands/eval/threshold.test.ts`
-Expected: FAIL — `formatThresholdSummary` not found
-
-**Step 3: Implement `formatThresholdSummary` in statistics.ts**
-
-Add to `apps/cli/src/commands/eval/statistics.ts`:
-
-```typescript
-/**
- * Format a threshold check summary line.
- * Returns whether the threshold was met and the formatted message.
- */
-export function formatThresholdSummary(
-  meanScore: number,
-  threshold: number,
-): { passed: boolean; message: string } {
-  const passed = meanScore >= threshold;
-  const verdict = passed ? 'PASS' : 'FAIL';
-  const message = `Suite score: ${meanScore.toFixed(2)} (threshold: ${threshold.toFixed(2)}) — ${verdict}`;
-  return { passed, message };
-}
-```
-
-**Step 4: Run tests to verify they pass**
-
-Run: `bun test apps/cli/test/commands/eval/threshold.test.ts`
-Expected: PASS
-
-**Step 5: Wire the threshold check into run-eval.ts**
-
-In `apps/cli/src/commands/eval/run-eval.ts`, after the summary is printed (around line 1153), add:
-
-```typescript
-    // Threshold quality gate check
-    const resolvedThreshold = options.threshold ?? yamlThreshold;
-    if (resolvedThreshold !== undefined) {
-      const { formatThresholdSummary } = await import('./statistics.js');
-      const thresholdResult = formatThresholdSummary(summary.mean, resolvedThreshold);
-      console.log(`\n${thresholdResult.message}`);
-      if (!thresholdResult.passed) {
-        process.exitCode = 1;
-      }
-    }
-```
-
-Note: `yamlThreshold` needs to be captured from the `prepareEvalFile` results. If multiple eval files are run, use the first non-undefined threshold (or the CLI value).
-
-Import `formatThresholdSummary` statically at the top (preferred over dynamic import since it's in the same package):
-
-```typescript
-import {
-  calculateEvaluationSummary,
-  formatEvaluationSummary,
-  formatMatrixSummary,
-  formatThresholdSummary,
-} from './statistics.js';
-```
-
-**Step 6: Commit**
-
-```bash
-git add apps/cli/src/commands/eval/statistics.ts apps/cli/src/commands/eval/run-eval.ts apps/cli/test/commands/eval/threshold.test.ts
-git commit -m "feat(cli): add threshold check with summary output after eval (#698)"
-```
-
----
-
-### Task 5: JUnit writer uses threshold for per-test pass/fail
-
-**Files:**
-- Modify: `apps/cli/src/commands/eval/junit-writer.ts`
-- Modify: `apps/cli/test/commands/eval/output-writers.test.ts` (add tests)
-
-**Step 1: Write failing tests**
-
-Add to `apps/cli/test/commands/eval/output-writers.test.ts` in the JUnit describe block:
-
-```typescript
-  it('uses custom threshold for pass/fail when provided', async () => {
-    const filePath = path.join(testDir, `junit-threshold-${Date.now()}.xml`);
-    const writer = await JunitWriter.open(filePath, { threshold: 0.8 });
-
-    await writer.append(makeResult({ testId: 'high', score: 0.9 }));
-    await writer.append(makeResult({ testId: 'mid', score: 0.6 }));
-    await writer.close();
-
-    const xml = await readFile(filePath, 'utf8');
-    expect(xml).not.toContain('<failure message="score=0.900"');
-    expect(xml).toContain('<failure message="score=0.600"');
-  });
-
-  it('defaults to 0.5 threshold when none provided', async () => {
-    const filePath = path.join(testDir, `junit-default-${Date.now()}.xml`);
-    const writer = await JunitWriter.open(filePath);
-
-    await writer.append(makeResult({ testId: 'pass', score: 0.6 }));
-    await writer.append(makeResult({ testId: 'fail', score: 0.3 }));
-    await writer.close();
-
-    const xml = await readFile(filePath, 'utf8');
-    expect(xml).not.toContain('<failure message="score=0.600"');
-    expect(xml).toContain('<failure message="score=0.300"');
-  });
-```
-
-**Step 2: Run tests to verify they fail**
-
-Run: `bun test apps/cli/test/commands/eval/output-writers.test.ts`
-Expected: FAIL — `JunitWriter.open` doesn't accept options
-
-**Step 3: Implement threshold support in JunitWriter**
-
-Modify `apps/cli/src/commands/eval/junit-writer.ts`:
-
-```typescript
-export interface JunitWriterOptions {
-  readonly threshold?: number;
-}
-
-export class JunitWriter {
-  private readonly filePath: string;
-  private readonly results: EvaluationResult[] = [];
-  private readonly threshold: number;
-  private closed = false;
-
-  private constructor(filePath: string, options?: JunitWriterOptions) {
-    this.filePath = filePath;
-    this.threshold = options?.threshold ?? 0.5;
-  }
-
-  static async open(filePath: string, options?: JunitWriterOptions): Promise<JunitWriter> {
-    await mkdir(path.dirname(filePath), { recursive: true });
-    return new JunitWriter(filePath, options);
-  }
-```
-
-Then replace all `r.score < 0.5` with `r.score < this.threshold` in the `close()` method.
-
-**Step 4: Pass threshold to JunitWriter in output-writer.ts**
-
-In `apps/cli/src/commands/eval/output-writer.ts`, where JunitWriter is created, pass the threshold. Check how output writers are created and thread the threshold through.
-
-**Step 5: Run tests to verify they pass**
-
-Run: `bun test apps/cli/test/commands/eval/output-writers.test.ts`
-Expected: PASS
-
-**Step 6: Commit**
-
-```bash
-git add apps/cli/src/commands/eval/junit-writer.ts apps/cli/src/commands/eval/output-writer.ts apps/cli/test/commands/eval/output-writers.test.ts
-git commit -m "feat(cli): JUnit writer uses --threshold for per-test pass/fail (#698)"
-```
-
----
-
-### Task 6: Add `threshold` to Zod schema and regenerate JSON schema
-
-**Files:**
-- Modify: `packages/core/src/evaluation/validation/eval-file.schema.ts` (already done in Task 2)
-- Run: `bun run generate:schema`
-
-**Step 1: Verify threshold is in ExecutionSchema**
-
-Read `packages/core/src/evaluation/validation/eval-file.schema.ts` and confirm `threshold` was added in Task 2.
-
-**Step 2: Regenerate JSON schema**
-
-Run: `bun run generate:schema`
-
-**Step 3: Run validate:examples to check existing YAML files still pass**
-
-Run: `bun run validate:examples`
-Expected: PASS (threshold is optional, so existing files are unaffected)
-
-**Step 4: Commit if schema file changed**
-
-```bash
-git add packages/core/
-git commit -m "chore: regenerate eval-schema.json with threshold field (#698)"
-```
-
----
-
-### Task 7: Run full test suite and verify
-
-**Step 1: Run all tests**
-
-Run: `bun run test`
-Expected: PASS (except any pre-existing known failures)
-
-**Step 2: Run typecheck**
-
-Run: `bun run typecheck`
-Expected: PASS
-
-**Step 3: Run lint**
-
-Run: `bun run lint`
-Expected: PASS
-
-**Step 4: Run build**
-
-Run: `bun run build`
-Expected: PASS
-
----
-
-### Task 8: Manual red/green UAT
-
-**Step 1: Red — verify no threshold behavior on main**
-
-Run an eval without --threshold:
-
-```bash
-bun apps/cli/src/cli.ts eval examples/features/rubric/evals/dataset.eval.yaml --test-id summary-1
-```
-
-Confirm: no "Suite score" line in output, exit code is 0.
-
-**Step 2: Green — verify --threshold works**
-
-Run with a threshold that should PASS:
-
-```bash
-bun apps/cli/src/cli.ts eval examples/features/rubric/evals/dataset.eval.yaml --test-id summary-1 --threshold 0.3
-```
-
-Confirm: "Suite score: X.XX (threshold: 0.30) — PASS" printed, exit code 0.
-
-Run with a threshold that should FAIL:
-
-```bash
-bun apps/cli/src/cli.ts eval examples/features/rubric/evals/dataset.eval.yaml --test-id summary-1 --threshold 0.99
-```
-
-Confirm: "Suite score: X.XX (threshold: 0.99) — FAIL" printed, exit code 1.
-
-**Step 3: Verify JUnit output uses threshold**
-
-```bash
-bun apps/cli/src/cli.ts eval examples/features/rubric/evals/dataset.eval.yaml --test-id summary-1 --threshold 0.9 -o /tmp/test-threshold.xml
-```
-
-Inspect the XML: tests with score < 0.9 should have `<failure>` elements.
diff --git a/plugins/agentv-dev/skills/agentv-eval-writer/SKILL.md b/plugins/agentv-dev/skills/agentv-eval-writer/SKILL.md
index efc818f3c..7a6f2c3f5 100644
--- a/plugins/agentv-dev/skills/agentv-eval-writer/SKILL.md
+++ b/plugins/agentv-dev/skills/agentv-eval-writer/SKILL.md
@@ -520,11 +520,24 @@ execution:
 
 When halted, remaining tests get `executionStatus: 'execution_error'` with `failureReasonCode: 'error_threshold_exceeded'`.
 
+## Suite-Level Quality Threshold
+
+Set a minimum mean score for the eval suite. If the mean quality score falls below the threshold, the CLI exits with code 1 — useful for CI/CD quality gates.
+
+```yaml
+execution:
+  threshold: 0.8
+```
+
+CLI flag `--threshold 0.8` overrides the YAML value. Must be a number between 0 and 1. Mean score is computed from quality results only (execution errors excluded).
+
+The threshold also controls JUnit XML pass/fail: tests with scores below the threshold are marked as `<failure>`. When no threshold is set, JUnit defaults to 0.5.
+
 ## CLI Commands
 
 ```bash
 # Run evaluation (requires API keys)
-agentv eval <file.yaml> [--test-id <id>] [--target <name>] [--dry-run]
+agentv eval <file.yaml> [--test-id <id>] [--target <name>] [--dry-run] [--threshold <0-1>]
 
 # Run with OTLP JSON file (importable by OTel backends)
 agentv eval <file.yaml> --otel-file traces/eval.otlp.json