diff --git a/AGENTS.md b/AGENTS.md
index e0a6b1aa8..cb022cb1b 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -258,3 +258,185 @@ When making changes to functionality:
 2. **Skill files** (`plugins/agentv-dev/skills/agentv-eval-builder/`): Update the AI-focused reference card if the change affects YAML schema, evaluator types, or CLI commands. Keep concise — link to docs site for details.
 
 3. **Examples** (`examples/`): Update any example code, scripts, or eval YAML files that exercise the changed functionality. Examples are both documentation and integration tests.
+
+4. **README.md**: Keep minimal. Links point to agentv.dev.
+
+## Evaluator Type System
+
+Evaluator types use **kebab-case** everywhere (matching promptfoo convention):
+
+- **YAML config:** `type: llm-grader`, `type: is-json`, `type: execution-metrics`
+- **Internal TypeScript:** `EvaluatorKind = 'llm-grader' | 'is-json' | ...`
+- **Output `scores[].type`:** `"llm-grader"`, `"is-json"`
+- **Registry keys:** `registry.register('llm-grader', ...)`
+
+**Source of truth:** `EVALUATOR_KIND_VALUES` array in `packages/core/src/evaluation/types.ts`
+
+**Backward compatibility:** Snake_case is accepted in YAML (`llm_judge` → `llm-grader`) via `normalizeEvaluatorType()` in `evaluator-parser.ts`. Single-word types (`contains`, `equals`, `regex`, `latency`, `cost`) have no separator and are unchanged.
+
+**Two type definitions exist:**
+- `EvaluatorKind` in `packages/core/src/evaluation/types.ts` — internal, canonical
+- `AssertionType` in `packages/eval/src/assertion.ts` — SDK-facing, must stay in sync
+
+## Git Workflow
+
+### Commit Convention
+
+Follow conventional commits: `type(scope): description`
+
+Types: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`
+
+### Issue Workflow
+
+When working on a GitHub issue, **ALWAYS** follow this workflow:
+
+1. **Claim the issue** — prevents other agents from duplicating work:
+   ```bash
+   # Load AGENT_ID from .env; if not set, ask the user or default to <harness>-<model>
+   # Harness = the coding tool (claude-code, opencode, codex-cli, cursor, etc.)
+   # Model = the LLM (opus, sonnet, o3, etc.)
+   # Examples: "claude-code-opus", "opencode-sonnet", "cursor-o3", "codex-cli-o3"
+   # In this local dev environment, default to "devbox2-codex" unless the user specifies another AGENT_ID.
+   # Do NOT use hostname or machine name.
+   source .env 2>/dev/null
+   if [ -z "$AGENT_ID" ]; then
+     echo "AGENT_ID is not set. Ask the user for an agent identifier, or default to devbox2-codex in this environment (otherwise use <harness>-<model>)."
+   fi
+
+   # Check if already claimed
+   gh issue view <number> --json labels --jq '.labels[].name' | grep -q "in-progress" && echo "SKIP — already claimed" && exit 1
+
+   # Claim it — label + project roadmap status
+   gh issue edit <number> --add-label "in-progress"
+
+   # Update project roadmap: set status to "In Progress" and stamp Agent ID
+   ITEM_ID=$(gh project item-list 1 --owner EntityProcess --format json | jq -r '.items[] | select(.content.number == <number> and .content.repository == "agentv") | .id')
+   if [ -n "$ITEM_ID" ]; then
+     gh project item-edit --project-id PVT_kwDOAIbbRc4BSmjF --id "$ITEM_ID" --field-id PVTSSF_lADOAIbbRc4BSmjFzhAFomw --single-select-option-id 47fc9ee4
+     gh project item-edit --project-id PVT_kwDOAIbbRc4BSmjF --id "$ITEM_ID" --field-id PVTF_lADOAIbbRc4BSmjFzhAHSnk --text "$AGENT_ID"
+   fi
+   ```
+   If the issue has the `in-progress` label, **do not work on it** — pick a different issue.
+
+2. **Create a worktree** with a feature branch:
+   ```bash
+   git worktree add agentv.worktrees/<branch-name> -b <type>/<issue-number>-<short-description>
+   cd agentv.worktrees/<branch-name>
+   bun install
+   cp "$(git worktree list --porcelain | head -1 | sed 's/worktree //')/.env" .env
+   # Example: git worktree add agentv.worktrees/feat/42-add-new-embedder -b feat/42-add-new-embedder
+   ```
+
+3. **Implement the changes** and commit following the commit convention
+
+4. **Push the branch and create a Pull Request**:
+   ```bash
+   git push -u origin <branch-name>
+   gh pr create --title "<type>(scope): description" --body "Closes #<issue-number>"
+   ```
+
+5. **Before merging**, ensure:
+   - **E2E verification completed** (see "Completing Work — E2E Checklist")
+   - CI pipeline passes (all checks green)
+   - Code has been reviewed if required
+   - No merge conflicts with `main`
+
+The `in-progress` label stays on the issue until the PR is merged and the issue is closed. Do not remove it manually.
+
+**IMPORTANT:** Never push directly to `main`. Always use branches and PRs.
+
+### Tracker Conventions
+
+- The roadmap project is the source of truth for prioritization.
+- Issues in the roadmap are prioritized; issues outside it are not.
+- `bug` marks defects.
+- Issues without `bug` are non-bug work by default.
+- `in-progress` marks an issue as claimed by an agent — do not start work on it.
+- `core`, `wui`, and `tui` are area labels.
+- Keep issue bodies focused on the handoff contract: objective, design latitude, acceptance signals, non-goals, and related links.
+- Do not put priority metadata in issue bodies.
+
+### Pull Requests
+
+**Always use squash merge** when merging PRs to main. This keeps the commit history clean with one commit per feature/fix.
+
+```bash
+# Using GitHub CLI to squash merge a PR
+gh pr merge <PR_NUMBER> --squash --delete-branch
+
+# Or with auto-merge enabled
+gh pr merge <PR_NUMBER> --squash --auto
+```
+
+Do NOT use regular merge or rebase merge, as these create noisy commit history with intermediate commits.
+
+### After Squash Merge
+
+Once a PR is squash-merged, its source branch diverges from main. **Do NOT** try to push additional commits from that branch—you will get merge conflicts.
+
+For follow-up fixes:
+```bash
+git checkout main
+git pull origin main
+git checkout -b fix/<short-description>
+# Apply fixes on the fresh branch
+```
+
+### Plans and Worktrees
+
+#### Plans
+
+Design documents and implementation plans are stored in `docs/plans/` inside the worktree (not the main repo). Save plans to the worktree so they are committed on the feature branch and visible in the draft PR.
+
+**Path warning:** When working in a worktree, use paths relative to the worktree root (e.g., `docs/plans/plan.md`). Do NOT prefix with the worktree directory from the main repo (e.g., `agentv.worktrees/feat/xxx/docs/plans/plan.md`) — this creates accidental nested directories inside the worktree.
+
+Plans are temporary working materials. **Before merging the PR**, delete the plan file and incorporate any user-relevant details into the official documentation.
+
+#### Git Worktrees
+
+Use the sibling `../agentv.worktrees/` directory for all AgentV worktrees. This overrides any generic skill or default preference for `.worktrees/` or `worktrees/` inside the repository. Do not create new AgentV worktrees inside the repository root.
+
+After creating a worktree, always run setup:
+```bash
+bun install                                    # worktrees do NOT share node_modules
+cp "$(git worktree list --porcelain | head -1 | sed 's/worktree //')/.env" .env    # required for e2e tests and LLM operations
+```
+Both steps are required before running builds, tests, or evals in the worktree.
+
+## Version Management
+
+This project uses a simple release script for version bumping. The git commit history serves as the changelog.
+
+### Releasing a new version
+
+Run the release script for a version bump:
+
+```bash
+bun run release          # patch bump (default)
+bun run release minor    # minor bump
+bun run release major    # major bump
+```
+
+The script will:
+1. Validate you're on the `main` branch with no uncommitted changes
+2. Pull latest changes from origin
+3. Bump version in all package.json files
+4. Commit the version bump
+5. Create and push a git tag
+
+Recommended publish flow:
+```bash
+bun run publish:next   # publish current version to npm `next`
+bun run promote:latest # promote same version to npm `latest`
+bun run tag:next 2.18.0
+bun run promote:latest 2.18.0
+```
+
+## Package Publishing
+- Core package (`packages/core/`) - Core evaluation engine and grading logic (published as `@agentv/core`)
+- CLI package (`apps/cli/`) is published as `agentv` on npm
+- Uses tsup with `noExternal: ["@agentv/core"]` to bundle workspace dependencies
+- Install command: `bun install -g agentv` (preferred) or `npm install -g agentv`
+
+## Python Scripts
+When running Python scripts, always use: `uv run <script.py>`
diff --git a/apps/cli/src/commands/eval/commands/run.ts b/apps/cli/src/commands/eval/commands/run.ts
index e680301f3..5df5ee42b 100644
--- a/apps/cli/src/commands/eval/commands/run.ts
+++ b/apps/cli/src/commands/eval/commands/run.ts
@@ -175,6 +175,11 @@ export const evalRunCommand = command({
       description:
         'Number of trailing messages to include in results output (default: 1, or "all")',
     }),
+    threshold: option({
+      type: optional(number),
+      long: 'threshold',
+      description: 'Suite-level quality gate: exit 1 if mean score falls below this value (0-1)',
+    }),
   },
   handler: async (args) => {
     // Launch interactive wizard when no eval paths and stdin is a TTY
@@ -217,7 +222,11 @@ export const evalRunCommand = command({
       graderTarget: args.graderTarget,
       model: args.model,
       outputMessages: args.outputMessages,
+      threshold: args.threshold,
     };
-    await runEvalCommand({ testFiles: resolvedPaths, rawOptions });
+    const result = await runEvalCommand({ testFiles: resolvedPaths, rawOptions });
+    if (result?.thresholdFailed) {
+      process.exit(1);
+    }
   },
 });
diff --git a/apps/cli/src/commands/eval/junit-writer.ts b/apps/cli/src/commands/eval/junit-writer.ts
index f3bfb7f18..514b24585 100644
--- a/apps/cli/src/commands/eval/junit-writer.ts
+++ b/apps/cli/src/commands/eval/junit-writer.ts
@@ -3,6 +3,10 @@ import path from 'node:path';
 
 import type { EvaluationResult } from '@agentv/core';
 
+export interface JunitWriterOptions {
+  readonly threshold?: number;
+}
+
 export function escapeXml(str: string): string {
   return str
     .replace(/&/g, '&amp;')
@@ -15,15 +19,17 @@ export function escapeXml(str: string): string {
 export class JunitWriter {
   private readonly filePath: string;
   private readonly results: EvaluationResult[] = [];
+  private readonly threshold: number;
   private closed = false;
 
-  private constructor(filePath: string) {
+  private constructor(filePath: string, options?: JunitWriterOptions) {
     this.filePath = filePath;
+    this.threshold = options?.threshold ?? 0.5;
   }
 
-  static async open(filePath: string): Promise<JunitWriter> {
+  static async open(filePath: string, options?: JunitWriterOptions): Promise<JunitWriter> {
     await mkdir(path.dirname(filePath), { recursive: true });
-    return new JunitWriter(filePath);
+    return new JunitWriter(filePath, options);
   }
 
   async append(result: EvaluationResult): Promise<void> {
@@ -52,7 +58,7 @@ export class JunitWriter {
 
     const suiteXmls: string[] = [];
     for (const [suiteName, results] of grouped) {
-      const failures = results.filter((r) => r.score < 0.5).length;
+      const failures = results.filter((r) => r.score < this.threshold).length;
       const errors = results.filter((r) => r.error !== undefined).length;
 
       const testCases = results.map((r) => {
@@ -61,7 +67,7 @@ export class JunitWriter {
         let inner = '';
         if (r.error) {
           inner = `\n      <error message="${escapeXml(r.error)}">${escapeXml(r.error)}</error>\n    `;
-        } else if (r.score < 0.5) {
+        } else if (r.score < this.threshold) {
           const message = `score=${r.score.toFixed(3)}`;
           const failedAssertions = r.assertions.filter((a) => !a.passed);
           const detail = [
@@ -84,7 +90,7 @@ export class JunitWriter {
     }
 
     const totalTests = this.results.length;
-    const totalFailures = this.results.filter((r) => r.score < 0.5).length;
+    const totalFailures = this.results.filter((r) => r.score < this.threshold).length;
     const totalErrors = this.results.filter((r) => r.error !== undefined).length;
 
     const xml = `<?xml version="1.0" encoding="UTF-8"?>\n<testsuites tests="${totalTests}" failures="${totalFailures}" errors="${totalErrors}">\n${suiteXmls.join('\n')}\n</testsuites>\n`;
diff --git a/apps/cli/src/commands/eval/output-writer.ts b/apps/cli/src/commands/eval/output-writer.ts
index acaf757fe..e4d2cebd8 100644
--- a/apps/cli/src/commands/eval/output-writer.ts
+++ b/apps/cli/src/commands/eval/output-writer.ts
@@ -15,6 +15,10 @@ export interface OutputWriter {
   close(): Promise<void>;
 }
 
+export interface WriterOptions {
+  readonly threshold?: number;
+}
+
 export async function createOutputWriter(
   filePath: string,
   format: OutputFormat,
@@ -35,7 +39,10 @@ export async function createOutputWriter(
 
 const SUPPORTED_EXTENSIONS = new Set(['.jsonl', '.json', '.xml', '.yaml', '.yml', '.html', '.htm']);
 
-export function createWriterFromPath(filePath: string): Promise<OutputWriter> {
+export function createWriterFromPath(
+  filePath: string,
+  options?: WriterOptions,
+): Promise<OutputWriter> {
   const ext = path.extname(filePath).toLowerCase();
   switch (ext) {
     case '.jsonl':
@@ -43,7 +50,7 @@ export function createWriterFromPath(filePath: string): Promise<OutputWriter> {
     case '.json':
       return JsonWriter.open(filePath);
     case '.xml':
-      return JunitWriter.open(filePath);
+      return JunitWriter.open(filePath, { threshold: options?.threshold });
     case '.yaml':
     case '.yml':
       return YamlWriter.open(filePath);
@@ -57,8 +64,11 @@ export function createWriterFromPath(filePath: string): Promise<OutputWriter> {
   }
 }
 
-export async function createMultiWriter(filePaths: readonly string[]): Promise<OutputWriter> {
-  const writers = await Promise.all(filePaths.map((fp) => createWriterFromPath(fp)));
+export async function createMultiWriter(
+  filePaths: readonly string[],
+  options?: WriterOptions,
+): Promise<OutputWriter> {
+  const writers = await Promise.all(filePaths.map((fp) => createWriterFromPath(fp, options)));
   return {
     async append(result: EvaluationResult): Promise<void> {
       await Promise.all(writers.map((w) => w.append(result)));
diff --git a/apps/cli/src/commands/eval/run-eval.ts b/apps/cli/src/commands/eval/run-eval.ts
index 2d486eab2..ac3a84cd9 100644
--- a/apps/cli/src/commands/eval/run-eval.ts
+++ b/apps/cli/src/commands/eval/run-eval.ts
@@ -45,6 +45,7 @@ import {
   calculateEvaluationSummary,
   formatEvaluationSummary,
   formatMatrixSummary,
+  formatThresholdSummary,
 } from './statistics.js';
 import { type TargetSelection, selectMultipleTargets, selectTarget } from './targets.js';
 
@@ -86,6 +87,7 @@ interface NormalizedOptions {
   readonly graderTarget?: string;
   readonly model?: string;
   readonly outputMessages: number | 'all';
+  readonly threshold?: number;
 }
 
 function normalizeBoolean(value: unknown): boolean {
@@ -301,6 +303,7 @@ function normalizeOptions(
     graderTarget: normalizeString(rawOptions.graderTarget),
     model: normalizeString(rawOptions.model),
     outputMessages: normalizeOutputMessages(normalizeString(rawOptions.outputMessages)),
+    threshold: normalizeOptionalNumber(rawOptions.threshold),
   } satisfies NormalizedOptions;
 }
 
@@ -430,6 +433,7 @@ async function prepareFileMetadata(params: {
   readonly yamlCachePath?: string;
   readonly totalBudgetUsd?: number;
   readonly failOnError?: FailOnError;
+  readonly threshold?: number;
 }> {
   const { testFilePath, repoRoot, cwd, options } = params;
 
@@ -515,6 +519,7 @@ async function prepareFileMetadata(params: {
     yamlCachePath: suite.cacheConfig?.cachePath,
     totalBudgetUsd: suite.totalBudgetUsd,
     failOnError: suite.failOnError,
+    threshold: suite.threshold,
   };
 }
 
@@ -749,6 +754,8 @@ export interface RunEvalResult {
   readonly outputPath: string;
   readonly testFiles: readonly string[];
   readonly target?: string;
+  /** True when --threshold is set and mean score is below the threshold */
+  readonly thresholdFailed?: boolean;
 }
 
 export async function runEvalCommand(
@@ -901,12 +908,9 @@ export async function runEvalCommand(
     extraOutputPaths.length > 0 ? [outputPath, ...extraOutputPaths] : [outputPath];
   const uniqueReportedOutputPaths = [...new Set(reportedOutputPaths)];
 
-  let outputWriter: OutputWriter;
   if (uniqueOutputPaths.length === 1) {
-    outputWriter = await createOutputWriter(primaryWritePath, options.format);
     console.log(`Output path: ${outputPath}`);
   } else {
-    outputWriter = await createMultiWriter(uniqueOutputPaths);
     console.log('Output paths:');
     for (const p of uniqueReportedOutputPaths) {
       console.log(`  ${p}`);
@@ -951,6 +955,7 @@ export async function runEvalCommand(
       readonly yamlCachePath?: string;
       readonly totalBudgetUsd?: number;
       readonly failOnError?: FailOnError;
+      readonly threshold?: number;
     }
   >();
   // Separate TypeScript/JS eval files from YAML files.
@@ -1006,6 +1011,24 @@ export async function runEvalCommand(
     console.log(`Response cache: enabled${yamlCachePath ? ` (${yamlCachePath})` : ''}`);
   }
 
+  // Resolve suite-level threshold: CLI --threshold takes precedence over YAML execution.threshold
+  const yamlThreshold = firstMeta?.threshold;
+  const resolvedThreshold = options.threshold ?? yamlThreshold;
+  if (resolvedThreshold !== undefined && (resolvedThreshold < 0 || resolvedThreshold > 1)) {
+    throw new Error('--threshold must be between 0 and 1');
+  }
+
+  // Build the output writer (deferred until after threshold is resolved so JUnit
+  // writer can use the resolved threshold for per-test pass/fail decisions)
+  const writerOptions =
+    resolvedThreshold !== undefined ? { threshold: resolvedThreshold } : undefined;
+  let outputWriter: OutputWriter;
+  if (uniqueOutputPaths.length === 1) {
+    outputWriter = await createOutputWriter(primaryWritePath, options.format);
+  } else {
+    outputWriter = await createMultiWriter(uniqueOutputPaths, writerOptions);
+  }
+
   // Detect matrix mode: multiple targets for any file
   const isMatrixMode = Array.from(fileMetadata.values()).some((meta) => meta.selections.length > 1);
 
@@ -1152,6 +1175,14 @@ export async function runEvalCommand(
     const summary = calculateEvaluationSummary(allResults);
     console.log(formatEvaluationSummary(summary));
 
+    // Threshold quality gate check
+    let thresholdFailed = false;
+    if (resolvedThreshold !== undefined) {
+      const thresholdResult = formatThresholdSummary(summary.mean, resolvedThreshold);
+      console.log(`\n${thresholdResult.message}`);
+      thresholdFailed = !thresholdResult.passed;
+    }
+
     // Print matrix summary when multiple targets were evaluated
     if (isMatrixMode && allResults.length > 0) {
       console.log(formatMatrixSummary(allResults));
@@ -1246,6 +1277,7 @@ export async function runEvalCommand(
       outputPath,
       testFiles: resolvedTestFiles,
       target: options.target,
+      thresholdFailed,
     };
   } finally {
     unsubscribeCodexLogs();
diff --git a/apps/cli/src/commands/eval/statistics.ts b/apps/cli/src/commands/eval/statistics.ts
index e47a65791..910052d24 100644
--- a/apps/cli/src/commands/eval/statistics.ts
+++ b/apps/cli/src/commands/eval/statistics.ts
@@ -334,3 +334,17 @@ export function formatMatrixSummary(results: readonly EvaluationResult[]): strin
 
   return lines.join('\n');
 }
+
+/**
+ * Format a threshold check summary line.
+ * Returns whether the threshold was met and the formatted message.
+ */
+export function formatThresholdSummary(
+  meanScore: number,
+  threshold: number,
+): { passed: boolean; message: string } {
+  const passed = meanScore >= threshold;
+  const verdict = passed ? 'PASS' : 'FAIL';
+  const message = `Suite score: ${meanScore.toFixed(2)} (threshold: ${threshold.toFixed(2)}) — ${verdict}`;
+  return { passed, message };
+}
diff --git a/apps/cli/test/commands/eval/output-writers.test.ts b/apps/cli/test/commands/eval/output-writers.test.ts
index 8c1ea67fb..75ff80da2 100644
--- a/apps/cli/test/commands/eval/output-writers.test.ts
+++ b/apps/cli/test/commands/eval/output-writers.test.ts
@@ -162,6 +162,32 @@ describe('JunitWriter', () => {
       'Cannot write to closed JUnit writer',
     );
   });
+
+  it('uses custom threshold for pass/fail when provided', async () => {
+    const filePath = path.join(testDir, `junit-threshold-${Date.now()}.xml`);
+    const writer = await JunitWriter.open(filePath, { threshold: 0.8 });
+
+    await writer.append(makeResult({ testId: 'high', score: 0.9 }));
+    await writer.append(makeResult({ testId: 'mid', score: 0.6 }));
+    await writer.close();
+
+    const xml = await readFile(filePath, 'utf8');
+    expect(xml).not.toContain('<failure message="score=0.900"');
+    expect(xml).toContain('<failure message="score=0.600"');
+  });
+
+  it('defaults to 0.5 threshold when none provided', async () => {
+    const filePath = path.join(testDir, `junit-default-${Date.now()}.xml`);
+    const writer = await JunitWriter.open(filePath);
+
+    await writer.append(makeResult({ testId: 'pass', score: 0.6 }));
+    await writer.append(makeResult({ testId: 'fail', score: 0.3 }));
+    await writer.close();
+
+    const xml = await readFile(filePath, 'utf8');
+    expect(xml).not.toContain('<failure message="score=0.600"');
+    expect(xml).toContain('<failure message="score=0.300"');
+  });
 });
 
 describe('escapeXml', () => {
diff --git a/apps/cli/test/commands/eval/threshold.test.ts b/apps/cli/test/commands/eval/threshold.test.ts
new file mode 100644
index 000000000..65c059167
--- /dev/null
+++ b/apps/cli/test/commands/eval/threshold.test.ts
@@ -0,0 +1,31 @@
+import { describe, expect, it } from 'bun:test';
+
+import { formatThresholdSummary } from '../../../src/commands/eval/statistics.js';
+
+describe('formatThresholdSummary', () => {
+  it('returns PASS when mean score meets threshold', () => {
+    const result = formatThresholdSummary(0.85, 0.6);
+    expect(result.passed).toBe(true);
+    expect(result.message).toContain('0.85');
+    expect(result.message).toContain('0.60');
+    expect(result.message).toContain('PASS');
+  });
+
+  it('returns FAIL when mean score is below threshold', () => {
+    const result = formatThresholdSummary(0.53, 0.6);
+    expect(result.passed).toBe(false);
+    expect(result.message).toContain('0.53');
+    expect(result.message).toContain('0.60');
+    expect(result.message).toContain('FAIL');
+  });
+
+  it('returns PASS when mean score exactly equals threshold', () => {
+    const result = formatThresholdSummary(0.6, 0.6);
+    expect(result.passed).toBe(true);
+  });
+
+  it('returns PASS for threshold 0 with any score', () => {
+    const result = formatThresholdSummary(0, 0);
+    expect(result.passed).toBe(true);
+  });
+});
diff --git a/apps/web/src/content/docs/evaluation/eval-files.mdx b/apps/web/src/content/docs/evaluation/eval-files.mdx
index 281614053..41c03eb97 100644
--- a/apps/web/src/content/docs/evaluation/eval-files.mdx
+++ b/apps/web/src/content/docs/evaluation/eval-files.mdx
@@ -34,7 +34,7 @@ tests:
 |-------|-------------|
 | `description` | Human-readable description of the evaluation |
 | `dataset` | Optional dataset identifier |
-| `execution` | Default execution config (`target`, `fail_on_error`, etc.) |
+| `execution` | Default execution config (`target`, `fail_on_error`, `threshold`, etc.) |
 | `workspace` | Suite-level workspace config — inline object or string path to an [external workspace file](/guides/workspace-pool/#external-workspace-config) |
 | `tests` | Array of individual tests, or a string path to an external file |
 | `assertions` | Suite-level evaluators appended to each test unless `execution.skip_defaults: true` is set on the test |
diff --git a/apps/web/src/content/docs/evaluation/running-evals.mdx b/apps/web/src/content/docs/evaluation/running-evals.mdx
index 7e221bbc6..5c502aa19 100644
--- a/apps/web/src/content/docs/evaluation/running-evals.mdx
+++ b/apps/web/src/content/docs/evaluation/running-evals.mdx
@@ -229,6 +229,33 @@ execution:
 
 When halted, remaining tests are recorded with `failureReasonCode: 'error_threshold_exceeded'`. With concurrency > 1, a few additional tests may complete before halting takes effect.
 
+### Suite-Level Quality Threshold
+
+Set a minimum mean score for the eval suite. If the mean quality score falls below the threshold, the CLI exits with code 1 — useful for CI/CD quality gates.
+
+**CLI flag:**
+
+```bash
+agentv eval evals/ --threshold 0.8
+```
+
+**YAML config:**
+
+```yaml
+execution:
+  threshold: 0.8
+```
+
+The CLI `--threshold` flag overrides the YAML value. The threshold is a number between 0 and 1. Mean score is computed from quality results only (execution errors are excluded).
+
+When active, a summary line is printed after the eval results:
+
+```
+Suite score: 0.85 (threshold: 0.80) — PASS
+```
+
+The threshold also controls JUnit XML pass/fail: tests with scores below the threshold are marked as `<failure>` in JUnit output. When no threshold is set, JUnit defaults to 0.5.
+
 ## Validate Before Running
 
 Check eval files for schema errors without executing:
diff --git a/packages/core/src/evaluation/loaders/config-loader.ts b/packages/core/src/evaluation/loaders/config-loader.ts
index 4835dcbd2..54505cddc 100644
--- a/packages/core/src/evaluation/loaders/config-loader.ts
+++ b/packages/core/src/evaluation/loaders/config-loader.ts
@@ -333,6 +333,32 @@ export function extractFailOnError(suite: JsonObject): FailOnError | undefined {
   return undefined;
 }
 
+/**
+ * Extract `execution.threshold` from parsed eval suite.
+ * Accepts a number in [0, 1] range.
+ * Returns undefined when not specified.
+ */
+export function extractThreshold(suite: JsonObject): number | undefined {
+  const execution = suite.execution;
+  if (!execution || typeof execution !== 'object' || Array.isArray(execution)) {
+    return undefined;
+  }
+
+  const executionObj = execution as Record<string, unknown>;
+  const raw = executionObj.threshold;
+
+  if (raw === undefined || raw === null) {
+    return undefined;
+  }
+
+  if (typeof raw === 'number' && raw >= 0 && raw <= 1) {
+    return raw;
+  }
+
+  logWarning(`Invalid execution.threshold: ${raw}. Must be a number between 0 and 1. Ignoring.`);
+  return undefined;
+}
+
 export function parseExecutionDefaults(
   raw: unknown,
   configPath: string,
diff --git a/packages/core/src/evaluation/validation/eval-file.schema.ts b/packages/core/src/evaluation/validation/eval-file.schema.ts
index 9ebf758a9..084eb3466 100644
--- a/packages/core/src/evaluation/validation/eval-file.schema.ts
+++ b/packages/core/src/evaluation/validation/eval-file.schema.ts
@@ -328,6 +328,7 @@ const ExecutionSchema = z.object({
   totalBudgetUsd: z.number().min(0).optional(),
   fail_on_error: FailOnErrorSchema.optional(),
   failOnError: FailOnErrorSchema.optional(),
+  threshold: z.number().min(0).max(1).optional(),
 });
 
 // ---------------------------------------------------------------------------
diff --git a/packages/core/src/evaluation/yaml-parser.ts b/packages/core/src/evaluation/yaml-parser.ts
index 119cadb47..e8004bbc9 100644
--- a/packages/core/src/evaluation/yaml-parser.ts
+++ b/packages/core/src/evaluation/yaml-parser.ts
@@ -13,6 +13,7 @@ import {
   extractTargetFromSuite,
   extractTargetsFromSuite,
   extractTargetsFromTestCase,
+  extractThreshold,
   extractTotalBudgetUsd,
   extractTrialsConfig,
   extractWorkersFromSuite,
@@ -59,6 +60,7 @@ export {
   extractTargetFromSuite,
   extractTargetsFromSuite,
   extractTargetsFromTestCase,
+  extractThreshold,
   extractTrialsConfig,
   extractWorkersFromSuite,
   loadConfig,
@@ -180,6 +182,8 @@ export type EvalSuiteResult = {
   readonly totalBudgetUsd?: number;
   /** Execution error tolerance: true or false */
   readonly failOnError?: import('./types.js').FailOnError;
+  /** Suite-level quality threshold (0-1) — suite fails if mean score is below */
+  readonly threshold?: number;
 };
 
 /**
@@ -201,6 +205,7 @@ export async function loadTestSuite(
   const { tests, parsed } = await loadTestsFromYaml(evalFilePath, repoRoot, options);
   const metadata = parseMetadata(parsed);
   const failOnError = extractFailOnError(parsed);
+  const threshold = extractThreshold(parsed);
   return {
     tests,
     trials: extractTrialsConfig(parsed),
@@ -210,6 +215,7 @@ export async function loadTestSuite(
     totalBudgetUsd: extractTotalBudgetUsd(parsed),
     ...(metadata !== undefined && { metadata }),
     ...(failOnError !== undefined && { failOnError }),
+    ...(threshold !== undefined && { threshold }),
   };
 }
 
diff --git a/packages/core/test/evaluation/loaders/config-loader.test.ts b/packages/core/test/evaluation/loaders/config-loader.test.ts
index 27dd52c1e..ac68e0eb9 100644
--- a/packages/core/test/evaluation/loaders/config-loader.test.ts
+++ b/packages/core/test/evaluation/loaders/config-loader.test.ts
@@ -5,6 +5,7 @@ import {
   extractTargetFromSuite,
   extractTargetsFromSuite,
   extractTargetsFromTestCase,
+  extractThreshold,
   extractTotalBudgetUsd,
   extractTrialsConfig,
   parseExecutionDefaults,
@@ -302,6 +303,48 @@ describe('extractFailOnError', () => {
   });
 });
 
+describe('extractThreshold', () => {
+  it('returns undefined when no execution block', () => {
+    const suite: JsonObject = { tests: [] };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+
+  it('returns undefined when threshold not set', () => {
+    const suite: JsonObject = { execution: { target: 'default' } };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+
+  it('parses valid threshold', () => {
+    const suite: JsonObject = { execution: { threshold: 0.8 } };
+    expect(extractThreshold(suite)).toBe(0.8);
+  });
+
+  it('accepts 0 as threshold', () => {
+    const suite: JsonObject = { execution: { threshold: 0 } };
+    expect(extractThreshold(suite)).toBe(0);
+  });
+
+  it('accepts 1 as threshold', () => {
+    const suite: JsonObject = { execution: { threshold: 1 } };
+    expect(extractThreshold(suite)).toBe(1);
+  });
+
+  it('returns undefined for negative threshold', () => {
+    const suite: JsonObject = { execution: { threshold: -0.1 } };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+
+  it('returns undefined for threshold > 1', () => {
+    const suite: JsonObject = { execution: { threshold: 1.5 } };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+
+  it('returns undefined for non-number threshold', () => {
+    const suite: JsonObject = { execution: { threshold: 'high' } };
+    expect(extractThreshold(suite)).toBeUndefined();
+  });
+});
+
 describe('parseExecutionDefaults', () => {
   it('returns undefined when no execution block', () => {
     expect(parseExecutionDefaults(undefined, '/test/config.yaml')).toBeUndefined();
diff --git a/plugins/agentv-dev/skills/agentv-eval-writer/SKILL.md b/plugins/agentv-dev/skills/agentv-eval-writer/SKILL.md
index efc818f3c..7a6f2c3f5 100644
--- a/plugins/agentv-dev/skills/agentv-eval-writer/SKILL.md
+++ b/plugins/agentv-dev/skills/agentv-eval-writer/SKILL.md
@@ -520,11 +520,24 @@ execution:
 
 When halted, remaining tests get `executionStatus: 'execution_error'` with `failureReasonCode: 'error_threshold_exceeded'`.
 
+## Suite-Level Quality Threshold
+
+Set a minimum mean score for the eval suite. If the mean quality score falls below the threshold, the CLI exits with code 1 — useful for CI/CD quality gates.
+
+```yaml
+execution:
+  threshold: 0.8
+```
+
+CLI flag `--threshold 0.8` overrides the YAML value. Must be a number between 0 and 1. Mean score is computed from quality results only (execution errors excluded).
+
+The threshold also controls JUnit XML pass/fail: tests with scores below the threshold are marked as `<failure>`. When no threshold is set, JUnit defaults to 0.5.
+
 ## CLI Commands
 
 ```bash
 # Run evaluation (requires API keys)
-agentv eval <file.yaml> [--test-id <id>] [--target <name>] [--dry-run]
+agentv eval <file.yaml> [--test-id <id>] [--target <name>] [--dry-run] [--threshold <0-1>]
 
 # Run with OTLP JSON file (importable by OTel backends)
 agentv eval <file.yaml> --otel-file traces/eval.otlp.json
diff --git a/plugins/agentv-dev/skills/agentv-eval-writer/references/eval-schema.json b/plugins/agentv-dev/skills/agentv-eval-writer/references/eval-schema.json
index a7be8362b..9827ee04c 100644
--- a/plugins/agentv-dev/skills/agentv-eval-writer/references/eval-schema.json
+++ b/plugins/agentv-dev/skills/agentv-eval-writer/references/eval-schema.json
@@ -6136,6 +6136,11 @@
                       },
                       "failOnError": {
                         "type": "boolean"
+                      },
+                      "threshold": {
+                        "type": "number",
+                        "minimum": 0,
+                        "maximum": 1
                       }
                     },
                     "additionalProperties": false
@@ -12442,6 +12447,11 @@
                       },
                       "failOnError": {
                         "type": "boolean"
+                      },
+                      "threshold": {
+                        "type": "number",
+                        "minimum": 0,
+                        "maximum": 1
                       }
                     },
                     "additionalProperties": false
@@ -15700,6 +15710,11 @@
             },
             "failOnError": {
               "type": "boolean"
+            },
+            "threshold": {
+              "type": "number",
+              "minimum": 0,
+              "maximum": 1
             }
           },
           "additionalProperties": false