Skip to content

tighten todo prompt and export exceptions#8

Merged
AustinKelsay merged 5 commits intomainfrom
refactor/tighten-prompts-and-existing-tests
Feb 15, 2026
Merged

tighten todo prompt and export exceptions#8
AustinKelsay merged 5 commits intomainfrom
refactor/tighten-prompts-and-existing-tests

Conversation

@AustinKelsay
Copy link
Copy Markdown
Owner

@AustinKelsay AustinKelsay commented Feb 12, 2026

Summary by CodeRabbit

  • New Features
    • Two new scoring failure types added ("missing_export", "factory_init_failed"), richer per-case scoring details, and retry-aware generation/scoring for improved robustness.
  • Schema
    • Schema version bumped to 0.2.2 to include the new failure types.
  • Tests
    • Expanded tests to cover new failure categories, output-evaluation logic, and retry behaviors.
  • Documentation
    • Updated READMEs, prompts, and guides clarifying output contracts, CLI commands, and spec conventions.

@vercel
Copy link
Copy Markdown

vercel Bot commented Feb 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
plebdev-bench-dashboard Ready Ready Preview, Comment Feb 15, 2026 3:15pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds two new scoring failure types ("missing_export", "factory_init_failed"); extends schemas/types and tests; updates error classification to return these types; extracts stdout suppression into src/lib/stdout-suppressor.ts; updates scorer to use suppression, aggregate expected totals, and return detailed failure results; introduces code-output policy, harness retry logic, and a compile-feedback retry flow in the runner.

Changes

Cohort / File(s) Summary
Schema & Types
apps/dashboard/src/lib/schemas.ts, apps/dashboard/src/lib/types.ts, src/schemas/common.schema.ts
Added "missing_export" and "factory_init_failed" to enums/unions and bumped SCHEMA_VERSION to 0.2.2.
Failure Classification & Tests
src/lib/failure-classifier.ts, test/failure-classifier.test.ts
Mapped new error phrases to "missing_export" and "factory_init_failed" (kept export_validation); updated tests to assert new classifications and ordering.
Stdout Suppression Utility
src/lib/stdout-suppressor.ts, src/lib/scorer.ts
New suppressor module exported; scorer now imports suppressStdout/suppressStdoutAsync and removed in-file suppression implementations.
Scorer Logic
src/lib/scorer.ts
Introduced expectedTotal for unified totals; added consolidated missing-export path returning per-export details with failureType: "missing_export"; added factory_init_failed branch and harmonized totals across error branches.
Code Output Policy & Harnesses
src/harnesses/code-output-policy.ts, src/harnesses/goose-adapter.ts, src/harnesses/opencode-adapter.ts
New centralized code-only prompt/evaluation utilities; adapters now use decision-based evaluation, support a single retry with retry-marker, and handle off-task/suspicious outputs; Opencode adapter adds ping() and adapter retry-aware prompt handling.
Runner Compile-Feedback Retry
src/runner/item-executor.ts
Adds compile-feedback retry flow: builds retry prompts with compiler/import errors, re-runs generation/scoring when appropriate, and promotes better retry results.
Tests for New Policy
test/code-output-policy.test.ts, test/schemas.test.ts
Added tests for code-output policy behavior and updated schema-version expectation.
Docs, Prompts & Bench Tests
README.md, llm/*, src/tests/*
Numerous prompt/readme updates: added "Output contract" sections, renamed some test artifacts to scoring.spec.ts, refined rubrics and instructions across many tests.
Dashboard Results & Plan
apps/dashboard/public/results/*.json
Updated plan/result metadata to schemaVersion: 0.2.2, adjusted timeouts/runtimes, and replaced/removed run entries.
Misc (new modules & test artifacts)
src/lib/stdout-suppressor.ts, src/harnesses/*, test/*
Added new utility module and policy harness, expanded adapter logic and tests; various files added/modified to support retry and evaluation flows.

Sequence Diagram(s)

sequenceDiagram
    participant Runner
    participant Harness as Harness (Goose/OpenCode)
    participant CodePolicy as CodeOutputPolicy
    participant Scorer
    participant StdSuppress as StdoutSuppressor
    participant Classifier

    Runner->>Harness: generateCode(prompt)
    Harness-->>CodePolicy: rawOutput
    CodePolicy-->>Harness: decision {shouldRetry, reason, code}
    alt decision.shouldRetry && retryAllowed
        Harness->>Harness: re-invoke with retry marker
        Harness-->>CodePolicy: rawOutput (retry)
    end
    CodePolicy-->>Runner: {code, method, shouldRetry=false}
    Runner->>Scorer: scoreGeneration(testSlug, codeFilePath?, rawOutput?)
    Scorer->>StdSuppress: suppressStdoutAsync(run import/execution)
    StdSuppress-->>Scorer: suppression active
    Scorer->>Scorer: import/execute candidate module
    alt missing exports detected
        Scorer->>Classifier: "Missing export ..."
        Classifier-->>Scorer: "missing_export"
        Scorer->>StdSuppress: restore
        Scorer-->>Runner: failure(failureType: "missing_export", totals, details)
    else factory init fails
        Scorer->>Classifier: "Failed to create instance ..."
        Classifier-->>Scorer: "factory_init_failed"
        Scorer->>StdSuppress: restore
        Scorer-->>Runner: failure(failureType: "factory_init_failed", totals, details)
    else imports & factory OK
        Scorer->>StdSuppress: restore
        Scorer-->>Runner: success(results)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through code with tiny paws,
I named two faults with tidy claws,
I hushed the logs while imports tried,
Gave missing exports a place to hide,
Then nudged retries — the tests looked pleased.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'tighten todo prompt and export exceptions' is directly related to the main changes, which involve tightening prompt specifications and handling two new export-related failure types (missing_export and factory_init_failed).
Docstring Coverage ✅ Passed Docstring coverage is 94.44% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/tighten-prompts-and-existing-tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/schemas/common.schema.ts`:
- Around line 65-66: Update the schema version and test expectation to reflect
the additive enum change: bump SCHEMA_VERSION from "0.2.1" to "0.2.2" next to
the modified scoringFailureTypes (which now includes "missing_export" and
"factory_init_failed"), and update the assertion in test/schemas.test.ts that
checks SCHEMA_VERSION so the test expects "0.2.2". Ensure only the
SCHEMA_VERSION value and the test expectation are changed; do not alter the
scoringFailureTypes entries themselves.

In `@src/tests/todo-app/prompt.blind.md`:
- Around line 9-13: Remove the duplicate plain-list entry that names the methods
`listTodos`, `listCompleted`, `listPending`, and `clearCompleted` (the short
list without signatures) so only the detailed signature lines for `listTodos():
Array<{ id: number, text: string, completed: boolean }>`, `listCompleted():
Array<{ id: number, text: string, completed: boolean }>`, `listPending():
Array<{ id: number, text: string, completed: boolean }>`, and `clearCompleted():
void` remain; simply delete the redundant line to avoid repeating those method
names without signatures.

Comment thread src/schemas/common.schema.ts
Comment thread src/tests/todo-app/prompt.blind.md Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/schemas/common.schema.ts (1)

61-72: ⚠️ Potential issue | 🟡 Minor

Update header export list to include scoring failure exports.

The header’s export list omits the newly exported scoring failure constants/types, so it no longer matches the file’s public surface. Please update it for accuracy.

✍️ Suggested header update
- * Exports: SCHEMA_VERSION, passTypes, PassTypeSchema, PassType,
- *          itemStatusTypes, ItemStatusSchema, ItemStatus,
- *          runtimeNames, RuntimeNameSchema, RuntimeName
+ * Exports: SCHEMA_VERSION, runtimeNames, RuntimeNameSchema, RuntimeName,
+ *          passTypes, PassTypeSchema, PassType,
+ *          itemStatusTypes, ItemStatusSchema, ItemStatus,
+ *          generationFailureTypes, GenerationFailureTypeSchema, GenerationFailureType,
+ *          scoringFailureTypes, ScoringFailureTypeSchema, ScoringFailureType,
+ *          frontierEvalFailureTypes, FrontierEvalFailureTypeSchema, FrontierEvalFailureType

As per coding guidelines, "Every file begins with a short header documenting purpose, exports, and invariants".

@AustinKelsay
Copy link
Copy Markdown
Owner Author

@CodeRabbit full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 12, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/lib/stdout-suppressor.ts`:
- Around line 37-48: The return from result.finally(restore) is a Promise when
fn() yields a Promise, but the code casts it incorrectly to T; update the cast
so the expression in the try branch returns a Promise<T> by changing the cast on
result.finally(restore) to as Promise<T>, keeping the surrounding logic (fn,
result, restore) intact so the async overload's return type is correct.
🧹 Nitpick comments (1)
llm/project/project-rules.md (1)

41-41: Document the new *.spec.ts suffix convention.

The rename from scoring.test.ts to scoring.spec.ts introduces a new file naming pattern that distinguishes specification files from test files. However, the suffix conventions section (lines 70-71) only documents *.test.ts for vitest tests.

Consider adding an entry for *.spec.ts to clarify its purpose:

  • *.spec.ts — scoring specifications (define expected exports and test cases for benchmark tests)
  • *.test.ts — vitest tests
📝 Suggested addition to naming conventions

Add this to the "Suffix conventions" section around line 71:

 - Suffix conventions:
+  - `*.spec.ts` — scoring specifications (expected exports and test cases for benchmarks)
   - `*.test.ts` — vitest tests
   - `*.schema.ts` — Zod schemas

Also applies to: 159-159

Comment thread src/lib/stdout-suppressor.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@apps/dashboard/public/results/20260213-075756-7dc316/plan.json`:
- Around line 4557-4559: The summary field "runtimes" is incorrect (shows 2)
while every item in the plan lists "runtime": "ollama", so update the runtimes
summary to the correct count (1) or compute it dynamically from unique runtime
values; specifically change the "runtimes" value to 1 (or derive it from the set
of runtime values) so it matches the items and prevents dashboard
inconsistency—look for the JSON keys "totalItems", "runtimes", "models" and the
per-item "runtime": "ollama" when making the fix.

In `@apps/dashboard/public/results/index.json`:
- Around line 3-6: The durationMs (27786998) does not equal the delta between
startedAt and completedAt for runId "20260213-075756-7dc316"; recompute
durationMs as the millisecond difference between completedAt and startedAt
(completedAt 2026-02-14T01:38:41.264Z minus startedAt 2026-02-13T15:57:56.054Z =
34845210) and update the durationMs field accordingly (or adjust the timestamps
if those are wrong).

Comment thread apps/dashboard/public/results/20260213-075756-7dc316/plan.json
Comment thread apps/dashboard/public/results/index.json
@AustinKelsay AustinKelsay merged commit 267a50c into main Feb 15, 2026
2 of 3 checks passed
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@src/harnesses/code-output-policy.ts`:
- Around line 53-77: The TSDoc for the exported function buildCodeOnlyPrompt is
missing an `@throws` tag; update its docblock to include an `@throws` entry
describing any errors that can propagate (e.g., if getTestSpecificContract
throws or if invalid input is provided) or explicitly state that it does not
throw, so the docblock contains purpose, `@param`, `@returns` and `@throws`; apply the
same pattern to the other exported functions noted (lines 112-178 and 230-251)
using their respective function names (e.g., getTestSpecificContract) to
indicate possible thrown errors or "never throws" as appropriate.

In `@src/harnesses/opencode-adapter.ts`:
- Around line 46-47: Update the TSDoc for the exported function
createOpenCodeAdapter to include a brief purpose description plus explicit
`@returns` and `@throws` tags: state that it returns a Harness instance (type:
Harness) and document any runtime errors that callers should expect (e.g.,
initialization/config errors or exceptions thrown during adapter construction).
Attach these tags to the existing docblock above createOpenCodeAdapter so the
exported function has purpose, returns, and throws documented for consumers.

In `@src/runner/item-executor.ts`:
- Around line 319-416: The current promotion logic in the run that sets
generation = retryAttempt.generation and scoringResult =
retryAttempt.scoringResult loses the original attempt evidence; before replacing
these fields in the promotion branch of runCompileFeedbackRetry (the block
handling shouldPromoteRetry around retryAttempt and variables generation,
scoringResult, previousPassed), capture the original attempt (generation,
scoringResult, duration, and any failureType/error) and append it to a
persistent attempts/retryHistory on the item result (or include an
originalAttempt field on MatrixItemResult) so both the original and promoted
attempts are stored; then perform the promotion (replace
generation/scoringResult) but leave the new history entry referencing both
attempts and any metadata (timestamps, durations, pass counts) so
auditing/metrics can reconstruct the full decision.

In `@test/code-output-policy.test.ts`:
- Around line 1-4: Add an "Invariants" line to the file header in
test/code-output-policy.test.ts so it follows the project header standard;
update the existing comment block (the header with Purpose and Exports) to
include a third line beginning with "Invariants:" and describe the file-level
guarantees (e.g., test-only helpers, no side effects, or the specific invariant
relevant to these unit tests) to make the header include Purpose, Exports, and
Invariants.
🧹 Nitpick comments (1)
src/runner/item-executor.ts (1)

70-179: Consider extracting the compile-retry helpers to keep the file under 500 lines.
With the new retry helpers, this file now ends around Line 538. Moving COMPILE_RETRY_ERROR_MAX_LENGTH, buildCompileRetryPrompt, CompileRetryContext, and runCompileFeedbackRetry into a small helper module (e.g., src/runner/compile-retry.ts) would keep responsibilities split and meet the size guideline.

As per coding guidelines: "Keep files under 500 lines; split aggressively by responsibility".

Comment on lines +53 to +77
/**
* Builds the strict code-only prompt contract.
*
* @param prompt - Base task prompt
* @param isRetry - Whether this is the retry attempt
* @returns Prompt text with strict output contract appended
*/
export function buildCodeOnlyPrompt(prompt: string, isRetry: boolean): string {
const retryLine = isRetry
? "Previous output was unusable. Retry now and output only final TypeScript source."
: "";
const testSpecificLines = getTestSpecificContract(prompt);
return [
prompt.trim(),
"",
"Output contract:",
"- Return only final TypeScript source code.",
"- Do not include markdown fences, analysis, status messages, or repository exploration text.",
"- If uncertain, still return your best complete TypeScript implementation.",
...testSpecificLines,
retryLine,
]
.filter((line) => line.length > 0)
.join("\n");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add @throws tags to exported function TSDoc.
The guideline requires purpose/params/returns/throws on exported functions; these blocks omit @throws.

✍️ Suggested doc updates
@@
  * `@param` prompt - Base task prompt
  * `@param` isRetry - Whether this is the retry attempt
  * `@returns` Prompt text with strict output contract appended
+ * `@throws` {Error} Never.
  */
 export function buildCodeOnlyPrompt(prompt: string, isRetry: boolean): string {
@@
  * `@param` output - Harness text output
  * `@param` minOutputLength - Minimum usable length
  * `@returns` Decision describing retry behavior and extracted code candidate
+ * `@throws` {Error} Never.
  */
 export function evaluateCodeOnlyOutput(
@@
  * `@param` prompt - Prompt text
  * `@returns` True when the prompt includes the internal retry marker
+ * `@throws` {Error} Never.
  */
 export function hasRetryMarker(prompt: string): boolean {
@@
  * `@param` prompt - Prompt text
  * `@returns` Prompt text with marker appended
+ * `@throws` {Error} Never.
  */
 export function appendRetryMarker(prompt: string): string {
@@
  * `@param` prompt - Prompt text
  * `@returns` Prompt without internal retry marker
+ * `@throws` {Error} Never.
  */
 export function stripRetryMarker(prompt: string): string {

As per coding guidelines: "All exported functions require TSDoc/JSDoc documentation (purpose, params, returns, throws)".

Also applies to: 112-178, 230-251

🤖 Prompt for AI Agents
In `@src/harnesses/code-output-policy.ts` around lines 53 - 77, The TSDoc for the
exported function buildCodeOnlyPrompt is missing an `@throws` tag; update its
docblock to include an `@throws` entry describing any errors that can propagate
(e.g., if getTestSpecificContract throws or if invalid input is provided) or
explicitly state that it does not throw, so the docblock contains purpose,
`@param`, `@returns` and `@throws`; apply the same pattern to the other exported
functions noted (lines 112-178 and 230-251) using their respective function
names (e.g., getTestSpecificContract) to indicate possible thrown errors or
"never throws" as appropriate.

Comment on lines +46 to 47
/** Creates an OpenCode harness adapter. */
export function createOpenCodeAdapter(): Harness {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Expand createOpenCodeAdapter TSDoc to include returns/throws.
As an exported function, it should include the full TSDoc tag set.

✍️ Suggested doc update
-/** Creates an OpenCode harness adapter. */
+/**
+ * Creates an OpenCode harness adapter.
+ *
+ * `@returns` Harness instance for OpenCode
+ * `@throws` {Error} Never.
+ */

As per coding guidelines: "All exported functions require TSDoc/JSDoc documentation (purpose, params, returns, throws)".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/** Creates an OpenCode harness adapter. */
export function createOpenCodeAdapter(): Harness {
/**
* Creates an OpenCode harness adapter.
*
* `@returns` Harness instance for OpenCode
* `@throws` {Error} Never.
*/
export function createOpenCodeAdapter(): Harness {
🤖 Prompt for AI Agents
In `@src/harnesses/opencode-adapter.ts` around lines 46 - 47, Update the TSDoc for
the exported function createOpenCodeAdapter to include a brief purpose
description plus explicit `@returns` and `@throws` tags: state that it returns a
Harness instance (type: Harness) and document any runtime errors that callers
should expect (e.g., initialization/config errors or exceptions thrown during
adapter construction). Attach these tags to the existing docblock above
createOpenCodeAdapter so the exported function has purpose, returns, and throws
documented for consumers.

Comment on lines +319 to +416
try {
scoringResult = await scoreGeneration(
item.test,
generation.output ?? "", // empty string OK when codeFilePath is set
undefined, // use default timeout
generation.codeFilePath, // pass file path from tool-calling harness
);
} catch (scoringError) {
const scoringErrorMessage =
scoringError instanceof Error
? scoringError.message
: String(scoringError);
if (
supportsCompileRetry &&
harnessForRetry &&
runtimeForRetry &&
promptForRetry.length > 0
) {
const retryFromException = await runCompileFeedbackRetry({
item,
harness: harnessForRetry,
runtime: runtimeForRetry,
promptForRetry,
timeoutMs,
unloadAfter,
log,
currentGenerationDurationMs: generation.durationMs,
compileError: scoringErrorMessage,
});
if (retryFromException) {
compileRetryUsed = true;
generation = retryFromException.generation;
scoringResult = retryFromException.scoringResult;
} else {
throw scoringError;
}
} else {
throw scoringError;
}
}

const compileError =
scoringResult.failureType === "import" ||
scoringResult.failureType === "missing_export"
? scoringResult.error
: undefined;
if (
!compileRetryUsed &&
supportsCompileRetry &&
typeof compileError === "string" &&
harnessForRetry &&
runtimeForRetry &&
promptForRetry.length > 0
) {
const retryAttempt = await runCompileFeedbackRetry({
item,
harness: harnessForRetry,
runtime: runtimeForRetry,
promptForRetry,
timeoutMs,
unloadAfter,
log,
currentGenerationDurationMs: generation.durationMs,
compileError,
});
if (retryAttempt) {
const previousPassed = scoringResult.passed;
const shouldPromoteRetry =
retryAttempt.scoringResult.passed > previousPassed ||
(retryAttempt.scoringResult.passed === previousPassed &&
scoringResult.failureType === "import" &&
retryAttempt.scoringResult.failureType !== "import");
if (shouldPromoteRetry) {
generation = retryAttempt.generation;
scoringResult = retryAttempt.scoringResult;
log.info(
{
harness: item.harness,
test: item.test,
passType: item.passType,
beforePassed: previousPassed,
afterPassed: retryAttempt.scoringResult.passed,
},
"Compile-feedback retry promoted as best attempt",
);
} else {
log.warn(
{
harness: item.harness,
test: item.test,
passType: item.passType,
beforePassed: previousPassed,
retryPassed: retryAttempt.scoringResult.passed,
},
"Compile-feedback retry did not improve score; keeping original attempt",
);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Retry promotion overwrites the original attempt without preserving evidence.
When generation and scoringResult are replaced (Line 392), the initial failure context is lost in MatrixItemResult, which makes runs harder to explain or audit. Please persist both attempts (e.g., store an attempts/retryHistory array or include the original scoring failure alongside the promoted result) and keep the promoted attempt as best.

Based on learnings: "Never implicitly fix up results after a run; capture enough evidence to explain outcomes (test failures, eval reasoning, durations, best-effort metrics)".

🤖 Prompt for AI Agents
In `@src/runner/item-executor.ts` around lines 319 - 416, The current promotion
logic in the run that sets generation = retryAttempt.generation and
scoringResult = retryAttempt.scoringResult loses the original attempt evidence;
before replacing these fields in the promotion branch of runCompileFeedbackRetry
(the block handling shouldPromoteRetry around retryAttempt and variables
generation, scoringResult, previousPassed), capture the original attempt
(generation, scoringResult, duration, and any failureType/error) and append it
to a persistent attempts/retryHistory on the item result (or include an
originalAttempt field on MatrixItemResult) so both the original and promoted
attempts are stored; then perform the promotion (replace
generation/scoringResult) but leave the new history entry referencing both
attempts and any metadata (timestamps, durations, pass counts) so
auditing/metrics can reconstruct the full decision.

Comment on lines +1 to +4
/**
* Purpose: Unit tests for harness code-output policy helpers.
* Exports: (none)
*/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add an invariants line to the file header.
The header is missing invariants, which is required for TS/JS files.

✍️ Suggested header update
 /**
  * Purpose: Unit tests for harness code-output policy helpers.
  * Exports: (none)
+ * Invariants: Tests are deterministic and network-free.
  */

As per coding guidelines: "Every file begins with a short header documenting purpose, exports, and invariants".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* Purpose: Unit tests for harness code-output policy helpers.
* Exports: (none)
*/
/**
* Purpose: Unit tests for harness code-output policy helpers.
* Exports: (none)
* Invariants: Tests are deterministic and network-free.
*/
🤖 Prompt for AI Agents
In `@test/code-output-policy.test.ts` around lines 1 - 4, Add an "Invariants" line
to the file header in test/code-output-policy.test.ts so it follows the project
header standard; update the existing comment block (the header with Purpose and
Exports) to include a third line beginning with "Invariants:" and describe the
file-level guarantees (e.g., test-only helpers, no side effects, or the specific
invariant relevant to these unit tests) to make the header include Purpose,
Exports, and Invariants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant