tighten todo prompt and export exceptions#8
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds two new scoring failure types ("missing_export", "factory_init_failed"); extends schemas/types and tests; updates error classification to return these types; extracts stdout suppression into Changes
Sequence Diagram(s)sequenceDiagram
participant Runner
participant Harness as Harness (Goose/OpenCode)
participant CodePolicy as CodeOutputPolicy
participant Scorer
participant StdSuppress as StdoutSuppressor
participant Classifier
Runner->>Harness: generateCode(prompt)
Harness-->>CodePolicy: rawOutput
CodePolicy-->>Harness: decision {shouldRetry, reason, code}
alt decision.shouldRetry && retryAllowed
Harness->>Harness: re-invoke with retry marker
Harness-->>CodePolicy: rawOutput (retry)
end
CodePolicy-->>Runner: {code, method, shouldRetry=false}
Runner->>Scorer: scoreGeneration(testSlug, codeFilePath?, rawOutput?)
Scorer->>StdSuppress: suppressStdoutAsync(run import/execution)
StdSuppress-->>Scorer: suppression active
Scorer->>Scorer: import/execute candidate module
alt missing exports detected
Scorer->>Classifier: "Missing export ..."
Classifier-->>Scorer: "missing_export"
Scorer->>StdSuppress: restore
Scorer-->>Runner: failure(failureType: "missing_export", totals, details)
else factory init fails
Scorer->>Classifier: "Failed to create instance ..."
Classifier-->>Scorer: "factory_init_failed"
Scorer->>StdSuppress: restore
Scorer-->>Runner: failure(failureType: "factory_init_failed", totals, details)
else imports & factory OK
Scorer->>StdSuppress: restore
Scorer-->>Runner: success(results)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/schemas/common.schema.ts`:
- Around line 65-66: Update the schema version and test expectation to reflect
the additive enum change: bump SCHEMA_VERSION from "0.2.1" to "0.2.2" next to
the modified scoringFailureTypes (which now includes "missing_export" and
"factory_init_failed"), and update the assertion in test/schemas.test.ts that
checks SCHEMA_VERSION so the test expects "0.2.2". Ensure only the
SCHEMA_VERSION value and the test expectation are changed; do not alter the
scoringFailureTypes entries themselves.
In `@src/tests/todo-app/prompt.blind.md`:
- Around line 9-13: Remove the duplicate plain-list entry that names the methods
`listTodos`, `listCompleted`, `listPending`, and `clearCompleted` (the short
list without signatures) so only the detailed signature lines for `listTodos():
Array<{ id: number, text: string, completed: boolean }>`, `listCompleted():
Array<{ id: number, text: string, completed: boolean }>`, `listPending():
Array<{ id: number, text: string, completed: boolean }>`, and `clearCompleted():
void` remain; simply delete the redundant line to avoid repeating those method
names without signatures.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/schemas/common.schema.ts (1)
61-72:⚠️ Potential issue | 🟡 MinorUpdate header export list to include scoring failure exports.
The header’s export list omits the newly exported scoring failure constants/types, so it no longer matches the file’s public surface. Please update it for accuracy.
✍️ Suggested header update
- * Exports: SCHEMA_VERSION, passTypes, PassTypeSchema, PassType, - * itemStatusTypes, ItemStatusSchema, ItemStatus, - * runtimeNames, RuntimeNameSchema, RuntimeName + * Exports: SCHEMA_VERSION, runtimeNames, RuntimeNameSchema, RuntimeName, + * passTypes, PassTypeSchema, PassType, + * itemStatusTypes, ItemStatusSchema, ItemStatus, + * generationFailureTypes, GenerationFailureTypeSchema, GenerationFailureType, + * scoringFailureTypes, ScoringFailureTypeSchema, ScoringFailureType, + * frontierEvalFailureTypes, FrontierEvalFailureTypeSchema, FrontierEvalFailureTypeAs per coding guidelines, "Every file begins with a short header documenting purpose, exports, and invariants".
|
@CodeRabbit full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/lib/stdout-suppressor.ts`:
- Around line 37-48: The return from result.finally(restore) is a Promise when
fn() yields a Promise, but the code casts it incorrectly to T; update the cast
so the expression in the try branch returns a Promise<T> by changing the cast on
result.finally(restore) to as Promise<T>, keeping the surrounding logic (fn,
result, restore) intact so the async overload's return type is correct.
🧹 Nitpick comments (1)
llm/project/project-rules.md (1)
41-41: Document the new*.spec.tssuffix convention.The rename from
scoring.test.tstoscoring.spec.tsintroduces a new file naming pattern that distinguishes specification files from test files. However, the suffix conventions section (lines 70-71) only documents*.test.tsfor vitest tests.Consider adding an entry for
*.spec.tsto clarify its purpose:
*.spec.ts— scoring specifications (define expected exports and test cases for benchmark tests)*.test.ts— vitest tests📝 Suggested addition to naming conventions
Add this to the "Suffix conventions" section around line 71:
- Suffix conventions: + - `*.spec.ts` — scoring specifications (expected exports and test cases for benchmarks) - `*.test.ts` — vitest tests - `*.schema.ts` — Zod schemasAlso applies to: 159-159
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@apps/dashboard/public/results/20260213-075756-7dc316/plan.json`:
- Around line 4557-4559: The summary field "runtimes" is incorrect (shows 2)
while every item in the plan lists "runtime": "ollama", so update the runtimes
summary to the correct count (1) or compute it dynamically from unique runtime
values; specifically change the "runtimes" value to 1 (or derive it from the set
of runtime values) so it matches the items and prevents dashboard
inconsistency—look for the JSON keys "totalItems", "runtimes", "models" and the
per-item "runtime": "ollama" when making the fix.
In `@apps/dashboard/public/results/index.json`:
- Around line 3-6: The durationMs (27786998) does not equal the delta between
startedAt and completedAt for runId "20260213-075756-7dc316"; recompute
durationMs as the millisecond difference between completedAt and startedAt
(completedAt 2026-02-14T01:38:41.264Z minus startedAt 2026-02-13T15:57:56.054Z =
34845210) and update the durationMs field accordingly (or adjust the timestamps
if those are wrong).
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@src/harnesses/code-output-policy.ts`:
- Around line 53-77: The TSDoc for the exported function buildCodeOnlyPrompt is
missing an `@throws` tag; update its docblock to include an `@throws` entry
describing any errors that can propagate (e.g., if getTestSpecificContract
throws or if invalid input is provided) or explicitly state that it does not
throw, so the docblock contains purpose, `@param`, `@returns` and `@throws`; apply the
same pattern to the other exported functions noted (lines 112-178 and 230-251)
using their respective function names (e.g., getTestSpecificContract) to
indicate possible thrown errors or "never throws" as appropriate.
In `@src/harnesses/opencode-adapter.ts`:
- Around line 46-47: Update the TSDoc for the exported function
createOpenCodeAdapter to include a brief purpose description plus explicit
`@returns` and `@throws` tags: state that it returns a Harness instance (type:
Harness) and document any runtime errors that callers should expect (e.g.,
initialization/config errors or exceptions thrown during adapter construction).
Attach these tags to the existing docblock above createOpenCodeAdapter so the
exported function has purpose, returns, and throws documented for consumers.
In `@src/runner/item-executor.ts`:
- Around line 319-416: The current promotion logic in the run that sets
generation = retryAttempt.generation and scoringResult =
retryAttempt.scoringResult loses the original attempt evidence; before replacing
these fields in the promotion branch of runCompileFeedbackRetry (the block
handling shouldPromoteRetry around retryAttempt and variables generation,
scoringResult, previousPassed), capture the original attempt (generation,
scoringResult, duration, and any failureType/error) and append it to a
persistent attempts/retryHistory on the item result (or include an
originalAttempt field on MatrixItemResult) so both the original and promoted
attempts are stored; then perform the promotion (replace
generation/scoringResult) but leave the new history entry referencing both
attempts and any metadata (timestamps, durations, pass counts) so
auditing/metrics can reconstruct the full decision.
In `@test/code-output-policy.test.ts`:
- Around line 1-4: Add an "Invariants" line to the file header in
test/code-output-policy.test.ts so it follows the project header standard;
update the existing comment block (the header with Purpose and Exports) to
include a third line beginning with "Invariants:" and describe the file-level
guarantees (e.g., test-only helpers, no side effects, or the specific invariant
relevant to these unit tests) to make the header include Purpose, Exports, and
Invariants.
🧹 Nitpick comments (1)
src/runner/item-executor.ts (1)
70-179: Consider extracting the compile-retry helpers to keep the file under 500 lines.
With the new retry helpers, this file now ends around Line 538. MovingCOMPILE_RETRY_ERROR_MAX_LENGTH,buildCompileRetryPrompt,CompileRetryContext, andrunCompileFeedbackRetryinto a small helper module (e.g.,src/runner/compile-retry.ts) would keep responsibilities split and meet the size guideline.As per coding guidelines: "Keep files under 500 lines; split aggressively by responsibility".
| /** | ||
| * Builds the strict code-only prompt contract. | ||
| * | ||
| * @param prompt - Base task prompt | ||
| * @param isRetry - Whether this is the retry attempt | ||
| * @returns Prompt text with strict output contract appended | ||
| */ | ||
| export function buildCodeOnlyPrompt(prompt: string, isRetry: boolean): string { | ||
| const retryLine = isRetry | ||
| ? "Previous output was unusable. Retry now and output only final TypeScript source." | ||
| : ""; | ||
| const testSpecificLines = getTestSpecificContract(prompt); | ||
| return [ | ||
| prompt.trim(), | ||
| "", | ||
| "Output contract:", | ||
| "- Return only final TypeScript source code.", | ||
| "- Do not include markdown fences, analysis, status messages, or repository exploration text.", | ||
| "- If uncertain, still return your best complete TypeScript implementation.", | ||
| ...testSpecificLines, | ||
| retryLine, | ||
| ] | ||
| .filter((line) => line.length > 0) | ||
| .join("\n"); | ||
| } |
There was a problem hiding this comment.
Add @throws tags to exported function TSDoc.
The guideline requires purpose/params/returns/throws on exported functions; these blocks omit @throws.
✍️ Suggested doc updates
@@
* `@param` prompt - Base task prompt
* `@param` isRetry - Whether this is the retry attempt
* `@returns` Prompt text with strict output contract appended
+ * `@throws` {Error} Never.
*/
export function buildCodeOnlyPrompt(prompt: string, isRetry: boolean): string {
@@
* `@param` output - Harness text output
* `@param` minOutputLength - Minimum usable length
* `@returns` Decision describing retry behavior and extracted code candidate
+ * `@throws` {Error} Never.
*/
export function evaluateCodeOnlyOutput(
@@
* `@param` prompt - Prompt text
* `@returns` True when the prompt includes the internal retry marker
+ * `@throws` {Error} Never.
*/
export function hasRetryMarker(prompt: string): boolean {
@@
* `@param` prompt - Prompt text
* `@returns` Prompt text with marker appended
+ * `@throws` {Error} Never.
*/
export function appendRetryMarker(prompt: string): string {
@@
* `@param` prompt - Prompt text
* `@returns` Prompt without internal retry marker
+ * `@throws` {Error} Never.
*/
export function stripRetryMarker(prompt: string): string {As per coding guidelines: "All exported functions require TSDoc/JSDoc documentation (purpose, params, returns, throws)".
Also applies to: 112-178, 230-251
🤖 Prompt for AI Agents
In `@src/harnesses/code-output-policy.ts` around lines 53 - 77, The TSDoc for the
exported function buildCodeOnlyPrompt is missing an `@throws` tag; update its
docblock to include an `@throws` entry describing any errors that can propagate
(e.g., if getTestSpecificContract throws or if invalid input is provided) or
explicitly state that it does not throw, so the docblock contains purpose,
`@param`, `@returns` and `@throws`; apply the same pattern to the other exported
functions noted (lines 112-178 and 230-251) using their respective function
names (e.g., getTestSpecificContract) to indicate possible thrown errors or
"never throws" as appropriate.
| /** Creates an OpenCode harness adapter. */ | ||
| export function createOpenCodeAdapter(): Harness { |
There was a problem hiding this comment.
Expand createOpenCodeAdapter TSDoc to include returns/throws.
As an exported function, it should include the full TSDoc tag set.
✍️ Suggested doc update
-/** Creates an OpenCode harness adapter. */
+/**
+ * Creates an OpenCode harness adapter.
+ *
+ * `@returns` Harness instance for OpenCode
+ * `@throws` {Error} Never.
+ */As per coding guidelines: "All exported functions require TSDoc/JSDoc documentation (purpose, params, returns, throws)".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /** Creates an OpenCode harness adapter. */ | |
| export function createOpenCodeAdapter(): Harness { | |
| /** | |
| * Creates an OpenCode harness adapter. | |
| * | |
| * `@returns` Harness instance for OpenCode | |
| * `@throws` {Error} Never. | |
| */ | |
| export function createOpenCodeAdapter(): Harness { |
🤖 Prompt for AI Agents
In `@src/harnesses/opencode-adapter.ts` around lines 46 - 47, Update the TSDoc for
the exported function createOpenCodeAdapter to include a brief purpose
description plus explicit `@returns` and `@throws` tags: state that it returns a
Harness instance (type: Harness) and document any runtime errors that callers
should expect (e.g., initialization/config errors or exceptions thrown during
adapter construction). Attach these tags to the existing docblock above
createOpenCodeAdapter so the exported function has purpose, returns, and throws
documented for consumers.
| try { | ||
| scoringResult = await scoreGeneration( | ||
| item.test, | ||
| generation.output ?? "", // empty string OK when codeFilePath is set | ||
| undefined, // use default timeout | ||
| generation.codeFilePath, // pass file path from tool-calling harness | ||
| ); | ||
| } catch (scoringError) { | ||
| const scoringErrorMessage = | ||
| scoringError instanceof Error | ||
| ? scoringError.message | ||
| : String(scoringError); | ||
| if ( | ||
| supportsCompileRetry && | ||
| harnessForRetry && | ||
| runtimeForRetry && | ||
| promptForRetry.length > 0 | ||
| ) { | ||
| const retryFromException = await runCompileFeedbackRetry({ | ||
| item, | ||
| harness: harnessForRetry, | ||
| runtime: runtimeForRetry, | ||
| promptForRetry, | ||
| timeoutMs, | ||
| unloadAfter, | ||
| log, | ||
| currentGenerationDurationMs: generation.durationMs, | ||
| compileError: scoringErrorMessage, | ||
| }); | ||
| if (retryFromException) { | ||
| compileRetryUsed = true; | ||
| generation = retryFromException.generation; | ||
| scoringResult = retryFromException.scoringResult; | ||
| } else { | ||
| throw scoringError; | ||
| } | ||
| } else { | ||
| throw scoringError; | ||
| } | ||
| } | ||
|
|
||
| const compileError = | ||
| scoringResult.failureType === "import" || | ||
| scoringResult.failureType === "missing_export" | ||
| ? scoringResult.error | ||
| : undefined; | ||
| if ( | ||
| !compileRetryUsed && | ||
| supportsCompileRetry && | ||
| typeof compileError === "string" && | ||
| harnessForRetry && | ||
| runtimeForRetry && | ||
| promptForRetry.length > 0 | ||
| ) { | ||
| const retryAttempt = await runCompileFeedbackRetry({ | ||
| item, | ||
| harness: harnessForRetry, | ||
| runtime: runtimeForRetry, | ||
| promptForRetry, | ||
| timeoutMs, | ||
| unloadAfter, | ||
| log, | ||
| currentGenerationDurationMs: generation.durationMs, | ||
| compileError, | ||
| }); | ||
| if (retryAttempt) { | ||
| const previousPassed = scoringResult.passed; | ||
| const shouldPromoteRetry = | ||
| retryAttempt.scoringResult.passed > previousPassed || | ||
| (retryAttempt.scoringResult.passed === previousPassed && | ||
| scoringResult.failureType === "import" && | ||
| retryAttempt.scoringResult.failureType !== "import"); | ||
| if (shouldPromoteRetry) { | ||
| generation = retryAttempt.generation; | ||
| scoringResult = retryAttempt.scoringResult; | ||
| log.info( | ||
| { | ||
| harness: item.harness, | ||
| test: item.test, | ||
| passType: item.passType, | ||
| beforePassed: previousPassed, | ||
| afterPassed: retryAttempt.scoringResult.passed, | ||
| }, | ||
| "Compile-feedback retry promoted as best attempt", | ||
| ); | ||
| } else { | ||
| log.warn( | ||
| { | ||
| harness: item.harness, | ||
| test: item.test, | ||
| passType: item.passType, | ||
| beforePassed: previousPassed, | ||
| retryPassed: retryAttempt.scoringResult.passed, | ||
| }, | ||
| "Compile-feedback retry did not improve score; keeping original attempt", | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
Retry promotion overwrites the original attempt without preserving evidence.
When generation and scoringResult are replaced (Line 392), the initial failure context is lost in MatrixItemResult, which makes runs harder to explain or audit. Please persist both attempts (e.g., store an attempts/retryHistory array or include the original scoring failure alongside the promoted result) and keep the promoted attempt as best.
Based on learnings: "Never implicitly fix up results after a run; capture enough evidence to explain outcomes (test failures, eval reasoning, durations, best-effort metrics)".
🤖 Prompt for AI Agents
In `@src/runner/item-executor.ts` around lines 319 - 416, The current promotion
logic in the run that sets generation = retryAttempt.generation and
scoringResult = retryAttempt.scoringResult loses the original attempt evidence;
before replacing these fields in the promotion branch of runCompileFeedbackRetry
(the block handling shouldPromoteRetry around retryAttempt and variables
generation, scoringResult, previousPassed), capture the original attempt
(generation, scoringResult, duration, and any failureType/error) and append it
to a persistent attempts/retryHistory on the item result (or include an
originalAttempt field on MatrixItemResult) so both the original and promoted
attempts are stored; then perform the promotion (replace
generation/scoringResult) but leave the new history entry referencing both
attempts and any metadata (timestamps, durations, pass counts) so
auditing/metrics can reconstruct the full decision.
| /** | ||
| * Purpose: Unit tests for harness code-output policy helpers. | ||
| * Exports: (none) | ||
| */ |
There was a problem hiding this comment.
Add an invariants line to the file header.
The header is missing invariants, which is required for TS/JS files.
✍️ Suggested header update
/**
* Purpose: Unit tests for harness code-output policy helpers.
* Exports: (none)
+ * Invariants: Tests are deterministic and network-free.
*/As per coding guidelines: "Every file begins with a short header documenting purpose, exports, and invariants".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /** | |
| * Purpose: Unit tests for harness code-output policy helpers. | |
| * Exports: (none) | |
| */ | |
| /** | |
| * Purpose: Unit tests for harness code-output policy helpers. | |
| * Exports: (none) | |
| * Invariants: Tests are deterministic and network-free. | |
| */ |
🤖 Prompt for AI Agents
In `@test/code-output-policy.test.ts` around lines 1 - 4, Add an "Invariants" line
to the file header in test/code-output-policy.test.ts so it follows the project
header standard; update the existing comment block (the header with Purpose and
Exports) to include a third line beginning with "Invariants:" and describe the
file-level guarantees (e.g., test-only helpers, no side effects, or the specific
invariant relevant to these unit tests) to make the header include Purpose,
Exports, and Invariants.
Summary by CodeRabbit