feat: hardened AI review — strict JSON, slop filter, quote-or-die#24
feat: hardened AI review — strict JSON, slop filter, quote-or-die#24
Conversation
…-escape verdict ## Why The deployed AI review on PR #23 was textbook small-model slop: duplicated content, generic platitudes, hallucinated risks (claimed "tokens" in a config-flip PR), zero file:line anchors. Root cause is the freeform 5-section prompt (`src/ai-review.js:336`) — it gave the 3B model permission to fill each section with whatever sounded reviewer-ish. Per the user's request: every escape hatch the model uses to avoid producing concrete output must be made mechanically invalid. ## What New module `src/ai-review-prompt.js` containing four locked-down primitives: | Primitive | Job | |---|---| | `STRICT_SYSTEM_PROMPT` | Forces JSON output, banned hedging words, banned slop phrases, NEVER refuse. Includes a one-shot example to anchor format. | | `tryParseReview()` | Strict shape enforcement: enum verdict, array findings, all fields typed. On parse failure, return `{ok: false}` — no retry. | | `filterFindings()` | Quote-or-die: drop any finding whose `quoted_line` isn't verbatim in the diff. Slop filter: drop any claim with hedging words or known filler phrases. | | `computeVerdict()` | Deterministic from filtered findings (`approve` if empty, `comment` if any). Model's verdict is advisory only — postprocessor decides. | `renderReviewMarkdown()` returns `null` when verdict is `approve` and findings are empty — the bot then does NOT post. Silence > slop. ## Behaviour change - Default `ai_review.system_prompt` flipped from the freeform 5-section string to `STRICT_SYSTEM_PROMPT`. Users with `system_prompt:` set explicitly in `config.local.yml` continue using the legacy freeform path (back-compat for existing deployments). - New review records carry `assessment` and `findings` count → dashboard's `getReviewStats()` finally has real data instead of `unknown`. ## Test plan - [x] All 737 tests pass (was 698 — added 39 covering parser, slop filter, verdict computation, render-skip behavior) - [x] eslint clean - [ ] After merge + self-update: AI review on the next PR should produce one of: (a) a tight comment with verdict + summary + file:line-anchored findings, or (b) no comment at all (when nothing concrete to flag). - [ ] Verify on netcup logs that the JSON parse rate is high — if Ollama keeps producing prose around the JSON, may need to tighten the `temperature` config (currently 0.3) or add an assistant-prefix pre-fill. ## Risk & rollout - Risk: low. Worst case (model produces unparseable output every time) is silent — no comments posted. Better than the current state where every comment is slop. - Rollout: self-update on merge. Confirmation = the next non-bot PR opened in any pulseengine repo gets either a strict-format review or no review. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AI Code Review for PR #24pulseengine/temper: Summary of ChangesThe PR introduces a new module Potential Bugs or Issues
Security Concerns
Suggestions for Improvement
Overall AssessmentThe PR introduces significant changes to the AI review system, which is a positive step towards improving its reliability and consistency. However, it also introduces potential bugs, security concerns, and performance overhead that need to be addressed before deployment. The team should carefully evaluate these issues and make necessary adjustments to ensure a robust and secure AI review process. This review was generated by a local AI model. It is advisory only and may contain inaccuracies. Reviewed at |
Why
The deployed AI review on PR #23 was textbook small-model slop: duplicated content, generic platitudes, hallucinated risks ("tokens" in a config-flip PR), zero file:line anchors. Root cause is the freeform 5-section prompt — it gave the 3B model permission to fill each section with whatever sounded reviewer-ish.
Per the user's brief: every escape hatch the model uses to avoid producing concrete output must be made mechanically invalid, not just discouraged.
What
New module
src/ai-review-prompt.jscontaining four locked-down primitives:STRICT_SYSTEM_PROMPTtryParseReview(){ok: false}— no retry.filterFindings()quoted_lineisn't verbatim in the diff. Slop filter: drop any claim with hedging words or known filler phrases.computeVerdict()approveif empty,commentif any). Model's verdict is advisory only.renderReviewMarkdown()returnsnullwhen verdict isapproveand findings are empty — the bot then does NOT post. Silence > slop.Behaviour change
ai_review.system_promptflipped from the freeform 5-section string toSTRICT_SYSTEM_PROMPT. Users withsystem_prompt:set explicitly inconfig.local.ymlcontinue using the legacy freeform path (back-compat).assessmentandfindingscount → dashboard'sgetReviewStats()finally has real data instead ofunknown.Test plan
Risk & rollout
Follow-up (PR-B)
Wire `rivet validate` / `rivet impact` as a prepended mechanical-oracle finding when the target repo has `rivet.yaml`. Findings from the oracle bypass the model entirely. Already prototyped — `rivet v0.4.3` against the rivet repo emits exactly the kind of name-anchored finding we want (e.g. `spar:SPAR-REQ-001 — no downstream artifacts`).
🤖 Generated with Claude Code