fix(ce-code-review): keep finding numbers stable by hnshah · Pull Request #754 · EveryInc/compound-engineering-plugin

hnshah · 2026-05-02T19:42:59Z

Summary

Assign ce-code-review finding numbers once after sorting, rather than per severity/route bucket
Reuse stable finding IDs when residual/actionable findings are repeated after safe autofixes
Tighten the autofix return contract so downstream users can copy non-auto-fixable work without remapping IDs

Fixes #743.

Test plan

bun test tests/review-skill-contract.test.ts tests/pipeline-review-contract.test.ts
bun run release:validate
git diff --check

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 107fca76d2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-02T19:44:59Z


 - **Pipe-delimited markdown tables** for findings -- never ASCII box-drawing characters or per-finding horizontal-rule separators between entries (the report-level `---` before the verdict is still required)
 - **Severity-grouped sections** -- `### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`. Omit empty severity levels.
+- **Stable sequential finding numbers** -- assign finding numbers once after sorting, continue them across severity sections, and reuse those same numbers when findings are repeated in Residual Actionable Work. Do not restart at `1` for each severity or route bucket.


Keep residual example IDs aligned with stable numbering

The new rule requires reusing stable finding numbers, but the example in this template still renumbers Residual Actionable Work entries (1, 2) instead of preserving original IDs from the findings table (1, 3 when #2 was auto-fixed). Because this file is used as the canonical output pattern, the contradiction can cause the reviewer output to keep remapping IDs, which breaks downstream correlation for residual work even though Stage 5 now defines stable numbering.

Useful? React with 👍 / 👎.

tmchow

Thanks for tackling this -- the core idea (assign once, reuse downstream) is right. Comparing against how ce-plan and ce-brainstorm handle stable IDs (R-IDs, U-IDs, etc.) surfaced a few places this could be tightened. Inline comments below; the priority items are the two on SKILL.md line 554 (step renumbering / fold into step 9) and the test loosening on the new contract test. Everything else is polish.

For reference, the durable-ID pattern those skills use lives in plugins/compound-engineering/skills/ce-plan/SKILL.md:401 and plugins/compound-engineering/skills/ce-plan/references/deepening-workflow.md:230 -- worth a skim if you haven't already, since the language patterns there transfer well.

tmchow · 2026-05-03T07:26:09Z

   - residual actionable queue: unresolved `gated_auto` or `manual` findings whose owner is `downstream-resolver`
   - report-only queue: `advisory` findings plus anything owned by `human` or `release`
 9. **Sort.** Order by severity (P0 first) -> anchor (descending) -> file path -> line number.
+9b. **Assign stable finding numbers once.** After sorting, assign monotonically increasing `#` values across the full primary finding set. Do not restart numbering inside each severity table or autofix/routing bucket. If later sections repeat a finding (for example Residual Actionable Work after `safe_auto` fixes are applied), reuse the same stable `#` so users can copy the non-auto-fixable work without remapping IDs.


Drop the 9b. workaround -- renumber, or fold into step 9.

Adding 9b. between existing steps 9 and 10 looks like a workaround to avoid renumbering 10. Collect coverage data and 11. Preserve CE agent artifacts. But these are private pipeline step numbers, not external references -- there's no ce-work analog citing them. The stability discipline ce-plan teaches applies to user-facing R/U IDs, not internal step numbers.

Pick one:

Preferred: renumber to 10. **Assign stable finding numbers once.** and bump the existing 10/11 to 11/12.

Even better: fold into existing step 9. Sorting and ID assignment are conceptually one operation -- the stable # is the post-sort position. Rewrite step 9 as:

9. **Sort and number.** Order by severity (P0 first) -> anchor (descending) -> file path -> line number, then assign monotonically increasing # values across the full primary finding set in that sorted order.

The folded version is clearer because it makes the dependency explicit: numbering follows from the sort key, so they belong together.

tmchow · 2026-05-03T07:26:09Z

   - residual actionable queue: unresolved `gated_auto` or `manual` findings whose owner is `downstream-resolver`
   - report-only queue: `advisory` findings plus anything owned by `human` or `release`
 9. **Sort.** Order by severity (P0 first) -> anchor (descending) -> file path -> line number.
+9b. **Assign stable finding numbers once.** After sorting, assign monotonically increasing `#` values across the full primary finding set. Do not restart numbering inside each severity table or autofix/routing bucket. If later sections repeat a finding (for example Residual Actionable Work after `safe_auto` fixes are applied), reuse the same stable `#` so users can copy the non-auto-fixable work without remapping IDs.


Name the failure mode in prose.

Current text ends with "so users can copy the non-auto-fixable work without remapping IDs." That's generic. Compare to ce-plan's analogous rule (plugins/compound-engineering/skills/ce-plan/SKILL.md:401 and the deepening-vector callout in references/deepening-workflow.md:230,244), which names a specific consumer ("breaks ce-work blocker and verification references"). Suggested replacement:

Suggested change

9b. **Assign stable finding numbers once.** After sorting, assign monotonically increasing `#` values across the full primary finding set. Do not restart numbering inside each severity table or autofix/routing bucket. If later sections repeat a finding (for example Residual Actionable Work after `safe_auto` fixes are applied), reuse the same stable `#` so users can copy the non-auto-fixable work without remapping IDs.

9b. **Assign stable finding numbers once.** After sorting, assign monotonically increasing `#` values across the full primary finding set. Do not restart numbering inside each severity table or autofix/routing bucket. If later sections repeat a finding (for example Residual Actionable Work after `safe_auto` fixes are applied), reuse the same stable `#` so users -- and downstream skills like `ce-resolve-pr-feedback` -- can reference findings by `#` after the autofix loop rewrites the report. Renumbering after autofix invalidates any prior reference: copied snippets, follow-up prompts citing `#3`, or tickets filed against an earlier render.

Naming the concrete failure mode lets the next reader judge edge cases the rule didn't anticipate.

tmchow · 2026-05-03T07:26:09Z


 1. **Header.** Scope, intent, mode, reviewer team with per-conditional justifications.
-2. **Findings.** Rendered as pipe-delimited tables grouped by severity (`### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`). Each finding row shows `#`, file, issue, reviewer(s), confidence, and synthesized route. Omit empty severity levels. Never render findings as freeform text blocks or numbered lists.
+2. **Findings.** Rendered as pipe-delimited tables grouped by severity (`### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`). Each finding row shows `#`, file, issue, reviewer(s), confidence, and synthesized route. Omit empty severity levels. Never render findings as freeform text blocks or numbered lists. Finding numbers must come from Stage 5's stable numbering and continue sequentially across severity sections; never restart at `1` for a new severity header.


Collapse the duplicated rule -- point back to Stage 5 instead of restating.

This sentence restates the rule already established in step 9b: "Finding numbers must come from Stage 5's stable numbering and continue sequentially across severity sections; never restart at 1 for a new severity header."

ce-plan keeps its stability rule in one canonical paragraph (SKILL.md:401) and references it from elsewhere -- so future edits have one home, not three (autofix bullet at L70, Stage 5 step, this Stage 6 rendering bullet). Suggested replacement:

Suggested change

2. **Findings.** Rendered as pipe-delimited tables grouped by severity (`### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`). Each finding row shows `#`, file, issue, reviewer(s), confidence, and synthesized route. Omit empty severity levels. Never render findings as freeform text blocks or numbered lists. Finding numbers must come from Stage 5's stable numbering and continue sequentially across severity sections; never restart at `1` for a new severity header.

2. **Findings.** Rendered as pipe-delimited tables grouped by severity (`### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`). Each finding row shows `#`, file, issue, reviewer(s), confidence, and synthesized route. Omit empty severity levels. Never render findings as freeform text blocks or numbered lists. Finding numbers come from the stable assignment in Stage 5 -- never re-derive them per severity table.

tmchow · 2026-05-03T07:26:09Z

 - **Apply only `safe_auto -> review-fixer` findings.** Leave `gated_auto`, `manual`, `human`, and `release` work unresolved.
 - **Write a run artifact** under `/tmp/compound-engineering/ce-code-review/<run-id>/` summarizing findings, applied fixes, residual actionable work, and advisory outputs. Orchestrators read this artifact to route residual `downstream-resolver` findings; the skill itself does not file tickets or prompt the user in autofix.
- **Emit a compact Residual Actionable Work summary in the autofix return** listing each residual `downstream-resolver` finding with severity, file:line, title, and autofix_class. Include the run-artifact path. Callers read this summary directly without parsing the artifact. When no residuals exist, state `Residual actionable work: none.` explicitly.
+- **Emit a compact Residual Actionable Work summary in the autofix return** listing each residual `downstream-resolver` finding with its stable `#`, severity, file:line, title, and autofix_class. Keep applied `safe_auto` fixes and residual non-auto findings in separate contiguous sections, and reuse the stable finding numbers from Stage 5 rather than renumbering residuals. Include the run-artifact path. Callers read this summary directly without parsing the artifact. When no residuals exist, state `Residual actionable work: none.` explicitly.


Minor: split the two unrelated rules into separate sentences.

The new clause mixes section structure ("keep applied fixes and residuals in separate contiguous sections") with ID reuse ("reuse stable finding numbers") into one sentence. Splitting reads more cleanly:

"Structure the summary as two separate contiguous sections: applied safe_auto fixes first, then residual non-auto findings. Within the residual section, reuse each finding's stable # from Stage 5 -- never renumber."

tmchow · 2026-05-03T07:26:09Z


 - **Pipe-delimited markdown tables** for findings -- never ASCII box-drawing characters or per-finding horizontal-rule separators between entries (the report-level `---` before the verdict is still required)
 - **Severity-grouped sections** -- `### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`. Omit empty severity levels.
+- **Stable sequential finding numbers** -- assign finding numbers once after sorting, continue them across severity sections, and reuse those same numbers when findings are repeated in Residual Actionable Work. Do not restart at `1` for each severity or route bucket.


Show the rule in the example tables, don't just declare it as a bullet.

This template already teaches by example -- good rendering above (~L70-100) and bad rendering (~L100-114). The new rule appears only as a declarative bullet here. LLMs rendering the report lean on the example more than the bullet.

Two concrete updates to the example section above:

Update the existing P1 / P2 sample tables so the P2 section starts at #3 (not #1), continuing from the P1 section's two findings -- demonstrating the cross-severity continuation.

Add a small Residual Actionable Work mini-table below that re-cites one of those # values, demonstrating the post-autofix reuse.

Without an example, the rule is easy to read past.

tmchow · 2026-05-03T07:26:09Z

+  test("ce-code-review uses stable sequential finding numbers across grouped output", async () => {
+    const content = await readRepoFile("plugins/compound-engineering/skills/ce-code-review/SKILL.md")
+    const template = await readRepoFile(
+      "plugins/compound-engineering/skills/ce-code-review/references/review-output-template.md",
+    )
+
+    expect(content).toContain("Assign stable finding numbers once")
+    expect(content).toContain("Do not restart numbering inside each severity table or autofix/routing bucket")
+    expect(content).toContain("reuse the same stable `#`")
+    expect(content).toContain("never restart at `1` for a new severity header")
+    expect(template).toContain("Stable sequential finding numbers")
+    expect(template).toContain("reuse those same numbers when findings are repeated in Residual Actionable Work")
+  })


Assertions are too loose -- tighten location anchoring and consider one behavioral check.

Two issues with the current test:

expect(content).toContain("Assign stable finding numbers once") passes even if someone moves the rule into a stale comment, deletes the surrounding step, or adds contradictory guidance elsewhere. The test confirms the rule is written, not that it lives in the right place.

It checks prose presence, not behavior. Compare to ce-plan's Phase 6 checklist (SKILL.md:802) which asserts structural invariants ("U-IDs are unique within the plan and follow the stability rule").

Two concrete improvements -- either is worth doing; both is better:

(a) Anchor the prose assertions to their location by slicing the file around the Stage 5 / Stage 5b headings so the test fails loudly if the rule is moved out of context:

const stage5 = content.split("### Stage 5b:")[0].split("### Stage 5:")[1] expect(stage5).toMatch(/Assign stable finding numbers once/) expect(stage5).toMatch(/reuse the same stable `#`/)

(b) Add at least one behavioral check. Even if the rendering itself lives in skill prose (LLM-driven), you can add a fixture in tests/fixtures/ showing a correctly-rendered multi-severity report and assert:

numbers across ### P0, ### P1, ### P2, ### P3 tables are strictly monotonic (never reset to 1)

a finding repeated in a Residual Actionable Work section retains the same #

A fixture-based assertion exercises a concrete artifact rather than a string in skill prose, and would catch the actual failure mode the rule is trying to prevent.

tmchow

feedback to address posted

fix(ce-code-review): keep finding numbers stable

107fca7

hnshah mentioned this pull request May 2, 2026

fix(ce-polish-beta): support Bash 3.2 project detection #755

Merged

chatgpt-codex-connector Bot reviewed May 2, 2026

View reviewed changes

tmchow reviewed May 3, 2026

View reviewed changes

tmchow requested changes May 3, 2026

View reviewed changes

lilis-tokyo approved these changes May 3, 2026

View reviewed changes

fix: address stable finding number review feedback

b6e0251

tmchow self-requested a review May 4, 2026 01:51

tmchow approved these changes May 4, 2026

View reviewed changes

tmchow merged commit e856756 into EveryInc:main May 4, 2026
2 checks passed

github-actions Bot mentioned this pull request May 4, 2026

chore: release main #753

Merged

hnshah deleted the fix/ce-code-review-stable-finding-numbers branch May 4, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ce-code-review): keep finding numbers stable#754

fix(ce-code-review): keep finding numbers stable#754
tmchow merged 2 commits intoEveryInc:mainfrom
hnshah:fix/ce-code-review-stable-finding-numbers

hnshah commented May 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 2, 2026

Uh oh!

tmchow left a comment

Uh oh!

tmchow May 3, 2026

Uh oh!

tmchow May 3, 2026

Uh oh!

tmchow May 3, 2026

Uh oh!

tmchow May 3, 2026

Uh oh!

tmchow May 3, 2026

Uh oh!

tmchow May 3, 2026

Uh oh!

tmchow left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hnshah commented May 2, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

tmchow left a comment

Choose a reason for hiding this comment

Uh oh!

tmchow May 3, 2026

Choose a reason for hiding this comment

Uh oh!

tmchow May 3, 2026

Choose a reason for hiding this comment

Uh oh!

tmchow May 3, 2026

Choose a reason for hiding this comment

Uh oh!

tmchow May 3, 2026

Choose a reason for hiding this comment

Uh oh!

tmchow May 3, 2026

Choose a reason for hiding this comment

Uh oh!

tmchow May 3, 2026

Choose a reason for hiding this comment

Uh oh!

tmchow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants