Skip to content

fix(ce-code-review): keep finding numbers stable#754

Merged
tmchow merged 2 commits intoEveryInc:mainfrom
hnshah:fix/ce-code-review-stable-finding-numbers
May 4, 2026
Merged

fix(ce-code-review): keep finding numbers stable#754
tmchow merged 2 commits intoEveryInc:mainfrom
hnshah:fix/ce-code-review-stable-finding-numbers

Conversation

@hnshah
Copy link
Copy Markdown
Contributor

@hnshah hnshah commented May 2, 2026

Summary

  • Assign ce-code-review finding numbers once after sorting, rather than per severity/route bucket
  • Reuse stable finding IDs when residual/actionable findings are repeated after safe autofixes
  • Tighten the autofix return contract so downstream users can copy non-auto-fixable work without remapping IDs

Fixes #743.

Test plan

  • bun test tests/review-skill-contract.test.ts tests/pipeline-review-contract.test.ts
  • bun run release:validate
  • git diff --check

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 107fca76d2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


- **Pipe-delimited markdown tables** for findings -- never ASCII box-drawing characters or per-finding horizontal-rule separators between entries (the report-level `---` before the verdict is still required)
- **Severity-grouped sections** -- `### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`. Omit empty severity levels.
- **Stable sequential finding numbers** -- assign finding numbers once after sorting, continue them across severity sections, and reuse those same numbers when findings are repeated in Residual Actionable Work. Do not restart at `1` for each severity or route bucket.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep residual example IDs aligned with stable numbering

The new rule requires reusing stable finding numbers, but the example in this template still renumbers Residual Actionable Work entries (1, 2) instead of preserving original IDs from the findings table (1, 3 when #2 was auto-fixed). Because this file is used as the canonical output pattern, the contradiction can cause the reviewer output to keep remapping IDs, which breaks downstream correlation for residual work even though Stage 5 now defines stable numbering.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@tmchow tmchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this -- the core idea (assign once, reuse downstream) is right. Comparing against how ce-plan and ce-brainstorm handle stable IDs (R-IDs, U-IDs, etc.) surfaced a few places this could be tightened. Inline comments below; the priority items are the two on SKILL.md line 554 (step renumbering / fold into step 9) and the test loosening on the new contract test. Everything else is polish.

For reference, the durable-ID pattern those skills use lives in plugins/compound-engineering/skills/ce-plan/SKILL.md:401 and plugins/compound-engineering/skills/ce-plan/references/deepening-workflow.md:230 -- worth a skim if you haven't already, since the language patterns there transfer well.

- residual actionable queue: unresolved `gated_auto` or `manual` findings whose owner is `downstream-resolver`
- report-only queue: `advisory` findings plus anything owned by `human` or `release`
9. **Sort.** Order by severity (P0 first) -> anchor (descending) -> file path -> line number.
9b. **Assign stable finding numbers once.** After sorting, assign monotonically increasing `#` values across the full primary finding set. Do not restart numbering inside each severity table or autofix/routing bucket. If later sections repeat a finding (for example Residual Actionable Work after `safe_auto` fixes are applied), reuse the same stable `#` so users can copy the non-auto-fixable work without remapping IDs.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the 9b. workaround -- renumber, or fold into step 9.

Adding 9b. between existing steps 9 and 10 looks like a workaround to avoid renumbering 10. Collect coverage data and 11. Preserve CE agent artifacts. But these are private pipeline step numbers, not external references -- there's no ce-work analog citing them. The stability discipline ce-plan teaches applies to user-facing R/U IDs, not internal step numbers.

Pick one:

  • Preferred: renumber to 10. **Assign stable finding numbers once.** and bump the existing 10/11 to 11/12.

  • Even better: fold into existing step 9. Sorting and ID assignment are conceptually one operation -- the stable # is the post-sort position. Rewrite step 9 as:

    9. **Sort and number.** Order by severity (P0 first) -> anchor (descending) -> file path -> line number, then assign monotonically increasing # values across the full primary finding set in that sorted order.

The folded version is clearer because it makes the dependency explicit: numbering follows from the sort key, so they belong together.

- residual actionable queue: unresolved `gated_auto` or `manual` findings whose owner is `downstream-resolver`
- report-only queue: `advisory` findings plus anything owned by `human` or `release`
9. **Sort.** Order by severity (P0 first) -> anchor (descending) -> file path -> line number.
9b. **Assign stable finding numbers once.** After sorting, assign monotonically increasing `#` values across the full primary finding set. Do not restart numbering inside each severity table or autofix/routing bucket. If later sections repeat a finding (for example Residual Actionable Work after `safe_auto` fixes are applied), reuse the same stable `#` so users can copy the non-auto-fixable work without remapping IDs.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name the failure mode in prose.

Current text ends with "so users can copy the non-auto-fixable work without remapping IDs." That's generic. Compare to ce-plan's analogous rule (plugins/compound-engineering/skills/ce-plan/SKILL.md:401 and the deepening-vector callout in references/deepening-workflow.md:230,244), which names a specific consumer ("breaks ce-work blocker and verification references"). Suggested replacement:

Suggested change
9b. **Assign stable finding numbers once.** After sorting, assign monotonically increasing `#` values across the full primary finding set. Do not restart numbering inside each severity table or autofix/routing bucket. If later sections repeat a finding (for example Residual Actionable Work after `safe_auto` fixes are applied), reuse the same stable `#` so users can copy the non-auto-fixable work without remapping IDs.
9b. **Assign stable finding numbers once.** After sorting, assign monotonically increasing `#` values across the full primary finding set. Do not restart numbering inside each severity table or autofix/routing bucket. If later sections repeat a finding (for example Residual Actionable Work after `safe_auto` fixes are applied), reuse the same stable `#` so users -- and downstream skills like `ce-resolve-pr-feedback` -- can reference findings by `#` after the autofix loop rewrites the report. Renumbering after autofix invalidates any prior reference: copied snippets, follow-up prompts citing `#3`, or tickets filed against an earlier render.

Naming the concrete failure mode lets the next reader judge edge cases the rule didn't anticipate.


1. **Header.** Scope, intent, mode, reviewer team with per-conditional justifications.
2. **Findings.** Rendered as pipe-delimited tables grouped by severity (`### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`). Each finding row shows `#`, file, issue, reviewer(s), confidence, and synthesized route. Omit empty severity levels. Never render findings as freeform text blocks or numbered lists.
2. **Findings.** Rendered as pipe-delimited tables grouped by severity (`### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`). Each finding row shows `#`, file, issue, reviewer(s), confidence, and synthesized route. Omit empty severity levels. Never render findings as freeform text blocks or numbered lists. Finding numbers must come from Stage 5's stable numbering and continue sequentially across severity sections; never restart at `1` for a new severity header.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collapse the duplicated rule -- point back to Stage 5 instead of restating.

This sentence restates the rule already established in step 9b: "Finding numbers must come from Stage 5's stable numbering and continue sequentially across severity sections; never restart at 1 for a new severity header."

ce-plan keeps its stability rule in one canonical paragraph (SKILL.md:401) and references it from elsewhere -- so future edits have one home, not three (autofix bullet at L70, Stage 5 step, this Stage 6 rendering bullet). Suggested replacement:

Suggested change
2. **Findings.** Rendered as pipe-delimited tables grouped by severity (`### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`). Each finding row shows `#`, file, issue, reviewer(s), confidence, and synthesized route. Omit empty severity levels. Never render findings as freeform text blocks or numbered lists. Finding numbers must come from Stage 5's stable numbering and continue sequentially across severity sections; never restart at `1` for a new severity header.
2. **Findings.** Rendered as pipe-delimited tables grouped by severity (`### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`). Each finding row shows `#`, file, issue, reviewer(s), confidence, and synthesized route. Omit empty severity levels. Never render findings as freeform text blocks or numbered lists. Finding numbers come from the stable assignment in Stage 5 -- never re-derive them per severity table.

- **Apply only `safe_auto -> review-fixer` findings.** Leave `gated_auto`, `manual`, `human`, and `release` work unresolved.
- **Write a run artifact** under `/tmp/compound-engineering/ce-code-review/<run-id>/` summarizing findings, applied fixes, residual actionable work, and advisory outputs. Orchestrators read this artifact to route residual `downstream-resolver` findings; the skill itself does not file tickets or prompt the user in autofix.
- **Emit a compact Residual Actionable Work summary in the autofix return** listing each residual `downstream-resolver` finding with severity, file:line, title, and autofix_class. Include the run-artifact path. Callers read this summary directly without parsing the artifact. When no residuals exist, state `Residual actionable work: none.` explicitly.
- **Emit a compact Residual Actionable Work summary in the autofix return** listing each residual `downstream-resolver` finding with its stable `#`, severity, file:line, title, and autofix_class. Keep applied `safe_auto` fixes and residual non-auto findings in separate contiguous sections, and reuse the stable finding numbers from Stage 5 rather than renumbering residuals. Include the run-artifact path. Callers read this summary directly without parsing the artifact. When no residuals exist, state `Residual actionable work: none.` explicitly.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: split the two unrelated rules into separate sentences.

The new clause mixes section structure ("keep applied fixes and residuals in separate contiguous sections") with ID reuse ("reuse stable finding numbers") into one sentence. Splitting reads more cleanly:

"Structure the summary as two separate contiguous sections: applied safe_auto fixes first, then residual non-auto findings. Within the residual section, reuse each finding's stable # from Stage 5 -- never renumber."


- **Pipe-delimited markdown tables** for findings -- never ASCII box-drawing characters or per-finding horizontal-rule separators between entries (the report-level `---` before the verdict is still required)
- **Severity-grouped sections** -- `### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`. Omit empty severity levels.
- **Stable sequential finding numbers** -- assign finding numbers once after sorting, continue them across severity sections, and reuse those same numbers when findings are repeated in Residual Actionable Work. Do not restart at `1` for each severity or route bucket.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show the rule in the example tables, don't just declare it as a bullet.

This template already teaches by example -- good rendering above (~L70-100) and bad rendering (~L100-114). The new rule appears only as a declarative bullet here. LLMs rendering the report lean on the example more than the bullet.

Two concrete updates to the example section above:

  1. Update the existing P1 / P2 sample tables so the P2 section starts at #3 (not #1), continuing from the P1 section's two findings -- demonstrating the cross-severity continuation.
  2. Add a small Residual Actionable Work mini-table below that re-cites one of those # values, demonstrating the post-autofix reuse.

Without an example, the rule is easy to read past.

Comment on lines +739 to +751
test("ce-code-review uses stable sequential finding numbers across grouped output", async () => {
const content = await readRepoFile("plugins/compound-engineering/skills/ce-code-review/SKILL.md")
const template = await readRepoFile(
"plugins/compound-engineering/skills/ce-code-review/references/review-output-template.md",
)

expect(content).toContain("Assign stable finding numbers once")
expect(content).toContain("Do not restart numbering inside each severity table or autofix/routing bucket")
expect(content).toContain("reuse the same stable `#`")
expect(content).toContain("never restart at `1` for a new severity header")
expect(template).toContain("Stable sequential finding numbers")
expect(template).toContain("reuse those same numbers when findings are repeated in Residual Actionable Work")
})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions are too loose -- tighten location anchoring and consider one behavioral check.

Two issues with the current test:

  1. expect(content).toContain("Assign stable finding numbers once") passes even if someone moves the rule into a stale comment, deletes the surrounding step, or adds contradictory guidance elsewhere. The test confirms the rule is written, not that it lives in the right place.
  2. It checks prose presence, not behavior. Compare to ce-plan's Phase 6 checklist (SKILL.md:802) which asserts structural invariants ("U-IDs are unique within the plan and follow the stability rule").

Two concrete improvements -- either is worth doing; both is better:

(a) Anchor the prose assertions to their location by slicing the file around the Stage 5 / Stage 5b headings so the test fails loudly if the rule is moved out of context:

const stage5 = content.split("### Stage 5b:")[0].split("### Stage 5:")[1]
expect(stage5).toMatch(/Assign stable finding numbers once/)
expect(stage5).toMatch(/reuse the same stable `#`/)

(b) Add at least one behavioral check. Even if the rendering itself lives in skill prose (LLM-driven), you can add a fixture in tests/fixtures/ showing a correctly-rendered multi-severity report and assert:

  • numbers across ### P0, ### P1, ### P2, ### P3 tables are strictly monotonic (never reset to 1)
  • a finding repeated in a Residual Actionable Work section retains the same #

A fixture-based assertion exercises a concrete artifact rather than a string in skill prose, and would catch the actual failure mode the rule is trying to prevent.

Copy link
Copy Markdown
Collaborator

@tmchow tmchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feedback to address posted

@tmchow tmchow self-requested a review May 4, 2026 01:51
@tmchow tmchow merged commit e856756 into EveryInc:main May 4, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request May 4, 2026
@hnshah hnshah deleted the fix/ce-code-review-stable-finding-numbers branch May 4, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

For ce code review: Please make auto fix numbers sequential

3 participants