-
Notifications
You must be signed in to change notification settings - Fork 1.3k
fix(ce-code-review): keep finding numbers stable #754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -51,7 +51,7 @@ Use this **exact format** when presenting synthesized review findings. Findings | |
| | # | File | Issue | Route | Next Step | | ||
| |---|------|-------|-------|-----------| | ||
| | 1 | `orders_controller.rb:42` | Ownership check missing on export lookup | `gated_auto -> downstream-resolver` | Defer via tracker (requires explicit approval before behavior change) | | ||
| | 2 | `export_service.rb:91` | Pagination contract needs a broader API decision | `manual -> downstream-resolver` | Defer via tracker with contract and client impact details | | ||
| | 3 | `export_service.rb:91` | Pagination contract needs a broader API decision | `manual -> downstream-resolver` | Defer via tracker with contract and client impact details | | ||
|
|
||
| ### Pre-existing Issues | ||
|
|
||
|
|
@@ -117,6 +117,7 @@ This fails because: no pipe-delimited tables, no severity-grouped `###` headers, | |
|
|
||
| - **Pipe-delimited markdown tables** for findings -- never ASCII box-drawing characters or per-finding horizontal-rule separators between entries (the report-level `---` before the verdict is still required) | ||
| - **Severity-grouped sections** -- `### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`. Omit empty severity levels. | ||
| - **Stable sequential finding numbers** -- assign finding numbers once after sorting, continue them across severity sections, and reuse those same numbers when findings are repeated in Residual Actionable Work. Do not restart at `1` for each severity or route bucket. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Show the rule in the example tables, don't just declare it as a bullet. This template already teaches by example -- good rendering above (~L70-100) and bad rendering (~L100-114). The new rule appears only as a declarative bullet here. LLMs rendering the report lean on the example more than the bullet. Two concrete updates to the example section above:
Without an example, the rule is easy to read past. |
||
| - **Always include file:line location** for code review issues | ||
| - **Reviewer column** shows which persona(s) flagged the issue. Multiple reviewers = cross-reviewer agreement. | ||
| - **Confidence column** shows the finding's anchor as an integer (`50`, `75`, or `100`). Never render as a float. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| ## Code Review Results | ||
|
|
||
| **Scope:** merge-base with main -> working tree | ||
| **Intent:** Demonstrate stable finding numbering | ||
| **Mode:** autofix | ||
|
|
||
| **Reviewers:** correctness, testing, maintainability | ||
|
|
||
| ### P1 -- High | ||
|
|
||
| | # | File | Issue | Reviewer | Confidence | Route | | ||
| |---|------|-------|----------|------------|-------| | ||
| | 1 | `export_service.rb:87` | Loads all orders into memory | performance | 100 | `safe_auto -> review-fixer` | | ||
| | 2 | `export_service.rb:91` | Missing pagination contract | api-contract | 75 | `manual -> downstream-resolver` | | ||
|
|
||
| ### P2 -- Moderate | ||
|
|
||
| | # | File | Issue | Reviewer | Confidence | Route | | ||
| |---|------|-------|----------|------------|-------| | ||
| | 3 | `export_service.rb:45` | Missing error handling | correctness | 75 | `gated_auto -> downstream-resolver` | | ||
|
|
||
| ### Applied Fixes | ||
|
|
||
| - `safe_auto`: Applied bounded export loading fix for #1. | ||
|
|
||
| ### Residual Actionable Work | ||
|
|
||
| | # | File | Issue | Route | Next Step | | ||
| |---|------|-------|-------|-----------| | ||
| | 2 | `export_service.rb:91` | Missing pagination contract | `manual -> downstream-resolver` | Defer via tracker with API contract context | | ||
| | 3 | `export_service.rb:45` | Missing error handling | `gated_auto -> downstream-resolver` | Defer via tracker pending behavior approval | |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -731,8 +731,46 @@ describe("ce-code-review contract", () => { | |
| test("ce-code-review autofix emits a residual-work summary in-chat, not only in the artifact", async () => { | ||
| const content = await readRepoFile("plugins/compound-engineering/skills/ce-code-review/SKILL.md") | ||
| expect(content).toMatch(/Emit a compact Residual Actionable Work summary/) | ||
| expect(content).toContain("with its stable `#`, severity, file:line, title, and autofix_class") | ||
| expect(content).toContain("Structure the summary as two separate contiguous sections") | ||
| expect(content).toContain("applied `safe_auto` fixes first, then residual non-auto findings") | ||
| expect(content).toContain("reuse each finding's stable `#` from Stage 5 -- never renumber") | ||
| expect(content).toContain("Residual actionable work: none.") | ||
| }) | ||
|
|
||
| test("ce-code-review uses stable sequential finding numbers across grouped output", async () => { | ||
| const content = await readRepoFile("plugins/compound-engineering/skills/ce-code-review/SKILL.md") | ||
| const template = await readRepoFile( | ||
| "plugins/compound-engineering/skills/ce-code-review/references/review-output-template.md", | ||
| ) | ||
| const fixture = await readRepoFile("tests/fixtures/ce-code-review-stable-numbering.md") | ||
|
|
||
| const stage5 = content.split("### Stage 5b:")[0].split("### Stage 5:")[1] | ||
| expect(stage5).toMatch(/Sort and number/) | ||
| expect(stage5).toMatch(/Do not restart numbering inside each severity table or autofix\/routing bucket/) | ||
| expect(stage5).toMatch(/reuse the same stable `#`/) | ||
| expect(stage5).toMatch(/ce-resolve-pr-feedback/) | ||
|
|
||
| const stage6 = content.split("### Headless output format")[0].split("### Stage 6: Synthesize and present")[1] | ||
| expect(stage6).toContain("Finding numbers come from the stable assignment in Stage 5") | ||
| expect(stage6).toContain("never re-derive them per severity table") | ||
| expect(template).toContain("Stable sequential finding numbers") | ||
| expect(template).toContain("reuse those same numbers when findings are repeated in Residual Actionable Work") | ||
|
|
||
| const primaryFindingIds = Array.from( | ||
| fixture.matchAll(/^\| (\d+) \| `[^`]+` \| .* \| .* \| \d+ \| `.*` \|$/gm), | ||
| ([, id]) => Number(id), | ||
| ) | ||
| expect(primaryFindingIds).toEqual([1, 2, 3]) | ||
|
|
||
| const residualSection = fixture.split("### Residual Actionable Work")[1] | ||
| const residualIds = Array.from( | ||
| residualSection.matchAll(/^\| (\d+) \| `[^`]+` \| .* \| `.*` \| .* \|$/gm), | ||
| ([, id]) => Number(id), | ||
| ) | ||
| expect(residualIds).toEqual([2, 3]) | ||
| expect(residualIds.every((id) => primaryFindingIds.includes(id))).toBe(true) | ||
| }) | ||
|
Comment on lines
+741
to
+773
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Assertions are too loose -- tighten location anchoring and consider one behavioral check. Two issues with the current test:
Two concrete improvements -- either is worth doing; both is better: (a) Anchor the prose assertions to their location by slicing the file around the Stage 5 / Stage 5b headings so the test fails loudly if the rule is moved out of context: const stage5 = content.split("### Stage 5b:")[0].split("### Stage 5:")[1]
expect(stage5).toMatch(/Assign stable finding numbers once/)
expect(stage5).toMatch(/reuse the same stable `#`/)(b) Add at least one behavioral check. Even if the rendering itself lives in skill prose (LLM-driven), you can add a fixture in
A fixture-based assertion exercises a concrete artifact rather than a string in skill prose, and would catch the actual failure mode the rule is trying to prevent. |
||
| }) | ||
|
|
||
| describe("testing-reviewer contract", () => { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new rule requires reusing stable finding numbers, but the example in this template still renumbers
Residual Actionable Workentries (1,2) instead of preserving original IDs from the findings table (1,3when#2was auto-fixed). Because this file is used as the canonical output pattern, the contradiction can cause the reviewer output to keep remapping IDs, which breaks downstream correlation for residual work even though Stage 5 now defines stable numbering.Useful? React with 👍 / 👎.