Skip to content

[Changelog] Parse rendered fragment-merge with docutils to mirror doc build#5619

Draft
hujc7 wants to merge 1 commit into
isaac-sim:developfrom
hujc7:jichuanh/changelog-compile-lint
Draft

[Changelog] Parse rendered fragment-merge with docutils to mirror doc build#5619
hujc7 wants to merge 1 commit into
isaac-sim:developfrom
hujc7:jichuanh/changelog-compile-lint

Conversation

@hujc7
Copy link
Copy Markdown
Collaborator

@hujc7 hujc7 commented May 14, 2026

Summary

Replace the just-merged orphan-paragraph regex check (#5611) with a docutils-backed lint that parses the rendered fragment-merge output — the exact text the nightly bot prepends to CHANGELOG.rst — through the same engine sphinx-build uses. Any WARNING+ on the merged output fails the PR gate with the same diagnostic Sphinx would emit downstream.

Why this is more durable than the regex

The regex from #5611 catches the shape that broke develop on 2026-05-14 (orphan paragraph inside a section body) but drifts in two directions:

  • Stricter than needed: rejects shapes Sphinx would accept (e.g. a single-fragment Added section ending with a prose note, like the historical isaaclab_rl 0.1.0 initial-version entry).
  • Laxer than needed: passes any other RST defect that doesn't trip the column-zero-paragraph rule (unclosed \`\`...\`\, under-indented continuation, malformed roles in nitpicky-on contexts, etc.).

By compiling the fragments via the same FragmentBatch._merge_sections + _format_entry the bot uses, then parsing with docutils, the gate's accept/reject decision tracks what would actually happen at doc-build time. The remaining drift surface is Sphinx-specific extensions, currently empty for fragment content.

What changed

File Change
tools/changelog/cli.py Remove regex orphan-paragraph rule from Fragment.validate. Add FragmentBatch._CompiledLinter (nested class grouping role-stub registration + lint helper). Add FragmentBatch.validate_compile_output. Wire into PRDiff.evaluate as Rule 4.
.github/workflows/changelog-check.yml pip install docutils step.
tools/changelog/test/test_compile_lint.py 18 new tests: direct lint helper (clean/seam/unclosed-literal/under-indent/role-stubs/prose-intro/dedup/parametrized matrix) + batch integration (empty/skip-only/clean-single/isolated-orphan-tail/merge-seam/inline-defect/error-naming) + gate orchestration (clean PR / fragment compile failure / non-fragment PR).
tools/changelog/test/test_validate.py Replace orphan-paragraph rejection test with acceptance test for isolated orphan-tail shape (now valid per Layer 2 semantics).
tools/changelog/test/invalid_content/3004.rst Removed (was the regex-era fixture).

Why _CompiledLinter is a nested class

The two helpers (_register_sphinx_role_stubs and lint) exist solely to support one purpose — parse a compiled fragment block the way sphinx-build would. The role-stub registration is a required precondition of the lint, never called independently. Grouping them as FragmentBatch._CompiledLinter keeps the cohesion explicit and avoids module-level helpers that say "these belong together" without giving them a home.

Verification

  • pytest tools/changelog/test/105 pass, 0 fail.
  • End-to-end regression: ran the new validator on the actual changelog.d/ state at b65a1ac2b73^ (the bot-input state that produced the broken CHANGELOG.rst). Flags line 35: Unexpected indentation — the exact diagnostic sphinx-build -W emitted on the merged file.
  • Zero false positives: scanned across 49 historical bot-input batches (every auto-version-bump commit's parent state across all packages). Only the one known-bad batch (b65a1ac, the [Newton] Backend-agnostic task-space accessors for IK/OSC #5400 incident) is flagged.
  • Concrete Layer-1-misses-Layer-2-catches cases included as fixtures:
    • Unclosed \`\`literal: Layer 1 passes, Layer 2 reports Inline literal start-string without end-string.
    • Under-indented continuation: Layer 1 passes, Layer 2 reports Bullet list ends without a blank line; unexpected unindent.
  • pre-commit clean on all touched files.

Status

Opened as draft while we let the gate's behavior settle. Ready for review once CI is green; happy to iterate on naming / structure / error wording.

… doc build

The PR gate now compiles every accumulated fragment for a touched package
through the same FragmentBatch._merge_sections + _format_entry the
nightly bot uses, then parses the resulting RST block with the same
docutils engine that backs sphinx-build. Any WARNING+ on the merged
output fails the gate with the exact diagnostic Sphinx would emit at
doc-build time, so structural drift between the gate's checks and the
doc build's checks is bounded by Sphinx-specific extensions only
(currently empty for fragment content).

The previous regex check rejected orphan paragraphs anywhere inside a
section body — over-strict in isolation (false-positive on prose-intro
shapes that compile cleanly) and incomplete in coverage (silent on
unclosed inline literals, under-indented continuations, and other RST
defects that the regex shape can't see). The docutils-backed lint is
both narrower (only flags what the doc build would flag) and broader
(catches every structural class, not just the orphan-paragraph one).

Validator helpers (_register_sphinx_role_stubs + lint) are grouped on
FragmentBatch._CompiledLinter because they exist solely to support
FragmentBatch.validate_compile_output — the role-stub registration is a
required precondition of the lint, never called independently. Keeping
them in a shared namespace makes the relationship explicit.

Verified:
- 105 tests pass (24 prior validate + 18 new compile-lint cases incl.
  empty/skip-only/clean-single/seam/inline-defect/role-stub coverage +
  the rest of the changelog suite).
- Regression: validator on the actual bot-input state at b65a1ac^
  flags 'line 35: Unexpected indentation' — the exact diagnostic
  sphinx-build -W emitted on the merged CHANGELOG.rst.
- False-positive scan across 49 historical bot-input batches: only the
  one known-bad batch (b65a1ac, the isaac-sim#5400 incident) is flagged.
- pre-commit clean on all touched files.
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR replaces the regex-based orphan paragraph detection from #5611 with a proper docutils-based lint that parses the compiled fragment-merge output through the same RST engine used by sphinx-build. This approach catches the exact RST defects that would fail the doc build, eliminating both false positives (valid RST shapes the regex rejected) and false negatives (RST defects the regex could not detect).

Design Assessment

Architecture: ✅ Well-designed

The design choice to use docutils parsing rather than regex pattern matching is sound — it moves the validation from "guessing what Sphinx might reject" to "using the same parser Sphinx uses." Key architectural decisions:

  1. Nested _CompiledLinter class — Clean encapsulation of the role-stub registration and lint logic. The docstring correctly justifies why these helpers belong together.

  2. Layered validation — The batch-level validate_compile_output() only runs when fragments are valid per Rule 2/3, ensuring the lint sees well-formed inputs. The placement after per-fragment checks (Rule 4 position) is correct.

  3. Sphinx role stubs — Registering Sphinx-specific roles (:class:, :meth:, etc.) as no-ops prevents spurious warnings while still catching structural RST issues. The stub list covers the standard Sphinx domain roles used in IsaacLab.

  4. Deduplication — The seen set in lint() prevents duplicate diagnostics from cluttering error output.

Findings

🟡 Minor Suggestions

tools/changelog/cli.py — Consider halt_level documentation

settings.halt_level = 5  # never halt

The magic number 5 works but could benefit from a brief inline note that docutils severity levels are 0-4 (DEBUG through SEVERE), so level 5 effectively means "never halt."

tools/changelog/cli.py:406-410 — Error message wording
The error message mentions "two fragments collide at a section seam" as the primary cause, but also covers single-fragment defects. Consider reordering to lead with the more common case (single-fragment RST defect) since collision seams are rarer.

tools/changelog/cli.py:457-458 — Local import placement

import io  # local — keep cli.py importable when docutils is absent

Good defensive coding. The comment explains the rationale well.

🟢 Positive Observations

  1. Comprehensive test coverage — 370 lines of new tests covering:

    • Direct lint helper tests (clean/seam/unclosed-literal/under-indent/role-stubs/prose-intro/dedup)
    • Batch integration tests (empty/skip-only/clean-single/isolated-orphan/merge-seam/inline-defect/error-naming)
    • Gate orchestration tests (clean PR/fragment compile failure/non-fragment PR)
  2. Regression verification — The commit message documents verification against the actual b65a1ac2b73^ state that caused the #5400 incident, confirming it catches the exact diagnostic.

  3. False positive scan — Testing across 49 historical bot-input batches with only the known-bad batch flagged demonstrates production readiness.

  4. Rule renumbering — Rule 4→5 renumbering is correctly applied, maintaining gate logic coherence.

Test Coverage

Aspect Coverage
_CompiledLinter.lint() ✅ 7 direct unit tests
validate_compile_output() ✅ 8 batch integration tests
PRDiff.evaluate() wiring ✅ 3 gate orchestration tests
Edge cases (empty/skip-only) ✅ Covered
Regression shapes (#5400) ✅ Explicit test case
Parametrized matrix ✅ 5 good/bad shape combinations

Total: 18 new tests, 105 total suite passing

CI Status

Check Status
Check changelog fragments ✅ Pass
Check for Broken Links ✅ Pass
pre-commit ⏳ Pending
Build Latest Docs ⏳ Pending
Installation Tests ⏳ Pending
license-check ⏳ Pending

The changelog check passing is particularly meaningful here since this PR modifies the changelog validation system itself.

Verdict

Ship it 🚀

This is a well-designed improvement that replaces fragile regex heuristics with proper parser-based validation. The implementation is clean, the test coverage is thorough, and the verification against historical data demonstrates it solves the real problem without introducing regressions. The docutils dependency addition is minimal and justified.

The minor suggestions above are optional polish — nothing blocking merge once CI completes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant