Skip to content

fix(doc): exclude code regions and escaped markers from docs +update checks#578

Merged
fangshuyu-768 merged 2 commits intofeat/docs-update-semantic-checkfrom
fix/docs-update-check-code-regions
Apr 21, 2026
Merged

fix(doc): exclude code regions and escaped markers from docs +update checks#578
fangshuyu-768 merged 2 commits intofeat/docs-update-semantic-checkfrom
fix/docs-update-check-code-regions

Conversation

@fangshuyu-768
Copy link
Copy Markdown
Collaborator

Summary

Follow-up to #569, targeting feat/docs-update-semantic-check so the fixes land together with the feature. Addresses the three inline review comments I left on that PR:

  1. docs_update_check.go:50 — blank-line check must ignore fenced code blocks / inline code / literal markdown examples.
  2. docs_update_check.go:75 — bold+italic regexes must ignore code spans / code fences / escaped markers.
  3. docs_update_check_test.go:88 — test suite should carry negative cases for fenced / inline / escaped markdown so future changes don't silently reintroduce false positives.

Changes

shortcuts/doc/docs_update_check.go

  • proseHasBlankLine(markdown) — fence-aware replacement for strings.Contains(markdown, "\n\n"). Returns true only when a blank line sits outside ```...``` / ~~~...~~~ regions. A trailing empty line at EOF is not counted.
  • stripMarkdownCodeRegions(markdown) — blanks out fenced code lines and masks inline code spans with equal-length whitespace (reusing scanInlineCodeSpans from markdown_fix.go). Preserves byte offsets outside the stripped regions so a future check that wants to report positions to the user doesn't need its own offset map.
  • stripEscapedEmphasisMarkers(s) — removes \* and \_. CommonMark renders these as literal * / _ with no emphasis semantics, so they shouldn't trip the combined-emphasis regex.
  • checkDocsUpdateReplaceMultilineMarkdown now calls proseHasBlankLine instead of the raw substring check.
  • checkDocsUpdateBoldItalic now runs the regexes on stripEscapedEmphasisMarkers(stripMarkdownCodeRegions(markdown)).

Shared fence helpers (codeFenceOpenMarker, isCodeFenceClose, leadingRun) are kept local to this file to keep the change scoped to the reviewed PR. If they're useful elsewhere in the doc package later, they can be promoted then.

shortcuts/doc/docs_update_check_test.go

13 new cases across the two check tests:

Area Cases added
TestCheckDocsUpdateReplaceMultilineMarkdown blank line inside backtick fence (no flag), blank line inside tilde fence (no flag), prose-blank wins over fence-blank (flag), fenced code with no blank lines (no flag)
TestCheckDocsUpdateBoldItalic ***text*** in backtick fence, **_text_** in backtick fence, _**text**_ in tilde fence, ***text*** in inline code, **_text_** in inline code, \***text*** escaped, \*\*_text_\*\* escaped — all expected to not flag; plus two positive cases to verify real emphasis outside code still fires when code is nearby

Test plan

Note

maintainer_can_modify: false on #569, so this is submitted as a stacked PR against the feature branch. Merging it before #569 completes the feature in a single logical increment; no separate main-targeted follow-up needed.

…checks

Addresses the three review comments on #569: the blank-line paragraph
check and the bold+italic emphasis check both operate on the raw
markdown string, so fenced code blocks / inline code spans / literal
escaped markers produce false-positive warnings on content users
expect to pass through verbatim.

Changes:

- Add proseHasBlankLine(): fence-aware detector that returns true only
  when a blank line sits outside of ```...``` or ~~~...~~~ regions.
  Replaces the raw strings.Contains("\n\n") check in
  checkDocsUpdateReplaceMultilineMarkdown.

- Add stripMarkdownCodeRegions(): blanks out fenced code lines and
  masks inline code spans (via scanInlineCodeSpans from markdown_fix.go)
  with equal-length whitespace so byte offsets outside the stripped
  regions are preserved.

- Add stripEscapedEmphasisMarkers(): removes "\*" and "\_" so literal
  sequences like "\***text***" — which CommonMark renders as a literal
  asterisk plus bold — don't match the combined bold+italic regex.

- Wire both helpers into checkDocsUpdateBoldItalic(): the regex now runs
  on stripEscapedEmphasisMarkers(stripMarkdownCodeRegions(markdown)),
  so code samples and escaped markers are sanitized away before
  detection.

Shared fence-parsing helpers (codeFenceOpenMarker, isCodeFenceClose,
leadingRun) are kept local to this file to avoid touching files outside
the scope of the reviewed PR. If a future change wants to reuse them
across the doc package, they can be promoted then.

Tests:

- TestCheckDocsUpdateReplaceMultilineMarkdown: add 4 negative/positive
  cases — blank line inside backtick and tilde fences (no flag), blank
  line in prose while fence also has blanks (flag wins), fenced code
  with no blank lines (no flag).

- TestCheckDocsUpdateBoldItalic: add 9 cases — ***text*** / **_text_** /
  _**text**_ inside fenced code (backtick and tilde), inside inline
  code spans, and escaped \***text*** / \*\*_text_\*\* (none flagged);
  plus two positive cases to verify the strip doesn't over-sanitize
  (real emphasis in prose still fires when inline/fenced code is nearby).
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


fangshuyu-768 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1bc1a8d9-afee-4160-91bc-a4d3adb18063

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/docs-update-check-code-regions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact labels Apr 20, 2026
…shapes

Self-review of the first commit turned up three issues:

- isCodeFenceClose was strict on exact marker length. Per CommonMark
  §4.5, a closing fence must be at least as long as the opener, not
  exactly the same length. A 3-backtick open legitimately closed by a
  4-backtick closer (used to embed triple-backticks inside the code
  sample) was left open-ended, causing the rest of the document to be
  treated as code and both checks to silently skip it.

- Both fence helpers accepted any amount of leading whitespace because
  they ran on strings.TrimSpace(line). CommonMark allows 0..3 leading
  spaces before a fence marker; 4+ spaces (or any tab in leading
  position, which expands to 4 columns) makes the line indented code
  block content, not a fence open/close. Indented fence-like lines now
  correctly remain prose and blank lines around them are detected.

- The bold/italic check only covered three of the six documented
  combined-emphasis shapes. Added ___text___, __*text*__, and
  *__text__* so parity with the asterisk variants is complete. The
  regex set is now table-driven (combinedEmphasisPatterns) to make
  adding future shapes a one-line change.

Implementation changes:

- New fenceIndentOK(line) helper: returns (body, true) for 0..3 leading
  spaces with no tabs, else (_, false). Used by both codeFenceOpenMarker
  and isCodeFenceClose.
- isCodeFenceClose now counts the fence-char run and accepts any run
  length >= len(marker), with trailing whitespace only.
- checkDocsUpdateBoldItalic replaced three named var regexes with a
  table of six {shape, re} entries and a single early-exit loop.
- Updated docsUpdateWarnings top docstring to list all six shapes.
- Noted the known limitation of stripEscapedEmphasisMarkers around
  doubled backslash escapes ("\\***text***"), which is a false negative
  we accept in exchange for keeping this a simple string replace.

Test additions (docs_update_check_test.go):

- Fence close: longer-marker close correctly ends fence; real prose
  blank after a longer-close fence is still detected.
- Indentation: 4-space indented fence-like line is not a fence open,
  so a surrounding blank line still flags; tab-indented variant same;
  3-space indented fence is still a real fence.
- New shapes: ___text___ positive + all three negative-guards (fenced
  code, inline code, escaped); __*text*__ and *__text__* positive +
  fenced/inline negative-guards; plus two composition tests to ensure
  the strip does not over-sanitize across the six-regex alternative set.

All 53 sub-tests in this file pass; go vet and gofmt are clean.
@fangshuyu-768 fangshuyu-768 merged commit 199aadb into feat/docs-update-semantic-check Apr 21, 2026
2 of 3 checks passed
@fangshuyu-768 fangshuyu-768 deleted the fix/docs-update-check-code-regions branch April 21, 2026 03:21
fangshuyu-768 added a commit that referenced this pull request Apr 21, 2026
* feat(doc): add pre-write semantic warnings to docs +update

Two static checks run before the MCP update-doc call:

1. replace_* + blank-line markdown: replace_range / replace_all only
   swap text inside an existing block — a \n\n in the payload will
   render as literal text, not a paragraph break. Hint to use
   delete_range + insert_before instead.

2. Combined bold+italic emphases (***text***, **_text_**, _**text**_)
   cannot round-trip through Lark and are silently downgraded to a
   single emphasis. Hint to split into two separate emphases.

Both warnings go to stderr and never block the update — they inform,
not gate. Adds table-driven tests for each check plus an aggregation
test, and wires the checks into Execute right before CallMCPTool.

Closes the first batch of items from the docs +update pitfalls
review (Cases 1 and 5).

* fix(doc): exclude code regions and escaped markers from docs +update checks (#578)

* fix(doc): exclude code regions and escaped markers from docs +update checks

Addresses the three review comments on #569: the blank-line paragraph
check and the bold+italic emphasis check both operate on the raw
markdown string, so fenced code blocks / inline code spans / literal
escaped markers produce false-positive warnings on content users
expect to pass through verbatim.

Changes:

- Add proseHasBlankLine(): fence-aware detector that returns true only
  when a blank line sits outside of ```...``` or ~~~...~~~ regions.
  Replaces the raw strings.Contains("\n\n") check in
  checkDocsUpdateReplaceMultilineMarkdown.

- Add stripMarkdownCodeRegions(): blanks out fenced code lines and
  masks inline code spans (via scanInlineCodeSpans from markdown_fix.go)
  with equal-length whitespace so byte offsets outside the stripped
  regions are preserved.

- Add stripEscapedEmphasisMarkers(): removes "\*" and "\_" so literal
  sequences like "\***text***" — which CommonMark renders as a literal
  asterisk plus bold — don't match the combined bold+italic regex.

- Wire both helpers into checkDocsUpdateBoldItalic(): the regex now runs
  on stripEscapedEmphasisMarkers(stripMarkdownCodeRegions(markdown)),
  so code samples and escaped markers are sanitized away before
  detection.

Shared fence-parsing helpers (codeFenceOpenMarker, isCodeFenceClose,
leadingRun) are kept local to this file to avoid touching files outside
the scope of the reviewed PR. If a future change wants to reuse them
across the doc package, they can be promoted then.

Tests:

- TestCheckDocsUpdateReplaceMultilineMarkdown: add 4 negative/positive
  cases — blank line inside backtick and tilde fences (no flag), blank
  line in prose while fence also has blanks (flag wins), fenced code
  with no blank lines (no flag).

- TestCheckDocsUpdateBoldItalic: add 9 cases — ***text*** / **_text_** /
  _**text**_ inside fenced code (backtick and tilde), inside inline
  code spans, and escaped \***text*** / \*\*_text_\*\* (none flagged);
  plus two positive cases to verify the strip doesn't over-sanitize
  (real emphasis in prose still fires when inline/fenced code is nearby).

* fix(doc): close CommonMark gaps and add three more combined-emphasis shapes

Self-review of the first commit turned up three issues:

- isCodeFenceClose was strict on exact marker length. Per CommonMark
  §4.5, a closing fence must be at least as long as the opener, not
  exactly the same length. A 3-backtick open legitimately closed by a
  4-backtick closer (used to embed triple-backticks inside the code
  sample) was left open-ended, causing the rest of the document to be
  treated as code and both checks to silently skip it.

- Both fence helpers accepted any amount of leading whitespace because
  they ran on strings.TrimSpace(line). CommonMark allows 0..3 leading
  spaces before a fence marker; 4+ spaces (or any tab in leading
  position, which expands to 4 columns) makes the line indented code
  block content, not a fence open/close. Indented fence-like lines now
  correctly remain prose and blank lines around them are detected.

- The bold/italic check only covered three of the six documented
  combined-emphasis shapes. Added ___text___, __*text*__, and
  *__text__* so parity with the asterisk variants is complete. The
  regex set is now table-driven (combinedEmphasisPatterns) to make
  adding future shapes a one-line change.

Implementation changes:

- New fenceIndentOK(line) helper: returns (body, true) for 0..3 leading
  spaces with no tabs, else (_, false). Used by both codeFenceOpenMarker
  and isCodeFenceClose.
- isCodeFenceClose now counts the fence-char run and accepts any run
  length >= len(marker), with trailing whitespace only.
- checkDocsUpdateBoldItalic replaced three named var regexes with a
  table of six {shape, re} entries and a single early-exit loop.
- Updated docsUpdateWarnings top docstring to list all six shapes.
- Noted the known limitation of stripEscapedEmphasisMarkers around
  doubled backslash escapes ("\\***text***"), which is a false negative
  we accept in exchange for keeping this a simple string replace.

Test additions (docs_update_check_test.go):

- Fence close: longer-marker close correctly ends fence; real prose
  blank after a longer-close fence is still detected.
- Indentation: 4-space indented fence-like line is not a fence open,
  so a surrounding blank line still flags; tab-indented variant same;
  3-space indented fence is still a real fence.
- New shapes: ___text___ positive + all three negative-guards (fenced
  code, inline code, escaped); __*text*__ and *__text__* positive +
  fenced/inline negative-guards; plus two composition tests to ensure
  the strip does not over-sanitize across the six-regex alternative set.

All 53 sub-tests in this file pass; go vet and gofmt are clean.

---------

Co-authored-by: fangshuyu-768 <shuyufang768@outlook.com>

* fix(doc): address CodeRabbit review on docs +update warnings (#581)

Two CodeRabbit nits from #569:

1. Unit test hint assertion only checked for `delete_range` in the
   remediation message; the companion `insert_before` half of the
   guidance could regress undetected. Broaden the assertion to require
   both tokens so a future edit that drops half the remediation
   produces an immediate test failure.

2. No E2E coverage proved the dry-run contract in the PR description
   ("Not emitted in dry-run mode — kept quiet during planning"). The
   helper itself is unit-tested, but nothing caught a regression where
   a later refactor wired docsUpdateWarnings into the DryRun path.

   Add tests/cli_e2e/docs/docs_update_dryrun_test.go:
   TestDocs_UpdateDryRunSuppressesSemanticWarnings invokes
   `docs +update --dry-run --mode=replace_range --markdown "***x***\n\nb"`
   — an input crafted to trip BOTH pre-write warnings — and asserts
   neither the "warning:" prefix, the blank-line message, nor the
   combined-emphasis message appears on stdout or stderr.

   Note: the file needs -f to add because .gitignore has a bare
   `docs/` rule that accidentally matches tests/cli_e2e/docs/. The
   existing tracked files under that directory predate the rule; new
   additions have to be force-added until the ignore pattern is
   narrowed. Not worth rewriting .gitignore for one file.

Verified manually that the new E2E fails cleanly when warnings are
injected into DryRun and passes again after reverting — the test has
real regression-detection power, not just a sticker.

Co-authored-by: fangshuyu-768 <shuyufang768@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants