Skip to content

fix(doc): preserve round-trip formatting in fetch output#469

Merged
fangshuyu-768 merged 4 commits intolarksuite:mainfrom
GaoSSR:fix/docs-fetch-roundtrip-formatting
Apr 20, 2026
Merged

fix(doc): preserve round-trip formatting in fetch output#469
fangshuyu-768 merged 4 commits intolarksuite:mainfrom
GaoSSR:fix/docs-fetch-roundtrip-formatting

Conversation

@GaoSSR
Copy link
Copy Markdown
Contributor

@GaoSSR GaoSSR commented Apr 14, 2026

Summary

Fixes a docs round-trip formatting regression in docs +fetch output. Some exported markdown could not be safely re-imported with docs +create / docs +update, which could flatten nested lists and surface inline formatting as literal markdown or text tags.

Changes

  • Trim leading spaces after opening bold and italic delimiters in fetched markdown
  • Normalize space-pair-indented nested list markers to tab indentation before returning fetched markdown
  • Add regression tests for emphasis spacing, text-tag emphasis, and nested list indentation

Test Plan

  • go test ./shortcuts/doc
  • go test ./shortcuts/doc -run 'TestFixExportedMarkdown|TestFixTopLevelSoftbreaks|TestFixBoldSpacing|TestNormalizeNestedListIndentation'
  • go run github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.1.6 run --new-from-rev=origin/main
  • make build
  • Manual local verification: ./lark-cli docs +fetch --doc 'KHCydlulPoZosix0m0ccKPWnnsg' --as user now emits **SFT ...** instead of ** SFT ...**, and nested ordered list items are tab-indented instead of ambiguous space-pair indentation
  • make unit-test still fails locally in unchanged shortcuts/minutes download tests with blocked download URL: local/internal host is not allowed

Related Issues

  • None

Summary by CodeRabbit

  • New Features

    • More consistent nested list rendering by normalizing space-pair indentation into tab-style nesting.
  • Bug Fixes

    • Improved emphasis spacing: trims stray internal spaces in bold/italic while preserving ambiguous patterns, multi-asterisk cases, and inline code spans.
    • Emphasis normalization is more robust with Unicode-aware trimming and avoids altering code blocks.
  • Tests

    • Added/expanded tests covering nested lists, varied emphasis spacing, and inline-code interactions.

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 14, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions Bot added domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact labels Apr 14, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a nested-list indentation normalization pass and a UTF‑8‑aware emphasis spacing normalizer; both run only outside fenced code blocks (and inline code spans). Tests added for emphasis trimming and nested list indentation behaviors.

Changes

Cohort / File(s) Summary
Markdown Processing Logic
shortcuts/doc/markdown_fix.go
Inserted applyOutsideCodeFences(..., normalizeNestedListIndentation) into fixExportedMarkdown; replaced regex-only emphasis trimming with segment-level UTF‑8-aware fixEmphasisSpacingSegment and helpers (nextAsteriskRun, normalizeEmphasisPayload, normalizeNestedListIndentation, hasPreviousNonBlankListItem). Emphasis fixes skip inline code and avoid altering 3+ asterisk cases or payloads containing *. Nested-list normalization converts even-numbered leading-space "pairs" into tab indentation when a previous non-blank list item exists.
Test Suite
shortcuts/doc/markdown_fix_test.go
Added TestFixBoldSpacing table cases covering single leading/trailing internal spaces for */**, preservation of ambiguous patterns and inline code, and a tag-contained case; added TestNormalizeNestedListIndentation validating space-pair→tab conversion and asserting no change for top-level lists, isolated indents, loose-list siblings, and indented code blocks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • fangshuyu-768
  • SunPeiYang996

Poem

🐰 I hop through lists and tidy their tracks,
I swap space pairs for tabby slacks.
I nibble stray gaps from bold and from star,
I guard code spans — I touch them not, far.
Happy markdown, I bound and clap! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: preserving round-trip formatting in fetch output by fixing markdown export/import regression.
Description check ✅ Passed The pull request description covers all required template sections: Summary explains the regression, Changes lists the key modifications, Test Plan details verification steps with checkmarks, and Related Issues is noted as None.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@shortcuts/doc/markdown_fix.go`:
- Around line 307-329: The current normalizeNestedListIndentation function
converts any even-space-indented list marker regardless of surrounding context;
change it to only rewrite indentation when the marker is nested under a
preceding list item. Before replacing indent in normalizeNestedListIndentation,
find the previous non-blank line and confirm it is a list item (use a list-item
detection regex similar to nestedListIndentRe but allowing leading tabs/spaces
and matching markers like [-*+]|\d+[.)]); if there is no previous non-blank list
item (start of file or top-level paragraph), skip conversion so top-level
even-space indents are preserved. Keep all other behavior (even-pair -> tabs
conversion, skipping odd counts) the same and reference nestedListIndentRe and
normalizeNestedListIndentation when making the change.
- Around line 126-129: The leading-space regexes (boldLeadingSpaceRe and
italicLeadingSpaceRe) are too permissive and match ambiguous spans like "* x *";
tighten them by requiring the captured inner text to end with a non-space
character (change the second group from ([^*\n].*?) to ([^*\n].*?\S)) so the
leading-space pass won't consume text that has a trailing space before the
closer and thus won't convert literal " * x * " / " ** x ** " into emphasis;
keep boldTrailingSpaceRe and italicTrailingSpaceRe as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 43193614-e642-48df-a695-4340398bc951

📥 Commits

Reviewing files that changed from the base of the PR and between 20761fa and 6c8adfc.

📒 Files selected for processing (2)
  • shortcuts/doc/markdown_fix.go
  • shortcuts/doc/markdown_fix_test.go

Comment thread shortcuts/doc/markdown_fix.go Outdated
Comment thread shortcuts/doc/markdown_fix.go
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

GaoSSR added 2 commits April 19, 2026 18:46
- trim leading spaces inside bold and italic emphasis exported by docs +fetch

- normalize nested list indentation to avoid flattening and literal text on re-import

- add regression tests for emphasis spacing and nested list indentation
- keep literal * x * and ** x ** text unchanged

- only normalize indented nested list markers when a parent list item exists

- add regression coverage for both CodeRabbit findings
@GaoSSR GaoSSR force-pushed the fix/docs-fetch-roundtrip-formatting branch from 511d8bc to f0ce393 Compare April 19, 2026 10:47
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
shortcuts/doc/markdown_fix.go (1)

114-133: Docstring for fixBoldSpacing is stale after the regex expansion.

The comment still enumerates only "two issues" (trailing ** whitespace, redundant heading bold), but the function now also normalizes leading whitespace inside **…**, and the full set of italic (*…*) leading/trailing cases via italicLeadingSpaceRe / italicTrailingSpaceRe. Consider refreshing the doc block so future readers can see at a glance what each regex covers (and that italic spans are handled too).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/markdown_fix.go` around lines 114 - 133, Update the docblock
for fixBoldSpacing to reflect all current behaviors: list normalization of
leading whitespace inside bold markers (boldLeadingSpaceRe), trimming trailing
whitespace before closing bold (boldTrailingSpaceRe), analogous leading/trailing
normalization for italic spans (italicLeadingSpaceRe and italicTrailingSpaceRe),
and removal of redundant bold in ATX headings (headingBoldRe); keep the note
about skipping inline code spans and briefly describe each regex's purpose so
readers can quickly see what each symbol handles.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@shortcuts/doc/markdown_fix.go`:
- Around line 114-133: Update the docblock for fixBoldSpacing to reflect all
current behaviors: list normalization of leading whitespace inside bold markers
(boldLeadingSpaceRe), trimming trailing whitespace before closing bold
(boldTrailingSpaceRe), analogous leading/trailing normalization for italic spans
(italicLeadingSpaceRe and italicTrailingSpaceRe), and removal of redundant bold
in ATX headings (headingBoldRe); keep the note about skipping inline code spans
and briefly describe each regex's purpose so readers can quickly see what each
symbol handles.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 23241f93-3586-4a2b-8f1e-e6534f0df5ea

📥 Commits

Reviewing files that changed from the base of the PR and between 511d8bc and f0ce393.

📒 Files selected for processing (2)
  • shortcuts/doc/markdown_fix.go
  • shortcuts/doc/markdown_fix_test.go
✅ Files skipped from review due to trivial changes (1)
  • shortcuts/doc/markdown_fix_test.go

Copy link
Copy Markdown
Collaborator

@fangshuyu-768 fangshuyu-768 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the formatting fix. I left a few edge-case concerns around nested list normalization and the missing regression coverage.

Comment thread shortcuts/doc/markdown_fix.go
Comment thread shortcuts/doc/markdown_fix.go
Comment thread shortcuts/doc/markdown_fix_test.go
- 遇到空行时停止向上查找父级列表项,避免把 loose list sibling 误改成嵌套列表
- 避免把列表项中的四空格缩进代码块误改成 tab 缩进列表项
- 补充两个回归测试,并更新 fixBoldSpacing 注释使其与当前实现一致
Copy link
Copy Markdown
Collaborator

@fangshuyu-768 fangshuyu-768 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My earlier comments on the nested-list heuristic are all resolved — confirmed by code + new test cases. Taking a fresh look at the new emphasis (bold/italic) leading-space rules, I spotted three issues worth addressing before merge. All verified by tracing the regexes by hand against the code paths in fixBoldSpacingLine.

Comment thread shortcuts/doc/markdown_fix.go Outdated
Comment thread shortcuts/doc/markdown_fix.go Outdated
Comment thread shortcuts/doc/markdown_fix.go Outdated
- 将 fixBoldSpacingLine 改为按星号 run 扫描,修复 ** hello **、* hello * 和同一行多个 italic span 的空格清理
- 保留 inline code、heading 和 *** hello** 这类近邻字面量,避免误改 emphasis nesting
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
shortcuts/doc/markdown_fix.go (1)

218-222: Docstring slightly misdescribes the scanner's payload rule.

The comment says "any candidate whose payload contains another asterisk" is ignored, but by construction payload := seg[openEnd:closeStart] where closeStart is the next asterisk run — payloads can never contain *. The real rule is that a mismatched closing run length (e.g. opener *, next run **) aborts normalization and the opener is emitted literally, which is what keeps nested‑emphasis‑like text untouched. Consider rewording for accuracy; no code change needed.

📝 Proposed docstring tweak
 // fixEmphasisSpacingSegment trims only the whitespace immediately inside simple
 // *...* and **...** spans. It deliberately ignores runs of 3+ asterisks and
-// any candidate whose payload contains another asterisk so nested emphasis-like
-// text remains untouched. When both inner sides contain whitespace, single-rune
-// payloads are preserved as literal text (for example "* x *" and "** x **").
+// any candidate whose next asterisk run has a different length than the opener,
+// so nested or unbalanced emphasis-like text remains untouched. When both inner
+// sides contain whitespace, single-rune payloads are preserved as literal text
+// (for example "* x *" and "** x **").
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/markdown_fix.go` around lines 218 - 222, The docstring for
fixEmphasisSpacingSegment is inaccurate about payloads containing asterisks;
update it to state that payload := seg[openEnd:closeStart] can never contain '*'
because closeStart points to the next asterisk run, and the actual rule that
prevents normalization is a mismatched closing run length (e.g., opener '*' vs
next run '**') which causes the opener to be emitted literally and preserves
nested-emphasis-like text—adjust the comment to describe this exact behavior and
reference variables seg, openEnd, closeStart and the mismatched run length rule
instead of saying the payload may contain another asterisk.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@shortcuts/doc/markdown_fix.go`:
- Around line 218-222: The docstring for fixEmphasisSpacingSegment is inaccurate
about payloads containing asterisks; update it to state that payload :=
seg[openEnd:closeStart] can never contain '*' because closeStart points to the
next asterisk run, and the actual rule that prevents normalization is a
mismatched closing run length (e.g., opener '*' vs next run '**') which causes
the opener to be emitted literally and preserves nested-emphasis-like
text—adjust the comment to describe this exact behavior and reference variables
seg, openEnd, closeStart and the mismatched run length rule instead of saying
the payload may contain another asterisk.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d2bea55d-59a6-4a59-bd2f-a85abcbc299f

📥 Commits

Reviewing files that changed from the base of the PR and between 6b92fb7 and 2f734f8.

📒 Files selected for processing (2)
  • shortcuts/doc/markdown_fix.go
  • shortcuts/doc/markdown_fix_test.go
✅ Files skipped from review due to trivial changes (1)
  • shortcuts/doc/markdown_fix_test.go

fangshuyu-768 added a commit that referenced this pull request Apr 20, 2026
Adds 5 invariant-level tests on top of #469's transforms:

- TestFixExportedMarkdownIdempotent — f(f(x)) == f(x) across rich
  fixtures (kitchen sink, CJK, nested containers). Protects the core
  round-trip promise from future transform interactions that rewrite
  their own output.
- TestFixExportedMarkdownPreservesFencedCodeByteForByte — packs every
  pipeline-touching shape into a fence and asserts byte-identical output.
  Code samples must never be silently rewritten by a formatting pass.
- TestFixExportedMarkdownPreservesCRLF — CRLF input preserves line
  endings AND still triggers transforms. Windows-authored markdown
  should not be silently LF-normalized.
- TestFixExportedMarkdownTransformInteractions — composition regressions:
  nested-list + trailing-space bold, text→list transition, callout
  containing list with emphasis, heading vs paragraph bold.
- TestNormalizeNestedListIndentationDocumentedSkips — locks in the
  deliberate no-op branches (odd-space indent, blank-line loose-list
  sibling, 4-space indented code block, parentless two-space) as an
  explicit spec so future heuristic tweaks surface in the test diff.

All transforms, fixtures, and expectations are derived from the head of
PR #469. No production code changes.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 93.67089% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.42%. Comparing base (1262aac) to head (2f734f8).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
shortcuts/doc/markdown_fix.go 93.67% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #469      +/-   ##
==========================================
+ Coverage   60.19%   60.42%   +0.22%     
==========================================
  Files         390      393       +3     
  Lines       33433    33657     +224     
==========================================
+ Hits        20125    20336     +211     
- Misses      11426    11433       +7     
- Partials     1882     1888       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@2f734f8937590efec4218f371a56a72eb7b84eda

🧩 Skill update

npx skills add GaoSSR/cli#fix/docs-fetch-roundtrip-formatting -y -g

@fangshuyu-768 fangshuyu-768 merged commit 293a9f8 into larksuite:main Apr 20, 2026
17 checks passed
@GaoSSR
Copy link
Copy Markdown
Contributor Author

GaoSSR commented Apr 20, 2026

@fangshuyu-768 Thanks for merge

fangshuyu-768 added a commit to fangshuyu-768/cli that referenced this pull request Apr 20, 2026
Adds 5 invariant-level tests on top of larksuite#469's transforms:

- TestFixExportedMarkdownIdempotent — f(f(x)) == f(x) across rich
  fixtures (kitchen sink, CJK, nested containers). Protects the core
  round-trip promise from future transform interactions that rewrite
  their own output.
- TestFixExportedMarkdownPreservesFencedCodeByteForByte — packs every
  pipeline-touching shape into a fence and asserts byte-identical output.
  Code samples must never be silently rewritten by a formatting pass.
- TestFixExportedMarkdownPreservesCRLF — CRLF input preserves line
  endings AND still triggers transforms. Windows-authored markdown
  should not be silently LF-normalized.
- TestFixExportedMarkdownTransformInteractions — composition regressions:
  nested-list + trailing-space bold, text→list transition, callout
  containing list with emphasis, heading vs paragraph bold.
- TestNormalizeNestedListIndentationDocumentedSkips — locks in the
  deliberate no-op branches (odd-space indent, blank-line loose-list
  sibling, 4-space indented code block, parentless two-space) as an
  explicit spec so future heuristic tweaks surface in the test diff.

All transforms, fixtures, and expectations are derived from the head of
PR larksuite#469. No production code changes.
fangshuyu-768 added a commit that referenced this pull request Apr 20, 2026
Adds 5 invariant-level tests on top of #469's transforms:

- TestFixExportedMarkdownIdempotent — f(f(x)) == f(x) across rich
  fixtures (kitchen sink, CJK, nested containers). Protects the core
  round-trip promise from future transform interactions that rewrite
  their own output.
- TestFixExportedMarkdownPreservesFencedCodeByteForByte — packs every
  pipeline-touching shape into a fence and asserts byte-identical output.
  Code samples must never be silently rewritten by a formatting pass.
- TestFixExportedMarkdownPreservesCRLF — CRLF input preserves line
  endings AND still triggers transforms. Windows-authored markdown
  should not be silently LF-normalized.
- TestFixExportedMarkdownTransformInteractions — composition regressions:
  nested-list + trailing-space bold, text→list transition, callout
  containing list with emphasis, heading vs paragraph bold.
- TestNormalizeNestedListIndentationDocumentedSkips — locks in the
  deliberate no-op branches (odd-space indent, blank-line loose-list
  sibling, 4-space indented code block, parentless two-space) as an
  explicit spec so future heuristic tweaks surface in the test diff.

All transforms, fixtures, and expectations are derived from the head of
PR #469. No production code changes.

Co-authored-by: fangshuyu-768 <shuyufang768@outlook.com>
@liangshuo-1 liangshuo-1 mentioned this pull request Apr 21, 2026
3 tasks
@coderabbitai coderabbitai Bot mentioned this pull request Apr 27, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants