feat(doc): convert lark-table XML to markdown tables in +fetch output by hanBufan · Pull Request #585 · larksuite/cli

hanBufan · 2026-04-21T08:08:22Z

Convert Feishu/Lark XML tables (, , ) to standard Markdown tables in docs +fetch output (applies to all formats).

Features:

Converts simple tables with rows and cells
Handles empty cells
Escapes pipe characters (|)
Converts multiline cell content to
Skips tables with merged cells (colspan/rowspan)
Skips tables inside fenced code blocks
Handles multiple tables per document

Summary

Changes

Change 1
Change 2

Test Plan

Unit tests pass
Manual local verification confirms the lark xxx command works as expected

Related Issues

None

Summary by CodeRabbit

New Features
- Table markup is now automatically converted to proper Markdown table format during document export, with proper handling of special characters and multi-line cell content.

Convert Feishu/Lark XML tables (<lark-table>, <lark-tr>, <lark-td>) to standard Markdown tables in docs +fetch output (applies to all formats). Features: - Converts simple tables with rows and cells - Handles empty cells - Escapes pipe characters (\|) - Converts multiline cell content to - Skips tables with merged cells (colspan/rowspan) - Skips tables inside fenced code blocks - Handles multiple tables per document

coderabbitai · 2026-04-21T08:08:36Z

📝 Walkthrough

Walkthrough

This pull request adds support for converting Lark-format HTML-like table markup (<lark-table>) to standard Markdown tables. The new fixLarkTables function is integrated into the markdown post-processing pipeline, escaping pipes and normalizing cell content while preserving tables with colspan/rowspan attributes.

Changes

Cohort / File(s)	Summary
Table Conversion Logic `shortcuts/doc/markdown_fix.go`	Introduced `fixLarkTables` function that converts `<lark-table>...</lark-table>` blocks to Markdown table syntax. Replaces `<lark-tr>` and `<lark-td>` with pipe-delimited rows, escapes `\|` characters, converts newlines to `<br/>`, and skips tables with `colspan` or `rowspan` attributes. Integrated as the first transformation in `fixExportedMarkdown` via `applyOutsideCodeFences`.
Unit Tests `shortcuts/doc/markdown_fix_test.go`	Added `TestFixLarkTables` and `TestFixLarkTablesIntegrated` covering: basic table conversion, empty cells, pipe escaping, multiline content handling, merged-cell preservation, fenced code block protection, whitespace normalization, tag attribute stripping, and end-to-end pipeline integration.
Idempotency Test `shortcuts/doc/markdown_fix_hardening_test.go`	Added fixture case `"lark-table converted to markdown"` to `TestFixExportedMarkdownIdempotent`, verifying that repeated application of `fixExportedMarkdown` produces consistent results for Lark table markup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix(doc): post-process docs +fetch output to improve round-trip fidelity #214: Extends the markdown post-processing pipeline by adding fixLarkTables transformation to the same function.
test(doc): harden markdown_fix pipeline with invariant tests #576: Adds idempotency test fixture to the same hardening test file introduced in this PR.
fix(doc): preserve round-trip formatting in fetch output #469: Previously modified fixExportedMarkdown to insert applyOutsideCodeFences transformations using the same pattern.

Suggested labels

size/M, domain/ccm

Suggested reviewers

fangshuyu-768
SunPeiYang996

Poem

🐰 A rabbit hops through tables grand,
Where <lark-tags> once did stand.
Now pipes and cells in rows align,
Markdown magic, so divine!
Hopping through code with careful thought, ✨
The finest table conversion wrought!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: converting lark-table XML elements to markdown tables in the +fetch output, which is the primary focus of all file modifications.
Description check	✅ Passed	The description covers the main feature, implementation details, and uses the required template sections (Summary, Changes, Test Plan, Related Issues), though Changes section is incomplete with placeholder text.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

CLAassistant · 2026-04-21T08:08:37Z

All committers have signed the CLA.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

shortcuts/doc/markdown_fix.go (1)

45-50: ⚠️ Potential issue | 🔴 Critical

Keep converted table rows contiguous through the later softbreak pass.

Because fixLarkTables runs before fixTopLevelSoftbreaks, the converted rows (| Header |, | --- |, | Data |) are treated as adjacent top-level paragraphs and get blank lines inserted between them. That breaks the Markdown table in the actual +fetch output; TestFixLarkTablesIntegrated would still pass because it only checks substrings.

🐛 One way to preserve converted Markdown table blocks

 func fixTopLevelSoftbreaks(md string) string {
 	lines := strings.Split(md, "\n")
 	out := make([]string, 0, len(lines)*2)
@@
-					if prev != "" && !isTableStructuralTag(prev) {
+					if prev != "" && !isTableStructuralTag(prev) && !areConsecutiveMarkdownTableRows(prev, trimmed) {
 						out = append(out, "")
 					}
 				}
 			}
 		}
@@
 	return strings.Join(out, "\n")
 }
+
+func areConsecutiveMarkdownTableRows(prev, current string) bool {
+	return isMarkdownTableRow(prev) && isMarkdownTableRow(current)
+}
+
+func isMarkdownTableRow(line string) bool {
+	return strings.HasPrefix(line, "|") && strings.Count(line, "|") >= 2
+}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/markdown_fix.go` around lines 45 - 50, Reorder the passes so
converted Markdown table rows are not split by the softbreak pass: move the
applyOutsideCodeFences(md, fixLarkTables) call to run after
fixTopLevelSoftbreaks (i.e., call md = fixTopLevelSoftbreaks(md) before applying
fixLarkTables), or equivalently ensure fixTopLevelSoftbreaks skips table blocks
created by fixLarkTables; update the sequence that currently lists
applyOutsideCodeFences(md, fixLarkTables) before fixTopLevelSoftbreaks to
preserve contiguous table rows.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@shortcuts/doc/markdown_fix.go`:
- Around line 558-560: The current exact substring checks on tableMatch miss
cases like `colspan = "2"` or `ROWSPAN="2"`; update the merged-cell detection to
use a case-insensitive regex that allows optional whitespace around the equals
sign (for example `(?i)\b(colspan|rowspan)\b\s*=`) instead of the two
strings.Contains checks. Replace the block that tests tableMatch for "colspan="
and "rowspan=" with a precompiled regexp (e.g. regexp.MustCompile) and use its
MatchString on tableMatch to decide whether to return tableMatch unchanged.
- Around line 582-606: Detect and preserve the original EOL when converting the
table: check for CRLF by testing if tableMatch contains "\r\n" and set a local
eol variable (e.g., eol := "\n" or "\r\n"); when transforming multiline cell
content in the content variable, replace "\r\n" first with "<br/>"+eol and then
replace remaining "\n" with "<br/>"+eol (so you don't leave a stray '\r' before
the <br/>); finally use strings.Join(mdRows, eol) when returning the assembled
table instead of joining with "\n" so mdRows, separator, and the overall return
preserve the document's original CRLF or LF; update references in this block
using content, mdRows, separator, colCount, and tableMatch.

---

Outside diff comments:
In `@shortcuts/doc/markdown_fix.go`:
- Around line 45-50: Reorder the passes so converted Markdown table rows are not
split by the softbreak pass: move the applyOutsideCodeFences(md, fixLarkTables)
call to run after fixTopLevelSoftbreaks (i.e., call md =
fixTopLevelSoftbreaks(md) before applying fixLarkTables), or equivalently ensure
fixTopLevelSoftbreaks skips table blocks created by fixLarkTables; update the
sequence that currently lists applyOutsideCodeFences(md, fixLarkTables) before
fixTopLevelSoftbreaks to preserve contiguous table rows.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 838a0b6f-762d-4685-952c-08835e5f03f2

📥 Commits

Reviewing files that changed from the base of the PR and between fbed6be and 498351f.

📒 Files selected for processing (3)

shortcuts/doc/markdown_fix.go
shortcuts/doc/markdown_fix_hardening_test.go
shortcuts/doc/markdown_fix_test.go

coderabbitai · 2026-04-21T08:13:47Z

+		// Check for merged cells - if present, skip conversion and keep XML
+		if strings.Contains(tableMatch, "colspan=") || strings.Contains(tableMatch, "rowspan=") {
+			return tableMatch


⚠️ Potential issue | 🟡 Minor

Detect merged-cell attributes with XML-compatible spacing/case.

The current exact substring check misses merged cells like colspan = "2" or ROWSPAN="2", so those tables would be converted instead of skipped.

🛡️ Proposed robust merged-cell detection

+var larkMergedCellAttrRe = regexp.MustCompile(`(?i)\b(?:colspan|rowspan)\s*=`) + func fixLarkTables(md string) string { // Match entire <lark-table>...</lark-table> blocks tableRe := regexp.MustCompile(`(?s)<lark-table[^>]*>(.*?)</lark-table>`) return tableRe.ReplaceAllStringFunc(md, func(tableMatch string) string { // Check for merged cells - if present, skip conversion and keep XML - if strings.Contains(tableMatch, "colspan=") || strings.Contains(tableMatch, "rowspan=") { + if larkMergedCellAttrRe.MatchString(tableMatch) { return tableMatch }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@shortcuts/doc/markdown_fix.go` around lines 558 - 560, The current exact substring checks on tableMatch miss cases like `colspan = "2"` or `ROWSPAN="2"`; update the merged-cell detection to use a case-insensitive regex that allows optional whitespace around the equals sign (for example `(?i)\b(colspan|rowspan)\b\s*=`) instead of the two strings.Contains checks. Replace the block that tests tableMatch for "colspan=" and "rowspan=" with a precompiled regexp (e.g. regexp.MustCompile) and use its MatchString on tableMatch to decide whether to return tableMatch unchanged.

coderabbitai · 2026-04-21T08:13:48Z

+				content := strings.TrimSpace(cell[1])
+				// Handle multiline content
+				content = strings.ReplaceAll(content, "\n", "<br/>")
+				// Escape pipe characters
+				content = strings.ReplaceAll(content, "|", `\|`)
+				cellContents = append(cellContents, content)
+			}
+
+			mdRows = append(mdRows, "| "+strings.Join(cellContents, " | ")+" |")
+			if len(cellContents) > colCount {
+				colCount = len(cellContents)
+			}
+		}
+
+		if len(mdRows) == 0 {
+			return tableMatch
+		}
+
+		// Build separator row after the first row (header)
+		separator := "|" + strings.Repeat(" --- |", colCount)
+		if len(mdRows) > 0 {
+			mdRows = append([]string{mdRows[0], separator}, mdRows[1:]...)
+		}
+
+		return strings.Join(mdRows, "\n")


⚠️ Potential issue | 🟡 Minor

Preserve CRLF line endings during table conversion.

For CRLF documents, this path emits converted table rows with \n and converts multiline cells as line1\r line2, which violates the existing CRLF preservation invariant for exported Markdown.

🧩 Proposed CRLF-safe conversion

+ lineEnding := "\n" + if strings.Contains(tableMatch, "\r\n") { + lineEnding = "\r\n" + } + var mdRows []string colCount := 0 for _, row := range rows { @@ var cellContents []string for _, cell := range cells { content := strings.TrimSpace(cell[1]) // Handle multiline content + content = strings.ReplaceAll(content, "\r\n", " ") content = strings.ReplaceAll(content, "\n", " ") + content = strings.ReplaceAll(content, "\r", " ") // Escape pipe characters content = strings.ReplaceAll(content, "|", `\|`) cellContents = append(cellContents, content) } @@ - return strings.Join(mdRows, "\n") + return strings.Join(mdRows, lineEnding) }) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@shortcuts/doc/markdown_fix.go` around lines 582 - 606, Detect and preserve the original EOL when converting the table: check for CRLF by testing if tableMatch contains "\r\n" and set a local eol variable (e.g., eol := "\n" or "\r\n"); when transforming multiline cell content in the content variable, replace "\r\n" first with " "+eol and then replace remaining "\n" with " "+eol (so you don't leave a stray '\r' before the ); finally use strings.Join(mdRows, eol) when returning the assembled table instead of joining with "\n" so mdRows, separator, and the overall return preserve the document's original CRLF or LF; update references in this block using content, mdRows, separator, colCount, and tableMatch.

github-actions Bot added domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact labels Apr 21, 2026

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

hanBufan closed this Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(doc): convert lark-table XML to markdown tables in +fetch output#585

feat(doc): convert lark-table XML to markdown tables in +fetch output#585
hanBufan wants to merge 1 commit intolarksuite:mainfrom
hanBufan:feat/convert-lark-tables-to-markdown

hanBufan commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

CLAassistant commented Apr 21, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

coderabbitai Bot Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hanBufan commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Plan

Related Issues

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

CLAassistant commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hanBufan commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

CLAassistant commented Apr 21, 2026 •

edited

Loading