Skip to content

feat(doc): convert lark-table XML to markdown tables in +fetch output#585

Closed
hanBufan wants to merge 1 commit intolarksuite:mainfrom
hanBufan:feat/convert-lark-tables-to-markdown
Closed

feat(doc): convert lark-table XML to markdown tables in +fetch output#585
hanBufan wants to merge 1 commit intolarksuite:mainfrom
hanBufan:feat/convert-lark-tables-to-markdown

Conversation

@hanBufan
Copy link
Copy Markdown

@hanBufan hanBufan commented Apr 21, 2026

Convert Feishu/Lark XML tables (, , ) to standard Markdown tables in docs +fetch output (applies to all formats).

Features:

  • Converts simple tables with rows and cells
  • Handles empty cells
  • Escapes pipe characters (|)
  • Converts multiline cell content to
  • Skips tables with merged cells (colspan/rowspan)
  • Skips tables inside fenced code blocks
  • Handles multiple tables per document

Summary

Changes

  • Change 1
  • Change 2

Test Plan

  • Unit tests pass
  • Manual local verification confirms the lark xxx command works as expected

Related Issues

  • None

Summary by CodeRabbit

  • New Features
    • Table markup is now automatically converted to proper Markdown table format during document export, with proper handling of special characters and multi-line cell content.

Convert Feishu/Lark XML tables (<lark-table>, <lark-tr>, <lark-td>) to
standard Markdown tables in docs +fetch output (applies to all formats).

Features:
- Converts simple tables with rows and cells
- Handles empty cells
- Escapes pipe characters (\|)
- Converts multiline cell content to <br/>
- Skips tables with merged cells (colspan/rowspan)
- Skips tables inside fenced code blocks
- Handles multiple tables per document
@github-actions github-actions Bot added domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact labels Apr 21, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

📝 Walkthrough

Walkthrough

This pull request adds support for converting Lark-format HTML-like table markup (<lark-table>) to standard Markdown tables. The new fixLarkTables function is integrated into the markdown post-processing pipeline, escaping pipes and normalizing cell content while preserving tables with colspan/rowspan attributes.

Changes

Cohort / File(s) Summary
Table Conversion Logic
shortcuts/doc/markdown_fix.go
Introduced fixLarkTables function that converts <lark-table>...</lark-table> blocks to Markdown table syntax. Replaces <lark-tr> and <lark-td> with pipe-delimited rows, escapes | characters, converts newlines to <br/>, and skips tables with colspan or rowspan attributes. Integrated as the first transformation in fixExportedMarkdown via applyOutsideCodeFences.
Unit Tests
shortcuts/doc/markdown_fix_test.go
Added TestFixLarkTables and TestFixLarkTablesIntegrated covering: basic table conversion, empty cells, pipe escaping, multiline content handling, merged-cell preservation, fenced code block protection, whitespace normalization, tag attribute stripping, and end-to-end pipeline integration.
Idempotency Test
shortcuts/doc/markdown_fix_hardening_test.go
Added fixture case "lark-table converted to markdown" to TestFixExportedMarkdownIdempotent, verifying that repeated application of fixExportedMarkdown produces consistent results for Lark table markup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

size/M, domain/ccm

Suggested reviewers

  • fangshuyu-768
  • SunPeiYang996

Poem

🐰 A rabbit hops through tables grand,
Where <lark-tags> once did stand.
Now pipes and cells in rows align,
Markdown magic, so divine!
Hopping through code with careful thought,
The finest table conversion wrought!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: converting lark-table XML elements to markdown tables in the +fetch output, which is the primary focus of all file modifications.
Description check ✅ Passed The description covers the main feature, implementation details, and uses the required template sections (Summary, Changes, Test Plan, Related Issues), though Changes section is incomplete with placeholder text.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 21, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
shortcuts/doc/markdown_fix.go (1)

45-50: ⚠️ Potential issue | 🔴 Critical

Keep converted table rows contiguous through the later softbreak pass.

Because fixLarkTables runs before fixTopLevelSoftbreaks, the converted rows (| Header |, | --- |, | Data |) are treated as adjacent top-level paragraphs and get blank lines inserted between them. That breaks the Markdown table in the actual +fetch output; TestFixLarkTablesIntegrated would still pass because it only checks substrings.

🐛 One way to preserve converted Markdown table blocks
 func fixTopLevelSoftbreaks(md string) string {
 	lines := strings.Split(md, "\n")
 	out := make([]string, 0, len(lines)*2)
@@
-					if prev != "" && !isTableStructuralTag(prev) {
+					if prev != "" && !isTableStructuralTag(prev) && !areConsecutiveMarkdownTableRows(prev, trimmed) {
 						out = append(out, "")
 					}
 				}
 			}
 		}
@@
 	return strings.Join(out, "\n")
 }
+
+func areConsecutiveMarkdownTableRows(prev, current string) bool {
+	return isMarkdownTableRow(prev) && isMarkdownTableRow(current)
+}
+
+func isMarkdownTableRow(line string) bool {
+	return strings.HasPrefix(line, "|") && strings.Count(line, "|") >= 2
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/markdown_fix.go` around lines 45 - 50, Reorder the passes so
converted Markdown table rows are not split by the softbreak pass: move the
applyOutsideCodeFences(md, fixLarkTables) call to run after
fixTopLevelSoftbreaks (i.e., call md = fixTopLevelSoftbreaks(md) before applying
fixLarkTables), or equivalently ensure fixTopLevelSoftbreaks skips table blocks
created by fixLarkTables; update the sequence that currently lists
applyOutsideCodeFences(md, fixLarkTables) before fixTopLevelSoftbreaks to
preserve contiguous table rows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@shortcuts/doc/markdown_fix.go`:
- Around line 558-560: The current exact substring checks on tableMatch miss
cases like `colspan = "2"` or `ROWSPAN="2"`; update the merged-cell detection to
use a case-insensitive regex that allows optional whitespace around the equals
sign (for example `(?i)\b(colspan|rowspan)\b\s*=`) instead of the two
strings.Contains checks. Replace the block that tests tableMatch for "colspan="
and "rowspan=" with a precompiled regexp (e.g. regexp.MustCompile) and use its
MatchString on tableMatch to decide whether to return tableMatch unchanged.
- Around line 582-606: Detect and preserve the original EOL when converting the
table: check for CRLF by testing if tableMatch contains "\r\n" and set a local
eol variable (e.g., eol := "\n" or "\r\n"); when transforming multiline cell
content in the content variable, replace "\r\n" first with "<br/>"+eol and then
replace remaining "\n" with "<br/>"+eol (so you don't leave a stray '\r' before
the <br/>); finally use strings.Join(mdRows, eol) when returning the assembled
table instead of joining with "\n" so mdRows, separator, and the overall return
preserve the document's original CRLF or LF; update references in this block
using content, mdRows, separator, colCount, and tableMatch.

---

Outside diff comments:
In `@shortcuts/doc/markdown_fix.go`:
- Around line 45-50: Reorder the passes so converted Markdown table rows are not
split by the softbreak pass: move the applyOutsideCodeFences(md, fixLarkTables)
call to run after fixTopLevelSoftbreaks (i.e., call md =
fixTopLevelSoftbreaks(md) before applying fixLarkTables), or equivalently ensure
fixTopLevelSoftbreaks skips table blocks created by fixLarkTables; update the
sequence that currently lists applyOutsideCodeFences(md, fixLarkTables) before
fixTopLevelSoftbreaks to preserve contiguous table rows.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 838a0b6f-762d-4685-952c-08835e5f03f2

📥 Commits

Reviewing files that changed from the base of the PR and between fbed6be and 498351f.

📒 Files selected for processing (3)
  • shortcuts/doc/markdown_fix.go
  • shortcuts/doc/markdown_fix_hardening_test.go
  • shortcuts/doc/markdown_fix_test.go

Comment on lines +558 to +560
// Check for merged cells - if present, skip conversion and keep XML
if strings.Contains(tableMatch, "colspan=") || strings.Contains(tableMatch, "rowspan=") {
return tableMatch
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Detect merged-cell attributes with XML-compatible spacing/case.

The current exact substring check misses merged cells like colspan = "2" or ROWSPAN="2", so those tables would be converted instead of skipped.

🛡️ Proposed robust merged-cell detection
+var larkMergedCellAttrRe = regexp.MustCompile(`(?i)\b(?:colspan|rowspan)\s*=`)
+
 func fixLarkTables(md string) string {
 	// Match entire <lark-table>...</lark-table> blocks
 	tableRe := regexp.MustCompile(`(?s)<lark-table[^>]*>(.*?)</lark-table>`)
 	return tableRe.ReplaceAllStringFunc(md, func(tableMatch string) string {
 		// Check for merged cells - if present, skip conversion and keep XML
-		if strings.Contains(tableMatch, "colspan=") || strings.Contains(tableMatch, "rowspan=") {
+		if larkMergedCellAttrRe.MatchString(tableMatch) {
 			return tableMatch
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/markdown_fix.go` around lines 558 - 560, The current exact
substring checks on tableMatch miss cases like `colspan = "2"` or `ROWSPAN="2"`;
update the merged-cell detection to use a case-insensitive regex that allows
optional whitespace around the equals sign (for example
`(?i)\b(colspan|rowspan)\b\s*=`) instead of the two strings.Contains checks.
Replace the block that tests tableMatch for "colspan=" and "rowspan=" with a
precompiled regexp (e.g. regexp.MustCompile) and use its MatchString on
tableMatch to decide whether to return tableMatch unchanged.

Comment on lines +582 to +606
content := strings.TrimSpace(cell[1])
// Handle multiline content
content = strings.ReplaceAll(content, "\n", "<br/>")
// Escape pipe characters
content = strings.ReplaceAll(content, "|", `\|`)
cellContents = append(cellContents, content)
}

mdRows = append(mdRows, "| "+strings.Join(cellContents, " | ")+" |")
if len(cellContents) > colCount {
colCount = len(cellContents)
}
}

if len(mdRows) == 0 {
return tableMatch
}

// Build separator row after the first row (header)
separator := "|" + strings.Repeat(" --- |", colCount)
if len(mdRows) > 0 {
mdRows = append([]string{mdRows[0], separator}, mdRows[1:]...)
}

return strings.Join(mdRows, "\n")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Preserve CRLF line endings during table conversion.

For CRLF documents, this path emits converted table rows with \n and converts multiline cells as line1\r<br/>line2, which violates the existing CRLF preservation invariant for exported Markdown.

🧩 Proposed CRLF-safe conversion
+		lineEnding := "\n"
+		if strings.Contains(tableMatch, "\r\n") {
+			lineEnding = "\r\n"
+		}
+
 		var mdRows []string
 		colCount := 0
 
 		for _, row := range rows {
@@
 			var cellContents []string
 			for _, cell := range cells {
 				content := strings.TrimSpace(cell[1])
 				// Handle multiline content
+				content = strings.ReplaceAll(content, "\r\n", "<br/>")
 				content = strings.ReplaceAll(content, "\n", "<br/>")
+				content = strings.ReplaceAll(content, "\r", "<br/>")
 				// Escape pipe characters
 				content = strings.ReplaceAll(content, "|", `\|`)
 				cellContents = append(cellContents, content)
 			}
@@
-		return strings.Join(mdRows, "\n")
+		return strings.Join(mdRows, lineEnding)
 	})
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/markdown_fix.go` around lines 582 - 606, Detect and preserve
the original EOL when converting the table: check for CRLF by testing if
tableMatch contains "\r\n" and set a local eol variable (e.g., eol := "\n" or
"\r\n"); when transforming multiline cell content in the content variable,
replace "\r\n" first with "<br/>"+eol and then replace remaining "\n" with
"<br/>"+eol (so you don't leave a stray '\r' before the <br/>); finally use
strings.Join(mdRows, eol) when returning the assembled table instead of joining
with "\n" so mdRows, separator, and the overall return preserve the document's
original CRLF or LF; update references in this block using content, mdRows,
separator, colCount, and tableMatch.

@hanBufan hanBufan closed this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants