feat(doc): convert lark-table XML to markdown tables in +fetch output#585
feat(doc): convert lark-table XML to markdown tables in +fetch output#585hanBufan wants to merge 1 commit intolarksuite:mainfrom
Conversation
Convert Feishu/Lark XML tables (<lark-table>, <lark-tr>, <lark-td>) to standard Markdown tables in docs +fetch output (applies to all formats). Features: - Converts simple tables with rows and cells - Handles empty cells - Escapes pipe characters (\|) - Converts multiline cell content to <br/> - Skips tables with merged cells (colspan/rowspan) - Skips tables inside fenced code blocks - Handles multiple tables per document
📝 WalkthroughWalkthroughThis pull request adds support for converting Lark-format HTML-like table markup ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
shortcuts/doc/markdown_fix.go (1)
45-50:⚠️ Potential issue | 🔴 CriticalKeep converted table rows contiguous through the later softbreak pass.
Because
fixLarkTablesruns beforefixTopLevelSoftbreaks, the converted rows (| Header |,| --- |,| Data |) are treated as adjacent top-level paragraphs and get blank lines inserted between them. That breaks the Markdown table in the actual+fetchoutput;TestFixLarkTablesIntegratedwould still pass because it only checks substrings.🐛 One way to preserve converted Markdown table blocks
func fixTopLevelSoftbreaks(md string) string { lines := strings.Split(md, "\n") out := make([]string, 0, len(lines)*2) @@ - if prev != "" && !isTableStructuralTag(prev) { + if prev != "" && !isTableStructuralTag(prev) && !areConsecutiveMarkdownTableRows(prev, trimmed) { out = append(out, "") } } } } @@ return strings.Join(out, "\n") } + +func areConsecutiveMarkdownTableRows(prev, current string) bool { + return isMarkdownTableRow(prev) && isMarkdownTableRow(current) +} + +func isMarkdownTableRow(line string) bool { + return strings.HasPrefix(line, "|") && strings.Count(line, "|") >= 2 +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/doc/markdown_fix.go` around lines 45 - 50, Reorder the passes so converted Markdown table rows are not split by the softbreak pass: move the applyOutsideCodeFences(md, fixLarkTables) call to run after fixTopLevelSoftbreaks (i.e., call md = fixTopLevelSoftbreaks(md) before applying fixLarkTables), or equivalently ensure fixTopLevelSoftbreaks skips table blocks created by fixLarkTables; update the sequence that currently lists applyOutsideCodeFences(md, fixLarkTables) before fixTopLevelSoftbreaks to preserve contiguous table rows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@shortcuts/doc/markdown_fix.go`:
- Around line 558-560: The current exact substring checks on tableMatch miss
cases like `colspan = "2"` or `ROWSPAN="2"`; update the merged-cell detection to
use a case-insensitive regex that allows optional whitespace around the equals
sign (for example `(?i)\b(colspan|rowspan)\b\s*=`) instead of the two
strings.Contains checks. Replace the block that tests tableMatch for "colspan="
and "rowspan=" with a precompiled regexp (e.g. regexp.MustCompile) and use its
MatchString on tableMatch to decide whether to return tableMatch unchanged.
- Around line 582-606: Detect and preserve the original EOL when converting the
table: check for CRLF by testing if tableMatch contains "\r\n" and set a local
eol variable (e.g., eol := "\n" or "\r\n"); when transforming multiline cell
content in the content variable, replace "\r\n" first with "<br/>"+eol and then
replace remaining "\n" with "<br/>"+eol (so you don't leave a stray '\r' before
the <br/>); finally use strings.Join(mdRows, eol) when returning the assembled
table instead of joining with "\n" so mdRows, separator, and the overall return
preserve the document's original CRLF or LF; update references in this block
using content, mdRows, separator, colCount, and tableMatch.
---
Outside diff comments:
In `@shortcuts/doc/markdown_fix.go`:
- Around line 45-50: Reorder the passes so converted Markdown table rows are not
split by the softbreak pass: move the applyOutsideCodeFences(md, fixLarkTables)
call to run after fixTopLevelSoftbreaks (i.e., call md =
fixTopLevelSoftbreaks(md) before applying fixLarkTables), or equivalently ensure
fixTopLevelSoftbreaks skips table blocks created by fixLarkTables; update the
sequence that currently lists applyOutsideCodeFences(md, fixLarkTables) before
fixTopLevelSoftbreaks to preserve contiguous table rows.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 838a0b6f-762d-4685-952c-08835e5f03f2
📒 Files selected for processing (3)
shortcuts/doc/markdown_fix.goshortcuts/doc/markdown_fix_hardening_test.goshortcuts/doc/markdown_fix_test.go
| // Check for merged cells - if present, skip conversion and keep XML | ||
| if strings.Contains(tableMatch, "colspan=") || strings.Contains(tableMatch, "rowspan=") { | ||
| return tableMatch |
There was a problem hiding this comment.
Detect merged-cell attributes with XML-compatible spacing/case.
The current exact substring check misses merged cells like colspan = "2" or ROWSPAN="2", so those tables would be converted instead of skipped.
🛡️ Proposed robust merged-cell detection
+var larkMergedCellAttrRe = regexp.MustCompile(`(?i)\b(?:colspan|rowspan)\s*=`)
+
func fixLarkTables(md string) string {
// Match entire <lark-table>...</lark-table> blocks
tableRe := regexp.MustCompile(`(?s)<lark-table[^>]*>(.*?)</lark-table>`)
return tableRe.ReplaceAllStringFunc(md, func(tableMatch string) string {
// Check for merged cells - if present, skip conversion and keep XML
- if strings.Contains(tableMatch, "colspan=") || strings.Contains(tableMatch, "rowspan=") {
+ if larkMergedCellAttrRe.MatchString(tableMatch) {
return tableMatch
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@shortcuts/doc/markdown_fix.go` around lines 558 - 560, The current exact
substring checks on tableMatch miss cases like `colspan = "2"` or `ROWSPAN="2"`;
update the merged-cell detection to use a case-insensitive regex that allows
optional whitespace around the equals sign (for example
`(?i)\b(colspan|rowspan)\b\s*=`) instead of the two strings.Contains checks.
Replace the block that tests tableMatch for "colspan=" and "rowspan=" with a
precompiled regexp (e.g. regexp.MustCompile) and use its MatchString on
tableMatch to decide whether to return tableMatch unchanged.
| content := strings.TrimSpace(cell[1]) | ||
| // Handle multiline content | ||
| content = strings.ReplaceAll(content, "\n", "<br/>") | ||
| // Escape pipe characters | ||
| content = strings.ReplaceAll(content, "|", `\|`) | ||
| cellContents = append(cellContents, content) | ||
| } | ||
|
|
||
| mdRows = append(mdRows, "| "+strings.Join(cellContents, " | ")+" |") | ||
| if len(cellContents) > colCount { | ||
| colCount = len(cellContents) | ||
| } | ||
| } | ||
|
|
||
| if len(mdRows) == 0 { | ||
| return tableMatch | ||
| } | ||
|
|
||
| // Build separator row after the first row (header) | ||
| separator := "|" + strings.Repeat(" --- |", colCount) | ||
| if len(mdRows) > 0 { | ||
| mdRows = append([]string{mdRows[0], separator}, mdRows[1:]...) | ||
| } | ||
|
|
||
| return strings.Join(mdRows, "\n") |
There was a problem hiding this comment.
Preserve CRLF line endings during table conversion.
For CRLF documents, this path emits converted table rows with \n and converts multiline cells as line1\r<br/>line2, which violates the existing CRLF preservation invariant for exported Markdown.
🧩 Proposed CRLF-safe conversion
+ lineEnding := "\n"
+ if strings.Contains(tableMatch, "\r\n") {
+ lineEnding = "\r\n"
+ }
+
var mdRows []string
colCount := 0
for _, row := range rows {
@@
var cellContents []string
for _, cell := range cells {
content := strings.TrimSpace(cell[1])
// Handle multiline content
+ content = strings.ReplaceAll(content, "\r\n", "<br/>")
content = strings.ReplaceAll(content, "\n", "<br/>")
+ content = strings.ReplaceAll(content, "\r", "<br/>")
// Escape pipe characters
content = strings.ReplaceAll(content, "|", `\|`)
cellContents = append(cellContents, content)
}
@@
- return strings.Join(mdRows, "\n")
+ return strings.Join(mdRows, lineEnding)
})
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@shortcuts/doc/markdown_fix.go` around lines 582 - 606, Detect and preserve
the original EOL when converting the table: check for CRLF by testing if
tableMatch contains "\r\n" and set a local eol variable (e.g., eol := "\n" or
"\r\n"); when transforming multiline cell content in the content variable,
replace "\r\n" first with "<br/>"+eol and then replace remaining "\n" with
"<br/>"+eol (so you don't leave a stray '\r' before the <br/>); finally use
strings.Join(mdRows, eol) when returning the assembled table instead of joining
with "\n" so mdRows, separator, and the overall return preserve the document's
original CRLF or LF; update references in this block using content, mdRows,
separator, colCount, and tableMatch.
Convert Feishu/Lark XML tables (, , ) to standard Markdown tables in docs +fetch output (applies to all formats).
Features:
Summary
Changes
Test Plan
lark xxxcommand works as expectedRelated Issues
Summary by CodeRabbit