Skip to content

feat(doc): normalize exported sheet and video tags#672

Closed
HomyeeKing wants to merge 2 commits intolarksuite:mainfrom
HomyeeKing:feat/export
Closed

feat(doc): normalize exported sheet and video tags#672
HomyeeKing wants to merge 2 commits intolarksuite:mainfrom
HomyeeKing:feat/export

Conversation

@HomyeeKing
Copy link
Copy Markdown

@HomyeeKing HomyeeKing commented Apr 27, 2026

变更说明

  • 将导出的电子表格标签从 sheet-id 规范化为 id
  • 将导出的视频结构从 <figure><source .../></figure> 规范化为 <video> 标签
  • 为真实的 docs_ai 导出结构补充对应测试

导出格式变更对比

电子表格

场景 导出形式
PR 修改前 <sheet sheet-id="jkxrFs" token="FhJTsZVIihsBEStE5RVcr6Fbnrf"></sheet>
PR 修改后 <sheet id="jkxrFs" token="FhJTsZVIihsBEStE5RVcr6Fbnrf"></sheet>

视频

场景 导出形式
PR 修改前 <figure view-type="Preview"><source href="..." mime="video/quicktime" origin-height="1892.000000" origin-width="3016.000000" token="RXwJbGRG9orDqhxuYKvch0TqnBf"/></figure>
PR 修改后 <video controls src="feishu://media/RXwJbGRG9orDqhxuYKvch0TqnBf" data-mime="video/quicktime" data-view-type="Preview"></video>

测试

  • go test ./shortcuts/doc

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

📝 Walkthrough

Walkthrough

This PR introduces a bidirectional content normalization layer for Feishu document operations. It adds a compatibility transformation system that converts Feishu-specific video markup to file tokens on input (create/update operations) and reverses this conversion on output (fetch operations) to maintain format compatibility across different doc formats.

Changes

Cohort / File(s) Summary
Content Normalization Layer
shortcuts/doc/docs_content_compat.go, shortcuts/doc/docs_content_compat_test.go
Implements normalizeDocInputContent function that rewrites <video> tags with feishu://media/ sources into self-closing <file> elements with extracted token, name, and view-type attributes for xml/markdown formats; includes comprehensive test coverage for format-specific behavior and attribute preservation.
Create Operation Integration
shortcuts/doc/docs_create_v2.go, shortcuts/doc/docs_create_test.go
Modifies buildCreateBody to capture doc-format and pass it through normalizeDocInputContent before sending content payload; adds test validating video markup normalization during creation.
Fetch Operation Post-Processing
shortcuts/doc/docs_fetch_v2.go, shortcuts/doc/docs_fetch_v2_test.go
Adds normalizeFetchedDocumentContent function to reverse normalization on fetched content: converts <file> and <figure><source> elements back to <video> tags with proper attributes, and rewrites <sheet sheet-id=...> to <sheet id=...>; includes extensive test coverage for video/sheet tag conversions.
Update Operation Integration
shortcuts/doc/docs_update_v2.go, shortcuts/doc/docs_update_test.go
Modifies buildUpdateBody to capture doc-format and normalize content payload via normalizeDocInputContent before transmission; adds test validating content transformation during updates.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

size/M, domain/ccm

Suggested reviewers

  • fangshuyu-768
  • liangshuo-1

Poem

🐰 Through formats wide, a rabbit hops with glee,
Converting videos and sheets with parsing spree!
From feishu tokens, tags transform and dance,
Normalizing up and down—a document's sweet chance! 📹✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description is provided in Chinese but lacks the required English sections (Summary, Changes, Test Plan) from the template and misses key structural information. Provide a complete description in the required template format with English sections including a brief summary, list of main changes, test plan with checkboxes, and related issues.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: normalizing exported sheet and video tags, which aligns with the core functionality across the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact labels Apr 27, 2026
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
shortcuts/doc/docs_fetch_v2.go (2)

83-87: Order coupling: <file> regex requires token before name.

fetchedVideoFileTagRe only matches <file> tags where token="…" appears before name="…". That happens to match what normalizeInputVideoTags in shortcuts/doc/docs_content_compat.go emits (line 42-49), and the test fixtures all use that order. If the docs_ai export ever serializes attributes in a different order, the file→video rewrite will silently no-op (the tag passes through unchanged) rather than fail — which would be a hard-to-spot regression. Two low-cost options:

  • Decouple the regex (match <file …/>) and pull token/name via fetchedAttrValue, mirroring how view-type is already extracted at line 150.
  • Or add a regression test asserting that <file name="…" token="…"/> is also rewritten, so the coupling is at least pinned.

Not blocking since current callers are consistent.

♻️ Sketch: attribute-order-independent file regex
-	fetchedVideoFileTagRe   = regexp.MustCompile(`<file\s+([^>]*\btoken="([^"]+)"[^>]*\bname="([^"]+)"[^>]*)/>`)
+	fetchedVideoFileTagRe   = regexp.MustCompile(`<file\s+[^>]*/>`)

Then in normalizeFetchedVideoTags extract via fetchedAttrValue(tag, "token") / fetchedAttrValue(tag, "name") and early-return when either is empty.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/docs_fetch_v2.go` around lines 83 - 87, The <file> tag regex
fetchedVideoFileTagRe is order-coupled (requires token before name) and can miss
tags with attributes in different orders; update normalizeFetchedVideoTags to
use a relaxed regex for fetchedVideoFileTagRe that simply matches <file .../>
(mirror how view-type is handled) and then extract token and name using
fetchedAttrValue(tag, "token") and fetchedAttrValue(tag, "name"), returning
early if either is empty; alternatively add a regression test ensuring a tag
like <file name="..." token="..."/> is rewritten to pin the current behavior.

185-203: Optional: consider a single regex over the suffix list.

Six strings.HasSuffix branches read fine, but a single switch over filepath.Ext(name) would compress this and make the supported-extensions list scannable. Purely a style/readability nit.

♻️ Suggested simplification
 func isFetchedVideoFilename(name string) bool {
-	name = strings.ToLower(strings.TrimSpace(name))
-	switch {
-	case strings.HasSuffix(name, ".mp4"):
-		return true
-	case strings.HasSuffix(name, ".mov"):
-		return true
-	case strings.HasSuffix(name, ".m4v"):
-		return true
-	case strings.HasSuffix(name, ".webm"):
-		return true
-	case strings.HasSuffix(name, ".avi"):
-		return true
-	case strings.HasSuffix(name, ".mkv"):
-		return true
-	default:
-		return false
-	}
+	switch strings.ToLower(filepath.Ext(strings.TrimSpace(name))) {
+	case ".mp4", ".mov", ".m4v", ".webm", ".avi", ".mkv":
+		return true
+	default:
+		return false
+	}
 }

Requires adding "path/filepath" to imports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/docs_fetch_v2.go` around lines 185 - 203, The
isFetchedVideoFilename function currently checks many suffixes with repeated
strings.HasSuffix calls; replace that with a single extraction of the file
extension (use filepath.Ext(name) after trimming/ToLower) and switch or map over
the extension (e.g., case ".mp4", ".mov", etc.) to return true for supported
extensions; update imports to include "path/filepath" and keep the existing
normalization (strings.TrimSpace/ToLower) before calling filepath.Ext.
shortcuts/doc/docs_content_compat.go (1)

12-12: Optional: name overlap with fetch-side regexes.

fetchedVideoTagRe here matches input <video> tags being normalized back into <file/>, while fetchedVideoFileTagRe / fetchedVideoFigureTagRe in shortcuts/doc/docs_fetch_v2.go operate on fetch output. The shared fetched* prefix is technically accurate (these tags originate from a prior fetch) but on a quick read it can look like input/output regexes are intermixed. A name like inputVideoTagRe (or moving all three regexes into a single file with directional prefixes inboundVideoTagRe / outboundVideo*TagRe) would make the direction obvious. No behavior change.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shortcuts/doc/docs_content_compat.go` at line 12, Rename the ambiguous regex
variable fetchedVideoTagRe to a direction-explicit name (e.g., inputVideoTagRe
or inboundVideoTagRe) to avoid confusion with fetchedVideoFileTagRe and
fetchedVideoFigureTagRe; update the declaration (var fetchedVideoTagRe = ...)
and all references/usages across the codebase to the new identifier so the regex
that normalizes incoming <video> tags is clearly distinguished from the
fetch-output regexes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@shortcuts/doc/docs_content_compat.go`:
- Line 12: Rename the ambiguous regex variable fetchedVideoTagRe to a
direction-explicit name (e.g., inputVideoTagRe or inboundVideoTagRe) to avoid
confusion with fetchedVideoFileTagRe and fetchedVideoFigureTagRe; update the
declaration (var fetchedVideoTagRe = ...) and all references/usages across the
codebase to the new identifier so the regex that normalizes incoming <video>
tags is clearly distinguished from the fetch-output regexes.

In `@shortcuts/doc/docs_fetch_v2.go`:
- Around line 83-87: The <file> tag regex fetchedVideoFileTagRe is order-coupled
(requires token before name) and can miss tags with attributes in different
orders; update normalizeFetchedVideoTags to use a relaxed regex for
fetchedVideoFileTagRe that simply matches <file .../> (mirror how view-type is
handled) and then extract token and name using fetchedAttrValue(tag, "token")
and fetchedAttrValue(tag, "name"), returning early if either is empty;
alternatively add a regression test ensuring a tag like <file name="..."
token="..."/> is rewritten to pin the current behavior.
- Around line 185-203: The isFetchedVideoFilename function currently checks many
suffixes with repeated strings.HasSuffix calls; replace that with a single
extraction of the file extension (use filepath.Ext(name) after trimming/ToLower)
and switch or map over the extension (e.g., case ".mp4", ".mov", etc.) to return
true for supported extensions; update imports to include "path/filepath" and
keep the existing normalization (strings.TrimSpace/ToLower) before calling
filepath.Ext.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2374a61d-e7dc-4673-a740-6e0af88020f1

📥 Commits

Reviewing files that changed from the base of the PR and between 1e2144e and c23c163.

📒 Files selected for processing (8)
  • shortcuts/doc/docs_content_compat.go
  • shortcuts/doc/docs_content_compat_test.go
  • shortcuts/doc/docs_create_test.go
  • shortcuts/doc/docs_create_v2.go
  • shortcuts/doc/docs_fetch_v2.go
  • shortcuts/doc/docs_fetch_v2_test.go
  • shortcuts/doc/docs_update_test.go
  • shortcuts/doc/docs_update_v2.go

@HomyeeKing HomyeeKing closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/ccm PR touches the ccm domain size/M Single-domain feat or fix with limited business impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants