feat: support importing documents larger than 20MB#220
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughDrive import now preflights and validates local files, choosing single-request Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as Client (CLI)
participant Drive as Drive API
participant Import as Import Tasks
CLI->>CLI: preflightDriveImportFile (validate path, stat, validate size)
alt fileSize <= maxDriveUploadFileSize
CLI->>Drive: POST /open-apis/drive/v1/medias/upload_all (file)
Drive-->>CLI: { file_token }
else fileSize > maxDriveUploadFileSize
CLI->>Drive: POST /open-apis/drive/v1/medias/upload_prepare (meta)
Drive-->>CLI: { upload_id, block_size, block_num }
loop for seq in 1..block_num
CLI->>Drive: POST /open-apis/drive/v1/medias/upload_part (upload_id, seq, chunk)
Drive-->>CLI: { success }
end
CLI->>Drive: POST /open-apis/drive/v1/medias/upload_finish (upload_id)
Drive-->>CLI: { file_token }
end
CLI->>Import: POST /open-apis/import_tasks (file_token, extra)
Import-->>CLI: { task_id / status }
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Greptile SummaryThis PR adds multipart chunked upload support ( Confidence Score: 5/5Safe to merge — the one remaining finding is a minor test coverage gap and does not affect production behaviour. All P0/P1 concerns from prior rounds (unused ctx, duplicate stderr prints, dead code, dry-run parent_node omission) are pre-existing or already fixed. The only new finding is a P2 test gap: the integration test for multipart upload doesn't assert the required parent_node: "" field in the prepare body. Production code sends it correctly; this is purely a regression-safety suggestion. shortcuts/drive/drive_import_common_test.go — add parent_node assertion in the multipart prepare body check. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[drive +import --file F --type T] --> B[preflightDriveImportFile\npath normalize · stat · size limit check]
B -->|error| Z[return error]
B -->|ok, fileSize| C{fileSize > 20 MB?}
C -->|no| D[uploadMediaForImportAll\nPOST medias/upload_all]
C -->|yes| E[prepareMediaImportUpload\nPOST medias/upload_prepare\nreturns upload_id · block_size · block_num]
E --> F[Loop: uploadMediaImportPart\nPOST medias/upload_part\nfor each chunk]
F --> G[finishMediaImportUpload\nPOST medias/upload_finish\nreturns file_token]
D --> H[createDriveImportTask\nPOST import_tasks → ticket]
G --> H
H --> I[pollDriveImportTask\nGET import_tasks/:ticket\npoll up to 30x · 2s interval]
I -->|ready| J[output token + URL]
I -->|timeout| K[output next_command for resume]
I -->|failed| Z
Reviews (6): Last reviewed commit: "support importing documents larger than ..." | Re-trigger Greptile |
🚀 PR Preview Install Guide🧰 CLI updatenpm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@e49413d1f28b7625e6f21ee936810197b070cc46🧩 Skill updatenpx skills add larksuite/cli#feat/media-chunked-upload -y -g |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
shortcuts/drive/drive_import_common_test.go (1)
226-236: Assertparent_nodewhile you're already validating the prepare body.
prepareMediaImportUpload()treatsparent_node: ""as required for/medias/upload_prepare, but this check never verifies it. Dropping that field would break multipart upload while this suite still passes.Suggested assertion
if got, _ := prepareBody["size"].(float64); got != float64(maxDriveUploadFileSize+1) { t.Fatalf("prepare size = %v, want %d", got, maxDriveUploadFileSize+1) } + if got, _ := prepareBody["parent_node"].(string); got != "" { + t.Fatalf("prepare parent_node = %q, want empty string", got) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/drive/drive_import_common_test.go` around lines 226 - 236, The test currently validates fields in the prepare request body but misses asserting parent_node; update the assertions after calling decodeCapturedJSONBody(t, prepareStub) to check that prepareBody["parent_node"] exists and equals the empty string (""), mirroring the requirement in prepareMediaImportUpload; use the same pattern as the existing checks (e.g., if got, _ := prepareBody["parent_node"].(string); got != "" { t.Fatalf(...)} ) so dropping parent_node will fail the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@shortcuts/drive/drive_import.go`:
- Around line 143-166: The dry-run multipart sequence in drive_import.go is not
mirroring the real upload flow in drive_import_common.go: include the same
"parent_node": "" field in the upload_prepare Body and emit one upload_part POST
per actual chunk instead of a single placeholder; update the Body payloads to
use the real size and per-chunk fields (use spec.SourceFileName(), spec.DocType,
spec.FileExtension() and compute chunk count from the file size / chunk size
logic used where maxDriveUploadFileSize is referenced) so the dry-run reports
the correct request bodies and number of /open-apis/drive/v1/medias/upload_part
calls.
---
Nitpick comments:
In `@shortcuts/drive/drive_import_common_test.go`:
- Around line 226-236: The test currently validates fields in the prepare
request body but misses asserting parent_node; update the assertions after
calling decodeCapturedJSONBody(t, prepareStub) to check that
prepareBody["parent_node"] exists and equals the empty string (""), mirroring
the requirement in prepareMediaImportUpload; use the same pattern as the
existing checks (e.g., if got, _ := prepareBody["parent_node"].(string); got !=
"" { t.Fatalf(...)} ) so dropping parent_node will fail the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 53227584-86f5-456d-ad41-c0b1ff89eb24
📒 Files selected for processing (5)
shortcuts/drive/drive_import.goshortcuts/drive/drive_import_common.goshortcuts/drive/drive_import_common_test.goshortcuts/drive/drive_import_test.goskills/lark-drive/references/lark-drive-import.md
2e528f1 to
2394bd2
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@shortcuts/drive/drive_import.go`:
- Around line 146-151: The dry-run payload builds the JSON "extra" string with
fmt.Sprintf in handle(s) that use spec (currently at the Body call and again
later), which duplicates logic and can produce invalid JSON; replace those
fmt.Sprintf calls with the shared helper buildImportMediaExtra(spec.DocType,
spec.FileExtension()) (from shortcuts/drive/drive_import_common.go) so the
"extra" field is constructed centrally and correctly (use spec as the source for
DocType/FileExtension and pass its values to buildImportMediaExtra in both spots
where the manual JSON string is created).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 58d4b7b2-ff83-4f68-aab2-8d87bf88f40a
📒 Files selected for processing (5)
shortcuts/drive/drive_import.goshortcuts/drive/drive_import_common.goshortcuts/drive/drive_import_common_test.goshortcuts/drive/drive_import_test.goskills/lark-drive/references/lark-drive-import.md
✅ Files skipped from review due to trivial changes (1)
- skills/lark-drive/references/lark-drive-import.md
🚧 Files skipped from review as they are similar to previous changes (2)
- shortcuts/drive/drive_import_common.go
- shortcuts/drive/drive_import_common_test.go
2394bd2 to
334ad36
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
shortcuts/drive/drive_import_common.go (1)
164-198: Validateblock_numbefore streaming the file.You already reject a prepare/upload mismatch here, but only after all chunks have been sent. Since
fileSizeandblock_sizeare known immediately afterupload_prepare, you can fail fast on an inconsistent session and avoid wasting a large upload on a doomed transfer.♻️ Suggested change
totalBlocks := session.BlockNum + expectedBlocks := (fileSize + session.BlockSize - 1) / session.BlockSize + if expectedBlocks != session.BlockNum { + return "", output.Errorf( + output.ExitAPI, + "api_error", + "upload prepare mismatch: expected %d blocks, got %d", + expectedBlocks, + session.BlockNum, + ) + } fmt.Fprintf(runtime.IO().ErrOut, "Multipart upload initialized: %d chunks x %s\n", totalBlocks, common.FormatSize(int64(session.BlockSize))) @@ - if session.BlockNum > 0 && session.BlockNum != uploadedBlocks { - return "", output.Errorf(output.ExitAPI, "api_error", "upload prepare mismatch: expected %d blocks, uploaded %d", session.BlockNum, uploadedBlocks) + if expectedBlocks != uploadedBlocks { + return "", output.Errorf(output.ExitAPI, "api_error", "upload prepare mismatch: expected %d blocks, uploaded %d", expectedBlocks, uploadedBlocks) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/drive/drive_import_common.go` around lines 164 - 198, Check and validate session.BlockNum against the computed number of chunks before opening/streaming the file: compute expectedBlocks = int((fileSize + int64(session.BlockSize) - 1) / int64(session.BlockSize)) and if session.BlockNum == 0 or session.BlockNum != expectedBlocks return an error immediately (using the same output.Errorf/output.ErrValidation pattern) so the code using session.BlockNum, session.BlockSize, fileSize, filePath and the multipart loop (uploadMediaImportPart) fails fast instead of streaming all chunks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@shortcuts/drive/drive_import.go`:
- Around line 141-184: appendDriveImportUploadDryRun currently branches only on
os.Stat size; instead run the same preflight used by Execute: call
validate.SafeInputPath(spec.FilePath) and validateDriveImportFileSize(...) (the
same signature/arguments Execute uses) before deciding upload path. If
SafeInputPath or validateDriveImportFileSize returns an error that would
abort/skip the real upload, do not append any upload POSTs in the dry-run;
otherwise use the validated size/result to choose the multipart flow
(upload_prepare/upload_part/upload_finish) vs single upload_all, and keep using
buildImportMediaExtra(spec.FilePath, spec.DocType) for the extra payload.
---
Nitpick comments:
In `@shortcuts/drive/drive_import_common.go`:
- Around line 164-198: Check and validate session.BlockNum against the computed
number of chunks before opening/streaming the file: compute expectedBlocks =
int((fileSize + int64(session.BlockSize) - 1) / int64(session.BlockSize)) and if
session.BlockNum == 0 or session.BlockNum != expectedBlocks return an error
immediately (using the same output.Errorf/output.ErrValidation pattern) so the
code using session.BlockNum, session.BlockSize, fileSize, filePath and the
multipart loop (uploadMediaImportPart) fails fast instead of streaming all
chunks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ed640133-72a1-4776-9c2f-9ccd21acee25
📒 Files selected for processing (5)
shortcuts/drive/drive_import.goshortcuts/drive/drive_import_common.goshortcuts/drive/drive_import_common_test.goshortcuts/drive/drive_import_test.goskills/lark-drive/references/lark-drive-import.md
334ad36 to
57d03aa
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
shortcuts/drive/drive_import_common.go (1)
96-125: Redundant file stat and validation inuploadMediaForImport.The
Executepath indrive_import.go(lines 78-79) already callspreflightDriveImportFilewhich performsos.StatandvalidateDriveImportFileSize, but discards the returnedfileSize. ThenuploadMediaForImportrepeats both operations here. Consider passing the pre-computedfileSizeto avoid redundant I/O and validation.♻️ Suggested refactor
Update the function signature to accept an optional pre-computed size:
-func uploadMediaForImport(ctx context.Context, runtime *common.RuntimeContext, filePath, fileName, docType string) (string, error) { - importInfo, err := os.Stat(filePath) - if err != nil { - return "", output.ErrValidation("cannot read file: %s", err) - } - - fileSize := importInfo.Size() - if err = validateDriveImportFileSize(filePath, docType, fileSize); err != nil { - return "", err - } +func uploadMediaForImport(ctx context.Context, runtime *common.RuntimeContext, filePath, fileName, docType string, precomputedSize int64) (string, error) { + fileSize := precomputedSize + if fileSize == 0 { + importInfo, err := os.Stat(filePath) + if err != nil { + return "", output.ErrValidation("cannot read file: %s", err) + } + fileSize = importInfo.Size() + if err = validateDriveImportFileSize(filePath, docType, fileSize); err != nil { + return "", err + } + }And in
drive_import.goExecute:- if _, err := preflightDriveImportFile(&spec); err != nil { + fileSize, err := preflightDriveImportFile(&spec) + if err != nil { return err } - fileToken, uploadErr := uploadMediaForImport(ctx, runtime, spec.FilePath, spec.SourceFileName(), spec.DocType) + fileToken, uploadErr := uploadMediaForImport(ctx, runtime, spec.FilePath, spec.SourceFileName(), spec.DocType, fileSize)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shortcuts/drive/drive_import_common.go` around lines 96 - 125, uploadMediaForImport currently repeats os.Stat and validateDriveImportFileSize even though preflightDriveImportFile (called from drive_import.go Execute) already did those checks and computed the size; change uploadMediaForImport to accept an optional pre-computed fileSize (e.g., add a fileSize int64 parameter or pointer/zero sentinel), use that value when provided and skip os.Stat/validateDriveImportFileSize, otherwise fall back to the existing logic; update the caller in drive_import.go Execute to pass the size returned by preflightDriveImportFile; keep subsequent uses of driveUploadSizeValue, buildImportMediaExtra, uploadMediaForImportAll and uploadMediaForImportMultipart unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@shortcuts/drive/drive_import_common.go`:
- Around line 96-125: uploadMediaForImport currently repeats os.Stat and
validateDriveImportFileSize even though preflightDriveImportFile (called from
drive_import.go Execute) already did those checks and computed the size; change
uploadMediaForImport to accept an optional pre-computed fileSize (e.g., add a
fileSize int64 parameter or pointer/zero sentinel), use that value when provided
and skip os.Stat/validateDriveImportFileSize, otherwise fall back to the
existing logic; update the caller in drive_import.go Execute to pass the size
returned by preflightDriveImportFile; keep subsequent uses of
driveUploadSizeValue, buildImportMediaExtra, uploadMediaForImportAll and
uploadMediaForImportMultipart unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 050424b3-6f3c-4141-8731-1de48b37eabd
📒 Files selected for processing (5)
shortcuts/drive/drive_import.goshortcuts/drive/drive_import_common.goshortcuts/drive/drive_import_common_test.goshortcuts/drive/drive_import_test.goskills/lark-drive/references/lark-drive-import.md
57d03aa to
2965a0c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@shortcuts/drive/drive_import.go`:
- Around line 150-157: The preflightDriveImportFile flow uses os.Stat but
doesn't ensure the path is a regular file, so directories/FIFOs can pass and
later cause os.Open to block or fail; update preflightDriveImportFile to check
info.Mode().IsRegular() (negate with !info.Mode().IsRegular()) after os.Stat and
before calling validateDriveImportFileSize, and if not regular return a
validation error (e.g., use output.ErrValidation("cannot read file: not a
regular file: %s", spec.FilePath) or similar) so
uploadMediaForImportAll/uploadMediaForImportMultipart won't attempt to open
non-regular files.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8bafbfb9-fb27-4cc7-bb45-9a9ddfcd3398
📒 Files selected for processing (5)
shortcuts/drive/drive_import.goshortcuts/drive/drive_import_common.goshortcuts/drive/drive_import_common_test.goshortcuts/drive/drive_import_test.goskills/lark-drive/references/lark-drive-import.md
🚧 Files skipped from review as they are similar to previous changes (2)
- shortcuts/drive/drive_import_test.go
- shortcuts/drive/drive_import_common.go
Change-Id: I445d629c080a5e9834e277d871406d34452bf1be
2965a0c to
e49413d
Compare
Summary
This PR adds support for importing documents larger than 20MB in
lark-cli drive +import.For large files, the shortcut now switches from
medias/upload_allto the multipart upload flow before creating the import task and polling the final result.Changes
upload_prepare -> upload_part -> upload_finishTest Plan
lark xxxcommand works as expectedManual verification:
go test ./shortcuts/drive -run 'TestDriveImportDryRunUsesExtensionlessDefaultName|TestDriveImportDryRunShowsMultipartUploadForLargeFile|TestValidateDriveImportSpecRejectsMismatchedType'lark-cli drive +import --file ./20M.docx --type docxsucceeds for a file larger than 20MBupload_prepare,upload_part, andupload_finishfor large filesRelated Issues
Summary by CodeRabbit
New Features
Bug Fixes
.xlsno longer allowed for certain targets).Documentation
Tests