feat: add MinerU document parse channel#4757
Conversation
- Add ChannelTypeMinerU (58) and APITypeMinerU - Add POST /v1/file_parse relay endpoint with multipart support - Implement MinerU adaptor for /file_parse synchronous parsing - model parameter is used for gateway channel selection only - backend parameter is passed through to MinerU directly - Support per-call billing (UsePrice) mechanism - Add channel health check test (/health endpoint) Tested locally against http://192.168.225.193:9000/file_parse
WalkthroughThis PR adds MinerU document parsing support by introducing a new channel type ( ChangesMinerU Document Extraction Integration
Frontend Configuration & OAuth Updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
relay/channel/mineru/adaptor.go (1)
187-194: 💤 Low valueClarify the status check comment.
The comment mentions "pending" but the code checks for
"failed". If"pending"also represents an error state for synchronous parsing, it should be checked as well. Otherwise, update the comment to accurately reflect the check.📝 Suggested comment fix
- // Check status - pending means something went wrong since we expect completed + // Check status - failed means processing failed if result.Status == "failed" { return nil, types.NewErrorWithStatusCode( errors.New("mineru processing failed"),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@relay/channel/mineru/adaptor.go` around lines 187 - 194, Decide whether a "pending" status should be treated as an error; if it should, change the condition in the adaptor.go block that checks result.Status to include pending (e.g., result.Status == "failed" || result.Status == "pending") and keep the comment indicating pending means something went wrong; otherwise update the comment to accurately state that the code checks for "failed" (replace "pending" with "failed") so the comment matches the condition that uses result.Status.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@controller/channel-test.go`:
- Line 84: Replace the blocking call at "resp, err := http.Get(healthURL)" with
an http.Client that has a timeout (e.g., client := &http.Client{Timeout: 5 *
time.Second}) and call client.Get(healthURL) instead; ensure you still check err
before using resp, always defer resp.Body.Close() when resp != nil, and consider
making the timeout a constant or config value so the channel test goroutine
cannot hang indefinitely.
In `@middleware/distributor.go`:
- Around line 273-283: ParseMultipartForm's error must be checked and handled:
call c.Request.ParseMultipartForm(64<<20), capture the error returned from
ParseMultipartForm, and if non-nil log the error (use existing logger or
c.Error/c.AbortWithStatusJSON) and avoid proceeding to read
c.Request.MultipartForm; only read MultipartForm.Value["model"] and set
modelRequest.Model and c.Set("relay_mode",
relayconstant.RelayModeDocumentExtract) when parsing succeeded and MultipartForm
is non-nil. Update the block that uses c.Request.ParseMultipartForm,
modelRequest.Model, and relayconstant.RelayModeDocumentExtract to perform the
error check and short-circuit on failure.
In `@relay/channel/mineru/adaptor.go`:
- Around line 104-121: In the loop that iterates fileHeaders, after opening each
file with fileHeader.Open() immediately call defer file.Close() to ensure the
file is always closed on function exit (replace the explicit file.Close() calls
in each error path and at the end of the loop); keep the existing error handling
for writer.CreateFormFile and io.Copy but remove manual closes there so
resources are managed by the deferred file.Close() tied to the file returned by
fileHeader.Open().
In `@relay/document_handler.go`:
- Line 60: The call to service.PostTextConsumeQuota assumes usage is *dto.Usage
and will panic if DoResponse returned another type; change the unchecked
assertion usage.(*dto.Usage) to a checked form: perform a type assertion with
the ok comma form (e.g., u, ok := usage.(*dto.Usage)), check ok (and nil) and
handle the failure path (log/error/return) before calling
service.PostTextConsumeQuota; update any related variable names around
DoResponse and service.PostTextConsumeQuota to use the safely asserted u value.
---
Nitpick comments:
In `@relay/channel/mineru/adaptor.go`:
- Around line 187-194: Decide whether a "pending" status should be treated as an
error; if it should, change the condition in the adaptor.go block that checks
result.Status to include pending (e.g., result.Status == "failed" ||
result.Status == "pending") and keep the comment indicating pending means
something went wrong; otherwise update the comment to accurately state that the
code checks for "failed" (replace "pending" with "failed") so the comment
matches the condition that uses result.Status.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eff785db-4442-4629-924f-08df0d4d74a6
📒 Files selected for processing (20)
common/api_type.goconstant/api_type.goconstant/channel.gocontroller/channel-test.gocontroller/relay.godto/document.gomiddleware/distributor.gorelay/channel/mineru/adaptor.gorelay/channel/mineru/constants.gorelay/channel/mineru/dto.gorelay/common/relay_info.gorelay/constant/relay_mode.gorelay/document_handler.gorelay/helper/valid_request.gorelay/relay_adaptor.gorouter/relay-router.gotypes/relay_format.goweb/src/constants/channel.constants.jsweb/src/helpers/render.jsxweb/vite.config.js
💤 Files with no reviewable changes (1)
- web/src/helpers/render.jsx
| baseURL = *channel.BaseURL | ||
| } | ||
| healthURL := strings.TrimSuffix(baseURL, "/") + "/health" | ||
| resp, err := http.Get(healthURL) |
There was a problem hiding this comment.
Add timeout to the HTTP GET request.
The http.Get call has no timeout and will block indefinitely if the MinerU backend is unresponsive, potentially hanging the channel test goroutine and degrading the test-all-channels flow.
⏱️ Proposed fix: use http.Client with timeout
- healthURL := strings.TrimSuffix(baseURL, "/") + "/health"
- resp, err := http.Get(healthURL)
+ healthURL := strings.TrimSuffix(baseURL, "/") + "/health"
+ client := &http.Client{Timeout: 10 * time.Second}
+ resp, err := client.Get(healthURL)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resp, err := http.Get(healthURL) | |
| client := &http.Client{Timeout: 10 * time.Second} | |
| resp, err := client.Get(healthURL) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@controller/channel-test.go` at line 84, Replace the blocking call at "resp,
err := http.Get(healthURL)" with an http.Client that has a timeout (e.g., client
:= &http.Client{Timeout: 5 * time.Second}) and call client.Get(healthURL)
instead; ensure you still check err before using resp, always defer
resp.Body.Close() when resp != nil, and consider making the timeout a constant
or config value so the channel test goroutine cannot hang indefinitely.
| } else if strings.HasPrefix(c.Request.URL.Path, "/v1/file_parse") { | ||
| // MinerU document extract uses multipart form-data | ||
| if c.Request.MultipartForm == nil { | ||
| c.Request.ParseMultipartForm(64 << 20) | ||
| } | ||
| if c.Request.MultipartForm != nil { | ||
| if values, ok := c.Request.MultipartForm.Value["model"]; ok && len(values) > 0 { | ||
| modelRequest.Model = values[0] | ||
| } | ||
| } | ||
| c.Set("relay_mode", relayconstant.RelayModeDocumentExtract) |
There was a problem hiding this comment.
Missing error handling for multipart form parsing.
Line 276 calls ParseMultipartForm but ignores the returned error. If parsing fails (e.g., due to malformed multipart data), the error is silently discarded, and subsequent code at lines 278-282 proceeds to access c.Request.MultipartForm, which may be nil or incomplete. This could lead to unexpected behavior or incomplete model extraction.
🛡️ Proposed fix to handle parsing errors
} else if strings.HasPrefix(c.Request.URL.Path, "/v1/file_parse") {
// MinerU document extract uses multipart form-data
if c.Request.MultipartForm == nil {
- c.Request.ParseMultipartForm(64 << 20)
+ if err := c.Request.ParseMultipartForm(64 << 20); err != nil {
+ // Log the error but continue - model extraction is optional
+ common.SysError(fmt.Sprintf("Failed to parse multipart form: %v", err))
+ }
}
if c.Request.MultipartForm != nil {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } else if strings.HasPrefix(c.Request.URL.Path, "/v1/file_parse") { | |
| // MinerU document extract uses multipart form-data | |
| if c.Request.MultipartForm == nil { | |
| c.Request.ParseMultipartForm(64 << 20) | |
| } | |
| if c.Request.MultipartForm != nil { | |
| if values, ok := c.Request.MultipartForm.Value["model"]; ok && len(values) > 0 { | |
| modelRequest.Model = values[0] | |
| } | |
| } | |
| c.Set("relay_mode", relayconstant.RelayModeDocumentExtract) | |
| } else if strings.HasPrefix(c.Request.URL.Path, "/v1/file_parse") { | |
| // MinerU document extract uses multipart form-data | |
| if c.Request.MultipartForm == nil { | |
| if err := c.Request.ParseMultipartForm(64 << 20); err != nil { | |
| // Log the error but continue - model extraction is optional | |
| common.SysError(fmt.Sprintf("Failed to parse multipart form: %v", err)) | |
| } | |
| } | |
| if c.Request.MultipartForm != nil { | |
| if values, ok := c.Request.MultipartForm.Value["model"]; ok && len(values) > 0 { | |
| modelRequest.Model = values[0] | |
| } | |
| } | |
| c.Set("relay_mode", relayconstant.RelayModeDocumentExtract) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@middleware/distributor.go` around lines 273 - 283, ParseMultipartForm's error
must be checked and handled: call c.Request.ParseMultipartForm(64<<20), capture
the error returned from ParseMultipartForm, and if non-nil log the error (use
existing logger or c.Error/c.AbortWithStatusJSON) and avoid proceeding to read
c.Request.MultipartForm; only read MultipartForm.Value["model"] and set
modelRequest.Model and c.Set("relay_mode",
relayconstant.RelayModeDocumentExtract) when parsing succeeded and MultipartForm
is non-nil. Update the block that uses c.Request.ParseMultipartForm,
modelRequest.Model, and relayconstant.RelayModeDocumentExtract to perform the
error check and short-circuit on failure.
| for _, fileHeader := range fileHeaders { | ||
| file, err := fileHeader.Open() | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to open file %s: %w", fileHeader.Filename, err) | ||
| } | ||
|
|
||
| part, err := writer.CreateFormFile(fieldName, fileHeader.Filename) | ||
| if err != nil { | ||
| file.Close() | ||
| return nil, fmt.Errorf("failed to create form file for %s: %w", fileHeader.Filename, err) | ||
| } | ||
|
|
||
| if _, err := io.Copy(part, file); err != nil { | ||
| file.Close() | ||
| return nil, fmt.Errorf("failed to copy file %s: %w", fileHeader.Filename, err) | ||
| } | ||
| file.Close() | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Prefer defer file.Close() for safer resource management.
The current code manually closes file in each error path and at the end. Using defer immediately after Open() is safer and prevents resource leaks if the code changes or panics.
♻️ Proposed refactor
for _, fileHeader := range fileHeaders {
file, err := fileHeader.Open()
if err != nil {
return nil, fmt.Errorf("failed to open file %s: %w", fileHeader.Filename, err)
}
+ defer file.Close()
part, err := writer.CreateFormFile(fieldName, fileHeader.Filename)
if err != nil {
- file.Close()
return nil, fmt.Errorf("failed to create form file for %s: %w", fileHeader.Filename, err)
}
if _, err := io.Copy(part, file); err != nil {
- file.Close()
return nil, fmt.Errorf("failed to copy file %s: %w", fileHeader.Filename, err)
}
- file.Close()
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@relay/channel/mineru/adaptor.go` around lines 104 - 121, In the loop that
iterates fileHeaders, after opening each file with fileHeader.Open() immediately
call defer file.Close() to ensure the file is always closed on function exit
(replace the explicit file.Close() calls in each error path and at the end of
the loop); keep the existing error handling for writer.CreateFormFile and
io.Copy but remove manual closes there so resources are managed by the deferred
file.Close() tied to the file returned by fileHeader.Open().
| return newAPIError | ||
| } | ||
|
|
||
| service.PostTextConsumeQuota(c, info, usage.(*dto.Usage), nil) |
There was a problem hiding this comment.
Add type assertion safety check.
The type assertion usage.(*dto.Usage) will panic if DoResponse returns a different type. Use a checked assertion instead.
🛡️ Proposed fix
- service.PostTextConsumeQuota(c, info, usage.(*dto.Usage), nil)
- return nil
+ dtoUsage, ok := usage.(*dto.Usage)
+ if !ok {
+ return types.NewError(fmt.Errorf("invalid usage type, expected *dto.Usage, got %T", usage), types.ErrorCodeInvalidRequest, types.ErrOptionWithSkipRetry())
+ }
+ service.PostTextConsumeQuota(c, info, dtoUsage, nil)
+ return nil📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| service.PostTextConsumeQuota(c, info, usage.(*dto.Usage), nil) | |
| dtoUsage, ok := usage.(*dto.Usage) | |
| if !ok { | |
| return types.NewError(fmt.Errorf("invalid usage type, expected *dto.Usage, got %T", usage), types.ErrorCodeInvalidRequest, types.ErrOptionWithSkipRetry()) | |
| } | |
| service.PostTextConsumeQuota(c, info, dtoUsage, nil) | |
| return nil |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@relay/document_handler.go` at line 60, The call to
service.PostTextConsumeQuota assumes usage is *dto.Usage and will panic if
DoResponse returned another type; change the unchecked assertion
usage.(*dto.Usage) to a checked form: perform a type assertion with the ok comma
form (e.g., u, ok := usage.(*dto.Usage)), check ok (and nil) and handle the
failure path (log/error/return) before calling service.PostTextConsumeQuota;
update any related variable names around DoResponse and
service.PostTextConsumeQuota to use the safely asserted u value.
There was a problem hiding this comment.
Pull request overview
This PR adds a new “MinerU” relay channel to support synchronous document parsing via a new multipart endpoint (POST /v1/file_parse), including backend wiring for relay format/mode routing and basic channel health checking.
Changes:
- Introduces
RelayFormatDocumentExtract/RelayModeDocumentExtractand wires/v1/file_parseinto the relay router + request validation + handler dispatch. - Adds a new MinerU channel/adaptor implementation that proxies multipart uploads to MinerU’s
/file_parseand bills per call via fixed usage. - Updates channel type/API type constants and the web UI channel list (plus a Vite alias tweak).
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| web/vite.config.js | Adds an explicit alias for Semi UI CSS resolution. |
| web/src/helpers/render.jsx | Removes LinkedIn icon import/mapping. |
| web/src/constants/channel.constants.js | Adds MinerU (58) to UI channel options. |
| types/relay_format.go | Adds RelayFormatDocumentExtract. |
| router/relay-router.go | Adds POST /v1/file_parse relay route. |
| relay/relay_adaptor.go | Registers MinerU adaptor selection by API type. |
| relay/helper/valid_request.go | Adds multipart validation for document extract requests. |
| relay/document_handler.go | Adds relay handler helper for document extraction flow. |
| relay/constant/relay_mode.go | Adds RelayModeDocumentExtract and path mapping. |
| relay/common/relay_info.go | Adds GenRelayInfoDocumentExtract and format dispatch. |
| relay/channel/mineru/dto.go | Defines MinerU /file_parse response DTOs. |
| relay/channel/mineru/constants.go | Adds MinerU model list + channel name. |
| relay/channel/mineru/adaptor.go | Implements multipart proxying + response passthrough + per-call usage for MinerU. |
| middleware/distributor.go | Extracts model from multipart form for /v1/file_parse channel selection. |
| dto/document.go | Adds DocumentExtractRequest request type. |
| controller/relay.go | Routes RelayModeDocumentExtract to the new helper. |
| controller/channel-test.go | Adds MinerU /health connectivity test path. |
| constant/channel.go | Adds ChannelTypeMinerU (58), base URL slot, and name. |
| constant/api_type.go | Adds APITypeMinerU. |
| common/api_type.go | Maps MinerU channel type to API type. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } else if strings.HasPrefix(c.Request.URL.Path, "/v1/file_parse") { | ||
| // MinerU document extract uses multipart form-data | ||
| if c.Request.MultipartForm == nil { | ||
| c.Request.ParseMultipartForm(64 << 20) | ||
| } |
| healthURL := strings.TrimSuffix(baseURL, "/") + "/health" | ||
| resp, err := http.Get(healthURL) | ||
| if err != nil { | ||
| return testResult{ | ||
| localErr: fmt.Errorf("mineru health check failed: %w", err), | ||
| } |
| var requestBuf bytes.Buffer | ||
| writer := multipart.NewWriter(&requestBuf) | ||
|
|
| } | ||
|
|
||
| part, err := writer.CreateFormFile(fieldName, fileHeader.Filename) | ||
| if err != nil { | ||
| file.Close() | ||
| return nil, fmt.Errorf("failed to create form file for %s: %w", fileHeader.Filename, err) | ||
| } |
|
|
||
| // Read response body | ||
| body, readErr := io.ReadAll(resp.Body) | ||
| if readErr != nil { | ||
| return nil, types.NewError(fmt.Errorf("failed to read response body: %w", readErr), types.ErrorCodeDoRequestFailed) | ||
| } | ||
| resp.Body.Close() |
| } | ||
| } | ||
| defer resp.Body.Close() | ||
| body, _ := io.ReadAll(resp.Body) |
|
误触创建,关闭。 |
Summary
Test Plan
Summary by CodeRabbit
Release Notes
New Features
/v1/file_parseendpoint for document extraction integrationRemoved Features