Skip to content

新增阿里通义万相wan视频模型支持#2139

Closed
feitianbubu wants to merge 6 commits into
QuantumNous:mainfrom
feitianbubu:pr/add-wan-video
Closed

新增阿里通义万相wan视频模型支持#2139
feitianbubu wants to merge 6 commits into
QuantumNous:mainfrom
feitianbubu:pr/add-wan-video

Conversation

@feitianbubu
Copy link
Copy Markdown
Contributor

@feitianbubu feitianbubu commented Oct 31, 2025

  1. 支持模型
{
   "wan2.5-i2v-preview", // 万相2.5 preview(有声视频)推荐
   "wan2.2-i2v-flash",   // 万相2.2极速版(无声视频)
   "wan2.2-i2v-plus",    // 万相2.2专业版(无声视频)
   "wanx2.1-i2v-plus",   // 万相2.1专业版(无声视频)
   "wanx2.1-i2v-turbo",  // 万相2.1极速版(无声视频)
}

官方文档: https://help.aliyun.com/zh/model-studio/image-to-video-api-reference?spm=a2c4g.11186623.help-menu-2400256.d_2_3_0.2fd96340ai7bZG&scm=20140722.H_2867393._.OR_help-T_cn~zh-V_1
3. 支持openAI sdk 视频生成请求
4. 请求格式:

curl http://localhost:3000/v1/videos \
  --request POST \
  --header 'Content-Type: multipart/form-data' \
  --form 'prompt=奇幻艺术' \
  --form 'model=wan2.5-i2v-preview' \
  --form 'seconds=5' \
  --form 'size=1080P' \
  --form 'metadata={"input": {"img_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png","audio_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"}'

厂商额外参数: 以json格式放到metadata
5. 请求示例:
6.
image

curl http://localhost:3000/v1/videos \
  --request POST \
  --header 'Content-Type: multipart/form-data' \
  --form 'prompt=奇幻艺术' \
  --form 'model=wan2.5-i2v-preview' \
  --form 'seconds=5' \
  --form 'size=1080P' \
  --form 'metadata={"input": {"img_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/wpimhv/rap.png","audio_url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250925/ozwpvi/rap.mp3"}}'
  1. 查询示例:
image

curl http://localhost:3000/v1/videos/f06a6f77-e5ab-4eff-8549-3e3eeff4e6ad
8.任务示例:
image

image

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 31, 2025

Walkthrough

Adds Ali video generation channel support by introducing a new TaskAdaptor implementation, refactoring form/JSON unmarshalling in common utilities, extending relay integration structures, and updating video proxy handling to support Ali-specific videoURL sources.

Changes

Cohort / File(s) Summary
Form/JSON unmarshalling refactoring
common/gin.go
Removed encoding/json dependency, refactored Unmarshal calls to pass values directly instead of pointers, introduced processFormMap() helper to consolidate map-to-JSON-to-struct conversion for form data (URL-encoded and multipart), updated parseFormData and parseMultipartFormData to delegate to new helper.
Ali channel adaptor
relay/channel/task/ali/adaptor.go, relay/channel/task/ali/constants.go
New Ali video generation adaptor with request/response models (AliVideoRequest, AliVideoInput, AliVideoParameters, AliVideoResponse, AliVideoOutput, AliUsage, AliMetadata), TaskAdaptor implementation handling initialization, request validation, URL/header/body construction, request execution, response parsing, task status retrieval, and OpenAI-like response conversion; ModelList and ChannelName constants defined.
Video proxy controller
controller/video_proxy.go
Replaced conditional branch with switch statement to handle Ali channel type, extracting videoURL directly from task.FailReason for Ali (new behavior), retaining existing logic for Gemini and default cases.
Relay data structures and integration
relay/common/relay_info.go, relay/common/relay_utils.go, relay/relay_adaptor.go
Extended TaskSubmitReq with Seconds and InputReference fields and custom UnmarshalJSON for flexible metadata handling; updated ValidateMultipartDirect to perform JSON-only unmarshalling with InputReference-to-Images mapping and Seconds fallback logic; extended GetTaskAdaptor in relay_adaptor to return new taskali.TaskAdaptor for Ali channel type.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Controller as video_proxy
    participant Relay as relay/adaptor
    participant Ali as ali/adaptor
    participant AliAPI as Ali API

    Client->>Controller: Video request (Ali channel)
    Controller->>Relay: GetTaskAdaptor(ChannelTypeAli)
    Relay-->>Controller: taskali.TaskAdaptor
    
    Controller->>Ali: ValidateRequestAndSetAction()
    Ali->>Ali: Parse JSON body, validate model/prompt
    Ali-->>Controller: ✓ Valid
    
    Controller->>Ali: BuildRequestURL()
    Ali-->>Controller: Ali API endpoint
    
    Controller->>Ali: BuildRequestHeader()
    Ali-->>Controller: Headers (Authorization, Content-Type)
    
    Controller->>Ali: BuildRequestBody()
    Ali->>Ali: convertToAliRequest()
    Ali-->>Controller: JSON request body
    
    Controller->>Ali: DoRequest()
    Ali->>AliAPI: POST video generation request
    AliAPI-->>Ali: Response (task_id, output, etc.)
    Ali-->>Controller: HTTP Response
    
    Controller->>Ali: DoResponse()
    Ali->>Ali: validateErrorCode, convertToOpenAIVideo()
    Ali-->>Controller: taskID, taskData, taskErr
    
    Controller-->>Client: Video generation response (OpenAI format)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Common/gin.go refactoring: Inspect the new processFormMap() helper logic and ensure Unmarshal pointer/value handling is correct across JSON, form, and multipart paths.
  • Ali adaptor implementation: Review the complete TaskAdaptor implementation, particularly convertToAliRequest() mapping logic, response status translation (convertAliStatus), and OpenAI response conversion (ConvertToOpenAIVideo()).
  • Relay integration: Verify TaskSubmitReq UnmarshalJSON custom logic handles both string and object metadata correctly, and that InputReference-to-Images mapping in ValidateMultipartDirect preserves existing behavior for other channels.

Possibly related PRs

Suggested reviewers

  • seefs001
  • creamlike1024
  • xyfacai

Poem

🐰 A new friend named Ali hops into the relay,
Forms and JSON dance in a fresh, clever way,
Videos flow through adapted streams,
Converting to OpenAI's format with dreams,
The adaptor's complete—let the rebase convey!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "新增阿里通义万相wan视频模型支持" (Add support for Alibaba Tongyi Wanxiang video models) is directly related to the main objective and changeset. The primary changes introduce a complete Ali video generation adapter (relay/channel/task/ali/adaptor.go, relay/channel/task/ali/constants.go) with model definitions and integration logic, while supporting changes in common/gin.go, relay_info.go, relay_utils.go, and controller/video_proxy.go enable this new provider. The title is concrete, specific, and clearly conveys the core feature being added without vagueness or generic language.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@feitianbubu feitianbubu changed the title 新增阿里万相wan视频模型支持 新增阿里通义万相wan视频模型支持 Oct 31, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
relay/common/relay_utils.go (1)

110-176: Function name is misleading AND missing data URL validation on InputReference.

ValidateMultipartDirect now exclusively handles JSON requests for Ali and Sora channels (verified by call sites). However, two issues need correction:

  1. Semantic mismatch: Function name suggests multipart/form-data handling but implementation is JSON-only. Consider renaming to ValidateTextToImageRequest or ValidateDirectImageGenRequest.

  2. Missing data URL validation: Lines 128–130 map InputReference directly to Images array without checking for data: prefix. This is inconsistent with the pattern used in relay/relay_task.go:354 and relay/channel/task/vertex/adaptor.go:322, where data URLs are explicitly prevented. Add validation:

if req.InputReference != "" {
    if !strings.HasPrefix(req.InputReference, "data:") {
        req.Images = []string{req.InputReference}
    }
}
🧹 Nitpick comments (2)
relay/common/relay_info.go (1)

509-539: Consider logging metadata parsing errors for debugging.

The flexible metadata parsing (supporting both stringified JSON and direct objects) is well-implemented. However, parsing errors are silently ignored, which could make debugging difficult if clients send malformed metadata.

Consider adding debug logging when metadata parsing fails:

 func (t *TaskSubmitReq) UnmarshalJSON(data []byte) error {
 	type Alias TaskSubmitReq
 	aux := &struct {
 		Metadata json.RawMessage `json:"metadata,omitempty"`
 		*Alias
 	}{
 		Alias: (*Alias)(t),
 	}
 
 	if err := common.Unmarshal(data, &aux); err != nil {
 		return err
 	}
 
 	if len(aux.Metadata) > 0 {
 		var metadataStr string
 		if err := common.Unmarshal(aux.Metadata, &metadataStr); err == nil && metadataStr != "" {
 			var metadataObj map[string]interface{}
 			if err := common.Unmarshal([]byte(metadataStr), &metadataObj); err == nil {
 				t.Metadata = metadataObj
 				return nil
+			} else {
+				common.LogDebug(c.Request.Context(), fmt.Sprintf("Failed to parse metadata string: %v", err))
 			}
 		}
 
 		var metadataObj map[string]interface{}
 		if err := common.Unmarshal(aux.Metadata, &metadataObj); err == nil {
 			t.Metadata = metadataObj
+		} else {
+			common.LogDebug(c.Request.Context(), fmt.Sprintf("Failed to parse metadata object: %v", err))
 		}
 	}
 
 	return nil
 }

Note: This assumes a debug logging function exists. Adjust based on your logging infrastructure.

relay/channel/task/ali/adaptor.go (1)

149-198: Review metadata extraction approach and consider error handling.

The conversion logic is generally solid, but there are a few areas to consider:

  1. Metadata extraction (lines 191-195): The marshal-then-unmarshal approach to extract metadata into the request is clever, but it silently ignores errors and could potentially overwrite previously set default values (like PromptExtend and Watermark). This might be intentional for allowing user overrides, but consider whether this behavior should be documented or if certain fields should be protected from metadata overrides.

  2. Images array handling (line 154): The code maps InputReference to ImgURL, but the TaskSubmitReq also has an Images[] array. From the context in relay_utils.go, InputReference is converted to Images[]. Should this code handle the Images array directly, or is the single InputReference sufficient for Ali's API?

Consider making the metadata extraction more explicit:

 	// 从 metadata 中提取额外参数
 	if req.Metadata != nil {
-		if metadataBytes, err := common.Marshal(req.Metadata); err == nil {
-			_ = common.Unmarshal(metadataBytes, aliReq)
+		var aliMetadata AliMetadata
+		if metadataBytes, err := common.Marshal(req.Metadata); err == nil {
+			if err := common.Unmarshal(metadataBytes, &aliMetadata); err == nil {
+				// Explicitly map metadata fields to avoid unintended overwrites
+				if aliMetadata.AudioURL != "" {
+					aliReq.Input.AudioURL = aliMetadata.AudioURL
+				}
+				// ... map other fields explicitly
+			}
 		}
 	}

Or, document the current behavior if metadata overrides are intentional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc56f45 and ddf5ea7.

📒 Files selected for processing (7)
  • common/gin.go (4 hunks)
  • controller/video_proxy.go (1 hunks)
  • relay/channel/task/ali/adaptor.go (1 hunks)
  • relay/channel/task/ali/constants.go (1 hunks)
  • relay/common/relay_info.go (3 hunks)
  • relay/common/relay_utils.go (1 hunks)
  • relay/relay_adaptor.go (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-26T09:59:00.337Z
Learnt from: Sh1n3zZ
Repo: QuantumNous/new-api PR: 1659
File: relay/relay_task.go:285-305
Timestamp: 2025-08-26T09:59:00.337Z
Learning: In controller/task_video.go, data: URLs (containing base64 encoded video data) are prevented from being stored in task.FailReason by checking if the URL starts with "data:" before assignment. This same pattern should be applied consistently across the codebase.

Applied to files:

  • controller/video_proxy.go
🧬 Code graph analysis (6)
relay/relay_adaptor.go (3)
constant/channel.go (1)
  • ChannelTypeAli (21-21)
relay/channel/task/ali/adaptor.go (1)
  • TaskAdaptor (105-109)
relay/channel/adapter.go (1)
  • TaskAdaptor (34-53)
relay/common/relay_info.go (1)
common/json.go (1)
  • Unmarshal (9-11)
controller/video_proxy.go (1)
constant/channel.go (2)
  • ChannelTypeGemini (28-28)
  • ChannelTypeAli (21-21)
common/gin.go (1)
common/json.go (2)
  • Unmarshal (9-11)
  • Marshal (21-23)
relay/channel/task/ali/adaptor.go (11)
model/task.go (5)
  • TaskStatus (13-13)
  • TaskStatusQueued (35-35)
  • TaskStatusInProgress (36-36)
  • TaskStatusSuccess (38-38)
  • TaskStatusFailure (37-37)
relay/common/relay_info.go (3)
  • RelayInfo (76-123)
  • TaskSubmitReq (488-499)
  • TaskInfo (541-550)
relay/common/relay_utils.go (1)
  • ValidateMultipartDirect (110-176)
common/gin.go (1)
  • UnmarshalBodyReusable (33-58)
common/json.go (2)
  • Marshal (21-23)
  • Unmarshal (9-11)
relay/channel/api_request.go (1)
  • DoTaskApiRequest (301-323)
service/error.go (1)
  • TaskErrorWrapper (140-157)
dto/openai_video.go (7)
  • NewOpenAIVideo (43-47)
  • OpenAIVideoError (49-52)
  • VideoStatusQueued (10-10)
  • VideoStatusInProgress (11-11)
  • VideoStatusCompleted (12-12)
  • VideoStatusFailed (13-13)
  • VideoStatusUnknown (9-9)
common/utils.go (1)
  • GetTimestamp (266-268)
service/http_client.go (1)
  • GetHttpClient (49-51)
relay/channel/task/ali/constants.go (2)
  • ModelList (3-9)
  • ChannelName (11-11)
relay/common/relay_utils.go (2)
relay/common/relay_info.go (1)
  • TaskSubmitReq (488-499)
common/gin.go (1)
  • UnmarshalBodyReusable (33-58)
🔇 Additional comments (15)
common/gin.go (2)

43-47: LGTM: Corrected pointer semantics.

The change from &v to v in the Unmarshal calls is correct. Since v is already a pointer type (passed by callers as &req), taking its address again (&v) would create a pointer-to-pointer, which is incorrect. The updated code properly passes the pointer directly to Unmarshal.


147-159: LGTM: Centralized form-to-JSON conversion.

The new processFormMap helper eliminates code duplication by centralizing the map-to-struct conversion logic via JSON marshaling/unmarshaling, improving maintainability.

relay/relay_adaptor.go (1)

31-31: LGTM: Ali task adaptor integration.

The import and switch case addition follow the established pattern for channel adaptor registration and are correctly implemented.

Also applies to: 137-138

relay/channel/task/ali/constants.go (1)

3-11: LGTM: Ali channel constants.

The model list and channel name declarations are clear and align with the PR objectives. The inline comments (in Chinese) provide helpful context for each model variant.

relay/common/relay_utils.go (1)

128-130: LGTM: InputReference to Images mapping.

The mapping of InputReference to the Images array (when provided) correctly supports the new Ali video generation flow where a single reference input can be specified.

controller/video_proxy.go (1)

94-105: No issues found. Data: URL filtering is already in place across the codebase.

The concern about data: URLs being stored in task.FailReason is already addressed. The relay_task.go:356-363 applies data: URL filtering—if the URL starts with "data:", it's skipped; otherwise, it stores in FailReason. Ali's VideoURL flows through taskResult.Url and is filtered before storage, consistent with the pattern in controller/task_video.go. The video_proxy.go code is safe.

While FailReason semantically implies error information rather than successful URLs, this is an established pattern in the codebase and does not introduce security or functional risks.

relay/common/relay_info.go (2)

4-4: LGTM!

The encoding/json import is appropriately added to support the custom UnmarshalJSON method for flexible metadata parsing.


488-499: LGTM!

The new fields Seconds, InputReference, and Metadata appropriately extend TaskSubmitReq to support video generation workflows. The use of omitempty keeps these fields optional, which is suitable for backward compatibility.

relay/channel/task/ali/adaptor.go (7)

25-100: LGTM!

The Ali API request/response structures are well-defined with appropriate JSON tags and comprehensive field coverage. The use of pointers for optional fields in AliVideoParameters (like Audio *bool) correctly distinguishes between unset and false values.


111-120: Initialization and validation logic looks correct.

The adaptor properly initializes with channel metadata and delegates validation to the common utility. Note: The function name ValidateMultipartDirect might seem contradictory with the comment stating Ali uses JSON format, but from the codebase context, this function actually handles JSON via UnmarshalBodyReusable.


122-132: LGTM!

The request URL and headers are correctly configured for Ali's async video generation API. The X-DashScope-Async: enable header is properly set to enable asynchronous task processing.


134-148: LGTM!

The request body building correctly unmarshals the task request and delegates to the conversion logic.


200-243: LGTM!

The response handling properly unmarshals the Ali API response, validates the task ID, and converts to OpenAI-compatible format. Error cases are appropriately handled with informative error messages.


246-305: LGTM!

The task fetching and result parsing logic correctly handles Ali's async task API. The status mapping comprehensively covers all Ali task states (PENDING, RUNNING, SUCCEEDED, FAILED, CANCELED, UNKNOWN) and appropriately translates them to internal statuses.


264-347: LGTM!

The helper methods are well-implemented:

  • GetModelList() and GetChannelName() return appropriate constants
  • ConvertToOpenAIVideo() correctly transforms the stored task data into OpenAI-compatible format, with the video URL properly set in metadata (line 321)
  • convertAliStatus() provides consistent status mapping throughout the adaptor

@seefs001
Copy link
Copy Markdown
Collaborator

在另一个pr把你的commit带过去了

@seefs001 seefs001 closed this Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants