feat: add Vision Language Model (VLM) support#57
Conversation
Add support for image inputs in chat completions, enabling users to send
images for model analysis (e.g., GPT-4V, Claude Vision, Doubao Vision).
Changes:
- Add ImageContentBlock type to internal adapter types
- Update request adapters to parse image content:
- openai-chat: parse image_url content parts
- anthropic: parse image content blocks with base64 source
- openai-response: parse input_image parts
- Update upstream adapters to build provider-specific formats:
- openai: build content arrays with image_url parts
- anthropic: build image content blocks
- openai-responses: build input_image content
- Update API validation schema to accept content arrays with images
- Add frontend image rendering in request detail panel
Supported image formats:
- Remote URL: {"type": "image_url", "image_url": {"url": "https://..."}}
- Base64 data URL: {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
Refs #56
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthrough引入 ImageContentBlock 与 ImageSource,支持在请求/上游/响应适配器及前端在消息中传递并渲染图片(base64 或 URL),并在 API schema 中扩展消息 content 为可包含文本/图像部件的数组。 Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ReqAdapter as 请求适配器
participant Internal as 内部格式
participant Upstream as 上游适配器
participant Provider
Client->>ReqAdapter: 发送含 text / image_url 或 image 的消息
ReqAdapter->>Internal: 解析为 InternalContentBlock[](text + ImageContentBlock)
Internal->>Upstream: 提供混合内容块
Upstream->>Upstream: convertImageToUrl / 构建 provider 格式(url 或 base64)
Upstream->>Provider: 发送视觉兼容请求
Provider-->>Upstream: 返回结果
Upstream-->>Internal: 转换为内部消息(可能含 input_image)
Internal-->>Client: 返回包含文本与/或图像的消息
sequenceDiagram
participant OpenAIResp as OpenAI 响应解析器
participant InternalMsg as 内部消息
participant Frontend as 前端视图
OpenAIResp->>OpenAIResp: 检测到 input_text / input_image 部件
OpenAIResp->>InternalMsg: 构建 InternalContentBlock[](text + image)
InternalMsg->>Frontend: 传递消息
Frontend->>Frontend: getMessageImages 提取 image_url 部件
Frontend->>Frontend: 使用 ImageContentDisplay 渲染图片
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Summary of ChangesHello @pescn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces comprehensive support for Vision Language Models (VLMs) by enabling the system to process, transmit, and display image content. It establishes a unified internal type for images and integrates this capability across various backend adapters for popular AI providers, as well as enhancing the frontend to visually render these images. This change significantly expands the application's multimodal interaction capabilities. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces Vision Language Model (VLM) support, which is a significant feature enhancement. The changes span across backend adapters for handling image content from different providers and frontend components for rendering images. The implementation is comprehensive, and the identified areas for improvement related to type safety, bug prevention in the backend, and refactoring in the frontend will make the VLM support more robust and maintainable.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@backend/src/adapters/request/anthropic.ts`:
- Around line 140-148: The image branch in convertContentBlock and the
AnthropicContentBlock interface assume only base64 sources; update the
AnthropicContentBlock type (lines around the current interface) to allow
source.type = "url" with a url/uri string property, then modify
convertContentBlock's case "image" to check block.source?.type — if "base64" map
to ImageContentBlock with source.type "base64" and data/mediaType, if "url" map
to ImageContentBlock with source.type "url" and the url/uri field set (not
data), ensuring no undefined data is assigned when the Anthropic image source is
a URL.
In `@backend/src/adapters/upstream/anthropic.ts`:
- Around line 204-223: The image base64 branch is currently defaulting
block.source.data to an empty string which can cause Anthropic API rejects;
update the logic around the block.type === "image" / block.source.type ===
"base64" handling so you only push a content image entry when block.source.data
is present and non-empty (remove the || "" default and/or early-skip the image
block if block.source.data is falsy), e.g., validate block.source.data before
calling content.push so content only contains valid base64 payloads.
🧹 Nitpick comments (4)
backend/src/adapters/types.ts (1)
48-60: 建议使用区分联合类型以增强类型安全性。当前
source对象的所有字段都是可选的,这意味着可能出现类型不一致的情况(例如type: "base64"但设置了url而没有data)。建议使用区分联合类型来强制正确的字段组合。
♻️ 建议重构
export interface ImageContentBlock { type: "image"; - source: { - type: "base64" | "url"; - mediaType?: string; // "image/jpeg", "image/png", etc. - data?: string; // base64 data (when type is "base64") - url?: string; // image URL (when type is "url") - }; + source: + | { + type: "base64"; + mediaType?: string; // "image/jpeg", "image/png", etc. + data: string; + } + | { + type: "url"; + url: string; + }; detail?: "auto" | "low" | "high"; // OpenAI vision detail level }此重构可确保:
type: "base64"时必须提供data字段type: "url"时必须提供url字段- 编译时即可捕获字段缺失错误
frontend/src/pages/requests/detail-panel/pretty-view.tsx (1)
537-551: 建议添加图片加载错误处理。当图片 URL 无效或图片加载失败时,用户会看到破损的图片图标。建议添加
onError处理以提供更好的用户体验。♻️ 建议的改进
<img src={url} alt={t('pages.requests.detail-panel.pretty-view.UserImage', { defaultValue: 'User provided image' })} className="max-h-64 max-w-full object-contain" loading="lazy" + onError={(e) => { + e.currentTarget.style.display = 'none' + }} />或者可以考虑显示一个占位符图标来替代破损的图片。
backend/src/adapters/upstream/openai-responses.ts (1)
108-124: 建议提取公共工具函数以消除重复代码。
convertImageToUrl和hasImages函数在backend/src/adapters/upstream/openai.ts中存在完全相同的实现。建议将这些工具函数提取到共享模块(如backend/src/adapters/utils.ts)中。♻️ 建议的重构
创建
backend/src/adapters/utils.ts:import type { ImageContentBlock, InternalContentBlock } from "./types"; /** * Convert image source to URL format */ export function convertImageToUrl(block: ImageContentBlock): string { if (block.source.type === "url" && block.source.url) { return block.source.url; } // Convert base64 to data URL return `data:${block.source.mediaType || "image/jpeg"};base64,${block.source.data}`; } /** * Check if content blocks contain any images */ export function hasImages(content: InternalContentBlock[]): boolean { return content.some((b) => b.type === "image"); }然后在两个 upstream adapter 中导入使用。
backend/src/adapters/upstream/openai.ts (1)
141-147: 考虑mediaType缺失时的默认值。当
mediaType未定义时默认使用"image/jpeg",但实际图片可能是 PNG、GIF 或 WebP 格式。虽然浏览器通常对 data URL 的 MIME 类型比较宽容,但建议在上游(解析请求时)尽量保留原始的mediaType信息。
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
backend/src/adapters/request/anthropic.tsbackend/src/adapters/request/openai-chat.tsbackend/src/adapters/request/openai-response.tsbackend/src/adapters/response/anthropic.tsbackend/src/adapters/types.tsbackend/src/adapters/upstream/anthropic.tsbackend/src/adapters/upstream/openai-responses.tsbackend/src/adapters/upstream/openai.tsbackend/src/api/v1/completions.tsfrontend/src/pages/requests/detail-panel/pretty-view.tsx
🧰 Additional context used
🧬 Code graph analysis (6)
backend/src/adapters/request/openai-response.ts (1)
backend/src/adapters/types.ts (2)
InternalContentBlock(65-70)ImageContentBlock(51-60)
backend/src/adapters/request/openai-chat.ts (1)
backend/src/adapters/types.ts (2)
InternalContentBlock(65-70)ImageContentBlock(51-60)
backend/src/adapters/upstream/openai.ts (1)
backend/src/adapters/types.ts (2)
ImageContentBlock(51-60)InternalContentBlock(65-70)
backend/src/adapters/request/anthropic.ts (1)
backend/src/adapters/types.ts (1)
ImageContentBlock(51-60)
frontend/src/pages/requests/detail-panel/pretty-view.tsx (1)
frontend/src/components/ui/indicator-badge.tsx (1)
IndicatorBadge(6-16)
backend/src/adapters/upstream/openai-responses.ts (1)
backend/src/adapters/types.ts (2)
ImageContentBlock(51-60)InternalContentBlock(65-70)
🔇 Additional comments (10)
backend/src/adapters/response/anthropic.ts (1)
97-99: LGTM!正确处理了
image类型的内容块。在 Anthropic 的助手响应中不包含图片块(图片仅出现在请求中),返回null是合理的处理方式,与tool_result的处理逻辑一致。backend/src/adapters/request/openai-chat.ts (1)
126-140: LGTM!正确实现了 OpenAI
image_url内容部分到内部ImageContentBlock的转换。URL 源类型和detail字段都被正确处理和传递。backend/src/api/v1/completions.ts (1)
75-102: LGTM!API 验证 schema 正确扩展以支持 VLM 图片内容:
tContentPart定义了文本和image_url两种内容类型detail字段支持 OpenAI 的视觉细节级别("auto" | "low" | "high")tMessage.content保持了向后兼容性,同时支持新的内容数组格式frontend/src/pages/requests/detail-panel/pretty-view.tsx (3)
51-58: LGTM!
ImageContentPart类型定义与 OpenAI 的image_url内容格式一致,包含了detail可选字段。
493-525: LGTM!函数实现了良好的防御性编程,正确处理了
content为null、string或非数组的边界情况。运行时类型检查确保了类型安全。
163-176: LGTM!图片区域的 UI 设计与工具调用区域保持一致,使用
IndicatorBadge显示图片数量是良好的用户体验设计。backend/src/adapters/request/openai-response.ts (1)
109-140: LGTM!函数逻辑正确:当存在图片时返回
InternalContentBlock[],否则保持原有的字符串拼接行为。这种设计保持了向后兼容性。backend/src/adapters/upstream/openai-responses.ts (1)
150-191: LGTM!消息转换逻辑清晰地处理了三种情况:字符串内容、包含图片的数组、纯文本数组。
detail字段被正确传递给上游 API。backend/src/adapters/upstream/openai.ts (2)
27-38: LGTM!
OpenAIContentPart接口正确建模了 OpenAI Vision API 的内容格式,OpenAIMessage.content类型更新为支持字符串、数组或 null,符合 OpenAI Chat Completions API 规范。
199-240: LGTM!消息转换逻辑正确实现了 OpenAI Vision API 的内容格式要求。当存在图片时构建
image_url部分,否则将文本块合并为字符串。
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
- Support URL source type in Anthropic request adapter (not just base64) - Validate base64 data is present before sending to Anthropic API Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add validation in OpenAI upstream adapter to skip images with missing data - Add validation in OpenAI Responses upstream adapter for the same - Prevents invalid data URLs being sent to providers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use discriminated union for ImageSource to ensure type safety - When type is "base64", data field is required - When type is "url", url field is required - Update request adapters to validate data before creating blocks Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces Vision Language Model (VLM) support, a significant feature enhancement. The changes are comprehensive, touching multiple backend adapters to handle image content and updating the frontend to render images. My review focuses on improving maintainability by addressing code duplication and simplifying complex type checks. I've also identified a potential bug in the Anthropic request adapter's image handling logic that could lead to incorrect data processing. Overall, this is a solid implementation of a complex feature.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@backend/src/adapters/upstream/openai-responses.ts`:
- Around line 153-192: When hasImages(msg.content) is true but convertImageToUrl
filters out all images, contentParts can be empty which would send an empty
content array to the OpenAI Responses API; update the image-processing branch in
openai-responses.ts (symbols: hasImages, convertImageToUrl, contentParts,
msg.content) to check if contentParts.length === 0 before returning and, if
empty, fall back to the text-only behavior used later (build a joined string
from text blocks) or otherwise avoid returning an empty content array (e.g.,
skip image-only return and let the subsequent text-only join handle the
response); ensure the returned object always contains a non-empty content field.
In `@backend/src/adapters/upstream/openai.ts`:
- Around line 202-241: The image-branch builds contentParts (in the hasImages
branch) but may return an empty array if all images lack URLs; instead, after
populating contentParts in the hasImages branch, check if contentParts.length
=== 0 and if so fall back to producing a text string from msg.content (same
logic as the text-only path that creates the content variable) and return {
role: msg.role, content: contentString } rather than returning an empty array;
reference symbols: hasImages, contentParts, convertImageToUrl, msg.content, and
the existing text-join logic that builds content.
🧹 Nitpick comments (1)
backend/src/adapters/request/anthropic.ts (1)
141-160: base64 图片源缺少数据验证。URL 类型的图片源会验证
url是否存在(第 143 行),但 base64 类型却没有验证data是否存在。这可能导致创建包含undefineddata 的ImageContentBlock,虽然类型上允许,但语义上不太合理。建议与 URL 分支保持一致的验证逻辑:
♻️ 建议修复
case "image": // Handle both base64 and URL source types if (block.source?.type === "url" && block.source.url) { return { type: "image", source: { type: "url", url: block.source.url, }, } as ImageContentBlock; } - // Default to base64 - return { - type: "image", - source: { - type: "base64", - mediaType: block.source?.media_type, - data: block.source?.data, - }, - } as ImageContentBlock; + // Default to base64, validate data exists + if (block.source?.type === "base64" && block.source.data) { + return { + type: "image", + source: { + type: "base64", + mediaType: block.source.media_type, + data: block.source.data, + }, + } as ImageContentBlock; + } + // Skip invalid image blocks + return null;
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/src/adapters/request/anthropic.tsbackend/src/adapters/upstream/anthropic.tsbackend/src/adapters/upstream/openai-responses.tsbackend/src/adapters/upstream/openai.ts
🧰 Additional context used
🧬 Code graph analysis (3)
backend/src/adapters/upstream/openai-responses.ts (1)
backend/src/adapters/types.ts (2)
ImageContentBlock(51-60)InternalContentBlock(65-70)
backend/src/adapters/upstream/openai.ts (1)
backend/src/adapters/types.ts (2)
ImageContentBlock(51-60)InternalContentBlock(65-70)
backend/src/adapters/request/anthropic.ts (1)
backend/src/adapters/types.ts (1)
ImageContentBlock(51-60)
🔇 Additional comments (10)
backend/src/adapters/upstream/anthropic.ts (2)
29-34: LGTM!
AnthropicContentBlock的source字段类型定义正确,支持base64和url两种图片源类型,与 Anthropic API 规范一致。
204-224: LGTM!图片处理逻辑实现正确:
- base64 类型在发送前验证
data是否存在- URL 类型在发送前验证
url是否存在- 无效的图片块会被静默跳过,避免向 Anthropic API 发送无效数据
之前评审中提出的空 base64 数据问题已在 commit 8585bc0 中修复。
backend/src/adapters/request/anthropic.ts (2)
7-7: LGTM!正确导入了
ImageContentBlock类型以支持新的图片处理逻辑。
28-32: LGTM!
AnthropicContentBlock.source类型定义已正确扩展,支持base64和url两种源类型,与 Anthropic API 文档一致。backend/src/adapters/upstream/openai-responses.ts (3)
6-19: 新增 ImageContentBlock 导入合理。
后续图片转换逻辑需要它,改动清晰。
25-30: ResponseApiContentPart 扩展为支持 input_image 很合适。
字段补齐后更符合多模态输入需求。
108-127: 图片 URL/base64 转换与判空处理 OK。
这部分逻辑清晰,避免了无效 data URL。backend/src/adapters/upstream/openai.ts (3)
6-21: 新增 ImageContentBlock 导入合理。
类型依赖明确,便于后续转换使用。
27-39: OpenAIContentPart 与消息 content 扩展方向正确。
支持图像内容后结构更符合多模态消息需求。
138-157: 图片 URL/base64 转换与判空处理 OK。
逻辑简洁且易于复用。
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
- Check source.type === "base64" before processing as base64 in Anthropic adapter - Handle empty content array edge case in OpenAI upstream adapters - Fallback to empty string if contentParts is empty Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/src/adapters/request/openai-chat.ts`:
- Around line 126-139: The image handling loop that builds blocks (the for...of
over content) currently always maps image_url parts to source.type="url"; update
the logic in the image branch (where ImageContentBlock is constructed) to detect
data URLs (strings starting with "data:") in part.image_url.url, parse them into
mediaType and base64 payload, and set source to { type: "base64", mediaType,
data } instead of a URL; keep the existing fallback to { type: "url", url } for
non-data URLs and preserve detail, so downstream adapters that expect base64
(e.g., Anthropic) receive the correct shape.
♻️ Duplicate comments (1)
backend/src/adapters/request/anthropic.ts (1)
152-161: 对 base64 类型的检查不够严格。当前逻辑在第 153 行仅检查
block.source?.data是否存在,而未验证type === "base64"。如果收到畸形请求{ type: 'url', data: '...' }(缺少 url 字段),会被错误地当作 base64 图像处理。建议与 URL 分支保持一致,显式检查类型:
- if (block.source?.data) { + if (block.source?.type === "base64" && block.source.data) {
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/src/adapters/request/anthropic.tsbackend/src/adapters/request/openai-chat.tsbackend/src/adapters/types.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/src/adapters/types.ts
🧰 Additional context used
🧬 Code graph analysis (2)
backend/src/adapters/request/openai-chat.ts (1)
backend/src/adapters/types.ts (2)
InternalContentBlock(74-79)ImageContentBlock(65-69)
backend/src/adapters/request/anthropic.ts (1)
backend/src/adapters/types.ts (1)
ImageContentBlock(65-69)
🔇 Additional comments (3)
backend/src/adapters/request/anthropic.ts (3)
7-7: LGTM!正确导入了
ImageContentBlock类型以支持新增的图像处理逻辑。
28-33: LGTM!接口正确扩展以支持
base64和url两种图像源类型,字段设计符合 Anthropic API 规范。
154-161:mediaType为可选字段,无需添加默认值。根据类型定义,
ImageSource的 base64 变体中mediaType明确定义为可选字段(mediaType?: string),输入的media_type也是可选的。代码正确地透传了源数据:如果block.source.media_type存在则传递,不存在则为undefined。这符合类型系统的设计,不需要添加默认值。若下游代码依赖
mediaType字段,应由该代码负责处理可选字段。转换函数的职责是转换数据,而非添加源中不存在的数据。Likely an incorrect or invalid review comment.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
- Create backend/src/adapters/upstream/utils.ts with shared functions - Extract convertImageToUrl and hasImages to shared module - Update openai.ts and openai-responses.ts to use shared utilities - Reduces code duplication and improves maintainability Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Fixed in commit a66cc43. Extracted shared utility functions to // backend/src/adapters/upstream/utils.ts
export function convertImageToUrl(block: ImageContentBlock): string { ... }
export function hasImages(content: InternalContentBlock[]): boolean { ... }Both |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces Vision Language Model (VLM) support, a significant feature enhancement. The changes are well-architected, with a new unified ImageContentBlock type and updates across various request and upstream adapters to handle image content. The addition of a shared utils.ts for image conversion logic is a good practice for code reuse. The frontend is also correctly updated to display images in the request details view. Overall, this is a solid implementation. I have one suggestion to improve the readability of the image parsing logic on the frontend.
- Add parseImageUrl helper to detect and parse data URLs - Convert data URLs (data:image/...;base64,...) to base64 source type - Ensures cross-provider compatibility (e.g., OpenAI to Anthropic) - Applied to both openai-chat and openai-response adapters Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
ImageContentBlockto internal types for unified image handlingChanges
Backend
backend/src/adapters/types.ts- AddedImageContentBlocktypebackend/src/adapters/request/openai-chat.ts- Parseimage_urlcontent partsbackend/src/adapters/request/anthropic.ts- Handleimagetype blocksbackend/src/adapters/request/openai-response.ts- Parseinput_imagecontentbackend/src/adapters/upstream/openai.ts- Build content arrays with image_url partsbackend/src/adapters/upstream/anthropic.ts- Include image blocks in requestsbackend/src/adapters/upstream/openai-responses.ts- Handle image inputsbackend/src/adapters/response/anthropic.ts- Handle image type in switchbackend/src/api/v1/completions.ts- Update validation schema for image contentFrontend
frontend/src/pages/requests/detail-panel/pretty-view.tsx- AddImageContentDisplaycomponentTest plan
Closes #56
🤖 Generated with Claude Code
Summary by CodeRabbit
发布说明
新功能
兼容性提醒
✏️ Tip: You can customize this high-level summary in your review settings.