Skip to content

feat: add Vision Language Model (VLM) support#57

Merged
pescn merged 7 commits into
mainfrom
feat/vlm-support
Jan 24, 2026
Merged

feat: add Vision Language Model (VLM) support#57
pescn merged 7 commits into
mainfrom
feat/vlm-support

Conversation

@pescn
Copy link
Copy Markdown
Contributor

@pescn pescn commented Jan 24, 2026

Summary

  • Add ImageContentBlock to internal types for unified image handling
  • Update request adapters (OpenAI Chat, Anthropic, OpenAI Response API) to parse image content
  • Update upstream adapters (OpenAI, Anthropic, OpenAI Responses) to send images to providers
  • Add frontend image rendering in request detail view
  • Support both remote URL and base64 encoded image formats

Changes

Backend

  • backend/src/adapters/types.ts - Added ImageContentBlock type
  • backend/src/adapters/request/openai-chat.ts - Parse image_url content parts
  • backend/src/adapters/request/anthropic.ts - Handle image type blocks
  • backend/src/adapters/request/openai-response.ts - Parse input_image content
  • backend/src/adapters/upstream/openai.ts - Build content arrays with image_url parts
  • backend/src/adapters/upstream/anthropic.ts - Include image blocks in requests
  • backend/src/adapters/upstream/openai-responses.ts - Handle image inputs
  • backend/src/adapters/response/anthropic.ts - Handle image type in switch
  • backend/src/api/v1/completions.ts - Update validation schema for image content

Frontend

  • frontend/src/pages/requests/detail-panel/pretty-view.tsx - Add ImageContentDisplay component

Test plan

  • Test VLM request with remote image URL
  • Test VLM request with base64 encoded image
  • Verify images render correctly in frontend request detail view
  • Test cross-format: OpenAI format request → Anthropic provider

Closes #56

🤖 Generated with Claude Code

Summary by CodeRabbit

发布说明

  • 新功能

    • 支持视觉内容:消息可包含图像,来源可为 URL 或内联数据(base64),并可与文本混合发送与接收。
    • 前端展示增强:对话详情显示图像缩略、数量与可选质量级别(自动/低/高),可直接预览。
  • 兼容性提醒

    • 消息内容结构已扩展为文本或文本+图像的组合,相关接口与展示行为已相应调整。

✏️ Tip: You can customize this high-level summary in your review settings.

Add support for image inputs in chat completions, enabling users to send
images for model analysis (e.g., GPT-4V, Claude Vision, Doubao Vision).

Changes:
- Add ImageContentBlock type to internal adapter types
- Update request adapters to parse image content:
  - openai-chat: parse image_url content parts
  - anthropic: parse image content blocks with base64 source
  - openai-response: parse input_image parts
- Update upstream adapters to build provider-specific formats:
  - openai: build content arrays with image_url parts
  - anthropic: build image content blocks
  - openai-responses: build input_image content
- Update API validation schema to accept content arrays with images
- Add frontend image rendering in request detail panel

Supported image formats:
- Remote URL: {"type": "image_url", "image_url": {"url": "https://..."}}
- Base64 data URL: {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}

Refs #56

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 24, 2026

Warning

Rate limit exceeded

@pescn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 8 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between a66cc43 and 7acbe04.

📒 Files selected for processing (2)
  • backend/src/adapters/request/openai-chat.ts
  • backend/src/adapters/request/openai-response.ts

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

引入 ImageContentBlock 与 ImageSource,支持在请求/上游/响应适配器及前端在消息中传递并渲染图片(base64 或 URL),并在 API schema 中扩展消息 content 为可包含文本/图像部件的数组。

Changes

凝聚组 / 文件(s) 变更概要
类型系统
backend/src/adapters/types.ts
新增 ImageSourceImageContentBlock,并将 InternalContentBlock 联合扩展包含图片块(可选 detail)。
请求适配器 — Anthropic
backend/src/adapters/request/anthropic.ts
扩展 AnthropicContentBlock.source 为 `type: "base64"
请求适配器 — OpenAI (chat / response)
backend/src/adapters/request/openai-chat.ts, backend/src/adapters/request/openai-response.ts
支持 image_url / input_image 部件并构建 ImageContentBlockconvertContentParts 由只返回 string 改为 `string
上游适配器 — Anthropic
backend/src/adapters/upstream/anthropic.ts
AnthropicContentBlock 新增可选 source 字段;在构建上游消息时根据 source.type 输出 base64 或 url 图片块(跳过无效数据)。
上游适配器 — OpenAI / Responses
backend/src/adapters/upstream/openai.ts, backend/src/adapters/upstream/openai-responses.ts
新增 OpenAIContentPart / ResponseApiContentPart 支持图像部件(image_url / input_image);OpenAIMessage.content 可为 string 或 内容部件数组;新增 convertImageToUrlhasImages 辅助。
响应适配器 — Anthropic
backend/src/adapters/response/anthropic.ts
将响应中的 image 块视为可忽略(与 tool_result 一致),避免序列化为文本。
上游工具/辅助
backend/src/adapters/upstream/utils.ts
新增 convertImageToUrl(block)(base64→data: URL 或返回 url)和 hasImages(content) 判断函数并导出。
API 层
backend/src/api/v1/completions.ts
扩展消息 schema:tMessage.content 支持字符串、null 或 tContentPart[](包含文本与 image_url 部件)。
前端 UI
frontend/src/pages/requests/detail-panel/pretty-view.tsx
新增 ImageContentDisplaygetMessageImages,在消息详情中提取并渲染 image_url 部件(支持 data:URLs 与外部 URL),并引入 ImageIcon

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ReqAdapter as 请求适配器
    participant Internal as 内部格式
    participant Upstream as 上游适配器
    participant Provider

    Client->>ReqAdapter: 发送含 text / image_url 或 image 的消息
    ReqAdapter->>Internal: 解析为 InternalContentBlock[](text + ImageContentBlock)
    Internal->>Upstream: 提供混合内容块
    Upstream->>Upstream: convertImageToUrl / 构建 provider 格式(url 或 base64)
    Upstream->>Provider: 发送视觉兼容请求
    Provider-->>Upstream: 返回结果
    Upstream-->>Internal: 转换为内部消息(可能含 input_image)
    Internal-->>Client: 返回包含文本与/或图像的消息
Loading
sequenceDiagram
    participant OpenAIResp as OpenAI 响应解析器
    participant InternalMsg as 内部消息
    participant Frontend as 前端视图

    OpenAIResp->>OpenAIResp: 检测到 input_text / input_image 部件
    OpenAIResp->>InternalMsg: 构建 InternalContentBlock[](text + image)
    InternalMsg->>Frontend: 传递消息
    Frontend->>Frontend: getMessageImages 提取 image_url 部件
    Frontend->>Frontend: 使用 ImageContentDisplay 渲染图片
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 我是忙碌的小兔子,嗅着像素香,
文字与图片并肩跑,框里跳跃光,
URL 与 Base64 手牵长,
适配器们轻轻把它递,
视觉上线,心欢畅 🖼️✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add Vision Language Model (VLM) support' clearly and accurately summarizes the main objective of the pull request.
Linked Issues check ✅ Passed The pull request successfully implements all coding requirements from issue #56: ImageContentBlock type with proper structure, request adapters for OpenAI/Anthropic/OpenAI-Responses image parsing, upstream adapters for sending images to providers, and frontend image display.
Out of Scope Changes check ✅ Passed All changes are within scope of VLM image input support. The PR includes internal type extensions, request/upstream adapter modifications, API schema updates, and frontend rendering - all aligned with issue #56 requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @pescn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for Vision Language Models (VLMs) by enabling the system to process, transmit, and display image content. It establishes a unified internal type for images and integrates this capability across various backend adapters for popular AI providers, as well as enhancing the frontend to visually render these images. This change significantly expands the application's multimodal interaction capabilities.

Highlights

  • Unified Image Handling: Introduced a new ImageContentBlock type to standardize the representation and handling of image content across the backend, enabling consistent processing of visual data.
  • Adapter Integration: Updated request adapters (OpenAI Chat, Anthropic, OpenAI Response API) to correctly parse incoming image content (both URL and base64 formats) and upstream adapters (OpenAI, Anthropic, OpenAI Responses) to send image data to AI providers.
  • Frontend Image Display: Implemented a new ImageContentDisplay component in the frontend to render images within the request detail view, providing visual feedback for VLM interactions.
  • API Schema Update: Modified the API validation schema to officially support image content within messages, ensuring proper data structure for VLM requests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Vision Language Model (VLM) support, which is a significant feature enhancement. The changes span across backend adapters for handling image content from different providers and frontend components for rendering images. The implementation is comprehensive, and the identified areas for improvement related to type safety, bug prevention in the backend, and refactoring in the frontend will make the VLM support more robust and maintainable.

Comment thread backend/src/adapters/types.ts Outdated
Comment thread backend/src/adapters/upstream/anthropic.ts Outdated
Comment thread backend/src/adapters/upstream/openai.ts Outdated
Comment thread frontend/src/pages/requests/detail-panel/pretty-view.tsx
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@backend/src/adapters/request/anthropic.ts`:
- Around line 140-148: The image branch in convertContentBlock and the
AnthropicContentBlock interface assume only base64 sources; update the
AnthropicContentBlock type (lines around the current interface) to allow
source.type = "url" with a url/uri string property, then modify
convertContentBlock's case "image" to check block.source?.type — if "base64" map
to ImageContentBlock with source.type "base64" and data/mediaType, if "url" map
to ImageContentBlock with source.type "url" and the url/uri field set (not
data), ensuring no undefined data is assigned when the Anthropic image source is
a URL.

In `@backend/src/adapters/upstream/anthropic.ts`:
- Around line 204-223: The image base64 branch is currently defaulting
block.source.data to an empty string which can cause Anthropic API rejects;
update the logic around the block.type === "image" / block.source.type ===
"base64" handling so you only push a content image entry when block.source.data
is present and non-empty (remove the || "" default and/or early-skip the image
block if block.source.data is falsy), e.g., validate block.source.data before
calling content.push so content only contains valid base64 payloads.
🧹 Nitpick comments (4)
backend/src/adapters/types.ts (1)

48-60: 建议使用区分联合类型以增强类型安全性。

当前 source 对象的所有字段都是可选的,这意味着可能出现类型不一致的情况(例如 type: "base64" 但设置了 url 而没有 data)。

建议使用区分联合类型来强制正确的字段组合。

♻️ 建议重构
 export interface ImageContentBlock {
   type: "image";
-  source: {
-    type: "base64" | "url";
-    mediaType?: string; // "image/jpeg", "image/png", etc.
-    data?: string; // base64 data (when type is "base64")
-    url?: string; // image URL (when type is "url")
-  };
+  source:
+    | {
+        type: "base64";
+        mediaType?: string; // "image/jpeg", "image/png", etc.
+        data: string;
+      }
+    | {
+        type: "url";
+        url: string;
+      };
   detail?: "auto" | "low" | "high"; // OpenAI vision detail level
 }

此重构可确保:

  • type: "base64" 时必须提供 data 字段
  • type: "url" 时必须提供 url 字段
  • 编译时即可捕获字段缺失错误
frontend/src/pages/requests/detail-panel/pretty-view.tsx (1)

537-551: 建议添加图片加载错误处理。

当图片 URL 无效或图片加载失败时,用户会看到破损的图片图标。建议添加 onError 处理以提供更好的用户体验。

♻️ 建议的改进
         <img
           src={url}
           alt={t('pages.requests.detail-panel.pretty-view.UserImage', { defaultValue: 'User provided image' })}
           className="max-h-64 max-w-full object-contain"
           loading="lazy"
+          onError={(e) => {
+            e.currentTarget.style.display = 'none'
+          }}
         />

或者可以考虑显示一个占位符图标来替代破损的图片。

backend/src/adapters/upstream/openai-responses.ts (1)

108-124: 建议提取公共工具函数以消除重复代码。

convertImageToUrlhasImages 函数在 backend/src/adapters/upstream/openai.ts 中存在完全相同的实现。建议将这些工具函数提取到共享模块(如 backend/src/adapters/utils.ts)中。

♻️ 建议的重构

创建 backend/src/adapters/utils.ts:

import type { ImageContentBlock, InternalContentBlock } from "./types";

/**
 * Convert image source to URL format
 */
export function convertImageToUrl(block: ImageContentBlock): string {
  if (block.source.type === "url" && block.source.url) {
    return block.source.url;
  }
  // Convert base64 to data URL
  return `data:${block.source.mediaType || "image/jpeg"};base64,${block.source.data}`;
}

/**
 * Check if content blocks contain any images
 */
export function hasImages(content: InternalContentBlock[]): boolean {
  return content.some((b) => b.type === "image");
}

然后在两个 upstream adapter 中导入使用。

backend/src/adapters/upstream/openai.ts (1)

141-147: 考虑 mediaType 缺失时的默认值。

mediaType 未定义时默认使用 "image/jpeg",但实际图片可能是 PNG、GIF 或 WebP 格式。虽然浏览器通常对 data URL 的 MIME 类型比较宽容,但建议在上游(解析请求时)尽量保留原始的 mediaType 信息。

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c70250 and 480f8d4.

📒 Files selected for processing (10)
  • backend/src/adapters/request/anthropic.ts
  • backend/src/adapters/request/openai-chat.ts
  • backend/src/adapters/request/openai-response.ts
  • backend/src/adapters/response/anthropic.ts
  • backend/src/adapters/types.ts
  • backend/src/adapters/upstream/anthropic.ts
  • backend/src/adapters/upstream/openai-responses.ts
  • backend/src/adapters/upstream/openai.ts
  • backend/src/api/v1/completions.ts
  • frontend/src/pages/requests/detail-panel/pretty-view.tsx
🧰 Additional context used
🧬 Code graph analysis (6)
backend/src/adapters/request/openai-response.ts (1)
backend/src/adapters/types.ts (2)
  • InternalContentBlock (65-70)
  • ImageContentBlock (51-60)
backend/src/adapters/request/openai-chat.ts (1)
backend/src/adapters/types.ts (2)
  • InternalContentBlock (65-70)
  • ImageContentBlock (51-60)
backend/src/adapters/upstream/openai.ts (1)
backend/src/adapters/types.ts (2)
  • ImageContentBlock (51-60)
  • InternalContentBlock (65-70)
backend/src/adapters/request/anthropic.ts (1)
backend/src/adapters/types.ts (1)
  • ImageContentBlock (51-60)
frontend/src/pages/requests/detail-panel/pretty-view.tsx (1)
frontend/src/components/ui/indicator-badge.tsx (1)
  • IndicatorBadge (6-16)
backend/src/adapters/upstream/openai-responses.ts (1)
backend/src/adapters/types.ts (2)
  • ImageContentBlock (51-60)
  • InternalContentBlock (65-70)
🔇 Additional comments (10)
backend/src/adapters/response/anthropic.ts (1)

97-99: LGTM!

正确处理了 image 类型的内容块。在 Anthropic 的助手响应中不包含图片块(图片仅出现在请求中),返回 null 是合理的处理方式,与 tool_result 的处理逻辑一致。

backend/src/adapters/request/openai-chat.ts (1)

126-140: LGTM!

正确实现了 OpenAI image_url 内容部分到内部 ImageContentBlock 的转换。URL 源类型和 detail 字段都被正确处理和传递。

backend/src/api/v1/completions.ts (1)

75-102: LGTM!

API 验证 schema 正确扩展以支持 VLM 图片内容:

  • tContentPart 定义了文本和 image_url 两种内容类型
  • detail 字段支持 OpenAI 的视觉细节级别("auto" | "low" | "high")
  • tMessage.content 保持了向后兼容性,同时支持新的内容数组格式
frontend/src/pages/requests/detail-panel/pretty-view.tsx (3)

51-58: LGTM!

ImageContentPart 类型定义与 OpenAI 的 image_url 内容格式一致,包含了 detail 可选字段。


493-525: LGTM!

函数实现了良好的防御性编程,正确处理了 contentnullstring 或非数组的边界情况。运行时类型检查确保了类型安全。


163-176: LGTM!

图片区域的 UI 设计与工具调用区域保持一致,使用 IndicatorBadge 显示图片数量是良好的用户体验设计。

backend/src/adapters/request/openai-response.ts (1)

109-140: LGTM!

函数逻辑正确:当存在图片时返回 InternalContentBlock[],否则保持原有的字符串拼接行为。这种设计保持了向后兼容性。

backend/src/adapters/upstream/openai-responses.ts (1)

150-191: LGTM!

消息转换逻辑清晰地处理了三种情况:字符串内容、包含图片的数组、纯文本数组。detail 字段被正确传递给上游 API。

backend/src/adapters/upstream/openai.ts (2)

27-38: LGTM!

OpenAIContentPart 接口正确建模了 OpenAI Vision API 的内容格式,OpenAIMessage.content 类型更新为支持字符串、数组或 null,符合 OpenAI Chat Completions API 规范。


199-240: LGTM!

消息转换逻辑正确实现了 OpenAI Vision API 的内容格式要求。当存在图片时构建 image_url 部分,否则将文本块合并为字符串。

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread backend/src/adapters/request/anthropic.ts Outdated
Comment thread backend/src/adapters/upstream/anthropic.ts
pescn and others added 3 commits January 24, 2026 17:16
- Support URL source type in Anthropic request adapter (not just base64)
- Validate base64 data is present before sending to Anthropic API

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add validation in OpenAI upstream adapter to skip images with missing data
- Add validation in OpenAI Responses upstream adapter for the same
- Prevents invalid data URLs being sent to providers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use discriminated union for ImageSource to ensure type safety
- When type is "base64", data field is required
- When type is "url", url field is required
- Update request adapters to validate data before creating blocks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@pescn
Copy link
Copy Markdown
Contributor Author

pescn commented Jan 24, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Vision Language Model (VLM) support, a significant feature enhancement. The changes are comprehensive, touching multiple backend adapters to handle image content and updating the frontend to render images. My review focuses on improving maintainability by addressing code duplication and simplifying complex type checks. I've also identified a potential bug in the Anthropic request adapter's image handling logic that could lead to incorrect data processing. Overall, this is a solid implementation of a complex feature.

Comment thread backend/src/adapters/request/anthropic.ts
Comment thread backend/src/adapters/upstream/openai.ts Outdated
Comment thread frontend/src/pages/requests/detail-panel/pretty-view.tsx
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@backend/src/adapters/upstream/openai-responses.ts`:
- Around line 153-192: When hasImages(msg.content) is true but convertImageToUrl
filters out all images, contentParts can be empty which would send an empty
content array to the OpenAI Responses API; update the image-processing branch in
openai-responses.ts (symbols: hasImages, convertImageToUrl, contentParts,
msg.content) to check if contentParts.length === 0 before returning and, if
empty, fall back to the text-only behavior used later (build a joined string
from text blocks) or otherwise avoid returning an empty content array (e.g.,
skip image-only return and let the subsequent text-only join handle the
response); ensure the returned object always contains a non-empty content field.

In `@backend/src/adapters/upstream/openai.ts`:
- Around line 202-241: The image-branch builds contentParts (in the hasImages
branch) but may return an empty array if all images lack URLs; instead, after
populating contentParts in the hasImages branch, check if contentParts.length
=== 0 and if so fall back to producing a text string from msg.content (same
logic as the text-only path that creates the content variable) and return {
role: msg.role, content: contentString } rather than returning an empty array;
reference symbols: hasImages, contentParts, convertImageToUrl, msg.content, and
the existing text-join logic that builds content.
🧹 Nitpick comments (1)
backend/src/adapters/request/anthropic.ts (1)

141-160: base64 图片源缺少数据验证。

URL 类型的图片源会验证 url 是否存在(第 143 行),但 base64 类型却没有验证 data 是否存在。这可能导致创建包含 undefined data 的 ImageContentBlock,虽然类型上允许,但语义上不太合理。

建议与 URL 分支保持一致的验证逻辑:

♻️ 建议修复
     case "image":
       // Handle both base64 and URL source types
       if (block.source?.type === "url" && block.source.url) {
         return {
           type: "image",
           source: {
             type: "url",
             url: block.source.url,
           },
         } as ImageContentBlock;
       }
-      // Default to base64
-      return {
-        type: "image",
-        source: {
-          type: "base64",
-          mediaType: block.source?.media_type,
-          data: block.source?.data,
-        },
-      } as ImageContentBlock;
+      // Default to base64, validate data exists
+      if (block.source?.type === "base64" && block.source.data) {
+        return {
+          type: "image",
+          source: {
+            type: "base64",
+            mediaType: block.source.media_type,
+            data: block.source.data,
+          },
+        } as ImageContentBlock;
+      }
+      // Skip invalid image blocks
+      return null;
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 480f8d4 and 1e7fcc4.

📒 Files selected for processing (4)
  • backend/src/adapters/request/anthropic.ts
  • backend/src/adapters/upstream/anthropic.ts
  • backend/src/adapters/upstream/openai-responses.ts
  • backend/src/adapters/upstream/openai.ts
🧰 Additional context used
🧬 Code graph analysis (3)
backend/src/adapters/upstream/openai-responses.ts (1)
backend/src/adapters/types.ts (2)
  • ImageContentBlock (51-60)
  • InternalContentBlock (65-70)
backend/src/adapters/upstream/openai.ts (1)
backend/src/adapters/types.ts (2)
  • ImageContentBlock (51-60)
  • InternalContentBlock (65-70)
backend/src/adapters/request/anthropic.ts (1)
backend/src/adapters/types.ts (1)
  • ImageContentBlock (51-60)
🔇 Additional comments (10)
backend/src/adapters/upstream/anthropic.ts (2)

29-34: LGTM!

AnthropicContentBlocksource 字段类型定义正确,支持 base64url 两种图片源类型,与 Anthropic API 规范一致。


204-224: LGTM!

图片处理逻辑实现正确:

  • base64 类型在发送前验证 data 是否存在
  • URL 类型在发送前验证 url 是否存在
  • 无效的图片块会被静默跳过,避免向 Anthropic API 发送无效数据

之前评审中提出的空 base64 数据问题已在 commit 8585bc0 中修复。

backend/src/adapters/request/anthropic.ts (2)

7-7: LGTM!

正确导入了 ImageContentBlock 类型以支持新的图片处理逻辑。


28-32: LGTM!

AnthropicContentBlock.source 类型定义已正确扩展,支持 base64url 两种源类型,与 Anthropic API 文档一致。

backend/src/adapters/upstream/openai-responses.ts (3)

6-19: 新增 ImageContentBlock 导入合理。
后续图片转换逻辑需要它,改动清晰。


25-30: ResponseApiContentPart 扩展为支持 input_image 很合适。
字段补齐后更符合多模态输入需求。


108-127: 图片 URL/base64 转换与判空处理 OK。
这部分逻辑清晰,避免了无效 data URL。

backend/src/adapters/upstream/openai.ts (3)

6-21: 新增 ImageContentBlock 导入合理。
类型依赖明确,便于后续转换使用。


27-39: OpenAIContentPart 与消息 content 扩展方向正确。
支持图像内容后结构更符合多模态消息需求。


138-157: 图片 URL/base64 转换与判空处理 OK。
逻辑简洁且易于复用。

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread backend/src/adapters/upstream/openai-responses.ts
Comment thread backend/src/adapters/upstream/openai.ts
- Check source.type === "base64" before processing as base64 in Anthropic adapter
- Handle empty content array edge case in OpenAI upstream adapters
- Fallback to empty string if contentParts is empty

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/src/adapters/request/openai-chat.ts`:
- Around line 126-139: The image handling loop that builds blocks (the for...of
over content) currently always maps image_url parts to source.type="url"; update
the logic in the image branch (where ImageContentBlock is constructed) to detect
data URLs (strings starting with "data:") in part.image_url.url, parse them into
mediaType and base64 payload, and set source to { type: "base64", mediaType,
data } instead of a URL; keep the existing fallback to { type: "url", url } for
non-data URLs and preserve detail, so downstream adapters that expect base64
(e.g., Anthropic) receive the correct shape.
♻️ Duplicate comments (1)
backend/src/adapters/request/anthropic.ts (1)

152-161: 对 base64 类型的检查不够严格。

当前逻辑在第 153 行仅检查 block.source?.data 是否存在,而未验证 type === "base64"。如果收到畸形请求 { type: 'url', data: '...' }(缺少 url 字段),会被错误地当作 base64 图像处理。

建议与 URL 分支保持一致,显式检查类型:

-      if (block.source?.data) {
+      if (block.source?.type === "base64" && block.source.data) {
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e7fcc4 and 88f4b0f.

📒 Files selected for processing (3)
  • backend/src/adapters/request/anthropic.ts
  • backend/src/adapters/request/openai-chat.ts
  • backend/src/adapters/types.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/src/adapters/types.ts
🧰 Additional context used
🧬 Code graph analysis (2)
backend/src/adapters/request/openai-chat.ts (1)
backend/src/adapters/types.ts (2)
  • InternalContentBlock (74-79)
  • ImageContentBlock (65-69)
backend/src/adapters/request/anthropic.ts (1)
backend/src/adapters/types.ts (1)
  • ImageContentBlock (65-69)
🔇 Additional comments (3)
backend/src/adapters/request/anthropic.ts (3)

7-7: LGTM!

正确导入了 ImageContentBlock 类型以支持新增的图像处理逻辑。


28-33: LGTM!

接口正确扩展以支持 base64url 两种图像源类型,字段设计符合 Anthropic API 规范。


154-161: mediaType 为可选字段,无需添加默认值。

根据类型定义,ImageSource 的 base64 变体中 mediaType 明确定义为可选字段(mediaType?: string),输入的 media_type 也是可选的。代码正确地透传了源数据:如果 block.source.media_type 存在则传递,不存在则为 undefined

这符合类型系统的设计,不需要添加默认值。若下游代码依赖 mediaType 字段,应由该代码负责处理可选字段。转换函数的职责是转换数据,而非添加源中不存在的数据。

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread backend/src/adapters/request/openai-chat.ts
- Create backend/src/adapters/upstream/utils.ts with shared functions
- Extract convertImageToUrl and hasImages to shared module
- Update openai.ts and openai-responses.ts to use shared utilities
- Reduces code duplication and improves maintainability

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@pescn
Copy link
Copy Markdown
Contributor Author

pescn commented Jan 24, 2026

Fixed in commit a66cc43. Extracted shared utility functions to backend/src/adapters/upstream/utils.ts:

// backend/src/adapters/upstream/utils.ts
export function convertImageToUrl(block: ImageContentBlock): string { ... }
export function hasImages(content: InternalContentBlock[]): boolean { ... }

Both openai.ts and openai-responses.ts now import from this shared module, eliminating code duplication.

@pescn
Copy link
Copy Markdown
Contributor Author

pescn commented Jan 24, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Vision Language Model (VLM) support, a significant feature enhancement. The changes are well-architected, with a new unified ImageContentBlock type and updates across various request and upstream adapters to handle image content. The addition of a shared utils.ts for image conversion logic is a good practice for code reuse. The frontend is also correctly updated to display images in the request details view. Overall, this is a solid implementation. I have one suggestion to improve the readability of the image parsing logic on the frontend.

Comment thread frontend/src/pages/requests/detail-panel/pretty-view.tsx
- Add parseImageUrl helper to detect and parse data URLs
- Convert data URLs (data:image/...;base64,...) to base64 source type
- Ensures cross-provider compatibility (e.g., OpenAI to Anthropic)
- Applied to both openai-chat and openai-response adapters

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@pescn pescn merged commit 50f9873 into main Jan 24, 2026
2 checks passed
@pescn pescn deleted the feat/vlm-support branch January 24, 2026 09:40
@pescn pescn restored the feat/vlm-support branch January 24, 2026 13:37
@pescn pescn deleted the feat/vlm-support branch January 24, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Vision Language Model (VLM) Support

1 participant