Summary
Add support for Vision Language Models (VLM) - image input for model analysis.
Scope
Supported
- ✅ Image Input (Vision): Users send images for model analysis (GPT-4V, Claude Vision, etc.)
Not Supported (Future)
- ❌ Image Generation (DALL-E)
- ❌ Audio Input/Output
- ❌ Video
- ❌ Realtime/Omni
Requirements
Internal Format Extension
// New ImageContentBlock
interface ImageContentBlock {
type: "image"
source: {
type: "base64" | "url"
mediaType?: string // "image/jpeg", "image/png", etc.
data?: string // base64 data
url?: string // image URL
}
detail?: "auto" | "low" | "high" // OpenAI vision detail
}
// Update InternalContentBlock union
type InternalContentBlock =
| TextContentBlock
| ThinkingContentBlock
| ToolUseContentBlock
| ToolResultContentBlock
| ImageContentBlock // New
Adapter Modifications
Request Adapters (Parse image input):
openai-chat.ts: Parse image_url content part
anthropic.ts: Parse image content block
openai-responses.ts: Parse input_image part
Upstream Adapters (Send to Provider):
openai.ts: Build OpenAI Vision format
anthropic.ts: Build Anthropic Vision format
Database Changes
ALTER TABLE models ADD COLUMN supports_vision BOOLEAN DEFAULT false;
ALTER TABLE models ADD COLUMN max_image_size INTEGER; -- bytes
ALTER TABLE models ADD COLUMN max_images_per_request INTEGER;
Image Handling Considerations
- Base64 encoding adds ~33% overhead
- Large images: Consider server-side compression or rejection
- JSONB storage: Consider whether to save original image data
概要
添加视觉语言模型(VLM)支持 - 图片输入用于模型分析。
范围
支持
- ✅ 图片输入 (Vision):用户发送图片让模型分析(GPT-4V、Claude Vision 等)
暂不支持(未来)
- ❌ 图片生成 (DALL-E)
- ❌ 音频输入/输出
- ❌ 视频
- ❌ Realtime/Omni
需求
内部格式扩展
// 新增 ImageContentBlock
interface ImageContentBlock {
type: "image"
source: {
type: "base64" | "url"
mediaType?: string // "image/jpeg", "image/png" 等
data?: string // base64 数据
url?: string // 图片 URL
}
detail?: "auto" | "low" | "high" // OpenAI vision 细节级别
}
// 更新 InternalContentBlock union
type InternalContentBlock =
| TextContentBlock
| ThinkingContentBlock
| ToolUseContentBlock
| ToolResultContentBlock
| ImageContentBlock // 新增
适配器修改
Request Adapters(解析图片输入):
openai-chat.ts:解析 image_url content part
anthropic.ts:解析 image content block
openai-responses.ts:解析 input_image part
Upstream Adapters(发送给 Provider):
openai.ts:构建 OpenAI Vision 格式
anthropic.ts:构建 Anthropic Vision 格式
数据库变更
ALTER TABLE models ADD COLUMN supports_vision BOOLEAN DEFAULT false;
ALTER TABLE models ADD COLUMN max_image_size INTEGER; -- 字节
ALTER TABLE models ADD COLUMN max_images_per_request INTEGER;
图片处理注意事项
- Base64 编码会增加约 33% 体积
- 大图片:考虑服务端压缩或拒绝
- JSONB 存储:考虑是否保存原始图片数据
Summary
Add support for Vision Language Models (VLM) - image input for model analysis.
Scope
Supported
Not Supported (Future)
Requirements
Internal Format Extension
Adapter Modifications
Request Adapters (Parse image input):
openai-chat.ts: Parseimage_urlcontent partanthropic.ts: Parseimagecontent blockopenai-responses.ts: Parseinput_imagepartUpstream Adapters (Send to Provider):
openai.ts: Build OpenAI Vision formatanthropic.ts: Build Anthropic Vision formatDatabase Changes
Image Handling Considerations
概要
添加视觉语言模型(VLM)支持 - 图片输入用于模型分析。
范围
支持
暂不支持(未来)
需求
内部格式扩展
适配器修改
Request Adapters(解析图片输入):
openai-chat.ts:解析image_urlcontent partanthropic.ts:解析imagecontent blockopenai-responses.ts:解析input_imagepartUpstream Adapters(发送给 Provider):
openai.ts:构建 OpenAI Vision 格式anthropic.ts:构建 Anthropic Vision 格式数据库变更
图片处理注意事项