feat: Vision Language Model (VLM) Support

## Summary

Add support for Vision Language Models (VLM) - image input for model analysis.

## Scope

### Supported
- ✅ **Image Input (Vision)**: Users send images for model analysis (GPT-4V, Claude Vision, etc.)

### Not Supported (Future)
- ❌ Image Generation (DALL-E)
- ❌ Audio Input/Output
- ❌ Video
- ❌ Realtime/Omni

## Requirements

### Internal Format Extension
```typescript
// New ImageContentBlock
interface ImageContentBlock {
  type: "image"
  source: {
    type: "base64" | "url"
    mediaType?: string  // "image/jpeg", "image/png", etc.
    data?: string       // base64 data
    url?: string        // image URL
  }
  detail?: "auto" | "low" | "high"  // OpenAI vision detail
}

// Update InternalContentBlock union
type InternalContentBlock =
  | TextContentBlock
  | ThinkingContentBlock
  | ToolUseContentBlock
  | ToolResultContentBlock
  | ImageContentBlock  // New
```

### Adapter Modifications

**Request Adapters** (Parse image input):
- `openai-chat.ts`: Parse `image_url` content part
- `anthropic.ts`: Parse `image` content block
- `openai-responses.ts`: Parse `input_image` part

**Upstream Adapters** (Send to Provider):
- `openai.ts`: Build OpenAI Vision format
- `anthropic.ts`: Build Anthropic Vision format

### Database Changes
```sql
ALTER TABLE models ADD COLUMN supports_vision BOOLEAN DEFAULT false;
ALTER TABLE models ADD COLUMN max_image_size INTEGER;  -- bytes
ALTER TABLE models ADD COLUMN max_images_per_request INTEGER;
```

### Image Handling Considerations
- Base64 encoding adds ~33% overhead
- Large images: Consider server-side compression or rejection
- JSONB storage: Consider whether to save original image data

---

## 概要

添加视觉语言模型（VLM）支持 - 图片输入用于模型分析。

## 范围

### 支持
- ✅ **图片输入 (Vision)**：用户发送图片让模型分析（GPT-4V、Claude Vision 等）

### 暂不支持（未来）
- ❌ 图片生成 (DALL-E)
- ❌ 音频输入/输出
- ❌ 视频
- ❌ Realtime/Omni

## 需求

### 内部格式扩展
```typescript
// 新增 ImageContentBlock
interface ImageContentBlock {
  type: "image"
  source: {
    type: "base64" | "url"
    mediaType?: string  // "image/jpeg", "image/png" 等
    data?: string       // base64 数据
    url?: string        // 图片 URL
  }
  detail?: "auto" | "low" | "high"  // OpenAI vision 细节级别
}

// 更新 InternalContentBlock union
type InternalContentBlock =
  | TextContentBlock
  | ThinkingContentBlock
  | ToolUseContentBlock
  | ToolResultContentBlock
  | ImageContentBlock  // 新增
```

### 适配器修改

**Request Adapters**（解析图片输入）：
- `openai-chat.ts`：解析 `image_url` content part
- `anthropic.ts`：解析 `image` content block
- `openai-responses.ts`：解析 `input_image` part

**Upstream Adapters**（发送给 Provider）：
- `openai.ts`：构建 OpenAI Vision 格式
- `anthropic.ts`：构建 Anthropic Vision 格式

### 数据库变更
```sql
ALTER TABLE models ADD COLUMN supports_vision BOOLEAN DEFAULT false;
ALTER TABLE models ADD COLUMN max_image_size INTEGER;  -- 字节
ALTER TABLE models ADD COLUMN max_images_per_request INTEGER;
```

### 图片处理注意事项
- Base64 编码会增加约 33% 体积
- 大图片：考虑服务端压缩或拒绝
- JSONB 存储：考虑是否保存原始图片数据

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Vision Language Model (VLM) Support #56

Summary

Scope

Supported

Not Supported (Future)

Requirements

Internal Format Extension

Adapter Modifications

Database Changes

Image Handling Considerations

概要

范围

支持

暂不支持（未来）

需求

内部格式扩展

适配器修改

数据库变更

图片处理注意事项

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Vision Language Model (VLM) Support #56

Description

Summary

Scope

Supported

Not Supported (Future)

Requirements

Internal Format Extension

Adapter Modifications

Database Changes

Image Handling Considerations

概要

范围

支持

暂不支持（未来）

需求

内部格式扩展

适配器修改

数据库变更

图片处理注意事项

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions