Add vision_model registration & vision tools for image input support

## Problem

Mainstream DeepSeek models (e.g., `deepseek-v4-pro`) **do not natively support multimodal image input**. Offloading image recognition to a dedicated, fast vision-capable LLM allows the main agent to stay focused on reasoning and coding, without being distracted by visual parsing work.

## Proposed solution

### 1. Clipboard image paste (`Ctrl+V`)

Pressing `Ctrl+V` in the composer reads the system clipboard. If it contains an image (RGBA bitmap), the TUI encodes it as PNG, persists it to `~/.deepseek/clipboard-images/clipboard-{timestamp}.png`, and inserts a text reference into the input buffer:

`[image:/home/user/.deepseek/clipboard-images/clipboard-1715030400123456789.png]`

A status hint is displayed, e.g. `Attached image: 1024x768 PNG (235KB)`.

This works **independently** of `vision_model` — the image is saved to disk and referenced by path so the main model or any sub-agent can read it via standard file tools.

### 2. Dedicated vision model (`[vision_model]`)

Users can configure a standalone [vision_model] in config.toml. Once enabled via the vision_model feature flag, two built-in vision tools become available:

- **`vision_analyze`** — reads an image file from disk, base64-encodes it, sends it to the configured vision model for analysis
- **`vision_ocr`** — delegates to `vision_analyze` with an OCR-specific prompt to extract text from images

The vision model runs with fully **independent session state**, isolated from the main model's context window.

### Configure in config.toml
```
[features]
  vision_model = true 

[vision_model]
  model = ""gemini-3.1-flash-lite-preview"          # vision_model
  provider = "openai"        # optional
  api_key = "..."
  base_url = "https://generativelanguage.googleapis.com/v1beta/openai/"
```

## End-to-end flow

  Ctrl+V (clipboard has image)
    → PNG saved to ~/.deepseek/clipboard-images/
    → [image:path] inserted into composer
    → user types prompt: "what does this screenshot show?"
    → DeepSeek main model decides to call vision_analyze
    → tool reads image file, base64-encodes it
    → independent vision session sends OpenAI-compatible request to Gemini
    → analysis result flows back → main model → user

  Supported image formats

  - Clipboard paste: any RGBA bitmap on the system clipboard
  - File analysis (vision_analyze / vision_ocr): png, jpg/jpeg, gif, webp, bmp, svg

## Additional context

### Input Image

<img width="600" height="400" alt="Image" src="https://github.com/user-attachments/assets/4fbdb577-95a9-4a50-b277-9f136b2e9d8a" />

### Input Prompt

<img width="1277" height="136" alt="Image" src="https://github.com/user-attachments/assets/bfde934e-bdeb-4901-92ff-f9fabcec2f28" />


### Output Result

<img width="1200" height="452" alt="Image" src="https://github.com/user-attachments/assets/3bbf6286-b5dd-4029-9c65-8cdebb40ef27" />



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vision_model registration & vision tools for image input support #868

Problem

Proposed solution

1. Clipboard image paste (`Ctrl+V`)

2. Dedicated vision model (`[vision_model]`)

Configure in config.toml

End-to-end flow

Additional context

Input Image

Input Prompt

Output Result

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add vision_model registration & vision tools for image input support #868

Description

Problem

Proposed solution

1. Clipboard image paste (Ctrl+V)

2. Dedicated vision model ([vision_model])

Configure in config.toml

End-to-end flow

Additional context

Input Image

Input Prompt

Output Result

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Clipboard image paste (`Ctrl+V`)

2. Dedicated vision model (`[vision_model]`)