-
Notifications
You must be signed in to change notification settings - Fork 625
feat: support openrouter gemini 2.5 flash image preview #791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughImplements streaming-time image handling in OpenAICompatibleProvider: detects image deltas, caches image URLs via devicePresenter, emits image_data events (with cached URL or original), and supports Gemini inline image parts by emitting image_data with provided data and mime type. Regular text delta processing is skipped for chunks containing images. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant L as LLM Stream
participant P as OpenAICompatibleProvider
participant D as DevicePresenter
participant C as Client
Note over L,P: Streaming chat completion deltas
L->>P: delta (content/text or image)
alt delta has image_url
P->>D: cacheImage(image_url)
alt cache success
D-->>P: cachedUrl
P-->>C: event: image_data(data=cachedUrl, mime=deepchat/image-url)
else cache failure
D-->>P: error
P-->>C: event: image_data(data=image_url, mime=deepchat/image-url)
end
Note over P: Skip further delta processing for this chunk
else delta has Gemini inlineData
Note over P: parts[].inlineData.{data,mimeType}
P-->>C: event: image_data(data=inlineData.data, mime=inlineData.mimeType or image/png)
Note over P: Skip further delta processing for this chunk
else text/other delta
P-->>C: existing text/role delta handling
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts(1 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{js,jsx,ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/development-setup.mdc)
**/*.{js,jsx,ts,tsx}: 使用 OxLint 进行代码检查
Log和注释使用英文书写
Files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
src/{main,renderer}/**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/electron-best-practices.mdc)
src/{main,renderer}/**/*.ts: Use context isolation for improved security
Implement proper inter-process communication (IPC) patterns
Optimize application startup time with lazy loading
Implement proper error handling and logging for debugging
Files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
src/main/**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/electron-best-practices.mdc)
Use Electron's built-in APIs for file system and native dialogs
From main to renderer, broadcast events via EventBus using mainWindow.webContents.send()
Files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/error-logging.mdc)
**/*.{ts,tsx}: 始终使用 try-catch 处理可能的错误
提供有意义的错误信息
记录详细的错误日志
优雅降级处理
日志应包含时间戳、日志级别、错误代码、错误描述、堆栈跟踪(如适用)、相关上下文信息
日志级别应包括 ERROR、WARN、INFO、DEBUG
不要吞掉错误
提供用户友好的错误信息
实现错误重试机制
避免记录敏感信息
使用结构化日志
设置适当的日志级别Enable and adhere to strict TypeScript type checking
Files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
src/main/presenter/llmProviderPresenter/providers/*.ts
📄 CodeRabbit inference engine (.cursor/rules/llm-agent-loop.mdc)
src/main/presenter/llmProviderPresenter/providers/*.ts: Each file insrc/main/presenter/llmProviderPresenter/providers/*.tsshould handle interaction with a specific LLM API, including request/response formatting, tool definition conversion, native/non-native tool call management, and standardizing output streams to a common event format.
Provider implementations must use acoreStreammethod that yields standardized stream events to decouple the main loop from provider-specific details.
ThecoreStreammethod in each Provider must perform a single streaming API request per conversation round and must not contain multi-round tool call loop logic.
Provider files should implement helper methods such asformatMessages,convertToProviderTools,parseFunctionCalls, andprepareFunctionCallPromptas needed for provider-specific logic.
All provider implementations must parse provider-specific data chunks and yield standardized events for text, reasoning, tool calls, usage, errors, stop reasons, and image data.
When a provider does not support native function calling, it must prepare messages using prompt wrapping (e.g.,prepareFunctionCallPrompt) before making the API call.
When a provider supports native function calling, MCP tools must be converted to the provider's format (e.g., usingconvertToProviderTools) and included in the API request.
Provider implementations should aggregate and yield usage events as part of the standardized stream.
Provider implementations should yield image data events in the standardized format when applicable.
Provider implementations should yield reasoning events in the standardized format when applicable.
Provider implementations should yield tool call events (tool_call_start,tool_call_chunk,tool_call_end) in the standardized format.
Provider implementations should yield stop events with appropriatestop_reasonin the standardized format.
Provider implementations should yield error events in the standardized format...
Files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
src/main/**/*.{ts,js,tsx,jsx}
📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)
主进程代码放在
src/main
Files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
src/**/*.{ts,tsx,vue}
📄 CodeRabbit inference engine (CLAUDE.md)
Use English for all logs and comments
Files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
src/main/presenter/**/*.ts
📄 CodeRabbit inference engine (CLAUDE.md)
Maintain one presenter per functional domain in src/main/presenter/
Files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Provider implementations should yield image data events in the standardized format when applicable.
📚 Learning: 2025-07-21T01:46:52.880Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Provider implementations should yield image data events in the standardized format when applicable.
Applied to files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-07-21T01:46:52.880Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : All provider implementations must parse provider-specific data chunks and yield standardized events for text, reasoning, tool calls, usage, errors, stop reasons, and image data.
Applied to files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-07-21T01:46:52.880Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Provider implementations must use a `coreStream` method that yields standardized stream events to decouple the main loop from provider-specific details.
Applied to files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-08-26T14:13:46.578Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-26T14:13:46.578Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Each LLM provider must implement provider-specific API interactions, convert MCP tools, and normalize streaming responses
Applied to files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-08-26T14:13:46.578Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-26T14:13:46.578Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Implement a coreStream method for new providers following the standardized event interface
Applied to files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-07-21T01:46:52.880Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : The `coreStream` method in each Provider must perform a single streaming API request per conversation round and must not contain multi-round tool call loop logic.
Applied to files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-07-21T01:46:52.880Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Provider implementations should aggregate and yield usage events as part of the standardized stream.
Applied to files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-07-21T01:46:52.880Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Each file in `src/main/presenter/llmProviderPresenter/providers/*.ts` should handle interaction with a specific LLM API, including request/response formatting, tool definition conversion, native/non-native tool call management, and standardizing output streams to a common event format.
Applied to files:
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
🧬 Code graph analysis (1)
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts (1)
src/main/presenter/index.ts (1)
presenter(188-188)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-check (x64)
| // 处理图片数据(OpenRouter Gemini 格式) | ||
| if (delta?.images && Array.isArray(delta.images)) { | ||
| for (const image of delta.images) { | ||
| if (image.type === 'image_url' && image.image_url?.url) { | ||
| try { | ||
| const cachedUrl = await presenter.devicePresenter.cacheImage(image.image_url.url) | ||
| yield { | ||
| type: 'image_data', | ||
| image_data: { | ||
| data: cachedUrl, | ||
| mimeType: 'deepchat/image-url' | ||
| } | ||
| } | ||
| } catch (cacheError) { | ||
| console.warn('[handleChatCompletion] Failed to cache image:', cacheError) | ||
| yield { | ||
| type: 'image_data', | ||
| image_data: { | ||
| data: image.image_url.url, | ||
| mimeType: 'deepchat/image-url' | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| continue | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid early-continue; it can drop finish_reason/tool_calls in mixed chunks
If a chunk contains both images and finish_reason/tool_calls, the early continue skips subsequent handlers, potentially causing missing stop events or tool-call deltas.
Apply this diff to skip only text parsing for the current chunk while still processing tool_calls/finish_reason:
- // 处理图片数据(OpenRouter Gemini 格式)
+ // 处理图片数据(OpenRouter Gemini 格式)
if (delta?.images && Array.isArray(delta.images)) {
for (const image of delta.images) {
if (image.type === 'image_url' && image.image_url?.url) {
try {
const cachedUrl = await presenter.devicePresenter.cacheImage(image.image_url.url)
yield {
type: 'image_data',
image_data: {
data: cachedUrl,
mimeType: 'deepchat/image-url'
}
}
} catch (cacheError) {
console.warn('[handleChatCompletion] Failed to cache image:', cacheError)
yield {
type: 'image_data',
image_data: {
data: image.image_url.url,
mimeType: 'deepchat/image-url'
}
}
}
}
}
- continue
+ // Skip only text parsing for this chunk; allow tool_calls/finish_reason handling below.
+ skipTextForChunk = true
}Add this guard (outside the selected range) right before the character-level text handling:
// declare near other state vars at the top of the loop scope
let skipTextForChunk = false- // 如果没有内容,则继续下一个 chunk
- if (!currentContent) continue
+ // 如果没有内容或已决定跳过文本处理,则继续下一个 chunk
+ if (!currentContent || skipTextForChunk) continue| // 处理 Gemini 原生格式的图片数据(inlineData) | ||
| if (delta?.content?.parts && Array.isArray(delta.content.parts)) { | ||
| for (const part of delta.content.parts) { | ||
| if (part.inlineData && part.inlineData.data) { | ||
| yield { | ||
| type: 'image_data', | ||
| image_data: { | ||
| data: part.inlineData.data, | ||
| mimeType: part.inlineData.mimeType || 'image/png' | ||
| } | ||
| } | ||
| } | ||
| } | ||
| continue | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Standardize inline image mimeType and avoid early-continue
- Inline Gemini images are emitted with mimeType set to the real image type, but elsewhere you use the sentinel 'deepchat/image-base64'. This inconsistency can break downstream renderers expecting the standardized format.
- Same early-continue issue as above.
Apply this diff to emit a consistent mimeType and skip only text parsing:
- // 处理 Gemini 原生格式的图片数据(inlineData)
+ // 处理 Gemini 原生格式的图片数据(inlineData)
if (delta?.content?.parts && Array.isArray(delta.content.parts)) {
for (const part of delta.content.parts) {
if (part.inlineData && part.inlineData.data) {
yield {
type: 'image_data',
image_data: {
- data: part.inlineData.data,
- mimeType: part.inlineData.mimeType || 'image/png'
+ data: part.inlineData.data,
+ // Use standardized sentinel for base64 payloads for consistency with other paths
+ mimeType: 'deepchat/image-base64'
}
}
}
}
- continue
+ // Skip only text parsing for this chunk; allow tool_calls/finish_reason handling below.
+ skipTextForChunk = true
}Optional improvement: convert inline base64 to a data URI and cache it via devicePresenter to unify URL-based rendering and reduce large in-memory payloads.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // 处理 Gemini 原生格式的图片数据(inlineData) | |
| if (delta?.content?.parts && Array.isArray(delta.content.parts)) { | |
| for (const part of delta.content.parts) { | |
| if (part.inlineData && part.inlineData.data) { | |
| yield { | |
| type: 'image_data', | |
| image_data: { | |
| data: part.inlineData.data, | |
| mimeType: part.inlineData.mimeType || 'image/png' | |
| } | |
| } | |
| } | |
| } | |
| continue | |
| } | |
| // 处理 Gemini 原生格式的图片数据(inlineData) | |
| if (delta?.content?.parts && Array.isArray(delta.content.parts)) { | |
| for (const part of delta.content.parts) { | |
| if (part.inlineData && part.inlineData.data) { | |
| yield { | |
| type: 'image_data', | |
| image_data: { | |
| data: part.inlineData.data, | |
| // Use standardized sentinel for base64 payloads for consistency with other paths | |
| mimeType: 'deepchat/image-base64' | |
| } | |
| } | |
| } | |
| } | |
| // Skip only text parsing for this chunk; allow tool_calls/finish_reason handling below. | |
| skipTextForChunk = true | |
| } |
🤖 Prompt for AI Agents
In src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
around lines 673-688, the handler emits inline Gemini images with their native
mimeType and uses an early continue that skips too much; change it to emit the
sentinel mimeType 'deepchat/image-base64' for consistency (set
image_data.mimeType = 'deepchat/image-base64') and remove the broad continue so
only text parsing is skipped (i.e., process image parts but don’t early-return
the whole delta handling), and optionally convert the inline base64 into a data
URI and register/cache it via devicePresenter to enable URL-based rendering and
avoid large in-memory payloads.
support openrouter gemini 2.5 flash image preview

Summary by CodeRabbit
New Features
Bug Fixes