Skip to content

Conversation

@yyhhyyyyyy
Copy link
Collaborator

@yyhhyyyyyy yyhhyyyyyy commented Aug 27, 2025

support openrouter gemini 2.5 flash image preview
23e5ee2091def644b89938e945ab5e49

Summary by CodeRabbit

  • New Features

    • Streamed responses now include images, emitted as image data events.
    • Supports image URLs in OpenAI‑compatible streams and inline image data from Gemini.
    • Provides consistent mime types for image data (e.g., image/png or image-url).
    • Caches image URLs for faster loading when possible.
  • Bug Fixes

    • Improved resilience: if image caching fails, the original URL is used so images still render without interruption.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 27, 2025

Walkthrough

Implements streaming-time image handling in OpenAICompatibleProvider: detects image deltas, caches image URLs via devicePresenter, emits image_data events (with cached URL or original), and supports Gemini inline image parts by emitting image_data with provided data and mime type. Regular text delta processing is skipped for chunks containing images.

Changes

Cohort / File(s) Change Summary
OpenAI-compatible provider image streaming
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
Added logic in handleChatCompletion to detect image deltas from OpenAI-compatible streams and Gemini parts, cache image URLs via presenter.devicePresenter.cacheImage, and emit image_data events with appropriate mimeType. Added warning and fallback when caching fails. No exported API changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant L as LLM Stream
  participant P as OpenAICompatibleProvider
  participant D as DevicePresenter
  participant C as Client

  Note over L,P: Streaming chat completion deltas

  L->>P: delta (content/text or image)
  alt delta has image_url
    P->>D: cacheImage(image_url)
    alt cache success
      D-->>P: cachedUrl
      P-->>C: event: image_data(data=cachedUrl, mime=deepchat/image-url)
    else cache failure
      D-->>P: error
      P-->>C: event: image_data(data=image_url, mime=deepchat/image-url)
    end
    Note over P: Skip further delta processing for this chunk
  else delta has Gemini inlineData
    Note over P: parts[].inlineData.{data,mimeType}
    P-->>C: event: image_data(data=inlineData.data, mime=inlineData.mimeType or image/png)
    Note over P: Skip further delta processing for this chunk
  else text/other delta
    P-->>C: existing text/role delta handling
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • zerob13

Poem

A nibble, a hop, a stream in the night,
Images peek in the token-light—delight! 🥕
Cache if you can, fallback if you must,
Emit with mime, in stream we trust.
Little ears perk at each data clue—
Picture by picture, the chat hops through.

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/openrouter-gemini-image-generation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3c3cc80 and 99074d8.

📒 Files selected for processing (1)
  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/development-setup.mdc)

**/*.{js,jsx,ts,tsx}: 使用 OxLint 进行代码检查
Log和注释使用英文书写

Files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
src/{main,renderer}/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/electron-best-practices.mdc)

src/{main,renderer}/**/*.ts: Use context isolation for improved security
Implement proper inter-process communication (IPC) patterns
Optimize application startup time with lazy loading
Implement proper error handling and logging for debugging

Files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
src/main/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/electron-best-practices.mdc)

Use Electron's built-in APIs for file system and native dialogs

From main to renderer, broadcast events via EventBus using mainWindow.webContents.send()

Files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/error-logging.mdc)

**/*.{ts,tsx}: 始终使用 try-catch 处理可能的错误
提供有意义的错误信息
记录详细的错误日志
优雅降级处理
日志应包含时间戳、日志级别、错误代码、错误描述、堆栈跟踪(如适用)、相关上下文信息
日志级别应包括 ERROR、WARN、INFO、DEBUG
不要吞掉错误
提供用户友好的错误信息
实现错误重试机制
避免记录敏感信息
使用结构化日志
设置适当的日志级别

Enable and adhere to strict TypeScript type checking

Files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
src/main/presenter/llmProviderPresenter/providers/*.ts

📄 CodeRabbit inference engine (.cursor/rules/llm-agent-loop.mdc)

src/main/presenter/llmProviderPresenter/providers/*.ts: Each file in src/main/presenter/llmProviderPresenter/providers/*.ts should handle interaction with a specific LLM API, including request/response formatting, tool definition conversion, native/non-native tool call management, and standardizing output streams to a common event format.
Provider implementations must use a coreStream method that yields standardized stream events to decouple the main loop from provider-specific details.
The coreStream method in each Provider must perform a single streaming API request per conversation round and must not contain multi-round tool call loop logic.
Provider files should implement helper methods such as formatMessages, convertToProviderTools, parseFunctionCalls, and prepareFunctionCallPrompt as needed for provider-specific logic.
All provider implementations must parse provider-specific data chunks and yield standardized events for text, reasoning, tool calls, usage, errors, stop reasons, and image data.
When a provider does not support native function calling, it must prepare messages using prompt wrapping (e.g., prepareFunctionCallPrompt) before making the API call.
When a provider supports native function calling, MCP tools must be converted to the provider's format (e.g., using convertToProviderTools) and included in the API request.
Provider implementations should aggregate and yield usage events as part of the standardized stream.
Provider implementations should yield image data events in the standardized format when applicable.
Provider implementations should yield reasoning events in the standardized format when applicable.
Provider implementations should yield tool call events (tool_call_start, tool_call_chunk, tool_call_end) in the standardized format.
Provider implementations should yield stop events with appropriate stop_reason in the standardized format.
Provider implementations should yield error events in the standardized format...

Files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
src/main/**/*.{ts,js,tsx,jsx}

📄 CodeRabbit inference engine (.cursor/rules/project-structure.mdc)

主进程代码放在 src/main

Files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
src/**/*.{ts,tsx,vue}

📄 CodeRabbit inference engine (CLAUDE.md)

Use English for all logs and comments

Files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
src/main/presenter/**/*.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Maintain one presenter per functional domain in src/main/presenter/

Files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Provider implementations should yield image data events in the standardized format when applicable.
📚 Learning: 2025-07-21T01:46:52.880Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Provider implementations should yield image data events in the standardized format when applicable.

Applied to files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-07-21T01:46:52.880Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : All provider implementations must parse provider-specific data chunks and yield standardized events for text, reasoning, tool calls, usage, errors, stop reasons, and image data.

Applied to files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-07-21T01:46:52.880Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Provider implementations must use a `coreStream` method that yields standardized stream events to decouple the main loop from provider-specific details.

Applied to files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-08-26T14:13:46.578Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-26T14:13:46.578Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Each LLM provider must implement provider-specific API interactions, convert MCP tools, and normalize streaming responses

Applied to files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-08-26T14:13:46.578Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-26T14:13:46.578Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Implement a coreStream method for new providers following the standardized event interface

Applied to files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-07-21T01:46:52.880Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : The `coreStream` method in each Provider must perform a single streaming API request per conversation round and must not contain multi-round tool call loop logic.

Applied to files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-07-21T01:46:52.880Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Provider implementations should aggregate and yield usage events as part of the standardized stream.

Applied to files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
📚 Learning: 2025-07-21T01:46:52.880Z
Learnt from: CR
PR: ThinkInAIXYZ/deepchat#0
File: .cursor/rules/llm-agent-loop.mdc:0-0
Timestamp: 2025-07-21T01:46:52.880Z
Learning: Applies to src/main/presenter/llmProviderPresenter/providers/*.ts : Each file in `src/main/presenter/llmProviderPresenter/providers/*.ts` should handle interaction with a specific LLM API, including request/response formatting, tool definition conversion, native/non-native tool call management, and standardizing output streams to a common event format.

Applied to files:

  • src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
🧬 Code graph analysis (1)
src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts (1)
src/main/presenter/index.ts (1)
  • presenter (188-188)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-check (x64)

Comment on lines +645 to +672
// 处理图片数据(OpenRouter Gemini 格式)
if (delta?.images && Array.isArray(delta.images)) {
for (const image of delta.images) {
if (image.type === 'image_url' && image.image_url?.url) {
try {
const cachedUrl = await presenter.devicePresenter.cacheImage(image.image_url.url)
yield {
type: 'image_data',
image_data: {
data: cachedUrl,
mimeType: 'deepchat/image-url'
}
}
} catch (cacheError) {
console.warn('[handleChatCompletion] Failed to cache image:', cacheError)
yield {
type: 'image_data',
image_data: {
data: image.image_url.url,
mimeType: 'deepchat/image-url'
}
}
}
}
}
continue
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid early-continue; it can drop finish_reason/tool_calls in mixed chunks

If a chunk contains both images and finish_reason/tool_calls, the early continue skips subsequent handlers, potentially causing missing stop events or tool-call deltas.

Apply this diff to skip only text parsing for the current chunk while still processing tool_calls/finish_reason:

-      // 处理图片数据(OpenRouter Gemini 格式)
+      // 处理图片数据(OpenRouter Gemini 格式)
       if (delta?.images && Array.isArray(delta.images)) {
         for (const image of delta.images) {
           if (image.type === 'image_url' && image.image_url?.url) {
             try {
               const cachedUrl = await presenter.devicePresenter.cacheImage(image.image_url.url)
               yield {
                 type: 'image_data',
                 image_data: {
                   data: cachedUrl,
                   mimeType: 'deepchat/image-url'
                 }
               }
             } catch (cacheError) {
               console.warn('[handleChatCompletion] Failed to cache image:', cacheError)
               yield {
                 type: 'image_data',
                 image_data: {
                   data: image.image_url.url,
                   mimeType: 'deepchat/image-url'
                 }
               }
             }
           }
         }
-        continue
+        // Skip only text parsing for this chunk; allow tool_calls/finish_reason handling below.
+        skipTextForChunk = true
       }

Add this guard (outside the selected range) right before the character-level text handling:

// declare near other state vars at the top of the loop scope
let skipTextForChunk = false
-      // 如果没有内容,则继续下一个 chunk
-      if (!currentContent) continue
+      // 如果没有内容或已决定跳过文本处理,则继续下一个 chunk
+      if (!currentContent || skipTextForChunk) continue

Comment on lines +673 to +688
// 处理 Gemini 原生格式的图片数据(inlineData)
if (delta?.content?.parts && Array.isArray(delta.content.parts)) {
for (const part of delta.content.parts) {
if (part.inlineData && part.inlineData.data) {
yield {
type: 'image_data',
image_data: {
data: part.inlineData.data,
mimeType: part.inlineData.mimeType || 'image/png'
}
}
}
}
continue
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Standardize inline image mimeType and avoid early-continue

  • Inline Gemini images are emitted with mimeType set to the real image type, but elsewhere you use the sentinel 'deepchat/image-base64'. This inconsistency can break downstream renderers expecting the standardized format.
  • Same early-continue issue as above.

Apply this diff to emit a consistent mimeType and skip only text parsing:

-      // 处理 Gemini 原生格式的图片数据(inlineData)
+      // 处理 Gemini 原生格式的图片数据(inlineData)
       if (delta?.content?.parts && Array.isArray(delta.content.parts)) {
         for (const part of delta.content.parts) {
           if (part.inlineData && part.inlineData.data) {
             yield {
               type: 'image_data',
               image_data: {
-                data: part.inlineData.data,
-                mimeType: part.inlineData.mimeType || 'image/png'
+                data: part.inlineData.data,
+                // Use standardized sentinel for base64 payloads for consistency with other paths
+                mimeType: 'deepchat/image-base64'
               }
             }
           }
         }
-        continue
+        // Skip only text parsing for this chunk; allow tool_calls/finish_reason handling below.
+        skipTextForChunk = true
       }

Optional improvement: convert inline base64 to a data URI and cache it via devicePresenter to unify URL-based rendering and reduce large in-memory payloads.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// 处理 Gemini 原生格式的图片数据(inlineData)
if (delta?.content?.parts && Array.isArray(delta.content.parts)) {
for (const part of delta.content.parts) {
if (part.inlineData && part.inlineData.data) {
yield {
type: 'image_data',
image_data: {
data: part.inlineData.data,
mimeType: part.inlineData.mimeType || 'image/png'
}
}
}
}
continue
}
// 处理 Gemini 原生格式的图片数据(inlineData)
if (delta?.content?.parts && Array.isArray(delta.content.parts)) {
for (const part of delta.content.parts) {
if (part.inlineData && part.inlineData.data) {
yield {
type: 'image_data',
image_data: {
data: part.inlineData.data,
// Use standardized sentinel for base64 payloads for consistency with other paths
mimeType: 'deepchat/image-base64'
}
}
}
}
// Skip only text parsing for this chunk; allow tool_calls/finish_reason handling below.
skipTextForChunk = true
}
🤖 Prompt for AI Agents
In src/main/presenter/llmProviderPresenter/providers/openAICompatibleProvider.ts
around lines 673-688, the handler emits inline Gemini images with their native
mimeType and uses an early continue that skips too much; change it to emit the
sentinel mimeType 'deepchat/image-base64' for consistency (set
image_data.mimeType = 'deepchat/image-base64') and remove the broad continue so
only text parsing is skipped (i.e., process image parts but don’t early-return
the whole delta handling), and optionally convert the inline base64 into a data
URI and register/cache it via devicePresenter to enable URL-based rendering and
avoid large in-memory payloads.

@zerob13 zerob13 merged commit 3643fb2 into dev Aug 27, 2025
2 checks passed
@zerob13 zerob13 deleted the feat/openrouter-gemini-image-generation branch January 6, 2026 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants