Skip to content

feat: multimodal image input mvp#324

Merged
yacosta738 merged 14 commits into
mainfrom
feat/multimodal-image-input-mvp
Mar 26, 2026
Merged

feat: multimodal image input mvp#324
yacosta738 merged 14 commits into
mainfrom
feat/multimodal-image-input-mvp

Conversation

@yacosta738
Copy link
Copy Markdown
Contributor

This pull request introduces significant improvements to project documentation, pull request templates, and code ownership assignments. The main changes focus on enhancing clarity for contributors (both human and AI), enforcing documentation practices in PRs, and refining code ownership for docs-related files.

Documentation and Contributor Guidance:

  • Replaced .agents/AGENTS.md with a new, much more comprehensive CLAUDE.md that provides detailed project architecture, coding standards, build/test commands, and guidelines specifically for AI contributors (Claude Code). This new file gives a clearer overview of the Corvus project structure, conventions, and best practices.

Pull Request Template Improvements:

  • Added a required "Documentation Impact" section to all PR templates (1_fix_issue.yml, 2_new_feature.yml, 3_other_type.yml) to ensure contributors explicitly state which docs were updated or explain why no docs update is needed. [1] [2] [3]
  • Updated checklist items in all PR templates to require contributors to confirm that documentation is up-to-date and matches current behavior, and to provide a direct link to the Contributing Guidelines. [1] [2] [3]

Code Ownership:

  • Assigned ownership of documentation-related files and scripts (clients/web/apps/docs/, scripts/validate-docs-metadata.mjs, .github/pull_request_template.md, .github/PULL_REQUEST_TEMPLATE/) to @yacosta738 in CODEOWNERS, clarifying responsibility for these areas.

Add image attachment support across the agent runtime: channel media
handling traits, provider vision capabilities (Gemini, OpenAI-compatible),
gateway multipart endpoints, observability media metrics, and config
schema for image processing limits. Includes updated specs and archived
SDD change artifacts.
Keep docs accurate by validating ownership, review freshness, locale parity, and discoverability in CI while tightening PR accountability. Also clear chat warnings so the full repository pipeline stays green.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds canonical multimodal channel contract (ordered ContentPart parts), image staging/validation with RAII cleanup, multimodal config and routing, provider image-capability plumbing and adapters, gateway enqueueing for WhatsApp/Telegram into a shared channel runtime, observability/metrics for image ingress, docs metadata validation, and CI/workflow updates.

Changes

Cohort / File(s) Summary
Core channel contract & media
clients/agent-runtime/src/channels/traits.rs, clients/agent-runtime/src/channels/media.rs, clients/agent-runtime/src/channels/mod.rs
Add ContentPart and ChannelMessage.parts; implement StagedImage, MIME sniffing, size/count limits, StagedImageGuard, ChannelRuntimeHandle, and image-routing/gating in the channel loop.
Channel adapters & ingest
clients/agent-runtime/src/channels/.../*
.../{telegram.rs,whatsapp.rs,lark.rs,discord.rs,qq.rs,cli.rs,dingtalk.rs,email_channel.rs,imessage.rs,irc.rs,matrix.rs,mattermost.rs,signal.rs,slack.rs}
Populate parts across adapters; Telegram/WhatsApp now extract image parts and add fetch_and_stage_image helpers; many listeners updated to construct multimodal ChannelMessage.
Media staging runtime plumbing
clients/agent-runtime/src/channels/mod.rs, clients/agent-runtime/src/channels/media.rs
Introduce staging APIs, pass staged images into channel runtime params, implement admission gating, staging pipeline, ingress events, and enqueue handle lifecycle.
Provider adapters & routing
clients/agent-runtime/src/providers/{traits.rs,compatible.rs,gemini.rs,reliable.rs,router.rs}
Add images: &'a [StagedImage] to ChatRequest; extend ProviderCapabilities (image_input, transport forms); implement multimodal chat for OpenAI-compatible & Gemini; Router/Reliable aggregate capabilities and enforce fail‑closed image routing.
Config, onboarding & gateway
clients/agent-runtime/src/config/schema.rs, clients/agent-runtime/src/config/mod.rs, clients/agent-runtime/src/onboard/wizard.rs, clients/agent-runtime/src/gateway/mod.rs
Add MultimodalConfig, ModelRouteConfig.allow_image_input, validation; onboarding populates defaults; gateway spawns ChannelRuntimeHandle and enqueues verified WhatsApp messages when present.
Observability & metrics
clients/agent-runtime/src/observability/{traits.rs,log.rs,multi.rs,otel.rs,prometheus.rs,mod.rs}
New ImageIngressEvent/Outcome/Reason, observer hook on_image_ingress, logging arm, Prometheus/Otel counters and fan-out.
Tests & callsite updates
clients/agent-runtime/src/{agent/agent.rs,channels/mod.rs,update/mod.rs}, clients/agent-runtime/tests/*, clients/agent-runtime/src/channels/*
Update callsites/tests to pass images: &[]; add tests for staging, gating, routing, staged-image cleanup, and runtime enqueue behavior.
Docs content & validation CI
scripts/validate-docs-metadata.mjs, clients/web/apps/docs/{package.json,README.md,content.config.ts,astro.config.mjs}, clients/web/apps/docs/src/content/docs/**/*
Add docs metadata validator, pnpm script and check integration, extend frontmatter schema and update many docs frontmatter and sidebar entries.
Repo governance & CI templates
.github/{CODEOWNERS,pull_request_template.md,PULL_REQUEST_TEMPLATE/*}, .github/workflows/docs-quality.yml, .github/workflows/sonarqube-analysis.yml
Add CODEOWNERS for docs, require Documentation Impact in PR templates, add Docs Quality workflow and adjust Sonar exclusions.
Design/specs & verification
openspec/changes/archive/..., openspec/specs/...
Add design/proposal/exploration/specs/tasks/verify-report documenting MVP constraints (Telegram+WhatsApp, provider capability declarations, fail‑closed routing, staging rules).
Minor web & tests
clients/web/apps/chat/src/{App.vue,App.spec.ts}
Small test robustness fixes and a lint-ignore directive; no functional frontend changes.

Sequence Diagram(s)

sequenceDiagram
    participant Channel as Channel (Telegram/WhatsApp)
    participant Stager as Image Staging
    participant Router as Route Resolver
    participant Provider as Provider (OpenAI/Gemini)
    participant Runtime as Channel Runtime

    Channel->>Runtime: raw inbound webhook/update
    Runtime->>Stager: extract parts and request staging
    Stager->>Stager: sniff MIME, enforce size/count, write temp file
    alt staging rejected
        Stager-->>Runtime: ImageIngressEvent(Rejected, reason)
        Runtime->>Channel: send rejection reply (fail-closed)
    else staging admitted
        Stager->>Router: resolve vision_model_hint → provider+model
        alt no eligible provider
            Router-->>Runtime: ImageIngressEvent(Rejected, NoProvider)
            Runtime->>Channel: send unavailable/unable-to-handle
        else eligible provider
            Runtime->>Provider: forward staged images + text (images inline/transport)
            Provider-->>Runtime: chat response / tool calls
            Runtime->>Channel: deliver response
            Runtime-->>Stager: cleanup staged temp files
        end
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Possibly related PRs

Suggested labels

area:vue

Suggested reviewers

  • yuniel-acosta
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/multimodal-image-input-mvp

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 26, 2026

✅ Contributor Report

User: @yacosta738
Status: Passed (12/13 metrics passed)

Metric Description Value Threshold Status
PR Merge Rate PRs merged vs closed 89% >= 30%
Repo Quality Repos with ≥100 stars 0 >= 0
Positive Reactions Positive reactions received 9 >= 1
Negative Reactions Negative reactions received 0 <= 5
Account Age GitHub account age 3071 days >= 30 days
Activity Consistency Regular activity over time 108% >= 0%
Issue Engagement Issues with community engagement 0 >= 0
Code Reviews Code reviews given to others 455 >= 0
Merger Diversity Unique maintainers who merged PRs 2 >= 0
Repo History Merge Rate Merge rate in this repo 91% >= 0%
Repo History Min PRs Previous PRs in this repo 176 >= 0
Profile Completeness Profile richness (bio, followers) 90 >= 0
Suspicious Patterns Spam-like activity detection 1 N/A

Contributor Report evaluates based on public GitHub activity. Analysis period: 2025-03-26 to 2026-03-26

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 43

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
clients/web/apps/docs/src/content/docs/guides/features.md (1)

54-54: ⚠️ Potential issue | 🟡 Minor

Update stale AGENTS reference to CLAUDE to avoid canonical-doc drift.

Line 54 still references README/AGENTS, but this PR replaces .agents/AGENTS.md with CLAUDE.md. Since this page is now status: canonical, this should be corrected in the same change.

Suggested fix
-- [x] **README/AGENTS**: In-repo documentation for developers and agents.
+- [x] **README/CLAUDE**: In-repo documentation for developers and AI contributors.

As per coding guidelines, “Verify technical accuracy and that docs stay aligned with code changes.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/web/apps/docs/src/content/docs/guides/features.md` at line 54, Update
the stale checklist entry that reads "[x] **README/AGENTS**" to reference the
new CLAUDE doc (e.g., change the text to something like "[x] **CLAUDE**: In-repo
documentation for developers and agents") so the canonical docs reflect the
replacement of .agents/AGENTS.md with CLAUDE.md; locate the string "[x]
**README/AGENTS**" in features.md and replace it accordingly, keeping the
checklist formatting and intent intact.
clients/web/apps/docs/src/content/docs/guides/structure.md (1)

22-22: ⚠️ Potential issue | 🟠 Major

Update stale contributor-doc reference (AGENTS.mdCLAUDE.md).

This page still points to AGENTS.md, but this PR replaces that guidance file with CLAUDE.md. Please update this reference to avoid sending contributors to deprecated docs.

As per coding guidelines: "**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/web/apps/docs/src/content/docs/guides/structure.md` at line 22, The
docs page references a stale file name `AGENTS.md`; update the single occurrence
of that reference in the `structure.md` content to `CLAUDE.md` so contributors
are routed to the new guidance file, ensuring the text and any surrounding
punctuation remain consistent with existing phrasing.
clients/web/apps/docs/src/content/docs/es/guides/structure.md (1)

18-22: ⚠️ Potential issue | 🟠 Major

Fix ES guide parity with EN paths and contributor-doc naming.

This ES page diverges from the EN structure guide on docs paths (docs/... vs clients/web/apps/docs/src/content/docs/...) and still references AGENTS.md instead of CLAUDE.md. Please align both to keep localization and contributor guidance consistent.

As per coding guidelines: "For user-facing docs, check EN/ES parity or explicitly note pending translation gaps."

Also applies to: 52-57

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/web/apps/docs/src/content/docs/es/guides/structure.md` around lines
18 - 22, Update the Spanish guide entries so they match the English guide
structure: replace the shortened `docs/` path with the full docs path used in EN
(use the same path pattern `clients/web/apps/docs/src/content/docs/...`), rename
the contributor doc reference from `AGENTS.md` to `CLAUDE.md`, and make the same
parallel fixes for the later section referenced (lines 52-57) so the ES page
mirrors EN parity for docs paths and contributor-doc naming (look for the list
items containing `docs/`, `AGENTS.md`, and their duplicates).
clients/web/apps/docs/src/content/docs/clients/agent-runtime/providers/index.mdx (1)

15-41: ⚠️ Potential issue | 🟠 Major

Document multimodal/image support in provider capabilities.

This provider reference does not reflect the image-input MVP shipped in this PR (which providers accept images, any constraints/fallback behavior). Please add capability details so docs match runtime behavior.

As per coding guidelines: "**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@clients/web/apps/docs/src/content/docs/clients/agent-runtime/providers/index.mdx`
around lines 15 - 41, Add a "Multimodal/Image support" column to the provider
reference table in index.mdx and populate it per each provider ID (e.g., openai,
openrouter, anthropic, gemini, ollama, lmstudio, venice, groq, mistral, etc.)
indicating whether image input is supported, any constraints (supported formats,
max size/resolution, accepted endpoints), and fallback behavior (e.g., "not
supported — use text-only" or "accepts base64 or multipart; will downscale to
X"). Ensure entries match the runtime behavior implemented for those provider
IDs so the docs reflect the image-input MVP (specify exact provider capabilities
and limitations in the Notes cell for each provider).
clients/agent-runtime/src/gateway/mod.rs (1)

2010-2034: ⚠️ Potential issue | 🔴 Critical

Fail closed when the WhatsApp signing secret is missing.

If state.whatsapp_app_secret is None, this handler skips signature verification entirely and accepts unsigned /whatsapp POSTs. For a public webhook, reject requests (or fail startup) until the app secret is configured.

As per coding guidelines, clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs: Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/gateway/mod.rs` around lines 2010 - 2034, The
handler currently skips verification when state.whatsapp_app_secret is None;
change it to fail-closed by rejecting requests when the app secret is not
configured: in the /whatsapp request handler check state.whatsapp_app_secret and
if None immediately return UNAUTHORIZED (or a clear error response) instead of
proceeding, and keep the existing verify_whatsapp_signature(app_secret, &body,
signature) branch when Some(app_secret) is present; update any logging to
indicate "missing whatsapp_app_secret" and ensure the behavior is enforced in
the handler around state.whatsapp_app_secret and verify_whatsapp_signature so
unsigned POSTs are not accepted by default.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/AGENTS.md:
- Around line 1-208: The AGENTS metadata block is missing from
.agents/AGENTS.md; add a dedicated "Agent Metadata" section near the top that
restores the required contract by including explicit fields for agent name,
short description, purpose, capabilities (enumerated), version, and
compatibility details; ensure the section is clearly labeled (e.g., "Agent
Metadata" header) and formatted as a structured, machine-parseable block
(YAML/JSON-style) so tools reading .agents/AGENTS.md can find keys like name,
description, purpose, capabilities, version, and compatibility without changing
other guidance content.

In @.github/CODEOWNERS:
- Around line 14-18: The CODEOWNERS entries for the critical paths
(/clients/web/apps/docs/, /scripts/validate-docs-metadata.mjs,
/.github/workflows, /.github/pull_request_template.md,
/.github/PULL_REQUEST_TEMPLATE/) currently list only an individual
(`@yacosta738`); update each of those lines to include the appropriate team owner
as primary and keep `@yacosta738` as secondary (for example: add @<org>/docs-team
or @<org>/governance as the first owner before `@yacosta738`) so the critical
docs/governance paths are owned by the team while retaining the individual as a
fallback.

In @.github/PULL_REQUEST_TEMPLATE/1_fix_issue.yml:
- Around line 76-77: Update the two checklist items that read "I have updated
the documentation, or I explained above why no documentation update is needed."
and "I verified the documentation matches the current behavior." by adding
required: true to each label entry so these checks become mandatory; apply the
same change to the identical entries in the other two templates (the files
containing 2_new_feature.yml and 3_other_type.yml) to keep them consistent and
enforce governance across all PR templates.

In `@clients/agent-runtime/src/channels/discord.rs`:
- Around line 408-415: The message being constructed stores Discord text in
`content` but leaves `parts` empty, so downstream consumers see an empty
canonical message; update the constructor that builds the message (the struct
instance with fields `content`, `channel`, `timestamp`, `parts`) to initialize
`parts` from `clean_content` (e.g., create a single text part using the same
`clean_content` value) instead of an empty vec, ensuring `parts` mirrors the
`content` field for downstream consumers.
- Around line 397-401: The ID fallback branch currently returns a bare UUID when
message_id.is_empty(), causing inconsistent IDs versus the normal format
"discord_{message_id}"; update the branch that constructs id (the code around
the id: if message_id.is_empty() { ... } else { ... } block) to prefix the
generated UUID with "discord_" (e.g., format!("discord_{}", Uuid::new_v4())) so
both paths produce IDs in the same discord_ namespace and preserve downstream
dedupe/routing expectations.

In `@clients/agent-runtime/src/channels/lark.rs`:
- Around line 450-461: The ChannelMessage is being created with content
populated but parts left empty; update the websocket handler (where
ChannelMessage is constructed) and the parse_event_payload() path (also
referenced around lines 615-623) to populate ChannelMessage.parts with a single
text part derived from the parsed text instead of vec![]; create a
ChannelMessage part object (using the project’s ChannelMessagePart/Part struct)
with the text as its body/kind (e.g., kind/type "text" and the same content) and
include any required metadata (id/timestamp/role) so consumers reading parts see
the canonical text representation.

In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 750-799: The code currently calls stage_channel_images(...) before
checking media::validate_image_count(...), causing over-limit requests to
fetch/write attachments unnecessarily and bypass StagedImageGuard; move the
image-count validation to run immediately after if msg.has_image_parts() (before
calling stage_channel_images), emit the same ImageIngressOutcome::Rejected and
send the same rejection message when media::validate_image_count(image_count)
returns an error, and only call stage_channel_images and construct
StagedImageGuard if the count validation succeeds so no staging work occurs for
rejected requests.
- Around line 905-919: The code currently calls emit_image_ingress with
ImageIngressOutcome::ProviderSent inside the block checking image_route_metadata
and staged_guard before the llm_result match, causing error/timeout flows to be
reported as successful; remove or move that emit_image_ingress call and instead
invoke emit_image_ingress with ProviderSent only after verifying a real
successful send (e.g., inside the branch that matches a successful
llm_result/send), or alternatively change the outcome to an “Attempt” value if
you want to record attempts earlier; update references to image_route_metadata,
staged_guard, emit_image_ingress and the llm_result match so the telemetry
reflects actual success rather than pre-send attempts.
- Around line 603-615: The approval/safety blocking path still reads msg.content
and can miss multimodal prompts; replace usages of msg.content in downstream
gating checks with the computed user_text (or directly call
msg.text_projection()) so the canonical blocking call and all downstream gates
use the multimodal text projection; update any references in
approval/safety/blocking logic that currently reference msg.content to use
user_text (the variable computed from msg.text_projection() or the projection
call itself) to ensure image-only text in msg.parts is considered.

In `@clients/agent-runtime/src/channels/qq.rs`:
- Around line 352-363: The QQ ingress builds ChannelMessage instances with
content set but parts left empty; update both ChannelMessage constructors (the
channel_msg in the C2C branch and the similar constructor around the 391-402
region) to populate the parts field from the content (e.g., create a single text
part derived from content) so the content-part contract is satisfied; ensure you
construct the appropriate Part/MessagePart structure used elsewhere and push it
into parts instead of leaving parts: vec![] while keeping id, sender,
reply_target, channel, and timestamp unchanged.

In `@clients/agent-runtime/src/channels/telegram.rs`:
- Around line 768-783: The code currently pushes caption text (variable caption
and ContentPart::Text) before the document/image MIME check, which allows
unsupported image documents (e.g., image/gif) to be downgraded to a text-only
turn; update the logic in the handler that builds parts so that when an
image-like document is present and fails the MIME admission check you either
retain the media reference (do not drop the document) so downstream staging can
reject it, or return None for the entire turn instead of pushing caption text;
specifically, move or gate the caption handling (the caption variable and any
push of ContentPart::Text) to occur after verifying the document/media via the
same MIME admission logic used for images/documents (the document handling code
around the document MIME gate), and ensure functions/variables like caption,
parts, ContentPart::Text, and the document MIME gate follow this new flow so
unsupported media do not produce a text-only turn.
- Around line 1538-1550: The reqwest errors logged in the Telegram client
closures leak the bot token because the error Display includes the full URL;
update the three failing map_err/log calls (the getFile POST in the map_err at
the post(self.api_url("getFile")) call, the file download request map_err around
post(self.file_download_url(...)), and the bytes read error path after
.bytes().await) to avoid printing the raw error: call the existing
providers::sanitize_api_error() (or a small wrapper that strips self.bot_token
from api_url()/file_download_url()) and log that sanitized string or a generic
classified message instead of `{e}`, ensuring you still map to
media::ImageRejectionReason::FetchFailed (or the existing rejection) and
preserve the original error handling flow; reference api_url(),
file_download_url(), and providers::sanitize_api_error for where to apply the
change.
- Around line 1579-1590: The code currently calls dl_resp.bytes().await which
buffers the whole response; instead iterate dl_resp.bytes_stream(), accumulate a
running total and call media::validate_size(total, media::MAX_IMAGE_BYTES) after
each chunk to early-reject when the total exceeds the limit, and map stream
errors to the same media::ImageRejectionReason::FetchFailed as done now. Locate
the download handling around dl_resp and replace the bytes().await usage with a
chunk loop over bytes_stream(), updating byte_len and validating per-chunk, and
ensure you return the appropriate Err variant immediately when validation fails
(and preserve logging via tracing::warn for stream errors).

In `@clients/agent-runtime/src/channels/traits.rs`:
- Around line 41-58: text_projection currently only concatenates non-empty
strings from self.parts and thus returns empty for legacy text-only messages
modeled as content != "" with parts = []; update text_projection to fall back to
the legacy self.content when parts is empty (or prepend/append it when
appropriate) so those messages are preserved: inside fn text_projection(&self)
-> String, after computing blocks from self.parts, if blocks.is_empty() and
self.content.is_empty() is false (or simply if self.parts.is_empty()), return
self.content.clone() (or include self.content together with blocks if both
exist); reference the text_projection method, the parts field, and DummyChannel
shape to locate where to add this fallback.

In `@clients/agent-runtime/src/channels/whatsapp.rs`:
- Around line 222-233: Replace the eager full-buffer read of
dl_resp.bytes().await with streaming accumulation using dl_resp.bytes_stream():
iterate the stream, append chunks to a mutable accumulator (e.g. BytesMut), keep
a running total and call media::validate_size(total, media::MAX_IMAGE_BYTES) on
each chunk to early-return the media::ImageRejectionReason::FetchFailed error
when the limit is exceeded, and only after the loop produce the final bytes;
apply the same pattern used in http_request.rs (lines around bytes_stream usage)
and mirror the fix in the analogous telegram.rs location to prevent memory
exhaustion when Content-Length is missing or incorrect.
- Around line 206-215: The map_err closures that call tracing::warn!("WhatsApp
media download failed: {e}") (around the dl_resp download logic and the later
similar log) must not interpolate the reqwest::Error into logs because its
Display can contain the full signed download_url; update those places (the
map_err on self.client.get(download_url)... and the other tracing::warn at the
later download handling) to emit a non-sensitive, classified message such as
"WhatsApp media download failed" without the raw error or URL, or
sanitize/redact the URL before logging; if you need error context include only
non-secret fields (e.g., response status code or a simple error kind) extracted
from the error rather than the full Display.

In `@clients/agent-runtime/src/config/schema.rs`:
- Around line 3123-3147: Update validate_multimodal_config to not only check
multimodal.vision_model_hint is Some but also verify there exists a matching
[[model_routes]] entry whose route/model identifier equals the hint and whose
allow_image_input == true; use self.model_routes (or the struct field holding
routes) to find a route matching multimodal.vision_model_hint.as_ref().unwrap()
and if none found or the matched route has allow_image_input == false, return an
error similar to the existing anyhow::bail messages; keep the existing checks
for allowed_channels and invalid channel names (MVP_VALID_MULTIMODAL_CHANNELS)
intact.

In `@clients/agent-runtime/src/gateway/mod.rs`:
- Around line 2053-2072: Before returning 202, reserve and deduplicate each
incoming message id in idempotency_store and only enqueue messages after
successful reservation; use state.channel_runtime_handle and its
handle.enqueue(msg) call for queueing, skip any msg whose id is already reserved
(treat as duplicate) and do not re-enqueue, and if any enqueue fails (e.g.,
closed/full channel) abort and return a non-accepted response (do not return
202). Ensure you atomically record msg.id in idempotency_store prior to calling
handle.enqueue(msg) and only return StatusCode::ACCEPTED when all non-duplicate
messages were successfully enqueued.
- Around line 2056-2060: The tracing::info call that logs WhatsApp message
content must stop including truncate_with_ellipsis(&msg.content, 50); update the
log in gateway::mod.rs (the tracing::info near the WhatsApp → canonical runtime
line) to only emit stable metadata such as msg.id, msg.sender, and a boolean for
whether the message has image parts (inspect msg.parts or an existing helper
like has_image_parts) and remove any raw payload/caption from logs; ensure the
new log shape includes those fields and no sensitive content.

In `@clients/agent-runtime/src/observability/otel.rs`:
- Around line 194-195: The branch that matches ObserverEvent::ImageIngress
currently no-ops and must instead record telemetry; update the match arm for
ObserverEvent::ImageIngress in observability::otel.rs to emit an appropriate
image-ingress metric (and optionally a span) via the existing OTLP/metric
exporter used by this module (reuse the same metric recorder/handle used for
other observer events), include relevant tags (e.g., image id/source, size) and
ensure the code path forwards the event rather than dropping it; then add a
unit/integration test that sends an ObserverEvent::ImageIngress through the same
recording path and asserts the metric/record was produced (not discarded), using
the same test helpers/mocks the file uses for other events.

In `@clients/agent-runtime/src/observability/prometheus.rs`:
- Around line 176-177: The Prometheus backend currently ignores
ObserverEvent::ImageIngress in the event match arm; update the handler in
clients/agent-runtime/src/observability/prometheus.rs to record a counter for
ImageIngress events (e.g., labels for stage and outcome) instead of doing
nothing, using the same metric conventions as other observer events so it shows
up in encode(); also add a unit test that emits an ImageIngress event and
asserts the encoded output contains the corresponding metric name/labels
produced by encode(). Ensure you reference ObserverEvent::ImageIngress, the
Prometheus recorder function handling observer events, and the encode() method
in your changes and test.

In `@clients/agent-runtime/src/observability/traits.rs`:
- Around line 21-29: The ImageIngressEvent.reason field currently allows
free-form text; change it to a closed enum (e.g., ImageIngressReason) so callers
must supply one of predefined codes (e.g., ProviderError, BadMimeType, TooLarge,
Timeout, Unknown) instead of raw strings; update the ImageIngressEvent struct to
use this enum for reason, adjust any constructors/serializers that set reason
(search for ImageIngressEvent:: and assignments to .reason) to map external
error text to one of the enum variants and strip/redact sensitive details before
sending telemetry, and add a placeholder variant (e.g., Other) for unmapped
cases to ensure no secrets are logged.
- Around line 166-167: Change the default implementation of on_image_ingress so
it forwards the incoming ImageIngressEvent into record_event instead of being a
no-op: construct an ObserverEvent::ImageIngress from the &ImageIngressEvent
(clone or convert as appropriate) and call
self.record_event(&ObserverEvent::ImageIngress(...)). Update the signature in
the on_image_ingress default impl to use the event value/reference you need to
build the ObserverEvent; keep the method name on_image_ingress and target
record_event and the ObserverEvent::ImageIngress variant so existing observers
that only implement record_event continue to receive image ingress events.

In `@clients/agent-runtime/src/providers/compatible.rs`:
- Around line 494-577: chat_multimodal currently builds only message content
(text+image data URLs) and returns ProviderChatResponse without sending tool
specifications or parsing provider tool_calls, causing multimodal user turns to
lose tool-calling behavior; update chat_multimodal to mirror chat()’s tool flow:
include request.tools (serialize tool specs) in the request body when present,
and after receiving ApiChatResponse inspect/parse choice.tool_calls (or
equivalent field on choice) into ProviderChatResponse.tool_calls instead of
always returning an empty Vec; ensure the same auth/send/response-error handling
is kept (functions: chat_multimodal, chat, ProviderChatResponse,
ApiChatResponse, choice/tool_calls) so multimodal requests preserve native tool
invocation.
- Around line 511-538: The loop over request.messages currently appends
request.images to every msg where msg.role == "user", causing duplicated images;
modify the loop in compatible.rs so images are only attached to the current/new
user turn (e.g., the last user message). Concretely, iterate with an index or
find the last user message (use request.messages.iter().enumerate() or scan to
find the last user index) and only run the image-reading/base64/data_url block
and push image blocks into blocks when the current msg is that last user
message; keep building text blocks for all user msgs but only add images for
that single identified message before pushing into api_messages.

In `@clients/agent-runtime/src/providers/gemini.rs`:
- Around line 372-410: The current code only serializes the last user message
into user_parts and then creates GenerateContentRequest.contents with a single
user Content, losing earlier turns; instead iterate over request.messages to
build a Content per message (preserving role into Content.role and text into
Part::Text) and push those into contents in order, and then for the current
(last) user Content append the image Part::InlineData items (created from
request.images) to that Content's parts; update the code around user_parts, the
loop over request.images, and the construction of GenerateContentRequest (and
references to system_instruction/contents) so history is preserved and images
are attached only to the latest user turn.
- Around line 417-433: The error handling leaks sensitive provider data (the API
key in the URL and raw body) when build_generate_content_url (used in the
multimodal path) and the text-only generate_content path receive non-2xx
responses or transport errors; change the logic in the code that calls
build_generate_content_request/send() so that transport failures and non-success
HTTP responses are wrapped with generic, URL-free errors (e.g., "Gemini API
request failed") and do NOT include response.text() or the full URL; instead log
or return a redacted error that omits the query string/API key and only surface
safe details (status code and a fixed message), and update the branch that
inspects GenerateContentResponse.error to also return a redacted generic error
message rather than the provider's raw err.message; apply the same pattern to
the text-only generate_content path.

In `@clients/agent-runtime/src/providers/reliable.rs`:
- Around line 321-326: The capabilities() implementation currently returns only
the first wrapped provider’s capabilities, under-advertising the chain; update
ReliableProvider::capabilities to aggregate capabilities from all providers by
iterating self.providers and folding/merging each provider.capabilities() into
an accumulated ProviderCapabilities (e.g., use .iter().map(|(_, p)|
p.capabilities()).fold(ProviderCapabilities::default(), |acc, c| acc.merge(c))
or a bitwise OR/union if ProviderCapabilities supports it) so the returned
capabilities represent the union of every wrapped provider rather than just the
first.
- Around line 398-435: This image-turn loop bypasses execute_single_provider()
(losing retry/backoff, Retry-After handling, non-retryable classification and
API-key rotation) and logs raw errors; change the inner provider invocation to
call the existing retry/wrapping path (use execute_single_provider or the same
wrapper used for text turns) for each image-capable provider (i.e., inside the
loop over self.model_chain() and self.providers) so retries/backoff and API-key
rotation are applied, collect failures from the wrapped call, and replace the
direct tracing::warn!("{e}") with a sanitized/log-safe message (no raw error
payloads or secrets) or rely on the wrapper’s safe logging; keep the same
failure aggregation (failures.push(format!(...))) and preserve the final bail
messages when all image-capable providers fail.

In `@clients/agent-runtime/src/providers/router.rs`:
- Around line 95-100: capabilities() currently returns only the default
provider's ProviderCapabilities causing incorrect router-level capabilities;
change capabilities() to iterate over self.providers, collect each provider's
p.capabilities(), and return the union/aggregate of all capabilities (so
supports_image_input() and other flags are true if any registered provider
reports them). Use the existing ProviderCapabilities type's union/merge
operation or OR together the relevant flags when aggregating from self.providers
(affecting the behavior used by chat() and upstream capability checks).

In `@clients/agent-runtime/src/providers/traits.rs`:
- Around line 79-81: The default Provider::chat() currently delegates to
chat_with_history() and silently drops ChatRequest.images (StagedImage via
StagedImage.temp_path), causing image turns to be treated as text; change
Provider::chat() to detect non-empty ChatRequest.images and fail-closed (return
an Err/unsupported-media error) instead of delegating, so image-capable
providers must explicitly override chat(); apply the same guard to the other
identical default implementation at the other location (the block around the
second delegation at lines referenced in the review).

In `@clients/web/apps/chat/src/App.spec.ts`:
- Around line 104-109: The current pattern calls
expect(button?.exists()).toBe(true) then immediately has an if (!button) throw,
which is unreachable; move the null-check guard before the expect so the
descriptive error can run and prevent calling .exists() on undefined. For each
occurrence in App.spec.ts (e.g., pairButton at the block around the lines with
pairButton?.exists()), change the order to: if (!pairButton) throw new
Error("Expected pairing button to exist"); then await
expect(pairButton.exists()).toBe(true) (or
expect(pairButton.exists()).resolves/toBe true as appropriate), and apply the
same change to the other three button blocks referenced in the comment so the
explicit guards are effective.

In `@clients/web/apps/docs/astro.config.mjs`:
- Around line 152-165: The sidebar entry for "Cerebro Migration" uses an
incorrect slug value; update the slug in the object with label "Cerebro
Migration" from "guides/cerebro/migration-guide" to "guides/cerebro/migration"
so it matches the actual file (migration.md) and keeps the translations block
as-is; locate the object that contains label: "Cerebro Migration" and modify its
slug property accordingly.

In `@clients/web/apps/docs/src/content/docs/clients/agent-runtime/pr-workflow.md`:
- Around line 3-8: The PR updated the English frontmatter (description, owner,
status, lastReviewed, appliesTo, docType) for the PR workflow but did not update
or mark the Spanish counterpart; add a synced ES version of the document
(translate frontmatter and content for pr-workflow.md into Spanish in the same
docs directory) or explicitly mark the EN doc with a pending-translation flag
(e.g., add a frontmatter key like translationPending: ["es"] or translationNote:
"ES pending") and update any translation index/metadata used by the docs
pipeline so the missing Spanish page is recorded as intentional.

In `@clients/web/apps/docs/src/content/docs/es/guides/development.md`:
- Around line 3-8: The Spanish guide frontmatter (fields like description,
owner, status, lastReviewed, appliesTo, docType) was updated but the equivalent
English frontmatter was not updated; either update the matching EN document's
frontmatter to mirror these changes or add an explicit "translationPending:
true" (or a parity note) in the ES frontmatter to mark the translation gap, and
ensure the EN doc's metadata (description/status/lastReviewed) is kept in sync
with the ES changes so parity checks for "**/*.{md,mdx}" pass.

In `@clients/web/apps/docs/src/content/docs/es/guides/features.md`:
- Around line 3-8: Update this canonical guide to remove the deprecated "AGENTS"
reference and replace it with "CLAUDE" everywhere it appears (text, headings,
and any internal links), ensuring any link targets or API/feature names are
adjusted to the CLAUDE migration; specifically search for the string "AGENTS" in
the features.md content and change it to "CLAUDE", update related wording or
URLs that point to the old AGENTS docs, and keep the frontmatter
(description/owner/status) intact and correct.

In `@clients/web/apps/docs/src/content/docs/guides/architecture.md`:
- Around line 3-8: The architecture tree in the docs' architecture.md currently
shows a top-level "docs/" node but the actual app lives under
"clients/web/apps/docs/"; update the diagram/text in the architecture tree
section (the tree diagram that contains the string "docs/") to use
"clients/web/apps/docs/" instead of "docs/" and verify any sibling/child entries
and links in that same tree block reference the corrected path.

In `@clients/web/apps/docs/src/content/docs/guides/customization.md`:
- Around line 3-8: The frontmatter for this EN guide is present but you must
ensure EN/ES parity: locate the Spanish counterpart (same slug with .es or in
i18n directories) and update its frontmatter fields (description, owner, status,
lastReviewed, appliesTo, docType) to match or, if the ES page does not exist,
add an explicit "translationPending: true" (or a clear note in the
frontmatter/body) to this EN file so readers know the Spanish translation is
pending; update the metadata fields named description, owner, status,
lastReviewed, appliesTo, and docType (or add translationPending) accordingly.

In
`@openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/design.md`:
- Around line 138-141: The documentation incorrectly states that the canonical
`text` part requires `source_role` while the code emits ContentPart::Text { text
} only; update the table row for `text` to list only `text` as the required
field (remove `source_role`) so the documented shape matches the implementation
(refer to ContentPart::Text and the `text` part kind).

In `@scripts/validate-docs-metadata.mjs`:
- Line 37: The module currently calls getSidebarSlugs() and
collectReferencedSlugs() at top-level causing full doc walks on import; convert
these to lazy-initializer functions (e.g., create getSidebarSlugsLazy() and
getCollectedReferencedSlugsLazy() or wrap existing logic in functions like
lazyGetSidebarSlugs and lazyCollectReferencedSlugs that compute and cache
results on first call) and update validateOrphanStatus to call those lazy
getters instead of the module-level variables so the doc-walking only runs when
validation actually needs the data.
- Around line 115-124: The current extractFrontmatter/getField approach uses a
fragile regex that breaks on multiline block scalars, values containing colons,
and complex YAML; replace the regex-based extraction by parsing the frontmatter
with a proper YAML parser (e.g., js-yaml) — call extractFrontmatter to get the
raw frontmatter string, pass it to yaml.load (or safeLoad) to get an object,
then read fields from that object (handle null/undefined safely) instead of
using getField's regex; update any callers of getField to use the parsed object
or provide a small wrapper function that returns string values from the parsed
frontmatter.
- Around line 110-112: The catch block currently swallows errors and falls back
to walk(docsRoot) silently; change the catch to capture the error (e.g., catch
(err)) and log it before returning walk(docsRoot) so git failures are visible —
update the existing anonymous catch in scripts/validate-docs-metadata.mjs to
call console.error (or the script's logger) with a clear message including the
error and context, then return walk(docsRoot).
- Around line 323-336: The code computes ageMs/ageDays from reviewedAt before
validating the date, so invalid lastReviewed values can produce NaN and bypass
the age check; change the order in the validation block: after constructing
reviewedAt from lastReviewed, immediately check
Number.isNaN(reviewedAt.getTime()) and push the error
(`errors.push(\`${relativePath}: invalid lastReviewed '${lastReviewed}'\`)`) if
invalid, and only if valid (else branch) compute ageMs and ageDays and then
perform the status !== "deprecated" / maxReviewAgeDaysByDocType lookup and
ageDays > maxReviewAgeDays comparison to potentially push the stale-review
error.

---

Outside diff comments:
In `@clients/agent-runtime/src/gateway/mod.rs`:
- Around line 2010-2034: The handler currently skips verification when
state.whatsapp_app_secret is None; change it to fail-closed by rejecting
requests when the app secret is not configured: in the /whatsapp request handler
check state.whatsapp_app_secret and if None immediately return UNAUTHORIZED (or
a clear error response) instead of proceeding, and keep the existing
verify_whatsapp_signature(app_secret, &body, signature) branch when
Some(app_secret) is present; update any logging to indicate "missing
whatsapp_app_secret" and ensure the behavior is enforced in the handler around
state.whatsapp_app_secret and verify_whatsapp_signature so unsigned POSTs are
not accepted by default.

In
`@clients/web/apps/docs/src/content/docs/clients/agent-runtime/providers/index.mdx`:
- Around line 15-41: Add a "Multimodal/Image support" column to the provider
reference table in index.mdx and populate it per each provider ID (e.g., openai,
openrouter, anthropic, gemini, ollama, lmstudio, venice, groq, mistral, etc.)
indicating whether image input is supported, any constraints (supported formats,
max size/resolution, accepted endpoints), and fallback behavior (e.g., "not
supported — use text-only" or "accepts base64 or multipart; will downscale to
X"). Ensure entries match the runtime behavior implemented for those provider
IDs so the docs reflect the image-input MVP (specify exact provider capabilities
and limitations in the Notes cell for each provider).

In `@clients/web/apps/docs/src/content/docs/es/guides/structure.md`:
- Around line 18-22: Update the Spanish guide entries so they match the English
guide structure: replace the shortened `docs/` path with the full docs path used
in EN (use the same path pattern `clients/web/apps/docs/src/content/docs/...`),
rename the contributor doc reference from `AGENTS.md` to `CLAUDE.md`, and make
the same parallel fixes for the later section referenced (lines 52-57) so the ES
page mirrors EN parity for docs paths and contributor-doc naming (look for the
list items containing `docs/`, `AGENTS.md`, and their duplicates).

In `@clients/web/apps/docs/src/content/docs/guides/features.md`:
- Line 54: Update the stale checklist entry that reads "[x] **README/AGENTS**"
to reference the new CLAUDE doc (e.g., change the text to something like "[x]
**CLAUDE**: In-repo documentation for developers and agents") so the canonical
docs reflect the replacement of .agents/AGENTS.md with CLAUDE.md; locate the
string "[x] **README/AGENTS**" in features.md and replace it accordingly,
keeping the checklist formatting and intent intact.

In `@clients/web/apps/docs/src/content/docs/guides/structure.md`:
- Line 22: The docs page references a stale file name `AGENTS.md`; update the
single occurrence of that reference in the `structure.md` content to `CLAUDE.md`
so contributors are routed to the new guidance file, ensuring the text and any
surrounding punctuation remain consistent with existing phrasing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 12f3f73a-e1f3-4e49-bd7a-7f2a9589d54d

📥 Commits

Reviewing files that changed from the base of the PR and between 93be06c and 3a49f5d.

📒 Files selected for processing (101)
  • .agents/AGENTS.md
  • .github/CODEOWNERS
  • .github/PULL_REQUEST_TEMPLATE/1_fix_issue.yml
  • .github/PULL_REQUEST_TEMPLATE/2_new_feature.yml
  • .github/PULL_REQUEST_TEMPLATE/3_other_type.yml
  • .github/pull_request_template.md
  • .github/workflows/docs-quality.yml
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/channels/cli.rs
  • clients/agent-runtime/src/channels/dingtalk.rs
  • clients/agent-runtime/src/channels/discord.rs
  • clients/agent-runtime/src/channels/email_channel.rs
  • clients/agent-runtime/src/channels/imessage.rs
  • clients/agent-runtime/src/channels/irc.rs
  • clients/agent-runtime/src/channels/lark.rs
  • clients/agent-runtime/src/channels/matrix.rs
  • clients/agent-runtime/src/channels/mattermost.rs
  • clients/agent-runtime/src/channels/media.rs
  • clients/agent-runtime/src/channels/mod.rs
  • clients/agent-runtime/src/channels/qq.rs
  • clients/agent-runtime/src/channels/signal.rs
  • clients/agent-runtime/src/channels/slack.rs
  • clients/agent-runtime/src/channels/telegram.rs
  • clients/agent-runtime/src/channels/traits.rs
  • clients/agent-runtime/src/channels/whatsapp.rs
  • clients/agent-runtime/src/config/mod.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/doctor/mod.rs
  • clients/agent-runtime/src/gateway/mod.rs
  • clients/agent-runtime/src/observability/log.rs
  • clients/agent-runtime/src/observability/mod.rs
  • clients/agent-runtime/src/observability/multi.rs
  • clients/agent-runtime/src/observability/otel.rs
  • clients/agent-runtime/src/observability/prometheus.rs
  • clients/agent-runtime/src/observability/traits.rs
  • clients/agent-runtime/src/onboard/wizard.rs
  • clients/agent-runtime/src/providers/compatible.rs
  • clients/agent-runtime/src/providers/gemini.rs
  • clients/agent-runtime/src/providers/reliable.rs
  • clients/agent-runtime/src/providers/router.rs
  • clients/agent-runtime/src/providers/traits.rs
  • clients/agent-runtime/src/update/mod.rs
  • clients/agent-runtime/tests/admin_config_api_integration.rs
  • clients/web/apps/chat/src/App.spec.ts
  • clients/web/apps/chat/src/App.vue
  • clients/web/apps/docs/README.md
  • clients/web/apps/docs/astro.config.mjs
  • clients/web/apps/docs/package.json
  • clients/web/apps/docs/src/content.config.ts
  • clients/web/apps/docs/src/content/docs/clients/agent-runtime/architecture.md
  • clients/web/apps/docs/src/content/docs/clients/agent-runtime/ci-map.md
  • clients/web/apps/docs/src/content/docs/clients/agent-runtime/index.mdx
  • clients/web/apps/docs/src/content/docs/clients/agent-runtime/pr-workflow.md
  • clients/web/apps/docs/src/content/docs/clients/agent-runtime/providers/index.mdx
  • clients/web/apps/docs/src/content/docs/es/clients/agent-runtime/architecture.md
  • clients/web/apps/docs/src/content/docs/es/clients/agent-runtime/ci-map.md
  • clients/web/apps/docs/src/content/docs/es/clients/agent-runtime/index.mdx
  • clients/web/apps/docs/src/content/docs/es/clients/agent-runtime/pr-workflow.md
  • clients/web/apps/docs/src/content/docs/es/clients/agent-runtime/providers/index.mdx
  • clients/web/apps/docs/src/content/docs/es/guides/architecture.md
  • clients/web/apps/docs/src/content/docs/es/guides/architecture/overview.md
  • clients/web/apps/docs/src/content/docs/es/guides/cerebro/migration.md
  • clients/web/apps/docs/src/content/docs/es/guides/cli-reference.md
  • clients/web/apps/docs/src/content/docs/es/guides/configuration.md
  • clients/web/apps/docs/src/content/docs/es/guides/customization.md
  • clients/web/apps/docs/src/content/docs/es/guides/development.md
  • clients/web/apps/docs/src/content/docs/es/guides/features.md
  • clients/web/apps/docs/src/content/docs/es/guides/getting-started.md
  • clients/web/apps/docs/src/content/docs/es/guides/gpg-setup.md
  • clients/web/apps/docs/src/content/docs/es/guides/hardware-peripherals-design.md
  • clients/web/apps/docs/src/content/docs/es/guides/release.md
  • clients/web/apps/docs/src/content/docs/es/guides/structure.md
  • clients/web/apps/docs/src/content/docs/es/guides/surrealdb.md
  • clients/web/apps/docs/src/content/docs/es/index.mdx
  • clients/web/apps/docs/src/content/docs/es/intro/introduction.mdx
  • clients/web/apps/docs/src/content/docs/guides/architecture.md
  • clients/web/apps/docs/src/content/docs/guides/architecture/overview.md
  • clients/web/apps/docs/src/content/docs/guides/cerebro/migration.md
  • clients/web/apps/docs/src/content/docs/guides/cli-reference.md
  • clients/web/apps/docs/src/content/docs/guides/configuration.md
  • clients/web/apps/docs/src/content/docs/guides/customization.md
  • clients/web/apps/docs/src/content/docs/guides/development.md
  • clients/web/apps/docs/src/content/docs/guides/features.md
  • clients/web/apps/docs/src/content/docs/guides/getting-started.md
  • clients/web/apps/docs/src/content/docs/guides/gpg-setup.md
  • clients/web/apps/docs/src/content/docs/guides/hardware-peripherals-design.md
  • clients/web/apps/docs/src/content/docs/guides/release.md
  • clients/web/apps/docs/src/content/docs/guides/structure.md
  • clients/web/apps/docs/src/content/docs/guides/surrealdb.md
  • clients/web/apps/docs/src/content/docs/index.mdx
  • clients/web/apps/docs/src/content/docs/intro/introduction.mdx
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/design.md
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/exploration.md
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/proposal.md
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/specs/agent-loop/spec.md
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/specs/agent-runtime-providers/spec.md
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/tasks.md
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/verify-report.md
  • openspec/specs/agent-loop/spec.md
  • openspec/specs/agent-runtime-providers/spec.md
  • scripts/validate-docs-metadata.mjs

Comment thread .agents/AGENTS.md Outdated
Comment thread .github/CODEOWNERS Outdated
Comment thread .github/PULL_REQUEST_TEMPLATE/1_fix_issue.yml
Comment thread clients/agent-runtime/src/channels/discord.rs
Comment thread clients/agent-runtime/src/channels/discord.rs Outdated
Comment thread scripts/validate-docs-metadata.mjs Outdated
Comment thread scripts/validate-docs-metadata.mjs Outdated
Comment thread scripts/validate-docs-metadata.mjs
Comment thread scripts/validate-docs-metadata.mjs
…s, and test alignment

- Fix brace mismatch in channels/mod.rs fail-closed image staging guard
- Fix ownership error: clone user_text before move into enriched_message
- Fix partial move: extract text before consuming tool_calls in compatible.rs
- Remove unused truncate_with_ellipsis import from gateway/mod.rs
- Fix test assertions: use ImageIngressReason enum variants instead of as_deref strings
- Add model_routes to config validation tests for vision route cross-reference
- Override chat() in ImageCapableMock so image turns use the correct trait path
- Align channel parts, observability, provider adapters, and docs with verify findings
- Replace backtracking-prone markdown link regex with simpler pattern
- Add --ignore-scripts to pnpm install in docs-quality workflow
- PATH hotspots in execFileSync calls reviewed as SAFE (no shell bypass)
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Mar 26, 2026

Deploying corvus with  Cloudflare Pages  Cloudflare Pages

Latest commit: 09adb0a
Status: ✅  Deploy successful!
Preview URL: https://e5f389fd.corvus-42x.pages.dev
Branch Preview URL: https://feat-multimodal-image-input.corvus-42x.pages.dev

View logs

@coderabbitai coderabbitai Bot added area:web and removed area:rust labels Mar 26, 2026
@sentry
Copy link
Copy Markdown

sentry Bot commented Mar 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

The frontmatter slug used 'migration-guide' while the sidebar
referenced 'migration', causing orphan detection failures in CI.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 32

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
clients/agent-runtime/src/channels/lark.rs (1)

561-623: ⚠️ Potential issue | 🟠 Major

Webhook mode still bypasses websocket normalization.

The websocket path strips @_user_N placeholders, trims text, skips empty turns, and suppresses unmentioned group-chat traffic before building ChannelMessage. This webhook path now populates parts, but it still skips that normalization/gating, so webhook mode can forward blank or placeholder-only turns and answer group traffic that websocket mode ignores. Reuse the same normalization logic before messages.push(...). As per coding guidelines "Implement Channel trait in src/channels/ with consistent send, listen, and health_check semantics and cover auth/allowlist/health behavior with tests".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/lark.rs` around lines 561 - 623, The
webhook branch currently builds a ChannelMessage directly from the extracted
text (msg_type / parse_post_content -> text) without applying the same
normalization/gating used by the websocket path; extract and reuse that
normalization (strip `@_user_N` placeholders, trim, drop empty turns, and suppress
unmentioned group-chat traffic) before pushing the message. Concretely: refactor
the websocket normalization into a reusable helper (e.g., normalize_message_text
or reuse the existing helper if present), call it on the `text` variable after
computing `timestamp` and `chat_id`, and only push ChannelMessage (id/ sender/
reply_target/ content/ channel/ timestamp/ parts: vec![ContentPart::Text { text
}]) when the helper returns a non-empty/allowed result so webhook mode follows
the same gating as websocket mode. Ensure you reference parse_post_content,
msg_type, ChannelMessage, and ContentPart when locating the code to change.
♻️ Duplicate comments (7)
clients/agent-runtime/src/observability/prometheus.rs (1)

191-201: ⚠️ Potential issue | 🟡 Minor

Add a metric assertion test for corvus_image_ingress_total.

The new handler at Line 191 increments the counter, but no test currently verifies this metric is exported by encode(). Please add one regression test that emits ObserverEvent::ImageIngress(...) and asserts corvus_image_ingress_total (with expected labels) is present in output.

As per coding guidelines "**/*: Security first, performance second. Look for behavioral regressions, missing tests, and contract breaks across modules."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/observability/prometheus.rs` around lines 191 -
201, Add a regression test that emits an ObserverEvent::ImageIngress and
verifies the corvus_image_ingress_total metric is exported by encode(): create a
test (e.g., in the same module tests or prometheus tests) that constructs an
ObserverEvent::ImageIngress with specific channel, outcome and reason values,
calls the code path that records the event (triggering
image_ingress.with_label_values(...).inc()), then calls the prometheus encode()
function and asserts the output contains the metric name
corvus_image_ingress_total with the exact label set (channel, outcome, reason)
and an increased count; reference the ObserverEvent::ImageIngress variant, the
image_ingress counter, and the encode() exporter when adding the assertion.
clients/agent-runtime/src/observability/otel.rs (1)

202-216: ⚠️ Potential issue | 🟡 Minor

Add regression coverage for the new ImageIngress OTel path.

Line 202 now records image ingress metrics, but tests still don’t exercise this branch. Please add a test that emits ObserverEvent::ImageIngress(...) to guard against future silent drops/regressions in this path.

As per coding guidelines "**/*: Security first, performance second. Look for behavioral regressions, missing tests, and contract breaks across modules."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/observability/otel.rs` around lines 202 - 216, Add
a unit/integration test that exercises the ObserverEvent::ImageIngress branch:
construct an ObserverEvent::ImageIngress with a channel, outcome and reason,
send it through the same event-handling entry point that matches on
ObserverEvent (the match arm that calls self.image_ingress.add), and assert the
image_ingress metric was updated (via the test OTLP/metric exporter or the
observability struct's test-accessible state) to prevent regressions in the
image_ingress.add path.
clients/web/apps/docs/src/content/docs/guides/architecture.md (1)

16-35: ⚠️ Potential issue | 🟡 Minor

Nest the docs app under clients/web/ in the tree.

clients/web/apps/docs/ is the right path, but showing it as a root-level sibling still misrepresents the repo layout. Expand the web/ node to include apps/docs/ here instead of listing the full nested path as its own top-level entry, and mirror the same structure in the Spanish page to keep EN/ES aligned.

Suggested doc tree shape
 │   ├── iosApp/                 # iOS host app (Xcode project)
-│   └── web/                    # Web app and operator dashboard
+│   └── web/                    # Web app and operator dashboard
+│       └── apps/
+│           └── docs/           # Documentation (Astro + Starlight)
 ...
-├── clients/web/apps/docs/      # Documentation (Astro + Starlight)

As per coding guidelines, **/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes. For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/web/apps/docs/src/content/docs/guides/architecture.md` around lines
16 - 35, The docs tree lists "clients/web/apps/docs/" as a top-level sibling;
update the ASCII tree in the architecture guide so the web node expands to
include an apps/docs subtree (e.g., under the "web/" branch add "apps/" then
"docs/") instead of repeating the full nested path as a root item, and apply the
same change to the Spanish version to keep EN/ES aligned; ensure the updated
diagram preserves the existing nodes (clients/, modules/, gradle/, Makefile,
settings.gradle.kts) and verify the docs file references to
"clients/web/apps/docs/" remain accurate and compliant with the **/*.{md,mdx}
guideline for EN/ES parity.
scripts/validate-docs-metadata.mjs (1)

127-149: ⚠️ Potential issue | 🟠 Major

Use a real YAML parser for frontmatter extraction.

extractFrontmatter() / getField() only handle single-line key: value entries. Block scalars, arrays, inline comments, and values containing additional colons will be misread, which makes this required CI check prone to false passes and false failures. Parse the frontmatter once with YAML and read fields from the resulting object instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/validate-docs-metadata.mjs` around lines 127 - 149, Replace the
fragile line-based parsing in getMetadata/getField with a real YAML parse: have
extractFrontmatter return the raw frontmatter string, import a YAML parser
(e.g., js-yaml or yaml), parse the frontmatter once into an object, and then
read description, owner, status, lastReviewed, appliesTo, docType, and slug from
that parsed object (falling back to empty string or null as before); remove or
stop using getField() for value extraction and ensure getMetadata still returns
null when no frontmatter and returns the normalized fields from the parsed YAML
when present.
clients/agent-runtime/src/gateway/mod.rs (1)

2064-2105: ⚠️ Potential issue | 🔴 Critical

Rollback and namespace queue reservations before returning 2xx.

Line 2068 records a bare msg.id in the same idempotency_store used by /webhook keys at Line 1734, and Lines 2083-2105 keep that reservation even when enqueue fails while still allowing 202 Accepted if another message in the batch succeeded. That can permanently drop the failed WhatsApp turn and lets /whatsapp traffic collide with or evict /webhook dedupe state.

Suggested fix
     if let Some(ref handle) = state.channel_runtime_handle {
         let mut enqueued = 0u32;
+        let mut enqueue_failed = false;
         for msg in messages {
+            let message_id = msg.id.clone();
+            let dedupe_key = format!("whatsapp:{message_id}");
+
             // Deduplicate: skip already-seen message ids.
-            if !state.idempotency_store.record_if_new(&msg.id) {
+            if !state.idempotency_store.record_if_new(&dedupe_key) {
                 tracing::debug!(
-                    msg.id = %msg.id,
+                    message_id = %message_id,
                     "WhatsApp duplicate skipped",
                 );
                 continue;
             }
 
             tracing::info!(
                 msg.id = %msg.id,
                 msg.sender = %msg.sender,
                 has_image = msg.has_image_parts(),
                 "WhatsApp → canonical runtime",
             );
 
             if let Err(e) = handle.enqueue(msg) {
-                tracing::error!("Failed to enqueue WhatsApp message: {e}");
-                // Enqueue failed — don't count as accepted.
+                state.idempotency_store.remove(&dedupe_key);
+                enqueue_failed = true;
+                tracing::error!(
+                    message_id = %message_id,
+                    "Failed to enqueue WhatsApp message: {e}"
+                );
             } else {
                 enqueued += 1;
             }
         }
 
+        if enqueue_failed {
+            return (
+                StatusCode::SERVICE_UNAVAILABLE,
+                Json(serde_json::json!({
+                    "error": "Failed to enqueue one or more WhatsApp messages"
+                })),
+            );
+        }
+
         if enqueued > 0 {
             return (
                 StatusCode::ACCEPTED,
                 Json(serde_json::json!({
                     "status": "accepted"
                 })),
             );
         }
-        // All messages were duplicates or failed to enqueue.
+        // All messages were duplicates.
         return (
             StatusCode::OK,
             Json(serde_json::json!({
                 "status": "ok"
             })),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/gateway/mod.rs` around lines 2064 - 2105, The loop
records message ids via state.idempotency_store.record_if_new(&msg.id) before
enqueueing, which leaves reservations even when handle.enqueue(msg) fails and
can collide with /webhook keys; change the logic so you either (a) only call
record_if_new after a successful handle.enqueue, or (b) if you must reserve
before enqueue, ensure you remove/rollback that reservation on enqueue failure
by calling a remove/unrecord method on idempotency_store, and ensure the
idempotency entries for WhatsApp are namespaced (e.g., use a "whatsapp" prefix
or dedicated key method) so they do not collide with /webhook entries; update
the code paths around state.idempotency_store.record_if_new, handle.enqueue, and
the final response decision so failed enqueues do not leave permanent
reservations and the 202/200 response reflects only truly accepted messages.
clients/agent-runtime/src/config/schema.rs (1)

3128-3135: ⚠️ Potential issue | 🔴 Critical

Fix compile-time type mismatch: match &Option<String> instead of match Option<String>.

mm is a shared borrow, so mm.vision_model_hint has type &Option<String>, but the pattern Some(ref h) expects Option<String> by value. This causes a type mismatch and will not compile. Use as_deref() to borrow the inner String and validate the hint is not empty.

Suggested fix
-        let hint = match mm.vision_model_hint {
-            Some(ref h) => h,
-            None => {
-                anyhow::bail!(
-                    "multimodal.enabled=true requires multimodal.vision_model_hint to be set"
-                );
-            }
-        };
+        let hint = mm
+            .vision_model_hint
+            .as_deref()
+            .map(str::trim)
+            .filter(|hint| !hint.is_empty())
+            .ok_or_else(|| {
+                anyhow::anyhow!(
+                    "multimodal.enabled=true requires multimodal.vision_model_hint to be set"
+                )
+            })?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/config/schema.rs` around lines 3128 - 3135, The
match is using mm.vision_model_hint by-value but mm is a shared borrow so change
the match to use a borrowed string (e.g. mm.vision_model_hint.as_deref()) and
bind to Some(h) (a &str) instead of Some(ref h); also validate h is not empty
and bail with the same message if it is empty or None so that the hint variable
becomes a non-empty &str for further use (referencing mm, vision_model_hint, and
the hint binding).
clients/agent-runtime/src/channels/telegram.rs (1)

1562-1564: ⚠️ Potential issue | 🔴 Critical

Sanitize the getFile decode error too.

Line 1563 still formats the raw reqwest::Error. Response::json() can carry the request URL, so a malformed Telegram response can re-expose bot_token even after the earlier sanitization fixes.

Minimal fix
-        let data: serde_json::Value = resp.json().await.map_err(|e| {
-            tracing::warn!("Telegram getFile response parse error: {e}");
+        let data: serde_json::Value = resp.json().await.map_err(|e| {
+            tracing::warn!(
+                "Telegram getFile response parse error: {}",
+                self.sanitize_error(&e)
+            );
             media::ImageRejectionReason::FetchFailed
         })?;

Based on learnings: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/telegram.rs` around lines 1562 - 1564, The
parse-error log for resp.json() in the getFile flow currently interpolates the
raw reqwest::Error (resp.json().await.map_err(|e| { tracing::warn!("Telegram
getFile response parse error: {e}"); ... })), which can leak URLs or tokens;
replace that interpolation with a non-sensitive message—e.g. call
tracing::warn!("Telegram getFile response parse error (sanitized)"); without
including {e} or any request data, or extract and log only a safe error
code/enum that contains no URL/token info—keeping the map_err to return
media::ImageRejectionReason::FetchFailed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/AGENTS.md:
- Around line 1-174: The MD022 warnings are caused by missing blank lines
surrounding headings in AGENTS.md; add a single blank line before and after each
top-level and sub-heading (e.g., the "Agent Metadata" YAML block, "## Mission",
"## Repo rule sources", "### Whole repo", "### Kotlin / Android / KMP", etc.)
and ensure fenced code blocks are followed and preceded by a blank line so
headings are separated from content and code blocks; update AGENTS.md to insert
these blank lines around the headings referenced to satisfy markdownlint MD022.

In @.github/pull_request_template.md:
- Around line 75-79: The pull request template currently pre-checks verification
items which undermines explicit confirmation; update the checklist entries for
"I have read the Contributing Guidelines", "I have added or updated tests...",
and "I verified the documentation matches the current behavior" (the three items
currently marked with [x]) to be unchecked ([ ]) instead of checked so
reviewers/authors must intentionally confirm them; edit the three checklist
lines in .github/pull_request_template.md accordingly while keeping the exact
text of each item unchanged.

In `@clients/agent-runtime/src/channels/discord.rs`:
- Around line 369-372: The warning logs raw Discord user IDs when
is_user_allowed(author_id) returns false; change the tracing::warn call to avoid
emitting the raw author_id by either replacing it with a constant redacted
string (e.g., "<redacted>") or logging an irreversible hash of author_id
(compute a fast one-way hash inside the same block before logging) so the
message remains informative without retaining the original identifier; update
the branch where is_user_allowed is called and ensure any new hash function used
is deterministic but one-way and not reversible.
- Around line 299-305: The heartbeat arm currently uses `_ = hb_rx.recv()` which
ignores the Option and never breaks when the ticker task closes; change the arm
to await hb_rx.recv(), pattern-match the result (e.g., if let Some(_) =
hb_rx.recv().await or match hb_rx.recv().await) and if it returns None, break
the loop; otherwise construct the heartbeat payload (using sequence) and call
write.send(Message::Text(...)).await as before so the loop stops sending
heartbeats when the ticker channel closes.

In `@clients/agent-runtime/src/channels/lark.rs`:
- Around line 400-418: The logs currently emit raw identifiers (sender_open_id,
lark_msg.message_id, chat_id) in the Lark WS handling; update the logging calls
around self.is_user_allowed, the dedup branch (ws_seen_ids) and any chat_id log
to redact or hash identifiers before logging (e.g., compute a stable SHA-256 or
blake2b hex digest of sender_open_id/message_id/chat_id and log the short hex or
a masked form). Locate the tracing::warn! that references sender_open_id, the
tracing::debug! that prints lark_msg.message_id, and any tracing::... that
prints chat_id, and replace the direct identifiers with the hashed/masked
variable; keep the is_user_allowed check and dedup logic unchanged except use
the raw ids for internal checks but never log them directly.

In `@clients/agent-runtime/src/channels/media.rs`:
- Around line 117-124: The PNG detection only checks the first four bytes;
change it to require the full 8‑byte PNG signature by verifying
sniffed_bytes.len() >= 8 and comparing bytes 0..7 against the canonical PNG
signature (0x89,0x50,0x4E,0x47,0x0D,0x0A,0x1A,0x0A) before returning
AllowedImageMime::Png (update the same detection block that currently checks
sniffed_bytes[0..3]); this tightens the admission boundary and prevents
truncated/crafted payloads from being accepted.

In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 914-920: The success path still uses the original ctx.model for
post-response validation; compute/store the resolved execution model (the result
of execution_model_for_turn(ctx.model.as_str(), image_route_metadata.as_ref()))
and thread that value into handle_successful_response so that
enforce_strict_memory_validation receives the effective model instead of
ctx.model; update the handle_successful_response call/site and any downstream
calls to enforce_strict_memory_validation to use this resolved model variable to
ensure image-grounded answers are validated with the vision-aware model.

In `@clients/agent-runtime/src/channels/qq.rs`:
- Around line 345-347: The deny-path warn logs currently print persistent QQ IDs
(user_openid/member_openid); update the logging in the relevant blocks (where
is_user_allowed(...) is checked and the similar block around member_openid) to
avoid emitting raw identifiers—either remove the identifier from the message or
replace it with a deterministic short hash/truncated form (e.g., SHA256 hex
truncated to 8 chars) before logging; ensure the change is applied to the warn!
calls referencing user_openid and member_openid so no raw PII is written to
logs.

In `@clients/agent-runtime/src/channels/telegram.rs`:
- Around line 1547-1552: Add explicit timeouts in fetch_and_stage_image(): wrap
the reqwest .send().await call that posts to self.api_url("getFile") with a
tokio::time::timeout using CHANNEL_MESSAGE_TIMEOUT_SECS (or a small per-request
timeout) and map timeout errors to FetchFailed; also wrap the entire file
download/streaming loop (the part that reads the response body) in a timeout so
a stalled stream also yields FetchFailed. While here, change the error logging
after resp.json().await to use self.sanitize_error(&e) instead of interpolating
{e} so the bot token isn’t leaked (mirror how getFile errors are sanitized).
Ensure all timeout branches convert to the same FetchFailed error path used by
stage_channel_images().
- Around line 1621-1634: The current temp_path based on sha256 is shared across
concurrent handlers (see temp_path, sha256 and tokio::fs::write), leading to
deletion races when StagedImageGuard drops; change the staging to create a
unique temp file per invocation using atomic creation (e.g.,
std::fs::OpenOptions::create_new() or tempfile::NamedTempFile::new_in()) so each
handler gets its own exclusive path, write bytes into that unique file and
propagate errors as before, and ensure StagedImageGuard deletes only its own
file.

In `@clients/agent-runtime/src/channels/whatsapp.rs`:
- Around line 257-266: The current code builds a predictable temp_path and
writes with tokio::fs::write which risks TOCTOU, symlink attacks, and
collisions; replace that with a securely-created unique temp file (e.g.,
tempfile::Builder or tempfile::NamedTempFile created with create_new/owner-only
perms) and write bytes into that file atomically, returning or moving the path
only on success and mapping errors to media::ImageRejectionReason::FetchFailed
as before; update the code paths that reference temp_path/sha256 and the
tokio::fs::write call so the temporary file is created securely and written to
(and ensure file permissions are owner-only).
- Around line 166-170: fetch_and_stage_image currently hardcodes
media::MAX_IMAGE_BYTES for both the header check and the streaming loop,
ignoring the configurable multimodal.max_image_bytes; change the API to accept
an effective image-size ceiling (e.g., add a parameter max_image_bytes: usize or
an Options struct) to the fetch_and_stage_image method and use that value for
the Content-Length/header pre-check and the streaming byte-accumulator loop
(replace uses of media::MAX_IMAGE_BYTES inside fetch_and_stage_image and the
related helper logic at the other affected block) and propagate the same
parameter from callers so stricter or larger deployments are honored
consistently.
- Around line 179-184: The warning log currently slices media_id bytes with
&media_id[..media_id.len().min(8)], which can panic on non-ASCII UTF-8; update
the handler that logs media_id (the closure used in the map_err for the media
metadata fetch) to safely truncate the ID using character boundaries (e.g.,
media_id.chars().take(8).collect::<String>() or an equivalent safe char-boundary
helper) before interpolating it into tracing::warn!; ensure you replace the
byte-slice reference to media_id with the safe truncated string.

In `@clients/agent-runtime/src/observability/multi.rs`:
- Around line 28-32: Add a unit test that verifies
MultiObserver::on_image_ingress fans out to each contained observer: create two
or more lightweight test observers (e.g., a TestObserver struct implementing the
Observer trait that records calls to a Vec or Atomic counter), construct a
MultiObserver with those observers, call
multi.on_image_ingress(&ImageIngressEvent{...}), and assert each test observer
received exactly one call with the same event (or equal contents). Place the
test alongside existing MultiObserver tests and reference MultiObserver,
on_image_ingress, ImageIngressEvent, and the test observer implementation in the
test to ensure the dispatch path is covered.

In `@clients/agent-runtime/src/providers/compatible.rs`:
- Around line 508-539: The code silently drops images when there is no user turn
because last_user_idx (from rposition) can be None; add an explicit validation
before building api_messages: if request.images is non-empty and
last_user_idx.is_none(), return an error (e.g., anyhow::anyhow!("Multimodal
request contains images but no user message")); this check should occur before
the for (idx, msg) loop so functions like last_user_idx and request.images are
used to detect the invalid state and fail closed rather than quietly producing a
text-only request.
- Around line 528-531: The loop that reads images currently calls blocking
std::fs::read(&image.temp_path) which will block Tokio worker threads; replace
that call with the async tokio::fs::read(&image.temp_path).await and preserve
the existing error mapping (e.g.,
tokio::fs::read(&image.temp_path).await.map_err(|e| anyhow::anyhow!("Failed to
read staged image: {e}"))?), ensuring the enclosing function is async (or, if it
cannot be async, use tokio::task::spawn_blocking to perform the read) so the
base64_encode(&bytes) line remains unchanged and the request.images loop no
longer blocks the runtime.

In `@clients/agent-runtime/src/providers/gemini.rs`:
- Around line 369-387: The code currently attaches images only when Some(idx) ==
last_user_idx, but if last_user_idx is None the request proceeds without images;
update the logic in the message-to-contents assembly (look at last_user_idx,
request.images, contents and the loop that pushes Part::Text and image parts) to
explicitly reject/return an error when request.images is non-empty and
last_user_idx.is_none(), so image-bearing requests with no user turn fail fast
instead of silently dropping InlineData parts; ensure the error path returns a
clear validation error to callers.
- Around line 387-390: The loop that reads staged images uses blocking
std::fs::read(&image.temp_path) inside the async flow (for image in
request.images) — change this to tokio::fs::read(&image.temp_path).await to
avoid blocking Tokio worker threads; update the enclosing function to be async
if it isn't already, adjust the error mapping to handle the async read (preserve
the anyhow::anyhow! message), and continue to call base64_encode(&bytes) as
before (symbols to locate: request.images, image.temp_path, base64_encode).

In `@clients/agent-runtime/src/providers/reliable.rs`:
- Around line 411-428: The loop currently only gates on
provider.capabilities().supports_image_input(), which allows an image-capable
but tool-incapable provider to be chosen when request.tools is non-empty; update
the provider selection in the loop (the iteration over self.providers and the
call site around execute_single_provider) to also require the provider support
tool execution when request.tools is non-empty (e.g. check
provider.capabilities().supports_tools() or equivalent), and if no provider
satisfies both supports_image_input() and the tool capability when request.tools
is non-empty, return an explicit error (or fail fast) instead of silently
continuing; ensure the checks reference self.providers, provider.capabilities(),
request.tools, and the execute_single_provider call so the correct providers are
filtered before calling p.chat.

In `@clients/agent-runtime/src/providers/router.rs`:
- Around line 486-540: The tests only cover text-only providers and don't verify
successful routing when a provider supports images; add a test that registers at
least one provider that advertises image support and returns a valid response
for image turns, then call router.chat with a request containing images and
assert the call succeeds and that the image-capable mock's call_count()
increments; also update capabilities_merges_all_providers to construct multiple
providers (including one that supports images) and assert
caps.supports_image_input() is true to ensure capability aggregation/selection
is exercised (refer to tests chat_rejects_image_turn_to_text_only_provider,
chat_text_only_passes_through_without_image_check and
RouterProvider::capabilities).
- Around line 95-108: supports_native_tools() is inconsistent with
capabilities(): capabilities() unions native_tool_calling across all routed
providers but supports_native_tools() still checks only the default provider;
update supports_native_tools() to iterate self.providers and return true if any
provider.capabilities().native_tool_calling is true (mirroring the union logic
in capabilities()), and apply the same fix to the second occurrence noted around
the other method (lines referenced in the review) so both implementations remain
consistent.

In `@clients/agent-runtime/src/providers/traits.rs`:
- Around line 327-331: Add a regression test exercising the fail-closed image
path by creating a request with a non-empty images slice and asserting that the
default Provider::chat() returns an Err; specifically, in the existing test
block near the current chat/image tests add a case that constructs a Request (or
equivalent struct used in these tests) with images = vec![/* dummy image value
*/] and call Provider::chat(&default_provider, request).assert_err() (or
assert!(result.is_err())) to ensure the "Provider does not support image input
(no chat() override)" path is exercised and the text-only fallback does not
silently reappear; reference the Provider::chat() implementation and the
request.images check so the test targets the fail-closed behavior.

In
`@clients/web/apps/docs/src/content/docs/clients/agent-runtime/providers/index.mdx`:
- Around line 15-40: The Image Support column is wrong — audit
clients/agent-runtime/src/providers/mod.rs to see which concrete provider types
are instantiated and then check each provider's capabilities() method to
determine if image_input: true (GeminiProvider and any provider implementing
OpenAiCompatibleProvider should be marked Yes); update the English table rows
accordingly (e.g., OpenRouter/Anthropic/OpenAI are non-overriding providers so
set Image Support to No, and providers that use OpenAiCompatibleProvider such as
Groq, Mistral, xAI, DeepSeek, Together, Fireworks, Perplexity, Cohere, Bedrock,
Vercel, Cloudflare, Venice, Synthetic, OpenCode should be set to Yes where
capabilities() exposes image_input) and mirror this corrected table schema into
es/clients/agent-runtime/providers/index.mdx (add the Image Support column and
synchronize all rows).

In `@clients/web/apps/docs/src/content/docs/es/guides/cerebro/migration.md`:
- Around line 5-9: The Spanish "cerebro/migration" doc is orphaned (not present
in the sidebar and not referenced); add it to the docs navigation and create at
least one cross-reference from related docs (or from the English counterpart) so
the validator can find it. Update the sidebar configuration to include the
document slug/title and add a link from the English migration doc (or another
related page) to this Spanish locale, ensuring frontmatter parity
(owner/status/lastReviewed/appliesTo/docType) is preserved; re-run the docs
quality check to confirm the orphan error is resolved.

In
`@clients/web/apps/docs/src/content/docs/es/guides/hardware-peripherals-design.md`:
- Around line 3-8: The page's content still references "AGENTS.md" even though
the PR renamed that file to "CLAUDE.md" and the frontmatter `lastReviewed` was
updated; find the stale reference text that states "this phase was documented in
AGENTS.md" (occurring around the section referencing the review phase / lines
~263–267) and replace it with "CLAUDE.md", updating any markdown link or
filename mention to the new CLAUDE.md name so the doc and links match the code
changes.

In
`@openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/exploration.md`:
- Around line 10-13: Update the "Current State" section in exploration.md to
indicate this is a historical snapshot by adding an explicit "as-of" timestamp
or commit reference (e.g., "As of 2026-03-26 / commit <hash>") and a short note
that runtime contracts have since changed due to the multimodal implementation;
locate the "Current State" heading and append the timestamp/commit note and a
one-sentence clarification that this file is archived and no longer reflects the
live runtime behavior.

In
`@openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/specs/agent-loop/spec.md`:
- Around line 21-106: The new headings in this diff (e.g., "## ADDED
Requirements", "### Requirement: MVP Inbound Image Turn Contract", "###
Requirement: Image Admission Safety and Retention Controls", "### Requirement:
MVP Channel Boundaries and Ingress Fallback" and each "#### Scenario: ..." block
such as "Scenario: Telegram photo with caption is normalized into canonical
parts", "Scenario: Non-image attachment is not coerced into an image turn",
etc.) need a blank line after them to satisfy markdownlint MD022; update spec.md
by inserting a single empty line immediately after each of those added heading
lines so each heading is followed by a blank line before the subsequent prose or
list items.

In
`@openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/specs/agent-runtime-providers/spec.md`:
- Around line 15-89: Add a blank line between each "#### Scenario" heading and
the following list to satisfy MD022; specifically, in the archived spec update
ensure a single empty line is inserted before the bullets under the headings
"Scenario: Provider declares accepted image transport forms", "Scenario:
Undeclared provider remains text-only", "Scenario: In-scope providers are
eligible for MVP image routing", "Scenario: Out-of-scope provider is excluded
from MVP promise", "Scenario: Eligible provider receives the canonical image
turn", "Scenario: No capable provider is available", "Scenario: Runtime-managed
inline payload is used for a validated image", and "Scenario: Remote reference
is rejected when it would weaken safety controls" so each heading is followed by
a blank line before its list.

In
`@openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/tasks.md`:
- Line 40: In the archived task line that begins "3.4 Add WhatsApp Graph media
resolution/download helpers..." replace the phrase "hand validated" with the
hyphenated form "hand-validated" so the task reads "...stream bounded bytes, and
hand-validated payloads to the shared staging pipeline."; also scan the same
tasks.md for any other occurrences of "hand validated" and update them to
"hand-validated" to satisfy the linter.

In `@openspec/specs/agent-loop/spec.md`:
- Around line 37-120: The new markdown headings (e.g., "#### Scenario: WhatsApp
image turn enters the canonical runtime seam", "#### Scenario: Rejected WhatsApp
transport never reaches the runtime", "### Requirement: MVP Inbound Image Turn
Contract", and all subsequent "#### Scenario:" and "### Requirement:" headings)
are followed immediately by content and must have a blank line inserted after
each heading to satisfy markdownlint MD022; update each heading in this diff by
adding a single empty line between the heading line and the following paragraph
so each heading is separated from its content.

In `@openspec/specs/agent-runtime-providers/spec.md`:
- Around line 102-176: The MD022 warnings come from missing blank lines between
several "#### Scenario:" headings and their following list items; update each
"#### Scenario: ..." heading (e.g., headings that begin "#### Scenario: Provider
declares accepted image transport forms", "#### Scenario: Undeclared provider
remains text-only", "#### Scenario: In-scope providers are eligible for MVP
image routing", "#### Scenario: Out-of-scope provider is excluded from MVP
promise", "#### Scenario: Eligible provider receives the canonical image turn",
"#### Scenario: No capable provider is available", "#### Scenario:
Runtime-managed inline payload is used for a validated image", "#### Scenario:
Remote reference is rejected when it would weaken safety controls") to ensure
there is a single blank line between the heading line and the subsequent
bullet/list block; insert the blank line in each place so the "#### Scenario"
headings are separated from the following "-" list items to satisfy markdownlint
MD022.

In `@scripts/validate-docs-metadata.mjs`:
- Around line 65-119: GetChangedDocsFiles currently limits git queries to
docsRootRelative and returns deleted paths verbatim; change its logic to query
all changed files (remove docsRootRelative from the git diff/ls-files calls),
then build two lists: (A) all changed paths under docsRootRelative and (B) all
other changed paths; map to absolute paths, filter A for .md/.mdx, then if A is
empty but B contains validator inputs (e.g., sidebar/config/other docs inputs)
OR if any path in A does not exist (use fs.existsSync) or any deleted paths were
detected, return walk(docsRoot) instead of returning the raw list; otherwise
return the filtered, existing .md/.mdx absolute paths. Ensure you update logic
inside getChangedDocsFiles to use docsRoot and fs.existsSync checks and to treat
non-doc changes as a trigger to fall back to walk(docsRoot).

---

Outside diff comments:
In `@clients/agent-runtime/src/channels/lark.rs`:
- Around line 561-623: The webhook branch currently builds a ChannelMessage
directly from the extracted text (msg_type / parse_post_content -> text) without
applying the same normalization/gating used by the websocket path; extract and
reuse that normalization (strip `@_user_N` placeholders, trim, drop empty turns,
and suppress unmentioned group-chat traffic) before pushing the message.
Concretely: refactor the websocket normalization into a reusable helper (e.g.,
normalize_message_text or reuse the existing helper if present), call it on the
`text` variable after computing `timestamp` and `chat_id`, and only push
ChannelMessage (id/ sender/ reply_target/ content/ channel/ timestamp/ parts:
vec![ContentPart::Text { text }]) when the helper returns a non-empty/allowed
result so webhook mode follows the same gating as websocket mode. Ensure you
reference parse_post_content, msg_type, ChannelMessage, and ContentPart when
locating the code to change.

---

Duplicate comments:
In `@clients/agent-runtime/src/channels/telegram.rs`:
- Around line 1562-1564: The parse-error log for resp.json() in the getFile flow
currently interpolates the raw reqwest::Error (resp.json().await.map_err(|e| {
tracing::warn!("Telegram getFile response parse error: {e}"); ... })), which can
leak URLs or tokens; replace that interpolation with a non-sensitive
message—e.g. call tracing::warn!("Telegram getFile response parse error
(sanitized)"); without including {e} or any request data, or extract and log
only a safe error code/enum that contains no URL/token info—keeping the map_err
to return media::ImageRejectionReason::FetchFailed.

In `@clients/agent-runtime/src/config/schema.rs`:
- Around line 3128-3135: The match is using mm.vision_model_hint by-value but mm
is a shared borrow so change the match to use a borrowed string (e.g.
mm.vision_model_hint.as_deref()) and bind to Some(h) (a &str) instead of
Some(ref h); also validate h is not empty and bail with the same message if it
is empty or None so that the hint variable becomes a non-empty &str for further
use (referencing mm, vision_model_hint, and the hint binding).

In `@clients/agent-runtime/src/gateway/mod.rs`:
- Around line 2064-2105: The loop records message ids via
state.idempotency_store.record_if_new(&msg.id) before enqueueing, which leaves
reservations even when handle.enqueue(msg) fails and can collide with /webhook
keys; change the logic so you either (a) only call record_if_new after a
successful handle.enqueue, or (b) if you must reserve before enqueue, ensure you
remove/rollback that reservation on enqueue failure by calling a remove/unrecord
method on idempotency_store, and ensure the idempotency entries for WhatsApp are
namespaced (e.g., use a "whatsapp" prefix or dedicated key method) so they do
not collide with /webhook entries; update the code paths around
state.idempotency_store.record_if_new, handle.enqueue, and the final response
decision so failed enqueues do not leave permanent reservations and the 202/200
response reflects only truly accepted messages.

In `@clients/agent-runtime/src/observability/otel.rs`:
- Around line 202-216: Add a unit/integration test that exercises the
ObserverEvent::ImageIngress branch: construct an ObserverEvent::ImageIngress
with a channel, outcome and reason, send it through the same event-handling
entry point that matches on ObserverEvent (the match arm that calls
self.image_ingress.add), and assert the image_ingress metric was updated (via
the test OTLP/metric exporter or the observability struct's test-accessible
state) to prevent regressions in the image_ingress.add path.

In `@clients/agent-runtime/src/observability/prometheus.rs`:
- Around line 191-201: Add a regression test that emits an
ObserverEvent::ImageIngress and verifies the corvus_image_ingress_total metric
is exported by encode(): create a test (e.g., in the same module tests or
prometheus tests) that constructs an ObserverEvent::ImageIngress with specific
channel, outcome and reason values, calls the code path that records the event
(triggering image_ingress.with_label_values(...).inc()), then calls the
prometheus encode() function and asserts the output contains the metric name
corvus_image_ingress_total with the exact label set (channel, outcome, reason)
and an increased count; reference the ObserverEvent::ImageIngress variant, the
image_ingress counter, and the encode() exporter when adding the assertion.

In `@clients/web/apps/docs/src/content/docs/guides/architecture.md`:
- Around line 16-35: The docs tree lists "clients/web/apps/docs/" as a top-level
sibling; update the ASCII tree in the architecture guide so the web node expands
to include an apps/docs subtree (e.g., under the "web/" branch add "apps/" then
"docs/") instead of repeating the full nested path as a root item, and apply the
same change to the Spanish version to keep EN/ES aligned; ensure the updated
diagram preserves the existing nodes (clients/, modules/, gradle/, Makefile,
settings.gradle.kts) and verify the docs file references to
"clients/web/apps/docs/" remain accurate and compliant with the **/*.{md,mdx}
guideline for EN/ES parity.

In `@scripts/validate-docs-metadata.mjs`:
- Around line 127-149: Replace the fragile line-based parsing in
getMetadata/getField with a real YAML parse: have extractFrontmatter return the
raw frontmatter string, import a YAML parser (e.g., js-yaml or yaml), parse the
frontmatter once into an object, and then read description, owner, status,
lastReviewed, appliesTo, docType, and slug from that parsed object (falling back
to empty string or null as before); remove or stop using getField() for value
extraction and ensure getMetadata still returns null when no frontmatter and
returns the normalized fields from the parsed YAML when present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7c26ff46-e019-491a-b18a-6c44fc9c63eb

📥 Commits

Reviewing files that changed from the base of the PR and between 93be06c and a1098e4.

📒 Files selected for processing (101)
  • .agents/AGENTS.md
  • .github/CODEOWNERS
  • .github/PULL_REQUEST_TEMPLATE/1_fix_issue.yml
  • .github/PULL_REQUEST_TEMPLATE/2_new_feature.yml
  • .github/PULL_REQUEST_TEMPLATE/3_other_type.yml
  • .github/pull_request_template.md
  • .github/workflows/docs-quality.yml
  • clients/agent-runtime/src/agent/agent.rs
  • clients/agent-runtime/src/channels/cli.rs
  • clients/agent-runtime/src/channels/dingtalk.rs
  • clients/agent-runtime/src/channels/discord.rs
  • clients/agent-runtime/src/channels/email_channel.rs
  • clients/agent-runtime/src/channels/imessage.rs
  • clients/agent-runtime/src/channels/irc.rs
  • clients/agent-runtime/src/channels/lark.rs
  • clients/agent-runtime/src/channels/matrix.rs
  • clients/agent-runtime/src/channels/mattermost.rs
  • clients/agent-runtime/src/channels/media.rs
  • clients/agent-runtime/src/channels/mod.rs
  • clients/agent-runtime/src/channels/qq.rs
  • clients/agent-runtime/src/channels/signal.rs
  • clients/agent-runtime/src/channels/slack.rs
  • clients/agent-runtime/src/channels/telegram.rs
  • clients/agent-runtime/src/channels/traits.rs
  • clients/agent-runtime/src/channels/whatsapp.rs
  • clients/agent-runtime/src/config/mod.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/doctor/mod.rs
  • clients/agent-runtime/src/gateway/mod.rs
  • clients/agent-runtime/src/observability/log.rs
  • clients/agent-runtime/src/observability/mod.rs
  • clients/agent-runtime/src/observability/multi.rs
  • clients/agent-runtime/src/observability/otel.rs
  • clients/agent-runtime/src/observability/prometheus.rs
  • clients/agent-runtime/src/observability/traits.rs
  • clients/agent-runtime/src/onboard/wizard.rs
  • clients/agent-runtime/src/providers/compatible.rs
  • clients/agent-runtime/src/providers/gemini.rs
  • clients/agent-runtime/src/providers/reliable.rs
  • clients/agent-runtime/src/providers/router.rs
  • clients/agent-runtime/src/providers/traits.rs
  • clients/agent-runtime/src/update/mod.rs
  • clients/agent-runtime/tests/admin_config_api_integration.rs
  • clients/web/apps/chat/src/App.spec.ts
  • clients/web/apps/chat/src/App.vue
  • clients/web/apps/docs/README.md
  • clients/web/apps/docs/astro.config.mjs
  • clients/web/apps/docs/package.json
  • clients/web/apps/docs/src/content.config.ts
  • clients/web/apps/docs/src/content/docs/clients/agent-runtime/architecture.md
  • clients/web/apps/docs/src/content/docs/clients/agent-runtime/ci-map.md
  • clients/web/apps/docs/src/content/docs/clients/agent-runtime/index.mdx
  • clients/web/apps/docs/src/content/docs/clients/agent-runtime/pr-workflow.md
  • clients/web/apps/docs/src/content/docs/clients/agent-runtime/providers/index.mdx
  • clients/web/apps/docs/src/content/docs/es/clients/agent-runtime/architecture.md
  • clients/web/apps/docs/src/content/docs/es/clients/agent-runtime/ci-map.md
  • clients/web/apps/docs/src/content/docs/es/clients/agent-runtime/index.mdx
  • clients/web/apps/docs/src/content/docs/es/clients/agent-runtime/pr-workflow.md
  • clients/web/apps/docs/src/content/docs/es/clients/agent-runtime/providers/index.mdx
  • clients/web/apps/docs/src/content/docs/es/guides/architecture.md
  • clients/web/apps/docs/src/content/docs/es/guides/architecture/overview.md
  • clients/web/apps/docs/src/content/docs/es/guides/cerebro/migration.md
  • clients/web/apps/docs/src/content/docs/es/guides/cli-reference.md
  • clients/web/apps/docs/src/content/docs/es/guides/configuration.md
  • clients/web/apps/docs/src/content/docs/es/guides/customization.md
  • clients/web/apps/docs/src/content/docs/es/guides/development.md
  • clients/web/apps/docs/src/content/docs/es/guides/features.md
  • clients/web/apps/docs/src/content/docs/es/guides/getting-started.md
  • clients/web/apps/docs/src/content/docs/es/guides/gpg-setup.md
  • clients/web/apps/docs/src/content/docs/es/guides/hardware-peripherals-design.md
  • clients/web/apps/docs/src/content/docs/es/guides/release.md
  • clients/web/apps/docs/src/content/docs/es/guides/structure.md
  • clients/web/apps/docs/src/content/docs/es/guides/surrealdb.md
  • clients/web/apps/docs/src/content/docs/es/index.mdx
  • clients/web/apps/docs/src/content/docs/es/intro/introduction.mdx
  • clients/web/apps/docs/src/content/docs/guides/architecture.md
  • clients/web/apps/docs/src/content/docs/guides/architecture/overview.md
  • clients/web/apps/docs/src/content/docs/guides/cerebro/migration.md
  • clients/web/apps/docs/src/content/docs/guides/cli-reference.md
  • clients/web/apps/docs/src/content/docs/guides/configuration.md
  • clients/web/apps/docs/src/content/docs/guides/customization.md
  • clients/web/apps/docs/src/content/docs/guides/development.md
  • clients/web/apps/docs/src/content/docs/guides/features.md
  • clients/web/apps/docs/src/content/docs/guides/getting-started.md
  • clients/web/apps/docs/src/content/docs/guides/gpg-setup.md
  • clients/web/apps/docs/src/content/docs/guides/hardware-peripherals-design.md
  • clients/web/apps/docs/src/content/docs/guides/release.md
  • clients/web/apps/docs/src/content/docs/guides/structure.md
  • clients/web/apps/docs/src/content/docs/guides/surrealdb.md
  • clients/web/apps/docs/src/content/docs/index.mdx
  • clients/web/apps/docs/src/content/docs/intro/introduction.mdx
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/design.md
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/exploration.md
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/proposal.md
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/specs/agent-loop/spec.md
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/specs/agent-runtime-providers/spec.md
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/tasks.md
  • openspec/changes/archive/2026-03-26-2026-03-25-multimodal-image-input-mvp/verify-report.md
  • openspec/specs/agent-loop/spec.md
  • openspec/specs/agent-runtime-providers/spec.md
  • scripts/validate-docs-metadata.mjs

Comment thread .agents/AGENTS.md
Comment thread .github/pull_request_template.md
Comment thread clients/agent-runtime/src/channels/discord.rs
Comment thread clients/agent-runtime/src/channels/discord.rs
Comment thread clients/agent-runtime/src/channels/lark.rs
Comment thread openspec/specs/agent-loop/spec.md
Comment thread openspec/specs/agent-runtime-providers/spec.md
Comment thread scripts/validate-docs-metadata.mjs
- Replace ReDoS-vulnerable markdown link regex with backtrack-safe pattern
- Add --ignore-scripts to pnpm install in docs-quality workflow
- Add Debug derive to ResolvedImageRoute for test assertions
- Add coverage tests for image route resolution and config validation
- PATH hotspots reviewed as SAFE: execFileSync bypasses shell in CI
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
scripts/validate-docs-metadata.mjs (1)

65-121: ⚠️ Potential issue | 🟠 Major

The changed-file resolver still skips full validation for non-doc inputs and deletions.

getChangedDocsFiles() still scopes git to clients/web/apps/docs/src/content/docs/** and drops missing paths with existsSync(), so Line 382 can still print “metadata validation skipped” after sidebar/config/validator changes or doc deletions. That misses the full-walk needed to catch orphan and parity fallout.

As per coding guidelines **/*: Look for behavioral regressions, missing tests, and contract breaks across modules.

Also applies to: 382-384

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/validate-docs-metadata.mjs` around lines 65 - 121,
getChangedDocsFiles currently narrows git results to docsRootRelative and
discards missing files via existsSync, which lets config/sidebar/validator
changes or deletions skip the full walk; modify getChangedDocsFiles so after
obtaining the raw git list (variables/locations:
diffRange/output/trackedOutput/untrackedOutput) you first inspect the raw paths
for any of: files outside docsRootRelative, deleted/missing entries (existsSync
false), or repo-level docs config files (e.g., sidebar/config/validator paths);
if any such condition is present, return walk(docsRoot) to force full
validation, otherwise continue with the existing filtering
(.filter(/\.(md|mdx)$/) and mapping to path.resolve). Ensure checks reference
getChangedDocsFiles, docsRoot, docsRootRelative, existsSync, walk, and the raw
git outputs so missing/deleted files no longer cause false "metadata validation
skipped".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 2158-2229: spawn_runtime_handle currently creates the provider
(via create_routed_provider) but doesn't call provider.warmup(), so the gateway
can suffer cold-start latency; after creating let provider: Arc<dyn Provider> =
Arc::from(...), clone the Arc and kick off an async warmup in the background
(e.g. let p_clone = Arc::clone(&provider); tokio::spawn(async move { let _ =
p_clone.warmup().await; });) before spawning the dispatch loop so TLS/DNS/HTTP2
setup occurs proactively; ensure you handle/ignore the Result from warmup or log
errors as appropriate.

In `@scripts/validate-docs-metadata.mjs`:
- Around line 241-254: parityMetadata is only checked for existence and three
fields, so a locale counterpart that lacks other required frontmatter (e.g.,
owner, description, lastReviewed) can slip through; update the parity validation
in the block that uses getMetadata/parityMetadata to either (A) validate the
full required set on parityMetadata (same checks you run on metadata or call the
existing validateMetadata/validateRequiredFields helper) and push errors to
errors with the same parityRelativePath context, or (B) add parityPath to the
overall target set so it is independently validated; ensure you reference
parityMetadata, metadata, parityPath, parityRelativePath and push descriptive
messages to errors when parity frontmatter is missing or mismatched.
- Around line 333-349: The validator currently allows future lastReviewed dates
because negative ageDays bypasses the freshness check; update the logic around
reviewedAt (derived from lastReviewed) to explicitly reject dates in the future:
after constructing reviewedAt (and validating its parse), check if
reviewedAt.getTime() > Date.now() and push an error to errors (e.g.,
"missing/invalid lastReviewed" style: "relativePath: lastReviewed 'YYYY-MM-DD'
is in the future") and return/skip the staleness computation; modify the block
handling lastReviewed/reviewedAt (search for variables lastReviewed, reviewedAt,
ageDays, maxReviewAgeDaysByDocType, docType) to perform this early future-date
rejection before computing ageDays.

---

Duplicate comments:
In `@scripts/validate-docs-metadata.mjs`:
- Around line 65-121: getChangedDocsFiles currently narrows git results to
docsRootRelative and discards missing files via existsSync, which lets
config/sidebar/validator changes or deletions skip the full walk; modify
getChangedDocsFiles so after obtaining the raw git list (variables/locations:
diffRange/output/trackedOutput/untrackedOutput) you first inspect the raw paths
for any of: files outside docsRootRelative, deleted/missing entries (existsSync
false), or repo-level docs config files (e.g., sidebar/config/validator paths);
if any such condition is present, return walk(docsRoot) to force full
validation, otherwise continue with the existing filtering
(.filter(/\.(md|mdx)$/) and mapping to path.resolve). Ensure checks reference
getChangedDocsFiles, docsRoot, docsRootRelative, existsSync, walk, and the raw
git outputs so missing/deleted files no longer cause false "metadata validation
skipped".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 30e68c80-7b5a-44ad-87aa-7d46e674daa6

📥 Commits

Reviewing files that changed from the base of the PR and between 755f45a and dd55608.

📒 Files selected for processing (4)
  • clients/agent-runtime/src/channels/media.rs
  • clients/agent-runtime/src/channels/mod.rs
  • clients/web/apps/docs/src/content/docs/es/guides/cerebro/migration.md
  • scripts/validate-docs-metadata.mjs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: sonar
  • GitHub Check: pr-checks
  • GitHub Check: pr-checks
  • GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (6)
**/*

⚙️ CodeRabbit configuration file

**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.

Files:

  • scripts/validate-docs-metadata.mjs
  • clients/web/apps/docs/src/content/docs/es/guides/cerebro/migration.md
  • clients/agent-runtime/src/channels/media.rs
  • clients/agent-runtime/src/channels/mod.rs
**/*.{md,mdx}

⚙️ CodeRabbit configuration file

**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.

Files:

  • clients/web/apps/docs/src/content/docs/es/guides/cerebro/migration.md
clients/agent-runtime/src/channels/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Implement Channel trait in src/channels/ with consistent send, listen, and health_check semantics and cover auth/allowlist/health behavior with tests

Files:

  • clients/agent-runtime/src/channels/media.rs
  • clients/agent-runtime/src/channels/mod.rs
clients/agent-runtime/src/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Files:

  • clients/agent-runtime/src/channels/media.rs
  • clients/agent-runtime/src/channels/mod.rs
clients/agent-runtime/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test for code validation, or document which checks were skipped and why

Files:

  • clients/agent-runtime/src/channels/media.rs
  • clients/agent-runtime/src/channels/mod.rs
**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.

Files:

  • clients/agent-runtime/src/channels/media.rs
  • clients/agent-runtime/src/channels/mod.rs
🧠 Learnings (5)
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why

Applied to files:

  • scripts/validate-docs-metadata.mjs
  • clients/agent-runtime/src/channels/media.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/.github/**/*.{yml,yaml} : For workflow/template-only changes, ensure YAML/template syntax validity

Applied to files:

  • scripts/validate-docs-metadata.mjs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests

Applied to files:

  • clients/agent-runtime/src/channels/media.rs
  • clients/agent-runtime/src/channels/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths

Applied to files:

  • clients/agent-runtime/src/channels/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Keep startup path lean and avoid heavy initialization in command parsing flow

Applied to files:

  • clients/agent-runtime/src/channels/mod.rs
🔇 Additional comments (19)
clients/agent-runtime/src/channels/media.rs (7)

102-144: LGTM: Magic-byte sniffing correctly prioritizes content over declared MIME.

The PNG signature now checks all 8 bytes. Declared MIME is intentionally ignored when sniffing fails—correct security posture.


1-8: LGTM: Constants are well-documented and appropriately sized.

10 MiB max and 1 image per turn are reasonable MVP limits.


10-37: LGTM: Clean MIME type enum with proper parsing.

The roundtrip between from_mime_str and as_str is tested and correct.


39-68: LGTM: Rejection reasons cover all failure modes with safe Display output.


77-100: LGTM: StagedImage with safe best-effort cleanup.

The cleanup is idempotent (checks exists() first) and logs only the path, not sensitive data.


146-162: LGTM: Boundary validators are straightforward and correct.


164-363: LGTM: Comprehensive test coverage for all validation paths.

Tests cover MIME sniffing success/failure, size/count boundaries, and cleanup behavior including the idempotent noop case.

clients/agent-runtime/src/channels/mod.rs (11)

237-244: LGTM: Model selection correctly threads vision route into execution path.

The effective_model is now passed through to handle_successful_response and used in enforce_strict_memory_validation, addressing the concern about post-response validation using the correct model.


624-654: LGTM: Memory recall/autosave correctly skipped for blank text.

Image-only turns with empty text_projection() and empty content now skip build_memory_context and autosave, preventing irrelevant memory entries.


774-805: LGTM: Image count validation correctly precedes staging work.

validate_image_count at line 777 runs before stage_channel_images at line 808, preventing unnecessary fetch/write operations for over-limit requests.


853-877: LGTM: Unsupported channel rejection now emits telemetry.

The emit_image_ingress call at lines 856-866 records the rejection before returning, making these cases visible in metrics.


95-123: LGTM: ChannelRuntimeHandle provides clean enqueue interface.

Clone + Send + Sync traits are appropriate for sharing across gateway and channel listeners. The try_send provides backpressure without blocking.


125-136: LGTM: RAII guard ensures temp files are cleaned on all exit paths.


160-191: LGTM: Image route resolution with proper fail-closed semantics.

Empty/missing hints and non-image-capable routes correctly return rejection errors.


213-235: LGTM: Telemetry helper emits only operational metadata, no secrets.


504-534: LGTM: Staged images forwarded to provider without cloning.

The slice reference &[StagedImage] aligns with ChatRequest.images signature for zero-copy forwarding.


1014-1050: LGTM: Channel-specific staging with fail-closed fallback for unknown channels.

Unknown channels return empty Vec, triggering the fail-closed rejection at line 855. The per-channel fetch_and_stage_image methods validate MIME and size before writing.


3860-4693: LGTM: Comprehensive test coverage for multimodal gating and telemetry.

Tests verify:

  • ChannelRuntimeHandle enqueue/backpressure behavior
  • Image count, staging, and config-based rejections
  • Telemetry emission for all rejection paths
  • Model selection for image vs text turns
  • Memory skip for image-only messages
  • RAII cleanup on guard drop
clients/web/apps/docs/src/content/docs/es/guides/cerebro/migration.md (1)

4-9: Metadata and credential guidance now match the runtime contract.

The new slug lines up with the registered sidebar entry, and the switch to CORVUS_CEREBRO_AUTH_TOKEN matches the runtime/config references.

Also applies to: 44-56

Comment thread clients/agent-runtime/src/channels/mod.rs
Comment thread scripts/validate-docs-metadata.mjs
Comment thread scripts/validate-docs-metadata.mjs
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 26, 2026

Caution

Review failed

An error occurred during the review process. Please try again later.

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds canonical multimodal channel contract (ordered ContentPart parts), image staging/validation with RAII cleanup, multimodal config and routing, provider image-capability plumbing and adapters, gateway enqueueing for WhatsApp/Telegram into a shared channel runtime, observability/metrics for image ingress, docs metadata validation, and CI/workflow updates.

Changes

Cohort / File(s) Summary
Core channel contract & media
clients/agent-runtime/src/channels/traits.rs, clients/agent-runtime/src/channels/media.rs, clients/agent-runtime/src/channels/mod.rs
Add ContentPart and ChannelMessage.parts; implement StagedImage, MIME sniffing, size/count limits, StagedImageGuard, ChannelRuntimeHandle, and image-routing/gating in the channel loop.
Channel adapters & ingest
clients/agent-runtime/src/channels/.../*
.../{telegram.rs,whatsapp.rs,lark.rs,discord.rs,qq.rs,cli.rs,dingtalk.rs,email_channel.rs,imessage.rs,irc.rs,matrix.rs,mattermost.rs,signal.rs,slack.rs}
Populate parts across adapters; Telegram/WhatsApp now extract image parts and add fetch_and_stage_image helpers; many listeners updated to construct multimodal ChannelMessage.
Media staging runtime plumbing
clients/agent-runtime/src/channels/mod.rs, clients/agent-runtime/src/channels/media.rs
Introduce staging APIs, pass staged images into channel runtime params, implement admission gating, staging pipeline, ingress events, and enqueue handle lifecycle.
Provider adapters & routing
clients/agent-runtime/src/providers/{traits.rs,compatible.rs,gemini.rs,reliable.rs,router.rs}
Add images: &'a [StagedImage] to ChatRequest; extend ProviderCapabilities (image_input, transport forms); implement multimodal chat for OpenAI-compatible & Gemini; Router/Reliable aggregate capabilities and enforce fail‑closed image routing.
Config, onboarding & gateway
clients/agent-runtime/src/config/schema.rs, clients/agent-runtime/src/config/mod.rs, clients/agent-runtime/src/onboard/wizard.rs, clients/agent-runtime/src/gateway/mod.rs
Add MultimodalConfig, ModelRouteConfig.allow_image_input, validation; onboarding populates defaults; gateway spawns ChannelRuntimeHandle and enqueues verified WhatsApp messages when present.
Observability & metrics
clients/agent-runtime/src/observability/{traits.rs,log.rs,multi.rs,otel.rs,prometheus.rs,mod.rs}
New ImageIngressEvent/Outcome/Reason, observer hook on_image_ingress, logging arm, Prometheus/Otel counters and fan-out.
Tests & callsite updates
clients/agent-runtime/src/{agent/agent.rs,channels/mod.rs,update/mod.rs}, clients/agent-runtime/tests/*, clients/agent-runtime/src/channels/*
Update callsites/tests to pass images: &[]; add tests for staging, gating, routing, staged-image cleanup, and runtime enqueue behavior.
Docs content & validation CI
scripts/validate-docs-metadata.mjs, clients/web/apps/docs/{package.json,README.md,content.config.ts,astro.config.mjs}, clients/web/apps/docs/src/content/docs/**/*
Add docs metadata validator, pnpm script and check integration, extend frontmatter schema and update many docs frontmatter and sidebar entries.
Repo governance & CI templates
.github/{CODEOWNERS,pull_request_template.md,PULL_REQUEST_TEMPLATE/*}, .github/workflows/docs-quality.yml, .github/workflows/sonarqube-analysis.yml
Add CODEOWNERS for docs, require Documentation Impact in PR templates, add Docs Quality workflow and adjust Sonar exclusions.
Design/specs & verification
openspec/changes/archive/..., openspec/specs/...
Add design/proposal/exploration/specs/tasks/verify-report documenting MVP constraints (Telegram+WhatsApp, provider capability declarations, fail‑closed routing, staging rules).
Minor web & tests
clients/web/apps/chat/src/{App.vue,App.spec.ts}
Small test robustness fixes and a lint-ignore directive; no functional frontend changes.

Sequence Diagram(s)

sequenceDiagram
    participant Channel as Channel (Telegram/WhatsApp)
    participant Stager as Image Staging
    participant Router as Route Resolver
    participant Provider as Provider (OpenAI/Gemini)
    participant Runtime as Channel Runtime

    Channel->>Runtime: raw inbound webhook/update
    Runtime->>Stager: extract parts and request staging
    Stager->>Stager: sniff MIME, enforce size/count, write temp file
    alt staging rejected
        Stager-->>Runtime: ImageIngressEvent(Rejected, reason)
        Runtime->>Channel: send rejection reply (fail-closed)
    else staging admitted
        Stager->>Router: resolve vision_model_hint → provider+model
        alt no eligible provider
            Router-->>Runtime: ImageIngressEvent(Rejected, NoProvider)
            Runtime->>Channel: send unavailable/unable-to-handle
        else eligible provider
            Runtime->>Provider: forward staged images + text (images inline/transport)
            Provider-->>Runtime: chat response / tool calls
            Runtime->>Channel: deliver response
            Runtime-->>Stager: cleanup staged temp files
        end
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Possibly related PRs

Suggested labels

area:vue

Suggested reviewers

  • yuniel-acosta
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/multimodal-image-input-mvp

- Add background provider warmup in spawn_runtime_handle to avoid cold-start latency
- Validate required fields (description, owner, lastReviewed) on locale parity counterparts
- Reject future lastReviewed dates in docs metadata validator
- Fall back to full walk when config/sidebar/non-docs files change in diff
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

@yacosta738 yacosta738 merged commit b0aeb20 into main Mar 26, 2026
17 of 19 checks passed
@yacosta738 yacosta738 deleted the feat/multimodal-image-input-mvp branch March 26, 2026 13:38
@yacosta738 yacosta738 mentioned this pull request Mar 26, 2026
This was referenced Apr 11, 2026
@dallay-bot dallay-bot Bot mentioned this pull request May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement multimodal image input MVP Define multimodal image input support for selected channels and providers

1 participant