feat(runtime): add audio input support with local transcription for Telegram by yacosta738 · Pull Request #413 · dallay/corvus

yacosta738 · 2026-04-03T17:58:10Z

Add audio-to-text input capability so agents can receive voice notes and audio files (OGG/Opus, MP3, WAV, M4A) via Telegram, transcribe them locally using whisper.cpp, and feed the transcription into the normal agent conversation flow.

Key changes:

ContentPart::Audio variant for multimodal message parsing
Transcriber trait as new runtime extension point for STT engines
WhisperCliTranscriber wrapping whisper.cpp CLI with concurrency guard
Audio media module: MIME sniffing, size/duration validation, staging
7-step pipeline: parse → gate → fetch → validate → stage → transcribe → inject
[audio] TOML config section (disabled by default, fail-closed)
AudioIngressEvent observability for all admission/rejection paths
StagedAudioGuard RAII cleanup on all exit paths
Doctor health checks for whisper binary and model availability
Zero new Rust crate dependencies

Privacy: all transcription is local (NFR1), no audio data leaves the operator's infrastructure.

Closes #246

…elegram Add audio-to-text input capability so agents can receive voice notes and audio files (OGG/Opus, MP3, WAV, M4A) via Telegram, transcribe them locally using whisper.cpp, and feed the transcription into the normal agent conversation flow. Key changes: - ContentPart::Audio variant for multimodal message parsing - Transcriber trait as new runtime extension point for STT engines - WhisperCliTranscriber wrapping whisper.cpp CLI with concurrency guard - Audio media module: MIME sniffing, size/duration validation, staging - 7-step pipeline: parse → gate → fetch → validate → stage → transcribe → inject - [audio] TOML config section (disabled by default, fail-closed) - AudioIngressEvent observability for all admission/rejection paths - StagedAudioGuard RAII cleanup on all exit paths - Doctor health checks for whisper binary and model availability - Zero new Rust crate dependencies Privacy: all transcription is local (NFR1), no audio data leaves the operator's infrastructure. Closes #246

linear · 2026-04-03T17:58:14Z

DALLAY-150 Add audio input support for agents (Telegram, HTTP Gateway, CLI)

coderabbitai · 2026-04-03T17:58:19Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR implements Phase 1 audio input support for agents, introducing local audio ingestion via Telegram, validation through MIME sniffing and size/duration limits, transcription using a whisper.cpp CLI wrapper, and injection into the existing agent conversation pipeline with observability instrumentation and configuration validation.

Changes

Cohort / File(s)	Summary
Audio Core Media & Validation `clients/agent-runtime/src/channels/audio_media.rs`	New module defining audio MIME enum with parsing/canonical serialization, magic-byte MIME sniffing (with precedence over declared MIME), rejection reason error types, size/duration validators, and `StagedAudio` container with SHA-256, sniffed MIME, and RAII cleanup; includes `AudioHistoryMeta` for conversation context serialization with truncation/formatting.
Message Type Extensions `clients/agent-runtime/src/channels/traits.rs`	Added `ContentPart::Audio` variant with channel metadata, optional MIME/caption/filename/bytes/duration; added helpers `has_audio_parts()` and `audio_parts()` to `ChannelMessage`; updated `text_projection()` to include non-empty audio captions.
Channel Message Processing `clients/agent-runtime/src/channels/mod.rs`	Integrated audio pipeline into `process_channel_message()` before memory enrichment: gating on config/transcriber availability, staging audio (Telegram-only), MIME/size/duration validation, transcription with `Transcriber` trait, and injection of transcription as text; added observability event emission and RAII cleanup guard; extended `handle_successful_response` to store audio metadata via `ChatMessage::user_with_audio()`/`user_with_media()`; updated `build_history()` to include audio context.
Telegram Audio Integration `clients/agent-runtime/src/channels/telegram.rs`	Extended parsing to recognize `voice` and `audio` message fields, emit `ContentPart::Audio`; added `TelegramChannel::fetch_and_stage_audio()` method performing pre-flight duration/size validation, HTTP download with streaming size enforcement, MIME sniffing, SHA-256 computation, and atomic temp-file staging; updated error-handling control flow for unauthorized audio messages.
Transcription Abstraction `clients/agent-runtime/src/transcription/traits.rs`, `clients/agent-runtime/src/transcription/whisper_cli.rs`	New `Transcriber` trait with async `transcribe()` and `health_check()` methods; `WhisperCliTranscriber` implementation using subprocess spawning with configurable binary/model/language, semaphore-based concurrency limiting, per-call timeout, stderr inspection for error classification (`Corrupted` vs `TranscriptionFailed`), and `[BLANK_AUDIO]` marker filtering; includes `resolve_model_path()` for model file discovery.
Audio Configuration & Validation `clients/agent-runtime/src/config/schema.rs`, `clients/agent-runtime/src/config/mod.rs`	New `AudioConfig` struct with enable flag, allowed-channel allowlist, size/duration ceilings, transcription model/language, whisper binary path, concurrency, and timeout; wired into `Config` with serde defaults; validation enforces nonzero bounds and phase-1-channel warnings when enabled; re-exported in public API.
Observability Audio Events `clients/agent-runtime/src/observability/traits.rs`, `clients/agent-runtime/src/observability/mod.rs`, `clients/agent-runtime/src/observability/log.rs`, `clients/agent-runtime/src/observability/otel.rs`, `clients/agent-runtime/src/observability/prometheus.rs`	Added `AudioIngressOutcome` (Admitted/Rejected), `AudioIngressReason` enum (11 variants with snake_case `Display`), and `AudioIngressEvent` struct; extended `ObserverEvent` with `AudioIngress` variant; implemented `on_audio_ingress()` trait method with default forward to `record_event()`; added logging/OTEL/Prometheus metrics with `channel`, `outcome`, `reason` labels.
Provider Message & History Integration `clients/agent-runtime/src/providers/traits.rs`	Added `audio_metadata: Option<Vec<AudioHistoryMeta>>` field to `ChatMessage` (with Serde `default` and `skip_serializing_if`); added constructors `user_with_audio()` and `user_with_media()` that conditionally set metadata when vectors are non-empty; updated existing constructors to initialize `audio_metadata: None`.
Provider Test Fixtures `clients/agent-runtime/src/providers/anthropic.rs`, `compatible.rs`, `copilot.rs`, `openrouter.rs`, `router.rs`	Updated `ChatMessage` test constructions across unit/integration tests to include `audio_metadata: None` field, aligning fixture initialization with updated struct shape.
Test & Utility Updates `clients/agent-runtime/src/channels/discord.rs`, `clients/agent-runtime/src/channels/whatsapp.rs`	Generalized panic-match arms from specific `ContentPart` variants to wildcard `_` patterns, improving test robustness to new variants without behavior change.
Runtime Startup & Diagnostics `clients/agent-runtime/src/main.rs`, `clients/agent-runtime/src/lib.rs`, `clients/agent-runtime/src/doctor/mod.rs`	Added `transcription` module declaration; extended `run(config)` in doctor to call `check_audio_health()`, which verifies whisper binary with `--help` and resolves transcription model file presence when audio is enabled; logs skipped checks when disabled.
Configuration Wizard `clients/agent-runtime/src/onboard/wizard.rs`	Updated `run_wizard()` and `run_quick_setup()` to initialize `audio: AudioConfig::default()` in `Config` construction.
Design & Specification Documentation `openspec/changes/archive/2026-04-03-audio-input-support/*`, `openspec/specs/audio-input/spec.md`	Added comprehensive design, exploration, proposal, specification, task plan, verification and archive reports documenting Phase 1 audio ingestion, transcription, integration points, error taxonomy, privacy/concurrency constraints, and compliance matrix.

Sequence Diagram(s)

sequenceDiagram
    participant User as Telegram User
    participant Telegram as Telegram Channel
    participant Stage as Staging (Temp File)
    participant Transcriber as Whisper CLI
    participant Provider as LLM Provider
    participant Agent as Agent Loop

    User->>Telegram: Send voice/audio message
    Telegram->>Telegram: Parse ContentPart::Audio
    Telegram->>Telegram: Gate on config/transcriber
    Telegram->>Telegram: Fetch file from Telegram API
    Telegram->>Stage: Validate MIME (magic bytes)<br/>Check size/duration<br/>Write temp file with SHA-256
    
    rect rgba(100, 150, 200, 0.5)
    Note over Stage,Transcriber: Audio Pipeline
    Stage->>Transcriber: Pass StagedAudio
    Transcriber->>Transcriber: Acquire semaphore permit<br/>(concurrency control)
    Transcriber->>Transcriber: Spawn whisper subprocess<br/>with model/timeout
    Transcriber->>Transcriber: Parse stdout<br/>Filter [BLANK_AUDIO]<br/>Guard empty transcription
    Transcriber-->>Stage: Return TranscriptionResult
    end
    
    Stage->>Telegram: Emit AudioIngressEvent<br/>(Admitted/Rejected)
    Stage->>Stage: RAII cleanup temp file
    
    Telegram->>Telegram: Inject transcription<br/>Replace Audio with Text
    Telegram->>Telegram: Build AudioHistoryMeta<br/>for conversation context
    Telegram->>Provider: Send ChatMessage<br/>with audio_metadata
    Provider->>Agent: Include audio history context
    Agent->>Provider: Generate response
    Provider-->>Telegram: Response text
    Telegram-->>User: Send text response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Rationale: Heterogeneous changes spanning new abstraction layers (Transcriber trait, audio media module with MIME sniffing), deep integration into the message processing pipeline with careful ordering (gating before memory enrichment), RAII cleanup guarantees, subprocess concurrency control with semaphores, Telegram-specific fetch/stage implementation, and widespread observability instrumentation. Requires careful validation of error propagation paths, transcription rejection handling, cleanup side effects, and concurrency correctness.

Possibly related PRs

feat: runtime image normalization pipeline with history support (#267) #333: Implements parallel media pipelines modifying overlapping runtime surfaces (build_history/handle_successful_response, ChannelMessage/ChatMessage history metadata, staging patterns), closely related to audio metadata integration.
feat(agent-runtime): add auth and provider runtime upgrades #29: Modifies channels/mod.rs and ChannelRuntimeContext with transcriber field initialization and pipeline extension, directly related to audio runtime context setup.
feat: multimodal image input mvp #324: Implements multimodal ingress with similar ContentPart extensions, staging/RAII patterns, observability trait changes, and config/doctor validation structure.

Suggested reviewers

yuniel-acosta

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title uses conventional commit format (feat prefix) and clearly summarizes the main change—adding audio input support with local transcription for Telegram. However, it exceeds the 72-character limit at 76 characters.	Reduce title to 72 characters or fewer, e.g., 'feat(runtime): add audio input support for Telegram' (58 chars) or similar.
Description check	❓ Inconclusive	The PR description covers key changes, motivation, and links the related issue (`#246`). However, it does not follow the provided template structure (missing discrete sections for Related Issues, Tested Information, Documentation Impact, Breaking Changes, and Checklist).	Restructure the description to match the template: add discrete sections for Related Issues, Tested Information, Documentation Impact, Breaking Changes, and a completed Checklist.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	The PR implements all primary coding requirements from issue `#246`: audio parsing (ContentPart::Audio), local transcription (Transcriber trait + WhisperCliTranscriber), MIME/size/duration validation, staging with RAII cleanup, injection into the message flow, observability (AudioIngressEvent), error taxonomy, and privacy constraints (local-only processing).
Out of Scope Changes check	✅ Passed	All changes are scoped to issue `#246` objectives: audio pipeline (parse/gate/fetch/validate/stage/transcribe/inject), configuration, observability, Telegram integration, and doctor health checks. No unrelated refactoring, feature creep, or tangential changes detected.
Docstring Coverage	✅ Passed	Docstring coverage is 92.45% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/dallay-150-add-audio-input-support-for-agents-telegram-http-gateway-cli

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-03T17:59:47Z

✅ Contributor Report

User: @yacosta738
Status: Passed (12/13 metrics passed)

Metric	Description	Value	Threshold	Status
PR Merge Rate	PRs merged vs closed	89%	>= 30%	✅
Repo Quality	Repos with ≥100 stars	0	>= 0	✅
Positive Reactions	Positive reactions received	10	>= 1	✅
Negative Reactions	Negative reactions received	0	<= 5	✅
Account Age	GitHub account age	3080 days	>= 30 days	✅
Activity Consistency	Regular activity over time	108%	>= 0%	✅
Issue Engagement	Issues with community engagement	0	>= 0	✅
Code Reviews	Code reviews given to others	510	>= 0	✅
Merger Diversity	Unique maintainers who merged PRs	2	>= 0	✅
Repo History Merge Rate	Merge rate in this repo	91%	>= 0%	✅
Repo History Min PRs	Previous PRs in this repo	195	>= 0	✅
Profile Completeness	Profile richness (bio, followers)	90	>= 0	✅
Suspicious Patterns	Spam-like activity detection	1	N/A	❌

_{Contributor Report evaluates based on public GitHub activity. Analysis period: 2025-04-04 to 2026-04-04}

Add unit tests for audio rejection user messages, ingress reason mapping, config validation, Telegram voice/audio JSON parsing, and pipeline integration to reach ≥80% coverage on new code.

cloudflare-workers-and-pages · 2026-04-03T18:59:09Z

Deploying corvus with Cloudflare Pages

Latest commit:	`b4f304f`
Status:	✅ Deploy successful!
Preview URL:	https://c785439c.corvus-42x.pages.dev
Branch Preview URL:	https://feature-dallay-150-add-audio.corvus-42x.pages.dev

View logs

coderabbitai

Actionable comments posted: 27

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

clients/agent-runtime/src/channels/mod.rs (2)
619-688: ⚠️ Potential issue | 🟠 Major

Put the audio stages under the per-turn timeout.

This block now does fetch/stage/transcribe work before the only timeout in the handler. A slow Telegram download or wedged whisper process can hold a worker past CHANNEL_MESSAGE_TIMEOUT_SECS and never hit the timeout reply path. Wrap the whole turn, or pass a remaining budget into the audio stages.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/mod.rs` around lines 619 - 688,
process_channel_message currently performs the audio pipeline
(gate_audio_config, gate_and_stage_audio, transcribe_audio,
inject_transcription) before the per-turn timeout, allowing slow
downloads/transcription to exceed CHANNEL_MESSAGE_TIMEOUT_SECS; move the entire
audio stages under the per-turn timeout boundary (or compute remaining_budget
and pass it into gate_audio_config/gate_and_stage_audio/transcribe_audio) so
that these calls are canceled when the channel turn times out, and ensure any
temp resources from audio_guard are still cleaned up on timeout.
2705-2721: ⚠️ Potential issue | 🔴 Critical

Instantiate the transcriber in both runtime constructors.

gate_audio_config() rejects whenever ctx.transcriber is empty, but both production ChannelRuntimeContext builders still hard-code transcriber: None. With [audio] enabled, every audio turn will fail as TranscriberUnavailable, so the new feature never actually admits audio in either start_channels() or spawn_runtime_handle().

Also applies to: 2784-2800
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/mod.rs` around lines 2705 - 2721, The
ChannelRuntimeContext currently sets transcriber: None which causes
gate_audio_config() to reject audio; fix by constructing the transcriber before
building runtime_ctx and passing it into ChannelRuntimeContext as transcriber:
Some(...) instead of None. Locate where runtime_ctx is created (the
ChannelRuntimeContext instantiation in start_channels() and the analogous one in
spawn_runtime_handle()) and call the existing audio/transcriber factory (e.g.,
the module/function used to create transcribers in this crate—invoke it with the
runtime/config) to produce a transcriber instance, then set transcriber:
Some(transcriber) in both constructors so gate_audio_config() sees a transcriber
present. Ensure any errors from creating the transcriber are handled/propagated
consistent with existing error handling.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clients/agent-runtime/src/channels/audio_media.rs`:
- Around line 118-124: The MP3 magic-byte check in audio_media.rs is too strict;
update the second-byte test to accept any value with the top three bits set (the
MPEG sync continuation) instead of only 0xFB/0xF3/0xF2. Replace the explicit
equality checks on sniffed_bytes[1] with a mask test like (sniffed_bytes[1] &
0xE0) == 0xE0 in the same if that returns AllowedAudioMime::Mp3 so valid MP3
frame headers aren’t rejected.

In `@clients/agent-runtime/src/channels/discord.rs`:
- Line 955: The panic message in the wildcard match arm (currently `_ =>
panic!("expected Image, got Text")`) is inaccurate; change the arm to bind the
unmatched value (e.g., `_` -> `other`) and update the panic to either a generic
message like "expected Image, got non-Image variant" or include the actual
variant via formatting (e.g., panic!("expected Image, got {:?}", other)) so the
failure text matches the wildcard behavior.

In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 1271-1283: The code currently maps the "more than one audio part"
validation to audio_media::AudioRejectionReason::SystemError; update the
reject_audio_turn call in the audio_parts.len() > 1 branch to use a specific
rejection reason (e.g.
audio_media::AudioRejectionReason::TooManyAudioAttachments or a clearly named
variant like MultipleAudioAttachments). If that enum variant does not exist, add
it to audio_media::AudioRejectionReason and use it here so the rejection path in
reject_audio_turn, telemetry, and any user-facing messages can distinguish
validation failures from internal system errors. Ensure you reference the
audio_parts check, the reject_audio_turn call, and replace the SystemError enum
usage with the new specific variant.
- Around line 670-681: The emitted transcription latency is using
tx.duration_secs (clip length) instead of the actual processing time; update the
code that calls emit_audio_ingress in the loop over audio_guard and
transcriptions to use the TranscriptionResult.processing_ms (preserved from
transcribe_audio()) converted to ms instead of
tx.duration_secs.map(duration_f64_to_ms); ensure TranscriptionResult returned by
transcribe_audio() includes processing_ms and that the other similar
emit_audio_ingress usage (the block around lines 1376-1405) is updated the same
way so both places report real transcription latency.

In `@clients/agent-runtime/src/channels/telegram.rs`:
- Around line 1824-1833: The current staging uses a predictable temp path built
from sha256 (temp_path) and writes via tokio::fs::write, which risks races and
symlink/clobber attacks; change the logic in the block that constructs temp_path
and calls tokio::fs::write (and returns
audio_media::AudioRejectionReason::FetchFailed on error) to create a secure,
unique temp file (use std::fs::OpenOptions with create_new or a NamedTempFile
equivalent), write the bytes via the file handle rather than atomically
overwriting a predictable path, and then use/rename that file to the final name
or return its path; retain the same error mapping but ensure failures are logged
with the file handle/path for debugging.
- Around line 63-107: The audio/voice parsing now creates ContentPart::Audio
even when derive_text_projection() returns None, which means
handle_unauthorized_message() bails out without sending an approval prompt;
update the unauthorized-path logic (where derive_text_projection(),
handle_unauthorized_message(), and send_unauthorized_notification() are used) to
detect media-only updates (parts contains audio but text projection is None) and
call send_unauthorized_notification() so unapproved senders still get the
notification; apply the same fix for the other audio-handling blocks (the voice
and audio branches that push ContentPart::Audio) and add a regression test that
posts an unauthorized audio-only update and asserts
send_unauthorized_notification() (or the channel’s outbound notification) was
invoked.

In `@clients/agent-runtime/src/config/schema.rs`:
- Around line 304-308: Replace the duplicated hard-coded constants in schema.rs
with the shared definitions from the audio_media module: remove the local
MAX_AUDIO_BYTES_CEILING and MAX_AUDIO_DURATION_SECS_CEILING declarations and
import the constants from channels::audio_media (e.g. use
crate::channels::audio_media::{MAX_AUDIO_BYTES_CEILING,
MAX_AUDIO_DURATION_SECS_CEILING};), so startup validation uses the same values
as runtime media validation (refer to the constants named
MAX_AUDIO_BYTES_CEILING and MAX_AUDIO_DURATION_SECS_CEILING).
- Around line 336-341: Add validation to reject zero for the transcription
controls: ensure max_concurrent_transcriptions and transcription_timeout_secs
are > 0 during config validation (e.g., in the Config/Schema validation method
in clients/agent-runtime/src/config/schema.rs). If either field equals 0, return
a clear startup error (with context naming max_concurrent_transcriptions or
transcription_timeout_secs) rather than allowing runtime operation; use the
existing validation/error pattern used elsewhere in the file (and reference
default_max_concurrent_transcriptions and default_transcription_timeout_secs
when documenting/recovering).
- Around line 313-342: The AudioConfig struct currently allows
unknown/misspelled TOML keys which silently fall back to defaults; update the
AudioConfig definition to add the serde attribute #[serde(deny_unknown_fields)]
so deserialization fails on unknown fields (matching the parent Config behavior)
— locate the AudioConfig struct in schema.rs and add that attribute above its
#[derive(...)] line.

In `@clients/agent-runtime/src/doctor/mod.rs`:
- Around line 944-971: The test audio_health_pass_model_exists duplicates the
production model-check logic instead of exercising check_audio_health(), so
replace the inline existence checks with a call to check_audio_health() (or the
specific helper it uses) to ensure the real path is tested; set up the TempDir
and model file as before, then call check_audio_health() (or the exported
function that returns Vec<DiagItem>), and assert on the returned items' length,
Severity::Ok and message contains "found" to validate the real logic (reference
test name audio_health_pass_model_exists, function check_audio_health, types
DiagItem and Severity).
- Around line 659-677: The check currently treats any existing filesystem entry
as a valid whisper model; change the logic to verify the resolved model_path is
a regular file (e.g., use model_path.is_file() or metadata().is_file()) before
pushing DiagItem::ok for the transcription model (ac.transcription_model); if
not a file, push DiagItem::error with the same contextual message referencing
model_path.display() so directories don't produce false-positive doctor results.

In `@clients/agent-runtime/src/observability/otel.rs`:
- Around line 201-202: ObserverEvent::AudioIngress is currently ignored in the
OTEL backend (the match arm with ObserverEvent::AudioIngress(_) is a no-op), so
audio admit/reject telemetry and reasons are not recorded; update the OTEL
handler in otel.rs to record a metric and associated attributes for audio
ingress events instead of silently dropping them—extract the admit/reject status
and reason from the AudioIngress payload and use the existing OTEL metric
recorder (same subsystem used for other ObserverEvent arms) to emit a counter or
histogram and set attributes like "audio.admit" (bool/string) and "audio.reason"
(string) so OTEL deployments capture admit/reject counts and reasons.

In `@clients/agent-runtime/src/observability/prometheus.rs`:
- Around line 190-191: The match arm currently ignores
ObserverEvent::AudioIngress which prevents audio admit/reject/failure metrics
from being exposed; update the handler (the match over ObserverEvent in
prometheus.rs) to process ObserverEvent::AudioIngress instead of discarding it,
and map its inner variants to the same Prometheus counters used for other
ingress types (increment the appropriate admit/reject/failure metrics and set
labels/timestamps as done for the existing ingress events), using the
ObserverEvent::AudioIngress symbol to locate the code and mirror the logic used
for the other ingress-related arms.

In `@clients/agent-runtime/src/providers/traits.rs`:
- Around line 54-64: Add symmetric serde tests for the new audio_metadata field
matching the existing image_metadata tests: write a missing-field
deserialization test that deserializes JSON lacking "audio_metadata" into the
same struct used in traits.rs (exercise user_with_audio / the message struct)
and asserts audio_metadata == None, and write a skip-serialize-none test that
constructs the struct with audio_metadata = None, serializes it to JSON, and
asserts the "audio_metadata" key is not present; place these tests alongside the
existing image_metadata serde tests so they run in the same test module.
- Around line 54-64: The PR currently builds mixed-media turns by calling
user_with_audio(...) and then mutating image_metadata later, which creates
partial-state; add a single constructor (e.g., user_with_media) that accepts
content plus both image_metadata and audio_metadata (or Option-wrapped Vecs) and
returns Self with role, content, image_metadata and audio_metadata set
atomically; update callers that currently call user_with_audio and then set
image_metadata (the code mutating image_metadata) to call user_with_media
instead; optionally keep thin helpers user_with_audio and user_with_image that
forward to user_with_media to preserve existing call-sites.

In `@clients/agent-runtime/src/transcription/whisper_cli.rs`:
- Around line 150-156: The current check in the whisper-cli subprocess handling
(the branch that tests output.status.success() in the function handling
transcription) treats any non-zero exit as AudioRejectionReason::Corrupted;
change this so the default error returned for non-zero exits is a
transcription/system error (e.g., AudioRejectionReason::TranscriptionFailed or
SystemError) and only map to AudioRejectionReason::Corrupted when stderr
contains clear media-decode/input failure signatures (detect keywords like
"decode", "unsupported format", "invalid data", "couldn't parse", "ffmpeg",
"libav", or similar). Update the error path that logs via tracing::error! to
still include stderr and exit code, and perform a small pattern match on the
stderr string to switch to Corrupted only when those decode-related tokens are
present; otherwise return the TranscriptionFailed/SystemError variant.
- Around line 73-89: The function resolve_model_path currently returns the
per-user path whenever a home directory exists without checking whether the file
actually exists; change it to construct the user path (using
user_dirs.home_dir() and the filename), test that path.exists(), and only return
it if present—otherwise fall back to the system path (e.g.,
PathBuf::from(format!("/usr/local/share/whisper/{filename}"))). Ensure you still
keep the existing fallback when directories::UserDirs::new() is None. Use the
same local variables (filename, user_dirs, home_dir) so callers of
resolve_model_path need no changes.
- Around line 126-147: The spawned whisper-cli child is left running on timeout
because the wait future is dropped; fix by enabling kill-on-drop on the Command
before spawning (call cmd.kill_on_drop(true)) or by ensuring the Child is
explicitly killed (call child.kill().await and wait for its exit) in the timeout
branch and any early-return error branches; update the code around cmd.spawn()
and the timeout match (the variables cmd, child, self.timeout, and the timeout
Err(_) branch) so the child is terminated before returning and the semaphore
permit is only released after the child is killed/awaited.

In `@openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md`:
- Line 4: Update the stale GitHub issue link in archive-report.md by replacing
the repository segment "anthropics/corvus" with "dallay/corvus" so the issue URL
becomes https://github.com/dallay/corvus/issues/246; ensure the change updates
the exact text shown in the file (the string
"https://github.com/anthropics/corvus/issues/246") to the new repository target.

In `@openspec/changes/archive/2026-04-03-audio-input-support/design.md`:
- Around line 82-88: The markdown fenced code blocks showing the pipeline steps
and directory tree are missing language tags (triggering MD040); update each
fenced block around the snippets that list
extract_user_text()/gate_audio_config()/transcription/… and the src/ tree to
include a language identifier (e.g., ```text) so markdownlint passes;
specifically add the same language tag to the three fenced blocks containing the
step list (extract_user_text → enrich_with_memory → …), the expanded audio-step
list (→ gate_audio_config → … → inject_transcription → …), and the src/
directory tree block.
- Around line 826-829: Update the design.md text to remove the incorrect claim
that a standalone doctor module doesn't exist and instead state that the doctor
command is implemented at clients/agent-runtime/src/doctor and is invoked from
the CLI via Commands::Doctor => doctor::run(); also revise the wording to
reflect that health checks are integrated into the runtime startup validation
path (see src/config/validation.rs) and that audio diagnostics are included,
removing any "will be added in future" phrasing and ensuring the document
accurately describes the existing integration.

In `@openspec/changes/archive/2026-04-03-audio-input-support/exploration.md`:
- Around line 434-449: The markdown subsections "### Phase 1 Scope (MVP)", "###
Phase 2 (Follow-up)", and "### Effort Estimate" need blank lines added before
and after each heading to satisfy markdownlint MD022; edit the block containing
those headings in exploration.md so there is an empty line above and below each
`###` heading (ensure you also add a trailing blank line after the final
subsection) to fix the spacing.
- Around line 71-83: The unlabeled fenced code blocks in the exploration
examples (the sequence showing Channel.listen() → parse message → build
ContentPart::Image and the other two similar blocks) violate markdownlint MD040;
update each fenced block that documents the flow (the one containing
Channel.listen(), and the other blocks around the same example) to include a
language tag (e.g., ```text or ```mermaid as appropriate) so the fences are
labeled; locate the blocks near the sequence using identifiers like
Channel.listen(), process_channel_message(), extract_user_text(),
gate_and_stage_images(), StagedImageGuard and run_unified_channel_tool_loop()
and add the language label to each opening fence.

In `@openspec/changes/archive/2026-04-03-audio-input-support/proposal.md`:
- Around line 69-72: Two fenced code blocks (the one showing "Image flow:
Channel → ContentPart::Image → ..." and the pipeline block starting with
"extract_user_text()") are missing language identifiers which triggers
markdownlint MD040; update both fences to include a language label such as
```text so the blocks become labeled code fences (e.g., add "text" to the
opening backticks for the ContentPart::Image/Audio flow block and the
extract_user_text() pipeline block).

In
`@openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md`:
- Around line 341-347: Update the archived spec to match the runtime: change the
Transcriber trait code fence to rust and ensure it reflects current runtime
usage; update the [audio] contract to include the AudioConfig fields
whisper_binary (default "whisper-cli"), max_concurrent_transcriptions, and
transcription_timeout_secs, and replace any lingering references to the binary
named "whisper" with "whisper-cli"; apply the same fixes to the other
Transcriber snippets and [audio] contract occurrences (the other two locations
mentioned) so the archived contract matches AudioConfig and runtime behavior.

In `@openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md`:
- Around line 25-37: Update the Markdown in verify-report.md to satisfy
MD022/MD031/MD040 by adding blank lines before and after the "Anti-Patterns
Check" and "Code Style" headings and by converting the three fenced code blocks
to have language tags and surrounding blank lines; specifically change the
blocks containing "cargo check --manifest-path clients/agent-runtime/Cargo.toml
→ Finished dev profile" and "cargo clippy --manifest-path
clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev
profile" to ```bash fenced blocks with a blank line before and after, and change
the final test summary block ("All test suites pass: unit tests...") to a
```text fenced block with a blank line before and after so the headings and
fenced blocks comply with linting rules.

In `@openspec/specs/audio-input/spec.md`:
- Around line 97-99: The spec's placement of the 7-step audio pipeline is
incorrect; update the documentation so it reflects the actual implementation
order used in clients/agent-runtime: the audio gating/staging/transcription
pipeline is executed before extract_user_text() (i.e., inserted into
process_channel_message() prior to calling extract_user_text()), not between
extract_user_text() and enrich_with_memory(); reference
process_channel_message(), extract_user_text(), and enrich_with_memory() when
making the spec change.

---

Outside diff comments:
In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 619-688: process_channel_message currently performs the audio
pipeline (gate_audio_config, gate_and_stage_audio, transcribe_audio,
inject_transcription) before the per-turn timeout, allowing slow
downloads/transcription to exceed CHANNEL_MESSAGE_TIMEOUT_SECS; move the entire
audio stages under the per-turn timeout boundary (or compute remaining_budget
and pass it into gate_audio_config/gate_and_stage_audio/transcribe_audio) so
that these calls are canceled when the channel turn times out, and ensure any
temp resources from audio_guard are still cleaned up on timeout.
- Around line 2705-2721: The ChannelRuntimeContext currently sets transcriber:
None which causes gate_audio_config() to reject audio; fix by constructing the
transcriber before building runtime_ctx and passing it into
ChannelRuntimeContext as transcriber: Some(...) instead of None. Locate where
runtime_ctx is created (the ChannelRuntimeContext instantiation in
start_channels() and the analogous one in spawn_runtime_handle()) and call the
existing audio/transcriber factory (e.g., the module/function used to create
transcribers in this crate—invoke it with the runtime/config) to produce a
transcriber instance, then set transcriber: Some(transcriber) in both
constructors so gate_audio_config() sees a transcriber present. Ensure any
errors from creating the transcriber are handled/propagated consistent with
existing error handling.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5be0db47-780b-4ae1-9073-e138f500a063

📥 Commits

Reviewing files that changed from the base of the PR and between 522f1fe and c2f6341.

📒 Files selected for processing (35)

clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/discord.rs
clients/agent-runtime/src/channels/mod.rs
clients/agent-runtime/src/channels/telegram.rs
clients/agent-runtime/src/channels/traits.rs
clients/agent-runtime/src/channels/whatsapp.rs
clients/agent-runtime/src/config/mod.rs
clients/agent-runtime/src/config/schema.rs
clients/agent-runtime/src/doctor/mod.rs
clients/agent-runtime/src/lib.rs
clients/agent-runtime/src/main.rs
clients/agent-runtime/src/observability/log.rs
clients/agent-runtime/src/observability/mod.rs
clients/agent-runtime/src/observability/otel.rs
clients/agent-runtime/src/observability/prometheus.rs
clients/agent-runtime/src/observability/traits.rs
clients/agent-runtime/src/onboard/wizard.rs
clients/agent-runtime/src/providers/anthropic.rs
clients/agent-runtime/src/providers/compatible.rs
clients/agent-runtime/src/providers/copilot.rs
clients/agent-runtime/src/providers/openrouter.rs
clients/agent-runtime/src/providers/router.rs
clients/agent-runtime/src/providers/traits.rs
clients/agent-runtime/src/transcription/mod.rs
clients/agent-runtime/src/transcription/traits.rs
clients/agent-runtime/src/transcription/whisper_cli.rs
openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md
openspec/changes/archive/2026-04-03-audio-input-support/design.md
openspec/changes/archive/2026-04-03-audio-input-support/exploration.md
openspec/changes/archive/2026-04-03-audio-input-support/proposal.md
openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md
openspec/changes/archive/2026-04-03-audio-input-support/state.yaml
openspec/changes/archive/2026-04-03-audio-input-support/tasks.md
openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md
openspec/specs/audio-input/spec.md

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: submit-gradle
GitHub Check: pr-checks
GitHub Check: sonar
GitHub Check: Cloudflare Pages

🧰 Additional context used

📓 Path-based instructions (9)

clients/agent-runtime/src/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Files:

clients/agent-runtime/src/observability/prometheus.rs
clients/agent-runtime/src/providers/router.rs
clients/agent-runtime/src/providers/anthropic.rs
clients/agent-runtime/src/observability/otel.rs
clients/agent-runtime/src/transcription/mod.rs
clients/agent-runtime/src/providers/compatible.rs
clients/agent-runtime/src/lib.rs
clients/agent-runtime/src/channels/whatsapp.rs
clients/agent-runtime/src/main.rs
clients/agent-runtime/src/observability/mod.rs
clients/agent-runtime/src/providers/copilot.rs
clients/agent-runtime/src/providers/openrouter.rs
clients/agent-runtime/src/observability/log.rs
clients/agent-runtime/src/channels/discord.rs
clients/agent-runtime/src/config/mod.rs
clients/agent-runtime/src/providers/traits.rs
clients/agent-runtime/src/doctor/mod.rs
clients/agent-runtime/src/onboard/wizard.rs
clients/agent-runtime/src/transcription/traits.rs
clients/agent-runtime/src/channels/telegram.rs
clients/agent-runtime/src/channels/mod.rs
clients/agent-runtime/src/observability/traits.rs
clients/agent-runtime/src/config/schema.rs
clients/agent-runtime/src/transcription/whisper_cli.rs
clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/traits.rs

clients/agent-runtime/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test for code validation, or document which checks were skipped and why

Files:

clients/agent-runtime/src/observability/prometheus.rs
clients/agent-runtime/src/providers/router.rs
clients/agent-runtime/src/providers/anthropic.rs
clients/agent-runtime/src/observability/otel.rs
clients/agent-runtime/src/transcription/mod.rs
clients/agent-runtime/src/providers/compatible.rs
clients/agent-runtime/src/lib.rs
clients/agent-runtime/src/channels/whatsapp.rs
clients/agent-runtime/src/main.rs
clients/agent-runtime/src/observability/mod.rs
clients/agent-runtime/src/providers/copilot.rs
clients/agent-runtime/src/providers/openrouter.rs
clients/agent-runtime/src/observability/log.rs
clients/agent-runtime/src/channels/discord.rs
clients/agent-runtime/src/config/mod.rs
clients/agent-runtime/src/providers/traits.rs
clients/agent-runtime/src/doctor/mod.rs
clients/agent-runtime/src/onboard/wizard.rs
clients/agent-runtime/src/transcription/traits.rs
clients/agent-runtime/src/channels/telegram.rs
clients/agent-runtime/src/channels/mod.rs
clients/agent-runtime/src/observability/traits.rs
clients/agent-runtime/src/config/schema.rs
clients/agent-runtime/src/transcription/whisper_cli.rs
clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/traits.rs

**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.

Files:

clients/agent-runtime/src/observability/prometheus.rs
clients/agent-runtime/src/providers/router.rs
clients/agent-runtime/src/providers/anthropic.rs
clients/agent-runtime/src/observability/otel.rs
clients/agent-runtime/src/transcription/mod.rs
clients/agent-runtime/src/providers/compatible.rs
clients/agent-runtime/src/lib.rs
clients/agent-runtime/src/channels/whatsapp.rs
clients/agent-runtime/src/main.rs
clients/agent-runtime/src/observability/mod.rs
clients/agent-runtime/src/providers/copilot.rs
clients/agent-runtime/src/providers/openrouter.rs
clients/agent-runtime/src/observability/log.rs
clients/agent-runtime/src/channels/discord.rs
clients/agent-runtime/src/config/mod.rs
clients/agent-runtime/src/providers/traits.rs
clients/agent-runtime/src/doctor/mod.rs
clients/agent-runtime/src/onboard/wizard.rs
clients/agent-runtime/src/transcription/traits.rs
clients/agent-runtime/src/channels/telegram.rs
clients/agent-runtime/src/channels/mod.rs
clients/agent-runtime/src/observability/traits.rs
clients/agent-runtime/src/config/schema.rs
clients/agent-runtime/src/transcription/whisper_cli.rs
clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/traits.rs

**/*

⚙️ CodeRabbit configuration file

**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.

Files:

clients/agent-runtime/src/observability/prometheus.rs
clients/agent-runtime/src/providers/router.rs
clients/agent-runtime/src/providers/anthropic.rs
clients/agent-runtime/src/observability/otel.rs
clients/agent-runtime/src/transcription/mod.rs
clients/agent-runtime/src/providers/compatible.rs
clients/agent-runtime/src/lib.rs
clients/agent-runtime/src/channels/whatsapp.rs
clients/agent-runtime/src/main.rs
clients/agent-runtime/src/observability/mod.rs
openspec/changes/archive/2026-04-03-audio-input-support/state.yaml
clients/agent-runtime/src/providers/copilot.rs
clients/agent-runtime/src/providers/openrouter.rs
clients/agent-runtime/src/observability/log.rs
clients/agent-runtime/src/channels/discord.rs
clients/agent-runtime/src/config/mod.rs
clients/agent-runtime/src/providers/traits.rs
clients/agent-runtime/src/doctor/mod.rs
openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md
openspec/changes/archive/2026-04-03-audio-input-support/proposal.md
openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md
clients/agent-runtime/src/onboard/wizard.rs
clients/agent-runtime/src/transcription/traits.rs
openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md
openspec/changes/archive/2026-04-03-audio-input-support/exploration.md
openspec/specs/audio-input/spec.md
openspec/changes/archive/2026-04-03-audio-input-support/tasks.md
clients/agent-runtime/src/channels/telegram.rs
clients/agent-runtime/src/channels/mod.rs
clients/agent-runtime/src/observability/traits.rs
clients/agent-runtime/src/config/schema.rs
clients/agent-runtime/src/transcription/whisper_cli.rs
openspec/changes/archive/2026-04-03-audio-input-support/design.md
clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/traits.rs

clients/agent-runtime/src/providers/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Implement Provider trait in src/providers/ and register in src/providers/mod.rs factory when adding a new provider

Files:

clients/agent-runtime/src/providers/router.rs
clients/agent-runtime/src/providers/anthropic.rs
clients/agent-runtime/src/providers/compatible.rs
clients/agent-runtime/src/providers/copilot.rs
clients/agent-runtime/src/providers/openrouter.rs
clients/agent-runtime/src/providers/traits.rs

clients/agent-runtime/src/channels/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Implement Channel trait in src/channels/ with consistent send, listen, and health_check semantics and cover auth/allowlist/health behavior with tests

Files:

clients/agent-runtime/src/channels/whatsapp.rs
clients/agent-runtime/src/channels/discord.rs
clients/agent-runtime/src/channels/telegram.rs
clients/agent-runtime/src/channels/mod.rs
clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/traits.rs

clients/agent-runtime/src/main.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

clients/agent-runtime/src/main.rs: Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
Keep startup path lean and avoid heavy initialization in command parsing flow

Files:

clients/agent-runtime/src/main.rs

clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable

Files:

clients/agent-runtime/src/config/mod.rs
clients/agent-runtime/src/config/schema.rs

**/*.{md,mdx}

⚙️ CodeRabbit configuration file

**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.

Files:

openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md
openspec/changes/archive/2026-04-03-audio-input-support/proposal.md
openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md
openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md
openspec/changes/archive/2026-04-03-audio-input-support/exploration.md
openspec/specs/audio-input/spec.md
openspec/changes/archive/2026-04-03-audio-input-support/tasks.md
openspec/changes/archive/2026-04-03-audio-input-support/design.md

🧠 Learnings (10)

📓 Common learnings

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests

📚 Learning: 2026-02-17T12:31:17.076Z

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths

Applied to files:

clients/agent-runtime/src/observability/prometheus.rs
clients/agent-runtime/src/transcription/mod.rs
clients/agent-runtime/src/providers/compatible.rs
clients/agent-runtime/src/channels/whatsapp.rs
clients/agent-runtime/src/main.rs
clients/agent-runtime/src/channels/discord.rs
clients/agent-runtime/src/config/mod.rs
clients/agent-runtime/src/doctor/mod.rs
clients/agent-runtime/src/config/schema.rs
clients/agent-runtime/src/transcription/whisper_cli.rs

📚 Learning: 2026-02-17T12:31:17.076Z

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests

Applied to files:

clients/agent-runtime/src/providers/anthropic.rs
clients/agent-runtime/src/providers/compatible.rs
clients/agent-runtime/src/channels/whatsapp.rs
clients/agent-runtime/src/observability/mod.rs
clients/agent-runtime/src/providers/openrouter.rs
clients/agent-runtime/src/doctor/mod.rs
clients/agent-runtime/src/transcription/traits.rs
openspec/changes/archive/2026-04-03-audio-input-support/tasks.md
clients/agent-runtime/src/channels/telegram.rs
clients/agent-runtime/src/channels/mod.rs
clients/agent-runtime/src/observability/traits.rs
clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/traits.rs

📚 Learning: 2026-02-17T12:31:17.076Z

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/providers/**/*.rs : Implement `Provider` trait in `src/providers/` and register in `src/providers/mod.rs` factory when adding a new provider

Applied to files:

clients/agent-runtime/src/transcription/mod.rs
clients/agent-runtime/src/lib.rs
clients/agent-runtime/src/observability/mod.rs
clients/agent-runtime/src/transcription/traits.rs

📚 Learning: 2026-02-17T12:31:17.076Z

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/Cargo.toml : Do not add heavy dependencies for minor convenience; justify new crate additions

Applied to files:

clients/agent-runtime/src/transcription/mod.rs
clients/agent-runtime/src/main.rs
clients/agent-runtime/src/observability/mod.rs
clients/agent-runtime/src/config/mod.rs

📚 Learning: 2026-02-17T12:31:17.076Z

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why

Applied to files:

clients/agent-runtime/src/transcription/mod.rs
clients/agent-runtime/src/channels/whatsapp.rs
clients/agent-runtime/src/main.rs
clients/agent-runtime/src/config/mod.rs
clients/agent-runtime/src/doctor/mod.rs
clients/agent-runtime/src/config/schema.rs

📚 Learning: 2026-02-17T12:31:17.076Z

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Keep startup path lean and avoid heavy initialization in command parsing flow

Applied to files:

clients/agent-runtime/src/main.rs
clients/agent-runtime/src/channels/mod.rs

📚 Learning: 2026-02-17T12:31:17.076Z

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/**/*.rs : Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Applied to files:

clients/agent-runtime/src/main.rs

📚 Learning: 2026-02-17T12:31:17.076Z

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable

Applied to files:

clients/agent-runtime/src/config/mod.rs

📚 Learning: 2026-02-17T12:31:17.076Z

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/tools/**/*.rs : Implement `Tool` trait in `src/tools/` with strict parameter schema, validate and sanitize all inputs, and return structured `ToolResult` without panics in runtime path

Applied to files:

clients/agent-runtime/src/transcription/traits.rs

🪛 LanguageTool

openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md

[style] ~399-~399: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...tion failure — process timeout - GIVEN a very large audio file near the duration limit - WH...

(EN_WEAK_ADJECTIVE)

[grammar] ~479-~479: Use a hyphen to join words.
Context: ...tartup validation error indicating the 1 hour ceiling #### Scenario: Missing aud...

(QB_NEW_EN_HYPHEN)

[style] ~838-~838: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...t, including: - Zero-byte audio files - Extremely large files (rejected by size limit) - Files ...

(EN_WEAK_ADJECTIVE)

[locale-violation] ~847-~847: In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects.
Context: ... same user sends a text message "hello" afterwards - THEN the text message is processed no...

(AFTERWARDS_US)

openspec/changes/archive/2026-04-03-audio-input-support/exploration.md

[style] ~181-~181: Consider using a different adverb to strengthen your wording.
Context: ...) and audio files (audio field) are completely ignored — messages with only voice/au...

(COMPLETELY_ENTIRELY)

openspec/specs/audio-input/spec.md

[style] ~399-~399: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...tion failure — process timeout - GIVEN a very large audio file near the duration limit - WH...

(EN_WEAK_ADJECTIVE)

[grammar] ~479-~479: Use a hyphen to join words.
Context: ...tartup validation error indicating the 1 hour ceiling #### Scenario: Missing aud...

(QB_NEW_EN_HYPHEN)

[style] ~838-~838: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...t, including: - Zero-byte audio files - Extremely large files (rejected by size limit) - Files ...

(EN_WEAK_ADJECTIVE)

[locale-violation] ~847-~847: In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects.
Context: ... same user sends a text message "hello" afterwards - THEN the text message is processed no...

(AFTERWARDS_US)

openspec/changes/archive/2026-04-03-audio-input-support/design.md

[style] ~943-~943: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nges. - No provider contract changes. - No existing behavior modified. - Rollout: ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.22.0)

openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md

[warning] 25-25: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

[warning] 25-25: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 30-30: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

[warning] 30-30: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 35-35: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

[warning] 35-35: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 313-313: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 320-320: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

openspec/changes/archive/2026-04-03-audio-input-support/proposal.md

[warning] 69-69: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 79-79: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md

[warning] 341-341: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

openspec/changes/archive/2026-04-03-audio-input-support/exploration.md

[warning] 71-71: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 359-359: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 370-370: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 434-434: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 441-441: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 447-447: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

openspec/specs/audio-input/spec.md

[warning] 341-341: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

openspec/changes/archive/2026-04-03-audio-input-support/design.md

[warning] 82-82: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 92-92: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 949-949: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (15)

clients/agent-runtime/src/main.rs (2)

74-74: Good module wiring for transcription integration.

mod transcription; cleanly wires the new runtime transcription module into the binary crate.

718-719: Scoped lint suppression is acceptable here.

Applying clippy::large_futures at the dispatcher boundary is reasonable for this async match-heavy function.

clients/agent-runtime/src/providers/anthropic.rs (1)

975-976: Test fixture updates are correct.

Adding audio_metadata: None consistently keeps test ChatMessage fixtures aligned with the current provider trait contract.

Also applies to: 981-982, 987-988, 1000-1001, 1008-1009, 1023-1024, 1033-1034, 1155-1156, 1175-1176, 1277-1278, 1340-1341, 1346-1347, 1352-1353

clients/agent-runtime/src/transcription/mod.rs (1)

1-2: Module exports look clean and intentional.

Publicly exposing traits and whisper_cli is a solid, minimal surface for the transcription subsystem.

clients/agent-runtime/src/observability/log.rs (1)

192-203: Good audio ingress log coverage with safe metadata fields.

This adds the expected ingress lifecycle visibility without logging raw audio/transcript payloads.

clients/agent-runtime/src/providers/openrouter.rs (1)

538-539: Fixture alignment is correct.

The added audio_metadata: None keeps tests in sync with the expanded ChatMessage schema.

Also applies to: 544-545, 588-589, 594-595, 642-643, 738-739, 759-760

openspec/changes/archive/2026-04-03-audio-input-support/state.yaml (1)

1-8: Archive state entry looks complete and consistent.

Phase state, references, and branch linkage are properly captured.

clients/agent-runtime/src/observability/mod.rs (1)

17-19: Re-export update is correct.

Adding audio ingress types to the module surface keeps observability APIs coherent for downstream users.

clients/agent-runtime/src/onboard/wizard.rs (1)

799-799: Good fail-closed wiring for audio config defaults.

Both onboarding paths now initialize audio explicitly, which keeps generated configs complete and secure-by-default.

Also applies to: 1037-1037

clients/agent-runtime/src/config/mod.rs (1)

5-15: Re-export update is correct and coherent.

Adding AudioConfig to the schema re-export keeps the config API aligned with the new [audio] section.

clients/agent-runtime/src/providers/traits.rs (1)

17-19: audio_metadata addition is backward-compatible and safely defaulted.

Using #[serde(default, skip_serializing_if = "Option::is_none")] here is the right compatibility choice for existing stored history payloads.

Also applies to: 23-29, 32-38, 41-52, 54-64, 67-73, 76-82

clients/agent-runtime/src/transcription/traits.rs (1)

23-42: Transcriber interface is clean and runtime-safe.

Good separation between user-facing transcription errors and startup/doctor health diagnostics.

clients/agent-runtime/src/channels/telegram.rs (1)

3077-3077: LGTM on the widened test matches.

Using _ => panic!(...) keeps these assertions focused on the expected variant now that ContentPart has another case.

Also applies to: 3140-3140, 3171-3171, 3184-3184, 3224-3224, 3279-3279, 3332-3332, 3389-3389

clients/agent-runtime/src/transcription/whisper_cli.rs (1)

50-70: Nice output normalization.

Filtering [BLANK_AUDIO] and collapsing multiline stdout here is a good guard against injecting empty/silent transcripts.

clients/agent-runtime/src/channels/audio_media.rs (1)

282-725: Strong boundary-focused test coverage for the new audio media layer.

Coverage across MIME sniffing, size/duration boundaries, cleanup behavior, and context rendering looks solid for this critical ingress path.

coderabbitai · 2026-04-03T19:14:15Z

+```
+cargo check --manifest-path clients/agent-runtime/Cargo.toml → Finished dev profile
+```
+
+**Clippy**: ✅ Passed (zero warnings)
+```
+cargo clippy --manifest-path clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev profile
+```
+
+**Tests**: ✅ 6,487 passed / 0 failed / 0 ignored
+```
+All test suites pass: unit tests (3193 lib + 3220 bin), 15 integration test suites, 2 doc-tests.
+```


⚠️ Potential issue | 🟡 Minor

Fix markdownlint violations in fenced blocks and heading spacing.

The fenced blocks need language tags and surrounding blank lines, and the “Anti-Patterns Check” / “Code Style” headings need blank lines around them per MD022/MD031/MD040.

Proposed markdown fix

**Build**: ✅ Passed -``` + +```bash cargo check --manifest-path clients/agent-runtime/Cargo.toml → Finished dev profile

Clippy: ✅ Passed (zero warnings)
- + +bash
cargo clippy --manifest-path clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev profile

**Tests**: ✅ 6,487 passed / 0 failed / 0 ignored -``` + +```text All test suites pass: unit tests (3193 lib + 3220 bin), 15 integration test suites, 2 doc-tests.

@@

Code Quality Assessment

Anti-Patterns Check

✅ No unwrap()/expect() in production code — all occurrences are in #[cfg(test)] blocks
@@

Code Style

✅ Follows existing codebase patterns (mirrors StagedImageGuard, ImageRejectionReason, etc.)

</details> Also applies to: 313-313, 320-320 <details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.22.0)</summary> [warning] 25-25: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- [warning] 25-25: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- [warning] 30-30: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- [warning] 30-30: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- [warning] 35-35: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- [warning] 35-35: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> <details> <summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md
around lines 25 - 37, Update the Markdown in verify-report.md to satisfy
MD022/MD031/MD040 by adding blank lines before and after the "Anti-Patterns
Check" and "Code Style" headings and by converting the three fenced code blocks
to have language tags and surrounding blank lines; specifically change the
blocks containing "cargo check --manifest-path clients/agent-runtime/Cargo.toml
→ Finished dev profile" and "cargo clippy --manifest-path
clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev
profile" to ```bash fenced blocks with a blank line before and after, and change
the final test summary block ("All test suites pass: unit tests...") to a

fenced blocks comply with linting rules.

High priority: - Broaden MP3 magic-byte sync detection to accept full MPEG frame mask - Map whisper non-zero exits to TranscriptionFailed by default, Corrupted only on decode-related stderr keywords - Add kill_on_drop(true) to prevent orphaned whisper child processes - Use create_new(true) for temp file creation to prevent symlink attacks - Add MultipleAudioParts rejection variant instead of SystemError - Check model path is_file() not just exists() in doctor - Measure and report actual transcription latency instead of clip duration - Wire WhisperCliTranscriber into ChannelRuntimeContext when audio enabled - Move audio pipeline stages under per-turn timeout boundary - Fall back to system model path when user path doesn't exist Medium priority: - Add deny_unknown_fields to AudioConfig for strict TOML parsing - Validate transcription concurrency and timeout are non-zero at startup - Deduplicate audio constants between schema and audio_media modules - Record audio ingress metrics in OTEL and Prometheus backends - Add user_with_media constructor to avoid partial-state mutations - Add symmetric serde tests for audio_metadata field - Fix discord test panic message for wildcard match arm - Handle unauthorized audio-only messages in Telegram channel Documentation: - Fix stale GitHub link in archive report - Add markdown language tags to fenced code blocks (MD040) - Fix heading spacing (MD022) in exploration and verify-report - Update spec pipeline order to match implementation - Sync archived spec with AudioConfig runtime fields

coderabbitai

Actionable comments posted: 17

♻️ Duplicate comments (3)

clients/agent-runtime/src/doctor/mod.rs (1)

944-971: ⚠️ Potential issue | 🟡 Minor

audio_health_pass_model_exists still bypasses production logic.

This test rebuilds model-check behavior inline instead of exercising check_audio_health, so it can pass while real doctor logic regresses.

As per coding guidelines **/*: “Look for behavioral regressions, missing tests, and contract breaks across modules.”
openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md (1)
24-37: ⚠️ Potential issue | 🟡 Minor

Resolve remaining markdownlint spacing violations (MD031/MD022).

Fences at Line 25/30/35 need surrounding blank lines, and headings at Line 313 and Line 320 need blank lines below them.
🧹 Proposed markdown fix
 **Build**: ✅ Passed
+
 ```bash
 cargo check --manifest-path clients/agent-runtime/Cargo.toml → Finished dev profile
Clippy: ✅ Passed (zero warnings)
+
cargo clippy --manifest-path clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev profile
Tests: ✅ 6,487 passed / 0 failed / 0 ignored
+
All test suites pass: unit tests (3193 lib + 3220 bin), 15 integration test suites, 2 doc-tests.
@@

Anti-Patterns Check

✅ No unwrap()/expect() in production code — all occurrences are in #[cfg(test)] blocks
@@

Code Style

✅ Follows existing codebase patterns (mirrors StagedImageGuard, ImageRejectionReason, etc.)
</details>

 


Also applies to: 313-321

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md
around lines 24 - 37, The markdown has missing blank lines around fenced code
blocks and after two headings; add a blank line before and after each
triple-backtick fence that wraps the "cargo check --manifest-path
clients/agent-runtime/Cargo.toml → Finished dev profile" and the "cargo clippy
..." and the "All test suites pass: ..." code blocks, and insert a blank line
immediately below the "Anti-Patterns Check" and "Code Style" headings so each
heading is followed by an empty line; update the verify-report.md content
accordingly to satisfy MD031/MD022.
</details>

</blockquote></details>
<details>
<summary>clients/agent-runtime/src/channels/mod.rs (1)</summary><blockquote>

`1123-1125`: _⚠️ Potential issue_ | _🟡 Minor_

**Keep `MultipleAudioParts` distinct in ingress telemetry.**

This still collapses a known validation failure into `SystemError`, so rejected multi-audio turns are indistinguishable from real runtime faults in `AudioIngressEvent`. Add a dedicated `AudioIngressReason` variant and map it here instead.

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @clients/agent-runtime/src/channels/mod.rs around lines 1123 - 1125, The
match currently maps audio_media::AudioRejectionReason::MultipleAudioParts to
AudioIngressReason::SystemError, collapsing validation failures with real
faults; add a new AudioIngressReason variant (e.g., MultipleAudioParts) to the
AudioIngressReason enum and update the match in the code handling
audio_media::AudioRejectionReason so MultipleAudioParts maps to the new
AudioIngressReason::MultipleAudioParts; ensure any places that construct or
pattern-match AudioIngressEvent/AudioIngressReason are updated to handle the new
variant so telemetry distinguishes validation rejection from system errors.
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/settings.local.json:

Around line 5-6: The new permissions are too broad: replace the unrestricted
"Bash(gh pr:*)" with a scoped, read-only GH CLI set (e.g., "Bash(gh pr:view,gh
pr:list,gh pr:status)" or the minimal verbs your workflow needs) and change
"Read(//tmp/)" to a dedicated temp subdirectory used only by this workflow
(e.g., "Read(//tmp//)") so the CLAUDE settings grant least
privilege while still allowing the workflow’s required read-only PR queries and
access to its own temp folder.

In @clients/agent-runtime/src/channels/audio_media.rs:

Around line 120-124: The current MP3 sniff accepts any second byte with top 3
bits set (0xE0) which also matches reserved layer values and ADTS AAC headers
(e.g., 0xFF 0xF1); update the check that returns AllowedAudioMime::Mp3 to also
require valid (non-zero) MPEG layer bits so reserved/ADTS frames are rejected —
replace the condition in the snippet that inspects sniffed_bytes (and the
duplicate at the other occurrence) with a combined check: ensure
(sniffed_bytes[1] & 0xE0) == 0xE0 AND (sniffed_bytes[1] & 0x06) != 0, and flip
the test validate_audio_mime_detects_mp3_sync_e0 to expect rejection instead of
acceptance.

Around line 239-277: The to_context_string() method in audio_media.rs
currently injects sha256 and byte_len into model-facing history; remove those
fields so the context string only contains the media marker (mime), optional
duration, the sanitized transcription, and the sanitized caption (retain the
existing newline stripping and 200-char truncation logic). Update the initial
format call in to_context_string() (and any subsequent writes) to no longer
include byte_len or sha256, ensure the closing ']' behavior stays the same, and
update tests that assert the produced context to expect strings without the hash
prefix and byte size (see to_context_string and ChatMessage::user_with_audio for
where this string is consumed).

In @clients/agent-runtime/src/channels/mod.rs:

Around line 642-688: The per-turn timeout (CHANNEL_MESSAGE_TIMEOUT_SECS) and
started_at stopwatch must be started before any audio work so audio
fetch/staging and transcriber semaphore waits are counted; move the creation of
the per-turn timeout/stopwatch out of the post-audio section and into the code
path before evaluating msg.has_audio_parts(), or alternatively compute the
remaining budget and thread it into the audio functions (pass the
deadline/timeout into gate_audio_config, gate_and_stage_audio, and
transcribe_audio) so those calls respect the same timeout; update uses in the
audio pipeline (audio_history_metas, gate_audio_config, gate_and_stage_audio,
transcribe_audio, inject_transcription) to either run under the pre-started
timeout or accept and honor the remaining deadline.

Around line 1479-1483: The injected_text currently prefixes the transcript
with a synthetic label depending on caption_text, which alters downstream
behavior; instead set injected_text to the transcript content itself (the
trimmed string) without any “[Voice message transcription]”/“[Audio
transcription]” prefix, removing the conditional formatting logic around
caption_text and leaving audio provenance to AudioHistoryMeta so downstream
memory/pre-execution checks and providers see the exact transcribed text.

In @clients/agent-runtime/src/channels/telegram.rs:

Around line 1856-1870: The current TOCTOU happens because you create the file
with OpenOptions::create_new and then drop the std::fs::File handle before
calling tokio::fs::write, allowing the file to be swapped; fix it by writing
through the original file handle instead of dropping it: keep the std::fs::File
returned by OpenOptions::open (the variable currently named file), and either
call file.write_all(&bytes) (or wrap it in spawn_blocking if you must avoid
blocking the async runtime) or convert it to a tokio::fs::File via
tokio::fs::File::from_std(file) and call async write_all; ensure you flush (and
optionally sync_all) and close the handle before proceeding.

In @clients/agent-runtime/src/config/schema.rs:

Around line 3345-3371: When audio is enabled (ac.enabled), validate that
ac.whisper_binary and ac.transcription_model are not blank: trim whitespace and
if either is empty, return an error (anyhow::bail!) stating that whisper_binary
and/or transcription_model must be non-empty when audio is enabled; perform
these checks alongside the existing ac.allowed_channels validation (before the
tracing::info! log) so the audio path fails closed at startup rather than later
when spawning whisper or resolving models.

In @clients/agent-runtime/src/doctor/mod.rs:

Around line 687-692: The code currently pushes DiagItem::ok for the whisper
binary when the spawned process returns any Ok result, which can mark a non-zero
exit (e.g., from running --help) as healthy; change the check after running
the binary so you inspect the child process ExitStatus (use status.success())
and only push DiagItem::ok when success() is true, otherwise push a failing
DiagItem (e.g., DiagItem::err) with the binary_path, the non-zero exit code or
status, and any stderr/stdout to make the failure clear.

In @clients/agent-runtime/src/providers/traits.rs:

Around line 17-19: The audio_metadata field currently serializes
AudioHistoryMeta.transcription duplicating the transcript already stored in
ChatMessage.content; remove or scrub the transcription before persisting by
ensuring AudioHistoryMeta.transcription is either omitted or set to None/empty
during serialization for traits.rs (affecting the audio_metadata field and any
roundtrip test logic that inspects AudioHistoryMeta in the tests around the
lines referenced); update the serializer behavior or the code that constructs
audio_metadata to not carry user speech text while keeping other metadata fields
intact so only ChatMessage.content retains the transcript.

In @clients/agent-runtime/src/transcription/whisper_cli.rs:

Around line 113-116: The model path check uses exists() which allows
directories; update both transcribe() and health_check() to validate the
resolved model path with is_file() instead of exists() (i.e., replace checks
that call self.model_path.exists() with self.model_path.is_file()) so
directories are rejected early and behavior matches the doctor module's
validation.

Around line 194-214: The health_check in whisper_cli.rs currently spawns
Command::new(&self.binary_path).arg("--help") and treats any Ok(_) from
status().await as success; change the logic in health_check to inspect the
returned std::process::ExitStatus (from binary_check) and fail if
!status.success(): return Err including the exit code or status (use
status.code() or the ExitStatus) so non‑zero exits of the whisper-cli --help
properly produce an error for self.binary_path / health_check / binary_check.

In @openspec/changes/archive/2026-04-03-audio-input-support/design.md:

Around line 829-850: The doc snippet is stale: replace the old
check_audio_config/DoctorWarning logic with the current
check_audio_health/DiagItem semantics—call check_audio_health (or rename the
snippet) and produce DiagItem entries instead of DoctorWarning, using the
module's file-existence check used elsewhere (e.g., check_model_file_exists or
model_path.is_file() rather than model_path.exists()) and update messages to
match DiagItem fields (kind/source "audio", diagnostic message, and appropriate
severity/category). Also update references from whisper_binary/whisper model
checks to the actual identifiers used by check_audio_health (e.g.,
resolve_model_path -> the current resolver) so the doc mirrors the real function
names and return structure.

Around line 364-379: Update the design docs to match the implemented
transcription trait: change TranscriptionResult.duration_secs from f64 to
Option, update any function signatures that still show anyhow::Result or
Result<TranscriptionResult, _> to the actual Result<TranscriptionResult,
AudioRejectionReason> shape, and change health_check() return documentation from
bool to the implemented Result<(), String>; ensure all occurrences (including
the other referenced sections) reference the concrete types and error enum
AudioRejectionReason and the TranscriptionResult struct as implemented.

In @openspec/changes/archive/2026-04-03-audio-input-support/proposal.md:

Line 39: Update the stale count "6 error types" in the proposal where the
phrase "unsupported format, too large, too long, corrupted, transcription
failed, no speech" appears—either remove the numeric count or replace it with an
accurate description that includes the additional cases (disabled, channel,
transcriber, system) implemented in the PR so the taxonomy in the text matches
the code/implementation; ensure the phrase describing error types reflects the
full set or uses non-numeric language like "the following error types" to avoid
future drift.

In
@openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md:

Around line 632-647: The spec omits the new
AudioRejectionReason::MultipleAudioParts variant; add a 12th row to the
rejection table for MultipleAudioParts with a clear user-facing message (e.g.,
"Please send only one audio file at a time.") and an "Emitted When" description
like "More than one audio attachment is present / runtime rejects multiple audio
parts", ensuring the taxonomy is updated from 11 to 12 variants so the archived
spec matches the shipped enum and runtime behavior; reference
AudioRejectionReason::MultipleAudioParts and update the surrounding text
asserting exhaustiveness (REQ-11) to reflect 12 variants.

In @openspec/specs/audio-input/spec.md:

Around line 404-415: Update the spec text for the health_check() scenarios to
describe the resolved whisper model lookup order rather than a single hard-coded
path: list the precedence used by the implementation (explicit configured path,
user home path like ~/.corvus/models/whisper/{model}.bin, then
system/package-managed locations) or reference the resolved path returned by the
health_check() logic; ensure the "unhealthy" scenario expects an Err(String)
that names the resolved path it tried. Apply the same wording change to the
other affected section referenced (the block around lines 892-915) so
package-managed installs and the documented system fallback are covered
consistently.

Around line 97-115: Add and document an early fail-closed check for multiple
audio parts: before the 7-step pipeline in process_channel_message() (i.e.,
prior to Parse/extract_user_text()), detect if more than one ContentPart::Audio
is present and immediately reject with AudioRejectionReason::MultipleAudioParts
and emit an AudioIngressEvent; update REQ-2 to state this early rejection and
update the REQ-11 taxonomy/table to include the MultipleAudioParts reason and
its human-readable message so the docs match the runtime behavior (also apply
the same insertion/update in the corresponding section around lines 632-647).

Duplicate comments:
In @clients/agent-runtime/src/channels/mod.rs:

Around line 1123-1125: The match currently maps
audio_media::AudioRejectionReason::MultipleAudioParts to
AudioIngressReason::SystemError, collapsing validation failures with real
faults; add a new AudioIngressReason variant (e.g., MultipleAudioParts) to the
AudioIngressReason enum and update the match in the code handling
audio_media::AudioRejectionReason so MultipleAudioParts maps to the new
AudioIngressReason::MultipleAudioParts; ensure any places that construct or
pattern-match AudioIngressEvent/AudioIngressReason are updated to handle the new
variant so telemetry distinguishes validation rejection from system errors.

In @openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md:

Around line 24-37: The markdown has missing blank lines around fenced code
blocks and after two headings; add a blank line before and after each
triple-backtick fence that wraps the "cargo check --manifest-path
clients/agent-runtime/Cargo.toml → Finished dev profile" and the "cargo clippy
..." and the "All test suites pass: ..." code blocks, and insert a blank line
immediately below the "Anti-Patterns Check" and "Code Style" headings so each
heading is followed by an empty line; update the verify-report.md content
accordingly to satisfy MD031/MD022.
</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ]  Push a commit to this branch (recommended)
- [ ]  Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: ASSERTIVE

**Plan**: Pro

**Run ID**: `3d6a679b-6c7c-46da-983f-3bf8712ead66`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between c2f63419f794663da8ed48e9ffea658da3c5dbac and aa848be8752cee1ea4c499f8b9e299899d19a998.

</details>

<details>
<summary>📒 Files selected for processing (19)</summary>

* `.claude/settings.local.json`
* `clients/agent-runtime/src/channels/audio_media.rs`
* `clients/agent-runtime/src/channels/discord.rs`
* `clients/agent-runtime/src/channels/mod.rs`
* `clients/agent-runtime/src/channels/telegram.rs`
* `clients/agent-runtime/src/config/schema.rs`
* `clients/agent-runtime/src/doctor/mod.rs`
* `clients/agent-runtime/src/observability/otel.rs`
* `clients/agent-runtime/src/observability/prometheus.rs`
* `clients/agent-runtime/src/providers/traits.rs`
* `clients/agent-runtime/src/transcription/traits.rs`
* `clients/agent-runtime/src/transcription/whisper_cli.rs`
* `openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md`
* `openspec/changes/archive/2026-04-03-audio-input-support/design.md`
* `openspec/changes/archive/2026-04-03-audio-input-support/exploration.md`
* `openspec/changes/archive/2026-04-03-audio-input-support/proposal.md`
* `openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md`
* `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md`
* `openspec/specs/audio-input/spec.md`

</details>

</details>

<details>
<summary>📜 Review details</summary>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)</summary>

* GitHub Check: sonar
* GitHub Check: pr-checks
* GitHub Check: submit-gradle
* GitHub Check: Cloudflare Pages

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>📓 Path-based instructions (8)</summary>

<details>
<summary>clients/agent-runtime/src/channels/**/*.rs</summary>


**📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)**

> Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests

Files:
- `clients/agent-runtime/src/channels/discord.rs`
- `clients/agent-runtime/src/channels/mod.rs`
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>clients/agent-runtime/src/**/*.rs</summary>


**📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)**

> `clients/agent-runtime/src/**/*.rs`: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
> Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Files:
- `clients/agent-runtime/src/channels/discord.rs`
- `clients/agent-runtime/src/observability/prometheus.rs`
- `clients/agent-runtime/src/observability/otel.rs`
- `clients/agent-runtime/src/transcription/traits.rs`
- `clients/agent-runtime/src/providers/traits.rs`
- `clients/agent-runtime/src/doctor/mod.rs`
- `clients/agent-runtime/src/transcription/whisper_cli.rs`
- `clients/agent-runtime/src/channels/mod.rs`
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/config/schema.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>clients/agent-runtime/**/*.rs</summary>


**📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)**

> Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why

Files:
- `clients/agent-runtime/src/channels/discord.rs`
- `clients/agent-runtime/src/observability/prometheus.rs`
- `clients/agent-runtime/src/observability/otel.rs`
- `clients/agent-runtime/src/transcription/traits.rs`
- `clients/agent-runtime/src/providers/traits.rs`
- `clients/agent-runtime/src/doctor/mod.rs`
- `clients/agent-runtime/src/transcription/whisper_cli.rs`
- `clients/agent-runtime/src/channels/mod.rs`
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/config/schema.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>**/*.rs</summary>


**⚙️ CodeRabbit configuration file**

> `**/*.rs`: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
> Flag unnecessary clones, unchecked panics in production paths, and weak error context.
> Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
> 

Files:
- `clients/agent-runtime/src/channels/discord.rs`
- `clients/agent-runtime/src/observability/prometheus.rs`
- `clients/agent-runtime/src/observability/otel.rs`
- `clients/agent-runtime/src/transcription/traits.rs`
- `clients/agent-runtime/src/providers/traits.rs`
- `clients/agent-runtime/src/doctor/mod.rs`
- `clients/agent-runtime/src/transcription/whisper_cli.rs`
- `clients/agent-runtime/src/channels/mod.rs`
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/config/schema.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>**/*</summary>


**⚙️ CodeRabbit configuration file**

> `**/*`: Security first, performance second.
> Validate input boundaries, auth/authz implications, and secret management.
> Look for behavioral regressions, missing tests, and contract breaks across modules.
> 

Files:
- `clients/agent-runtime/src/channels/discord.rs`
- `clients/agent-runtime/src/observability/prometheus.rs`
- `clients/agent-runtime/src/observability/otel.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/proposal.md`
- `clients/agent-runtime/src/transcription/traits.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/exploration.md`
- `clients/agent-runtime/src/providers/traits.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md`
- `openspec/specs/audio-input/spec.md`
- `clients/agent-runtime/src/doctor/mod.rs`
- `clients/agent-runtime/src/transcription/whisper_cli.rs`
- `clients/agent-runtime/src/channels/mod.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/design.md`
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/config/schema.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>**/*.{md,mdx}</summary>


**⚙️ CodeRabbit configuration file**

> `**/*.{md,mdx}`: Verify technical accuracy and that docs stay aligned with code changes.
> For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.
> 

Files:
- `openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/proposal.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/exploration.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md`
- `openspec/specs/audio-input/spec.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/design.md`

</details>
<details>
<summary>clients/agent-runtime/src/providers/**/*.rs</summary>


**📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)**

> Implement `Provider` trait in `src/providers/` and register in `src/providers/mod.rs` factory when adding a new provider

Files:
- `clients/agent-runtime/src/providers/traits.rs`

</details>
<details>
<summary>clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs</summary>


**📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)**

> Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable

Files:
- `clients/agent-runtime/src/config/schema.rs`

</details>

</details><details>
<summary>🧠 Learnings (7)</summary>

<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
**Applied to files:**
- `clients/agent-runtime/src/channels/discord.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md`
- `openspec/specs/audio-input/spec.md`
- `clients/agent-runtime/src/doctor/mod.rs`
- `clients/agent-runtime/src/transcription/whisper_cli.rs`
- `clients/agent-runtime/src/config/schema.rs`

</details>
<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/providers/**/*.rs : Implement Provider trait in src/providers/ and register in src/providers/mod.rs factory when adding a new provider
**Applied to files:**
- `clients/agent-runtime/src/transcription/traits.rs`

</details>
<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement Channel trait in src/channels/ with consistent send, listen, and health_check semantics and cover auth/allowlist/health behavior with tests
**Applied to files:**
- `clients/agent-runtime/src/transcription/traits.rs`
- `clients/agent-runtime/src/providers/traits.rs`
- `openspec/specs/audio-input/spec.md`
- `clients/agent-runtime/src/channels/mod.rs`
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test for code validation, or document which checks were skipped and why
**Applied to files:**
- `clients/agent-runtime/src/providers/traits.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md`
- `clients/agent-runtime/src/doctor/mod.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/design.md`
- `clients/agent-runtime/src/config/schema.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/.github/**/*.{yml,yaml} : For workflow/template-only changes, ensure YAML/template syntax validity
**Applied to files:**
- `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md`

</details>
<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools}/**/*.rs : Treat src/security/, src/gateway/, src/tools/ as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks
**Applied to files:**
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/config/schema.rs`

</details>
<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
**Applied to files:**
- `clients/agent-runtime/src/config/schema.rs`

</details>

</details><details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>openspec/changes/archive/2026-04-03-audio-input-support/exploration.md</summary>

[style] ~181-~181: Consider using a different adverb to strengthen your wording.
Context: ...) and audio files (`audio` field) are **completely ignored** — messages with only voice/au...

(COMPLETELY_ENTIRELY)

</details>
<details>
<summary>openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md</summary>

[style] ~399-~399: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...tion failure — process timeout  - GIVEN a very large audio file near the duration limit - WH...

(EN_WEAK_ADJECTIVE)

---

[grammar] ~482-~482: Use a hyphen to join words.
Context: ...tartup validation error indicating the 1 hour ceiling  #### Scenario: Missing aud...

(QB_NEW_EN_HYPHEN)

---

[style] ~841-~841: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...t, including: - Zero-byte audio files - Extremely large files (rejected by size limit) - Files ...

(EN_WEAK_ADJECTIVE)

---

[locale-violation] ~850-~850: In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects.
Context: ... same user sends a text message "hello" afterwards - THEN the text message is processed no...

(AFTERWARDS_US)

</details>
<details>
<summary>openspec/specs/audio-input/spec.md</summary>

[style] ~399-~399: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...tion failure — process timeout  - GIVEN a very large audio file near the duration limit - WH...

(EN_WEAK_ADJECTIVE)

---

[grammar] ~482-~482: Use a hyphen to join words.
Context: ...tartup validation error indicating the 1 hour ceiling  #### Scenario: Missing aud...

(QB_NEW_EN_HYPHEN)

---

[style] ~841-~841: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...t, including: - Zero-byte audio files - Extremely large files (rejected by size limit) - Files ...

(EN_WEAK_ADJECTIVE)

---

[locale-violation] ~850-~850: In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects.
Context: ... same user sends a text message "hello" afterwards - THEN the text message is processed no...

(AFTERWARDS_US)

</details>
<details>
<summary>openspec/changes/archive/2026-04-03-audio-input-support/design.md</summary>

[style] ~942-~942: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nges. - No provider contract changes. - No existing behavior modified. - Rollout: ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

</details>

</details>
<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>

<details>
<summary>openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md</summary>

[warning] 25-25: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

[warning] 30-30: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

[warning] 35-35: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

[warning] 313-313: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

---

[warning] 320-320: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (10)</summary><blockquote>

<details>
<summary>clients/agent-runtime/src/channels/discord.rs (1)</summary><blockquote>

`955-955`: **LGTM - test diagnostic improved.**

The panic message now accurately reflects the unexpected variant, addressing the prior feedback. Good fix.

</blockquote></details>
<details>
<summary>clients/agent-runtime/src/observability/otel.rs (1)</summary><blockquote>

`209-223`: **Audio ingress OTEL handling is now correctly wired.**

Line 209 no longer drops `ObserverEvent::AudioIngress`, and Line 215 emits the counter with structured attributes.

</blockquote></details>
<details>
<summary>clients/agent-runtime/src/observability/prometheus.rs (1)</summary><blockquote>

`120-131`: **Prometheus audio ingress metrics are correctly added.**

The new counter registration and the `ObserverEvent::AudioIngress` handler cover the previously missing telemetry path.



Also applies to: 205-215

</blockquote></details>
<details>
<summary>openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md (1)</summary><blockquote>

`4-7`: **Archive metadata looks aligned and corrected.**

Issue linking and archive metadata are consistent with the PR context.

</blockquote></details>
<details>
<summary>clients/agent-runtime/src/transcription/traits.rs (1)</summary><blockquote>

`35-45`: **Transcriber contract is well-shaped for pipeline error mapping.**

Returning `AudioRejectionReason` from `transcribe()` and structured readiness errors from `health_check()` keeps ingress handling deterministic.

</blockquote></details>
<details>
<summary>clients/agent-runtime/src/channels/telegram.rs (4)</summary><blockquote>

`1-1`: **LGTM!**

Import aligns with the new audio_media module structure.

---

`63-107`: **LGTM!**

Voice/audio parsing correctly mirrors the image parsing pattern. The forced `audio/ogg` for voice notes matches Telegram's spec.

---

`749-776`: **LGTM!**

Correctly addresses the unauthorized media-only message handling gap. Senders now receive notification regardless of whether they sent text or audio-only content.

---

`3114-3114`: **LGTM!**

Test assertion pattern change is fine.

</blockquote></details>
<details>
<summary>openspec/changes/archive/2026-04-03-audio-input-support/exploration.md (1)</summary><blockquote>

`1-463`: **LGTM!**

Documentation accurately reflects the implementation design. The markdown formatting issues from previous review (MD040/MD022) have been addressed — code blocks are labeled and headings have proper spacing.

</blockquote></details>

</blockquote></details>

</details>

coderabbitai · 2026-04-03T20:43:33Z

+                    let injected_text = if caption_text.is_some() {
+                        format!("[Audio transcription]: {trimmed}")
+                    } else {
+                        format!("[Voice message transcription]: {trimmed}")
+                    };


⚠️ Potential issue | 🟠 Major

Inject the transcript text itself, not a synthetic label.

Prefixing the transcript with [Voice message transcription]: / [Audio transcription]: changes what memory, pre-execution checks, and the provider see, so audio input is no longer equivalent to the typed user text. The audio provenance is already preserved in AudioHistoryMeta.

✂️ Suggested change

- let injected_text = if caption_text.is_some() { - format!("[Audio transcription]: {trimmed}") - } else { - format!("[Voice message transcription]: {trimmed}") - }; + let injected_text = trimmed.clone();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let injected_text = if caption_text.is_some() {

format!("[Audio transcription]: {trimmed}")

} else {

format!("[Voice message transcription]: {trimmed}")

};

let injected_text = trimmed.clone();

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/mod.rs` around lines 1479 - 1483, The injected_text currently prefixes the transcript with a synthetic label depending on caption_text, which alters downstream behavior; instead set injected_text to the transcript content itself (the trimmed string) without any “[Voice message transcription]”/“[Audio transcription]” prefix, removing the conditional formatting logic around caption_text and leaving audio provenance to AudioHistoryMeta so downstream memory/pre-execution checks and providers see the exact transcribed text.

coderabbitai · 2026-04-03T20:43:33Z

+```rust
+fn check_audio_config(config: &AudioConfig) -> Vec<DoctorWarning> {
+    let mut warnings = Vec::new();
+    if config.enabled {
+        // 1. Check whisper binary is in PATH or at configured path
+        if which::which(&config.whisper_binary).is_err() {
+            warnings.push(DoctorWarning::new(
+                "audio",
+                format!("whisper binary '{}' not found in PATH", config.whisper_binary),
+            ));
+        }
+        // 2. Check model file exists
+        let model_path = resolve_model_path(&config.transcription_model);
+        if !model_path.exists() {
+            warnings.push(DoctorWarning::new(
+                "audio",
+                format!("Whisper model not found at {}", model_path.display()),
+            ));
+        }
+    }
+    warnings
+}


⚠️ Potential issue | 🟡 Minor

Doctor API snippet is stale relative to current module.

The snippet references check_audio_config, DoctorWarning, and model_path.exists(); the implemented path is check_audio_health with DiagItem and file check semantics. Please refresh this block to avoid misleading future changes.

As per coding guidelines **/*.{md,mdx}: “Verify technical accuracy and that docs stay aligned with code changes.”

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@openspec/changes/archive/2026-04-03-audio-input-support/design.md` around lines 829 - 850, The doc snippet is stale: replace the old check_audio_config/DoctorWarning logic with the current check_audio_health/DiagItem semantics—call check_audio_health (or rename the snippet) and produce DiagItem entries instead of DoctorWarning, using the module's file-existence check used elsewhere (e.g., check_model_file_exists or model_path.is_file() rather than model_path.exists()) and update messages to match DiagItem fields (kind/source "audio", diagnostic message, and appropriate severity/category). Also update references from whisper_binary/whisper model checks to the actual identifiers used by check_audio_health (e.g., resolve_model_path -> the current resolver) so the doc mirrors the real function names and return structure.

coderabbitai · 2026-04-03T20:43:33Z

+- Audio observability events (`AudioIngressEvent`, `on_audio_ingress()`)
+- Audio history metadata (`AudioHistoryMeta` with transcription text)
+- `StagedAudioGuard` RAII cleanup
+- 6 error types: unsupported format, too large, too long, corrupted, transcription failed, no speech


⚠️ Potential issue | 🟡 Minor

Fix rejection taxonomy count in proposal.

Line 39 states “6 error types”, but this PR’s documented/implemented taxonomy is larger (including disabled/channel/transcriber/system cases). Update this line to avoid stale requirements language.

✏️ Proposed doc fix

-- 6 error types: unsupported format, too large, too long, corrupted, transcription failed, no speech +- 11+ rejection/error categories covering format, size/duration, corruption, transcription/no-speech, and runtime availability/system failures

As per coding guidelines **/*.{md,mdx}: “Verify technical accuracy and that docs stay aligned with code changes.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- 6 error types: unsupported format, too large, too long, corrupted, transcription failed, no speech

- 11+ rejection/error categories covering format, size/duration, corruption, transcription/no-speech, and runtime availability/system failures

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@openspec/changes/archive/2026-04-03-audio-input-support/proposal.md` at line 39, Update the stale count "6 error types" in the proposal where the phrase "unsupported format, too large, too long, corrupted, transcription failed, no speech" appears—either remove the numeric count or replace it with an accurate description that includes the additional cases (disabled, channel, transcriber, system) implemented in the PR so the taxonomy in the text matches the code/implementation; ensure the phrase describing error types reflects the full set or uses non-numeric language like "the following error types" to avoid future drift.

coderabbitai · 2026-04-03T20:43:33Z

+| Rejection Reason        | User-Facing Message                                                                  | Emitted When                                              |
+|-------------------------|--------------------------------------------------------------------------------------|-----------------------------------------------------------|
+| `Disabled`              | "Audio input is currently disabled."                                                 | `audio.enabled` is `false`                                |
+| `ChannelNotAllowed`     | "Audio input is not enabled for this channel."                                       | Channel not in `audio.allowed_channels`                   |
+| `FetchFailed`           | "I couldn't download that audio safely. Please try again."                           | Channel fetch fails (network, auth, timeout)              |
+| `MimeRejected`          | "That audio format is not supported. Supported formats: OGG, MP3, WAV, M4A."        | Magic-byte sniffing does not match allowed formats        |
+| `Oversize`              | "That audio file is too large to process. Maximum size: 25 MB."                      | Audio bytes exceed effective size limit                   |
+| `TooLong`               | "That audio is too long to process. Maximum duration: 10 minutes."                   | Duration exceeds effective duration limit                 |
+| `Corrupted`             | "That audio file appears to be corrupted and cannot be processed."                   | Transcription engine cannot decode the audio              |
+| `TranscriptionFailed`   | "Audio transcription failed. Please try again or send text instead."                 | Transcriber returns error (process crash, timeout, etc.)  |
+| `NoSpeechDetected`      | "No speech was detected in that audio. Please try again with a clearer recording."   | Transcription produces empty/whitespace-only text         |
+| `TranscriberUnavailable`| "Audio transcription is not available on this agent. Please send text instead."      | No healthy Transcriber is registered or health check fails|
+| `SystemError`           | "An internal error occurred while processing audio. Please try again."               | Unexpected internal error (e.g., temp file I/O failure, semaphore poisoning) |
+
+This taxonomy (11 variants) MUST be exhaustive for Phase 1. Every audio rejection MUST map to
+exactly one of these reasons.


⚠️ Potential issue | 🟡 Minor

Document the MultipleAudioParts rejection.

REQ-11 still declares an exhaustive 11-variant taxonomy, but the runtime now rejects multiple audio attachments with AudioRejectionReason::MultipleAudioParts and a dedicated user-facing message. Please add that 12th variant here, along with its ingress-reason behavior, so the archived spec matches the shipped contract.

As per coding guidelines, "Verify technical accuracy and that docs stay aligned with code changes."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md` around lines 632 - 647, The spec omits the new AudioRejectionReason::MultipleAudioParts variant; add a 12th row to the rejection table for `MultipleAudioParts` with a clear user-facing message (e.g., "Please send only one audio file at a time.") and an "Emitted When" description like "More than one audio attachment is present / runtime rejects multiple audio parts", ensuring the taxonomy is updated from 11 to 12 variants so the archived spec matches the shipped enum and runtime behavior; reference AudioRejectionReason::MultipleAudioParts and update the surrounding text asserting exhaustiveness (REQ-11) to reflect 12 variants.

coderabbitai · 2026-04-03T20:43:33Z

+The runtime MUST process every inbound audio through a 7-step pipeline inserted into
+`process_channel_message()` before `extract_user_text()` and `enrich_with_memory()`:
+
+1. **Parse**: Channel extracts audio metadata into `ContentPart::Audio` (REQ-1)
+2. **Gate config**: Check `[audio]` config — `enabled` and `allowed_channels` (REQ-7)
+3. **Fetch**: Download audio bytes from the channel's platform API (REQ-10)
+4. **Validate**: Apply MIME sniffing, size limit, and duration limit (REQ-3, REQ-4)
+5. **Stage**: Write validated bytes to temp file as `StagedAudio`, protected by `StagedAudioGuard`
+   RAII cleanup (REQ-5)
+6. **Transcribe**: Invoke `Transcriber::transcribe()` to produce text (REQ-6)
+7. **Inject**: Replace `ContentPart::Audio` with `ContentPart::Text` containing the transcription;
+   store `AudioHistoryMeta` (REQ-8)
+
+After injection, the message continues through the normal text-only flow (`enrich_with_memory()` →
+`run_unified_channel_tool_loop()` → provider). The provider MUST NOT receive audio bytes or any
+audio-specific payload.
+
+The pipeline MUST be fail-closed: any step that cannot be completed MUST reject the audio with an
+appropriate `AudioRejectionReason` and emit an `AudioIngressEvent`.


⚠️ Potential issue | 🟠 Major

Document the multiple-audio rejection contract.

Issue #246 is single-audio-per-message, and the runtime already has AudioRejectionReason::MultipleAudioParts, but REQ-2 still reads as if every inbound audio proceeds into the 7-step pipeline and REQ-11 says the 11-row table is exhaustive. Please add the early rejection when more than one audio part is present and include the missing reason/message in the taxonomy.

As per coding guidelines, "Verify technical accuracy and that docs stay aligned with code changes."

Also applies to: 632-647

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@openspec/specs/audio-input/spec.md` around lines 97 - 115, Add and document an early fail-closed check for multiple audio parts: before the 7-step pipeline in process_channel_message() (i.e., prior to Parse/extract_user_text()), detect if more than one ContentPart::Audio is present and immediately reject with AudioRejectionReason::MultipleAudioParts and emit an AudioIngressEvent; update REQ-2 to state this early rejection and update the REQ-11 taxonomy/table to include the MultipleAudioParts reason and its human-readable message so the docs match the runtime behavior (also apply the same insertion/update in the corresponding section around lines 632-647).

coderabbitai · 2026-04-03T20:43:33Z

+#### Scenario: Health check — healthy
+
+- GIVEN whisper binary exists at the configured path
+- AND the configured model file exists at `~/.corvus/models/whisper/{model}.bin`
+- WHEN `health_check()` is called
+- THEN it MUST return `Ok(())`
+
+#### Scenario: Health check — unhealthy (missing model)
+
+- GIVEN whisper binary exists but the configured model file does not exist
+- WHEN `health_check()` is called
+- THEN it MUST return `Err(String)` with a descriptive message about the missing model


⚠️ Potential issue | 🟡 Minor

Describe the resolved whisper model path, not a single hard-coded location.

The implementation notes a system model-path fallback, but these sections only allow ~/.corvus/models/whisper/{model}.bin. Please document the actual lookup order or refer to the resolved model path so package-managed installs remain in spec.

As per coding guidelines, "Verify technical accuracy and that docs stay aligned with code changes."

Also applies to: 892-915

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@openspec/specs/audio-input/spec.md` around lines 404 - 415, Update the spec text for the health_check() scenarios to describe the resolved whisper model lookup order rather than a single hard-coded path: list the precedence used by the implementation (explicit configured path, user home path like ~/.corvus/models/whisper/{model}.bin, then system/package-managed locations) or reference the resolved path returned by the health_check() logic; ensure the "unhealthy" scenario expects an Err(String) that names the resolved path it tried. Apply the same wording change to the other affected section referenced (the block around lines 892-915) so package-managed installs and the documented system fallback are covered consistently.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (5)

clients/agent-runtime/src/channels/mod.rs (3)
640-689: ⚠️ Potential issue | 🟠 Major

The turn timeout still starts after audio work.

Moving started_at earlier only changes logging. The actual tokio::time::timeout(...) is still created at Line 775, after gating, fetch/staging, and transcription complete, so slow downloads or transcriber queue waits can still bypass CHANNEL_MESSAGE_TIMEOUT_SECS. Start one deadline before the audio branch and spend the remaining budget inside the audio helpers.

Also applies to: 775-791
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/mod.rs` around lines 640 - 689, The
timeout/deadline must be created before doing audio work so slow gating, staging
or transcription can't bypass CHANNEL_MESSAGE_TIMEOUT_SECS; move creation of the
deadline/timeout (currently using started_at + CHANNEL_MESSAGE_TIMEOUT_SECS and
tokio::time::timeout(...)) to immediately after computing session_id/started_at,
then thread the remaining time/deadline into the audio helpers
(gate_audio_config, gate_and_stage_audio, transcribe_audio) or wrap those calls
with a timeout using the precomputed remaining Duration so they honor the same
CHANNEL_MESSAGE_TIMEOUT_SECS budget.
1479-1483: ⚠️ Potential issue | 🟠 Major

Inject the transcript text verbatim.

These prefixes change what memory, pre-execution checks, and the provider see, so audio input is no longer equivalent to typed input. The provenance already lives in AudioHistoryMeta.
Suggested change
-                    let injected_text = if caption_text.is_some() {
-                        format!("[Audio transcription]: {trimmed}")
-                    } else {
-                        format!("[Voice message transcription]: {trimmed}")
-                    };
+                    let injected_text = trimmed.clone();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/mod.rs` around lines 1479 - 1483, The
injected transcript currently prepends a prefix based on caption_text (the
injected_text construction) which changes how downstream memory and providers
treat audio vs typed input; remove those prefixes and inject the transcript
verbatim (use the trimmed transcript string directly) so provenance remains in
AudioHistoryMeta and the transcript is equivalent to typed input; update the
code that builds injected_text in channels/mod.rs (the block referencing
caption_text and injected_text) to assign the plain trimmed text without "[Audio
transcription]" or "[Voice message transcription]" prefixes.
1102-1125: ⚠️ Potential issue | 🟠 Major

Preserve MultipleAudioParts in observability.

This mapping still collapses a known validation failure into AudioIngressReason::SystemError, so dashboards and alerts cannot distinguish “one audio per message” rejections from real runtime failures. Add a dedicated observability reason and map it through here instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/mod.rs` around lines 1102 - 1125, The
match in audio_rejection_to_ingress_reason collapses
audio_media::AudioRejectionReason::MultipleAudioParts into
AudioIngressReason::SystemError; introduce a dedicated observability variant
(e.g., AudioIngressReason::MultipleAudioParts) in the observability enum and
update audio_rejection_to_ingress_reason to map
audio_media::AudioRejectionReason::MultipleAudioParts to that new variant
instead of SystemError so validation rejections are distinguishable in
dashboards and alerts; ensure any serialization/usage sites of
AudioIngressReason handle the new variant.
clients/agent-runtime/src/channels/audio_media.rs (2)
247-254: ⚠️ Potential issue | 🟠 Major

Keep trace metadata out of model-facing history.

to_context_string() is replayed into chat history from build_history(). Including byte_len and sha256 here leaks internal trace metadata into the prompt and burns tokens on every prior audio turn even though those fields already live in structured history. Keep the synthetic context to modality, duration, transcription, and caption only.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/audio_media.rs` around lines 247 - 254,
The to_context_string method currently injects internal trace metadata (byte_len
and sha256) into model-facing history; modify to_context_string (used by
build_history) to only include modality (mime), duration, transcription, and
caption in the produced string, removing byte_len, sha256, and any related
prefix_len logic so the returned context string is concise and safe for replay
into chat history.
120-129: ⚠️ Potential issue | 🟠 Major

Reject reserved MPEG version IDs too.

This still accepts headers like 0xFF 0xEA/0xEC/0xEE because only the layer bits are checked. Those use the reserved MPEG version id (0b01), so the MIME gate can still misclassify invalid frames as MP3. Add a version_bits != 0b01 guard and a regression test.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/audio_media.rs` around lines 120 - 129,
The MP3 sniffing still accepts frames with the reserved MPEG version id (0b01)
because only layer bits were checked; update the guard in the sniffing branch
that inspects sniffed_bytes so it also rejects version bits == 0b01 (i.e., check
the MPEG version bits in sniffed_bytes[1] and skip/return non-MP3 when they
equal the reserved value) before returning AllowedAudioMime::Mp3, and add a
regression test (e.g., feed bytes like 0xFF 0xEA/0xEC/0xEE) to assert these are
not classified as AllowedAudioMime::Mp3.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clients/agent-runtime/src/channels/audio_media.rs`:
- Around line 140-142: The current ftyp check on sniffed_bytes (using
sniffed_bytes[4..8] == b"ftyp") is too broad and treats any ISO BMFF as
AllowedAudioMime::M4a; update the detection in the same code path that returns
AllowedAudioMime::M4a to either (1) parse the ftyp box further and verify the
major_brand or any compatible_brand (bytes after the 8-byte header) contains an
audio-specific brand (e.g., "M4A " / "M4B " or other known audio brands) before
returning AllowedAudioMime::M4a, or (2) if brands are absent/unreliable, parse
the MP4 boxes to locate the moov->trak->mdia->hdlr box and ensure the
handler_type equals "soun" (audio) before accepting as M4a; apply this check
where sniffed_bytes and AllowedAudioMime::M4a are referenced so non-audio MP4
containers are rejected.

In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 1173-1175: The message for AudioRejectionReason::TooLong
incorrectly computes the displayed limit using integer division
(config.audio.max_audio_duration_secs / 60) which yields 0 for sub-minute limits
and underreports others; update the formatting in the match arm for
audio_media::AudioRejectionReason::TooLong to compute minutes and seconds from
config.audio.max_audio_duration_secs (or round up minutes when you prefer
minute-only display), choose pluralization ("minute"/"minutes",
"second"/"seconds") accordingly, and emit either "X minutes Y seconds" for
sub-minute and mixed values or "N minute(s)" when exact; ensure you reference
config.audio.max_audio_duration_secs and the AudioRejectionReason::TooLong
branch when making the change.

---

Duplicate comments:
In `@clients/agent-runtime/src/channels/audio_media.rs`:
- Around line 247-254: The to_context_string method currently injects internal
trace metadata (byte_len and sha256) into model-facing history; modify
to_context_string (used by build_history) to only include modality (mime),
duration, transcription, and caption in the produced string, removing byte_len,
sha256, and any related prefix_len logic so the returned context string is
concise and safe for replay into chat history.
- Around line 120-129: The MP3 sniffing still accepts frames with the reserved
MPEG version id (0b01) because only layer bits were checked; update the guard in
the sniffing branch that inspects sniffed_bytes so it also rejects version bits
== 0b01 (i.e., check the MPEG version bits in sniffed_bytes[1] and skip/return
non-MP3 when they equal the reserved value) before returning
AllowedAudioMime::Mp3, and add a regression test (e.g., feed bytes like 0xFF
0xEA/0xEC/0xEE) to assert these are not classified as AllowedAudioMime::Mp3.

In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 640-689: The timeout/deadline must be created before doing audio
work so slow gating, staging or transcription can't bypass
CHANNEL_MESSAGE_TIMEOUT_SECS; move creation of the deadline/timeout (currently
using started_at + CHANNEL_MESSAGE_TIMEOUT_SECS and tokio::time::timeout(...))
to immediately after computing session_id/started_at, then thread the remaining
time/deadline into the audio helpers (gate_audio_config, gate_and_stage_audio,
transcribe_audio) or wrap those calls with a timeout using the precomputed
remaining Duration so they honor the same CHANNEL_MESSAGE_TIMEOUT_SECS budget.
- Around line 1479-1483: The injected transcript currently prepends a prefix
based on caption_text (the injected_text construction) which changes how
downstream memory and providers treat audio vs typed input; remove those
prefixes and inject the transcript verbatim (use the trimmed transcript string
directly) so provenance remains in AudioHistoryMeta and the transcript is
equivalent to typed input; update the code that builds injected_text in
channels/mod.rs (the block referencing caption_text and injected_text) to assign
the plain trimmed text without "[Audio transcription]" or "[Voice message
transcription]" prefixes.
- Around line 1102-1125: The match in audio_rejection_to_ingress_reason
collapses audio_media::AudioRejectionReason::MultipleAudioParts into
AudioIngressReason::SystemError; introduce a dedicated observability variant
(e.g., AudioIngressReason::MultipleAudioParts) in the observability enum and
update audio_rejection_to_ingress_reason to map
audio_media::AudioRejectionReason::MultipleAudioParts to that new variant
instead of SystemError so validation rejections are distinguishable in
dashboards and alerts; ensure any serialization/usage sites of
AudioIngressReason handle the new variant.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7e814888-7a47-45ec-a21d-880d1e4bf5be

📥 Commits

Reviewing files that changed from the base of the PR and between aa848be and 30ed5eb.

📒 Files selected for processing (2)

clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/mod.rs

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: pr-checks
GitHub Check: sonar
GitHub Check: submit-gradle
GitHub Check: Cloudflare Pages

🧰 Additional context used

📓 Path-based instructions (5)

clients/agent-runtime/src/channels/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Implement Channel trait in src/channels/ with consistent send, listen, and health_check semantics and cover auth/allowlist/health behavior with tests

Files:

clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/mod.rs

clients/agent-runtime/src/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Files:

clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/mod.rs

clients/agent-runtime/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test for code validation, or document which checks were skipped and why

Files:

clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/mod.rs

**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.

Files:

clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/mod.rs

**/*

⚙️ CodeRabbit configuration file

**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.

Files:

clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/mod.rs

🧠 Learnings (3)

📚 Learning: 2026-02-17T12:31:17.076Z

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests

Applied to files:

clients/agent-runtime/src/channels/audio_media.rs
clients/agent-runtime/src/channels/mod.rs

📚 Learning: 2026-02-17T12:31:17.076Z

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why

Applied to files:

clients/agent-runtime/src/channels/audio_media.rs

📚 Learning: 2026-02-17T12:31:17.076Z

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths

Applied to files:

clients/agent-runtime/src/channels/audio_media.rs

coderabbitai · 2026-04-04T06:55:39Z

+    // M4A: bytes 4-7 = "ftyp" (ISO base media file format)
+    if sniffed_bytes.len() >= 8 && &sniffed_bytes[4..8] == b"ftyp" {
+        return Ok(AllowedAudioMime::M4a);


⚠️ Potential issue | 🟠 Major

ftyp alone is too broad for M4A detection.

Any ISO BMFF file with an ftyp box here — including plain MP4 video containers — is accepted as AllowedAudioMime::M4a. That weakens the fail-closed MIME gate and can route non-audio payloads into transcription. Check an audio-specific brand or inspect the track handler before returning M4a.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/audio_media.rs` around lines 140 - 142, The current ftyp check on sniffed_bytes (using sniffed_bytes[4..8] == b"ftyp") is too broad and treats any ISO BMFF as AllowedAudioMime::M4a; update the detection in the same code path that returns AllowedAudioMime::M4a to either (1) parse the ftyp box further and verify the major_brand or any compatible_brand (bytes after the 8-byte header) contains an audio-specific brand (e.g., "M4A " / "M4B " or other known audio brands) before returning AllowedAudioMime::M4a, or (2) if brands are absent/unreliable, parse the MP4 boxes to locate the moov->trak->mdia->hdlr box and ensure the handler_type equals "soun" (audio) before accepting as M4a; apply this check where sniffed_bytes and AllowedAudioMime::M4a are referenced so non-audio MP4 containers are rejected.

- Reject reserved MPEG version bits (0b01) in MP3 magic-byte detection to exclude more invalid frame headers - Fix TooLong user message for sub-minute durations (was showing 0 min) - Remove sha256/byte_len from to_context_string() to reduce model tokens - Add dedicated MultipleAudioParts variant to AudioIngressReason for dashboards/alerts instead of collapsing into SystemError

Check cumulative size against max_audio_bytes before extending the byte buffer in fetch_and_stage_audio to prevent OOM from oversized chunks sent by a malicious upstream server.

Replace the monolithic pre-push hook that runs all checks (~2-7 min) with a diff-aware version that only checks stacks with changed files: - Rust (fmt + clippy + unit tests): only if clients/agent-runtime/ changed - Kotlin (compile check): only if composeApp/agent-core-kmp/gradle changed - Web (biome lint): only if clients/web/ changed - Docs (lychee links): only if .md files changed - Gradle locks: only if build config changed Expected improvement: 2-7 minutes → 0-25 seconds for typical pushes. CI remains the comprehensive quality gate. Escape hatches: - SKIP_GIT_HOOKS=1 git push (bypass entirely) - FULL_PRE_PUSH=1 git push (run all checks like before)

Add unit tests for build_transcriber, gate_audio_config edge cases, inject_transcription, TooLong message variants, Telegram voice/audio JSON parsing, and AudioConfig zero-value validation to close the 1.6 percent coverage gap on new code.

sonarqubecloud · 2026-04-04T08:33:46Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
86.7% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

test(runtime): add audio pipeline coverage tests for SonarCloud gate

c2f6341

Add unit tests for audio rejection user messages, ingress reason mapping, config validation, Telegram voice/audio JSON parsing, and pipeline integration to reach ≥80% coverage on new code.

coderabbitai Bot added area:rust labels Apr 3, 2026

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 4, 2026

View reviewed changes

yacosta738 force-pushed the feature/dallay-150-add-audio-input-support-for-agents-telegram-http-gateway-cli branch from 30ed5eb to c78f06d Compare April 4, 2026 07:24

yacosta738 added 3 commits April 4, 2026 09:43

fix(runtime): validate audio chunk size before buffer allocation

0450096

Check cumulative size against max_audio_bytes before extending the byte buffer in fetch_and_stage_audio to prevent OOM from oversized chunks sent by a malicious upstream server.

yacosta738 merged commit 258d3c3 into main Apr 4, 2026
16 checks passed

yacosta738 deleted the feature/dallay-150-add-audio-input-support-for-agents-telegram-http-gateway-cli branch April 4, 2026 08:38

yacosta738 mentioned this pull request Apr 4, 2026

chore: release v1.0.0 #237

Merged

coderabbitai Bot mentioned this pull request Apr 4, 2026

feat: add audio input support for http gateway and cli phase 2 #437

Merged

This was referenced Apr 9, 2026

chore: release v2.0.0 #465

Merged

docs: broken links report #464

Closed

chore: release v2.2.0 #471

Merged

chore: release v3.0.0 #517

Merged

docs: broken links report #531

Closed

dallay-bot Bot mentioned this pull request Apr 19, 2026

docs: broken links report #570

Closed

This was referenced Apr 25, 2026

docs: broken links report #645

Closed

chore: release main #655

Closed

chore: release main #708

Merged

docs: broken links report #718

Closed

docs: broken links report #733

Closed

Copilot AI mentioned this pull request May 1, 2026

fix(docs): exclude npmjs.com from lychee link checker #734

Merged

8 tasks

dallay-bot Bot mentioned this pull request May 3, 2026

docs: broken links report #761

Closed

Copilot AI mentioned this pull request May 3, 2026

fix(docs): correct broken CHANGELOG links from /issues/ to /pull/ for PRs #767

Merged

8 tasks

dallay-bot Bot mentioned this pull request May 6, 2026

docs: broken links report #787

Closed

Copilot AI mentioned this pull request May 6, 2026

docs: fix broken CHANGELOG comparison links #790

Merged

8 tasks

This was referenced May 10, 2026

docs: broken links report #801

Closed

docs: broken links report #804

Closed

dallay-bot Bot mentioned this pull request May 17, 2026

docs: broken links report #811

Open

	- 6 error types: unsupported format, too large, too long, corrupted, transcription failed, no speech
	- 11+ rejection/error categories covering format, size/duration, corruption, transcription/no-speech, and runtime availability/system failures

Conversation

yacosta738 commented Apr 3, 2026

Uh oh!

linear Bot commented Apr 3, 2026

Uh oh!

coderabbitai Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

github-actions Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Contributor Report

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying corvus with Cloudflare Pages

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 3, 2026

Choose a reason for hiding this comment

Code Quality Assessment

Anti-Patterns Check

Code Style

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Anti-Patterns Check

Code Style

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud Bot commented Apr 4, 2026

Quality Gate passed

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 3, 2026 •

edited

Loading

github-actions Bot commented Apr 3, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Apr 3, 2026 •

edited

Loading