Skip to content

feat(runtime): add audio input support with local transcription for Telegram#413

Merged
yacosta738 merged 7 commits into
mainfrom
feature/dallay-150-add-audio-input-support-for-agents-telegram-http-gateway-cli
Apr 4, 2026
Merged

feat(runtime): add audio input support with local transcription for Telegram#413
yacosta738 merged 7 commits into
mainfrom
feature/dallay-150-add-audio-input-support-for-agents-telegram-http-gateway-cli

Conversation

@yacosta738
Copy link
Copy Markdown
Contributor

Add audio-to-text input capability so agents can receive voice notes and audio files (OGG/Opus, MP3, WAV, M4A) via Telegram, transcribe them locally using whisper.cpp, and feed the transcription into the normal agent conversation flow.

Key changes:

  • ContentPart::Audio variant for multimodal message parsing
  • Transcriber trait as new runtime extension point for STT engines
  • WhisperCliTranscriber wrapping whisper.cpp CLI with concurrency guard
  • Audio media module: MIME sniffing, size/duration validation, staging
  • 7-step pipeline: parse → gate → fetch → validate → stage → transcribe → inject
  • [audio] TOML config section (disabled by default, fail-closed)
  • AudioIngressEvent observability for all admission/rejection paths
  • StagedAudioGuard RAII cleanup on all exit paths
  • Doctor health checks for whisper binary and model availability
  • Zero new Rust crate dependencies

Privacy: all transcription is local (NFR1), no audio data leaves the operator's infrastructure.

Closes #246

…elegram

Add audio-to-text input capability so agents can receive voice notes and
audio files (OGG/Opus, MP3, WAV, M4A) via Telegram, transcribe them
locally using whisper.cpp, and feed the transcription into the normal
agent conversation flow.

Key changes:
- ContentPart::Audio variant for multimodal message parsing
- Transcriber trait as new runtime extension point for STT engines
- WhisperCliTranscriber wrapping whisper.cpp CLI with concurrency guard
- Audio media module: MIME sniffing, size/duration validation, staging
- 7-step pipeline: parse → gate → fetch → validate → stage → transcribe → inject
- [audio] TOML config section (disabled by default, fail-closed)
- AudioIngressEvent observability for all admission/rejection paths
- StagedAudioGuard RAII cleanup on all exit paths
- Doctor health checks for whisper binary and model availability
- Zero new Rust crate dependencies

Privacy: all transcription is local (NFR1), no audio data leaves the
operator's infrastructure.

Closes #246
@linear
Copy link
Copy Markdown

linear Bot commented Apr 3, 2026

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR implements Phase 1 audio input support for agents, introducing local audio ingestion via Telegram, validation through MIME sniffing and size/duration limits, transcription using a whisper.cpp CLI wrapper, and injection into the existing agent conversation pipeline with observability instrumentation and configuration validation.

Changes

Cohort / File(s) Summary
Audio Core Media & Validation
clients/agent-runtime/src/channels/audio_media.rs
New module defining audio MIME enum with parsing/canonical serialization, magic-byte MIME sniffing (with precedence over declared MIME), rejection reason error types, size/duration validators, and StagedAudio container with SHA-256, sniffed MIME, and RAII cleanup; includes AudioHistoryMeta for conversation context serialization with truncation/formatting.
Message Type Extensions
clients/agent-runtime/src/channels/traits.rs
Added ContentPart::Audio variant with channel metadata, optional MIME/caption/filename/bytes/duration; added helpers has_audio_parts() and audio_parts() to ChannelMessage; updated text_projection() to include non-empty audio captions.
Channel Message Processing
clients/agent-runtime/src/channels/mod.rs
Integrated audio pipeline into process_channel_message() before memory enrichment: gating on config/transcriber availability, staging audio (Telegram-only), MIME/size/duration validation, transcription with Transcriber trait, and injection of transcription as text; added observability event emission and RAII cleanup guard; extended handle_successful_response to store audio metadata via ChatMessage::user_with_audio()/user_with_media(); updated build_history() to include audio context.
Telegram Audio Integration
clients/agent-runtime/src/channels/telegram.rs
Extended parsing to recognize voice and audio message fields, emit ContentPart::Audio; added TelegramChannel::fetch_and_stage_audio() method performing pre-flight duration/size validation, HTTP download with streaming size enforcement, MIME sniffing, SHA-256 computation, and atomic temp-file staging; updated error-handling control flow for unauthorized audio messages.
Transcription Abstraction
clients/agent-runtime/src/transcription/traits.rs, clients/agent-runtime/src/transcription/whisper_cli.rs
New Transcriber trait with async transcribe() and health_check() methods; WhisperCliTranscriber implementation using subprocess spawning with configurable binary/model/language, semaphore-based concurrency limiting, per-call timeout, stderr inspection for error classification (Corrupted vs TranscriptionFailed), and [BLANK_AUDIO] marker filtering; includes resolve_model_path() for model file discovery.
Audio Configuration & Validation
clients/agent-runtime/src/config/schema.rs, clients/agent-runtime/src/config/mod.rs
New AudioConfig struct with enable flag, allowed-channel allowlist, size/duration ceilings, transcription model/language, whisper binary path, concurrency, and timeout; wired into Config with serde defaults; validation enforces nonzero bounds and phase-1-channel warnings when enabled; re-exported in public API.
Observability Audio Events
clients/agent-runtime/src/observability/traits.rs, clients/agent-runtime/src/observability/mod.rs, clients/agent-runtime/src/observability/log.rs, clients/agent-runtime/src/observability/otel.rs, clients/agent-runtime/src/observability/prometheus.rs
Added AudioIngressOutcome (Admitted/Rejected), AudioIngressReason enum (11 variants with snake_case Display), and AudioIngressEvent struct; extended ObserverEvent with AudioIngress variant; implemented on_audio_ingress() trait method with default forward to record_event(); added logging/OTEL/Prometheus metrics with channel, outcome, reason labels.
Provider Message & History Integration
clients/agent-runtime/src/providers/traits.rs
Added audio_metadata: Option<Vec<AudioHistoryMeta>> field to ChatMessage (with Serde default and skip_serializing_if); added constructors user_with_audio() and user_with_media() that conditionally set metadata when vectors are non-empty; updated existing constructors to initialize audio_metadata: None.
Provider Test Fixtures
clients/agent-runtime/src/providers/anthropic.rs, compatible.rs, copilot.rs, openrouter.rs, router.rs
Updated ChatMessage test constructions across unit/integration tests to include audio_metadata: None field, aligning fixture initialization with updated struct shape.
Test & Utility Updates
clients/agent-runtime/src/channels/discord.rs, clients/agent-runtime/src/channels/whatsapp.rs
Generalized panic-match arms from specific ContentPart variants to wildcard _ patterns, improving test robustness to new variants without behavior change.
Runtime Startup & Diagnostics
clients/agent-runtime/src/main.rs, clients/agent-runtime/src/lib.rs, clients/agent-runtime/src/doctor/mod.rs
Added transcription module declaration; extended run(config) in doctor to call check_audio_health(), which verifies whisper binary with --help and resolves transcription model file presence when audio is enabled; logs skipped checks when disabled.
Configuration Wizard
clients/agent-runtime/src/onboard/wizard.rs
Updated run_wizard() and run_quick_setup() to initialize audio: AudioConfig::default() in Config construction.
Design & Specification Documentation
openspec/changes/archive/2026-04-03-audio-input-support/*, openspec/specs/audio-input/spec.md
Added comprehensive design, exploration, proposal, specification, task plan, verification and archive reports documenting Phase 1 audio ingestion, transcription, integration points, error taxonomy, privacy/concurrency constraints, and compliance matrix.

Sequence Diagram(s)

sequenceDiagram
    participant User as Telegram User
    participant Telegram as Telegram Channel
    participant Stage as Staging (Temp File)
    participant Transcriber as Whisper CLI
    participant Provider as LLM Provider
    participant Agent as Agent Loop

    User->>Telegram: Send voice/audio message
    Telegram->>Telegram: Parse ContentPart::Audio
    Telegram->>Telegram: Gate on config/transcriber
    Telegram->>Telegram: Fetch file from Telegram API
    Telegram->>Stage: Validate MIME (magic bytes)<br/>Check size/duration<br/>Write temp file with SHA-256
    
    rect rgba(100, 150, 200, 0.5)
    Note over Stage,Transcriber: Audio Pipeline
    Stage->>Transcriber: Pass StagedAudio
    Transcriber->>Transcriber: Acquire semaphore permit<br/>(concurrency control)
    Transcriber->>Transcriber: Spawn whisper subprocess<br/>with model/timeout
    Transcriber->>Transcriber: Parse stdout<br/>Filter [BLANK_AUDIO]<br/>Guard empty transcription
    Transcriber-->>Stage: Return TranscriptionResult
    end
    
    Stage->>Telegram: Emit AudioIngressEvent<br/>(Admitted/Rejected)
    Stage->>Stage: RAII cleanup temp file
    
    Telegram->>Telegram: Inject transcription<br/>Replace Audio with Text
    Telegram->>Telegram: Build AudioHistoryMeta<br/>for conversation context
    Telegram->>Provider: Send ChatMessage<br/>with audio_metadata
    Provider->>Agent: Include audio history context
    Agent->>Provider: Generate response
    Provider-->>Telegram: Response text
    Telegram-->>User: Send text response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Rationale: Heterogeneous changes spanning new abstraction layers (Transcriber trait, audio media module with MIME sniffing), deep integration into the message processing pipeline with careful ordering (gating before memory enrichment), RAII cleanup guarantees, subprocess concurrency control with semaphores, Telegram-specific fetch/stage implementation, and widespread observability instrumentation. Requires careful validation of error propagation paths, transcription rejection handling, cleanup side effects, and concurrency correctness.

Possibly related PRs

Suggested reviewers

  • yuniel-acosta
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title uses conventional commit format (feat prefix) and clearly summarizes the main change—adding audio input support with local transcription for Telegram. However, it exceeds the 72-character limit at 76 characters. Reduce title to 72 characters or fewer, e.g., 'feat(runtime): add audio input support for Telegram' (58 chars) or similar.
Description check ❓ Inconclusive The PR description covers key changes, motivation, and links the related issue (#246). However, it does not follow the provided template structure (missing discrete sections for Related Issues, Tested Information, Documentation Impact, Breaking Changes, and Checklist). Restructure the description to match the template: add discrete sections for Related Issues, Tested Information, Documentation Impact, Breaking Changes, and a completed Checklist.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR implements all primary coding requirements from issue #246: audio parsing (ContentPart::Audio), local transcription (Transcriber trait + WhisperCliTranscriber), MIME/size/duration validation, staging with RAII cleanup, injection into the message flow, observability (AudioIngressEvent), error taxonomy, and privacy constraints (local-only processing).
Out of Scope Changes check ✅ Passed All changes are scoped to issue #246 objectives: audio pipeline (parse/gate/fetch/validate/stage/transcribe/inject), configuration, observability, Telegram integration, and doctor health checks. No unrelated refactoring, feature creep, or tangential changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 92.45% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/dallay-150-add-audio-input-support-for-agents-telegram-http-gateway-cli

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

✅ Contributor Report

User: @yacosta738
Status: Passed (12/13 metrics passed)

Metric Description Value Threshold Status
PR Merge Rate PRs merged vs closed 89% >= 30%
Repo Quality Repos with ≥100 stars 0 >= 0
Positive Reactions Positive reactions received 10 >= 1
Negative Reactions Negative reactions received 0 <= 5
Account Age GitHub account age 3080 days >= 30 days
Activity Consistency Regular activity over time 108% >= 0%
Issue Engagement Issues with community engagement 0 >= 0
Code Reviews Code reviews given to others 510 >= 0
Merger Diversity Unique maintainers who merged PRs 2 >= 0
Repo History Merge Rate Merge rate in this repo 91% >= 0%
Repo History Min PRs Previous PRs in this repo 195 >= 0
Profile Completeness Profile richness (bio, followers) 90 >= 0
Suspicious Patterns Spam-like activity detection 1 N/A

Contributor Report evaluates based on public GitHub activity. Analysis period: 2025-04-04 to 2026-04-04

Add unit tests for audio rejection user messages, ingress reason
mapping, config validation, Telegram voice/audio JSON parsing, and
pipeline integration to reach ≥80% coverage on new code.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 3, 2026

Deploying corvus with  Cloudflare Pages  Cloudflare Pages

Latest commit: b4f304f
Status: ✅  Deploy successful!
Preview URL: https://c785439c.corvus-42x.pages.dev
Branch Preview URL: https://feature-dallay-150-add-audio.corvus-42x.pages.dev

View logs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 27

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
clients/agent-runtime/src/channels/mod.rs (2)

619-688: ⚠️ Potential issue | 🟠 Major

Put the audio stages under the per-turn timeout.

This block now does fetch/stage/transcribe work before the only timeout in the handler. A slow Telegram download or wedged whisper process can hold a worker past CHANNEL_MESSAGE_TIMEOUT_SECS and never hit the timeout reply path. Wrap the whole turn, or pass a remaining budget into the audio stages.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/mod.rs` around lines 619 - 688,
process_channel_message currently performs the audio pipeline
(gate_audio_config, gate_and_stage_audio, transcribe_audio,
inject_transcription) before the per-turn timeout, allowing slow
downloads/transcription to exceed CHANNEL_MESSAGE_TIMEOUT_SECS; move the entire
audio stages under the per-turn timeout boundary (or compute remaining_budget
and pass it into gate_audio_config/gate_and_stage_audio/transcribe_audio) so
that these calls are canceled when the channel turn times out, and ensure any
temp resources from audio_guard are still cleaned up on timeout.

2705-2721: ⚠️ Potential issue | 🔴 Critical

Instantiate the transcriber in both runtime constructors.

gate_audio_config() rejects whenever ctx.transcriber is empty, but both production ChannelRuntimeContext builders still hard-code transcriber: None. With [audio] enabled, every audio turn will fail as TranscriberUnavailable, so the new feature never actually admits audio in either start_channels() or spawn_runtime_handle().

Also applies to: 2784-2800

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/mod.rs` around lines 2705 - 2721, The
ChannelRuntimeContext currently sets transcriber: None which causes
gate_audio_config() to reject audio; fix by constructing the transcriber before
building runtime_ctx and passing it into ChannelRuntimeContext as transcriber:
Some(...) instead of None. Locate where runtime_ctx is created (the
ChannelRuntimeContext instantiation in start_channels() and the analogous one in
spawn_runtime_handle()) and call the existing audio/transcriber factory (e.g.,
the module/function used to create transcribers in this crate—invoke it with the
runtime/config) to produce a transcriber instance, then set transcriber:
Some(transcriber) in both constructors so gate_audio_config() sees a transcriber
present. Ensure any errors from creating the transcriber are handled/propagated
consistent with existing error handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clients/agent-runtime/src/channels/audio_media.rs`:
- Around line 118-124: The MP3 magic-byte check in audio_media.rs is too strict;
update the second-byte test to accept any value with the top three bits set (the
MPEG sync continuation) instead of only 0xFB/0xF3/0xF2. Replace the explicit
equality checks on sniffed_bytes[1] with a mask test like (sniffed_bytes[1] &
0xE0) == 0xE0 in the same if that returns AllowedAudioMime::Mp3 so valid MP3
frame headers aren’t rejected.

In `@clients/agent-runtime/src/channels/discord.rs`:
- Line 955: The panic message in the wildcard match arm (currently `_ =>
panic!("expected Image, got Text")`) is inaccurate; change the arm to bind the
unmatched value (e.g., `_` -> `other`) and update the panic to either a generic
message like "expected Image, got non-Image variant" or include the actual
variant via formatting (e.g., panic!("expected Image, got {:?}", other)) so the
failure text matches the wildcard behavior.

In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 1271-1283: The code currently maps the "more than one audio part"
validation to audio_media::AudioRejectionReason::SystemError; update the
reject_audio_turn call in the audio_parts.len() > 1 branch to use a specific
rejection reason (e.g.
audio_media::AudioRejectionReason::TooManyAudioAttachments or a clearly named
variant like MultipleAudioAttachments). If that enum variant does not exist, add
it to audio_media::AudioRejectionReason and use it here so the rejection path in
reject_audio_turn, telemetry, and any user-facing messages can distinguish
validation failures from internal system errors. Ensure you reference the
audio_parts check, the reject_audio_turn call, and replace the SystemError enum
usage with the new specific variant.
- Around line 670-681: The emitted transcription latency is using
tx.duration_secs (clip length) instead of the actual processing time; update the
code that calls emit_audio_ingress in the loop over audio_guard and
transcriptions to use the TranscriptionResult.processing_ms (preserved from
transcribe_audio()) converted to ms instead of
tx.duration_secs.map(duration_f64_to_ms); ensure TranscriptionResult returned by
transcribe_audio() includes processing_ms and that the other similar
emit_audio_ingress usage (the block around lines 1376-1405) is updated the same
way so both places report real transcription latency.

In `@clients/agent-runtime/src/channels/telegram.rs`:
- Around line 1824-1833: The current staging uses a predictable temp path built
from sha256 (temp_path) and writes via tokio::fs::write, which risks races and
symlink/clobber attacks; change the logic in the block that constructs temp_path
and calls tokio::fs::write (and returns
audio_media::AudioRejectionReason::FetchFailed on error) to create a secure,
unique temp file (use std::fs::OpenOptions with create_new or a NamedTempFile
equivalent), write the bytes via the file handle rather than atomically
overwriting a predictable path, and then use/rename that file to the final name
or return its path; retain the same error mapping but ensure failures are logged
with the file handle/path for debugging.
- Around line 63-107: The audio/voice parsing now creates ContentPart::Audio
even when derive_text_projection() returns None, which means
handle_unauthorized_message() bails out without sending an approval prompt;
update the unauthorized-path logic (where derive_text_projection(),
handle_unauthorized_message(), and send_unauthorized_notification() are used) to
detect media-only updates (parts contains audio but text projection is None) and
call send_unauthorized_notification() so unapproved senders still get the
notification; apply the same fix for the other audio-handling blocks (the voice
and audio branches that push ContentPart::Audio) and add a regression test that
posts an unauthorized audio-only update and asserts
send_unauthorized_notification() (or the channel’s outbound notification) was
invoked.

In `@clients/agent-runtime/src/config/schema.rs`:
- Around line 304-308: Replace the duplicated hard-coded constants in schema.rs
with the shared definitions from the audio_media module: remove the local
MAX_AUDIO_BYTES_CEILING and MAX_AUDIO_DURATION_SECS_CEILING declarations and
import the constants from channels::audio_media (e.g. use
crate::channels::audio_media::{MAX_AUDIO_BYTES_CEILING,
MAX_AUDIO_DURATION_SECS_CEILING};), so startup validation uses the same values
as runtime media validation (refer to the constants named
MAX_AUDIO_BYTES_CEILING and MAX_AUDIO_DURATION_SECS_CEILING).
- Around line 336-341: Add validation to reject zero for the transcription
controls: ensure max_concurrent_transcriptions and transcription_timeout_secs
are > 0 during config validation (e.g., in the Config/Schema validation method
in clients/agent-runtime/src/config/schema.rs). If either field equals 0, return
a clear startup error (with context naming max_concurrent_transcriptions or
transcription_timeout_secs) rather than allowing runtime operation; use the
existing validation/error pattern used elsewhere in the file (and reference
default_max_concurrent_transcriptions and default_transcription_timeout_secs
when documenting/recovering).
- Around line 313-342: The AudioConfig struct currently allows
unknown/misspelled TOML keys which silently fall back to defaults; update the
AudioConfig definition to add the serde attribute #[serde(deny_unknown_fields)]
so deserialization fails on unknown fields (matching the parent Config behavior)
— locate the AudioConfig struct in schema.rs and add that attribute above its
#[derive(...)] line.

In `@clients/agent-runtime/src/doctor/mod.rs`:
- Around line 944-971: The test audio_health_pass_model_exists duplicates the
production model-check logic instead of exercising check_audio_health(), so
replace the inline existence checks with a call to check_audio_health() (or the
specific helper it uses) to ensure the real path is tested; set up the TempDir
and model file as before, then call check_audio_health() (or the exported
function that returns Vec<DiagItem>), and assert on the returned items' length,
Severity::Ok and message contains "found" to validate the real logic (reference
test name audio_health_pass_model_exists, function check_audio_health, types
DiagItem and Severity).
- Around line 659-677: The check currently treats any existing filesystem entry
as a valid whisper model; change the logic to verify the resolved model_path is
a regular file (e.g., use model_path.is_file() or metadata().is_file()) before
pushing DiagItem::ok for the transcription model (ac.transcription_model); if
not a file, push DiagItem::error with the same contextual message referencing
model_path.display() so directories don't produce false-positive doctor results.

In `@clients/agent-runtime/src/observability/otel.rs`:
- Around line 201-202: ObserverEvent::AudioIngress is currently ignored in the
OTEL backend (the match arm with ObserverEvent::AudioIngress(_) is a no-op), so
audio admit/reject telemetry and reasons are not recorded; update the OTEL
handler in otel.rs to record a metric and associated attributes for audio
ingress events instead of silently dropping them—extract the admit/reject status
and reason from the AudioIngress payload and use the existing OTEL metric
recorder (same subsystem used for other ObserverEvent arms) to emit a counter or
histogram and set attributes like "audio.admit" (bool/string) and "audio.reason"
(string) so OTEL deployments capture admit/reject counts and reasons.

In `@clients/agent-runtime/src/observability/prometheus.rs`:
- Around line 190-191: The match arm currently ignores
ObserverEvent::AudioIngress which prevents audio admit/reject/failure metrics
from being exposed; update the handler (the match over ObserverEvent in
prometheus.rs) to process ObserverEvent::AudioIngress instead of discarding it,
and map its inner variants to the same Prometheus counters used for other
ingress types (increment the appropriate admit/reject/failure metrics and set
labels/timestamps as done for the existing ingress events), using the
ObserverEvent::AudioIngress symbol to locate the code and mirror the logic used
for the other ingress-related arms.

In `@clients/agent-runtime/src/providers/traits.rs`:
- Around line 54-64: Add symmetric serde tests for the new audio_metadata field
matching the existing image_metadata tests: write a missing-field
deserialization test that deserializes JSON lacking "audio_metadata" into the
same struct used in traits.rs (exercise user_with_audio / the message struct)
and asserts audio_metadata == None, and write a skip-serialize-none test that
constructs the struct with audio_metadata = None, serializes it to JSON, and
asserts the "audio_metadata" key is not present; place these tests alongside the
existing image_metadata serde tests so they run in the same test module.
- Around line 54-64: The PR currently builds mixed-media turns by calling
user_with_audio(...) and then mutating image_metadata later, which creates
partial-state; add a single constructor (e.g., user_with_media) that accepts
content plus both image_metadata and audio_metadata (or Option-wrapped Vecs) and
returns Self with role, content, image_metadata and audio_metadata set
atomically; update callers that currently call user_with_audio and then set
image_metadata (the code mutating image_metadata) to call user_with_media
instead; optionally keep thin helpers user_with_audio and user_with_image that
forward to user_with_media to preserve existing call-sites.

In `@clients/agent-runtime/src/transcription/whisper_cli.rs`:
- Around line 150-156: The current check in the whisper-cli subprocess handling
(the branch that tests output.status.success() in the function handling
transcription) treats any non-zero exit as AudioRejectionReason::Corrupted;
change this so the default error returned for non-zero exits is a
transcription/system error (e.g., AudioRejectionReason::TranscriptionFailed or
SystemError) and only map to AudioRejectionReason::Corrupted when stderr
contains clear media-decode/input failure signatures (detect keywords like
"decode", "unsupported format", "invalid data", "couldn't parse", "ffmpeg",
"libav", or similar). Update the error path that logs via tracing::error! to
still include stderr and exit code, and perform a small pattern match on the
stderr string to switch to Corrupted only when those decode-related tokens are
present; otherwise return the TranscriptionFailed/SystemError variant.
- Around line 73-89: The function resolve_model_path currently returns the
per-user path whenever a home directory exists without checking whether the file
actually exists; change it to construct the user path (using
user_dirs.home_dir() and the filename), test that path.exists(), and only return
it if present—otherwise fall back to the system path (e.g.,
PathBuf::from(format!("/usr/local/share/whisper/{filename}"))). Ensure you still
keep the existing fallback when directories::UserDirs::new() is None. Use the
same local variables (filename, user_dirs, home_dir) so callers of
resolve_model_path need no changes.
- Around line 126-147: The spawned whisper-cli child is left running on timeout
because the wait future is dropped; fix by enabling kill-on-drop on the Command
before spawning (call cmd.kill_on_drop(true)) or by ensuring the Child is
explicitly killed (call child.kill().await and wait for its exit) in the timeout
branch and any early-return error branches; update the code around cmd.spawn()
and the timeout match (the variables cmd, child, self.timeout, and the timeout
Err(_) branch) so the child is terminated before returning and the semaphore
permit is only released after the child is killed/awaited.

In `@openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md`:
- Line 4: Update the stale GitHub issue link in archive-report.md by replacing
the repository segment "anthropics/corvus" with "dallay/corvus" so the issue URL
becomes https://github.com/dallay/corvus/issues/246; ensure the change updates
the exact text shown in the file (the string
"https://github.com/anthropics/corvus/issues/246") to the new repository target.

In `@openspec/changes/archive/2026-04-03-audio-input-support/design.md`:
- Around line 82-88: The markdown fenced code blocks showing the pipeline steps
and directory tree are missing language tags (triggering MD040); update each
fenced block around the snippets that list
extract_user_text()/gate_audio_config()/transcription/… and the src/ tree to
include a language identifier (e.g., ```text) so markdownlint passes;
specifically add the same language tag to the three fenced blocks containing the
step list (extract_user_text → enrich_with_memory → …), the expanded audio-step
list (→ gate_audio_config → … → inject_transcription → …), and the src/
directory tree block.
- Around line 826-829: Update the design.md text to remove the incorrect claim
that a standalone doctor module doesn't exist and instead state that the doctor
command is implemented at clients/agent-runtime/src/doctor and is invoked from
the CLI via Commands::Doctor => doctor::run(); also revise the wording to
reflect that health checks are integrated into the runtime startup validation
path (see src/config/validation.rs) and that audio diagnostics are included,
removing any "will be added in future" phrasing and ensuring the document
accurately describes the existing integration.

In `@openspec/changes/archive/2026-04-03-audio-input-support/exploration.md`:
- Around line 434-449: The markdown subsections "### Phase 1 Scope (MVP)", "###
Phase 2 (Follow-up)", and "### Effort Estimate" need blank lines added before
and after each heading to satisfy markdownlint MD022; edit the block containing
those headings in exploration.md so there is an empty line above and below each
`###` heading (ensure you also add a trailing blank line after the final
subsection) to fix the spacing.
- Around line 71-83: The unlabeled fenced code blocks in the exploration
examples (the sequence showing Channel.listen() → parse message → build
ContentPart::Image and the other two similar blocks) violate markdownlint MD040;
update each fenced block that documents the flow (the one containing
Channel.listen(), and the other blocks around the same example) to include a
language tag (e.g., ```text or ```mermaid as appropriate) so the fences are
labeled; locate the blocks near the sequence using identifiers like
Channel.listen(), process_channel_message(), extract_user_text(),
gate_and_stage_images(), StagedImageGuard and run_unified_channel_tool_loop()
and add the language label to each opening fence.

In `@openspec/changes/archive/2026-04-03-audio-input-support/proposal.md`:
- Around line 69-72: Two fenced code blocks (the one showing "Image flow:
Channel → ContentPart::Image → ..." and the pipeline block starting with
"extract_user_text()") are missing language identifiers which triggers
markdownlint MD040; update both fences to include a language label such as
```text so the blocks become labeled code fences (e.g., add "text" to the
opening backticks for the ContentPart::Image/Audio flow block and the
extract_user_text() pipeline block).

In
`@openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md`:
- Around line 341-347: Update the archived spec to match the runtime: change the
Transcriber trait code fence to rust and ensure it reflects current runtime
usage; update the [audio] contract to include the AudioConfig fields
whisper_binary (default "whisper-cli"), max_concurrent_transcriptions, and
transcription_timeout_secs, and replace any lingering references to the binary
named "whisper" with "whisper-cli"; apply the same fixes to the other
Transcriber snippets and [audio] contract occurrences (the other two locations
mentioned) so the archived contract matches AudioConfig and runtime behavior.

In `@openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md`:
- Around line 25-37: Update the Markdown in verify-report.md to satisfy
MD022/MD031/MD040 by adding blank lines before and after the "Anti-Patterns
Check" and "Code Style" headings and by converting the three fenced code blocks
to have language tags and surrounding blank lines; specifically change the
blocks containing "cargo check --manifest-path clients/agent-runtime/Cargo.toml
→ Finished dev profile" and "cargo clippy --manifest-path
clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev
profile" to ```bash fenced blocks with a blank line before and after, and change
the final test summary block ("All test suites pass: unit tests...") to a
```text fenced block with a blank line before and after so the headings and
fenced blocks comply with linting rules.

In `@openspec/specs/audio-input/spec.md`:
- Around line 97-99: The spec's placement of the 7-step audio pipeline is
incorrect; update the documentation so it reflects the actual implementation
order used in clients/agent-runtime: the audio gating/staging/transcription
pipeline is executed before extract_user_text() (i.e., inserted into
process_channel_message() prior to calling extract_user_text()), not between
extract_user_text() and enrich_with_memory(); reference
process_channel_message(), extract_user_text(), and enrich_with_memory() when
making the spec change.

---

Outside diff comments:
In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 619-688: process_channel_message currently performs the audio
pipeline (gate_audio_config, gate_and_stage_audio, transcribe_audio,
inject_transcription) before the per-turn timeout, allowing slow
downloads/transcription to exceed CHANNEL_MESSAGE_TIMEOUT_SECS; move the entire
audio stages under the per-turn timeout boundary (or compute remaining_budget
and pass it into gate_audio_config/gate_and_stage_audio/transcribe_audio) so
that these calls are canceled when the channel turn times out, and ensure any
temp resources from audio_guard are still cleaned up on timeout.
- Around line 2705-2721: The ChannelRuntimeContext currently sets transcriber:
None which causes gate_audio_config() to reject audio; fix by constructing the
transcriber before building runtime_ctx and passing it into
ChannelRuntimeContext as transcriber: Some(...) instead of None. Locate where
runtime_ctx is created (the ChannelRuntimeContext instantiation in
start_channels() and the analogous one in spawn_runtime_handle()) and call the
existing audio/transcriber factory (e.g., the module/function used to create
transcribers in this crate—invoke it with the runtime/config) to produce a
transcriber instance, then set transcriber: Some(transcriber) in both
constructors so gate_audio_config() sees a transcriber present. Ensure any
errors from creating the transcriber are handled/propagated consistent with
existing error handling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5be0db47-780b-4ae1-9073-e138f500a063

📥 Commits

Reviewing files that changed from the base of the PR and between 522f1fe and c2f6341.

📒 Files selected for processing (35)
  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/discord.rs
  • clients/agent-runtime/src/channels/mod.rs
  • clients/agent-runtime/src/channels/telegram.rs
  • clients/agent-runtime/src/channels/traits.rs
  • clients/agent-runtime/src/channels/whatsapp.rs
  • clients/agent-runtime/src/config/mod.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/doctor/mod.rs
  • clients/agent-runtime/src/lib.rs
  • clients/agent-runtime/src/main.rs
  • clients/agent-runtime/src/observability/log.rs
  • clients/agent-runtime/src/observability/mod.rs
  • clients/agent-runtime/src/observability/otel.rs
  • clients/agent-runtime/src/observability/prometheus.rs
  • clients/agent-runtime/src/observability/traits.rs
  • clients/agent-runtime/src/onboard/wizard.rs
  • clients/agent-runtime/src/providers/anthropic.rs
  • clients/agent-runtime/src/providers/compatible.rs
  • clients/agent-runtime/src/providers/copilot.rs
  • clients/agent-runtime/src/providers/openrouter.rs
  • clients/agent-runtime/src/providers/router.rs
  • clients/agent-runtime/src/providers/traits.rs
  • clients/agent-runtime/src/transcription/mod.rs
  • clients/agent-runtime/src/transcription/traits.rs
  • clients/agent-runtime/src/transcription/whisper_cli.rs
  • openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md
  • openspec/changes/archive/2026-04-03-audio-input-support/design.md
  • openspec/changes/archive/2026-04-03-audio-input-support/exploration.md
  • openspec/changes/archive/2026-04-03-audio-input-support/proposal.md
  • openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md
  • openspec/changes/archive/2026-04-03-audio-input-support/state.yaml
  • openspec/changes/archive/2026-04-03-audio-input-support/tasks.md
  • openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md
  • openspec/specs/audio-input/spec.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: submit-gradle
  • GitHub Check: pr-checks
  • GitHub Check: sonar
  • GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (9)
clients/agent-runtime/src/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Files:

  • clients/agent-runtime/src/observability/prometheus.rs
  • clients/agent-runtime/src/providers/router.rs
  • clients/agent-runtime/src/providers/anthropic.rs
  • clients/agent-runtime/src/observability/otel.rs
  • clients/agent-runtime/src/transcription/mod.rs
  • clients/agent-runtime/src/providers/compatible.rs
  • clients/agent-runtime/src/lib.rs
  • clients/agent-runtime/src/channels/whatsapp.rs
  • clients/agent-runtime/src/main.rs
  • clients/agent-runtime/src/observability/mod.rs
  • clients/agent-runtime/src/providers/copilot.rs
  • clients/agent-runtime/src/providers/openrouter.rs
  • clients/agent-runtime/src/observability/log.rs
  • clients/agent-runtime/src/channels/discord.rs
  • clients/agent-runtime/src/config/mod.rs
  • clients/agent-runtime/src/providers/traits.rs
  • clients/agent-runtime/src/doctor/mod.rs
  • clients/agent-runtime/src/onboard/wizard.rs
  • clients/agent-runtime/src/transcription/traits.rs
  • clients/agent-runtime/src/channels/telegram.rs
  • clients/agent-runtime/src/channels/mod.rs
  • clients/agent-runtime/src/observability/traits.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/transcription/whisper_cli.rs
  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/traits.rs
clients/agent-runtime/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test for code validation, or document which checks were skipped and why

Files:

  • clients/agent-runtime/src/observability/prometheus.rs
  • clients/agent-runtime/src/providers/router.rs
  • clients/agent-runtime/src/providers/anthropic.rs
  • clients/agent-runtime/src/observability/otel.rs
  • clients/agent-runtime/src/transcription/mod.rs
  • clients/agent-runtime/src/providers/compatible.rs
  • clients/agent-runtime/src/lib.rs
  • clients/agent-runtime/src/channels/whatsapp.rs
  • clients/agent-runtime/src/main.rs
  • clients/agent-runtime/src/observability/mod.rs
  • clients/agent-runtime/src/providers/copilot.rs
  • clients/agent-runtime/src/providers/openrouter.rs
  • clients/agent-runtime/src/observability/log.rs
  • clients/agent-runtime/src/channels/discord.rs
  • clients/agent-runtime/src/config/mod.rs
  • clients/agent-runtime/src/providers/traits.rs
  • clients/agent-runtime/src/doctor/mod.rs
  • clients/agent-runtime/src/onboard/wizard.rs
  • clients/agent-runtime/src/transcription/traits.rs
  • clients/agent-runtime/src/channels/telegram.rs
  • clients/agent-runtime/src/channels/mod.rs
  • clients/agent-runtime/src/observability/traits.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/transcription/whisper_cli.rs
  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/traits.rs
**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.

Files:

  • clients/agent-runtime/src/observability/prometheus.rs
  • clients/agent-runtime/src/providers/router.rs
  • clients/agent-runtime/src/providers/anthropic.rs
  • clients/agent-runtime/src/observability/otel.rs
  • clients/agent-runtime/src/transcription/mod.rs
  • clients/agent-runtime/src/providers/compatible.rs
  • clients/agent-runtime/src/lib.rs
  • clients/agent-runtime/src/channels/whatsapp.rs
  • clients/agent-runtime/src/main.rs
  • clients/agent-runtime/src/observability/mod.rs
  • clients/agent-runtime/src/providers/copilot.rs
  • clients/agent-runtime/src/providers/openrouter.rs
  • clients/agent-runtime/src/observability/log.rs
  • clients/agent-runtime/src/channels/discord.rs
  • clients/agent-runtime/src/config/mod.rs
  • clients/agent-runtime/src/providers/traits.rs
  • clients/agent-runtime/src/doctor/mod.rs
  • clients/agent-runtime/src/onboard/wizard.rs
  • clients/agent-runtime/src/transcription/traits.rs
  • clients/agent-runtime/src/channels/telegram.rs
  • clients/agent-runtime/src/channels/mod.rs
  • clients/agent-runtime/src/observability/traits.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/transcription/whisper_cli.rs
  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/traits.rs
**/*

⚙️ CodeRabbit configuration file

**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.

Files:

  • clients/agent-runtime/src/observability/prometheus.rs
  • clients/agent-runtime/src/providers/router.rs
  • clients/agent-runtime/src/providers/anthropic.rs
  • clients/agent-runtime/src/observability/otel.rs
  • clients/agent-runtime/src/transcription/mod.rs
  • clients/agent-runtime/src/providers/compatible.rs
  • clients/agent-runtime/src/lib.rs
  • clients/agent-runtime/src/channels/whatsapp.rs
  • clients/agent-runtime/src/main.rs
  • clients/agent-runtime/src/observability/mod.rs
  • openspec/changes/archive/2026-04-03-audio-input-support/state.yaml
  • clients/agent-runtime/src/providers/copilot.rs
  • clients/agent-runtime/src/providers/openrouter.rs
  • clients/agent-runtime/src/observability/log.rs
  • clients/agent-runtime/src/channels/discord.rs
  • clients/agent-runtime/src/config/mod.rs
  • clients/agent-runtime/src/providers/traits.rs
  • clients/agent-runtime/src/doctor/mod.rs
  • openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md
  • openspec/changes/archive/2026-04-03-audio-input-support/proposal.md
  • openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md
  • clients/agent-runtime/src/onboard/wizard.rs
  • clients/agent-runtime/src/transcription/traits.rs
  • openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md
  • openspec/changes/archive/2026-04-03-audio-input-support/exploration.md
  • openspec/specs/audio-input/spec.md
  • openspec/changes/archive/2026-04-03-audio-input-support/tasks.md
  • clients/agent-runtime/src/channels/telegram.rs
  • clients/agent-runtime/src/channels/mod.rs
  • clients/agent-runtime/src/observability/traits.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/transcription/whisper_cli.rs
  • openspec/changes/archive/2026-04-03-audio-input-support/design.md
  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/traits.rs
clients/agent-runtime/src/providers/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Implement Provider trait in src/providers/ and register in src/providers/mod.rs factory when adding a new provider

Files:

  • clients/agent-runtime/src/providers/router.rs
  • clients/agent-runtime/src/providers/anthropic.rs
  • clients/agent-runtime/src/providers/compatible.rs
  • clients/agent-runtime/src/providers/copilot.rs
  • clients/agent-runtime/src/providers/openrouter.rs
  • clients/agent-runtime/src/providers/traits.rs
clients/agent-runtime/src/channels/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Implement Channel trait in src/channels/ with consistent send, listen, and health_check semantics and cover auth/allowlist/health behavior with tests

Files:

  • clients/agent-runtime/src/channels/whatsapp.rs
  • clients/agent-runtime/src/channels/discord.rs
  • clients/agent-runtime/src/channels/telegram.rs
  • clients/agent-runtime/src/channels/mod.rs
  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/traits.rs
clients/agent-runtime/src/main.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

clients/agent-runtime/src/main.rs: Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
Keep startup path lean and avoid heavy initialization in command parsing flow

Files:

  • clients/agent-runtime/src/main.rs
clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable

Files:

  • clients/agent-runtime/src/config/mod.rs
  • clients/agent-runtime/src/config/schema.rs
**/*.{md,mdx}

⚙️ CodeRabbit configuration file

**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.

Files:

  • openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md
  • openspec/changes/archive/2026-04-03-audio-input-support/proposal.md
  • openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md
  • openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md
  • openspec/changes/archive/2026-04-03-audio-input-support/exploration.md
  • openspec/specs/audio-input/spec.md
  • openspec/changes/archive/2026-04-03-audio-input-support/tasks.md
  • openspec/changes/archive/2026-04-03-audio-input-support/design.md
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths

Applied to files:

  • clients/agent-runtime/src/observability/prometheus.rs
  • clients/agent-runtime/src/transcription/mod.rs
  • clients/agent-runtime/src/providers/compatible.rs
  • clients/agent-runtime/src/channels/whatsapp.rs
  • clients/agent-runtime/src/main.rs
  • clients/agent-runtime/src/channels/discord.rs
  • clients/agent-runtime/src/config/mod.rs
  • clients/agent-runtime/src/doctor/mod.rs
  • clients/agent-runtime/src/config/schema.rs
  • clients/agent-runtime/src/transcription/whisper_cli.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests

Applied to files:

  • clients/agent-runtime/src/providers/anthropic.rs
  • clients/agent-runtime/src/providers/compatible.rs
  • clients/agent-runtime/src/channels/whatsapp.rs
  • clients/agent-runtime/src/observability/mod.rs
  • clients/agent-runtime/src/providers/openrouter.rs
  • clients/agent-runtime/src/doctor/mod.rs
  • clients/agent-runtime/src/transcription/traits.rs
  • openspec/changes/archive/2026-04-03-audio-input-support/tasks.md
  • clients/agent-runtime/src/channels/telegram.rs
  • clients/agent-runtime/src/channels/mod.rs
  • clients/agent-runtime/src/observability/traits.rs
  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/traits.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/providers/**/*.rs : Implement `Provider` trait in `src/providers/` and register in `src/providers/mod.rs` factory when adding a new provider

Applied to files:

  • clients/agent-runtime/src/transcription/mod.rs
  • clients/agent-runtime/src/lib.rs
  • clients/agent-runtime/src/observability/mod.rs
  • clients/agent-runtime/src/transcription/traits.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/Cargo.toml : Do not add heavy dependencies for minor convenience; justify new crate additions

Applied to files:

  • clients/agent-runtime/src/transcription/mod.rs
  • clients/agent-runtime/src/main.rs
  • clients/agent-runtime/src/observability/mod.rs
  • clients/agent-runtime/src/config/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why

Applied to files:

  • clients/agent-runtime/src/transcription/mod.rs
  • clients/agent-runtime/src/channels/whatsapp.rs
  • clients/agent-runtime/src/main.rs
  • clients/agent-runtime/src/config/mod.rs
  • clients/agent-runtime/src/doctor/mod.rs
  • clients/agent-runtime/src/config/schema.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Keep startup path lean and avoid heavy initialization in command parsing flow

Applied to files:

  • clients/agent-runtime/src/main.rs
  • clients/agent-runtime/src/channels/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/**/*.rs : Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Applied to files:

  • clients/agent-runtime/src/main.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable

Applied to files:

  • clients/agent-runtime/src/config/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/tools/**/*.rs : Implement `Tool` trait in `src/tools/` with strict parameter schema, validate and sanitize all inputs, and return structured `ToolResult` without panics in runtime path

Applied to files:

  • clients/agent-runtime/src/transcription/traits.rs
🪛 LanguageTool
openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md

[style] ~399-~399: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...tion failure — process timeout - GIVEN a very large audio file near the duration limit - WH...

(EN_WEAK_ADJECTIVE)


[grammar] ~479-~479: Use a hyphen to join words.
Context: ...tartup validation error indicating the 1 hour ceiling #### Scenario: Missing aud...

(QB_NEW_EN_HYPHEN)


[style] ~838-~838: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...t, including: - Zero-byte audio files - Extremely large files (rejected by size limit) - Files ...

(EN_WEAK_ADJECTIVE)


[locale-violation] ~847-~847: In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects.
Context: ... same user sends a text message "hello" afterwards - THEN the text message is processed no...

(AFTERWARDS_US)

openspec/changes/archive/2026-04-03-audio-input-support/exploration.md

[style] ~181-~181: Consider using a different adverb to strengthen your wording.
Context: ...) and audio files (audio field) are completely ignored — messages with only voice/au...

(COMPLETELY_ENTIRELY)

openspec/specs/audio-input/spec.md

[style] ~399-~399: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...tion failure — process timeout - GIVEN a very large audio file near the duration limit - WH...

(EN_WEAK_ADJECTIVE)


[grammar] ~479-~479: Use a hyphen to join words.
Context: ...tartup validation error indicating the 1 hour ceiling #### Scenario: Missing aud...

(QB_NEW_EN_HYPHEN)


[style] ~838-~838: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...t, including: - Zero-byte audio files - Extremely large files (rejected by size limit) - Files ...

(EN_WEAK_ADJECTIVE)


[locale-violation] ~847-~847: In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects.
Context: ... same user sends a text message "hello" afterwards - THEN the text message is processed no...

(AFTERWARDS_US)

openspec/changes/archive/2026-04-03-audio-input-support/design.md

[style] ~943-~943: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nges. - No provider contract changes. - No existing behavior modified. - Rollout: ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.22.0)
openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md

[warning] 25-25: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 25-25: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 30-30: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 30-30: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 35-35: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 35-35: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 313-313: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 320-320: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

openspec/changes/archive/2026-04-03-audio-input-support/proposal.md

[warning] 69-69: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 79-79: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md

[warning] 341-341: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

openspec/changes/archive/2026-04-03-audio-input-support/exploration.md

[warning] 71-71: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 359-359: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 370-370: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 434-434: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 441-441: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 447-447: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

openspec/specs/audio-input/spec.md

[warning] 341-341: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

openspec/changes/archive/2026-04-03-audio-input-support/design.md

[warning] 82-82: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 92-92: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 949-949: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (15)
clients/agent-runtime/src/main.rs (2)

74-74: Good module wiring for transcription integration.

mod transcription; cleanly wires the new runtime transcription module into the binary crate.


718-719: Scoped lint suppression is acceptable here.

Applying clippy::large_futures at the dispatcher boundary is reasonable for this async match-heavy function.

clients/agent-runtime/src/providers/anthropic.rs (1)

975-976: Test fixture updates are correct.

Adding audio_metadata: None consistently keeps test ChatMessage fixtures aligned with the current provider trait contract.

Also applies to: 981-982, 987-988, 1000-1001, 1008-1009, 1023-1024, 1033-1034, 1155-1156, 1175-1176, 1277-1278, 1340-1341, 1346-1347, 1352-1353

clients/agent-runtime/src/transcription/mod.rs (1)

1-2: Module exports look clean and intentional.

Publicly exposing traits and whisper_cli is a solid, minimal surface for the transcription subsystem.

clients/agent-runtime/src/observability/log.rs (1)

192-203: Good audio ingress log coverage with safe metadata fields.

This adds the expected ingress lifecycle visibility without logging raw audio/transcript payloads.

clients/agent-runtime/src/providers/openrouter.rs (1)

538-539: Fixture alignment is correct.

The added audio_metadata: None keeps tests in sync with the expanded ChatMessage schema.

Also applies to: 544-545, 588-589, 594-595, 642-643, 738-739, 759-760

openspec/changes/archive/2026-04-03-audio-input-support/state.yaml (1)

1-8: Archive state entry looks complete and consistent.

Phase state, references, and branch linkage are properly captured.

clients/agent-runtime/src/observability/mod.rs (1)

17-19: Re-export update is correct.

Adding audio ingress types to the module surface keeps observability APIs coherent for downstream users.

clients/agent-runtime/src/onboard/wizard.rs (1)

799-799: Good fail-closed wiring for audio config defaults.

Both onboarding paths now initialize audio explicitly, which keeps generated configs complete and secure-by-default.

Also applies to: 1037-1037

clients/agent-runtime/src/config/mod.rs (1)

5-15: Re-export update is correct and coherent.

Adding AudioConfig to the schema re-export keeps the config API aligned with the new [audio] section.

clients/agent-runtime/src/providers/traits.rs (1)

17-19: audio_metadata addition is backward-compatible and safely defaulted.

Using #[serde(default, skip_serializing_if = "Option::is_none")] here is the right compatibility choice for existing stored history payloads.

Also applies to: 23-29, 32-38, 41-52, 54-64, 67-73, 76-82

clients/agent-runtime/src/transcription/traits.rs (1)

23-42: Transcriber interface is clean and runtime-safe.

Good separation between user-facing transcription errors and startup/doctor health diagnostics.

clients/agent-runtime/src/channels/telegram.rs (1)

3077-3077: LGTM on the widened test matches.

Using _ => panic!(...) keeps these assertions focused on the expected variant now that ContentPart has another case.

Also applies to: 3140-3140, 3171-3171, 3184-3184, 3224-3224, 3279-3279, 3332-3332, 3389-3389

clients/agent-runtime/src/transcription/whisper_cli.rs (1)

50-70: Nice output normalization.

Filtering [BLANK_AUDIO] and collapsing multiline stdout here is a good guard against injecting empty/silent transcripts.

clients/agent-runtime/src/channels/audio_media.rs (1)

282-725: Strong boundary-focused test coverage for the new audio media layer.

Coverage across MIME sniffing, size/duration boundaries, cleanup behavior, and context rendering looks solid for this critical ingress path.

Comment thread clients/agent-runtime/src/channels/audio_media.rs Outdated
Comment thread clients/agent-runtime/src/channels/discord.rs Outdated
Comment thread clients/agent-runtime/src/channels/mod.rs
Comment thread clients/agent-runtime/src/channels/mod.rs
Comment thread clients/agent-runtime/src/channels/telegram.rs
Comment thread openspec/changes/archive/2026-04-03-audio-input-support/proposal.md Outdated
Comment thread openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md Outdated
Comment on lines +25 to +37
```
cargo check --manifest-path clients/agent-runtime/Cargo.toml → Finished dev profile
```

**Clippy**: ✅ Passed (zero warnings)
```
cargo clippy --manifest-path clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev profile
```

**Tests**: ✅ 6,487 passed / 0 failed / 0 ignored
```
All test suites pass: unit tests (3193 lib + 3220 bin), 15 integration test suites, 2 doc-tests.
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix markdownlint violations in fenced blocks and heading spacing.

The fenced blocks need language tags and surrounding blank lines, and the “Anti-Patterns Check” / “Code Style” headings need blank lines around them per MD022/MD031/MD040.

Proposed markdown fix
 **Build**: ✅ Passed
-```
+
+```bash
 cargo check --manifest-path clients/agent-runtime/Cargo.toml → Finished dev profile

Clippy: ✅ Passed (zero warnings)
- + +bash
cargo clippy --manifest-path clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev profile


**Tests**: ✅ 6,487 passed / 0 failed / 0 ignored
-```
+
+```text
All test suites pass: unit tests (3193 lib + 3220 bin), 15 integration test suites, 2 doc-tests.

@@

Code Quality Assessment

Anti-Patterns Check

  • No unwrap()/expect() in production code — all occurrences are in #[cfg(test)] blocks
    @@

Code Style

  • ✅ Follows existing codebase patterns (mirrors StagedImageGuard, ImageRejectionReason, etc.)
</details>



Also applies to: 313-313, 320-320

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>

[warning] 25-25: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

[warning] 25-25: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

[warning] 30-30: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

[warning] 30-30: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

[warning] 35-35: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

[warning] 35-35: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md
around lines 25 - 37, Update the Markdown in verify-report.md to satisfy
MD022/MD031/MD040 by adding blank lines before and after the "Anti-Patterns
Check" and "Code Style" headings and by converting the three fenced code blocks
to have language tags and surrounding blank lines; specifically change the
blocks containing "cargo check --manifest-path clients/agent-runtime/Cargo.toml
→ Finished dev profile" and "cargo clippy --manifest-path
clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev
profile" to ```bash fenced blocks with a blank line before and after, and change
the final test summary block ("All test suites pass: unit tests...") to a

fenced blocks comply with linting rules.

Comment thread openspec/specs/audio-input/spec.md
High priority:
- Broaden MP3 magic-byte sync detection to accept full MPEG frame mask
- Map whisper non-zero exits to TranscriptionFailed by default, Corrupted
  only on decode-related stderr keywords
- Add kill_on_drop(true) to prevent orphaned whisper child processes
- Use create_new(true) for temp file creation to prevent symlink attacks
- Add MultipleAudioParts rejection variant instead of SystemError
- Check model path is_file() not just exists() in doctor
- Measure and report actual transcription latency instead of clip duration
- Wire WhisperCliTranscriber into ChannelRuntimeContext when audio enabled
- Move audio pipeline stages under per-turn timeout boundary
- Fall back to system model path when user path doesn't exist

Medium priority:
- Add deny_unknown_fields to AudioConfig for strict TOML parsing
- Validate transcription concurrency and timeout are non-zero at startup
- Deduplicate audio constants between schema and audio_media modules
- Record audio ingress metrics in OTEL and Prometheus backends
- Add user_with_media constructor to avoid partial-state mutations
- Add symmetric serde tests for audio_metadata field
- Fix discord test panic message for wildcard match arm
- Handle unauthorized audio-only messages in Telegram channel

Documentation:
- Fix stale GitHub link in archive report
- Add markdown language tags to fenced code blocks (MD040)
- Fix heading spacing (MD022) in exploration and verify-report
- Update spec pipeline order to match implementation
- Sync archived spec with AudioConfig runtime fields
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

♻️ Duplicate comments (3)
clients/agent-runtime/src/doctor/mod.rs (1)

944-971: ⚠️ Potential issue | 🟡 Minor

audio_health_pass_model_exists still bypasses production logic.

This test rebuilds model-check behavior inline instead of exercising check_audio_health, so it can pass while real doctor logic regresses.

As per coding guidelines **/*: “Look for behavioral regressions, missing tests, and contract breaks across modules.”

openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md (1)

24-37: ⚠️ Potential issue | 🟡 Minor

Resolve remaining markdownlint spacing violations (MD031/MD022).

Fences at Line 25/30/35 need surrounding blank lines, and headings at Line 313 and Line 320 need blank lines below them.

🧹 Proposed markdown fix
 **Build**: ✅ Passed
+
 ```bash
 cargo check --manifest-path clients/agent-runtime/Cargo.toml → Finished dev profile

Clippy: ✅ Passed (zero warnings)
+

cargo clippy --manifest-path clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev profile

Tests: ✅ 6,487 passed / 0 failed / 0 ignored
+

All test suites pass: unit tests (3193 lib + 3220 bin), 15 integration test suites, 2 doc-tests.

@@

Anti-Patterns Check

  • No unwrap()/expect() in production code — all occurrences are in #[cfg(test)] blocks
    @@

Code Style

  • ✅ Follows existing codebase patterns (mirrors StagedImageGuard, ImageRejectionReason, etc.)
</details>

 


Also applies to: 313-321

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md
around lines 24 - 37, The markdown has missing blank lines around fenced code
blocks and after two headings; add a blank line before and after each
triple-backtick fence that wraps the "cargo check --manifest-path
clients/agent-runtime/Cargo.toml → Finished dev profile" and the "cargo clippy
..." and the "All test suites pass: ..." code blocks, and insert a blank line
immediately below the "Anti-Patterns Check" and "Code Style" headings so each
heading is followed by an empty line; update the verify-report.md content
accordingly to satisfy MD031/MD022.


</details>

</blockquote></details>
<details>
<summary>clients/agent-runtime/src/channels/mod.rs (1)</summary><blockquote>

`1123-1125`: _⚠️ Potential issue_ | _🟡 Minor_

**Keep `MultipleAudioParts` distinct in ingress telemetry.**

This still collapses a known validation failure into `SystemError`, so rejected multi-audio turns are indistinguishable from real runtime faults in `AudioIngressEvent`. Add a dedicated `AudioIngressReason` variant and map it here instead.

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @clients/agent-runtime/src/channels/mod.rs around lines 1123 - 1125, The
match currently maps audio_media::AudioRejectionReason::MultipleAudioParts to
AudioIngressReason::SystemError, collapsing validation failures with real
faults; add a new AudioIngressReason variant (e.g., MultipleAudioParts) to the
AudioIngressReason enum and update the match in the code handling
audio_media::AudioRejectionReason so MultipleAudioParts maps to the new
AudioIngressReason::MultipleAudioParts; ensure any places that construct or
pattern-match AudioIngressEvent/AudioIngressReason are updated to handle the new
variant so telemetry distinguishes validation rejection from system errors.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/settings.local.json:

  • Around line 5-6: The new permissions are too broad: replace the unrestricted
    "Bash(gh pr:*)" with a scoped, read-only GH CLI set (e.g., "Bash(gh pr:view,gh
    pr:list,gh pr:status)" or the minimal verbs your workflow needs) and change
    "Read(//tmp/)" to a dedicated temp subdirectory used only by this workflow
    (e.g., "Read(//tmp//
    )") so the CLAUDE settings grant least
    privilege while still allowing the workflow’s required read-only PR queries and
    access to its own temp folder.

In @clients/agent-runtime/src/channels/audio_media.rs:

  • Around line 120-124: The current MP3 sniff accepts any second byte with top 3
    bits set (0xE0) which also matches reserved layer values and ADTS AAC headers
    (e.g., 0xFF 0xF1); update the check that returns AllowedAudioMime::Mp3 to also
    require valid (non-zero) MPEG layer bits so reserved/ADTS frames are rejected —
    replace the condition in the snippet that inspects sniffed_bytes (and the
    duplicate at the other occurrence) with a combined check: ensure
    (sniffed_bytes[1] & 0xE0) == 0xE0 AND (sniffed_bytes[1] & 0x06) != 0, and flip
    the test validate_audio_mime_detects_mp3_sync_e0 to expect rejection instead of
    acceptance.
  • Around line 239-277: The to_context_string() method in audio_media.rs
    currently injects sha256 and byte_len into model-facing history; remove those
    fields so the context string only contains the media marker (mime), optional
    duration, the sanitized transcription, and the sanitized caption (retain the
    existing newline stripping and 200-char truncation logic). Update the initial
    format call in to_context_string() (and any subsequent writes) to no longer
    include byte_len or sha256, ensure the closing ']' behavior stays the same, and
    update tests that assert the produced context to expect strings without the hash
    prefix and byte size (see to_context_string and ChatMessage::user_with_audio for
    where this string is consumed).

In @clients/agent-runtime/src/channels/mod.rs:

  • Around line 642-688: The per-turn timeout (CHANNEL_MESSAGE_TIMEOUT_SECS) and
    started_at stopwatch must be started before any audio work so audio
    fetch/staging and transcriber semaphore waits are counted; move the creation of
    the per-turn timeout/stopwatch out of the post-audio section and into the code
    path before evaluating msg.has_audio_parts(), or alternatively compute the
    remaining budget and thread it into the audio functions (pass the
    deadline/timeout into gate_audio_config, gate_and_stage_audio, and
    transcribe_audio) so those calls respect the same timeout; update uses in the
    audio pipeline (audio_history_metas, gate_audio_config, gate_and_stage_audio,
    transcribe_audio, inject_transcription) to either run under the pre-started
    timeout or accept and honor the remaining deadline.
  • Around line 1479-1483: The injected_text currently prefixes the transcript
    with a synthetic label depending on caption_text, which alters downstream
    behavior; instead set injected_text to the transcript content itself (the
    trimmed string) without any “[Voice message transcription]”/“[Audio
    transcription]” prefix, removing the conditional formatting logic around
    caption_text and leaving audio provenance to AudioHistoryMeta so downstream
    memory/pre-execution checks and providers see the exact transcribed text.

In @clients/agent-runtime/src/channels/telegram.rs:

  • Around line 1856-1870: The current TOCTOU happens because you create the file
    with OpenOptions::create_new and then drop the std::fs::File handle before
    calling tokio::fs::write, allowing the file to be swapped; fix it by writing
    through the original file handle instead of dropping it: keep the std::fs::File
    returned by OpenOptions::open (the variable currently named file), and either
    call file.write_all(&bytes) (or wrap it in spawn_blocking if you must avoid
    blocking the async runtime) or convert it to a tokio::fs::File via
    tokio::fs::File::from_std(file) and call async write_all; ensure you flush (and
    optionally sync_all) and close the handle before proceeding.

In @clients/agent-runtime/src/config/schema.rs:

  • Around line 3345-3371: When audio is enabled (ac.enabled), validate that
    ac.whisper_binary and ac.transcription_model are not blank: trim whitespace and
    if either is empty, return an error (anyhow::bail!) stating that whisper_binary
    and/or transcription_model must be non-empty when audio is enabled; perform
    these checks alongside the existing ac.allowed_channels validation (before the
    tracing::info! log) so the audio path fails closed at startup rather than later
    when spawning whisper or resolving models.

In @clients/agent-runtime/src/doctor/mod.rs:

  • Around line 687-692: The code currently pushes DiagItem::ok for the whisper
    binary when the spawned process returns any Ok result, which can mark a non-zero
    exit (e.g., from running --help) as healthy; change the check after running
    the binary so you inspect the child process ExitStatus (use status.success())
    and only push DiagItem::ok when success() is true, otherwise push a failing
    DiagItem (e.g., DiagItem::err) with the binary_path, the non-zero exit code or
    status, and any stderr/stdout to make the failure clear.

In @clients/agent-runtime/src/providers/traits.rs:

  • Around line 17-19: The audio_metadata field currently serializes
    AudioHistoryMeta.transcription duplicating the transcript already stored in
    ChatMessage.content; remove or scrub the transcription before persisting by
    ensuring AudioHistoryMeta.transcription is either omitted or set to None/empty
    during serialization for traits.rs (affecting the audio_metadata field and any
    roundtrip test logic that inspects AudioHistoryMeta in the tests around the
    lines referenced); update the serializer behavior or the code that constructs
    audio_metadata to not carry user speech text while keeping other metadata fields
    intact so only ChatMessage.content retains the transcript.

In @clients/agent-runtime/src/transcription/whisper_cli.rs:

  • Around line 113-116: The model path check uses exists() which allows
    directories; update both transcribe() and health_check() to validate the
    resolved model path with is_file() instead of exists() (i.e., replace checks
    that call self.model_path.exists() with self.model_path.is_file()) so
    directories are rejected early and behavior matches the doctor module's
    validation.
  • Around line 194-214: The health_check in whisper_cli.rs currently spawns
    Command::new(&self.binary_path).arg("--help") and treats any Ok(_) from
    status().await as success; change the logic in health_check to inspect the
    returned std::process::ExitStatus (from binary_check) and fail if
    !status.success(): return Err including the exit code or status (use
    status.code() or the ExitStatus) so non‑zero exits of the whisper-cli --help
    properly produce an error for self.binary_path / health_check / binary_check.

In @openspec/changes/archive/2026-04-03-audio-input-support/design.md:

  • Around line 829-850: The doc snippet is stale: replace the old
    check_audio_config/DoctorWarning logic with the current
    check_audio_health/DiagItem semantics—call check_audio_health (or rename the
    snippet) and produce DiagItem entries instead of DoctorWarning, using the
    module's file-existence check used elsewhere (e.g., check_model_file_exists or
    model_path.is_file() rather than model_path.exists()) and update messages to
    match DiagItem fields (kind/source "audio", diagnostic message, and appropriate
    severity/category). Also update references from whisper_binary/whisper model
    checks to the actual identifiers used by check_audio_health (e.g.,
    resolve_model_path -> the current resolver) so the doc mirrors the real function
    names and return structure.
  • Around line 364-379: Update the design docs to match the implemented
    transcription trait: change TranscriptionResult.duration_secs from f64 to
    Option, update any function signatures that still show anyhow::Result or
    Result<TranscriptionResult, _> to the actual Result<TranscriptionResult,
    AudioRejectionReason> shape, and change health_check() return documentation from
    bool to the implemented Result<(), String>; ensure all occurrences (including
    the other referenced sections) reference the concrete types and error enum
    AudioRejectionReason and the TranscriptionResult struct as implemented.

In @openspec/changes/archive/2026-04-03-audio-input-support/proposal.md:

  • Line 39: Update the stale count "6 error types" in the proposal where the
    phrase "unsupported format, too large, too long, corrupted, transcription
    failed, no speech" appears—either remove the numeric count or replace it with an
    accurate description that includes the additional cases (disabled, channel,
    transcriber, system) implemented in the PR so the taxonomy in the text matches
    the code/implementation; ensure the phrase describing error types reflects the
    full set or uses non-numeric language like "the following error types" to avoid
    future drift.

In
@openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md:

  • Around line 632-647: The spec omits the new
    AudioRejectionReason::MultipleAudioParts variant; add a 12th row to the
    rejection table for MultipleAudioParts with a clear user-facing message (e.g.,
    "Please send only one audio file at a time.") and an "Emitted When" description
    like "More than one audio attachment is present / runtime rejects multiple audio
    parts", ensuring the taxonomy is updated from 11 to 12 variants so the archived
    spec matches the shipped enum and runtime behavior; reference
    AudioRejectionReason::MultipleAudioParts and update the surrounding text
    asserting exhaustiveness (REQ-11) to reflect 12 variants.

In @openspec/specs/audio-input/spec.md:

  • Around line 404-415: Update the spec text for the health_check() scenarios to
    describe the resolved whisper model lookup order rather than a single hard-coded
    path: list the precedence used by the implementation (explicit configured path,
    user home path like ~/.corvus/models/whisper/{model}.bin, then
    system/package-managed locations) or reference the resolved path returned by the
    health_check() logic; ensure the "unhealthy" scenario expects an Err(String)
    that names the resolved path it tried. Apply the same wording change to the
    other affected section referenced (the block around lines 892-915) so
    package-managed installs and the documented system fallback are covered
    consistently.
  • Around line 97-115: Add and document an early fail-closed check for multiple
    audio parts: before the 7-step pipeline in process_channel_message() (i.e.,
    prior to Parse/extract_user_text()), detect if more than one ContentPart::Audio
    is present and immediately reject with AudioRejectionReason::MultipleAudioParts
    and emit an AudioIngressEvent; update REQ-2 to state this early rejection and
    update the REQ-11 taxonomy/table to include the MultipleAudioParts reason and
    its human-readable message so the docs match the runtime behavior (also apply
    the same insertion/update in the corresponding section around lines 632-647).

Duplicate comments:
In @clients/agent-runtime/src/channels/mod.rs:

  • Around line 1123-1125: The match currently maps
    audio_media::AudioRejectionReason::MultipleAudioParts to
    AudioIngressReason::SystemError, collapsing validation failures with real
    faults; add a new AudioIngressReason variant (e.g., MultipleAudioParts) to the
    AudioIngressReason enum and update the match in the code handling
    audio_media::AudioRejectionReason so MultipleAudioParts maps to the new
    AudioIngressReason::MultipleAudioParts; ensure any places that construct or
    pattern-match AudioIngressEvent/AudioIngressReason are updated to handle the new
    variant so telemetry distinguishes validation rejection from system errors.

In @openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md:

  • Around line 24-37: The markdown has missing blank lines around fenced code
    blocks and after two headings; add a blank line before and after each
    triple-backtick fence that wraps the "cargo check --manifest-path
    clients/agent-runtime/Cargo.toml → Finished dev profile" and the "cargo clippy
    ..." and the "All test suites pass: ..." code blocks, and insert a blank line
    immediately below the "Anti-Patterns Check" and "Code Style" headings so each
    heading is followed by an empty line; update the verify-report.md content
    accordingly to satisfy MD031/MD022.

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: ASSERTIVE

**Plan**: Pro

**Run ID**: `3d6a679b-6c7c-46da-983f-3bf8712ead66`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between c2f63419f794663da8ed48e9ffea658da3c5dbac and aa848be8752cee1ea4c499f8b9e299899d19a998.

</details>

<details>
<summary>📒 Files selected for processing (19)</summary>

* `.claude/settings.local.json`
* `clients/agent-runtime/src/channels/audio_media.rs`
* `clients/agent-runtime/src/channels/discord.rs`
* `clients/agent-runtime/src/channels/mod.rs`
* `clients/agent-runtime/src/channels/telegram.rs`
* `clients/agent-runtime/src/config/schema.rs`
* `clients/agent-runtime/src/doctor/mod.rs`
* `clients/agent-runtime/src/observability/otel.rs`
* `clients/agent-runtime/src/observability/prometheus.rs`
* `clients/agent-runtime/src/providers/traits.rs`
* `clients/agent-runtime/src/transcription/traits.rs`
* `clients/agent-runtime/src/transcription/whisper_cli.rs`
* `openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md`
* `openspec/changes/archive/2026-04-03-audio-input-support/design.md`
* `openspec/changes/archive/2026-04-03-audio-input-support/exploration.md`
* `openspec/changes/archive/2026-04-03-audio-input-support/proposal.md`
* `openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md`
* `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md`
* `openspec/specs/audio-input/spec.md`

</details>

</details>

<details>
<summary>📜 Review details</summary>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)</summary>

* GitHub Check: sonar
* GitHub Check: pr-checks
* GitHub Check: submit-gradle
* GitHub Check: Cloudflare Pages

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>📓 Path-based instructions (8)</summary>

<details>
<summary>clients/agent-runtime/src/channels/**/*.rs</summary>


**📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)**

> Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests

Files:
- `clients/agent-runtime/src/channels/discord.rs`
- `clients/agent-runtime/src/channels/mod.rs`
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>clients/agent-runtime/src/**/*.rs</summary>


**📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)**

> `clients/agent-runtime/src/**/*.rs`: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
> Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Files:
- `clients/agent-runtime/src/channels/discord.rs`
- `clients/agent-runtime/src/observability/prometheus.rs`
- `clients/agent-runtime/src/observability/otel.rs`
- `clients/agent-runtime/src/transcription/traits.rs`
- `clients/agent-runtime/src/providers/traits.rs`
- `clients/agent-runtime/src/doctor/mod.rs`
- `clients/agent-runtime/src/transcription/whisper_cli.rs`
- `clients/agent-runtime/src/channels/mod.rs`
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/config/schema.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>clients/agent-runtime/**/*.rs</summary>


**📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)**

> Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why

Files:
- `clients/agent-runtime/src/channels/discord.rs`
- `clients/agent-runtime/src/observability/prometheus.rs`
- `clients/agent-runtime/src/observability/otel.rs`
- `clients/agent-runtime/src/transcription/traits.rs`
- `clients/agent-runtime/src/providers/traits.rs`
- `clients/agent-runtime/src/doctor/mod.rs`
- `clients/agent-runtime/src/transcription/whisper_cli.rs`
- `clients/agent-runtime/src/channels/mod.rs`
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/config/schema.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>**/*.rs</summary>


**⚙️ CodeRabbit configuration file**

> `**/*.rs`: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
> Flag unnecessary clones, unchecked panics in production paths, and weak error context.
> Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
> 

Files:
- `clients/agent-runtime/src/channels/discord.rs`
- `clients/agent-runtime/src/observability/prometheus.rs`
- `clients/agent-runtime/src/observability/otel.rs`
- `clients/agent-runtime/src/transcription/traits.rs`
- `clients/agent-runtime/src/providers/traits.rs`
- `clients/agent-runtime/src/doctor/mod.rs`
- `clients/agent-runtime/src/transcription/whisper_cli.rs`
- `clients/agent-runtime/src/channels/mod.rs`
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/config/schema.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>**/*</summary>


**⚙️ CodeRabbit configuration file**

> `**/*`: Security first, performance second.
> Validate input boundaries, auth/authz implications, and secret management.
> Look for behavioral regressions, missing tests, and contract breaks across modules.
> 

Files:
- `clients/agent-runtime/src/channels/discord.rs`
- `clients/agent-runtime/src/observability/prometheus.rs`
- `clients/agent-runtime/src/observability/otel.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/proposal.md`
- `clients/agent-runtime/src/transcription/traits.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/exploration.md`
- `clients/agent-runtime/src/providers/traits.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md`
- `openspec/specs/audio-input/spec.md`
- `clients/agent-runtime/src/doctor/mod.rs`
- `clients/agent-runtime/src/transcription/whisper_cli.rs`
- `clients/agent-runtime/src/channels/mod.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/design.md`
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/config/schema.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>**/*.{md,mdx}</summary>


**⚙️ CodeRabbit configuration file**

> `**/*.{md,mdx}`: Verify technical accuracy and that docs stay aligned with code changes.
> For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.
> 

Files:
- `openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/proposal.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/exploration.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md`
- `openspec/specs/audio-input/spec.md`
- `openspec/changes/archive/2026-04-03-audio-input-support/design.md`

</details>
<details>
<summary>clients/agent-runtime/src/providers/**/*.rs</summary>


**📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)**

> Implement `Provider` trait in `src/providers/` and register in `src/providers/mod.rs` factory when adding a new provider

Files:
- `clients/agent-runtime/src/providers/traits.rs`

</details>
<details>
<summary>clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs</summary>


**📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)**

> Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable

Files:
- `clients/agent-runtime/src/config/schema.rs`

</details>

</details><details>
<summary>🧠 Learnings (7)</summary>

<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths


**Applied to files:**
- `clients/agent-runtime/src/channels/discord.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md`
- `openspec/specs/audio-input/spec.md`
- `clients/agent-runtime/src/doctor/mod.rs`
- `clients/agent-runtime/src/transcription/whisper_cli.rs`
- `clients/agent-runtime/src/config/schema.rs`

</details>
<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/providers/**/*.rs : Implement Provider trait in src/providers/ and register in src/providers/mod.rs factory when adding a new provider


**Applied to files:**
- `clients/agent-runtime/src/transcription/traits.rs`

</details>
<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement Channel trait in src/channels/ with consistent send, listen, and health_check semantics and cover auth/allowlist/health behavior with tests


**Applied to files:**
- `clients/agent-runtime/src/transcription/traits.rs`
- `clients/agent-runtime/src/providers/traits.rs`
- `openspec/specs/audio-input/spec.md`
- `clients/agent-runtime/src/channels/mod.rs`
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test for code validation, or document which checks were skipped and why


**Applied to files:**
- `clients/agent-runtime/src/providers/traits.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md`
- `clients/agent-runtime/src/doctor/mod.rs`
- `openspec/changes/archive/2026-04-03-audio-input-support/design.md`
- `clients/agent-runtime/src/config/schema.rs`
- `clients/agent-runtime/src/channels/audio_media.rs`

</details>
<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/.github/**/*.{yml,yaml} : For workflow/template-only changes, ensure YAML/template syntax validity


**Applied to files:**
- `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md`

</details>
<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools}/**/*.rs : Treat src/security/, src/gateway/, src/tools/ as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks


**Applied to files:**
- `clients/agent-runtime/src/channels/telegram.rs`
- `clients/agent-runtime/src/config/schema.rs`

</details>
<details>
<summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>

Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable


**Applied to files:**
- `clients/agent-runtime/src/config/schema.rs`

</details>

</details><details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>openspec/changes/archive/2026-04-03-audio-input-support/exploration.md</summary>

[style] ~181-~181: Consider using a different adverb to strengthen your wording.
Context: ...) and audio files (`audio` field) are **completely ignored** — messages with only voice/au...

(COMPLETELY_ENTIRELY)

</details>
<details>
<summary>openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md</summary>

[style] ~399-~399: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...tion failure — process timeout  - GIVEN a very large audio file near the duration limit - WH...

(EN_WEAK_ADJECTIVE)

---

[grammar] ~482-~482: Use a hyphen to join words.
Context: ...tartup validation error indicating the 1 hour ceiling  #### Scenario: Missing aud...

(QB_NEW_EN_HYPHEN)

---

[style] ~841-~841: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...t, including: - Zero-byte audio files - Extremely large files (rejected by size limit) - Files ...

(EN_WEAK_ADJECTIVE)

---

[locale-violation] ~850-~850: In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects.
Context: ... same user sends a text message "hello" afterwards - THEN the text message is processed no...

(AFTERWARDS_US)

</details>
<details>
<summary>openspec/specs/audio-input/spec.md</summary>

[style] ~399-~399: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...tion failure — process timeout  - GIVEN a very large audio file near the duration limit - WH...

(EN_WEAK_ADJECTIVE)

---

[grammar] ~482-~482: Use a hyphen to join words.
Context: ...tartup validation error indicating the 1 hour ceiling  #### Scenario: Missing aud...

(QB_NEW_EN_HYPHEN)

---

[style] ~841-~841: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...t, including: - Zero-byte audio files - Extremely large files (rejected by size limit) - Files ...

(EN_WEAK_ADJECTIVE)

---

[locale-violation] ~850-~850: In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects.
Context: ... same user sends a text message "hello" afterwards - THEN the text message is processed no...

(AFTERWARDS_US)

</details>
<details>
<summary>openspec/changes/archive/2026-04-03-audio-input-support/design.md</summary>

[style] ~942-~942: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nges. - No provider contract changes. - No existing behavior modified. - Rollout: ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

</details>

</details>
<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>

<details>
<summary>openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md</summary>

[warning] 25-25: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

[warning] 30-30: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

[warning] 35-35: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

[warning] 313-313: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

---

[warning] 320-320: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (10)</summary><blockquote>

<details>
<summary>clients/agent-runtime/src/channels/discord.rs (1)</summary><blockquote>

`955-955`: **LGTM - test diagnostic improved.**

The panic message now accurately reflects the unexpected variant, addressing the prior feedback. Good fix.

</blockquote></details>
<details>
<summary>clients/agent-runtime/src/observability/otel.rs (1)</summary><blockquote>

`209-223`: **Audio ingress OTEL handling is now correctly wired.**

Line 209 no longer drops `ObserverEvent::AudioIngress`, and Line 215 emits the counter with structured attributes.

</blockquote></details>
<details>
<summary>clients/agent-runtime/src/observability/prometheus.rs (1)</summary><blockquote>

`120-131`: **Prometheus audio ingress metrics are correctly added.**

The new counter registration and the `ObserverEvent::AudioIngress` handler cover the previously missing telemetry path.



Also applies to: 205-215

</blockquote></details>
<details>
<summary>openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md (1)</summary><blockquote>

`4-7`: **Archive metadata looks aligned and corrected.**

Issue linking and archive metadata are consistent with the PR context.

</blockquote></details>
<details>
<summary>clients/agent-runtime/src/transcription/traits.rs (1)</summary><blockquote>

`35-45`: **Transcriber contract is well-shaped for pipeline error mapping.**

Returning `AudioRejectionReason` from `transcribe()` and structured readiness errors from `health_check()` keeps ingress handling deterministic.

</blockquote></details>
<details>
<summary>clients/agent-runtime/src/channels/telegram.rs (4)</summary><blockquote>

`1-1`: **LGTM!**

Import aligns with the new audio_media module structure.

---

`63-107`: **LGTM!**

Voice/audio parsing correctly mirrors the image parsing pattern. The forced `audio/ogg` for voice notes matches Telegram's spec.

---

`749-776`: **LGTM!**

Correctly addresses the unauthorized media-only message handling gap. Senders now receive notification regardless of whether they sent text or audio-only content.

---

`3114-3114`: **LGTM!**

Test assertion pattern change is fine.

</blockquote></details>
<details>
<summary>openspec/changes/archive/2026-04-03-audio-input-support/exploration.md (1)</summary><blockquote>

`1-463`: **LGTM!**

Documentation accurately reflects the implementation design. The markdown formatting issues from previous review (MD040/MD022) have been addressed — code blocks are labeled and headings have proper spacing.

</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread .claude/settings.local.json Outdated
Comment thread clients/agent-runtime/src/channels/audio_media.rs Outdated
Comment thread clients/agent-runtime/src/channels/audio_media.rs
Comment thread clients/agent-runtime/src/channels/mod.rs
Comment on lines +1479 to +1483
let injected_text = if caption_text.is_some() {
format!("[Audio transcription]: {trimmed}")
} else {
format!("[Voice message transcription]: {trimmed}")
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inject the transcript text itself, not a synthetic label.

Prefixing the transcript with [Voice message transcription]: / [Audio transcription]: changes what memory, pre-execution checks, and the provider see, so audio input is no longer equivalent to the typed user text. The audio provenance is already preserved in AudioHistoryMeta.

✂️ Suggested change
-                    let injected_text = if caption_text.is_some() {
-                        format!("[Audio transcription]: {trimmed}")
-                    } else {
-                        format!("[Voice message transcription]: {trimmed}")
-                    };
+                    let injected_text = trimmed.clone();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let injected_text = if caption_text.is_some() {
format!("[Audio transcription]: {trimmed}")
} else {
format!("[Voice message transcription]: {trimmed}")
};
let injected_text = trimmed.clone();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/mod.rs` around lines 1479 - 1483, The
injected_text currently prefixes the transcript with a synthetic label depending
on caption_text, which alters downstream behavior; instead set injected_text to
the transcript content itself (the trimmed string) without any “[Voice message
transcription]”/“[Audio transcription]” prefix, removing the conditional
formatting logic around caption_text and leaving audio provenance to
AudioHistoryMeta so downstream memory/pre-execution checks and providers see the
exact transcribed text.

Comment on lines +829 to +850
```rust
fn check_audio_config(config: &AudioConfig) -> Vec<DoctorWarning> {
let mut warnings = Vec::new();
if config.enabled {
// 1. Check whisper binary is in PATH or at configured path
if which::which(&config.whisper_binary).is_err() {
warnings.push(DoctorWarning::new(
"audio",
format!("whisper binary '{}' not found in PATH", config.whisper_binary),
));
}
// 2. Check model file exists
let model_path = resolve_model_path(&config.transcription_model);
if !model_path.exists() {
warnings.push(DoctorWarning::new(
"audio",
format!("Whisper model not found at {}", model_path.display()),
));
}
}
warnings
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Doctor API snippet is stale relative to current module.

The snippet references check_audio_config, DoctorWarning, and model_path.exists(); the implemented path is check_audio_health with DiagItem and file check semantics. Please refresh this block to avoid misleading future changes.

As per coding guidelines **/*.{md,mdx}: “Verify technical accuracy and that docs stay aligned with code changes.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/changes/archive/2026-04-03-audio-input-support/design.md` around
lines 829 - 850, The doc snippet is stale: replace the old
check_audio_config/DoctorWarning logic with the current
check_audio_health/DiagItem semantics—call check_audio_health (or rename the
snippet) and produce DiagItem entries instead of DoctorWarning, using the
module's file-existence check used elsewhere (e.g., check_model_file_exists or
model_path.is_file() rather than model_path.exists()) and update messages to
match DiagItem fields (kind/source "audio", diagnostic message, and appropriate
severity/category). Also update references from whisper_binary/whisper model
checks to the actual identifiers used by check_audio_health (e.g.,
resolve_model_path -> the current resolver) so the doc mirrors the real function
names and return structure.

- Audio observability events (`AudioIngressEvent`, `on_audio_ingress()`)
- Audio history metadata (`AudioHistoryMeta` with transcription text)
- `StagedAudioGuard` RAII cleanup
- 6 error types: unsupported format, too large, too long, corrupted, transcription failed, no speech
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix rejection taxonomy count in proposal.

Line 39 states “6 error types”, but this PR’s documented/implemented taxonomy is larger (including disabled/channel/transcriber/system cases). Update this line to avoid stale requirements language.

✏️ Proposed doc fix
-- 6 error types: unsupported format, too large, too long, corrupted, transcription failed, no speech
+- 11+ rejection/error categories covering format, size/duration, corruption, transcription/no-speech, and runtime availability/system failures

As per coding guidelines **/*.{md,mdx}: “Verify technical accuracy and that docs stay aligned with code changes.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- 6 error types: unsupported format, too large, too long, corrupted, transcription failed, no speech
- 11+ rejection/error categories covering format, size/duration, corruption, transcription/no-speech, and runtime availability/system failures
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/changes/archive/2026-04-03-audio-input-support/proposal.md` at line
39, Update the stale count "6 error types" in the proposal where the phrase
"unsupported format, too large, too long, corrupted, transcription failed, no
speech" appears—either remove the numeric count or replace it with an accurate
description that includes the additional cases (disabled, channel, transcriber,
system) implemented in the PR so the taxonomy in the text matches the
code/implementation; ensure the phrase describing error types reflects the full
set or uses non-numeric language like "the following error types" to avoid
future drift.

Comment on lines +632 to +647
| Rejection Reason | User-Facing Message | Emitted When |
|-------------------------|--------------------------------------------------------------------------------------|-----------------------------------------------------------|
| `Disabled` | "Audio input is currently disabled." | `audio.enabled` is `false` |
| `ChannelNotAllowed` | "Audio input is not enabled for this channel." | Channel not in `audio.allowed_channels` |
| `FetchFailed` | "I couldn't download that audio safely. Please try again." | Channel fetch fails (network, auth, timeout) |
| `MimeRejected` | "That audio format is not supported. Supported formats: OGG, MP3, WAV, M4A." | Magic-byte sniffing does not match allowed formats |
| `Oversize` | "That audio file is too large to process. Maximum size: 25 MB." | Audio bytes exceed effective size limit |
| `TooLong` | "That audio is too long to process. Maximum duration: 10 minutes." | Duration exceeds effective duration limit |
| `Corrupted` | "That audio file appears to be corrupted and cannot be processed." | Transcription engine cannot decode the audio |
| `TranscriptionFailed` | "Audio transcription failed. Please try again or send text instead." | Transcriber returns error (process crash, timeout, etc.) |
| `NoSpeechDetected` | "No speech was detected in that audio. Please try again with a clearer recording." | Transcription produces empty/whitespace-only text |
| `TranscriberUnavailable`| "Audio transcription is not available on this agent. Please send text instead." | No healthy Transcriber is registered or health check fails|
| `SystemError` | "An internal error occurred while processing audio. Please try again." | Unexpected internal error (e.g., temp file I/O failure, semaphore poisoning) |

This taxonomy (11 variants) MUST be exhaustive for Phase 1. Every audio rejection MUST map to
exactly one of these reasons.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the MultipleAudioParts rejection.

REQ-11 still declares an exhaustive 11-variant taxonomy, but the runtime now rejects multiple audio attachments with AudioRejectionReason::MultipleAudioParts and a dedicated user-facing message. Please add that 12th variant here, along with its ingress-reason behavior, so the archived spec matches the shipped contract.

As per coding guidelines, "Verify technical accuracy and that docs stay aligned with code changes."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md`
around lines 632 - 647, The spec omits the new
AudioRejectionReason::MultipleAudioParts variant; add a 12th row to the
rejection table for `MultipleAudioParts` with a clear user-facing message (e.g.,
"Please send only one audio file at a time.") and an "Emitted When" description
like "More than one audio attachment is present / runtime rejects multiple audio
parts", ensuring the taxonomy is updated from 11 to 12 variants so the archived
spec matches the shipped enum and runtime behavior; reference
AudioRejectionReason::MultipleAudioParts and update the surrounding text
asserting exhaustiveness (REQ-11) to reflect 12 variants.

Comment on lines +97 to +115
The runtime MUST process every inbound audio through a 7-step pipeline inserted into
`process_channel_message()` before `extract_user_text()` and `enrich_with_memory()`:

1. **Parse**: Channel extracts audio metadata into `ContentPart::Audio` (REQ-1)
2. **Gate config**: Check `[audio]` config — `enabled` and `allowed_channels` (REQ-7)
3. **Fetch**: Download audio bytes from the channel's platform API (REQ-10)
4. **Validate**: Apply MIME sniffing, size limit, and duration limit (REQ-3, REQ-4)
5. **Stage**: Write validated bytes to temp file as `StagedAudio`, protected by `StagedAudioGuard`
RAII cleanup (REQ-5)
6. **Transcribe**: Invoke `Transcriber::transcribe()` to produce text (REQ-6)
7. **Inject**: Replace `ContentPart::Audio` with `ContentPart::Text` containing the transcription;
store `AudioHistoryMeta` (REQ-8)

After injection, the message continues through the normal text-only flow (`enrich_with_memory()` →
`run_unified_channel_tool_loop()` → provider). The provider MUST NOT receive audio bytes or any
audio-specific payload.

The pipeline MUST be fail-closed: any step that cannot be completed MUST reject the audio with an
appropriate `AudioRejectionReason` and emit an `AudioIngressEvent`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Document the multiple-audio rejection contract.

Issue #246 is single-audio-per-message, and the runtime already has AudioRejectionReason::MultipleAudioParts, but REQ-2 still reads as if every inbound audio proceeds into the 7-step pipeline and REQ-11 says the 11-row table is exhaustive. Please add the early rejection when more than one audio part is present and include the missing reason/message in the taxonomy.

As per coding guidelines, "Verify technical accuracy and that docs stay aligned with code changes."

Also applies to: 632-647

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/specs/audio-input/spec.md` around lines 97 - 115, Add and document
an early fail-closed check for multiple audio parts: before the 7-step pipeline
in process_channel_message() (i.e., prior to Parse/extract_user_text()), detect
if more than one ContentPart::Audio is present and immediately reject with
AudioRejectionReason::MultipleAudioParts and emit an AudioIngressEvent; update
REQ-2 to state this early rejection and update the REQ-11 taxonomy/table to
include the MultipleAudioParts reason and its human-readable message so the docs
match the runtime behavior (also apply the same insertion/update in the
corresponding section around lines 632-647).

Comment on lines +404 to +415
#### Scenario: Health check — healthy

- GIVEN whisper binary exists at the configured path
- AND the configured model file exists at `~/.corvus/models/whisper/{model}.bin`
- WHEN `health_check()` is called
- THEN it MUST return `Ok(())`

#### Scenario: Health check — unhealthy (missing model)

- GIVEN whisper binary exists but the configured model file does not exist
- WHEN `health_check()` is called
- THEN it MUST return `Err(String)` with a descriptive message about the missing model
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Describe the resolved whisper model path, not a single hard-coded location.

The implementation notes a system model-path fallback, but these sections only allow ~/.corvus/models/whisper/{model}.bin. Please document the actual lookup order or refer to the resolved model path so package-managed installs remain in spec.

As per coding guidelines, "Verify technical accuracy and that docs stay aligned with code changes."

Also applies to: 892-915

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/specs/audio-input/spec.md` around lines 404 - 415, Update the spec
text for the health_check() scenarios to describe the resolved whisper model
lookup order rather than a single hard-coded path: list the precedence used by
the implementation (explicit configured path, user home path like
~/.corvus/models/whisper/{model}.bin, then system/package-managed locations) or
reference the resolved path returned by the health_check() logic; ensure the
"unhealthy" scenario expects an Err(String) that names the resolved path it
tried. Apply the same wording change to the other affected section referenced
(the block around lines 892-915) so package-managed installs and the documented
system fallback are covered consistently.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
clients/agent-runtime/src/channels/mod.rs (3)

640-689: ⚠️ Potential issue | 🟠 Major

The turn timeout still starts after audio work.

Moving started_at earlier only changes logging. The actual tokio::time::timeout(...) is still created at Line 775, after gating, fetch/staging, and transcription complete, so slow downloads or transcriber queue waits can still bypass CHANNEL_MESSAGE_TIMEOUT_SECS. Start one deadline before the audio branch and spend the remaining budget inside the audio helpers.

Also applies to: 775-791

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/mod.rs` around lines 640 - 689, The
timeout/deadline must be created before doing audio work so slow gating, staging
or transcription can't bypass CHANNEL_MESSAGE_TIMEOUT_SECS; move creation of the
deadline/timeout (currently using started_at + CHANNEL_MESSAGE_TIMEOUT_SECS and
tokio::time::timeout(...)) to immediately after computing session_id/started_at,
then thread the remaining time/deadline into the audio helpers
(gate_audio_config, gate_and_stage_audio, transcribe_audio) or wrap those calls
with a timeout using the precomputed remaining Duration so they honor the same
CHANNEL_MESSAGE_TIMEOUT_SECS budget.

1479-1483: ⚠️ Potential issue | 🟠 Major

Inject the transcript text verbatim.

These prefixes change what memory, pre-execution checks, and the provider see, so audio input is no longer equivalent to typed input. The provenance already lives in AudioHistoryMeta.

Suggested change
-                    let injected_text = if caption_text.is_some() {
-                        format!("[Audio transcription]: {trimmed}")
-                    } else {
-                        format!("[Voice message transcription]: {trimmed}")
-                    };
+                    let injected_text = trimmed.clone();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/mod.rs` around lines 1479 - 1483, The
injected transcript currently prepends a prefix based on caption_text (the
injected_text construction) which changes how downstream memory and providers
treat audio vs typed input; remove those prefixes and inject the transcript
verbatim (use the trimmed transcript string directly) so provenance remains in
AudioHistoryMeta and the transcript is equivalent to typed input; update the
code that builds injected_text in channels/mod.rs (the block referencing
caption_text and injected_text) to assign the plain trimmed text without "[Audio
transcription]" or "[Voice message transcription]" prefixes.

1102-1125: ⚠️ Potential issue | 🟠 Major

Preserve MultipleAudioParts in observability.

This mapping still collapses a known validation failure into AudioIngressReason::SystemError, so dashboards and alerts cannot distinguish “one audio per message” rejections from real runtime failures. Add a dedicated observability reason and map it through here instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/mod.rs` around lines 1102 - 1125, The
match in audio_rejection_to_ingress_reason collapses
audio_media::AudioRejectionReason::MultipleAudioParts into
AudioIngressReason::SystemError; introduce a dedicated observability variant
(e.g., AudioIngressReason::MultipleAudioParts) in the observability enum and
update audio_rejection_to_ingress_reason to map
audio_media::AudioRejectionReason::MultipleAudioParts to that new variant
instead of SystemError so validation rejections are distinguishable in
dashboards and alerts; ensure any serialization/usage sites of
AudioIngressReason handle the new variant.
clients/agent-runtime/src/channels/audio_media.rs (2)

247-254: ⚠️ Potential issue | 🟠 Major

Keep trace metadata out of model-facing history.

to_context_string() is replayed into chat history from build_history(). Including byte_len and sha256 here leaks internal trace metadata into the prompt and burns tokens on every prior audio turn even though those fields already live in structured history. Keep the synthetic context to modality, duration, transcription, and caption only.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/audio_media.rs` around lines 247 - 254,
The to_context_string method currently injects internal trace metadata (byte_len
and sha256) into model-facing history; modify to_context_string (used by
build_history) to only include modality (mime), duration, transcription, and
caption in the produced string, removing byte_len, sha256, and any related
prefix_len logic so the returned context string is concise and safe for replay
into chat history.

120-129: ⚠️ Potential issue | 🟠 Major

Reject reserved MPEG version IDs too.

This still accepts headers like 0xFF 0xEA/0xEC/0xEE because only the layer bits are checked. Those use the reserved MPEG version id (0b01), so the MIME gate can still misclassify invalid frames as MP3. Add a version_bits != 0b01 guard and a regression test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/audio_media.rs` around lines 120 - 129,
The MP3 sniffing still accepts frames with the reserved MPEG version id (0b01)
because only layer bits were checked; update the guard in the sniffing branch
that inspects sniffed_bytes so it also rejects version bits == 0b01 (i.e., check
the MPEG version bits in sniffed_bytes[1] and skip/return non-MP3 when they
equal the reserved value) before returning AllowedAudioMime::Mp3, and add a
regression test (e.g., feed bytes like 0xFF 0xEA/0xEC/0xEE) to assert these are
not classified as AllowedAudioMime::Mp3.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clients/agent-runtime/src/channels/audio_media.rs`:
- Around line 140-142: The current ftyp check on sniffed_bytes (using
sniffed_bytes[4..8] == b"ftyp") is too broad and treats any ISO BMFF as
AllowedAudioMime::M4a; update the detection in the same code path that returns
AllowedAudioMime::M4a to either (1) parse the ftyp box further and verify the
major_brand or any compatible_brand (bytes after the 8-byte header) contains an
audio-specific brand (e.g., "M4A " / "M4B " or other known audio brands) before
returning AllowedAudioMime::M4a, or (2) if brands are absent/unreliable, parse
the MP4 boxes to locate the moov->trak->mdia->hdlr box and ensure the
handler_type equals "soun" (audio) before accepting as M4a; apply this check
where sniffed_bytes and AllowedAudioMime::M4a are referenced so non-audio MP4
containers are rejected.

In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 1173-1175: The message for AudioRejectionReason::TooLong
incorrectly computes the displayed limit using integer division
(config.audio.max_audio_duration_secs / 60) which yields 0 for sub-minute limits
and underreports others; update the formatting in the match arm for
audio_media::AudioRejectionReason::TooLong to compute minutes and seconds from
config.audio.max_audio_duration_secs (or round up minutes when you prefer
minute-only display), choose pluralization ("minute"/"minutes",
"second"/"seconds") accordingly, and emit either "X minutes Y seconds" for
sub-minute and mixed values or "N minute(s)" when exact; ensure you reference
config.audio.max_audio_duration_secs and the AudioRejectionReason::TooLong
branch when making the change.

---

Duplicate comments:
In `@clients/agent-runtime/src/channels/audio_media.rs`:
- Around line 247-254: The to_context_string method currently injects internal
trace metadata (byte_len and sha256) into model-facing history; modify
to_context_string (used by build_history) to only include modality (mime),
duration, transcription, and caption in the produced string, removing byte_len,
sha256, and any related prefix_len logic so the returned context string is
concise and safe for replay into chat history.
- Around line 120-129: The MP3 sniffing still accepts frames with the reserved
MPEG version id (0b01) because only layer bits were checked; update the guard in
the sniffing branch that inspects sniffed_bytes so it also rejects version bits
== 0b01 (i.e., check the MPEG version bits in sniffed_bytes[1] and skip/return
non-MP3 when they equal the reserved value) before returning
AllowedAudioMime::Mp3, and add a regression test (e.g., feed bytes like 0xFF
0xEA/0xEC/0xEE) to assert these are not classified as AllowedAudioMime::Mp3.

In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 640-689: The timeout/deadline must be created before doing audio
work so slow gating, staging or transcription can't bypass
CHANNEL_MESSAGE_TIMEOUT_SECS; move creation of the deadline/timeout (currently
using started_at + CHANNEL_MESSAGE_TIMEOUT_SECS and tokio::time::timeout(...))
to immediately after computing session_id/started_at, then thread the remaining
time/deadline into the audio helpers (gate_audio_config, gate_and_stage_audio,
transcribe_audio) or wrap those calls with a timeout using the precomputed
remaining Duration so they honor the same CHANNEL_MESSAGE_TIMEOUT_SECS budget.
- Around line 1479-1483: The injected transcript currently prepends a prefix
based on caption_text (the injected_text construction) which changes how
downstream memory and providers treat audio vs typed input; remove those
prefixes and inject the transcript verbatim (use the trimmed transcript string
directly) so provenance remains in AudioHistoryMeta and the transcript is
equivalent to typed input; update the code that builds injected_text in
channels/mod.rs (the block referencing caption_text and injected_text) to assign
the plain trimmed text without "[Audio transcription]" or "[Voice message
transcription]" prefixes.
- Around line 1102-1125: The match in audio_rejection_to_ingress_reason
collapses audio_media::AudioRejectionReason::MultipleAudioParts into
AudioIngressReason::SystemError; introduce a dedicated observability variant
(e.g., AudioIngressReason::MultipleAudioParts) in the observability enum and
update audio_rejection_to_ingress_reason to map
audio_media::AudioRejectionReason::MultipleAudioParts to that new variant
instead of SystemError so validation rejections are distinguishable in
dashboards and alerts; ensure any serialization/usage sites of
AudioIngressReason handle the new variant.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7e814888-7a47-45ec-a21d-880d1e4bf5be

📥 Commits

Reviewing files that changed from the base of the PR and between aa848be and 30ed5eb.

📒 Files selected for processing (2)
  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/mod.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pr-checks
  • GitHub Check: sonar
  • GitHub Check: submit-gradle
  • GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (5)
clients/agent-runtime/src/channels/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Implement Channel trait in src/channels/ with consistent send, listen, and health_check semantics and cover auth/allowlist/health behavior with tests

Files:

  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/mod.rs
clients/agent-runtime/src/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Files:

  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/mod.rs
clients/agent-runtime/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test for code validation, or document which checks were skipped and why

Files:

  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/mod.rs
**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.

Files:

  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/mod.rs
**/*

⚙️ CodeRabbit configuration file

**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.

Files:

  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/mod.rs
🧠 Learnings (3)
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests

Applied to files:

  • clients/agent-runtime/src/channels/audio_media.rs
  • clients/agent-runtime/src/channels/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why

Applied to files:

  • clients/agent-runtime/src/channels/audio_media.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths

Applied to files:

  • clients/agent-runtime/src/channels/audio_media.rs

Comment on lines +140 to +142
// M4A: bytes 4-7 = "ftyp" (ISO base media file format)
if sniffed_bytes.len() >= 8 && &sniffed_bytes[4..8] == b"ftyp" {
return Ok(AllowedAudioMime::M4a);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

ftyp alone is too broad for M4A detection.

Any ISO BMFF file with an ftyp box here — including plain MP4 video containers — is accepted as AllowedAudioMime::M4a. That weakens the fail-closed MIME gate and can route non-audio payloads into transcription. Check an audio-specific brand or inspect the track handler before returning M4a.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@clients/agent-runtime/src/channels/audio_media.rs` around lines 140 - 142,
The current ftyp check on sniffed_bytes (using sniffed_bytes[4..8] == b"ftyp")
is too broad and treats any ISO BMFF as AllowedAudioMime::M4a; update the
detection in the same code path that returns AllowedAudioMime::M4a to either (1)
parse the ftyp box further and verify the major_brand or any compatible_brand
(bytes after the 8-byte header) contains an audio-specific brand (e.g., "M4A " /
"M4B " or other known audio brands) before returning AllowedAudioMime::M4a, or
(2) if brands are absent/unreliable, parse the MP4 boxes to locate the
moov->trak->mdia->hdlr box and ensure the handler_type equals "soun" (audio)
before accepting as M4a; apply this check where sniffed_bytes and
AllowedAudioMime::M4a are referenced so non-audio MP4 containers are rejected.

Comment thread clients/agent-runtime/src/channels/mod.rs Outdated
- Reject reserved MPEG version bits (0b01) in MP3 magic-byte detection
  to exclude more invalid frame headers
- Fix TooLong user message for sub-minute durations (was showing 0 min)
- Remove sha256/byte_len from to_context_string() to reduce model tokens
- Add dedicated MultipleAudioParts variant to AudioIngressReason for
  dashboards/alerts instead of collapsing into SystemError
@yacosta738 yacosta738 force-pushed the feature/dallay-150-add-audio-input-support-for-agents-telegram-http-gateway-cli branch from 30ed5eb to c78f06d Compare April 4, 2026 07:24
Check cumulative size against max_audio_bytes before extending the byte
buffer in fetch_and_stage_audio to prevent OOM from oversized chunks
sent by a malicious upstream server.
Replace the monolithic pre-push hook that runs all checks (~2-7 min)
with a diff-aware version that only checks stacks with changed files:

- Rust (fmt + clippy + unit tests): only if clients/agent-runtime/ changed
- Kotlin (compile check): only if composeApp/agent-core-kmp/gradle changed
- Web (biome lint): only if clients/web/ changed
- Docs (lychee links): only if .md files changed
- Gradle locks: only if build config changed

Expected improvement: 2-7 minutes → 0-25 seconds for typical pushes.
CI remains the comprehensive quality gate.

Escape hatches:
- SKIP_GIT_HOOKS=1 git push  (bypass entirely)
- FULL_PRE_PUSH=1 git push   (run all checks like before)
Add unit tests for build_transcriber, gate_audio_config edge cases,
inject_transcription, TooLong message variants, Telegram voice/audio
JSON parsing, and AudioConfig zero-value validation to close the 1.6
percent coverage gap on new code.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 4, 2026

@yacosta738 yacosta738 merged commit 258d3c3 into main Apr 4, 2026
16 checks passed
@yacosta738 yacosta738 deleted the feature/dallay-150-add-audio-input-support-for-agents-telegram-http-gateway-cli branch April 4, 2026 08:38
@yacosta738 yacosta738 mentioned this pull request Apr 4, 2026
@dallay-bot dallay-bot Bot mentioned this pull request Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add audio input support for agents (Telegram, HTTP Gateway, CLI)

1 participant