feat(runtime): add audio input support with local transcription for Telegram#413
Conversation
…elegram Add audio-to-text input capability so agents can receive voice notes and audio files (OGG/Opus, MP3, WAV, M4A) via Telegram, transcribe them locally using whisper.cpp, and feed the transcription into the normal agent conversation flow. Key changes: - ContentPart::Audio variant for multimodal message parsing - Transcriber trait as new runtime extension point for STT engines - WhisperCliTranscriber wrapping whisper.cpp CLI with concurrency guard - Audio media module: MIME sniffing, size/duration validation, staging - 7-step pipeline: parse → gate → fetch → validate → stage → transcribe → inject - [audio] TOML config section (disabled by default, fail-closed) - AudioIngressEvent observability for all admission/rejection paths - StagedAudioGuard RAII cleanup on all exit paths - Doctor health checks for whisper binary and model availability - Zero new Rust crate dependencies Privacy: all transcription is local (NFR1), no audio data leaves the operator's infrastructure. Closes #246
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR implements Phase 1 audio input support for agents, introducing local audio ingestion via Telegram, validation through MIME sniffing and size/duration limits, transcription using a whisper.cpp CLI wrapper, and injection into the existing agent conversation pipeline with observability instrumentation and configuration validation. Changes
Sequence Diagram(s)sequenceDiagram
participant User as Telegram User
participant Telegram as Telegram Channel
participant Stage as Staging (Temp File)
participant Transcriber as Whisper CLI
participant Provider as LLM Provider
participant Agent as Agent Loop
User->>Telegram: Send voice/audio message
Telegram->>Telegram: Parse ContentPart::Audio
Telegram->>Telegram: Gate on config/transcriber
Telegram->>Telegram: Fetch file from Telegram API
Telegram->>Stage: Validate MIME (magic bytes)<br/>Check size/duration<br/>Write temp file with SHA-256
rect rgba(100, 150, 200, 0.5)
Note over Stage,Transcriber: Audio Pipeline
Stage->>Transcriber: Pass StagedAudio
Transcriber->>Transcriber: Acquire semaphore permit<br/>(concurrency control)
Transcriber->>Transcriber: Spawn whisper subprocess<br/>with model/timeout
Transcriber->>Transcriber: Parse stdout<br/>Filter [BLANK_AUDIO]<br/>Guard empty transcription
Transcriber-->>Stage: Return TranscriptionResult
end
Stage->>Telegram: Emit AudioIngressEvent<br/>(Admitted/Rejected)
Stage->>Stage: RAII cleanup temp file
Telegram->>Telegram: Inject transcription<br/>Replace Audio with Text
Telegram->>Telegram: Build AudioHistoryMeta<br/>for conversation context
Telegram->>Provider: Send ChatMessage<br/>with audio_metadata
Provider->>Agent: Include audio history context
Agent->>Provider: Generate response
Provider-->>Telegram: Response text
Telegram-->>User: Send text response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Rationale: Heterogeneous changes spanning new abstraction layers (Transcriber trait, audio media module with MIME sniffing), deep integration into the message processing pipeline with careful ordering (gating before memory enrichment), RAII cleanup guarantees, subprocess concurrency control with semaphores, Telegram-specific fetch/stage implementation, and widespread observability instrumentation. Requires careful validation of error propagation paths, transcription rejection handling, cleanup side effects, and concurrency correctness. Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Contributor ReportUser: @yacosta738
Contributor Report evaluates based on public GitHub activity. Analysis period: 2025-04-04 to 2026-04-04 |
Add unit tests for audio rejection user messages, ingress reason mapping, config validation, Telegram voice/audio JSON parsing, and pipeline integration to reach ≥80% coverage on new code.
Deploying corvus with
|
| Latest commit: |
b4f304f
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://c785439c.corvus-42x.pages.dev |
| Branch Preview URL: | https://feature-dallay-150-add-audio.corvus-42x.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 27
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
clients/agent-runtime/src/channels/mod.rs (2)
619-688:⚠️ Potential issue | 🟠 MajorPut the audio stages under the per-turn timeout.
This block now does fetch/stage/transcribe work before the only timeout in the handler. A slow Telegram download or wedged whisper process can hold a worker past
CHANNEL_MESSAGE_TIMEOUT_SECSand never hit the timeout reply path. Wrap the whole turn, or pass a remaining budget into the audio stages.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/mod.rs` around lines 619 - 688, process_channel_message currently performs the audio pipeline (gate_audio_config, gate_and_stage_audio, transcribe_audio, inject_transcription) before the per-turn timeout, allowing slow downloads/transcription to exceed CHANNEL_MESSAGE_TIMEOUT_SECS; move the entire audio stages under the per-turn timeout boundary (or compute remaining_budget and pass it into gate_audio_config/gate_and_stage_audio/transcribe_audio) so that these calls are canceled when the channel turn times out, and ensure any temp resources from audio_guard are still cleaned up on timeout.
2705-2721:⚠️ Potential issue | 🔴 CriticalInstantiate the transcriber in both runtime constructors.
gate_audio_config()rejects wheneverctx.transcriberis empty, but both productionChannelRuntimeContextbuilders still hard-codetranscriber: None. With[audio]enabled, every audio turn will fail asTranscriberUnavailable, so the new feature never actually admits audio in eitherstart_channels()orspawn_runtime_handle().Also applies to: 2784-2800
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/mod.rs` around lines 2705 - 2721, The ChannelRuntimeContext currently sets transcriber: None which causes gate_audio_config() to reject audio; fix by constructing the transcriber before building runtime_ctx and passing it into ChannelRuntimeContext as transcriber: Some(...) instead of None. Locate where runtime_ctx is created (the ChannelRuntimeContext instantiation in start_channels() and the analogous one in spawn_runtime_handle()) and call the existing audio/transcriber factory (e.g., the module/function used to create transcribers in this crate—invoke it with the runtime/config) to produce a transcriber instance, then set transcriber: Some(transcriber) in both constructors so gate_audio_config() sees a transcriber present. Ensure any errors from creating the transcriber are handled/propagated consistent with existing error handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/agent-runtime/src/channels/audio_media.rs`:
- Around line 118-124: The MP3 magic-byte check in audio_media.rs is too strict;
update the second-byte test to accept any value with the top three bits set (the
MPEG sync continuation) instead of only 0xFB/0xF3/0xF2. Replace the explicit
equality checks on sniffed_bytes[1] with a mask test like (sniffed_bytes[1] &
0xE0) == 0xE0 in the same if that returns AllowedAudioMime::Mp3 so valid MP3
frame headers aren’t rejected.
In `@clients/agent-runtime/src/channels/discord.rs`:
- Line 955: The panic message in the wildcard match arm (currently `_ =>
panic!("expected Image, got Text")`) is inaccurate; change the arm to bind the
unmatched value (e.g., `_` -> `other`) and update the panic to either a generic
message like "expected Image, got non-Image variant" or include the actual
variant via formatting (e.g., panic!("expected Image, got {:?}", other)) so the
failure text matches the wildcard behavior.
In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 1271-1283: The code currently maps the "more than one audio part"
validation to audio_media::AudioRejectionReason::SystemError; update the
reject_audio_turn call in the audio_parts.len() > 1 branch to use a specific
rejection reason (e.g.
audio_media::AudioRejectionReason::TooManyAudioAttachments or a clearly named
variant like MultipleAudioAttachments). If that enum variant does not exist, add
it to audio_media::AudioRejectionReason and use it here so the rejection path in
reject_audio_turn, telemetry, and any user-facing messages can distinguish
validation failures from internal system errors. Ensure you reference the
audio_parts check, the reject_audio_turn call, and replace the SystemError enum
usage with the new specific variant.
- Around line 670-681: The emitted transcription latency is using
tx.duration_secs (clip length) instead of the actual processing time; update the
code that calls emit_audio_ingress in the loop over audio_guard and
transcriptions to use the TranscriptionResult.processing_ms (preserved from
transcribe_audio()) converted to ms instead of
tx.duration_secs.map(duration_f64_to_ms); ensure TranscriptionResult returned by
transcribe_audio() includes processing_ms and that the other similar
emit_audio_ingress usage (the block around lines 1376-1405) is updated the same
way so both places report real transcription latency.
In `@clients/agent-runtime/src/channels/telegram.rs`:
- Around line 1824-1833: The current staging uses a predictable temp path built
from sha256 (temp_path) and writes via tokio::fs::write, which risks races and
symlink/clobber attacks; change the logic in the block that constructs temp_path
and calls tokio::fs::write (and returns
audio_media::AudioRejectionReason::FetchFailed on error) to create a secure,
unique temp file (use std::fs::OpenOptions with create_new or a NamedTempFile
equivalent), write the bytes via the file handle rather than atomically
overwriting a predictable path, and then use/rename that file to the final name
or return its path; retain the same error mapping but ensure failures are logged
with the file handle/path for debugging.
- Around line 63-107: The audio/voice parsing now creates ContentPart::Audio
even when derive_text_projection() returns None, which means
handle_unauthorized_message() bails out without sending an approval prompt;
update the unauthorized-path logic (where derive_text_projection(),
handle_unauthorized_message(), and send_unauthorized_notification() are used) to
detect media-only updates (parts contains audio but text projection is None) and
call send_unauthorized_notification() so unapproved senders still get the
notification; apply the same fix for the other audio-handling blocks (the voice
and audio branches that push ContentPart::Audio) and add a regression test that
posts an unauthorized audio-only update and asserts
send_unauthorized_notification() (or the channel’s outbound notification) was
invoked.
In `@clients/agent-runtime/src/config/schema.rs`:
- Around line 304-308: Replace the duplicated hard-coded constants in schema.rs
with the shared definitions from the audio_media module: remove the local
MAX_AUDIO_BYTES_CEILING and MAX_AUDIO_DURATION_SECS_CEILING declarations and
import the constants from channels::audio_media (e.g. use
crate::channels::audio_media::{MAX_AUDIO_BYTES_CEILING,
MAX_AUDIO_DURATION_SECS_CEILING};), so startup validation uses the same values
as runtime media validation (refer to the constants named
MAX_AUDIO_BYTES_CEILING and MAX_AUDIO_DURATION_SECS_CEILING).
- Around line 336-341: Add validation to reject zero for the transcription
controls: ensure max_concurrent_transcriptions and transcription_timeout_secs
are > 0 during config validation (e.g., in the Config/Schema validation method
in clients/agent-runtime/src/config/schema.rs). If either field equals 0, return
a clear startup error (with context naming max_concurrent_transcriptions or
transcription_timeout_secs) rather than allowing runtime operation; use the
existing validation/error pattern used elsewhere in the file (and reference
default_max_concurrent_transcriptions and default_transcription_timeout_secs
when documenting/recovering).
- Around line 313-342: The AudioConfig struct currently allows
unknown/misspelled TOML keys which silently fall back to defaults; update the
AudioConfig definition to add the serde attribute #[serde(deny_unknown_fields)]
so deserialization fails on unknown fields (matching the parent Config behavior)
— locate the AudioConfig struct in schema.rs and add that attribute above its
#[derive(...)] line.
In `@clients/agent-runtime/src/doctor/mod.rs`:
- Around line 944-971: The test audio_health_pass_model_exists duplicates the
production model-check logic instead of exercising check_audio_health(), so
replace the inline existence checks with a call to check_audio_health() (or the
specific helper it uses) to ensure the real path is tested; set up the TempDir
and model file as before, then call check_audio_health() (or the exported
function that returns Vec<DiagItem>), and assert on the returned items' length,
Severity::Ok and message contains "found" to validate the real logic (reference
test name audio_health_pass_model_exists, function check_audio_health, types
DiagItem and Severity).
- Around line 659-677: The check currently treats any existing filesystem entry
as a valid whisper model; change the logic to verify the resolved model_path is
a regular file (e.g., use model_path.is_file() or metadata().is_file()) before
pushing DiagItem::ok for the transcription model (ac.transcription_model); if
not a file, push DiagItem::error with the same contextual message referencing
model_path.display() so directories don't produce false-positive doctor results.
In `@clients/agent-runtime/src/observability/otel.rs`:
- Around line 201-202: ObserverEvent::AudioIngress is currently ignored in the
OTEL backend (the match arm with ObserverEvent::AudioIngress(_) is a no-op), so
audio admit/reject telemetry and reasons are not recorded; update the OTEL
handler in otel.rs to record a metric and associated attributes for audio
ingress events instead of silently dropping them—extract the admit/reject status
and reason from the AudioIngress payload and use the existing OTEL metric
recorder (same subsystem used for other ObserverEvent arms) to emit a counter or
histogram and set attributes like "audio.admit" (bool/string) and "audio.reason"
(string) so OTEL deployments capture admit/reject counts and reasons.
In `@clients/agent-runtime/src/observability/prometheus.rs`:
- Around line 190-191: The match arm currently ignores
ObserverEvent::AudioIngress which prevents audio admit/reject/failure metrics
from being exposed; update the handler (the match over ObserverEvent in
prometheus.rs) to process ObserverEvent::AudioIngress instead of discarding it,
and map its inner variants to the same Prometheus counters used for other
ingress types (increment the appropriate admit/reject/failure metrics and set
labels/timestamps as done for the existing ingress events), using the
ObserverEvent::AudioIngress symbol to locate the code and mirror the logic used
for the other ingress-related arms.
In `@clients/agent-runtime/src/providers/traits.rs`:
- Around line 54-64: Add symmetric serde tests for the new audio_metadata field
matching the existing image_metadata tests: write a missing-field
deserialization test that deserializes JSON lacking "audio_metadata" into the
same struct used in traits.rs (exercise user_with_audio / the message struct)
and asserts audio_metadata == None, and write a skip-serialize-none test that
constructs the struct with audio_metadata = None, serializes it to JSON, and
asserts the "audio_metadata" key is not present; place these tests alongside the
existing image_metadata serde tests so they run in the same test module.
- Around line 54-64: The PR currently builds mixed-media turns by calling
user_with_audio(...) and then mutating image_metadata later, which creates
partial-state; add a single constructor (e.g., user_with_media) that accepts
content plus both image_metadata and audio_metadata (or Option-wrapped Vecs) and
returns Self with role, content, image_metadata and audio_metadata set
atomically; update callers that currently call user_with_audio and then set
image_metadata (the code mutating image_metadata) to call user_with_media
instead; optionally keep thin helpers user_with_audio and user_with_image that
forward to user_with_media to preserve existing call-sites.
In `@clients/agent-runtime/src/transcription/whisper_cli.rs`:
- Around line 150-156: The current check in the whisper-cli subprocess handling
(the branch that tests output.status.success() in the function handling
transcription) treats any non-zero exit as AudioRejectionReason::Corrupted;
change this so the default error returned for non-zero exits is a
transcription/system error (e.g., AudioRejectionReason::TranscriptionFailed or
SystemError) and only map to AudioRejectionReason::Corrupted when stderr
contains clear media-decode/input failure signatures (detect keywords like
"decode", "unsupported format", "invalid data", "couldn't parse", "ffmpeg",
"libav", or similar). Update the error path that logs via tracing::error! to
still include stderr and exit code, and perform a small pattern match on the
stderr string to switch to Corrupted only when those decode-related tokens are
present; otherwise return the TranscriptionFailed/SystemError variant.
- Around line 73-89: The function resolve_model_path currently returns the
per-user path whenever a home directory exists without checking whether the file
actually exists; change it to construct the user path (using
user_dirs.home_dir() and the filename), test that path.exists(), and only return
it if present—otherwise fall back to the system path (e.g.,
PathBuf::from(format!("/usr/local/share/whisper/{filename}"))). Ensure you still
keep the existing fallback when directories::UserDirs::new() is None. Use the
same local variables (filename, user_dirs, home_dir) so callers of
resolve_model_path need no changes.
- Around line 126-147: The spawned whisper-cli child is left running on timeout
because the wait future is dropped; fix by enabling kill-on-drop on the Command
before spawning (call cmd.kill_on_drop(true)) or by ensuring the Child is
explicitly killed (call child.kill().await and wait for its exit) in the timeout
branch and any early-return error branches; update the code around cmd.spawn()
and the timeout match (the variables cmd, child, self.timeout, and the timeout
Err(_) branch) so the child is terminated before returning and the semaphore
permit is only released after the child is killed/awaited.
In `@openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md`:
- Line 4: Update the stale GitHub issue link in archive-report.md by replacing
the repository segment "anthropics/corvus" with "dallay/corvus" so the issue URL
becomes https://github.com/dallay/corvus/issues/246; ensure the change updates
the exact text shown in the file (the string
"https://github.com/anthropics/corvus/issues/246") to the new repository target.
In `@openspec/changes/archive/2026-04-03-audio-input-support/design.md`:
- Around line 82-88: The markdown fenced code blocks showing the pipeline steps
and directory tree are missing language tags (triggering MD040); update each
fenced block around the snippets that list
extract_user_text()/gate_audio_config()/transcription/… and the src/ tree to
include a language identifier (e.g., ```text) so markdownlint passes;
specifically add the same language tag to the three fenced blocks containing the
step list (extract_user_text → enrich_with_memory → …), the expanded audio-step
list (→ gate_audio_config → … → inject_transcription → …), and the src/
directory tree block.
- Around line 826-829: Update the design.md text to remove the incorrect claim
that a standalone doctor module doesn't exist and instead state that the doctor
command is implemented at clients/agent-runtime/src/doctor and is invoked from
the CLI via Commands::Doctor => doctor::run(); also revise the wording to
reflect that health checks are integrated into the runtime startup validation
path (see src/config/validation.rs) and that audio diagnostics are included,
removing any "will be added in future" phrasing and ensuring the document
accurately describes the existing integration.
In `@openspec/changes/archive/2026-04-03-audio-input-support/exploration.md`:
- Around line 434-449: The markdown subsections "### Phase 1 Scope (MVP)", "###
Phase 2 (Follow-up)", and "### Effort Estimate" need blank lines added before
and after each heading to satisfy markdownlint MD022; edit the block containing
those headings in exploration.md so there is an empty line above and below each
`###` heading (ensure you also add a trailing blank line after the final
subsection) to fix the spacing.
- Around line 71-83: The unlabeled fenced code blocks in the exploration
examples (the sequence showing Channel.listen() → parse message → build
ContentPart::Image and the other two similar blocks) violate markdownlint MD040;
update each fenced block that documents the flow (the one containing
Channel.listen(), and the other blocks around the same example) to include a
language tag (e.g., ```text or ```mermaid as appropriate) so the fences are
labeled; locate the blocks near the sequence using identifiers like
Channel.listen(), process_channel_message(), extract_user_text(),
gate_and_stage_images(), StagedImageGuard and run_unified_channel_tool_loop()
and add the language label to each opening fence.
In `@openspec/changes/archive/2026-04-03-audio-input-support/proposal.md`:
- Around line 69-72: Two fenced code blocks (the one showing "Image flow:
Channel → ContentPart::Image → ..." and the pipeline block starting with
"extract_user_text()") are missing language identifiers which triggers
markdownlint MD040; update both fences to include a language label such as
```text so the blocks become labeled code fences (e.g., add "text" to the
opening backticks for the ContentPart::Image/Audio flow block and the
extract_user_text() pipeline block).
In
`@openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md`:
- Around line 341-347: Update the archived spec to match the runtime: change the
Transcriber trait code fence to rust and ensure it reflects current runtime
usage; update the [audio] contract to include the AudioConfig fields
whisper_binary (default "whisper-cli"), max_concurrent_transcriptions, and
transcription_timeout_secs, and replace any lingering references to the binary
named "whisper" with "whisper-cli"; apply the same fixes to the other
Transcriber snippets and [audio] contract occurrences (the other two locations
mentioned) so the archived contract matches AudioConfig and runtime behavior.
In `@openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md`:
- Around line 25-37: Update the Markdown in verify-report.md to satisfy
MD022/MD031/MD040 by adding blank lines before and after the "Anti-Patterns
Check" and "Code Style" headings and by converting the three fenced code blocks
to have language tags and surrounding blank lines; specifically change the
blocks containing "cargo check --manifest-path clients/agent-runtime/Cargo.toml
→ Finished dev profile" and "cargo clippy --manifest-path
clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev
profile" to ```bash fenced blocks with a blank line before and after, and change
the final test summary block ("All test suites pass: unit tests...") to a
```text fenced block with a blank line before and after so the headings and
fenced blocks comply with linting rules.
In `@openspec/specs/audio-input/spec.md`:
- Around line 97-99: The spec's placement of the 7-step audio pipeline is
incorrect; update the documentation so it reflects the actual implementation
order used in clients/agent-runtime: the audio gating/staging/transcription
pipeline is executed before extract_user_text() (i.e., inserted into
process_channel_message() prior to calling extract_user_text()), not between
extract_user_text() and enrich_with_memory(); reference
process_channel_message(), extract_user_text(), and enrich_with_memory() when
making the spec change.
---
Outside diff comments:
In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 619-688: process_channel_message currently performs the audio
pipeline (gate_audio_config, gate_and_stage_audio, transcribe_audio,
inject_transcription) before the per-turn timeout, allowing slow
downloads/transcription to exceed CHANNEL_MESSAGE_TIMEOUT_SECS; move the entire
audio stages under the per-turn timeout boundary (or compute remaining_budget
and pass it into gate_audio_config/gate_and_stage_audio/transcribe_audio) so
that these calls are canceled when the channel turn times out, and ensure any
temp resources from audio_guard are still cleaned up on timeout.
- Around line 2705-2721: The ChannelRuntimeContext currently sets transcriber:
None which causes gate_audio_config() to reject audio; fix by constructing the
transcriber before building runtime_ctx and passing it into
ChannelRuntimeContext as transcriber: Some(...) instead of None. Locate where
runtime_ctx is created (the ChannelRuntimeContext instantiation in
start_channels() and the analogous one in spawn_runtime_handle()) and call the
existing audio/transcriber factory (e.g., the module/function used to create
transcribers in this crate—invoke it with the runtime/config) to produce a
transcriber instance, then set transcriber: Some(transcriber) in both
constructors so gate_audio_config() sees a transcriber present. Ensure any
errors from creating the transcriber are handled/propagated consistent with
existing error handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5be0db47-780b-4ae1-9073-e138f500a063
📒 Files selected for processing (35)
clients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/telegram.rsclients/agent-runtime/src/channels/traits.rsclients/agent-runtime/src/channels/whatsapp.rsclients/agent-runtime/src/config/mod.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/doctor/mod.rsclients/agent-runtime/src/lib.rsclients/agent-runtime/src/main.rsclients/agent-runtime/src/observability/log.rsclients/agent-runtime/src/observability/mod.rsclients/agent-runtime/src/observability/otel.rsclients/agent-runtime/src/observability/prometheus.rsclients/agent-runtime/src/observability/traits.rsclients/agent-runtime/src/onboard/wizard.rsclients/agent-runtime/src/providers/anthropic.rsclients/agent-runtime/src/providers/compatible.rsclients/agent-runtime/src/providers/copilot.rsclients/agent-runtime/src/providers/openrouter.rsclients/agent-runtime/src/providers/router.rsclients/agent-runtime/src/providers/traits.rsclients/agent-runtime/src/transcription/mod.rsclients/agent-runtime/src/transcription/traits.rsclients/agent-runtime/src/transcription/whisper_cli.rsopenspec/changes/archive/2026-04-03-audio-input-support/archive-report.mdopenspec/changes/archive/2026-04-03-audio-input-support/design.mdopenspec/changes/archive/2026-04-03-audio-input-support/exploration.mdopenspec/changes/archive/2026-04-03-audio-input-support/proposal.mdopenspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.mdopenspec/changes/archive/2026-04-03-audio-input-support/state.yamlopenspec/changes/archive/2026-04-03-audio-input-support/tasks.mdopenspec/changes/archive/2026-04-03-audio-input-support/verify-report.mdopenspec/specs/audio-input/spec.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: submit-gradle
- GitHub Check: pr-checks
- GitHub Check: sonar
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (9)
clients/agent-runtime/src/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency
Files:
clients/agent-runtime/src/observability/prometheus.rsclients/agent-runtime/src/providers/router.rsclients/agent-runtime/src/providers/anthropic.rsclients/agent-runtime/src/observability/otel.rsclients/agent-runtime/src/transcription/mod.rsclients/agent-runtime/src/providers/compatible.rsclients/agent-runtime/src/lib.rsclients/agent-runtime/src/channels/whatsapp.rsclients/agent-runtime/src/main.rsclients/agent-runtime/src/observability/mod.rsclients/agent-runtime/src/providers/copilot.rsclients/agent-runtime/src/providers/openrouter.rsclients/agent-runtime/src/observability/log.rsclients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/config/mod.rsclients/agent-runtime/src/providers/traits.rsclients/agent-runtime/src/doctor/mod.rsclients/agent-runtime/src/onboard/wizard.rsclients/agent-runtime/src/transcription/traits.rsclients/agent-runtime/src/channels/telegram.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/observability/traits.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/transcription/whisper_cli.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/traits.rs
clients/agent-runtime/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Run
cargo fmt --all -- --check,cargo clippy --all-targets -- -D warnings, andcargo testfor code validation, or document which checks were skipped and why
Files:
clients/agent-runtime/src/observability/prometheus.rsclients/agent-runtime/src/providers/router.rsclients/agent-runtime/src/providers/anthropic.rsclients/agent-runtime/src/observability/otel.rsclients/agent-runtime/src/transcription/mod.rsclients/agent-runtime/src/providers/compatible.rsclients/agent-runtime/src/lib.rsclients/agent-runtime/src/channels/whatsapp.rsclients/agent-runtime/src/main.rsclients/agent-runtime/src/observability/mod.rsclients/agent-runtime/src/providers/copilot.rsclients/agent-runtime/src/providers/openrouter.rsclients/agent-runtime/src/observability/log.rsclients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/config/mod.rsclients/agent-runtime/src/providers/traits.rsclients/agent-runtime/src/doctor/mod.rsclients/agent-runtime/src/onboard/wizard.rsclients/agent-runtime/src/transcription/traits.rsclients/agent-runtime/src/channels/telegram.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/observability/traits.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/transcription/whisper_cli.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/traits.rs
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/agent-runtime/src/observability/prometheus.rsclients/agent-runtime/src/providers/router.rsclients/agent-runtime/src/providers/anthropic.rsclients/agent-runtime/src/observability/otel.rsclients/agent-runtime/src/transcription/mod.rsclients/agent-runtime/src/providers/compatible.rsclients/agent-runtime/src/lib.rsclients/agent-runtime/src/channels/whatsapp.rsclients/agent-runtime/src/main.rsclients/agent-runtime/src/observability/mod.rsclients/agent-runtime/src/providers/copilot.rsclients/agent-runtime/src/providers/openrouter.rsclients/agent-runtime/src/observability/log.rsclients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/config/mod.rsclients/agent-runtime/src/providers/traits.rsclients/agent-runtime/src/doctor/mod.rsclients/agent-runtime/src/onboard/wizard.rsclients/agent-runtime/src/transcription/traits.rsclients/agent-runtime/src/channels/telegram.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/observability/traits.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/transcription/whisper_cli.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/traits.rs
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
clients/agent-runtime/src/observability/prometheus.rsclients/agent-runtime/src/providers/router.rsclients/agent-runtime/src/providers/anthropic.rsclients/agent-runtime/src/observability/otel.rsclients/agent-runtime/src/transcription/mod.rsclients/agent-runtime/src/providers/compatible.rsclients/agent-runtime/src/lib.rsclients/agent-runtime/src/channels/whatsapp.rsclients/agent-runtime/src/main.rsclients/agent-runtime/src/observability/mod.rsopenspec/changes/archive/2026-04-03-audio-input-support/state.yamlclients/agent-runtime/src/providers/copilot.rsclients/agent-runtime/src/providers/openrouter.rsclients/agent-runtime/src/observability/log.rsclients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/config/mod.rsclients/agent-runtime/src/providers/traits.rsclients/agent-runtime/src/doctor/mod.rsopenspec/changes/archive/2026-04-03-audio-input-support/verify-report.mdopenspec/changes/archive/2026-04-03-audio-input-support/proposal.mdopenspec/changes/archive/2026-04-03-audio-input-support/archive-report.mdclients/agent-runtime/src/onboard/wizard.rsclients/agent-runtime/src/transcription/traits.rsopenspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.mdopenspec/changes/archive/2026-04-03-audio-input-support/exploration.mdopenspec/specs/audio-input/spec.mdopenspec/changes/archive/2026-04-03-audio-input-support/tasks.mdclients/agent-runtime/src/channels/telegram.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/observability/traits.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/transcription/whisper_cli.rsopenspec/changes/archive/2026-04-03-audio-input-support/design.mdclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/traits.rs
clients/agent-runtime/src/providers/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Implement
Providertrait insrc/providers/and register insrc/providers/mod.rsfactory when adding a new provider
Files:
clients/agent-runtime/src/providers/router.rsclients/agent-runtime/src/providers/anthropic.rsclients/agent-runtime/src/providers/compatible.rsclients/agent-runtime/src/providers/copilot.rsclients/agent-runtime/src/providers/openrouter.rsclients/agent-runtime/src/providers/traits.rs
clients/agent-runtime/src/channels/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Implement
Channeltrait insrc/channels/with consistentsend,listen, andhealth_checksemantics and cover auth/allowlist/health behavior with tests
Files:
clients/agent-runtime/src/channels/whatsapp.rsclients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/channels/telegram.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/traits.rs
clients/agent-runtime/src/main.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
clients/agent-runtime/src/main.rs: Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
Keep startup path lean and avoid heavy initialization in command parsing flow
Files:
clients/agent-runtime/src/main.rs
clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Files:
clients/agent-runtime/src/config/mod.rsclients/agent-runtime/src/config/schema.rs
**/*.{md,mdx}
⚙️ CodeRabbit configuration file
**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.
Files:
openspec/changes/archive/2026-04-03-audio-input-support/verify-report.mdopenspec/changes/archive/2026-04-03-audio-input-support/proposal.mdopenspec/changes/archive/2026-04-03-audio-input-support/archive-report.mdopenspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.mdopenspec/changes/archive/2026-04-03-audio-input-support/exploration.mdopenspec/specs/audio-input/spec.mdopenspec/changes/archive/2026-04-03-audio-input-support/tasks.mdopenspec/changes/archive/2026-04-03-audio-input-support/design.md
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
Applied to files:
clients/agent-runtime/src/observability/prometheus.rsclients/agent-runtime/src/transcription/mod.rsclients/agent-runtime/src/providers/compatible.rsclients/agent-runtime/src/channels/whatsapp.rsclients/agent-runtime/src/main.rsclients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/config/mod.rsclients/agent-runtime/src/doctor/mod.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/transcription/whisper_cli.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests
Applied to files:
clients/agent-runtime/src/providers/anthropic.rsclients/agent-runtime/src/providers/compatible.rsclients/agent-runtime/src/channels/whatsapp.rsclients/agent-runtime/src/observability/mod.rsclients/agent-runtime/src/providers/openrouter.rsclients/agent-runtime/src/doctor/mod.rsclients/agent-runtime/src/transcription/traits.rsopenspec/changes/archive/2026-04-03-audio-input-support/tasks.mdclients/agent-runtime/src/channels/telegram.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/observability/traits.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/traits.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/providers/**/*.rs : Implement `Provider` trait in `src/providers/` and register in `src/providers/mod.rs` factory when adding a new provider
Applied to files:
clients/agent-runtime/src/transcription/mod.rsclients/agent-runtime/src/lib.rsclients/agent-runtime/src/observability/mod.rsclients/agent-runtime/src/transcription/traits.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/Cargo.toml : Do not add heavy dependencies for minor convenience; justify new crate additions
Applied to files:
clients/agent-runtime/src/transcription/mod.rsclients/agent-runtime/src/main.rsclients/agent-runtime/src/observability/mod.rsclients/agent-runtime/src/config/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why
Applied to files:
clients/agent-runtime/src/transcription/mod.rsclients/agent-runtime/src/channels/whatsapp.rsclients/agent-runtime/src/main.rsclients/agent-runtime/src/config/mod.rsclients/agent-runtime/src/doctor/mod.rsclients/agent-runtime/src/config/schema.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Keep startup path lean and avoid heavy initialization in command parsing flow
Applied to files:
clients/agent-runtime/src/main.rsclients/agent-runtime/src/channels/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/**/*.rs : Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency
Applied to files:
clients/agent-runtime/src/main.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Applied to files:
clients/agent-runtime/src/config/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/tools/**/*.rs : Implement `Tool` trait in `src/tools/` with strict parameter schema, validate and sanitize all inputs, and return structured `ToolResult` without panics in runtime path
Applied to files:
clients/agent-runtime/src/transcription/traits.rs
🪛 LanguageTool
openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md
[style] ~399-~399: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...tion failure — process timeout - GIVEN a very large audio file near the duration limit - WH...
(EN_WEAK_ADJECTIVE)
[grammar] ~479-~479: Use a hyphen to join words.
Context: ...tartup validation error indicating the 1 hour ceiling #### Scenario: Missing aud...
(QB_NEW_EN_HYPHEN)
[style] ~838-~838: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...t, including: - Zero-byte audio files - Extremely large files (rejected by size limit) - Files ...
(EN_WEAK_ADJECTIVE)
[locale-violation] ~847-~847: In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects.
Context: ... same user sends a text message "hello" afterwards - THEN the text message is processed no...
(AFTERWARDS_US)
openspec/changes/archive/2026-04-03-audio-input-support/exploration.md
[style] ~181-~181: Consider using a different adverb to strengthen your wording.
Context: ...) and audio files (audio field) are completely ignored — messages with only voice/au...
(COMPLETELY_ENTIRELY)
openspec/specs/audio-input/spec.md
[style] ~399-~399: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...tion failure — process timeout - GIVEN a very large audio file near the duration limit - WH...
(EN_WEAK_ADJECTIVE)
[grammar] ~479-~479: Use a hyphen to join words.
Context: ...tartup validation error indicating the 1 hour ceiling #### Scenario: Missing aud...
(QB_NEW_EN_HYPHEN)
[style] ~838-~838: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...t, including: - Zero-byte audio files - Extremely large files (rejected by size limit) - Files ...
(EN_WEAK_ADJECTIVE)
[locale-violation] ~847-~847: In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects.
Context: ... same user sends a text message "hello" afterwards - THEN the text message is processed no...
(AFTERWARDS_US)
openspec/changes/archive/2026-04-03-audio-input-support/design.md
[style] ~943-~943: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nges. - No provider contract changes. - No existing behavior modified. - Rollout: ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.22.0)
openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md
[warning] 25-25: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 25-25: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 30-30: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 30-30: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 35-35: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 35-35: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 313-313: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 320-320: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
openspec/changes/archive/2026-04-03-audio-input-support/proposal.md
[warning] 69-69: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 79-79: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md
[warning] 341-341: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
openspec/changes/archive/2026-04-03-audio-input-support/exploration.md
[warning] 71-71: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 359-359: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 370-370: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 434-434: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 441-441: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 447-447: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
openspec/specs/audio-input/spec.md
[warning] 341-341: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
openspec/changes/archive/2026-04-03-audio-input-support/design.md
[warning] 82-82: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 92-92: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 949-949: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (15)
clients/agent-runtime/src/main.rs (2)
74-74: Good module wiring for transcription integration.
mod transcription;cleanly wires the new runtime transcription module into the binary crate.
718-719: Scoped lint suppression is acceptable here.Applying
clippy::large_futuresat the dispatcher boundary is reasonable for this async match-heavy function.clients/agent-runtime/src/providers/anthropic.rs (1)
975-976: Test fixture updates are correct.Adding
audio_metadata: Noneconsistently keeps testChatMessagefixtures aligned with the current provider trait contract.Also applies to: 981-982, 987-988, 1000-1001, 1008-1009, 1023-1024, 1033-1034, 1155-1156, 1175-1176, 1277-1278, 1340-1341, 1346-1347, 1352-1353
clients/agent-runtime/src/transcription/mod.rs (1)
1-2: Module exports look clean and intentional.Publicly exposing
traitsandwhisper_cliis a solid, minimal surface for the transcription subsystem.clients/agent-runtime/src/observability/log.rs (1)
192-203: Good audio ingress log coverage with safe metadata fields.This adds the expected ingress lifecycle visibility without logging raw audio/transcript payloads.
clients/agent-runtime/src/providers/openrouter.rs (1)
538-539: Fixture alignment is correct.The added
audio_metadata: Nonekeeps tests in sync with the expandedChatMessageschema.Also applies to: 544-545, 588-589, 594-595, 642-643, 738-739, 759-760
openspec/changes/archive/2026-04-03-audio-input-support/state.yaml (1)
1-8: Archive state entry looks complete and consistent.Phase state, references, and branch linkage are properly captured.
clients/agent-runtime/src/observability/mod.rs (1)
17-19: Re-export update is correct.Adding audio ingress types to the module surface keeps observability APIs coherent for downstream users.
clients/agent-runtime/src/onboard/wizard.rs (1)
799-799: Good fail-closed wiring for audio config defaults.Both onboarding paths now initialize
audioexplicitly, which keeps generated configs complete and secure-by-default.Also applies to: 1037-1037
clients/agent-runtime/src/config/mod.rs (1)
5-15: Re-export update is correct and coherent.Adding
AudioConfigto the schema re-export keeps the config API aligned with the new[audio]section.clients/agent-runtime/src/providers/traits.rs (1)
17-19:audio_metadataaddition is backward-compatible and safely defaulted.Using
#[serde(default, skip_serializing_if = "Option::is_none")]here is the right compatibility choice for existing stored history payloads.Also applies to: 23-29, 32-38, 41-52, 54-64, 67-73, 76-82
clients/agent-runtime/src/transcription/traits.rs (1)
23-42: Transcriber interface is clean and runtime-safe.Good separation between user-facing transcription errors and startup/doctor health diagnostics.
clients/agent-runtime/src/channels/telegram.rs (1)
3077-3077: LGTM on the widened test matches.Using
_ => panic!(...)keeps these assertions focused on the expected variant now thatContentParthas another case.Also applies to: 3140-3140, 3171-3171, 3184-3184, 3224-3224, 3279-3279, 3332-3332, 3389-3389
clients/agent-runtime/src/transcription/whisper_cli.rs (1)
50-70: Nice output normalization.Filtering
[BLANK_AUDIO]and collapsing multiline stdout here is a good guard against injecting empty/silent transcripts.clients/agent-runtime/src/channels/audio_media.rs (1)
282-725: Strong boundary-focused test coverage for the new audio media layer.Coverage across MIME sniffing, size/duration boundaries, cleanup behavior, and context rendering looks solid for this critical ingress path.
| ``` | ||
| cargo check --manifest-path clients/agent-runtime/Cargo.toml → Finished dev profile | ||
| ``` | ||
|
|
||
| **Clippy**: ✅ Passed (zero warnings) | ||
| ``` | ||
| cargo clippy --manifest-path clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev profile | ||
| ``` | ||
|
|
||
| **Tests**: ✅ 6,487 passed / 0 failed / 0 ignored | ||
| ``` | ||
| All test suites pass: unit tests (3193 lib + 3220 bin), 15 integration test suites, 2 doc-tests. | ||
| ``` |
There was a problem hiding this comment.
Fix markdownlint violations in fenced blocks and heading spacing.
The fenced blocks need language tags and surrounding blank lines, and the “Anti-Patterns Check” / “Code Style” headings need blank lines around them per MD022/MD031/MD040.
Proposed markdown fix
**Build**: ✅ Passed
-```
+
+```bash
cargo check --manifest-path clients/agent-runtime/Cargo.toml → Finished dev profileClippy: ✅ Passed (zero warnings)
- + +bash
cargo clippy --manifest-path clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev profile
**Tests**: ✅ 6,487 passed / 0 failed / 0 ignored
-```
+
+```text
All test suites pass: unit tests (3193 lib + 3220 bin), 15 integration test suites, 2 doc-tests.
@@
Code Quality Assessment
Anti-Patterns Check
- ✅ No
unwrap()/expect()in production code — all occurrences are in#[cfg(test)]blocks
@@
Code Style
- ✅ Follows existing codebase patterns (mirrors
StagedImageGuard,ImageRejectionReason, etc.)
</details>
Also applies to: 313-313, 320-320
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>
[warning] 25-25: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
---
[warning] 25-25: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
---
[warning] 30-30: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
---
[warning] 30-30: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
---
[warning] 35-35: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
---
[warning] 35-35: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md
around lines 25 - 37, Update the Markdown in verify-report.md to satisfy
MD022/MD031/MD040 by adding blank lines before and after the "Anti-Patterns
Check" and "Code Style" headings and by converting the three fenced code blocks
to have language tags and surrounding blank lines; specifically change the
blocks containing "cargo check --manifest-path clients/agent-runtime/Cargo.toml
→ Finished dev profile" and "cargo clippy --manifest-path
clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev
profile" to ```bash fenced blocks with a blank line before and after, and change
the final test summary block ("All test suites pass: unit tests...") to a
fenced blocks comply with linting rules.
High priority: - Broaden MP3 magic-byte sync detection to accept full MPEG frame mask - Map whisper non-zero exits to TranscriptionFailed by default, Corrupted only on decode-related stderr keywords - Add kill_on_drop(true) to prevent orphaned whisper child processes - Use create_new(true) for temp file creation to prevent symlink attacks - Add MultipleAudioParts rejection variant instead of SystemError - Check model path is_file() not just exists() in doctor - Measure and report actual transcription latency instead of clip duration - Wire WhisperCliTranscriber into ChannelRuntimeContext when audio enabled - Move audio pipeline stages under per-turn timeout boundary - Fall back to system model path when user path doesn't exist Medium priority: - Add deny_unknown_fields to AudioConfig for strict TOML parsing - Validate transcription concurrency and timeout are non-zero at startup - Deduplicate audio constants between schema and audio_media modules - Record audio ingress metrics in OTEL and Prometheus backends - Add user_with_media constructor to avoid partial-state mutations - Add symmetric serde tests for audio_metadata field - Fix discord test panic message for wildcard match arm - Handle unauthorized audio-only messages in Telegram channel Documentation: - Fix stale GitHub link in archive report - Add markdown language tags to fenced code blocks (MD040) - Fix heading spacing (MD022) in exploration and verify-report - Update spec pipeline order to match implementation - Sync archived spec with AudioConfig runtime fields
There was a problem hiding this comment.
Actionable comments posted: 17
♻️ Duplicate comments (3)
clients/agent-runtime/src/doctor/mod.rs (1)
944-971:⚠️ Potential issue | 🟡 Minor
audio_health_pass_model_existsstill bypasses production logic.This test rebuilds model-check behavior inline instead of exercising
check_audio_health, so it can pass while real doctor logic regresses.As per coding guidelines
**/*: “Look for behavioral regressions, missing tests, and contract breaks across modules.”openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md (1)
24-37:⚠️ Potential issue | 🟡 MinorResolve remaining markdownlint spacing violations (MD031/MD022).
Fences at Line 25/30/35 need surrounding blank lines, and headings at Line 313 and Line 320 need blank lines below them.
🧹 Proposed markdown fix
**Build**: ✅ Passed + ```bash cargo check --manifest-path clients/agent-runtime/Cargo.toml → Finished dev profileClippy: ✅ Passed (zero warnings)
+cargo clippy --manifest-path clients/agent-runtime/Cargo.toml --all-targets -- -D warnings → Finished dev profileTests: ✅ 6,487 passed / 0 failed / 0 ignored
+All test suites pass: unit tests (3193 lib + 3220 bin), 15 integration test suites, 2 doc-tests.@@
Anti-Patterns Check
- ✅ No
unwrap()/expect()in production code — all occurrences are in#[cfg(test)]blocks
@@Code Style
- ✅ Follows existing codebase patterns (mirrors
StagedImageGuard,ImageRejectionReason, etc.)</details> Also applies to: 313-321 <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md
around lines 24 - 37, The markdown has missing blank lines around fenced code
blocks and after two headings; add a blank line before and after each
triple-backtick fence that wraps the "cargo check --manifest-path
clients/agent-runtime/Cargo.toml → Finished dev profile" and the "cargo clippy
..." and the "All test suites pass: ..." code blocks, and insert a blank line
immediately below the "Anti-Patterns Check" and "Code Style" headings so each
heading is followed by an empty line; update the verify-report.md content
accordingly to satisfy MD031/MD022.</details> </blockquote></details> <details> <summary>clients/agent-runtime/src/channels/mod.rs (1)</summary><blockquote> `1123-1125`: _⚠️ Potential issue_ | _🟡 Minor_ **Keep `MultipleAudioParts` distinct in ingress telemetry.** This still collapses a known validation failure into `SystemError`, so rejected multi-audio turns are indistinguishable from real runtime faults in `AudioIngressEvent`. Add a dedicated `AudioIngressReason` variant and map it here instead. <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@clients/agent-runtime/src/channels/mod.rsaround lines 1123 - 1125, The
match currently maps audio_media::AudioRejectionReason::MultipleAudioParts to
AudioIngressReason::SystemError, collapsing validation failures with real
faults; add a new AudioIngressReason variant (e.g., MultipleAudioParts) to the
AudioIngressReason enum and update the match in the code handling
audio_media::AudioRejectionReason so MultipleAudioParts maps to the new
AudioIngressReason::MultipleAudioParts; ensure any places that construct or
pattern-match AudioIngressEvent/AudioIngressReason are updated to handle the new
variant so telemetry distinguishes validation rejection from system errors.</details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/settings.local.json:
- Around line 5-6: The new permissions are too broad: replace the unrestricted
"Bash(gh pr:*)" with a scoped, read-only GH CLI set (e.g., "Bash(gh pr:view,gh
pr:list,gh pr:status)" or the minimal verbs your workflow needs) and change
"Read(//tmp/)" to a dedicated temp subdirectory used only by this workflow
(e.g., "Read(//tmp//)") so the CLAUDE settings grant least
privilege while still allowing the workflow’s required read-only PR queries and
access to its own temp folder.In
@clients/agent-runtime/src/channels/audio_media.rs:
- Around line 120-124: The current MP3 sniff accepts any second byte with top 3
bits set (0xE0) which also matches reserved layer values and ADTS AAC headers
(e.g., 0xFF 0xF1); update the check that returns AllowedAudioMime::Mp3 to also
require valid (non-zero) MPEG layer bits so reserved/ADTS frames are rejected —
replace the condition in the snippet that inspects sniffed_bytes (and the
duplicate at the other occurrence) with a combined check: ensure
(sniffed_bytes[1] & 0xE0) == 0xE0 AND (sniffed_bytes[1] & 0x06) != 0, and flip
the test validate_audio_mime_detects_mp3_sync_e0 to expect rejection instead of
acceptance.- Around line 239-277: The to_context_string() method in audio_media.rs
currently injects sha256 and byte_len into model-facing history; remove those
fields so the context string only contains the media marker (mime), optional
duration, the sanitized transcription, and the sanitized caption (retain the
existing newline stripping and 200-char truncation logic). Update the initial
format call in to_context_string() (and any subsequent writes) to no longer
include byte_len or sha256, ensure the closing ']' behavior stays the same, and
update tests that assert the produced context to expect strings without the hash
prefix and byte size (see to_context_string and ChatMessage::user_with_audio for
where this string is consumed).In
@clients/agent-runtime/src/channels/mod.rs:
- Around line 642-688: The per-turn timeout (CHANNEL_MESSAGE_TIMEOUT_SECS) and
started_at stopwatch must be started before any audio work so audio
fetch/staging and transcriber semaphore waits are counted; move the creation of
the per-turn timeout/stopwatch out of the post-audio section and into the code
path before evaluating msg.has_audio_parts(), or alternatively compute the
remaining budget and thread it into the audio functions (pass the
deadline/timeout into gate_audio_config, gate_and_stage_audio, and
transcribe_audio) so those calls respect the same timeout; update uses in the
audio pipeline (audio_history_metas, gate_audio_config, gate_and_stage_audio,
transcribe_audio, inject_transcription) to either run under the pre-started
timeout or accept and honor the remaining deadline.- Around line 1479-1483: The injected_text currently prefixes the transcript
with a synthetic label depending on caption_text, which alters downstream
behavior; instead set injected_text to the transcript content itself (the
trimmed string) without any “[Voice message transcription]”/“[Audio
transcription]” prefix, removing the conditional formatting logic around
caption_text and leaving audio provenance to AudioHistoryMeta so downstream
memory/pre-execution checks and providers see the exact transcribed text.In
@clients/agent-runtime/src/channels/telegram.rs:
- Around line 1856-1870: The current TOCTOU happens because you create the file
with OpenOptions::create_new and then drop the std::fs::File handle before
calling tokio::fs::write, allowing the file to be swapped; fix it by writing
through the original file handle instead of dropping it: keep the std::fs::File
returned by OpenOptions::open (the variable currently named file), and either
call file.write_all(&bytes) (or wrap it in spawn_blocking if you must avoid
blocking the async runtime) or convert it to a tokio::fs::File via
tokio::fs::File::from_std(file) and call async write_all; ensure you flush (and
optionally sync_all) and close the handle before proceeding.In
@clients/agent-runtime/src/config/schema.rs:
- Around line 3345-3371: When audio is enabled (ac.enabled), validate that
ac.whisper_binary and ac.transcription_model are not blank: trim whitespace and
if either is empty, return an error (anyhow::bail!) stating that whisper_binary
and/or transcription_model must be non-empty when audio is enabled; perform
these checks alongside the existing ac.allowed_channels validation (before the
tracing::info! log) so the audio path fails closed at startup rather than later
when spawning whisper or resolving models.In
@clients/agent-runtime/src/doctor/mod.rs:
- Around line 687-692: The code currently pushes DiagItem::ok for the whisper
binary when the spawned process returns any Ok result, which can mark a non-zero
exit (e.g., from running--help) as healthy; change the check after running
the binary so you inspect the child process ExitStatus (use status.success())
and only push DiagItem::ok when success() is true, otherwise push a failing
DiagItem (e.g., DiagItem::err) with the binary_path, the non-zero exit code or
status, and any stderr/stdout to make the failure clear.In
@clients/agent-runtime/src/providers/traits.rs:
- Around line 17-19: The audio_metadata field currently serializes
AudioHistoryMeta.transcription duplicating the transcript already stored in
ChatMessage.content; remove or scrub the transcription before persisting by
ensuring AudioHistoryMeta.transcription is either omitted or set to None/empty
during serialization for traits.rs (affecting the audio_metadata field and any
roundtrip test logic that inspects AudioHistoryMeta in the tests around the
lines referenced); update the serializer behavior or the code that constructs
audio_metadata to not carry user speech text while keeping other metadata fields
intact so only ChatMessage.content retains the transcript.In
@clients/agent-runtime/src/transcription/whisper_cli.rs:
- Around line 113-116: The model path check uses exists() which allows
directories; update both transcribe() and health_check() to validate the
resolved model path with is_file() instead of exists() (i.e., replace checks
that call self.model_path.exists() with self.model_path.is_file()) so
directories are rejected early and behavior matches the doctor module's
validation.- Around line 194-214: The health_check in whisper_cli.rs currently spawns
Command::new(&self.binary_path).arg("--help") and treats any Ok(_) from
status().await as success; change the logic in health_check to inspect the
returned std::process::ExitStatus (from binary_check) and fail if
!status.success(): return Err including the exit code or status (use
status.code() or the ExitStatus) so non‑zero exits of the whisper-cli --help
properly produce an error for self.binary_path / health_check / binary_check.In
@openspec/changes/archive/2026-04-03-audio-input-support/design.md:
- Around line 829-850: The doc snippet is stale: replace the old
check_audio_config/DoctorWarning logic with the current
check_audio_health/DiagItem semantics—call check_audio_health (or rename the
snippet) and produce DiagItem entries instead of DoctorWarning, using the
module's file-existence check used elsewhere (e.g., check_model_file_exists or
model_path.is_file() rather than model_path.exists()) and update messages to
match DiagItem fields (kind/source "audio", diagnostic message, and appropriate
severity/category). Also update references from whisper_binary/whisper model
checks to the actual identifiers used by check_audio_health (e.g.,
resolve_model_path -> the current resolver) so the doc mirrors the real function
names and return structure.- Around line 364-379: Update the design docs to match the implemented
transcription trait: change TranscriptionResult.duration_secs from f64 to
Option, update any function signatures that still show anyhow::Result or
Result<TranscriptionResult, _> to the actual Result<TranscriptionResult,
AudioRejectionReason> shape, and change health_check() return documentation from
bool to the implemented Result<(), String>; ensure all occurrences (including
the other referenced sections) reference the concrete types and error enum
AudioRejectionReason and the TranscriptionResult struct as implemented.In
@openspec/changes/archive/2026-04-03-audio-input-support/proposal.md:
- Line 39: Update the stale count "6 error types" in the proposal where the
phrase "unsupported format, too large, too long, corrupted, transcription
failed, no speech" appears—either remove the numeric count or replace it with an
accurate description that includes the additional cases (disabled, channel,
transcriber, system) implemented in the PR so the taxonomy in the text matches
the code/implementation; ensure the phrase describing error types reflects the
full set or uses non-numeric language like "the following error types" to avoid
future drift.In
@openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md:
- Around line 632-647: The spec omits the new
AudioRejectionReason::MultipleAudioParts variant; add a 12th row to the
rejection table forMultipleAudioPartswith a clear user-facing message (e.g.,
"Please send only one audio file at a time.") and an "Emitted When" description
like "More than one audio attachment is present / runtime rejects multiple audio
parts", ensuring the taxonomy is updated from 11 to 12 variants so the archived
spec matches the shipped enum and runtime behavior; reference
AudioRejectionReason::MultipleAudioParts and update the surrounding text
asserting exhaustiveness (REQ-11) to reflect 12 variants.In
@openspec/specs/audio-input/spec.md:
- Around line 404-415: Update the spec text for the health_check() scenarios to
describe the resolved whisper model lookup order rather than a single hard-coded
path: list the precedence used by the implementation (explicit configured path,
user home path like ~/.corvus/models/whisper/{model}.bin, then
system/package-managed locations) or reference the resolved path returned by the
health_check() logic; ensure the "unhealthy" scenario expects an Err(String)
that names the resolved path it tried. Apply the same wording change to the
other affected section referenced (the block around lines 892-915) so
package-managed installs and the documented system fallback are covered
consistently.- Around line 97-115: Add and document an early fail-closed check for multiple
audio parts: before the 7-step pipeline in process_channel_message() (i.e.,
prior to Parse/extract_user_text()), detect if more than one ContentPart::Audio
is present and immediately reject with AudioRejectionReason::MultipleAudioParts
and emit an AudioIngressEvent; update REQ-2 to state this early rejection and
update the REQ-11 taxonomy/table to include the MultipleAudioParts reason and
its human-readable message so the docs match the runtime behavior (also apply
the same insertion/update in the corresponding section around lines 632-647).
Duplicate comments:
In@clients/agent-runtime/src/channels/mod.rs:
- Around line 1123-1125: The match currently maps
audio_media::AudioRejectionReason::MultipleAudioParts to
AudioIngressReason::SystemError, collapsing validation failures with real
faults; add a new AudioIngressReason variant (e.g., MultipleAudioParts) to the
AudioIngressReason enum and update the match in the code handling
audio_media::AudioRejectionReason so MultipleAudioParts maps to the new
AudioIngressReason::MultipleAudioParts; ensure any places that construct or
pattern-match AudioIngressEvent/AudioIngressReason are updated to handle the new
variant so telemetry distinguishes validation rejection from system errors.In
@openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md:
- Around line 24-37: The markdown has missing blank lines around fenced code
blocks and after two headings; add a blank line before and after each
triple-backtick fence that wraps the "cargo check --manifest-path
clients/agent-runtime/Cargo.toml → Finished dev profile" and the "cargo clippy
..." and the "All test suites pass: ..." code blocks, and insert a blank line
immediately below the "Anti-Patterns Check" and "Code Style" headings so each
heading is followed by an empty line; update the verify-report.md content
accordingly to satisfy MD031/MD022.</details> <details> <summary>🪄 Autofix (Beta)</summary> Fix all unresolved CodeRabbit comments on this PR: - [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended) - [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes </details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: ASSERTIVE **Plan**: Pro **Run ID**: `3d6a679b-6c7c-46da-983f-3bf8712ead66` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between c2f63419f794663da8ed48e9ffea658da3c5dbac and aa848be8752cee1ea4c499f8b9e299899d19a998. </details> <details> <summary>📒 Files selected for processing (19)</summary> * `.claude/settings.local.json` * `clients/agent-runtime/src/channels/audio_media.rs` * `clients/agent-runtime/src/channels/discord.rs` * `clients/agent-runtime/src/channels/mod.rs` * `clients/agent-runtime/src/channels/telegram.rs` * `clients/agent-runtime/src/config/schema.rs` * `clients/agent-runtime/src/doctor/mod.rs` * `clients/agent-runtime/src/observability/otel.rs` * `clients/agent-runtime/src/observability/prometheus.rs` * `clients/agent-runtime/src/providers/traits.rs` * `clients/agent-runtime/src/transcription/traits.rs` * `clients/agent-runtime/src/transcription/whisper_cli.rs` * `openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md` * `openspec/changes/archive/2026-04-03-audio-input-support/design.md` * `openspec/changes/archive/2026-04-03-audio-input-support/exploration.md` * `openspec/changes/archive/2026-04-03-audio-input-support/proposal.md` * `openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md` * `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md` * `openspec/specs/audio-input/spec.md` </details> </details> <details> <summary>📜 Review details</summary> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)</summary> * GitHub Check: sonar * GitHub Check: pr-checks * GitHub Check: submit-gradle * GitHub Check: Cloudflare Pages </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>📓 Path-based instructions (8)</summary> <details> <summary>clients/agent-runtime/src/channels/**/*.rs</summary> **📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)** > Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests Files: - `clients/agent-runtime/src/channels/discord.rs` - `clients/agent-runtime/src/channels/mod.rs` - `clients/agent-runtime/src/channels/telegram.rs` - `clients/agent-runtime/src/channels/audio_media.rs` </details> <details> <summary>clients/agent-runtime/src/**/*.rs</summary> **📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)** > `clients/agent-runtime/src/**/*.rs`: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements > Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency Files: - `clients/agent-runtime/src/channels/discord.rs` - `clients/agent-runtime/src/observability/prometheus.rs` - `clients/agent-runtime/src/observability/otel.rs` - `clients/agent-runtime/src/transcription/traits.rs` - `clients/agent-runtime/src/providers/traits.rs` - `clients/agent-runtime/src/doctor/mod.rs` - `clients/agent-runtime/src/transcription/whisper_cli.rs` - `clients/agent-runtime/src/channels/mod.rs` - `clients/agent-runtime/src/channels/telegram.rs` - `clients/agent-runtime/src/config/schema.rs` - `clients/agent-runtime/src/channels/audio_media.rs` </details> <details> <summary>clients/agent-runtime/**/*.rs</summary> **📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)** > Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why Files: - `clients/agent-runtime/src/channels/discord.rs` - `clients/agent-runtime/src/observability/prometheus.rs` - `clients/agent-runtime/src/observability/otel.rs` - `clients/agent-runtime/src/transcription/traits.rs` - `clients/agent-runtime/src/providers/traits.rs` - `clients/agent-runtime/src/doctor/mod.rs` - `clients/agent-runtime/src/transcription/whisper_cli.rs` - `clients/agent-runtime/src/channels/mod.rs` - `clients/agent-runtime/src/channels/telegram.rs` - `clients/agent-runtime/src/config/schema.rs` - `clients/agent-runtime/src/channels/audio_media.rs` </details> <details> <summary>**/*.rs</summary> **⚙️ CodeRabbit configuration file** > `**/*.rs`: Focus on Rust idioms, memory safety, and ownership/borrowing correctness. > Flag unnecessary clones, unchecked panics in production paths, and weak error context. > Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling. > Files: - `clients/agent-runtime/src/channels/discord.rs` - `clients/agent-runtime/src/observability/prometheus.rs` - `clients/agent-runtime/src/observability/otel.rs` - `clients/agent-runtime/src/transcription/traits.rs` - `clients/agent-runtime/src/providers/traits.rs` - `clients/agent-runtime/src/doctor/mod.rs` - `clients/agent-runtime/src/transcription/whisper_cli.rs` - `clients/agent-runtime/src/channels/mod.rs` - `clients/agent-runtime/src/channels/telegram.rs` - `clients/agent-runtime/src/config/schema.rs` - `clients/agent-runtime/src/channels/audio_media.rs` </details> <details> <summary>**/*</summary> **⚙️ CodeRabbit configuration file** > `**/*`: Security first, performance second. > Validate input boundaries, auth/authz implications, and secret management. > Look for behavioral regressions, missing tests, and contract breaks across modules. > Files: - `clients/agent-runtime/src/channels/discord.rs` - `clients/agent-runtime/src/observability/prometheus.rs` - `clients/agent-runtime/src/observability/otel.rs` - `openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md` - `openspec/changes/archive/2026-04-03-audio-input-support/proposal.md` - `clients/agent-runtime/src/transcription/traits.rs` - `openspec/changes/archive/2026-04-03-audio-input-support/exploration.md` - `clients/agent-runtime/src/providers/traits.rs` - `openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md` - `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md` - `openspec/specs/audio-input/spec.md` - `clients/agent-runtime/src/doctor/mod.rs` - `clients/agent-runtime/src/transcription/whisper_cli.rs` - `clients/agent-runtime/src/channels/mod.rs` - `openspec/changes/archive/2026-04-03-audio-input-support/design.md` - `clients/agent-runtime/src/channels/telegram.rs` - `clients/agent-runtime/src/config/schema.rs` - `clients/agent-runtime/src/channels/audio_media.rs` </details> <details> <summary>**/*.{md,mdx}</summary> **⚙️ CodeRabbit configuration file** > `**/*.{md,mdx}`: Verify technical accuracy and that docs stay aligned with code changes. > For user-facing docs, check EN/ES parity or explicitly note pending translation gaps. > Files: - `openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md` - `openspec/changes/archive/2026-04-03-audio-input-support/proposal.md` - `openspec/changes/archive/2026-04-03-audio-input-support/exploration.md` - `openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md` - `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md` - `openspec/specs/audio-input/spec.md` - `openspec/changes/archive/2026-04-03-audio-input-support/design.md` </details> <details> <summary>clients/agent-runtime/src/providers/**/*.rs</summary> **📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)** > Implement `Provider` trait in `src/providers/` and register in `src/providers/mod.rs` factory when adding a new provider Files: - `clients/agent-runtime/src/providers/traits.rs` </details> <details> <summary>clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs</summary> **📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)** > Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable Files: - `clients/agent-runtime/src/config/schema.rs` </details> </details><details> <summary>🧠 Learnings (7)</summary> <details> <summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths**Applied to files:** - `clients/agent-runtime/src/channels/discord.rs` - `openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md` - `openspec/specs/audio-input/spec.md` - `clients/agent-runtime/src/doctor/mod.rs` - `clients/agent-runtime/src/transcription/whisper_cli.rs` - `clients/agent-runtime/src/config/schema.rs` </details> <details> <summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/providers/**/*.rs : ImplementProvidertrait insrc/providers/and register insrc/providers/mod.rsfactory when adding a new provider**Applied to files:** - `clients/agent-runtime/src/transcription/traits.rs` </details> <details> <summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : ImplementChanneltrait insrc/channels/with consistentsend,listen, andhealth_checksemantics and cover auth/allowlist/health behavior with tests**Applied to files:** - `clients/agent-runtime/src/transcription/traits.rs` - `clients/agent-runtime/src/providers/traits.rs` - `openspec/specs/audio-input/spec.md` - `clients/agent-runtime/src/channels/mod.rs` - `clients/agent-runtime/src/channels/telegram.rs` - `clients/agent-runtime/src/channels/audio_media.rs` </details> <details> <summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Runcargo fmt --all -- --check,cargo clippy --all-targets -- -D warnings, andcargo testfor code validation, or document which checks were skipped and why**Applied to files:** - `clients/agent-runtime/src/providers/traits.rs` - `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md` - `clients/agent-runtime/src/doctor/mod.rs` - `openspec/changes/archive/2026-04-03-audio-input-support/design.md` - `clients/agent-runtime/src/config/schema.rs` - `clients/agent-runtime/src/channels/audio_media.rs` </details> <details> <summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/.github/**/*.{yml,yaml} : For workflow/template-only changes, ensure YAML/template syntax validity**Applied to files:** - `openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md` </details> <details> <summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools}/**/*.rs : Treatsrc/security/,src/gateway/,src/tools/as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks**Applied to files:** - `clients/agent-runtime/src/channels/telegram.rs` - `clients/agent-runtime/src/config/schema.rs` </details> <details> <summary>📚 Learning: 2026-02-17T12:31:17.076Z</summary>Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable**Applied to files:** - `clients/agent-runtime/src/config/schema.rs` </details> </details><details> <summary>🪛 LanguageTool</summary> <details> <summary>openspec/changes/archive/2026-04-03-audio-input-support/exploration.md</summary> [style] ~181-~181: Consider using a different adverb to strengthen your wording. Context: ...) and audio files (`audio` field) are **completely ignored** — messages with only voice/au... (COMPLETELY_ENTIRELY) </details> <details> <summary>openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md</summary> [style] ~399-~399: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase. Context: ...tion failure — process timeout - GIVEN a very large audio file near the duration limit - WH... (EN_WEAK_ADJECTIVE) --- [grammar] ~482-~482: Use a hyphen to join words. Context: ...tartup validation error indicating the 1 hour ceiling #### Scenario: Missing aud... (QB_NEW_EN_HYPHEN) --- [style] ~841-~841: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase. Context: ...t, including: - Zero-byte audio files - Extremely large files (rejected by size limit) - Files ... (EN_WEAK_ADJECTIVE) --- [locale-violation] ~850-~850: In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects. Context: ... same user sends a text message "hello" afterwards - THEN the text message is processed no... (AFTERWARDS_US) </details> <details> <summary>openspec/specs/audio-input/spec.md</summary> [style] ~399-~399: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase. Context: ...tion failure — process timeout - GIVEN a very large audio file near the duration limit - WH... (EN_WEAK_ADJECTIVE) --- [grammar] ~482-~482: Use a hyphen to join words. Context: ...tartup validation error indicating the 1 hour ceiling #### Scenario: Missing aud... (QB_NEW_EN_HYPHEN) --- [style] ~841-~841: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase. Context: ...t, including: - Zero-byte audio files - Extremely large files (rejected by size limit) - Files ... (EN_WEAK_ADJECTIVE) --- [locale-violation] ~850-~850: In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects. Context: ... same user sends a text message "hello" afterwards - THEN the text message is processed no... (AFTERWARDS_US) </details> <details> <summary>openspec/changes/archive/2026-04-03-audio-input-support/design.md</summary> [style] ~942-~942: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. Context: ...nges. - No provider contract changes. - No existing behavior modified. - Rollout: ... (ENGLISH_WORD_REPEAT_BEGINNING_RULE) </details> </details> <details> <summary>🪛 markdownlint-cli2 (0.22.0)</summary> <details> <summary>openspec/changes/archive/2026-04-03-audio-input-support/verify-report.md</summary> [warning] 25-25: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- [warning] 30-30: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- [warning] 35-35: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) --- [warning] 313-313: Headings should be surrounded by blank lines Expected: 1; Actual: 0; Below (MD022, blanks-around-headings) --- [warning] 320-320: Headings should be surrounded by blank lines Expected: 1; Actual: 0; Below (MD022, blanks-around-headings) </details> </details> </details> <details> <summary>🔇 Additional comments (10)</summary><blockquote> <details> <summary>clients/agent-runtime/src/channels/discord.rs (1)</summary><blockquote> `955-955`: **LGTM - test diagnostic improved.** The panic message now accurately reflects the unexpected variant, addressing the prior feedback. Good fix. </blockquote></details> <details> <summary>clients/agent-runtime/src/observability/otel.rs (1)</summary><blockquote> `209-223`: **Audio ingress OTEL handling is now correctly wired.** Line 209 no longer drops `ObserverEvent::AudioIngress`, and Line 215 emits the counter with structured attributes. </blockquote></details> <details> <summary>clients/agent-runtime/src/observability/prometheus.rs (1)</summary><blockquote> `120-131`: **Prometheus audio ingress metrics are correctly added.** The new counter registration and the `ObserverEvent::AudioIngress` handler cover the previously missing telemetry path. Also applies to: 205-215 </blockquote></details> <details> <summary>openspec/changes/archive/2026-04-03-audio-input-support/archive-report.md (1)</summary><blockquote> `4-7`: **Archive metadata looks aligned and corrected.** Issue linking and archive metadata are consistent with the PR context. </blockquote></details> <details> <summary>clients/agent-runtime/src/transcription/traits.rs (1)</summary><blockquote> `35-45`: **Transcriber contract is well-shaped for pipeline error mapping.** Returning `AudioRejectionReason` from `transcribe()` and structured readiness errors from `health_check()` keeps ingress handling deterministic. </blockquote></details> <details> <summary>clients/agent-runtime/src/channels/telegram.rs (4)</summary><blockquote> `1-1`: **LGTM!** Import aligns with the new audio_media module structure. --- `63-107`: **LGTM!** Voice/audio parsing correctly mirrors the image parsing pattern. The forced `audio/ogg` for voice notes matches Telegram's spec. --- `749-776`: **LGTM!** Correctly addresses the unauthorized media-only message handling gap. Senders now receive notification regardless of whether they sent text or audio-only content. --- `3114-3114`: **LGTM!** Test assertion pattern change is fine. </blockquote></details> <details> <summary>openspec/changes/archive/2026-04-03-audio-input-support/exploration.md (1)</summary><blockquote> `1-463`: **LGTM!** Documentation accurately reflects the implementation design. The markdown formatting issues from previous review (MD040/MD022) have been addressed — code blocks are labeled and headings have proper spacing. </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| let injected_text = if caption_text.is_some() { | ||
| format!("[Audio transcription]: {trimmed}") | ||
| } else { | ||
| format!("[Voice message transcription]: {trimmed}") | ||
| }; |
There was a problem hiding this comment.
Inject the transcript text itself, not a synthetic label.
Prefixing the transcript with [Voice message transcription]: / [Audio transcription]: changes what memory, pre-execution checks, and the provider see, so audio input is no longer equivalent to the typed user text. The audio provenance is already preserved in AudioHistoryMeta.
✂️ Suggested change
- let injected_text = if caption_text.is_some() {
- format!("[Audio transcription]: {trimmed}")
- } else {
- format!("[Voice message transcription]: {trimmed}")
- };
+ let injected_text = trimmed.clone();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let injected_text = if caption_text.is_some() { | |
| format!("[Audio transcription]: {trimmed}") | |
| } else { | |
| format!("[Voice message transcription]: {trimmed}") | |
| }; | |
| let injected_text = trimmed.clone(); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@clients/agent-runtime/src/channels/mod.rs` around lines 1479 - 1483, The
injected_text currently prefixes the transcript with a synthetic label depending
on caption_text, which alters downstream behavior; instead set injected_text to
the transcript content itself (the trimmed string) without any “[Voice message
transcription]”/“[Audio transcription]” prefix, removing the conditional
formatting logic around caption_text and leaving audio provenance to
AudioHistoryMeta so downstream memory/pre-execution checks and providers see the
exact transcribed text.
| ```rust | ||
| fn check_audio_config(config: &AudioConfig) -> Vec<DoctorWarning> { | ||
| let mut warnings = Vec::new(); | ||
| if config.enabled { | ||
| // 1. Check whisper binary is in PATH or at configured path | ||
| if which::which(&config.whisper_binary).is_err() { | ||
| warnings.push(DoctorWarning::new( | ||
| "audio", | ||
| format!("whisper binary '{}' not found in PATH", config.whisper_binary), | ||
| )); | ||
| } | ||
| // 2. Check model file exists | ||
| let model_path = resolve_model_path(&config.transcription_model); | ||
| if !model_path.exists() { | ||
| warnings.push(DoctorWarning::new( | ||
| "audio", | ||
| format!("Whisper model not found at {}", model_path.display()), | ||
| )); | ||
| } | ||
| } | ||
| warnings | ||
| } |
There was a problem hiding this comment.
Doctor API snippet is stale relative to current module.
The snippet references check_audio_config, DoctorWarning, and model_path.exists(); the implemented path is check_audio_health with DiagItem and file check semantics. Please refresh this block to avoid misleading future changes.
As per coding guidelines **/*.{md,mdx}: “Verify technical accuracy and that docs stay aligned with code changes.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openspec/changes/archive/2026-04-03-audio-input-support/design.md` around
lines 829 - 850, The doc snippet is stale: replace the old
check_audio_config/DoctorWarning logic with the current
check_audio_health/DiagItem semantics—call check_audio_health (or rename the
snippet) and produce DiagItem entries instead of DoctorWarning, using the
module's file-existence check used elsewhere (e.g., check_model_file_exists or
model_path.is_file() rather than model_path.exists()) and update messages to
match DiagItem fields (kind/source "audio", diagnostic message, and appropriate
severity/category). Also update references from whisper_binary/whisper model
checks to the actual identifiers used by check_audio_health (e.g.,
resolve_model_path -> the current resolver) so the doc mirrors the real function
names and return structure.
| - Audio observability events (`AudioIngressEvent`, `on_audio_ingress()`) | ||
| - Audio history metadata (`AudioHistoryMeta` with transcription text) | ||
| - `StagedAudioGuard` RAII cleanup | ||
| - 6 error types: unsupported format, too large, too long, corrupted, transcription failed, no speech |
There was a problem hiding this comment.
Fix rejection taxonomy count in proposal.
Line 39 states “6 error types”, but this PR’s documented/implemented taxonomy is larger (including disabled/channel/transcriber/system cases). Update this line to avoid stale requirements language.
✏️ Proposed doc fix
-- 6 error types: unsupported format, too large, too long, corrupted, transcription failed, no speech
+- 11+ rejection/error categories covering format, size/duration, corruption, transcription/no-speech, and runtime availability/system failuresAs per coding guidelines **/*.{md,mdx}: “Verify technical accuracy and that docs stay aligned with code changes.”
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - 6 error types: unsupported format, too large, too long, corrupted, transcription failed, no speech | |
| - 11+ rejection/error categories covering format, size/duration, corruption, transcription/no-speech, and runtime availability/system failures |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openspec/changes/archive/2026-04-03-audio-input-support/proposal.md` at line
39, Update the stale count "6 error types" in the proposal where the phrase
"unsupported format, too large, too long, corrupted, transcription failed, no
speech" appears—either remove the numeric count or replace it with an accurate
description that includes the additional cases (disabled, channel, transcriber,
system) implemented in the PR so the taxonomy in the text matches the
code/implementation; ensure the phrase describing error types reflects the full
set or uses non-numeric language like "the following error types" to avoid
future drift.
| | Rejection Reason | User-Facing Message | Emitted When | | ||
| |-------------------------|--------------------------------------------------------------------------------------|-----------------------------------------------------------| | ||
| | `Disabled` | "Audio input is currently disabled." | `audio.enabled` is `false` | | ||
| | `ChannelNotAllowed` | "Audio input is not enabled for this channel." | Channel not in `audio.allowed_channels` | | ||
| | `FetchFailed` | "I couldn't download that audio safely. Please try again." | Channel fetch fails (network, auth, timeout) | | ||
| | `MimeRejected` | "That audio format is not supported. Supported formats: OGG, MP3, WAV, M4A." | Magic-byte sniffing does not match allowed formats | | ||
| | `Oversize` | "That audio file is too large to process. Maximum size: 25 MB." | Audio bytes exceed effective size limit | | ||
| | `TooLong` | "That audio is too long to process. Maximum duration: 10 minutes." | Duration exceeds effective duration limit | | ||
| | `Corrupted` | "That audio file appears to be corrupted and cannot be processed." | Transcription engine cannot decode the audio | | ||
| | `TranscriptionFailed` | "Audio transcription failed. Please try again or send text instead." | Transcriber returns error (process crash, timeout, etc.) | | ||
| | `NoSpeechDetected` | "No speech was detected in that audio. Please try again with a clearer recording." | Transcription produces empty/whitespace-only text | | ||
| | `TranscriberUnavailable`| "Audio transcription is not available on this agent. Please send text instead." | No healthy Transcriber is registered or health check fails| | ||
| | `SystemError` | "An internal error occurred while processing audio. Please try again." | Unexpected internal error (e.g., temp file I/O failure, semaphore poisoning) | | ||
|
|
||
| This taxonomy (11 variants) MUST be exhaustive for Phase 1. Every audio rejection MUST map to | ||
| exactly one of these reasons. |
There was a problem hiding this comment.
Document the MultipleAudioParts rejection.
REQ-11 still declares an exhaustive 11-variant taxonomy, but the runtime now rejects multiple audio attachments with AudioRejectionReason::MultipleAudioParts and a dedicated user-facing message. Please add that 12th variant here, along with its ingress-reason behavior, so the archived spec matches the shipped contract.
As per coding guidelines, "Verify technical accuracy and that docs stay aligned with code changes."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@openspec/changes/archive/2026-04-03-audio-input-support/specs/audio-input/spec.md`
around lines 632 - 647, The spec omits the new
AudioRejectionReason::MultipleAudioParts variant; add a 12th row to the
rejection table for `MultipleAudioParts` with a clear user-facing message (e.g.,
"Please send only one audio file at a time.") and an "Emitted When" description
like "More than one audio attachment is present / runtime rejects multiple audio
parts", ensuring the taxonomy is updated from 11 to 12 variants so the archived
spec matches the shipped enum and runtime behavior; reference
AudioRejectionReason::MultipleAudioParts and update the surrounding text
asserting exhaustiveness (REQ-11) to reflect 12 variants.
| The runtime MUST process every inbound audio through a 7-step pipeline inserted into | ||
| `process_channel_message()` before `extract_user_text()` and `enrich_with_memory()`: | ||
|
|
||
| 1. **Parse**: Channel extracts audio metadata into `ContentPart::Audio` (REQ-1) | ||
| 2. **Gate config**: Check `[audio]` config — `enabled` and `allowed_channels` (REQ-7) | ||
| 3. **Fetch**: Download audio bytes from the channel's platform API (REQ-10) | ||
| 4. **Validate**: Apply MIME sniffing, size limit, and duration limit (REQ-3, REQ-4) | ||
| 5. **Stage**: Write validated bytes to temp file as `StagedAudio`, protected by `StagedAudioGuard` | ||
| RAII cleanup (REQ-5) | ||
| 6. **Transcribe**: Invoke `Transcriber::transcribe()` to produce text (REQ-6) | ||
| 7. **Inject**: Replace `ContentPart::Audio` with `ContentPart::Text` containing the transcription; | ||
| store `AudioHistoryMeta` (REQ-8) | ||
|
|
||
| After injection, the message continues through the normal text-only flow (`enrich_with_memory()` → | ||
| `run_unified_channel_tool_loop()` → provider). The provider MUST NOT receive audio bytes or any | ||
| audio-specific payload. | ||
|
|
||
| The pipeline MUST be fail-closed: any step that cannot be completed MUST reject the audio with an | ||
| appropriate `AudioRejectionReason` and emit an `AudioIngressEvent`. |
There was a problem hiding this comment.
Document the multiple-audio rejection contract.
Issue #246 is single-audio-per-message, and the runtime already has AudioRejectionReason::MultipleAudioParts, but REQ-2 still reads as if every inbound audio proceeds into the 7-step pipeline and REQ-11 says the 11-row table is exhaustive. Please add the early rejection when more than one audio part is present and include the missing reason/message in the taxonomy.
As per coding guidelines, "Verify technical accuracy and that docs stay aligned with code changes."
Also applies to: 632-647
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openspec/specs/audio-input/spec.md` around lines 97 - 115, Add and document
an early fail-closed check for multiple audio parts: before the 7-step pipeline
in process_channel_message() (i.e., prior to Parse/extract_user_text()), detect
if more than one ContentPart::Audio is present and immediately reject with
AudioRejectionReason::MultipleAudioParts and emit an AudioIngressEvent; update
REQ-2 to state this early rejection and update the REQ-11 taxonomy/table to
include the MultipleAudioParts reason and its human-readable message so the docs
match the runtime behavior (also apply the same insertion/update in the
corresponding section around lines 632-647).
| #### Scenario: Health check — healthy | ||
|
|
||
| - GIVEN whisper binary exists at the configured path | ||
| - AND the configured model file exists at `~/.corvus/models/whisper/{model}.bin` | ||
| - WHEN `health_check()` is called | ||
| - THEN it MUST return `Ok(())` | ||
|
|
||
| #### Scenario: Health check — unhealthy (missing model) | ||
|
|
||
| - GIVEN whisper binary exists but the configured model file does not exist | ||
| - WHEN `health_check()` is called | ||
| - THEN it MUST return `Err(String)` with a descriptive message about the missing model |
There was a problem hiding this comment.
Describe the resolved whisper model path, not a single hard-coded location.
The implementation notes a system model-path fallback, but these sections only allow ~/.corvus/models/whisper/{model}.bin. Please document the actual lookup order or refer to the resolved model path so package-managed installs remain in spec.
As per coding guidelines, "Verify technical accuracy and that docs stay aligned with code changes."
Also applies to: 892-915
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openspec/specs/audio-input/spec.md` around lines 404 - 415, Update the spec
text for the health_check() scenarios to describe the resolved whisper model
lookup order rather than a single hard-coded path: list the precedence used by
the implementation (explicit configured path, user home path like
~/.corvus/models/whisper/{model}.bin, then system/package-managed locations) or
reference the resolved path returned by the health_check() logic; ensure the
"unhealthy" scenario expects an Err(String) that names the resolved path it
tried. Apply the same wording change to the other affected section referenced
(the block around lines 892-915) so package-managed installs and the documented
system fallback are covered consistently.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (5)
clients/agent-runtime/src/channels/mod.rs (3)
640-689:⚠️ Potential issue | 🟠 MajorThe turn timeout still starts after audio work.
Moving
started_atearlier only changes logging. The actualtokio::time::timeout(...)is still created at Line 775, after gating, fetch/staging, and transcription complete, so slow downloads or transcriber queue waits can still bypassCHANNEL_MESSAGE_TIMEOUT_SECS. Start one deadline before the audio branch and spend the remaining budget inside the audio helpers.Also applies to: 775-791
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/mod.rs` around lines 640 - 689, The timeout/deadline must be created before doing audio work so slow gating, staging or transcription can't bypass CHANNEL_MESSAGE_TIMEOUT_SECS; move creation of the deadline/timeout (currently using started_at + CHANNEL_MESSAGE_TIMEOUT_SECS and tokio::time::timeout(...)) to immediately after computing session_id/started_at, then thread the remaining time/deadline into the audio helpers (gate_audio_config, gate_and_stage_audio, transcribe_audio) or wrap those calls with a timeout using the precomputed remaining Duration so they honor the same CHANNEL_MESSAGE_TIMEOUT_SECS budget.
1479-1483:⚠️ Potential issue | 🟠 MajorInject the transcript text verbatim.
These prefixes change what memory, pre-execution checks, and the provider see, so audio input is no longer equivalent to typed input. The provenance already lives in
AudioHistoryMeta.Suggested change
- let injected_text = if caption_text.is_some() { - format!("[Audio transcription]: {trimmed}") - } else { - format!("[Voice message transcription]: {trimmed}") - }; + let injected_text = trimmed.clone();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/mod.rs` around lines 1479 - 1483, The injected transcript currently prepends a prefix based on caption_text (the injected_text construction) which changes how downstream memory and providers treat audio vs typed input; remove those prefixes and inject the transcript verbatim (use the trimmed transcript string directly) so provenance remains in AudioHistoryMeta and the transcript is equivalent to typed input; update the code that builds injected_text in channels/mod.rs (the block referencing caption_text and injected_text) to assign the plain trimmed text without "[Audio transcription]" or "[Voice message transcription]" prefixes.
1102-1125:⚠️ Potential issue | 🟠 MajorPreserve
MultipleAudioPartsin observability.This mapping still collapses a known validation failure into
AudioIngressReason::SystemError, so dashboards and alerts cannot distinguish “one audio per message” rejections from real runtime failures. Add a dedicated observability reason and map it through here instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/mod.rs` around lines 1102 - 1125, The match in audio_rejection_to_ingress_reason collapses audio_media::AudioRejectionReason::MultipleAudioParts into AudioIngressReason::SystemError; introduce a dedicated observability variant (e.g., AudioIngressReason::MultipleAudioParts) in the observability enum and update audio_rejection_to_ingress_reason to map audio_media::AudioRejectionReason::MultipleAudioParts to that new variant instead of SystemError so validation rejections are distinguishable in dashboards and alerts; ensure any serialization/usage sites of AudioIngressReason handle the new variant.clients/agent-runtime/src/channels/audio_media.rs (2)
247-254:⚠️ Potential issue | 🟠 MajorKeep trace metadata out of model-facing history.
to_context_string()is replayed into chat history frombuild_history(). Includingbyte_lenandsha256here leaks internal trace metadata into the prompt and burns tokens on every prior audio turn even though those fields already live in structured history. Keep the synthetic context to modality, duration, transcription, and caption only.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/audio_media.rs` around lines 247 - 254, The to_context_string method currently injects internal trace metadata (byte_len and sha256) into model-facing history; modify to_context_string (used by build_history) to only include modality (mime), duration, transcription, and caption in the produced string, removing byte_len, sha256, and any related prefix_len logic so the returned context string is concise and safe for replay into chat history.
120-129:⚠️ Potential issue | 🟠 MajorReject reserved MPEG version IDs too.
This still accepts headers like
0xFF 0xEA/0xEC/0xEEbecause only the layer bits are checked. Those use the reserved MPEG version id (0b01), so the MIME gate can still misclassify invalid frames as MP3. Add aversion_bits != 0b01guard and a regression test.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/audio_media.rs` around lines 120 - 129, The MP3 sniffing still accepts frames with the reserved MPEG version id (0b01) because only layer bits were checked; update the guard in the sniffing branch that inspects sniffed_bytes so it also rejects version bits == 0b01 (i.e., check the MPEG version bits in sniffed_bytes[1] and skip/return non-MP3 when they equal the reserved value) before returning AllowedAudioMime::Mp3, and add a regression test (e.g., feed bytes like 0xFF 0xEA/0xEC/0xEE) to assert these are not classified as AllowedAudioMime::Mp3.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/agent-runtime/src/channels/audio_media.rs`:
- Around line 140-142: The current ftyp check on sniffed_bytes (using
sniffed_bytes[4..8] == b"ftyp") is too broad and treats any ISO BMFF as
AllowedAudioMime::M4a; update the detection in the same code path that returns
AllowedAudioMime::M4a to either (1) parse the ftyp box further and verify the
major_brand or any compatible_brand (bytes after the 8-byte header) contains an
audio-specific brand (e.g., "M4A " / "M4B " or other known audio brands) before
returning AllowedAudioMime::M4a, or (2) if brands are absent/unreliable, parse
the MP4 boxes to locate the moov->trak->mdia->hdlr box and ensure the
handler_type equals "soun" (audio) before accepting as M4a; apply this check
where sniffed_bytes and AllowedAudioMime::M4a are referenced so non-audio MP4
containers are rejected.
In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 1173-1175: The message for AudioRejectionReason::TooLong
incorrectly computes the displayed limit using integer division
(config.audio.max_audio_duration_secs / 60) which yields 0 for sub-minute limits
and underreports others; update the formatting in the match arm for
audio_media::AudioRejectionReason::TooLong to compute minutes and seconds from
config.audio.max_audio_duration_secs (or round up minutes when you prefer
minute-only display), choose pluralization ("minute"/"minutes",
"second"/"seconds") accordingly, and emit either "X minutes Y seconds" for
sub-minute and mixed values or "N minute(s)" when exact; ensure you reference
config.audio.max_audio_duration_secs and the AudioRejectionReason::TooLong
branch when making the change.
---
Duplicate comments:
In `@clients/agent-runtime/src/channels/audio_media.rs`:
- Around line 247-254: The to_context_string method currently injects internal
trace metadata (byte_len and sha256) into model-facing history; modify
to_context_string (used by build_history) to only include modality (mime),
duration, transcription, and caption in the produced string, removing byte_len,
sha256, and any related prefix_len logic so the returned context string is
concise and safe for replay into chat history.
- Around line 120-129: The MP3 sniffing still accepts frames with the reserved
MPEG version id (0b01) because only layer bits were checked; update the guard in
the sniffing branch that inspects sniffed_bytes so it also rejects version bits
== 0b01 (i.e., check the MPEG version bits in sniffed_bytes[1] and skip/return
non-MP3 when they equal the reserved value) before returning
AllowedAudioMime::Mp3, and add a regression test (e.g., feed bytes like 0xFF
0xEA/0xEC/0xEE) to assert these are not classified as AllowedAudioMime::Mp3.
In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 640-689: The timeout/deadline must be created before doing audio
work so slow gating, staging or transcription can't bypass
CHANNEL_MESSAGE_TIMEOUT_SECS; move creation of the deadline/timeout (currently
using started_at + CHANNEL_MESSAGE_TIMEOUT_SECS and tokio::time::timeout(...))
to immediately after computing session_id/started_at, then thread the remaining
time/deadline into the audio helpers (gate_audio_config, gate_and_stage_audio,
transcribe_audio) or wrap those calls with a timeout using the precomputed
remaining Duration so they honor the same CHANNEL_MESSAGE_TIMEOUT_SECS budget.
- Around line 1479-1483: The injected transcript currently prepends a prefix
based on caption_text (the injected_text construction) which changes how
downstream memory and providers treat audio vs typed input; remove those
prefixes and inject the transcript verbatim (use the trimmed transcript string
directly) so provenance remains in AudioHistoryMeta and the transcript is
equivalent to typed input; update the code that builds injected_text in
channels/mod.rs (the block referencing caption_text and injected_text) to assign
the plain trimmed text without "[Audio transcription]" or "[Voice message
transcription]" prefixes.
- Around line 1102-1125: The match in audio_rejection_to_ingress_reason
collapses audio_media::AudioRejectionReason::MultipleAudioParts into
AudioIngressReason::SystemError; introduce a dedicated observability variant
(e.g., AudioIngressReason::MultipleAudioParts) in the observability enum and
update audio_rejection_to_ingress_reason to map
audio_media::AudioRejectionReason::MultipleAudioParts to that new variant
instead of SystemError so validation rejections are distinguishable in
dashboards and alerts; ensure any serialization/usage sites of
AudioIngressReason handle the new variant.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7e814888-7a47-45ec-a21d-880d1e4bf5be
📒 Files selected for processing (2)
clients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/mod.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pr-checks
- GitHub Check: sonar
- GitHub Check: submit-gradle
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (5)
clients/agent-runtime/src/channels/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Implement
Channeltrait insrc/channels/with consistentsend,listen, andhealth_checksemantics and cover auth/allowlist/health behavior with tests
Files:
clients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/mod.rs
clients/agent-runtime/src/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency
Files:
clients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/mod.rs
clients/agent-runtime/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Run
cargo fmt --all -- --check,cargo clippy --all-targets -- -D warnings, andcargo testfor code validation, or document which checks were skipped and why
Files:
clients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/mod.rs
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/mod.rs
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
clients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/mod.rs
🧠 Learnings (3)
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests
Applied to files:
clients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why
Applied to files:
clients/agent-runtime/src/channels/audio_media.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
Applied to files:
clients/agent-runtime/src/channels/audio_media.rs
| // M4A: bytes 4-7 = "ftyp" (ISO base media file format) | ||
| if sniffed_bytes.len() >= 8 && &sniffed_bytes[4..8] == b"ftyp" { | ||
| return Ok(AllowedAudioMime::M4a); |
There was a problem hiding this comment.
ftyp alone is too broad for M4A detection.
Any ISO BMFF file with an ftyp box here — including plain MP4 video containers — is accepted as AllowedAudioMime::M4a. That weakens the fail-closed MIME gate and can route non-audio payloads into transcription. Check an audio-specific brand or inspect the track handler before returning M4a.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@clients/agent-runtime/src/channels/audio_media.rs` around lines 140 - 142,
The current ftyp check on sniffed_bytes (using sniffed_bytes[4..8] == b"ftyp")
is too broad and treats any ISO BMFF as AllowedAudioMime::M4a; update the
detection in the same code path that returns AllowedAudioMime::M4a to either (1)
parse the ftyp box further and verify the major_brand or any compatible_brand
(bytes after the 8-byte header) contains an audio-specific brand (e.g., "M4A " /
"M4B " or other known audio brands) before returning AllowedAudioMime::M4a, or
(2) if brands are absent/unreliable, parse the MP4 boxes to locate the
moov->trak->mdia->hdlr box and ensure the handler_type equals "soun" (audio)
before accepting as M4a; apply this check where sniffed_bytes and
AllowedAudioMime::M4a are referenced so non-audio MP4 containers are rejected.
- Reject reserved MPEG version bits (0b01) in MP3 magic-byte detection to exclude more invalid frame headers - Fix TooLong user message for sub-minute durations (was showing 0 min) - Remove sha256/byte_len from to_context_string() to reduce model tokens - Add dedicated MultipleAudioParts variant to AudioIngressReason for dashboards/alerts instead of collapsing into SystemError
30ed5eb to
c78f06d
Compare
Check cumulative size against max_audio_bytes before extending the byte buffer in fetch_and_stage_audio to prevent OOM from oversized chunks sent by a malicious upstream server.
Replace the monolithic pre-push hook that runs all checks (~2-7 min) with a diff-aware version that only checks stacks with changed files: - Rust (fmt + clippy + unit tests): only if clients/agent-runtime/ changed - Kotlin (compile check): only if composeApp/agent-core-kmp/gradle changed - Web (biome lint): only if clients/web/ changed - Docs (lychee links): only if .md files changed - Gradle locks: only if build config changed Expected improvement: 2-7 minutes → 0-25 seconds for typical pushes. CI remains the comprehensive quality gate. Escape hatches: - SKIP_GIT_HOOKS=1 git push (bypass entirely) - FULL_PRE_PUSH=1 git push (run all checks like before)
Add unit tests for build_transcriber, gate_audio_config edge cases, inject_transcription, TooLong message variants, Telegram voice/audio JSON parsing, and AudioConfig zero-value validation to close the 1.6 percent coverage gap on new code.
|



Add audio-to-text input capability so agents can receive voice notes and audio files (OGG/Opus, MP3, WAV, M4A) via Telegram, transcribe them locally using whisper.cpp, and feed the transcription into the normal agent conversation flow.
Key changes:
Privacy: all transcription is local (NFR1), no audio data leaves the operator's infrastructure.
Closes #246