feat: add audio input support for http gateway and cli phase 2#437
Conversation
Implements issue #412 (Phase 2) adding audio transcription ingress to two new channels: - HTTP gateway: POST /web/chat/audio accepts multipart/form-data with OGG/Opus, MP3, WAV, M4A files up to 25 MiB; nested router gives this route its own DefaultBodyLimit while the global 64 KB limit stays for all other routes; JSON error responses with AudioRejectionReason codes - CLI: /audio <path> command reads a local file, stages it, transcribes via Transcriber, and injects the result as plain text into the normal agent pipeline (Option A — pre-pipeline, no bypass of process_channel_message) Shared infrastructure: - stage_audio_from_bytes() extracted from the Telegram channel into audio_media.rs so all three channels reuse the same MIME sniff, size validation, SHA-256, and temp file staging logic - VALID_AUDIO_CHANNELS extended with "gateway" and "cli" - AudioIngressEvent emitted with correct channel name from all surfaces Also fixes tungstenite 0.28.0 breaking change (Message::Text now takes Utf8Bytes) across dingtalk, lark, discord, and qq channels, triggered by the axum/multipart feature addition required for the gateway endpoint. Closes #412
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Phase 2 audio ingress: a shared async staging utility Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant GatewayHandler as Gateway Handler
participant Auth as Auth/Rate-Limit
participant Stager as Audio Staging
participant Transcriber
participant SSE as SSE Stream
participant Observer
Client->>GatewayHandler: POST /web/chat/audio (multipart)
GatewayHandler->>Auth: validate bearer token & rate limits
Auth-->>GatewayHandler: ok / reject
GatewayHandler->>GatewayHandler: extract `audio` part
GatewayHandler->>Stager: stage_audio_from_bytes(bytes,...)
Stager->>Stager: magic-byte sniff, size/duration checks
Stager->>Stager: compute SHA-256, write temp file
Stager-->>GatewayHandler: StagedAudio or Rejection
alt staging failed
GatewayHandler->>Observer: emit AudioIngressEvent(rejected)
GatewayHandler-->>Client: HTTP error JSON
else staging ok
GatewayHandler->>Transcriber: transcribe(staged_audio) with timeout
Transcriber-->>GatewayHandler: transcription or error
alt transcription success
GatewayHandler->>Observer: emit AudioIngressEvent(admitted)
GatewayHandler-->>SSE: event: transcription (json)
GatewayHandler-->>SSE: event: done
else transcription failed
GatewayHandler->>Observer: emit AudioIngressEvent(rejected)
GatewayHandler-->>Client: HTTP/SSE error
end
end
GatewayHandler->>Stager: cleanup temp file
sequenceDiagram
participant User
participant CLI as CLI Listen
participant FS as File System
participant Stager as Audio Staging
participant Transcriber
participant Observer
participant Runtime as Channel Runtime
User->>CLI: /audio ~/path/file.ogg
CLI->>CLI: expand_home_tilde()
CLI->>FS: stat(expanded_path)
FS-->>CLI: metadata or not found
alt not found
CLI->>User: error (no telemetry)
else found
FS->>CLI: read_async(bytes)
CLI->>Stager: stage_audio_from_bytes(bytes,...)
Stager-->>CLI: StagedAudio or Rejection
alt staging failed
CLI->>Observer: emit AudioIngressEvent(rejected)
CLI->>User: error message
else staging ok
CLI->>Transcriber: transcribe(staged_audio)
Transcriber-->>CLI: transcription or error
alt success
CLI->>Observer: emit AudioIngressEvent(admitted)
CLI->>Runtime: inject ChannelMessage(text)
Runtime-->>User: agent response
else failure
CLI->>Observer: emit AudioIngressEvent(rejected)
CLI->>User: error message
end
end
end
CLI->>Stager: cleanup temp file
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)✅ Unit Test PR creation complete.
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Contributor ReportUser: @yacosta738
Contributor Report evaluates based on public GitHub activity. Analysis period: 2025-04-04 to 2026-04-04 |
There was a problem hiding this comment.
Actionable comments posted: 22
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/agent-runtime/src/channels/audio_media.rs`:
- Around line 329-341: Replace the blocking std::fs::OpenOptions usage and
separate tokio::fs::write call with fully async file creation and write, and
ensure cleanup on failure: use
tokio::fs::OpenOptions::new().write(true).create_new(true).open(&temp_path).await
to create the temp file, then write the bytes with
tokio::io::AsyncWriteExt::write_all on the opened file (or use tokio::fs::write
directly if you accept overwrite semantics), map errors to
AudioRejectionReason::FetchFailed while logging the actual error via
tracing::warn, and on any write/open error call
tokio::fs::remove_file(&temp_path).await.ok() to remove the empty temp file so
it is not leaked; update the tracing messages to include the error variable in
both open and write error branches.
In `@clients/agent-runtime/src/channels/cli.rs`:
- Around line 525-538: The tests test_cli_audio_tilde_expansion and
test_cli_audio_tilde_alone call std::env::set_var which is not thread-safe;
either mark these tests to run serially with #[serial_test::serial] above each
#[test] or refactor the expansion logic so parse_audio_command does not depend
on global env (e.g., extract a helper like expand_home_tilde_with_home(home:
&str, path: &str) and call that from parse_audio_command), then update the tests
to call the new helper with a synthetic HOME value instead of mutating std::env.
- Around line 163-173: Replace the manual staged.cleanup() calls with an RAII
guard so temporary files are always removed: introduce a StagedAudioGuard that
holds a reference to the StagedAudio and implements Drop calling
StagedAudio::cleanup(), create the guard immediately after
stage_audio_from_bytes (or wherever StagedAudio is obtained) so it covers all
exit paths, remove the explicit staged.cleanup() calls around
transcriber.transcribe(&staged).await and any early returns, and keep using
transcriber.transcribe(&staged) as before so the guard automatically cleans up
on both success and error paths.
- Around line 430-456: The OkTranscriber mock bypasses semaphore acquisition so
tests can't exercise concurrency controls; modify OkTranscriber (impl
Transcriber for OkTranscriber) to accept or access the same semaphore used by
the real pipeline and acquire/release it inside transcribe (and possibly
health_check) so the mock simulates blocking behavior—add a field (e.g.,
semaphore: Arc<Semaphore> or a handle) to OkTranscriber, update its constructor
in tests to pass the pipeline semaphore, and wrap the transcribe body with
semaphore.acquire().await (and drop/release when done) while still returning the
same TranscriptionResult to preserve test semantics.
- Around line 119-127: Replace the ad-hoc println size message with the
standardized rejection message generator to match REQ-11: call
cli_rejection_message(&AudioRejectionReason::Oversize, Some(file_size),
Some(self.audio_config.max_audio_bytes)) (or equivalent) instead of the current
println, then call self.emit_rejected(&AudioRejectionReason::Oversize,
Some(file_size), None) as before; this keeps the early pre-check in cli.rs while
ensuring message text matches the one produced by stage_audio_from_bytes() and
the rest of the system.
In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 1381-1385: The fallback `_ => return Ok(Vec::new())` silently
swallows unexpected raw-audio on non-Telegram channels; change it to fail fast
by returning an explicit error instead of an empty Vec. Replace that arm with a
Err(...) return using the crate's error type (e.g., return
Err(AgentError::SystemError("unexpected raw audio on non-Telegram
channel".into())) or a dedicated ContractViolation/UnexpectedRawAudio variant if
one exists), so callers get a precise rejection when raw audio arrives for
channels like "gateway" or "cli".
In `@clients/agent-runtime/src/channels/telegram.rs`:
- Around line 1829-1839: The staging in audio_media::stage_audio_from_bytes is
not atomic because it creates the final temp_path and then writes to it; change
the implementation to write to a separate temporary file (e.g., final_path +
".tmp" or a unique tmp name created with create_new), open that temp file for
writing, write_all the bytes, call sync_all on the File (and optionally parent
dir via tokio::fs::File::open on the parent and sync_all) to flush to disk,
close it, then atomically rename (tokio::fs::rename) the temp file to the
intended final path; ensure you still use create_new when creating the temp
target to avoid races and remove the temp file on error.
In `@clients/agent-runtime/src/gateway/mod.rs`:
- Around line 1992-1994: The match arm currently maps R::FetchFailed together
with client errors (R::MimeRejected | R::Corrupted | R::MultipleAudioParts |
R::FetchFailed) to StatusCode::BAD_REQUEST; change the logic so that
R::FetchFailed is not treated as a 4xx client error but returns a 5xx (e.g.,
StatusCode::INTERNAL_SERVER_ERROR). Update the match in gateway/mod.rs to remove
R::FetchFailed from the BAD_REQUEST group and add a separate branch mapping
R::FetchFailed to StatusCode::INTERNAL_SERVER_ERROR (or another appropriate
5xx), so server-side staging failures are reported as server errors.
- Around line 1223-1224: The AppState is initialized with transcriber: None in
run_gateway(), causing the /web/chat/audio route to always return
TranscriberUnavailable; update run_gateway() to construct and assign a
configured transcriber into AppState.transcriber (instead of None) using the
configured audio settings (e.g., config.audio) and the project's transcriber
constructor/initializer so the transcriber is available at runtime; ensure any
fallible initialization returns an error or logs and exits appropriately so
AppState never contains None when audio is enabled.
- Around line 2033-2265: The handle_chat_audio handler currently only reads the
"audio" part and returns a transcription + done, so it never resumes/creates a
conversation or forwards the transcript into the existing agent flow; modify
handle_chat_audio to also parse optional multipart fields session_id and
language, resolve or create the session (using the same session resolution logic
used elsewhere), and then call webhook_dispatch::execute() (or the equivalent
existing agent flow function) with the transcribed text and language so the
agent can process the message; replace the current final SSE payload
(AudioTranscriptionEvent + done) with streaming responses returned by
webhook_dispatch::execute() so clients receive the agent reply, ensure
staged.cleanup() and observer.on_audio_ingress telemetry remain on all paths,
and propagate any errors from transcription/dispatch as the existing audio
rejection responses (use symbols: handle_chat_audio, stage_audio_from_bytes,
transcriber.transcribe, AudioTranscriptionEvent, webhook_dispatch::execute,
staged.cleanup, observer.on_audio_ingress).
- Around line 6493-6499: The test helper build_audio_router currently only
applies DefaultBodyLimit and so tests calling .oneshot() can't detect
RequestBodyLimitLayer (64 KiB) or TimeoutLayer (30s) behavior; update
build_audio_router in gateway/mod.rs to wrap the Router returned for
handle_chat_audio with the same outer middleware stack used in production — add
RequestBodyLimitLayer configured to 64 * 1024 bytes and a TimeoutLayer set to
30s (in the same ordering as production) so tests exercising handle_chat_audio
via .oneshot() will observe the real limits, or alternatively change tests to
use the real app router instead of this helper.
- Around line 1266-1270: The audio endpoint (/web/chat/audio handled by
handle_chat_audio configured in Router::new() with DefaultBodyLimit::max(25 *
1024 * 1024)) is still being wrapped by the global
RequestBodyLimitLayer::new(MAX_BODY_SIZE) and TimeoutLayer, causing the 64
KiB/app-wide limits to reject large uploads; fix by moving the audio route into
its own router stack that is mounted separately (so it is not wrapped by the
global RequestBodyLimitLayer and TimeoutLayer) or by building the router so
those global layers are applied only to the non-audio subrouter and the audio
subrouter keeps its DefaultBodyLimit and its own timeout handling, ensuring
handle_chat_audio receives requests up to 25 MiB and can use
tokio::time::timeout without being preempted by the global middleware.
In `@openspec/changes/archive/2026-04-04-audio-input-phase2/design.md`:
- Around line 49-89: The gateway implementation in
clients/agent-runtime/src/gateway/mod.rs only returns raw transcription and
snapshots audio_config at startup; update the handler to match the design by
parsing multipart session_id and language in handle_chat_audio(), pass them into
stage_audio_from_bytes() instead of ignoring them, call
transcriber.transcribe(staged_audio) then build a text ChannelMessage from the
TranscriptionResult and invoke webhook_dispatch::execute(message,
include_sse_frames: true) to route through the agent path and produce the
documented agent/error payloads, ensure SSE stream includes transcription
metadata plus agent response, and stop snapshotting audio_config at startup
(read it dynamically or subscribe to config changes) so removing "gateway"/"cli"
flags takes immediate effect.
- Around line 180-241: The design incorrectly claims per-route overrides work
when the merged audio router is wrapped by global layers; fix by ensuring the
audio router's layers are applied after merge or by moving global layers to be
applied before merge so DefaultBodyLimit::max(25 * 1024 * 1024) on the audio
Router and an elevated TimeoutLayer are effective for /web/chat/audio, or
alternatively explicitly add a nested TimeoutLayer and ensure
merge(audio_router) happens prior to applying
RequestBodyLimitLayer::new(MAX_BODY_SIZE) and the global TimeoutLayer so the
audio route's DefaultBodyLimit and its own TimeoutLayer take precedence over the
global RequestBodyLimitLayer and TimeoutLayer (refer to merge(),
DefaultBodyLimit, RequestBodyLimitLayer, TimeoutLayer, and
tokio::time::timeout).
In `@openspec/changes/archive/2026-04-04-audio-input-phase2/proposal.md`:
- Around line 73-81: Two fenced code blocks in the proposal are missing language
identifiers; update both triple-backtick blocks shown around the HTTP/audio flow
and the CLI flow to use a language tag (e.g., ```text) so markdown lint passes
and rendering improves; the blocks reference symbols like
stage_audio_from_bytes(), transcribe_audio(), ChannelMessage,
CliChannel::listen(), process_channel_message(), ContentPart::Audio and
Agent::turn()—ensure you add the same language tag to both the block containing
the POST /web/chat/audio flow and the block containing the "User types: /audio
~/recording.mp3" CLI flow (also apply the same change to the second occurrence
noted around lines 95-103).
- Around line 116-124: The documented signature for stage_audio_from_bytes is
out of sync: replace the audio_config: &AudioConfig parameter with the explicit
parameters used in the implementation—channel_abbrev: &str, max_bytes:
Option<usize>, max_duration_secs: Option<u64>—and ensure channel_origin &str
remains present and the return type and Result error type stay the same so the
proposal snippet exactly matches the implemented function signature
(stage_audio_from_bytes).
In
`@openspec/changes/archive/2026-04-04-audio-input-phase2/specs/audio-input-phase2-spec.md`:
- Around line 31-40: The code block for the function signature
stage_audio_from_bytes is missing a language identifier; update the Markdown
fence opening from ``` to ```rust so the signature block is marked as Rust
(i.e., modify the code fence that wraps the stage_audio_from_bytes(...)
signature to include the "rust" language tag).
- Around line 275-277: The fenced code block showing the command syntax
currently lacks a language identifier; update the block containing "/audio
<file-path>" in audio-input-phase2-spec.md to include a language tag (e.g.,
change the opening fence to ```text) so the snippet reads with a language
identifier and closes with ``` as before.
- Around line 294-296: The spec says to route an audio ChannelMessage through
process_channel_message() but cli.rs currently transcribes first and sends a
text ChannelMessage directly via tx.send() (the "pre-pipeline" path). Either
update the spec to document the pre-pipeline behavior or change the
implementation: modify cli.rs to construct a ChannelMessage with
ContentPart::Audio, send it into the runtime pipeline by calling
process_channel_message() (so the runtime's Transcriber handles transcription),
and remove the direct tx.send() text-send flow; ensure references to
Transcriber, ChannelMessage, process_channel_message(), and tx.send() are
updated/removed accordingly to keep code and spec consistent.
In `@openspec/changes/archive/2026-04-04-audio-input-phase2/tasks.md`:
- Line 7: Update the checklist entry for stage_audio_from_bytes to match the
current API by replacing the old parameter list that included audio_config with
the actual explicit limits/metadata parameters now used by the function;
specifically edit the signature text to read something like async fn
stage_audio_from_bytes(bytes, channel_abbrev, declared_mime,
declared_duration_secs, <explicit limits/metadata params such as max_size_bytes,
max_duration_secs, ...>) -> Result<StagedAudio, AudioRejectionReason>, so the
archived task log accurately reflects the shipped function signature (refer to
stage_audio_from_bytes for exact parameter names).
In `@openspec/specs/audio-input/spec.md`:
- Around line 1184-1186: The fenced code block showing the command "/audio
<file-path>" lacks a language identifier; update that markdown block (the fenced
block containing /audio <file-path>) to include a language tag such as text or
shell (e.g., change ``` to ```text or ```shell) so the snippet is correctly
highlighted and the docs pass linting.
- Around line 944-953: The fenced code block containing the function signature
stage_audio_from_bytes(...) in the spec is missing a language identifier; update
the opening fence from ``` to ```rust (or ```text if preferred) so the block is
consistently marked as Rust code across the document and enables proper syntax
highlighting for the signature and types like StagedAudio and
AudioRejectionReason.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3fc5d429-1ca3-4ab4-92ff-7a3337427a65
⛔ Files ignored due to path filters (2)
clients/agent-runtime/Cargo.lockis excluded by!**/*.lockpnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (20)
clients/agent-runtime/Cargo.tomlclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/cli.rsclients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/channels/lark.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/qq.rsclients/agent-runtime/src/channels/telegram.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/gateway/admin.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/transcription/whisper_cli.rsclients/agent-runtime/tests/admin_config_api_integration.rsopenspec/changes/archive/2026-04-04-audio-input-phase2/design.mdopenspec/changes/archive/2026-04-04-audio-input-phase2/proposal.mdopenspec/changes/archive/2026-04-04-audio-input-phase2/specs/audio-input-phase2-spec.mdopenspec/changes/archive/2026-04-04-audio-input-phase2/state.yamlopenspec/changes/archive/2026-04-04-audio-input-phase2/tasks.mdopenspec/specs/audio-input/spec.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pr-checks
- GitHub Check: sonar
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (9)
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
openspec/changes/archive/2026-04-04-audio-input-phase2/state.yamlclients/agent-runtime/Cargo.tomlclients/agent-runtime/tests/admin_config_api_integration.rsclients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/channels/lark.rsclients/agent-runtime/src/channels/qq.rsclients/agent-runtime/src/gateway/admin.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/transcription/whisper_cli.rsclients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/channels/audio_media.rsopenspec/changes/archive/2026-04-04-audio-input-phase2/proposal.mdopenspec/changes/archive/2026-04-04-audio-input-phase2/tasks.mdopenspec/changes/archive/2026-04-04-audio-input-phase2/design.mdclients/agent-runtime/src/channels/telegram.rsopenspec/specs/audio-input/spec.mdclients/agent-runtime/src/gateway/mod.rsopenspec/changes/archive/2026-04-04-audio-input-phase2/specs/audio-input-phase2-spec.mdclients/agent-runtime/src/channels/cli.rs
clients/agent-runtime/**/Cargo.toml
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
clients/agent-runtime/**/Cargo.toml: Preserve release-size profile assumptions inCargo.tomland avoid adding heavy dependencies unless clearly justified
Do not add heavy dependencies for minor convenience; justify new crate additions
Files:
clients/agent-runtime/Cargo.toml
clients/agent-runtime/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Run
cargo fmt --all -- --check,cargo clippy --all-targets -- -D warnings, andcargo testfor code validation, or document which checks were skipped and why
Files:
clients/agent-runtime/tests/admin_config_api_integration.rsclients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/channels/lark.rsclients/agent-runtime/src/channels/qq.rsclients/agent-runtime/src/gateway/admin.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/transcription/whisper_cli.rsclients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/telegram.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/channels/cli.rs
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/agent-runtime/tests/admin_config_api_integration.rsclients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/channels/lark.rsclients/agent-runtime/src/channels/qq.rsclients/agent-runtime/src/gateway/admin.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/transcription/whisper_cli.rsclients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/telegram.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/channels/cli.rs
clients/agent-runtime/src/channels/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Implement
Channeltrait insrc/channels/with consistentsend,listen, andhealth_checksemantics and cover auth/allowlist/health behavior with tests
Files:
clients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/channels/lark.rsclients/agent-runtime/src/channels/qq.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/telegram.rsclients/agent-runtime/src/channels/cli.rs
clients/agent-runtime/src/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency
Files:
clients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/channels/lark.rsclients/agent-runtime/src/channels/qq.rsclients/agent-runtime/src/gateway/admin.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/transcription/whisper_cli.rsclients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/telegram.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/channels/cli.rs
clients/agent-runtime/src/{security,gateway,tools}/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Treat
src/security/,src/gateway/,src/tools/as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks
Files:
clients/agent-runtime/src/gateway/admin.rsclients/agent-runtime/src/gateway/mod.rs
clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Files:
clients/agent-runtime/src/gateway/admin.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/gateway/mod.rs
**/*.{md,mdx}
⚙️ CodeRabbit configuration file
**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.
Files:
openspec/changes/archive/2026-04-04-audio-input-phase2/proposal.mdopenspec/changes/archive/2026-04-04-audio-input-phase2/tasks.mdopenspec/changes/archive/2026-04-04-audio-input-phase2/design.mdopenspec/specs/audio-input/spec.mdopenspec/changes/archive/2026-04-04-audio-input-phase2/specs/audio-input-phase2-spec.md
🧠 Learnings (7)
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/Cargo.toml : Do not add heavy dependencies for minor convenience; justify new crate additions
Applied to files:
clients/agent-runtime/Cargo.toml
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/Cargo.toml : Preserve release-size profile assumptions in `Cargo.toml` and avoid adding heavy dependencies unless clearly justified
Applied to files:
clients/agent-runtime/Cargo.toml
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
Applied to files:
clients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/channels/lark.rsclients/agent-runtime/src/gateway/admin.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/transcription/whisper_cli.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/channels/telegram.rsclients/agent-runtime/src/channels/cli.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests
Applied to files:
clients/agent-runtime/src/channels/discord.rsclients/agent-runtime/src/channels/lark.rsclients/agent-runtime/src/channels/qq.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/transcription/whisper_cli.rsclients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/config/schema.rsclients/agent-runtime/src/channels/audio_media.rsopenspec/changes/archive/2026-04-04-audio-input-phase2/tasks.mdclients/agent-runtime/src/channels/telegram.rsopenspec/specs/audio-input/spec.mdclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/channels/cli.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Keep startup path lean and avoid heavy initialization in command parsing flow
Applied to files:
clients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/cli.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Applied to files:
clients/agent-runtime/src/config/schema.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why
Applied to files:
clients/agent-runtime/src/channels/telegram.rs
🪛 LanguageTool
openspec/changes/archive/2026-04-04-audio-input-phase2/tasks.md
[style] ~15-~15: ‘without warning’ might be wordy. Consider a shorter alternative.
Context: ...: ["telegram","gateway","cli"] passes without warning; ["telegram","gateway","discord"] war...
(EN_WORDINESS_PREMIUM_WITHOUT_WARNING)
openspec/changes/archive/2026-04-04-audio-input-phase2/design.md
[style] ~537-~537: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nges. - No provider contract changes. - No existing behavior modified for text or ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~550-~550: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...t support chunked/resumable uploads for very large files? Recommendation: No for Phase 2...
(EN_WEAK_ADJECTIVE)
openspec/changes/archive/2026-04-04-audio-input-phase2/specs/audio-input-phase2-spec.md
[grammar] ~498-~498: Use a hyphen to join words.
Context: ... Some(MimeRejected)` #### Scenario: CLI admitted event - GIVEN a valid OGG file...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.22.0)
openspec/changes/archive/2026-04-04-audio-input-phase2/proposal.md
[warning] 73-73: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 95-95: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
openspec/specs/audio-input/spec.md
[warning] 944-944: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 1184-1184: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
openspec/changes/archive/2026-04-04-audio-input-phase2/specs/audio-input-phase2-spec.md
[warning] 28-28: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 138-138: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 138-138: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 435-435: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 460-460: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 470-470: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 470-470: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (24)
clients/agent-runtime/src/channels/lark.rs (1)
283-283: WebSocket binary payload conversions are correct and consistent.These explicit
.into()conversions keep all binary send paths aligned and avoid type ambiguity at the call sites.Also applies to: 304-304, 364-364
clients/agent-runtime/src/channels/qq.rs (1)
248-250: Text frame construction updates look good.The explicit
.into()conversion is consistently applied across identify and heartbeat sends and keeps the WebSocket write path type-safe.Also applies to: 274-274, 302-302
clients/agent-runtime/src/channels/discord.rs (1)
292-294: Gateway send-path typing cleanup is solid.Identify and heartbeat sends now use explicit conversion uniformly, with no behavioral regression in the event loop logic.
Also applies to: 323-323, 351-351
clients/agent-runtime/src/channels/dingtalk.rs (1)
126-126: Send-path and test updates are aligned.Pong/ack frame construction and ping test payload typing are consistently updated, which keeps runtime and test expectations in sync.
Also applies to: 229-229, 506-507
openspec/changes/archive/2026-04-04-audio-input-phase2/state.yaml (1)
1-5: Archive state metadata looks correct.The phase, completion list, and update timestamp are internally consistent for an archived change record.
clients/agent-runtime/tests/admin_config_api_integration.rs (1)
91-92:AppStatetest fixture update is correct.Adding
transcriberandaudio_configkeepsstate_with_configsynchronized with the currentAppStateshape and avoids integration test drift.clients/agent-runtime/Cargo.toml (2)
131-131:axumfeature expansion is appropriately scoped.Enabling
multiparthere is a focused dependency change that matches the new audio ingress route requirements.
213-213: Dev-onlytower::utiladdition looks reasonable.Adding
utilunder[dev-dependencies]keeps production dependency surface unchanged while supporting test/router utilities.clients/agent-runtime/src/transcription/whisper_cli.rs (1)
17-26: Documentation is accurate and aligns with runtime sharing modelGood clarification of the shared semaphore behavior across Telegram/Gateway/CLI via a single
Arc<dyn Transcriber>instance.clients/agent-runtime/src/gateway/admin.rs (1)
2403-2404: Test state stays aligned with the expandedAppState.Good update. Adding the new audio fields here keeps the admin integration harness exercising the same state shape as production.
clients/agent-runtime/src/config/schema.rs (2)
301-303: Good allowlist expansion with fail-closed behavior preserved.Recognizing
gatewayandcliwhile keeping unknown channels warning-only (runtime fail-closed) is aligned with rollout safety.Also applies to: 3354-3357
7053-7101: Nice targeted tests for the new accepted channels.The added cases for
gateway,cli, and combined known channels provide clear regression coverage for this config change.clients/agent-runtime/src/channels/audio_media.rs (1)
855-958: Strong focused tests for the shared staging utility.These cover key rejection paths plus temp-file write integrity and SHA-256 correctness.
openspec/changes/archive/2026-04-04-audio-input-phase2/tasks.md (1)
44-44: The semaphore implementation is correct; claim about location/mechanism needs correction.The shared semaphore is an instance field (
Arc<Semaphore>) inWhisperCliTranscriberwithinsrc/transcription/whisper_cli.rs, not a module-levelOnceLockinaudio_media.rs. The singleTranscriberinstance is wrapped inArc<dyn Transcriber>and passed to all channels at startup, which is the correct mechanism for enforcing the unifiedmax_concurrent_transcriptionsbudget. The code comments already document this cross-channel sharing explicitly. No changes are needed.openspec/specs/audio-input/spec.md (4)
935-1028: REQ-19 spec looks solid and aligns with implementation.The shared
stage_audio_from_bytes()requirement is well-defined with clear validation order, rejection semantics, and scenarios. The implementation inaudio_media.rs(per AI summary) should follow this contract.
1030-1176: REQ-20 HTTP Gateway endpoint spec is comprehensive.Good coverage of multipart fields, error responses mapped to HTTP status codes, and gating behavior. The 25 MiB route-specific body limit override is correctly specified.
1177-1306: REQ-21 CLI/audiocommand spec aligns with implementation.The spec correctly defines
~expansion, error messages, and channel gating. One observation: the spec requires routing throughprocess_channel_message()withContentPart::Audio(line 1203-1204), but the implementation incli.rssends a textChannelMessagedirectly after transcription (pre-pipeline approach). This is intentional per the AI summary ("Option A — pre-pipeline"), but the spec text could be clearer that this is the chosen approach.
1431-1474: REQ-24 implementation is correct — semaphore acquisition IS visible and properly shared.The semaphore in
WhisperCliTranscriber::transcribe()is explicitly acquired on line 117 (self.semaphore.acquire().await), where the permit is held for the entire transcription duration and released on drop. The singleArc<Transcriber>created viabuild_transcriber()is stored inChannelRuntimeContext(line 95) and passed globally to all channel handlers, ensuring all channels (Telegram, gateway, CLI) compete for the same semaphore permits and respect the configuredmax_concurrent_transcriptionslimit as a global ceiling.clients/agent-runtime/src/channels/cli.rs (5)
349-358:expand_home_tildedoesn't handle~usersyntax.The function only handles
~and~/path, not~otheruser/path. This is acceptable if undocumented, but the spec (REQ-21) only mentions~expansion for home directory, so this is compliant.
66-217:handle_audio_commandimplementation is well-structured with proper gating.The gate order (enabled → allowed_channels → transcriber → file exists → size → stage → transcribe) follows the spec. Telemetry emission for both admitted and rejected outcomes is correct. The
started_attiming captures the full pipeline duration as required.
263-293: Good handling of/audiovs/audiobookdisambiguation.The
parse_audio_commandcorrectly returnsNonefor inputs like/audiobookso they fall through to the normal text path. This prevents breaking existing behavior for commands that happen to start with/audio.
622-647: Test correctly expects noAudioIngressEventfor file-not-found.Per the implementation comment at line 114, file access errors before entering the audio pipeline don't emit telemetry. This matches the test expectation and is reasonable — the pipeline was never entered.
43-56: > Likely an incorrect or invalid review comment.openspec/changes/archive/2026-04-04-audio-input-phase2/specs/audio-input-phase2-spec.md (1)
1-629: Overall delta spec is well-organized and comprehensive.Good structure with ADDED/MODIFIED/REMOVED sections, clear scenario definitions, and thorough cross-references. The spec correctly extends Phase 1 without breaking existing requirements.
| // ── Transcribe — staged.cleanup() on ALL exit paths ── | ||
| let transcription_result = match transcriber.transcribe(&staged).await { | ||
| Ok(r) => r, | ||
| Err(reason) => { | ||
| staged.cleanup(); | ||
| println!("{}", cli_rejection_message(&reason, &self.audio_config)); | ||
| self.emit_rejected(&reason, Some(staged_byte_len), staged_duration); | ||
| return; | ||
| } | ||
| }; | ||
| staged.cleanup(); // success path |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Manual cleanup pattern is error-prone — consider RAII wrapper.
While cleanup is called on both success and error paths, early returns or panics after staging but before cleanup could leak temp files. The spec (REQ-5) mandates RAII cleanup via StagedAudioGuard.
♻️ Suggested RAII pattern
// Create a guard that calls cleanup on drop
struct StagedAudioGuard<'a>(&'a StagedAudio);
impl Drop for StagedAudioGuard<'_> {
fn drop(&mut self) {
self.0.cleanup();
}
}
// Usage:
let staged = stage_audio_from_bytes(...).await?;
let _guard = StagedAudioGuard(&staged); // cleanup on any exit
let transcription_result = transcriber.transcribe(&staged).await?;
// guard drops here, cleanup happens automatically🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@clients/agent-runtime/src/channels/cli.rs` around lines 163 - 173, Replace
the manual staged.cleanup() calls with an RAII guard so temporary files are
always removed: introduce a StagedAudioGuard that holds a reference to the
StagedAudio and implements Drop calling StagedAudio::cleanup(), create the guard
immediately after stage_audio_from_bytes (or wherever StagedAudio is obtained)
so it covers all exit paths, remove the explicit staged.cleanup() calls around
transcriber.transcribe(&staged).await and any early returns, and keep using
transcriber.transcribe(&staged) as before so the guard automatically cleans up
on both success and error paths.
| // ── Mock transcriber for pipeline tests ─────────────────── | ||
|
|
||
| struct OkTranscriber; | ||
|
|
||
| #[async_trait] | ||
| impl Transcriber for OkTranscriber { | ||
| fn name(&self) -> &str { | ||
| "mock-ok" | ||
| } | ||
|
|
||
| async fn transcribe( | ||
| &self, | ||
| _audio: &StagedAudio, | ||
| ) -> Result<TranscriptionResult, AudioRejectionReason> { | ||
| Ok(TranscriptionResult { | ||
| text: "hello world".to_string(), | ||
| language: Some("en".to_string()), | ||
| duration_secs: Some(3.0), | ||
| confidence: None, | ||
| processing_ms: None, | ||
| }) | ||
| } | ||
|
|
||
| async fn health_check(&self) -> Result<(), String> { | ||
| Ok(()) | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Mock transcriber lacks semaphore — tests don't verify concurrency behavior.
The OkTranscriber mock returns immediately without semaphore acquisition. This is fine for unit tests, but integration tests should verify REQ-24 (cross-channel concurrency) with a real or semaphore-aware mock.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@clients/agent-runtime/src/channels/cli.rs` around lines 430 - 456, The
OkTranscriber mock bypasses semaphore acquisition so tests can't exercise
concurrency controls; modify OkTranscriber (impl Transcriber for OkTranscriber)
to accept or access the same semaphore used by the real pipeline and
acquire/release it inside transcribe (and possibly health_check) so the mock
simulates blocking behavior—add a field (e.g., semaphore: Arc<Semaphore> or a
handle) to OkTranscriber, update its constructor in tests to pass the pipeline
semaphore, and wrap the transcribe body with semaphore.acquire().await (and
drop/release when done) while still returning the same TranscriptionResult to
preserve test semantics.
| ``` | ||
| /audio <file-path> | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add language identifier to command syntax block.
-```
+```text
/audio <file-path>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@openspec/changes/archive/2026-04-04-audio-input-phase2/specs/audio-input-phase2-spec.md`
around lines 275 - 277, The fenced code block showing the command syntax
currently lacks a language identifier; update the block containing "/audio
<file-path>" in audio-input-phase2-spec.md to include a language tag (e.g.,
change the opening fence to ```text) so the snippet reads with a language
identifier and closes with ``` as before.
| 6. Build a `ChannelMessage` with `ContentPart::Audio` and route through | ||
| `process_channel_message()` to ensure the full audio pipeline runs | ||
| 7. Transcribe via the runtime's `Transcriber` (REQ-6) |
There was a problem hiding this comment.
Spec says route through process_channel_message() but implementation uses pre-pipeline approach.
Line 294-295 states: "Build a ChannelMessage with ContentPart::Audio and route through process_channel_message()". However, the cli.rs implementation builds a text ChannelMessage after transcription and sends it directly via tx.send(). This is the "Option A — pre-pipeline" approach per the code comments.
Either the spec should be updated to reflect the chosen approach, or the implementation should be changed. The current mismatch could cause confusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@openspec/changes/archive/2026-04-04-audio-input-phase2/specs/audio-input-phase2-spec.md`
around lines 294 - 296, The spec says to route an audio ChannelMessage through
process_channel_message() but cli.rs currently transcribes first and sends a
text ChannelMessage directly via tx.send() (the "pre-pipeline" path). Either
update the spec to document the pre-pipeline behavior or change the
implementation: modify cli.rs to construct a ChannelMessage with
ContentPart::Audio, send it into the runtime pipeline by calling
process_channel_message() (so the runtime's Transcriber handles transcription),
and remove the direct tx.send() text-send flow; ensure references to
Transcriber, ChannelMessage, process_channel_message(), and tx.send() are
updated/removed accordingly to keep code and spec consistent.
|
|
||
| ## Phase 1: Foundation | ||
|
|
||
| - [x] 1.1 Extract `stage_audio_from_bytes()` into `src/channels/audio_media.rs` — async fn accepting `bytes, channel_abbrev, declared_mime, declared_duration_secs, audio_config` → `Result<StagedAudio, AudioRejectionReason>`. Includes MIME sniff, size check, duration pre-check, SHA-256, atomic temp file write. (REQ-19) |
There was a problem hiding this comment.
Fix task 1.1 signature text to match current API.
The checklist still documents stage_audio_from_bytes(..., audio_config); current code uses explicit limits/metadata parameters. Please update this line so the archived task log matches shipped behavior.
As per coding guidelines: **/*.{md,mdx}: "Verify technical accuracy and that docs stay aligned with code changes."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openspec/changes/archive/2026-04-04-audio-input-phase2/tasks.md` at line 7,
Update the checklist entry for stage_audio_from_bytes to match the current API
by replacing the old parameter list that included audio_config with the actual
explicit limits/metadata parameters now used by the function; specifically edit
the signature text to read something like async fn stage_audio_from_bytes(bytes,
channel_abbrev, declared_mime, declared_duration_secs, <explicit limits/metadata
params such as max_size_bytes, max_duration_secs, ...>) -> Result<StagedAudio,
AudioRejectionReason>, so the archived task log accurately reflects the shipped
function signature (refer to stage_audio_from_bytes for exact parameter names).
| ``` | ||
| stage_audio_from_bytes( | ||
| bytes: &[u8], | ||
| declared_mime: Option<&str>, | ||
| declared_duration_secs: Option<u64>, | ||
| max_bytes: u64, | ||
| max_duration_secs: u64, | ||
| channel_origin: &str, | ||
| ) -> Result<StagedAudio, AudioRejectionReason> | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add language identifier to fenced code block.
The function signature block lacks a language specifier. Use rust or text for consistency with other code blocks in this spec.
-```
+```rust
stage_audio_from_bytes(🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 944-944: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openspec/specs/audio-input/spec.md` around lines 944 - 953, The fenced code
block containing the function signature stage_audio_from_bytes(...) in the spec
is missing a language identifier; update the opening fence from ``` to ```rust
(or ```text if preferred) so the block is consistently marked as Rust code
across the document and enables proper syntax highlighting for the signature and
types like StagedAudio and AudioRejectionReason.
| ``` | ||
| /audio <file-path> | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add language identifier to fenced code block.
Same issue — add text or shell for the command syntax block.
-```
+```text
/audio <file-path>🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 1184-1184: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@openspec/specs/audio-input/spec.md` around lines 1184 - 1186, The fenced code
block showing the command "/audio <file-path>" lacks a language identifier;
update that markdown block (the fenced block containing /audio <file-path>) to
include a language tag such as text or shell (e.g., change ``` to ```text or
```shell) so the snippet is correctly highlighted and the docs pass linting.
Implements issue #412 (Phase 2) adding audio transcription ingress to two new channels: - HTTP gateway: POST /web/chat/audio accepts multipart/form-data with OGG/Opus, MP3, WAV, M4A files up to 25 MiB; nested router gives this route its own DefaultBodyLimit while the global 64 KB limit stays for all other routes; JSON error responses with AudioRejectionReason codes - CLI: /audio <path> command reads a local file, stages it, transcribes via Transcriber, and injects the result as plain text into the normal agent pipeline (Option A — pre-pipeline, no bypass of process_channel_message) Shared infrastructure: - stage_audio_from_bytes() extracted from the Telegram channel into audio_media.rs so all three channels reuse the same MIME sniff, size validation, SHA-256, and temp file staging logic - VALID_AUDIO_CHANNELS extended with "gateway" and "cli" - AudioIngressEvent emitted with correct channel name from all surfaces Also fixes: - tungstenite 0.28.0 breaking change (Message::Text/Ping now take Utf8Bytes/Bytes) across dingtalk, lark, discord, and qq channels - clippy errors: u128→u64 truncation, redundant match arm, Observer trait mock - async file write in stage_audio_from_bytes (was blocking std::fs) - cli_rejection_message signature to accept actual/max values for Oversize - expand_home_tilde_with_home helper to avoid global env mutation in tests - transcriber init in gateway (was None, now calls build_transcriber) - FetchFailed mapped to 500 (was 400) as server-side error Closes #412
008e321 to
3cf904e
Compare
Deploying corvus with
|
| Latest commit: |
c45ad7f
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://e9ac6b18.corvus-42x.pages.dev |
| Branch Preview URL: | https://feat-412-add-audio-input-sup.corvus-42x.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
clients/agent-runtime/src/channels/audio_media.rs (1)
325-341:⚠️ Potential issue | 🟠 Major
stage_audio_from_bytescan still overwrite an existing temp path and is not atomic as documented.Line 334 writes directly to the final path via
tokio::fs::write. If a filename collision occurs, the existing file can be truncated/overwritten, and this does not match the “atomic staging” contract in the function docs.🔧 Proposed fix
+use tokio::io::AsyncWriteExt; @@ - let temp_path_async = temp_path.clone(); - let bytes_to_write = bytes.to_vec(); // Copy bytes for async write (owned) - - // Write to a temporary file first, then atomically rename - if let Err(e) = tokio::fs::write(&temp_path_async, &bytes_to_write).await { - tracing::warn!( - "Failed to stage audio to {}: {e}", - temp_path_async.display() - ); - // Clean up temp file on write failure - let _ = tokio::fs::remove_file(&temp_path_async).await; - return Err(AudioRejectionReason::FetchFailed); - } + let mut file = tokio::fs::OpenOptions::new() + .write(true) + .create_new(true) + .open(&temp_path) + .await + .map_err(|e| { + tracing::warn!("Failed to create staged audio {}: {e}", temp_path.display()); + AudioRejectionReason::FetchFailed + })?; + + if let Err(e) = file.write_all(bytes).await { + let _ = tokio::fs::remove_file(&temp_path).await; + tracing::warn!("Failed to write staged audio {}: {e}", temp_path.display()); + return Err(AudioRejectionReason::FetchFailed); + }#!/bin/bash # Verify current staging path write semantics in the file. rg -n -C3 'stage_audio_from_bytes|tokio::fs::write|OpenOptions::new|create_new' clients/agent-runtime/src/channels/audio_media.rsAs per coding guidelines, “Security first, performance second… Look for behavioral regressions and contract breaks across modules.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/audio_media.rs` around lines 325 - 341, The staging logic in stage_audio_from_bytes currently writes directly to temp_path_async which can overwrite an existing file; change it to create a uniquely-created temporary file using tokio::fs::OpenOptions::new().write(true).create_new(true) (or equivalent) so the open fails if the name exists, write the owned bytes_to_write into that new file, flush/close, then atomically rename/replace into the final path with tokio::fs::rename; on any error ensure you remove the temp file and return AudioRejectionReason::FetchFailed. Reference stage_audio_from_bytes, temp_path/temp_path_async, bytes_to_write, and ensure cleanup and proper error mapping around the create_new/open/write/rename sequence.clients/agent-runtime/src/channels/mod.rs (1)
1383-1387:⚠️ Potential issue | 🟡 MinorReturn an explicit error instead of silently swallowing unexpected raw audio.
Line 1387 still returns
Ok(Vec::new()), which masks routing contract breaks and gets translated later into a generic failure path. Fail fast here with an explicit rejection reason.🔧 Proposed fix
- _ => return Ok(Vec::new()), + _ => { + tracing::error!( + "Unexpected raw audio routed to stage_channel_audio for channel={}", + msg.channel + ); + return Err(audio_media::AudioRejectionReason::SystemError); + }As per coding guidelines, “Security first, performance second… Look for behavioral regressions and contract breaks across modules.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/mod.rs` around lines 1383 - 1387, The match fallback currently swallows unexpected raw audio by returning Ok(Vec::new())—replace the "_ => return Ok(Vec::new())" arm with a fail-fast error return that conveys an explicit rejection reason (for example, return Err(ChannelRoutingError::UnexpectedRawAudio) or use anyhow::anyhow!("unexpected raw audio for channel") or the crate's existing rejection enum) so callers know routing contract was violated; update the error type/variant to use the module's existing error conversion (Into or From) so the function signature stays consistent and callers receive a clear error instead of an empty Vec.clients/agent-runtime/src/gateway/mod.rs (2)
2104-2145:⚠️ Potential issue | 🔴 Critical
/web/chat/audiostill stops at transcription instead of entering the conversation flow.This handler only treats the
audiopart specially, silently drains the optionalsession_idandlanguageparts, and then returnstranscription+done. It never resolves/creates a session or forwards the transcript throughwebhook_dispatch::execute(), so the gateway path still cannot continue a conversation or return the agent response required by Issue#412.Also applies to: 2242-2263
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/gateway/mod.rs` around lines 2104 - 2145, The multipart handling loop currently only extracts the "audio" part and silently drains other fields, then returns transcription+done without creating/resolving a session or forwarding the transcript; update the handler (the multipart loop that updates audio_part_count, declared_mime, audio_bytes_opt) to also read and save optional "session_id" and "language" fields, then after obtaining the transcript invoke session resolution/creation (the same session logic used by other chat paths) and call webhook_dispatch::execute() with the resolved session and the transcript so the gateway continues the conversation flow and returns the agent response instead of stopping at transcription; ensure you do not return early after transcription and handle errors consistently with existing audio_rejection_to_response and observer notifications (e.g., use AudioIngressEvent and AudioIngressOutcome as currently done).
1266-1275:⚠️ Potential issue | 🔴 CriticalThe audio route is still inside the app-wide 64 KiB / 30s middleware stack.
Because
/web/chat/audiois merged before the parent.layer(RequestBodyLimitLayer::new(MAX_BODY_SIZE))and.layer(TimeoutLayer::with_status_code(...)), those outer layers still wrap the audio endpoint. Uploads over 64 KiB and transcriptions over 30s will fail before the route’sDefaultBodyLimit::max(25 * 1024 * 1024)or innertokio::time::timeout(...)can take effect. Split the audio router out of the global stack, or apply those global layers only to the non-audio subrouter.In axum/tower, if a merged sub-router has `DefaultBodyLimit::max(25 * 1024 * 1024)` but the parent router later adds `RequestBodyLimitLayer` and `TimeoutLayer` with `Router::layer(...)`, do those outer layers still wrap the merged route and run before the inner route-specific limit/handler?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/gateway/mod.rs` around lines 1266 - 1275, The audio route "/web/chat/audio" is still wrapped by the app-wide RequestBodyLimitLayer::new(MAX_BODY_SIZE) and TimeoutLayer::with_status_code(...) because you merge the subrouter before applying those global .layer(...) calls; move the audio subrouter out of that global stack or apply the global layers only to the non-audio router so the audio route's DefaultBodyLimit::max(25 * 1024 * 1024) and its own timeout (e.g., tokio::time::timeout(...) used in handle_chat_audio) run first; specifically, create a separate Router with Router::new().route("/web/chat/audio", post(handle_chat_audio)).layer(DefaultBodyLimit::max(...)) and combine it with the main app router after the main router has been wrapped by RequestBodyLimitLayer and TimeoutLayer, or conversely apply the RequestBodyLimitLayer and TimeoutLayer to a subrouter that excludes the "/web/chat/audio" route.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/agent-runtime/src/channels/cli.rs`:
- Around line 223-235: The transcription path currently ignores the result of
tx.send(msg).await causing dropped messages when the receiver is closed; change
the send handling in the transcription branch (where ChannelMessage is
constructed and tx.send(msg).await is called) to mirror the normal text path
behavior in listen() by checking the Result, stopping/returning when the
receiver is gone (e.g., propagate an Err or break the listen loop) instead of
silently dropping; while here also ensure this matches the intended Channel
trait semantics (send/listen/health_check) used across src/channels/ and add
tests covering closed receiver behavior, auth/allowlist and health checks to
enforce consistent behavior.
- Around line 31-56: The interactive CLI currently constructs CliChannel::new()
(text-only) so audio commands never reach handle_audio_command; update the
interactive entrypoint that builds the channel to call
CliChannel::with_audio(...) instead, passing the actual transcriber (or None if
you want gated behavior), the configured AudioConfig, and the observer used by
the interactive runtime so audio is enabled when available; locate where
CliChannel::new() is instantiated in the interactive entrypoint (the live
interactive entry function) and replace it with a with_audio call wiring the
runtime's transcriber/audio_config/observer.
---
Duplicate comments:
In `@clients/agent-runtime/src/channels/audio_media.rs`:
- Around line 325-341: The staging logic in stage_audio_from_bytes currently
writes directly to temp_path_async which can overwrite an existing file; change
it to create a uniquely-created temporary file using
tokio::fs::OpenOptions::new().write(true).create_new(true) (or equivalent) so
the open fails if the name exists, write the owned bytes_to_write into that new
file, flush/close, then atomically rename/replace into the final path with
tokio::fs::rename; on any error ensure you remove the temp file and return
AudioRejectionReason::FetchFailed. Reference stage_audio_from_bytes,
temp_path/temp_path_async, bytes_to_write, and ensure cleanup and proper error
mapping around the create_new/open/write/rename sequence.
In `@clients/agent-runtime/src/channels/mod.rs`:
- Around line 1383-1387: The match fallback currently swallows unexpected raw
audio by returning Ok(Vec::new())—replace the "_ => return Ok(Vec::new())" arm
with a fail-fast error return that conveys an explicit rejection reason (for
example, return Err(ChannelRoutingError::UnexpectedRawAudio) or use
anyhow::anyhow!("unexpected raw audio for channel") or the crate's existing
rejection enum) so callers know routing contract was violated; update the error
type/variant to use the module's existing error conversion (Into or From) so the
function signature stays consistent and callers receive a clear error instead of
an empty Vec.
In `@clients/agent-runtime/src/gateway/mod.rs`:
- Around line 2104-2145: The multipart handling loop currently only extracts the
"audio" part and silently drains other fields, then returns transcription+done
without creating/resolving a session or forwarding the transcript; update the
handler (the multipart loop that updates audio_part_count, declared_mime,
audio_bytes_opt) to also read and save optional "session_id" and "language"
fields, then after obtaining the transcript invoke session resolution/creation
(the same session logic used by other chat paths) and call
webhook_dispatch::execute() with the resolved session and the transcript so the
gateway continues the conversation flow and returns the agent response instead
of stopping at transcription; ensure you do not return early after transcription
and handle errors consistently with existing audio_rejection_to_response and
observer notifications (e.g., use AudioIngressEvent and AudioIngressOutcome as
currently done).
- Around line 1266-1275: The audio route "/web/chat/audio" is still wrapped by
the app-wide RequestBodyLimitLayer::new(MAX_BODY_SIZE) and
TimeoutLayer::with_status_code(...) because you merge the subrouter before
applying those global .layer(...) calls; move the audio subrouter out of that
global stack or apply the global layers only to the non-audio router so the
audio route's DefaultBodyLimit::max(25 * 1024 * 1024) and its own timeout (e.g.,
tokio::time::timeout(...) used in handle_chat_audio) run first; specifically,
create a separate Router with Router::new().route("/web/chat/audio",
post(handle_chat_audio)).layer(DefaultBodyLimit::max(...)) and combine it with
the main app router after the main router has been wrapped by
RequestBodyLimitLayer and TimeoutLayer, or conversely apply the
RequestBodyLimitLayer and TimeoutLayer to a subrouter that excludes the
"/web/chat/audio" route.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b44f30ef-53d9-4f4c-a9a7-0447c0efd746
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (6)
clients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/cli.rsclients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/tests/admin_config_api_integration.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pr-checks
- GitHub Check: sonar
- GitHub Check: submit-gradle
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (7)
clients/agent-runtime/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Run
cargo fmt --all -- --check,cargo clippy --all-targets -- -D warnings, andcargo testfor code validation, or document which checks were skipped and why
Files:
clients/agent-runtime/tests/admin_config_api_integration.rsclients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/channels/cli.rs
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/agent-runtime/tests/admin_config_api_integration.rsclients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/channels/cli.rs
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
clients/agent-runtime/tests/admin_config_api_integration.rsclients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/channels/cli.rs
clients/agent-runtime/src/channels/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Implement
Channeltrait insrc/channels/with consistentsend,listen, andhealth_checksemantics and cover auth/allowlist/health behavior with tests
Files:
clients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/channels/cli.rs
clients/agent-runtime/src/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency
Files:
clients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/channels/cli.rs
clients/agent-runtime/src/{security,gateway,tools}/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Treat
src/security/,src/gateway/,src/tools/as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks
Files:
clients/agent-runtime/src/gateway/mod.rs
clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Files:
clients/agent-runtime/src/gateway/mod.rs
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests
Applied to files:
clients/agent-runtime/src/channels/dingtalk.rsclients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/channels/cli.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
Applied to files:
clients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/channels/cli.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Applied to files:
clients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/gateway/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Keep startup path lean and avoid heavy initialization in command parsing flow
Applied to files:
clients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/channels/cli.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/**/*.rs : Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency
Applied to files:
clients/agent-runtime/src/channels/audio_media.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools}/**/*.rs : Treat `src/security/`, `src/gateway/`, `src/tools/` as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks
Applied to files:
clients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/gateway/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why
Applied to files:
clients/agent-runtime/src/channels/audio_media.rsclients/agent-runtime/src/gateway/mod.rsclients/agent-runtime/src/channels/cli.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/tools/**/*.rs : Implement `Tool` trait in `src/tools/` with strict parameter schema, validate and sanitize all inputs, and return structured `ToolResult` without panics in runtime path
Applied to files:
clients/agent-runtime/src/channels/cli.rs
🔇 Additional comments (5)
clients/agent-runtime/src/channels/dingtalk.rs (2)
126-126: WebSocket text payload conversion looks correct.Line 126 and Line 229 now use
Message::Text(...into()), which is the right typed conversion and preserves existing ack/pong behavior.Also applies to: 229-229
506-507: Ping-frame test update is valid.Lines 506-507 correctly construct
Message::Pingwithbytes::Bytes, matching current tungstenite payload typing.clients/agent-runtime/src/channels/audio_media.rs (1)
857-960: Good coverage for stage validation and staging behavior.These async tests exercise rejection branches plus success-path invariants (MIME detection, disk write parity, origin propagation, SHA-256 shape/value). Nice coverage for this new shared ingress primitive.
clients/agent-runtime/tests/admin_config_api_integration.rs (1)
91-92:AppStatefixture update is correct for the new audio runtime fields.Adding
transcriberandaudio_confighere keeps integration test state construction aligned with the runtime struct shape.clients/agent-runtime/src/channels/mod.rs (1)
1084-1086:build_transcribervisibility change looks appropriate.Making this
pub(crate)cleanly enables gateway reuse without widening public API surface.
| /// Create a bare CLI channel with no audio support. | ||
| /// | ||
| /// Backward-compatible with `run_interactive()` which has no transcriber. | ||
| /// Attempting `/audio` will print a user-friendly "not available" message. | ||
| pub fn new() -> Self { | ||
| Self | ||
| Self { | ||
| transcriber: None, | ||
| audio_config: AudioConfig::default(), | ||
| observer: Arc::new(NoopObserver), | ||
| } | ||
| } | ||
|
|
||
| /// Create a CLI channel with full audio support. | ||
| /// | ||
| /// `transcriber: None` → audio pipeline gated at the transcriber check. | ||
| pub fn with_audio( | ||
| transcriber: Option<Arc<dyn Transcriber>>, | ||
| audio_config: AudioConfig, | ||
| observer: Arc<dyn Observer>, | ||
| ) -> Self { | ||
| Self { | ||
| transcriber, | ||
| audio_config, | ||
| observer, | ||
| } | ||
| } |
There was a problem hiding this comment.
Interactive mode still instantiates the text-only CLI channel.
CliChannel::with_audio() is added here, but the live interactive entrypoint in clients/agent-runtime/src/agent/agent.rs:1585-1605 still constructs CliChannel::new(). In the shipped CLI path that leaves audio_config default-disabled and transcriber unset, so /audio <path> never reaches handle_audio_command().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@clients/agent-runtime/src/channels/cli.rs` around lines 31 - 56, The
interactive CLI currently constructs CliChannel::new() (text-only) so audio
commands never reach handle_audio_command; update the interactive entrypoint
that builds the channel to call CliChannel::with_audio(...) instead, passing the
actual transcriber (or None if you want gated behavior), the configured
AudioConfig, and the observer used by the interactive runtime so audio is
enabled when available; locate where CliChannel::new() is instantiated in the
interactive entrypoint (the live interactive entry function) and replace it with
a with_audio call wiring the runtime's transcriber/audio_config/observer.
… fallback - cli.rs: Check tx.send() result instead of ignoring it - mod.rs: Return ChannelNotAllowed error for unknown channels instead of silently returning empty vec (security hardening)
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
clients/agent-runtime/src/channels/cli.rs (1)
66-70:⚠️ Potential issue | 🟠 MajorActually stop
listen()when the/audiosend fails.This still keeps the outer loop alive after receiver shutdown:
handle_audio_command()only logs the failedtx.send, and the/audiobranch keeps going. That diverges from the normal text path, which breaks immediately on a closed receiver. Bubble a status/result back tolisten()so this branch can break too, and add a closed-receiver regression test for/audio.Based on learnings "Implement
Channeltrait insrc/channels/with consistentsend,listen, andhealth_checksemantics and cover auth/allowlist/health behavior with tests".Also applies to: 223-238, 287-290
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/channels/cli.rs` around lines 66 - 70, The /audio branch in handle_audio_command currently logs a failed tx.send but does not inform listen() to stop, leaving the outer loop alive; change handle_audio_command to return a Result or status (e.g., Result<(), ChannelClosedError>) and propagate that result to listen() so listen() can break on a closed receiver just like the text path, update listen() to check the returned status and exit the loop on Err/closed, and add a regression test that closes the receiver and asserts that the /audio command path stops (covering the same behavior referenced around handle_audio_command, listen, and Channel send/listen semantics).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/agent-runtime/src/channels/cli.rs`:
- Around line 156-163: The CLI prints incorrect duration limits when
AudioRejectionReason::TooLong because the Err branch always passes bytes.len()
and self.audio_config.max_audio_bytes into cli_rejection_message; update the
Err(reason) handling so that when reason matches AudioRejectionReason::TooLong
you pass duration-specific values (e.g., the actual max duration or None if
unknown) into cli_rejection_message instead of byte counts, and for non-TooLong
reasons continue to pass byte values; locate the match arm that calls
cli_rejection_message and branch on AudioRejectionReason::TooLong to supply
appropriate duration arguments (using the audio_config duration field or None)
while keeping other paths unchanged.
- Around line 105-141: The code currently calls tokio::fs::metadata(path) then
tokio::fs::read(path), which is vulnerable to TOCTOU and special-file attacks;
change the flow to open the file once with tokio::fs::File::open(path).await,
call file.metadata() on that File and ensure metadata.file_type().is_file()
(reject otherwise), then wrap the File in a capped reader (e.g.
tokio::io::AsyncRead::take) limited to self.audio_config.max_audio_bytes + 1 and
read through that reader so you can detect and reject oversize reads before
allocating; when the capped read returns more-than-max, use
AudioRejectionReason::Oversize, cli_rejection_message and self.emit_rejected the
same as current logic, and only call stage_audio_from_bytes() with the validated
buffer.
---
Duplicate comments:
In `@clients/agent-runtime/src/channels/cli.rs`:
- Around line 66-70: The /audio branch in handle_audio_command currently logs a
failed tx.send but does not inform listen() to stop, leaving the outer loop
alive; change handle_audio_command to return a Result or status (e.g.,
Result<(), ChannelClosedError>) and propagate that result to listen() so
listen() can break on a closed receiver just like the text path, update listen()
to check the returned status and exit the loop on Err/closed, and add a
regression test that closes the receiver and asserts that the /audio command
path stops (covering the same behavior referenced around handle_audio_command,
listen, and Channel send/listen semantics).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3e186182-e807-469a-8ef5-aaa0613079d3
📒 Files selected for processing (2)
clients/agent-runtime/src/channels/cli.rsclients/agent-runtime/src/channels/mod.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: sonar
- GitHub Check: pr-checks
- GitHub Check: submit-gradle
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (5)
clients/agent-runtime/src/channels/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Implement
Channeltrait insrc/channels/with consistentsend,listen, andhealth_checksemantics and cover auth/allowlist/health behavior with tests
Files:
clients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/cli.rs
clients/agent-runtime/src/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency
Files:
clients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/cli.rs
clients/agent-runtime/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Run
cargo fmt --all -- --check,cargo clippy --all-targets -- -D warnings, andcargo testfor code validation, or document which checks were skipped and why
Files:
clients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/cli.rs
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/cli.rs
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
clients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/cli.rs
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
Applied to files:
clients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/cli.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests
Applied to files:
clients/agent-runtime/src/channels/mod.rsclients/agent-runtime/src/channels/cli.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Applied to files:
clients/agent-runtime/src/channels/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why
Applied to files:
clients/agent-runtime/src/channels/cli.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/tools/**/*.rs : Implement `Tool` trait in `src/tools/` with strict parameter schema, validate and sanitize all inputs, and return structured `ToolResult` without panics in runtime path
Applied to files:
clients/agent-runtime/src/channels/cli.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Keep startup path lean and avoid heavy initialization in command parsing flow
Applied to files:
clients/agent-runtime/src/channels/cli.rs
🔇 Additional comments (1)
clients/agent-runtime/src/channels/cli.rs (1)
31-56: Verify the live interactive path switched toCliChannel::with_audio().
CliChannel::new()still hard-disables audio withAudioConfig::default()andtranscriber: None. If the real interactive entrypoint still constructsnew(),/audio <path>remains unreachable despite this implementation.Expected result: the live CLI/interactive entrypoint uses
CliChannel::with_audio(...); if it still usesCliChannel::new(), the new command is effectively disabled.#!/bin/bash # Search for the live CLI constructor sites. # Expect interactive entrypoints to use CliChannel::with_audio(...). rg -n -C3 'run_interactive|CliChannel::(new|with_audio)\(' clients/agent-runtime/srcBased on learnings "Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths".
…tion - TOCTOU: Open file once, check metadata.file_type().is_file(), use capped reader to detect oversize before allocating full buffer - Duration limits: Show duration (not bytes) for TooLong rejections - Receiver close: Propagate error from handle_audio_command to listen() so the loop stops when receiver is closed (matching text path behavior) - Add ReceiverClosed error type for explicit signaling
…tion - TOCTOU: Open file once, check metadata.file_type().is_file(), use capped reader to detect oversize before allocating full buffer - Duration limits: Show duration (not bytes) for TooLong rejections - Receiver close: Propagate error from handle_audio_command to listen() so the loop stops when receiver is closed (matching text path behavior) - Add ReceiverClosed error type for explicit signaling - Fix clippy: usize::try_from() for truncation, unused Result in tests
|
|
Note Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it. Generating unit tests... This may take up to 20 minutes. |
|
✅ Created PR with unit tests: #438 |


This pull request introduces several improvements and refactorings to the audio ingestion pipeline, WebSocket message handling, and configuration validation in the agent runtime. The most significant change is the introduction of a shared utility function for staging and validating audio bytes, which is now used by the Telegram channel and tested extensively. Additional changes include minor improvements to WebSocket message handling in multiple channels and configuration updates to support more audio ingress channels.
Audio ingestion pipeline improvements:
stage_audio_from_bytesinaudio_media.rsto validate, hash, and atomically stage raw audio bytes to a temp file, consolidating logic previously duplicated in channel implementations. This function is now used by the Telegram channel for audio staging. [1] [2]stage_audio_from_bytes, covering all validation branches and file staging logic.WebSocket message handling consistency:
.send()calls indingtalk.rs,discord.rs,qq.rs, andlark.rsto explicitly convert payloads toBytesorStringas needed, improving type safety and compatibility with recent dependency versions. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]Configuration and channel support:
schema.rsfrom just "telegram" to include "gateway" and "cli", updating validation and warning messages accordingly. [1] [2]channels/mod.rsthat "gateway" and "cli" channels handle audio pre-pipeline and do not route raw audio through the staging function.Dependency updates:
multipartfeature foraxumand added theutilfeature fortowerinCargo.tomlto support new or upcoming functionality. [1] [2]Closes: #412