Context
Follow-up from #246 (audio input support Phase 1). Phase 1 delivered core transcription infrastructure + Telegram channel. Phase 2 extends audio input to the remaining two entry points defined in the PRD.
Scope
HTTP Gateway
- New
POST /web/chat/audio endpoint accepting multipart/form-data
- Fields:
audio (file), session_id (optional), language (optional)
- Body limit increased to 25 MiB for this endpoint only
- Validate file, stage, transcribe, dispatch text through existing path
- Return transcription + agent response
CLI
/audio <path> command for local file transcription
- Read local file, validate format/size/duration
- Stage as StagedAudio, transcribe, inject text
Optional
- whisper-rs embedded transcription behind
--features audio-transcription (zero external dependency)
- Model auto-download tooling
Acceptance Criteria
References
Context
Follow-up from #246 (audio input support Phase 1). Phase 1 delivered core transcription infrastructure + Telegram channel. Phase 2 extends audio input to the remaining two entry points defined in the PRD.
Scope
HTTP Gateway
POST /web/chat/audioendpoint acceptingmultipart/form-dataaudio(file),session_id(optional),language(optional)CLI
/audio <path>command for local file transcriptionOptional
--features audio-transcription(zero external dependency)Acceptance Criteria
References
openspec/changes/archive/2026-04-03-audio-input-support/proposal.md(Phase 2 section)openspec/changes/archive/2026-04-03-audio-input-support/design.mdopenspec/specs/audio-input/spec.md