Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,438 changes: 521 additions & 917 deletions Cargo.lock

Large diffs are not rendered by default.

6 changes: 6 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -150,12 +150,18 @@ tempfile = "3"

# Prometheus metrics (optional, behind "metrics" feature)
prometheus = { version = "0.13", optional = true }
whisper-rs = { version = "0.15", optional = true, features = ["vulkan"] }
hf-hub = { version = "0.5", optional = true }
symphonia = { version = "0.5", features = ["mp3", "aac", "flac", "ogg", "wav", "isomp4"], optional = true }
ogg = { version = "0.9", optional = true }
opus = { version = "0.3", optional = true }
pdf-extract = "0.10.0"
open = "5.3.3"
urlencoding = "2.1.3"
moka = "0.12.13"

[features]
stt-whisper = ["dep:whisper-rs", "dep:hf-hub", "dep:symphonia", "dep:ogg", "dep:opus"]
metrics = ["dep:prometheus"]

[lints.clippy]
Expand Down
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,30 @@ channel = "my-provider/my-model"

Additional built-in providers include **Kilo Gateway**, **OpenCode Go**, **NVIDIA**, **MiniMax**, **Moonshot AI (Kimi)**, and **Z.AI Coding Plan** — configure with `kilo_key`, `opencode_go_key`, `nvidia_key`, `minimax_key`, `moonshot_key`, or `zai_coding_plan_key` in `[llm]`.

### Voice Transcription

Audio attachments (voice messages, audio files) are transcribed before being passed to the channel. Set `routing.voice` to choose the backend:

**Provider-based** — route through any configured LLM provider that supports audio input:

```toml
[defaults.routing]
voice = "openai/whisper-1"
```

**Local Whisper** (`stt-whisper` feature, requires `--features stt-whisper` at build time) — run inference locally via [whisper-rs](https://codeberg.org/tazz4843/whisper-rs), no API call needed:

```toml
[defaults.routing]
voice = "whisper-local://small"
```

The model is downloaded automatically from [`ggerganov/whisper.cpp`](https://huggingface.co/ggerganov/whisper.cpp) on first use and cached in `~/.cache/huggingface/hub`. Supported size names: `tiny`, `tiny.en`, `base`, `base.en`, `small`, `small.en`, `medium`, `medium.en`, `large`, `large-v1`, `large-v2`, `large-v3`. An absolute path to a GGML model file also works.

GPU acceleration via Vulkan is enabled automatically when a compatible device is detected. The loaded model is cached for the process lifetime — restart to switch models.

Ogg/Opus audio (Telegram voice messages) is decoded natively. All other formats are handled via symphonia.

### Skills

Extensible skill system integrated with [skills.sh](https://skills.sh):
Expand Down
1 change: 1 addition & 0 deletions prompts/en/tools/transcribe_audio_description.md.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Transcribe an audio file to text using local speech-to-text. Provide the path to the audio file. Supports ogg, opus, mp3, flac, wav, and m4a formats. Use this instead of external whisper CLI tools.
Loading
Loading