Skip to content

feat: Telegram two-way communication with voice message support#2

Merged
dzianisv merged 3 commits intomainfrom
feature/telegram-voice-messages
Jan 24, 2026
Merged

feat: Telegram two-way communication with voice message support#2
dzianisv merged 3 commits intomainfrom
feature/telegram-voice-messages

Conversation

@dzianisv
Copy link
Owner

Summary

Implements full Telegram integration for OpenCode notifications with two-way communication including voice message support.

Features

Outbound Notifications

  • Task completion notifications via Telegram (text + TTS audio)
  • Session context tracking for reply routing

Inbound Replies

  • Text message replies forwarded to OpenCode sessions
  • Voice/video message support with local Whisper STT transcription
  • Unified architecture: voice messages use telegram_replies table

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Telegram   │     │  telegram-  │     │  Supabase   │     │ TTS Plugin  │     │  Whisper    │
│   User      │     │  webhook    │     │  Realtime   │     │             │     │  Server     │
└──────┬──────┘     └──────┬──────┘     └──────┬──────┘     └──────┬──────┘     └──────┬──────┘
       │                   │                   │                   │                   │
       │ 🎤 Voice message  │                   │                   │                   │
       │──────────────────>│                   │                   │                   │
       │                   │ Download audio    │                   │                   │
       │                   │ (has BOT_TOKEN)   │                   │                   │
       │                   │                   │                   │                   │
       │                   │ INSERT telegram_  │                   │                   │
       │                   │ replies (audio)   │                   │                   │
       │                   │──────────────────>│                   │                   │
       │                   │                   │ WebSocket push    │                   │
       │                   │                   │──────────────────>│                   │
       │                   │                   │                   │ POST /transcribe  │
       │                   │                   │                   │──────────────────>│
       │                   │                   │                   │ {text: "..."}     │
       │                   │                   │                   │<──────────────────│
       │                   │                   │                   │ Forward to session│

Key Components

Component Description
telegram-webhook Edge Function Handles /start, /stop, /status, and incoming replies
send-notify Edge Function Sends notifications with session context
whisper/whisper_server.py Local Whisper STT server (port 8787)
subscribeToReplies() in tts.ts Unified handler for text + voice messages

Database Schema

  • telegram_subscribers - User subscriptions
  • telegram_reply_contexts - Active session routing (24h TTL)
  • telegram_replies - Incoming messages with new voice columns:
    • is_voice - Boolean flag for voice messages
    • audio_base64 - Base64-encoded audio from Edge Function
    • voice_file_type - voice, video_note, or video
    • voice_duration_seconds - Duration in seconds

Configuration

{
  "telegram": {
    "enabled": true,
    "uuid": "your-uuid-here",
    "receiveReplies": true
  },
  "whisper": {
    "enabled": true,
    "model": "base",
    "port": 8787
  }
}

Testing

  • 168 tests passing
  • Covers webhook handling, voice transcription, migration structure

Files Changed

  • tts.ts - Unified voice handling in subscribeToReplies()
  • supabase/functions/telegram-webhook/index.ts - Voice → telegram_replies
  • supabase/functions/send-notify/index.ts - Outbound notifications
  • supabase/migrations/ - 3 migration files
  • whisper/whisper_server.py - Local Whisper HTTP server
  • docs/telegram.md - Architecture documentation
  • test/tts.test.ts - Updated tests

Closes #1

Implements full Telegram integration for OpenCode notifications:

Outbound notifications:
- Task completion notifications via Telegram (text + TTS audio)
- Session context tracking for reply routing

Inbound replies:
- Text message replies forwarded to OpenCode sessions
- Voice/video message support with local Whisper STT transcription
- Unified architecture: voice messages use telegram_replies table

Key components:
- telegram-webhook Edge Function: handles /start, /stop, /status, replies
- send-notify Edge Function: sends notifications with session context
- Whisper server (localhost:8787): local speech-to-text transcription
- Supabase Realtime: WebSocket subscription for incoming messages

Database schema:
- telegram_subscribers: user subscriptions
- telegram_reply_contexts: active session routing (24h TTL)
- telegram_replies: incoming messages (text + voice with audio_base64)

Tests: 168 passing
- Merge telegram.design.md content into telegram.md (cleaner architecture)
- Delete obsolete telegram.design.md
- Add Whisper Server integration tests (health, models, transcribe)
- Add Whisper dependencies availability checks
- All 176 tests passing
…e-helpers

- Move whisper/, chatterbox/, coqui/ under opencode-helpers/
- Add HELPERS_DIR base constant in tts.ts
- Update all paths in code, tests, and documentation
- All 176 tests passing
@dzianisv dzianisv merged commit 484192d into main Jan 24, 2026
@dzianisv dzianisv deleted the feature/telegram-voice-messages branch January 24, 2026 20:39
@dzianisv
Copy link
Owner Author

Post-Merge Fix: Deployment Issue (Jan 25, 2026)

What Happened

After this PR was merged, Telegram replies weren't working. The code was correct but the Edge Functions weren't deployed to Supabase.

Root Cause

  • send-notify was deployed at 04:25 UTC on Jan 24
  • This PR was merged at 20:39 UTC on Jan 24
  • The deployed function was 16 hours older than the code, missing the reply context storage

Fix Applied

  1. ✅ Deployed send-notify v5 with reply context code
  2. ✅ Deployed telegram-webhook v8 with simplified emoji confirmations
  3. ✅ Added CI/CD: .github/workflows/deploy-supabase.yml
  4. ✅ Added deploy script: scripts/deploy-supabase.sh
  5. ✅ Set GitHub secrets: SUPABASE_ACCESS_TOKEN, SUPABASE_PROJECT_REF, SUPABASE_DB_PASSWORD

Going Forward

Supabase functions now auto-deploy when files in supabase/ change and are merged to main/master.

Verification

Test Session ID: ses_test_1769374929564
✓ Notification sent: reply_enabled=true
✓ Reply forwarded to session

dzianisv added a commit that referenced this pull request Jan 26, 2026
- Remove telegram.e2e.test.ts which spawned full OpenCode server and had
  model authentication issues causing timeouts
- Add telegram.integration.test.ts with 10 focused tests that verify:
  - Bug fix #1: Uses 👍 emoji instead of invalid ✅ for reactions
  - Bug fix #2: Skips subagent sessions (checks parentID)
  - API function signatures and documentation
- Update package.json test scripts

All 193 tests now pass reliably.
dzianisv added a commit that referenced this pull request Feb 11, 2026
* feat: Add Telegram two-way communication with voice message support

Implements full Telegram integration for OpenCode notifications:

Outbound notifications:
- Task completion notifications via Telegram (text + TTS audio)
- Session context tracking for reply routing

Inbound replies:
- Text message replies forwarded to OpenCode sessions
- Voice/video message support with local Whisper STT transcription
- Unified architecture: voice messages use telegram_replies table

Key components:
- telegram-webhook Edge Function: handles /start, /stop, /status, replies
- send-notify Edge Function: sends notifications with session context
- Whisper server (localhost:8787): local speech-to-text transcription
- Supabase Realtime: WebSocket subscription for incoming messages

Database schema:
- telegram_subscribers: user subscriptions
- telegram_reply_contexts: active session routing (24h TTL)
- telegram_replies: incoming messages (text + voice with audio_base64)

Tests: 168 passing

* docs: Consolidate telegram docs and add Whisper integration tests

- Merge telegram.design.md content into telegram.md (cleaner architecture)
- Delete obsolete telegram.design.md
- Add Whisper Server integration tests (health, models, transcribe)
- Add Whisper dependencies availability checks
- All 176 tests passing

* refactor: Consolidate plugin helpers under ~/.config/opencode/opencode-helpers

- Move whisper/, chatterbox/, coqui/ under opencode-helpers/
- Add HELPERS_DIR base constant in tts.ts
- Update all paths in code, tests, and documentation
- All 176 tests passing
dzianisv added a commit that referenced this pull request Feb 11, 2026
- Remove telegram.e2e.test.ts which spawned full OpenCode server and had
  model authentication issues causing timeouts
- Add telegram.integration.test.ts with 10 focused tests that verify:
  - Bug fix #1: Uses 👍 emoji instead of invalid ✅ for reactions
  - Bug fix #2: Skips subagent sessions (checks parentID)
  - API function signatures and documentation
- Update package.json test scripts

All 193 tests now pass reliably.
dzianisv pushed a commit that referenced this pull request Feb 13, 2026
The speak() function had its own reflection verdict check (requireVerdict)
that was independent from the event handler's check (waitForVerdict).
Setting waitForVerdict:false in config bypassed gate #1 in the event
handler, but gate #2 in speak() still blocked all speech because
requireVerdict defaults to true independently.

This caused 100% of TTS attempts to be blocked with either:
- 'Speak blocked: missing reflection verdict'
- 'Speak blocked: reflection verdict incomplete'

Fix: Remove the redundant verdict check from speak(). The event handler
already makes the verdict decision before calling speak() — having
speak() second-guess that decision was a design bug.
dzianisv pushed a commit that referenced this pull request Feb 13, 2026
The speak() function had its own reflection verdict check (requireVerdict)
that was independent from the event handler's check (waitForVerdict).
Setting waitForVerdict:false in config bypassed gate #1 in the event
handler, but gate #2 in speak() still blocked all speech because
requireVerdict defaults to true independently.

This caused 100% of TTS attempts to be blocked with either:
- 'Speak blocked: missing reflection verdict'
- 'Speak blocked: reflection verdict incomplete'

Fix: Remove the redundant verdict check from speak(). The event handler
already makes the verdict decision before calling speak() — having
speak() second-guess that decision was a design bug.
dzianisv added a commit that referenced this pull request Feb 13, 2026
#70)

* fix(tts): remove redundant reflection verdict gate in speak()

The speak() function had its own reflection verdict check (requireVerdict)
that was independent from the event handler's check (waitForVerdict).
Setting waitForVerdict:false in config bypassed gate #1 in the event
handler, but gate #2 in speak() still blocked all speech because
requireVerdict defaults to true independently.

This caused 100% of TTS attempts to be blocked with either:
- 'Speak blocked: missing reflection verdict'
- 'Speak blocked: reflection verdict incomplete'

Fix: Remove the redundant verdict check from speak(). The event handler
already makes the verdict decision before calling speak() — having
speak() second-guess that decision was a design bug.

* feat(tts): add Coqui TTS setup script and update install:tts

- Add scripts/setup-coqui.sh: creates Python venv, installs TTS + PyTorch
  + transformers<4.50, verifies import, runs synthesis test with playback
- Update install:tts npm script to run setup-coqui.sh after deploying plugin
- Supports --force flag to recreate existing venv
- Requires Python 3.10-3.12 for TTS compatibility

* fix(tts): add Coqui health check logging and OS TTS fallback

- setupCoqui() now logs clear error messages instead of silently returning false
- Verify TTS import after pip install to catch broken installs
- speak() falls back to OS TTS when Coqui is unavailable or synthesis fails
- Error messages include 'Run: npm run install:tts' for manual recovery

* docs: refactor tts.design.md to tts.md with updated content

- Rename docs/tts.design.md → docs/tts.md
- Update model from Jenny to VCTK VITS (multi-speaker, p226)
- Update device from cpu to mps (Apple Silicon)
- Add setup section (npm run install:tts, setup-coqui.sh)
- Add fallback behavior section (Coqui → OS TTS)
- Add full engine/model table with all supported options
- Update config example with speaker, correct model/device
- Simplify architecture diagram

---------

Co-authored-by: engineer <engineer@opencode.ai>
dzianisv pushed a commit that referenced this pull request Feb 15, 2026
Fix 4 anomalies in existing eval assertions:
- promptfooconfig.yaml #19: misleading description (said COMPLETE, asserted incomplete)
- stuck-detection.yaml 'Task finished': loose assertion allowed reason=working
- stuck-detection.yaml 'Very short delay': tautological assertion always passed
- post-compression.yaml #2/#3: accepted continue_task when needs_github_update correct

Add 19 new eval test cases:
- 8 judge eval cases (23→31): mid-task stop, subtle warnings, retry loops,
  partial impl, gold-plating, context exhaustion, missing tests, main push
- 6 stuck detection cases (12→18): retry loop, slow build, incomplete msg,
  planning-only, rate limited, stuck-not-complete
- 5 post-compression cases (12→17): failing CI, mid-debug, multi-PR,
  blocked on secrets, force-push

All evals pass: judge 31/31, stuck 18/18, compression 17/17.
Unit tests: 319 passed, 5 skipped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telegram Voice Message Support - Unified Architecture

1 participant