-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat(livekit-plugins-typecast): add Typecast TTS plugin #3633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0d2a125 to
03c6bf6
Compare
|
Update (2026-01-08): Rebasing done onto latest main.
|
|
Hi @theomonnom @chenghao-mou 👋 I rebased this PR onto the latest main today and aligned it with the current plugin and test patterns. For context, this adds a Typecast TTS plugin following the same structure as existing providers (e.g. ElevenLabs, Cartesia), FYI: Typecast is a TTS service developed by the company I work at, and I’m committed to maintaining this plugin long-term. Could you advise on the preferred path to land this?
Happy to adapt naming, env vars, or structure based on your guidance. Thanks! |
27a4a08 to
4f8272b
Compare
|
Gentle bump @theomonnom — just looking for a quick direction on in-tree merge vs external plugin. |
📝 WalkthroughWalkthroughThis pull request introduces a complete Typecast Text-to-Speech plugin for LiveKit, including the plugin implementation with voice discovery and synthesis capabilities, supporting models and type definitions, example code demonstrating various synthesis scenarios, and updated test infrastructure to integrate the new backend. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as LiveKit Agent
participant TTS as Typecast TTS
participant API as Typecast API
participant Room as LiveKit Room
Agent->>TTS: synthesize(text, conn_options)
TTS->>TTS: apply PromptOptions & OutputOptions
TTS->>API: POST /synthesize (with payload)
API-->>TTS: audio stream (chunks)
TTS->>TTS: map format to MIME type
TTS->>Room: stream audio chunks via AudioTrack
Note over TTS,Room: Live audio playback
TTS-->>Agent: ChunkedStream complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (13)
🚧 Files skipped from review as they are similar to previous changes (2)
🧰 Additional context used📓 Path-based instructions (1)**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (1)📚 Learning: 2026-01-16T07:44:56.353ZApplied to files:
🧬 Code graph analysis (3)livekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/models.py (2)
livekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/__init__.py (3)
livekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/tts.py (4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (27)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@examples/other/text-to-speech/typecast_tts.py`:
- Around line 16-127: Add a Google-style docstring and an explicit return type
to the entrypoint (change async def entrypoint(job: JobContext) to async def
entrypoint(job: JobContext) -> None) and ensure the Typecast client is closed to
avoid leaking aiohttp sessions by wrapping the existing logic in a try/finally
(create tts = typecast.TTS(...) before the try, keep current try/except for
list_voices and the rest in the try block, and in the finally call await
tts.close()) so the tts client is always closed even on errors; reference
symbols: entrypoint, tts, typecast.TTS, and tts.close().
In `@livekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/tts.py`:
- Around line 285-299: The code builds the API payload with user-provided
options but later uses self._opts.sample_rate (user-controlled) to initialize
the output emitter, causing a metadata mismatch; update the emitter
initialization to always use sample_rate=44100 (hardcode 44100 Hz) instead of
self._opts.sample_rate so the output metadata matches Typecast v1's fixed 44,100
Hz output (refer to the payload construction and the use of
self._opts.sample_rate in this module).
- Around line 339-346: The emitter's MIME type is hardcoded to "audio/wav" but
must reflect output_options.audio_format; update the call to
output_emitter.initialize (in function/method where request_id, sample_rate,
num_channels are set) to set mime_type dynamically by mapping
output_options.audio_format == "mp3" -> "audio/mpeg", "wav" -> "audio/wav" (and
fall back to a safe default or raise if unsupported). Ensure you reference
output_options.audio_format when constructing the mime_type argument so the
declared MIME matches the Typecast API response.
🧹 Nitpick comments (2)
livekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/log.py (1)
1-3: Add a short module docstring for consistency.
Optional but aligns with the docstring guideline.Suggested change
+"""Logging utilities for the Typecast plugin.""" import logging logger = logging.getLogger("livekit.plugins.typecast")As per coding guidelines: Use Google-style docstrings.
tests/test_tts.py (1)
253-257: Prefer the shared DEFAULT_VOICE_ID instead of a hard-coded literal.This keeps the test aligned with the plugin constant and avoids drift if the default changes.
🔧 Proposed change
- "tts": typecast.TTS( - voice=os.getenv("TYPECAST_VOICE_ID", "tc_62a8975e695ad26f7fb514d1") - ), + "tts": typecast.TTS( + voice=os.getenv("TYPECAST_VOICE_ID", typecast.DEFAULT_VOICE_ID) + ),
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
examples/other/text-to-speech/requirements.txtexamples/other/text-to-speech/typecast_tts.pylivekit-agents/livekit/agents/cli/cli.pylivekit-plugins/livekit-plugins-typecast/README.mdlivekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/__init__.pylivekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/log.pylivekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/models.pylivekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/py.typedlivekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/tts.pylivekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/version.pylivekit-plugins/livekit-plugins-typecast/pyproject.tomlpyproject.tomltests/docker-compose.ymltests/test_tts.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/version.pyexamples/other/text-to-speech/typecast_tts.pylivekit-agents/livekit/agents/cli/cli.pylivekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/log.pytests/test_tts.pylivekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/models.pylivekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/tts.pylivekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/__init__.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Implement Model Interface Pattern for STT, TTS, LLM, and Realtime models with provider-agnostic interfaces, fallback adapters for resilience, and stream adapters for different streaming patterns
📚 Learning: 2026-01-16T07:44:56.353Z
Learnt from: CR
Repo: livekit/agents PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-16T07:44:56.353Z
Learning: Follow the Plugin System pattern where plugins in livekit-plugins/ are separate packages registered via the Plugin base class
Applied to files:
examples/other/text-to-speech/requirements.txtpyproject.toml
📚 Learning: 2026-01-22T03:28:16.289Z
Learnt from: longcw
Repo: livekit/agents PR: 4563
File: livekit-agents/livekit/agents/beta/tools/end_call.py:65-65
Timestamp: 2026-01-22T03:28:16.289Z
Learning: In code paths that check capabilities or behavior of the LLM processing the current interaction, prefer using the activity's LLM obtained via ctx.session.current_agent._get_activity_or_raise().llm instead of ctx.session.llm. The session-level LLM may be a fallback and not reflect the actual agent handling the interaction. Use the activity LLM to determine capabilities and to make capability checks or feature toggles relevant to the current processing agent.
Applied to files:
livekit-agents/livekit/agents/cli/cli.py
🧬 Code graph analysis (3)
tests/test_tts.py (1)
livekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/tts.py (1)
TTS(55-256)
livekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/models.py (2)
livekit-plugins/livekit-plugins-hume/livekit/plugins/hume/tts.py (1)
AudioFormat(68-73)livekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/tts.py (1)
model(112-113)
livekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/tts.py (3)
livekit-agents/livekit/agents/_exceptions.py (3)
APIConnectionError(84-88)APIStatusError(45-81)APITimeoutError(91-95)livekit-agents/livekit/agents/types.py (1)
APIConnectOptions(54-88)livekit-agents/livekit/agents/utils/misc.py (2)
is_given(25-26)shortuuid(21-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit-tests
- GitHub Check: type-check (3.13)
🔇 Additional comments (9)
livekit-plugins/livekit-plugins-typecast/README.md (1)
1-15: Clear and helpful README.livekit-agents/livekit/agents/cli/cli.py (4)
724-726: Formatting improvement looks good.
980-982: Readable prompt formatting.
989-991: Shortcut rendering update is clear.
1167-1183: Nice UX improvement for JSON outputs.examples/other/text-to-speech/requirements.txt (1)
6-6: Dependency addition looks good.livekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/version.py (1)
1-15: Version module is straightforward.tests/docker-compose.yml (1)
63-64: Test environment wiring looks consistent.Also applies to: 84-84
pyproject.toml (1)
49-49: Workspace source addition is correct.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
livekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/tts.py
Show resolved
Hide resolved
livekit-plugins/livekit-plugins-typecast/livekit/plugins/typecast/tts.py
Outdated
Show resolved
Hide resolved
- Add emotion control with presets and intensity adjustment - Support 30+ languages (ISO 639-3) - Audio customization support - Include standard LiveKit agent example - Add Docker test configuration
Address PR review feedback to improve consistency with other TTS plugins. Changes: - Add DEFAULT_VOICE_ID constant (Olivia voice) - Add Voice dataclass for voice metadata - Make voice parameter optional with default value - Implement list_voices() method to query available voices - Add voice listing example to demo script - Export Voice and DEFAULT_VOICE_ID in public API The plugin now follows the same pattern as ElevenLabs, providing: - Default voice for quick start - Voice discovery through list_voices() - Improved user experience
- Change mime_type from 'audio/pcm' to 'audio/wav' to leverage AudioEmitter's built-in WAV header parsing - Remove manual 44-byte WAV header stripping (AudioEmitter handles this automatically) - Update livekit-agents dependency from >=1.2.14 to >=1.3.10
…string and improved error handling - Added a comprehensive docstring to the entrypoint function, outlining its purpose and arguments. - Improved error handling when listing available voices, logging warnings for exceptions. - Streamlined the synthesis examples for emotional expression and audio adjustments, ensuring clarity and consistency in the demonstration flow.
- Implemented dynamic mapping of audio formats to MIME types, supporting 'mp3' and 'wav'. - Updated output emitter initialization to use the appropriate MIME type based on user options. - Ensured consistent sample rate of 44,100 Hz for Typecast v1 API outputs.
ed10b26 to
6315704
Compare
Summary
Adds Typecast TTS plugin for creating lifelike speech with unique character voices and precise emotion control.
About Typecast
Typecast is a text-to-speech service that offers 180+ unique AI voices with distinct personalities across multiple languages. The Typecast API enables precise customization of tone, emotion, speed, and pitch, allowing developers to create rich, expressive audio that brings conversational AI to life. With ultra-fast synthesis speeds, it can turn hours of content into high-quality audio in minutes.
Key Features
Character Voices & Emotion Control
normal,happy,sad,angry, etc.Audio Customization
Implementation
Testing
tests/test_tts.pyFiles Changed
Plugin:
livekit-plugins/livekit-plugins-typecast/Example:
examples/other/text-to-speech/typecast_tts.pyConfiguration:
pyproject.toml: workspace configurationtests/docker-compose.yml: test environmenttests/test_tts.py: test integrationexamples/other/text-to-speech/requirements.txtSummary by CodeRabbit
New Features
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.