-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat(azure): Add streaming TTS support with connection pooling #4659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR expands Azure plugin capabilities by adding Azure to the test workflow matrix, introducing Azure TTS streaming support with synthesizer pool management and prewarming, updating documentation with prerequisites and quick-start examples, and extending test coverage for Azure TTS integration. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant TTS as Azure TTS
participant Pool as Synthesizer Pool
participant Stream as SynthesizeStream
participant Synth as SpeechSynthesizer
participant Azure as Azure WebSocket
Client->>TTS: stream()
TTS->>Stream: create SynthesizeStream
Stream->>Stream: start background prewarm
loop Prewarm (num_prewarm synthesizers)
Pool->>Synth: create & configure
Synth->>Azure: connect WebSocket
Azure-->>Synth: ready
Synth-->>Pool: add to pool
end
Client->>Stream: send text chunks
Stream->>Pool: acquire synthesizer
Pool-->>Stream: return warmed synth
Stream->>Synth: stream text (configure voice/format)
Synth->>Azure: stream audio via WebSocket
Azure-->>Synth: audio chunks + metadata
loop Audio streaming
Synth->>Stream: on_synthesis_complete callback
Stream->>Stream: queue audio data
Stream-->>Client: emit audio chunks
end
Stream->>Pool: release/replace synthesizer
Pool->>Synth: close if expired/failed
Client->>TTS: aclose()
TTS->>Stream: cancel prewarm tasks
Stream->>Pool: shutdown all synthesizers
Pool->>Synth: detach handlers & close
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 🧪 Unit Test Generation v2 is now available!We have significantly improved our unit test generation capabilities. To enable: Add this to your reviews:
finishing_touches:
unit_tests:
enabled: trueTry it out by using the Have feedback? Share your thoughts on our Discord thread! Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py`:
- Line 685: The call to
asyncio.create_task(self._create_replacement_synthesizers(needed)) is
fire-and-forget and may be garbage-collected; change it to create and retain the
Task so it isn’t collected and can be cleaned up: instantiate the Task via
asyncio.create_task for _create_replacement_synthesizers, store the Task in a
container (e.g., self._background_tasks set) and register a done callback to
remove it (use task.add_done_callback(self._background_tasks.discard)) so you
can manage lifecycle and cancel/await tasks during shutdown.
- Around line 374-384: Replace the deprecated asyncio.get_event_loop() call with
asyncio.get_running_loop() inside the async block that runs
_sync_create_and_warmup in a threadpool; specifically update the variable
assignment used by loop.run_in_executor in the async create/warmup section (the
code that catches asyncio.TimeoutError and raises APITimeoutError /
APIConnectionError) so it uses asyncio.get_running_loop() to obtain the current
loop.
- Around line 802-803: Remove the raw ANSI escape sequences from the logger
output in the SDK synthesis thread startup log: update the logger.debug call
that currently uses threading.current_thread().ident (variable thread_id) to log
a plain, uncolored message like "SDK synthesis thread started
(thread_id={thread_id})" so coloring is left to logging formatters instead of
embedding "\033[92m" and "\033[0m" in the message.
- Around line 794-796: Replace deprecated asyncio.get_event_loop() with
asyncio.get_running_loop() when capturing the loop before spawning threads;
specifically update the call in the tts module where "loop =
asyncio.get_event_loop()" is used (and the similar usage inside
_create_and_warmup_synthesizer) to use asyncio.get_running_loop() so the running
event loop is retrieved safely in async contexts.
- Around line 165-183: Summary: Docstring default for num_prewarm is
inconsistent with the function signature. Update the constructor docstring for
the Azure TTS class (the __init__ that defines the num_prewarm parameter) so the
described default matches the actual default of num_prewarm: int = 3 (or
alternatively change the parameter default to 10 if you intended the docstring
to be correct); specifically edit the "num_prewarm" line in the docstring to
state default 3 (or change the parameter in the __init__ signature) so the
docstring and the num_prewarm parameter remain consistent.
- Around line 54-60: The SDK_OUTPUT_FORMATS mapping is missing entries for 22050
and 44100 Hz, causing a legit SUPPORTED_OUTPUT_FORMATS selection (checked in the
class constructor) to silently fall back during streaming synthesis; add
mappings for 22050 ->
speechsdk.SpeechSynthesisOutputFormat.Raw22050Hz16BitMonoPcm and 44100 ->
speechsdk.SpeechSynthesisOutputFormat.Raw44100Hz16BitMonoPcm to
SDK_OUTPUT_FORMATS so it matches SUPPORTED_OUTPUT_FORMATS and the HTTP API, and
update the comment to reflect Raw formats for these sample rates as well.
In `@tests/test_tts.py`:
- Around line 415-421: The test creates azure.TTS() without specifying region
which lets it read AZURE_SPEECH_REGION and can mismatch the hardcoded
proxy-upstream "westus.tts.speech.microsoft.com:443"; update the test parameter
to construct the client with speech_region="westus" (i.e.,
azure.TTS(speech_region="westus")) so the TTS client builds endpoints that match
the proxy-upstream and ensure consistent routing.
🧹 Nitpick comments (2)
livekit-plugins/livekit-plugins-azure/README.md (1)
48-69: Consider clarifying the Quick Start introduction.Line 50 says "For more control over individual components:" which implies there should be a simpler example shown first (e.g., a basic pipeline mode). However, no such simpler example precedes this section. Consider either:
- Removing/rephrasing the introductory text, or
- Adding a simpler example before this one
Also, the example shows hardcoded credentials (
api_key="your-api-key"). While this is placeholder text, consider adding a comment or note encouraging users to use environment variables instead for better security practices.livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py (1)
419-450: Accessing private pool internals creates fragile coupling.The
prewarm()method directly accesses_connect_lock,_connections, and_availableon theConnectionPool. This couples tightly to the pool's internal implementation and will break ifConnectionPoolchanges its internals.Consider whether
ConnectionPoolshould expose a public prewarm API, or if this logic should be handled differently.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/tests.ymllivekit-plugins/livekit-plugins-azure/README.mdlivekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.pytests/test_tts.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.pytests/test_tts.py
🧬 Code graph analysis (1)
tests/test_tts.py (2)
livekit-agents/livekit/agents/tts/tts.py (2)
label(85-86)TTS(66-166)livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py (1)
TTS(149-476)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit-tests
- GitHub Check: type-check (3.13)
- GitHub Check: type-check (3.9)
🔇 Additional comments (7)
.github/workflows/tests.yml (1)
86-95: LGTM!Azure is correctly added to the plugin test matrix, and the required environment variables (
AZURE_SPEECH_KEY,AZURE_SPEECH_REGION) are already configured in the secrets section.tests/test_tts.py (3)
154-160: LGTM!Azure TTS correctly added to the SYNTHESIZE_TTS test matrix with appropriate proxy configuration.
331-335: Good defensive cleanup before skip.Properly closing the TTS instance before skipping prevents leaked async tasks. The skip reason clearly explains the WebSocket SDK incompatibility with toxiproxy.
678-682: Consistent skip pattern with synthesize timeout test.The skip logic mirrors the pattern used in
test_tts_synthesize_timeout, maintaining consistency across timeout tests.livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py (3)
825-858: Blockingfuture.result()in callback may cause issues.In
synthesizing_callback, callingfuture.result()blocks until the async queue put completes. While this ensures ordering, it could potentially cause issues if the event loop is blocked or slow. The Azure SDK callback thread will be blocked waiting for the asyncio operation.This is likely acceptable given the need for ordering, but worth noting for debugging if latency issues arise.
1088-1137: Cancellation handling is thorough and well-structured.The cancellation flow properly:
- Sets the cancelled flag
- Sends stop signal to synthesizer (non-blocking)
- Signals SDK thread to stop via text_queue
- Waits for synthesis thread with timeout
- Flushes and ends segment
- Drains queues
- Re-raises to trigger pool cleanup
This comprehensive cleanup prevents resource leaks on interruption.
696-718: Task cancellation ordering could leave orphaned tasks.If
asyncio.gather(*tasks)raises an exception,gracefully_cancel(*tasks)is called in the finally block. However, if the gather completes normally but one task had already failed, the other task might still be running whengracefully_cancelis called.The current pattern should work correctly since
gracefully_cancelhandles already-completed tasks, but ensuregracefully_cancelproperly handles all edge cases.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py
Outdated
Show resolved
Hide resolved
livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py
Outdated
Show resolved
Hide resolved
livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| speech_config = speechsdk.SpeechConfig( | ||
| endpoint=endpoint, | ||
| subscription=self._opts.subscription_key or "", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Streaming TTS ignores auth_token authentication, only uses subscription_key
When users authenticate using speech_auth_token instead of speech_key, the streaming TTS implementation fails silently because it only passes subscription_key to the Azure SDK.
Click to expand
Problem
The _create_and_warmup_synthesizer method at line 310-312 creates a SpeechConfig with only the subscription key:
speech_config = speechsdk.SpeechConfig(
endpoint=endpoint,
subscription=self._opts.subscription_key or "",
)When subscription_key is None (because the user provided speech_auth_token instead), this passes an empty string, causing authentication to fail.
Contrast with ChunkedStream
The HTTP-based ChunkedStream._run() at tts.py:526-530 correctly handles both authentication methods:
if self._opts.auth_token:
headers["Authorization"] = f"Bearer {self._opts.auth_token}"
elif self._opts.subscription_key:
headers["Ocp-Apim-Subscription-Key"] = self._opts.subscription_keyImpact
Users who configure Azure TTS with speech_auth_token (e.g., using Microsoft Entra authentication) will have working non-streaming synthesis but broken streaming synthesis.
Recommendation: After creating the SpeechConfig, check for auth_token and set it:
speech_config = speechsdk.SpeechConfig(
endpoint=endpoint,
subscription=self._opts.subscription_key or "",
)
if self._opts.auth_token:
speech_config.authorization_token = self._opts.auth_tokenWas this helpful? React with 👍 or 👎 to provide feedback.
Summary
Adds streaming text-to-speech (TTS) support to the Azure plugin using the Azure Speech SDK with WebSocket-based streaming. This enables real-time audio synthesis with lower latency compared to the chunked HTTP approach.
Changes
Streaming TTS Implementation
SynthesizeStreamclass using Azure Speech SDK'sTextStreaminput typestop_speaking_async()for clean interruptionConnection Pooling
num_prewarmparameter, default: 3)Connection.from_speech_synthesizer()for faster first synthesisDocumentation
Testing
STREAM_TTStest suiteSummary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.