Skip to content

Conversation

@007DXR
Copy link

@007DXR 007DXR commented Jan 30, 2026

Summary

Adds streaming text-to-speech (TTS) support to the Azure plugin using the Azure Speech SDK with WebSocket-based streaming. This enables real-time audio synthesis with lower latency compared to the chunked HTTP approach.

Changes

Streaming TTS Implementation

  • Implemented SynthesizeStream class using Azure Speech SDK's TextStream input type
  • Audio chunks are streamed back via SDK callbacks as synthesis progresses
  • Proper cancellation handling with stop_speaking_async() for clean interruption

Connection Pooling

  • Added connection pool for Azure Speech synthesizers with configurable pre-warming (num_prewarm parameter, default: 3)
  • Pre-connects WebSocket connections using Connection.from_speech_synthesizer() for faster first synthesis
  • Automatic pool maintenance with jittered expiry times to prevent thundering herd on reconnection
  • Failed/interrupted synthesizers are removed from pool and proactively replaced

Documentation

  • Updated README with comprehensive setup instructions for Azure Speech Services
  • Added quick start examples for pipeline mode usage
  • Documented environment variables for Azure Speech and Azure OpenAI

Testing

  • Added Azure to the STREAM_TTS test suite
  • Skipped toxiproxy timeout tests for Azure (WebSocket SDK incompatible with HTTP proxy timeouts)
  • Added Azure to CI test matrix

Summary by CodeRabbit

  • New Features

    • Azure text-to-speech now supports streaming capabilities
  • Documentation

    • Azure plugin documentation significantly updated with detailed setup guide, prerequisites for Azure Speech Services and Azure OpenAI, environment variable configuration instructions, and Quick Start example

✏️ Tip: You can customize this high-level summary in your review settings.


Open with Devin

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

This PR expands Azure plugin capabilities by adding Azure to the test workflow matrix, introducing Azure TTS streaming support with synthesizer pool management and prewarming, updating documentation with prerequisites and quick-start examples, and extending test coverage for Azure TTS integration.

Changes

Cohort / File(s) Summary
CI/CD Configuration
.github/workflows/tests.yml
Added Azure plugin to test matrix to expand CI/CD coverage.
Documentation Updates
livekit-plugins/livekit-plugins-azure/README.md
Expanded with title capitalization, comprehensive prerequisites section (Azure Credentials, Azure OpenAI), quick-start example demonstrating STT/TTS/LLM integration, and additional resource links.
TTS Streaming Implementation
livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py
Introduced Azure TTS streaming with synthesizer pool management, prewarming logic, async synthesizer creation/cleanup, new SynthesizeStream class for streaming pipelines, and support for audio/text persistence. Updated TTS.__init__ with num_prewarm parameter and enabled streaming capabilities.
Test Coverage
tests/test_tts.py
Added Azure TTS to synthesize and stream test parameterizations with timeout test skips and proper cleanup to prevent task leaks.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant TTS as Azure TTS
    participant Pool as Synthesizer Pool
    participant Stream as SynthesizeStream
    participant Synth as SpeechSynthesizer
    participant Azure as Azure WebSocket

    Client->>TTS: stream()
    TTS->>Stream: create SynthesizeStream
    Stream->>Stream: start background prewarm
    
    loop Prewarm (num_prewarm synthesizers)
        Pool->>Synth: create & configure
        Synth->>Azure: connect WebSocket
        Azure-->>Synth: ready
        Synth-->>Pool: add to pool
    end

    Client->>Stream: send text chunks
    Stream->>Pool: acquire synthesizer
    Pool-->>Stream: return warmed synth
    
    Stream->>Synth: stream text (configure voice/format)
    Synth->>Azure: stream audio via WebSocket
    Azure-->>Synth: audio chunks + metadata
    
    loop Audio streaming
        Synth->>Stream: on_synthesis_complete callback
        Stream->>Stream: queue audio data
        Stream-->>Client: emit audio chunks
    end

    Stream->>Pool: release/replace synthesizer
    Pool->>Synth: close if expired/failed
    
    Client->>TTS: aclose()
    TTS->>Stream: cancel prewarm tasks
    Stream->>Pool: shutdown all synthesizers
    Pool->>Synth: detach handlers & close
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested reviewers

  • chenghao-mou
  • davidzhao

Poem

🐰 Azure streams now flow so fast,
With pools of synths, they're built to last!
Pre-warmed connections, ready to sing,
TTS streaming takes its wing! 🎵

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 48.28% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(azure): Add streaming TTS support with connection pooling' accurately summarizes the main changes: adding streaming TTS support to Azure with connection pooling, which aligns with the PR's primary objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

🧪 Unit Test Generation v2 is now available!

We have significantly improved our unit test generation capabilities.

To enable: Add this to your .coderabbit.yaml configuration:

reviews:
  finishing_touches:
    unit_tests:
      enabled: true

Try it out by using the @coderabbitai generate unit tests command on your code files or under ✨ Finishing Touches on the walkthrough!

Have feedback? Share your thoughts on our Discord thread!


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link

CLAassistant commented Jan 30, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py`:
- Line 685: The call to
asyncio.create_task(self._create_replacement_synthesizers(needed)) is
fire-and-forget and may be garbage-collected; change it to create and retain the
Task so it isn’t collected and can be cleaned up: instantiate the Task via
asyncio.create_task for _create_replacement_synthesizers, store the Task in a
container (e.g., self._background_tasks set) and register a done callback to
remove it (use task.add_done_callback(self._background_tasks.discard)) so you
can manage lifecycle and cancel/await tasks during shutdown.
- Around line 374-384: Replace the deprecated asyncio.get_event_loop() call with
asyncio.get_running_loop() inside the async block that runs
_sync_create_and_warmup in a threadpool; specifically update the variable
assignment used by loop.run_in_executor in the async create/warmup section (the
code that catches asyncio.TimeoutError and raises APITimeoutError /
APIConnectionError) so it uses asyncio.get_running_loop() to obtain the current
loop.
- Around line 802-803: Remove the raw ANSI escape sequences from the logger
output in the SDK synthesis thread startup log: update the logger.debug call
that currently uses threading.current_thread().ident (variable thread_id) to log
a plain, uncolored message like "SDK synthesis thread started
(thread_id={thread_id})" so coloring is left to logging formatters instead of
embedding "\033[92m" and "\033[0m" in the message.
- Around line 794-796: Replace deprecated asyncio.get_event_loop() with
asyncio.get_running_loop() when capturing the loop before spawning threads;
specifically update the call in the tts module where "loop =
asyncio.get_event_loop()" is used (and the similar usage inside
_create_and_warmup_synthesizer) to use asyncio.get_running_loop() so the running
event loop is retrieved safely in async contexts.
- Around line 165-183: Summary: Docstring default for num_prewarm is
inconsistent with the function signature. Update the constructor docstring for
the Azure TTS class (the __init__ that defines the num_prewarm parameter) so the
described default matches the actual default of num_prewarm: int = 3 (or
alternatively change the parameter default to 10 if you intended the docstring
to be correct); specifically edit the "num_prewarm" line in the docstring to
state default 3 (or change the parameter in the __init__ signature) so the
docstring and the num_prewarm parameter remain consistent.
- Around line 54-60: The SDK_OUTPUT_FORMATS mapping is missing entries for 22050
and 44100 Hz, causing a legit SUPPORTED_OUTPUT_FORMATS selection (checked in the
class constructor) to silently fall back during streaming synthesis; add
mappings for 22050 ->
speechsdk.SpeechSynthesisOutputFormat.Raw22050Hz16BitMonoPcm and 44100 ->
speechsdk.SpeechSynthesisOutputFormat.Raw44100Hz16BitMonoPcm to
SDK_OUTPUT_FORMATS so it matches SUPPORTED_OUTPUT_FORMATS and the HTTP API, and
update the comment to reflect Raw formats for these sample rates as well.

In `@tests/test_tts.py`:
- Around line 415-421: The test creates azure.TTS() without specifying region
which lets it read AZURE_SPEECH_REGION and can mismatch the hardcoded
proxy-upstream "westus.tts.speech.microsoft.com:443"; update the test parameter
to construct the client with speech_region="westus" (i.e.,
azure.TTS(speech_region="westus")) so the TTS client builds endpoints that match
the proxy-upstream and ensure consistent routing.
🧹 Nitpick comments (2)
livekit-plugins/livekit-plugins-azure/README.md (1)

48-69: Consider clarifying the Quick Start introduction.

Line 50 says "For more control over individual components:" which implies there should be a simpler example shown first (e.g., a basic pipeline mode). However, no such simpler example precedes this section. Consider either:

  1. Removing/rephrasing the introductory text, or
  2. Adding a simpler example before this one

Also, the example shows hardcoded credentials (api_key="your-api-key"). While this is placeholder text, consider adding a comment or note encouraging users to use environment variables instead for better security practices.

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py (1)

419-450: Accessing private pool internals creates fragile coupling.

The prewarm() method directly accesses _connect_lock, _connections, and _available on the ConnectionPool. This couples tightly to the pool's internal implementation and will break if ConnectionPool changes its internals.

Consider whether ConnectionPool should expose a public prewarm API, or if this logic should be handled differently.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 843ffeb and ff43637.

📒 Files selected for processing (4)
  • .github/workflows/tests.yml
  • livekit-plugins/livekit-plugins-azure/README.md
  • livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py
  • tests/test_tts.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py
  • tests/test_tts.py
🧬 Code graph analysis (1)
tests/test_tts.py (2)
livekit-agents/livekit/agents/tts/tts.py (2)
  • label (85-86)
  • TTS (66-166)
livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py (1)
  • TTS (149-476)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unit-tests
  • GitHub Check: type-check (3.13)
  • GitHub Check: type-check (3.9)
🔇 Additional comments (7)
.github/workflows/tests.yml (1)

86-95: LGTM!

Azure is correctly added to the plugin test matrix, and the required environment variables (AZURE_SPEECH_KEY, AZURE_SPEECH_REGION) are already configured in the secrets section.

tests/test_tts.py (3)

154-160: LGTM!

Azure TTS correctly added to the SYNTHESIZE_TTS test matrix with appropriate proxy configuration.


331-335: Good defensive cleanup before skip.

Properly closing the TTS instance before skipping prevents leaked async tasks. The skip reason clearly explains the WebSocket SDK incompatibility with toxiproxy.


678-682: Consistent skip pattern with synthesize timeout test.

The skip logic mirrors the pattern used in test_tts_synthesize_timeout, maintaining consistency across timeout tests.

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py (3)

825-858: Blocking future.result() in callback may cause issues.

In synthesizing_callback, calling future.result() blocks until the async queue put completes. While this ensures ordering, it could potentially cause issues if the event loop is blocked or slow. The Azure SDK callback thread will be blocked waiting for the asyncio operation.

This is likely acceptable given the need for ordering, but worth noting for debugging if latency issues arise.


1088-1137: Cancellation handling is thorough and well-structured.

The cancellation flow properly:

  1. Sets the cancelled flag
  2. Sends stop signal to synthesizer (non-blocking)
  3. Signals SDK thread to stop via text_queue
  4. Waits for synthesis thread with timeout
  5. Flushes and ends segment
  6. Drains queues
  7. Re-raises to trigger pool cleanup

This comprehensive cleanup prevents resource leaks on interruption.


696-718: Task cancellation ordering could leave orphaned tasks.

If asyncio.gather(*tasks) raises an exception, gracefully_cancel(*tasks) is called in the finally block. However, if the gather completes normally but one task had already failed, the other task might still be running when gracefully_cancel is called.

The current pattern should work correctly since gracefully_cancel handles already-completed tasks, but ensure gracefully_cancel properly handles all edge cases.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View issue and 5 additional flags in Devin Review.

Open in Devin Review

Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View issue and 8 additional flags in Devin Review.

Open in Devin Review

Comment on lines +310 to +313
speech_config = speechsdk.SpeechConfig(
endpoint=endpoint,
subscription=self._opts.subscription_key or "",
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Streaming TTS ignores auth_token authentication, only uses subscription_key

When users authenticate using speech_auth_token instead of speech_key, the streaming TTS implementation fails silently because it only passes subscription_key to the Azure SDK.

Click to expand

Problem

The _create_and_warmup_synthesizer method at line 310-312 creates a SpeechConfig with only the subscription key:

speech_config = speechsdk.SpeechConfig(
    endpoint=endpoint,
    subscription=self._opts.subscription_key or "",
)

When subscription_key is None (because the user provided speech_auth_token instead), this passes an empty string, causing authentication to fail.

Contrast with ChunkedStream

The HTTP-based ChunkedStream._run() at tts.py:526-530 correctly handles both authentication methods:

if self._opts.auth_token:
    headers["Authorization"] = f"Bearer {self._opts.auth_token}"
elif self._opts.subscription_key:
    headers["Ocp-Apim-Subscription-Key"] = self._opts.subscription_key

Impact

Users who configure Azure TTS with speech_auth_token (e.g., using Microsoft Entra authentication) will have working non-streaming synthesis but broken streaming synthesis.

Recommendation: After creating the SpeechConfig, check for auth_token and set it:

speech_config = speechsdk.SpeechConfig(
    endpoint=endpoint,
    subscription=self._opts.subscription_key or "",
)
if self._opts.auth_token:
    speech_config.authorization_token = self._opts.auth_token
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants