feat(azure): Add streaming TTS support with connection pooling #4659

007DXR · 2026-01-30T11:04:05Z

Summary

Adds streaming text-to-speech (TTS) support to the Azure plugin using the Azure Speech SDK with WebSocket-based streaming. This enables real-time audio synthesis with lower latency compared to the chunked HTTP approach.

Changes

Streaming TTS Implementation

Implemented SynthesizeStream class using Azure Speech SDK's TextStream input type
Audio chunks are streamed back via SDK callbacks as synthesis progresses
Proper cancellation handling with stop_speaking_async() for clean interruption

Connection Pooling

Added connection pool for Azure Speech synthesizers with configurable pre-warming (num_prewarm parameter, default: 3)
Pre-connects WebSocket connections using Connection.from_speech_synthesizer() for faster first synthesis
Automatic pool maintenance with jittered expiry times to prevent thundering herd on reconnection
Failed/interrupted synthesizers are removed from pool and proactively replaced

Documentation

Updated README with comprehensive setup instructions for Azure Speech Services
Added quick start examples for pipeline mode usage
Documented environment variables for Azure Speech and Azure OpenAI

Testing

Added Azure to the STREAM_TTS test suite
Skipped toxiproxy timeout tests for Azure (WebSocket SDK incompatible with HTTP proxy timeouts)
Added Azure to CI test matrix

Summary by CodeRabbit

New Features
- Azure text-to-speech now supports streaming capabilities
Documentation
- Azure plugin documentation significantly updated with detailed setup guide, prerequisites for Azure Speech Services and Azure OpenAI, environment variable configuration instructions, and Quick Start example

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-30T11:04:24Z

📝 Walkthrough

Walkthrough

This PR expands Azure plugin capabilities by adding Azure to the test workflow matrix, introducing Azure TTS streaming support with synthesizer pool management and prewarming, updating documentation with prerequisites and quick-start examples, and extending test coverage for Azure TTS integration.

Changes

Cohort / File(s)	Summary
CI/CD Configuration `.github/workflows/tests.yml`	Added Azure plugin to test matrix to expand CI/CD coverage.
Documentation Updates `livekit-plugins/livekit-plugins-azure/README.md`	Expanded with title capitalization, comprehensive prerequisites section (Azure Credentials, Azure OpenAI), quick-start example demonstrating STT/TTS/LLM integration, and additional resource links.
TTS Streaming Implementation `livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py`	Introduced Azure TTS streaming with synthesizer pool management, prewarming logic, async synthesizer creation/cleanup, new `SynthesizeStream` class for streaming pipelines, and support for audio/text persistence. Updated `TTS.__init__` with `num_prewarm` parameter and enabled streaming capabilities.
Test Coverage `tests/test_tts.py`	Added Azure TTS to synthesize and stream test parameterizations with timeout test skips and proper cleanup to prevent task leaks.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant TTS as Azure TTS
    participant Pool as Synthesizer Pool
    participant Stream as SynthesizeStream
    participant Synth as SpeechSynthesizer
    participant Azure as Azure WebSocket

    Client->>TTS: stream()
    TTS->>Stream: create SynthesizeStream
    Stream->>Stream: start background prewarm
    
    loop Prewarm (num_prewarm synthesizers)
        Pool->>Synth: create & configure
        Synth->>Azure: connect WebSocket
        Azure-->>Synth: ready
        Synth-->>Pool: add to pool
    end

    Client->>Stream: send text chunks
    Stream->>Pool: acquire synthesizer
    Pool-->>Stream: return warmed synth
    
    Stream->>Synth: stream text (configure voice/format)
    Synth->>Azure: stream audio via WebSocket
    Azure-->>Synth: audio chunks + metadata
    
    loop Audio streaming
        Synth->>Stream: on_synthesis_complete callback
        Stream->>Stream: queue audio data
        Stream-->>Client: emit audio chunks
    end

    Stream->>Pool: release/replace synthesizer
    Pool->>Synth: close if expired/failed
    
    Client->>TTS: aclose()
    TTS->>Stream: cancel prewarm tasks
    Stream->>Pool: shutdown all synthesizers
    Pool->>Synth: detach handlers & close

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested reviewers

chenghao-mou
davidzhao

Poem

🐰 Azure streams now flow so fast,
With pools of synths, they're built to last!
Pre-warmed connections, ready to sing,
TTS streaming takes its wing! 🎵

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 48.28% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(azure): Add streaming TTS support with connection pooling' accurately summarizes the main changes: adding streaming TTS support to Azure with connection pooling, which aligns with the PR's primary objective.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

🧪 Unit Test Generation v2 is now available!

We have significantly improved our unit test generation capabilities.

To enable: Add this to your .coderabbit.yaml configuration:

reviews:
  finishing_touches:
    unit_tests:
      enabled: true

Try it out by using the @coderabbitai generate unit tests command on your code files or under ✨ Finishing Touches on the walkthrough!

Have feedback? Share your thoughts on our Discord thread!

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

CLAassistant · 2026-01-30T11:06:49Z

All committers have signed the CLA.

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In `@livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py`:
- Line 685: The call to
asyncio.create_task(self._create_replacement_synthesizers(needed)) is
fire-and-forget and may be garbage-collected; change it to create and retain the
Task so it isn’t collected and can be cleaned up: instantiate the Task via
asyncio.create_task for _create_replacement_synthesizers, store the Task in a
container (e.g., self._background_tasks set) and register a done callback to
remove it (use task.add_done_callback(self._background_tasks.discard)) so you
can manage lifecycle and cancel/await tasks during shutdown.
- Around line 374-384: Replace the deprecated asyncio.get_event_loop() call with
asyncio.get_running_loop() inside the async block that runs
_sync_create_and_warmup in a threadpool; specifically update the variable
assignment used by loop.run_in_executor in the async create/warmup section (the
code that catches asyncio.TimeoutError and raises APITimeoutError /
APIConnectionError) so it uses asyncio.get_running_loop() to obtain the current
loop.
- Around line 802-803: Remove the raw ANSI escape sequences from the logger
output in the SDK synthesis thread startup log: update the logger.debug call
that currently uses threading.current_thread().ident (variable thread_id) to log
a plain, uncolored message like "SDK synthesis thread started
(thread_id={thread_id})" so coloring is left to logging formatters instead of
embedding "\033[92m" and "\033[0m" in the message.
- Around line 794-796: Replace deprecated asyncio.get_event_loop() with
asyncio.get_running_loop() when capturing the loop before spawning threads;
specifically update the call in the tts module where "loop =
asyncio.get_event_loop()" is used (and the similar usage inside
_create_and_warmup_synthesizer) to use asyncio.get_running_loop() so the running
event loop is retrieved safely in async contexts.
- Around line 165-183: Summary: Docstring default for num_prewarm is
inconsistent with the function signature. Update the constructor docstring for
the Azure TTS class (the __init__ that defines the num_prewarm parameter) so the
described default matches the actual default of num_prewarm: int = 3 (or
alternatively change the parameter default to 10 if you intended the docstring
to be correct); specifically edit the "num_prewarm" line in the docstring to
state default 3 (or change the parameter in the __init__ signature) so the
docstring and the num_prewarm parameter remain consistent.
- Around line 54-60: The SDK_OUTPUT_FORMATS mapping is missing entries for 22050
and 44100 Hz, causing a legit SUPPORTED_OUTPUT_FORMATS selection (checked in the
class constructor) to silently fall back during streaming synthesis; add
mappings for 22050 ->
speechsdk.SpeechSynthesisOutputFormat.Raw22050Hz16BitMonoPcm and 44100 ->
speechsdk.SpeechSynthesisOutputFormat.Raw44100Hz16BitMonoPcm to
SDK_OUTPUT_FORMATS so it matches SUPPORTED_OUTPUT_FORMATS and the HTTP API, and
update the comment to reflect Raw formats for these sample rates as well.

In `@tests/test_tts.py`:
- Around line 415-421: The test creates azure.TTS() without specifying region
which lets it read AZURE_SPEECH_REGION and can mismatch the hardcoded
proxy-upstream "westus.tts.speech.microsoft.com:443"; update the test parameter
to construct the client with speech_region="westus" (i.e.,
azure.TTS(speech_region="westus")) so the TTS client builds endpoints that match
the proxy-upstream and ensure consistent routing.

🧹 Nitpick comments (2)

livekit-plugins/livekit-plugins-azure/README.md (1)

48-69: Consider clarifying the Quick Start introduction.

Line 50 says "For more control over individual components:" which implies there should be a simpler example shown first (e.g., a basic pipeline mode). However, no such simpler example precedes this section. Consider either:

Removing/rephrasing the introductory text, or

Adding a simpler example before this one

Also, the example shows hardcoded credentials (api_key="your-api-key"). While this is placeholder text, consider adding a comment or note encouraging users to use environment variables instead for better security practices.

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py (1)

419-450: Accessing private pool internals creates fragile coupling.

The prewarm() method directly accesses _connect_lock, _connections, and _available on the ConnectionPool. This couples tightly to the pool's internal implementation and will break if ConnectionPool changes its internals.

Consider whether ConnectionPool should expose a public prewarm API, or if this logic should be handled differently.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 843ffeb and ff43637.

📒 Files selected for processing (4)

.github/workflows/tests.yml
livekit-plugins/livekit-plugins-azure/README.md
livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py
tests/test_tts.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py
tests/test_tts.py

🧬 Code graph analysis (1)

tests/test_tts.py (2)

livekit-agents/livekit/agents/tts/tts.py (2)

label (85-86)

TTS (66-166)

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py (1)

TTS (149-476)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: unit-tests
GitHub Check: type-check (3.13)
GitHub Check: type-check (3.9)

🔇 Additional comments (7)

.github/workflows/tests.yml (1)

86-95: LGTM!

Azure is correctly added to the plugin test matrix, and the required environment variables (AZURE_SPEECH_KEY, AZURE_SPEECH_REGION) are already configured in the secrets section.

tests/test_tts.py (3)

154-160: LGTM!

Azure TTS correctly added to the SYNTHESIZE_TTS test matrix with appropriate proxy configuration.

331-335: Good defensive cleanup before skip.

Properly closing the TTS instance before skipping prevents leaked async tasks. The skip reason clearly explains the WebSocket SDK incompatibility with toxiproxy.

678-682: Consistent skip pattern with synthesize timeout test.

The skip logic mirrors the pattern used in test_tts_synthesize_timeout, maintaining consistency across timeout tests.

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py (3)

825-858: Blocking future.result() in callback may cause issues.

In synthesizing_callback, calling future.result() blocks until the async queue put completes. While this ensures ordering, it could potentially cause issues if the event loop is blocked or slow. The Azure SDK callback thread will be blocked waiting for the asyncio operation.

This is likely acceptable given the need for ordering, but worth noting for debugging if latency issues arise.

1088-1137: Cancellation handling is thorough and well-structured.

The cancellation flow properly:

Sets the cancelled flag

Sends stop signal to synthesizer (non-blocking)

Signals SDK thread to stop via text_queue

Waits for synthesis thread with timeout

Flushes and ends segment

Drains queues

Re-raises to trigger pool cleanup

This comprehensive cleanup prevents resource leaks on interruption.

696-718: Task cancellation ordering could leave orphaned tasks.

If asyncio.gather(*tasks) raises an exception, gracefully_cancel(*tasks) is called in the finally block. However, if the gather completes normally but one task had already failed, the other task might still be running when gracefully_cancel is called.

The current pattern should work correctly since gracefully_cancel handles already-completed tasks, but ensure gracefully_cancel properly handles all edge cases.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py

tests/test_tts.py

devin-ai-integration

Devin Review found 1 potential issue.

View issue and 5 additional flags in Devin Review.

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py

devin-ai-integration

Devin Review found 1 new potential issue.

View issue and 8 additional flags in Devin Review.

devin-ai-integration · 2026-02-02T07:45:46Z

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py

+            speech_config = speechsdk.SpeechConfig(
+                endpoint=endpoint,
+                subscription=self._opts.subscription_key or "",
+            )


🔴 Streaming TTS ignores auth_token authentication, only uses subscription_key

When users authenticate using speech_auth_token instead of speech_key, the streaming TTS implementation fails silently because it only passes subscription_key to the Azure SDK.

Click to expand

Problem

The _create_and_warmup_synthesizer method at line 310-312 creates a SpeechConfig with only the subscription key:

speech_config = speechsdk.SpeechConfig( endpoint=endpoint, subscription=self._opts.subscription_key or "", )

When subscription_key is None (because the user provided speech_auth_token instead), this passes an empty string, causing authentication to fail.

Contrast with ChunkedStream

The HTTP-based ChunkedStream._run() at tts.py:526-530 correctly handles both authentication methods:

if self._opts.auth_token: headers["Authorization"] = f"Bearer {self._opts.auth_token}" elif self._opts.subscription_key: headers["Ocp-Apim-Subscription-Key"] = self._opts.subscription_key

Impact

Users who configure Azure TTS with speech_auth_token (e.g., using Microsoft Entra authentication) will have working non-streaming synthesis but broken streaming synthesis.

Recommendation: After creating the SpeechConfig, check for auth_token and set it:

speech_config = speechsdk.SpeechConfig( endpoint=endpoint, subscription=self._opts.subscription_key or "", ) if self._opts.auth_token: speech_config.authorization_token = self._opts.auth_token

Was this helpful? React with 👍 or 👎 to provide feedback.

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

007DXR added 2 commits February 2, 2026 14:44

add tts streaming feature and correspoding test

46bf4d0

fix issues brought up by coderabbitai

c25eef9

007DXR force-pushed the xinran/tts branch from ff43637 to c25eef9 Compare February 2, 2026 07:17

devin-ai-integration bot reviewed Feb 2, 2026

View reviewed changes

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py Show resolved Hide resolved

fix issues brought up by coderabbitai

dac03e0

devin-ai-integration bot reviewed Feb 2, 2026

View reviewed changes

feat(azure): Add streaming TTS support with connection pooling #4659

Are you sure you want to change the base?

feat(azure): Add streaming TTS support with connection pooling #4659

Conversation

007DXR commented Jan 30, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Streaming TTS Implementation

Connection Pooling

Documentation

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

CLAassistant commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 2, 2026

Choose a reason for hiding this comment

Problem

Contrast with ChunkedStream

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

007DXR commented Jan 30, 2026 •

edited by devin-ai-integration bot

Loading

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

CLAassistant commented Jan 30, 2026 •

edited

Loading