Skip to content

Conversation

@ianbbqzy
Copy link

@ianbbqzy ianbbqzy commented Jan 29, 2026

auto_mode to be added to config param in a separate PR when word tokenizer and user-controlled manual flushes are supported. For now, auto_mode should enhance quality and naturalness of agent response

Summary by CodeRabbit

  • Updates
    • Auto Mode enabled for Inworld TTS operations.
    • Streaming text now flushes context more frequently during transmission, improving responsiveness and reducing latency for long messages.

✏️ Tip: You can customize this high-level summary in your review settings.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Ian Lee seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

The Inworld TTS plugin now sets autoMode to True when creating a WebSocket context and calls flush_context inside the per-chunk sending loop (after each ~1000-character chunk) instead of only once after all chunks are sent.

Changes

Cohort / File(s) Summary
Inworld TTS Auto Mode & Flushing
livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py
Set pkt["create"]["autoMode"] = True when creating the WebSocket context. Moved flush_context into the per-chunk send loop so it runs after each ~1000-character chunk; marked as temporary until auto_mode is released.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I nibble bytes and polish code with care,
AutoMode set true — a bright new hare!
Flush after chunks, hop by hop I bound,
Temporary steps where changes are found,
A rabbit cheers at fixes neatly sound. 🐇✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[Draft] inworld tts auto mode' clearly describes the main change: enabling auto mode for inworld TTS. It directly relates to the implementation of auto_mode behavior in the TTS module.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py`:
- Around line 358-359: The unconditional setting of pkt["create"]["autoMode"] =
True (in the TTS packet creation where SentenceTokenizer is used) should be
removed or gated to avoid sending an undocumented field; change this to follow
the existing pattern used for timestampType/applyTextNormalization (only set
autoMode if the corresponding config value is provided/is_given()), or otherwise
remove the line and/or wrap the send in graceful error handling to ignore server
rejections related to unknown fields; update the code around the packet
construction in the TTS send/create logic (the block that references
SentenceTokenizer and pkt["create"]) to implement the conditional check or
removal.
🧹 Nitpick comments (1)
livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py (1)

1081-1086: Gate flush_context behind a flag when autoMode is enabled.

Currently, flush_context is called after every sentence token. With autoMode enabled (which is unconditional), the Inworld server handles buffering internally, making per-token flushes redundant and increasing message overhead. Consider gating this behind a flag or removing it entirely in favor of explicit flushes via user input or end_input().

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 618c4b5 and 0595314.

📒 Files selected for processing (1)
  • livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: unit-tests

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +358 to +359
# Always enable auto_mode since we always use SentenceTokenizer
pkt["create"]["autoMode"] = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

wc -l "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py"

Repository: livekit/agents

Length of output: 133


🏁 Script executed:

sed -n '350,370p' "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py"

Repository: livekit/agents

Length of output: 1118


🏁 Script executed:

sed -n '1070,1100p' "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py"

Repository: livekit/agents

Length of output: 1339


🏁 Script executed:

rg "autoMode|auto_mode" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py" -B 3 -A 3

Repository: livekit/agents

Length of output: 864


🏁 Script executed:

rg "create\[" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py" -B 2 -A 5

Repository: livekit/agents

Length of output: 40


🏁 Script executed:

rg "version|Version|release|Release" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py" -i | head -20

Repository: livekit/agents

Length of output: 802


🏁 Script executed:

rg "try|except|error|Error" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py" -B 2 -A 2 | head -40

Repository: livekit/agents

Length of output: 1393


🏁 Script executed:

find "livekit-plugins/livekit-plugins-inworld" -name "*.py" -type f

Repository: livekit/agents

Length of output: 350


🏁 Script executed:

cat "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/version.py"

Repository: livekit/agents

Length of output: 658


🏁 Script executed:

cat "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/__init__.py"

Repository: livekit/agents

Length of output: 1447


🌐 Web query:

Inworld TTS WebSocket "autoMode" field API documentation release notes

💡 Result:

I couldn’t find any official Inworld TTS WebSocket API documentation or release-notes entry that defines an autoMode field.

What is documented for “automatic” behavior in the WebSocket TTS API is buffering + auto-flush via fields like bufferCharThreshold (start synthesis when buffered text reaches a threshold) and maxBufferDelayMs (time-based buffering), plus explicit flush_context messages. This is described in the WebSocket API reference and Inworld’s WebSocket/TTS announcement blog post. [1], [2]

Inworld’s published TTS release notes (as of the latest entry on January 21, 2026) don’t mention adding/renaming any WebSocket request field called autoMode. [3]

Sources
[1] Inworld API Reference — “Synthesize speech (WebSocket)” (voice:streamBidirectional)
[2] Inworld blog (June 11, 2025) — “Introducing timestamp alignment, WebSockets and more for Inworld TTS”
[3] Inworld Docs — TTS Release Notes


🏁 Script executed:

rg "autoMode|auto_mode" "livekit-plugins/livekit-plugins-inworld/" -r --type py

Repository: livekit/agents

Length of output: 518


🏁 Script executed:

rg "autoMode" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py"

Repository: livekit/agents

Length of output: 110


🏁 Script executed:

rg "auto_mode" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py"

Repository: livekit/agents

Length of output: 200


🏁 Script executed:

sed -n '340,365p' "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py"

Repository: livekit/agents

Length of output: 1478


🏁 Script executed:

rg "Inworld|API|compatibility" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py" -i -B 2 -A 2 | head -50

Repository: livekit/agents

Length of output: 1514


Remove or conditionally gate undocumented autoMode field to prevent API failures.

The autoMode field is not documented in the Inworld TTS WebSocket API and the code comment ("To be removed once auto_mode is released") indicates it is a future/unreleased feature. Sending this unrecognized field unconditionally could cause stream creation failures if Inworld does not silently ignore unknown fields. Either:

  • Wait for official autoMode release and remove the comment, or
  • Make it conditional (similar to timestampType and applyTextNormalization which use is_given() checks), or
  • Add error handling to gracefully degrade if the field is rejected.
🤖 Prompt for AI Agents
In `@livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py`
around lines 358 - 359, The unconditional setting of pkt["create"]["autoMode"] =
True (in the TTS packet creation where SentenceTokenizer is used) should be
removed or gated to avoid sending an undocumented field; change this to follow
the existing pattern used for timestampType/applyTextNormalization (only set
autoMode if the corresponding config value is provided/is_given()), or otherwise
remove the line and/or wrap the send in graceful error handling to ignore server
rejections related to unknown fields; update the code around the packet
construction in the TTS send/create logic (the block that references
SentenceTokenizer and pkt["create"]) to implement the conditional check or
removal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants