-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[Draft] inworld tts auto mode #4655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Ian Lee seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
📝 WalkthroughWalkthroughThe Inworld TTS plugin now sets Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
618c4b5 to
0595314
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py`:
- Around line 358-359: The unconditional setting of pkt["create"]["autoMode"] =
True (in the TTS packet creation where SentenceTokenizer is used) should be
removed or gated to avoid sending an undocumented field; change this to follow
the existing pattern used for timestampType/applyTextNormalization (only set
autoMode if the corresponding config value is provided/is_given()), or otherwise
remove the line and/or wrap the send in graceful error handling to ignore server
rejections related to unknown fields; update the code around the packet
construction in the TTS send/create logic (the block that references
SentenceTokenizer and pkt["create"]) to implement the conditional check or
removal.
🧹 Nitpick comments (1)
livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py (1)
1081-1086: Gateflush_contextbehind a flag whenautoModeis enabled.Currently,
flush_contextis called after every sentence token. WithautoModeenabled (which is unconditional), the Inworld server handles buffering internally, making per-token flushes redundant and increasing message overhead. Consider gating this behind a flag or removing it entirely in favor of explicit flushes via user input orend_input().
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| # Always enable auto_mode since we always use SentenceTokenizer | ||
| pkt["create"]["autoMode"] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
wc -l "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py"Repository: livekit/agents
Length of output: 133
🏁 Script executed:
sed -n '350,370p' "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py"Repository: livekit/agents
Length of output: 1118
🏁 Script executed:
sed -n '1070,1100p' "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py"Repository: livekit/agents
Length of output: 1339
🏁 Script executed:
rg "autoMode|auto_mode" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py" -B 3 -A 3Repository: livekit/agents
Length of output: 864
🏁 Script executed:
rg "create\[" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py" -B 2 -A 5Repository: livekit/agents
Length of output: 40
🏁 Script executed:
rg "version|Version|release|Release" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py" -i | head -20Repository: livekit/agents
Length of output: 802
🏁 Script executed:
rg "try|except|error|Error" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py" -B 2 -A 2 | head -40Repository: livekit/agents
Length of output: 1393
🏁 Script executed:
find "livekit-plugins/livekit-plugins-inworld" -name "*.py" -type fRepository: livekit/agents
Length of output: 350
🏁 Script executed:
cat "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/version.py"Repository: livekit/agents
Length of output: 658
🏁 Script executed:
cat "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/__init__.py"Repository: livekit/agents
Length of output: 1447
🌐 Web query:
Inworld TTS WebSocket "autoMode" field API documentation release notes
💡 Result:
I couldn’t find any official Inworld TTS WebSocket API documentation or release-notes entry that defines an autoMode field.
What is documented for “automatic” behavior in the WebSocket TTS API is buffering + auto-flush via fields like bufferCharThreshold (start synthesis when buffered text reaches a threshold) and maxBufferDelayMs (time-based buffering), plus explicit flush_context messages. This is described in the WebSocket API reference and Inworld’s WebSocket/TTS announcement blog post. [1], [2]
Inworld’s published TTS release notes (as of the latest entry on January 21, 2026) don’t mention adding/renaming any WebSocket request field called autoMode. [3]
Sources
[1] Inworld API Reference — “Synthesize speech (WebSocket)” (voice:streamBidirectional)
[2] Inworld blog (June 11, 2025) — “Introducing timestamp alignment, WebSockets and more for Inworld TTS”
[3] Inworld Docs — TTS Release Notes
🏁 Script executed:
rg "autoMode|auto_mode" "livekit-plugins/livekit-plugins-inworld/" -r --type pyRepository: livekit/agents
Length of output: 518
🏁 Script executed:
rg "autoMode" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py"Repository: livekit/agents
Length of output: 110
🏁 Script executed:
rg "auto_mode" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py"Repository: livekit/agents
Length of output: 200
🏁 Script executed:
sed -n '340,365p' "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py"Repository: livekit/agents
Length of output: 1478
🏁 Script executed:
rg "Inworld|API|compatibility" "livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py" -i -B 2 -A 2 | head -50Repository: livekit/agents
Length of output: 1514
Remove or conditionally gate undocumented autoMode field to prevent API failures.
The autoMode field is not documented in the Inworld TTS WebSocket API and the code comment ("To be removed once auto_mode is released") indicates it is a future/unreleased feature. Sending this unrecognized field unconditionally could cause stream creation failures if Inworld does not silently ignore unknown fields. Either:
- Wait for official
autoModerelease and remove the comment, or - Make it conditional (similar to
timestampTypeandapplyTextNormalizationwhich useis_given()checks), or - Add error handling to gracefully degrade if the field is rejected.
🤖 Prompt for AI Agents
In `@livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/tts.py`
around lines 358 - 359, The unconditional setting of pkt["create"]["autoMode"] =
True (in the TTS packet creation where SentenceTokenizer is used) should be
removed or gated to avoid sending an undocumented field; change this to follow
the existing pattern used for timestampType/applyTextNormalization (only set
autoMode if the corresponding config value is provided/is_given()), or otherwise
remove the line and/or wrap the send in graceful error handling to ignore server
rejections related to unknown fields; update the code around the packet
construction in the TTS send/create logic (the block that references
SentenceTokenizer and pkt["create"]) to implement the conditional check or
removal.
auto_mode to be added to config param in a separate PR when word tokenizer and user-controlled manual flushes are supported. For now, auto_mode should enhance quality and naturalness of agent response
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.