Claude/slack update stt assembly to inworld jx5y b#56
Conversation
- Create InworldSTTNode using Inworld's REST API (POST /stt/v1/transcribe) with energy-based VAD for end-of-turn detection - Remove assembly-ai-stt-ws-node.ts and its WebSocket-based streaming logic - Update ConversationGraphWrapper to hold inworldSTTNode reference - Update ConversationGraphConfig to accept inworldApiKey (was assemblyAIApiKey) - Replace ASSEMBLY_AI_API_KEY env var with INWORLD_API_KEY in graph-service, .env.example, and render.yaml (single key for all Inworld services) - Replace AssemblyAI turn-detection presets in server config with equivalent Inworld STT VAD presets (silenceThresholdMs / minSpeechMs / energyThreshold) - Rename ASSEMBLY_AI_EAGERNESS env var to INWORLD_STT_EAGERNESS - Update comments in connection-manager, transcript-extractor-node, server, and audio-processor.js to reflect the new STT provider https://claude.ai/code/session_01EDqcCeQHNj2f2TVeFb5Dxh
- Fix prettier line-length formatting for Buffer.from() call in inworld-stt-node.ts - Remove unused samplesPerMs variable in inworld-stt-node.ts - Fix prettier line-length formatting for description string in server.ts https://claude.ai/code/session_01EDqcCeQHNj2f2TVeFb5Dxh
- Add 30s timeout protection to callInworldSTT to prevent network stalls - Fix session lifecycle: remove shared STT node destruction from ConnectionManager.destroy() since the node is shared across sessions https://claude.ai/code/session_01TJxsn7u4AVgj7UHWPffSha
There was a problem hiding this comment.
Pull request overview
This PR migrates the backend speech-to-text path from an AssemblyAI streaming WebSocket node to a new Inworld STT node that uses energy-based VAD and an Inworld STT REST call, and updates configuration/env wiring accordingly.
Changes:
- Replaced
AssemblyAISTTWebSocketNodewith a newInworldSTTNodeimplementation and rewired the conversation graph to use it. - Removed AssemblyAI environment/config plumbing and updated logging/comments to reflect Inworld STT usage.
- Updated deployment/env examples to drop
ASSEMBLY_AI_API_KEYand introduceINWORLD_STT_EAGERNESS-based VAD presets.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| render.yaml | Removes ASSEMBLY_AI_API_KEY from Render env configuration. |
| frontend/public/audio-processor.js | Updates comments to reflect Inworld STT + energy-based VAD assumptions (100ms/16kHz). |
| backend/src/services/graph-service.ts | Switches graph initialization to require INWORLD_API_KEY and pass it into graph config. |
| backend/src/server.ts | Updates documentation/log strings to reflect Inworld STT. |
| backend/src/helpers/connection-manager.ts | Removes AssemblyAI session-close behavior; updates comments for Inworld STT. |
| backend/src/graphs/nodes/transcript-extractor-node.ts | Updates documentation to reference InworldSTTNode output. |
| backend/src/graphs/nodes/inworld-stt-node.ts | Adds new STT node with energy-based VAD + REST transcription. |
| backend/src/graphs/nodes/assembly-ai-stt-ws-node.ts | Removes the AssemblyAI streaming WebSocket STT node. |
| backend/src/graphs/conversation-graph.ts | Rewires graph from AssemblyAI STT node to Inworld STT node and updates wrapper/config. |
| backend/src/config/server.ts | Replaces AssemblyAI turn-detection presets with Inworld STT VAD presets + env override. |
| backend/package-lock.json | Lockfile metadata changes (peer flags removed in places). |
| backend/.env.example | Removes ASSEMBLY_AI_API_KEY from example env file. |
Files not reviewed (1)
- backend/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| inworldSTT: { | ||
| /** VAD eagerness level */ | ||
| eagerness: (process.env.INWORLD_STT_EAGERNESS || | ||
| 'high') as InworldSTTEagerness, | ||
| }, |
There was a problem hiding this comment.
INWORLD_STT_EAGERNESS is cast to InworldSTTEagerness without validation, so an invalid env value will make getInworldSTTSettings() return undefined and cause downstream runtime errors. Consider validating against {'low','medium','high'} (fallback to 'high' and log a warning) before indexing into inworldSTTPresets.
| const silenceChunksThreshold = Math.ceil( | ||
| this.silenceThresholdMs / 100 // chunks are ~100ms each (1600 samples @ 16kHz) | ||
| ); | ||
| const minSpeechChunks = Math.ceil(this.minSpeechMs / 100); | ||
|
|
There was a problem hiding this comment.
VAD chunk thresholds are computed using a hard-coded 100ms assumption (silenceThresholdMs / 100, minSpeechMs / 100), but the node also accepts sampleRate and the actual chunk duration depends on the incoming frame size. To avoid mis-tuned VAD if chunk size/sample rate changes, derive the chunk duration from audioData.length and sampleRate (or assert/enforce 1600@16kHz explicitly).
| private sendPartialTranscript( | ||
| sessionId: string, | ||
| interactionId: string, | ||
| text: string | ||
| ): void { | ||
| const connection = this.connections[sessionId]; | ||
| if (!connection?.onPartialTranscript) return; | ||
|
|
||
| try { | ||
| connection.onPartialTranscript(text, interactionId); | ||
| } catch (error) { | ||
| logger.error({ err: error }, 'error_sending_partial_transcript'); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
sendPartialTranscript() is declared but never called in this node, which adds dead code and suggests partial transcript support that isn't actually implemented. Either remove it or wire it up (e.g., emit interim text when available) so the class API matches its behavior.
| private sendPartialTranscript( | |
| sessionId: string, | |
| interactionId: string, | |
| text: string | |
| ): void { | |
| const connection = this.connections[sessionId]; | |
| if (!connection?.onPartialTranscript) return; | |
| try { | |
| connection.onPartialTranscript(text, interactionId); | |
| } catch (error) { | |
| logger.error({ err: error }, 'error_sending_partial_transcript'); | |
| } | |
| } |
No description provided.