Skip to content

google realtime: generateReply() throws for gemini-3.1-flash-live-preview, breaking voice.AgentSession tool flows #1197

@rahulmanuwas

Description

@rahulmanuwas

Describe the bug

When using @livekit/agents-plugin-google realtime with model gemini-3.1-flash-live-preview, session.generateReply() throws:

generateReply is not compatible with 'gemini-3.1-flash-live-preview'

This is more than an app-level incompatibility. It breaks common voice.AgentSession programmatic reply flows, including greeting generation, reconnect recovery, guardrail block replies, and explicit post-tool continuation.

In our production voice agent, the most important failure mode is post-tool continuation: after a function call completes, the server needs to trigger the assistant to continue the turn. With Gemini 3.1 Live through the current LiveKit wrapper, that path is not available, so we have to downgrade realtime tool-enabled agents to gemini-2.5-flash-native-audio-preview-12-2025.

Relevant log output

From our voice-agent logs while running a realtime tool-enabled session:

[agent] Switching realtime session model from gemini-3.1-flash-live-preview to gemini-2.5-flash-native-audio-preview-12-2025 because LiveKit 1.2.3 rejects generateReply() on Gemini 3.1 Live

And a minimal direct repro of the failing API call produces:

generateReply is not compatible with 'gemini-3.1-flash-live-preview'

Describe your environment

  • @livekit/agents: 1.2.3
  • @livekit/agents-plugin-google: 1.2.3
  • @google/genai: 1.47.0
  • Node: v25.8.2

Minimal reproducible example

import { voice } from '@livekit/agents';
import * as google from '@livekit/agents-plugin-google';

const agent = new voice.Agent({
  instructions: 'You are a helpful assistant.',
});

const session = new voice.AgentSession({
  llm: new google.beta.realtime.RealtimeModel({
    model: 'gemini-3.1-flash-live-preview',
    apiKey: process.env.GOOGLE_API_KEY!,
    instructions: 'You are a helpful assistant.',
  }),
});

await session.start({ agent, room });

await session.generateReply({
  allowInterruptions: true,
  instructions: 'Say hello in one short sentence.',
});

Additional information

A few details that may help narrow the issue:

  • The Google realtime adapter currently has an explicit guard that throws for gemini-3.1-flash-live-preview inside generateReply().
  • The adapter implementation appears to synthesize a content turn for generateReply(), which seems aligned with Gemini 2.5 Live behavior.
  • Google's Gemini Live docs say Gemini 3.1 Live differs materially from 2.5 here: after initial history seeding, ongoing text input should use send_realtime_input, while send_client_content is only supported for initial history seeding.
  • So this does not look like a simple missing model allowlist entry; it looks like the current generateReply() strategy may be 2.5-specific.

Because voice.AgentSession relies on programmatic continuation for several standard flows, one of these needs to be true:

  • generateReply() should work with gemini-3.1-flash-live-preview, or
  • a supported Gemini 3.1-compatible continuation primitive should exist for server-side voice agents, or
  • the docs should clearly state that Gemini 3.1 Live is not currently compatible with voice.AgentSession flows that depend on generateReply()

Separately, Gemini 3.1 Live introduces thinkingLevel semantics, while Gemini 2.5 Live uses thinkingBudget. That makes the current compatibility boundary especially important for anyone trying to adopt Gemini 3.1 features in server-side realtime agents.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions