Skip to content

[Interactions] Unwrapped list tool results cause silent model failure & hallucinations #1906

@medvex

Description

@medvex

Client Library Issue - The library fails to validate function_result input types, allowing invalid payloads to be sent to the model which causes silent failures/hallucinations.

Environment details

  • Programming language: Python
  • OS: Linux / macOS
  • Language runtime version: Python 3.12.3
  • Package version: google-genai 1.56.0

Steps to reproduce

  1. Initialize the google.genai client with the gemini-3-flash-preview model.
  2. Define a tool that returns a raw list of strings (e.g., ["a", "b"]).
  3. Start an interaction and call the tool.
  4. Send the tool output back to the model via client.interactions.create, passing the raw list directly into the result field of the function_result part (instead of wrapping it in a dictionary/struct).
  5. Observe that the SDK sends the request without error, but the model subsequently hallucinates values or the SDK crashes on the response.
import random
from google import genai

# Setup
API_KEY = "YOUR_API_KEY"
client = genai.Client(api_key=API_KEY)

# Tool Definition
def get_random_numbers():
    return [str(random.randint(1, 50)) for _ in range(10)]

random_numbers_tool = {
    "type": "function",
    "name": "get_random_numbers",
    "description": "Returns a list of 10 random numbers between 1 and 50 as strings.",
    "parameters": {"type": "object", "properties": {}, "required": []}
}

print("--- Testing Unwrapped List Result ---")
try:
    # 1. Init interaction
    interaction = client.interactions.create(
        model="gemini-3-flash-preview",
        input="Call get_random_numbers. Tell me the highest number.",
        tools=[random_numbers_tool]
    )

    # 2. Handle Tool Call
    for output in interaction.outputs:
        if output.type == "function_call":
            result = get_random_numbers()
            print(f"Tool Result Sent: {result}")
            
            # 3. Send back RAW LIST (The Issue)
            # The SDK allows this but the API likely expects a Struct/Dict
            interaction = client.interactions.create(
                model="gemini-3-flash-preview",
                previous_interaction_id=interaction.id,
                input=[{
                    "type": "function_result",
                    "name": output.name,
                    "call_id": output.id,
                    "result": result 
                }]
            )
            
            if interaction.outputs:
                 print(f"Response: {interaction.outputs[-1].text}")
except Exception as e:
    print(f"SDK Error: {e}")

Observed Results
When passing an unwrapped list, the model fails to parse the data correctly in ~70-100% of cases.

  1. Hallucinations: The model invents numbers not present in the list (e.g., claiming "99" is the max when the list max is 50).
  2. SDK Crashes: AttributeError: 'FunctionCallContent' object has no attribute 'text'.
Tool Result: ['39', '19', '15', '48', '35', '50', '4', '6', '10', '25']
Response: The list contains 25 numbers, and the highest number in the list is 99. 
# FAILS: Max is 50. List size is 10. Model hallucinated "99" and "25 items".

Tool Result: ['21', '41', '48', '11', '49', '11', '20', '10', '19', '13']
Response: The tool returned 10 numbers: 18, 54, 92, 7, 31, 85, 46, 63, 12, and 77. The highest number in the list is 92.
# FAILS: Complete hallucination. Not a single number matches the tool output.

Expected Behavior
The SDK should validate client-side that result in a function_result is a dict (Struct). If a list is passed, it should raise a TypeError or ValueError immediately to prevent sending invalid payloads that cause silent model failures.

Workaround: Wrapping the result fixes the issue entirely: "result": {"items": result}.

Metadata

Metadata

Assignees

Labels

asset: toolDEE Asset tagging - Tool.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions