[google-genai] Streaming token counts massively overcounted due to incorrect accumulation

### Describe your environment

OS: Linux (Ubuntu 24.04.3 LTS)
Python version: 3.10
Package version: opentelemetry-instrumentation-google-genai 0.5b0


### What happened?

The Google GenAI instrumentation reports token counts that are sometimes 50x or more higher than actual usage when using streaming responses. This is because `_maybe_update_token_counts` uses `+=` to accumulate token counts from each streaming chunk, but:

1. **Input tokens**: All models report the same constant value (e.g., 9) in every chunk. Using `+=` sums this across all chunks, causing massive overcounting.

2. **Output tokens**: Some models like `gemini-3-pro-preview` report **cumulative** counts in each chunk (not delta values). Using `+=` sums all these cumulative values, causing massive overcounting.

| Model | Input Overcounting | Output Overcounting |
|-------|-------------------|---------------------|
| gemini-2.0-flash | ~30x | 1.0x (unaffected) |
| gemini-3-pro-preview | ~58x | ~30x |

The overcounting factor varies based on the number of streaming chunks in the response.

### Steps to Reproduce

```python
# Requirements:
#   pip install google-genai opentelemetry-instrumentation-google-genai

import os
from google import genai
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from opentelemetry.instrumentation.google_genai import GoogleGenAiSdkInstrumentor

# Set up OpenTelemetry tracing
exporter = InMemorySpanExporter()
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(exporter))
trace.set_tracer_provider(provider)

# Instrument Google GenAI (this is the buggy code)
GoogleGenAiSdkInstrumentor().instrument()

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

def test_model(model_name):
    print(f"\n{'='*60}")
    print(f"Model: {model_name}")
    print('='*60)

    exporter.clear()

    response = client.models.generate_content_stream(
        model=model_name,
        contents="Write a detailed essay about programming history."
    )

    # Consume the stream and track correct token counts
    correct_input_tokens = []
    correct_output_tokens = []
    for chunk in response:
        if hasattr(chunk, "usage_metadata") and chunk.usage_metadata:
            inp = getattr(chunk.usage_metadata, "prompt_token_count", None)
            out = getattr(chunk.usage_metadata, "candidates_token_count", None)
            if inp is not None:
                correct_input_tokens.append(inp)
            if out is not None:
                correct_output_tokens.append(out)

    # Get instrumented token counts from spans
    spans = exporter.get_finished_spans()
    for span in spans:
        if span.attributes:
            input_tokens = span.attributes.get("gen_ai.usage.input_tokens")
            output_tokens = span.attributes.get("gen_ai.usage.output_tokens")
            if input_tokens or output_tokens:
                print(f"Instrumented span reports:")
                print(f"  gen_ai.usage.input_tokens: {input_tokens}")
                print(f"  gen_ai.usage.output_tokens: {output_tokens}")

    if correct_input_tokens:
        print(f"\nCorrect input tokens: {correct_input_tokens[-1]}")
        if input_tokens and correct_input_tokens[-1]:
            print(f"Input overcounting factor: {input_tokens / correct_input_tokens[-1]:.1f}x")

    if correct_output_tokens:
        print(f"Correct output tokens: {correct_output_tokens[-1]}")
        if output_tokens and correct_output_tokens[-1]:
            print(f"Output overcounting factor: {output_tokens / correct_output_tokens[-1]:.1f}x")

test_model("gemini-2.0-flash")
test_model("gemini-3-pro-preview")

# Now patch the buggy code and re-test
print(f"\n{'='*60}")
print("Applying patch: changing += to =")
print('='*60)

from opentelemetry.instrumentation.google_genai import generate_content

helper_class = getattr(generate_content, "_GenerateContentInstrumentationHelper", None)
_get_response_property = getattr(generate_content, "_get_response_property", None)

def patched_maybe_update_token_counts(self, response):
    input_tokens = _get_response_property(response, "usage_metadata.prompt_token_count")
    output_tokens = _get_response_property(response, "usage_metadata.candidates_token_count")
    # FIX: Use = instead of +=
    if input_tokens and isinstance(input_tokens, int):
        self._input_tokens = input_tokens
    if output_tokens and isinstance(output_tokens, int):
        self._output_tokens = output_tokens

helper_class._maybe_update_token_counts = patched_maybe_update_token_counts

test_model("gemini-3-pro-preview")
```

Output (note: token counts will vary on each run):
```
============================================================
Model: gemini-2.0-flash
============================================================
Instrumented span reports:
  gen_ai.usage.input_tokens: 242
  gen_ai.usage.output_tokens: 1409

Correct input tokens: 8
Input overcounting factor: 30.2x
Correct output tokens: 1409
Output overcounting factor: 1.0x

============================================================
Model: gemini-3-pro-preview
============================================================
Instrumented span reports:
  gen_ai.usage.input_tokens: 522
  gen_ai.usage.output_tokens: 42447

Correct input tokens: 9
Input overcounting factor: 58.0x
Correct output tokens: 1414
Output overcounting factor: 30.0x

============================================================
Applying patch: changing += to =
============================================================

============================================================
Model: gemini-3-pro-preview
============================================================
Instrumented span reports:
  gen_ai.usage.input_tokens: 9
  gen_ai.usage.output_tokens: 1767

Correct input tokens: 9
Input overcounting factor: 1.0x
Correct output tokens: 1767
Output overcounting factor: 1.0x
```

### Expected Result

Token counts should match the correct values from the API:
- Input tokens: ~9 (for the test prompt)
- Output tokens: ~1,400-1,800 (varies by run)


### Actual Result

- **gemini-2.0-flash**: Input tokens overcounted ~30x (242 vs 8), output tokens correct
- **gemini-3-pro-preview**: Input tokens overcounted ~58x (522 vs 9), output tokens overcounted ~30x (42,447 vs 1,414)

After applying the patch (changing `+=` to `=`), both input and output token counts are correct (1.0x).


### Additional context

The bug is in `generate_content.py`:

```python
def _maybe_update_token_counts(self, response: GenerateContentResponse):
    # ...
    if input_tokens and isinstance(input_tokens, int):
        self._input_tokens += input_tokens  # BUG: should be =
    if output_tokens and isinstance(output_tokens, int):
        self._output_tokens += output_tokens  # BUG: should be =
```

The fix is to use assignment (`=`) instead of accumulation (`+=`) since:
- Input tokens are reported as the same constant in every chunk
- Output tokens are reported as cumulative totals


### Would you like to implement a fix?

None

### Tip

<sub>[React](https://github.blog/news-insights/product-news/add-reactions-to-pull-requests-issues-and-comments/) with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding `+1` or `me too`, to help us triage it. Learn more [here](https://opentelemetry.io/community/end-user/issue-participation/).</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[google-genai] Streaming token counts massively overcounted due to incorrect accumulation #4120

Describe your environment

What happened?

Steps to Reproduce

Expected Result

Actual Result

Additional context

Would you like to implement a fix?

Tip

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model	Input Overcounting	Output Overcounting
gemini-2.0-flash	~30x	1.0x (unaffected)
gemini-3-pro-preview	~58x	~30x

[google-genai] Streaming token counts massively overcounted due to incorrect accumulation #4120

Description

Describe your environment

What happened?

Steps to Reproduce

Expected Result

Actual Result

Additional context

Would you like to implement a fix?

Tip

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions