Skip to content

Conversation

@fredvollmer
Copy link
Contributor

@fredvollmer fredvollmer commented Jan 26, 2026

VertexAI supports some request options which users may want to enable on retried requests, but not on every request. For example, the X-Vertex-AI-LLM-Request-Type header can be set to a value of "priority" to decrease the odds of receiving a 429. However, this options incurs 1.8x the standard cost, so it would be beneficial to enable it only on retries. I'm sure there are other scenarios as well, but this is the one my team is currently facing.

Summary by CodeRabbit

  • New Features

    • HTTP options can now be provided as a callable to customize request options (headers, timeouts) per retry attempt.
    • Public constructor accepts either static HTTP options or a per-attempt callable for dynamic request configuration.
  • Refactor

    • Streaming retry logic now records the current attempt number to enable per-attempt behavior.
  • Tests

    • Added comprehensive tests covering static and per-attempt HTTP options, retry behavior, streaming, and attempt-tracking.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 26, 2026

📝 Walkthrough

Walkthrough

Added per-attempt tracking to LLMStream via a private _attempt_number set each retry and extended the Google LLM plugin to accept either static or callable per-attempt http_options, resolving them using the current 1-indexed attempt. Tests added to validate propagation and retry behavior.

Changes

Cohort / File(s) Summary
Attempt tracking
livekit-agents/livekit/agents/llm/llm.py
Added private _attempt_number attribute (initialized to 1) on LLMStream. In _main_task retry loop, set _attempt_number = i + 1 at each iteration start.
Per-attempt HTTP options (Google plugin)
livekit-plugins/livekit-plugins-google/.../livekit/plugins/google/llm.py
_LLMOptions.http_options and LLM.__init__ now accept `types.HttpOptions
Tests (new)
tests/test_google_llm_http_options.py
New tests for static and callable http_options, factory invocation with 1-indexed attempts, header/timeout propagation, retry scenarios (including 429), streaming mocks, and verification of _attempt_number on streams.

Sequence Diagram(s)

mermaid
sequenceDiagram
autonumber
participant Stream as LLMStream
participant Plugin as GoogleLLM
participant API as GoogleAPI
rect rgba(200,200,255,0.5)
Stream->>Stream: start _main_task retry loop\nset _attempt_number = i + 1
end
rect rgba(200,255,200,0.5)
Stream->>Plugin: request completion (includes attempt context)
Plugin->>Plugin: resolve http_options\n(if callable) call with attempt
Plugin->>API: send request with resolved http_options
API-->>Plugin: response (200 / 429 / stream chunks)
alt 429 or retryable
Plugin-->>Stream: indicate retry needed
Stream->>Stream: increment retry index -> repeat
else success
API-->>Stream: stream data / final result
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰
I count each hop — one, two — _attempt_number in stride,
Per-attempt headers and timeouts walk by my side.
Static or callable, each retry gets a say,
I stream, I retry, and nibble bugs away. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding an http_options factory feature to the Google plugins. It accurately reflects the primary objective of allowing selective per-attempt request options configuration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26fdf36 and 90bf198.

📒 Files selected for processing (1)
  • tests/test_google_llm_http_options.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • tests/test_google_llm_http_options.py
🧬 Code graph analysis (1)
tests/test_google_llm_http_options.py (3)
livekit-agents/livekit/agents/_exceptions.py (1)
  • APIStatusError (45-81)
livekit-agents/livekit/agents/types.py (1)
  • APIConnectOptions (54-88)
livekit-plugins/livekit-plugins-google/livekit/plugins/google/llm.py (2)
  • LLM (95-391)
  • model (229-230)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unit-tests
  • GitHub Check: type-check (3.13)
  • GitHub Check: type-check (3.9)
🔇 Additional comments (15)
tests/test_google_llm_http_options.py (15)

1-15: Clean module setup and imports.
Everything necessary for the test suite is in place, with no obvious lint or style issues.


20-33: Static http_options constructor coverage looks good.
Clear, focused assertions on storage and callability.


34-48: Callable http_options acceptance is well-tested.
The factory wiring is exercised directly and succinctly.


50-69: Factory output behavior checks are solid.
The 1‑indexed attempt assumptions are explicitly validated.


75-88: Attempt recording test is straightforward and clear.
Good minimal verification of 1‑indexed semantics.


90-114: Header mutation by attempt is covered well.
The priority header logic mirrors the intended behavior.


115-125: Timeout progression test is concise and correct.
Matches the intended “increase per retry” semantics.


128-139: Mock response construction is clear and sufficient.
No concerns with the response shape for these tests.


181-211: Static http_options propagation test is solid.
Captures config and validates header forwarding clearly.


212-242: Attempt=1 invocation behavior is well-validated.
Simple and accurate coverage of first attempt semantics.


244-294: Retry behavior and priority header propagation are covered thoroughly.
Good verification of both attempt numbers and headers on retry.


295-327: Static http_options reuse across retries is validated.
Matches expected behavior without extra complexity.


333-359: Attempt number initialization test looks good.
Directly verifies the new _attempt_number default.


360-385: Stream subclass attribute existence test is solid.
Ensures the base contract is visible across implementations.


142-175: Mock generate_content_stream signature is correct and matches production API usage.

The current implementation correctly matches how the production code consumes this API. In llm.py:451-459, the code pattern is stream = await self._client.aio.models.generate_content_stream(...); async for response in stream:, which requires generate_content_stream to be an async function that returns an async generator object when awaited. The mock does exactly this—it's an async function that returns an async generator object. The suggested change would make the mock an async generator function itself, which would break the production code since it wouldn't be awaitable.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@tests/test_google_llm_http_options.py`:
- Around line 5-6: Remove the unused AsyncMock import to satisfy ruff F401: edit
the import line in tests/test_google_llm_http_options.py so it only imports the
used symbols (e.g., replace "from unittest.mock import AsyncMock, MagicMock,
patch" with "from unittest.mock import MagicMock, patch"), ensuring AsyncMock is
no longer referenced in the file.
- Around line 53-55: The test currently assigns a lambda to factory which
triggers ruff E731; replace the lambda with a regular function definition named
factory that accepts attempt and returns types.HttpOptions(timeout=4000 +
attempt * 1000). Update any references to the existing factory variable to use
this new def factory(attempt) function so ruff E731 is resolved while preserving
the same behavior and the use of types.HttpOptions.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddc52c7 and 2287f7f.

📒 Files selected for processing (2)
  • livekit-plugins/livekit-plugins-google/livekit/plugins/google/llm.py
  • tests/test_google_llm_http_options.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-plugins/livekit-plugins-google/livekit/plugins/google/llm.py
  • tests/test_google_llm_http_options.py
🧬 Code graph analysis (2)
livekit-plugins/livekit-plugins-google/livekit/plugins/google/llm.py (1)
livekit-agents/livekit/agents/utils/misc.py (1)
  • is_given (25-26)
tests/test_google_llm_http_options.py (4)
livekit-agents/livekit/agents/_exceptions.py (1)
  • APIStatusError (45-81)
livekit-agents/livekit/agents/llm/chat_context.py (2)
  • ChatContext (218-656)
  • add_message (234-267)
livekit-agents/livekit/agents/types.py (1)
  • APIConnectOptions (54-88)
tests/fake_llm.py (3)
  • FakeLLM (45-67)
  • FakeLLMResponse (28-42)
  • FakeLLMStream (70-136)
🪛 GitHub Actions: CI
tests/test_google_llm_http_options.py

[error] 6-6: F401 'AsyncMock' imported but unused. (unittest.mock.AsyncMock)

🪛 GitHub Check: ruff
tests/test_google_llm_http_options.py

[failure] 53-55: Ruff (E731)
tests/test_google_llm_http_options.py:53:9: E731 Do not assign a lambda expression, use a def


[failure] 6-6: Ruff (F401)
tests/test_google_llm_http_options.py:6:27: F401 unittest.mock.AsyncMock imported but unused

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: unit-tests
🔇 Additional comments (7)
livekit-plugins/livekit-plugins-google/livekit/plugins/google/llm.py (2)

21-21: Nice API flexibility for per-attempt http_options.
Type updates and docstrings make the factory option clear and discoverable.

Also applies to: 80-80, 116-118, 148-148


431-438: Per-attempt resolution + fallback looks solid.
Default timeout fallback and header injection are straightforward and consistent.

tests/test_google_llm_http_options.py (5)

20-49: Good coverage for static vs callable http_options.
The polymorphism tests clearly validate static, callable, and 1‑indexed attempt behavior.

Also applies to: 57-69


72-126: Factory behavior tests are clear and targeted.
Attempt numbering, header variance, and timeout growth are well exercised.


128-175: Mock stream helpers are clean and reusable.
The simulated response/429 flow reads clearly and supports the integration tests well.


178-332: Integration coverage for retry + header propagation looks strong.
Static and callable paths are both exercised with good assertions on headers and attempts.


334-381: Attempt-number assertions add good safety.
These checks ensure the retry tracking contract holds across base and subclass streams.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@fredvollmer
Copy link
Contributor Author

I now realize that a FallbackAdapter can largely handle this use case. I'll leave this open in case the maintainers feel there's a use for it, but feel free to close as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant