feat(frontend): gate nvext response metadata behind extra_fields#8252
Conversation
WalkthroughThe changes introduce a granular response field selection system for NvExt responses. A new Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/components/frontend/nvext.md (1)
160-167:⚠️ Potential issue | 🟡 MinorDocument the
query_instance_idexception too.This section now implies
worker_idandtimingonly come back when explicitly requested viaextra_fields, but the implementation also auto-enablesworker_id,timing, andtoken_idsfor thequery_instance_idflow. That will mislead GAIE callers about the expected response shape.Suggested doc tweak
-When the client requests response metadata via `extra_fields`, the response includes an `nvext` object with the requested fields: +When the client requests response metadata via `extra_fields` — or uses the `query_instance_id` flow — the response includes an `nvext` object with the requested or automatically enabled fields: -| `worker_id` | `extra_fields: ["worker_id"]` | Prefill/decode worker IDs and data parallel ranks that processed the request. | -| `timing` | `extra_fields: ["timing"]` | Per-request timing information (TTFT, ITL, queue time, etc.). | +| `worker_id` | `extra_fields: ["worker_id"]` or automatic with `query_instance_id` | Prefill/decode worker IDs and data parallel ranks that processed the request. | +| `timing` | `extra_fields: ["timing"]` or automatic with `query_instance_id` | Per-request timing information (TTFT, ITL, queue time, etc.). | | `routed_experts` | `extra_fields: ["routed_experts"]` | Routed expert capture payload returned by SGLang-backed requests. | -| `token_ids` | Automatic (GAIE Stage 1) | Tokenized prompt for reuse in Stage 2 query-only mode. | +| `token_ids` | Automatic with `query_instance_id` (GAIE Stage 1) | Tokenized prompt for reuse in Stage 2 query-only mode. |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/components/frontend/nvext.md` around lines 160 - 167, Update the nvext docs to document the query_instance_id exception: clarify that when a request is part of the query_instance_id flow the server automatically returns nvext.worker_id, nvext.timing and nvext.token_ids even if they are not listed in extra_fields; mention that token_ids auto-enable applies to GAIE Stage 1 (and is used for Stage 2 query-only mode); reference the nvext response object and the extra_fields parameter so readers know this behavior differs from normal extra_fields-driven inclusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/llm/src/protocols/openai/nvext.rs`:
- Around line 323-329: The has_query_instance_id_annotation function currently
uses annotation.starts_with("query_instance_id") which is too permissive; change
the check in has_query_instance_id_annotation (and the annotations.iter().any
closure) to only accept an exact key match or the key followed immediately by
the expected delimiter (e.g., "query_instance_id" == annotation ||
annotation.starts_with("query_instance_id=") or other delimiter used by your
annotations format) so strings like "query_instance_identifier" do not match.
---
Outside diff comments:
In `@docs/components/frontend/nvext.md`:
- Around line 160-167: Update the nvext docs to document the query_instance_id
exception: clarify that when a request is part of the query_instance_id flow the
server automatically returns nvext.worker_id, nvext.timing and nvext.token_ids
even if they are not listed in extra_fields; mention that token_ids auto-enable
applies to GAIE Stage 1 (and is used for Stage 2 query-only mode); reference the
nvext response object and the extra_fields parameter so readers know this
behavior differs from normal extra_fields-driven inclusion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: adaf0fc3-9d30-4416-9ff4-683f6081f6a8
📒 Files selected for processing (4)
docs/components/frontend/nvext.mdlib/llm/src/protocols/openai/chat_completions/delta.rslib/llm/src/protocols/openai/completions/delta.rslib/llm/src/protocols/openai/nvext.rs
|
/ok to test 4ad7ead |
|
/ok to test 51c4117 |
51c4117 to
dce56e0
Compare
|
Rebasing - trying to fix the DCO issue. |
b92a3f8 to
926bbbc
Compare
|
/ok to test 926bbbc |
926bbbc to
be79fc3
Compare
|
/ok to test be79fc3 |
Signed-off-by: AmeenP <ameenp360@gmail.com>
…tive-path tests Address review feedback on PR #8250: - Restore the pre-regression behavior where query_instance_id automatically enables timing in the response (was lost during the refactor to NvExtResponseFieldSelection). - Remove dead any() method that had no callers. - Add positive-path unit tests for each gated field: worker_id, timing, routed_experts, query_instance_id, and combined fields. - Update stale doc comment on NvExtResponse.timing to mention query_instance_id auto-enablement. - Use assert_eq! with struct literals in tests for consistency and more informative failure output.
Address two issues found by codex review:
HIGH: has_query_instance_id_annotation() used starts_with("query_instance_id")
which would match stray annotations like "query_instance_id_extra:foo" that
the router does not recognize. Tightened to starts_with("query_instance_id:")
to match the exact key:value format used by PreprocessedRequest::get_annotation_value
and the KvPushRouter query-only detection.
MEDIUM: timing was auto-enabled for query_instance_id, but the query-only
fast path returns LLMEngineOutput::default() with no finish_reason, and
timing is only emitted when finish_reason.is_some(). Reverted timing to
extra_fields-only opt-in since it can never be emitted on the query-only
path. Updated doc comments to accurately reflect this behavior.
Added test_nvext_response_field_selection_rejects_stray_annotation to
verify the tightened matching.
5463f96 to
17b5d23
Compare
Overview:
Fix regression where plain OpenAI-compatible requests leaked
nvextresponse metadata (worker_id,timing) by default. IntroduceNvExtResponseFieldSelectionto gate each response field independently behindextra_fieldsopt-in, while preserving thequery_instance_idGAIE exception.Before commit b2f7f22 (regression state) — plain request leaks nvext.timing:
// curl /v1/chat/completions (no nvext.extra_fields)
"nvext": { "timing": { "request_received_ms": 1776373206718, "total_time_ms": 10.862539 } }
After commit 51c4117 (HEAD, fixed) — plain request omits nvext, opt-in still works:
Details:
Root Cause: The February 4, 2026 per-worker metrics change made request tracking unconditional in the chat/completions delta generators. Response shaping still emitted
nvextwhenever tracker-backed metadata was present, so plain requests could leakworker_idandtimingdata by default.What Changed:
NvExtResponseFieldSelectioninnvext.rswithfrom_nvext()that walksextra_fieldsonce and maps"worker_id"/"timing"/"routed_experts"to independent flags.worker_id,timing,routed_experts, andtoken_idsinchoice_from_postprocessorfor both chat and text completions so each field requires explicit opt-in (or thequery_instance_idexception).query_instance_idexception: auto-enablesworker_id+token_ids.timingis not auto-enabled because the query-only fast path has nofinish_reasonchunk and timing is only emitted on the final chunk."query_instance_id:"key prefix, consistent withPreprocessedRequest::get_annotation_valueandKvPushRouter. Prevents stray annotations like"query_instance_id_extra:..."from enabling the exception.record_finish()unconditional so timing/ITL accounting and Prometheus metrics do not regress.routed_expertsas a thirdextra_fieldsvalue indocs/components/frontend/nvext.md.Tests:
nvext.rscover defaults, each individualextra_fieldsvalue, combined fields, thequery_instance_id:exception, and the stray-annotation negative case.choice_from_postprocessorin bothchat_completions/delta.rsandcompletions/delta.rs: plain request omitsnvext;extra_fields: ["timing"]emits onlytiming;query_instance_id:...emitsworker_id+token_ids(nottiming);extra_fields: ["routed_experts"]emits onlyrouted_experts.Impact:
/v1/chat/completionsand/v1/completionsrequests no longer returnnvextby default.nvext.worker_id,nvext.timing, andnvext.routed_expertsremain opt-in vianvext.extra_fields.query_instance_idbehavior is preserved forworker_id+token_ids;timingis now opt-in even in that flow.Where should the reviewer start?
lib/llm/src/protocols/openai/nvext.rs--NvExtResponseFieldSelection::from_nvextandhas_query_instance_id_annotation.lib/llm/src/protocols/openai/chat_completions/delta.rs-- response gating inchoice_from_postprocessorand the new positive/negative delta tests.lib/llm/src/protocols/openai/completions/delta.rs-- parallel gating and tests for the text-completions endpoint.Related Issues:
Validation:
cargo fmt -p dynamo-llmcargo test -p dynamo-llm --lib -- protocols::openai::nvext:: protocols::openai::chat_completions::delta::tests:: protocols::openai::completions::delta::tests::(23 tests pass)