feat: request lifecycle logging + metrics capture [DIS-1643]#7840
Conversation
WalkthroughThis PR threads request IDs through inflight guard creation across multiple LLM service endpoints (OpenAI, Anthropic, Tensor) and enhances the metrics infrastructure to track and log request lifecycle events with request context. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
lib/llm/src/http/service/metrics.rs (1)
1049-1083: Keep latency fields numeric in the structured logs.
elapsed_ms,ttft_ms, andavg_itl_msare currently going throughDisplay/Stringformatting, so the JSONL output will carry them as strings instead of numbers. That makes the new request-level latency fields much harder to query and aggregate downstream.Suggested fix
- let elapsed_ms = self.timer.elapsed().as_millis(); + let elapsed_ms = self.timer.elapsed().as_millis() as u64; ... - elapsed_ms = %elapsed_ms, + elapsed_ms = elapsed_ms, ... - elapsed_ms = %elapsed_ms, + elapsed_ms = elapsed_ms,if let Some(ttft_ms) = self.ttft_ms { - span.record("ttft_ms", format!("{:.2}", ttft_ms).as_str()); + let ttft_ms = (ttft_ms * 100.0).round() / 100.0; + span.record("ttft_ms", ttft_ms); } if self.itl_count > 0 { - let avg_ms = (self.itl_sum_secs / self.itl_count as f64) * 1000.0; - span.record("avg_itl_ms", format!("{:.2}", avg_ms).as_str()); + let avg_ms = + ((self.itl_sum_secs / self.itl_count as f64) * 1000.0 * 100.0).round() / 100.0; + span.record("avg_itl_ms", avg_ms); }Also applies to: 1392-1397
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/llm/src/http/service/metrics.rs` around lines 1049 - 1083, The latency fields are being logged with the `%` formatter which forces string/display formatting; change the tracing fields to pass the numeric values directly (e.g., use elapsed_ms = elapsed_ms instead of elapsed_ms = %elapsed_ms) so they are emitted as numbers in structured logs; apply the same change for ttft_ms and avg_itl_ms wherever they are logged (look for the local elapsed_ms variable computed from self.timer.elapsed().as_millis() and the ttft_ms/avg_itl_ms variables and update their uses in the tracing::error! / tracing::info! calls).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/llm/src/http/service/anthropic.rs`:
- Around line 308-313: The inflight guard creation
(state.metrics_clone().create_inflight_guard(...) assigned to inflight_guard)
must be moved earlier into the resolved-model block so it is created before
request translation and the call to engine.generate(); this ensures lifecycle
logs cover translation/setup failures and that elapsed_ms includes that
phase—create the guard there using the already-available request_id (use
request_id.clone()) and then reuse that inflight_guard variable in the current
location instead of creating it after translation/generation.
In `@lib/llm/src/http/service/disconnect.rs`:
- Around line 214-225: The tracing::warn call in the context.stopped() branch
currently logs "request cancelled by client" which misattributes timeouts;
update the log to a neutral message (e.g., "request cancelled" or "request
stopped") while keeping the existing fields and mark_error(ErrorType::Cancelled)
call intact so metrics still record cancellation—modify the string inside
tracing::warn in disconnect.rs (near inflight_guard.mark_error and
ErrorType::Cancelled) to a neutral wording.
In `@lib/llm/src/http/service/metrics.rs`:
- Around line 1053-1055: The current match arm for ErrorType::Cancelled sets
detail to "client disconnected before completion" which omits timeouts; update
the ErrorType::Cancelled match arm inside the detail = match self.error_type {
... } in metrics.rs (used for the emitted error_detail in completion logs) to a
string that covers both disconnects and timeouts (e.g., "client disconnected or
request timed out before completion" or similar) so timeout paths are labeled
correctly.
---
Nitpick comments:
In `@lib/llm/src/http/service/metrics.rs`:
- Around line 1049-1083: The latency fields are being logged with the `%`
formatter which forces string/display formatting; change the tracing fields to
pass the numeric values directly (e.g., use elapsed_ms = elapsed_ms instead of
elapsed_ms = %elapsed_ms) so they are emitted as numbers in structured logs;
apply the same change for ttft_ms and avg_itl_ms wherever they are logged (look
for the local elapsed_ms variable computed from self.timer.elapsed().as_millis()
and the ttft_ms/avg_itl_ms variables and update their uses in the
tracing::error! / tracing::info! calls).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4bbbfce2-9661-4082-b63e-98729a909979
📒 Files selected for processing (8)
lib/llm/src/grpc/service/openai.rslib/llm/src/grpc/service/tensor.rslib/llm/src/http/service/anthropic.rslib/llm/src/http/service/disconnect.rslib/llm/src/http/service/metrics.rslib/llm/src/http/service/openai.rslib/llm/tests/http_metrics.rslib/runtime/src/pipeline/network/ingress/push_handler.rs
|
Addressed CodeRabbit feedback: Fixed:
Pre-existing / follow-up:
|
e3f648f to
26dc587
Compare
|
Addressed @jh-nv feedback: #1 (metrics.rs:922): Fixed — #2 (metrics.rs:980): Fixed — removed the misleading comment. The #3 (metrics.rs:1294): #4 (push_handler.rs): Keeping "request received" — it's intentionally consistent with the frontend's lifecycle log pair ("request received" / "request completed") for end-to-end correlation across frontend and worker logs. |
|
Correction on #3: |
Add "request received" (INFO) and "request completed" (INFO/ERROR) logs to InflightGuard with structured fields (request_id, model, endpoint, request_type, status, elapsed_ms). Add cancellation WARN in disconnect monitor. Add worker lifecycle logs in push_handler. All call sites now pass request_id to create_inflight_guard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Record input_tokens, output_tokens, ttft_ms, avg_itl_ms, prefill_worker_id, decode_worker_id on the enclosing tracing span via ResponseMetricCollector::Drop so they appear in JSONL logs alongside request lifecycle events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Always record input_tokens/output_tokens on span (zero is meaningful, distinguishes "no tokens" from "field never set"). Change cancellation messages from "client disconnected" to "cancelled" since context.stopped() can also be triggered by server-side timeouts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Match the OpenAI handler pattern ("create inflight_guard early to ensure
all errors are counted"). Previously the guard was created after
engine.generate(), so backend failures were not tracked.
Closes #7843
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move "request received"/"request completed" logs from manual calls into RequestMetricsGuard via set_request_id() and Drop. This ensures "request completed" fires on all exit paths (errors, panics), matching the frontend's InflightGuard pattern. Closes #7844 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…omment Change create_inflight_guard request_id param from String to &str to avoid unnecessary clones at call sites. Remove misleading comment about span field inheritance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0642ad8 to
a144102
Compare
a144102 to
7448dfa
Compare
Inject request-id into transport headers from the request context directly, independent of DistributedTraceIdLayer. This ensures worker lifecycle logs have request_id in both JSONL and READABLE modes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7448dfa to
c458b00
Compare
| ErrorType::NotImplemented => "requested feature not implemented", | ||
| ErrorType::None => "unknown error", | ||
| }; | ||
| tracing::error!( |
There was a problem hiding this comment.
Just a nit, canceled requests get logged at error! here, but cancellations are mostly just clients disconnecting or LB timeouts. So on production this might be excessive.
Just something to look for in the future -- too bad we don't dog food our own product at scale with real clients.
| @@ -998,26 +1032,56 @@ impl InflightGuard { | |||
| impl Drop for InflightGuard { | |||
| fn drop(&mut self) { | |||
| let duration = self.timer.elapsed().as_secs_f64(); | |||
There was a problem hiding this comment.
Just a very small thing... you have this again in L1048 (you're calling elapsed twice)
| .with_label_values(&[&self.model]) | ||
| .observe(duration); | ||
|
|
||
| let elapsed_ms = self.timer.elapsed().as_millis(); |
There was a problem hiding this comment.
Very small nit, elapsed is already called right after drop(), on the top of this method.
The images(), videos(), and video_stream() handlers were missing explicit inflight.mark_error() calls on their early-return error paths. While the InflightGuard RAII defaults to Status::Error on drop, it uses ErrorType::Internal for all cases, causing the video_stream() ctx.stopped() cancellation path to be misclassified (should be ErrorType::Cancelled) and structured lifecycle logging added by #7840 to not be triggered correctly. Apply the same pattern used in all LLM handlers: - images(): mark_error on engine.generate() failure and from_annotated_stream() failure - videos(): same two paths - video_stream(): mark_error(Internal) on engine.generate() failure, mark_error(Cancelled) in ctx.stopped() select arm, mark_error(Internal) in Response::builder() map_err Fixes #7645 Signed-off-by: Matej Kosec <mkosec@4u2g-0421.ipp3a2.colossus.nvidia.com> Signed-off-by: Matej Kosec <mkosec@nvidia.com>
The images(), videos(), and video_stream() handlers were missing explicit inflight.mark_error() calls on their early-return error paths. While the InflightGuard RAII defaults to Status::Error on drop, it uses ErrorType::Internal for all cases, causing the video_stream() ctx.stopped() cancellation path to be misclassified (should be ErrorType::Cancelled) and structured lifecycle logging added by #7840 to not be triggered correctly. Apply the same pattern used in all LLM handlers: - images(): mark_error on engine.generate() failure and from_annotated_stream() failure - videos(): same two paths - video_stream(): mark_error(Internal) on engine.generate() failure, mark_error(Cancelled) in ctx.stopped() select arm, mark_error(Internal) in Response::builder() map_err Fixes #7645 Signed-off-by: Matej Kosec <mkosec@nvidia.com>
The images(), videos(), and video_stream() handlers were missing explicit inflight.mark_error() calls on their early-return error paths. While the InflightGuard RAII defaults to Status::Error on drop, it uses ErrorType::Internal for all cases, causing the video_stream() ctx.stopped() cancellation path to be misclassified (should be ErrorType::Cancelled) and structured lifecycle logging added by #7840 to not be triggered correctly. Apply the same pattern used in all LLM handlers: - images(): mark_error on engine.generate() failure and from_annotated_stream() failure - videos(): same two paths - video_stream(): mark_error(Internal) on engine.generate() failure, mark_error(Cancelled) in ctx.stopped() select arm, mark_error(Internal) in Response::builder() map_err Fixes #7645 Signed-off-by: Matej Kosec <mkosec@nvidia.com>
The images(), videos(), and video_stream() handlers were missing explicit inflight.mark_error() calls on their early-return error paths. While the InflightGuard RAII defaults to Status::Error on drop, it uses ErrorType::Internal for all cases, causing the video_stream() ctx.stopped() cancellation path to be misclassified (should be ErrorType::Cancelled) and structured lifecycle logging added by #7840 to not be triggered correctly. Apply the same pattern used in all LLM handlers: - images(): mark_error on engine.generate() failure and from_annotated_stream() failure - videos(): same two paths - video_stream(): mark_error(Internal) on engine.generate() failure, mark_error(Cancelled) in ctx.stopped() select arm, mark_error(Internal) in Response::builder() map_err Fixes #7645 Signed-off-by: Matej Kosec <mkosec@4u2g-0421.ipp3a2.colossus.nvidia.com> Signed-off-by: Matej Kosec <mkosec@nvidia.com>
…mo#7840) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
request_idfield toInflightGuardwith structured "request received" (INFO) and "request completed" (INFO/ERROR) lifecycle logsDisplayimpls forRequestType/ErrorTypetrace\!towarn\!with structured fields indisconnect.rspush_handler.rscreate_inflight_guardcall sites now passrequest_idinput_tokens,output_tokens,ttft_ms,avg_itl_ms,prefill_worker_id,decode_worker_idon the enclosing tracing span viaResponseMetricCollector::DropTest plan
cargo clippy --workspace -- -D warningscleancargo fmt --checkcleanPart 2 of 3 for DIS-1643: Consistent Error Tracing
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes