Skip to content

feat: request lifecycle logging + metrics capture [DIS-1643]#7840

Merged
nnshah1 merged 7 commits into
mainfrom
nnshah1/DIS-1643-pr2-lifecycle-and-metrics
Apr 3, 2026
Merged

feat: request lifecycle logging + metrics capture [DIS-1643]#7840
nnshah1 merged 7 commits into
mainfrom
nnshah1/DIS-1643-pr2-lifecycle-and-metrics

Conversation

@nnshah1
Copy link
Copy Markdown
Contributor

@nnshah1 nnshah1 commented Apr 2, 2026

Summary

  • Add request_id field to InflightGuard with structured "request received" (INFO) and "request completed" (INFO/ERROR) lifecycle logs
  • Add accessor methods and Display impls for RequestType/ErrorType
  • Change cancellation logging from trace\! to warn\! with structured fields in disconnect.rs
  • Add worker lifecycle logs ("request received"/"request completed") in push_handler.rs
  • All create_inflight_guard call sites now pass request_id
  • Record input_tokens, output_tokens, ttft_ms, avg_itl_ms, prefill_worker_id, decode_worker_id on the enclosing tracing span via ResponseMetricCollector::Drop

Test plan

  • cargo clippy --workspace -- -D warnings clean
  • cargo fmt --check clean
  • CI checks pass
  • E2e tests pass (PR3/tests PR)

Part 2 of 3 for DIS-1643: Consistent Error Tracing

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Enhanced request lifecycle tracking with unique identifiers across all request types.
    • Added comprehensive request completion logging with timing and metric details.
  • Bug Fixes

    • Improved disconnect handling with elevated warning-level logging for unexpected client/stream closures.
    • Enhanced request metadata visibility in disconnect scenarios for better troubleshooting.

@nnshah1 nnshah1 requested a review from a team as a code owner April 2, 2026 23:02
@github-actions github-actions Bot added feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Apr 2, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 2, 2026

Walkthrough

This PR threads request IDs through inflight guard creation across multiple LLM service endpoints (OpenAI, Anthropic, Tensor) and enhances the metrics infrastructure to track and log request lifecycle events with request context.

Changes

Cohort / File(s) Summary
Inflight Guard Initialization
lib/llm/src/grpc/service/openai.rs, lib/llm/src/grpc/service/tensor.rs, lib/llm/src/http/service/anthropic.rs, lib/llm/src/http/service/openai.rs
Updated create_inflight_guard calls to pass request_id as an additional parameter, threading request identifiers through inflight metrics tracking across multiple service endpoints.
Metrics Infrastructure
lib/llm/src/http/service/metrics.rs
Added request_id field to InflightGuard, exposed new public accessors (request_id, model, endpoint, request_type, error_type, elapsed_ms), enhanced request lifecycle logging with severity levels based on status, added ResponseMetricCollector per-request aggregation state tracking, and implemented Display trait for RequestType and ErrorType.
Disconnect & Request Lifecycle Logging
lib/llm/src/http/service/disconnect.rs
Elevated disconnect logging from trace! to warn! level and added detailed request metadata logging when streams stop, including request ID, model, endpoint, and elapsed time.
Test Updates
lib/llm/tests/http_metrics.rs
Updated test cases to pass request_id argument to create_inflight_guard calls.
Request Tracking Logging
lib/runtime/src/pipeline/network/ingress/push_handler.rs
Added tracing::info! log events for request received and request completed milestones in the ingress push handler.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: adding request lifecycle logging and metrics capture across the codebase for DIS-1643.
Description check ✅ Passed The description includes all required template sections: Overview (Summary), Details (bullet points of changes), though 'Where should the reviewer start?' section is missing.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
lib/llm/src/http/service/metrics.rs (1)

1049-1083: Keep latency fields numeric in the structured logs.

elapsed_ms, ttft_ms, and avg_itl_ms are currently going through Display/String formatting, so the JSONL output will carry them as strings instead of numbers. That makes the new request-level latency fields much harder to query and aggregate downstream.

Suggested fix
-        let elapsed_ms = self.timer.elapsed().as_millis();
+        let elapsed_ms = self.timer.elapsed().as_millis() as u64;
...
-                    elapsed_ms = %elapsed_ms,
+                    elapsed_ms = elapsed_ms,
...
-                    elapsed_ms = %elapsed_ms,
+                    elapsed_ms = elapsed_ms,
         if let Some(ttft_ms) = self.ttft_ms {
-            span.record("ttft_ms", format!("{:.2}", ttft_ms).as_str());
+            let ttft_ms = (ttft_ms * 100.0).round() / 100.0;
+            span.record("ttft_ms", ttft_ms);
         }
         if self.itl_count > 0 {
-            let avg_ms = (self.itl_sum_secs / self.itl_count as f64) * 1000.0;
-            span.record("avg_itl_ms", format!("{:.2}", avg_ms).as_str());
+            let avg_ms =
+                ((self.itl_sum_secs / self.itl_count as f64) * 1000.0 * 100.0).round() / 100.0;
+            span.record("avg_itl_ms", avg_ms);
         }

Also applies to: 1392-1397

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/http/service/metrics.rs` around lines 1049 - 1083, The latency
fields are being logged with the `%` formatter which forces string/display
formatting; change the tracing fields to pass the numeric values directly (e.g.,
use elapsed_ms = elapsed_ms instead of elapsed_ms = %elapsed_ms) so they are
emitted as numbers in structured logs; apply the same change for ttft_ms and
avg_itl_ms wherever they are logged (look for the local elapsed_ms variable
computed from self.timer.elapsed().as_millis() and the ttft_ms/avg_itl_ms
variables and update their uses in the tracing::error! / tracing::info! calls).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/http/service/anthropic.rs`:
- Around line 308-313: The inflight guard creation
(state.metrics_clone().create_inflight_guard(...) assigned to inflight_guard)
must be moved earlier into the resolved-model block so it is created before
request translation and the call to engine.generate(); this ensures lifecycle
logs cover translation/setup failures and that elapsed_ms includes that
phase—create the guard there using the already-available request_id (use
request_id.clone()) and then reuse that inflight_guard variable in the current
location instead of creating it after translation/generation.

In `@lib/llm/src/http/service/disconnect.rs`:
- Around line 214-225: The tracing::warn call in the context.stopped() branch
currently logs "request cancelled by client" which misattributes timeouts;
update the log to a neutral message (e.g., "request cancelled" or "request
stopped") while keeping the existing fields and mark_error(ErrorType::Cancelled)
call intact so metrics still record cancellation—modify the string inside
tracing::warn in disconnect.rs (near inflight_guard.mark_error and
ErrorType::Cancelled) to a neutral wording.

In `@lib/llm/src/http/service/metrics.rs`:
- Around line 1053-1055: The current match arm for ErrorType::Cancelled sets
detail to "client disconnected before completion" which omits timeouts; update
the ErrorType::Cancelled match arm inside the detail = match self.error_type {
... } in metrics.rs (used for the emitted error_detail in completion logs) to a
string that covers both disconnects and timeouts (e.g., "client disconnected or
request timed out before completion" or similar) so timeout paths are labeled
correctly.

---

Nitpick comments:
In `@lib/llm/src/http/service/metrics.rs`:
- Around line 1049-1083: The latency fields are being logged with the `%`
formatter which forces string/display formatting; change the tracing fields to
pass the numeric values directly (e.g., use elapsed_ms = elapsed_ms instead of
elapsed_ms = %elapsed_ms) so they are emitted as numbers in structured logs;
apply the same change for ttft_ms and avg_itl_ms wherever they are logged (look
for the local elapsed_ms variable computed from self.timer.elapsed().as_millis()
and the ttft_ms/avg_itl_ms variables and update their uses in the
tracing::error! / tracing::info! calls).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4bbbfce2-9661-4082-b63e-98729a909979

📥 Commits

Reviewing files that changed from the base of the PR and between c376655 and 5849668.

📒 Files selected for processing (8)
  • lib/llm/src/grpc/service/openai.rs
  • lib/llm/src/grpc/service/tensor.rs
  • lib/llm/src/http/service/anthropic.rs
  • lib/llm/src/http/service/disconnect.rs
  • lib/llm/src/http/service/metrics.rs
  • lib/llm/src/http/service/openai.rs
  • lib/llm/tests/http_metrics.rs
  • lib/runtime/src/pipeline/network/ingress/push_handler.rs

Comment thread lib/llm/src/http/service/anthropic.rs Outdated
Comment thread lib/llm/src/http/service/disconnect.rs Outdated
Comment thread lib/llm/src/http/service/metrics.rs
Comment thread lib/runtime/src/pipeline/network/ingress/push_handler.rs Outdated
@nnshah1
Copy link
Copy Markdown
Contributor Author

nnshah1 commented Apr 2, 2026

Addressed CodeRabbit feedback:

Fixed:

  • disconnect.rs / metrics.rs wording: Changed "client disconnected" → "cancelled before completion" and "request cancelled by client" → "request cancelled" since context.stopped() covers both client disconnects and server-side timeouts.
  • metrics.rs token counts: Always record input_tokens/output_tokens on span (zero is meaningful — distinguishes "no tokens generated" from "field never set").

Pre-existing / follow-up:

  • anthropic.rs InflightGuard placement: The guard position is pre-existing behavior, same pattern as all OpenAI handlers. We only added request_id to the existing call site. Moving the guard earlier is a separate refactor.
  • push_handler.rs lifecycle alignment: The "request received"/"request completed" pairing not covering all error paths is a known gap. Filed as follow-up.

Comment thread lib/llm/src/http/service/metrics.rs Outdated
Comment thread lib/llm/src/http/service/metrics.rs Outdated
Comment thread lib/llm/src/http/service/metrics.rs
Comment thread lib/runtime/src/pipeline/network/ingress/push_handler.rs Outdated
@nnshah1 nnshah1 force-pushed the nnshah1/DIS-1643-pr2-lifecycle-and-metrics branch from e3f648f to 26dc587 Compare April 3, 2026 00:07
@nnshah1
Copy link
Copy Markdown
Contributor Author

nnshah1 commented Apr 3, 2026

Addressed @jh-nv feedback:

#1 (metrics.rs:922): Fixed — create_inflight_guard now takes request_id: &str instead of String. Call sites use &request_id or request.id() directly, no more clones.

#2 (metrics.rs:980): Fixed — removed the misleading comment. The span.record("model", ...) sets the span field; the info\! log explicitly includes model as a field for that specific event.

#3 (metrics.rs:1294): as_secs_f64() * 1000.0 gives fractional millisecond precision (e.g. 12.34ms), while as_millis() truncates to integer (e.g. 12ms).

#4 (push_handler.rs): Keeping "request received" — it's intentionally consistent with the frontend's lifecycle log pair ("request received" / "request completed") for end-to-end correlation across frontend and worker logs.

@nnshah1
Copy link
Copy Markdown
Contributor Author

nnshah1 commented Apr 3, 2026

Correction on #3: as_secs_f64() * 1000.0 preserves sub-millisecond precision (e.g. 12.34ms formatted to 2 decimal places on the span), while as_millis() truncates to integer (12ms). This is consistent with how the Prometheus histograms observe duration via as_secs_f64(). For latency-sensitive metrics like TTFT and ITL, the fractional precision is worth keeping.

@nnshah1 nnshah1 enabled auto-merge (squash) April 3, 2026 00:40
nnshah1 and others added 4 commits April 2, 2026 17:44
Add "request received" (INFO) and "request completed" (INFO/ERROR) logs
to InflightGuard with structured fields (request_id, model, endpoint,
request_type, status, elapsed_ms). Add cancellation WARN in disconnect
monitor. Add worker lifecycle logs in push_handler. All call sites now
pass request_id to create_inflight_guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Record input_tokens, output_tokens, ttft_ms, avg_itl_ms,
prefill_worker_id, decode_worker_id on the enclosing tracing span via
ResponseMetricCollector::Drop so they appear in JSONL logs alongside
request lifecycle events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Always record input_tokens/output_tokens on span (zero is meaningful,
distinguishes "no tokens" from "field never set"). Change cancellation
messages from "client disconnected" to "cancelled" since context.stopped()
can also be triggered by server-side timeouts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Match the OpenAI handler pattern ("create inflight_guard early to ensure
all errors are counted"). Previously the guard was created after
engine.generate(), so backend failures were not tracked.

Closes #7843

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nnshah1 and others added 2 commits April 2, 2026 17:44
Move "request received"/"request completed" logs from manual calls into
RequestMetricsGuard via set_request_id() and Drop. This ensures
"request completed" fires on all exit paths (errors, panics), matching
the frontend's InflightGuard pattern.

Closes #7844

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…omment

Change create_inflight_guard request_id param from String to &str to
avoid unnecessary clones at call sites. Remove misleading comment about
span field inheritance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nnshah1 nnshah1 force-pushed the nnshah1/DIS-1643-pr2-lifecycle-and-metrics branch from 0642ad8 to a144102 Compare April 3, 2026 00:44
@nnshah1 nnshah1 force-pushed the nnshah1/DIS-1643-pr2-lifecycle-and-metrics branch from a144102 to 7448dfa Compare April 3, 2026 00:51
Inject request-id into transport headers from the request context
directly, independent of DistributedTraceIdLayer. This ensures worker
lifecycle logs have request_id in both JSONL and READABLE modes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nnshah1 nnshah1 force-pushed the nnshah1/DIS-1643-pr2-lifecycle-and-metrics branch from 7448dfa to c458b00 Compare April 3, 2026 00:54
@nnshah1 nnshah1 merged commit 679d9f1 into main Apr 3, 2026
95 checks passed
@nnshah1 nnshah1 deleted the nnshah1/DIS-1643-pr2-lifecycle-and-metrics branch April 3, 2026 01:38
ErrorType::NotImplemented => "requested feature not implemented",
ErrorType::None => "unknown error",
};
tracing::error!(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, canceled requests get logged at error! here, but cancellations are mostly just clients disconnecting or LB timeouts. So on production this might be excessive.

Just something to look for in the future -- too bad we don't dog food our own product at scale with real clients.

@@ -998,26 +1032,56 @@ impl InflightGuard {
impl Drop for InflightGuard {
fn drop(&mut self) {
let duration = self.timer.elapsed().as_secs_f64();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a very small thing... you have this again in L1048 (you're calling elapsed twice)

.with_label_values(&[&self.model])
.observe(duration);

let elapsed_ms = self.timer.elapsed().as_millis();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very small nit, elapsed is already called right after drop(), on the top of this method.

MatejKosec pushed a commit that referenced this pull request Apr 6, 2026
The images(), videos(), and video_stream() handlers were missing
explicit inflight.mark_error() calls on their early-return error paths.
While the InflightGuard RAII defaults to Status::Error on drop, it uses
ErrorType::Internal for all cases, causing the video_stream()
ctx.stopped() cancellation path to be misclassified (should be
ErrorType::Cancelled) and structured lifecycle logging added by #7840 to
not be triggered correctly.

Apply the same pattern used in all LLM handlers:
- images(): mark_error on engine.generate() failure and from_annotated_stream() failure
- videos(): same two paths
- video_stream(): mark_error(Internal) on engine.generate() failure,
  mark_error(Cancelled) in ctx.stopped() select arm,
  mark_error(Internal) in Response::builder() map_err

Fixes #7645

Signed-off-by: Matej Kosec <mkosec@4u2g-0421.ipp3a2.colossus.nvidia.com>
Signed-off-by: Matej Kosec <mkosec@nvidia.com>
MatejKosec added a commit that referenced this pull request Apr 6, 2026
The images(), videos(), and video_stream() handlers were missing
explicit inflight.mark_error() calls on their early-return error paths.
While the InflightGuard RAII defaults to Status::Error on drop, it uses
ErrorType::Internal for all cases, causing the video_stream()
ctx.stopped() cancellation path to be misclassified (should be
ErrorType::Cancelled) and structured lifecycle logging added by #7840 to
not be triggered correctly.

Apply the same pattern used in all LLM handlers:
- images(): mark_error on engine.generate() failure and from_annotated_stream() failure
- videos(): same two paths
- video_stream(): mark_error(Internal) on engine.generate() failure,
  mark_error(Cancelled) in ctx.stopped() select arm,
  mark_error(Internal) in Response::builder() map_err

Fixes #7645

Signed-off-by: Matej Kosec <mkosec@nvidia.com>
MatejKosec added a commit that referenced this pull request Apr 6, 2026
The images(), videos(), and video_stream() handlers were missing
explicit inflight.mark_error() calls on their early-return error paths.
While the InflightGuard RAII defaults to Status::Error on drop, it uses
ErrorType::Internal for all cases, causing the video_stream()
ctx.stopped() cancellation path to be misclassified (should be
ErrorType::Cancelled) and structured lifecycle logging added by #7840 to
not be triggered correctly.

Apply the same pattern used in all LLM handlers:
- images(): mark_error on engine.generate() failure and from_annotated_stream() failure
- videos(): same two paths
- video_stream(): mark_error(Internal) on engine.generate() failure,
  mark_error(Cancelled) in ctx.stopped() select arm,
  mark_error(Internal) in Response::builder() map_err

Fixes #7645

Signed-off-by: Matej Kosec <mkosec@nvidia.com>
MatejKosec pushed a commit that referenced this pull request Apr 6, 2026
The images(), videos(), and video_stream() handlers were missing
explicit inflight.mark_error() calls on their early-return error paths.
While the InflightGuard RAII defaults to Status::Error on drop, it uses
ErrorType::Internal for all cases, causing the video_stream()
ctx.stopped() cancellation path to be misclassified (should be
ErrorType::Cancelled) and structured lifecycle logging added by #7840 to
not be triggered correctly.

Apply the same pattern used in all LLM handlers:
- images(): mark_error on engine.generate() failure and from_annotated_stream() failure
- videos(): same two paths
- video_stream(): mark_error(Internal) on engine.generate() failure,
  mark_error(Cancelled) in ctx.stopped() select arm,
  mark_error(Internal) in Response::builder() map_err

Fixes #7645

Signed-off-by: Matej Kosec <mkosec@4u2g-0421.ipp3a2.colossus.nvidia.com>
Signed-off-by: Matej Kosec <mkosec@nvidia.com>
yao531441 pushed a commit to yao531441/dynamo that referenced this pull request May 13, 2026
…mo#7840)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants