feat: request lifecycle logging + metrics capture [DIS-1643] by nnshah1 · Pull Request #7840 · ai-dynamo/dynamo

nnshah1 · 2026-04-02T23:02:01Z

Summary

Add request_id field to InflightGuard with structured "request received" (INFO) and "request completed" (INFO/ERROR) lifecycle logs
Add accessor methods and Display impls for RequestType/ErrorType
Change cancellation logging from trace\! to warn\! with structured fields in disconnect.rs
Add worker lifecycle logs ("request received"/"request completed") in push_handler.rs
All create_inflight_guard call sites now pass request_id
Record input_tokens, output_tokens, ttft_ms, avg_itl_ms, prefill_worker_id, decode_worker_id on the enclosing tracing span via ResponseMetricCollector::Drop

Test plan

cargo clippy --workspace -- -D warnings clean
cargo fmt --check clean
CI checks pass
E2e tests pass (PR3/tests PR)

Part 2 of 3 for DIS-1643: Consistent Error Tracing

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Enhanced request lifecycle tracking with unique identifiers across all request types.
- Added comprehensive request completion logging with timing and metric details.
Bug Fixes
- Improved disconnect handling with elevated warning-level logging for unexpected client/stream closures.
- Enhanced request metadata visibility in disconnect scenarios for better troubleshooting.

coderabbitai · 2026-04-02T23:12:52Z

Walkthrough

This PR threads request IDs through inflight guard creation across multiple LLM service endpoints (OpenAI, Anthropic, Tensor) and enhances the metrics infrastructure to track and log request lifecycle events with request context.

Changes

Cohort / File(s)	Summary
Inflight Guard Initialization `lib/llm/src/grpc/service/openai.rs`, `lib/llm/src/grpc/service/tensor.rs`, `lib/llm/src/http/service/anthropic.rs`, `lib/llm/src/http/service/openai.rs`	Updated `create_inflight_guard` calls to pass `request_id` as an additional parameter, threading request identifiers through inflight metrics tracking across multiple service endpoints.
Metrics Infrastructure `lib/llm/src/http/service/metrics.rs`	Added `request_id` field to `InflightGuard`, exposed new public accessors (`request_id`, `model`, `endpoint`, `request_type`, `error_type`, `elapsed_ms`), enhanced request lifecycle logging with severity levels based on status, added `ResponseMetricCollector` per-request aggregation state tracking, and implemented `Display` trait for `RequestType` and `ErrorType`.
Disconnect & Request Lifecycle Logging `lib/llm/src/http/service/disconnect.rs`	Elevated disconnect logging from `trace!` to `warn!` level and added detailed request metadata logging when streams stop, including request ID, model, endpoint, and elapsed time.
Test Updates `lib/llm/tests/http_metrics.rs`	Updated test cases to pass `request_id` argument to `create_inflight_guard` calls.
Request Tracking Logging `lib/runtime/src/pipeline/network/ingress/push_handler.rs`	Added `tracing::info!` log events for request received and request completed milestones in the ingress push handler.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: adding request lifecycle logging and metrics capture across the codebase for DIS-1643.
Description check	✅ Passed	The description includes all required template sections: Overview (Summary), Details (bullet points of changes), though 'Where should the reviewer start?' section is missing.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

lib/llm/src/http/service/metrics.rs (1)

1049-1083: Keep latency fields numeric in the structured logs.

elapsed_ms, ttft_ms, and avg_itl_ms are currently going through Display/String formatting, so the JSONL output will carry them as strings instead of numbers. That makes the new request-level latency fields much harder to query and aggregate downstream.

Suggested fix

-        let elapsed_ms = self.timer.elapsed().as_millis();
+        let elapsed_ms = self.timer.elapsed().as_millis() as u64;
...
-                    elapsed_ms = %elapsed_ms,
+                    elapsed_ms = elapsed_ms,
...
-                    elapsed_ms = %elapsed_ms,
+                    elapsed_ms = elapsed_ms,

         if let Some(ttft_ms) = self.ttft_ms {
-            span.record("ttft_ms", format!("{:.2}", ttft_ms).as_str());
+            let ttft_ms = (ttft_ms * 100.0).round() / 100.0;
+            span.record("ttft_ms", ttft_ms);
         }
         if self.itl_count > 0 {
-            let avg_ms = (self.itl_sum_secs / self.itl_count as f64) * 1000.0;
-            span.record("avg_itl_ms", format!("{:.2}", avg_ms).as_str());
+            let avg_ms =
+                ((self.itl_sum_secs / self.itl_count as f64) * 1000.0 * 100.0).round() / 100.0;
+            span.record("avg_itl_ms", avg_ms);
         }

Also applies to: 1392-1397

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/http/service/metrics.rs` around lines 1049 - 1083, The latency
fields are being logged with the `%` formatter which forces string/display
formatting; change the tracing fields to pass the numeric values directly (e.g.,
use elapsed_ms = elapsed_ms instead of elapsed_ms = %elapsed_ms) so they are
emitted as numbers in structured logs; apply the same change for ttft_ms and
avg_itl_ms wherever they are logged (look for the local elapsed_ms variable
computed from self.timer.elapsed().as_millis() and the ttft_ms/avg_itl_ms
variables and update their uses in the tracing::error! / tracing::info! calls).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/http/service/anthropic.rs`:
- Around line 308-313: The inflight guard creation
(state.metrics_clone().create_inflight_guard(...) assigned to inflight_guard)
must be moved earlier into the resolved-model block so it is created before
request translation and the call to engine.generate(); this ensures lifecycle
logs cover translation/setup failures and that elapsed_ms includes that
phase—create the guard there using the already-available request_id (use
request_id.clone()) and then reuse that inflight_guard variable in the current
location instead of creating it after translation/generation.

In `@lib/llm/src/http/service/disconnect.rs`:
- Around line 214-225: The tracing::warn call in the context.stopped() branch
currently logs "request cancelled by client" which misattributes timeouts;
update the log to a neutral message (e.g., "request cancelled" or "request
stopped") while keeping the existing fields and mark_error(ErrorType::Cancelled)
call intact so metrics still record cancellation—modify the string inside
tracing::warn in disconnect.rs (near inflight_guard.mark_error and
ErrorType::Cancelled) to a neutral wording.

In `@lib/llm/src/http/service/metrics.rs`:
- Around line 1053-1055: The current match arm for ErrorType::Cancelled sets
detail to "client disconnected before completion" which omits timeouts; update
the ErrorType::Cancelled match arm inside the detail = match self.error_type {
... } in metrics.rs (used for the emitted error_detail in completion logs) to a
string that covers both disconnects and timeouts (e.g., "client disconnected or
request timed out before completion" or similar) so timeout paths are labeled
correctly.

---

Nitpick comments:
In `@lib/llm/src/http/service/metrics.rs`:
- Around line 1049-1083: The latency fields are being logged with the `%`
formatter which forces string/display formatting; change the tracing fields to
pass the numeric values directly (e.g., use elapsed_ms = elapsed_ms instead of
elapsed_ms = %elapsed_ms) so they are emitted as numbers in structured logs;
apply the same change for ttft_ms and avg_itl_ms wherever they are logged (look
for the local elapsed_ms variable computed from self.timer.elapsed().as_millis()
and the ttft_ms/avg_itl_ms variables and update their uses in the
tracing::error! / tracing::info! calls).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4bbbfce2-9661-4082-b63e-98729a909979

📥 Commits

Reviewing files that changed from the base of the PR and between c376655 and 5849668.

📒 Files selected for processing (8)

lib/llm/src/grpc/service/openai.rs
lib/llm/src/grpc/service/tensor.rs
lib/llm/src/http/service/anthropic.rs
lib/llm/src/http/service/disconnect.rs
lib/llm/src/http/service/metrics.rs
lib/llm/src/http/service/openai.rs
lib/llm/tests/http_metrics.rs
lib/runtime/src/pipeline/network/ingress/push_handler.rs

nnshah1 · 2026-04-02T23:32:12Z

Addressed CodeRabbit feedback:

Fixed:

disconnect.rs / metrics.rs wording: Changed "client disconnected" → "cancelled before completion" and "request cancelled by client" → "request cancelled" since context.stopped() covers both client disconnects and server-side timeouts.
metrics.rs token counts: Always record input_tokens/output_tokens on span (zero is meaningful — distinguishes "no tokens generated" from "field never set").

Pre-existing / follow-up:

anthropic.rs InflightGuard placement: The guard position is pre-existing behavior, same pattern as all OpenAI handlers. We only added request_id to the existing call site. Moving the guard earlier is a separate refactor.
push_handler.rs lifecycle alignment: The "request received"/"request completed" pairing not covering all error paths is a known gap. Filed as follow-up.

nnshah1 · 2026-04-03T00:19:50Z

Addressed @jh-nv feedback:

#1 (metrics.rs:922): Fixed — create_inflight_guard now takes request_id: &str instead of String. Call sites use &request_id or request.id() directly, no more clones.

#2 (metrics.rs:980): Fixed — removed the misleading comment. The span.record("model", ...) sets the span field; the info\! log explicitly includes model as a field for that specific event.

#3 (metrics.rs:1294): as_secs_f64() * 1000.0 gives fractional millisecond precision (e.g. 12.34ms), while as_millis() truncates to integer (e.g. 12ms).

#4 (push_handler.rs): Keeping "request received" — it's intentionally consistent with the frontend's lifecycle log pair ("request received" / "request completed") for end-to-end correlation across frontend and worker logs.

nnshah1 · 2026-04-03T00:21:08Z

Correction on #3: as_secs_f64() * 1000.0 preserves sub-millisecond precision (e.g. 12.34ms formatted to 2 decimal places on the span), while as_millis() truncates to integer (12ms). This is consistent with how the Prometheus histograms observe duration via as_secs_f64(). For latency-sensitive metrics like TTFT and ITL, the fractional precision is worth keeping.

Add "request received" (INFO) and "request completed" (INFO/ERROR) logs to InflightGuard with structured fields (request_id, model, endpoint, request_type, status, elapsed_ms). Add cancellation WARN in disconnect monitor. Add worker lifecycle logs in push_handler. All call sites now pass request_id to create_inflight_guard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Record input_tokens, output_tokens, ttft_ms, avg_itl_ms, prefill_worker_id, decode_worker_id on the enclosing tracing span via ResponseMetricCollector::Drop so they appear in JSONL logs alongside request lifecycle events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Always record input_tokens/output_tokens on span (zero is meaningful, distinguishes "no tokens" from "field never set"). Change cancellation messages from "client disconnected" to "cancelled" since context.stopped() can also be triggered by server-side timeouts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Match the OpenAI handler pattern ("create inflight_guard early to ensure all errors are counted"). Previously the guard was created after engine.generate(), so backend failures were not tracked. Closes #7843 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move "request received"/"request completed" logs from manual calls into RequestMetricsGuard via set_request_id() and Drop. This ensures "request completed" fires on all exit paths (errors, panics), matching the frontend's InflightGuard pattern. Closes #7844 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…omment Change create_inflight_guard request_id param from String to &str to avoid unnecessary clones at call sites. Remove misleading comment about span field inheritance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Inject request-id into transport headers from the request context directly, independent of DistributedTraceIdLayer. This ensures worker lifecycle logs have request_id in both JSONL and READABLE modes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

keivenchang · 2026-04-03T01:40:30Z

+                    ErrorType::NotImplemented => "requested feature not implemented",
+                    ErrorType::None => "unknown error",
+                };
+                tracing::error!(


Just a nit, canceled requests get logged at error! here, but cancellations are mostly just clients disconnecting or LB timeouts. So on production this might be excessive.

Just something to look for in the future -- too bad we don't dog food our own product at scale with real clients.

keivenchang · 2026-04-03T01:41:29Z

@@ -998,26 +1032,56 @@ impl InflightGuard {
 impl Drop for InflightGuard {
    fn drop(&mut self) {
        let duration = self.timer.elapsed().as_secs_f64();


Just a very small thing... you have this again in L1048 (you're calling elapsed twice)

keivenchang · 2026-04-03T01:42:01Z

            .with_label_values(&[&self.model])
            .observe(duration);
+
+        let elapsed_ms = self.timer.elapsed().as_millis();


Very small nit, elapsed is already called right after drop(), on the top of this method.

The images(), videos(), and video_stream() handlers were missing explicit inflight.mark_error() calls on their early-return error paths. While the InflightGuard RAII defaults to Status::Error on drop, it uses ErrorType::Internal for all cases, causing the video_stream() ctx.stopped() cancellation path to be misclassified (should be ErrorType::Cancelled) and structured lifecycle logging added by #7840 to not be triggered correctly. Apply the same pattern used in all LLM handlers: - images(): mark_error on engine.generate() failure and from_annotated_stream() failure - videos(): same two paths - video_stream(): mark_error(Internal) on engine.generate() failure, mark_error(Cancelled) in ctx.stopped() select arm, mark_error(Internal) in Response::builder() map_err Fixes #7645 Signed-off-by: Matej Kosec <mkosec@4u2g-0421.ipp3a2.colossus.nvidia.com> Signed-off-by: Matej Kosec <mkosec@nvidia.com>

The images(), videos(), and video_stream() handlers were missing explicit inflight.mark_error() calls on their early-return error paths. While the InflightGuard RAII defaults to Status::Error on drop, it uses ErrorType::Internal for all cases, causing the video_stream() ctx.stopped() cancellation path to be misclassified (should be ErrorType::Cancelled) and structured lifecycle logging added by #7840 to not be triggered correctly. Apply the same pattern used in all LLM handlers: - images(): mark_error on engine.generate() failure and from_annotated_stream() failure - videos(): same two paths - video_stream(): mark_error(Internal) on engine.generate() failure, mark_error(Cancelled) in ctx.stopped() select arm, mark_error(Internal) in Response::builder() map_err Fixes #7645 Signed-off-by: Matej Kosec <mkosec@nvidia.com>

The images(), videos(), and video_stream() handlers were missing explicit inflight.mark_error() calls on their early-return error paths. While the InflightGuard RAII defaults to Status::Error on drop, it uses ErrorType::Internal for all cases, causing the video_stream() ctx.stopped() cancellation path to be misclassified (should be ErrorType::Cancelled) and structured lifecycle logging added by #7840 to not be triggered correctly. Apply the same pattern used in all LLM handlers: - images(): mark_error on engine.generate() failure and from_annotated_stream() failure - videos(): same two paths - video_stream(): mark_error(Internal) on engine.generate() failure, mark_error(Cancelled) in ctx.stopped() select arm, mark_error(Internal) in Response::builder() map_err Fixes #7645 Signed-off-by: Matej Kosec <mkosec@4u2g-0421.ipp3a2.colossus.nvidia.com> Signed-off-by: Matej Kosec <mkosec@nvidia.com>

…mo#7840) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

nnshah1 requested a review from a team as a code owner April 2, 2026 23:02

pull-request-size Bot added the size/L label Apr 2, 2026

github-actions Bot added feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Apr 2, 2026

This was referenced Apr 2, 2026

feat: request lifecycle logging via InflightGuard #7815

Closed

feat: token counts, TTFT, ITL, worker IDs on span #7816

Closed

coderabbitai Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread lib/llm/src/http/service/anthropic.rs Outdated

Comment thread lib/llm/src/http/service/disconnect.rs Outdated

Comment thread lib/llm/src/http/service/metrics.rs

Comment thread lib/runtime/src/pipeline/network/ingress/push_handler.rs Outdated

copy-pr-bot Bot temporarily deployed to GITLAB April 2, 2026 23:30 Inactive

This was referenced Apr 2, 2026

fix: InflightGuard created after engine.generate() in anthropic handler #7843

Closed

fix: worker lifecycle logs not aligned with error paths in push_handler #7844

Closed

jh-nv reviewed Apr 2, 2026

View reviewed changes

Comment thread lib/llm/src/http/service/metrics.rs Outdated

copy-pr-bot Bot temporarily deployed to GITLAB April 2, 2026 23:50 Inactive

jh-nv reviewed Apr 2, 2026

View reviewed changes

Comment thread lib/llm/src/http/service/metrics.rs Outdated

copy-pr-bot Bot temporarily deployed to GITLAB April 2, 2026 23:56 Inactive

jh-nv reviewed Apr 3, 2026

View reviewed changes

Comment thread lib/llm/src/http/service/metrics.rs

jh-nv reviewed Apr 3, 2026

View reviewed changes

Comment thread lib/runtime/src/pipeline/network/ingress/push_handler.rs Outdated

nnshah1 force-pushed the nnshah1/DIS-1643-pr2-lifecycle-and-metrics branch from e3f648f to 26dc587 Compare April 3, 2026 00:07

copy-pr-bot Bot temporarily deployed to GITLAB April 3, 2026 00:07 Inactive

jh-nv approved these changes Apr 3, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to GITLAB April 3, 2026 00:20 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 3, 2026 00:37 Inactive

nnshah1 enabled auto-merge (squash) April 3, 2026 00:40

nnshah1 and others added 4 commits April 2, 2026 17:44

nnshah1 and others added 2 commits April 2, 2026 17:44

nnshah1 force-pushed the nnshah1/DIS-1643-pr2-lifecycle-and-metrics branch from 0642ad8 to a144102 Compare April 3, 2026 00:44

copy-pr-bot Bot temporarily deployed to GITLAB April 3, 2026 00:44 Inactive

nnshah1 force-pushed the nnshah1/DIS-1643-pr2-lifecycle-and-metrics branch from a144102 to 7448dfa Compare April 3, 2026 00:51

copy-pr-bot Bot temporarily deployed to GITLAB April 3, 2026 00:51 Inactive

nnshah1 force-pushed the nnshah1/DIS-1643-pr2-lifecycle-and-metrics branch from 7448dfa to c458b00 Compare April 3, 2026 00:54

copy-pr-bot Bot temporarily deployed to GITLAB April 3, 2026 00:54 Inactive

nnshah1 merged commit 679d9f1 into main Apr 3, 2026
95 checks passed

nnshah1 deleted the nnshah1/DIS-1643-pr2-lifecycle-and-metrics branch April 3, 2026 01:38

keivenchang reviewed Apr 3, 2026

View reviewed changes

copy-pr-bot Bot had a problem deploying to GITLAB April 3, 2026 06:59 Failure

sengopal mentioned this pull request Apr 17, 2026

fix(logging): remove duplicate request_id field in request lifecycle logs #8307

Open

2 tasks

jthomson04 mentioned this pull request May 11, 2026

[FEATURE]: server-side per-request logs upon sending a response #4967

Open

yao531441 pushed a commit to yao531441/dynamo that referenced this pull request May 13, 2026

feat: request lifecycle logging + metrics capture [DIS-1643] (ai-dyna…

5331de3

…mo#7840) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: request lifecycle logging + metrics capture [DIS-1643]#7840

feat: request lifecycle logging + metrics capture [DIS-1643]#7840
nnshah1 merged 7 commits into
mainfrom
nnshah1/DIS-1643-pr2-lifecycle-and-metrics

nnshah1 commented Apr 2, 2026 •

edited by keivenchang

Loading

Uh oh!

coderabbitai Bot commented Apr 2, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nnshah1 commented Apr 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nnshah1 commented Apr 3, 2026

Uh oh!

nnshah1 commented Apr 3, 2026

Uh oh!

Uh oh!

keivenchang Apr 3, 2026

Uh oh!

keivenchang Apr 3, 2026

Uh oh!

keivenchang Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nnshah1 commented Apr 2, 2026 • edited by keivenchang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 2, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nnshah1 commented Apr 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nnshah1 commented Apr 3, 2026

Uh oh!

nnshah1 commented Apr 3, 2026

Uh oh!

Uh oh!

keivenchang Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

keivenchang Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

keivenchang Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nnshah1 commented Apr 2, 2026 •

edited by keivenchang

Loading