doc: update metrics doc regarding frontend staged gauges by jh-nv · Pull Request #8459 · ai-dynamo/dynamo

jh-nv · 2026-04-21T19:18:54Z

Overview:

update metrics doc regarding frontend staged gauges

Details:

Follow-up docs for PR #8162 (staged frontend gauges).

Update the documentation for the new staged frontend gauges, and added deprecation notes for the ones to remove

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Documentation

Introduced improved frontend metrics: dynamo_frontend_active_requests for request lifetime tracking and dynamo_frontend_stage_requests with granular stage/phase labels
Updated autoscaling configuration guides and monitoring query examples with new metrics
Marked legacy metrics as deprecated with clear migration guidance
Enhanced troubleshooting documentation and metrics reference with examples and derived signal definitions

coderabbitai · 2026-04-21T19:20:55Z

Walkthrough

Documentation updated to reflect new frontend metrics. Replaced deprecated dynamo_frontend_queued_requests and dynamo_frontend_inflight_requests with dynamo_frontend_stage_requests{stage,phase} and dynamo_frontend_active_requests. Prometheus Adapter and KEDA configuration examples revised accordingly.

Changes

Cohort / File(s)	Summary
Documentation Updates `docs/kubernetes/autoscaling.md`, `docs/observability/metrics.md`	Replaced deprecated frontend metrics with new metrics. Updated autoscaling examples to use `dynamo_frontend_stage_requests` and `dynamo_frontend_active_requests`. Added detailed label semantics documentation and deprecated metric guidance. Added PromQL-style derived signal formulas and cross-references between documentation sections.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: updating metrics documentation for frontend staged gauges, which aligns with the PR's core objective of documenting new staged metrics.
Description check	✅ Passed	The description covers the overview, details about the follow-up documentation, and references the related PR `#8162`, but the 'Where should the reviewer start' section is empty and 'Related Issues' references a placeholder issue number.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

docs/kubernetes/autoscaling.md (1)

351-365: Make queue-depth queries resilient to future stage additions.

Docs define queue depth as preprocess + route + dispatch, but this rule sums all stage series. Consider filtering explicitly (and mirroring the same change in Line 511 KEDA query) to avoid semantic drift if new stages are added later.

Suggested doc diff

-  metricsQuery: |
-    sum(<<.Series>>{<<.LabelMatchers>>}) by (namespace, dynamo_namespace)
+  metricsQuery: |
+    sum(<<.Series>>{<<.LabelMatchers>>,stage=~"preprocess|route|dispatch"}) by (namespace, dynamo_namespace)

-        sum(dynamo_frontend_stage_requests{dynamo_namespace="default-sglang-agg"})
+        sum(dynamo_frontend_stage_requests{dynamo_namespace="default-sglang-agg",stage=~"preprocess|route|dispatch"})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/kubernetes/autoscaling.md` around lines 351 - 365, The current
prometheus-adapter rule for dynamo_queued_requests sums all
dynamo_frontend_stage_requests series which risks semantic drift if new stages
are added; update the metricsQuery for the rule named "dynamo_queued_requests"
to explicitly sum only the preprocess, route, and dispatch stages (e.g., sum of
those three label-filtered series) instead of using a wildcard of <<.Series>>;
also apply the same explicit-stage filtering change to the corresponding KEDA
query that references frontend stage requests so both rules remain consistent.

docs/observability/metrics.md (1)

176-180: Clarify aggregation scope in derived PromQL examples.

Line 176 says these are “per frontend pod,” but the shown sum(...) expressions aggregate across all matched series unless scoped/grouped. Consider rewording or adding by(...) examples so operators don’t misread query scope.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/observability/metrics.md` around lines 176 - 180, The PromQL examples
claim "per frontend pod" but use bare sum(...) which aggregates across all
series; update the docs around dynamo_frontend_stage_requests and
dynamo_frontend_active_requests to either (a) clarify that the shown sum()
examples produce cluster-wide totals, or (b) provide explicit per-pod
aggregation variants such as using sum by(pod)(...) or sum without reduction but
grouped appropriately; reference the three derived operators (the two
expressions using sum(dynamo_frontend_stage_requests) and
sum(dynamo_frontend_active_requests) - sum(dynamo_frontend_stage_requests), and
the Router saturation sum(dynamo_frontend_stage_requests{stage="route"})) and
add a short note showing the per-pod and cluster-wide forms so readers aren’t
misled about scope.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/kubernetes/autoscaling.md`:
- Around line 351-365: The current prometheus-adapter rule for
dynamo_queued_requests sums all dynamo_frontend_stage_requests series which
risks semantic drift if new stages are added; update the metricsQuery for the
rule named "dynamo_queued_requests" to explicitly sum only the preprocess,
route, and dispatch stages (e.g., sum of those three label-filtered series)
instead of using a wildcard of <<.Series>>; also apply the same explicit-stage
filtering change to the corresponding KEDA query that references frontend stage
requests so both rules remain consistent.

In `@docs/observability/metrics.md`:
- Around line 176-180: The PromQL examples claim "per frontend pod" but use bare
sum(...) which aggregates across all series; update the docs around
dynamo_frontend_stage_requests and dynamo_frontend_active_requests to either (a)
clarify that the shown sum() examples produce cluster-wide totals, or (b)
provide explicit per-pod aggregation variants such as using sum by(pod)(...) or
sum without reduction but grouped appropriately; reference the three derived
operators (the two expressions using sum(dynamo_frontend_stage_requests) and
sum(dynamo_frontend_active_requests) - sum(dynamo_frontend_stage_requests), and
the Router saturation sum(dynamo_frontend_stage_requests{stage="route"})) and
add a short note showing the per-pod and cluster-wide forms so readers aren’t
misled about scope.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c658035a-fead-43c4-8a2f-56a426fd75dc

📥 Commits

Reviewing files that changed from the base of the PR and between ddd19a6 and b63fddd.

📒 Files selected for processing (2)

docs/kubernetes/autoscaling.md
docs/observability/metrics.md

github-actions · 2026-04-21T19:21:02Z

🌿 Fern Docs Preview: https://nvidia-preview-9f539174-9876-43fd-bddb-d3df75be522e.docs.buildwithfern.com/dynamo/dev

it actually queries.

doc: update metrics doc regarding frontend staged gauge

b63fddd

pull-request-size Bot added the size/M label Apr 21, 2026

github-actions Bot added the documentation Improvements or additions to documentation label Apr 21, 2026

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

keivenchang approved these changes Apr 22, 2026

View reviewed changes

Comment thread docs/kubernetes/autoscaling.md Outdated

jh-nv added 3 commits April 22, 2026 12:20

fix KEDA/Prometheus-Adapter metric identifier to match the PromQL

a13be81

it actually queries.

address comments

f354f32

update

e2abd4a

jh-nv merged commit 5135c32 into main Apr 22, 2026
63 of 64 checks passed

jh-nv deleted the jihao/frontend_doc branch April 22, 2026 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: update metrics doc regarding frontend staged gauges#8459

doc: update metrics doc regarding frontend staged gauges#8459
jh-nv merged 4 commits into
mainfrom
jihao/frontend_doc

jh-nv commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 21, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

github-actions Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jh-nv commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Documentation

Uh oh!

coderabbitai Bot commented Apr 21, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jh-nv commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented Apr 21, 2026 •

edited

Loading