feat: monitoring demand endpoint decoupled from completion windows#987
feat: monitoring demand endpoint decoupled from completion windows#987fergusfinn wants to merge 4 commits into
Conversation
Adds \`GET /admin/api/v1/monitoring/demand?window=<...>\` — returns the same per-model deadline-bucketed counts that scouter and friends need for planning, but lets the caller choose the window shape instead of implicitly taking whatever is in \`config.batches.allowed_completion_windows\`. Each \`window=\` entry is either \`<end>\` (shorthand for \`0s:<end>\`, matching the legacy "due within N" behaviour) or \`<start>:<end>\` for a disjoint range. Both bounds are offsets from \`now\`. Caller can mix shapes in one request and the raw string is echoed back as the response key. Example: GET /admin/api/v1/monitoring/demand?window=1h,24h → cumulative: due within 1h, due within 24h GET /admin/api/v1/monitoring/demand?window=0s:1h,1h:24h → disjoint: due 0-1h, due 1-24h Follows the trait rename in fusillade#217 (\`get_pending_request_counts_by_model_and_completion_window\` → \`get_pending_request_counts_by_model_and_window\`, windows now carry both start and end). Old \`/pending-request-counts\` endpoint is kept as-is so consumers can migrate at their own pace; it now feeds the new signature with \`start=0\` per window for behavioural parity. Pins fusillade via \`[patch.crates-io]\` temporarily; will drop once fusillade releases.
Deploying control-layer with
|
| Latest commit: |
81926ed
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://1e5e0f13.control-layer.pages.dev |
| Branch Preview URL: | https://feat-demand-by-lookahead-end.control-layer.pages.dev |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e37416144
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # TODO(demand-endpoint): remove once fusillade releases the | ||
| # `get_pending_request_counts_by_model_and_window` signature (doublewordai/fusillade#217). | ||
| [patch.crates-io] | ||
| fusillade = { path = "../fusillade" } |
There was a problem hiding this comment.
Remove local fusillade path override before merging
Adding [patch.crates-io] fusillade = { path = "../fusillade" } makes this repository unbuildable unless a sibling ../fusillade checkout is manually present. In this repo’s CI (.github/workflows/ci.yaml), jobs only do actions/checkout of this repository, so dependency resolution fails (e.g., cargo metadata --format-version 1 errors reading ../fusillade/Cargo.toml) and backend/build pipelines are blocked.
Useful? React with 👍 / 👎.
| return None; | ||
| } | ||
| let (start_secs, end_secs) = match trimmed.split_once(':') { | ||
| Some((start, end)) => (parse_window_to_seconds(start), parse_window_to_seconds(end)), |
There was a problem hiding this comment.
Parse
0s demand ranges as zero, not 24h
parse_demand_window reuses parse_window_to_seconds for the range start, but that helper treats non-positive values as invalid and converts them to 24h. As a result, documented inputs like window=0s:1h are parsed as (86400, 3600), yielding an inverted/empty bucket instead of “now to 1h,” so the new disjoint-window API returns incorrect demand counts for valid caller input.
Useful? React with 👍 / 👎.
| ) -> Result<Json<PendingCountsByModelAndWindow>, Error> { | ||
| let windows: Vec<(String, i64, i64)> = params.window.split(',').filter_map(parse_demand_window).collect(); | ||
|
|
||
| if windows.is_empty() { |
There was a problem hiding this comment.
Reject malformed demand windows instead of coercing
get_demand only returns 400 when no windows are parsed, but malformed entries are still parsed because parse_window_to_seconds silently defaults invalid tokens to 24h. That means requests like window=bad or window=1h:bad return 200 with misleading bucket calculations instead of a client error, which contradicts the endpoint contract and can hide caller mistakes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Adds a new admin monitoring endpoint to query per-model pending/claimed/processing request demand using caller-defined deadline windows, decoupling monitoring buckets from batch completion-window configuration.
Changes:
- Introduces
GET /admin/api/v1/monitoring/demand?window=...and exposes it via router + OpenAPI. - Updates internal fusillade calls to use the new
(label, start, end)window tuple signature. - Temporarily patches the
fusilladedependency to a local path to consume an unreleased API change.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| dwctl/src/openapi/admin.rs | Registers the new demand handler in the admin OpenAPI surface. |
| dwctl/src/lib.rs | Routes /monitoring/demand to the new handler. |
| dwctl/src/api/handlers/queue.rs | Implements get_demand, query parsing, and adds endpoint tests; updates legacy endpoint to new fusillade signature. |
| dwctl/src/api/handlers/batches.rs | Updates window tuple shapes and fusillade callsites for capacity reservation logic + tests. |
| Cargo.toml | Adds a workspace [patch.crates-io] override for fusillade. |
| Cargo.lock | Reflects the path override by removing registry source/checksum for fusillade. |
| fn parse_demand_window(raw: &str) -> Option<(String, i64, i64)> { | ||
| let trimmed = raw.trim(); | ||
| if trimmed.is_empty() { | ||
| return None; | ||
| } | ||
| let (start_secs, end_secs) = match trimmed.split_once(':') { | ||
| Some((start, end)) => (parse_window_to_seconds(start), parse_window_to_seconds(end)), | ||
| None => (0, parse_window_to_seconds(trimmed)), | ||
| }; | ||
| Some((trimmed.to_string(), start_secs, end_secs)) |
| response.assert_status_ok(); | ||
| let counts: HashMap<String, HashMap<String, i64>> = response.json(); | ||
| assert_eq!(counts.len(), 0, "no pending requests exist in a clean database"); | ||
| } |
| # TODO(demand-endpoint): remove once fusillade releases the | ||
| # `get_pending_request_counts_by_model_and_window` signature (doublewordai/fusillade#217). | ||
| [patch.crates-io] | ||
| fusillade = { path = "../fusillade" } |
Follow-up to fusillade#217 review feedback — the trait method now takes Option<i64> for the lower bound to preserve the old "includes overdue" semantic when start is None, and uses a half-open interval on the upper bound so adjacent disjoint windows don't double-count. control-layer changes: - \`/pending-request-counts\`: windows now pass \`None\` for start, so overdue requests are still counted exactly like before the rename. - \`/demand\`: shorthand \`window=N\` maps to \`(label, None, N_secs)\` — behavioural parity with the legacy endpoint for scouter and friends. Explicit \`window=S:E\` maps to \`(label, Some(S_secs), E_secs)\` for strict half-open bounds. - \`reserve_capacity_for_batch\`: same signature change. The reservation path wants to include overdue work too.
fusillade 16.0.0 is now on crates.io with the \`get_pending_request_counts_by_model_and_window\` signature this PR uses (doublewordai/fusillade#217). Drop the \`[patch.crates-io]\` block that pointed at the local checkout and pin the released version. Cargo.lock also picks up a few windows-sys 0.60 → 0.61 transitive bumps as a side-effect of re-resolution; unrelated to this PR.
Addresses copilot feedback on #987: The original \`parse_demand_window\` reused \`parse_window_to_seconds\` from the batch-API path, which is forgiving by design — it defaults zero/negative/malformed input to 24h. That meant: - \`window=0s:1h\` parsed \`0s\` as 86400 (the 24h default for non-positive values), flipping the intended "strict from now" semantic. - \`window=foo\` silently succeeded with 24h buckets instead of being rejected. Fix: a local \`parse_demand_duration\` that strictly parses \`<int><unit>\` (h/m/s, non-negative integer) and returns \`None\` on anything malformed. \`parse_demand_window\` propagates the error up as a string, and the handler maps it to \`400 Bad Request\`. Tests: added regressions for \`window=0s:1h\` (must succeed) and \`window=foo\` / \`1x\` / \`1h,bad\` / \`-1h:1h\` (must 400).
|
Review feedback addressed in the follow-up commit:
|
Summary
Adds `GET /admin/api/v1/monitoring/demand?window=<...>` — same per-model deadline-bucketed counts that the existing `/pending-request-counts` returns, but the caller picks the window shape via query parameter instead of implicitly inheriting `config.batches.allowed_completion_windows`.
Each `window=` entry is either:
Both bounds are offsets from `now`. Shapes can be mixed in a single call. The raw input string is echoed back as the response key, so scouter-style callers that send `window=1h,24h` still see `"1h"` / `"24h"` keys.
Why
Scouter reads this endpoint to plan replica allocation and only cares about "demand in the next N hours". The old endpoint picked its buckets from whatever completion-window SLAs the batch API exposed to users — e.g. if product adds a `6h` SLA tier for a customer, scouter's buckets would silently shift under it. This PR treats that coupling as the implementation detail it is and lets planning consumers name their own windows.
Dependencies
Kept untouched
`GET /admin/api/v1/monitoring/pending-request-counts` still returns the same shape and still pulls windows from `allowed_completion_windows` — no scouter-side deployment coordination needed. We can remove it once no caller uses it.
Test plan
Post-merge