Skip to content

Instant-admit free sessions when below per-model capacity#530

Merged
jahooma merged 1 commit intomainfrom
jahooma/instant-admit
Apr 21, 2026
Merged

Instant-admit free sessions when below per-model capacity#530
jahooma merged 1 commit intomainfrom
jahooma/instant-admit

Conversation

@jahooma
Copy link
Copy Markdown
Contributor

@jahooma jahooma commented Apr 21, 2026

Summary

When a freebuff model's active-session count is below its configured instantAdmitCapacity, requestSession promotes the user inline instead of making them wait up to 15s for the next admission tick. Capacities are server-side config (web/src/server/free-session/config.ts): GLM=50, MiniMax=200 — tuned per deployment without bumping the shared common package. Above the threshold the existing FIFO queue + tick still apply, so backpressure kicks in exactly when it's needed.

Test plan

  • Unit tests: below-capacity admits, at-capacity queues, per-model capacities independent
  • Existing free-session + handler tests still pass (65 tests)
  • Smoke-test a real requestSession hop in dev to confirm active response without polling

When a model's active-session count is under its configured
`instantAdmitCapacity`, requestSession now promotes the user inline
instead of queuing for the 15s admission tick. Capacity is configured
server-side (web/src/server/free-session/config.ts): GLM=50, MiniMax=200.
Above the threshold the existing FIFO queue + tick still apply.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 21, 2026

Greptile Summary

This PR introduces an instant-admit fast path in requestSession: when a freebuff model's concurrent active-session count is below its configured instantAdmitCapacity, a newly-queued user is promoted to active inline — within the same HTTP request — instead of waiting up to 15 s for the next FIFO admission tick. Above the threshold the existing queue + tick behaviour is unchanged, preserving backpressure exactly when needed.

Key changes:

  • config.ts: Adds getInstantAdmitCapacity with per-model soft limits (GLM=50, MiniMax=200). Returns 0 for unknown models, keeping the queue-only default for any model not listed.
  • store.ts: Adds activeCountForModel (single-column COUNT for the threshold check) and promoteQueuedUser (targeted UPDATE that flips one queued row to active; is a no-op if the row already moved on).
  • public-api.ts: After joinOrTakeOver resolves to a queued row, the instant-admit block fires: reads the count, and if below capacity calls promoteQueuedUser. The acknowledged TOCTOU race can overshoot by up to concurrency − 1; this is acceptable since capacities are sized with headroom.
  • Tests: Three new requestSession tests (below-cap admit, at-cap queue, per-model independence) plus backward-compatible stub deps added to the handler test suite.

Confidence Score: 4/5

Safe to merge — the instant-admit path degrades gracefully to the existing queue on any failure, and the acknowledged race overshoot is acceptable given headroom in configured capacities.

Logic is sound, tests cover the three key scenarios, and the fallback on a null promoteQueuedUser return is correct. Two P2 observations remain: activeCountForModel slightly overcounts by including grace-window-expired rows, and hardcoded model ID strings in config.ts could silently drift from the canonical registry. Neither breaks the primary user path.

web/src/server/free-session/config.ts — model ID strings should ideally reference canonical constants to prevent silent drift on rename.

Important Files Changed

Filename Overview
web/src/server/free-session/public-api.ts Core logic change: adds instant-admit check after joinOrTakeOver. Race condition is acknowledged and acceptable; fallback to queue on null promotion is correct; existing gate paths are unaffected.
web/src/server/free-session/store.ts Adds activeCountForModel and promoteQueuedUser. Both are correctly implemented; minor note that activeCountForModel counts grace-window-expired rows, making capacity checks slightly conservative.
web/src/server/free-session/config.ts Adds getInstantAdmitCapacity with hardcoded per-model capacities. Unknown models return 0 (always queue). Model ID strings are not tied to canonical constants, so a rename would silently degrade to queue-only.
web/src/server/free-session/tests/public-api.test.ts Three new instant-admit tests covering below-capacity admit, at-capacity queue fallback, and per-model independence. Existing tests isolated via getInstantAdmitCapacity: () => 0 default.
web/src/app/api/v1/freebuff/session/tests/session.test.ts Handler test deps updated with new required fields. Correctly disables instant-admit for handler-level tests via stub deps.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([POST /session]) --> B[resolveFreebuffModel]
    B --> C{waiting room\nenabled?}
    C -- No --> D([status: disabled])
    C -- Yes --> E[joinOrTakeOver]
    E -- model_locked --> F([status: model_locked])
    E -- row returned --> G{row.status\n=== queued?}
    G -- No / already active --> K[viewForRow]
    G -- Yes --> H{getInstantAdmitCapacity\n> 0?}
    H -- No / 0 --> K
    H -- Yes --> I[activeCountForModel]
    I --> J{activeCount\n< capacity?}
    J -- No --> K
    J -- Yes --> L[promoteQueuedUser\nrow → active]
    L -- null / race lost --> K
    L -- promoted row --> K
    K --> M([return SessionStateResponse])
Loading

Reviews (1): Last reviewed commit: "Instant-admit free sessions when below p..." | Re-trigger Greptile

Comment on lines +184 to +195
export async function activeCountForModel(model: string): Promise<number> {
const rows = await db
.select({ n: count() })
.from(schema.freeSession)
.where(
and(
eq(schema.freeSession.status, 'active'),
eq(schema.freeSession.model, model),
),
)
return Number(rows[0]?.n ?? 0)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 activeCountForModel includes expired-but-still-active sessions

activeCountForModel counts every row where status = 'active', including sessions that have passed expires_at but are still within the grace window (not yet swept by sweepExpired). This means instant-admit can be blocked even when real capacity exists — e.g., if 10 of the 50 configured GLM slots have expired but haven't been swept yet, the capacity check still reads 50 and falls back to the queue unnecessarily.

Since the comment in requestSession already states capacities are chosen "with headroom for this," the conservative behaviour is intentional, but it's worth knowing. Adding an expires_at > now filter here would make the count more accurate at the cost of passing now through the call chain:

// Optional tighter version — only count truly live sessions:
export async function activeCountForModel(model: string, now?: Date): Promise<number> {
  const conditions = [
    eq(schema.freeSession.status, 'active'),
    eq(schema.freeSession.model, model),
    ...(now ? [gt(schema.freeSession.expires_at, now)] : []),
  ]
  ...
}

Not a bug given the acknowledged headroom, but worth documenting or addressing in a follow-up.

Comment on lines +50 to +53
const INSTANT_ADMIT_CAPACITY: Record<string, number> = {
'z-ai/glm-5.1': 50,
'minimax/minimax-m2.7': 200,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded model IDs could silently drift from the canonical registry

INSTANT_ADMIT_CAPACITY hardcodes model ID strings ('z-ai/glm-5.1', 'minimax/minimax-m2.7'). If a model is renamed or a new default is added to the @codebuff/common/constants/freebuff-models registry, this map will silently return 0 for the new ID and every user will fall back to the FIFO queue — without any error or warning.

Consider importing the canonical model ID constants from @codebuff/common so a rename triggers a compile-time error rather than a silent behaviour change:

import { GLM_MODEL_ID, MINIMAX_MODEL_ID } from '@codebuff/common/constants/freebuff-models'

const INSTANT_ADMIT_CAPACITY: Record<string, number> = {
  [GLM_MODEL_ID]: 50,
  [MINIMAX_MODEL_ID]: 200,
}

@jahooma jahooma merged commit 950b2b4 into main Apr 21, 2026
34 checks passed
@jahooma jahooma deleted the jahooma/instant-admit branch April 21, 2026 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant