Instant-admit free sessions when below per-model capacity#530
Conversation
When a model's active-session count is under its configured `instantAdmitCapacity`, requestSession now promotes the user inline instead of queuing for the 15s admission tick. Capacity is configured server-side (web/src/server/free-session/config.ts): GLM=50, MiniMax=200. Above the threshold the existing FIFO queue + tick still apply.
Greptile SummaryThis PR introduces an instant-admit fast path in Key changes:
Confidence Score: 4/5Safe to merge — the instant-admit path degrades gracefully to the existing queue on any failure, and the acknowledged race overshoot is acceptable given headroom in configured capacities. Logic is sound, tests cover the three key scenarios, and the fallback on a null
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A([POST /session]) --> B[resolveFreebuffModel]
B --> C{waiting room\nenabled?}
C -- No --> D([status: disabled])
C -- Yes --> E[joinOrTakeOver]
E -- model_locked --> F([status: model_locked])
E -- row returned --> G{row.status\n=== queued?}
G -- No / already active --> K[viewForRow]
G -- Yes --> H{getInstantAdmitCapacity\n> 0?}
H -- No / 0 --> K
H -- Yes --> I[activeCountForModel]
I --> J{activeCount\n< capacity?}
J -- No --> K
J -- Yes --> L[promoteQueuedUser\nrow → active]
L -- null / race lost --> K
L -- promoted row --> K
K --> M([return SessionStateResponse])
Reviews (1): Last reviewed commit: "Instant-admit free sessions when below p..." | Re-trigger Greptile |
| export async function activeCountForModel(model: string): Promise<number> { | ||
| const rows = await db | ||
| .select({ n: count() }) | ||
| .from(schema.freeSession) | ||
| .where( | ||
| and( | ||
| eq(schema.freeSession.status, 'active'), | ||
| eq(schema.freeSession.model, model), | ||
| ), | ||
| ) | ||
| return Number(rows[0]?.n ?? 0) | ||
| } |
There was a problem hiding this comment.
activeCountForModel includes expired-but-still-active sessions
activeCountForModel counts every row where status = 'active', including sessions that have passed expires_at but are still within the grace window (not yet swept by sweepExpired). This means instant-admit can be blocked even when real capacity exists — e.g., if 10 of the 50 configured GLM slots have expired but haven't been swept yet, the capacity check still reads 50 and falls back to the queue unnecessarily.
Since the comment in requestSession already states capacities are chosen "with headroom for this," the conservative behaviour is intentional, but it's worth knowing. Adding an expires_at > now filter here would make the count more accurate at the cost of passing now through the call chain:
// Optional tighter version — only count truly live sessions:
export async function activeCountForModel(model: string, now?: Date): Promise<number> {
const conditions = [
eq(schema.freeSession.status, 'active'),
eq(schema.freeSession.model, model),
...(now ? [gt(schema.freeSession.expires_at, now)] : []),
]
...
}Not a bug given the acknowledged headroom, but worth documenting or addressing in a follow-up.
| const INSTANT_ADMIT_CAPACITY: Record<string, number> = { | ||
| 'z-ai/glm-5.1': 50, | ||
| 'minimax/minimax-m2.7': 200, | ||
| } |
There was a problem hiding this comment.
Hardcoded model IDs could silently drift from the canonical registry
INSTANT_ADMIT_CAPACITY hardcodes model ID strings ('z-ai/glm-5.1', 'minimax/minimax-m2.7'). If a model is renamed or a new default is added to the @codebuff/common/constants/freebuff-models registry, this map will silently return 0 for the new ID and every user will fall back to the FIFO queue — without any error or warning.
Consider importing the canonical model ID constants from @codebuff/common so a rename triggers a compile-time error rather than a silent behaviour change:
import { GLM_MODEL_ID, MINIMAX_MODEL_ID } from '@codebuff/common/constants/freebuff-models'
const INSTANT_ADMIT_CAPACITY: Record<string, number> = {
[GLM_MODEL_ID]: 50,
[MINIMAX_MODEL_ID]: 200,
}
Summary
When a freebuff model's active-session count is below its configured
instantAdmitCapacity,requestSessionpromotes the user inline instead of making them wait up to 15s for the next admission tick. Capacities are server-side config (web/src/server/free-session/config.ts): GLM=50, MiniMax=200 — tuned per deployment without bumping the sharedcommonpackage. Above the threshold the existing FIFO queue + tick still apply, so backpressure kicks in exactly when it's needed.Test plan
requestSessionhop in dev to confirm active response without polling