Rate-limit freebuff GLM sessions to 5 per 20 hours#537
Conversation
Adds a free_session_admit audit log (one row per queued→active transition) and gates POST /api/v1/freebuff/session against it so GLM 5.1 users who've already had 5 one-hour sessions in the last 20h are blocked with a new rate_limited status (HTTP 429). Queued/active responses now carry an optional rateLimit quota the CLI renders as "N / 5 used in last 20h" so users see their remaining allowance as soon as they join the waitlist. Minimax is left unlimited. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Greptile SummaryThis PR introduces a per-user, per-model session rate limit for freebuff GLM 5.1 users (5 sessions per 20-hour rolling window). It adds a new One P1 logic issue was found: the rate-limit check in Confidence Score: 3/5Needs one targeted fix before merge: reconnecting to an already-admitted session incorrectly triggers the rate-limit block. The overall design is solid — DB schema, transaction safety, CLI rendering, and test coverage are all well done. The P1 bug where the rate-limit check runs before reading the existing session row means a user promoted from the queue while their CLI is offline loses their legitimately-earned 5th session on reconnect. This directly affects the primary user path (rejoining after disconnection at the limit boundary) and is worth fixing before shipping. web/src/server/free-session/public-api.ts (rate-limit check ordering, lines 235–255) Important Files Changed
Sequence DiagramsequenceDiagram
participant CLI
participant API as POST /session
participant PublicAPI as public-api.ts
participant Store as store.ts
participant DB
CLI->>API: POST (model=z-ai/glm-5.1)
API->>PublicAPI: requestSession()
PublicAPI->>DB: listRecentAdmits(userId, model, since, limit=5)
DB-->>PublicAPI: [Date, Date, Date, Date, Date] (5 rows)
alt recentCount >= 5 (rate limited)
PublicAPI-->>API: { status: rate_limited, retryAfterMs }
API-->>CLI: 429 { status: rate_limited, ... }
CLI->>CLI: Show rate-limited screen, stop polling
else recentCount < 5 (allowed)
PublicAPI->>DB: joinOrTakeOver (UPSERT queued)
alt instant-admit capacity available
PublicAPI->>DB: activeCountForModel()
DB-->>PublicAPI: count < capacity
PublicAPI->>Store: promoteQueuedUser()
Store->>DB: UPDATE free_session SET status=active
Store->>DB: INSERT free_session_admit (audit row)
Store-->>PublicAPI: active row
end
PublicAPI->>DB: listRecentAdmits() again (attachRateLimit)
DB-->>PublicAPI: updated count
PublicAPI-->>API: { status: queued/active, rateLimit }
API-->>CLI: 200 with quota info
CLI->>CLI: Show N/5 used in waiting room
end
Note over Store,DB: admitFromQueue tick also writes free_session_admit
Reviews (1): Last reviewed commit: "Rate-limit freebuff GLM sessions to 5 pe..." | Re-trigger Greptile |
| // Rate-limit check runs before joinOrTakeOver so heavy users never even | ||
| // create a queued row. Only models listed in RATE_LIMITS are gated; others | ||
| // (Minimax today) fall through unchanged. | ||
| const snapshot = await fetchRateLimitSnapshot(params.userId, model, deps) | ||
| if (snapshot && snapshot.info.recentCount >= snapshot.info.limit) { | ||
| // Oldest admit's window-anniversary is when one slot opens back up. | ||
| // Clamped at 0 so a clock skew can't surface a negative retry-after. | ||
| const windowMs = snapshot.info.windowHours * 60 * 60 * 1000 | ||
| const retryAfterMs = Math.max( | ||
| 0, | ||
| (snapshot.oldest?.getTime() ?? 0) + windowMs - nowOf(deps).getTime(), | ||
| ) | ||
| return { | ||
| status: 'rate_limited', | ||
| model, | ||
| limit: snapshot.info.limit, | ||
| windowHours: snapshot.info.windowHours, | ||
| recentCount: snapshot.info.recentCount, | ||
| retryAfterMs, | ||
| } | ||
| } |
There was a problem hiding this comment.
Rate limit check fires before join, blocking reconnect to legitimately-admitted session
The rate-limit check in requestSession runs unconditionally before joinOrTakeOver. This means a user who is promoted from the queue while their CLI is offline (5th admission written by the tick) and then reconnects will hit the check in this order:
- CLI starts → GET returns
active(they were promoted while away) - Startup-takeover branch fires a POST to rotate instance id
- POST enters
requestSession→fetchRateLimitSnapshot→recentCount = 5 >= 5 - Returns
rate_limitedwithout ever callingjoinOrTakeOver - User sees "Session limit reached" and their legitimately-earned 5th session expires unused
The correct behaviour is that the rate-limit gate should only block new queue entries, not reconnections to an already-active or already-queued slot. A minimal fix is to read the existing row first and only apply the rate-limit check when no active/queued row exists.
The test suite does not cover this reconnect-while-promoted scenario, which is why it wasn't caught.
| const status = | ||
| state.status === 'model_locked' ? 409 : state.status === 'banned' ? 403 : 200 | ||
| state.status === 'model_locked' | ||
| ? 409 | ||
| : state.status === 'banned' | ||
| ? 403 | ||
| : state.status === 'rate_limited' | ||
| ? 429 | ||
| : 200 | ||
| return NextResponse.json(state, { status }) |
There was a problem hiding this comment.
Missing
Retry-After HTTP header on 429 response
RFC 6585 and standard API practice expect a Retry-After header on 429 responses so HTTP middleware, proxies, and any future clients can honour the back-off without parsing the JSON body. The state object already carries retryAfterMs, so the header can be set cheaply:
const retryAfterSec = Math.ceil((state as { retryAfterMs?: number }).retryAfterMs ?? 0) / 1000
return NextResponse.json(state, {
status: 429,
headers: { 'Retry-After': String(Math.ceil(retryAfterSec)) },
})| ); | ||
| --> statement-breakpoint | ||
| ALTER TABLE "free_session_admit" ADD CONSTRAINT "free_session_admit_user_id_user_id_fk" FOREIGN KEY ("user_id") REFERENCES "public"."user"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint | ||
| CREATE INDEX "idx_free_session_admit_user_model_time" ON "free_session_admit" USING btree ("user_id","model","admitted_at"); No newline at end of file |
…limit # Conflicts: # cli/src/hooks/use-freebuff-session.ts # web/src/app/api/v1/freebuff/session/_handlers.ts # web/src/server/free-session/public-api.ts
requestSession is the takeover path as well as the join path, so a user whose 5th GLM admit put them at the cap would get rate_limited on CLI restart and lose access to their still-active session (or their queue position). Skip the quota check when the caller already holds a queued or active+unexpired row for the same model — admit counts only need to gate fresh admissions, not re-anchoring to an existing row. Expired rows still count as fresh and remain blocked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
free_session_admitaudit log (one row per queued→active transition) via a new migration, and records admissions from bothadmitFromQueueandpromoteQueuedUser.POST /api/v1/freebuff/sessionagainst that log so GLM 5.1 users who've already had 5 one-hour sessions in the last 20h get a newrate_limitedresponse (HTTP 429). Minimax stays unlimited.rateLimitquota snapshot that the CLI renders as "N / 5 used in last 20h" the moment the user joins the waitlist; therate_limitedterminal screen shows "X of 5 sessions used on <model> in the last 20h. Try again in <retry-after>."Test plan
tsc --noEmitclean acrossweb,cli,common,packages/internal.🤖 Generated with Claude Code