Skip to content

Rate-limit freebuff GLM sessions to 5 per 20 hours#537

Merged
jahooma merged 3 commits intomainfrom
jahooma/freebuff-glm-limit
Apr 24, 2026
Merged

Rate-limit freebuff GLM sessions to 5 per 20 hours#537
jahooma merged 3 commits intomainfrom
jahooma/freebuff-glm-limit

Conversation

@jahooma
Copy link
Copy Markdown
Contributor

@jahooma jahooma commented Apr 22, 2026

Summary

  • Adds a free_session_admit audit log (one row per queued→active transition) via a new migration, and records admissions from both admitFromQueue and promoteQueuedUser.
  • Gates POST /api/v1/freebuff/session against that log so GLM 5.1 users who've already had 5 one-hour sessions in the last 20h get a new rate_limited response (HTTP 429). Minimax stays unlimited.
  • Queued/active responses now carry an optional rateLimit quota snapshot that the CLI renders as "N / 5 used in last 20h" the moment the user joins the waitlist; the rate_limited terminal screen shows "X of 5 sessions used on <model> in the last 20h. Try again in <retry-after>."

Test plan

  • 574 web unit tests pass, including 5 new rate-limit cases (over-limit block, window rolloff, Minimax unlimited, queued quota display, instant-admit quota bump).
  • tsc --noEmit clean across web, cli, common, packages/internal.
  • Manual: run the freebuff CLI, join the GLM queue, verify the quota line appears; seed 5 admits in the last 20h and confirm the 6th attempt shows the rate-limited screen.

🤖 Generated with Claude Code

Adds a free_session_admit audit log (one row per queued→active transition)
and gates POST /api/v1/freebuff/session against it so GLM 5.1 users who've
already had 5 one-hour sessions in the last 20h are blocked with a new
rate_limited status (HTTP 429). Queued/active responses now carry an
optional rateLimit quota the CLI renders as "N / 5 used in last 20h" so
users see their remaining allowance as soon as they join the waitlist.
Minimax is left unlimited.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 22, 2026

Greptile Summary

This PR introduces a per-user, per-model session rate limit for freebuff GLM 5.1 users (5 sessions per 20-hour rolling window). It adds a new free_session_admit audit table that records every queued→active promotion, and gates POST /api/v1/freebuff/session against that log. Minimax remains unlimited. The CLI gains a "N / 5 used in last 20h" quota line in the waiting-room screen and a terminal rate_limited screen with a retry-after countdown.

One P1 logic issue was found: the rate-limit check in requestSession runs before joinOrTakeOver, so a user who is promoted from the queue while their CLI is offline (their 5th admission is written by the tick) will get a rate_limited response when they try to reconnect and take over their active slot — effectively losing that session. The fix is to skip the rate-limit gate for existing active/queued rows.

Confidence Score: 3/5

Needs one targeted fix before merge: reconnecting to an already-admitted session incorrectly triggers the rate-limit block.

The overall design is solid — DB schema, transaction safety, CLI rendering, and test coverage are all well done. The P1 bug where the rate-limit check runs before reading the existing session row means a user promoted from the queue while their CLI is offline loses their legitimately-earned 5th session on reconnect. This directly affects the primary user path (rejoining after disconnection at the limit boundary) and is worth fixing before shipping.

web/src/server/free-session/public-api.ts (rate-limit check ordering, lines 235–255)

Important Files Changed

Filename Overview
web/src/server/free-session/public-api.ts Core rate-limit gate + quota-snapshot logic; contains a P1 bug where reconnects to an already-admitted session are incorrectly blocked by the rate-limit check.
web/src/server/free-session/store.ts Adds freeSessionAdmit write in both admitFromQueue and promoteQueuedUser within transactions, and adds listRecentAdmits query; implementation is correct and well-structured.
web/src/app/api/v1/freebuff/session/_handlers.ts Routes rate_limited to HTTP 429 consistently with the existing pattern for banned (403) and model_locked (409); missing Retry-After header on the 429 response (P2).
common/src/types/freebuff-session.ts Adds FreebuffSessionRateLimit interface and rate_limited discriminant to the shared response union; types are accurate and well-documented.
packages/internal/src/db/migrations/0046_cloudy_firedrake.sql New free_session_admit table with composite index on (user_id, model, admitted_at); schema matches Drizzle definition; missing trailing newline (cosmetic).
web/src/server/free-session/tests/public-api.test.ts Good coverage of rate-limit happy path, window rolloff, Minimax unlimited, and instant-admit quota bump; missing a test for the reconnect-to-active-after-queue-promotion scenario.
cli/src/hooks/use-freebuff-session.ts Handles 429 from POST as a terminal non-throw state; rate_limited correctly returns null from nextDelayMs stopping the poll loop.
cli/src/components/waiting-room-screen.tsx Adds rate-limited terminal screen and inline quota display for rate-limited models; formatRetryAfter handles edge cases (non-finite, ≤0) gracefully.
cli/src/app.tsx Correctly routes rate_limited to the WaitingRoomScreen alongside other terminal statuses.

Sequence Diagram

sequenceDiagram
    participant CLI
    participant API as POST /session
    participant PublicAPI as public-api.ts
    participant Store as store.ts
    participant DB

    CLI->>API: POST (model=z-ai/glm-5.1)
    API->>PublicAPI: requestSession()
    PublicAPI->>DB: listRecentAdmits(userId, model, since, limit=5)
    DB-->>PublicAPI: [Date, Date, Date, Date, Date] (5 rows)

    alt recentCount >= 5 (rate limited)
        PublicAPI-->>API: { status: rate_limited, retryAfterMs }
        API-->>CLI: 429 { status: rate_limited, ... }
        CLI->>CLI: Show rate-limited screen, stop polling
    else recentCount < 5 (allowed)
        PublicAPI->>DB: joinOrTakeOver (UPSERT queued)
        alt instant-admit capacity available
            PublicAPI->>DB: activeCountForModel()
            DB-->>PublicAPI: count < capacity
            PublicAPI->>Store: promoteQueuedUser()
            Store->>DB: UPDATE free_session SET status=active
            Store->>DB: INSERT free_session_admit (audit row)
            Store-->>PublicAPI: active row
        end
        PublicAPI->>DB: listRecentAdmits() again (attachRateLimit)
        DB-->>PublicAPI: updated count
        PublicAPI-->>API: { status: queued/active, rateLimit }
        API-->>CLI: 200 with quota info
        CLI->>CLI: Show N/5 used in waiting room
    end

    Note over Store,DB: admitFromQueue tick also writes free_session_admit
Loading

Reviews (1): Last reviewed commit: "Rate-limit freebuff GLM sessions to 5 pe..." | Re-trigger Greptile

Comment on lines +235 to +255
// Rate-limit check runs before joinOrTakeOver so heavy users never even
// create a queued row. Only models listed in RATE_LIMITS are gated; others
// (Minimax today) fall through unchanged.
const snapshot = await fetchRateLimitSnapshot(params.userId, model, deps)
if (snapshot && snapshot.info.recentCount >= snapshot.info.limit) {
// Oldest admit's window-anniversary is when one slot opens back up.
// Clamped at 0 so a clock skew can't surface a negative retry-after.
const windowMs = snapshot.info.windowHours * 60 * 60 * 1000
const retryAfterMs = Math.max(
0,
(snapshot.oldest?.getTime() ?? 0) + windowMs - nowOf(deps).getTime(),
)
return {
status: 'rate_limited',
model,
limit: snapshot.info.limit,
windowHours: snapshot.info.windowHours,
recentCount: snapshot.info.recentCount,
retryAfterMs,
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Rate limit check fires before join, blocking reconnect to legitimately-admitted session

The rate-limit check in requestSession runs unconditionally before joinOrTakeOver. This means a user who is promoted from the queue while their CLI is offline (5th admission written by the tick) and then reconnects will hit the check in this order:

  1. CLI starts → GET returns active (they were promoted while away)
  2. Startup-takeover branch fires a POST to rotate instance id
  3. POST enters requestSessionfetchRateLimitSnapshotrecentCount = 5 >= 5
  4. Returns rate_limited without ever calling joinOrTakeOver
  5. User sees "Session limit reached" and their legitimately-earned 5th session expires unused

The correct behaviour is that the rate-limit gate should only block new queue entries, not reconnections to an already-active or already-queued slot. A minimal fix is to read the existing row first and only apply the rate-limit check when no active/queued row exists.

The test suite does not cover this reconnect-while-promoted scenario, which is why it wasn't caught.

Comment on lines 147 to 155
const status =
state.status === 'model_locked' ? 409 : state.status === 'banned' ? 403 : 200
state.status === 'model_locked'
? 409
: state.status === 'banned'
? 403
: state.status === 'rate_limited'
? 429
: 200
return NextResponse.json(state, { status })
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing Retry-After HTTP header on 429 response

RFC 6585 and standard API practice expect a Retry-After header on 429 responses so HTTP middleware, proxies, and any future clients can honour the back-off without parsing the JSON body. The state object already carries retryAfterMs, so the header can be set cheaply:

const retryAfterSec = Math.ceil((state as { retryAfterMs?: number }).retryAfterMs ?? 0) / 1000
return NextResponse.json(state, {
  status: 429,
  headers: { 'Retry-After': String(Math.ceil(retryAfterSec)) },
})

);
--> statement-breakpoint
ALTER TABLE "free_session_admit" ADD CONSTRAINT "free_session_admit_user_id_user_id_fk" FOREIGN KEY ("user_id") REFERENCES "public"."user"("id") ON DELETE cascade ON UPDATE no action;--> statement-breakpoint
CREATE INDEX "idx_free_session_admit_user_model_time" ON "free_session_admit" USING btree ("user_id","model","admitted_at"); No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 SQL file is missing a trailing newline

The diff ends with \ No newline at end of file. Most SQL linters, editors, and git diff tools treat POSIX files without a trailing newline as malformed. A newline should be added after the final ;.

jahooma and others added 2 commits April 24, 2026 15:16
…limit

# Conflicts:
#	cli/src/hooks/use-freebuff-session.ts
#	web/src/app/api/v1/freebuff/session/_handlers.ts
#	web/src/server/free-session/public-api.ts
requestSession is the takeover path as well as the join path, so a user
whose 5th GLM admit put them at the cap would get rate_limited on CLI
restart and lose access to their still-active session (or their queue
position). Skip the quota check when the caller already holds a queued
or active+unexpired row for the same model — admit counts only need to
gate fresh admissions, not re-anchoring to an existing row. Expired
rows still count as fresh and remain blocked.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jahooma jahooma merged commit 585260b into main Apr 24, 2026
34 checks passed
@jahooma jahooma deleted the jahooma/freebuff-glm-limit branch April 24, 2026 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant