Skip to content

fix: synchronous endpoint + Cloud Run scaling (fixes prod timeout)#282

Merged
beveradb merged 1 commit intomainfrom
hotfix-sync-endpoint
Mar 26, 2026
Merged

fix: synchronous endpoint + Cloud Run scaling (fixes prod timeout)#282
beveradb merged 1 commit intomainfrom
hotfix-sync-endpoint

Conversation

@beveradb
Copy link
Copy Markdown
Collaborator

Hotfix for prod job timeout. Remove GPU semaphore, make endpoint synchronous, increase POST timeout to 1800s.

@coderabbitai ignore

The fire-and-forget + semaphore design caused all jobs to queue on one
instance. Cloud Run couldn't see background threads as "busy" so it
never scaled to new instances.

Fix: make endpoint synchronous (await executor) with concurrency=1.
Cloud Run sees each request as active during processing and scales to
new GPU instances for concurrent jobs — matching Modal's .spawn().
Increase client POST timeout to 1800s to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@beveradb beveradb merged commit b8ded33 into main Mar 26, 2026
2 checks passed
@beveradb beveradb deleted the hotfix-sync-endpoint branch March 26, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant