Skip to content

Shorten TMDB request timeout and retries; make env-tunable#451

Open
funkypenguin wants to merge 1 commit intocedya77:devfrom
funkypenguin:fix/tmdb-shed-timeouts
Open

Shorten TMDB request timeout and retries; make env-tunable#451
funkypenguin wants to merge 1 commit intocedya77:devfrom
funkypenguin:fix/tmdb-shed-timeouts

Conversation

@funkypenguin
Copy link
Copy Markdown

Summary

Cut TMDB per-request timeout from 15s → 5s and max attempts from 3 → 2, both env-tunable. Worst-case held time per request drops from 45s to 10s.

Motivation

Under upstream TMDB brownouts (observed on a busy public instance), every concurrent TMDB call was holding its response buffer + promise chain for up to 45 seconds waiting on retries. A single manifest load fans out to dozens of TMDB calls; concurrent user traffic multiplies that. During a brief TMDB slowdown, the in-flight state stacked up faster than GC could reclaim it, taking the process to OOM even with --max-old-space-size=8192.

Example from the recent incident:

[Meta] WARN [MovieMeta] TMDB fallback failed for tmdb:1241921: The operation was aborted due to timeout
[Meta] WARN [MovieMeta] TMDB fallback failed for tmdb:1125257: The operation was aborted due to timeout
... [50+ more in a few seconds] ...
[Trakt] Up Next: fetchTraktUpNextEpisodes took 124964ms   <-- single request, 2+ minutes
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

The process wasn't leaking memory — it was holding 8 GB of concurrent pending-TMDB-request state. Raising the heap limit doesn't help; shedding load faster does.

Change

Two one-line swaps in makeTmdbRequest:

- const maxRetries = 3;
+ const maxRetries = Math.max(1, parseInt(process.env.TMDB_MAX_RETRIES || '2', 10));
+ const requestTimeoutMs = Math.max(1000, parseInt(process.env.TMDB_REQUEST_TIMEOUT_MS || '5000', 10));

- signal: AbortSignal.timeout(15000)
+ signal: AbortSignal.timeout(requestTimeoutMs)

New env vars

Var Default Prior behaviour
TMDB_REQUEST_TIMEOUT_MS 5000 15000 (hard-coded)
TMDB_MAX_RETRIES 2 3 (hard-coded)

Self-hosters with private TMDB keys and spare quota can raise these back to prior values if they prefer aggressive retrying. Operators hitting rate limits or upstream flakiness can tune down further.

What's preserved

  • 429 backoff via Retry-After header — untouched
  • Non-retryable codes fast-fail path — untouched
  • Error classification and logging — untouched
  • The scrapedImdbIdCache clear — untouched

Follow-ups (not in this PR)

Similar timeout/retry tuning is warranted for: TVmaze (75s worst case), Trakt (90s), AniList (90s), MAL/Jikan (45s), MDBList (50s), Letterboxd (90s), IMDb Ratings (120s), TVDB (24s). Keeping this PR focused on TMDB since that's the one in the incident log and the most frequently hit provider.

Test plan

  • node --check on modified file
  • Applies cleanly against current dev
  • With defaults, verify typical TMDB latency stays well under 5s (normal TMDB is ~200-800ms)
  • Under induced upstream slowness, verify requests fail fast at 5-10s instead of 45s
  • Confirm Retry-After handling still works for 429 responses (existing test path)

🤖 Generated with Claude Code

makeTmdbRequest() retried up to 3 times with a 15-second AbortSignal
timeout per attempt, meaning a single failing TMDB call could hold its
response buffers and promise chain for 45 seconds before giving up. Under
upstream brownouts (observed recently on a busy public instance) this
stacks up across the dozens of concurrent TMDB calls that a single
manifest/search/meta request can trigger, and the in-flight state piles
up in the Node heap until it OOMs — raising --max-old-space-size to 8GB
just delayed the failure by a few minutes because the actual problem was
in-flight request volume, not retained data.

New defaults: 5s × 2 attempts = 10s worst case (9x less held time).

Env vars:
  TMDB_REQUEST_TIMEOUT_MS (default 5000)
  TMDB_MAX_RETRIES        (default 2)

Operators running into TMDB rate limits or flaky upstreams can tune these
without rebuilding; self-hosted instances with spare TMDB quota can bump
them back up. The existing backoff-on-429 logic is preserved and still
honours the Retry-After header.

Only affects TMDB. Similar treatment is warranted for TVmaze, Trakt,
AniList, MAL, MDBList, Letterboxd, and IMDb Ratings — leaving those for
follow-up PRs to keep this change focused and easy to review.
@github-actions
Copy link
Copy Markdown
Contributor

PR Guard

  • Missing template sections: ## linked issue, ## type of change, ## why this approach, ## testing, ## documentation, ## author checklist, ## ai usage disclosure
  • Non-trivial PRs must link an issue in the PR body.
  • Please complete the relevant checkboxes in the PR template.

Maintainers may still close PRs that do not match project direction or review capacity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant