Shorten TMDB request timeout and retries; make env-tunable by funkypenguin · Pull Request #451 · cedya77/aiometadata

funkypenguin · 2026-04-23T19:09:02Z

Summary

Cut TMDB per-request timeout from 15s → 5s and max attempts from 3 → 2, both env-tunable. Worst-case held time per request drops from 45s to 10s.

Motivation

Under upstream TMDB brownouts (observed on a busy public instance), every concurrent TMDB call was holding its response buffer + promise chain for up to 45 seconds waiting on retries. A single manifest load fans out to dozens of TMDB calls; concurrent user traffic multiplies that. During a brief TMDB slowdown, the in-flight state stacked up faster than GC could reclaim it, taking the process to OOM even with --max-old-space-size=8192.

Example from the recent incident:

[Meta] WARN [MovieMeta] TMDB fallback failed for tmdb:1241921: The operation was aborted due to timeout
[Meta] WARN [MovieMeta] TMDB fallback failed for tmdb:1125257: The operation was aborted due to timeout
... [50+ more in a few seconds] ...
[Trakt] Up Next: fetchTraktUpNextEpisodes took 124964ms   <-- single request, 2+ minutes
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

The process wasn't leaking memory — it was holding 8 GB of concurrent pending-TMDB-request state. Raising the heap limit doesn't help; shedding load faster does.

Change

Two one-line swaps in makeTmdbRequest:

- const maxRetries = 3;
+ const maxRetries = Math.max(1, parseInt(process.env.TMDB_MAX_RETRIES || '2', 10));
+ const requestTimeoutMs = Math.max(1000, parseInt(process.env.TMDB_REQUEST_TIMEOUT_MS || '5000', 10));

- signal: AbortSignal.timeout(15000)
+ signal: AbortSignal.timeout(requestTimeoutMs)

New env vars

Var	Default	Prior behaviour
`TMDB_REQUEST_TIMEOUT_MS`	`5000`	15000 (hard-coded)
`TMDB_MAX_RETRIES`	`2`	3 (hard-coded)

Self-hosters with private TMDB keys and spare quota can raise these back to prior values if they prefer aggressive retrying. Operators hitting rate limits or upstream flakiness can tune down further.

What's preserved

429 backoff via Retry-After header — untouched
Non-retryable codes fast-fail path — untouched
Error classification and logging — untouched
The scrapedImdbIdCache clear — untouched

Follow-ups (not in this PR)

Similar timeout/retry tuning is warranted for: TVmaze (75s worst case), Trakt (90s), AniList (90s), MAL/Jikan (45s), MDBList (50s), Letterboxd (90s), IMDb Ratings (120s), TVDB (24s). Keeping this PR focused on TMDB since that's the one in the incident log and the most frequently hit provider.

Test plan

node --check on modified file
Applies cleanly against current dev
With defaults, verify typical TMDB latency stays well under 5s (normal TMDB is ~200-800ms)
Under induced upstream slowness, verify requests fail fast at 5-10s instead of 45s
Confirm Retry-After handling still works for 429 responses (existing test path)

🤖 Generated with Claude Code

makeTmdbRequest() retried up to 3 times with a 15-second AbortSignal timeout per attempt, meaning a single failing TMDB call could hold its response buffers and promise chain for 45 seconds before giving up. Under upstream brownouts (observed recently on a busy public instance) this stacks up across the dozens of concurrent TMDB calls that a single manifest/search/meta request can trigger, and the in-flight state piles up in the Node heap until it OOMs — raising --max-old-space-size to 8GB just delayed the failure by a few minutes because the actual problem was in-flight request volume, not retained data. New defaults: 5s × 2 attempts = 10s worst case (9x less held time). Env vars: TMDB_REQUEST_TIMEOUT_MS (default 5000) TMDB_MAX_RETRIES (default 2) Operators running into TMDB rate limits or flaky upstreams can tune these without rebuilding; self-hosted instances with spare TMDB quota can bump them back up. The existing backoff-on-429 logic is preserved and still honours the Retry-After header. Only affects TMDB. Similar treatment is warranted for TVmaze, Trakt, AniList, MAL, MDBList, Letterboxd, and IMDb Ratings — leaving those for follow-up PRs to keep this change focused and easy to review.

github-actions · 2026-04-23T19:09:14Z

PR Guard

Missing template sections: ## linked issue, ## type of change, ## why this approach, ## testing, ## documentation, ## author checklist, ## ai usage disclosure
Non-trivial PRs must link an issue in the PR body.
Please complete the relevant checkboxes in the PR template.

Maintainers may still close PRs that do not match project direction or review capacity.

github-actions Bot added the needs-issue label Apr 23, 2026

cedya77 force-pushed the dev branch 3 times, most recently from f1cf7c6 to 68d40bf Compare April 25, 2026 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shorten TMDB request timeout and retries; make env-tunable#451

Shorten TMDB request timeout and retries; make env-tunable#451
funkypenguin wants to merge 1 commit intocedya77:devfrom
funkypenguin:fix/tmdb-shed-timeouts

funkypenguin commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

funkypenguin commented Apr 23, 2026

Summary

Motivation

Change

New env vars

What's preserved

Follow-ups (not in this PR)

Test plan

Uh oh!

github-actions Bot commented Apr 23, 2026

PR Guard

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant