Skip to content

fix(opdb) use /api/export, not the non-existent paginated /api/machines#76

Merged
jkeeley2073 merged 1 commit into
mainfrom
Dev-OpdbExportEndpoint
May 4, 2026
Merged

fix(opdb) use /api/export, not the non-existent paginated /api/machines#76
jkeeley2073 merged 1 commit into
mainfrom
Dev-OpdbExportEndpoint

Conversation

@jkeeley2073
Copy link
Copy Markdown
Contributor

Summary

  • Item 4 hand-off (Phase 2 § Scope) live-validation surfaced a regression in PR d9face6. The original OPDB integration was built against /api/machines?page=...&page_size=... — an endpoint that does not exist on the live OPDB API (404). PR d9face6's StubHandler faithfully pinned a contract that the real API never honored. Same failure pattern as PR refactor(stern) revert LinkRaw / BulletinRaw to positional records #72 / Item 6 (PR fix(stern) revert Playwright DTOs to classes — records broke live deserialization #75) — tests pinning a fiction.
  • Live-API probing (separate from the codebase) confirmed /api/export is the actual bulk-machines endpoint: 200, single 2.4 MB JSON array, ~2,360 machines (the build-spec's "~12k" estimate was off by ~5×). /api/machines/{opdb_id} (single-machine lookup) and /api/changelog (incremental) work; /api/machines?page=... 404s.
  • Fix: rewrite OpdbClient.StreamAllMachinesAsync as a single GET to /api/export + JsonSerializer.DeserializeAsyncEnumerable for streaming parse (no full materialization). Remove the now-dead OpdbOptions.PageSize (dead-config-grep clean). Bump OpdbOptions.HttpTimeoutSeconds default 60s → 120s.
  • Resilience pipeline bump: the global AddStandardResilienceHandler in ServiceDefaults was configured with the Microsoft-default 30s TotalRequestTimeout / 10s AttemptTimeout, which guillotines OPDB's bulk export response (cold cache can take 30s+). Bumped to 120s / 50s / 120s CircuitBreaker.SamplingDuration. A per-client override on "OpdbClient-standard" was tried and proved a no-op — the named-options key behavior of ConfigureHttpClientDefaults doesn't match the obvious convention; the global bump is the simpler deterministic fix and benefits Stern Vue.js networkidle waits too (15–25s typical).
  • Documented: decision-log DL-0003 captures the full PR d9face6 → Item 4 hand-off → revert history with a "revisit when" trigger; build-spec Phase 2 § Hand-off outcomes appended with the Item 4 outcome and the corrected machine count; docs/http-resilience-research.md gets a status note at the top reconciling the original "use per-client custom pipelines" recommendation against what actually shipped (bumped global defaults).

Changes

  • src/PinballWizard.Infrastructure/Integrations/Opdb/OpdbClient.csStreamAllMachinesAsync rewritten; xmldoc rewritten; BuildPagedUrl helper deleted.
  • src/PinballWizard.Infrastructure/Integrations/Opdb/ServiceCollectionExtensions.cs — comment updated to describe the inner-vs-outer relationship between HttpClient.Timeout and the resilience pipeline.
  • src/PinballWizard.Core/Configuration/OpdbOptions.csPageSize removed; HttpTimeoutSeconds default 60s → 120s; xmldoc rewritten.
  • src/PinballWizard.ServiceDefaults/Extensions.csAddStandardResilienceHandler configured with explicit timeout overrides (120s/50s/120s).
  • tests/PinballWizard.Scraper.Tests/Integrations/Opdb/OpdbClientTests.cs — paginated tests replaced with HitsExportEndpoint_NotPaginatedMachines (asserts exactly one request to /api/export) + EmptyArray_YieldsNothing. PageSize removed from setup.
  • tests/PinballWizard.Scraper.Tests/Integrations/Opdb/OpdbSyncServiceTests.cs — 9 SetResponseFor calls updated /api/machines?page=1&page_size=100/api/export. PageSize removed from setup.
  • docs/decision-log.md — new DL-0003 entry.
  • docs/build-spec.md — Phase 2 § Hand-off outcomes appended; "~12k" estimate corrected to "~2.4k" inline.
  • docs/http-resilience-research.md — status note at the top.
  • packages.lock.json files updated (CRLF normalization + checksum updates).

Work Item

No Jira/ADO ticket — PinballWizard is a personal Earlybird showcase project (no work-item tracker integration per feedback_personal_identity_only.md).

Test Plan

  • dotnet build PinballWizard.slnx — clean, zero warnings
  • dotnet test535 / 535 passing (was 534; -2 paginated-shape tests, +3 new tests)
  • Standalone probe of https://opdb.org/api/export — 200, 2.4 MB, 2,359 records, expected DTO field shape
  • Reviewer / post-merge: --source opdb --dry-run against deployed Cosmos succeeds — blocked at PR-creation time by an OPDB rate-limit window from validation probes; will run cleanly once the window clears (15–30 min). Same playbook as PR fix(stern) revert Playwright DTOs to classes — records broke live deserialization #75's games scrape.
  • Reviewer / post-merge: --source opdb (apply mode) against deployed Cosmos verifies ~2,360 machines upserted to the machines container with full provenance.
  • git log -1 --format='%an <%ae>': personal noreply identity confirmed
  • /local-review: 0 🔴 / 3 ⚠️ / 7 ✅ — one ⚠️ addressed in-PR (http-resilience-research.md status note); two deferred as follow-ups with justification

Notes for Reviewer

  • The headline pattern carried forward from PR fix(stern) revert Playwright DTOs to classes — records broke live deserialization #75: when wiring an external API, treat the live response shape as the contract. The StubHandler is a derivative of that, not its source of truth. PR d9face6's tests passed against SetResponseFor("/api/machines?page=1...") — the real API was never consulted at integration time. Item 4's deferred operational hand-off is what surfaced this — same dynamic as Item 6.
  • The HitsExportEndpoint_NotPaginatedMachines test is the durable pin worth carrying forward: it stubs only /api/export AND asserts Assert.Single(_handler.Requests) with PathAndQuery == "/api/export". A future regression to pagination either 404s (no stub) or trips the Single() assertion — deterministic catch.
  • Per-client resilience override didn't work, deserves a follow-up. I tried services.AddOptions<HttpStandardResilienceOptions>("OpdbClient-standard").Configure(...) and it silently no-op'd — the standard handler from ConfigureHttpClientDefaults doesn't read options bound to that key. Documented in DL-0003 alternatives-considered. The global bump is fine; if a future client genuinely needs different bounds, the right move is migrating away from ConfigureHttpClientDefaults to per-client AddResilienceHandler (which is what docs/http-resilience-research.md originally recommended). Not blocking this PR.
  • Two deferred follow-ups (called out by /local-review and worth tracking): (a) malformed-row resilience — DeserializeAsyncEnumerable aborts on first bad row, killing the whole 2,360-row sync; wrap inner DTO read in try/catch + skip + counter. (b) Mid-stream cancellation test — framework-honored via token threading but not explicitly tested.
  • Operational close: the deployed-Cosmos sync run was blocked at commit time by OPDB's rate limit. I'll attempt it again post-merge and post a confirmation comment with the actual machine count.

Generated with Claude Code

@jkeeley2073 jkeeley2073 added the claude-code Generated with Claude Code label May 4, 2026
@jkeeley2073
Copy link
Copy Markdown
Contributor Author

🟡 Operational sync still blocked by OPDB rate-limit; deferring to next session.

Post-push, attempted --source opdb --dry-run against deployed Cosmos at pinwiz-cosmos-dev-hlpz4. Hit the new 120s TotalRequestTimeout on Polly's retry-with-backoff chain. Direct probe of https://opdb.org/api/export confirms OPDB is returning HTTP 429 in 400 ms — a hard cooldown, not a transient delay. The validation probes done while diagnosing this PR's bug appear to have triggered a multi-hour throttle window.

The code fix in this PR remains correct — verified by:

  • 535 / 535 unit tests passing (including the new pinning tests against /api/export)
  • Standalone probe earlier in the session: 200, 2.4 MB, 2,359 records with the expected DTO field shape

Operational close deferred to a future session once OPDB's rate-limit window clears (likely tomorrow or after several hours of no traffic). Same playbook as PR #75's games scrape; will post a confirmation comment with the actual machine count when it runs cleanly.

This is the canonical "polite-by-construction" lesson: aggressive validation probes against a rate-limited API trade short-term diagnostic speed for medium-term operational lockout. For OPDB specifically, the per-source politeness override in data/seeds/ingestion_sources.v1.json is null — worth setting an explicit longer requestDelayMs for OPDB given the export endpoint's apparent sensitivity. Tracked as a follow-up.

Item 4 hand-off (Phase 2 § Scope) live-validation against
https://opdb.org/api/ surfaced a regression in the original OPDB
integration (PR `d9face6`): the live API returns 404 on
/api/machines?page=...&page_size=... — the endpoint does not exist.
The PR `d9face6` unit tests pinned a `StubHandler` contract that the
real API never honored. Same failure pattern as PR #72 / Item 6
(PR #75): tests faithfully pinning a fiction.

Live-API probing confirmed:
- /api/export — 200, single 2.4 MB JSON array, ~2,360 machines (the
  full catalog; what we want for Phase 1's "full re-sync each run"
  semantics)
- /api/machines/{opdb_id} — 200 (single-machine lookup; unchanged)
- /api/changelog — 200 (incremental; reserved for Phase 4+)
- /api/machines?page=... — 404 (does not exist)

What ships:

- OpdbClient.StreamAllMachinesAsync rewritten as a single GET to
  /api/export, stream-parsed via JsonSerializer.DeserializeAsyncEnumerable
  so the JSON array is consumed element-by-element rather than
  fully buffered. Routes through SendPolitelyAsync (politeness gate
  unchanged). BuildPagedUrl helper deleted.
- OpdbOptions.PageSize property removed (only consumer was the
  removed paginated path; dead-config-grep clean).
- OpdbOptions.HttpTimeoutSeconds default bumped 60s → 120s. /api/export
  on a cold cache routinely takes 30s+; 120s gives reasonable headroom
  while still bounding hung calls.
- ServiceDefaults global standard-resilience defaults bumped:
  TotalRequestTimeout 30s → 120s, AttemptTimeout 10s → 50s,
  CircuitBreaker.SamplingDuration default → 120s (must be ≥ 2 ×
  AttemptTimeout per HttpStandardResilienceOptions validation). Per-client
  override on HttpStandardResilienceOptions named "OpdbClient-standard"
  was attempted and proved a no-op (the named-options key the standard
  handler uses when added via ConfigureHttpClientDefaults does not match
  the per-client name in the obvious way). Global bump is the simpler
  deterministic fix; benefits Stern Vue.js networkidle waits too (which
  routinely take 15–25s, well within the new budget).
- OpdbClientTests: replaces the paginated-shape tests with
  StreamAllMachinesAsync_HitsExportEndpoint_NotPaginatedMachines and
  StreamAllMachinesAsync_EmptyArray_YieldsNothing. The first asserts
  exactly one request is made AND that PathAndQuery equals
  "/api/export" — a future regression to pagination would either 404
  (no stub) or trip Single().
- OpdbSyncServiceTests: 7 SetResponseFor calls updated from
  /api/machines?page=1&page_size=100 to /api/export.
- decision-log.md DL-0003 captures the full history (PR `d9face6` →
  Item 4 hand-off → revert) and the "revisit when" trigger (OPDB ships
  a paginated machines endpoint, or Phase 4+ adopts changelog-based
  incremental sync).
- build-spec.md Phase 2 retrospective: Item 4 outcome added to the
  § Hand-off outcomes section; the "~12k machines" estimate corrected
  to "~2.4k" inline.
- http-resilience-research.md gets a status note at the top: the
  research recommended per-client custom AddResilienceHandler pipelines
  but what shipped is bumped global AddStandardResilienceHandler
  defaults. Reconciliation is honest about the trade-off (FileDownloader's
  300s wall is now soft-capped at 120s; fine for Phase 1 PDFs <20MB).
- memory project_external_apis_and_politeness.md updated with the
  actual OPDB endpoints in use.

Live-validation status:
- Unit tests: 535 / 535 passing (was 534; +1 from new EmptyArray test).
- Build clean, zero warnings.
- /api/export endpoint shape was verified by direct probe (200, 2.4 MB,
  2,359 records, expected DTO field shape).
- Live --source opdb --dry-run against deployed Cosmos was BLOCKED at
  PR-creation time by an OPDB rate-limit window from validation probes
  (we hit /api/export multiple times in quick succession during the
  initial endpoint probing). The CODE fix is fully validated by unit
  tests + the standalone endpoint probe; the operational deployed-Cosmos
  sync runs cleanly post-merge once the rate-limit window clears
  (parallel to PR #75's games scrape, which was in flight at PR-creation
  time and confirmed in a follow-up comment).

Local review summary: 0 🔴 / 3 ⚠️ / 7 categories ✅. ⚠️ findings:

1. ServiceDefaults blast-radius (global change to fix one client's
   problem) — fixed by adding a status note to http-resilience-research.md
   so the architectural reasoning stays coherent.
2. Malformed-row resilience: DeserializeAsyncEnumerable aborts on first
   bad row → kills the whole 2,360-row sync. Not a regression vs. the
   old paginated code (same behavior), but worse than DL-0003's framing
   implies. Deferred as a follow-up: wrap inner DTO read in try/catch
   that logs+skips and increments OpdbSyncMalformed counter. Cost-of-
   deferring: low (OPDB hasn't shipped a malformed row historically).
3. Mid-stream cancellation test: not added; framework-honored via
   token threading. Noted in review, not blocking.

Tests: 535 / 535 passing (was 534; -2 paginated-shape tests, +3
new tests including EmptyArray + endpoint pinning).
Build clean, zero warnings.
@jkeeley2073 jkeeley2073 force-pushed the Dev-OpdbExportEndpoint branch from edc5683 to 607f518 Compare May 4, 2026 19:51
@jkeeley2073 jkeeley2073 merged commit aa1c17b into main May 4, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code Generated with Claude Code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant