fix(opdb) use /api/export, not the non-existent paginated /api/machines#76
Conversation
|
🟡 Operational sync still blocked by OPDB rate-limit; deferring to next session. Post-push, attempted The code fix in this PR remains correct — verified by:
Operational close deferred to a future session once OPDB's rate-limit window clears (likely tomorrow or after several hours of no traffic). Same playbook as PR #75's games scrape; will post a confirmation comment with the actual machine count when it runs cleanly. This is the canonical "polite-by-construction" lesson: aggressive validation probes against a rate-limited API trade short-term diagnostic speed for medium-term operational lockout. For OPDB specifically, the per-source politeness override in |
Item 4 hand-off (Phase 2 § Scope) live-validation against https://opdb.org/api/ surfaced a regression in the original OPDB integration (PR `d9face6`): the live API returns 404 on /api/machines?page=...&page_size=... — the endpoint does not exist. The PR `d9face6` unit tests pinned a `StubHandler` contract that the real API never honored. Same failure pattern as PR #72 / Item 6 (PR #75): tests faithfully pinning a fiction. Live-API probing confirmed: - /api/export — 200, single 2.4 MB JSON array, ~2,360 machines (the full catalog; what we want for Phase 1's "full re-sync each run" semantics) - /api/machines/{opdb_id} — 200 (single-machine lookup; unchanged) - /api/changelog — 200 (incremental; reserved for Phase 4+) - /api/machines?page=... — 404 (does not exist) What ships: - OpdbClient.StreamAllMachinesAsync rewritten as a single GET to /api/export, stream-parsed via JsonSerializer.DeserializeAsyncEnumerable so the JSON array is consumed element-by-element rather than fully buffered. Routes through SendPolitelyAsync (politeness gate unchanged). BuildPagedUrl helper deleted. - OpdbOptions.PageSize property removed (only consumer was the removed paginated path; dead-config-grep clean). - OpdbOptions.HttpTimeoutSeconds default bumped 60s → 120s. /api/export on a cold cache routinely takes 30s+; 120s gives reasonable headroom while still bounding hung calls. - ServiceDefaults global standard-resilience defaults bumped: TotalRequestTimeout 30s → 120s, AttemptTimeout 10s → 50s, CircuitBreaker.SamplingDuration default → 120s (must be ≥ 2 × AttemptTimeout per HttpStandardResilienceOptions validation). Per-client override on HttpStandardResilienceOptions named "OpdbClient-standard" was attempted and proved a no-op (the named-options key the standard handler uses when added via ConfigureHttpClientDefaults does not match the per-client name in the obvious way). Global bump is the simpler deterministic fix; benefits Stern Vue.js networkidle waits too (which routinely take 15–25s, well within the new budget). - OpdbClientTests: replaces the paginated-shape tests with StreamAllMachinesAsync_HitsExportEndpoint_NotPaginatedMachines and StreamAllMachinesAsync_EmptyArray_YieldsNothing. The first asserts exactly one request is made AND that PathAndQuery equals "/api/export" — a future regression to pagination would either 404 (no stub) or trip Single(). - OpdbSyncServiceTests: 7 SetResponseFor calls updated from /api/machines?page=1&page_size=100 to /api/export. - decision-log.md DL-0003 captures the full history (PR `d9face6` → Item 4 hand-off → revert) and the "revisit when" trigger (OPDB ships a paginated machines endpoint, or Phase 4+ adopts changelog-based incremental sync). - build-spec.md Phase 2 retrospective: Item 4 outcome added to the § Hand-off outcomes section; the "~12k machines" estimate corrected to "~2.4k" inline. - http-resilience-research.md gets a status note at the top: the research recommended per-client custom AddResilienceHandler pipelines but what shipped is bumped global AddStandardResilienceHandler defaults. Reconciliation is honest about the trade-off (FileDownloader's 300s wall is now soft-capped at 120s; fine for Phase 1 PDFs <20MB). - memory project_external_apis_and_politeness.md updated with the actual OPDB endpoints in use. Live-validation status: - Unit tests: 535 / 535 passing (was 534; +1 from new EmptyArray test). - Build clean, zero warnings. - /api/export endpoint shape was verified by direct probe (200, 2.4 MB, 2,359 records, expected DTO field shape). - Live --source opdb --dry-run against deployed Cosmos was BLOCKED at PR-creation time by an OPDB rate-limit window from validation probes (we hit /api/export multiple times in quick succession during the initial endpoint probing). The CODE fix is fully validated by unit tests + the standalone endpoint probe; the operational deployed-Cosmos sync runs cleanly post-merge once the rate-limit window clears (parallel to PR #75's games scrape, which was in flight at PR-creation time and confirmed in a follow-up comment). Local review summary: 0 🔴 / 3⚠️ / 7 categories ✅.⚠️ findings: 1. ServiceDefaults blast-radius (global change to fix one client's problem) — fixed by adding a status note to http-resilience-research.md so the architectural reasoning stays coherent. 2. Malformed-row resilience: DeserializeAsyncEnumerable aborts on first bad row → kills the whole 2,360-row sync. Not a regression vs. the old paginated code (same behavior), but worse than DL-0003's framing implies. Deferred as a follow-up: wrap inner DTO read in try/catch that logs+skips and increments OpdbSyncMalformed counter. Cost-of- deferring: low (OPDB hasn't shipped a malformed row historically). 3. Mid-stream cancellation test: not added; framework-honored via token threading. Noted in review, not blocking. Tests: 535 / 535 passing (was 534; -2 paginated-shape tests, +3 new tests including EmptyArray + endpoint pinning). Build clean, zero warnings.
edc5683 to
607f518
Compare
Summary
d9face6. The original OPDB integration was built against/api/machines?page=...&page_size=...— an endpoint that does not exist on the live OPDB API (404). PRd9face6'sStubHandlerfaithfully pinned a contract that the real API never honored. Same failure pattern as PR refactor(stern) revert LinkRaw / BulletinRaw to positional records #72 / Item 6 (PR fix(stern) revert Playwright DTOs to classes — records broke live deserialization #75) — tests pinning a fiction./api/exportis the actual bulk-machines endpoint: 200, single 2.4 MB JSON array, ~2,360 machines (the build-spec's "~12k" estimate was off by ~5×)./api/machines/{opdb_id}(single-machine lookup) and/api/changelog(incremental) work;/api/machines?page=...404s.OpdbClient.StreamAllMachinesAsyncas a single GET to/api/export+JsonSerializer.DeserializeAsyncEnumerablefor streaming parse (no full materialization). Remove the now-deadOpdbOptions.PageSize(dead-config-grep clean). BumpOpdbOptions.HttpTimeoutSecondsdefault 60s → 120s.AddStandardResilienceHandlerinServiceDefaultswas configured with the Microsoft-default 30sTotalRequestTimeout/ 10sAttemptTimeout, which guillotines OPDB's bulk export response (cold cache can take 30s+). Bumped to 120s / 50s / 120sCircuitBreaker.SamplingDuration. A per-client override on"OpdbClient-standard"was tried and proved a no-op — the named-options key behavior ofConfigureHttpClientDefaultsdoesn't match the obvious convention; the global bump is the simpler deterministic fix and benefits Stern Vue.jsnetworkidlewaits too (15–25s typical).d9face6→ Item 4 hand-off → revert history with a "revisit when" trigger; build-spec Phase 2 § Hand-off outcomes appended with the Item 4 outcome and the corrected machine count;docs/http-resilience-research.mdgets a status note at the top reconciling the original "use per-client custom pipelines" recommendation against what actually shipped (bumped global defaults).Changes
src/PinballWizard.Infrastructure/Integrations/Opdb/OpdbClient.cs—StreamAllMachinesAsyncrewritten; xmldoc rewritten;BuildPagedUrlhelper deleted.src/PinballWizard.Infrastructure/Integrations/Opdb/ServiceCollectionExtensions.cs— comment updated to describe the inner-vs-outer relationship betweenHttpClient.Timeoutand the resilience pipeline.src/PinballWizard.Core/Configuration/OpdbOptions.cs—PageSizeremoved;HttpTimeoutSecondsdefault 60s → 120s; xmldoc rewritten.src/PinballWizard.ServiceDefaults/Extensions.cs—AddStandardResilienceHandlerconfigured with explicit timeout overrides (120s/50s/120s).tests/PinballWizard.Scraper.Tests/Integrations/Opdb/OpdbClientTests.cs— paginated tests replaced withHitsExportEndpoint_NotPaginatedMachines(asserts exactly one request to/api/export) +EmptyArray_YieldsNothing.PageSizeremoved from setup.tests/PinballWizard.Scraper.Tests/Integrations/Opdb/OpdbSyncServiceTests.cs— 9SetResponseForcalls updated/api/machines?page=1&page_size=100→/api/export.PageSizeremoved from setup.docs/decision-log.md— new DL-0003 entry.docs/build-spec.md— Phase 2 § Hand-off outcomes appended; "~12k" estimate corrected to "~2.4k" inline.docs/http-resilience-research.md— status note at the top.packages.lock.jsonfiles updated (CRLF normalization + checksum updates).Work Item
No Jira/ADO ticket — PinballWizard is a personal Earlybird showcase project (no work-item tracker integration per
feedback_personal_identity_only.md).Test Plan
dotnet build PinballWizard.slnx— clean, zero warningsdotnet test— 535 / 535 passing (was 534; -2 paginated-shape tests, +3 new tests)https://opdb.org/api/export— 200, 2.4 MB, 2,359 records, expected DTO field shape--source opdb --dry-runagainst deployed Cosmos succeeds — blocked at PR-creation time by an OPDB rate-limit window from validation probes; will run cleanly once the window clears (15–30 min). Same playbook as PR fix(stern) revert Playwright DTOs to classes — records broke live deserialization #75's games scrape.--source opdb(apply mode) against deployed Cosmos verifies ~2,360 machines upserted to themachinescontainer with full provenance.git log -1 --format='%an <%ae>': personal noreply identity confirmed/local-review: 0 🔴 / 3Notes for Reviewer
StubHandleris a derivative of that, not its source of truth. PRd9face6's tests passed againstSetResponseFor("/api/machines?page=1...")— the real API was never consulted at integration time. Item 4's deferred operational hand-off is what surfaced this — same dynamic as Item 6.HitsExportEndpoint_NotPaginatedMachinestest is the durable pin worth carrying forward: it stubs only/api/exportAND assertsAssert.Single(_handler.Requests)withPathAndQuery == "/api/export". A future regression to pagination either 404s (no stub) or trips theSingle()assertion — deterministic catch.services.AddOptions<HttpStandardResilienceOptions>("OpdbClient-standard").Configure(...)and it silently no-op'd — the standard handler fromConfigureHttpClientDefaultsdoesn't read options bound to that key. Documented in DL-0003 alternatives-considered. The global bump is fine; if a future client genuinely needs different bounds, the right move is migrating away fromConfigureHttpClientDefaultsto per-clientAddResilienceHandler(which is whatdocs/http-resilience-research.mdoriginally recommended). Not blocking this PR./local-reviewand worth tracking): (a) malformed-row resilience —DeserializeAsyncEnumerableaborts on first bad row, killing the whole 2,360-row sync; wrap inner DTO read in try/catch + skip + counter. (b) Mid-stream cancellation test — framework-honored via token threading but not explicitly tested.Generated with Claude Code