refactor(stern) revert LinkRaw / BulletinRaw to positional records#72
Merged
Conversation
Phase 2 § Scope item 6 (Wave 2). The package bump itself (Microsoft.Playwright 1.12.0 → 1.59.0) already landed in PR #61; this PR closes the scope item by completing the records-workaround revert from PR #34. Background: PR #34 converted LinkRaw / BulletinRaw / EditionRaw from positional records to class-with-init-properties as a workaround for Playwright 1.12.0's deserialization path, which used Activator.CreateInstance(type) (no parameterless ctor on positional records → throws). Playwright 1.59.0 deserializes via System.Text.Json, which supports positional records natively. (EditionRaw was removed in a separate refactor and is not part of this revert.) What ships: - LinkRaw and BulletinRaw revert from class-with-init-properties to positional records with [property: JsonPropertyName] attributes. - The Href parameter is `string?` (not `string`) — this matches the System.Text.Json contract on missing fields (STJ assigns null, not empty string, when a property is absent in the JSON). The downstream IsNullOrWhiteSpace guards already coalesce null and empty, so behavior is preserved; the type now honestly reflects the runtime nullability the deserializer enforces. - Both records change from `private` to `internal` so the new SternPlaywrightRecordDeserializationTests in the test assembly can pin the System.Text.Json deserialization contract Playwright invokes. - Infrastructure.csproj adds [InternalsVisibleTo] for PinballWizard.Scraper.Tests, with a comment explaining why (to pin the deserialization contract — needed because Stern Playwright scrapers have no automated integration tests per Phase 2 § Scope item 8 route ii, and the only other validation surface is the live site). - 6 new tests in SternPlaywrightRecordDeserializationTests pin: full-payload deserialization (LinkRaw, BulletinRaw); optional fields omitted → null; required field (Href) omitted → null, matching the downstream guards' contract. JsonPropertyName typos would otherwise only surface as "0 results discovered" against the live site. - docs/build-spec.md Phase 2 § Scope item 6 updated to reflect reality: the package bump already landed in PR #61; this PR finishes the records-workaround revert. Local review summary: 0 🔴 / 2⚠️ / 8 categories ✅.⚠️ — Fixed: - Tighten Href: string → string?: matches the System.Text.Json contract on missing fields. Downstream IsNullOrWhiteSpace guards already handle both null and empty; the type is now honest. - Add System.Text.Json deserialization unit tests for LinkRaw and BulletinRaw: pins the contract Playwright invokes; catches JsonPropertyName typos that the build cannot. Tests: 524 / 524 passing (was 518; +6 deserialization tests). Build clean, zero warnings. After merge — operational hand-off: - Run `dotnet run --project src/PinballWizard.Cli -- --source games` and `--source bulletins` against sternpinball.com and confirm the scrapers still produce non-zero ScrapedItem yields with full provenance. The deserialization tests pin the JSON contract; the live-site validation pins the end-to-end Playwright integration. - If the live-site validation surfaces a regression, revert this commit and document why classes are still required.
jkeeley2073
added a commit
that referenced
this pull request
May 4, 2026
Item 4 hand-off (Phase 2 § Scope) live-validation against https://opdb.org/api/ surfaced a regression in the original OPDB integration (PR `d9face6`): the live API returns 404 on /api/machines?page=...&page_size=... — the endpoint does not exist. The PR `d9face6` unit tests pinned a `StubHandler` contract that the real API never honored. Same failure pattern as PR #72 / Item 6 (PR #75): tests faithfully pinning a fiction. Live-API probing confirmed: - /api/export — 200, single 2.4 MB JSON array, ~2,360 machines (the full catalog; what we want for Phase 1's "full re-sync each run" semantics) - /api/machines/{opdb_id} — 200 (single-machine lookup; unchanged) - /api/changelog — 200 (incremental; reserved for Phase 4+) - /api/machines?page=... — 404 (does not exist) What ships: - OpdbClient.StreamAllMachinesAsync rewritten as a single GET to /api/export, stream-parsed via JsonSerializer.DeserializeAsyncEnumerable so the JSON array is consumed element-by-element rather than fully buffered. Routes through SendPolitelyAsync (politeness gate unchanged). BuildPagedUrl helper deleted. - OpdbOptions.PageSize property removed (only consumer was the removed paginated path; dead-config-grep clean). - OpdbOptions.HttpTimeoutSeconds default bumped 60s → 120s. /api/export on a cold cache routinely takes 30s+; 120s gives reasonable headroom while still bounding hung calls. - ServiceDefaults global standard-resilience defaults bumped: TotalRequestTimeout 30s → 120s, AttemptTimeout 10s → 50s, CircuitBreaker.SamplingDuration default → 120s (must be ≥ 2 × AttemptTimeout per HttpStandardResilienceOptions validation). Per-client override on HttpStandardResilienceOptions named "OpdbClient-standard" was attempted and proved a no-op (the named-options key the standard handler uses when added via ConfigureHttpClientDefaults does not match the per-client name in the obvious way). Global bump is the simpler deterministic fix; benefits Stern Vue.js networkidle waits too (which routinely take 15–25s, well within the new budget). - OpdbClientTests: replaces the paginated-shape tests with StreamAllMachinesAsync_HitsExportEndpoint_NotPaginatedMachines and StreamAllMachinesAsync_EmptyArray_YieldsNothing. The first asserts exactly one request is made AND that PathAndQuery equals "/api/export" — a future regression to pagination would either 404 (no stub) or trip Single(). - OpdbSyncServiceTests: 7 SetResponseFor calls updated from /api/machines?page=1&page_size=100 to /api/export. - decision-log.md DL-0003 captures the full history (PR `d9face6` → Item 4 hand-off → revert) and the "revisit when" trigger (OPDB ships a paginated machines endpoint, or Phase 4+ adopts changelog-based incremental sync). - build-spec.md Phase 2 retrospective: Item 4 outcome added to the § Hand-off outcomes section; the "~12k machines" estimate corrected to "~2.4k" inline. - http-resilience-research.md gets a status note at the top: the research recommended per-client custom AddResilienceHandler pipelines but what shipped is bumped global AddStandardResilienceHandler defaults. Reconciliation is honest about the trade-off (FileDownloader's 300s wall is now soft-capped at 120s; fine for Phase 1 PDFs <20MB). - memory project_external_apis_and_politeness.md updated with the actual OPDB endpoints in use. Live-validation status: - Unit tests: 535 / 535 passing (was 534; +1 from new EmptyArray test). - Build clean, zero warnings. - /api/export endpoint shape was verified by direct probe (200, 2.4 MB, 2,359 records, expected DTO field shape). - Live --source opdb --dry-run against deployed Cosmos was BLOCKED at PR-creation time by an OPDB rate-limit window from validation probes (we hit /api/export multiple times in quick succession during the initial endpoint probing). The CODE fix is fully validated by unit tests + the standalone endpoint probe; the operational deployed-Cosmos sync runs cleanly post-merge once the rate-limit window clears (parallel to PR #75's games scrape, which was in flight at PR-creation time and confirmed in a follow-up comment). Local review summary: 0 🔴 / 3⚠️ / 7 categories ✅.⚠️ findings: 1. ServiceDefaults blast-radius (global change to fix one client's problem) — fixed by adding a status note to http-resilience-research.md so the architectural reasoning stays coherent. 2. Malformed-row resilience: DeserializeAsyncEnumerable aborts on first bad row → kills the whole 2,360-row sync. Not a regression vs. the old paginated code (same behavior), but worse than DL-0003's framing implies. Deferred as a follow-up: wrap inner DTO read in try/catch that logs+skips and increments OpdbSyncMalformed counter. Cost-of- deferring: low (OPDB hasn't shipped a malformed row historically). 3. Mid-stream cancellation test: not added; framework-honored via token threading. Noted in review, not blocking. Tests: 535 / 535 passing (was 534; -2 paginated-shape tests, +3 new tests including EmptyArray + endpoint pinning). Build clean, zero warnings.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes
docs/build-spec.mdPhase 2 § Scope item 6 (Wave 2).The package bump itself already landed in PR #61 (commit
43c1f23,feat(deps) bump all NuGet packages to latest stable) —Microsoft.Playwrightis at 1.59.0. The remaining work for this scope item was the records-workaround revert from PR #34: convertLinkRaw/BulletinRaw(class-with-init-properties workaround for Playwright 1.12.0) back to positional records, since Playwright 1.59.0 deserializes via System.Text.Json which supports records natively.The build-spec entry has been corrected to reflect this reality.
Background
PR #34 (Oct 2025) converted these records to classes because Playwright 1.12.0 used
Activator.CreateInstance(type)to materialize JS-evaluation results — that call fails on positional records (no parameterless ctor). Playwright 1.59.0 usesSystem.Text.Jsoninstead, which supports positional records natively via the synthesized constructor.What ships
LinkRawandBulletinRawrevert from class-with-init-properties to positional records with[property: JsonPropertyName]attributesHrefisstring?(notstring): matches the System.Text.Json contract on missing fields (STJ assigns null, not empty string, when a property is absent). The downstreamIsNullOrWhiteSpaceguards already coalesce null and empty, so behavior is preserved; the type now honestly reflects the runtime nullability the deserializer enforcesprivatetointernalso the newSternPlaywrightRecordDeserializationTestsin the test assembly can pin the contractPinballWizard.Infrastructure.csprojadds[InternalsVisibleTo]forPinballWizard.Scraper.Testswith a comment explaining why (Stern Playwright scrapers have no automated integration tests per Phase 2 § Scope item 8 route ii; live-site validation is the only other check, and aJsonPropertyNametypo would only surface as "0 results discovered" in production)SternPlaywrightRecordDeserializationTestspin: full-payload deserialization (both records), optional fields omitted → null, required field (Href) omitted → null (matching the downstream guards' contract)docs/build-spec.mdPhase 2 § Scope item 6 updated to reflect realityTest Plan
dotnet test PinballWizard.slnx --nologo→ 524 / 524 passing (was 518; +6 deserialization tests)dotnet build PinballWizard.slnx --nologo→ clean, zero warningsJsonPropertyNametypo or breaking change to STJ's record handling fails these tests at build time, not at "0 results" against the live siteLocal review
/local-reviewoutcome: 0 🔴 / 2Href: string→string?: matches the System.Text.Json contract on missing fields. Downstream guards unchanged; type is now honest.SternPlaywrightRecordDeserializationTests(6 cases) to pin the contract Playwright invokes.Operational hand-off after merge
dotnet run --project src/PinballWizard.Cli -- --source gamesand--source bulletinsagainststernpinball.comand confirm the scrapers still produce non-zeroScrapedItemyields with full provenance. The new tests pin the JSON contract; the live-site validation pins the end-to-end Playwright integration.7-item self-audit
LinkRawandBulletinRaware now syntactically identical (positional records,[property: JsonPropertyName],string?for Href, comments cross-reference each other consistently)catch { }— pre-existing bare catches inClickTabAsync/TryClickLoadMoreAsyncare NOT introduced by this PR (out of scope)--source games/--source bulletinspaths invokegit log -1 --format='%an <%ae>'→Jim Keeley <94459922+jkeeley2073@users.noreply.github.com>✅Out of Scope
sternpinball.com)🤖 Generated with Claude Code