Skip to content

refactor(stern) revert LinkRaw / BulletinRaw to positional records#72

Merged
jkeeley2073 merged 1 commit into
mainfrom
Dev-PlaywrightRecordsRevert
May 4, 2026
Merged

refactor(stern) revert LinkRaw / BulletinRaw to positional records#72
jkeeley2073 merged 1 commit into
mainfrom
Dev-PlaywrightRecordsRevert

Conversation

@jkeeley2073
Copy link
Copy Markdown
Contributor

Summary

Closes docs/build-spec.md Phase 2 § Scope item 6 (Wave 2).

The package bump itself already landed in PR #61 (commit 43c1f23, feat(deps) bump all NuGet packages to latest stable) — Microsoft.Playwright is at 1.59.0. The remaining work for this scope item was the records-workaround revert from PR #34: convert LinkRaw / BulletinRaw (class-with-init-properties workaround for Playwright 1.12.0) back to positional records, since Playwright 1.59.0 deserializes via System.Text.Json which supports records natively.

The build-spec entry has been corrected to reflect this reality.

Background

PR #34 (Oct 2025) converted these records to classes because Playwright 1.12.0 used Activator.CreateInstance(type) to materialize JS-evaluation results — that call fails on positional records (no parameterless ctor). Playwright 1.59.0 uses System.Text.Json instead, which supports positional records natively via the synthesized constructor.

What ships

  • LinkRaw and BulletinRaw revert from class-with-init-properties to positional records with [property: JsonPropertyName] attributes
  • Href is string? (not string): matches the System.Text.Json contract on missing fields (STJ assigns null, not empty string, when a property is absent). The downstream IsNullOrWhiteSpace guards already coalesce null and empty, so behavior is preserved; the type now honestly reflects the runtime nullability the deserializer enforces
  • Both records change from private to internal so the new SternPlaywrightRecordDeserializationTests in the test assembly can pin the contract
  • PinballWizard.Infrastructure.csproj adds [InternalsVisibleTo] for PinballWizard.Scraper.Tests with a comment explaining why (Stern Playwright scrapers have no automated integration tests per Phase 2 § Scope item 8 route ii; live-site validation is the only other check, and a JsonPropertyName typo would only surface as "0 results discovered" in production)
  • 6 new tests in SternPlaywrightRecordDeserializationTests pin: full-payload deserialization (both records), optional fields omitted → null, required field (Href) omitted → null (matching the downstream guards' contract)
  • docs/build-spec.md Phase 2 § Scope item 6 updated to reflect reality

Test Plan

  • dotnet test PinballWizard.slnx --nologo524 / 524 passing (was 518; +6 deserialization tests)
  • dotnet build PinballWizard.slnx --nologo → clean, zero warnings
  • The new tests deliberately exercise the same path Playwright invokes (System.Text.Json against the record types). A JsonPropertyName typo or breaking change to STJ's record handling fails these tests at build time, not at "0 results" against the live site

Local review

/local-review outcome: 0 🔴 / 2 ⚠️ / 8 categories ✅.

  • ⚠️ FixedHref: stringstring?: matches the System.Text.Json contract on missing fields. Downstream guards unchanged; type is now honest.
  • ⚠️ Fixed — added SternPlaywrightRecordDeserializationTests (6 cases) to pin the contract Playwright invokes.

Operational hand-off after merge

  • Run dotnet run --project src/PinballWizard.Cli -- --source games and --source bulletins against sternpinball.com and confirm the scrapers still produce non-zero ScrapedItem yields with full provenance. The new tests pin the JSON contract; the live-site validation pins the end-to-end Playwright integration.
  • If the live-site validation surfaces a regression, revert this commit and document why classes are still required (the comment in the file already references the historical PR fix(scrapers) wire JJP machine filter, harden Spooky, add pre-push self-audit #34 reasoning).

7-item self-audit

  1. Every option field is read — N/A (no options)
  2. Sibling-diffLinkRaw and BulletinRaw are now syntactically identical (positional records, [property: JsonPropertyName], string? for Href, comments cross-reference each other consistently)
  3. No bare catch { } — pre-existing bare catches in ClickTabAsync / TryClickLoadMoreAsync are NOT introduced by this PR (out of scope)
  4. CLI / orchestrator wiring — no CLI changes; tests pin the deserialization that the existing --source games / --source bulletins paths invoke
  5. Tests assert behavior — the 6 new tests deliberately use real System.Text.Json against real record types, with full / partial / missing-required payloads
  6. Build is zero-warning — verified
  7. Identity checkgit log -1 --format='%an <%ae>'Jim Keeley <94459922+jkeeley2073@users.noreply.github.com>

Out of Scope

🤖 Generated with Claude Code

Phase 2 § Scope item 6 (Wave 2). The package bump itself
(Microsoft.Playwright 1.12.0 → 1.59.0) already landed in PR #61;
this PR closes the scope item by completing the records-workaround
revert from PR #34.

Background: PR #34 converted LinkRaw / BulletinRaw / EditionRaw from
positional records to class-with-init-properties as a workaround for
Playwright 1.12.0's deserialization path, which used
Activator.CreateInstance(type) (no parameterless ctor on positional
records → throws). Playwright 1.59.0 deserializes via System.Text.Json,
which supports positional records natively. (EditionRaw was removed
in a separate refactor and is not part of this revert.)

What ships:

- LinkRaw and BulletinRaw revert from class-with-init-properties to
  positional records with [property: JsonPropertyName] attributes.
- The Href parameter is `string?` (not `string`) — this matches the
  System.Text.Json contract on missing fields (STJ assigns null,
  not empty string, when a property is absent in the JSON). The
  downstream IsNullOrWhiteSpace guards already coalesce null and
  empty, so behavior is preserved; the type now honestly reflects
  the runtime nullability the deserializer enforces.
- Both records change from `private` to `internal` so the new
  SternPlaywrightRecordDeserializationTests in the test assembly can
  pin the System.Text.Json deserialization contract Playwright invokes.
- Infrastructure.csproj adds [InternalsVisibleTo] for
  PinballWizard.Scraper.Tests, with a comment explaining why (to
  pin the deserialization contract — needed because Stern Playwright
  scrapers have no automated integration tests per Phase 2 § Scope
  item 8 route ii, and the only other validation surface is the
  live site).
- 6 new tests in SternPlaywrightRecordDeserializationTests pin:
  full-payload deserialization (LinkRaw, BulletinRaw); optional
  fields omitted → null; required field (Href) omitted → null,
  matching the downstream guards' contract. JsonPropertyName typos
  would otherwise only surface as "0 results discovered" against
  the live site.
- docs/build-spec.md Phase 2 § Scope item 6 updated to reflect
  reality: the package bump already landed in PR #61; this PR
  finishes the records-workaround revert.

Local review summary: 0 🔴 / 2 ⚠️ / 8 categories ✅.

⚠️ — Fixed:
- Tighten Href: string → string?: matches the System.Text.Json
  contract on missing fields. Downstream IsNullOrWhiteSpace guards
  already handle both null and empty; the type is now honest.
- Add System.Text.Json deserialization unit tests for LinkRaw and
  BulletinRaw: pins the contract Playwright invokes; catches
  JsonPropertyName typos that the build cannot.

Tests: 524 / 524 passing (was 518; +6 deserialization tests).
Build clean, zero warnings.

After merge — operational hand-off:
- Run `dotnet run --project src/PinballWizard.Cli -- --source games`
  and `--source bulletins` against sternpinball.com and confirm the
  scrapers still produce non-zero ScrapedItem yields with full
  provenance. The deserialization tests pin the JSON contract; the
  live-site validation pins the end-to-end Playwright integration.
- If the live-site validation surfaces a regression, revert this
  commit and document why classes are still required.
@jkeeley2073 jkeeley2073 added the claude-code Generated with Claude Code label May 4, 2026
@jkeeley2073 jkeeley2073 merged commit 76d3e8b into main May 4, 2026
3 of 4 checks passed
jkeeley2073 added a commit that referenced this pull request May 4, 2026
Item 4 hand-off (Phase 2 § Scope) live-validation against
https://opdb.org/api/ surfaced a regression in the original OPDB
integration (PR `d9face6`): the live API returns 404 on
/api/machines?page=...&page_size=... — the endpoint does not exist.
The PR `d9face6` unit tests pinned a `StubHandler` contract that the
real API never honored. Same failure pattern as PR #72 / Item 6
(PR #75): tests faithfully pinning a fiction.

Live-API probing confirmed:
- /api/export — 200, single 2.4 MB JSON array, ~2,360 machines (the
  full catalog; what we want for Phase 1's "full re-sync each run"
  semantics)
- /api/machines/{opdb_id} — 200 (single-machine lookup; unchanged)
- /api/changelog — 200 (incremental; reserved for Phase 4+)
- /api/machines?page=... — 404 (does not exist)

What ships:

- OpdbClient.StreamAllMachinesAsync rewritten as a single GET to
  /api/export, stream-parsed via JsonSerializer.DeserializeAsyncEnumerable
  so the JSON array is consumed element-by-element rather than
  fully buffered. Routes through SendPolitelyAsync (politeness gate
  unchanged). BuildPagedUrl helper deleted.
- OpdbOptions.PageSize property removed (only consumer was the
  removed paginated path; dead-config-grep clean).
- OpdbOptions.HttpTimeoutSeconds default bumped 60s → 120s. /api/export
  on a cold cache routinely takes 30s+; 120s gives reasonable headroom
  while still bounding hung calls.
- ServiceDefaults global standard-resilience defaults bumped:
  TotalRequestTimeout 30s → 120s, AttemptTimeout 10s → 50s,
  CircuitBreaker.SamplingDuration default → 120s (must be ≥ 2 ×
  AttemptTimeout per HttpStandardResilienceOptions validation). Per-client
  override on HttpStandardResilienceOptions named "OpdbClient-standard"
  was attempted and proved a no-op (the named-options key the standard
  handler uses when added via ConfigureHttpClientDefaults does not match
  the per-client name in the obvious way). Global bump is the simpler
  deterministic fix; benefits Stern Vue.js networkidle waits too (which
  routinely take 15–25s, well within the new budget).
- OpdbClientTests: replaces the paginated-shape tests with
  StreamAllMachinesAsync_HitsExportEndpoint_NotPaginatedMachines and
  StreamAllMachinesAsync_EmptyArray_YieldsNothing. The first asserts
  exactly one request is made AND that PathAndQuery equals
  "/api/export" — a future regression to pagination would either 404
  (no stub) or trip Single().
- OpdbSyncServiceTests: 7 SetResponseFor calls updated from
  /api/machines?page=1&page_size=100 to /api/export.
- decision-log.md DL-0003 captures the full history (PR `d9face6` →
  Item 4 hand-off → revert) and the "revisit when" trigger (OPDB ships
  a paginated machines endpoint, or Phase 4+ adopts changelog-based
  incremental sync).
- build-spec.md Phase 2 retrospective: Item 4 outcome added to the
  § Hand-off outcomes section; the "~12k machines" estimate corrected
  to "~2.4k" inline.
- http-resilience-research.md gets a status note at the top: the
  research recommended per-client custom AddResilienceHandler pipelines
  but what shipped is bumped global AddStandardResilienceHandler
  defaults. Reconciliation is honest about the trade-off (FileDownloader's
  300s wall is now soft-capped at 120s; fine for Phase 1 PDFs <20MB).
- memory project_external_apis_and_politeness.md updated with the
  actual OPDB endpoints in use.

Live-validation status:
- Unit tests: 535 / 535 passing (was 534; +1 from new EmptyArray test).
- Build clean, zero warnings.
- /api/export endpoint shape was verified by direct probe (200, 2.4 MB,
  2,359 records, expected DTO field shape).
- Live --source opdb --dry-run against deployed Cosmos was BLOCKED at
  PR-creation time by an OPDB rate-limit window from validation probes
  (we hit /api/export multiple times in quick succession during the
  initial endpoint probing). The CODE fix is fully validated by unit
  tests + the standalone endpoint probe; the operational deployed-Cosmos
  sync runs cleanly post-merge once the rate-limit window clears
  (parallel to PR #75's games scrape, which was in flight at PR-creation
  time and confirmed in a follow-up comment).

Local review summary: 0 🔴 / 3 ⚠️ / 7 categories ✅. ⚠️ findings:

1. ServiceDefaults blast-radius (global change to fix one client's
   problem) — fixed by adding a status note to http-resilience-research.md
   so the architectural reasoning stays coherent.
2. Malformed-row resilience: DeserializeAsyncEnumerable aborts on first
   bad row → kills the whole 2,360-row sync. Not a regression vs. the
   old paginated code (same behavior), but worse than DL-0003's framing
   implies. Deferred as a follow-up: wrap inner DTO read in try/catch
   that logs+skips and increments OpdbSyncMalformed counter. Cost-of-
   deferring: low (OPDB hasn't shipped a malformed row historically).
3. Mid-stream cancellation test: not added; framework-honored via
   token threading. Noted in review, not blocking.

Tests: 535 / 535 passing (was 534; -2 paginated-shape tests, +3
new tests including EmptyArray + endpoint pinning).
Build clean, zero warnings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code Generated with Claude Code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant