feat(pinballbrothers) Pinball Brothers scraper — Phase 1.3.a#37
Merged
Conversation
Fifth manufacturer scraper, fourth in the WordPress-REST-API family. Pinball Brothers runs WP + Visual Composer with /wp-json/wp/v2/pages fully open. Game-page filter: pages whose slug ends with the configured suffix (default `-pinball`) -- Queen, Alien, ABBA, Predator all follow this convention. Suffix stripped to derive a canonical slug (queen-pinball -> queen) used as GameRecord.Slug. PB's marketing pages have no firmware downloads or JSON-LD product data. Editions are buried in Visual Composer shortcodes that would need a dedicated parser. v1 ships a minimal GameRecord (title + canonical slug + page URL); the catalog spine comes from OPDB, same pattern as AP and Spooky. ScraperManufacturerKey adds PinballBrothers = "pinballbrothers" constant + game_pinballbrothers_* prefix dispatch -- matches OpdbMachineMapper.NormalizeManufacturerKey exactly so reconciled records land in the correct Cosmos partition. First scraper PR shipped through the new two-step pre-push audit from PR #36: * /local-review (Step 0): 0 🔴 / 3⚠️ -- one⚠️ fixed (log message drift); two deferred (test gap, cosmetic config order) * 7-item mechanical checklist (Step 1): all pass Tests: 288 -> 314 (+26). Build: zero warnings. DI smoke: clean. Politeness: WP REST + polite gate inherited from PoliteScraperBase. CLI: --source pinballbrothers.
Comment on lines
+94
to
+99
| catch (Exception ex) | ||
| { | ||
| Logger.LogWarning( | ||
| ex, "Pinball Brothers scraper: failed to extract page {Url}; skipping.", page.Link); | ||
| return null; | ||
| } |
Comment on lines
+112
to
+119
| foreach (var page in pages) | ||
| { | ||
| if (!string.IsNullOrEmpty(page.Slug) | ||
| && page.Slug.EndsWith(suffix, StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| result.Add(page); | ||
| } | ||
| } |
This was referenced May 3, 2026
jkeeley2073
added a commit
that referenced
this pull request
May 3, 2026
Seventh manufacturer scraper, third using JSON-LD product schema
(after JJP and BoF). WordPress + WooCommerce; discovery walks the
WP sitemap index and filters URLs to /store/p3-game-kits/multimorphic-game-kits/{slug}/
only -- Multimorphic-published P3 game kits, not the 13 third-party
kits sold through the same storefront. Third-party kits belong to
their originating studios per OPDB attribution; running them through
the reconciler with manufacturer=multimorphic would land them in the
wrong Cosmos partition (see ADR 0011).
Multimorphic JSON-LD ships BOTH flat offers[].price AND nested
offers[].priceSpecification (object, not array -- distinct from BoF),
and uses http://schema.org/... not https:// for availability.
MultimorphicProductExtractor handles every combination + @graph
wrapping -- the same code would work against any well-formed
WooCommerce-on-WordPress storefront.
ScraperManufacturerKey adds Multimorphic = "multimorphic" matching
OpdbMachineMapper.NormalizeManufacturerKey exactly so reconciled
records land in the correct Cosmos partition.
Pre-push self-audit:
* /local-review (Step 0): 0 🔴 / 1 ⚠️ -- family-wide test-infra
gap (no scraper-pipeline integration test exists for any of the
7 scrapers; would need shared IPolitenessGate + HttpMessageHandler
mocking infra to close across all of them); deferred to a focused
cross-cutting follow-up
* 7-item mechanical checklist (Step 1): all pass
Tests: 347 -> 375 (+28). Build: zero warnings. CLI: --source multimorphic.
Phase 1.3 status after this PR:
* Pinball Brothers (#37) -- shipped
* Barrels of Fun (#38) -- shipped
* Multimorphic -- this PR
* Chicago Gaming -- next candidate (custom CMS, AP-style template)
* Dutch Pinball -- DEFERRED with prejudice (robots.txt Disallow: /)
* Haggis -- DEFERRED, infrastructure outage (haggispinball.com.au
web server unreachable; recommend retry in 1-2 weeks)
jkeeley2073
added a commit
that referenced
this pull request
May 3, 2026
Seventh manufacturer scraper, third using JSON-LD product schema
(after JJP and BoF). WordPress + WooCommerce; discovery walks the
WP sitemap index and filters URLs to /store/p3-game-kits/multimorphic-game-kits/{slug}/
only -- Multimorphic-published P3 game kits, not the 13 third-party
kits sold through the same storefront. Third-party kits belong to
their originating studios per OPDB attribution; running them through
the reconciler with manufacturer=multimorphic would land them in the
wrong Cosmos partition (see ADR 0011).
Multimorphic JSON-LD ships BOTH flat offers[].price AND nested
offers[].priceSpecification (object, not array -- distinct from BoF),
and uses http://schema.org/... not https:// for availability.
MultimorphicProductExtractor handles every combination + @graph
wrapping -- the same code would work against any well-formed
WooCommerce-on-WordPress storefront.
ScraperManufacturerKey adds Multimorphic = "multimorphic" matching
OpdbMachineMapper.NormalizeManufacturerKey exactly so reconciled
records land in the correct Cosmos partition.
Pre-push self-audit:
* /local-review (Step 0): 0 🔴 / 1 ⚠️ -- family-wide test-infra
gap (no scraper-pipeline integration test exists for any of the
7 scrapers; would need shared IPolitenessGate + HttpMessageHandler
mocking infra to close across all of them); deferred to a focused
cross-cutting follow-up
* 7-item mechanical checklist (Step 1): all pass
Tests: 347 -> 375 (+28). Build: zero warnings. CLI: --source multimorphic.
Phase 1.3 status after this PR:
* Pinball Brothers (#37) -- shipped
* Barrels of Fun (#38) -- shipped
* Multimorphic -- this PR
* Chicago Gaming -- next candidate (custom CMS, AP-style template)
* Dutch Pinball -- DEFERRED with prejudice (robots.txt Disallow: /)
* Haggis -- DEFERRED, infrastructure outage (haggispinball.com.au
web server unreachable; recommend retry in 1-2 weeks)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fifth manufacturer scraper, fourth in the WordPress-REST-API family (after Spooky). Pinball Brothers runs WP + Visual Composer with
/wp-json/wp/v2/pagesfully open. Game-page filter: pages whose slug ends with the configured suffix (default-pinball) — Queen, Alien, ABBA, and Predator all follow this convention. The suffix is stripped to derive a canonical slug (queen-pinball→queen).PB's marketing pages have no firmware downloads or JSON-LD product data. Editions are buried in Visual Composer shortcodes (
[vc_tta_section title="Champions Edition"]…) that would need a dedicated parser. v1 ships a minimalGameRecord(title + canonical slug + page URL); the catalog spine comes from OPDB, same pattern as AP and Spooky. Edition extraction is a follow-up if/when the VC parser is worth writing.Catalog coverage
With Stern + JJP + AP + Spooky + Pinball Brothers, the project now covers ~98% of currently-shipping commercial pinball.
Reconciler integration
ScraperManufacturerKey.PinballBrothers = "pinballbrothers"matchesOpdbMachineMapper.NormalizeManufacturerKeyexactly, so records produced by this scraper land in the same Cosmos partition the OPDB sync wrote PB machines to. The reconciler from PR #35 will pick them up automatically once Cosmos is deployed.Pre-push self-audit (per PR #34 + PR #36)
Step 0 —
/local-review(qualitative)Local review: 0 🔴 / 3 ⚠️ / 7 categories ✅"aborting scrape"vs Spooky's"aborting Spooky scrape") — operational greppabilityFilterGamePageswith a slug equal to suffix-only (covered at the extractor stage instead, layered defense)appsettings.jsonPinballBrotherskey order differs fromSpookycosmeticallyStep 1 — Mechanical checklist
PinballBrothersOptionsproperty read by code (verified by grep —BaseUrl/PagesEndpointPath/PageSize/MaxPagesToFetch/GameSlugSuffixall hit)TryExtractpattern, same constructor null-check style, log shapes parallel (after fixingcatch { }insrc/PinballWizard.Infrastructure/Scraping/PinballBrothers/SourceAliasContractTestsstill passes —Name = \"Pinball Brothers\"matches the alias-map value[\"pinballbrothers\"] = \"Pinball Brothers\"FilterGamePages_KeepsOnlyPagesWithMatchingSuffixincludes 4 game + 3 non-game fixtures withAssert.DoesNotContain(p => p.Slug == \"about-us\")git log -1 --format='%an <%ae>'— personal noreplyTests
314 / 314 passing (was 288). +26:
PbWpPagesClientTests(8) — JSON deserialization round-trip, malformed body graceful handling, suffix filter (positive + negative + case-insensitive + empty-slug rejection), null-arg validationPbGamePageExtractorTests(16) — record build with canonical slug, suffix stripping for all four shipped games, HTML entity decoding, non-game slug rejection, suffix-only slug rejection, empty title/link rejection,StripSuffixcase-insensitivity, null-arg validationScraperManufacturerKeyTests(+1 row) —game_pinballbrothers_queen→pinballbrothersPhase 1.3 recon recap
Six manufacturers surveyed in parallel before this build:
Disallow: /in robots.txtOut of scope
robots.txt; would need a polite outreach email to dutchpinball.com before any code🤖 Generated with Claude Code