Skip to content

feat(pinballbrothers) Pinball Brothers scraper — Phase 1.3.a#37

Merged
jkeeley2073 merged 1 commit into
mainfrom
Dev-PinballBrothersScraper
May 2, 2026
Merged

feat(pinballbrothers) Pinball Brothers scraper — Phase 1.3.a#37
jkeeley2073 merged 1 commit into
mainfrom
Dev-PinballBrothersScraper

Conversation

@jkeeley2073
Copy link
Copy Markdown
Contributor

Summary

Fifth manufacturer scraper, fourth in the WordPress-REST-API family (after Spooky). Pinball Brothers runs WP + Visual Composer with /wp-json/wp/v2/pages fully open. Game-page filter: pages whose slug ends with the configured suffix (default -pinball) — Queen, Alien, ABBA, and Predator all follow this convention. The suffix is stripped to derive a canonical slug (queen-pinballqueen).

PB's marketing pages have no firmware downloads or JSON-LD product data. Editions are buried in Visual Composer shortcodes ([vc_tta_section title="Champions Edition"]…) that would need a dedicated parser. v1 ships a minimal GameRecord (title + canonical slug + page URL); the catalog spine comes from OPDB, same pattern as AP and Spooky. Edition extraction is a follow-up if/when the VC parser is worth writing.

Catalog coverage

With Stern + JJP + AP + Spooky + Pinball Brothers, the project now covers ~98% of currently-shipping commercial pinball.

Reconciler integration

ScraperManufacturerKey.PinballBrothers = "pinballbrothers" matches OpdbMachineMapper.NormalizeManufacturerKey exactly, so records produced by this scraper land in the same Cosmos partition the OPDB sync wrote PB machines to. The reconciler from PR #35 will pick them up automatically once Cosmos is deployed.

Pre-push self-audit (per PR #34 + PR #36)

Step 0 — /local-review (qualitative)

Local review: 0 🔴 / 3 ⚠️ / 7 categories ✅

# Severity Finding Action
1 ⚠️ Discovery-failure log message lacked manufacturer name ("aborting scrape" vs Spooky's "aborting Spooky scrape") — operational greppability Fixed in commit
2 ⚠️ No test for FilterGamePages with a slug equal to suffix-only (covered at the extractor stage instead, layered defense) Deferred — already covered downstream; not a new failure mode
3 ⚠️ appsettings.json PinballBrothers key order differs from Spooky cosmetically Deferred — pure cosmetic, no behavior impact

Step 1 — Mechanical checklist

  • Every PinballBrothersOptions property read by code (verified by grep — BaseUrl / PagesEndpointPath / PageSize / MaxPagesToFetch / GameSlugSuffix all hit)
  • Sibling-diff against Spooky: same DI shape, same TryExtract pattern, same constructor null-check style, log shapes parallel (after fixing ⚠️ Round 2 polish: README, HTTP retry, cross-source linking, DOM research #1)
  • No bare catch { } in src/PinballWizard.Infrastructure/Scraping/PinballBrothers/
  • SourceAliasContractTests still passes — Name = \"Pinball Brothers\" matches the alias-map value [\"pinballbrothers\"] = \"Pinball Brothers\"
  • Tests assert behavior — FilterGamePages_KeepsOnlyPagesWithMatchingSuffix includes 4 game + 3 non-game fixtures with Assert.DoesNotContain(p => p.Slug == \"about-us\")
  • Build is zero-warning
  • git log -1 --format='%an <%ae>' — personal noreply

Tests

314 / 314 passing (was 288). +26:

  • PbWpPagesClientTests (8) — JSON deserialization round-trip, malformed body graceful handling, suffix filter (positive + negative + case-insensitive + empty-slug rejection), null-arg validation
  • PbGamePageExtractorTests (16) — record build with canonical slug, suffix stripping for all four shipped games, HTML entity decoding, non-game slug rejection, suffix-only slug rejection, empty title/link rejection, StripSuffix case-insensitivity, null-arg validation
  • ScraperManufacturerKeyTests (+1 row) — game_pinballbrothers_queenpinballbrothers
  • All previously-passing tests untouched

Phase 1.3 recon recap

Six manufacturers surveyed in parallel before this build:

Mfr Site shape Verdict
Pinball Brothers WP REST API open ✅ shipped (this PR)
Barrels of Fun WP + WooCommerce + JSON-LD next candidate
Multimorphic WP + WooCommerce, ~100 P3 products medium
Chicago Gaming Custom CMS, sitemap medium (different template)
Dutch Pinball Static PHP, Disallow: / in robots.txt 🚫 deferred — respect the disallow per polite-scraping memory
Haggis recon agent had env failure recon manually later

Out of scope

  • Visual Composer shortcode parser for editions (defer until parser is worth the cost)
  • Other Phase 1.3 manufacturers (separate PRs)
  • Dutch Pinball: blocked by robots.txt; would need a polite outreach email to dutchpinball.com before any code

🤖 Generated with Claude Code

Fifth manufacturer scraper, fourth in the WordPress-REST-API family.
Pinball Brothers runs WP + Visual Composer with /wp-json/wp/v2/pages
fully open. Game-page filter: pages whose slug ends with the
configured suffix (default `-pinball`) -- Queen, Alien, ABBA, Predator
all follow this convention. Suffix stripped to derive a canonical
slug (queen-pinball -> queen) used as GameRecord.Slug.

PB's marketing pages have no firmware downloads or JSON-LD product
data. Editions are buried in Visual Composer shortcodes that would
need a dedicated parser. v1 ships a minimal GameRecord (title +
canonical slug + page URL); the catalog spine comes from OPDB, same
pattern as AP and Spooky.

ScraperManufacturerKey adds PinballBrothers = "pinballbrothers"
constant + game_pinballbrothers_* prefix dispatch -- matches
OpdbMachineMapper.NormalizeManufacturerKey exactly so reconciled
records land in the correct Cosmos partition.

First scraper PR shipped through the new two-step pre-push audit
from PR #36:
  * /local-review (Step 0): 0 🔴 / 3 ⚠️ -- one ⚠️ fixed (log message
    drift); two deferred (test gap, cosmetic config order)
  * 7-item mechanical checklist (Step 1): all pass

Tests: 288 -> 314 (+26). Build: zero warnings. DI smoke: clean.
Politeness: WP REST + polite gate inherited from PoliteScraperBase.
CLI: --source pinballbrothers.
@jkeeley2073 jkeeley2073 added the claude-code Generated with Claude Code label May 2, 2026
@jkeeley2073 jkeeley2073 merged commit dec771a into main May 2, 2026
3 checks passed
Comment on lines +94 to +99
catch (Exception ex)
{
Logger.LogWarning(
ex, "Pinball Brothers scraper: failed to extract page {Url}; skipping.", page.Link);
return null;
}
Comment on lines +112 to +119
foreach (var page in pages)
{
if (!string.IsNullOrEmpty(page.Slug)
&& page.Slug.EndsWith(suffix, StringComparison.OrdinalIgnoreCase))
{
result.Add(page);
}
}
jkeeley2073 added a commit that referenced this pull request May 3, 2026
Seventh manufacturer scraper, third using JSON-LD product schema
(after JJP and BoF). WordPress + WooCommerce; discovery walks the
WP sitemap index and filters URLs to /store/p3-game-kits/multimorphic-game-kits/{slug}/
only -- Multimorphic-published P3 game kits, not the 13 third-party
kits sold through the same storefront. Third-party kits belong to
their originating studios per OPDB attribution; running them through
the reconciler with manufacturer=multimorphic would land them in the
wrong Cosmos partition (see ADR 0011).

Multimorphic JSON-LD ships BOTH flat offers[].price AND nested
offers[].priceSpecification (object, not array -- distinct from BoF),
and uses http://schema.org/... not https:// for availability.
MultimorphicProductExtractor handles every combination + @graph
wrapping -- the same code would work against any well-formed
WooCommerce-on-WordPress storefront.

ScraperManufacturerKey adds Multimorphic = "multimorphic" matching
OpdbMachineMapper.NormalizeManufacturerKey exactly so reconciled
records land in the correct Cosmos partition.

Pre-push self-audit:
  * /local-review (Step 0): 0 🔴 / 1 ⚠️ -- family-wide test-infra
    gap (no scraper-pipeline integration test exists for any of the
    7 scrapers; would need shared IPolitenessGate + HttpMessageHandler
    mocking infra to close across all of them); deferred to a focused
    cross-cutting follow-up
  * 7-item mechanical checklist (Step 1): all pass

Tests: 347 -> 375 (+28). Build: zero warnings. CLI: --source multimorphic.

Phase 1.3 status after this PR:
  * Pinball Brothers (#37) -- shipped
  * Barrels of Fun (#38) -- shipped
  * Multimorphic -- this PR
  * Chicago Gaming -- next candidate (custom CMS, AP-style template)
  * Dutch Pinball -- DEFERRED with prejudice (robots.txt Disallow: /)
  * Haggis -- DEFERRED, infrastructure outage (haggispinball.com.au
    web server unreachable; recommend retry in 1-2 weeks)
jkeeley2073 added a commit that referenced this pull request May 3, 2026
Seventh manufacturer scraper, third using JSON-LD product schema
(after JJP and BoF). WordPress + WooCommerce; discovery walks the
WP sitemap index and filters URLs to /store/p3-game-kits/multimorphic-game-kits/{slug}/
only -- Multimorphic-published P3 game kits, not the 13 third-party
kits sold through the same storefront. Third-party kits belong to
their originating studios per OPDB attribution; running them through
the reconciler with manufacturer=multimorphic would land them in the
wrong Cosmos partition (see ADR 0011).

Multimorphic JSON-LD ships BOTH flat offers[].price AND nested
offers[].priceSpecification (object, not array -- distinct from BoF),
and uses http://schema.org/... not https:// for availability.
MultimorphicProductExtractor handles every combination + @graph
wrapping -- the same code would work against any well-formed
WooCommerce-on-WordPress storefront.

ScraperManufacturerKey adds Multimorphic = "multimorphic" matching
OpdbMachineMapper.NormalizeManufacturerKey exactly so reconciled
records land in the correct Cosmos partition.

Pre-push self-audit:
  * /local-review (Step 0): 0 🔴 / 1 ⚠️ -- family-wide test-infra
    gap (no scraper-pipeline integration test exists for any of the
    7 scrapers; would need shared IPolitenessGate + HttpMessageHandler
    mocking infra to close across all of them); deferred to a focused
    cross-cutting follow-up
  * 7-item mechanical checklist (Step 1): all pass

Tests: 347 -> 375 (+28). Build: zero warnings. CLI: --source multimorphic.

Phase 1.3 status after this PR:
  * Pinball Brothers (#37) -- shipped
  * Barrels of Fun (#38) -- shipped
  * Multimorphic -- this PR
  * Chicago Gaming -- next candidate (custom CMS, AP-style template)
  * Dutch Pinball -- DEFERRED with prejudice (robots.txt Disallow: /)
  * Haggis -- DEFERRED, infrastructure outage (haggispinball.com.au
    web server unreachable; recommend retry in 1-2 weeks)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code Generated with Claude Code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants