Skip to content

feat(barrelsoffun) Barrels of Fun scraper — Phase 1.3.b#38

Merged
jkeeley2073 merged 1 commit into
mainfrom
Dev-BarrelsOfFunScraper
May 3, 2026
Merged

feat(barrelsoffun) Barrels of Fun scraper — Phase 1.3.b#38
jkeeley2073 merged 1 commit into
mainfrom
Dev-BarrelsOfFunScraper

Conversation

@jkeeley2073
Copy link
Copy Markdown
Contributor

Summary

Sixth manufacturer scraper, second to consume JSON-LD schema.org/Product (after JJP). Barrels of Fun sells through WooCommerce on a separate storefront domain — shop.kollectfun.com — while the marketing site www.barrelsoffun.com has no products. The scraper targets the storefront.

Discovery: /product-category/machines/ HTML page is the canonical filter for what counts as a pinball machine. The /product/* URL space also contains apparel, parts, and accessories — same defence-in-depth pattern as JJP's collection-handle filter shipped in PR #34, applied to a different storefront stack.

Per-product: parses JSON-LD @type: Product which has two interesting wrinkles:

  1. @graph wrapping. Yoast / RankMath SEO plugins wrap multiple schema entries in a @graph array; the Product entry can be anywhere inside.
  2. Two price shapes. WooCommerce nests under offers[].priceSpecification[].price (BoF's actual shape, verified during recon). Shopify puts it directly on offers[].price (JJP's shape). BofProductExtractor reads both, with nested taking precedence — the same extractor would work against another WooCommerce-on-WordPress storefront without modification.

Catalog state

Today's /product-category/machines/ returns one machine: Jim Henson's Labyrinth (1,100 units, $10,600). The category is the right filter as the catalog grows — Bowie / Dune / future titles will appear there as they ship.

Reconciler integration

ScraperManufacturerKey.BarrelsOfFun = \"barrelsoffun\" matches OpdbMachineMapper.NormalizeManufacturerKey exactly, so records land in the same Cosmos partition the OPDB sync wrote BoF machines to. The reconciler from PR #35 will pick them up automatically once Cosmos deploys.

Pre-push self-audit (per PR #34 + PR #36)

Step 0 — /local-review (qualitative)

Local review: 0 🔴 / 1 ⚠️ / 9 categories ✅ — cleanest review yet.

# Severity Finding Action
1 ⚠️ No full-pipeline integration test for BofProductScraper.ScrapeAsync (yield order, DiscoveryUrl/Context propagation, per-page failure isolation) Deferred — sibling parity with JJP/AP, which have the same gap; extractor + category-client tests separately cover the relevant logic; not a new failure mode

Step 1 — Mechanical checklist

  • Every BarrelsOfFunOptions property read by code (verified by grep — BaseUrl / MachinesCategoryPath / ProductPathPrefix all hit, no dead config)
  • Sibling-diff against JJP: identical constructor signatures, null-check pattern, TryExtractAsync shape, log message including manufacturer name (the PR feat(pinballbrothers) Pinball Brothers scraper — Phase 1.3.a #37 finding), DI registration pattern, yield order
  • No bare catch { } in src/PinballWizard.Infrastructure/Scraping/BarrelsOfFun/
  • SourceAliasContractTests still passes — Name = \"Barrels of Fun\" matches alias-map value
  • Tests assert behavior — ParseProductLinks fixture mixes machines + category + cart + external + sub-page + fragment + query; Extract_NestedPriceSpecification_BuildsRecord and Extract_FlatPriceShape_AlsoWorks cover both real shapes
  • Build is zero-warning
  • git log -1 --format='%an <%ae>' — personal noreply

Tests

347 / 347 passing (was 314). +33:

  • BofCategoryClientTests (7) — canonical product URL filter; sub-page rejection; fragment + query dedup; relative href base resolution; null/blank-arg validation
  • BofProductExtractorTests (15) — nested priceSpecification shape (BoF actual); flat offers.price shape; @graph wrapping; og:title fallback; h1 fallback; missing-everything → null; malformed JSON-LD fall-through; non-Product type ignored; slug extraction edge cases; full availability normalization table; null-arg validation
  • ScraperManufacturerKeyTests (+1 row) — game_barrelsoffun_jim-hensons-labyrinthbarrelsoffun

Phase 1.3 status

Mfr Status
Pinball Brothers ✅ shipped (#37)
Barrels of Fun ✅ this PR
Multimorphic ✅ next candidate (~100 P3 products, more filtering work)
Chicago Gaming medium — custom CMS, AP-style template
Dutch Pinball 🚫 deferred — Disallow: / in robots.txt; would need polite outreach
Haggis recon agent had env failure earlier — manual recon needed

After this merges, catalog coverage is ~99% of currently-shipping commercial pinball.

Out of scope

  • Full scraper-pipeline integration test (deferred per Step 0 review — same gap exists in JJP/AP and is not a new regression)
  • Multimorphic / Chicago Gaming / Haggis scrapers (separate PRs)
  • Dutch Pinball (blocked by robots.txt)

🤖 Generated with Claude Code

Sixth manufacturer scraper, second using JSON-LD product schema
(after JJP). BoF sells through WooCommerce on a separate storefront
(shop.kollectfun.com); the marketing site has no products.

Discovery: /product-category/machines/ HTML page -- canonical filter
for what counts as a pinball machine, since /product/* URL space
also contains apparel, parts, and accessories. Same defence-in-depth
pattern as JJP's collection-handle filter from PR #34.

Per-product: parses JSON-LD which (a) is sometimes wrapped in @graph
(Yoast/RankMath), (b) exposes price under EITHER WooCommerce nested
offers[].priceSpecification[].price OR flat Shopify offers[].price
shape. BofProductExtractor reads both -- same code works against
any WooCommerce-on-WordPress storefront without modification.

ScraperManufacturerKey adds BarrelsOfFun = "barrelsoffun" matching
OpdbMachineMapper.NormalizeManufacturerKey exactly so reconciled
records land in the correct Cosmos partition.

Pre-push self-audit:
  * /local-review (Step 0): 0 🔴 / 1 ⚠️ -- defer-with-justification
    (no full scraper-pipeline integration test; sibling parity with
    JJP, not a new gap)
  * 7-item mechanical checklist (Step 1): all pass

Tests: 314 -> 347 (+33). Build: zero warnings. CLI: --source barrelsoffun.
@jkeeley2073 jkeeley2073 added the claude-code Generated with Claude Code label May 3, 2026
@jkeeley2073 jkeeley2073 merged commit 4a02cdd into main May 3, 2026
3 checks passed
"outofstock" => "out_of_stock",
"preorder" => "preorder",
"discontinued" => "discontinued",
_ => lastSegment?.ToLowerInvariant(),
Comment on lines +81 to +103
foreach (var anchor in doc.QuerySelectorAll("a[href]"))
{
var href = anchor.GetAttribute("href");
if (string.IsNullOrWhiteSpace(href)) continue;
if (!Uri.TryCreate(baseUri, href, out var absolute)) continue;

if (!string.Equals(absolute.Host, baseUri.Host, StringComparison.OrdinalIgnoreCase)) continue;
if (!absolute.AbsolutePath.StartsWith(productPathPrefix, StringComparison.OrdinalIgnoreCase)) continue;

// Reject links to sub-pages of a product (e.g. /product/x/reviews) by
// requiring exactly one path segment after the prefix.
var afterPrefix = absolute.AbsolutePath[productPathPrefix.Length..].TrimEnd('/');
if (afterPrefix.Length == 0) continue;
if (afterPrefix.Contains('/', StringComparison.Ordinal)) continue;

// Drop the fragment and query so two anchor variants of the same
// product canonicalise to one URL in the result set.
var canonical = new UriBuilder(absolute) { Fragment = "", Query = "" }.Uri;
if (seen.Add(canonical.AbsoluteUri))
{
urls.Add(canonical);
}
}
Comment on lines +130 to +165
foreach (var script in doc.QuerySelectorAll("script[type='application/ld+json']"))
{
var text = script.TextContent;
if (string.IsNullOrWhiteSpace(text)) continue;

JsonElement root;
try
{
using var parsed = JsonDocument.Parse(text);
root = parsed.RootElement.Clone();
}
catch (JsonException)
{
continue;
}

// WooCommerce sometimes wraps JSON-LD in @graph; sometimes not.
if (root.ValueKind == JsonValueKind.Object && root.TryGetProperty("@graph", out var graph) && graph.ValueKind == JsonValueKind.Array)
{
foreach (var item in graph.EnumerateArray())
{
if (TryReadProduct(item) is { } prod) return prod;
}
}
else if (root.ValueKind == JsonValueKind.Array)
{
foreach (var item in root.EnumerateArray())
{
if (TryReadProduct(item) is { } prod) return prod;
}
}
else
{
if (TryReadProduct(root) is { } prod) return prod;
}
}
Comment on lines +149 to +152
foreach (var item in graph.EnumerateArray())
{
if (TryReadProduct(item) is { } prod) return prod;
}
Comment on lines +156 to +159
foreach (var item in root.EnumerateArray())
{
if (TryReadProduct(item) is { } prod) return prod;
}
Comment on lines +204 to +210
foreach (var item in imageProp.EnumerateArray())
{
if (item.ValueKind == JsonValueKind.String && item.GetString() is { Length: > 0 } s)
{
images.Add(s);
}
}
Comment on lines +238 to +241
if (offer.TryGetProperty("price", out var direct))
{
if (FormatPrice(direct) is { } flat) return flat;
}
Comment on lines +97 to +102
catch (Exception ex)
{
Logger.LogWarning(
ex, "Barrels of Fun scraper: failed to fetch / extract {Url}; skipping.", productUrl);
return null;
}
jkeeley2073 added a commit that referenced this pull request May 3, 2026
Seventh manufacturer scraper, third using JSON-LD product schema
(after JJP and BoF). WordPress + WooCommerce; discovery walks the
WP sitemap index and filters URLs to /store/p3-game-kits/multimorphic-game-kits/{slug}/
only -- Multimorphic-published P3 game kits, not the 13 third-party
kits sold through the same storefront. Third-party kits belong to
their originating studios per OPDB attribution; running them through
the reconciler with manufacturer=multimorphic would land them in the
wrong Cosmos partition (see ADR 0011).

Multimorphic JSON-LD ships BOTH flat offers[].price AND nested
offers[].priceSpecification (object, not array -- distinct from BoF),
and uses http://schema.org/... not https:// for availability.
MultimorphicProductExtractor handles every combination + @graph
wrapping -- the same code would work against any well-formed
WooCommerce-on-WordPress storefront.

ScraperManufacturerKey adds Multimorphic = "multimorphic" matching
OpdbMachineMapper.NormalizeManufacturerKey exactly so reconciled
records land in the correct Cosmos partition.

Pre-push self-audit:
  * /local-review (Step 0): 0 🔴 / 1 ⚠️ -- family-wide test-infra
    gap (no scraper-pipeline integration test exists for any of the
    7 scrapers; would need shared IPolitenessGate + HttpMessageHandler
    mocking infra to close across all of them); deferred to a focused
    cross-cutting follow-up
  * 7-item mechanical checklist (Step 1): all pass

Tests: 347 -> 375 (+28). Build: zero warnings. CLI: --source multimorphic.

Phase 1.3 status after this PR:
  * Pinball Brothers (#37) -- shipped
  * Barrels of Fun (#38) -- shipped
  * Multimorphic -- this PR
  * Chicago Gaming -- next candidate (custom CMS, AP-style template)
  * Dutch Pinball -- DEFERRED with prejudice (robots.txt Disallow: /)
  * Haggis -- DEFERRED, infrastructure outage (haggispinball.com.au
    web server unreachable; recommend retry in 1-2 weeks)
jkeeley2073 added a commit that referenced this pull request May 3, 2026
Seventh manufacturer scraper, third using JSON-LD product schema
(after JJP and BoF). WordPress + WooCommerce; discovery walks the
WP sitemap index and filters URLs to /store/p3-game-kits/multimorphic-game-kits/{slug}/
only -- Multimorphic-published P3 game kits, not the 13 third-party
kits sold through the same storefront. Third-party kits belong to
their originating studios per OPDB attribution; running them through
the reconciler with manufacturer=multimorphic would land them in the
wrong Cosmos partition (see ADR 0011).

Multimorphic JSON-LD ships BOTH flat offers[].price AND nested
offers[].priceSpecification (object, not array -- distinct from BoF),
and uses http://schema.org/... not https:// for availability.
MultimorphicProductExtractor handles every combination + @graph
wrapping -- the same code would work against any well-formed
WooCommerce-on-WordPress storefront.

ScraperManufacturerKey adds Multimorphic = "multimorphic" matching
OpdbMachineMapper.NormalizeManufacturerKey exactly so reconciled
records land in the correct Cosmos partition.

Pre-push self-audit:
  * /local-review (Step 0): 0 🔴 / 1 ⚠️ -- family-wide test-infra
    gap (no scraper-pipeline integration test exists for any of the
    7 scrapers; would need shared IPolitenessGate + HttpMessageHandler
    mocking infra to close across all of them); deferred to a focused
    cross-cutting follow-up
  * 7-item mechanical checklist (Step 1): all pass

Tests: 347 -> 375 (+28). Build: zero warnings. CLI: --source multimorphic.

Phase 1.3 status after this PR:
  * Pinball Brothers (#37) -- shipped
  * Barrels of Fun (#38) -- shipped
  * Multimorphic -- this PR
  * Chicago Gaming -- next candidate (custom CMS, AP-style template)
  * Dutch Pinball -- DEFERRED with prejudice (robots.txt Disallow: /)
  * Haggis -- DEFERRED, infrastructure outage (haggispinball.com.au
    web server unreachable; recommend retry in 1-2 weeks)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code Generated with Claude Code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants