feat(barrelsoffun) Barrels of Fun scraper — Phase 1.3.b#38
Merged
Conversation
Sixth manufacturer scraper, second using JSON-LD product schema (after JJP). BoF sells through WooCommerce on a separate storefront (shop.kollectfun.com); the marketing site has no products. Discovery: /product-category/machines/ HTML page -- canonical filter for what counts as a pinball machine, since /product/* URL space also contains apparel, parts, and accessories. Same defence-in-depth pattern as JJP's collection-handle filter from PR #34. Per-product: parses JSON-LD which (a) is sometimes wrapped in @graph (Yoast/RankMath), (b) exposes price under EITHER WooCommerce nested offers[].priceSpecification[].price OR flat Shopify offers[].price shape. BofProductExtractor reads both -- same code works against any WooCommerce-on-WordPress storefront without modification. ScraperManufacturerKey adds BarrelsOfFun = "barrelsoffun" matching OpdbMachineMapper.NormalizeManufacturerKey exactly so reconciled records land in the correct Cosmos partition. Pre-push self-audit: * /local-review (Step 0): 0 🔴 / 1⚠️ -- defer-with-justification (no full scraper-pipeline integration test; sibling parity with JJP, not a new gap) * 7-item mechanical checklist (Step 1): all pass Tests: 314 -> 347 (+33). Build: zero warnings. CLI: --source barrelsoffun.
| "outofstock" => "out_of_stock", | ||
| "preorder" => "preorder", | ||
| "discontinued" => "discontinued", | ||
| _ => lastSegment?.ToLowerInvariant(), |
Comment on lines
+81
to
+103
| foreach (var anchor in doc.QuerySelectorAll("a[href]")) | ||
| { | ||
| var href = anchor.GetAttribute("href"); | ||
| if (string.IsNullOrWhiteSpace(href)) continue; | ||
| if (!Uri.TryCreate(baseUri, href, out var absolute)) continue; | ||
|
|
||
| if (!string.Equals(absolute.Host, baseUri.Host, StringComparison.OrdinalIgnoreCase)) continue; | ||
| if (!absolute.AbsolutePath.StartsWith(productPathPrefix, StringComparison.OrdinalIgnoreCase)) continue; | ||
|
|
||
| // Reject links to sub-pages of a product (e.g. /product/x/reviews) by | ||
| // requiring exactly one path segment after the prefix. | ||
| var afterPrefix = absolute.AbsolutePath[productPathPrefix.Length..].TrimEnd('/'); | ||
| if (afterPrefix.Length == 0) continue; | ||
| if (afterPrefix.Contains('/', StringComparison.Ordinal)) continue; | ||
|
|
||
| // Drop the fragment and query so two anchor variants of the same | ||
| // product canonicalise to one URL in the result set. | ||
| var canonical = new UriBuilder(absolute) { Fragment = "", Query = "" }.Uri; | ||
| if (seen.Add(canonical.AbsoluteUri)) | ||
| { | ||
| urls.Add(canonical); | ||
| } | ||
| } |
Comment on lines
+130
to
+165
| foreach (var script in doc.QuerySelectorAll("script[type='application/ld+json']")) | ||
| { | ||
| var text = script.TextContent; | ||
| if (string.IsNullOrWhiteSpace(text)) continue; | ||
|
|
||
| JsonElement root; | ||
| try | ||
| { | ||
| using var parsed = JsonDocument.Parse(text); | ||
| root = parsed.RootElement.Clone(); | ||
| } | ||
| catch (JsonException) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| // WooCommerce sometimes wraps JSON-LD in @graph; sometimes not. | ||
| if (root.ValueKind == JsonValueKind.Object && root.TryGetProperty("@graph", out var graph) && graph.ValueKind == JsonValueKind.Array) | ||
| { | ||
| foreach (var item in graph.EnumerateArray()) | ||
| { | ||
| if (TryReadProduct(item) is { } prod) return prod; | ||
| } | ||
| } | ||
| else if (root.ValueKind == JsonValueKind.Array) | ||
| { | ||
| foreach (var item in root.EnumerateArray()) | ||
| { | ||
| if (TryReadProduct(item) is { } prod) return prod; | ||
| } | ||
| } | ||
| else | ||
| { | ||
| if (TryReadProduct(root) is { } prod) return prod; | ||
| } | ||
| } |
Comment on lines
+149
to
+152
| foreach (var item in graph.EnumerateArray()) | ||
| { | ||
| if (TryReadProduct(item) is { } prod) return prod; | ||
| } |
Comment on lines
+156
to
+159
| foreach (var item in root.EnumerateArray()) | ||
| { | ||
| if (TryReadProduct(item) is { } prod) return prod; | ||
| } |
Comment on lines
+204
to
+210
| foreach (var item in imageProp.EnumerateArray()) | ||
| { | ||
| if (item.ValueKind == JsonValueKind.String && item.GetString() is { Length: > 0 } s) | ||
| { | ||
| images.Add(s); | ||
| } | ||
| } |
Comment on lines
+238
to
+241
| if (offer.TryGetProperty("price", out var direct)) | ||
| { | ||
| if (FormatPrice(direct) is { } flat) return flat; | ||
| } |
Comment on lines
+97
to
+102
| catch (Exception ex) | ||
| { | ||
| Logger.LogWarning( | ||
| ex, "Barrels of Fun scraper: failed to fetch / extract {Url}; skipping.", productUrl); | ||
| return null; | ||
| } |
This was referenced May 3, 2026
jkeeley2073
added a commit
that referenced
this pull request
May 3, 2026
Seventh manufacturer scraper, third using JSON-LD product schema
(after JJP and BoF). WordPress + WooCommerce; discovery walks the
WP sitemap index and filters URLs to /store/p3-game-kits/multimorphic-game-kits/{slug}/
only -- Multimorphic-published P3 game kits, not the 13 third-party
kits sold through the same storefront. Third-party kits belong to
their originating studios per OPDB attribution; running them through
the reconciler with manufacturer=multimorphic would land them in the
wrong Cosmos partition (see ADR 0011).
Multimorphic JSON-LD ships BOTH flat offers[].price AND nested
offers[].priceSpecification (object, not array -- distinct from BoF),
and uses http://schema.org/... not https:// for availability.
MultimorphicProductExtractor handles every combination + @graph
wrapping -- the same code would work against any well-formed
WooCommerce-on-WordPress storefront.
ScraperManufacturerKey adds Multimorphic = "multimorphic" matching
OpdbMachineMapper.NormalizeManufacturerKey exactly so reconciled
records land in the correct Cosmos partition.
Pre-push self-audit:
* /local-review (Step 0): 0 🔴 / 1 ⚠️ -- family-wide test-infra
gap (no scraper-pipeline integration test exists for any of the
7 scrapers; would need shared IPolitenessGate + HttpMessageHandler
mocking infra to close across all of them); deferred to a focused
cross-cutting follow-up
* 7-item mechanical checklist (Step 1): all pass
Tests: 347 -> 375 (+28). Build: zero warnings. CLI: --source multimorphic.
Phase 1.3 status after this PR:
* Pinball Brothers (#37) -- shipped
* Barrels of Fun (#38) -- shipped
* Multimorphic -- this PR
* Chicago Gaming -- next candidate (custom CMS, AP-style template)
* Dutch Pinball -- DEFERRED with prejudice (robots.txt Disallow: /)
* Haggis -- DEFERRED, infrastructure outage (haggispinball.com.au
web server unreachable; recommend retry in 1-2 weeks)
This was referenced May 3, 2026
jkeeley2073
added a commit
that referenced
this pull request
May 3, 2026
Seventh manufacturer scraper, third using JSON-LD product schema
(after JJP and BoF). WordPress + WooCommerce; discovery walks the
WP sitemap index and filters URLs to /store/p3-game-kits/multimorphic-game-kits/{slug}/
only -- Multimorphic-published P3 game kits, not the 13 third-party
kits sold through the same storefront. Third-party kits belong to
their originating studios per OPDB attribution; running them through
the reconciler with manufacturer=multimorphic would land them in the
wrong Cosmos partition (see ADR 0011).
Multimorphic JSON-LD ships BOTH flat offers[].price AND nested
offers[].priceSpecification (object, not array -- distinct from BoF),
and uses http://schema.org/... not https:// for availability.
MultimorphicProductExtractor handles every combination + @graph
wrapping -- the same code would work against any well-formed
WooCommerce-on-WordPress storefront.
ScraperManufacturerKey adds Multimorphic = "multimorphic" matching
OpdbMachineMapper.NormalizeManufacturerKey exactly so reconciled
records land in the correct Cosmos partition.
Pre-push self-audit:
* /local-review (Step 0): 0 🔴 / 1 ⚠️ -- family-wide test-infra
gap (no scraper-pipeline integration test exists for any of the
7 scrapers; would need shared IPolitenessGate + HttpMessageHandler
mocking infra to close across all of them); deferred to a focused
cross-cutting follow-up
* 7-item mechanical checklist (Step 1): all pass
Tests: 347 -> 375 (+28). Build: zero warnings. CLI: --source multimorphic.
Phase 1.3 status after this PR:
* Pinball Brothers (#37) -- shipped
* Barrels of Fun (#38) -- shipped
* Multimorphic -- this PR
* Chicago Gaming -- next candidate (custom CMS, AP-style template)
* Dutch Pinball -- DEFERRED with prejudice (robots.txt Disallow: /)
* Haggis -- DEFERRED, infrastructure outage (haggispinball.com.au
web server unreachable; recommend retry in 1-2 weeks)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Sixth manufacturer scraper, second to consume JSON-LD
schema.org/Product(after JJP). Barrels of Fun sells through WooCommerce on a separate storefront domain —shop.kollectfun.com— while the marketing sitewww.barrelsoffun.comhas no products. The scraper targets the storefront.Discovery:
/product-category/machines/HTML page is the canonical filter for what counts as a pinball machine. The/product/*URL space also contains apparel, parts, and accessories — same defence-in-depth pattern as JJP's collection-handle filter shipped in PR #34, applied to a different storefront stack.Per-product: parses JSON-LD
@type: Productwhich has two interesting wrinkles:@graphwrapping. Yoast / RankMath SEO plugins wrap multiple schema entries in a@grapharray; theProductentry can be anywhere inside.offers[].priceSpecification[].price(BoF's actual shape, verified during recon). Shopify puts it directly onoffers[].price(JJP's shape).BofProductExtractorreads both, with nested taking precedence — the same extractor would work against another WooCommerce-on-WordPress storefront without modification.Catalog state
Today's
/product-category/machines/returns one machine: Jim Henson's Labyrinth (1,100 units, $10,600). The category is the right filter as the catalog grows — Bowie / Dune / future titles will appear there as they ship.Reconciler integration
ScraperManufacturerKey.BarrelsOfFun = \"barrelsoffun\"matchesOpdbMachineMapper.NormalizeManufacturerKeyexactly, so records land in the same Cosmos partition the OPDB sync wrote BoF machines to. The reconciler from PR #35 will pick them up automatically once Cosmos deploys.Pre-push self-audit (per PR #34 + PR #36)
Step 0 —
/local-review(qualitative)Local review: 0 🔴 / 1 ⚠️ / 9 categories ✅— cleanest review yet.BofProductScraper.ScrapeAsync(yield order, DiscoveryUrl/Context propagation, per-page failure isolation)Step 1 — Mechanical checklist
BarrelsOfFunOptionsproperty read by code (verified by grep —BaseUrl/MachinesCategoryPath/ProductPathPrefixall hit, no dead config)TryExtractAsyncshape, log message including manufacturer name (the PR feat(pinballbrothers) Pinball Brothers scraper — Phase 1.3.a #37 finding), DI registration pattern, yield ordercatch { }insrc/PinballWizard.Infrastructure/Scraping/BarrelsOfFun/SourceAliasContractTestsstill passes —Name = \"Barrels of Fun\"matches alias-map valueParseProductLinksfixture mixes machines + category + cart + external + sub-page + fragment + query;Extract_NestedPriceSpecification_BuildsRecordandExtract_FlatPriceShape_AlsoWorkscover both real shapesgit log -1 --format='%an <%ae>'— personal noreplyTests
347 / 347 passing (was 314). +33:
BofCategoryClientTests(7) — canonical product URL filter; sub-page rejection; fragment + query dedup; relative href base resolution; null/blank-arg validationBofProductExtractorTests(15) — nestedpriceSpecificationshape (BoF actual); flatoffers.priceshape;@graphwrapping; og:title fallback; h1 fallback; missing-everything → null; malformed JSON-LD fall-through; non-Product type ignored; slug extraction edge cases; full availability normalization table; null-arg validationScraperManufacturerKeyTests(+1 row) —game_barrelsoffun_jim-hensons-labyrinth→barrelsoffunPhase 1.3 status
Disallow: /in robots.txt; would need polite outreachAfter this merges, catalog coverage is ~99% of currently-shipping commercial pinball.
Out of scope
🤖 Generated with Claude Code