feat(multimorphic) Multimorphic scraper — Phase 1.3.c#39
Merged
Conversation
| "outofstock" => "out_of_stock", | ||
| "preorder" => "preorder", | ||
| "discontinued" => "discontinued", | ||
| _ => lastSegment?.ToLowerInvariant(), |
Comment on lines
+133
to
+167
| foreach (var script in doc.QuerySelectorAll("script[type='application/ld+json']")) | ||
| { | ||
| var text = script.TextContent; | ||
| if (string.IsNullOrWhiteSpace(text)) continue; | ||
|
|
||
| JsonElement root; | ||
| try | ||
| { | ||
| using var parsed = JsonDocument.Parse(text); | ||
| root = parsed.RootElement.Clone(); | ||
| } | ||
| catch (JsonException) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| if (root.ValueKind == JsonValueKind.Object && root.TryGetProperty("@graph", out var graph) && graph.ValueKind == JsonValueKind.Array) | ||
| { | ||
| foreach (var item in graph.EnumerateArray()) | ||
| { | ||
| if (TryReadProduct(item) is { } prod) return prod; | ||
| } | ||
| } | ||
| else if (root.ValueKind == JsonValueKind.Array) | ||
| { | ||
| foreach (var item in root.EnumerateArray()) | ||
| { | ||
| if (TryReadProduct(item) is { } prod) return prod; | ||
| } | ||
| } | ||
| else | ||
| { | ||
| if (TryReadProduct(root) is { } prod) return prod; | ||
| } | ||
| } |
Comment on lines
+151
to
+154
| foreach (var item in graph.EnumerateArray()) | ||
| { | ||
| if (TryReadProduct(item) is { } prod) return prod; | ||
| } |
Comment on lines
+158
to
+161
| foreach (var item in root.EnumerateArray()) | ||
| { | ||
| if (TryReadProduct(item) is { } prod) return prod; | ||
| } |
Comment on lines
+206
to
+212
| foreach (var item in imageProp.EnumerateArray()) | ||
| { | ||
| if (item.ValueKind == JsonValueKind.String && item.GetString() is { Length: > 0 } s) | ||
| { | ||
| images.Add(s); | ||
| } | ||
| } |
Comment on lines
+240
to
+243
| if (offer.TryGetProperty("price", out var direct)) | ||
| { | ||
| if (FormatPrice(direct) is { } flat) return flat; | ||
| } |
Comment on lines
+99
to
+104
| catch (Exception ex) | ||
| { | ||
| Logger.LogWarning( | ||
| ex, "Multimorphic scraper: failed to fetch / extract {Url}; skipping.", productUrl); | ||
| return null; | ||
| } |
Comment on lines
+91
to
+100
| foreach (var sitemap in doc.Descendants(SitemapNs + "sitemap")) | ||
| { | ||
| var loc = sitemap.Element(SitemapNs + "loc")?.Value; | ||
| if (string.IsNullOrWhiteSpace(loc)) continue; | ||
| if (loc.Contains("wp-sitemap-posts-product", StringComparison.OrdinalIgnoreCase) | ||
| && Uri.TryCreate(loc, UriKind.Absolute, out var uri)) | ||
| { | ||
| sitemaps.Add(uri); | ||
| } | ||
| } |
Comment on lines
+121
to
+135
| foreach (var url in doc.Descendants(SitemapNs + "url")) | ||
| { | ||
| var loc = url.Element(SitemapNs + "loc")?.Value; | ||
| if (string.IsNullOrWhiteSpace(loc)) continue; | ||
| if (!Uri.TryCreate(loc, UriKind.Absolute, out var uri)) continue; | ||
|
|
||
| var path = uri.AbsolutePath; | ||
| if (!path.StartsWith(normalizedPrefix, StringComparison.OrdinalIgnoreCase)) continue; | ||
|
|
||
| var afterPrefix = path[normalizedPrefix.Length..].TrimEnd('/'); | ||
| if (afterPrefix.Length == 0) continue; | ||
| if (afterPrefix.Contains('/', StringComparison.Ordinal)) continue; | ||
|
|
||
| urls.Add(uri); | ||
| } |
7 tasks
9c644a4 to
5cdf85e
Compare
This was referenced May 3, 2026
Seventh manufacturer scraper, third using JSON-LD product schema
(after JJP and BoF). WordPress + WooCommerce; discovery walks the
WP sitemap index and filters URLs to /store/p3-game-kits/multimorphic-game-kits/{slug}/
only -- Multimorphic-published P3 game kits, not the 13 third-party
kits sold through the same storefront. Third-party kits belong to
their originating studios per OPDB attribution; running them through
the reconciler with manufacturer=multimorphic would land them in the
wrong Cosmos partition (see ADR 0011).
Multimorphic JSON-LD ships BOTH flat offers[].price AND nested
offers[].priceSpecification (object, not array -- distinct from BoF),
and uses http://schema.org/... not https:// for availability.
MultimorphicProductExtractor handles every combination + @graph
wrapping -- the same code would work against any well-formed
WooCommerce-on-WordPress storefront.
ScraperManufacturerKey adds Multimorphic = "multimorphic" matching
OpdbMachineMapper.NormalizeManufacturerKey exactly so reconciled
records land in the correct Cosmos partition.
Pre-push self-audit:
* /local-review (Step 0): 0 🔴 / 1 ⚠️ -- family-wide test-infra
gap (no scraper-pipeline integration test exists for any of the
7 scrapers; would need shared IPolitenessGate + HttpMessageHandler
mocking infra to close across all of them); deferred to a focused
cross-cutting follow-up
* 7-item mechanical checklist (Step 1): all pass
Tests: 347 -> 375 (+28). Build: zero warnings. CLI: --source multimorphic.
Phase 1.3 status after this PR:
* Pinball Brothers (#37) -- shipped
* Barrels of Fun (#38) -- shipped
* Multimorphic -- this PR
* Chicago Gaming -- next candidate (custom CMS, AP-style template)
* Dutch Pinball -- DEFERRED with prejudice (robots.txt Disallow: /)
* Haggis -- DEFERRED, infrastructure outage (haggispinball.com.au
web server unreachable; recommend retry in 1-2 weeks)
5cdf85e to
0f3dcea
Compare
This was referenced May 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Seventh manufacturer scraper, third to consume JSON-LD
schema.org/Product(after JJP and BoF). Multimorphic is WordPress + WooCommerce; discovery walks the WP sitemap index (/wp-sitemap.xml) → product sub-sitemaps and filters URLs to/store/p3-game-kits/multimorphic-game-kits/{slug}/only — the 16 Multimorphic-published P3 game kits.Why exclude third-party kits
Multimorphic's storefront sells both their own game kits AND third-party kits (Drained, Princess Bride, Portal, Silver Falls, Flipper Foxtrot, etc.) for the P3 platform. Per ADR 0011, OPDB owns the catalog spine: it attributes those games to their originating studios, not Multimorphic. Running third-party kits through the reconciler with
manufacturer = multimorphicwould land them in the wrong Cosmos partition.The path-prefix filter (
/store/p3-game-kits/multimorphic-game-kits/exclusively) draws the line cleanly. Code-level rationale lives inMultimorphicOptions.csandMultimorphicProductScraper.cs, not just this PR description.JSON-LD shape — third permutation handled
offerspriceSpecificationpricehttps://schema.org/...https://schema.org/...priceAND nestedhttp://schema.org/...MultimorphicProductExtractorhandles every combination — flat-only / nested-only / both, object-or-arraypriceSpecification, http-or-https availability, plus@graphwrapping.Phase 1.3 status
robots.txtsaysDisallow: /(complete crawl ban). We honor robots.txt; user explicitly confirmed the policy. Would require polite outreach to dutchpinball.com for explicit permission before any code.haggispinball.com.auresolves to a real Australian IP but the web server is unreachable (connection timeouts on all ports). Different blocker than Dutch Pinball; retry in 1-2 weeks.Pre-push self-audit (per PR #34 + PR #36)
Step 0 —
/local-review(qualitative)Local review: 0 🔴 / 1 ⚠️ / 9 categories ✅— same clean result as BoF.ScrapedItem.DiscoveryUrl/DiscoveryContext/Source.ScrapedFrompropagationIPolitenessGate+HttpMessageHandlermocking infrastructure that doesn't exist yet. Better as a focused cross-cutting test-infra PR than a one-off addition here.Step 1 — Mechanical checklist
MultimorphicOptionsproperty read by code (verified by grep —BaseUrl/SitemapPath/MultimorphicGameKitsPathPrefixall hit, no dead config)MultimorphicSitemapClientmirrorsJjpSitemapClient(index walk);MultimorphicProductExtractormirrorsBofProductExtractor(JSON-LD parser);MultimorphicProductScrapermirrorsBofProductScraper(TryExtractAsync, log shapes including the manufacturer name per the PR feat(pinballbrothers) Pinball Brothers scraper — Phase 1.3.a #37 finding)catch { }insrc/PinballWizard.Infrastructure/Scraping/Multimorphic/SourceAliasContractTestsstill passes —Name = \"Multimorphic\"matches alias-map valuegit log -1 --format='%an <%ae>'— personal noreplyTests
375 / 375 passing (was 347). +28:
MultimorphicSitemapClientTests(8) — index parse rejects non-product sub-sitemaps; paginated sub-sitemaps walked; path-prefix filter rejects 3rd-party kits / circuit boards / accessories / apparel / the P3 platform; sub-page rejection; prefix-without-trailing-slash; null/blank-arg validationMultimorphicProductExtractorTests(12) — real Multimorphic shape (flat + nested + http schema.org); nested-only path;@graphwrap; og:title fallback; no-title → null; malformed JSON-LD fall-through; slug extraction (positive + negative landmarkmultimorphic-game-kits); availability normalization across http / https / bare token; null-arg validationScraperManufacturerKeyTests(+1 row) —game_multimorphic_lexy-lightspeed-escape-from-earth→multimorphicOut of scope
🤖 Generated with Claude Code