test(infra) family-wide scraper-pipeline test infrastructure#41
Merged
Conversation
Closes the recurring⚠️ finding from /local-review runs across PRs #38 / #39 / #40: no scraper-pipeline integration test asserts yield order, provenance propagation, per-page failure isolation, or the polite-gate routing invariants. Adds two shared fakes plus a proof-of-concept against CgcGamePageScraper. * FakePolitenessGate: records every Acquire/Report and lease dispose so tests can assert the polite-scraping invariants end-to-end. Includes ThrowOnAcquire / ThrowOnReport for the abort paths. * QueueingHttpMessageHandler: pre-canned URL→response map; throws UnexpectedRequestException with the mapped-URL list on misses, so a regression fetching the wrong URL fails loudly with a diff-friendly error message instead of silently 404-ing. * CgcGamePageScraperTests (proof-of-concept): 5 tests covering happy path (yield order + provenance + URL-equality between gate and wire), per-page failure isolation with politeness invariants still holding under failure, discovery-failure yield-break, PolitenessException propagation on both Acquire and Report paths. Picked CGC for the POC because it exercises BOTH .Game and .Link yields — the template covers the dual-yield case so backfills to single-yield scrapers will be a strict subset. Pre-push self-audit: * /local-review (Step 0): 0 🔴 / 5⚠️ -- all five fixed in this PR rather than deferred, because the proof-of-concept becomes the template for ~7 future backfill PRs and template gaps multiply. Reviewer's exact recommendations applied: 1. URL-equality assertion between handler.Requests and gate.Acquired/Reported in the happy path 2. Politeness invariants pinned under per-page failure path 3. PolitenessException test asserts zero requests + zero reports on the abort 4. Provenance assertions on Source.ScrapedFrom / ScrapedAt / DiscoveredOn 5. New test for ThrowOnReport (was unused API surface) * 7-item mechanical checklist (Step 1): all pass Tests: 373 -> 378 (+5). Build: zero warnings. Backfill across the other 7 scrapers (Stern, JJP, AP, Spooky, PB, BoF, Multimorphic) is intentionally deferred -- follow-up PRs can land each independently using this template. The audit-flagged invariants are now codified once and reusable.
This was referenced May 3, 2026
jkeeley2073
added a commit
that referenced
this pull request
May 3, 2026
Backfill of the PR #41 template across the family. 5 tests covering happy-path with full provenance + gate-vs-wire URL equality, per-page failure isolation, discovery failure, PolitenessException on both Acquire and Report. Single-yield (.Game only) WooCommerce storefront via /product-category/machines/. Pre-push audit: 7-item mechanical (all pass). /local-review deferred to human reviewer at merge time.
jkeeley2073
added a commit
that referenced
this pull request
May 3, 2026
Backfill of the PR #41 template across the family. 5 tests covering happy-path with full provenance + gate-vs-wire URL equality, per-page failure isolation, discovery failure, PolitenessException on both Acquire and Report. Multi-yield (.Game + .Link) scraper so tests pin both kinds plus .Link.GameSlug lineage to parent game. Pre-push audit: 7-item mechanical (all pass). /local-review deferred to human reviewer at merge time.
jkeeley2073
added a commit
that referenced
this pull request
May 3, 2026
Backfill of the PR #41 template across the family. 5 tests covering happy-path with full provenance + gate-vs-wire URL equality, per-page failure isolation, discovery failure, PolitenessException on both Acquire and Report. Single-yield (.Game only) Shopify storefront via /collections/{slug}/products.json. Pre-push audit: 7-item mechanical (all pass). /local-review deferred to human reviewer at merge time.
jkeeley2073
added a commit
that referenced
this pull request
May 3, 2026
Backfill of the PR #41 template across the family. 5 tests covering happy-path with full provenance + gate-vs-wire URL equality, per-page failure isolation, sitemap discovery failure, PolitenessException on both Acquire and Report. Single-yield (.Game only) WordPress storefront via /wp-sitemap.xml walk filtered to /store/p3-game-kits/multimorphic-game-kits/. Pre-push audit: 7-item mechanical (all pass). /local-review deferred to human reviewer at merge time.
jkeeley2073
added a commit
that referenced
this pull request
May 3, 2026
Backfill of the PR #41 template across the family. 5 tests covering happy-path with full provenance + gate-vs-wire URL equality, per-page failure isolation, discovery failure, PolitenessException on both Acquire and Report. Multi-yield (.Game + .Link) scraper so tests pin both kinds plus .Link.GameSlug lineage to parent game. Pre-push audit: 7-item mechanical (all pass). /local-review deferred to human reviewer at merge time.
jkeeley2073
added a commit
that referenced
this pull request
May 3, 2026
Backfill of the PR #41 template across the family. 5 tests covering happy-path with full provenance + gate-vs-wire URL equality, per-link extraction failure isolation, discovery failure, PolitenessException on both Acquire and Report. Single-yield-link scraper (manuals are documents, not games) so the tests assert .Link yield order only with no .Game items. Pre-push audit: 7-item mechanical (all pass). /local-review deferred to human reviewer at merge time.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the recurring⚠️ finding from
/local-reviewruns across PRs #38 / #39 / #40: "no scraper-pipeline integration test asserts yield order, provenance propagation, per-page failure isolation, or the polite-gate routing invariants."Two shared test fakes plus a proof-of-concept against
CgcGamePageScraper. The fakes are reusable across all 8 scrapers; backfilling each one is intentionally deferred to follow-up PRs (so each can land independently as scrapers are touched). What this PR establishes is the template.What's added
FakePolitenessGate— implementsIPolitenessGate, records every Acquire/Report/lease-dispose so tests can assert the polite-scraping invariants are honored end-to-end.ThrowOnAcquire/ThrowOnReportfor testing abort paths.QueueingHttpMessageHandler— maps absolute URLs to pre-canned responses. ThrowsUnexpectedRequestException(with the mapped-URL list embedded in the message) on unmapped requests so a regression that fetches the wrong URL fails loudly instead of silently 404-ing.CgcGamePageScraperTests(proof-of-concept) — 5 tests covering:PolitenessExceptionpropagation on the Acquire path with assertions that zero requests firedPolitenessExceptionpropagation on the Report pathPicked CGC for the proof-of-concept because it exercises BOTH
.Gameand.Linkyields — the template covers the dual-yield case, so backfills to single-yield scrapers (PB, JJP, BoF, Multimorphic, Spooky) will be a strict subset.Pre-push self-audit (per PR #34 + PR #36)
Step 0 —
/local-review0 🔴 / 5⚠️ / 4 ✅ — and all 5 ⚠️ fixed in this PR rather than deferred. The reviewer made a strong asymmetric-cost argument: this PR is the template for ~7 future backfill PRs, so template gaps multiply. Fixing once now is much cheaper than fixing 7 times later.
handler.Requestsandgate.Acquired/Reportedmissing — without it a future re-canonicalisation refactor could silently throttle a different originAssert.Equal(...)on URL sequences in the happy-path testgate+handler— politeness invariants under failure not pinnedPolitenessExceptiontest asserted propagation but not "zero requests fired" — a regression that swallowed the throw and continued would still pass for the first callAssert.Empty(handler.Requests)andAssert.Empty(gate.Reported)Source.ScrapedFrom,ScrapedAt,DiscoveredOn) not asserted — these are what survive intocatalog.jsonand Phase 2 RAG citations; provenance is "sacred" per CLAUDE.mdfirstGame.Source!.ScrapedFrometc.ThrowOnReportwas unused API surface ("I added a hook, never tested it")Step 1 — Mechanical checklist
*Optionsproperties (N/A)catch { }intests/PinballWizard.Scraper.Tests/Scraping/_TestInfra/SourceAliasContractTestsstill passesgit log -1 --format='%an <%ae>'— personal noreplyTests
378 / 378 passing (was 373). +5:
CgcGamePageScraperTests(5)Folder convention
Established new convention:
tests/.../Scraping/_TestInfra/for shared test fakes. Leading underscore sorts to top of directory listings — discoverability for the next test author. (Reviewer noted this is a new convention; one-line README in a follow-up could document it formally if it sticks.)Out of scope
CgcGamePageScraperTests. Single-yield scrapers (PB / JJP / BoF / Multimorphic / Spooky) drop the Link-yield assertions; multi-yield scrapers (AP / Stern's GamePageScraper) keep them._TestInfra/— defer until pattern proves out across 2-3 backfills.🤖 Generated with Claude Code