test(infra) JJP scraper-pipeline integration tests#47
Merged
Conversation
Backfill of the PR #41 template across the family. 5 tests covering happy-path with full provenance + gate-vs-wire URL equality, per-page failure isolation, discovery failure, PolitenessException on both Acquire and Report. Single-yield (.Game only) Shopify storefront via /collections/{slug}/products.json. Pre-push audit: 7-item mechanical (all pass). /local-review deferred to human reviewer at merge time.
4eca975 to
e71dd58
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Backfill of the PR #41 family-wide scraper-pipeline test-infra template against
JjpProductScraper. JJP is a single-yield scraper (.Gameonly) on a Shopify storefront. Discovery is the canonical Shopify pattern: fetch/collections/{slug}/products.jsonfor the pinball-machine handle set, walk the sitemap index for product sub-sitemaps, filter the listed product URLs against the handle set, then fetch each filtered product page and yield. The five tests use the sharedFakePolitenessGate+QueueingHttpMessageHandlerfrom PR #41 to pin behaviors the unit-test surface couldn't reach: yield order, provenance-field propagation ontoScrapedItem,SourceType.JjpProductPageand"JJP Product Page"discovery context, gate-vs-wire URL byte-equality, per-page failure isolation, discovery failure aborting only this source, andPolitenessExceptionpropagation on both Acquire and Report paths.Touches only the new test file plus a single CHANGELOG bullet inserted directly after the family-wide test-infra entry. No production code changed; no
JjpProductScraperbugs found in the test path. (Note: the two pre-existing JJP drift items surfaced by/local-reviewof PR #43 —ExtractSlugnull-guard andNormalizeAvailabilityvisibility — are intentionally out of scope for this test-infra PR and tracked separately.)Test Plan
dotnet build -nologo -v:m→ 0 warnings, 0 errorsdotnet test --filter "FullyQualifiedName~JjpProductScraperTests"→ 5/5 pass (878 ms)dotnet test(full suite) → 435/435 pass (was 430 on main; +5 = the new tests)Tests added (all in
tests/PinballWizard.Scraper.Tests/Scraping/Jjp/JjpProductScraperTests.cs):ScrapeAsync_HappyPath_YieldsGamesInListOrderWithProvenance— 3 fixtured products surface from the collection JSON + sitemap walk; assert.Gameyield order matches sitemap order,SourceType.JjpProductPage,"JJP Product Page"context,DiscoveryUrlper item,Source.ScrapedFrom= product URL,Source.ScrapedAtnon-default,DiscoveredOncontains"jjp_products", every wire URL byte-equals every gate-acquired and gate-reported URL.ScrapeAsync_PerPageFetchFailure_DoesNotAbortRun— middle product page returns 500; the other two still yield; the 500 is reported to the gate.ScrapeAsync_DiscoveryFailure_AbortsThisSourceOnly—/collections/pinball-machines-for-sale/products.jsonreturns 500; scraper yields nothing without throwing.ScrapeAsync_PolitenessExceptionFromGate_PropagatesUp—gate.ThrowOnAcquireset; assertPolitenessExceptionpropagates and zero wire requests fired.ScrapeAsync_GateThrowsOnReport_BubblesUp—gate.ThrowOnReportset; assertPolitenessExceptionpropagates from the first response report (collection JSON fetch).Out of Scope
/local-review(ExtractSlugnull-guard,NormalizeAvailabilityvisibility) — tracked separately.Checklist
docs/adr/— N/A, test-onlyREADME.mdand/ordocs/are updated in the same PR — N/A, no behavior change~/.claude/projects/c--projects-PinballWizard/memory/is now stale, it has been updated or removed in the same PR — N/ATODO/FIXME/ commented-out code committed<NoWarn>without a comment explaining why and the removal criterionPre-push self-audit (additive PRs)
Step 0 —
/local-review(qualitative)/local-reviewand addressed every 🔴 finding before push/local-reviewonce across the family-merge sequence to catch any cross-PR drift.Step 1 — Mechanical checklist
*Optionsproperty has at least one real getter call insrc/— N/A, no production code addedCgcGamePageScraperTests); drift is justified — JJP-specific deltas: discovery is collection JSON + sitemap walk vs. menu HTML; single-yield (.Gameonly) vs. mixed yields; provenance assertsDiscoveryUrlequals product page URL because there is no separate index discovery URL on the per-yield pathcatch { }— none added (onlyAssert.ThrowsAsync<PolitenessException>for the propagation tests)ISourceScraper? — N/A, test-only PRgate.ThrowOnAcquire/ThrowOnReport)dotnet build -nologo -v:mgit log -1 --format='%an <%ae>'shows personal noreply, not work email — verified:Jim Keeley <94459922+jkeeley2073@users.noreply.github.com>