feat(sync) scraper-to-Machine reconciliation service + ADR 0011#35
Merged
Conversation
Bridges the legacy GameRecord shape (Core/Models) to the OPDB-keyed
Machine aggregate (Core/Domain). Architectural debt that had been
"deferred to follow-up" in every Phase 1.2 PR description -- four PRs
deep was enough.
Per ADR 0011 the bridge is a one-way reconciliation: OPDB owns the
catalog spine (Title / Year / Designers / Themes) and the scrapers
contribute Editions plus the ManufacturerSlugs[mfg] back-reference.
Two-pass match strategy:
1. Slug fast path: Machine.ManufacturerSlugs[mfg] == GameRecord.Slug
2. Title-normalize fallback (bootstrap): lowercase + strip
non-alphanumeric. Populates the slug map on first match so
subsequent runs hit the fast path. Ambiguous matches (>=2
Machines with the same normalized title) are logged with both
candidate IDs and skipped -- never pick one arbitrarily.
Per ADR 0011 unmatched scraper records are skipped, not inserted --
OPDB is the gate for what counts as a real machine. Per-partition
cache means O(P) repository streams per run (one stream per
manufacturer), not O(N).
ScraperManufacturerKey constants match
OpdbMachineMapper.NormalizeManufacturerKey exactly so scraped data
lands in the same Cosmos partition OPDB wrote to.
CLI integration is deferred until Cosmos is deployed; in production
the reconciler runs in the scraper-mfg-sync ACA Job.
28 new tests (288 total) with NSubstitute. Self-audit checklist
shipped in PR #34 ran clean: zero warnings, identity verified,
SourceAliasContractTests still passing.
Comment on lines
+137
to
+144
| foreach (var machine in partition) | ||
| { | ||
| if (machine.ManufacturerSlugs.TryGetValue(manufacturer, out var existingSlug) | ||
| && string.Equals(existingSlug, game.Slug, StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| return (machine, MatchOutcome.Slug); | ||
| } | ||
| } |
Comment on lines
+153
to
+161
| foreach (var machine in partition) | ||
| { | ||
| if (NormalizeTitle(machine.Title) == normalizedScraped) | ||
| { | ||
| candidate = machine; | ||
| matchCount++; | ||
| if (matchCount > 1) break; | ||
| } | ||
| } |
Comment on lines
+212
to
+215
| foreach (var c in title) | ||
| { | ||
| if (char.IsLetterOrDigit(c)) sb.Append(char.ToLowerInvariant(c)); | ||
| } |
This was referenced May 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bridges the legacy/working
GameRecordshape (inCore/Models) to the OPDB-keyedMachineaggregate (inCore/Domain/). Architectural debt that had been "deferred to follow-up" in every Phase 1.2 PR description (#30 OPDB / #31 JJP / #32 AP / #33 Spooky) — four PRs deep was the right time to land this.Per ADR 0011 (added in this PR), the bridge is a one-way reconciliation: OPDB owns the catalog spine, the scrapers contribute edition data and populate
Machine.ManufacturerSlugs[manufacturerKey]back-references.Field ownership (locked in ADR 0011)
Id,PartitionKey,ManufacturerDisplayNameTitle,Year,Designers,ThemesEditionsManufacturerSlugs[mfg]LastSeenAtA scraped game with no matching Machine is logged and skipped — never written. Per ADR 0011 OPDB is the gate for what counts as a real machine; manufacturers shipping machines OPDB doesn't yet know about appear in warning logs and require an OPDB contribution to reconcile.
Two-pass match strategy
Machine.ManufacturerSlugs[mfg] == GameRecord.Slug— constant per-record work after the first sync seeds the slug map.ManufacturerSlugsmap. Match by normalized title (lowercase + strip non-alphanumeric). Exactly one match → populate the slug map and treat as matched. Zero or multiple matches → log warning (with both candidate Machine IDs in the ambiguous case) and skip.The fallback handles bootstrap automatically; subsequent runs hit the fast path.
Manufacturer-key alignment
ScraperManufacturerKeyconstants (stern/jjp/americanpinball/spooky) matchOpdbMachineMapper.NormalizeManufacturerKeyexactly so a JJPGameRecordlands in the same Cosmos partition the OPDB sync wrote JJP machines to. Pinned byScraperManufacturerKeyTests.Constants_AreLowercaseToMatchPartitionKeys.Deferred (intentional)
--sync-machinesflag etc. Cosmos infrastructure isn't deployed yet, so wiring it into Program.cs would create a code path that fails at DI startup. The reconciler ships as pure Application + tests, exactly matching the existingOpdbSyncServicepattern. CLI integration lands when Bicep deploys.games.json— the legacy catalog continues to be written alongside Cosmos data. Sunset decision is its own ADR/PR.Tests
ScraperManufacturerKeyTests(5) — every prefix recognised, sentinel-prefix gating, blank-input throws, lowercase-key invariantScraperReconciliationServiceTests(23) — full coverage:LastSeenAtupdateGameIdprefix → counted asFailedMappingStreamByManufacturerAsynccalled exactly once per partition per runReconcileAsync(null)throwsNSubstitute against
IMachineRepositoryand a fakeTimeProvider— same pattern asCosmosRepositoryTestsandOpdbMachineMapperTests.Self-audit checklist (per CLAUDE.md § PR self-audit)
Ran the checklist shipped in PR #34. All items pass:
IScraperReconciliationServicemirrorsIOpdbSyncServiceshapecatch { }SourceAliasContractTestsstill passes (no scraper added/changed)git log -1 --format='%an <%ae>'→ personal noreplyTest plan
dotnet build— cleandotnet test— 288/288 passdotnet test --filter Sync— reconciler suite isolated, 28 passOut of scope
scraper-mfg-sync— Bicep PR territorygames.jsonsunsetting — its own ADR🤖 Generated with Claude Code