Skip to content

feat(sync) scraper-to-Machine reconciliation service + ADR 0011#35

Merged
jkeeley2073 merged 1 commit into
mainfrom
Dev-CosmosMigration
May 2, 2026
Merged

feat(sync) scraper-to-Machine reconciliation service + ADR 0011#35
jkeeley2073 merged 1 commit into
mainfrom
Dev-CosmosMigration

Conversation

@jkeeley2073
Copy link
Copy Markdown
Contributor

Summary

Bridges the legacy/working GameRecord shape (in Core/Models) to the OPDB-keyed Machine aggregate (in Core/Domain/). Architectural debt that had been "deferred to follow-up" in every Phase 1.2 PR description (#30 OPDB / #31 JJP / #32 AP / #33 Spooky) — four PRs deep was the right time to land this.

Per ADR 0011 (added in this PR), the bridge is a one-way reconciliation: OPDB owns the catalog spine, the scrapers contribute edition data and populate Machine.ManufacturerSlugs[manufacturerKey] back-references.

Field ownership (locked in ADR 0011)

Field Owner Behavior on reconcile
Id, PartitionKey, ManufacturerDisplayName OPDB Never changed
Title, Year, Designers, Themes OPDB Never changed
Editions Scraper Replaced wholesale (current pricing/availability is fresher on the manufacturer site than on OPDB)
ManufacturerSlugs[mfg] Scraper Set / replaced on reconcile
LastSeenAt Either Updated to now on reconcile

A scraped game with no matching Machine is logged and skipped — never written. Per ADR 0011 OPDB is the gate for what counts as a real machine; manufacturers shipping machines OPDB doesn't yet know about appear in warning logs and require an OPDB contribution to reconcile.

Two-pass match strategy

  1. Slug fast path: Machine.ManufacturerSlugs[mfg] == GameRecord.Slug — constant per-record work after the first sync seeds the slug map.
  2. Title-normalize fallback (bootstrap path): First run, every Machine in the partition has an empty ManufacturerSlugs map. Match by normalized title (lowercase + strip non-alphanumeric). Exactly one match → populate the slug map and treat as matched. Zero or multiple matches → log warning (with both candidate Machine IDs in the ambiguous case) and skip.

The fallback handles bootstrap automatically; subsequent runs hit the fast path.

Manufacturer-key alignment

ScraperManufacturerKey constants (stern / jjp / americanpinball / spooky) match OpdbMachineMapper.NormalizeManufacturerKey exactly so a JJP GameRecord lands in the same Cosmos partition the OPDB sync wrote JJP machines to. Pinned by ScraperManufacturerKeyTests.Constants_AreLowercaseToMatchPartitionKeys.

Deferred (intentional)

  • CLI integration--sync-machines flag etc. Cosmos infrastructure isn't deployed yet, so wiring it into Program.cs would create a code path that fails at DI startup. The reconciler ships as pure Application + tests, exactly matching the existing OpdbSyncService pattern. CLI integration lands when Bicep deploys.
  • Sunsetting games.json — the legacy catalog continues to be written alongside Cosmos data. Sunset decision is its own ADR/PR.

Tests

  • 288 / 288 passing (was 260) — +28 new tests
  • ScraperManufacturerKeyTests (5) — every prefix recognised, sentinel-prefix gating, blank-input throws, lowercase-key invariant
  • ScraperReconciliationServiceTests (23) — full coverage:
    • Slug fast-path merge with edition copy + LastSeenAt update
    • OPDB-owned-field preservation (a scraper-side title change must NOT overwrite the OPDB title)
    • Bootstrap title-normalize match (empty slug map → populated after match)
    • Title normalization across case / punctuation / digits / whitespace / null
    • Ambiguous title (≥2 candidates with same normalized title) → no upsert, both IDs logged
    • Unmatched record → skip, no insert
    • Unrecognised GameId prefix → counted as FailedMapping
    • Partition cache: StreamByManufacturerAsync called exactly once per partition per run
    • Idempotent re-reconcile flips title-fallback → slug-fast-path on second run
    • Constructor null-checks + ReconcileAsync(null) throws
  • Build: zero warnings

NSubstitute against IMachineRepository and a fake TimeProvider — same pattern as CosmosRepositoryTests and OpdbMachineMapperTests.

Self-audit checklist (per CLAUDE.md § PR self-audit)

Ran the checklist shipped in PR #34. All items pass:

  • No new options classes (N/A)
  • Sibling-diff: IScraperReconciliationService mirrors IOpdbSyncService shape
  • No bare catch { }
  • SourceAliasContractTests still passes (no scraper added/changed)
  • Tests assert behavior — "ambiguous → skip" includes 2 same-title machines; "preserves OPDB-owned" sets a different scraper title and asserts OPDB title survives
  • Build is zero-warning
  • git log -1 --format='%an <%ae>' → personal noreply

Test plan

  • dotnet build — clean
  • dotnet test — 288/288 pass
  • dotnet test --filter Sync — reconciler suite isolated, 28 pass
  • End-to-end against live Cosmos — deferred until Bicep deploys

Out of scope

  • CLI command wiring (deferred — Cosmos not deployed)
  • ACA Job definition for scraper-mfg-sync — Bicep PR territory
  • games.json sunsetting — its own ADR
  • Phase 1.3 manufacturer scrapers (Multimorphic / Chicago Gaming / Haggis / Pinball Brothers / Dutch / Barrels of Fun) — same scraper template; can land independently

🤖 Generated with Claude Code

Bridges the legacy GameRecord shape (Core/Models) to the OPDB-keyed
Machine aggregate (Core/Domain). Architectural debt that had been
"deferred to follow-up" in every Phase 1.2 PR description -- four PRs
deep was enough.

Per ADR 0011 the bridge is a one-way reconciliation: OPDB owns the
catalog spine (Title / Year / Designers / Themes) and the scrapers
contribute Editions plus the ManufacturerSlugs[mfg] back-reference.
Two-pass match strategy:

  1. Slug fast path: Machine.ManufacturerSlugs[mfg] == GameRecord.Slug
  2. Title-normalize fallback (bootstrap): lowercase + strip
     non-alphanumeric. Populates the slug map on first match so
     subsequent runs hit the fast path. Ambiguous matches (>=2
     Machines with the same normalized title) are logged with both
     candidate IDs and skipped -- never pick one arbitrarily.

Per ADR 0011 unmatched scraper records are skipped, not inserted --
OPDB is the gate for what counts as a real machine. Per-partition
cache means O(P) repository streams per run (one stream per
manufacturer), not O(N).

ScraperManufacturerKey constants match
OpdbMachineMapper.NormalizeManufacturerKey exactly so scraped data
lands in the same Cosmos partition OPDB wrote to.

CLI integration is deferred until Cosmos is deployed; in production
the reconciler runs in the scraper-mfg-sync ACA Job.

28 new tests (288 total) with NSubstitute. Self-audit checklist
shipped in PR #34 ran clean: zero warnings, identity verified,
SourceAliasContractTests still passing.
@jkeeley2073 jkeeley2073 added the claude-code Generated with Claude Code label May 2, 2026
Comment on lines +137 to +144
foreach (var machine in partition)
{
if (machine.ManufacturerSlugs.TryGetValue(manufacturer, out var existingSlug)
&& string.Equals(existingSlug, game.Slug, StringComparison.OrdinalIgnoreCase))
{
return (machine, MatchOutcome.Slug);
}
}
Comment on lines +153 to +161
foreach (var machine in partition)
{
if (NormalizeTitle(machine.Title) == normalizedScraped)
{
candidate = machine;
matchCount++;
if (matchCount > 1) break;
}
}
Comment on lines +212 to +215
foreach (var c in title)
{
if (char.IsLetterOrDigit(c)) sb.Append(char.ToLowerInvariant(c));
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code Generated with Claude Code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants