Skip to content

fix(opdb) treat blank strings as null in OpdbMachineMapper fallback chain#100

Merged
jkeeley2073 merged 3 commits into
mainfrom
Dev-Phase4OpdbMapperEmptyStringFix
May 8, 2026
Merged

fix(opdb) treat blank strings as null in OpdbMachineMapper fallback chain#100
jkeeley2073 merged 3 commits into
mainfrom
Dev-Phase4OpdbMapperEmptyStringFix

Conversation

@jkeeley2073
Copy link
Copy Markdown
Contributor

Summary

Critical data-correctness fix uncovered while spot-checking the Phase 4 W1-3 eval re-curation outcome (PR #98 merged 2026-05-08).

The bug. OpdbMachineMapper.Map() and MergeOpdbFieldsInto() fell back through dto.CommonName ?? dto.Name ?? dto.OpdbId. C#'s null-coalescing operator preserves empty strings — null ?? "" ?? OpdbId evaluates to "", never reaching OpdbId. OPDB's /api/export returns some modern Stern records with empty name / common_name strings, so the existing chain produced empty titles in deployed Cosmos.

The blast radius. A spot-check of the deployed catalog (probe script, since deleted) found 58 Stern records from 2018+ with title="" — including GweeP-MW95j (Stern Godzilla Pro 2021), GRBN-MQR4P (Stern Foo Fighters), and the rest of the modern Stern catalog. Empty titles silently break IMachineRepository.QueryByTitleAsync — the function tool the Wizard agent uses to ground every machine reference. Phase 3 H2 baseline citation_precision=0.133 was substantially driven by this single bug: when the eval asked about Stern Godzilla 2021, the deployed catalog had only Sega Godzilla 1998 with a non-empty title.

The fix. New FirstNonBlank(params string?[]) helper that treats null/empty/whitespace identically. Applied at four sites:

  • OpdbMachineMapper.Map() title fallback chain
  • OpdbMachineMapper.Map() manufacturer-key resolution (ShortName fallback)
  • OpdbMachineMapper.MergeOpdbFieldsInto() title refresh
  • OpdbSyncService.cs:191 alias pass-2 partition-key resolution (sibling drift caught by /local-review)

Operator hand-off — do this after merge. The deployed catalog stays broken until the operator re-runs OPDB sync. After this PR merges:

  1. dotnet run --project src/PinballWizard.Cli -- --source opdb (with OPDB__APITOKEN and az login against the personal Earlybird sub)
  2. Verify with dotnet script tools/eval/Recurate.csx -- --dry-run (the W1-3 hardened recuration tool from PR chore(eval) Phase 4 W1-3 hardening — mfg-aware re-curation skip-on-mismatch #99 should now resolve Godzilla to GweeP-MW95j instead of Sega's G5po2-MeP6B)
  3. Re-run live recuration to update the eval ground truth: dotnet script tools/eval/Recurate.csx
  4. Re-run H2 baseline to confirm the citation_precision lift: dotnet run --project src/PinballWizard.Cli -- --eval

Memory hand-off written at ~/.claude/projects/c--projects-PinballWizard/memory/session_handoff_2026_05_08_opdb_mapper_blank_string_fix.md so a future session asking "what's the state of the deployed catalog?" finds the answer without git archaeology.

Test Plan

  • dotnet build PinballWizard.slnx — Build succeeded, 0 Warning(s), 0 Error(s)
  • dotnet test PinballWizard.slnx — Passed: 718, Failed: 0, Skipped: 0 (was 709; +9 regression tests)
  • Mapper tests (8 new):
    • Map_BlankCommonNameAndName_FallsBackToOpdbId — Theory with 5 InlineData rows: (null, ""), ("", null), ("", ""), (" ", " "), ("\t", "\n"). Each fails against pre-fix code (returns "" or whitespace, not OpdbId).
    • Map_BlankCommonName_FallsThroughToName — pins the falls-through-to-Name behavior when CommonName is blank but Name is non-blank.
    • Map_BlankShortName_FallsBackToFullManufacturerName — pins the defense-in-depth fix on the manufacturer-key chain.
    • MergeOpdbFieldsInto_BlankCommonNameAndName_PreservesExistingTitle — pins that blanks on a re-sync don't wipe an existing title.
  • Sync-service test (1 new):
    • SyncAsync_AliasBlankShortName_FallsThroughToFullManufacturerName — pins the alias is folded as an edition (not silently skipped) when OPDB returns shortname="".
  • Live integration verification post-merge: H2 baseline rerun (operator hand-off above) measures actual citation_precision lift.

Out of Scope

  • Lower-severity drift in storefront JSON-LD extractors (BofProductExtractor, JjpProductExtractor, MultimorphicProductExtractor) — similar ?? chains for description fields. Storefront JSON-LD rarely emits empty descriptions; impact theoretical. Tracked as Phase 4.x follow-up; not expanded into this PR to keep scope bounded.
  • Re-running the OPDB sync — operator hand-off documented above; not a code change.
  • Re-running W1-3 hardened recuration — sequenced after sync re-run.

Checklist

  • CI is green (build + test + coverage + CodeQL + sanitization) — verified pre-push; will re-confirm in PR checks
  • PR title follows the Conventional Commits format above
  • If this is a new architectural decision, an ADR has been added under docs/adr/ — N/A (data-correctness fix; the existing OpdbMachineMapper design is correct, only the fallback chain semantics were wrong)
  • If user-visible behavior changes, README.md and/or docs/ are updated in the same PR — internal mechanics; user-visible behavior emerges only after the operator re-runs OPDB sync per the hand-off above
  • If a memory in ~/.claude/projects/c--projects-PinballWizard/memory/ is now stale, it has been updated or removed in the same PR — new memory note added documenting the operator hand-off
  • No TODO / FIXME / commented-out code committed
  • No new entries in <NoWarn> without a comment explaining why and the removal criterion — N/A

Pre-push self-audit

Step 0 — /local-review (qualitative)

  • Ran /local-review and addressed every 🔴 finding before push
  • Local review outcome: 1 🔴 (fixed pre-push) / 2 ⚠️ (1 fixed: operator hand-off memory note added; 1 deferred: storefront JSON-LD description ?? OG chains, see Out of Scope above) / 9 categories ✅
    • Fixed: sibling drift in OpdbSyncService.cs:191 (alias pass-2 partition-key fallback had the same bug). Promoted FirstNonBlank to internal static so the sync service can use it. New regression test pins the alias fold-as-edition behavior on shortname="".

Step 1 — Mechanical checklist

  • Every new *Options property has at least one real getter call in src/ — N/A (no options touched)
  • Sibling-diffed against the closest existing implementation — /local-review § 4 flagged the alias-path drift; fixed in commit bab335a
  • No bare catch { } — no new try/catch added; existing per-alias try/catch unchanged
  • New ISourceScraper? — N/A
  • Tests assert behavior, not just structure — 9 regression tests pin specific input → output contracts (blank/null/whitespace permutations vs. OpdbId fallback)
  • Build is zero-warning — verified 0 Warning(s), 0 Error(s)
  • git log -1 --format='%an <%ae>' shows personal noreply, not work email — confirmed 94459922+jkeeley2073@users.noreply.github.com

@jkeeley2073 jkeeley2073 added the claude-code Generated with Claude Code label May 8, 2026
Comment on lines +192 to +195
foreach (var candidate in candidates)
{
if (!string.IsNullOrWhiteSpace(candidate)) return candidate;
}
Comment on lines +192 to +195
foreach (var candidate in candidates)
{
if (!string.IsNullOrWhiteSpace(candidate)) return candidate;
}
…hain

Critical data-correctness fix uncovered while spot-checking the W1-3
recuration outcome. The deployed Cosmos catalog has 58 modern Stern
records (2018+, including Stern Godzilla Pro 2021 / GweeP-MW95j) where
title is empty string, breaking IMachineRepository.QueryByTitleAsync —
the function tool the Wizard agent uses to ground every machine
reference. The eval's W1-3 recuration silently picked the Sega 1998
Godzilla because that was the only "Godzilla" with a non-empty title
the title-keyed lookup could find.

ROOT CAUSE
C#'s null-coalescing operator (??) preserves empty strings:
`null ?? "" ?? fallback` evaluates to `""`, never reaching `fallback`.
OPDB's /api/export returns some modern Stern records with empty
`name` / `common_name`, so the existing chain
`Title = dto.CommonName ?? dto.Name ?? dto.OpdbId` produced an empty
title instead of the documented OpdbId fallback. Same issue affected
MergeOpdbFieldsInto on subsequent re-syncs (would WIPE existing titles
back to empty if OPDB returned blanks). Defense-in-depth: the
manufacturer ShortName fallback had the same shape; a blank ShortName
would have produced a "unknown" partition key.

FIX
Introduced FirstNonBlank(params string?[] candidates) helper that
returns the first non-null AND non-whitespace candidate, or null if
every candidate is blank. Applied at three sites in
OpdbMachineMapper:
  - Map() Title fallback chain
  - Map() manufacturer-key resolution (ShortName → Name)
  - MergeOpdbFieldsInto() Title refresh

REGRESSION TESTS (5 new)
  - Map_BlankCommonNameAndName_FallsBackToOpdbId — Theory with 5
    blank/null/whitespace permutations; pins the documented OpdbId
    fallback fires when both name fields are blank
  - Map_BlankCommonName_FallsThroughToName — pins the
    falls-through-to-Name behavior when CommonName is blank but Name
    is non-blank
  - Map_BlankShortName_FallsBackToFullManufacturerName — pins the
    defense-in-depth fix on the manufacturer-key chain
  - MergeOpdbFieldsInto_BlankCommonNameAndName_PreservesExistingTitle
    — pins that blanks on a re-sync don't wipe the existing title

OPERATOR HAND-OFF
The fix corrects the mapping logic; the deployed Cosmos catalog is
still in the broken state until the OPDB sync is re-run. Sequenced
hand-off: after this PR merges, re-run `dotnet run --project
src/PinballWizard.Cli -- --source opdb` against the personal
Earlybird subscription. The 58 Stern modern records will repopulate
their titles. Then re-run the W1-3 hardened recuration tool (per PR
#99) to reconcile the eval ground truth against the corrected
catalog.

Tests: 709 → 717. Build clean, zero warnings.
Pre-push /local-review surfaced 1 🔴 + 2 ⚠️.

🔴 Sibling drift caught: OpdbSyncService.cs:191 had the same
   blank-string-in-?? bug pattern in the alias pass-2 partition-key
   resolution. A blank ShortName on an alias DTO was preserved by ??
   as "", which then tripped NormalizeManufacturerKey's blank-input
   guard — caught by the per-alias try/catch and counted as a
   silently-dropped alias edition. Same bug, separate code path.

   Fix: promoted FirstNonBlank from `private static` to
   `internal static` so OpdbSyncService can use the same helper. New
   regression test SyncAsync_AliasBlankShortName_FallsThroughToFullManufacturerName
   pins the alias is folded as an edition (not skipped) when OPDB
   returns shortname="".

⚠️ Operator hand-off documented in memory (separate non-PR commit
   below). Code comments stay focused on the bug; operational state
   lives in memory.

⚠️ Lower-severity drift in storefront JSON-LD extractors
   (BofProductExtractor, JjpProductExtractor, MultimorphicProductExtractor)
   has similar `?? ` chains for `description` fields. Defer:
   storefront JSON-LD rarely emits empty descriptions; impact
   theoretical; tracked as Phase 4.x follow-up.

Tests: 717 → 718. Build clean, zero warnings.
@jkeeley2073 jkeeley2073 force-pushed the Dev-Phase4OpdbMapperEmptyStringFix branch from 6b88b39 to 1b15214 Compare May 8, 2026 14:01
@jkeeley2073 jkeeley2073 merged commit bc8230d into main May 8, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code Generated with Claude Code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants