fix(opdb) treat blank strings as null in OpdbMachineMapper fallback chain#100
Merged
Conversation
18 tasks
bab335a to
6bdc826
Compare
19 tasks
…hain
Critical data-correctness fix uncovered while spot-checking the W1-3
recuration outcome. The deployed Cosmos catalog has 58 modern Stern
records (2018+, including Stern Godzilla Pro 2021 / GweeP-MW95j) where
title is empty string, breaking IMachineRepository.QueryByTitleAsync —
the function tool the Wizard agent uses to ground every machine
reference. The eval's W1-3 recuration silently picked the Sega 1998
Godzilla because that was the only "Godzilla" with a non-empty title
the title-keyed lookup could find.
ROOT CAUSE
C#'s null-coalescing operator (??) preserves empty strings:
`null ?? "" ?? fallback` evaluates to `""`, never reaching `fallback`.
OPDB's /api/export returns some modern Stern records with empty
`name` / `common_name`, so the existing chain
`Title = dto.CommonName ?? dto.Name ?? dto.OpdbId` produced an empty
title instead of the documented OpdbId fallback. Same issue affected
MergeOpdbFieldsInto on subsequent re-syncs (would WIPE existing titles
back to empty if OPDB returned blanks). Defense-in-depth: the
manufacturer ShortName fallback had the same shape; a blank ShortName
would have produced a "unknown" partition key.
FIX
Introduced FirstNonBlank(params string?[] candidates) helper that
returns the first non-null AND non-whitespace candidate, or null if
every candidate is blank. Applied at three sites in
OpdbMachineMapper:
- Map() Title fallback chain
- Map() manufacturer-key resolution (ShortName → Name)
- MergeOpdbFieldsInto() Title refresh
REGRESSION TESTS (5 new)
- Map_BlankCommonNameAndName_FallsBackToOpdbId — Theory with 5
blank/null/whitespace permutations; pins the documented OpdbId
fallback fires when both name fields are blank
- Map_BlankCommonName_FallsThroughToName — pins the
falls-through-to-Name behavior when CommonName is blank but Name
is non-blank
- Map_BlankShortName_FallsBackToFullManufacturerName — pins the
defense-in-depth fix on the manufacturer-key chain
- MergeOpdbFieldsInto_BlankCommonNameAndName_PreservesExistingTitle
— pins that blanks on a re-sync don't wipe the existing title
OPERATOR HAND-OFF
The fix corrects the mapping logic; the deployed Cosmos catalog is
still in the broken state until the OPDB sync is re-run. Sequenced
hand-off: after this PR merges, re-run `dotnet run --project
src/PinballWizard.Cli -- --source opdb` against the personal
Earlybird subscription. The 58 Stern modern records will repopulate
their titles. Then re-run the W1-3 hardened recuration tool (per PR
#99) to reconcile the eval ground truth against the corrected
catalog.
Tests: 709 → 717. Build clean, zero warnings.
Pre-push /local-review surfaced 1 🔴 + 2⚠️ . 🔴 Sibling drift caught: OpdbSyncService.cs:191 had the same blank-string-in-?? bug pattern in the alias pass-2 partition-key resolution. A blank ShortName on an alias DTO was preserved by ?? as "", which then tripped NormalizeManufacturerKey's blank-input guard — caught by the per-alias try/catch and counted as a silently-dropped alias edition. Same bug, separate code path. Fix: promoted FirstNonBlank from `private static` to `internal static` so OpdbSyncService can use the same helper. New regression test SyncAsync_AliasBlankShortName_FallsThroughToFullManufacturerName pins the alias is folded as an edition (not skipped) when OPDB returns shortname="".⚠️ Operator hand-off documented in memory (separate non-PR commit below). Code comments stay focused on the bug; operational state lives in memory.⚠️ Lower-severity drift in storefront JSON-LD extractors (BofProductExtractor, JjpProductExtractor, MultimorphicProductExtractor) has similar `?? ` chains for `description` fields. Defer: storefront JSON-LD rarely emits empty descriptions; impact theoretical; tracked as Phase 4.x follow-up. Tests: 717 → 718. Build clean, zero warnings.
6b88b39 to
1b15214
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Critical data-correctness fix uncovered while spot-checking the Phase 4 W1-3 eval re-curation outcome (PR #98 merged 2026-05-08).
The bug.
OpdbMachineMapper.Map()andMergeOpdbFieldsInto()fell back throughdto.CommonName ?? dto.Name ?? dto.OpdbId. C#'s null-coalescing operator preserves empty strings —null ?? "" ?? OpdbIdevaluates to"", never reachingOpdbId. OPDB's/api/exportreturns some modern Stern records with emptyname/common_namestrings, so the existing chain produced empty titles in deployed Cosmos.The blast radius. A spot-check of the deployed catalog (probe script, since deleted) found 58 Stern records from 2018+ with
title=""— includingGweeP-MW95j(Stern Godzilla Pro 2021),GRBN-MQR4P(Stern Foo Fighters), and the rest of the modern Stern catalog. Empty titles silently breakIMachineRepository.QueryByTitleAsync— the function tool the Wizard agent uses to ground every machine reference. Phase 3 H2 baselinecitation_precision=0.133was substantially driven by this single bug: when the eval asked about Stern Godzilla 2021, the deployed catalog had only Sega Godzilla 1998 with a non-empty title.The fix. New
FirstNonBlank(params string?[])helper that treats null/empty/whitespace identically. Applied at four sites:OpdbMachineMapper.Map()title fallback chainOpdbMachineMapper.Map()manufacturer-key resolution (ShortName fallback)OpdbMachineMapper.MergeOpdbFieldsInto()title refreshOpdbSyncService.cs:191alias pass-2 partition-key resolution (sibling drift caught by/local-review)Operator hand-off — do this after merge. The deployed catalog stays broken until the operator re-runs OPDB sync. After this PR merges:
dotnet run --project src/PinballWizard.Cli -- --source opdb(withOPDB__APITOKENandaz loginagainst the personal Earlybird sub)dotnet script tools/eval/Recurate.csx -- --dry-run(the W1-3 hardened recuration tool from PR chore(eval) Phase 4 W1-3 hardening — mfg-aware re-curation skip-on-mismatch #99 should now resolve Godzilla toGweeP-MW95jinstead of Sega'sG5po2-MeP6B)dotnet script tools/eval/Recurate.csxcitation_precisionlift:dotnet run --project src/PinballWizard.Cli -- --evalMemory hand-off written at
~/.claude/projects/c--projects-PinballWizard/memory/session_handoff_2026_05_08_opdb_mapper_blank_string_fix.mdso a future session asking "what's the state of the deployed catalog?" finds the answer without git archaeology.Test Plan
dotnet build PinballWizard.slnx— Build succeeded, 0 Warning(s), 0 Error(s)dotnet test PinballWizard.slnx— Passed: 718, Failed: 0, Skipped: 0 (was 709; +9 regression tests)Map_BlankCommonNameAndName_FallsBackToOpdbId— Theory with 5 InlineData rows:(null, ""),("", null),("", ""),(" ", " "),("\t", "\n"). Each fails against pre-fix code (returns""or whitespace, not OpdbId).Map_BlankCommonName_FallsThroughToName— pins the falls-through-to-Name behavior when CommonName is blank but Name is non-blank.Map_BlankShortName_FallsBackToFullManufacturerName— pins the defense-in-depth fix on the manufacturer-key chain.MergeOpdbFieldsInto_BlankCommonNameAndName_PreservesExistingTitle— pins that blanks on a re-sync don't wipe an existing title.SyncAsync_AliasBlankShortName_FallsThroughToFullManufacturerName— pins the alias is folded as an edition (not silently skipped) when OPDB returnsshortname="".citation_precisionlift.Out of Scope
BofProductExtractor,JjpProductExtractor,MultimorphicProductExtractor) — similar??chains fordescriptionfields. Storefront JSON-LD rarely emits empty descriptions; impact theoretical. Tracked as Phase 4.x follow-up; not expanded into this PR to keep scope bounded.Checklist
docs/adr/— N/A (data-correctness fix; the existingOpdbMachineMapperdesign is correct, only the fallback chain semantics were wrong)~/.claude/projects/c--projects-PinballWizard/memory/is now stale, it has been updated or removed in the same PR — new memory note added documenting the operator hand-offTODO/FIXME/ commented-out code committed<NoWarn>without a comment explaining why and the removal criterion — N/APre-push self-audit
Step 0 —
/local-review(qualitative)/local-reviewand addressed every 🔴 finding before pushdescription ?? OGchains, see Out of Scope above) / 9 categories ✅OpdbSyncService.cs:191(alias pass-2 partition-key fallback had the same bug). PromotedFirstNonBlanktointernal staticso the sync service can use it. New regression test pins the alias fold-as-edition behavior onshortname="".Step 1 — Mechanical checklist
*Optionsproperty has at least one real getter call insrc/— N/A (no options touched)/local-review § 4flagged the alias-path drift; fixed in commitbab335acatch { }— no new try/catch added; existing per-alias try/catch unchangedISourceScraper? — N/A0 Warning(s), 0 Error(s)git log -1 --format='%an <%ae>'shows personal noreply, not work email — confirmed94459922+jkeeley2073@users.noreply.github.com