feat(cosmos) PR 5: machine_title_lookups point-read + dual-write — user-delight headline#145
Merged
Merged
Conversation
…er-delight headline
PR 5 of 6 in the Cosmos for User Delight track per ADR-0025 § 4 —
the user-delight headline change and the final PR in the track.
Replaces the cross-partition `STRINGEQUALS` query on the Wizard
answer flow's cache-miss path with a two-point-read pattern: first
the new `machine_title_lookups` container by normalized title,
then `machines` by (opdb_id, manufacturer). Validated against PR 4's
`pinwiz.cosmos.query_duration_ms{operation=query, container=machines}`
baseline — the existing query is retained as a logged-warning
fallback so the tool degrades gracefully when the lookup is missing
or stale.
What lands:
- New container `machine_title_lookups` (partition `/normalizedTitle`,
doc id == partition-key value, selective indexing per ADR-0025 § 3,
no TTL — bounded by OPDB catalog).
- New `MachineTitleLookup` domain entity (Core) with `NormalizeTitle`
static helper that lowercases + trims + escapes Cosmos-id-forbidden
chars (`/`, `\`, `?`, `#` → `_`) so id == partition-key value is
always valid; `UpsertEntry(opdbId, manufacturer)` and
`RemoveEntry(opdbId)` mutate the parallel `OpdbIds` /
`Manufacturers` arrays in lock-step.
- New `IMachineTitleLookupRepository` (Application) with convenience
`GetByTitleAsync` / `DeleteByTitleAsync` that normalize the input
title and delegate to `IRepository<T>`'s point-read methods.
- New `MachineTitleLookupRepository` (Infrastructure) extends
`CosmosRepository<MachineTitleLookup>` so every SDK call inherits
PR 4's metering — RU + duration land on `pinwiz.cosmos.*` tagged
`container=machine_title_lookups`.
- `OpdbSyncService` gains `IMachineTitleLookupRepository` dependency
+ `UpdateTitleLookupAsync` helper called inside the per-machine
pass-1 loop with rename detection (capture `priorTitle` before
`OpdbMachineMapper.MergeOpdbFieldsInto`; if normalized prior
title differs from new normalized title, remove entry from old
row, delete row if it becomes empty, upsert new row). Per-step
exception isolation — a failed lookup write logs at warning and
the sync continues; the machine row already landed and the
cross-partition fallback resolves queries until the next sync.
- `MachineGroundingTool.GetMachineByTitleAsync` refactored to
point-read first via lookup, fall back to `QueryByTitleAsync`
when the lookup row is missing OR stale (lookup hit but the
pointed-at machine row is gone). Stale lookup logs warning;
missing lookup with successful fallback logs warning so operators
see the gap (operator action: re-run OPDB sync). Missing lookup
with no fallback match returns null silently (normal "no such
machine" flow).
- Five-layer enforcement scaffolding self-references: ADR-0025 § 4
is the canonical spec; CLAUDE.md item 8(e) (PR 4 wording) is
satisfied because every new repo method routes through
`ExecuteWithMetricsAsync` via inheritance.
Tests (1009 -> 1034, +25 net new):
- `MachineTitleLookupTests` — 16 tests covering normalization
(lowercase/trim, forbidden-char substitution, blank rejection)
and parallel-array invariants (UpsertEntry replaces+appends,
RemoveEntry returns true/false correctly, lengths stay equal).
- 4 new `MachineGroundingToolTests` — lookup-hit-uses-point-read-no-
fallback, lookup-hit-but-machine-missing-falls-back, lookup-missing-
falls-back, lookup-collision-returns-first.
- 5 new `OpdbSyncServiceTests` — insert dual-writes lookup, rename
moves entry to new row, rename of last entry deletes old row,
dry-run doesn't touch lookups, lookup-upsert-throws still completes
the sync.
- `CosmosOptionsTests` — pin new container's partition key + TTL=null
decision; `IndexingPolicyContractTests` — pin new container's
selective policy.
Pre-push self-audit:
- /local-review qualitative: 0 🔴 / 1 ⚠️ / 10 categories ✅
* ⚠️ Missing TTL=null comment on machine_title_lookups + missing
pinning test — addressed in this PR before push (CosmosOptions
now has a documented "No TTL" rationale block; CosmosOptionsTests
gains Defaults_MachineTitleLookups_HasNoTtl).
- 8-item mechanical: ✅ all 8 pass; sub-rules (a) + (b) + (c) + (d) +
(e) of item 8 all explicitly satisfied for the new container.
- Build: 0/0 zero warnings as errors.
- Identity: personal noreply.
Decision-log entry under docs/decision-log.md captures the three
sub-ADR decisions (dual-write over Change-Feed for a single
projection; parallel-arrays over list-of-records for wire-format
compactness; one-way Cosmos-id-safe normalization over hashed-id
or two-keyed-storage).
This closes the Cosmos for User Delight track — all 6 PRs shipped.
The §7.1 architecture-v2 user-delight revisit triggers (200ms p95
on the getMachineByTitle path, RU cost dominance) are now both
measurable and structurally bounded.
Comment on lines
+454
to
+460
| catch (Exception ex) | ||
| { | ||
| _logger.LogWarning( | ||
| ex, | ||
| "OPDB sync: failed to clean up old title lookup row for machine {MachineId} (prior title '{PriorTitle}' → new title '{NewTitle}'). The stale entry will produce a logged-warning fallback in MachineGroundingTool until the next sync repopulates the row.", | ||
| machineId, priorTitle, newTitle); | ||
| } |
Comment on lines
+483
to
+489
| catch (Exception ex) | ||
| { | ||
| _logger.LogWarning( | ||
| ex, | ||
| "OPDB sync: failed to update title lookup row for machine {MachineId} title '{Title}'. The cross-partition fallback in MachineGroundingTool will resolve queries for this title until the next sync repopulates the lookup.", | ||
| machineId, newTitle); | ||
| } |
|
|
||
| var repo = Substitute.For<IMachineRepository>(); | ||
| // The lookup-pointed-at machine is gone (point-read returns null). | ||
| repo.GetByOpdbIdAsync("GRBN-STALE", "stern", Arg.Any<CancellationToken>()).Returns((Machine?)null); |
| // transient lookup-write failure during OPDB sync). The tool | ||
| // must still resolve the answer via the fallback path. | ||
| var lookups = Substitute.For<IMachineTitleLookupRepository>(); | ||
| lookups.GetByTitleAsync("Godzilla", Arg.Any<CancellationToken>()).Returns((MachineTitleLookup?)null); |
| _handler.SetResponseFor("/api/export", JsonArray( | ||
| MachineJson("GRBN-MQR4P", manufacturer: "Stern Pinball, Inc.", name: "Stranger Things (Pro)", commonName: "Stranger Things"))); | ||
|
|
||
| _repository.GetByOpdbIdAsync("GRBN-MQR4P", "stern", Arg.Any<CancellationToken>()).Returns((Machine?)null); |
| // Lookup repo returns null on first call (no existing row), so | ||
| // the helper creates a new row. | ||
| _titleLookups.GetByTitleAsync(Arg.Any<string>(), Arg.Any<CancellationToken>()) | ||
| .Returns((MachineTitleLookup?)null); |
| _titleLookups.GetByTitleAsync("Old Name", Arg.Any<CancellationToken>()).Returns(oldLookup); | ||
| // New lookup row doesn't exist yet. | ||
| _titleLookups.GetByTitleAsync("New Name", Arg.Any<CancellationToken>()) | ||
| .Returns((MachineTitleLookup?)null); |
| _handler.SetResponseFor("/api/export", JsonArray( | ||
| MachineJson("GRBN-MQR4P", manufacturer: "Stern Pinball, Inc.", name: "Stranger Things (Pro)", commonName: "Stranger Things"))); | ||
|
|
||
| _repository.GetByOpdbIdAsync("GRBN-MQR4P", "stern", Arg.Any<CancellationToken>()).Returns((Machine?)null); |
| _handler.SetResponseFor("/api/export", JsonArray( | ||
| MachineJson("GRBN-MQR4P", manufacturer: "Stern Pinball, Inc.", name: "Stranger Things (Pro)", commonName: "Stranger Things"))); | ||
|
|
||
| _repository.GetByOpdbIdAsync("GRBN-MQR4P", "stern", Arg.Any<CancellationToken>()).Returns((Machine?)null); |
|
|
||
| _repository.GetByOpdbIdAsync("GRBN-MQR4P", "stern", Arg.Any<CancellationToken>()).Returns((Machine?)null); | ||
| _titleLookups.GetByTitleAsync(Arg.Any<string>(), Arg.Any<CancellationToken>()) | ||
| .Returns((MachineTitleLookup?)null); |
Comment on lines
+454
to
+460
| catch (Exception ex) | ||
| { | ||
| _logger.LogWarning( | ||
| ex, | ||
| "OPDB sync: failed to clean up old title lookup row for machine {MachineId} (prior title '{PriorTitle}' → new title '{NewTitle}'). The stale entry will produce a logged-warning fallback in MachineGroundingTool until the next sync repopulates the row.", | ||
| machineId, priorTitle, newTitle); | ||
| } |
Comment on lines
+483
to
+489
| catch (Exception ex) | ||
| { | ||
| _logger.LogWarning( | ||
| ex, | ||
| "OPDB sync: failed to update title lookup row for machine {MachineId} title '{Title}'. The cross-partition fallback in MachineGroundingTool will resolve queries for this title until the next sync repopulates the lookup.", | ||
| machineId, newTitle); | ||
| } |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 5 of 6 in the Cosmos for User Delight track per ADR-0025 § 4 — the user-delight headline change and the final PR in the track. Replaces the cross-partition
STRINGEQUALSquery on the Wizard answer flow's cache-miss path with a two-point-read pattern: first the newmachine_title_lookupscontainer by normalized title, thenmachinesby(opdb_id, manufacturer). Validated against PR 4'spinwiz.cosmos.query_duration_ms{operation=query, container=machines}baseline. The existing query is retained as a logged-warning fallback so the tool degrades gracefully when the lookup is missing or stale.What lands
machine_title_lookups— partition/normalizedTitle, docid == partitionKey valuefor pure point reads, selective indexing per ADR-0025 § 3 (/id/?,/normalizedTitle/?, exclude/*), no TTL (rows bounded by OPDB catalog, refreshed every sync).MachineTitleLookup(Core) — entity withNormalizeTitlestatic helper that lowercases + trims + escapes Cosmos-id-forbidden chars (/,\,?,#→_), plusUpsertEntry/RemoveEntrykeepingOpdbIdsandManufacturersparallel arrays in lock-step.IMachineTitleLookupRepository(Application) — convenienceGetByTitleAsync/DeleteByTitleAsyncover the inheritedIRepository<T>.MachineTitleLookupRepository(Infrastructure) — extendsCosmosRepository<MachineTitleLookup>so every SDK call inherits PR 4's metering (RU + duration onpinwiz.cosmos.*taggedcontainer=machine_title_lookups+CosmosException.Diagnosticscapture on non-404 failures).OpdbSyncServicedual-write — newUpdateTitleLookupAsynchelper called inside the per-machine pass-1 loop. Rename detection capturespriorTitlebefore the merge; if normalized differs, removes entry from old row (deletes row if it becomes empty), upserts new row. Per-step exception isolation — failed lookup write logs warning, sync continues, cross-partition fallback resolves queries until the next sync.MachineGroundingTool.GetMachineByTitleAsyncrefactor — point-read first via lookup, fall back toQueryByTitleAsyncwhen lookup row is missing OR stale (lookup hit but pointed-at machine row is gone). Stale lookup logs warning; missing lookup with successful fallback logs warning (operator action: re-run OPDB sync).Test Plan
dotnet build PinballWizard.slnx -p:TreatWarningsAsErrors=true→ 0/0dotnet test PinballWizard.slnx→ 1034 passed (+25 net new)MachineTitleLookupTests— 16 tests (normalization edge cases, parallel-array invariants)MachineGroundingToolTests— point-read path, stale-lookup fallback, missing-lookup fallback, collision-returns-firstOpdbSyncServiceTests— insert dual-writes, rename moves entry, rename of last entry deletes row, dry-run doesn't touch lookups, lookup upsert throws still completes syncCosmosOptionsTests+IndexingPolicyContractTestsupdated to pin the new container's partition key, selective policy, and TTL=null decision/local-reviewqualitative: 0 🔴 / 1git log -1 --format='%an <%ae>'shows personal noreply--ensure-cosmos-containersagainst deployed Cosmos creates the new container; re-run OPDB sync against deployed Cosmos populatesmachine_title_lookups; observe in App Insights thatpinwiz.cosmos.query_duration_ms{operation=read, container=machine_title_lookups}p95 is ~5ms vs the pre-merge{operation=query, container=machines}baseline of ~50-150msOut of Scope
MachineGroundingToolfor title collisions — deferred until eval data surfaces an ambiguity case the agent can't resolve from the first entrymachines→machine_title_lookups— deferred until a 2ndmachinesmaterialized view lands (revisit trigger documented in ADR-0025 § 1)rawTitles: string[]field; defer until such a query emergesCloses the Cosmos for User Delight track
All 6 PRs in the track now shipped:
pinwiz.cosmos.*instruments +ExecuteWithMetricsAsync) ✅ mergedThe §7.1 architecture-v2 user-delight revisit triggers (200ms p95 on the
getMachineByTitlepath, RU cost dominance) are now both measurable and structurally bounded.🤖 Generated with Claude Code