Skip to content

feat(cosmos) PR 5: machine_title_lookups point-read + dual-write — user-delight headline#145

Merged
jkeeley2073 merged 1 commit into
mainfrom
Dev-Phase4Adr0025Pr5PointReadLookup
May 9, 2026
Merged

feat(cosmos) PR 5: machine_title_lookups point-read + dual-write — user-delight headline#145
jkeeley2073 merged 1 commit into
mainfrom
Dev-Phase4Adr0025Pr5PointReadLookup

Conversation

@jkeeley2073
Copy link
Copy Markdown
Contributor

Summary

PR 5 of 6 in the Cosmos for User Delight track per ADR-0025 § 4 — the user-delight headline change and the final PR in the track. Replaces the cross-partition STRINGEQUALS query on the Wizard answer flow's cache-miss path with a two-point-read pattern: first the new machine_title_lookups container by normalized title, then machines by (opdb_id, manufacturer). Validated against PR 4's pinwiz.cosmos.query_duration_ms{operation=query, container=machines} baseline. The existing query is retained as a logged-warning fallback so the tool degrades gracefully when the lookup is missing or stale.

What lands

  • New container machine_title_lookups — partition /normalizedTitle, doc id == partitionKey value for pure point reads, selective indexing per ADR-0025 § 3 (/id/?, /normalizedTitle/?, exclude /*), no TTL (rows bounded by OPDB catalog, refreshed every sync).
  • MachineTitleLookup (Core) — entity with NormalizeTitle static helper that lowercases + trims + escapes Cosmos-id-forbidden chars (/, \, ?, #_), plus UpsertEntry/RemoveEntry keeping OpdbIds and Manufacturers parallel arrays in lock-step.
  • IMachineTitleLookupRepository (Application) — convenience GetByTitleAsync / DeleteByTitleAsync over the inherited IRepository<T>.
  • MachineTitleLookupRepository (Infrastructure) — extends CosmosRepository<MachineTitleLookup> so every SDK call inherits PR 4's metering (RU + duration on pinwiz.cosmos.* tagged container=machine_title_lookups + CosmosException.Diagnostics capture on non-404 failures).
  • OpdbSyncService dual-write — new UpdateTitleLookupAsync helper called inside the per-machine pass-1 loop. Rename detection captures priorTitle before the merge; if normalized differs, removes entry from old row (deletes row if it becomes empty), upserts new row. Per-step exception isolation — failed lookup write logs warning, sync continues, cross-partition fallback resolves queries until the next sync.
  • MachineGroundingTool.GetMachineByTitleAsync refactor — point-read first via lookup, fall back to QueryByTitleAsync when lookup row is missing OR stale (lookup hit but pointed-at machine row is gone). Stale lookup logs warning; missing lookup with successful fallback logs warning (operator action: re-run OPDB sync).

Test Plan

  • dotnet build PinballWizard.slnx -p:TreatWarningsAsErrors=true → 0/0
  • dotnet test PinballWizard.slnx1034 passed (+25 net new)
  • New tests:
    • MachineTitleLookupTests — 16 tests (normalization edge cases, parallel-array invariants)
    • 4 new MachineGroundingToolTests — point-read path, stale-lookup fallback, missing-lookup fallback, collision-returns-first
    • 5 new OpdbSyncServiceTests — insert dual-writes, rename moves entry, rename of last entry deletes row, dry-run doesn't touch lookups, lookup upsert throws still completes sync
    • CosmosOptionsTests + IndexingPolicyContractTests updated to pin the new container's partition key, selective policy, and TTL=null decision
  • Pre-push self-audit:
    • Step 0 /local-review qualitative: 0 🔴 / 1 ⚠️ / 10 categories ✅ — the ⚠️ on missing TTL=null comment + test pin was surfaced and addressed in this PR before push
    • Step 1 8-item mechanical: ✅ all 8 PASS; item 8 sub-rules (a)+(b)+(c)+(d)+(e) all explicitly satisfied for the new container
  • Identity: git log -1 --format='%an <%ae>' shows personal noreply
  • Operator validation deferred to whole-sequence verification: --ensure-cosmos-containers against deployed Cosmos creates the new container; re-run OPDB sync against deployed Cosmos populates machine_title_lookups; observe in App Insights that pinwiz.cosmos.query_duration_ms{operation=read, container=machine_title_lookups} p95 is ~5ms vs the pre-merge {operation=query, container=machines} baseline of ~50-150ms

Out of Scope

  • Up-to-N matches return surface from MachineGroundingTool for title collisions — deferred until eval data surfaces an ambiguity case the agent can't resolve from the first entry
  • Change-Feed-driven projection from machinesmachine_title_lookups — deferred until a 2nd machines materialized view lands (revisit trigger documented in ADR-0025 § 1)
  • Re-keying lookups on raw (un-normalized) title for hypothetical analytic queries — would need a new rawTitles: string[] field; defer until such a query emerges

Closes the Cosmos for User Delight track

All 6 PRs in the track now shipped:

The §7.1 architecture-v2 user-delight revisit triggers (200ms p95 on the getMachineByTitle path, RU cost dominance) are now both measurable and structurally bounded.

🤖 Generated with Claude Code

…er-delight headline

PR 5 of 6 in the Cosmos for User Delight track per ADR-0025 § 4 —
the user-delight headline change and the final PR in the track.
Replaces the cross-partition `STRINGEQUALS` query on the Wizard
answer flow's cache-miss path with a two-point-read pattern: first
the new `machine_title_lookups` container by normalized title,
then `machines` by (opdb_id, manufacturer). Validated against PR 4's
`pinwiz.cosmos.query_duration_ms{operation=query, container=machines}`
baseline — the existing query is retained as a logged-warning
fallback so the tool degrades gracefully when the lookup is missing
or stale.

What lands:

- New container `machine_title_lookups` (partition `/normalizedTitle`,
  doc id == partition-key value, selective indexing per ADR-0025 § 3,
  no TTL — bounded by OPDB catalog).
- New `MachineTitleLookup` domain entity (Core) with `NormalizeTitle`
  static helper that lowercases + trims + escapes Cosmos-id-forbidden
  chars (`/`, `\`, `?`, `#` → `_`) so id == partition-key value is
  always valid; `UpsertEntry(opdbId, manufacturer)` and
  `RemoveEntry(opdbId)` mutate the parallel `OpdbIds` /
  `Manufacturers` arrays in lock-step.
- New `IMachineTitleLookupRepository` (Application) with convenience
  `GetByTitleAsync` / `DeleteByTitleAsync` that normalize the input
  title and delegate to `IRepository<T>`'s point-read methods.
- New `MachineTitleLookupRepository` (Infrastructure) extends
  `CosmosRepository<MachineTitleLookup>` so every SDK call inherits
  PR 4's metering — RU + duration land on `pinwiz.cosmos.*` tagged
  `container=machine_title_lookups`.
- `OpdbSyncService` gains `IMachineTitleLookupRepository` dependency
  + `UpdateTitleLookupAsync` helper called inside the per-machine
  pass-1 loop with rename detection (capture `priorTitle` before
  `OpdbMachineMapper.MergeOpdbFieldsInto`; if normalized prior
  title differs from new normalized title, remove entry from old
  row, delete row if it becomes empty, upsert new row). Per-step
  exception isolation — a failed lookup write logs at warning and
  the sync continues; the machine row already landed and the
  cross-partition fallback resolves queries until the next sync.
- `MachineGroundingTool.GetMachineByTitleAsync` refactored to
  point-read first via lookup, fall back to `QueryByTitleAsync`
  when the lookup row is missing OR stale (lookup hit but the
  pointed-at machine row is gone). Stale lookup logs warning;
  missing lookup with successful fallback logs warning so operators
  see the gap (operator action: re-run OPDB sync). Missing lookup
  with no fallback match returns null silently (normal "no such
  machine" flow).
- Five-layer enforcement scaffolding self-references: ADR-0025 § 4
  is the canonical spec; CLAUDE.md item 8(e) (PR 4 wording) is
  satisfied because every new repo method routes through
  `ExecuteWithMetricsAsync` via inheritance.

Tests (1009 -> 1034, +25 net new):

- `MachineTitleLookupTests` — 16 tests covering normalization
  (lowercase/trim, forbidden-char substitution, blank rejection)
  and parallel-array invariants (UpsertEntry replaces+appends,
  RemoveEntry returns true/false correctly, lengths stay equal).
- 4 new `MachineGroundingToolTests` — lookup-hit-uses-point-read-no-
  fallback, lookup-hit-but-machine-missing-falls-back, lookup-missing-
  falls-back, lookup-collision-returns-first.
- 5 new `OpdbSyncServiceTests` — insert dual-writes lookup, rename
  moves entry to new row, rename of last entry deletes old row,
  dry-run doesn't touch lookups, lookup-upsert-throws still completes
  the sync.
- `CosmosOptionsTests` — pin new container's partition key + TTL=null
  decision; `IndexingPolicyContractTests` — pin new container's
  selective policy.

Pre-push self-audit:

- /local-review qualitative: 0 🔴 / 1 ⚠️ / 10 categories ✅
    * ⚠️ Missing TTL=null comment on machine_title_lookups + missing
      pinning test — addressed in this PR before push (CosmosOptions
      now has a documented "No TTL" rationale block; CosmosOptionsTests
      gains Defaults_MachineTitleLookups_HasNoTtl).
- 8-item mechanical: ✅ all 8 pass; sub-rules (a) + (b) + (c) + (d) +
  (e) of item 8 all explicitly satisfied for the new container.
- Build: 0/0 zero warnings as errors.
- Identity: personal noreply.

Decision-log entry under docs/decision-log.md captures the three
sub-ADR decisions (dual-write over Change-Feed for a single
projection; parallel-arrays over list-of-records for wire-format
compactness; one-way Cosmos-id-safe normalization over hashed-id
or two-keyed-storage).

This closes the Cosmos for User Delight track — all 6 PRs shipped.
The §7.1 architecture-v2 user-delight revisit triggers (200ms p95
on the getMachineByTitle path, RU cost dominance) are now both
measurable and structurally bounded.
@jkeeley2073 jkeeley2073 added the claude-code Generated with Claude Code label May 9, 2026
@jkeeley2073 jkeeley2073 enabled auto-merge May 9, 2026 14:43
Comment on lines +454 to +460
catch (Exception ex)
{
_logger.LogWarning(
ex,
"OPDB sync: failed to clean up old title lookup row for machine {MachineId} (prior title '{PriorTitle}' → new title '{NewTitle}'). The stale entry will produce a logged-warning fallback in MachineGroundingTool until the next sync repopulates the row.",
machineId, priorTitle, newTitle);
}
Comment on lines +483 to +489
catch (Exception ex)
{
_logger.LogWarning(
ex,
"OPDB sync: failed to update title lookup row for machine {MachineId} title '{Title}'. The cross-partition fallback in MachineGroundingTool will resolve queries for this title until the next sync repopulates the lookup.",
machineId, newTitle);
}

var repo = Substitute.For<IMachineRepository>();
// The lookup-pointed-at machine is gone (point-read returns null).
repo.GetByOpdbIdAsync("GRBN-STALE", "stern", Arg.Any<CancellationToken>()).Returns((Machine?)null);
// transient lookup-write failure during OPDB sync). The tool
// must still resolve the answer via the fallback path.
var lookups = Substitute.For<IMachineTitleLookupRepository>();
lookups.GetByTitleAsync("Godzilla", Arg.Any<CancellationToken>()).Returns((MachineTitleLookup?)null);
_handler.SetResponseFor("/api/export", JsonArray(
MachineJson("GRBN-MQR4P", manufacturer: "Stern Pinball, Inc.", name: "Stranger Things (Pro)", commonName: "Stranger Things")));

_repository.GetByOpdbIdAsync("GRBN-MQR4P", "stern", Arg.Any<CancellationToken>()).Returns((Machine?)null);
// Lookup repo returns null on first call (no existing row), so
// the helper creates a new row.
_titleLookups.GetByTitleAsync(Arg.Any<string>(), Arg.Any<CancellationToken>())
.Returns((MachineTitleLookup?)null);
_titleLookups.GetByTitleAsync("Old Name", Arg.Any<CancellationToken>()).Returns(oldLookup);
// New lookup row doesn't exist yet.
_titleLookups.GetByTitleAsync("New Name", Arg.Any<CancellationToken>())
.Returns((MachineTitleLookup?)null);
_handler.SetResponseFor("/api/export", JsonArray(
MachineJson("GRBN-MQR4P", manufacturer: "Stern Pinball, Inc.", name: "Stranger Things (Pro)", commonName: "Stranger Things")));

_repository.GetByOpdbIdAsync("GRBN-MQR4P", "stern", Arg.Any<CancellationToken>()).Returns((Machine?)null);
_handler.SetResponseFor("/api/export", JsonArray(
MachineJson("GRBN-MQR4P", manufacturer: "Stern Pinball, Inc.", name: "Stranger Things (Pro)", commonName: "Stranger Things")));

_repository.GetByOpdbIdAsync("GRBN-MQR4P", "stern", Arg.Any<CancellationToken>()).Returns((Machine?)null);

_repository.GetByOpdbIdAsync("GRBN-MQR4P", "stern", Arg.Any<CancellationToken>()).Returns((Machine?)null);
_titleLookups.GetByTitleAsync(Arg.Any<string>(), Arg.Any<CancellationToken>())
.Returns((MachineTitleLookup?)null);
Comment on lines +454 to +460
catch (Exception ex)
{
_logger.LogWarning(
ex,
"OPDB sync: failed to clean up old title lookup row for machine {MachineId} (prior title '{PriorTitle}' → new title '{NewTitle}'). The stale entry will produce a logged-warning fallback in MachineGroundingTool until the next sync repopulates the row.",
machineId, priorTitle, newTitle);
}
Comment on lines +483 to +489
catch (Exception ex)
{
_logger.LogWarning(
ex,
"OPDB sync: failed to update title lookup row for machine {MachineId} title '{Title}'. The cross-partition fallback in MachineGroundingTool will resolve queries for this title until the next sync repopulates the lookup.",
machineId, newTitle);
}
@jkeeley2073 jkeeley2073 merged commit de2e6fb into main May 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code Generated with Claude Code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants