feat(cosmos) PR 2: CosmosClientOptions wins + warmup + health check + ApplicationName per host#140
Merged
Conversation
… ApplicationName per host PR 2 of 6 in the Cosmos for User Delight track per ADR-0025 (plan: ~/.claude/plans/lets-take-some-time-ticklish-storm.md). PR 1 (#139) shipped the ADR + 5-layer enforcement scaffolding; this PR implements the locked CosmosClientOptions posture + host-startup wiring (no behavioral change to user-facing query paths; PR 5 is the user-delight headline change). What lands: - CosmosClientOptions changes (per ADR-0025 § 2): - EnableContentResponseOnWrite = false — saves one round-trip + ~1 RU per write. Required pre-flight: refactored CosmosRepository<T>.UpsertAsync to return the input entity directly instead of response.Resource (which is null when the option is off). Audited every IRepository<T>.UpsertAsync caller — none consume response.Resource. Updated IRepository<T> XML doc + the affected CosmosRepositoryTests case to reflect the new contract. - AllowBulkExecution = true — auto-batches concurrent same-partition operations. Zero risk for current single-op call sites; meaningful win for OPDB sync (~2,400 sequential upserts) and the future Phase 1 → Cosmos backfill. - MachineRepository.QueryByTitleAsync: refactored to use Container.GetItemQueryIterator directly with MaxItemCount = 1 (the caller breaks on first match). Comment cross-references ADR-0025 § 4 bridge state — PR 5 replaces the cross-partition query with a point-read against machine_title_lookups. - New CosmosClientWarmupHostedService (BackgroundService) — calls CosmosClient.ReadAccountAsync() at host startup to amortize the SDK's lazy-connection cost (~300-500ms) off the first user query. Failure logs Warning, not throw (warmup is a latency optimization, not a hard dependency). - New CosmosHealthCheck (IHealthCheck) — probes machines container via Container.ReadContainerAsync as a canary. Tagged 'live' so /healthz reports Cosmos reachability for ACA / Aspire liveness probes. On CosmosException, captures Diagnostics into health-check data so operators see region + retry count + RU consumed without a separate trace lookup. - Cosmos:ApplicationName per host: pinwiz-cli (CLI) + pinwiz-rag-worker (RagIngestionWorker) — distinguishes hosts in Cosmos diagnostics + custom-metric tagging without changing client behavior. - New Microsoft.Extensions.Diagnostics.HealthChecks 10.0.7 package in Directory.Packages.props + Infrastructure.csproj. Narrower than taking a Microsoft.AspNetCore.App framework reference; gives just the IHealthCheck abstractions + AddCheck<T> registration extension the Infrastructure layer needs. Tests (979 -> 987, +8): - CosmosClientOptionsTests (NEW, 2 tests) — pins all 6 ADR-0025 § 2 client options via real DI registration (catches wiring drift, not just config drift). Includes the null-safe ApplicationName path pinning the SDK constraint that empty strings are rejected. - CosmosHealthCheckTests (NEW, 6 tests) — healthy / CosmosException with diagnostics data keys / generic exception / cancellation propagation / 2 ctor null guards. - CosmosRepositoryTests.UpsertAsync_PassesEntityAndPartitionKey_ReturnsInputEntity (UPDATED) — asserts Assert.Same(entity, result) AND null ETag to prove the new contract returns the input instance, not the persisted body. Pre-push self-audit: - /local-review qualitative: ✅ 0 🔴 / 0⚠️ / 7 categories ✅ - 8-item mechanical (now includes Cosmos surface conformance from PR 1): ✅ all 8 items pass - Build: 0/0 zero warnings as errors - Identity: personal noreply Per ADR-0025 § Architectural style this PR applies "Cosmos document store + targeted CQRS materialized views; NOT full event sourcing" to the client-options layer specifically — the materialized-view container (machine_title_lookups) lands in PR 5.
Comment on lines
+58
to
+65
| catch (Exception ex) | ||
| { | ||
| stopwatch.Stop(); | ||
| _logger.LogWarning( | ||
| ex, | ||
| "Cosmos client warmup failed after {DurationMs:F0}ms. The first user query will pay the lazy-connection cost. Health check (`/healthz`) will surface persistent unreachability.", | ||
| stopwatch.Elapsed.TotalMilliseconds); | ||
| } |
Comment on lines
+80
to
+83
| catch (Exception ex) | ||
| { | ||
| return HealthCheckResult.Unhealthy($"Cosmos unreachable: {ex.Message}", exception: ex); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 2 of 6 in the Cosmos for User Delight track per ADR-0025. PR 1 (#139) shipped the ADR + 5-layer enforcement scaffolding; this PR implements the locked CosmosClientOptions posture + host-startup wiring. No behavioral change to user-facing query paths — PR 5 is the user-delight headline change.
What lands
CosmosClientOptions changes (per ADR-0025 § 2):
EnableContentResponseOnWrite = false— saves one round-trip + ~1 RU per write. Required pre-flight: refactoredCosmosRepository<T>.UpsertAsyncto return the input entity directly. Audited everyIRepository<T>.UpsertAsynccaller — none consumeresponse.Resource. UpdatedIRepository<T>XML doc +CosmosRepositoryTestsfor the new contract.AllowBulkExecution = true— auto-batches concurrent same-partition operations. Zero risk for current single-op call sites; meaningful win for OPDB sync (~2,400 sequential upserts) and the future Phase 1 → Cosmos backfill.Hot-path query tuning:
MachineRepository.QueryByTitleAsyncrefactored to useContainer.GetItemQueryIteratordirectly withMaxItemCount = 1(the caller breaks on first match). Comment cross-references ADR-0025 § 4 bridge state — PR 5 replaces the cross-partition query with a point-read.Host-startup posture:
CosmosClientWarmupHostedService—BackgroundServicecallingCosmosClient.ReadAccountAsync()at host startup to amortize the SDK's lazy-connection cost (~300-500ms) off the first user query.CosmosHealthCheck—IHealthCheckprobingmachinescontainer as a canary. Taggedliveso/healthzreports Cosmos reachability for ACA / Aspire liveness probes. OnCosmosException, capturesDiagnosticsinto health-checkdata(region, retry count, RU consumed) so operators see context without a separate trace lookup.Per-host identity:
Cosmos:ApplicationName=pinwiz-cli(CLI) andpinwiz-rag-worker(RagIngestionWorker). Distinguishes hosts in Cosmos diagnostics + custom-metric tagging.Package add:
Microsoft.Extensions.Diagnostics.HealthChecks10.0.7 — narrower than taking aMicrosoft.AspNetCore.Appframework reference; gives just the abstractions +AddCheck<T>extension the Infrastructure layer needs.Tests (979 → 987, +8)
CosmosClientOptionsTests(2 tests) — pins all 6 ADR-0025 § 2 client options via real DI registration (catches wiring drift, not just config drift). Includes the null-safeApplicationNamepath pinning the SDK constraint that empty strings are rejected.CosmosHealthCheckTests(6 tests) — healthy /CosmosExceptionwith diagnostics data keys / generic exception / cancellation propagation / 2 ctor null guards.CosmosRepositoryTests.UpsertAsync_PassesEntityAndPartitionKey_ReturnsInputEntity— assertsAssert.Same(entity, result)AND null ETag to prove the new contract.Pre-push self-audit
/local-review): ✅ 0 🔴 / 0Test plan
dotnet build PinballWizard.slnx -p:TreatWarningsAsErrors=true— 0/0 warningsdotnet test PinballWizard.slnx— 987/987 passingIRepository<T>.UpsertAsynccallers consumeresponse.ResourceCosmos:ApplicationNameshows up in Cosmos diagnostic logs aspinwiz-rag-workeronce the Container App image is rebuilt + swappedTrack progress
rag_dead_letterspinwiz.cosmos.*instruments +MeteredCosmosRepository<T>decorator (gates PR 5)🤖 Generated with Claude Code