feat(cosmos) PR 3: selective indexing on rag_index_state + rag_dead_letters#142
Merged
Merged
Conversation
…etters
PR 3 of 6 in the Cosmos for User Delight track per ADR-0025
(plan: ~/.claude/plans/lets-take-some-time-ticklish-storm.md).
Adds an optional CosmosIndexingPolicyOptions to CosmosContainerOptions
and applies selective indexing to two write-heavy RAG containers,
saving RU on every upsert without affecting any existing read path.
What lands:
- CosmosIndexingPolicyOptions (new) — IncludedPaths + ExcludedPaths
shape on the existing CosmosContainerOptions; null = default policy
(all-paths). Cosmos requires at least one included path when any
path is set; the provisioners pass paths through unmodified, so an
empty IncludedPaths would surface as a Cosmos create-time rejection.
- Per-container defaults (per ADR-0025 § 3):
* rag_index_state -> include /id/?, /document_id/?, /recorded_utc/?
and exclude /*. recorded_utc is load-bearing because the
reconciler (CosmosAiSearchRagReconciler) issues
`SELECT TOP @n * FROM c ORDER BY c.recorded_utc DESC` — without
the index that ORDER BY would scan the entire container on
Cosmos serverless every reconcile cycle.
* rag_dead_letters -> include /id/?, /document_id/?, /attempt_count/?,
/last_attempt_utc/? and exclude /*. SDK access is point-reads
only; the remaining indexed paths support operator queries in
Data Explorer when triaging failed deliveries.
* rag_leases -> NO override (intentionally left at default). The
lease container is owned by Cosmos.ChangeFeedProcessor and its
query surface is SDK-internal; selective indexing here would
manifest as a silent perf regression if a future SDK version
added queries against fields we excluded.
- ArmCosmosProvisioner — applies CosmosDBIndexingPolicy on container
create (IndexingMode = Consistent, IsAutomatic = true). On existing
containers, drift logs LogWarning (not throw) per ADR-0025 § 3 —
re-applying a policy is metadata-only and Cosmos re-indexes in the
background; partition-key drift remains fatal because it would
silently misroute writes.
- DataPlaneCosmosProvisioner — same drift-warning semantics. Both
provisioners share an identical IndexingPolicyMatches contract:
ordinal-sorted SequenceEqual on included + excluded paths.
- IndexingPolicyContractTests (NEW, 6 tests) — pins each container's
policy decision (selective / default / explicitly-not-overridden)
with inline rationale citing ADR-0025 § 3 and the actual code path
that drives each include/exclude choice.
Three deviations from the plan, all driven by reading the actual code:
1. rag_leases left at default (plan said selective). Reason: SDK-
managed lease container; query surface is opaque.
2. rag_index_state indexes /recorded_utc/? not /contentHash/?. Reason:
no `contentHash` JSON path exists (the field is `last_indexed_hash`,
never queried). The reconciler's actual `ORDER BY c.recorded_utc
DESC` query is what dictates the index.
3. rag_dead_letters uses snake_case /attempt_count/? + /last_attempt_utc/?
not PascalCase /AttemptCount/? + /createdAt/?. Reason: Cosmos
indexes JSON-on-the-wire paths; DeadLetterDocument's
[JsonPropertyName] attributes emit snake_case.
Tests: 987 -> 993 (+6, all in IndexingPolicyContractTests).
Pre-push self-audit:
- /local-review qualitative: 0 🔴 / 0 ⚠️ (1 surfaced + fixed) / 11 categories ✅
- 8-item mechanical (now includes Cosmos surface conformance from PR 1):
✅ all 8 items pass; item 8 (e) `MeteredCosmosRepository<T>` wrap
is N/A here — the decorator lands in PR 4.
- Build: 0/0 zero warnings as errors
- Identity: personal noreply
Per ADR-0025 § Architectural style this PR adds another increment of
the "Cosmos document store + targeted CQRS materialized views; NOT
full event sourcing" posture: the write-heavy projection containers
(rag_index_state, rag_dead_letters) get RU-discipline tuning matched
to their actual read patterns rather than over-indexing for hypothetical
future query shapes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 3 of 6 in the Cosmos for User Delight track per ADR-0025. Adds an optional
CosmosIndexingPolicyOptionsshape toCosmosContainerOptionsand applies selective indexing to two write-heavy RAG containers — saving RU on every upsert without affecting any existing read path.Per-container decisions (per ADR-0025 § 3):
rag_index_state/id/?,/document_id/?,/recorded_utc/?; exclude/*recorded_utcis load-bearing because the reconciler issuesSELECT TOP @n * FROM c ORDER BY c.recorded_utc DESCrag_dead_letters/id/?,/document_id/?,/attempt_count/?,/last_attempt_utc/?; exclude/*rag_leasesCosmos.ChangeFeedProcessor; query surface is SDK-internal — selective indexing would risk a silent perf regression on a future SDK versionmachines,ingestion_sources,scraped_documentsDrift posture: indexing-policy drift on existing containers logs
LogWarning(not throw) per ADR-0025 § 3 — re-applying a policy is metadata-only and Cosmos re-indexes in the background. Partition-key drift remains fatal because it would silently misroute writes.Three deviations from the plan, driven by reading the actual code
rag_leasesleft at default indexing (plan said selective). Reason: SDK-managed lease container; query surface is opaque.rag_index_stateindexes/recorded_utc/?not/contentHash/?. Reason: no `contentHash` JSON property exists (the field is `last_indexed_hash` and is never queried); the reconciler's actual `ORDER BY c.recorded_utc DESC` is what dictates the index.rag_dead_lettersuses snake_case/attempt_count/?+/last_attempt_utc/?not PascalCase/AttemptCount/?+/createdAt/?. Reason: Cosmos indexes JSON-on-the-wire paths;DeadLetterDocument's[JsonPropertyName]attributes emit snake_case.Test Plan
dotnet build PinballWizard.slnx -p:TreatWarningsAsErrors=true→ 0/0 (clean)dotnet test PinballWizard.slnx→ 993 passed (+6 new inIndexingPolicyContractTests; was 987 after PR 2)IndexingPolicyContractTests(new) — 6 tests pinning per-container policy decisions with inline rationale citing ADR-0025 § 3 and the code paths that drive each include/exclude choice/local-reviewqualitative: 0 🔴 / 0MeteredCosmosRepository<T>wrap N/A — decorator lands in PR 4git log -1 --format='%an <%ae>'shows personal noreply--ensure-cosmos-containersagainst deployed Cosmos and inspect the two RAG containers in Azure Portal Data Explorer for the selective indexing JSONOut of Scope
MeteredCosmosRepository<T>decorator andpinwiz.cosmos.*instruments — PR 4machine_title_lookupscontainer) — PR 5rag_dead_letters— PR 6 (independent of this PR; can ship in parallel)🤖 Generated with Claude Code