Skip to content

feat(cosmos) PR 3: selective indexing on rag_index_state + rag_dead_letters#142

Merged
jkeeley2073 merged 1 commit into
mainfrom
Dev-Phase4Adr0025Pr3SelectiveIndexing
May 9, 2026
Merged

feat(cosmos) PR 3: selective indexing on rag_index_state + rag_dead_letters#142
jkeeley2073 merged 1 commit into
mainfrom
Dev-Phase4Adr0025Pr3SelectiveIndexing

Conversation

@jkeeley2073
Copy link
Copy Markdown
Contributor

Summary

PR 3 of 6 in the Cosmos for User Delight track per ADR-0025. Adds an optional CosmosIndexingPolicyOptions shape to CosmosContainerOptions and applies selective indexing to two write-heavy RAG containers — saving RU on every upsert without affecting any existing read path.

Per-container decisions (per ADR-0025 § 3):

Container Policy Rationale
rag_index_state include /id/?, /document_id/?, /recorded_utc/?; exclude /* recorded_utc is load-bearing because the reconciler issues SELECT TOP @n * FROM c ORDER BY c.recorded_utc DESC
rag_dead_letters include /id/?, /document_id/?, /attempt_count/?, /last_attempt_utc/?; exclude /* SDK access is point-reads only; remaining paths support operator queries in Data Explorer
rag_leases (none — default policy) Owned by Cosmos.ChangeFeedProcessor; query surface is SDK-internal — selective indexing would risk a silent perf regression on a future SDK version
machines, ingestion_sources, scraped_documents (none — default policy) Read-side query patterns still being tuned

Drift posture: indexing-policy drift on existing containers logs LogWarning (not throw) per ADR-0025 § 3 — re-applying a policy is metadata-only and Cosmos re-indexes in the background. Partition-key drift remains fatal because it would silently misroute writes.

Three deviations from the plan, driven by reading the actual code

  1. rag_leases left at default indexing (plan said selective). Reason: SDK-managed lease container; query surface is opaque.
  2. rag_index_state indexes /recorded_utc/? not /contentHash/?. Reason: no `contentHash` JSON property exists (the field is `last_indexed_hash` and is never queried); the reconciler's actual `ORDER BY c.recorded_utc DESC` is what dictates the index.
  3. rag_dead_letters uses snake_case /attempt_count/? + /last_attempt_utc/? not PascalCase /AttemptCount/? + /createdAt/?. Reason: Cosmos indexes JSON-on-the-wire paths; DeadLetterDocument's [JsonPropertyName] attributes emit snake_case.

Test Plan

  • dotnet build PinballWizard.slnx -p:TreatWarningsAsErrors=true → 0/0 (clean)
  • dotnet test PinballWizard.slnx → 993 passed (+6 new in IndexingPolicyContractTests; was 987 after PR 2)
  • IndexingPolicyContractTests (new) — 6 tests pinning per-container policy decisions with inline rationale citing ADR-0025 § 3 and the code paths that drive each include/exclude choice
  • Pre-push self-audit:
    • Step 0 /local-review qualitative: 0 🔴 / 0 ⚠️ / 11 categories ✅ (one ⚠️ surfaced about a misleading remarks docstring claim — fixed in the same PR before push)
    • Step 1 8-item mechanical (incl. new Cosmos surface conformance from PR 1): all 8 PASS; item 8 (e) MeteredCosmosRepository<T> wrap N/A — decorator lands in PR 4
  • Identity: git log -1 --format='%an <%ae>' shows personal noreply
  • Operator validation deferred to whole-sequence verification: run --ensure-cosmos-containers against deployed Cosmos and inspect the two RAG containers in Azure Portal Data Explorer for the selective indexing JSON

Out of Scope

  • MeteredCosmosRepository<T> decorator and pinwiz.cosmos.* instruments — PR 4
  • Title→OpdbId point-read lookup (machine_title_lookups container) — PR 5
  • TTL on rag_dead_letters — PR 6 (independent of this PR; can ship in parallel)
  • Re-applying selective policy to ALREADY-existing containers in deployed Cosmos: this PR's drift-warning surfaces the gap to operators; an explicit policy-replacement reconcile path is intentionally NOT added because it would obscure the warn-vs-fix distinction and tempt future "auto-heal everything" drift creep

🤖 Generated with Claude Code

…etters

PR 3 of 6 in the Cosmos for User Delight track per ADR-0025
(plan: ~/.claude/plans/lets-take-some-time-ticklish-storm.md).
Adds an optional CosmosIndexingPolicyOptions to CosmosContainerOptions
and applies selective indexing to two write-heavy RAG containers,
saving RU on every upsert without affecting any existing read path.

What lands:

- CosmosIndexingPolicyOptions (new) — IncludedPaths + ExcludedPaths
  shape on the existing CosmosContainerOptions; null = default policy
  (all-paths). Cosmos requires at least one included path when any
  path is set; the provisioners pass paths through unmodified, so an
  empty IncludedPaths would surface as a Cosmos create-time rejection.
- Per-container defaults (per ADR-0025 § 3):
    * rag_index_state -> include /id/?, /document_id/?, /recorded_utc/?
      and exclude /*. recorded_utc is load-bearing because the
      reconciler (CosmosAiSearchRagReconciler) issues
      `SELECT TOP @n * FROM c ORDER BY c.recorded_utc DESC` — without
      the index that ORDER BY would scan the entire container on
      Cosmos serverless every reconcile cycle.
    * rag_dead_letters -> include /id/?, /document_id/?, /attempt_count/?,
      /last_attempt_utc/? and exclude /*. SDK access is point-reads
      only; the remaining indexed paths support operator queries in
      Data Explorer when triaging failed deliveries.
    * rag_leases -> NO override (intentionally left at default). The
      lease container is owned by Cosmos.ChangeFeedProcessor and its
      query surface is SDK-internal; selective indexing here would
      manifest as a silent perf regression if a future SDK version
      added queries against fields we excluded.
- ArmCosmosProvisioner — applies CosmosDBIndexingPolicy on container
  create (IndexingMode = Consistent, IsAutomatic = true). On existing
  containers, drift logs LogWarning (not throw) per ADR-0025 § 3 —
  re-applying a policy is metadata-only and Cosmos re-indexes in the
  background; partition-key drift remains fatal because it would
  silently misroute writes.
- DataPlaneCosmosProvisioner — same drift-warning semantics. Both
  provisioners share an identical IndexingPolicyMatches contract:
  ordinal-sorted SequenceEqual on included + excluded paths.
- IndexingPolicyContractTests (NEW, 6 tests) — pins each container's
  policy decision (selective / default / explicitly-not-overridden)
  with inline rationale citing ADR-0025 § 3 and the actual code path
  that drives each include/exclude choice.

Three deviations from the plan, all driven by reading the actual code:

1. rag_leases left at default (plan said selective). Reason: SDK-
   managed lease container; query surface is opaque.
2. rag_index_state indexes /recorded_utc/? not /contentHash/?. Reason:
   no `contentHash` JSON path exists (the field is `last_indexed_hash`,
   never queried). The reconciler's actual `ORDER BY c.recorded_utc
   DESC` query is what dictates the index.
3. rag_dead_letters uses snake_case /attempt_count/? + /last_attempt_utc/?
   not PascalCase /AttemptCount/? + /createdAt/?. Reason: Cosmos
   indexes JSON-on-the-wire paths; DeadLetterDocument's
   [JsonPropertyName] attributes emit snake_case.

Tests: 987 -> 993 (+6, all in IndexingPolicyContractTests).

Pre-push self-audit:
- /local-review qualitative: 0 🔴 / 0 ⚠️ (1 surfaced + fixed) / 11 categories ✅
- 8-item mechanical (now includes Cosmos surface conformance from PR 1):
  ✅ all 8 items pass; item 8 (e) `MeteredCosmosRepository<T>` wrap
  is N/A here — the decorator lands in PR 4.
- Build: 0/0 zero warnings as errors
- Identity: personal noreply

Per ADR-0025 § Architectural style this PR adds another increment of
the "Cosmos document store + targeted CQRS materialized views; NOT
full event sourcing" posture: the write-heavy projection containers
(rag_index_state, rag_dead_letters) get RU-discipline tuning matched
to their actual read patterns rather than over-indexing for hypothetical
future query shapes.
@jkeeley2073 jkeeley2073 added the claude-code Generated with Claude Code label May 9, 2026
@jkeeley2073 jkeeley2073 enabled auto-merge May 9, 2026 13:43
@jkeeley2073 jkeeley2073 merged commit c63ef08 into main May 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code Generated with Claude Code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant