Skip to content

docs: revise ADR-005 — staged overlay refactoring with prototype findings#439

Merged
yuanchen8911 merged 2 commits intoNVIDIA:mainfrom
yuanchen8911:docs/adr-004-overlay-refactoring
Apr 7, 2026
Merged

docs: revise ADR-005 — staged overlay refactoring with prototype findings#439
yuanchen8911 merged 2 commits intoNVIDIA:mainfrom
yuanchen8911:docs/adr-004-overlay-refactoring

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

@yuanchen8911 yuanchen8911 commented Mar 19, 2026

Summary

Revise ADR-005 based on prototype implementation findings. The original proposal (reorder inheritance tree as a no-code Phase 1, then add mixins) is not viable as written — reparenting conflicts with the current all-match resolver semantics. The revised ADR resequences the work: fix resolver correctness first, then add mixins for maintenance-critical shared fragments.

Motivation / Context

A full prototype of the original Phase 1 (4 intermediate overlays, 8 reparented intent overlays) revealed:

  1. Reparenting breaks under all-match semantics — old parent overlays (eks-training) still match queries independently and compete with intermediates at the same specificity level, producing non-deterministic merge results
  2. Specificity() bug — all overlays report the same specificity because YAML-parsed empty strings aren't treated as "any", making the sort meaningless
  3. Net line count increases — intermediates add ~251 lines while removing ~117, a net +134

A subsequent mixin prototype validated that spec.mixins composition works correctly, producing identical hydrated output for all converted overlays.

Fixes: N/A
Related: #305

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server)
  • Recipe engine / data (pkg/recipe)
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: ____________

Implementation Notes

Revised phasing (was: Reorder → Mixins → Deep Mixins):

  1. Phase 1: Specificity() bug fix + validation lift-up (correctness fix + structure cleanup)
  2. Phase 2: Maximal leaf candidate selection as shared helper under both BuildRecipeResult and BuildRecipeResultWithEvaluator (candidate selection refinement)
  3. Phase 3: Mixins for OS/platform fragments with post-compose constraint evaluation and conflict policy enforcement
  4. Deferred A: Intermediates + reparenting — only after Phase 2, only if future accelerator growth justifies the file churn
  5. Deferred B: Deep-merge validation mixins — only after Phase 3 proven stable

Key changes from original ADR:

  • Justification reframed: maintenance cost and drift resistance, not line-count reduction (~80 lines net from mixins, ~67 from validation lift-up)
  • "No resolver model replacement" and "no code changes" removed from Non-Goals — both are now required
  • Intermediates demoted from recommended Phase 1 to Deferred A
  • Prototype findings documented with measured numbers

Testing

N/A — documentation only.

Risk Assessment

  • Low — Isolated change, well-tested, easy to revert

Rollout notes: N/A — ADR document only, no code changes.

Checklist

  • Tests pass locally (make test with -race)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality
  • I updated docs if user-facing behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S) — GPG signing info

@yuanchen8911 yuanchen8911 requested a review from a team as a code owner March 19, 2026 18:23
@yuanchen8911 yuanchen8911 force-pushed the docs/adr-004-overlay-refactoring branch 2 times, most recently from 52ad12b to 6cb02dc Compare March 19, 2026 18:35
@yuanchen8911 yuanchen8911 force-pushed the docs/adr-004-overlay-refactoring branch from 6cb02dc to 0f27c00 Compare March 19, 2026 18:38
@yuanchen8911 yuanchen8911 force-pushed the docs/adr-004-overlay-refactoring branch from 0f27c00 to e2ef22f Compare March 19, 2026 21:14
@yuanchen8911 yuanchen8911 changed the title docs: add ADR-004 for overlay refactoring to reduce duplication docs: add ADR-005 for overlay refactoring to reduce duplication Mar 19, 2026
@yuanchen8911 yuanchen8911 force-pushed the docs/adr-004-overlay-refactoring branch from e2ef22f to a19016d Compare March 19, 2026 21:21
Copy link
Copy Markdown
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good ADR. 3 areas that need clarification before this is ready to merge:

  1. P0: Constraint evaluator change: Moving from per-overlay to post-merge evaluation is a significant behavioral change that affects ExcludedOverlays/ConstraintWarnings semantics. This ADR should address what happens when mixin-contributed constraints fail against a snapshot — does the whole recipe fail, or is there a graceful degradation path?

  2. Mixin-vs-inherited conflict policy: CI lint covers mixin-vs-mixin conflicts but not mixin-vs-inherited conflicts. A mixin could silently weaken or override a constraint from the inheritance chain. We need explicit policy for that.

  3. Decision section: The analysis clearly supports Reorder + Mixins. Either commit to it as "Proposed" or explicitly frame the TBD. ADR is meant to force a team vote between specific options.

Copy link
Copy Markdown
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the constraint evaluator change is a blocker

@yuanchen8911
Copy link
Copy Markdown
Contributor Author

Re: P0 — Constraint evaluator change

Good catch. The intent is not to introduce partial mixin degradation. In Phase 2, constraint evaluation would run against the fully composed leaf candidate: inheritance chain + mixins + leaf-local content. If any constraint contributed by that composed candidate fails, that candidate is excluded as a unit. Recipe generation still succeeds by falling back to the remaining matching overlays, consistent with today's ExcludedOverlays behavior; only if every matching overlay is excluded do we end up with base-only output.

To keep this debuggable, ConstraintWarnings will preserve provenance for the failing constraint via a Source field (e.g., "inheritance/h100-eks", "mixin/os-ubuntu", "leaf/h100-eks-ubuntu-training") — not just richer warning text. This makes it clear whether the failure came from the inheritance chain, a mixin, or the leaf overlay itself.

I'll add a "Constraint Failure Semantics" section to the ADR documenting this explicitly.

@yuanchen8911
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review, Mark. Addressing each point:

1. Constraint evaluator change (P0)

Good catch. The intent is not to introduce partial mixin degradation. In Phase 2, constraint evaluation would run against the fully composed leaf candidate: inheritance chain + mixins + leaf-local content. If any constraint contributed by that composed candidate fails, that candidate is excluded as a unit. Recipe generation still succeeds by falling back to the remaining matching overlays, consistent with today's ExcludedOverlays behavior; only if every matching overlay is excluded do we end up with base-only output.

To keep this debuggable, ConstraintWarnings will preserve provenance for the failing constraint via a Source field (e.g., "inheritance/h100-eks", "mixin/os-ubuntu", "leaf/h100-eks-ubuntu-training") — not just richer warning text. This makes it clear whether the failure came from the inheritance chain, a mixin, or the leaf overlay itself.

I'll add a "Constraint Failure Semantics" section to the ADR documenting this explicitly.

2. Mixin-vs-inherited conflict policy

Agreed — the current "mixin wins" wording is too implicit. I'll revise the ADR to make the policy explicit: mixin-vs-inherited duplicate constraint names are forbidden, and CI lint rejects them. This keeps mixins truly orthogonal and prevents accidental weakening or silent override of inherited constraints.

The platform-dynamo example currently shows the mixin raising K8s.server.version from the parent's >= 1.32.4 to >= 1.34. Under the strict policy, that would be a lint error, so I'll update the example so that constraint is defined in one place only.

3. Decision section

Agreed. The ADR currently says Status: Proposed but leaves the decision itself as TBD, which is too ambiguous. I'll update the decision section to make the proposed choice explicit: adopt Reorder + Mixins as the selected direction for implementation via Phase 1 (Reorder) and Phase 2 (Mixins), with Reorder + Deep Mixins deferred until validation deep-merge semantics are proven.

I'll also make Phase 2's boundary explicit: validation is not allowed in mixins in this phase because current validation merge semantics are replacement-based, not deep-merge.

mchmarny
mchmarny previously approved these changes Apr 2, 2026
Copy link
Copy Markdown
Member

@mchmarny mchmarny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@yuanchen8911 yuanchen8911 requested a review from mchmarny April 2, 2026 19:28
@lockwobr
Copy link
Copy Markdown
Contributor

lockwobr commented Apr 3, 2026

Consider New Phase 1.5: Push validation up + delete redundant constraints

No code changes. Only overlay YAML restructuring.

The problem in one picture

Here's the inheritance chain for h100-eks-ubuntu-training today:

base                          K8s >= 1.28 (from eks)
└── eks                       K8s >= 1.28, validation: {conformance: [5 checks]}
    └── eks-training          K8s >= 1.30
        └── h100-eks-training K8s >= 1.32.4, gpu-operator overrides, skyhook
            └── h100-eks-ubuntu-training   ← LEAF

The leaf file (h100-eks-ubuntu-training.yaml) is 73 lines. Here's what's in it:

spec:
  base: h100-eks-training
  criteria: { service: eks, accelerator: h100, os: ubuntu, intent: training }

  constraints:
    - name: K8s.server.version          # ← REDUNDANT: parent already says >= 1.32.4
      value: ">= 1.32.4"
    - name: OS.release.ID               # ← UNIQUE to this leaf (ubuntu-specific)
      value: ubuntu
    - name: OS.release.VERSION_ID       # ← UNIQUE to this leaf
      value: "24.04"
    - name: OS.sysctl.../osrelease      # ← UNIQUE to this leaf
      value: ">= 6.8"

  componentRefs: []                     # ← EMPTY, adds nothing

  validation:                           # ← 28 lines of validation config
    deployment:                         #    that could live in the parent
      checks: [operator-health, expected-resources, gpu-operator-version, check-nvidia-smi]
      constraints: [{ name: Deployment.gpu-operator.version, value: ">= v24.6.0" }]
    performance:
      checks: [nccl-all-reduce-bw]
      constraints: [{ name: nccl-all-reduce-bw, value: ">= 300" }]
    conformance:
      checks: [platform-health, gpu-operator-health, dra-support, ...]

Two problems:

  1. K8s >= 1.32.4 is already in the parent (h100-eks-training). Re-declaring it here does nothing — the merge engine already inherits it by constraint name. It's copy-paste noise.
  2. The validation block has nothing Ubuntu-specific. It's about H100 + EKS + training — it checks GPU operator health, NCCL bandwidth, conformance. It belongs in h100-eks-training, not in the ubuntu leaf.

This same pattern repeats across ~16 leaf overlays.

The fix (two moves, zero code changes)

Move 1: Push validation up to the intent overlay. The validation block describes what to check for a given {accelerator, service, intent} combination. Move it from the -ubuntu- leaf to its parent.

Before:
h100-eks-training.yaml        → has GPU config, NO validation
h100-eks-ubuntu-training.yaml → has OS constraints + validation (28 lines)

After:
h100-eks-training.yaml        → has GPU config + validation (moved here)
h100-eks-ubuntu-training.yaml → has OS constraints only

This works because Merge() already handles validation inheritance — if the parent has a validation block, children inherit it. Children only need to re-declare validation if they want to override a phase.

Move 2: Delete redundant constraint re-declarations. If h100-eks-training already says K8s >= 1.32.4, every child inherits it automatically. Delete the duplicate line from every leaf that just re-states what the parent already says.

After both moves: leaf becomes trivial

h100-eks-ubuntu-training.yaml — AFTER (was 73 lines, now ~25 with license header)

kind: RecipeMetadata
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
  name: h100-eks-ubuntu-training
spec:
  base: h100-eks-training
  criteria:
    service: eks
    accelerator: h100
    os: ubuntu
    intent: training
  constraints:
    - name: OS.release.ID
      value: ubuntu
    - name: OS.release.VERSION_ID
      value: "24.04"
    - name: OS.sysctl./proc/sys/kernel/osrelease
      value: ">= 6.8"

That's it. Only the things that are genuinely unique to "this is Ubuntu" remain.

Why this is safe

  • Constraint inheritance already works this way. Merge() merges constraints by name — child overrides parent for the same name, parent values carry through for names the child doesn't mention. Deleting a redundant re-declaration changes
    nothing about the merged output.
  • Validation inheritance already works this way. Merge() does phase-level replacement — if the child doesn't declare a validation phase, it inherits the parent's. Moving validation to the parent and removing it from the child produces the
    same merged result.
  • Verifiable with a golden-file test. Generate recipe output for every leaf overlay before and after. Diff must be empty. If it's not empty, something was moved wrong.

What's left after Phase 1 + 1.5

Leaf overlays are now ~10-15 lines of actual content. The remaining duplication:

Still duplicated Lines Files Notes
Ubuntu OS constraints (3 lines) 3 × 12 = 36 12 Genuinely per-leaf (criteria-specific)
Kubeflow component block ~8 × 4 = 32 4 Could go in a parent overlay
Dynamo component block ~25 × 4 = 100 4 Strongest case for sharing

The Ubuntu constraints are only 3 lines per file — arguably not worth any machinery to deduplicate. The kubeflow block could be moved to {accel}-{service}-training-kubeflow parent overlays. The dynamo block is the only one where a sharing
mechanism (mixins or otherwise) has a real payoff, and even that could be handled with a {service}-inference-dynamo parent overlay instead.

Summary

Phase Dedup Code changes Risk
1: Reorder tree ~40% None Low (overlay restructure)
1.5: Push validation up + delete redundant constraints ~70% None Low (verify with golden-file diff)
2: Mixins (if still needed) ~75-80% ~80 lines Go Medium

Phase 1.5 captures most of the mixin benefit with zero code changes and zero new concepts.

@yuanchen8911
Copy link
Copy Markdown
Contributor Author

@lockwobr Good point. I agree this identifies a meaningful no-code cleanup step: push validation up to the {accelerator}-{service}-{intent} layer and delete leaf-level re-statements of constraints that are already inherited from the parent. That should be safe as long as the hydrated recipe output stays identical before and after — verifiable with aicr query golden-file diffs.

I don't think it fully removes the case for mixins, because the platform overlays still duplicate substantial component blocks (dynamo in particular at ~25 lines × 4 files). But it does strengthen the case that the Ubuntu layer itself should become much thinner before introducing any new mechanism. I'll fold this into the ADR framing, either as part of Phase 1 or as an explicit refinement step between Phase 1 and Phase 2.

@yuanchen8911
Copy link
Copy Markdown
Contributor Author

PR #439 Review Summary

Mark (mchmarny) — 8 inline comments

P0: Constraint evaluator change — Moving to post-merge evaluation changes ExcludedOverlays semantics. What happens when mixin constraints fail?
→ Composed leaf candidate is excluded as a unit. Added candidate selection and constraint failure semantics sections to ADR.

Mixin-vs-inherited conflict policy — Mixins could silently weaken inherited constraints.
→ Strict policy: duplicate constraint/component names between mixins and inheritance chain are forbidden. CI lint enforces.

Decision section TBD — Analysis clearly supports Reorder + Mixins, commit to it.
→ Updated to "Proposed: Reorder + Mixins." Validation excluded from Phase 2 mixins as a hard boundary.

Mixin field strippingspec.mixins could leak into recipe output.
→ Added implementation note: merge a copy with mixins cleared.

Use aicr query for golden-file testing — Hydrated output verification without running bundle.
→ Added to Phase 1 exit criteria.

Nits — ADR-004→005 in PR description; rename file to drop -adr suffix.
→ Both fixed.

Jason (xdu31) — 1 inline comment

Does the OS layer need to exist? — Ubuntu overlays add no components, no validation. Kernel >= 6.8 is a GPU driver requirement, not OS. Removing the OS layer eliminates 12 files and the main mixin justification.
→ Agreed the Ubuntu layer is thinner than framed. Phase 1 should move validation up to intent layer and kernel constraint to accelerator layer. Platform overlays (dynamo, kubeflow) remain independent justification for mixins.

Brian (lockwobr) — 1 conversation comment

Phase 1.5: Push validation up + delete redundant constraints — No code changes, captures ~70% dedup by moving validation to {accel}-{service}-{intent} and removing re-stated constraints from leaves.
→ Agreed. Will fold into ADR as part of Phase 1 or explicit refinement step. Doesn't fully remove mixin case (platform overlays still duplicate component blocks).

Alex (ayuskauskas) — 2 comments

Auto-mixin — If mixins only add constraints, the engine could auto-apply them by criteria match.
→ Explicit spec.mixins preferred for Phase 2 — keeps composition visible and debuggable. Criteria-driven selection could be a future direction.

Flat leaf-only structure — Remove inheritance entirely, one file per combo.
→ Valid alternative but different from ADR's Flat Mixins option. Tradeoff: simpler locally, but version bumps become N-file edits.

@yuanchen8911
Copy link
Copy Markdown
Contributor Author

yuanchen8911 commented Apr 3, 2026

Recent Updates

Replies posted:

  1. Reply to @xdu31 — OS layer comment (inline reply)
  2. Reply to @lockwobr — Phase 1.5 comment (conversation comment)
  3. Reply to @mchmarny — mixin field stripping (inline reply)
  4. Reply to @mchmarnyaicr query for golden-file testing (inline reply)
  5. Reply to @ayuskauskas — auto-mixin (inline reply)
  6. Reply to @ayuskauskas — flat leaf-only structure (conversation comment)

ADR content updates pushed:
7. Mixin stripping note — added implementation note that spec.mixins must be cleared before materializing the recipe result
8. Phase 1 exit criteria — added aicr query --format json golden-file verification across all leaf overlay combinations
9. Exit criteria wording — tightened "recipe generation commands" → "recipe resolutions produce identical hydrated output" in both Phase 1 and Phase 2
10. Phase 1 now includes validation lift-up and constraint cleanup (based on feedback from @xdu31 and @lockwobr):
- Move validation blocks up to {accelerator}-{service}-{intent} layer — these checks are not OS-specific
- Move kernel >= 6.8 out of Ubuntu leaf to the highest shared layer where it is a driver requirement
- Delete redundant constraint re-declarations from leaf overlays
11. Dedup estimates updated from ~40% to ~40%+ across all references
12. os-ubuntu mixin now described as Ubuntu release constraints only (kernel moves to accelerator layer in Phase 1)
13. New Phase 1 exit criteria: no leaf carries validation duplicated across siblings; no leaf re-declares inherited constraints
14. Full consistency pass across options summary, tradeoffs, Flat Mixins, Auto-Compose, and Recommendation sections

@yuanchen8911
Copy link
Copy Markdown
Contributor Author

Summary of Review Feedback and ADR Updates

# Concern From ADR Update
1 Constraint failure semantics (P0) Mark Added candidate selection + constraint failure sections: composed leaf candidate excluded as a unit
2 Mixin conflict policy Mark Strict duplicate-name prohibition for constraints and components, CI lint enforces
3 Mixin field stripping Mark Implementation note: merge a copy with mixins cleared before materializing recipe output
4 Golden-file testing Mark Added aicr query --format json as verification path in Phase 1 exit criteria
5 OS layer justification Jason Phase 1 now moves validation up to intent layer and kernel constraint to accelerator layer
6 Phase 1.5: validation lift-up Brian Folded into Phase 1 — validation lift-up + redundant constraint cleanup, no code changes
7 Auto-mixin / flat structure Alex Explicit spec.mixins preferred; flat leaf-only acknowledged as valid alternative
8 ADR-004 → ADR-005 naming Mark Fixed PR description; renamed file to 005-overlay-refactoring.md
9 Dedup estimates Updated from ~40% to ~40%+ across all references
10 os-ubuntu mixin scope Now Ubuntu release constraints only; kernel moves to accelerator layer in Phase 1
11 Consistency pass Updated options summary, tradeoffs, Flat Mixins, Auto-Compose, example trees, and Recommendation sections
12 Phase 1 exit criteria Added: no duplicated validation across siblings; no re-declared inherited constraints

@yuanchen8911
Copy link
Copy Markdown
Contributor Author

Here's the plan for Phase 1. I hope it addresses the comments. If we agree, I'll start an implementation.

Phase 1 (no code changes):

  1. Create {accelerator}-{service} intermediate overlays (h100-eks, gb200-eks, etc.)
  2. Move shared GPU operator overrides, skyhook config, and K8s constraints into those intermediates
  3. Re-parent intent overlays to inherit from the new intermediates
  4. Move validation blocks up to the {accelerator}-{service}-{intent} layer
  5. Move kernel >= 6.8 out of the Ubuntu leaf to the highest shared layer where it is actually a driver requirement
  6. Delete redundant constraint re-declarations from leaf overlays

@yuanchen8911 yuanchen8911 changed the title docs: add ADR-005 for overlay refactoring to reduce duplication docs: revise ADR-005 — staged overlay refactoring with prototype findings Apr 6, 2026
@yuanchen8911
Copy link
Copy Markdown
Contributor Author

ADR-005 Revision Summary

Based on a full prototype implementation of both the intermediate/reparenting approach and the mixin approach, the ADR needs material revision. Here are the findings and the proposed new phasing.

What Changed and Why

Prototype implementation of the original Phase 1 (intermediates + reparenting) revealed three issues:

  1. Reparenting is incompatible with the current resolver. The resolver applies all matching overlays independently. Reparenting h100-eks-training from eks-training to a new h100-eks intermediate doesn't remove eks-training from the merge — it still matches the query and competes at the same specificity level, producing non-deterministic results.

  2. Specificity() has a pre-existing correctness bug. YAML-parsed empty strings aren't treated as "any", so all overlays report the same specificity. The sort is meaningless. This was masked by the current linear chain structure.

  3. Intermediates increase total line count for the current accelerator/service set. Net +134 lines — intent overlays must absorb content from their former parents, offsetting dedup gains.

A subsequent mixin prototype validated that spec.mixins composition works correctly with zero semantic regressions across all current leaf overlays discovered from recipes/overlays/.

Revised Phases

Phase 1 — Correctness fix + structure cleanup

  • Fix Specificity() to treat "" as "any" (with regression test)
  • Lift validation blocks from ubuntu leaves to intent overlays (~67 lines eliminated)

Phase 2 — Candidate selection refinement

  • Implement maximal leaf candidate selection as a shared helper under both BuildRecipeResult and BuildRecipeResultWithEvaluator
  • Filter candidates to maximal matching leaves — prerequisite for any composition mechanism

Phase 3 — Mixins (OS + platform)

  • RecipeMixin kind with os-ubuntu, platform-kubeflow, platform-dynamo (~80 lines net reduction)
  • Post-compose constraint evaluation so mixin constraints are visible to the evaluator
  • Conflict policy enforcement (no duplicate names between mixin and chain)

Deferred A — Intermediates + reparenting

  • Only viable after Phase 2; only justified when future accelerator growth (B200, GB300) makes the file-count reduction worth the added files

Deferred B — Deep-merge validation mixins

  • Only after Phase 3 proven stable and deep-merge semantics verified safe

Key Reframing

  • Phase 1 original reparenting was disproven by the prototype because current all-match resolver semantics make old parents continue to participate independently
  • Specificity() had a real pre-existing correctness bug that was masked by the current linear tree
  • Intermediates are deferred because, for today's tree, they add files and net lines before they add enough value
  • Mixins were validated as a maintenance/drift-reduction tool, not a major line-count win
  • The new sequencing is: correctness first, candidate selection second, composition abstractions third

The full revised ADR text is ready — I'll push it as a commit once we align on the direction. The prototype code (Specificity fix, maximal leaf selection, mixin loader/merger) is available as a working reference.

@mchmarny @xdu31 @ayuskauskas — would appreciate another look given the material changes to the phasing and justification.

…ings

Revise ADR-005 based on prototype implementation findings:

- Original Phase 1 (intermediates + reparenting) is not viable with the
  current all-match resolver. Reparented overlays still compete with their
  former parents at the same specificity level, producing non-deterministic
  merge results.

- Specificity() has a pre-existing bug: YAML-parsed empty strings are not
  treated as "any", making all overlays report the same specificity.

- Intermediates increase net line count (+134 lines) for the current
  accelerator/service set due to content absorption from former parents.

- Mixin prototype validated: spec.mixins composition produces identical
  hydrated output for all current leaf overlays.

Revised sequencing:
  Phase 1: Specificity fix + validation lift-up
  Phase 2: Maximal leaf candidate selection (shared helper)
  Phase 3: Mixins for OS/platform composition
  Deferred A: Intermediates (after Phase 2, if accelerator growth justifies)
  Deferred B: Deep-merge validation mixins (after Phase 3 proven stable)

Justification reframed from line-count reduction to maintenance cost,
drift resistance, correctness, and review burden.

Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
@ayuskauskas
Copy link
Copy Markdown
Contributor

Seems reasonable

@yuanchen8911 yuanchen8911 requested a review from xdu31 April 6, 2026 21:42
@yuanchen8911 yuanchen8911 merged commit 51f2b8c into NVIDIA:main Apr 7, 2026
9 checks passed
@mchmarny mchmarny added this to the v0.12 milestone Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants