Skip to content

refactor: Streamline component manager provider bootstrap#504

Open
jw-nvidia wants to merge 1 commit intoNVIDIA:mainfrom
jw-nvidia:refactor/cm-provider-bootstrap
Open

refactor: Streamline component manager provider bootstrap#504
jw-nvidia wants to merge 1 commit intoNVIDIA:mainfrom
jw-nvidia:refactor/cm-provider-bootstrap

Conversation

@jw-nvidia
Copy link
Copy Markdown
Contributor

Description

  • split component manager config into dedicated config and builtin packages
  • move provider registry APIs under providerapi, and remove componentmanager provider re-export wrappers.
  • normalize provider config construction so YAML/default paths share config helpers while cmd/serve only wires provider creation.

Type of Change

  • Feature - New feature or functionality (feat:)
  • Fix - Bug fixes (fix:)
  • Chore - Modification or removal of existing functionality (chore:)
  • Refactor - Refactoring of existing functionality (refactor:)
  • Docs - Changes in documentation or OpenAPI schema (docs:)
  • CI - Changes in GitHub workflows. Requires additional scrutiny (ci:)
  • Version - Issuing a new release version (version:)

Services Affected

  • API - API models or endpoints updated
  • Workflow - Workflow service updated
  • DB - DB DAOs or migrations updated
  • Site Manager - Site Manager updated
  • Cert Manager - Cert Manager updated
  • Site Agent - Site Agent updated
  • RLA - RLA service updated
  • Powershelf Manager - Powershelf Manager updated
  • NVSwitch Manager - NVSwitch Manager updated

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@jw-nvidia jw-nvidia requested a review from a team as a code owner May 7, 2026 22:24
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d759b4da-48ca-4f92-9b7e-8e04c6580455

📥 Commits

Reviewing files that changed from the base of the PR and between 0a819d4 and 6b50d8f.

📒 Files selected for processing (33)
  • rla/cmd/serve.go
  • rla/docs/component-manager-architecture.md
  • rla/docs/component-manager-config.md
  • rla/internal/scheduler/jobs/inventorysync/inventory.go
  • rla/internal/scheduler/jobs/inventorysync/inventory_test.go
  • rla/internal/scheduler/jobs/inventorysync/job.go
  • rla/internal/scheduler/jobs/leakdetection/job.go
  • rla/internal/service/config.go
  • rla/internal/task/componentmanager/builtin/component_manager_factories.go
  • rla/internal/task/componentmanager/builtin/component_manager_factories_test.go
  • rla/internal/task/componentmanager/builtin/config.go
  • rla/internal/task/componentmanager/builtin/config_test.go
  • rla/internal/task/componentmanager/builtin/provider_config_decoders.go
  • rla/internal/task/componentmanager/builtin/provider_config_decoders_test.go
  • rla/internal/task/componentmanager/componentmanager.go
  • rla/internal/task/componentmanager/componentmanager_test.go
  • rla/internal/task/componentmanager/compute/nico/nico.go
  • rla/internal/task/componentmanager/config.go
  • rla/internal/task/componentmanager/config/config.go
  • rla/internal/task/componentmanager/config/config_test.go
  • rla/internal/task/componentmanager/config/doc.go
  • rla/internal/task/componentmanager/config/errors.go
  • rla/internal/task/componentmanager/config/yaml.go
  • rla/internal/task/componentmanager/errors.go
  • rla/internal/task/componentmanager/mock/mock.go
  • rla/internal/task/componentmanager/nvlswitch/nico/nico.go
  • rla/internal/task/componentmanager/nvlswitch/nvswitchmanager/nvswitchmanager.go
  • rla/internal/task/componentmanager/powershelf/nico/nico.go
  • rla/internal/task/componentmanager/powershelf/psm/psm.go
  • rla/internal/task/componentmanager/providerapi/errors.go
  • rla/internal/task/componentmanager/providerapi/providerapi.go
  • rla/internal/task/componentmanager/providerapi/providerapi_test.go
  • rla/internal/task/componentmanager/providerapi/registry.go
💤 Files with no reviewable changes (1)
  • rla/internal/task/componentmanager/config.go
✅ Files skipped from review due to trivial changes (6)
  • rla/internal/task/componentmanager/config/errors.go
  • rla/internal/task/componentmanager/config/doc.go
  • rla/internal/task/componentmanager/nvlswitch/nvswitchmanager/nvswitchmanager.go
  • rla/internal/task/componentmanager/builtin/provider_config_decoders_test.go
  • rla/internal/scheduler/jobs/inventorysync/inventory.go
  • rla/internal/task/componentmanager/providerapi/providerapi_test.go
🚧 Files skipped from review as they are similar to previous changes (22)
  • rla/internal/task/componentmanager/builtin/config.go
  • rla/internal/task/componentmanager/powershelf/nico/nico.go
  • rla/internal/task/componentmanager/mock/mock.go
  • rla/internal/service/config.go
  • rla/internal/task/componentmanager/builtin/component_manager_factories_test.go
  • rla/internal/task/componentmanager/config/config.go
  • rla/internal/task/componentmanager/builtin/component_manager_factories.go
  • rla/internal/task/componentmanager/providerapi/providerapi.go
  • rla/internal/task/componentmanager/nvlswitch/nico/nico.go
  • rla/internal/task/componentmanager/componentmanager_test.go
  • rla/docs/component-manager-config.md
  • rla/internal/task/componentmanager/compute/nico/nico.go
  • rla/internal/task/componentmanager/componentmanager.go
  • rla/internal/scheduler/jobs/inventorysync/inventory_test.go
  • rla/docs/component-manager-architecture.md
  • rla/internal/task/componentmanager/builtin/config_test.go
  • rla/internal/task/componentmanager/errors.go
  • rla/internal/task/componentmanager/config/config_test.go
  • rla/internal/task/componentmanager/providerapi/errors.go
  • rla/cmd/serve.go
  • rla/internal/task/componentmanager/config/yaml.go
  • rla/internal/task/componentmanager/providerapi/registry.go

Summary by CodeRabbit

  • New Features

    • Introduced a typed provider API and a config-driven provider registry plus improved built-in config loading and provider completion.
  • Refactor

    • Rewired component-manager/provider interactions to use the new provider API with stricter validation and fail-fast initialization; config loading now prefers CLI, then env, then embedded defaults.
  • Bug Fixes

    • Startup now fails on provider misconfiguration instead of silently skipping providers.
  • Documentation

    • Updated architecture and config docs to reflect new provider patterns, loading priorities, and provider completion semantics.
  • Tests

    • Added and updated tests covering the new provider/config behavior and nil-safety.

Walkthrough

This PR restructures component-manager code into cmconfig and providerapi subpackages, implements strict YAML config loading with provider completion, introduces a nil-safe error-returning provider registry, provides builtin defaults and factory registration, and migrates startup, scheduler jobs, tests, and docs to the new APIs.

Changes

Component Manager Configuration Refactor

Layer / File(s) Summary
Service bootstrap
rla/cmd/serve.go, rla/internal/service/config.go
rla serve loads cmconfig.Config (CLI → env → builtin), builds providerapi.ProviderRegistry from ProviderConfigs (strict name checks and provider.NewProvider(ctx)), constructs builtin component-manager registry, and passes config + provider registry into service initialization.
providerapi: Errors & Registry
rla/internal/task/componentmanager/providerapi/errors.go, registry.go, providerapi.go, providerapi_test.go
Adds provider sentinel/typed errors; moves provider registry into providerapi, changes Register to return error, adds Has/List, and makes decoder/registry read methods nil-safe; tests cover nil-safety and duplicate registration.
cmconfig: Errors & Types
rla/internal/task/componentmanager/config/errors.go, config/config.go, doc.go
Adds config-scoped sentinel/typed errors and cmconfig.Config with normalized ComponentManagers and ProviderConfigs; provides New, Validate, HasProvider, and provider-completion helpers.
YAML parsing & LoadConfig
rla/internal/task/componentmanager/config/yaml.go, builtin/config.go
Implements strict YAML parsing (KnownFields(true)), ParseConfig/LoadConfig, decodes provider overrides via decoder registry, rejects duplicate provider entries, completes missing providers implied by component managers, and exposes builtin defaults.
Builtin component manager factories
rla/internal/task/componentmanager/builtin/component_manager_factories.go, tests
Adds NewComponentManagerRegistry to register compiled-in managers, compute Nico ComputePowerDelay from provider config, and initialize Registry with cmconfig.Config and *providerapi.ProviderRegistry.
Factory signature updates
rla/internal/task/componentmanager/{compute/nico,powershelf,nvlswitch,psm,mock}/...
Updates all manager factories to accept *providerapi.ProviderRegistry and to use providerapi.GetTyped for typed lookups.
Componentmanager core & error consolidation
rla/internal/task/componentmanager/componentmanager.go, errors.go
Updates ManagerFactory and Registry.Initialize to use cmconfig/providerapi, re-exports/aliases config/provider errors, preserves manager-specific error types, and adapts tests.
Scheduler integration
rla/internal/scheduler/jobs/inventorysync/*, leakdetection/*
Jobs now accept *providerapi.ProviderRegistry and cmconfig.Config; provider lookups use providerapi.GetTyped; required vs optional semantics unchanged.
Docs & Tests
rla/docs/*, *_test.go across packages
Updates docs to describe providerapi registry and ConfigDecoder pattern, clarifies authoritative YAML vs embedded defaults and provider completion semantics, and adds/updates tests for parsing, provider registry behavior, builtin defaults, and factory initialization.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.42% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor: Streamline component manager provider bootstrap' accurately reflects the main objective: restructuring the component manager provider initialization flow across dedicated packages while simplifying configuration paths.
Description check ✅ Passed The description is well-aligned with the changeset, detailing the three primary refactoring efforts: package restructuring, API reorganization, and configuration normalization.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
rla/internal/task/componentmanager/builtin/config.go (1)

53-70: 💤 Low value

Consider replacing the IIFE with a direct conditional assignment for idiomatic Go.

The Immediately Invoked Function Expression exists only to reuse the already-declared err variable while introducing config. A plain conditional achieves the same with no cognitive overhead and eliminates the duplicated path != "" check across the IIFE body and the error-wrapping block.

♻️ Proposed refactor
-	config, err := func() (cmconfig.Config, error) {
-		if path != "" {
-			// Load config from file
-			return cmconfig.LoadConfig(path, decoders)
-		}
-
-		// Use embedded service config
-		return cmconfig.New(defaultServiceComponentManagers(), decoders)
-	}()
-
-	if err != nil {
-		if path != "" {
-			err = fmt.Errorf("load config from file: %w", err)
-		} else {
-			err = fmt.Errorf("get default config: %w", err)
-		}
-		return cmconfig.Config{}, err
-	}
+	var config cmconfig.Config
+	if path != "" {
+		config, err = cmconfig.LoadConfig(path, decoders)
+		err = wrapErr(err, "load config from file")
+	} else {
+		config, err = cmconfig.New(defaultServiceComponentManagers(), decoders)
+		err = wrapErr(err, "get default config")
+	}
+	if err != nil {
+		return cmconfig.Config{}, err
+	}

Where wrapErr is a small inline helper, or simply inline fmt.Errorf in each branch. Either way the single path != "" check drives both loading and error-message selection.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rla/internal/task/componentmanager/builtin/config.go` around lines 53 - 70,
Replace the IIFE by a straightforward conditional: if path != "" call
cmconfig.LoadConfig(path, decoders) and assign to config, err; else call
cmconfig.New(defaultServiceComponentManagers(), decoders) and assign to config,
err; then check err and wrap it with fmt.Errorf("load config from file: %w",
err) when path != "" or fmt.Errorf("get default config: %w", err) otherwise—this
removes the IIFE and the duplicated path != "" check while keeping the same
calls to cmconfig.LoadConfig, cmconfig.New, defaultServiceComponentManagers, and
the existing err wrapping.
rla/internal/task/componentmanager/config/yaml.go (1)

116-140: 💤 Low value

Consider wrapping decoder errors with provider context.

When decoder.DecodeYAML fails at Line 131, the returned error may lack sufficient context about which provider configuration failed. Wrapping with the provider name would aid debugging.

♻️ Optional: Add provider context to decode errors
 		providerConfig, err := decoder.DecodeYAML(rawNode)
 		if err != nil {
-			return err
+			return fmt.Errorf("provider %q: %w", name, err)
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rla/internal/task/componentmanager/config/yaml.go` around lines 116 - 140,
The decoder.DecodeYAML error returned in setYAMLProviderConfigs lacks provider
context; update the error handling after calling decoder.DecodeYAML(rawNode) to
wrap or annotate the returned error with the provider name (use the `name` from
prepareProviderConfigForAdd) so the error message identifies which provider
failed—e.g., replace the direct return of err with a wrapped error that includes
`name` and the original err for decoder.DecodeYAML failures, leaving other logic
(prepareProviderConfigForAdd, HasProvider, ProviderConfigs assignment)
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rla/cmd/serve.go`:
- Around line 108-115: The code in the provider creation loop silently skips
providers when providerConfig.NewProvider returns an error (the block around
provider, err := providerConfig.NewProvider(ctx)), which can cause downstream
runtime failures; change this so you either treat creation failures as fatal
during startup (returning error / exiting) or, if non-fatal, record the failed
provider names in a skippedProviders set and after the provider loop validate
all configured component managers (e.g., wherever component managers are
constructed/registered) to ensure none depend on a skipped provider; if any do,
fail startup with a clear error referencing the missing provider and the
dependent component manager name.

In `@rla/docs/component-manager-architecture.md`:
- Around line 250-256: The import paths in the example are inconsistent: they
use github.com/NVIDIA/ncx-infra-controller-rest/... but the repository uses
github.com/NVIDIA/infra-controller-rest/; update the import strings in the
example (e.g., the imports referencing componentmanager, providerapi,
myapiprovider, common, operations, devicetypes) to use the correct base path
github.com/NVIDIA/infra-controller-rest/ so they match the rest of the codebase.

In `@rla/docs/component-manager-config.md`:
- Around line 37-38: The docs are ambiguous about whether an authoritative
service-loaded YAML's component_managers map can be partial; update the sentence
around "component_managers" to state the exact behavior: explicitly say whether
all three keys ("compute", "nvlswitch", "powershelf") are required when the file
is authoritative (for example: "All three component types must be present;
partial maps are not supported") or, if partial maps are merged with defaults,
state that and describe how missing keys are filled (e.g., defaults applied).
Reference the terms component_managers, compute, nvlswitch, powershelf and
"authoritative" in the new sentence so readers know which behavior applies.

---

Nitpick comments:
In `@rla/internal/task/componentmanager/builtin/config.go`:
- Around line 53-70: Replace the IIFE by a straightforward conditional: if path
!= "" call cmconfig.LoadConfig(path, decoders) and assign to config, err; else
call cmconfig.New(defaultServiceComponentManagers(), decoders) and assign to
config, err; then check err and wrap it with fmt.Errorf("load config from file:
%w", err) when path != "" or fmt.Errorf("get default config: %w", err)
otherwise—this removes the IIFE and the duplicated path != "" check while
keeping the same calls to cmconfig.LoadConfig, cmconfig.New,
defaultServiceComponentManagers, and the existing err wrapping.

In `@rla/internal/task/componentmanager/config/yaml.go`:
- Around line 116-140: The decoder.DecodeYAML error returned in
setYAMLProviderConfigs lacks provider context; update the error handling after
calling decoder.DecodeYAML(rawNode) to wrap or annotate the returned error with
the provider name (use the `name` from prepareProviderConfigForAdd) so the error
message identifies which provider failed—e.g., replace the direct return of err
with a wrapped error that includes `name` and the original err for
decoder.DecodeYAML failures, leaving other logic (prepareProviderConfigForAdd,
HasProvider, ProviderConfigs assignment) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6d6ffa7d-c6b8-407f-9691-95efcd4e8a3f

📥 Commits

Reviewing files that changed from the base of the PR and between f02eabc and 3eef9db.

📒 Files selected for processing (33)
  • rla/cmd/serve.go
  • rla/docs/component-manager-architecture.md
  • rla/docs/component-manager-config.md
  • rla/internal/scheduler/jobs/inventorysync/inventory.go
  • rla/internal/scheduler/jobs/inventorysync/inventory_test.go
  • rla/internal/scheduler/jobs/inventorysync/job.go
  • rla/internal/scheduler/jobs/leakdetection/job.go
  • rla/internal/service/config.go
  • rla/internal/task/componentmanager/builtin/component_manager_factories.go
  • rla/internal/task/componentmanager/builtin/component_manager_factories_test.go
  • rla/internal/task/componentmanager/builtin/config.go
  • rla/internal/task/componentmanager/builtin/config_test.go
  • rla/internal/task/componentmanager/builtin/provider_config_decoders.go
  • rla/internal/task/componentmanager/builtin/provider_config_decoders_test.go
  • rla/internal/task/componentmanager/componentmanager.go
  • rla/internal/task/componentmanager/componentmanager_test.go
  • rla/internal/task/componentmanager/compute/nico/nico.go
  • rla/internal/task/componentmanager/config.go
  • rla/internal/task/componentmanager/config/config.go
  • rla/internal/task/componentmanager/config/config_test.go
  • rla/internal/task/componentmanager/config/doc.go
  • rla/internal/task/componentmanager/config/errors.go
  • rla/internal/task/componentmanager/config/yaml.go
  • rla/internal/task/componentmanager/errors.go
  • rla/internal/task/componentmanager/mock/mock.go
  • rla/internal/task/componentmanager/nvlswitch/nico/nico.go
  • rla/internal/task/componentmanager/nvlswitch/nvswitchmanager/nvswitchmanager.go
  • rla/internal/task/componentmanager/powershelf/nico/nico.go
  • rla/internal/task/componentmanager/powershelf/psm/psm.go
  • rla/internal/task/componentmanager/providerapi/errors.go
  • rla/internal/task/componentmanager/providerapi/providerapi.go
  • rla/internal/task/componentmanager/providerapi/providerapi_test.go
  • rla/internal/task/componentmanager/providerapi/registry.go
💤 Files with no reviewable changes (1)
  • rla/internal/task/componentmanager/config.go

Comment thread rla/cmd/serve.go
Comment thread rla/docs/component-manager-architecture.md Outdated
Comment thread rla/docs/component-manager-config.md Outdated
@jw-nvidia jw-nvidia force-pushed the refactor/cm-provider-bootstrap branch from 3eef9db to 0a819d4 Compare May 7, 2026 23:01
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
rla/internal/scheduler/jobs/inventorysync/job.go (1)

69-76: ⚠️ Potential issue | 🟠 Major

Reconsider unconditional NICo requirement for partial configurations.

The constructor disables the entire inventory sync job when the NICo provider is absent (lines 69–76), but runInventoryOne demonstrates capability-aware dispatch: syncNVSwitches and syncPowershelves both guard against nil clients and dispatch based on cmConfig. However, syncMachines unconditionally requires nicoClient. This creates an inconsistency: a deployment configured with PSM and NSM but intentionally without NICo would fail to sync switches and shelves despite having the necessary providers.

Either document that machine sync is mandatory (making the entire job unsyncable without NICo), or permit graceful degradation by allowing the job to initialize when partial providers are available, with individual sync functions gating their operations on client availability—aligning with the existing cmConfig-based dispatch pattern.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rla/internal/scheduler/jobs/inventorysync/job.go` around lines 69 - 76, The
constructor currently bails out when providerapi.GetTyped for
nicoprovider.ProviderName fails; instead, stop returning nil—log the warning (as
you already do) but keep constructing the job with a nil nicoProvider so
runInventoryOne can perform capability-aware dispatch; specifically, remove the
early return in the block that calls
providerapi.GetTyped[*nicoprovider.Provider], allow nicoProvider to be nil, and
update syncMachines (and any other sync* helpers like syncNVSwitches and
syncPowershelves if needed) to explicitly guard on the stored
nicoProvider/nicoClient and the cmConfig before performing NICo-specific work so
the job gracefully degrades when NICo is absent (or alternatively add a clear
documented assertion in the constructor if machine sync must be mandatory).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rla/cmd/serve.go`:
- Around line 99-113: The loop over config.ProviderConfigs calls
providerConfig.Name() before checking for nil, which can panic; instead, first
check if providerConfig == nil and return the typed
providerapi.ProviderNotConfiguredError{Name: name} (the same error used below)
before calling providerConfig.Name() or providerConfig.NewProvider(ctx); update
the loop handling around providerConfig.Name(), providerConfig.NewProvider, and
the existing ProviderNotConfiguredError / ProviderConfigNameMismatchError
branches so nil entries yield the typed error rather than a panic.

In `@rla/docs/component-manager-config.md`:
- Around line 44-48: Update the documentation to include the NVSwitch
Manager-backed nvlswitch implementation: add the new implementation name (the
provider key used in the builtin registry for NVSwitch Manager) to the
`nvlswitch` row under "Available Implementations" and to any provider/completion
lists that currently show only `nico`/`psm` (e.g., the sections around the
existing nvlswitch/provider examples at the other noted ranges). Specifically,
edit the table row for the `nvlswitch` component and the provider completion
examples that mention `nico`/`psm` so they include the NVSwitch Manager provider
name exactly as it appears in the registry.

---

Outside diff comments:
In `@rla/internal/scheduler/jobs/inventorysync/job.go`:
- Around line 69-76: The constructor currently bails out when
providerapi.GetTyped for nicoprovider.ProviderName fails; instead, stop
returning nil—log the warning (as you already do) but keep constructing the job
with a nil nicoProvider so runInventoryOne can perform capability-aware
dispatch; specifically, remove the early return in the block that calls
providerapi.GetTyped[*nicoprovider.Provider], allow nicoProvider to be nil, and
update syncMachines (and any other sync* helpers like syncNVSwitches and
syncPowershelves if needed) to explicitly guard on the stored
nicoProvider/nicoClient and the cmConfig before performing NICo-specific work so
the job gracefully degrades when NICo is absent (or alternatively add a clear
documented assertion in the constructor if machine sync must be mandatory).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 11ee2044-5037-4867-9c56-9d15d4faae69

📥 Commits

Reviewing files that changed from the base of the PR and between 3eef9db and 0a819d4.

📒 Files selected for processing (33)
  • rla/cmd/serve.go
  • rla/docs/component-manager-architecture.md
  • rla/docs/component-manager-config.md
  • rla/internal/scheduler/jobs/inventorysync/inventory.go
  • rla/internal/scheduler/jobs/inventorysync/inventory_test.go
  • rla/internal/scheduler/jobs/inventorysync/job.go
  • rla/internal/scheduler/jobs/leakdetection/job.go
  • rla/internal/service/config.go
  • rla/internal/task/componentmanager/builtin/component_manager_factories.go
  • rla/internal/task/componentmanager/builtin/component_manager_factories_test.go
  • rla/internal/task/componentmanager/builtin/config.go
  • rla/internal/task/componentmanager/builtin/config_test.go
  • rla/internal/task/componentmanager/builtin/provider_config_decoders.go
  • rla/internal/task/componentmanager/builtin/provider_config_decoders_test.go
  • rla/internal/task/componentmanager/componentmanager.go
  • rla/internal/task/componentmanager/componentmanager_test.go
  • rla/internal/task/componentmanager/compute/nico/nico.go
  • rla/internal/task/componentmanager/config.go
  • rla/internal/task/componentmanager/config/config.go
  • rla/internal/task/componentmanager/config/config_test.go
  • rla/internal/task/componentmanager/config/doc.go
  • rla/internal/task/componentmanager/config/errors.go
  • rla/internal/task/componentmanager/config/yaml.go
  • rla/internal/task/componentmanager/errors.go
  • rla/internal/task/componentmanager/mock/mock.go
  • rla/internal/task/componentmanager/nvlswitch/nico/nico.go
  • rla/internal/task/componentmanager/nvlswitch/nvswitchmanager/nvswitchmanager.go
  • rla/internal/task/componentmanager/powershelf/nico/nico.go
  • rla/internal/task/componentmanager/powershelf/psm/psm.go
  • rla/internal/task/componentmanager/providerapi/errors.go
  • rla/internal/task/componentmanager/providerapi/providerapi.go
  • rla/internal/task/componentmanager/providerapi/providerapi_test.go
  • rla/internal/task/componentmanager/providerapi/registry.go
💤 Files with no reviewable changes (1)
  • rla/internal/task/componentmanager/config.go
✅ Files skipped from review due to trivial changes (8)
  • rla/internal/task/componentmanager/mock/mock.go
  • rla/internal/task/componentmanager/config/doc.go
  • rla/internal/task/componentmanager/builtin/component_manager_factories_test.go
  • rla/internal/task/componentmanager/config/errors.go
  • rla/internal/task/componentmanager/config/yaml.go
  • rla/internal/task/componentmanager/builtin/config_test.go
  • rla/internal/task/componentmanager/config/config.go
  • rla/docs/component-manager-architecture.md
🚧 Files skipped from review as they are similar to previous changes (7)
  • rla/internal/task/componentmanager/componentmanager_test.go
  • rla/internal/task/componentmanager/powershelf/psm/psm.go
  • rla/internal/task/componentmanager/compute/nico/nico.go
  • rla/internal/service/config.go
  • rla/internal/task/componentmanager/config/config_test.go
  • rla/internal/task/componentmanager/providerapi/providerapi_test.go
  • rla/internal/scheduler/jobs/inventorysync/inventory.go

Comment thread rla/cmd/serve.go
Comment on lines +99 to +113
for name, providerConfig := range config.ProviderConfigs {
configName := providerConfig.Name()
if name != configName {
return nil, providerapi.ProviderConfigNameMismatchError{
Name: name,
ConfigName: configName,
}
}
}

// Initialize PSM provider if configured
if config.Providers.PSM != nil {
psmProvider, err := psm.New(*config.Providers.PSM)
provider, err := providerConfig.NewProvider(ctx)
if err != nil {
log.Warn().Err(err).Msg("Unable to create PSM client (powershelf operations may not work)")
} else {
providerRegistry.Register(psmProvider)
log.Info().
Dur("timeout", config.Providers.PSM.Timeout).
Msg("Initialized PSM provider")
return nil, fmt.Errorf("create provider %q: %w", name, err)
}
if provider == nil {
return nil, providerapi.ProviderNotConfiguredError{Name: name}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard nil provider configs before calling Name().

providerConfig.Name() is dereferenced before any nil check. A nil entry in config.ProviderConfigs will panic during startup instead of returning the typed ProviderNotConfiguredError used elsewhere in this refactor.

Suggested fix
 	for name, providerConfig := range config.ProviderConfigs {
+		if providerConfig == nil {
+			return nil, providerapi.ProviderNotConfiguredError{Name: name}
+		}
+
 		configName := providerConfig.Name()
 		if name != configName {
 			return nil, providerapi.ProviderConfigNameMismatchError{
 				Name:       name,
 				ConfigName: configName,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rla/cmd/serve.go` around lines 99 - 113, The loop over config.ProviderConfigs
calls providerConfig.Name() before checking for nil, which can panic; instead,
first check if providerConfig == nil and return the typed
providerapi.ProviderNotConfiguredError{Name: name} (the same error used below)
before calling providerConfig.Name() or providerConfig.NewProvider(ctx); update
the loop handling around providerConfig.Name(), providerConfig.NewProvider, and
the existing ProviderNotConfiguredError / ProviderConfigNameMismatchError
branches so nil entries yield the typed error rather than a panic.

Comment thread rla/docs/component-manager-config.md
- split component manager config into dedicated config and builtin packages
- move provider registry APIs under providerapi, and remove componentmanager
  provider re-export wrappers.
- normalize provider config construction so YAML/default paths share config
  helpers while cmd/serve only wires provider creation.

Signed-off-by: Jin Wang <jinwan@nvidia.com>
@jw-nvidia jw-nvidia force-pushed the refactor/cm-provider-bootstrap branch from 0a819d4 to 6b50d8f Compare May 8, 2026 15:42
@jw-nvidia jw-nvidia requested a review from kunzhao-nv May 8, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant