Feat/rename ncx to nico#507
Conversation
Summary by CodeRabbit
WalkthroughThis pull request effectuates a complete conversion of the codebase, build/test/deploy infrastructure, configuration, and user/documentation surface from "Carbide" to "NICo". New canonical unprefixed role constants are introduced and used throughout all authorization, RBAC, and role-based checks. Every endpoint, YAML manifest, Helm chart, Docker image, test/fixture, and CLI/SDK component is updated for NICo branding and correctness. Handler/test logic generalize error and role matching for uniformity and future migration safety. ChangesNICo Rebranding and Canonical Authorization Refactor
Sequence Diagram(s)Not present—this refactor/unification targets structural, config, and naming layers globally and does not introduce new orchestrated flows between three or more runtime actors. Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
|
There was a problem hiding this comment.
Actionable comments posted: 18
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (10)
ipam/bun_test.go (1)
41-47:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
TestDBConfig.Namerename is not propagated to test connections.Line 45 sets
Nametonicotest, but the connection helpers still hardcode"postgres"(Line 82, Line 87, Line 92), so this rename is effectively bypassed in this file.🔧 Proposed fix
func getTestBundbSession(t *testing.T) (*bun.DB, error) { cfg := getTestDBParams() - return newBunPostgres(cfg.Host, cfg.Port, cfg.User, cfg.Password, "postgres") + return newBunPostgres(cfg.Host, cfg.Port, cfg.User, cfg.Password, cfg.Name) } func getTestSqlxSession(t *testing.T) (sqlIf, error) { cfg := getTestDBParams() - return newPostgres(cfg.Host, cfg.Port, cfg.User, cfg.Password, "postgres", SSLModeDisable) + return newPostgres(cfg.Host, cfg.Port, cfg.User, cfg.Password, cfg.Name, SSLModeDisable) } func getTestBadBundbSession(t *testing.T) (*bun.DB, error) { cfg := getTestDBParams() - return newBunPostgres("1.2.3.4", cfg.Port, cfg.User, cfg.Password, "postgres") + return newBunPostgres("1.2.3.4", cfg.Port, cfg.User, cfg.Password, cfg.Name) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ipam/bun_test.go` around lines 41 - 47, The TestDBConfig value sets Name to "nicotest" in getTestDBParams but the test connection helpers in this file still hardcode "postgres", so update those helpers to use the TestDBConfig.Name field instead of the literal "postgres"; locate the connection-building functions in this file (the helpers that create DB connection strings) and replace the hardcoded database name with the passed-in cfg.Name/TestDBConfig.Name when constructing the DSN or sql.Open calls.api/pkg/api/handler/interface.go (1)
72-80:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCorrect the Swagger path parameter name to match the route placeholder.
The
@Paramannotation usesinstanceId(camelCase), but the corresponding@Routerplaceholder isinstance_id(snake_case). This inconsistency will cause the generated OpenAPI documentation to desynchronize with the actual route definition.Proposed fix
-// `@Param` instanceId path string true "ID of Instance" +// `@Param` instance_id path string true "ID of Instance" // `@Param` status query string false "Filter by status" e.g. 'Pending', 'Error'" // `@Param` includeRelation query string false "Related entities to include in response e.g. 'Instance', 'Subnet'" // `@Param` pageNumber query integer false "Page number of results returned" // `@Param` pageSize query integer false "Number of results per page" // `@Param` orderBy query string false "Order by field" // `@Success` 200 {object} model.APIInterface // `@Router` /v2/org/{org}/nico/instance/{instance_id}/interface [get]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/interface.go` around lines 72 - 80, Update the Swagger `@Param` for the path parameter in GetAllInterfaceHandler.Handle so the parameter name matches the route placeholder: change the path param identifier from instanceId to instance_id in the `@Param` annotation used for the instance ID; ensure the edited annotation exactly matches the `@Router` placeholder {instance_id} to keep generated OpenAPI docs in sync with the route.api/pkg/api/handler/infinibandinterface.go (1)
73-82:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCorrect Swagger path parameter declarations to match route templates.
The annotation blocks contain misaligned path parameters: Block 1 declares
instanceId(camelCase) but the route template uses{instance_id}(snake_case); Block 2 declaresinstanceIdandinfinibandPartitionIdas path parameters despite the route containing no such placeholders. These inconsistencies produce incorrect OpenAPI documentation.Proposed annotation fixes
-// `@Param` instanceId path string true "ID of Instance" +// `@Param` instance_id path string true "ID of Instance" // `@Param` status query string false "Filter by status" e.g. 'Pending', 'Error'" // `@Param` includeRelation query string false "Related entities to include in response e.g. 'Instance', 'Subnet'" // `@Param` pageNumber query integer false "Page number of results returned" // `@Param` pageSize query integer false "Number of results per page" // `@Param` orderBy query string false "Order by field" // `@Success` 200 {object} model.APIInterface // `@Router` /v2/org/{org}/nico/instance/{instance_id}/interface [get]For lines 128–137, remove the spurious path declarations:
// `@Param` org path string true "Name of NGC organization" // `@Param` siteId query string true "ID of Site" -// `@Param` instanceId path string true "ID of Instance" -// `@Param` infinibandPartitionId path string true "ID of InfiniBandPartition" // `@Param` status query string false "Filter by status" e.g. 'Pending', 'Error'" // `@Param` includeRelation query string false "Related entities to include in response e.g. 'InfiniBandPartition, Instance'" // `@Param` pageNumber query integer false "Page number of results returned" // `@Param` pageSize query integer false "Number of results per page" // `@Param` orderBy query string false "Order by field" // `@Success` 200 {object} model.APIInfiniBandInterface // `@Router` /v2/org/{org}/nico/infiniband-interface [get]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/infinibandinterface.go` around lines 73 - 82, The Swagger annotations for GetAllInstanceInfiniBandInterfaceHandler.Handle contain incorrect path parameter names and extraneous path params; update the comment block above Handle so path parameters exactly match the route template (use instance_id not instanceId) and remove any path declarations that are not present in the route (e.g., infinibandPartitionId and any other spurious path params); ensure `@Param` lines reflect only actual path placeholders and correct casing to produce accurate OpenAPI docs.cli/pkg/app.go (1)
52-98:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestore legacy
CARBIDE_*env-var fallbacks for CLI flags.Lines 52–98 only register
NICO_*env vars. This can break existing scripts that still export legacy names. Keep both during transition and preferNICO_*first.🔧 Proposed compatibility patch
&cli.StringFlag{ Name: "config", Usage: "Path to config file", - EnvVars: []string{"NICO_CONFIG"}, + EnvVars: []string{"NICO_CONFIG", "CARBIDE_CONFIG"}, }, &cli.StringFlag{ Name: "base-url", Usage: "API base URL", - EnvVars: []string{"NICO_BASE_URL"}, + EnvVars: []string{"NICO_BASE_URL", "CARBIDE_BASE_URL"}, Value: defaultBaseURL, }, &cli.StringFlag{ Name: "org", Usage: "Organization name", - EnvVars: []string{"NICO_ORG"}, + EnvVars: []string{"NICO_ORG", "CARBIDE_ORG"}, }, &cli.StringFlag{ Name: "token", Usage: "API bearer token", - EnvVars: []string{"NICO_TOKEN"}, + EnvVars: []string{"NICO_TOKEN", "CARBIDE_TOKEN"}, }, @@ &cli.StringFlag{ Name: "token-url", Usage: "OIDC token endpoint URL for login and token refresh", - EnvVars: []string{"NICO_TOKEN_URL"}, + EnvVars: []string{"NICO_TOKEN_URL", "CARBIDE_TOKEN_URL"}, }, &cli.StringFlag{ Name: "keycloak-url", Usage: "Keycloak base URL (constructs token-url if --token-url is not set)", - EnvVars: []string{"NICO_KEYCLOAK_URL"}, + EnvVars: []string{"NICO_KEYCLOAK_URL", "CARBIDE_KEYCLOAK_URL"}, }, &cli.StringFlag{ Name: "keycloak-realm", Usage: "Keycloak realm (used with --keycloak-url)", - EnvVars: []string{"NICO_KEYCLOAK_REALM"}, + EnvVars: []string{"NICO_KEYCLOAK_REALM", "CARBIDE_KEYCLOAK_REALM"}, Value: "nico-dev", }, &cli.StringFlag{ Name: "client-id", Usage: "OAuth client ID", - EnvVars: []string{"NICO_CLIENT_ID"}, + EnvVars: []string{"NICO_CLIENT_ID", "CARBIDE_CLIENT_ID"}, Value: "nico-api", },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/pkg/app.go` around lines 52 - 98, The CLI flags currently only use NICO_* environment variable names (e.g., flags with EnvVars: []string{"NICO_CONFIG"}, "NICO_BASE_URL", "NICO_ORG", "NICO_TOKEN", "NICO_TOKEN_URL", "NICO_KEYCLOAK_URL", "NICO_KEYCLOAK_REALM", "NICO_CLIENT_ID"); restore backward-compatible CARBIDE_* fallbacks by updating each flag's EnvVars slice to include both names with the NICO_* name first (for example EnvVars: []string{"NICO_CONFIG", "CARBIDE_CONFIG"}) so existing scripts still work while preferring the new NICO variables; do this for all the flags that currently reference only NICO_* (leave flags that had no env var like token-command/debug unchanged) and keep existing Value/default settings.api/pkg/api/handler/fabric.go (1)
67-74:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix Swagger
siteIdparameter annotations to match the path routes.Line 73 and Line 235 routes require
{siteId}in the path, but docs are inconsistent:GetAllomits it andGetmarks it as query on Line 232. This causes inaccurate OpenAPI output.📝 Suggested annotation correction
// `@Param` org path string true "Name of NGC organization" +// `@Param` siteId path string true "ID of Site" // `@Param` includeRelation query string false "Related entities to include in response e.g. 'InfrastructureProvider', 'Site'" @@ -// `@Param` siteId query string true "ID of Site" +// `@Param` siteId path string true "ID of Site"Also applies to: 230-235
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/fabric.go` around lines 67 - 74, The Swagger annotations for the Fabric handlers are missing/incorrect for the siteId path parameter; update the doc comments for GetAllFabricHandler.Handle (and the corresponding Get handler) so that `@Param` siteId is declared as a path parameter (e.g. `@Param` siteId path string true "Site ID") rather than omitted or marked as query, and ensure the `@Router` paths include {siteId} to match the actual route signatures; locate the comment blocks above the Handle methods in fabric.go and replace the incorrect `@Param` for siteId (and any misplaced query annotation) with the correct path declaration.helm/charts/nico-rest/charts/nico-rest-cert-manager/values.yaml (1)
19-20:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winInconsistent image name still references "carbide".
The
image.namevalue remainscarbide-rest-cert-managerwhile other identifiers in this chart have been renamed to usenico-rest-cert-manager. This inconsistency may cause confusion or deployment failures if the container image registry is also being updated.Proposed fix
image: - name: carbide-rest-cert-manager + name: nico-rest-cert-manager🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@helm/charts/nico-rest/charts/nico-rest-cert-manager/values.yaml` around lines 19 - 20, The image.name value is inconsistent and should be updated from "carbide-rest-cert-manager" to "nico-rest-cert-manager"; edit the values.yaml entry for image.name to use "nico-rest-cert-manager" and then search the chart for any remaining "carbide-rest-cert-manager" references (templates, Chart.yaml, other values keys) to ensure all identifiers and image references are renamed consistently.api/pkg/api/handler/expectedmachine_test.go (1)
1902-1917:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMigrate the remaining regression tests to canonical NICo route/role inputs.
Line 1902 and Line 1995 still hit
/carbide/..., and Line 1916 and Line 2010 still use hardcodedFORGE_PROVIDER_ADMIN. These two tests now validate legacy compatibility rather than the primary NICo surface, so NICo-specific regressions could slip through.Proposed patch
- req := httptest.NewRequest(http.MethodPost, "/v2/org/"+org+"/carbide/expected-machine", bytes.NewReader(reqBody)) + req := httptest.NewRequest(http.MethodPost, "/v2/org/"+org+"/nico/expected-machine", bytes.NewReader(reqBody)) @@ - Roles: []string{"FORGE_PROVIDER_ADMIN"}, + Roles: []string{authz.ProviderAdminRole}, @@ - url := "/v2/org/" + org + "/carbide/expected-machine/" + testEM.ID.String() + url := "/v2/org/" + org + "/nico/expected-machine/" + testEM.ID.String() @@ - Roles: []string{"FORGE_PROVIDER_ADMIN"}, + Roles: []string{authz.ProviderAdminRole},Also applies to: 1995-2011
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/expectedmachine_test.go` around lines 1902 - 1917, Update the test to use the canonical NICo route and role inputs instead of the legacy carbide/ and FORGE_PROVIDER_ADMIN values: change the request path created in req (currently "/v2/org/"+org+"/carbide/expected-machine") to the NICo path "/v2/org/"+org+"/nico/expected-machine", and replace the hardcoded role in the c.Set("user", &cdbm.User{... Roles: []string{"FORGE_PROVIDER_ADMIN"}, ...}) block with the NICo role (use the project canonical NICo role constant if available, otherwise "NICO_PROVIDER_ADMIN") so the test validates the primary NICo surface rather than legacy compatibility.auth/pkg/authentication/keycloak_test.go (1)
1809-1823:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove the unreachable duplicate token-endpoint case.
Lines 1810 and 1819 contain identical predicates in the same switch statement. The first case matches and executes, rendering the second case unreachable dead code. The token-exchange error path in the mock server is never exercised.
Suggested fix
switch { case strings.Contains(req.URL.Path, "/realms/nico/protocol/openid-connect/token"): // Always return unauthorized for admin login res.WriteHeader(http.StatusUnauthorized) res.Write([]byte(`{"error":"invalid_grant","error_description":"Invalid credentials"}`)) case strings.Contains(req.URL.Path, "/admin/realms/nico/identity-provider/instances"): // Return empty IDP list res.WriteHeader(http.StatusOK) res.Header().Set("Content-Type", "application/json") res.Write([]byte(`[]`)) - case strings.Contains(req.URL.Path, "/realms/nico/protocol/openid-connect/token"): - // Return error for token exchange - res.WriteHeader(http.StatusBadRequest) - res.Write([]byte(`{"error":"invalid_grant","error_description":"Authorization code expired"}`)) default: res.WriteHeader(http.StatusNotFound) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@auth/pkg/authentication/keycloak_test.go` around lines 1809 - 1823, The switch in the test HTTP handler in keycloak_test.go has two identical cases using strings.Contains(req.URL.Path, "/realms/nico/protocol/openid-connect/token"), making the second (token-exchange error) branch unreachable; remove the duplicate predicate and ensure there are distinct predicates so admin login returns 401 while the token-exchange path returns 400. Locate the handler switch (the cases using strings.Contains(req.URL.Path, "/realms/nico/protocol/openid-connect/token") and the token-exchange error branch) and either remove/merge the redundant case or change the second case's predicate to the actual token-exchange endpoint used by the code under test so the token-exchange error path is exercised.deploy/kustomize/base/site-agent/rbac.yaml (1)
31-34:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftRestrict secret permissions to explicitly required resources for least-privilege compliance.
The Role grants access to all secrets in the namespace. The site-agent statefulset references only three specific secrets (
site-registration,core-grpc-client-site-agent-certs,temporal-client-site-agent-certs), making the current broad permission unnecessary. ApplyresourceNamesto limit access to only these required secrets:rules: - apiGroups: [""] resources: ["secrets"] resourceNames: ["site-registration", "core-grpc-client-site-agent-certs", "temporal-client-site-agent-certs"] verbs: ["get", "list", "watch", "update", "patch"] - apiGroups: ["cert-manager.io"] resources: ["certificaterequests"] verbs: ["get", "list", "watch", "update", "patch"]Splitting the rules allows certificaterequests to remain unrestricted (cert-manager creates these dynamically), while core secret access is scoped to known resources.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/kustomize/base/site-agent/rbac.yaml` around lines 31 - 34, The Role rule currently grants namespace-wide secret access; split the combined rule into two rules: one for secrets that adds resourceNames limiting access to only the three secrets referenced by the site-agent (site-registration, core-grpc-client-site-agent-certs, temporal-client-site-agent-certs) and a separate rule for cert-manager.io certificaterequests (leave certificaterequests unrestricted), ensuring you update the rules block to reference resources ["secrets"] with resourceNames set to those three names and resources ["certificaterequests"] under apiGroups ["cert-manager.io"] for the other rule.deploy/INSTALLATION.md (1)
899-906:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAlign the token example with the documented seeded admin user.
This guide lists the pre-seeded admin username as
admin, but the curl example authenticates asadmin@example.com. That inconsistency is enough to block first-time setup unless email login is explicitly enabled and documented here.As per coding guidelines,
**/*.md: "Assess the documentation for misspellings, grammatical errors, missing documentation and correctness".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/INSTALLATION.md` around lines 899 - 906, The token curl example uses username=admin@example.com which conflicts with the documented seeded admin user "admin"; update the TOKEN acquisition snippet (the curl POST that sets TOKEN and uses client_id=nico-api and realm nico-dev) to use username=admin (or clearly document that email login must be enabled) so the example matches the seeded admin credentials in the guide.
🟡 Minor comments (15)
api/pkg/api/handler/machinevalidation.go-697-697 (1)
697-697:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAlign router path placeholder names with handler parameter keys.
Lines 697/833/1098/1386/1594 use
{id}or{name}, while handlers readmachineID/cfgNamefromc.Param(...). Keep these names consistent to prevent ambiguous or incorrect generated API docs.Proposed fix
-// `@Router` /v2/org/{org}/nico/site/{site}/machine-validation/results/machine/{id} [get] +// `@Router` /v2/org/{org}/nico/site/{site}/machine-validation/results/machine/{machineID} [get] -// `@Router` /v2/org/{org}/nico/site/{site}/machine-validation/runs/machine/{id} [get] +// `@Router` /v2/org/{org}/nico/site/{site}/machine-validation/runs/machine/{machineID} [get] -// `@Router` /v2/org/{org}/nico/site/{site}/machine-validation/external-config/{name} [get] +// `@Router` /v2/org/{org}/nico/site/{site}/machine-validation/external-config/{cfgName} [get] -// `@Router` /v2/org/{org}/nico/site/{site}/machine-validation/external-config/{name} [patch] +// `@Router` /v2/org/{org}/nico/site/{site}/machine-validation/external-config/{cfgName} [patch] -// `@Router` /v2/org/{org}/nico/site/{site}/machine-validation/external-config/{name} [delete] +// `@Router` /v2/org/{org}/nico/site/{site}/machine-validation/external-config/{cfgName} [delete]Also applies to: 833-833, 1098-1098, 1386-1386, 1594-1594
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/machinevalidation.go` at line 697, Route placeholder names in the swagger route comments must match the handler parameter keys; update each router comment that currently uses {id} or {name} to use {machineID} or {cfgName} respectively so they match c.Param("machineID") and c.Param("cfgName") used in the handlers (e.g., the comment above the handler that reads c.Param("machineID") should use /machine/{machineID} and the one that reads c.Param("cfgName") should use /{cfgName}); locate the affected router comments near the functions that call c.Param("machineID") / c.Param("cfgName") and replace the placeholder tokens accordingly to keep generated API docs unambiguous.api/pkg/api/handler/nvlinkinterface.go-81-81 (1)
81-81:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse consistent instance path parameter naming in router annotation.
Line 81 uses
{instance_id}but the handler and@ParamuseinstanceId. This inconsistency can degrade generated API docs/clients.Proposed fix
-// `@Router` /v2/org/{org}/nico/instance/{instance_id}/interface [get] +// `@Router` /v2/org/{org}/nico/instance/{instanceId}/interface [get]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/nvlinkinterface.go` at line 81, The router annotation uses a different instance path parameter name ({instance_id}) than the handler/@Param (instanceId); update the `@Router` annotation to use {instanceId} so it matches the `@Param` and handler signature in nvlinkinterface.go (ensure the route line with "/v2/org/{org}/nico/instance/{instanceId}/interface [get]" matches the existing `@Param` instanceId and the handler function that expects instanceId).api/pkg/api/handler/site.go-1169-1169 (1)
1169-1169:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix path casing mismatch in status-history route annotation.
Line 1169 uses
/nico/Site/{id}/...while the rest of this file uses lowercasesite. This can produce an incorrect OpenAPI path for consumers.Suggested fix
-// `@Router` /v2/org/{org}/nico/Site/{id}/status-history [get] +// `@Router` /v2/org/{org}/nico/site/{id}/status-history [get]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/site.go` at line 1169, Update the Router annotation string that currently reads "/v2/org/{org}/nico/Site/{id}/status-history" to use lowercase "site" so it matches the rest of the file and generated OpenAPI paths; i.e., change the annotation in site.go for the status-history route to "/v2/org/{org}/nico/site/{id}/status-history" (the comment/annotation on the status-history handler)..github/scripts/promote-single-ngc-resource.sh-32-33 (1)
32-33:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winKeep the promotion log aligned with
DEST_TEAM.Line 77 now reports an upload to
nico, but Lines 78-79 still authenticate and push toDEST_TEAM, which remainscarbide. That makes promotion logs misleading until the actual destination team changes.Suggested fix
- echo "Uploading to nico/$RESOURCE_NAME:$SEMANTIC_VERSION..." + echo "Uploading to ${DEST_TEAM}/$RESOURCE_NAME:$SEMANTIC_VERSION..."Also applies to: 77-79
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/scripts/promote-single-ngc-resource.sh around lines 32 - 33, The promotion log prints a hardcoded "nico" while the script uses the DEST_TEAM variable ("carbide"), causing misleading messages; update the upload/promotion log to reference the DEST_TEAM variable (use "${DEST_TEAM}" consistently) instead of the literal "nico" and ensure the subsequent authentication/push commands that already use DEST_TEAM remain in sync with that log message so all output reflects the actual destination team.cli/README.md-183-189 (1)
183-189:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd a language identifier to this fenced block.
This block is still unlabeled, so markdownlint MD040 will keep warning on the README.
textis sufficient here.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/README.md` around lines 183 - 189, The README's fenced code block listing config files is unlabeled causing markdownlint MD040; update the triple-backtick fence that encloses the lines starting with "~/.nico/config.yaml" so it becomes "```text" (i.e., add the language identifier "text" to the opening fence) to satisfy the linter without changing the block content.cli/README.md-147-147 (1)
147-147:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse
nicocliin the command synopsis.Line 147 is the only place still documenting the command as
cli <resource> ..., which contradicts the install and usage sections and points readers at a nonexistent binary name.As per coding guidelines "Assess the documentation for misspellings, grammatical errors, missing documentation and correctness".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/README.md` at line 147, Replace the incorrect command synopsis text "cli <resource> [sub-resource] <action> [args] [flags]" in the README with the actual binary name used elsewhere by the project: "nicocli <resource> [sub-resource] <action> [args] [flags]"; update that single occurrence so the command synopsis matches the install and usage sections and points to the real executable.api/pkg/api/handler/vpc_test.go-2021-2021 (1)
2021-2021:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse the actual virtualization route template here.
Line 2021 still sets
/v2/org/%v/nico/vpc/%v, but the handler is documented and routed as/v2/org/{org}/nico/vpc/{id}/virtualization. Keeping the Echo path aligned prevents this test from missing path-template-dependent behavior in middleware and tracing.Suggested fix
- ec.SetPath(fmt.Sprintf("/v2/org/%v/nico/vpc/%v", tt.args.reqOrg, tt.args.reqVPCID)) + ec.SetPath(fmt.Sprintf("/v2/org/%v/nico/vpc/%v/virtualization", tt.args.reqOrg, tt.args.reqVPCID))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/vpc_test.go` at line 2021, The test currently sets the Echo request path to "/v2/org/%v/nico/vpc/%v" which does not match the handler's routed template; update the ec.SetPath call in vpc_test.go to use the full virtualization route "/v2/org/%v/nico/vpc/%v/virtualization" (still formatting with tt.args.reqOrg and tt.args.reqVPCID) so that middleware/tracing and path-template-dependent behavior for the handler (referenced by ec.SetPath) are exercised correctly.auth/README.md-199-201 (1)
199-201:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAlign Keycloak example values with the shipped kustomize defaults.
This guide tells users to edit the existing
nico-rest-api-config, but the sample usesrealm: nico/clientID: nico-cloudwhile the base manifest usesrealm: nico-dev/clientID: nico-api. Please make the difference explicit (e.g., “production override”) or make values consistent to avoid broken copy/paste setups.As per coding guidelines,
**/*.md: "Assess the documentation for misspellings, grammatical errors, missing documentation and correctness".Also applies to: 230-233
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@auth/README.md` around lines 199 - 201, Update the Keycloak example values in the README so they match the shipped kustomize defaults or clearly mark them as production overrides: change the sample for `keycloak.realm` and `keycloak.clientID` to the base manifest values (`nico-dev` and `nico-api`) or add an explicit note (e.g., “production override: realm=nico, clientID=nico-cloud”) next to `keycloak.realm`, `keycloak.clientID`, and keep `keycloak.clientSecretPath` as-is; ensure the same change is applied to the other occurrence mentioned (lines 230-233) so copy/paste won’t break users..github/workflows/lint-and-test.yml-306-315 (1)
306-315:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winEnsure mock servers are always stopped when tests fail.
With current flow, a failing
go testexits before the stop commands execute. Usetrapso cleanup runs on all exit paths.Proposed change
if [ "${{ matrix.module }}" = "site-agent" ]; then make nico-mock-server-start make rla-mock-server-start + cleanup() { + make nico-mock-server-stop || true + make rla-mock-server-stop || true + } + trap cleanup EXIT fi go test -v -race -p 1 ./${{ matrix.module }}/... -coverprofile=coverage-${{ matrix.module }}.txt -covermode=atomic 2>&1 | tee test-output-${{ matrix.module }}.txt - - if [ "${{ matrix.module }}" = "site-agent" ]; then - make nico-mock-server-stop - make rla-mock-server-stop - fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/lint-and-test.yml around lines 306 - 315, The test job may exit early on failure and skip stopping the mock servers; wrap the lifecycle of the mocks with a trap so they are stopped on any exit. Specifically, after starting the mocks via make nico-mock-server-start and make rla-mock-server-start, register a trap that runs make nico-mock-server-stop and make rla-mock-server-stop on EXIT (or ERR/INT) before running the go test command; ensure the trap is cleared or remains in place as appropriate so the stop targets always execute even if go test fails.cli/pkg/commands.go-367-367 (1)
367-367:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse the actual CLI binary name in help and auth guidance.
Line 367 and Line 721 use
cli, but users invokenicocli; this makes usage/help and recovery messaging misleading.Proposed fix
+const cliBinaryName = "nicocli" + func buildActionCommand(spec *Spec, ro resolvedOp, subResource string) *cli.Command { @@ - usageText := "cli " + tagToCommand(ro.tag) + usageText := cliBinaryName + " " + tagToCommand(ro.tag) @@ - return nil, fmt.Errorf("no token available; run 'cli login' or set --token / NICO_TOKEN") + return nil, fmt.Errorf("no token available; run '%s login' or set --token / NICO_TOKEN", cliBinaryName) }Also applies to: 721-721
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cli/pkg/commands.go` at line 367, The help/usage string is hardcoded to "cli " which is wrong for users; update the construction of usage text (usageText := "cli " + tagToCommand(ro.tag)) and any other hardcoded "cli" occurrences (e.g., the message around line 721) to use the real binary name instead—either replace the literal with "nicocli " or derive it dynamically (e.g., from a single shared constant like CLIName or os.Args[0]) so that tagToCommand(ro.tag) is prefixed by the actual CLI binary name in all help/auth messages..github/workflows/README.md-132-132 (1)
132-132:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix Docker tag format typo in docs.
Line 132 uses
=in<image>:1.0.0=<short-sha>, but Docker tags should use-(as your own example already shows).Proposed doc fix
-1. **VERSION + Git Short SHA**: `<image>:1.0.0=<short-sha>` (e.g., `nico-rest-api:1.0.0-abc1234`) +1. **VERSION + Git Short SHA**: `<image>:1.0.0-<short-sha>` (e.g., `nico-rest-api:1.0.0-abc1234`)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/README.md at line 132, The doc has a typo in the Docker tag example: change the incorrect `<image>:1.0.0=<short-sha>` to use a hyphen instead of `=` so the tag format becomes `<image>:1.0.0-<short-sha>` (e.g., `nico-rest-api:1.0.0-abc1234`) to match Docker tag semantics and the example shown.helm/README.md-44-45 (1)
44-45:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDisambiguate chart path commands for copy-paste reliability.
These commands assume the current directory is
helm/. If run from repo root,charts/...fails. Either prependhelm/in command paths or explicitly statecd helmbefore examples.Example normalization (repo-root friendly)
-helm upgrade --install nico-rest charts/nico-rest/ \ +helm upgrade --install nico-rest helm/charts/nico-rest/ \ ... -helm upgrade --install nico-rest-site-agent charts/nico-rest-site-agent/ \ +helm upgrade --install nico-rest-site-agent helm/charts/nico-rest-site-agent/ \As per coding guidelines:
**/*.md: "Assess the documentation for misspellings, grammatical errors, missing documentation and correctness".Also applies to: 68-69, 105-106, 118-119
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@helm/README.md` around lines 44 - 45, Update the helm command examples so they work when run from the repository root: either prefix chart paths with "helm/" (e.g., change "charts/nico-rest/" to "helm/charts/nico-rest/") or add an explicit instruction to "cd helm" before the examples; apply the same normalization to the other affected command examples noted (lines showing "helm upgrade --install ... charts/...") so copy-paste from repo root doesn't break and the README is unambiguous.deploy/README.md-159-159 (1)
159-159:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix relative CLI link target.
Line 159 uses
cli/README.md, which is relative todeploy/and likely resolves incorrectly. Use../cli/README.mdso the link works from this document.Proposed fix
-- **CLI** — install `nicocli` to interact with the deployed cluster. See [cli/README.md](cli/README.md). +- **CLI** — install `nicocli` to interact with the deployed cluster. See [cli/README.md](../cli/README.md).As per coding guidelines:
**/*.md: "Assess the documentation for misspellings, grammatical errors, missing documentation and correctness".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/README.md` at line 159, The link target "cli/README.md" in the README is incorrect for the deploy/ document; update the link string to "../cli/README.md" so the CLI README resolves correctly when referenced from deploy/ (replace the occurrence of "cli/README.md" with "../cli/README.md")..github/workflows/promotion.yaml-55-59 (1)
55-59:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate approval message to reflect NICo branding.
The approval step message still references "Carbide REST" instead of "NICo REST", creating inconsistent user-facing messaging.
📝 Proposed fix
- name: Approve Promote to NICo Registry run: | echo "Received build artifacts: ${{ inputs.build_artifacts }}" echo "Approve Promote to NICo Registry" - echo "Promote Carbide REST to registry ngcr.io/0837451325059433/carbide" + echo "Promote NICo REST to registry ngcr.io/0837451325059433/carbide"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/promotion.yaml around lines 55 - 59, Update the approval step's user-facing echo to use the NICo branding: in the GitHub Actions step with name "Approve Promote to NICo Registry" replace the echo that currently says "Promote Carbide REST to registry ngcr.io/0837451325059433/carbide" with "Promote NICo REST to registry ngcr.io/0837451325059433/carbide" (leave the other echo lines unchanged).helm/charts/nico-rest/Chart.yaml-8-9 (1)
8-9:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove duplicate keyword.
The
keywordslist containsnicotwice. Remove the duplicate entry.🧹 Proposed fix
keywords: - nico - - nico - bare-metal🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@helm/charts/nico-rest/Chart.yaml` around lines 8 - 9, The Chart.yaml keywords array contains a duplicate entry "nico"; locate the keywords section in Chart.yaml and remove the repeated "nico" so each keyword appears only once (leave a single "nico" entry in the keywords list).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 61eb3e4c-9857-42ca-8a37-6778095b5277
⛔ Files ignored due to path filters (2)
helm/charts/carbide-rest/Chart.lockis excluded by!**/*.lockhelm/charts/nico-rest/Chart.lockis excluded by!**/*.lock
📒 Files selected for processing (298)
.coderabbit.yaml.github/scripts/promote-single-ngc-resource.sh.github/workflows/README.md.github/workflows/build-push-docker.yml.github/workflows/build-push-service.yml.github/workflows/helm-workflows.yml.github/workflows/lint-and-test.yml.github/workflows/main-build.yml.github/workflows/prepare-build-info.yml.github/workflows/promotion.yaml.github/workflows/slack-notifications.yml.gitignoreAGENTS.mdCHANGELOG.mdCODE_OF_CONDUCT.mdCONTRIBUTING.mdMakefileREADME.mdapi/cmd/api/main.goapi/config.yamlapi/internal/config/config.goapi/internal/config/db_test.goapi/internal/server/server.goapi/internal/server/server_test.goapi/pkg/api/handler/allocation.goapi/pkg/api/handler/allocation_test.goapi/pkg/api/handler/allocationconstraint.goapi/pkg/api/handler/allocationconstraint_test.goapi/pkg/api/handler/audit_entry.goapi/pkg/api/handler/audit_entry_test.goapi/pkg/api/handler/dpuextensionservice.goapi/pkg/api/handler/dpuextensionservice_test.goapi/pkg/api/handler/expectedmachine.goapi/pkg/api/handler/expectedmachine_test.goapi/pkg/api/handler/expectedpowershelf.goapi/pkg/api/handler/expectedpowershelf_test.goapi/pkg/api/handler/expectedswitch.goapi/pkg/api/handler/expectedswitch_test.goapi/pkg/api/handler/fabric.goapi/pkg/api/handler/fabric_test.goapi/pkg/api/handler/infinibandinterface.goapi/pkg/api/handler/infinibandinterface_test.goapi/pkg/api/handler/infinibandpartition.goapi/pkg/api/handler/infinibandpartition_test.goapi/pkg/api/handler/infrastructureprovider.goapi/pkg/api/handler/infrastructureprovider_test.goapi/pkg/api/handler/instance.goapi/pkg/api/handler/instance_test.goapi/pkg/api/handler/instancebatch.goapi/pkg/api/handler/instancebatch_test.goapi/pkg/api/handler/instancetype.goapi/pkg/api/handler/instancetype_test.goapi/pkg/api/handler/interface.goapi/pkg/api/handler/interface_test.goapi/pkg/api/handler/ipblock.goapi/pkg/api/handler/ipblock_test.goapi/pkg/api/handler/machine.goapi/pkg/api/handler/machine_test.goapi/pkg/api/handler/machinecapability.goapi/pkg/api/handler/machinecapability_test.goapi/pkg/api/handler/machineinstancetype.goapi/pkg/api/handler/machineinstancetype_test.goapi/pkg/api/handler/machinevalidation.goapi/pkg/api/handler/machinevalidation_test.goapi/pkg/api/handler/metadata.goapi/pkg/api/handler/networksecuritygroup.goapi/pkg/api/handler/networksecuritygroup_test.goapi/pkg/api/handler/nvlinkinterface.goapi/pkg/api/handler/nvlinkinterface_test.goapi/pkg/api/handler/nvlinklogicalpartition.goapi/pkg/api/handler/nvlinklogicalpartition_test.goapi/pkg/api/handler/operatingsystem.goapi/pkg/api/handler/operatingsystem_test.goapi/pkg/api/handler/rack.goapi/pkg/api/handler/rack_test.goapi/pkg/api/handler/serviceaccount.goapi/pkg/api/handler/serviceaccount_test.goapi/pkg/api/handler/site.goapi/pkg/api/handler/site_test.goapi/pkg/api/handler/sku.goapi/pkg/api/handler/sku_test.goapi/pkg/api/handler/sshkey.goapi/pkg/api/handler/sshkey_test.goapi/pkg/api/handler/sshkeygroup.goapi/pkg/api/handler/sshkeygroup_test.goapi/pkg/api/handler/stats.goapi/pkg/api/handler/stats_test.goapi/pkg/api/handler/subnet.goapi/pkg/api/handler/subnet_test.goapi/pkg/api/handler/task.goapi/pkg/api/handler/task_test.goapi/pkg/api/handler/tenant.goapi/pkg/api/handler/tenant_test.goapi/pkg/api/handler/tenantaccount.goapi/pkg/api/handler/tenantaccount_test.goapi/pkg/api/handler/tray.goapi/pkg/api/handler/tray_test.goapi/pkg/api/handler/user.goapi/pkg/api/handler/util/common/common.goapi/pkg/api/handler/util/common/common_test.goapi/pkg/api/handler/util/common/testing.goapi/pkg/api/handler/vpc.goapi/pkg/api/handler/vpc_test.goapi/pkg/api/handler/vpcpeering.goapi/pkg/api/handler/vpcpeering_test.goapi/pkg/api/handler/vpcprefix.goapi/pkg/api/handler/vpcprefix_test.goapi/pkg/api/model/audit_entry_test.goapi/pkg/api/model/dpuextensionservice.goapi/pkg/api/model/expectedmachine_test.goapi/pkg/api/model/health.goapi/pkg/api/model/infrastructureprovider.goapi/pkg/api/model/ipblock.goapi/pkg/api/model/machine.goapi/pkg/api/model/machine_test.goapi/pkg/api/model/metadata.goapi/pkg/api/model/networksecuritygroup.goapi/pkg/api/model/networksecuritygroup_test.goapi/pkg/api/model/rack_test.goapi/pkg/api/model/site.goapi/pkg/api/model/site_test.goapi/pkg/api/model/subnet.goapi/pkg/api/model/tenant.goapi/pkg/api/model/tenantaccount.goapi/pkg/api/model/tray_test.goapi/pkg/api/model/user.goapi/pkg/api/model/vpc.goapi/pkg/api/model/vpcpeering.goauth/README.mdauth/pkg/api/handler/keycloak_test.goauth/pkg/authentication/keycloak_test.goauth/pkg/authentication/middleware_test.goauth/pkg/authorization/org.goauth/pkg/authorization/org_test.goauth/pkg/config/jwks.goauth/pkg/config/jwks_test.goauth/pkg/config/keycloak_test.goauth/pkg/config/roles.goauth/pkg/core/claim/keycloak_test.goauth/pkg/processors/custom_test.goauth/pkg/testing/constants.gocert-manager/pkg/certs/cli.gocert-manager/pkg/pki/ca_helpers_test.gocert-manager/pkg/pki/pki_test.gocli/README.mdcli/cmd/cli/main.gocli/pkg/app.gocli/pkg/auth.gocli/pkg/auth_test.gocli/pkg/client.gocli/pkg/commands.gocli/pkg/commands_test.gocli/pkg/config.gocli/pkg/config_test.gocli/pkg/output.gocli/pkg/spec.gocli/pkg/spec_test.gocli/tui/commands.gocli/tui/commands_test.gocli/tui/configselector.gocli/tui/repl.gocli/tui/session.gocommon/pkg/roles/roles.godb/cmd/migrations/main.godb/create-postgres-extensions.shdb/pkg/db/model/audit_entry_test.godb/pkg/db/model/expectedmachine.godb/pkg/db/model/expectedpowershelf.godb/pkg/db/model/expectedswitch.godb/pkg/db/model/sku.godb/pkg/db/model/tenantsite_test.godb/pkg/db/model/user_test.godb/pkg/db/model/vpc.godb/pkg/migrations/migrations_test.godb/pkg/util/testing.godb/scripts/README.mddb/scripts/setup.sqldeploy/INSTALLATION.mddeploy/README.mddeploy/kind/cluster-config.yamldeploy/kustomize/base/api/configmap.yamldeploy/kustomize/base/api/deployment.yamldeploy/kustomize/base/api/service.yamldeploy/kustomize/base/cert-manager-io/cluster-issuer.yamldeploy/kustomize/base/cert-manager/deployment.yamldeploy/kustomize/base/cert-manager/rbac.yamldeploy/kustomize/base/cert-manager/service.yamldeploy/kustomize/base/common/db-creds.yamldeploy/kustomize/base/common/image-pull-secret.yamldeploy/kustomize/base/common/keycloak-client-secret.yamldeploy/kustomize/base/common/kustomization.yamldeploy/kustomize/base/common/temporal-client-cloud-cert.yamldeploy/kustomize/base/common/temporal-encryption-key.yamldeploy/kustomize/base/db/job.yamldeploy/kustomize/base/keycloak/kustomization.yamldeploy/kustomize/base/keycloak/realm-configmap.yamldeploy/kustomize/base/mock-core/deployment.yamldeploy/kustomize/base/mock-core/kustomization.yamldeploy/kustomize/base/mock-core/service.yamldeploy/kustomize/base/postgres/init-configmap.yamldeploy/kustomize/base/postgres/kustomization.yamldeploy/kustomize/base/site-agent/certificate.yamldeploy/kustomize/base/site-agent/configmap.yamldeploy/kustomize/base/site-agent/rbac.yamldeploy/kustomize/base/site-agent/service.yamldeploy/kustomize/base/site-agent/site-registration-secret.yamldeploy/kustomize/base/site-agent/statefulset.yamldeploy/kustomize/base/site-agent/temporal-client-site-agent-certs.yamldeploy/kustomize/base/temporal-helm/certificates.yamldeploy/kustomize/base/workflow/configmap.yamldeploy/kustomize/base/workflow/deployment.yamldeploy/kustomize/overlays/api/kustomization.yamldeploy/kustomize/overlays/cert-manager/kustomization.yamldeploy/kustomize/overlays/db/kustomization.yamldeploy/kustomize/overlays/mock-core/kustomization.yamldeploy/kustomize/overlays/site-agent/kustomization.yamldeploy/kustomize/overlays/site-manager/kustomization.yamldeploy/kustomize/overlays/workflow/kustomization.yamldocker/local/Dockerfile.nico-nsmdocker/local/Dockerfile.nico-psmdocker/local/Dockerfile.nico-rest-apidocker/local/Dockerfile.nico-rest-cert-managerdocker/local/Dockerfile.nico-rest-dbdocker/local/Dockerfile.nico-rest-mock-coredocker/local/Dockerfile.nico-rest-site-agentdocker/local/Dockerfile.nico-rest-site-managerdocker/local/Dockerfile.nico-rest-workflowdocker/local/Dockerfile.nico-rladocker/production/Dockerfile.nico-nsmdocker/production/Dockerfile.nico-psmdocker/production/Dockerfile.nico-rest-apidocker/production/Dockerfile.nico-rest-cert-managerdocker/production/Dockerfile.nico-rest-dbdocker/production/Dockerfile.nico-rest-site-agentdocker/production/Dockerfile.nico-rest-site-managerdocker/production/Dockerfile.nico-rest-workflowdocker/production/Dockerfile.nico-rladocker/production/README.mdhelm/README.mdhelm/charts/carbide-rest-site-agent/templates/configmap.yamlhelm/charts/carbide-rest-site-agent/templates/rbac.yamlhelm/charts/carbide-rest/Chart.yamlhelm/charts/carbide-rest/charts/carbide-rest-cert-manager/Chart.yamlhelm/charts/carbide-rest/charts/carbide-rest-common/Chart.yamlhelm/charts/carbide-rest/charts/carbide-rest-site-manager/Chart.yamlhelm/charts/carbide-rest/charts/carbide-rest-site-manager/templates/rbac.yamlhelm/charts/carbide-rest/charts/carbide-rest-site-manager/templates/service.yamlhelm/charts/carbide-rest/templates/NOTES.txthelm/charts/nico-rest-site-agent/Chart.yamlhelm/charts/nico-rest-site-agent/templates/_helpers.tplhelm/charts/nico-rest-site-agent/templates/bootstrap.yamlhelm/charts/nico-rest-site-agent/templates/certificate.yamlhelm/charts/nico-rest-site-agent/templates/configmap.yamlhelm/charts/nico-rest-site-agent/templates/rbac.yamlhelm/charts/nico-rest-site-agent/templates/service.yamlhelm/charts/nico-rest-site-agent/templates/statefulset.yamlhelm/charts/nico-rest-site-agent/templates/temporal-certs-secret.yamlhelm/charts/nico-rest-site-agent/values.yamlhelm/charts/nico-rest/.helmignorehelm/charts/nico-rest/Chart.yamlhelm/charts/nico-rest/charts/nico-rest-api/Chart.yamlhelm/charts/nico-rest/charts/nico-rest-api/templates/_helpers.tplhelm/charts/nico-rest/charts/nico-rest-api/templates/configmap.yamlhelm/charts/nico-rest/charts/nico-rest-api/templates/deployment.yamlhelm/charts/nico-rest/charts/nico-rest-api/templates/service.yamlhelm/charts/nico-rest/charts/nico-rest-api/values.yamlhelm/charts/nico-rest/charts/nico-rest-cert-manager/Chart.yamlhelm/charts/nico-rest/charts/nico-rest-cert-manager/templates/_helpers.tplhelm/charts/nico-rest/charts/nico-rest-cert-manager/templates/deployment.yamlhelm/charts/nico-rest/charts/nico-rest-cert-manager/templates/rbac.yamlhelm/charts/nico-rest/charts/nico-rest-cert-manager/templates/service.yamlhelm/charts/nico-rest/charts/nico-rest-cert-manager/values.yamlhelm/charts/nico-rest/charts/nico-rest-common/Chart.yamlhelm/charts/nico-rest/charts/nico-rest-common/templates/_helpers.tplhelm/charts/nico-rest/charts/nico-rest-common/templates/certificate.yamlhelm/charts/nico-rest/charts/nico-rest-common/templates/secrets.yamlhelm/charts/nico-rest/charts/nico-rest-common/values.yamlhelm/charts/nico-rest/charts/nico-rest-db/Chart.yamlhelm/charts/nico-rest/charts/nico-rest-db/templates/_helpers.tplhelm/charts/nico-rest/charts/nico-rest-db/templates/migration-job.yamlhelm/charts/nico-rest/charts/nico-rest-db/values.yamlhelm/charts/nico-rest/charts/nico-rest-site-manager/Chart.yamlhelm/charts/nico-rest/charts/nico-rest-site-manager/crds/site.yamlhelm/charts/nico-rest/charts/nico-rest-site-manager/templates/_helpers.tplhelm/charts/nico-rest/charts/nico-rest-site-manager/templates/certificate.yamlhelm/charts/nico-rest/charts/nico-rest-site-manager/templates/deployment.yamlhelm/charts/nico-rest/charts/nico-rest-site-manager/templates/rbac.yamlhelm/charts/nico-rest/charts/nico-rest-site-manager/templates/service.yamlhelm/charts/nico-rest/charts/nico-rest-site-manager/values.yamlhelm/charts/nico-rest/charts/nico-rest-workflow/Chart.yamlhelm/charts/nico-rest/charts/nico-rest-workflow/templates/_helpers.tplhelm/charts/nico-rest/charts/nico-rest-workflow/templates/configmap.yamlhelm/charts/nico-rest/charts/nico-rest-workflow/templates/deployment-cloud-worker.yamlhelm/charts/nico-rest/charts/nico-rest-workflow/templates/deployment-site-worker.yamlhelm/charts/nico-rest/charts/nico-rest-workflow/values.yamlhelm/charts/nico-rest/templates/NOTES.txthelm/charts/nico-rest/values.yamlipam/bun_test.go
💤 Files with no reviewable changes (9)
- helm/charts/carbide-rest/Chart.yaml
- helm/charts/carbide-rest/charts/carbide-rest-site-manager/Chart.yaml
- helm/charts/carbide-rest/charts/carbide-rest-common/Chart.yaml
- helm/charts/carbide-rest-site-agent/templates/configmap.yaml
- helm/charts/carbide-rest/charts/carbide-rest-cert-manager/Chart.yaml
- helm/charts/carbide-rest/templates/NOTES.txt
- helm/charts/carbide-rest-site-agent/templates/rbac.yaml
- helm/charts/carbide-rest/charts/carbide-rest-site-manager/templates/service.yaml
- helm/charts/carbide-rest/charts/carbide-rest-site-manager/templates/rbac.yaml
| "service": "nico-psm", | ||
| "image_ref": "${{ needs.build-nico-psm.outputs.image_ref }}", | ||
| "binary_resource": "${{ needs.build-nico-psm.ngc_binary_resource }}", | ||
| "image_resource": "${{ needs.build-nico-psm.outputs.ngc_image_resource }}" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify all needs-based artifact fields use .outputs.<name>
rg -nP 'needs\.build-[a-z0-9-]+\.(image_ref|ngc_binary_resource|ngc_image_resource)' .github/workflows/build-push-docker.ymlRepository: NVIDIA/infra-controller-rest
Length of output: 160
🏁 Script executed:
# Check the full context around lines 384-387
sed -n '375,395p' .github/workflows/build-push-docker.ymlRepository: NVIDIA/infra-controller-rest
Length of output: 1162
🏁 Script executed:
# Search for the build-nico-psm job definition and its outputs
rg -n 'build-nico-psm' .github/workflows/build-push-docker.yml | head -20Repository: NVIDIA/infra-controller-rest
Length of output: 505
🏁 Script executed:
# Verify the job definition has outputs declared
rg -A 10 'jobs:' .github/workflows/build-push-docker.yml | grep -A 10 'build-nico-psm'Repository: NVIDIA/infra-controller-rest
Length of output: 54
Correct the output accessor for nico-psm binary resource.
Line 386 references needs.build-nico-psm.ngc_binary_resource without the .outputs accessor. This produces an invalid or empty value in the aggregated artifacts JSON. All other services in this block correctly use the .outputs syntax for accessing job outputs.
Proposed fix
- "binary_resource": "${{ needs.build-nico-psm.ngc_binary_resource }}",
+ "binary_resource": "${{ needs.build-nico-psm.outputs.ngc_binary_resource }}",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "service": "nico-psm", | |
| "image_ref": "${{ needs.build-nico-psm.outputs.image_ref }}", | |
| "binary_resource": "${{ needs.build-nico-psm.ngc_binary_resource }}", | |
| "image_resource": "${{ needs.build-nico-psm.outputs.ngc_image_resource }}" | |
| "service": "nico-psm", | |
| "image_ref": "${{ needs.build-nico-psm.outputs.image_ref }}", | |
| "binary_resource": "${{ needs.build-nico-psm.outputs.ngc_binary_resource }}", | |
| "image_resource": "${{ needs.build-nico-psm.outputs.ngc_image_resource }}" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/build-push-docker.yml around lines 384 - 387, The
aggregated artifacts JSON uses an incorrect output accessor for the nico-psm
binary resource: change the reference from
needs.build-nico-psm.ngc_binary_resource to use the outputs object
(needs.build-nico-psm.outputs.ngc_binary_resource) so it matches the other
services and returns the actual job output; update the "binary_resource" value
for the "nico-psm" service accordingly.
| # Helm chart version: read from Chart.yaml (umbrella chart as source of truth) | ||
| HELM_VERSION=$(grep '^version:' helm/charts/carbide-rest/Chart.yaml | awk '{print $2}') | ||
| HELM_VERSION=$(grep '^version:' helm/charts/nico-rest/Chart.yaml | awk '{print $2}') | ||
| echo "helm_version=$HELM_VERSION" >> $GITHUB_OUTPUT |
There was a problem hiding this comment.
Fail fast if Helm version cannot be resolved.
If the chart file path changes or parsing fails, HELM_VERSION becomes empty and downstream jobs get invalid metadata. Add explicit file/value validation.
Proposed change
- # Helm chart version: read from Chart.yaml (umbrella chart as source of truth)
- HELM_VERSION=$(grep '^version:' helm/charts/nico-rest/Chart.yaml | awk '{print $2}')
+ # Helm chart version: read from Chart.yaml (umbrella chart as source of truth)
+ CHART_FILE="helm/charts/nico-rest/Chart.yaml"
+ if [ ! -f "$CHART_FILE" ]; then
+ echo "ERROR: $CHART_FILE not found"
+ exit 1
+ fi
+ HELM_VERSION=$(awk -F': *' '$1=="version"{print $2; exit}' "$CHART_FILE")
+ if [ -z "$HELM_VERSION" ]; then
+ echo "ERROR: failed to parse version from $CHART_FILE"
+ exit 1
+ fi
echo "helm_version=$HELM_VERSION" >> $GITHUB_OUTPUT📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Helm chart version: read from Chart.yaml (umbrella chart as source of truth) | |
| HELM_VERSION=$(grep '^version:' helm/charts/carbide-rest/Chart.yaml | awk '{print $2}') | |
| HELM_VERSION=$(grep '^version:' helm/charts/nico-rest/Chart.yaml | awk '{print $2}') | |
| echo "helm_version=$HELM_VERSION" >> $GITHUB_OUTPUT | |
| # Helm chart version: read from Chart.yaml (umbrella chart as source of truth) | |
| CHART_FILE="helm/charts/nico-rest/Chart.yaml" | |
| if [ ! -f "$CHART_FILE" ]; then | |
| echo "ERROR: $CHART_FILE not found" | |
| exit 1 | |
| fi | |
| HELM_VERSION=$(awk -F': *' '$1=="version"{print $2; exit}' "$CHART_FILE") | |
| if [ -z "$HELM_VERSION" ]; then | |
| echo "ERROR: failed to parse version from $CHART_FILE" | |
| exit 1 | |
| fi | |
| echo "helm_version=$HELM_VERSION" >> $GITHUB_OUTPUT |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/prepare-build-info.yml around lines 143 - 145, After
extracting HELM_VERSION from helm/charts/nico-rest/Chart.yaml, add explicit
validation: verify the Chart.yaml file exists/readable and that the HELM_VERSION
variable is non-empty; if either check fails, print a clear error to stderr
(including the path and the parsed value) and exit non‑zero so the workflow
fails fast instead of writing an empty helm_version to GITHUB_OUTPUT. Reference
the HELM_VERSION variable and the existing grep/awk extraction block so you
insert the file-existence check and a conditional that aborts with exit 1 when
HELM_VERSION is empty.
|
|
||
| build/ | ||
| /carbide-rest | ||
| /nico-rest |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Search for any lingering references to carbide-rest across the repository.
# Search for carbide-rest references (case-insensitive to catch variations)
rg -i 'carbide-rest' --type-not gitignore -C2Repository: NVIDIA/infra-controller-rest
Length of output: 106
🏁 Script executed:
#!/bin/bash
# Search for carbide-rest references with corrected syntax
rg -i 'carbide-rest' --glob '!.gitignore' -C2Repository: NVIDIA/infra-controller-rest
Length of output: 50386
🏁 Script executed:
# Also verify the .gitignore change
cat -n .gitignore | head -25Repository: NVIDIA/infra-controller-rest
Length of output: 966
🏁 Script executed:
# Check for any carbide-rest or carbide references in the entire codebase
rg -i 'carbide' | head -50Repository: NVIDIA/infra-controller-rest
Length of output: 5283
Incomplete rebranding—critical files still reference carbide-rest.
While the .gitignore change is syntactically sound, the renaming effort is incomplete across the codebase. Multiple critical files retain outdated references:
site-manager/pkg/sitemgr/cli.go(line ~70): Default namespace still hardcoded to"carbide-rest"instead of"nico-rest"- Helm charts: Image names in values.yaml files still reference
carbide-rest-*variants (nico-rest-site-agent,nico-rest-db, etc.) - Kubernetes manifests: Service, RBAC, Certificate, and Deployment files in
deploy/kustomize/base/site-manager/still usecarbide-restidentifiers
The .gitignore entry correctly targets the new directory name, but the namespace default in cli.go contradicts this intent and will prevent proper deployment. Ensure all generation scripts, configuration defaults, and infrastructure definitions are updated to reflect the new namespace.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.gitignore at line 17, The repo rename to "nico-rest" is incomplete: update
all remaining hardcoded "carbide-rest" identifiers to "nico-rest" so defaults
and manifests align with the new .gitignore; specifically change the default
namespace value in site-manager/pkg/sitemgr/cli.go (where the default namespace
is set to "carbide-rest") to "nico-rest", update image names in Helm values.yaml
files (replace any `carbide-rest-*` with `nico-rest-*`), and replace identifiers
in Kubernetes manifests under deploy/kustomize/base/site-manager/ (Services,
RBAC, Certificates, Deployments) to use "nico-rest" consistently; run a
repository-wide search for "carbide-rest" to ensure no remaining references and
update any generation scripts or configuration defaults that emit the old name.
| func InitMetricsServer(e *echo.Echo, cfg *config.Config) *echo.Echo { | ||
| ep := echo.New() | ||
| ep.HideBanner = true | ||
|
|
||
| conf := echoPrometheus.MiddlewareConfig{ | ||
| Subsystem: "carbide_api", | ||
| Subsystem: fmt.Sprintf("%s_api", cfg.GetAPIName()), | ||
| Skipper: api.MetricsURLSkipper, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Locate GetAPIName and apiName defaults =="
rg -n --type=go -C3 '\bGetAPIName\s*\('
rg -n --type=go -C2 'apiName\s*[:=]\s*"[^"]+"'
echo
echo "== Validate derived <apiName>_api against Prometheus naming rules =="
python - <<'PY'
import re
from pathlib import Path
name_assign = re.compile(r'apiName\s*[:=]\s*"([^"]+)"')
valid = re.compile(r'^[a-zA-Z_:][a-zA-Z0-9_:]*$')
found = []
for p in Path(".").rglob("*.go"):
try:
text = p.read_text(encoding="utf-8")
except Exception:
continue
for m in name_assign.finditer(text):
api_name = m.group(1)
subsystem = f"{api_name}_api"
found.append((str(p), api_name, subsystem, bool(valid.match(subsystem))))
if not found:
print("No explicit apiName literals found; inspect GetAPIName source manually.")
else:
for path, api_name, subsystem, ok in found:
status = "VALID" if ok else "INVALID"
print(f"{status}: apiName='{api_name}' -> subsystem='{subsystem}' ({path})")
PYRepository: NVIDIA/infra-controller-rest
Length of output: 6406
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Find ConfigAPIName definition =="
rg -n --type=go 'ConfigAPIName' -B2 -A2
echo
echo "== Check config validation and how ConfigAPIName is set =="
rg -n --type=go 'GetString.*ConfigAPIName|viper.*GetString' api/internal/config/config.go -B3 -A3
echo
echo "== Look for config file examples or environment variable binding =="
fd -t f '\.yml|\.yaml|\.toml|\.env' | head -20Repository: NVIDIA/infra-controller-rest
Length of output: 2358
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Examine echoPrometheus library usage and validation =="
rg -n --type=go 'echoPrometheus|MiddlewareConfig' api/internal/server/server.go -B5 -A5
echo
echo "== Check if there are any tests with invalid API names =="
rg -n --type=go 'GetAPIName|apiName' api/internal/config/config.go -B5 -A10Repository: NVIDIA/infra-controller-rest
Length of output: 2711
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Locate and check viper configuration binding =="
fd -t f config.go | xargs grep -l "viper\|venv" | head -5
rg -n 'BindEnv|BindFlag|Set.*ConfigAPIName' --type=go
echo
echo "== Check imports to understand config library =="
head -30 api/internal/config/config.go | grep -E '^import|viper'Repository: NVIDIA/infra-controller-rest
Length of output: 320
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Check actual config files for api.name values =="
rg -n 'api\.name|api:.*name' api/config.yaml deploy/kustomize/base/api/configmap.yaml 2>/dev/null || echo "Pattern not found or files missing"
echo
echo "== Check if there's any validation logic for api name =="
rg -n --type=go 'api\.name|APIName' api/internal/config/config.go -B10 -A2 | head -50
echo
echo "== Look for any helper validation functions =="
rg -n --type=go 'sanitize|validate|valid' api/internal/config/config.go --ignore-case
echo
echo "== Check if echoPrometheus returns errors on invalid subsystem =="
rg -n --type=go 'NewMiddleware|error|Error' api/internal/server/server.go | grep -A5 -B5 'echoPrometheus\|InitMetrics'Repository: NVIDIA/infra-controller-rest
Length of output: 4033
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Read actual config.yaml to see default/example api name =="
cat api/config.yaml 2>/dev/null | head -100
echo
echo "== Check deployment configs =="
grep -r "api.name\|api:\|apiName" deploy/kustomize/base/api/ 2>/dev/null | head -20Repository: NVIDIA/infra-controller-rest
Length of output: 2175
🏁 Script executed:
#!/bin/bash
set -eulo pipefail
echo "== Check if there are tests for invalid API names or metric registration =="
rg -n --type=go 'GetAPIName|Prometheus|subsystem' api/internal/server/server_test.go -B3 -A3
echo
echo "== Check if echoPrometheus.NewMiddlewareWithConfig has error handling =="
rg -n 'NewMiddlewareWithConfig' api/internal/server/server.go -B2 -A5
echo
echo "== Search for any Prometheus validation or sanitization =="
rg -n --type=go 'Prometheus|metric|subsystem' . -i --type=go | grep -i 'valid\|sanitize\|clean\|format'Repository: NVIDIA/infra-controller-rest
Length of output: 2780
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Check Prometheus metric naming rules documentation in codebase =="
rg -n --type=go 'metric.*name|subsystem' . -B2 -A2 | grep -i 'rule\|valid\|format\|pattern'
echo
echo "== Look for any github.com/prometheus imports or usages =="
rg -n 'prometheus|metric' api/internal/server/server.go -i
echo
echo "== Check if there's existing sanitization anywhere else =="
rg -n --type=go 'Sanitize|Clean|Format' . | head -20Repository: NVIDIA/infra-controller-rest
Length of output: 3810
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Check what happens when NewMiddlewareWithConfig is called - look at full InitMetricsServer =="
sed -n '296,315p' api/internal/server/server.go
echo
echo "== Search for any comment or documentation about API name format =="
rg -n -B5 'api\.name|ConfigAPIName|GetAPIName' api/internal/config/config.go | grep -A5 -B5 '//'Repository: NVIDIA/infra-controller-rest
Length of output: 1666
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Check how echoPrometheus handles invalid subsystem names =="
rg -n 'type.*MiddlewareConfig|Subsystem' --type=go api/internal/server/server.go
echo
echo "== Verify if Validate() method in Config covers api.name =="
sed -n '321,400p' api/internal/config/config.go
echo
echo "== Search for any environment variable binding or override =="
rg -n 'BindEnv|viper.*Bind|SetEnv' api/internal/config/config.goRepository: NVIDIA/infra-controller-rest
Length of output: 2888
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Check if there's any documentation or comment about expected API name format =="
rg -n 'api\.name|GetAPIName' --type=go -B5 -A3 | grep -E '(//|/\*|\*/)' -A2 -B2
echo
echo "== Look for any test or example showing what api.name can be =="
find . -name '*.yaml' -o -name '*.yml' | xargs grep -l 'api' 2>/dev/null | head -5 | xargs grep 'api:' -A5 2>/dev/null
echo
echo "== Check if echoPrometheus library itself validates or sanitizes =="
rg -n 'github\.com/labstack/echo-contrib' . 2>/dev/null | head -3Repository: NVIDIA/infra-controller-rest
Length of output: 6582
Sanitize cfg.GetAPIName() before using it as Prometheus subsystem.
Line 301 constructs the subsystem directly from config without validation. The config system accepts arbitrary values for api.name via YAML, environment variables, or programmatic override, with no format constraints. If the API name contains characters invalid in Prometheus metrics (e.g., hyphens), metric registration will fail at runtime.
Proposed hardening
func InitMetricsServer(e *echo.Echo, cfg *config.Config) *echo.Echo {
ep := echo.New()
ep.HideBanner = true
+ subsystem := sanitizePromSubsystem(cfg.GetAPIName()) + "_api"
conf := echoPrometheus.MiddlewareConfig{
- Subsystem: fmt.Sprintf("%s_api", cfg.GetAPIName()),
+ Subsystem: subsystem,
Skipper: api.MetricsURLSkipper,
}
@@
return ep
}
+
+func sanitizePromSubsystem(name string) string {
+ if name == "" {
+ return "api"
+ }
+ b := make([]rune, 0, len(name))
+ for _, r := range name {
+ ok := (r >= 'a' && r <= 'z') ||
+ (r >= 'A' && r <= 'Z') ||
+ (r >= '0' && r <= '9') ||
+ r == '_' || r == ':'
+ if ok {
+ b = append(b, r)
+ } else {
+ b = append(b, '_')
+ }
+ }
+ // First rune cannot be a digit.
+ if len(b) > 0 && b[0] >= '0' && b[0] <= '9' {
+ b = append([]rune{'_'}, b...)
+ }
+ return string(b)
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func InitMetricsServer(e *echo.Echo, cfg *config.Config) *echo.Echo { | |
| ep := echo.New() | |
| ep.HideBanner = true | |
| conf := echoPrometheus.MiddlewareConfig{ | |
| Subsystem: "carbide_api", | |
| Subsystem: fmt.Sprintf("%s_api", cfg.GetAPIName()), | |
| Skipper: api.MetricsURLSkipper, | |
| func InitMetricsServer(e *echo.Echo, cfg *config.Config) *echo.Echo { | |
| ep := echo.New() | |
| ep.HideBanner = true | |
| subsystem := sanitizePromSubsystem(cfg.GetAPIName()) + "_api" | |
| conf := echoPrometheus.MiddlewareConfig{ | |
| Subsystem: subsystem, | |
| Skipper: api.MetricsURLSkipper, | |
| } | |
| return ep | |
| } | |
| func sanitizePromSubsystem(name string) string { | |
| if name == "" { | |
| return "api" | |
| } | |
| b := make([]rune, 0, len(name)) | |
| for _, r := range name { | |
| ok := (r >= 'a' && r <= 'z') || | |
| (r >= 'A' && r <= 'Z') || | |
| (r >= '0' && r <= '9') || | |
| r == '_' || r == ':' | |
| if ok { | |
| b = append(b, r) | |
| } else { | |
| b = append(b, '_') | |
| } | |
| } | |
| // First rune cannot be a digit. | |
| if len(b) > 0 && b[0] >= '0' && b[0] <= '9' { | |
| b = append([]rune{'_'}, b...) | |
| } | |
| return string(b) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/internal/server/server.go` around lines 296 - 302, InitMetricsServer
currently uses cfg.GetAPIName() directly for the Prometheus Subsystem
(echoPrometheus.MiddlewareConfig.Subsystem) which can contain invalid chars;
update InitMetricsServer to sanitize the API name before use by normalizing to a
safe Prometheus subsystem token (e.g., replace or remove invalid characters such
as hyphens, spaces, and non-alphanumerics, collapse consecutive separators, and
ensure it starts with a letter or underscore), and fall back to a fixed safe
default like "api" if the sanitized result is empty; apply this transformation
just before creating the MiddlewareConfig.Subsystem so all calls that reference
cfg.GetAPIName() in InitMetricsServer use the sanitized value.
| // @Param org path string true "Name of NGC organization" | ||
| // @Param site_id query string true "Filter by site ID" | ||
| // @Success 200 {object} model.APIUser | ||
| // @Router /v2/org/{org}/carbide/machine-capability [get] | ||
| // @Router /v2/org/{org}/nico/machine-capability [get] |
There was a problem hiding this comment.
Swagger annotation block is still inconsistent with the handler contract
Line 64 correctly moves the route to /nico, but the same doc block still advertises mismatched request/response metadata (for example, Line 62 uses site_id while handler reads siteId, and Line 63 advertises model.APIUser although this endpoint returns machine capabilities). Please update this annotation block fully so generated OpenAPI stays accurate.
Suggested annotation fix
// Handle godoc
-// `@Summary` Return information about the current user
-// `@Description` Get basic information about the user making the request
-// `@Tags` user
+// `@Summary` Return distinct machine capabilities
+// `@Description` Get distinct machine capabilities for an organization, optionally filtered
+// `@Tags` machine-capability
// `@Accept` */*
// `@Produce` json
// `@Security` ApiKeyAuth
// `@Param` org path string true "Name of NGC organization"
-// `@Param` site_id query string true "Filter by site ID"
-// `@Success` 200 {object} model.APIUser
+// `@Param` siteId query string false "Filter by site ID"
+// `@Success` 200 {array} model.APIMachineCapability
// `@Router` /v2/org/{org}/nico/machine-capability [get]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/pkg/api/handler/machinecapability.go` around lines 61 - 64, The Swagger
doc block above the handler is inconsistent: update the annotation in
machinecapability.go to match the handler contract by using the exact parameter
names and response shape the handler expects—change the query param from
"site_id" to "siteId" (keep the path param "org" as-is), ensure the route stays
"/v2/org/{org}/nico/machine-capability [get]", and change the Success line to
reflect the actual return type (e.g., 200 {array} model.MachineCapability or the
correct model name the handler returns) so generated OpenAPI matches the
handler.
| name: nico | ||
| user: nico | ||
| password: nico |
There was a problem hiding this comment.
Move database credentials out of ConfigMap data.
db.user and db.password are plaintext in a ConfigMap. This weakens secret handling and broadens accidental exposure risk. Use a Kubernetes Secret for credentials and inject them via mounted file/env reference instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/kustomize/base/api/configmap.yaml` around lines 42 - 44, ConfigMap
currently contains plaintext DB credentials (keys "user" and "password" under
the ConfigMap with name "nico"); remove those keys from the ConfigMap data and
create a Kubernetes Secret containing db.user and db.password instead, then
update any workloads that consume this ConfigMap to read credentials from the
Secret (via env/secretKeyRef or a mounted volume) while keeping non-sensitive
values like "name" in the ConfigMap; search for usages of the ConfigMap name
"nico" and change the Pod/Deployment env or volume references to use the new
Secret to inject db.user and db.password securely.
| name: nico | ||
| user: nico | ||
| password: nico | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect workflow configmaps and overlays for password overrides/patches.
rg -n -C3 'nico-rest-workflow-config|db:|name:\s*nico|user:\s*nico|password:\s*nico|secret|patches|patchesStrategicMerge' --glob '*.y*ml'Repository: NVIDIA/infra-controller-rest
Length of output: 50385
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find kustomize structure and overlays
find . -name "kustomization.yaml" -o -name "kustomization.yml" | grep -E "workflow|deploy" | head -20
# Look for deployment/overlay directories
fd -t d "overlay|prod|staging|dev" --path "*kustomize*" | head -20Repository: NVIDIA/infra-controller-rest
Length of output: 1350
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check the actual configmap file mentioned in the review
find . -path "*deploy/kustomize/base/workflow/configmap.yaml" -exec cat {} \;
# Also search more broadly for workflow configmaps
find . -path "*configmap.yaml" | grep workflowRepository: NVIDIA/infra-controller-rest
Length of output: 1877
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for kustomize patches that might override credentials
rg -l "patchesStrategicMerge|patchesJson6902" --glob "*/kustomization.*" | head -20
# Look for overlay structure with credential patches
rg -B5 -A10 "name.*nico|password.*nico" --glob "*/kustomization*" 2>/dev/null || trueRepository: NVIDIA/infra-controller-rest
Length of output: 221
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check the base kustomization.yaml for workflow
echo "=== BASE KUSTOMIZATION ==="
cat ./deploy/kustomize/base/workflow/kustomization.yaml
echo ""
echo "=== OVERLAY KUSTOMIZATION ==="
cat ./deploy/kustomize/overlays/workflow/kustomization.yamlRepository: NVIDIA/infra-controller-rest
Length of output: 2519
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for any patches or strategic merge patches related to workflow config
echo "=== Searching for patches in workflow directories ==="
find ./deploy/kustomize -path "*workflow*" -type f \( -name "*.yaml" -o -name "*.yml" \) -exec grep -l "patch\|secret\|name.*nico" {} \;
echo ""
echo "=== Check for overlay subdirectories ==="
ls -la ./deploy/kustomize/overlays/workflow/ 2>/dev/null || echo "No overlay subdirs"Repository: NVIDIA/infra-controller-rest
Length of output: 500
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check the deployment to see if it references secrets for DB credentials
echo "=== WORKFLOW DEPLOYMENT ==="
cat ./deploy/kustomize/base/workflow/deployment.yaml | head -100Repository: NVIDIA/infra-controller-rest
Length of output: 3071
Confirm that production overlays use secret-backed database credentials instead of the hardcoded base defaults.
The base configmap embeds password: nico inline. The current overlay references the base without patching this ConfigMap field, and the deployment reads configuration directly from the ConfigMap without external secret injection. Ensure production deployments override db.password through either:
- A Kustomize strategic merge patch on the ConfigMap data section
- Mounting database credentials as secrets and referencing them via environment variables in the deployment
This prevents shipping plaintext default credentials to production environments.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/kustomize/base/workflow/configmap.yaml` around lines 34 - 37, The base
ConfigMap exposes plaintext credentials (data.password = "nico") which must not
be used in production; update production overlays to override the base by either
(a) applying a Kustomize strategicMergePatch to remove/replace the ConfigMap
data.password with a reference to a Secret, or (b) create a Secret (via
secretGenerator or Kubernetes Secret manifest) and modify the Deployment spec to
read DB credentials from the Secret (use env valueFrom.secretKeyRef or
volumeMount with secretKeyRef) instead of reading directly from the base
ConfigMap; ensure you target the ConfigMap data key "password" and the
Deployment's env variables that currently reference the ConfigMap so production
no longer ships plaintext defaults.
| # -- Plain-text password. Ignored when secrets.dbCreds is set. | ||
| password: forge | ||
| password: nico |
There was a problem hiding this comment.
Avoid shipping a static default DB password in chart defaults.
Using a predictable default (nico) makes accidental insecure deployments too easy. Prefer empty-by-default and require secrets.dbCreds or environment-specific overrides.
Proposed change
db:
host: postgres.postgres
port: 5432
name: nico
user: nico
# -- Plain-text password. Ignored when secrets.dbCreds is set.
- password: nico
+ password: ""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # -- Plain-text password. Ignored when secrets.dbCreds is set. | |
| password: forge | |
| password: nico | |
| # -- Plain-text password. Ignored when secrets.dbCreds is set. | |
| password: "" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@helm/charts/nico-rest/charts/nico-rest-workflow/values.yaml` around lines 59
- 60, Remove the static default DB password "nico" from the chart values and
make the DB password empty-by-default (e.g., set the top-level value for
password to an empty string or placeholder) so deployments must explicitly
provide credentials via secrets.dbCreds or environment-specific overrides;
update the values.yaml entry for "password" and add a short comment clarifying
that a value must be provided through secrets.dbCreds (or via environment) and
that leaving it empty will fail validation/require a secret.
| @echo "Waiting for API service..." | ||
| kubectl -n carbide-rest rollout status deployment/carbide-rest-api --timeout=120s || true | ||
| kubectl -n nico-rest rollout status deployment/nico-rest-api --timeout=120s || true |
There was a problem hiding this comment.
Do not swallow API rollout failures in kind-apply.
kind-deploy is the happy-path local bring-up target, so masking the API rollout with || true lets the target report success while the control plane is still down. If only bootstrap-dependent workloads need soft failure handling, keep that exception scoped to those workloads.
Suggested fix
- kubectl -n nico-rest rollout status deployment/nico-rest-api --timeout=120s || true
+ kubectl -n nico-rest rollout status deployment/nico-rest-api --timeout=120s📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @echo "Waiting for API service..." | |
| kubectl -n carbide-rest rollout status deployment/carbide-rest-api --timeout=120s || true | |
| kubectl -n nico-rest rollout status deployment/nico-rest-api --timeout=120s || true | |
| `@echo` "Waiting for API service..." | |
| kubectl -n nico-rest rollout status deployment/nico-rest-api --timeout=120s |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Makefile` around lines 337 - 338, The Makefile currently swallows failures
for the API rollout by appending "|| true" to the kubectl rollout status for
deployment/nico-rest-api in the kind-apply/ kind-deploy flow; remove the "||
true" so the target fails on rollout errors, or if you only intend soft-failure
for bootstrap-only workloads, move the "|| true" to those specific kubectl
rollout lines instead of the nico-rest-api line (identify and edit the kubectl
-n nico-rest rollout status deployment/nico-rest-api --timeout=120s invocation
in the Makefile and either delete the "|| true" or relocate it to the
bootstrap-specific deployment rollout commands).
| @echo "Setting up common secrets..." | ||
| kubectl apply -k deploy/kustomize/base/common | ||
| kubectl -n carbide-rest wait --for=condition=Ready certificate/temporal-client-cloud-cert --timeout=240s || true | ||
| kubectl -n nico-rest wait --for=condition=Ready certificate/temporal-client-cloud-cert --timeout=240s || true |
There was a problem hiding this comment.
Fail fast when the Temporal client certificate is not ready.
The API and workflow workers depend on temporal-client-cloud-cert, so continuing past a failed wait only defers the real error into later rollout failures. This target should stop at the missing certificate.
Suggested fix
- kubectl -n nico-rest wait --for=condition=Ready certificate/temporal-client-cloud-cert --timeout=240s || true
+ kubectl -n nico-rest wait --for=condition=Ready certificate/temporal-client-cloud-cert --timeout=240s🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Makefile` around lines 477 - 479, The Makefile currently swallows failures
when waiting for the Temporal client cert because the kubectl wait command ends
with "|| true"; remove that fallback so the target fails fast if the certificate
(certificate/temporal-client-cloud-cert) is not Ready—either delete "|| true" or
replace it with "|| exit 1" (or rely on set -e) so the Makefile stops
immediately on a failed kubectl -n nico-rest wait for the certificate.
- nico-rest-site-agent: also mount mTLS certs at /etc/carbide so the
chart works against the not-yet-renamed carbide-rest-site-agent image
in addition to the new nico-rest-site-agent binary.
- Revert image.name to carbide-rest-* in nico-rest-{api,cert-manager,
db,site-manager,workflow} and nico-rest-site-agent values.yaml — the
published images still use the carbide-* names, so the chart renames
alone shouldn't change the image references.
- nico-rest-site-manager rbac: add the missing "create" verb on
forge.nvidia.io/sites (already present for nico.nvidia.io/sites);
required while legacy site agents still create Site CRs in the
forge group.
- nico-rest-api configmap: drop the trailing "-" on the
validateAuth include so the rendered ConfigMap has no leading
blank line.
Signed-off-by: Shayan Namaghi <snamaghi@nvidia.com>
9e19866 to
6c297ee
Compare
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-05-08 22:04:10 UTC | Commit: 6c297ee |
Description
Type of Change
Services Affected
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes