fix: Verify if Interfaces exists for deletion of NVLink and IB Partition#503
fix: Verify if Interfaces exists for deletion of NVLink and IB Partition#503hwadekar-nv wants to merge 5 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughDelete handlers for InfiniBandPartition and NVLinkLogicalPartition now perform pre-deletion checks: they load interface rows referencing the partition, aggregate referenced InstanceIDs, count Instances in active/pending/power states, and return HTTP 400 with a message when any active Instances block deletion. Tests are extended to cover these failure responses. ChangesReferential Integrity Checks
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
1488873 to
a49d437
Compare
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-05-07 20:02:19 UTC | Commit: a49d437 |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/pkg/api/handler/infinibandpartition_test.go`:
- Around line 1593-1606: The new 400 case for "error when partition has
InfiniBand Interfaces" must assert the error response contains the validation
field "infiniBandInterfaceIds"; update the test branch that handles
expectedStatus 400 (the table-driven case using ibpBlocked.ID and expectedStatus
http.StatusBadRequest) to decode the response body and assert that the
validation payload includes a non-empty "infiniBandInterfaceIds" entry (or key)
so the contract is validated; locate the test case in
infinibandpartition_test.go and add the assertion where other error body checks
are performed.
In `@api/pkg/api/handler/nvlinklogicalpartition_test.go`:
- Around line 1838-1851: The test case for "error when partition has NVLink
Interfaces" currently only asserts HTTP 400; update the test that exercises the
handler (in nvlinklogicalpartition_test.go around the case using nvllp4, tnOrg4,
tnu4) to also decode the response body and assert there is a validation entry
named "nvLinkInterfaceIds" that is present and non-empty; modify the
table-driven test's expectations (beside expectedErr/expectedStatus) to check
the response validation payload and fail if nvLinkInterfaceIds is missing or
empty.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: cb89ed18-b600-4ab8-b50f-a6de3622cca1
📒 Files selected for processing (5)
api/pkg/api/handler/Untitled-1.jsapi/pkg/api/handler/infinibandpartition.goapi/pkg/api/handler/infinibandpartition_test.goapi/pkg/api/handler/nvlinklogicalpartition.goapi/pkg/api/handler/nvlinklogicalpartition_test.go
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
72d574d to
dc343b9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/pkg/api/handler/infinibandpartition.go`:
- Around line 1030-1059: The current deletion guard only blocks deletion when
attached Instances are in specific "active" statuses (code around
InfiniBandInterfaceDAO.GetAll -> ibInterfaces, instanceIDSet,
InstanceDAO.GetCount -> activeCount), which lets a partition be deleted if
interfaces exist but their Instances are terminated; change the logic to reject
deletion as soon as any InfiniBandInterface rows exist for the
InfiniBandPartition (i.e., if len(ibInterfaces) > 0 return a 400 with a clear
message) instead of relying on Instance status counts, and add a regression test
that creates an InfiniBandInterface for an Instance in a terminated/inactive
status and asserts that DELETE of the partition is rejected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2f444181-03bc-4f1e-928f-3d6b8b70dc74
📒 Files selected for processing (5)
api/pkg/api/handler/Untitled-1.jsapi/pkg/api/handler/infinibandpartition.goapi/pkg/api/handler/infinibandpartition_test.goapi/pkg/api/handler/nvlinklogicalpartition.goapi/pkg/api/handler/nvlinklogicalpartition_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- api/pkg/api/handler/nvlinklogicalpartition.go
- api/pkg/api/handler/nvlinklogicalpartition_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/pkg/api/handler/nvlinklogicalpartition.go`:
- Around line 1280-1290: The GetCount call uses only InstanceIDs and can be
blocked by inactive instances; modify the instanceDAO.GetCount invocation to
include the same active/pending/power-state filters used by the IB partition
deletion guard by adding the corresponding Statuses and PowerStates fields to
the cdbm.InstanceFilterInput (alongside InstanceIDs from
instanceIDSet.ToSlice()) so only active/pending/power-state matches are counted
when deciding whether to block delete; update any variable names as needed to
mirror the IB partition guard's filter values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: fdd5bcd2-d7c5-428d-b3d7-e2e331bd0224
📒 Files selected for processing (2)
api/pkg/api/handler/infinibandpartition.goapi/pkg/api/handler/nvlinklogicalpartition.go
🚧 Files skipped from review as they are similar to previous changes (1)
- api/pkg/api/handler/infinibandpartition.go
| activeCount, err := instanceDAO.GetCount(ctx, nil, cdbm.InstanceFilterInput{ | ||
| InstanceIDs: instanceIDSet.ToSlice(), | ||
| }) | ||
| if err != nil { | ||
| logger.Error().Err(err).Msg("error retrieving count of Instances from DB for NVLink Logical Partition interface check") | ||
| return cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, "Failed to retrieve count of Instances for NVLink Logical Partition", nil) | ||
| } | ||
| if activeCount > 0 { | ||
| logger.Warn().Int("active_instance_count", activeCount).Msg("NVLink Logical Partition has active Instances associated via interfaces") | ||
| msg := fmt.Sprintf("%d active Instances are associated with this NVLink Logical Partition, unable to delete", activeCount) | ||
| return cutil.NewAPIErrorResponse(c, http.StatusBadRequest, msg, nil) |
There was a problem hiding this comment.
Filter by active instance states before returning the delete blocker.
The count query currently filters only by InstanceIDs, so any existing instance record can block deletion. That does not match the “active via interfaces” rule and can return false 400 responses for inactive/terminated instances.
Please add the same active/pending/power-state status filter used by the IB partition deletion guard to this GetCount call.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@api/pkg/api/handler/nvlinklogicalpartition.go` around lines 1280 - 1290, The
GetCount call uses only InstanceIDs and can be blocked by inactive instances;
modify the instanceDAO.GetCount invocation to include the same
active/pending/power-state filters used by the IB partition deletion guard by
adding the corresponding Statuses and PowerStates fields to the
cdbm.InstanceFilterInput (alongside InstanceIDs from instanceIDSet.ToSlice()) so
only active/pending/power-state matches are counted when deciding whether to
block delete; update any variable names as needed to mirror the IB partition
guard's filter values.
There was a problem hiding this comment.
I have removed status check as active/pending/power-state filters used by the IB partition, so we can skip this. as if Instances are exists in DB, we will block Pertitions to delete it
There was a problem hiding this comment.
@hwadekar-nv, understood — the intent is clear: any Instance record present in the DB, regardless of lifecycle state, constitutes a blocking condition for partition deletion.
One residual risk worth acknowledging: if your system ever retains Instance rows in a terminal state (e.g., Deleted, Terminated, or similar) without purging them, those records will permanently prevent partition deletion even when no operationally active Instance exists. This would require a manual DB cleanup or a compensating API call to unblock. If Instance records are always hard-deleted from the DB upon termination, this concern does not apply.
If that invariant holds, the current implementation is correct and consistent with your intent.
(*>_<)
(contemplating the instance lifecycle invariant...)
✏️ Learnings added
Learnt from: hwadekar-nv
Repo: NVIDIA/infra-controller-rest PR: 503
File: api/pkg/api/handler/nvlinklogicalpartition.go:1280-1290
Timestamp: 2026-05-07T22:47:14.528Z
Learning: In `api/pkg/api/handler/nvlinklogicalpartition.go`, the `DeleteNVLinkLogicalPartitionHandler` intentionally does NOT apply any status/power-state filters when checking for blocking Instances via NVLink Interfaces before deletion. Any Instance record present in the DB (regardless of lifecycle state) is treated as a block on partition deletion. This differs from the IB partition deletion guard which uses active/pending/power-state filters. The design relies on the assumption that Instance DB records are hard-deleted (not soft-deleted) when an instance terminates, so stale terminal-state records will not exist to create false blocks.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: thossain-nv
Repo: NVIDIA/ncx-infra-controller-rest PR: 257
File: api/pkg/api/handler/vpcpeering.go:203-262
Timestamp: 2026-03-18T01:37:21.156Z
Learning: In Go handler code under api/pkg/api/handler, RBAC should not short-circuit on the first matching role (e.g., if isProviderAdmin { ... } else { ... }). For dual-role service accounts (e.g., Latinum), an org may have both ProviderAdmin and TenantAdmin. Evaluate both authorization paths independently, accumulate errors per path, and only reject if all applicable paths fail. Implement a helper that checks each role path and collects errors, then decide based on whether any path grants access. Add tests covering mixed-role scenarios and ensure CI gates enforce this behavior across all relevant handlers in this directory.
Learnt from: bcavnvidia
Repo: NVIDIA/ncx-infra-controller-rest PR: 350
File: api/pkg/api/handler/vpc.go:303-313
Timestamp: 2026-04-06T21:09:55.214Z
Learning: In this codebase, the database interface is intentionally designed to persist the exact value it is given (including values derived from protobuf types, e.g., enum `.String()`). Repo-owned DB constants (e.g., `cdbm.VpcRoutingProfileTypeExternal`) are published for external callers/consumers, not as an indirection layer that the handler should use. Therefore, when reviewing this repository’s handlers, do not flag as a coupling issue the pattern of persisting a protobuf-derived string value directly via calls like `cdb.GetStrPtr(routingProfileTypeValue.String())`, as this is expected behavior rather than an accidental dependency on those string representations.
Learnt from: bcavnvidia
Repo: NVIDIA/ncx-infra-controller-rest PR: 350
File: api/pkg/api/handler/vpc.go:303-313
Timestamp: 2026-04-06T21:22:33.123Z
Learning: Do not flag as an unacceptable coupling the persistence of protobuf enum values using the enum’s `.String()` output (e.g., storing `cdb.GetStrPtr(myProtoEnum.String())`) in inventory/round-trip paths. This codebase stores these `.String()` values because the gRPC back-end inventory process returns the same protobuf-derived string format, so DB round-trips must preserve the exact representation. If and when these inventory values change to plain strings with no backing proto enum, reassess the storage approach, but until then treat proto-enum `.String()` persistence as intentional and compatible.
Learnt from: bcavnvidia
Repo: NVIDIA/ncx-infra-controller-rest PR: 367
File: api/pkg/api/handler/machineinstancetype.go:593-606
Timestamp: 2026-04-10T16:39:25.404Z
Learning: In this codebase, Machine IDs are NOT UUIDs, while the deprecated Machine/InstanceType association IDs ARE UUIDs. When reviewing code that parses/validates incoming machine-related identifiers, treat `uuid.Parse` as a reliable discriminator between the deprecated UUID-based association IDs and the non-UUID Machine IDs. Do not raise/flag potential ambiguity between these two ID namespaces when a `uuid.Parse` check is used to separate them.
Learnt from: parmani-nv
Repo: NVIDIA/ncx-infra-controller-rest PR: 372
File: api/pkg/api/handler/fabric.go:137-143
Timestamp: 2026-04-15T23:49:00.903Z
Learning: In this repository, `cdb.ErrDoesNotExist` (from `db/pkg/db`) is a DB-layer sentinel error that is never wrapped. Therefore, in Go code that calls DAO/DB methods (e.g., `TenantSiteDAO.GetByTenantIDAndSiteID`), direct equality checks like `err == cdb.ErrDoesNotExist` / `err != cdb.ErrDoesNotExist` are intentional and safe. Do not flag these comparisons as robustness issues and do not suggest replacing them with `errors.Is()` for this specific sentinel.
Learnt from: thossain-nv
Repo: NVIDIA/infra-controller-rest PR: 462
File: api/pkg/api/handler/expectedswitch.go:158-214
Timestamp: 2026-05-01T21:21:59.704Z
Learning: For Expected* handler code under api/pkg/api/handler/, do NOT flag the pattern of executing Temporal workflows via common.ExecuteSyncWorkflow inside cdb.WithTx / cdb.WithTxResult transaction closures as a critical architectural issue. This codebase intentionally accepts the risk of DB commit failure after a successful Temporal call potentially diverging Site/Cloud state, based on: (1) DB transaction errors are rare, (2) certain Expected* inventory reconciliation can self-heal a Cloud rollback after a Site-side create, and (3) for deletes, IsMissingOnSite enables safe retry.
Learnt from: chet
Repo: NVIDIA/infra-controller-rest PR: 494
File: api/pkg/api/handler/nvlinklogicalpartition.go:210-215
Timestamp: 2026-05-07T06:30:32.433Z
Learning: In handler files under `api/pkg/api/handler/` that are still in the team’s mid-migration state toward the `WithTx` pattern (i.e., they follow the known interim flow where `timeoutResp != nil` is checked before evaluating the transaction `err` from `cdb.WithTx` / `cdb.WithTxResult`), do not raise or flag the `timeoutResp`-ordering issue. The intended follow-up in a later pass is to surface transaction errors before handling timeout/`TerminateWorkflowOnTimeOut` for consistency; until then, treat this ordering as intentional for these in-progress handlers.
d804c42 to
8481171
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
api/pkg/api/handler/nvlinklogicalpartition.go (1)
1280-1290:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win“active” wording is misleading for the current count filter.
At Line 1280, the query filters only by
InstanceIDs, soactiveCountand the Line 1289 error message can report non-active rows as “active”. If the intended behavior is “any existing instance blocks delete,” please rename the variable/log/message accordingly.Proposed wording/variable alignment
- activeCount, err := instanceDAO.GetCount(ctx, nil, cdbm.InstanceFilterInput{ + blockingCount, err := instanceDAO.GetCount(ctx, nil, cdbm.InstanceFilterInput{ InstanceIDs: instanceIDSet.ToSlice(), }) @@ - if activeCount > 0 { - logger.Warn().Int("active_instance_count", activeCount).Msg("NVLink Logical Partition has active Instances associated via interfaces") - msg := fmt.Sprintf("%d active Instances are associated with this NVLink Logical Partition, unable to delete", activeCount) + if blockingCount > 0 { + logger.Warn().Int("blocking_instance_count", blockingCount).Msg("NVLink Logical Partition has Instances associated via interfaces") + msg := fmt.Sprintf("%d Instances are associated with this NVLink Logical Partition, unable to delete", blockingCount) return cutil.NewAPIErrorResponse(c, http.StatusBadRequest, msg, nil) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@api/pkg/api/handler/nvlinklogicalpartition.go` around lines 1280 - 1290, The variable and messages use “active” but the GetCount call only filters by InstanceIDs (no active-state filter), so rename activeCount and related log/messages to reflect any existing instances (e.g., instanceCount or existingInstanceCount) and update the logger.Warn and returned error message to say "Instances are associated with this NVLink Logical Partition, unable to delete" (or similar); locate the code around instanceDAO.GetCount, the activeCount variable, the logger.Warn().Int("active_instance_count", ...), and the fmt.Sprintf(...) return to apply the renames and message edits (or alternatively, if the original intent was to check only active instances, add the active-state filter to the GetCount call instead of renaming).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@api/pkg/api/handler/nvlinklogicalpartition.go`:
- Around line 1280-1290: The variable and messages use “active” but the GetCount
call only filters by InstanceIDs (no active-state filter), so rename activeCount
and related log/messages to reflect any existing instances (e.g., instanceCount
or existingInstanceCount) and update the logger.Warn and returned error message
to say "Instances are associated with this NVLink Logical Partition, unable to
delete" (or similar); locate the code around instanceDAO.GetCount, the
activeCount variable, the logger.Warn().Int("active_instance_count", ...), and
the fmt.Sprintf(...) return to apply the renames and message edits (or
alternatively, if the original intent was to check only active instances, add
the active-state filter to the GetCount call instead of renaming).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6134dc63-dd43-479d-90c6-4a2d5301306b
📒 Files selected for processing (5)
api/pkg/api/handler/Untitled-1.jsapi/pkg/api/handler/infinibandpartition.goapi/pkg/api/handler/infinibandpartition_test.goapi/pkg/api/handler/nvlinklogicalpartition.goapi/pkg/api/handler/nvlinklogicalpartition_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- api/pkg/api/handler/infinibandpartition.go
- api/pkg/api/handler/nvlinklogicalpartition_test.go
- api/pkg/api/handler/infinibandpartition_test.go
a425b2f to
169de72
Compare
Description
Before making deletion call to Partition,
if exists raise an error, prevent deleting Partition
Type of Change
Services Affected
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes