Skip to content

refactor: Migrate NVLink Logical Partition API handlers to WithTx#494

Open
chet wants to merge 1 commit intoNVIDIA:mainfrom
chet:with-tx-nvlinklogicalpartition
Open

refactor: Migrate NVLink Logical Partition API handlers to WithTx#494
chet wants to merge 1 commit intoNVIDIA:mainfrom
chet:with-tx-nvlinklogicalpartition

Conversation

@chet
Copy link
Copy Markdown
Contributor

@chet chet commented May 6, 2026

Description

Applies the WithTx pattern from #462 to the NVLink Logical Partition handlers. Integrates TerminateWorkflowOnTimeout as well for extra squeaky clean-ness.

CodeRabbit feedback addressed:

  • Propagate StatusDetail creation failure as an API error in the Delete handler (matches the Create handler) instead of just logging.

Signed-off-by: Chet Nichols III chetn@nvidia.com

Type of Change

  • Feature - New feature or functionality (feat:)
  • Fix - Bug fixes (fix:)
  • Chore - Modification or removal of existing functionality (chore:)
  • Refactor - Refactoring of existing functionality (refactor:)
  • Docs - Changes in documentation or OpenAPI schema (docs:)
  • CI - Changes in GitHub workflows. Requires additional scrutiny (ci:)
  • Version - Issuing a new release version (version:)

Services Affected

  • API - API models or endpoints updated
  • Workflow - Workflow service updated
  • DB - DB DAOs or migrations updated
  • Site Manager - Site Manager updated
  • Cert Manager - Cert Manager updated
  • Site Agent - Site Agent updated
  • RLA - RLA service updated
  • Powershelf Manager - Powershelf Manager updated
  • NVSwitch Manager - NVSwitch Manager updated

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@chet chet requested a review from a team as a code owner May 6, 2026 18:57
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Replace manual database/sql transaction management in NVLink Logical Partition Create/Update/Delete handlers with cdb.WithTx/cdb.WithTxResult; integrate StatusDetail creation, synchronous Site Temporal workflow execution within transactions, and deferred TerminateWorkflow on workflow timeout.

Changes

NVLink Logical Partition Transaction & Workflow Migration

Layer / File(s) Summary
Imports
api/pkg/api/handler/nvlinklogicalpartition.go
Removed database/sql import; adjusted imports to rely on cdb transaction helpers.
Data / DAO
api/pkg/api/handler/nvlinklogicalpartition.go (create/update/delete sections)
Introduced and used StatusDetail creation alongside NVLinkLogicalPartition state mutations inside transactions.
Core Transactional Flow
api/pkg/api/handler/nvlinklogicalpartition.go (create: lines ~205–317, update: lines ~1011–1102, delete: lines ~1232–1334)
Replaced explicit Begin/commit/rollback with cdb.WithTx and cdb.WithTxResult; DB create/update/status transitions and workflow scheduling now occur inside transactional closures, returning results via closure returns.
Temporal Workflow Orchestration
api/pkg/api/handler/nvlinklogicalpartition.go (same sections)
Within transaction closures, obtain Site Temporal client, schedule and wait for Create/Update/Delete workflows with timeout-bounded contexts. On timeout, capture a timeoutResp closure to terminate the workflow after the transaction unwinds. Delete treats NICo object-not-found as skippable.
Post-Transaction State Sync & Response
api/pkg/api/handler/nvlinklogicalpartition.go (create/update sections)
After transaction completes, reconcile DB status/details using workflow outputs via wfutil, aggregate StatusDetails, and construct the API response.
Error Handling Simplification
api/pkg/api/handler/nvlinklogicalpartition.go
Removed manual commit/rollback paths and replaced with unified error handling via cdb.WithTx* closures and consistent API error construction.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant API as API Handler
  participant DB
  participant Temporal as Site Temporal
  participant WFUtil as wfutil

  Client->>API: Create/Update/Delete request
  API->>DB: cdb.WithTx / cdb.WithTxResult (start tx)
  DB-->>API: tx context
  API->>DB: insert/update NVLinkLogicalPartition + StatusDetail (within tx)
  API->>Temporal: Schedule & wait Site workflow (within tx)
  Temporal-->>API: workflow result or timeout
  API->>DB: tx commit/rollback (via cdb helper)
  alt workflow timed out
    API->>Temporal: TerminateWorkflow (deferred, post-rollback)
  end
  API->>WFUtil: reconcile status from workflow result (post-tx)
  WFUtil-->>DB: update partition status/details
  API->>Client: return entity + aggregated status details
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description check ✅ Passed The description is directly related to the changeset, explaining the application of the WithTx pattern, integration of TerminateWorkflowOnTimeout, and the specific Delete handler error handling improvement.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately summarizes the primary change: migrating NVLink Logical Partition API handlers to use the WithTx transaction helper pattern, which is the core refactoring objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-05-06 19:00:34 UTC | Commit: 38d169e

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/pkg/api/handler/nvlinklogicalpartition.go`:
- Around line 1251-1256: The delete path currently only checks the error from
sdDAO.CreateFromParams and may allow a nil result to pass, so update the block
using sdDAO.CreateFromParams (the call that passes nvllp.ID.String(),
cdb.GetStrPtr(cdbm.NVLinkLogicalPartitionStatusDeleting), and the message
"Received request for deletion, pending processing") to also validate the
returned result is non-nil; if the returned status-detail is nil (even with nil
error) log an error with logger.Error().Msg and return the same
cutil.NewAPIError(http.StatusInternalServerError, "Failed to create Status
Detail for NVLink Logical Partition deletion", nil) to abort the transaction.
Ensure you reference the same variables (ctx, tx) and preserve existing error
logging for non-nil errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: efc04cfb-ab90-4db9-9cbe-f7f294d9de00

📥 Commits

Reviewing files that changed from the base of the PR and between 7da45c1 and 38d169e.

📒 Files selected for processing (1)
  • api/pkg/api/handler/nvlinklogicalpartition.go

Comment thread api/pkg/api/handler/nvlinklogicalpartition.go
@chet chet changed the title refactor: Migrate nvlinklogicalpartition handler to WithTx refactor: Migrate NVLink Logical Partition API handlers to WithTx transaction helper May 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
nico-nsm 64 2 20 33 9 0
nico-psm 56 4 29 13 2 8
nico-rest-api 57 4 30 13 2 8
nico-rest-cert-manager 54 4 28 13 1 8
nico-rest-db 55 4 28 13 2 8
nico-rest-site-agent 54 4 28 13 1 8
nico-rest-site-manager 54 4 28 13 1 8
nico-rest-workflow 56 4 29 13 2 8
nico-rla 55 4 28 13 2 8
TOTAL 505 34 248 137 22 64

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

chet added a commit to chet/bare-metal-manager-rest that referenced this pull request May 7, 2026
…ain-nv

Match the convention in the rest of the codebase by splitting the
combined `if X := Y(); X != nil { ... }` patterns introduced by the
WithTx migration into two statements. Called out by CodeRabbit on
PR NVIDIA#494.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Chet Nichols III <chetn@nvidia.com>
@chet chet force-pushed the with-tx-nvlinklogicalpartition branch from 38d169e to 10c660c Compare May 7, 2026 04:45
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/pkg/api/handler/nvlinklogicalpartition.go`:
- Around line 210-215: Handlers currently call timeoutResp() and return its
result before checking the transaction helper error from cdb.WithTx /
cdb.WithTxResult, which masks DB/tx failures; change each handler (the
timeoutResp variable usage and the cdb.WithTx / cdb.WithTxResult call sites) to
first check the tx helper error (err) and if err != nil return that error (or
wrap/preserve it), otherwise if err == nil then call and return timeoutResp();
do the same for all listed locations so transaction rollback/unwind failures
surface before returning the timeout/TerminateWorkflow response.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1dd3754b-0861-4494-a6e4-503efdd207c5

📥 Commits

Reviewing files that changed from the base of the PR and between 38d169e and 10c660c.

📒 Files selected for processing (1)
  • api/pkg/api/handler/nvlinklogicalpartition.go

Comment thread api/pkg/api/handler/nvlinklogicalpartition.go
@chet chet force-pushed the with-tx-nvlinklogicalpartition branch 2 times, most recently from 1457877 to 5094e9a Compare May 7, 2026 06:25
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
api/pkg/api/handler/nvlinklogicalpartition.go (2)

237-238: 💤 Low value

Minor inefficiency: unnecessary pointer allocation and dereference.

The expression *cdb.GetStrPtr(cdbm.NVLinkLogicalPartitionStatusPending) allocates a pointer to the status constant and immediately dereferences it. If CreateFromParams accepts a plain string for the status parameter, pass the constant directly.

Proposed simplification
-		ssd, derr = sdDAO.CreateFromParams(ctx, tx, nvllp.ID.String(), *cdb.GetStrPtr(cdbm.NVLinkLogicalPartitionStatusPending),
+		ssd, derr = sdDAO.CreateFromParams(ctx, tx, nvllp.ID.String(), cdbm.NVLinkLogicalPartitionStatusPending,
 			cdb.GetStrPtr("received NVLink Logical Partition creation request, pending"))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/pkg/api/handler/nvlinklogicalpartition.go` around lines 237 - 238, The
call to sdDAO.CreateFromParams is needlessly allocating and immediately
dereferencing a pointer for the status argument using
*cdb.GetStrPtr(cdbm.NVLinkLogicalPartitionStatusPending); change that argument
to pass the plain constant cdbm.NVLinkLogicalPartitionStatusPending directly
(leave other params like nvllp.ID.String() and the description as-is) to avoid
the unnecessary pointer allocation and dereference.

1276-1277: 💤 Low value

Minor inefficiency: same unnecessary pointer allocation pattern.

Same observation as in the Create handler—*cdb.GetStrPtr(cdbm.NVLinkLogicalPartitionStatusDeleting) can be simplified if the parameter accepts a plain string.

Proposed simplification
-		ssd, derr := sdDAO.CreateFromParams(ctx, tx, nvllp.ID.String(), *cdb.GetStrPtr(cdbm.NVLinkLogicalPartitionStatusDeleting),
+		ssd, derr := sdDAO.CreateFromParams(ctx, tx, nvllp.ID.String(), cdbm.NVLinkLogicalPartitionStatusDeleting,
 			cdb.GetStrPtr("Received request for deletion, pending processing"))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/pkg/api/handler/nvlinklogicalpartition.go` around lines 1276 - 1277, In
sdDAO.CreateFromParams call in nvlinklogicalpartition.go (the line creating
`ssd`), remove the unnecessary pointer allocation
`*cdb.GetStrPtr(cdbm.NVLinkLogicalPartitionStatusDeleting)` and pass the plain
string constant `cdbm.NVLinkLogicalPartitionStatusDeleting` (or a local string
variable) directly; likewise stop calling `cdb.GetStrPtr` for that status
value—update the `ssd, derr := sdDAO.CreateFromParams(ctx, tx,
nvllp.ID.String(), ...)` invocation to use the plain string status and keep the
message parameter as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@api/pkg/api/handler/nvlinklogicalpartition.go`:
- Around line 237-238: The call to sdDAO.CreateFromParams is needlessly
allocating and immediately dereferencing a pointer for the status argument using
*cdb.GetStrPtr(cdbm.NVLinkLogicalPartitionStatusPending); change that argument
to pass the plain constant cdbm.NVLinkLogicalPartitionStatusPending directly
(leave other params like nvllp.ID.String() and the description as-is) to avoid
the unnecessary pointer allocation and dereference.
- Around line 1276-1277: In sdDAO.CreateFromParams call in
nvlinklogicalpartition.go (the line creating `ssd`), remove the unnecessary
pointer allocation `*cdb.GetStrPtr(cdbm.NVLinkLogicalPartitionStatusDeleting)`
and pass the plain string constant `cdbm.NVLinkLogicalPartitionStatusDeleting`
(or a local string variable) directly; likewise stop calling `cdb.GetStrPtr` for
that status value—update the `ssd, derr := sdDAO.CreateFromParams(ctx, tx,
nvllp.ID.String(), ...)` invocation to use the plain string status and keep the
message parameter as-is.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 93d546a6-7e9b-4aa8-b0a8-22b24020fb78

📥 Commits

Reviewing files that changed from the base of the PR and between 1457877 and 5094e9a.

📒 Files selected for processing (1)
  • api/pkg/api/handler/nvlinklogicalpartition.go

Applies the `WithTx` pattern from NVIDIA#462 to the NVLink Logical Partition handlers. Integrates `TerminateWorkflowOnTimeout` as well for extra squeaky clean-ness.

CodeRabbit feedback addressed:
- Propagate `StatusDetail` creation failure as an API error in the Delete handler (matches the Create handler) instead of just logging.

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
@chet chet force-pushed the with-tx-nvlinklogicalpartition branch from 5094e9a to 95359e6 Compare May 7, 2026 06:29
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
api/pkg/api/handler/nvlinklogicalpartition.go (1)

205-317: 🏗️ Heavy lift

Add regression coverage for the new WithTx write paths.

These branches now depend on transaction-helper rollback semantics, delayed timeout termination, and special-case workflow error handling. Please add focused coverage for at least: workflow timeout in create/update/delete, description-only update payload generation, delete StatusDetail creation failure, and delete NICo object-not-found handling. This refactor is otherwise easy to regress silently.

Also applies to: 1011-1102, 1234-1334

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/pkg/api/handler/nvlinklogicalpartition.go` around lines 205 - 317, Add
regression tests exercising the new cdb.WithTx write paths: create focused tests
that (1) simulate Temporal client (scp.GetClientByID / stc.ExecuteWorkflow)
timeouts when running "CreateNVLinkLogicalPartition" so the wferr path that sets
timeoutResp and calls common.TerminateWorkflowOnTimeOut after tx rollback is
exercised, (2) verify update flows that only change Description produce the
correct payload and DB updates, (3) simulate sdDAO.CreateFromParams failing
during delete to assert proper error handling/reporting, and (4) simulate delete
NICo object-not-found responses to ensure that path is handled as expected; use
mocks for nvllpDAO, sdDAO, scp/stc, and common.UnwrapWorkflowError to drive the
error branches and assert timeoutResp is non-nil and TerminateWorkflowOnTimeOut
is invoked after WithTx returns.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@api/pkg/api/handler/nvlinklogicalpartition.go`:
- Around line 205-317: Add regression tests exercising the new cdb.WithTx write
paths: create focused tests that (1) simulate Temporal client (scp.GetClientByID
/ stc.ExecuteWorkflow) timeouts when running "CreateNVLinkLogicalPartition" so
the wferr path that sets timeoutResp and calls common.TerminateWorkflowOnTimeOut
after tx rollback is exercised, (2) verify update flows that only change
Description produce the correct payload and DB updates, (3) simulate
sdDAO.CreateFromParams failing during delete to assert proper error
handling/reporting, and (4) simulate delete NICo object-not-found responses to
ensure that path is handled as expected; use mocks for nvllpDAO, sdDAO, scp/stc,
and common.UnwrapWorkflowError to drive the error branches and assert
timeoutResp is non-nil and TerminateWorkflowOnTimeOut is invoked after WithTx
returns.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: dfc2b363-d7b4-456c-be6d-bca30021c4dd

📥 Commits

Reviewing files that changed from the base of the PR and between 5094e9a and 95359e6.

📒 Files selected for processing (1)
  • api/pkg/api/handler/nvlinklogicalpartition.go

@chet chet changed the title refactor: Migrate NVLink Logical Partition API handlers to WithTx transaction helper refactor: Migrate NVLink Logical Partition API handlers to WithTx May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants