Skip to content

refactor: Migrate Network Security Group API handlers to WithTx transaction helper#478

Open
chet wants to merge 1 commit intoNVIDIA:mainfrom
chet:with-tx-networksecuritygroup
Open

refactor: Migrate Network Security Group API handlers to WithTx transaction helper#478
chet wants to merge 1 commit intoNVIDIA:mainfrom
chet:with-tx-networksecuritygroup

Conversation

@chet
Copy link
Copy Markdown
Contributor

@chet chet commented May 4, 2026

Description

Applies WithTx (and WithTxResult!) from #462 (and now in #472, #473, and #476) to the Create/Update/Delete NSG handlers.

Implements our "timeoutResp pattern" (which is something we had introduced in #472, and then @coderabbitai said we should be consistent by doing it everywhere). TLDR is the existing code calls common.TerminateWorkflowOnTimeOut on timeout, but we want to defer that until after the transaction is unwound + DB connection back (because we don't want it to block waiting on the network).

The adjustment (which we've done before, but figured I'd call it out more explicitly here) had some discussion in #472 (comment), and is effectively:

    var timeoutResp func() error

    err = cdb.WithTx(ctx, ..., func(tx *cdb.Tx) error {
      ...
      if /* workflow timeout detected */ {
        // capture the terminate work, but DON'T do it yet
        timeoutResp = func() error {
          return common.TerminateWorkflowOnTimeOut(...)
        }
        return cutil.NewAPIError(...)   // forces rollback
      }
      ...
    })

    // rollback has now completed, now we do potentially blocking network work
    if timeoutResp != nil {
      return timeoutResp()
    }

Also addressed some @coderabbitai feedback around log messages in advance.

Signed-off-by: Chet Nichols III chetn@nvidia.com

Type of Change

  • Feature - New feature or functionality (feat:)
  • Fix - Bug fixes (fix:)
  • Chore - Modification or removal of existing functionality (chore:)
  • Refactor - Refactoring of existing functionality (refactor:)
  • Docs - Changes in documentation or OpenAPI schema (docs:)
  • CI - Changes in GitHub workflows. Requires additional scrutiny (ci:)
  • Version - Issuing a new release version (version:)

Services Affected

  • API - API models or endpoints updated
  • Workflow - Workflow service updated
  • DB - DB DAOs or migrations updated
  • Site Manager - Site Manager updated
  • Cert Manager - Cert Manager updated
  • Site Agent - Site Agent updated
  • RLA - RLA service updated
  • Powershelf Manager - Powershelf Manager updated
  • NVSwitch Manager - NVSwitch Manager updated

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@chet chet requested a review from a team as a code owner May 4, 2026 16:57
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 321c2193-a66b-4b49-97cb-ec317fba8bc4

📥 Commits

Reviewing files that changed from the base of the PR and between b41687c and 37f086a.

📒 Files selected for processing (1)
  • api/pkg/api/handler/networksecuritygroup.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • api/pkg/api/handler/networksecuritygroup.go

Summary by CodeRabbit

  • Refactor
    • Improved transaction handling for network security group create, update, and delete operations to be more reliable and consistent.
  • Bug Fixes
    • Enhanced timeout and cleanup behavior so long-running operations are properly finalized and scheduled for termination when needed, reducing risk of incomplete or stuck requests.

Walkthrough

Replace manual SQL transaction handling in NetworkSecurityGroup create/update/delete handlers with cdb.WithTx / cdb.WithTxResult closures. Move Temporal synchronous workflow invocation inside those closures, introduce outer-scope timeout callbacks deferred until after closure return, and remove the unused database/sql import. (≈30 words)

Changes

NetworkSecurityGroup Transaction Management Refactor

Layer / File(s) Summary
Import Cleanup
api/pkg/api/handler/networksecuritygroup.go
Removes database/sql import now that explicit sql.TxOptions / BeginTx/Commit/Rollback are eliminated.
Data / Local IDs
api/pkg/api/handler/networksecuritygroup.go
networkSecurityGroupID and transaction-scoped values are generated/assigned inside transactional closures; outer-scope placeholders (e.g., timeoutResp, ssd, ssds) are declared to be populated by the closures.
Core Transactional Pattern
api/pkg/api/handler/networksecuritygroup.go (create: ~218–360, update: ~1193–1325, delete: ~907–1018)
Replaces manual tx management with cdb.WithTxResult (create/update) and cdb.WithTx (delete). DB insert/update/delete and status-detail creation occur inside closure-based transactions.
Workflow Execution Moved Inside Transaction
api/pkg/api/handler/networksecuritygroup.go (create/update/delete closures)
Temporal client retrieval and synchronous workflow Start/Complete calls are invoked from inside the transaction closures rather than spanning manual transaction boundaries.
Timeout / Termination Handling
api/pkg/api/handler/networksecuritygroup.go (create/update/delete)
Introduces outer-scope timeoutResp callbacks; if a workflow timeout occurs, scheduling of TerminateWorkflowOnTimeOut is deferred until after the transactional closure returns.
Response Construction / Wiring
api/pkg/api/handler/networksecuritygroup.go (response sections near ~1323 and other handlers)
HTTP responses are constructed from NSG and status-detail values produced and captured inside the transactional closures.
Removed Manual Commit Blocks
api/pkg/api/handler/networksecuritygroup.go
Old explicit Commit/Rollback blocks removed; commit/rollback now managed by cdb.WithTx/WithTxResult helper.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler
    participant DB
    participant Temporal
    Client->>Handler: HTTP Create/Update/Delete request
    Handler->>DB: cdb.WithTx / cdb.WithTxResult (enter closure)
    DB-->>Handler: tx context
    Handler->>DB: insert/update/delete NSG + create status-details
    Handler->>Temporal: start synchronous workflow (within closure)
    Temporal-->>Handler: workflow result or timeout
    DB-->>Handler: commit or rollback (closure returns)
    alt workflow timeout detected
        Handler->>Temporal: TerminateWorkflowOnTimeOut (scheduled after closure)
    end
    Handler-->>Client: HTTP response built from closure-captured values
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the primary change: migration of Network Security Group handlers to the WithTx transaction helper pattern.
Description check ✅ Passed The description is directly relevant to the changeset, detailing the refactoring rationale, the timeoutResp pattern implementation, and references to prior related work.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-05-04 16:58:16 UTC | Commit: 064594a

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
api/pkg/api/handler/networksecuritygroup.go (1)

937-940: 💤 Low value

Inconsistent error handling for status detail creation.

In the Create handler (lines 253-256), a failed sdDAO.CreateFromParams call returns an error and aborts the transaction. Here in Delete, the error is logged but silently ignored, allowing the deletion to proceed. If this is intentional—treating status detail as non-critical for deletes—consider adding a brief comment to document this design decision; otherwise, align with the Create handler's behavior.

💡 Suggested documentation if intentional
 		// Create status detail
+		// NOTE: Status detail creation is non-critical for deletes; log and continue.
 		if _, derr := sdDAO.CreateFromParams(ctx, tx, nsg.ID, *cdb.GetStrPtr(cdbm.NetworkSecurityGroupStatusDeleting),
 			cdb.GetStrPtr("received request for deletion, pending processing")); derr != nil {
 			logger.Error().Err(derr).Msg("error creating Status Detail DB entry")
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/networksecuritygroup.go` around lines 937 - 940, The
Delete handler currently calls sdDAO.CreateFromParams and only logs errors while
Create handler treats that error as fatal and aborts the transaction; make them
consistent by either (A) propagating the error from sdDAO.CreateFromParams in
the Delete handler (roll back/abort the current transaction and return the error
exactly like the Create handler does), or (B) if the status-detail write is
intentionally non-critical on delete, add a concise comment above the
sdDAO.CreateFromParams call explaining this design decision so future readers
know the difference; reference sdDAO.CreateFromParams, the Delete handler in
networksecuritygroup.go, and the Create handler behavior when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@api/pkg/api/handler/networksecuritygroup.go`:
- Around line 937-940: The Delete handler currently calls sdDAO.CreateFromParams
and only logs errors while Create handler treats that error as fatal and aborts
the transaction; make them consistent by either (A) propagating the error from
sdDAO.CreateFromParams in the Delete handler (roll back/abort the current
transaction and return the error exactly like the Create handler does), or (B)
if the status-detail write is intentionally non-critical on delete, add a
concise comment above the sdDAO.CreateFromParams call explaining this design
decision so future readers know the difference; reference
sdDAO.CreateFromParams, the Delete handler in networksecuritygroup.go, and the
Create handler behavior when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 32121a4c-e1eb-42a6-8182-d78927e24bf4

📥 Commits

Reviewing files that changed from the base of the PR and between bce7503 and 064594a.

📒 Files selected for processing (1)
  • api/pkg/api/handler/networksecuritygroup.go

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
nico-nsm 64 2 20 33 9 0
nico-psm 56 4 29 13 2 8
nico-rest-api 57 4 30 13 2 8
nico-rest-cert-manager 54 4 28 13 1 8
nico-rest-db 55 4 28 13 2 8
nico-rest-site-agent 54 4 28 13 1 8
nico-rest-site-manager 54 4 28 13 1 8
nico-rest-workflow 56 4 29 13 2 8
nico-rla 55 4 28 13 2 8
TOTAL 505 34 248 137 22 64

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@chet chet force-pushed the with-tx-networksecuritygroup branch from 064594a to 214a91e Compare May 6, 2026 17:52
@thossain-nv thossain-nv changed the title refactor: Migrate networksecuritygroup handler to WithTx refactor: Migrate Network Security Group API handlers to WithTx transaction helper May 6, 2026
var timeoutResp func() error
var ssd *cdbm.StatusDetail

networkSecurityGroup, err := cdb.WithTxResult(ctx, cnsgh.dbSession, func(tx *cdb.Tx) (*cdbm.NetworkSecurityGroup, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any difference between WithTx and using outer scope variable vs calling WithTxResult?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah -- that's a lot better -- fixed!

NetworkSecurityGroupID: nsg.ID,
Status: cdb.GetStrPtr(cdbm.NetworkSecurityGroupStatusDeleting),
}
if _, derr := nsgDAO.Update(ctx, tx, unsgInput); derr != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can detach these as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shoot, missed that one -- done, sorry!

@chet chet force-pushed the with-tx-networksecuritygroup branch from 214a91e to b41687c Compare May 6, 2026 19:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/pkg/api/handler/networksecuritygroup.go`:
- Around line 940-945: The StatusDetail creation error is currently only logged
and swallowed after calling sdDAO.CreateFromParams in the delete flow; change
this to fail the surrounding transaction by returning or propagating the error
(derr) so the caller can rollback/abort instead of committing; locate the
sdDAO.CreateFromParams call (using ctx, tx, nsg.ID and
cdbm.NetworkSecurityGroupStatusDeleting) and replace the logger-only path with a
return/propagate of derr (or wrap it with context) consistent with the
create/update error handling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ab031316-39c3-4890-ac47-f25c627cdf52

📥 Commits

Reviewing files that changed from the base of the PR and between 064594a and b41687c.

📒 Files selected for processing (1)
  • api/pkg/api/handler/networksecuritygroup.go

Comment thread api/pkg/api/handler/networksecuritygroup.go
Applies `WithTx` (and `WithTxResult`!) from NVIDIA#462 to the `Create`/`Update`/`Delete` NSG handlers.

Implements our "`timeoutResp` pattern" (which is something we had introduced in NVIDIA#472, and then @coderabbitai said we should be consistent by doing it everywhere). TLDR is the existing code calls `common.TerminateWorkflowOnTimeOut` on timeout, but we want to defer that until after the transaction is unwound + DB connection back (because we don't want it to block waiting on the network).

The adjustment (which we've done before, but figured I'd call it out more explicitly here) is effectively:
```
    var timeoutResp func() error

    err = cdb.WithTx(ctx, ..., func(tx *cdb.Tx) error {
      ...
      if /* workflow timeout detected */ {
        // capture the terminate work, but DON'T do it yet
        timeoutResp = func() error {
          return common.TerminateWorkflowOnTimeOut(...)
        }
        return cutil.NewAPIError(...)   // forces rollback
      }
      ...
    })

    // rollback has now completed, now we do potentially blocking network work
    if timeoutResp != nil {
      return timeoutResp()
    }
```

Also addressed some @coderabbitai feedback around log messages in advance.

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
@chet chet force-pushed the with-tx-networksecuritygroup branch from b41687c to 37f086a Compare May 6, 2026 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants