Skip to content

refactor: Migrate VPC Prefix API handlers to WithTx#498

Merged
chet merged 1 commit into
NVIDIA:mainfrom
chet:with-tx-vpcprefix
May 7, 2026
Merged

refactor: Migrate VPC Prefix API handlers to WithTx#498
chet merged 1 commit into
NVIDIA:mainfrom
chet:with-tx-vpcprefix

Conversation

@chet
Copy link
Copy Markdown
Contributor

@chet chet commented May 6, 2026

Description

Applies WithTx from #462 to the VPC Prefix API handlers.

Takes into consideration a few previous code reviews for integrating these, ensuring:

  • We split assignment from error condition checking (thanks @thossain-nv).
  • We use the TerminateWorkflowOnTimeOut helper and not duplicate code (thanks @thossain-nv).
  • We make sure we're consistently using outer-scope vars with WithTx and not a weird mix of WithTx and WithTxResult.

Signed-off-by: Chet Nichols III chetn@nvidia.com

Type of Change

  • Feature - New feature or functionality (feat:)
  • Fix - Bug fixes (fix:)
  • Chore - Modification or removal of existing functionality (chore:)
  • Refactor - Refactoring of existing functionality (refactor:)
  • Docs - Changes in documentation or OpenAPI schema (docs:)
  • CI - Changes in GitHub workflows. Requires additional scrutiny (ci:)
  • Version - Issuing a new release version (version:)

Services Affected

  • API - API models or endpoints updated
  • Workflow - Workflow service updated
  • DB - DB DAOs or migrations updated
  • Site Manager - Site Manager updated
  • Cert Manager - Cert Manager updated
  • Site Agent - Site Agent updated
  • RLA - RLA service updated
  • Powershelf Manager - Powershelf Manager updated
  • NVSwitch Manager - NVSwitch Manager updated

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@chet chet requested a review from a team as a code owner May 6, 2026 20:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

Review Change Stack

Summary by CodeRabbit

  • Refactor
    • VPC prefix handlers unified under a safer transactional flow, consolidating create/update/delete steps to run atomically.
    • Added advisory locking to reduce concurrent conflicts and integrated synchronous background workflow execution into handler flows.
    • Improved timeout-aware handling so workflow termination runs only after transaction unwind, and standardized transaction error handling for more reliable responses.

Walkthrough

VPC prefix Create/Update/Delete handlers were refactored to use cdb.WithTx closure transactions, remove direct database/sql usage, centralize advisory locking and status-detail writes inside the closure, and execute synchronous Temporal workflows with deferred timeout termination after the transaction unwinds.

Changes

VPC Prefix Handler Refactoring with Temporal Workflow Integration

Layer / File(s) Summary
Import and Infrastructure
api/pkg/api/handler/vpcprefix.go
Removed direct database/sql import; transaction control now delegated to cdb.WithTx.
Create Handler Implementation
api/pkg/api/handler/vpcprefix.go (lines 204–331)
CreateVpcPrefixHandler.Handle refactored to run inside WithTx: acquires advisory lock, creates IPAM child prefix, inserts VPC prefix and status detail, looks up Temporal client, executes synchronous CreateVpcPrefix workflow (with timeout handling deferred until after tx), and returns created entity plus status detail.
Update Handler Implementation
api/pkg/api/handler/vpcprefix.go (lines 780–872)
UpdateVpcPrefixHandler.Handle now uses WithTx for the update, status-detail retrieval for the response, Temporal client lookup, and synchronous UpdateVpcPrefix workflow execution; defers timeout-based TerminateWorkflow until after transaction unwinds.
Delete Handler Implementation
api/pkg/api/handler/vpcprefix.go (lines 994–1093)
DeleteVpcPrefixHandler.Handle migrated to WithTx, acquires advisory lock inside closure, sets status to Deleting and creates status detail, executes synchronous DeleteVpcPrefix workflow (tolerating object-not-found Site errors), and defers termination on timeout until after the transaction returns; uses common.HandleTxError for tx error handling.

Sequence Diagram

sequenceDiagram
    participant Handler as Handler
    participant WithTx as WithTx\n(Transaction)
    participant DB as Database
    participant Status as Status\nDetails
    participant Temporal as Temporal\nClient
    participant Workflow as Workflow\nExecutor

    Handler->>WithTx: Begin closure with TX
    WithTx->>DB: Acquire advisory lock (if needed)
    Note over DB: Lock held for duration

    rect rgba(100, 150, 200, 0.5)
    Note over WithTx,DB: Core Operation (Create/Update/Delete)
    WithTx->>DB: Execute entity operation (insert/update/delete), create/read status detail
    WithTx->>Status: create/read status detail
    DB-->>WithTx: Operation result
    end

    WithTx->>Temporal: Retrieve Temporal client
    WithTx->>Workflow: Construct workflow request
    WithTx->>Workflow: Execute synchronously (with timeout)
    Workflow-->>WithTx: Workflow status/result

    WithTx->>Handler: Closure returns (tx unwinds)
    alt timeout
      Handler->>Temporal: TerminateWorkflowOnTimeOut deferred
    end
    Handler-->>Client: API response with entity + status detail
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main refactoring effort: migrating VPC Prefix API handlers to use the WithTx pattern, which is the primary change across the modified file.
Description check ✅ Passed The description clearly relates to the changeset by explaining the WithTx migration pattern application and referencing specific implementation considerations from prior reviews.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-05-06 20:51:26 UTC | Commit: 49fedcb

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
nico-nsm 64 2 20 33 9 0
nico-psm 56 4 29 13 2 8
nico-rest-api 57 4 30 13 2 8
nico-rest-cert-manager 54 4 28 13 1 8
nico-rest-db 55 4 28 13 2 8
nico-rest-site-agent 54 4 28 13 1 8
nico-rest-site-manager 54 4 28 13 1 8
nico-rest-workflow 56 4 29 13 2 8
nico-rla 55 4 28 13 2 8
TOTAL 505 34 248 137 22 64

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

Copy link
Copy Markdown
Contributor

@thossain-nv thossain-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @chet

Applies `WithTx` from NVIDIA#462 to the VPC Prefix API handlers.

Takes into consideration a few previous code reviews for integrating these, ensuring:
- We split assignment from error condition checking (thanks @thossain-nv).
- We use the `TerminateWorkflowOnTimeOut` helper and not duplicate code (thanks @thossain-nv).
- We make sure we're consistently using outer-scope vars with `WithTx` and not a weird mix of `WithTx` and `WithTxResult`.

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
@chet chet force-pushed the with-tx-vpcprefix branch from 49fedcb to d6d2a6b Compare May 7, 2026 19:26
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/pkg/api/handler/vpcprefix.go`:
- Around line 1038-1077: The delete handler uses the unbounded ctx and omits
WorkflowExecutionTimeout, which can leave DB transactions/locks open; wrap the
call to stc.ExecuteWorkflow in a bounded context via ctxBound, cancel :=
context.WithTimeout(ctx, cutil.WorkflowContextTimeout) with defer cancel(), pass
ctxBound into stc.ExecuteWorkflow and we.Get (replace uses of ctx), and add
workflowOptions.WorkflowExecutionTimeout = cutil.WorkflowExecutionTimeout so the
ExecuteWorkflow/WorkflowExecution has the same timeouts as create/update
handlers (refer to symbols workflowOptions, stc.ExecuteWorkflow, we.Get,
cutil.WorkflowContextTimeout, cutil.WorkflowExecutionTimeout).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 67b69b5f-2121-4e68-ba12-4f2e0c5b7143

📥 Commits

Reviewing files that changed from the base of the PR and between 49fedcb and d6d2a6b.

📒 Files selected for processing (1)
  • api/pkg/api/handler/vpcprefix.go

Comment on lines +1038 to +1077
workflowOptions := temporalClient.StartWorkflowOptions{
ID: "vpc-prefix-delete-" + vpcPrefix.ID.String(),
TaskQueue: queue.SiteTaskQueue,
}

logger.Error().Err(err).Msg("failed to delete VPC Prefix, timeout occurred executing workflow on Site.")
logger.Info().Msg("triggering VPC prefix delete workflow")

// Create a new context deadlines
newctx, newcancel := context.WithTimeout(context.Background(), cutil.WorkflowContextNewAfterTimeout)
defer newcancel()
// Trigger Site workflow to delete VPC prefix VPC prefix
we, derr := stc.ExecuteWorkflow(ctx, workflowOptions, "DeleteVpcPrefix", deleteVpcPrefixRequest)
if derr != nil {
logger.Error().Err(derr).Msg("failed to synchronously start Temporal workflow to delete VPC prefix")
return cutil.NewAPIError(http.StatusInternalServerError, fmt.Sprintf("Failed to start sync workflow to delete VPC prefix on Site: %s", derr), nil)
}

// Initiate termination workflow
serr := stc.TerminateWorkflow(newctx, wid, "", "timeout occurred executing delete VPC prefix workflow")
if serr != nil {
logger.Error().Err(serr).Msg("failed to terminate Temporal workflow for deleting VPC prefix")
return cutil.NewAPIErrorResponse(c, http.StatusInternalServerError, fmt.Sprintf("Failed to terminate synchronous VPC prefix deletion workflow after timeout, Cloud and Site data may be de-synced: %s", serr), nil)
wid := we.GetID()
logger.Info().Str("Workflow ID", wid).Msg("executed synchronous delete VPC prefix workflow")

// Execute the workflow synchronously
wferr := we.Get(ctx, nil)
// Handle skippable errors
if wferr != nil {
// If this was a 404 back from NICo, we can treat the object as already having been deleted and allow things to proceed.
var applicationErr *tp.ApplicationError
if errors.As(wferr, &applicationErr) && slices.Contains(swe.ObjectNotFoundErrTypes(), applicationErr.Type()) {
logger.Warn().Msg(swe.ErrTypeNICoObjectNotFound + " received from Site")
// Reset error to nil
wferr = nil
}
}

logger.Info().Str("Workflow ID", wid).Msg("initiated terminate synchronous delete VPC prefix workflow successfully")
// Check if wferr is still nil now that we've handled any skippable errors.
if wferr != nil {
var timeoutErr *tp.TimeoutError
if errors.As(wferr, &timeoutErr) || ctx.Err() != nil {
logger.Error().Err(wferr).Msg("failed to delete VPC Prefix, timeout occurred executing workflow on Site.")
timeoutResp = func() error {
return common.TerminateWorkflowOnTimeOut(c, logger, stc, wid, wferr, "VPCPrefix", "Delete")
}
return cutil.NewAPIError(http.StatusInternalServerError, "VPC Prefix delete workflow timed out", nil)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
sed -n '277,321p;818,865p;1038,1082p' api/pkg/api/handler/vpcprefix.go

Repository: NVIDIA/infra-controller-rest

Length of output: 6458


Align delete workflow timeout handling with create and update handlers.

The delete handler omits WorkflowExecutionTimeout and executes the workflow against the unbounded handler context, whereas both create and update handlers establish a bounded workflow context via context.WithTimeout(ctx, cutil.WorkflowContextTimeout) and set WorkflowExecutionTimeout: cutil.WorkflowExecutionTimeout. A stalled Site workflow will hold the DB transaction and advisory lock open until the handler context is canceled, degrading system reliability and blocking other operations.

Suggested fix
 		workflowOptions := temporalClient.StartWorkflowOptions{
-			ID:        "vpc-prefix-delete-" + vpcPrefix.ID.String(),
-			TaskQueue: queue.SiteTaskQueue,
+			ID:                       "vpc-prefix-delete-" + vpcPrefix.ID.String(),
+			WorkflowExecutionTimeout: cutil.WorkflowExecutionTimeout,
+			TaskQueue:                queue.SiteTaskQueue,
 		}
 
 		logger.Info().Msg("triggering VPC prefix delete workflow")
 
+		workflowCtx, cancel := context.WithTimeout(ctx, cutil.WorkflowContextTimeout)
+		defer cancel()
+
 		// Trigger Site workflow to delete VPC prefix VPC prefix
-		we, derr := stc.ExecuteWorkflow(ctx, workflowOptions, "DeleteVpcPrefix", deleteVpcPrefixRequest)
+		we, derr := stc.ExecuteWorkflow(workflowCtx, workflowOptions, "DeleteVpcPrefix", deleteVpcPrefixRequest)
 		if derr != nil {
 			logger.Error().Err(derr).Msg("failed to synchronously start Temporal workflow to delete VPC prefix")
 			return cutil.NewAPIError(http.StatusInternalServerError, fmt.Sprintf("Failed to start sync workflow to delete VPC prefix on Site: %s", derr), nil)
 		}
 
 		wid := we.GetID()
 		logger.Info().Str("Workflow ID", wid).Msg("executed synchronous delete VPC prefix workflow")
 
 		// Execute the workflow synchronously
-		wferr := we.Get(ctx, nil)
+		wferr := we.Get(workflowCtx, nil)
 		// Handle skippable errors
 		if wferr != nil {
 			// If this was a 404 back from NICo, we can treat the object as already having been deleted and allow things to proceed.
 			var applicationErr *tp.ApplicationError
 			if errors.As(wferr, &applicationErr) && slices.Contains(swe.ObjectNotFoundErrTypes(), applicationErr.Type()) {
 				logger.Warn().Msg(swe.ErrTypeNICoObjectNotFound + " received from Site")
 				// Reset error to nil
 				wferr = nil
 			}
 		}
 
 		// Check if wferr is still nil now that we've handled any skippable errors.
 		if wferr != nil {
 			var timeoutErr *tp.TimeoutError
-			if errors.As(wferr, &timeoutErr) || ctx.Err() != nil {
+			if errors.As(wferr, &timeoutErr) || wferr == context.DeadlineExceeded || workflowCtx.Err() != nil {
 				logger.Error().Err(wferr).Msg("failed to delete VPC Prefix, timeout occurred executing workflow on Site.")
 				timeoutResp = func() error {
 					return common.TerminateWorkflowOnTimeOut(c, logger, stc, wid, wferr, "VPCPrefix", "Delete")
 				}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/pkg/api/handler/vpcprefix.go` around lines 1038 - 1077, The delete
handler uses the unbounded ctx and omits WorkflowExecutionTimeout, which can
leave DB transactions/locks open; wrap the call to stc.ExecuteWorkflow in a
bounded context via ctxBound, cancel := context.WithTimeout(ctx,
cutil.WorkflowContextTimeout) with defer cancel(), pass ctxBound into
stc.ExecuteWorkflow and we.Get (replace uses of ctx), and add
workflowOptions.WorkflowExecutionTimeout = cutil.WorkflowExecutionTimeout so the
ExecuteWorkflow/WorkflowExecution has the same timeouts as create/update
handlers (refer to symbols workflowOptions, stc.ExecuteWorkflow, we.Get,
cutil.WorkflowContextTimeout, cutil.WorkflowExecutionTimeout).

@chet chet merged commit 31aa375 into NVIDIA:main May 7, 2026
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants