refactor: Migrate vpc handlers to WithTx#472
Conversation
|
Caution Review failedFailed to post review comments Summary by CodeRabbit
WalkthroughRefactors VPC handlers to run DB updates and synchronous Temporal workflow invocations inside ChangesVPC Handler Transaction Refactoring
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-05-02 05:41:34 UTC | Commit: e1bbf86 |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/pkg/api/handler/vpc.go`:
- Around line 449-466: The timeout handling currently calls
stc.TerminateWorkflow from inside the cdb.WithTx closure (checking
errors.As(wferr, &timeoutErr) || wferr == context.DeadlineExceeded ||
wfCtx.Err() != nil), holding the DB transaction during the RPC; instead, set a
local flag (e.g., timeoutResp or a boolean like needTerminate) and capture the
workflow ID (wid) and wferr inside the closure, then return from WithTx; after
WithTx completes, if needTerminate is true create a new context with
context.WithTimeout using cutil.WorkflowContextNewAfterTimeout and call
stc.TerminateWorkflow (handling serr/logging and returning the appropriate
cutil.NewAPIError), mirroring the pattern used by UpdateVirtualization to avoid
making remote calls while the transaction is open; reference
stc.TerminateWorkflow, timeoutErr, wferr, wfCtx, wid,
cutil.WorkflowContextNewAfterTimeout and logger to locate and implement the
change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f7a330e7-4070-4834-be73-35fa422f9384
📒 Files selected for processing (1)
api/pkg/api/handler/vpc.go
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
| var timeoutErr *tp.TimeoutError | ||
| if errors.As(wferr, &timeoutErr) || wferr == context.DeadlineExceeded || wfCtx.Err() != nil { | ||
| logger.Error().Err(wferr).Msg("failed to create VPC, timeout occurred executing workflow on Site.") | ||
| timeoutResp = func() error { |
There was a problem hiding this comment.
I’m trying to understand how this cancel workflow will execute based on this pattern.
There was a problem hiding this comment.
WithTx returns APIError, which gets processed after WithTx executes. Other HandleTxError manages returning error.
There was a problem hiding this comment.
@hwadekar-nv Yeahhh good question! The timeoutResp flow is:
- Inside the closure, when we detect a timeout, we don't call
TerminateWorkflowdirectly. Instead, we stash a closure in the outer-scopetimeoutRespthat knows how to do the termination work. - We then return a
*cutil.APIErrorfrom theWithTxclosure, which causesWithTxto roll back the transaction. - After
WithTxreturns (rollback complete, locks released, etc), the outer code checks iftimeoutRespis set; that's when the actualTerminateWorkflowRPC fires (callingTerminateWorkflowOnTimeOut, which builds a fresh context withWorkflowContextNewAfterTimeout+ hits Site to terminate the orphaned workflow + and returns an error to the caller.
And if it's not obvious at first glance, the reason we don't inline TeriminateWorkflow while still inside the closure is because the DB transaction would stay open during that second RPC, which would continue to hold locks + a connection while we wait on the network/RPC.
We ran into this a bunch in the Core codebase, and have spent almost year trying to clean it up and unwind it (we even have a custom compiler extension now that looks for blocking calls during transactions).
Lemme know if that all makes sense!
thossain-nv
left a comment
There was a problem hiding this comment.
Thank you for the great refactor @chet Added a suggestion for timeout handling.
| var timeoutErr *tp.TimeoutError | ||
| if errors.As(wferr, &timeoutErr) || wferr == context.DeadlineExceeded || wfCtx.Err() != nil { | ||
| logger.Error().Err(wferr).Msg("failed to create VPC, timeout occurred executing workflow on Site.") | ||
| timeoutResp = func() error { |
There was a problem hiding this comment.
Should we make use if the existing utility method TerminateWorkflowOnTimeOut for the timeout handling?
This applies the new `WithTx` pattern from NVIDIA#462 to the `vpc` handlers (Create/Update/Delete plus the virtualization-type update). The `UpdateVirtualization` handler uses the `timeoutResp` closure pattern for `common.TerminateWorkflowOnTimeOut` so the helper still runs after the tx unwinds; the other three retain their inline `stc.TerminateWorkflow` paths Keeping these PRs smaller and tightly scoped so they're: - In theory a little easier to read. - More tightly scoped/less "blast radius" per PR. - A little nicer on/for @coderabbitai. 😆 I do I wish the diffs were easier to read, but it is what it is! Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Applies `WithTx` (and `WithTxResult`!) from NVIDIA#462 to the `Create`/`Update`/`Delete` NSG handlers. Implements our "`timeoutResp` pattern" (which is something we had introduced in NVIDIA#472, and then @coderabbitai said we should be consistent by doing it everywhere). TLDR is the existing code calls `common.TerminateWorkflowOnTimeOut` on timeout, but we want to defer that until after the transaction is unwound + DB connection back (because we don't want it to block waiting on the network). The adjustment (which we've done before, but figured I'd call it out more explicitly here) is effectively: ``` var timeoutResp func() error err = cdb.WithTx(ctx, ..., func(tx *cdb.Tx) error { ... if /* workflow timeout detected */ { // capture the terminate work, but DON'T do it yet timeoutResp = func() error { return common.TerminateWorkflowOnTimeOut(...) } return cutil.NewAPIError(...) // forces rollback } ... }) // rollback has now completed, now we do potentially blocking network work if timeoutResp != nil { return timeoutResp() } ``` Also addressed some @coderabbitai feedback around log messages in advance. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Applies `WithTx` (and `WithTxResult`!) from NVIDIA#462 to the `Create`/`Update`/`Delete` NSG handlers. Implements our "`timeoutResp` pattern" (which is something we had introduced in NVIDIA#472, and then @coderabbitai said we should be consistent by doing it everywhere). TLDR is the existing code calls `common.TerminateWorkflowOnTimeOut` on timeout, but we want to defer that until after the transaction is unwound + DB connection back (because we don't want it to block waiting on the network). The adjustment (which we've done before, but figured I'd call it out more explicitly here) is effectively: ``` var timeoutResp func() error err = cdb.WithTx(ctx, ..., func(tx *cdb.Tx) error { ... if /* workflow timeout detected */ { // capture the terminate work, but DON'T do it yet timeoutResp = func() error { return common.TerminateWorkflowOnTimeOut(...) } return cutil.NewAPIError(...) // forces rollback } ... }) // rollback has now completed, now we do potentially blocking network work if timeoutResp != nil { return timeoutResp() } ``` Also addressed some @coderabbitai feedback around log messages in advance. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Applies `WithTx` (and `WithTxResult`!) from NVIDIA#462 to the `Create`/`Update`/`Delete` NSG handlers. Implements our "`timeoutResp` pattern" (which is something we had introduced in NVIDIA#472, and then @coderabbitai said we should be consistent by doing it everywhere). TLDR is the existing code calls `common.TerminateWorkflowOnTimeOut` on timeout, but we want to defer that until after the transaction is unwound + DB connection back (because we don't want it to block waiting on the network). The adjustment (which we've done before, but figured I'd call it out more explicitly here) is effectively: ``` var timeoutResp func() error err = cdb.WithTx(ctx, ..., func(tx *cdb.Tx) error { ... if /* workflow timeout detected */ { // capture the terminate work, but DON'T do it yet timeoutResp = func() error { return common.TerminateWorkflowOnTimeOut(...) } return cutil.NewAPIError(...) // forces rollback } ... }) // rollback has now completed, now we do potentially blocking network work if timeoutResp != nil { return timeoutResp() } ``` Also addressed some @coderabbitai feedback around log messages in advance. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Description
This applies the new
WithTxpattern from #462 to thevpchandlers (Create/Update/Delete plus the virtualization-type update).The
UpdateVirtualizationhandler uses thetimeoutRespclosure pattern forcommon.TerminateWorkflowOnTimeOutso the helper still runs after the tx unwinds; the other three retain their inlinestc.TerminateWorkflowpaths.Keeping these PRs smaller and tightly scoped so they're:
I do I wish the diffs were easier to read, but it is what it is!
Signed-off-by: Chet Nichols III chetn@nvidia.com
Type of Change
Services Affected
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes